WEBVTT Kind: captions; language: en-us NOTE Treffsikkerhet: 91% (H?Y) 00:00:00.199 --> 00:00:07.050 In this video we will illustrate graphical displays of single variables. NOTE Treffsikkerhet: 67% (MEDIUM) 00:00:07.050 --> 00:00:12.700 So first, we are going to open our data set i Jamovi. NOTE Treffsikkerhet: 91% (H?Y) 00:00:15.100 --> 00:00:20.200 And then see what we can display about these variables. NOTE Treffsikkerhet: 91% (H?Y) 00:00:20.200 --> 00:00:27.800 They displays on the individual variables are a part of the descriptives. So again, we go into 00:00:27.800 --> 00:00:32.000 analysis, exploration and descriptives. NOTE Treffsikkerhet: 91% (H?Y) 00:00:32.200 --> 00:00:36.000 Let us begin with NOTE Treffsikkerhet: 91% (H?Y) 00:00:36.300 --> 00:00:44.200 the three variables that are categorical or qualitative measured at the nominal level. NOTE Treffsikkerhet: 91% (H?Y) 00:00:45.500 --> 00:00:53.900 We do not want the statistics at this point, so we can uncheck these. NOTE Treffsikkerhet: 89% (H?Y) 00:00:53.900 --> 00:01:02.500 And then the whole result disappears. We saw that we can get frequency tables in terms of 00:01:02.500 --> 00:01:09.900 descriptives, but we can get similar information graphically. If we go down here and open the plot 00:01:09.900 --> 00:01:19.700 section. The only thing we can use for categorical, variables are bar plots. And this is the only 00:01:19.700 --> 00:01:24.900 good use of barplot. Please don't use bar plots for anything else. NOTE Treffsikkerhet: 91% (H?Y) 00:01:26.200 --> 00:01:33.000 In this case, where you only have two labels, they're not terribly informative over the tables. But 00:01:33.000 --> 00:01:41.800 if we had lots of different labels or if we had ordinal level variables, it might be informative to 00:01:41.800 --> 00:01:49.100 actually produce a graphical display. So we can scroll down and look at the proportions of different 00:01:49.100 --> 00:01:55.600 values in our three variables. So we can see that there is about the same number of children in the 00:01:55.600 --> 00:01:56.300 two condition. NOTE Treffsikkerhet: 85% (H?Y) 00:01:56.300 --> 00:02:04.300 There are many more children with the majority home language than a minority language and there 00:02:04.300 --> 00:02:09.000 is approximately an equal number of girls and boys. NOTE Treffsikkerhet: 91% (H?Y) 00:02:10.500 --> 00:02:20.400 Some people like to get pie charts which are kind of horrible, but the Jamovi doesn't produce them 00:02:20.400 --> 00:02:26.900 because they're not really used outside of newspapers. If you really want to produce a pie chart for 00:02:26.900 --> 00:02:28.649 one of your variables NOTE Treffsikkerhet: 85% (H?Y) 00:02:28.649 --> 00:02:34.800 you can use this trick, go click on R, RJ editor. NOTE Treffsikkerhet: 91% (H?Y) 00:02:37.300 --> 00:02:41.900 Delete this thing, which we don't need. NOTE Treffsikkerhet: 87% (H?Y) 00:02:41.900 --> 00:02:49.649 Highlight and delete and then type pie open parentheses NOTE Treffsikkerhet: 80% (H?Y) 00:02:49.649 --> 00:02:55.500 Table open parenthesis data. NOTE Treffsikkerhet: 86% (H?Y) 00:02:56.300 --> 00:03:03.600 Dollar sex, close parentheses, close parentheses. NOTE Treffsikkerhet: 88% (H?Y) 00:03:03.600 --> 00:03:12.600 So this instructs the Jamovi to only use this sex variable from your data, that's what the dollar is 00:03:12.600 --> 00:03:20.800 doing their. Produce a table of values from that variable and then produce a pie chart from that. 00:03:20.800 --> 00:03:28.300 It's very important to have one parentheses opening here, one opening here and both of them closing 00:03:28.300 --> 00:03:34.300 their. There are two sets of two, two instances of closing parentheses there. NOTE Treffsikkerhet: 85% (H?Y) 00:03:34.300 --> 00:03:43.350 And then click on the green triangle to run this. And this is the result which looks very silly as 00:03:43.350 --> 00:03:45.300 expected. NOTE Treffsikkerhet: 83% (H?Y) 00:03:46.000 --> 00:03:53.600 To delete something. You don't want from the report. You can right click on this one and then go on 00:03:53.600 --> 00:03:56.050 and click on remove. NOTE Treffsikkerhet: 90% (H?Y) 00:03:56.050 --> 00:04:00.399 So we can forget it's as if it never happened. NOTE Treffsikkerhet: 91% (H?Y) 00:04:00.399 --> 00:04:05.150 What about graphs for the numeric variables? NOTE Treffsikkerhet: 90% (H?Y) 00:04:05.150 --> 00:04:15.100 Analyses, exploration, click on exploration, click on descriptives. So we start a new section of 00:04:15.100 --> 00:04:23.900 reports. Since we're not going to be asking for any descriptive statistics we uncheck these so that 00:04:23.900 --> 00:04:35.000 section disappears and then we select which variable we want to see graphically. Let's run our NOTE Treffsikkerhet: 76% (H?Y) 00:04:35.000 --> 00:04:43.550 examples first on the matrices0K variable. Let me close this statistics section by clicking on it 00:04:43.550 --> 00:04:48.800 and make sure the plot section is open. If it's not, you click on this. NOTE Treffsikkerhet: 89% (H?Y) 00:04:49.700 --> 00:04:57.900 So the first thing that we always look at when we have any quantitative data is a histogram. NOTE Treffsikkerhet: 89% (H?Y) 00:04:59.800 --> 00:05:06.800 At any point, you can add variables and this will automatically produce the same graphs for the 00:05:06.800 --> 00:05:14.100 other variables. So, I can now add the vocabulary variable here and it will immediately produce the 00:05:14.100 --> 00:05:17.000 histogram for the vocabulary. NOTE Treffsikkerhet: 90% (H?Y) 00:05:18.500 --> 00:05:30.300 In addition to the histogram. I can plot the density. The density, you can think of as a line that 00:05:30.300 --> 00:05:39.700 tries to go through the histogram as if smoothing the tops of the different bars and here it's kind 00:05:39.700 --> 00:05:46.000 of in the middle because it has to be smooth and it has to be between this low value and these high 00:05:46.000 --> 00:05:47.150 value. NOTE Treffsikkerhet: 91% (H?Y) 00:05:47.150 --> 00:05:54.700 And similar things go on here. So you can even make the histogram disappear and only retain the 00:05:54.700 --> 00:06:02.300 density which we don't do. Because the point of having the histogram is to see where your actual 00:06:02.300 --> 00:06:08.900 data are and the density hides your data and only retains the shape. This is useful in some 00:06:08.900 --> 00:06:17.300 situations, but not when you want to examine the data for possible problems. NOTE Treffsikkerhet: 85% (H?Y) 00:06:18.400 --> 00:06:26.900 So this is the basic graph. You should always produce with all your numeric variables. Another very 00:06:26.900 --> 00:06:37.000 useful graph is the box plot. To understand how the box plot works let me first plot the data. NOTE Treffsikkerhet: 78% (H?Y) 00:06:39.100 --> 00:06:47.400 So you can see here, these are the actual values for the matrices data. They are just scattered 00:06:47.400 --> 00:06:55.100 here. And this tells us that there's a three values around 10. There's a bunch of values between 15 00:06:55.100 --> 00:07:04.250 and 20 something, and there is a few values around 25. So these are the actual data, like these data. 00:07:04.250 --> 00:07:07.799 One way to look at these data is called the NOTE Treffsikkerhet: 90% (H?Y) 00:07:07.799 --> 00:07:10.000 violin plot. NOTE Treffsikkerhet: 78% (H?Y) 00:07:11.100 --> 00:07:20.600 And the violent plot is showing you how much data you have around each region of values. NOTE Treffsikkerhet: 91% (H?Y) 00:07:20.600 --> 00:07:30.500 So, it is not a coincidence that the violent plot looks like a symmetric version of the density. NOTE Treffsikkerhet: 91% (H?Y) 00:07:30.500 --> 00:07:38.600 So this is, these are the same values for the same variable here, and here. So the violin plot is 00:07:38.600 --> 00:07:46.100 basically your density line, which is derived from the histogram, except it's plotted vertically. 00:07:46.100 --> 00:07:51.950 And it's plotted, both left and right. So if you didn't have the data plotted, NOTE Treffsikkerhet: 91% (H?Y) 00:07:51.950 --> 00:07:58.900 This would tell you in what regions of values, your data would be concentrated. NOTE Treffsikkerhet: 88% (H?Y) 00:08:02.900 --> 00:08:12.000 What are usually plotting though is something much more straightforward and it's called a box plot. NOTE Treffsikkerhet: 84% (H?Y) 00:08:12.000 --> 00:08:15.200 This is the box plot. NOTE Treffsikkerhet: 82% (H?Y) 00:08:15.300 --> 00:08:24.049 And indeed, it's called like that because it contains a box. The box is the 50% of your data that 00:08:24.049 --> 00:08:32.299 the central half. Remember, when we were discussing the interquartile range that, we split the data 00:08:32.299 --> 00:08:39.200 into four quartiles. So this is what you're looking at here. The black line is the median. NOTE Treffsikkerhet: 90% (H?Y) 00:08:39.200 --> 00:08:47.600 This line is the first quartile and this is the third quartile So within this range. NOTE Treffsikkerhet: 91% (H?Y) 00:08:47.600 --> 00:08:56.800 Half of the matrices values will be found. So the if you were to count the number of points in here, 00:08:56.800 --> 00:09:01.700 they would be as many as here plus here. NOTE Treffsikkerhet: 91% (H?Y) 00:09:01.700 --> 00:09:08.500 And these lines extend to the range of the data. So they go from the minimum NOTE Treffsikkerhet: 85% (H?Y) 00:09:08.500 --> 00:09:17.700 through the maximum. In some cases the variable will have outline values that will be far from the 00:09:17.700 --> 00:09:19.700 central data. NOTE Treffsikkerhet: 80% (H?Y) 00:09:20.400 --> 00:09:29.800 Let me add the vocabulary measures at grade one. In which case you see here there is a value that 00:09:29.800 --> 00:09:39.200 is a bit outside of the distribution. And so, in this case, plotting this box plot to the full range 00:09:39.200 --> 00:09:48.400 would require this line to go all the way up to here. Let me remove the data and only show the box 00:09:48.400 --> 00:09:49.800 plot. NOTE Treffsikkerhet: 85% (H?Y) 00:09:50.600 --> 00:09:59.200 So the box plot in the case of such data that are potentially outlying, that would cause extremely 00:09:59.200 --> 00:10:06.100 long lines. Only plots the line up to some point. NOTE Treffsikkerhet: 86% (H?Y) 00:10:06.100 --> 00:10:14.400 Which is one and a half, or twice this way. And the rest is plotted individually, so you can see how 00:10:14.400 --> 00:10:22.599 many possible outliers you have. The box plot is one of the most useful plots you can get 00:10:22.599 --> 00:10:29.500 especially when you want to make comparisons. The first and most important plot is the histogram, 00:10:29.500 --> 00:10:35.900 where you check the distribution, its shape and potential problems, and then you can use the box 00:10:35.900 --> 00:10:36.400 plots NOTE Treffsikkerhet: 91% (H?Y) 00:10:36.400 --> 00:10:39.200 for further comparisons.