WEBVTT Kind: captions; language: en-us NOTE Treffsikkerhet: 91% (H?Y) 00:00:00.000 --> 00:00:05.400 In this video we will talk about the histogram. NOTE Treffsikkerhet: 91% (H?Y) 00:00:07.100 --> 00:00:15.050 So far we have used this kind of graph to show all the measures that we have made. NOTE Treffsikkerhet: 91% (H?Y) 00:00:15.050 --> 00:00:23.700 For example if we had a set of measurements for children's knowledge of letters, that is how many 00:00:23.700 --> 00:00:32.500 letters children know the sounds to or their names, and so for each child we have a number of how 00:00:32.500 --> 00:00:38.800 many letters they actually know like this. This means that NOTE Treffsikkerhet: 90% (H?Y) 00:00:38.800 --> 00:00:45.100 there were two children who knew two letters, NOTE Treffsikkerhet: 91% (H?Y) 00:00:45.700 --> 00:00:56.900 three children who knew three letters, there weren't any children who only knew four letter,s there 00:00:56.900 --> 00:01:04.900 were five children who knew five letters, one child you knew six letters, and so one for each possible 00:01:04.900 --> 00:01:08.550 value of number of letters known. NOTE Treffsikkerhet: 91% (H?Y) 00:01:08.550 --> 00:01:17.050 This kind of graph has a lot of information, but is not very informative if you want to get a sense 00:01:17.050 --> 00:01:25.900 for the distribution of your data, especially if you have very large samples, or many different 00:01:25.900 --> 00:01:30.500 possible values that your observations could take. NOTE Treffsikkerhet: 91% (H?Y) 00:01:30.900 --> 00:01:39.500 So this is another example where we have measured a bunch of children on their reading skill with 00:01:39.500 --> 00:01:48.400 the word building fluency test, and calculated a words per minute metric for each child. So there was 00:01:48.400 --> 00:01:55.300 one child who couldn't read at all that also got a zero words per minute. And there were two children who got 00:01:55.300 --> 00:01:59.199 20 words per minute, and so on. NOTE Treffsikkerhet: 91% (H?Y) 00:01:59.199 --> 00:02:07.300 Because there are many different values this graph is quite flat, it doesn't really tell us how the 00:02:07.300 --> 00:02:15.300 measures are distributed. And you can imagine if we had 500 children this would not look very 00:02:15.300 --> 00:02:17.100 interpretable. NOTE Treffsikkerhet: 90% (H?Y) 00:02:17.100 --> 00:02:27.050 Instead of showing every single value we obtain, you do instead is put ranges of values 00:02:27.050 --> 00:02:37.400 in buckets. Buckets are called bins, and bins are formally defined as consecutive, not overlapping 00:02:37.400 --> 00:02:46.500 intervals. So that's ranges of values the touched on their ends. So for a bin width of five the NOTE Treffsikkerhet: 79% (H?Y) 00:02:46.500 --> 00:02:55.900 first bucket, the first bin, would be from a value of zero to a value five words per minute. NOTE Treffsikkerhet: 91% (H?Y) 00:02:56.400 --> 00:03:09.500 So how many children had reading rates from 0 to 5 words per minute, well one, two, three. So these 00:03:09.500 --> 00:03:19.649 three children go into the first bin that represents values between 0 and 5, and there are three 00:03:19.649 --> 00:03:24.000 values inside this bin, this bucket. NOTE Treffsikkerhet: 91% (H?Y) 00:03:24.000 --> 00:03:29.450 The next bin would be between 5 and 10 NOTE Treffsikkerhet: 77% (H?Y) 00:03:29.450 --> 00:03:42.100 so one here, and 5 here, that is six values were observed between five and 10. And this is our second 00:03:42.100 --> 00:03:46.300 bucket that contains six measurements . NOTE Treffsikkerhet: 91% (H?Y) 00:03:47.000 --> 00:03:53.450 And so the next one would be between 10 and 15, NOTE Treffsikkerhet: 88% (H?Y) 00:03:53.450 --> 00:04:05.500 we have only one child with a rate of between 10 and 15 words per minute so this is one. We have NOTE Treffsikkerhet: 70% (MEDIUM) 00:04:05.500 --> 00:04:12.750 two, plus three, plus two, plus two, NOTE Treffsikkerhet: 84% (H?Y) 00:04:12.750 --> 00:04:16.299 that is nine NOTE Treffsikkerhet: 91% (H?Y) 00:04:16.299 --> 00:04:29.049 values in the range between 15 and 20, and so on. Now one important thing here is what happens with 00:04:29.049 --> 00:04:37.600 the Border values. We should not count them twice because the total number of observations here must 00:04:37.600 --> 00:04:43.400 be equal to the total number of observations here, which is how many we have. So if we count the 00:04:43.400 --> 00:04:46.800 heights of all these bars the sum would be NOTE Treffsikkerhet: 91% (H?Y) 00:04:46.800 --> 00:04:50.100 how many children were assessed. NOTE Treffsikkerhet: 85% (H?Y) 00:04:50.100 --> 00:05:00.800 So it's important to be consistent. Usually the Border value goes on the left, so 20 words per minute 00:05:00.800 --> 00:05:11.700 goes into the 15 to 20 box. And not into the 22 to 25 box. So five words per minute would be here, 10 words 00:05:11.700 --> 00:05:20.850 per minute would be here, and so on. And the result of this bining of values is taking this NOTE Treffsikkerhet: 86% (H?Y) 00:05:20.850 --> 00:05:28.100 representation of, which is called a frequency plot, and creating this here which is called a 00:05:28.100 --> 00:05:29.750 histogram. NOTE Treffsikkerhet: 91% (H?Y) 00:05:29.750 --> 00:05:38.000 The histogram is probably the most important graphical display there is we always use it when we 00:05:38.000 --> 00:05:47.400 have quantitative variables, because it gives us at a glance a very good indication of the shape of 00:05:47.400 --> 00:05:57.050 our data distribution, how our data are distributed over values, in a way that is not so much affected 00:05:57.050 --> 00:06:00.299 by how many values there can be, or NOTE Treffsikkerhet: 91% (H?Y) 00:06:00.299 --> 00:06:09.000 many measures we have made. aAd we can also easily see if there are Peaks, if there are gaps, or if 00:06:09.000 --> 00:06:14.800 there are any outliers far from the other measurements using the histogram.