WEBVTT Kind: captions; language: en-us NOTE Treffsikkerhet: 89% (H?Y) 00:00:00.000 --> 00:00:08.800 iIn this video we will talk about standard scores we will see how standard scores are calculated what 00:00:08.800 --> 00:00:17.000 they represent and why they are so useful. To motivate the idea of a standard score let us look at 00:00:17.000 --> 00:00:24.900 some of our actual data let's go to Jimovie. Let us first load our data set NOTE Treffsikkerhet: 91% (H?Y) 00:00:27.500 --> 00:00:33.900 and let us look at some of our vocabulary scores. NOTE Treffsikkerhet: 91% (H?Y) 00:00:33.900 --> 00:00:43.000 The first child listed here has a vocabulary score of 47, this means that the child answered forty 00:00:43.000 --> 00:00:49.000 seven questions correctly, or selected 47 pictures correctly. NOTE Treffsikkerhet: 91% (H?Y) 00:00:49.000 --> 00:00:59.300 Okay is that a good thing or a bad thing, is that a good performance or a bad performance ? This 47 is 00:00:59.300 --> 00:01:09.100 essentially meaningless unless we know what kids are expected to do with this test at this age. Let 00:01:09.100 --> 00:01:17.000 us see what the other kids in our small sample did, all go to exploration, descriptives NOTE Treffsikkerhet: 91% (H?Y) 00:01:17.600 --> 00:01:21.300 and choose to look at NOTE Treffsikkerhet: 91% (H?Y) 00:01:21.300 --> 00:01:33.950 the mean, median, minimum and maximum performance. So for vocabulary in kindergarten we see that scores 00:01:33.950 --> 00:01:38.900 ranged between 38 and 91. NOTE Treffsikkerhet: 84% (H?Y) 00:01:38.900 --> 00:01:48.050 And that a means score was 61.3 so compared to 61.3 NOTE Treffsikkerhet: 85% (H?Y) 00:01:48.050 --> 00:01:52.500 Our score of 47 seems quite low, NOTE Treffsikkerhet: 91% (H?Y) 00:01:52.500 --> 00:01:55.699 but how low is it really NOTE Treffsikkerhet: 91% (H?Y) 00:01:55.699 --> 00:02:03.800 What proportion of children scores higher or lower than 47, that would be a much more useful kind of 00:02:03.800 --> 00:02:12.300 information to understand the meaning of this 47. Let us look at the standard deviation for our 00:02:12.300 --> 00:02:13.900 sample. NOTE Treffsikkerhet: 91% (H?Y) 00:02:18.300 --> 00:02:36.200 The standard deviation is 12.1 so our score of 47 is 47 - 61.3 that is more than 14 points below the 00:02:36.200 --> 00:02:37.350 mean. NOTE Treffsikkerhet: 91% (H?Y) 00:02:37.350 --> 00:02:51.100 And 14 points below the mean is more than one standard deviation, in fact by dividing 14.3 by 12.1 it 00:02:51.100 --> 00:03:01.900 turns out that this child with the score of 47 correct choices is actually 1.18 standard deviations 00:03:01.900 --> 00:03:07.050 below the mean. What does that mean to understand ? NOTE Treffsikkerhet: 91% (H?Y) 00:03:07.050 --> 00:03:14.400 To the proportion that is associated with each score we can go to our normal probability display, so 00:03:14.400 --> 00:03:25.800 this is our normal probability display and let us go to a score of - 1.18, that is 1.18 standard 00:03:25.800 --> 00:03:28.300 deviations below the mean. NOTE Treffsikkerhet: 91% (H?Y) 00:03:28.900 --> 00:03:37.900 This is where this child lies with respect to the normal distribution, when we think about it in 00:03:37.900 --> 00:03:40.500 terms of standard deviations. NOTE Treffsikkerhet: 91% (H?Y) 00:03:40.900 --> 00:03:51.700 And if we click up to, we see that 1.18 standard deviations below the mean is associated with the 00:03:51.700 --> 00:04:01.700 probability of about 12%. So this means that our child is at the 12 percentile, this means that 88 00:04:01.700 --> 00:04:09.899 percent of children score higher, and 12 percent of children score at 47 or below. NOTE Treffsikkerhet: 87% (H?Y) 00:04:09.899 --> 00:04:20.300 Indeed if you go back to Jimovie and count how many of the vocabulary in kindergarten values are 00:04:20.300 --> 00:04:30.800 higher than 47, you will see that there is 42 of them, 42 out of 47 children in the sample is about 89 00:04:30.800 --> 00:04:38.000 percent which is exactly what we expected based on looking at the percentile from the normal 00:04:38.000 --> 00:04:39.250 distribution. NOTE Treffsikkerhet: 91% (H?Y) 00:04:39.250 --> 00:04:48.799 So what we did by using information from the mean and standard deviation was to convert this 00:04:48.799 --> 00:04:57.700 meaningless raw score of 47 into a value that is directly interpretable as a proportion of children 00:04:57.700 --> 00:05:07.000 scoring higher. So we turned an almost useless number into a highly informative quantity that tells 00:05:07.000 --> 00:05:09.450 us how high or how low this NOTE Treffsikkerhet: 76% (H?Y) 00:05:09.450 --> 00:05:18.000 child scores for this test and in comparison with the same age range, or at least in comparison with 00:05:18.000 --> 00:05:19.950 irrelevant sample. NOTE Treffsikkerhet: 91% (H?Y) 00:05:19.950 --> 00:05:28.900 This is the distribution of vocabulary raw scores in kindergarten for our entire sample of 47 00:05:28.900 --> 00:05:37.500 children, here is the histogram and on top of the histogram I have plotted a normal curve with the 00:05:37.500 --> 00:05:43.150 same mean and standard deviation as the sample, for comparison. NOTE Treffsikkerhet: 85% (H?Y) 00:05:43.150 --> 00:05:52.000 So it looks like the histogram is a fairly good approximation of this normal curve although there 00:05:52.000 --> 00:06:00.350 seems to be a slight excess of children's scoring between 50 and 60 compared to Children scoring 00:06:00.350 --> 00:06:02.500 around 60. NOTE Treffsikkerhet: 91% (H?Y) 00:06:02.600 --> 00:06:12.000 Anyway let us use this sample to see how we go from the raw score values, which as we saw are 00:06:12.000 --> 00:06:18.900 essentially uninformative, to those values that are informative, that are interpretable as 00:06:18.900 --> 00:06:27.700 proportions of the relevant sample. Recall that the mean of this sample was 61.3, NOTE Treffsikkerhet: 91% (H?Y) 00:06:27.700 --> 00:06:31.750 and the standard deviation was 12.1. NOTE Treffsikkerhet: 91% (H?Y) 00:06:31.750 --> 00:06:38.900 So the first thing we do is we subtract the mean from each value NOTE Treffsikkerhet: 91% (H?Y) 00:06:39.100 --> 00:06:50.200 if we subtract 61.3 from each value then we get to this histogram now as you will notice this is 00:06:50.200 --> 00:06:57.700 identical, of course the distribution of values hasn't changed, what has changed is that every value 00:06:57.700 --> 00:07:08.200 is 61.3 less than it used to be so the dispersion is unchanged and indeed the new set of numbers has 00:07:08.200 --> 00:07:09.299 the same standard deviation NOTE Treffsikkerhet: 82% (H?Y) 00:07:09.299 --> 00:07:18.000 as before but since we subtracted the mean from every value, the mean is now zero because a value 00:07:18.000 --> 00:07:27.350 that would be around 61 is now around Zero by subtracting 61.3, and the same is true for every value. 00:07:27.350 --> 00:07:38.200 So values that used to be around 61 are now around 0, it's like Shifting the histogram so that the 00:07:38.200 --> 00:07:39.250 mean is now NOTE Treffsikkerhet: 77% (H?Y) 00:07:39.250 --> 00:07:40.550 at zero NOTE Treffsikkerhet: 91% (H?Y) 00:07:40.550 --> 00:07:43.850 that's what subtracting the mean did. NOTE Treffsikkerhet: 87% (H?Y) 00:07:43.850 --> 00:07:49.800 Let us now divide every value by this standard deviation. NOTE Treffsikkerhet: 91% (H?Y) 00:07:51.600 --> 00:08:00.200 Again it should not be a surprise that the histogram is identical, we did not change the relative 00:08:00.200 --> 00:08:11.100 standing of any number, we just divided every number by 12.1 the mean remains unchanged because it's 00:08:11.100 --> 00:08:19.400 not affected by the division what used to be zero is still zero. What did change was the actual 00:08:19.400 --> 00:08:22.150 values of the numbers so that NOTE Treffsikkerhet: 91% (H?Y) 00:08:22.150 --> 00:08:26.350 a number that used to be 12 points above the mean NOTE Treffsikkerhet: 86% (H?Y) 00:08:26.350 --> 00:08:34.600 which was about one standard deviation about the mean well divided, by twelve point one is now around 00:08:34.600 --> 00:08:43.150 one so being 12 points above the mean is now one. So being one above the mean is now NOTE Treffsikkerhet: 91% (H?Y) 00:08:43.150 --> 00:08:52.600 one standard deviation above the mean. Indeed the numbers here are interpretable as how many standard 00:08:52.600 --> 00:08:57.650 deviations above or below the mean the original value was, NOTE Treffsikkerhet: 91% (H?Y) 00:08:57.650 --> 00:09:05.800 and that's what a standard score is. So we have calculated standard scores, otherwise known as Z 00:09:05.800 --> 00:09:09.300 scores using the formula NOTE Treffsikkerhet: 79% (H?Y) 00:09:09.500 --> 00:09:18.800 raw score which is our original measurement minus the mean of all the measurements, divided by the 00:09:18.800 --> 00:09:29.200 standard deviation in this is called a z-score or standard score. And a z-score is exactly a number 00:09:29.200 --> 00:09:37.400 that expresses how many standard deviations away from the mean the original measurement was. So a 00:09:37.400 --> 00:09:39.200 positive z-score means the NOTE Treffsikkerhet: 86% (H?Y) 00:09:39.200 --> 00:09:45.800 original measurement was above the mean, and a negative z-score means the original measurement was 00:09:45.800 --> 00:09:47.800 below the mean. NOTE Treffsikkerhet: 91% (H?Y) 00:09:48.100 --> 00:09:57.900 The reason this is so useful is that if our data are approximately normally distributed, or can be 00:09:57.900 --> 00:10:05.700 brought to be approximately normally distributed, then the number of standard deviations above or 00:10:05.700 --> 00:10:14.400 below the mean is directly linked to a probability. Because of the direct link of probabilities to 00:10:14.400 --> 00:10:17.349 positions in the normal distribution. NOTE Treffsikkerhet: 89% (H?Y) 00:10:17.349 --> 00:10:27.300 And as you recall probability is thought of as frequency, or proportions, how often things happen on 00:10:27.300 --> 00:10:38.700 or in what proportion of instances things happen. So a probability of 12%, is the same as a 12 00:10:38.700 --> 00:10:41.500 percent proportion of the sample, NOTE Treffsikkerhet: 91% (H?Y) 00:10:41.500 --> 00:10:46.800 or of the population depending on how things work computed. NOTE Treffsikkerhet: 91% (H?Y) 00:10:46.800 --> 00:10:57.400 sS by using a z-score through the normal distribution we can link to percentiles, the proportion of 00:10:57.400 --> 00:11:03.050 children scoring below and above our original raw score. NOTE Treffsikkerhet: 86% (H?Y) 00:11:03.050 --> 00:11:12.100 And therefore we have computed a very informative measure out of the raw score that tells us the 00:11:12.100 --> 00:11:20.200 relative standing of this child as a proportion of the sample, and that's what is z score or standard 00:11:20.200 --> 00:11:24.000 score is and why it is so useful.