WEBVTT Kind: captions; language: en-us NOTE Treffsikkerhet: 87% (H?Y) 00:00:00.000 --> 00:00:08.700 In this video we will talk about test standardization, in particular we will talk about rescaling of 00:00:08.700 --> 00:00:16.300 standard scores to easily understood numbers and norming, or standardization samples, on which 00:00:16.300 --> 00:00:22.900 measurements are used to derive the conversion. And these are essential elements of every assessment 00:00:22.900 --> 00:00:29.750 test. As you recall from our presentation of standard scores NOTE Treffsikkerhet: 88% (H?Y) 00:00:29.750 --> 00:00:41.400 when we have a set of measurements on a test, we can subtract the mean and get to a centered set of 00:00:41.400 --> 00:00:47.200 measurements with a mean of zero and the same standard deviation as the original, and then by 00:00:47.200 --> 00:00:53.900 dividing with the standard deviation we get to standard scores which have the mean of 0 and a 00:00:53.900 --> 00:00:58.700 standard deviation of one, and these are the z-scores NOTE Treffsikkerhet: 91% (H?Y) 00:00:58.700 --> 00:01:05.800 and they are essentially the original scores expressed as number of standard deviations away from 00:01:05.800 --> 00:01:14.300 the mean. Now these kinds of numbers are not very easy to perceive or remember for many people 00:01:14.300 --> 00:01:21.600 because they have decimal digits and they're also often negative, indicating performance below the 00:01:21.600 --> 00:01:28.600 mean. So negatives and decimals make these numbers a bit cumbersome for many people, NOTE Treffsikkerhet: 90% (H?Y) 00:01:28.600 --> 00:01:35.100 and therefore what we can do to correct that and make them more convenient without changing their 00:01:35.100 --> 00:01:44.300 essence is to transform them to any distribution we would like. So if we multiply by a number such as 15 00:01:44.300 --> 00:01:52.800 and then add a hundred to every one of them then we can derive exactly the same distribution again, 00:01:52.800 --> 00:01:58.699 and of course the same as the original raw scores, but this distribution NOTE Treffsikkerhet: 81% (H?Y) 00:01:58.699 --> 00:02:04.900 has a mean of a hundred, and the standard deviation of 15, the two numbers that we use to derive 00:02:04.900 --> 00:02:05.850 it. NOTE Treffsikkerhet: 84% (H?Y) 00:02:05.850 --> 00:02:14.600 So someone who was one standard deviation above the mean, is now 15 above the mean because we 00:02:14.600 --> 00:02:22.600 multiplied everything by 15, someone who was 2 standard deviations below the mean is now 2 times 15 00:02:22.600 --> 00:02:27.650 that's 30 below the mean, 30 below a hundred is 70. NOTE Treffsikkerhet: 80% (H?Y) 00:02:27.650 --> 00:02:39.100 Someone who had exactly average performance would be at 100, so what we did was to transform our actual 00:02:39.100 --> 00:02:47.000 distribution of raw score that we measured, into an identical distribution with a mean of a hundred 00:02:47.000 --> 00:02:54.400 and standard deviation of 15, which are numbers that are easy to remember when we know and do not 00:02:54.400 --> 00:02:57.500 forget what they represent. NOTE Treffsikkerhet: 86% (H?Y) 00:02:58.500 --> 00:03:08.100 So standardized scores can be scaled, the original standard score is the Z score, but we can 00:03:08.100 --> 00:03:15.800 scale them to be like an IQ scale, and these are with the mean of a hundred and standard deviation of 00:03:15.800 --> 00:03:25.700 15. This is what IQ scores actually are, they are standardized numbers that are made to have a mean of 00:03:25.700 --> 00:03:29.300 100 and a standard deviation of 15. NOTE Treffsikkerhet: 91% (H?Y) 00:03:29.300 --> 00:03:36.200 Based on the actual raw scores of how many questions were answered, which are uninformative numbers 00:03:36.200 --> 00:03:42.100 in themselves because in order to interpret those properly we would need to know the mean and 00:03:42.100 --> 00:03:47.050 standard deviation of everyone answering the questions on the test. NOTE Treffsikkerhet: 91% (H?Y) 00:03:47.050 --> 00:03:54.600 There are other scales you are probably familiar with such as whisk subscales which are made to 00:03:54.600 --> 00:04:01.000 have a mean of 10 and a standard deviation of three using an operation similar to the one we just 00:04:01.000 --> 00:04:07.750 demonstrated, and there are many other possibilities of scale scores that are used in assessment tests 00:04:07.750 --> 00:04:16.100 and they all have the same interpretation, you remember always think of percentiles. So all kinds of 00:04:16.100 --> 00:04:17.299 scale scores NOTE Treffsikkerhet: 73% (MEDIUM) 00:04:17.299 --> 00:04:25.200 are informative to the extent you know what percentiles, what proportion of the population each score 00:04:25.200 --> 00:04:26.950 corresponds to. NOTE Treffsikkerhet: 91% (H?Y) 00:04:26.950 --> 00:04:35.200 Standardization of a test begins with selecting a sample that is a group of people, adults or 00:04:35.200 --> 00:04:37.250 children, to be measured. NOTE Treffsikkerhet: 89% (H?Y) 00:04:37.250 --> 00:04:44.100 This is called the norming sample and it is extremely important for the norming sample to be 00:04:44.100 --> 00:04:51.100 representative of the population, that is to have the same mean and standard deviation as the whole 00:04:51.100 --> 00:04:58.200 population. Of course we can never measure everyone so we have to carefully select our sample so that 00:04:58.200 --> 00:05:04.300 we can reasonably expect it to have a mean and standard deviation that will be very close to the 00:05:04.300 --> 00:05:07.600 whole population, so they must have the same average NOTE Treffsikkerhet: 90% (H?Y) 00:05:07.600 --> 00:05:14.100 performance and a similar dispersion of performance as over the whole population. Once the norming 00:05:14.100 --> 00:05:21.400 sample is selected everyone in this sample is administered the test and the distribution of scores 00:05:21.400 --> 00:05:27.300 on the test is checked to make sure it conforms to the normal distribution, or is brought to conform to 00:05:27.300 --> 00:05:28.150 it. NOTE Treffsikkerhet: 82% (H?Y) 00:05:28.150 --> 00:05:34.400 And after that we can calculate a mean and standard deviation expecting that to express the 00:05:34.400 --> 00:05:42.900 population, and then calculate Z scores and provide conversion tables to z-scores other standardized 00:05:42.900 --> 00:05:51.100 scores, scale scores and eventually percentiles. So for each raw score we can express the 00:05:51.100 --> 00:05:58.400 corresponding z-score or scaled score or percentile that it corresponds to and these are the tables NOTE Treffsikkerhet: 80% (H?Y) 00:05:58.400 --> 00:06:05.500 you see acompany every assessment test were you look up your raw score and derive a scaled score, or 00:06:05.500 --> 00:06:07.900 a percentile, or both. NOTE Treffsikkerhet: 87% (H?Y) 00:06:07.900 --> 00:06:17.600 So the result of the test is a percentile rank, the proportion of people in the norming sample this 00:06:17.600 --> 00:06:26.800 person performs better or worse than, so a relative standing. And this percentile is generalizable to 00:06:26.800 --> 00:06:32.049 the population to the extent the norming sample was representative. NOTE Treffsikkerhet: 87% (H?Y) 00:06:32.049 --> 00:06:40.300 Remember percentiles derived from a test are actually corresponding to rankings with respect to the 00:06:40.300 --> 00:06:45.750 norming sample from which the conversion tables were derived. NOTE Treffsikkerhet: 91% (H?Y) 00:06:45.750 --> 00:06:51.600 Now let us look at some examples of interpreting such scores. NOTE Treffsikkerhet: 91% (H?Y) 00:06:51.800 --> 00:06:59.800 Let's start with the case of full scale IQ, which is scaled to a mean of 100 and a standard deviation 00:06:59.800 --> 00:07:01.250 of 15. NOTE Treffsikkerhet: 91% (H?Y) 00:07:01.250 --> 00:07:11.100 So what does the score of 100 on an IQ test mean? Well 100 is equal to the standardization mean by 00:07:11.100 --> 00:07:16.650 definition we make IQ scaled scores to have a mean of 100, NOTE Treffsikkerhet: 91% (H?Y) 00:07:16.650 --> 00:07:24.700 so this is a z-score of zero not away from the mean exactly, equal to the mean, therefore it 00:07:24.700 --> 00:07:34.600 corresponds to the 50th percentile. This means half the people score better and half score worse or 00:07:34.600 --> 00:07:40.400 equal to this score, so an IQ of 100 means you are in the middle. NOTE Treffsikkerhet: 87% (H?Y) 00:07:40.400 --> 00:07:55.400 What is an IQ of 130 mean ? Well 130 is 30 above 100, 30 above the mean and 30 is twice the standard 00:07:55.400 --> 00:08:06.600 deviation of 15, so an IQ of 130 means two standard deviations above the mean that is a z-score of +2 00:08:06.600 --> 00:08:11.500 which corresponds to the 98th percentile. NOTE Treffsikkerhet: 88% (H?Y) 00:08:11.500 --> 00:08:19.700 Note that we don't usually use decimal places for percentiles, we round them up two integers so that 00:08:19.700 --> 00:08:25.700 they're easier to interpret. So we say the 98th percentile here. NOTE Treffsikkerhet: 86% (H?Y) 00:08:26.700 --> 00:08:40.100 What about an IQ of 80 ? Well 80 is 20 points below the mean of 100, and 20 divided by 15 means that's one and 00:08:40.100 --> 00:08:47.800 one third one, and one third standard deviations below the mean, this is what 80 means. 80 means a 00:08:47.800 --> 00:08:57.400 z-score of - 1.33 which corresponds to the 9th percentile, to someone with an IQ of 80. NOTE Treffsikkerhet: 74% (MEDIUM) 00:08:57.400 --> 00:09:06.300 Has a score that is expected to be lower than ninety one percent of the population and equal or 00:09:06.300 --> 00:09:14.700 better than nine percent of the population, and this is the meaning of an IQ equal to 80. NOTE Treffsikkerhet: 91% (H?Y) 00:09:15.600 --> 00:09:23.800 Let's look at a different scaling turning to whisk subscale type scaling which are standardized to 00:09:23.800 --> 00:09:28.150 have a mean of 10 and standard deviation of three. NOTE Treffsikkerhet: 75% (MEDIUM) 00:09:28.150 --> 00:09:37.000 What does a subscales score of 10 mean ? Well that is equal to the standardization mean that is a z-score 00:09:37.000 --> 00:09:44.250 of 0 corresponding to the 50th percentile. So a scale score of 10 means that you're doing 00:09:44.250 --> 00:09:49.500 better than half, and worse than half the sample, you are in the middle. NOTE Treffsikkerhet: 69% (MEDIUM) 00:09:49.600 --> 00:10:00.200 What about a sub scale score of 13 ? Well 13 is three above the mean which is 10, so that is one 00:10:00.200 --> 00:10:02.800 standard deviation above the mean. NOTE Treffsikkerhet: 80% (H?Y) 00:10:02.800 --> 00:10:10.700 The z- score of plus 1 one standard deviation above the mean, corresponds to the 84th percentile, so a 00:10:10.700 --> 00:10:20.900 standard score of 13 means that you are doing better or as well as 84% of the relevant population. So 00:10:20.900 --> 00:10:25.600 children of your age if you're talking about the Whisk. NOTE Treffsikkerhet: 85% (H?Y) 00:10:26.100 --> 00:10:35.500 What about a scaled score of 8 on a whisk subscale ? Well 8 is 2 below the mean of 10 NOTE Treffsikkerhet: 82% (H?Y) 00:10:35.500 --> 00:10:41.450 and that is actually two thirds of a standard deviation, 2/3 of three. NOTE Treffsikkerhet: 79% (H?Y) 00:10:41.450 --> 00:10:51.950 SO 2/3 of of a standard deviation below the mean is z equal to minus 0.67, that's 2/3 00:10:51.950 --> 00:10:59.900 which corresponds to the 25th percentile. So a scale score of eight means you are doing worse than 00:10:59.900 --> 00:11:08.900 75% of the norming sample and as well or better than 25% of that. NOTE Treffsikkerhet: 82% (H?Y) 00:11:09.300 --> 00:11:20.600 What about a scaled score of 4, 4 is 6 below the mean of 10, and 6 is twice three, so two standard 00:11:20.600 --> 00:11:29.000 deviations below the mean, this means a z-score of - 2 which corresponds to the second percentile, so 00:11:29.000 --> 00:11:36.900 a scaled score of 4 means you're doing worse than 98% of the norming sample, or of the corresponding 00:11:36.900 --> 00:11:39.700 comparison population. NOTE Treffsikkerhet: 91% (H?Y) 00:11:40.500 --> 00:11:48.800 We must be very careful with interpretation of test scores, and remember that although they look like 00:11:48.800 --> 00:11:56.050 numbers they're very easy to misinterpret because they don't mean what actual numbers mean. NOTE Treffsikkerhet: 82% (H?Y) 00:11:56.050 --> 00:12:03.400 What I mean to say with that is that scaled scores are not on your Ratio or interval scale. NOTE Treffsikkerhet: 90% (H?Y) 00:12:03.600 --> 00:12:13.349 First of all distances aren't comparable, so the difference between two people who have 115 NOTE Treffsikkerhet: 84% (H?Y) 00:12:13.349 --> 00:12:22.900 and 105 scaled scores of IQ, or intelligence quotients, and the difference between two people who have 00:12:22.900 --> 00:12:30.550 80 and 70 is not comparable, they both look like 10-point differences, they are 10-point differences 00:12:30.550 --> 00:12:36.400 but that doesn't mean they're equal differences. There is nothing that can be said about comparing 00:12:36.400 --> 00:12:42.800 these differences, because these numbers aren't on a scale that has distances. NOTE Treffsikkerhet: 74% (MEDIUM) 00:12:43.000 --> 00:12:52.600 And much more obviously, ratios are completely meaningless because there is no 0 on this scale, so 00:12:52.600 --> 00:13:01.100 there is no sense in which you can say that an IQ of 120 is 50% higher than an IQ of 80, or that an 00:13:01.100 --> 00:13:09.200 IQ of 150 is a hundred percent higher than an IQ of 75 percent, or double that, or anything of that 00:13:09.200 --> 00:13:13.750 sort. These are completely nonsense statements. NOTE Treffsikkerhet: 91% (H?Y) 00:13:13.750 --> 00:13:20.600 The only valid interpretation of test scores is as percentiles, NOTE Treffsikkerhet: 88% (H?Y) 00:13:20.600 --> 00:13:28.400 and that's why you always need to have a conversion table handy so that you can convert the scale 00:13:28.400 --> 00:13:36.300 scores to Z scores, or directly to percentiles, and know what your performance, your test performance 00:13:36.300 --> 00:13:42.400 means in terms of proportion of the population that does better or worse.