WEBVTT Kind: captions; language: en-us NOTE Treffsikkerhet: 82% (H?Y) 00:00:00.000 --> 00:00:05.000 In this video I want to clarify what percentiles are. NOTE Treffsikkerhet: 91% (H?Y) 00:00:05.100 --> 00:00:13.800 According to the definition, a percentile is the data value that corresponds to a given proportion 00:00:13.800 --> 00:00:26.000 of the data. For example, the median is the 50th percentile. So the median is the middle value in a 00:00:26.000 --> 00:00:33.200 sorted data set. It's the one in the middle that is greater than half of the values less than half 00:00:33.200 --> 00:00:35.800 of the values. So it corresponds NOTE Treffsikkerhet: 79% (H?Y) 00:00:35.800 --> 00:00:43.850 to half of the data or otherwise to 50%. So, the median is the 50th percentile. NOTE Treffsikkerhet: 84% (H?Y) 00:00:43.850 --> 00:00:53.700 The first quartile, which is by definition, the data value at the 25 percent point. So 1/4 of data 00:00:53.700 --> 00:01:05.099 points, sorted in increasing order. That is the 25th percentile. The 10th percentile, is the value 00:01:05.099 --> 00:01:13.300 of the data or the score in a set of scores, that is greater than or equal to no less than 10 percent 00:01:13.300 --> 00:01:14.350 of the data. NOTE Treffsikkerhet: 78% (H?Y) 00:01:14.350 --> 00:01:23.100 So, the 10th percentile is the value that is greater than one tenth of the data and is less than 90% 00:01:23.100 --> 00:01:24.449 of the data. NOTE Treffsikkerhet: 84% (H?Y) 00:01:24.449 --> 00:01:35.400 And conversely the 90th percentile is the score that is exceeded by up to 10% of the scores. So it 00:01:35.400 --> 00:01:45.800 is less than 10% and greater than 90%. So, percentiles are data values associated with given 00:01:45.800 --> 00:01:47.449 proportions. NOTE Treffsikkerhet: 83% (H?Y) 00:01:47.449 --> 00:01:56.100 Let's look at some more examples in this simulated data set. These are 50 data points. There are ten 00:01:56.100 --> 00:02:03.100 on each row. If we first sort them in ascending order. So, the first one is the smallest, the last 00:02:03.100 --> 00:02:05.200 one in the is the largest. NOTE Treffsikkerhet: 83% (H?Y) 00:02:05.200 --> 00:02:14.200 And so this shorted values, 20% of the data is on the first row, 40% percent of the data in 00:02:14.200 --> 00:02:17.149 the first two rows and so on. NOTE Treffsikkerhet: 84% (H?Y) 00:02:17.149 --> 00:02:29.050 So in this data set, the 10th percentile is 83, because 10% of the data, the 10th, the 10% NOTE Treffsikkerhet: 91% (H?Y) 00:02:29.050 --> 00:02:37.500 smallest values in the data are these. It's the first five values. First five values in the data 00:02:37.500 --> 00:02:45.600 set of 50 values is the 10% of values, and because they're short ascending order this is the 00:02:45.600 --> 00:02:54.500 smallest. This is the fifth and so it's the 10th percentile. So the 10th percentile is equal to 83. NOTE Treffsikkerhet: 84% (H?Y) 00:02:55.200 --> 00:03:05.100 In the same way, we define the 20th percentile to be the value that is at the 20 percent point. So 00:03:05.100 --> 00:03:16.200 it's 88. The 50th percentile is the value in the middle. So it's 97. That's also the median in this 00:03:16.200 --> 00:03:24.149 set. The 75th percentile, which is the same as the third quartile is 111. NOTE Treffsikkerhet: 91% (H?Y) 00:03:24.149 --> 00:03:36.800 Which is above 75% of the data, below 25% and then 90th percentile is 119. 119 is the score that is 00:03:36.800 --> 00:03:43.200 above 90% of scores and below 10% of scores. So these are the percentiles. NOTE Treffsikkerhet: 91% (H?Y) 00:03:43.200 --> 00:03:53.700 A very closely related term is the percentile rank. The percentile rank is the percentage of values 00:03:53.700 --> 00:03:58.300 that are equal or lower to a data value. NOTE Treffsikkerhet: 81% (H?Y) 00:03:58.500 --> 00:04:07.600 For example, the percentile rank of the median is 50 because by definition, the median is greater 00:04:07.600 --> 00:04:15.500 than 50 percent of the data. Therefore its percentile rank is 50. The percentile rank of the first 00:04:15.500 --> 00:04:19.500 quartile by definition is 25. NOTE Treffsikkerhet: 77% (H?Y) 00:04:19.500 --> 00:04:30.000 The percentile rank of a score that is less than 90 percent of all scores is 10 because if the score 00:04:30.000 --> 00:04:36.400 is less than 90 percent of all scores, it's greater than or equal to 10 percent of all scores. So 00:04:36.400 --> 00:04:45.799 it's 10. The percentile rank of a score that is greater than or equal to 90 percent of all scores is 00:04:45.799 --> 00:04:47.500 90. NOTE Treffsikkerhet: 79% (H?Y) 00:04:48.800 --> 00:04:53.350 And going back to our simulated data set. NOTE Treffsikkerhet: 91% (H?Y) 00:04:53.350 --> 00:04:57.850 The percentile rank of 83 NOTE Treffsikkerhet: 83% (H?Y) 00:04:57.850 --> 00:05:07.700 is the proportion or the percentage of scores that are less than or equal to it. So 83 is less than 00:05:07.700 --> 00:05:18.550 or equal to six values out of 50, which is 12%. Therefore, the percentile rank of 83 is 12. NOTE Treffsikkerhet: 81% (H?Y) 00:05:18.550 --> 00:05:28.000 The percentile rank of 102. This value here, which is greater than or equal to 60 percent of the 00:05:28.000 --> 00:05:39.000 data, is 60. The percentile rank of 120 is therefore 96. There is four percent of the data above it 00:05:39.000 --> 00:05:47.799 and ninety-six percent of the data below or equal to it. So the percentile rank of 120 is NOTE Treffsikkerhet: 83% (H?Y) 00:05:47.799 --> 00:05:49.700 96. NOTE Treffsikkerhet: 91% (H?Y) 00:05:49.700 --> 00:05:56.100 Obviously, the percentile rank can be between 0 and 100. NOTE Treffsikkerhet: 91% (H?Y) 00:05:57.200 --> 00:06:06.700 Now in practice, we use the word percentile a bit more loosely to mean related, but not exactly the 00:06:06.700 --> 00:06:14.500 same things. So we will use the word percentile when we actually mean percentile, or when we mean 00:06:14.500 --> 00:06:21.600 percentile rank or when mean a percentage or a proportion of the data and which one we mean 00:06:21.600 --> 00:06:27.550 is generally quite clear from the context once you're familiar with the notion of the percentile 00:06:27.550 --> 00:06:28.000 and it's NOTE Treffsikkerhet: 67% (MEDIUM) 00:06:28.000 --> 00:06:30.350 converse the percentile rank. NOTE Treffsikkerhet: 90% (H?Y) 00:06:30.350 --> 00:06:41.200 So as an example, we might say a child is at the 10th percentile. When we say that we mean that the 00:06:41.200 --> 00:06:48.600 score that this child got on some test is less than 90% of scores. NOTE Treffsikkerhet: 91% (H?Y) 00:06:48.600 --> 00:06:57.200 So 90% of children score higher, so 10% of children score less than or equal to this core of this 00:06:57.200 --> 00:06:58.250 child. NOTE Treffsikkerhet: 91% (H?Y) 00:06:58.250 --> 00:07:05.800 So we say then that this child is at the 10th percentile. This is not according to the formal 00:07:05.800 --> 00:07:09.700 definition, but this is standard usage. NOTE Treffsikkerhet: 85% (H?Y) 00:07:09.700 --> 00:07:16.549 You may also hear something like its score is in the 5th percentile. NOTE Treffsikkerhet: 91% (H?Y) 00:07:16.549 --> 00:07:25.000 When you use an expression like this, we actually refer to a proportion of data. So this would mean 00:07:25.000 --> 00:07:34.000 that the score we talking about is within the lowest 5%. It could be lower than the 50th percentile 00:07:34.000 --> 00:07:40.900 or it could be at the 5th percentile, but it's within the lowest 5%. When I say it's in the 5th 00:07:40.900 --> 00:07:42.600 percentile. NOTE Treffsikkerhet: 91% (H?Y) 00:07:44.500 --> 00:07:52.650 Finally we might say something like the 16th percentile is one standard deviation below the mean. 00:07:52.650 --> 00:08:00.100 This is actually proper usage. It means the 16% of values are less than one standard deviation below 00:08:00.100 --> 00:08:01.350 the mean. NOTE Treffsikkerhet: 91% (H?Y) 00:08:01.350 --> 00:08:07.900 So all of these uses of the word percentile, which are slightly different, are related to the same 00:08:07.900 --> 00:08:15.800 concept to the idea of the proportion of the data that are associated with a given value.