WEBVTT Kind: captions; language: en-us NOTE Treffsikkerhet: 91% (H?Y) 00:00:00.000 --> 00:00:08.700 In this video we will talk about normality that is evaluating the degree to which a certain set of 00:00:08.700 --> 00:00:16.200 observations can be said to approximate the normal distribution. To do that we have three sets of 00:00:16.200 --> 00:00:25.500 tools and we usually apply them all, the first way to evaluate normality is to look at graphs. So the 00:00:25.500 --> 00:00:29.900 graphical evaluation of normality includes looking at the NOTE Treffsikkerhet: 81% (H?Y) 00:00:29.900 --> 00:00:35.300 histogram and what is called the q-q plot or quantile quantile plot. NOTE Treffsikkerhet: 91% (H?Y) 00:00:35.300 --> 00:00:43.500 The second way of evaluating normality is to calculate certain numerical indices whose values we 00:00:43.500 --> 00:00:51.400 know for what is really normally distributed, and these are called skewness and kurtosis and then see 00:00:51.400 --> 00:00:57.650 if the values you will get for our data set deviate from those expected for a normal distribution 00:00:57.650 --> 00:01:05.050 very much. And the second way to evaluate normality is to run statistical test of NOTE Treffsikkerhet: 86% (H?Y) 00:01:05.050 --> 00:01:13.900 normality these tests answer a slightly different question that is the answer the question can this 00:01:13.900 --> 00:01:21.400 set of measurements have come from sampling from a normally distributed population. So let us start 00:01:21.400 --> 00:01:25.900 by looking at the graphical evaluation of normality NOTE Treffsikkerhet: 89% (H?Y) 00:01:27.900 --> 00:01:37.600 This small data set includes 47 measurements of word reading fluency in grade 1, so there are 47 00:01:37.600 --> 00:01:45.400 children they were asked to read a list of words within 45 seconds and then at the end of 45 seconds we 00:01:45.400 --> 00:01:48.550 counted how many words they read correctly. NOTE Treffsikkerhet: 87% (H?Y) 00:01:48.550 --> 00:01:54.300 And as you see there are some children who didn't really know how to read, read zero or very few 00:01:54.300 --> 00:02:01.400 words and there were also some children who read quite a bunch of words almost up to 50, so 00:02:01.400 --> 00:02:05.400 these are actually pretty good readers for first grade. NOTE Treffsikkerhet: 90% (H?Y) 00:02:05.400 --> 00:02:12.600 Now what does this distribution look like, does it look like normally distributed this is a very 00:02:12.600 --> 00:02:18.900 realistic and reasonable shape to get from actual data, but we need to know whether it is 00:02:18.900 --> 00:02:25.300 approximating the normal distribution sufficiently well in order to be able to run statistical tests 00:02:25.300 --> 00:02:29.700 that depend on the Assumption of normally distributed data. NOTE Treffsikkerhet: 91% (H?Y) 00:02:30.400 --> 00:02:37.000 Let us look into this distribution in some more detail to understand how we evaluate it, the first 00:02:37.000 --> 00:02:44.800 thing we do is look at the histogram. This looks fairly nice except for the Gap here, it seems that we 00:02:44.800 --> 00:02:52.200 didn't get enough values between 10 and 15 correct words per minute whereas the rest of the shape 00:02:52.200 --> 00:02:58.100 looks kind of okay or it could be that there are a bit more values NOTE Treffsikkerhet: 91% (H?Y) 00:02:58.100 --> 00:03:06.300 slightly below 10 and slightly above 15 and this is kind of a minor little Gap that doesn't really 00:03:06.300 --> 00:03:13.700 mean that the data aren't normally distributed. So the histogram looks possibly okay NOTE Treffsikkerhet: 73% (MEDIUM) 00:03:15.900 --> 00:03:22.800 let us see if the actual values that were obtained the actual measurement, so each yellow line here 00:03:22.800 --> 00:03:30.399 indicates one data point, we can see that there was one child who did not read any words NOTE Treffsikkerhet: 74% (MEDIUM) 00:03:30.399 --> 00:03:36.900 so this child read nothing, could not read any words by the time of measurement in the middle of 00:03:36.900 --> 00:03:44.900 the first grade falling behind a bit there are a couple of children reading 49 words, there are a 00:03:44.900 --> 00:03:53.300 bunch of children reading nine words only one child between 10 and 15. A bunch of children between 15 00:03:53.300 --> 00:03:59.950 and 20 and so on, so each yellow line is one data point and they have been slightly NOTE Treffsikkerhet: 71% (MEDIUM) 00:03:59.950 --> 00:04:05.900 so that you can see more data points when they actually fall on top of one another. NOTE Treffsikkerhet: 91% (H?Y) 00:04:08.100 --> 00:04:17.300 This is the normal distribution with the same mean and standard deviation as our data set, so a mean 00:04:17.300 --> 00:04:21.750 of about 23 and a standard deviation of about 12 NOTE Treffsikkerhet: 91% (H?Y) 00:04:21.750 --> 00:04:26.500 and this is a perfectly normally distributed curve. NOTE Treffsikkerhet: 91% (H?Y) 00:04:27.100 --> 00:04:35.300 It is not entirely straightforward to directly compare the histogram with this curve, but we can more 00:04:35.300 --> 00:04:43.400 easily compare the data set with an idealized data set that is normally distributed by taking into 00:04:43.400 --> 00:04:52.100 account the distances between the data points, so if you look at the data points, the yellow lines 00:04:52.100 --> 00:04:53.900 down here. NOTE Treffsikkerhet: 91% (H?Y) 00:04:53.900 --> 00:05:00.900 You can see that they are distributed such that the first two data points have this distance between 00:05:00.900 --> 00:05:02.150 them NOTE Treffsikkerhet: 75% (MEDIUM) 00:05:02.150 --> 00:05:11.000 the second from the third has a slightly larger distance, and then a slightly smaller distance and so 00:05:11.000 --> 00:05:12.300 on. NOTE Treffsikkerhet: 89% (H?Y) 00:05:12.300 --> 00:05:19.200 So the way to evaluate this is to compare it directly to the distances that would be expected from 00:05:19.200 --> 00:05:28.600 the normal distribution here is a set of 47 idealized values, these aren't actual data points and 00:05:28.600 --> 00:05:35.700 they're not randomly sampled from some population, they are mathematically created to have the exact 00:05:35.700 --> 00:05:41.850 distances that are expected for normally distributed data, so as you can see NOTE Treffsikkerhet: 82% (H?Y) 00:05:41.850 --> 00:05:51.000 the data are more tightly packed around the mean of 0 and they are becoming sparser and sparser as 00:05:51.000 --> 00:05:53.299 you go farther from the mean. NOTE Treffsikkerhet: 82% (H?Y) 00:05:53.299 --> 00:06:00.400 So the largest distances are between the first pair of data, and then the distance is get smaller as 00:06:00.400 --> 00:06:07.400 we approach the mean, tightest at the mean, and then again spreading out. And then this rate of 00:06:07.400 --> 00:06:14.500 tightening and spreading out is exactly equal to what would be expected from the mathematical normal 00:06:14.500 --> 00:06:16.300 distribution. NOTE Treffsikkerhet: 83% (H?Y) 00:06:16.400 --> 00:06:21.750 And these are our actual measurements, these are real data. NOTE Treffsikkerhet: 83% (H?Y) 00:06:21.750 --> 00:06:29.350 And what we can do is you can graph these distances in a way that makes it very easily visible 00:06:29.350 --> 00:06:34.150 whether they are reasonable approximations of each other. NOTE Treffsikkerhet: 91% (H?Y) 00:06:34.150 --> 00:06:38.800 So here is what is called a quantile quantile plot NOTE Treffsikkerhet: 89% (H?Y) 00:06:38.800 --> 00:06:47.299 and here is the line on which every point would fall if the distances were exactly the same 00:06:47.299 --> 00:06:50.200 between the actual data set NOTE Treffsikkerhet: 91% (H?Y) 00:06:50.200 --> 00:06:59.600 and the idealized set of values that are normally distributed. The way this graph is created is that 00:06:59.600 --> 00:07:03.650 we plot each data point against the other, NOTE Treffsikkerhet: 91% (H?Y) 00:07:03.650 --> 00:07:10.500 so starting from the left the lowest value in each of these. NOTE Treffsikkerhet: 91% (H?Y) 00:07:10.500 --> 00:07:21.800 We plot one day at one point here which on this axis has the value from the idealized set, the 00:07:21.800 --> 00:07:31.800 mathematically expected value, that's why it says here theoretical value. And on the vertical axis it 00:07:31.800 --> 00:07:37.600 has the value of the first data point here so it's zero. NOTE Treffsikkerhet: 87% (H?Y) 00:07:37.800 --> 00:07:50.100 And then the next data point links on the horizontal axis here the second theoretical value on the vertical 00:07:50.100 --> 00:07:54.100 axis, the second actual measurement NOTE Treffsikkerhet: 86% (H?Y) 00:07:55.000 --> 00:08:04.500 and so on for each, this is the third point and then up to the last point in which we plot the value 00:08:04.500 --> 00:08:13.900 of the largest theoretically expected point, or 47 points, with the largest measurement we actually 00:08:13.900 --> 00:08:17.700 obtained which is 49. NOTE Treffsikkerhet: 80% (H?Y) 00:08:17.800 --> 00:08:29.300 So if the distances between these actual data points were becoming smaller than larger at exactly 00:08:29.300 --> 00:08:36.700 the same rate as the theoretically expected distances, then all of these yellow data points would 00:08:36.700 --> 00:08:42.000 fall on the red line they would form a straight line NOTE Treffsikkerhet: 91% (H?Y) 00:08:42.000 --> 00:08:47.849 and then we would know that our data would be perfectly normally distributed. NOTE Treffsikkerhet: 91% (H?Y) 00:08:47.849 --> 00:08:54.900 That doesn't ever happen in practice because data are never perfectly normally distributed but 00:08:54.900 --> 00:09:01.950 they're often approximately normally distributed, as in this case where we can see that NOTE Treffsikkerhet: 80% (H?Y) 00:09:01.950 --> 00:09:09.100 the points generally lie very close to the line and do not form another shape that might be 00:09:09.100 --> 00:09:12.400 distinctly different from a straight line. NOTE Treffsikkerhet: 87% (H?Y) 00:09:12.400 --> 00:09:21.100 This is a QQ plot as produced by Jimovie which is exactly the same as the one I showed you on the 00:09:21.100 --> 00:09:29.750 previous page, the only difference is that on the vertical axis it's using the Z scores for the data 00:09:29.750 --> 00:09:37.400 rather than values as we know from our Z score calculation this makes absolutely no 00:09:37.400 --> 00:09:40.500 difference in the shape of the distribution, NOTE Treffsikkerhet: 89% (H?Y) 00:09:40.500 --> 00:09:48.300 it just makes these values more directly comparable with the theoretically expected values, but 00:09:48.300 --> 00:09:56.700 otherwise the graph is exactly the same. And by looking at this we can see that our points fall 00:09:56.700 --> 00:10:04.600 reasonably close to the theoretically expected distribution which is indicated essentially by this 00:10:04.600 --> 00:10:06.500 straight line. NOTE Treffsikkerhet: 91% (H?Y) 00:10:09.400 --> 00:10:20.700 So here are three different measurements from these 47 children and the corresponding QQ plots on 00:10:20.700 --> 00:10:27.900 the bottom row, so histograms on the top row, QQ plots on the bottom row NOTE Treffsikkerhet: 89% (H?Y) 00:10:27.900 --> 00:10:36.700 and these are the reference lines and on the top row on the histograms I have added the density 00:10:36.700 --> 00:10:47.400 lines. So this is the density in line for the matrices test, this is the density line for reading 00:10:47.400 --> 00:10:55.000 fluency in kindergarten where most children actually cannot read at all so they get a score of 0, and 00:10:55.000 --> 00:10:58.500 this is the histogram with the density line for the first NOTE Treffsikkerhet: 72% (MEDIUM) 00:10:58.500 --> 00:11:00.150 grade NOTE Treffsikkerhet: 91% (H?Y) 00:11:00.150 --> 00:11:03.600 which are the data we were looking at NOTE Treffsikkerhet: 91% (H?Y) 00:11:03.600 --> 00:11:11.700 where most kids can actually read somewhere between 20 and 25 words per minute. NOTE Treffsikkerhet: 84% (H?Y) 00:11:11.800 --> 00:11:23.400 So if we add the theoretical normal distribution curves with the same mean and standard deviation as 00:11:23.400 --> 00:11:33.200 each data set, we can look at the comparison between the theoretical curve, the red line, and the 00:11:33.200 --> 00:11:37.950 actual density of our data, this purple line. NOTE Treffsikkerhet: 91% (H?Y) 00:11:37.950 --> 00:11:44.750 So if our data were normally distributed these two lines would be identical, now they're pretty close 00:11:44.750 --> 00:11:52.250 so we can see that the comparison between these two curves is very similar to looking at these data 00:11:52.250 --> 00:11:56.650 points here plotted in comparison to the reference line NOTE Treffsikkerhet: 91% (H?Y) 00:11:56.650 --> 00:12:01.900 This is easier to interpret than comparing these two curves NOTE Treffsikkerhet: 91% (H?Y) 00:12:01.900 --> 00:12:09.400 we can see that for the data set in the middle the shape of the distribution has nothing to do with 00:12:09.400 --> 00:12:16.000 the normal distribution and of course the QQ plot also shows a completely different shape for the 00:12:16.000 --> 00:12:19.250 data points compared to the reference line. NOTE Treffsikkerhet: 91% (H?Y) 00:12:19.250 --> 00:12:24.750 So these data are most certainly not normally distributed at all. NOTE Treffsikkerhet: 91% (H?Y) 00:12:24.750 --> 00:12:33.000 And for these data we see that the two curves are reasonably close to one another, and of course the 00:12:33.000 --> 00:12:41.600 points on the QQ plot are reasonably close to the reference line and do not form a different shape. NOTE Treffsikkerhet: 89% (H?Y) 00:12:42.200 --> 00:12:48.050 So that's how we evaluate histograms and QQ plots. NOTE Treffsikkerhet: 91% (H?Y) 00:12:48.050 --> 00:12:56.000 Let us now look at the indices that we can calculate that tell us the degree to which a distribution 00:12:56.000 --> 00:13:01.300 has properties that are similar to those of the normal distribution. NOTE Treffsikkerhet: 91% (H?Y) 00:13:04.100 --> 00:13:13.400 These are data that are theoretically created to be perfectly normally distributed for this sample 00:13:13.400 --> 00:13:17.450 size of 47, and as you can see NOTE Treffsikkerhet: 83% (H?Y) 00:13:17.450 --> 00:13:24.700 the data fall exactly on this straight line, the straight reference line of the QQ plot, confirming 00:13:24.700 --> 00:13:31.900 that their distances are as theoretically expected and this is the normally distributed shape of the 00:13:31.900 --> 00:13:33.350 histogram NOTE Treffsikkerhet: 91% (H?Y) 00:13:33.350 --> 00:13:37.400 which has the properties that it is symmetric, NOTE Treffsikkerhet: 91% (H?Y) 00:13:37.400 --> 00:13:46.400 and it is of a certain rate of decline, not too fast, not too slow. NOTE Treffsikkerhet: 90% (H?Y) 00:13:46.500 --> 00:13:53.400 This actually if you look very carefully does not look perfectly symmetric, but that's because there 00:13:53.400 --> 00:14:03.300 is an odd number of data points. So at some boundary, one point must go to one of the two sides and 00:14:03.300 --> 00:14:07.800 that makes one bar looks slightly longer than the other one NOTE Treffsikkerhet: 91% (H?Y) 00:14:07.800 --> 00:14:11.800 but the data are in fact perfectly symmetric. NOTE Treffsikkerhet: 91% (H?Y) 00:14:13.100 --> 00:14:17.050 So these are normally distributed data NOTE Treffsikkerhet: 73% (MEDIUM) 00:14:17.050 --> 00:14:19.800 and here we can see NOTE Treffsikkerhet: 91% (H?Y) 00:14:19.800 --> 00:14:27.800 the reference line in red, on top here we can see the individual data points plotted below the 00:14:27.800 --> 00:14:29.200 histogram NOTE Treffsikkerhet: 91% (H?Y) 00:14:29.200 --> 00:14:38.900 the density line which is this light purple here, and the red line which is the theoretical normal 00:14:38.900 --> 00:14:42.950 distribution with the same mean and standard deviation. NOTE Treffsikkerhet: 82% (H?Y) 00:14:42.950 --> 00:14:51.600 And we can already see that although the data conform to the expected distances the theoretical 00:14:51.600 --> 00:14:57.600 normal distribution with the same mean and standard deviation doesn't actually have the exact same 00:14:57.600 --> 00:15:00.100 shape as the density NOTE Treffsikkerhet: 91% (H?Y) 00:15:00.100 --> 00:15:07.599 and that's because you can never perfectly approach the normal distribution with such a small sample, 00:15:07.599 --> 00:15:16.750 because there aren't enough data far from the mean, the normal distribution doesn't end abruptly, 00:15:16.750 --> 00:15:25.200 it never ends it just drags on to Infinity, except with very few data out there. But you only 00:15:25.200 --> 00:15:30.250 have 47 data points, so these two curves here NOTE Treffsikkerhet: 85% (H?Y) 00:15:30.250 --> 00:15:40.700 do not perfectly match. Although the data points are perfectly matched to the expected distances. NOTE Treffsikkerhet: 91% (H?Y) 00:15:41.300 --> 00:15:48.600 So the two properties that we can evaluate numerically, our first this symmetry NOTE Treffsikkerhet: 91% (H?Y) 00:15:48.600 --> 00:15:54.950 and second the rate of data points as we go far from the mean NOTE Treffsikkerhet: 91% (H?Y) 00:15:54.950 --> 00:16:03.100 So the first one is called skewness, the degree of skewness is the degree to which the distribution 00:16:03.100 --> 00:16:10.500 departs from perfect symmetry, so a skewed distribution is an asymmetrical distribution. The normal 00:16:10.500 --> 00:16:16.000 distribution is perfectly symmetrical and so it has no skewness, the skewness of the normal 00:16:16.000 --> 00:16:18.400 distribution is 0. NOTE Treffsikkerhet: 76% (H?Y) 00:16:18.400 --> 00:16:27.500 So any symmetrical distribution will have a skewness of zero, whether or not its shape is at all 00:16:27.500 --> 00:16:36.800 similar to the normal distribution. The second index is called kurtosis and kurtosis evaluates the 00:16:36.800 --> 00:16:45.350 ratio of points that are far from the mean, are there enough points at large distances from the mean 00:16:45.350 --> 00:16:47.750 as we would expect from the normal NOTE Treffsikkerhet: 68% (MEDIUM) 00:16:47.750 --> 00:16:55.500 distribution are there are too few or too many compared, to the normal distribution. So kurtosis 00:16:55.500 --> 00:17:03.000 doesn't actually evaluate the shape of the curve, but it evaluates whether there are too few or too 00:17:03.000 --> 00:17:08.949 many data points far from the mean compared to the normal distribution. NOTE Treffsikkerhet: 83% (H?Y) 00:17:08.949 --> 00:17:19.700 For the normal distribution we normalize this value to be 0, so 0 excess kurtosis indicates that 00:17:19.700 --> 00:17:29.200 there are as many data far from the mean as we would expect. Again having a ketosis of zero does not 00:17:29.200 --> 00:17:37.300 imply that the distribution looks like the normal distribution, it only evaluates a single property. 00:17:37.300 --> 00:17:38.450 So skewness NOTE Treffsikkerhet: 90% (H?Y) 00:17:38.450 --> 00:17:47.000 and kurtosis are two numerical indices that can be used to tell us if these two specific properties 00:17:47.000 --> 00:17:54.700 of the normal distribution hold in our data set, if they do not hold then we know our data aren't 00:17:54.700 --> 00:17:56.550 normally distributed. NOTE Treffsikkerhet: 91% (H?Y) 00:17:56.550 --> 00:18:03.500 If they do hold we do not know that our data are normally distributed because they could be other 00:18:03.500 --> 00:18:09.900 deviations from the normal distribution that is not captured by these indices, so that's why we need 00:18:09.900 --> 00:18:15.250 to use all the different approaches to evaluate normality simultaneously. NOTE Treffsikkerhet: 89% (H?Y) 00:18:15.250 --> 00:18:25.600 So what are the values for these data, for this made-up theoretically perfect data set, the skewness 00:18:25.600 --> 00:18:34.000 is 0 because this was made to be perfectly symmetrical, kurtosis is not exactly zero although the 00:18:34.000 --> 00:18:40.700 distances are exactly as the theoretically expected ones for the reason that there aren't enough 00:18:40.700 --> 00:18:45.600 data points far from the mean because the sample is too small, so even though the NOTE Treffsikkerhet: 91% (H?Y) 00:18:45.600 --> 00:18:48.000 distances are the right ones NOTE Treffsikkerhet: 91% (H?Y) 00:18:48.000 --> 00:18:56.700 with only 47 points the distribution is cut too short and so there is negative kurtosis. Negative 00:18:56.700 --> 00:19:04.800 ketosis means to few data points far from the mean, whereas positive kurtosis would indicate too many 00:19:04.800 --> 00:19:07.250 data points far from the mean. NOTE Treffsikkerhet: 88% (H?Y) 00:19:07.250 --> 00:19:18.949 However this is a very small number, so it is not as far from 0 as 1, in general skewness and kurtosis 00:19:18.949 --> 00:19:27.100 between minus 1 and 1 are considered to be close enough to what would be expected from normally 00:19:27.100 --> 00:19:36.150 distributed data, so in general we do not worry if we see skewness and kurtosis between minus 1 and 1. 00:19:36.150 --> 00:19:37.250 So these NOTE Treffsikkerhet: 83% (H?Y) 00:19:37.250 --> 00:19:44.100 values are consistent with normal distribution, which is a relief because these data were made to be as 00:19:44.100 --> 00:19:54.650 perfectly normally distributed as possible with n equals 47. On the bottom here I have also indicated 00:19:54.650 --> 00:20:00.250 a statistical test of normality that is called a Shapiro Wilkes test. NOTE Treffsikkerhet: 84% (H?Y) 00:20:00.250 --> 00:20:08.900 A Shapiro Wilkes test produces two numbers, one is an index called w NOTE Treffsikkerhet: 82% (H?Y) 00:20:09.300 --> 00:20:18.600 that varies between 0 and 1, 0 indicating that the distribution is completely unlike the normal 00:20:18.600 --> 00:20:26.800 distribution and 1 indicating exactly like the normal distribution. However we do not interpret 00:20:26.800 --> 00:20:34.700 this number directly the reason for that will become clearer after we've discussed sampling, but for 00:20:34.700 --> 00:20:40.000 now you should remember that when we evaluate the results of a Shapiro Wilkes NOTE Treffsikkerhet: 90% (H?Y) 00:20:40.000 --> 00:20:45.300 we do not refer to W but we refer to P, NOTE Treffsikkerhet: 85% (H?Y) 00:20:45.300 --> 00:20:54.400 the p-value is the probability that a distribution like this one could have arisen by random 00:20:54.400 --> 00:20:58.300 sampling from a normally distributed population. NOTE Treffsikkerhet: 91% (H?Y) 00:20:58.300 --> 00:21:06.800 So this test is answering a different question, the question is could we get a shape like this if we 00:21:06.800 --> 00:21:10.350 were sampling from a perfectly normal distribution. NOTE Treffsikkerhet: 88% (H?Y) 00:21:10.350 --> 00:21:18.850 And so if this value is large it's a probability, so it will be between 0 and 1, NOTE Treffsikkerhet: 91% (H?Y) 00:21:18.850 --> 00:21:24.550 if it's a large value here it's the largest value possible, NOTE Treffsikkerhet: 82% (H?Y) 00:21:24.550 --> 00:21:31.400 and indicates that this data set is consistent with sampling from a normal distribution. Of course 00:21:31.400 --> 00:21:37.200 this a relief because these were data that were made to be perfectly normally distributed for the 00:21:37.200 --> 00:21:42.000 sample size, so we get perfect values for w and P. NOTE Treffsikkerhet: 91% (H?Y) 00:21:42.000 --> 00:21:48.600 If this value is small, is smaller than 0.05 NOTE Treffsikkerhet: 91% (H?Y) 00:21:48.600 --> 00:21:55.900 this means that it would have been unlikely to obtain a distribution such as this from a normally 00:21:55.900 --> 00:21:58.300 distributed population NOTE Treffsikkerhet: 80% (H?Y) 00:21:58.300 --> 00:22:06.400 in that case we would say that our data set fails the Shapiro Wilks test, and that our data are not 00:22:06.400 --> 00:22:13.650 likely to have come from a normally distributed population. But in this case we get perfect indices 00:22:13.650 --> 00:22:16.900 consistent with the perfect data. NOTE Treffsikkerhet: 88% (H?Y) 00:22:17.200 --> 00:22:23.250 Let us look at a variation now, what about asymmetrical data NOTE Treffsikkerhet: 83% (H?Y) 00:22:23.250 --> 00:22:32.800 this is a slight asymmetry, these are left skewed data so there is a skew on the left side like the 00:22:32.800 --> 00:22:41.850 data are kind of leaning, they're not symmetrical but this is not a huge deviation from the Symmetry. 00:22:41.850 --> 00:22:49.900 We can see the effect very clearly on the histogram and we can see it clearly on the QQ plot because 00:22:49.900 --> 00:22:53.600 we see that the dots now do not form a line NOTE Treffsikkerhet: 91% (H?Y) 00:22:53.600 --> 00:22:59.300 and aren't dispersed around the line, but rather they form a curve NOTE Treffsikkerhet: 89% (H?Y) 00:22:59.300 --> 00:23:06.100 on the right side of the reference line which indicates left skewness. NOTE Treffsikkerhet: 87% (H?Y) 00:23:07.200 --> 00:23:14.900 Here are the reference curves for both the histogram and the q-q plot in red, NOTE Treffsikkerhet: 91% (H?Y) 00:23:14.900 --> 00:23:23.600 and we can see that the density is slightly skewed, consistent with the histogram, and we can see that 00:23:23.600 --> 00:23:33.400 the data points have some more values to the left of the most dense area here. What are the indices 00:23:33.400 --> 00:23:36.250 associated with this shape ? NOTE Treffsikkerhet: 78% (H?Y) 00:23:36.250 --> 00:23:50.500 The skewness for this data set is - 0.73 so minus means that the data are skewed to the left, it means 00:23:50.500 --> 00:23:58.600 that it is on this side that the data dragged on, and this is the side that is more abrupt. NOTE Treffsikkerhet: 80% (H?Y) 00:23:59.400 --> 00:24:11.300 0.73 is closer to 0 than 1 so it's between minus 1 and 1 which indicates that the skewness is mild 00:24:11.300 --> 00:24:15.000 it is not a serious deviation from normality NOTE Treffsikkerhet: 85% (H?Y) 00:24:15.000 --> 00:24:18.100 as far as skewness is concerned. NOTE Treffsikkerhet: 86% (H?Y) 00:24:18.100 --> 00:24:24.000 Kurtosis for this data set also comes to 0.73 NOTE Treffsikkerhet: 86% (H?Y) 00:24:24.000 --> 00:24:31.650 which is between minus 1 and 1, indicated that it's not a serious deviation and is positive 00:24:31.650 --> 00:24:38.900 indicating that there is a slight excess of distant data points and this is mostly because of this 00:24:38.900 --> 00:24:47.300 data point which is a bit far from the mean given the size of the sample. NOTE Treffsikkerhet: 66% (MEDIUM) 00:24:47.800 --> 00:24:59.800 The shapiro Wilks test produces a probability of about 25%/26% W is very high is near 1, and P is also 00:24:59.800 --> 00:25:08.350 quite high so 26% for a set of data with this shape, or a more extreme one to have come from sampling 00:25:08.350 --> 00:25:11.500 a perfectly normal population. NOTE Treffsikkerhet: 83% (H?Y) 00:25:11.500 --> 00:25:19.900 Therefore this data set does not fail the Shapiro Wilks test for normality and is sufficiently 00:25:19.900 --> 00:25:26.800 normally distributed for these criteria, all of these criteria, so skewness is within minus 1 and 1 00:25:26.800 --> 00:25:35.850 kurtosis is within -1 and 1, and the Shapiro Works probability is greater than 0.05. NOTE Treffsikkerhet: 91% (H?Y) 00:25:35.850 --> 00:25:41.100 This doesn't mean that we accept our data is normally distributed, NOTE Treffsikkerhet: 90% (H?Y) 00:25:41.100 --> 00:25:50.000 it means that these three criteria are fulfilled, we pass these three, but we still have to contend 00:25:50.000 --> 00:25:53.150 with the QQ plot and the histogram. NOTE Treffsikkerhet: 84% (H?Y) 00:25:53.150 --> 00:26:00.600 Which suggests a mild deviation and in this case there are some times where we might want to correct 00:26:00.600 --> 00:26:05.400 this deviation and other times that we might not want to. NOTE Treffsikkerhet: 91% (H?Y) 00:26:07.000 --> 00:26:14.600 This is the exact same situation except now that the skewness is on the right side, so this is a 00:26:14.600 --> 00:26:17.000 right skewed data set NOTE Treffsikkerhet: 91% (H?Y) 00:26:17.000 --> 00:26:26.350 so it's asymmetric there are excess distant values on the right side and the QQ plot is curving 00:26:26.350 --> 00:26:33.400 upwards is forming a shape that is distinctly different from the reference line, however not very far 00:26:33.400 --> 00:26:41.400 from it, so this is not a clear violation, it's a mild deviation here we can see the reference Curves 00:26:41.400 --> 00:26:46.650 in red and we can see that the density is reasonably close to the normal NOTE Treffsikkerhet: 90% (H?Y) 00:26:46.650 --> 00:26:53.850 distribution with the same mean and standard deviation with this exception, and we can see that 00:26:53.850 --> 00:26:57.000 the numerical indices are the same. NOTE Treffsikkerhet: 87% (H?Y) 00:26:57.000 --> 00:27:03.400 Because these are the same data but symmetrically mirrored on the other side, so now skewness is 00:27:03.400 --> 00:27:13.600 positive, is plus 0.73, kurtosis is of course the same and the Shapiro Wilkes test also is the same 00:27:13.600 --> 00:27:17.100 because the shape of the distribution is the same. NOTE Treffsikkerhet: 91% (H?Y) 00:27:17.500 --> 00:27:24.100 Let us look at a more severe deviation, so more skewed data. NOTE Treffsikkerhet: 91% (H?Y) 00:27:24.100 --> 00:27:27.600 These data are skewed to the left NOTE Treffsikkerhet: 91% (H?Y) 00:27:27.600 --> 00:27:37.300 again and so we have a more obvious deviation here the curve formed by these points on the QQ plot 00:27:37.300 --> 00:27:45.200 starts to depart substantially from the reference line and here are the reference Curves in red and 00:27:45.200 --> 00:27:48.200 we can see that both the density here NOTE Treffsikkerhet: 87% (H?Y) 00:27:48.200 --> 00:27:55.750 and the line here are getting a bit far from the theoretically expected lines. NOTE Treffsikkerhet: 70% (MEDIUM) 00:27:55.750 --> 00:28:07.000 And the indices now are a larger negative skewness which almost -2 its outside the range of minus 1 00:28:07.000 --> 00:28:13.550 to 1 so this is moderate skewness to the left, because it's negativen NOTE Treffsikkerhet: 61% (MEDIUM) 00:28:13.550 --> 00:28:16.500 kurtosis is very large NOTE Treffsikkerhet: 80% (H?Y) 00:28:16.500 --> 00:28:24.200 because of this data point which is very far from the mean compared to what would be expected for 47 00:28:24.200 --> 00:28:33.400 points. Then we can see that the Shapiro Wilkes W is now much lower than the previous cases and this 00:28:33.400 --> 00:28:40.200 one which is what we should be looking at the p-value for this Shapiro Wilks test is much lower than 00:28:40.200 --> 00:28:41.650 1,000. NOTE Treffsikkerhet: 91% (H?Y) 00:28:41.650 --> 00:28:49.600 So it is much lower than 0.05 it is actually a very small number which means that it is very 00:28:49.600 --> 00:28:57.300 unlikely to get a shape like this if you sample a normally distributed population, it's not 00:28:57.300 --> 00:29:00.100 impossible but it's unlikely. NOTE Treffsikkerhet: 91% (H?Y) 00:29:00.100 --> 00:29:10.000 So this data set fails all three tests, it doesn't pass any of these criteria, so this data set is 00:29:10.000 --> 00:29:14.850 most certainly not a good approximation to the normal distribution NOTE Treffsikkerhet: 89% (H?Y) 00:29:14.850 --> 00:29:23.200 and this is the same on the other side so a right skewed moderate deviation, moderate skewness we can 00:29:23.200 --> 00:29:31.600 see here the reference curves and the indices and statistical tests that have the same values except 00:29:31.600 --> 00:29:34.600 the skewness is now positive. NOTE Treffsikkerhet: 91% (H?Y) 00:29:37.100 --> 00:29:43.250 What about an even more extreme deviation from normality NOTE Treffsikkerhet: 91% (H?Y) 00:29:43.250 --> 00:29:52.000 this data set produces a density curve that is markedly different from the reference normal 00:29:52.000 --> 00:30:01.400 distribution and the data are now quite far from the reference line, there is a clear curve formed so 00:30:01.400 --> 00:30:08.200 the data on the QQ plot do not look like a straight line at all, they're not scattered around the 00:30:08.200 --> 00:30:13.700 reference line instead they form a clear and sharp curve that NOTE Treffsikkerhet: 91% (H?Y) 00:30:13.700 --> 00:30:21.400 curves away from the reference line at both ends and consistent with these observations we see that 00:30:21.400 --> 00:30:27.500 there is high right skewness so this is positive and greater than 3, NOTE Treffsikkerhet: 82% (H?Y) 00:30:27.500 --> 00:30:36.200 very high kurtosis because there is a data point very far from the mean, much farther than would ever 00:30:36.200 --> 00:30:44.700 be expected for 47 data points, and accordingly the Shapiro Wilkes test produces a low W and a very 00:30:44.700 --> 00:30:50.300 very low probability the number is actually much smaller than that, but all we need to know is that 00:30:50.300 --> 00:30:53.500 it's smaller than 0.05. NOTE Treffsikkerhet: 91% (H?Y) 00:30:53.500 --> 00:31:01.000 So this artificial data set clearly fails the test of normality it is not approximately normally 00:31:01.000 --> 00:31:03.000 distributed. NOTE Treffsikkerhet: 75% (MEDIUM) 00:31:03.900 --> 00:31:16.100 Here is another artificial data set, this one is symmetric but is made to look chunkier, so it doesn't 00:31:16.100 --> 00:31:23.800 gradually fall off as you might expect for normally distributed data, it's bulkier and this you can 00:31:23.800 --> 00:31:33.500 see on the QQ plot because of the way that the points trail off the reference line indicating that 00:31:33.500 --> 00:31:34.300 it starts NOTE Treffsikkerhet: 74% (MEDIUM) 00:31:34.300 --> 00:31:39.750 of to steep and then it goes off to steep again. NOTE Treffsikkerhet: 85% (H?Y) 00:31:39.750 --> 00:31:45.600 And these are the red reference curves where you can see that NOTE Treffsikkerhet: 91% (H?Y) 00:31:45.600 --> 00:31:49.500 the density of this distribution NOTE Treffsikkerhet: 84% (H?Y) 00:31:49.500 --> 00:32:00.900 approaches the normal distribution but gets flat here because it rises too fast then stays high for 00:32:00.900 --> 00:32:06.949 too long then drops too fast in comparison to the normal distribution NOTE Treffsikkerhet: 91% (H?Y) 00:32:06.949 --> 00:32:16.200 However this doesn't look like an extreme deviation, what are the indices for this one skewness is 0 00:32:16.200 --> 00:32:20.250 because this is a symmetric distribution NOTE Treffsikkerhet: 81% (H?Y) 00:32:20.250 --> 00:32:27.750 kurtosis is minus one point something so it's at the limit NOTE Treffsikkerhet: 91% (H?Y) 00:32:27.750 --> 00:32:37.300 for being acceptable it is borderline acceptable and it's negative meaning that there aren't enough 00:32:37.300 --> 00:32:44.400 data points far from the mean, indeed because this mountain here, this mountain of data is cut short 00:32:44.400 --> 00:32:51.700 and there aren't any values were they would be expected for a normal distribution. So this data set 00:32:51.700 --> 00:32:55.500 fails the ketosis test but only very mildly NOTE Treffsikkerhet: 76% (H?Y) 00:32:55.500 --> 00:33:03.200 if we look at the Shapiro Wilkes test we get a high W and very high P 10 times higher than the 00:33:03.200 --> 00:33:11.350 limit of 0.05, so we pass this chapter will test for normality meaning that it would be quite 00:33:11.350 --> 00:33:21.500 plausible to have obtained a set of values like these by sampling from a normal distribution. So if 00:33:21.500 --> 00:33:24.000 we had this data set NOTE Treffsikkerhet: 91% (H?Y) 00:33:24.000 --> 00:33:30.800 we would normally not need to do anything about it even though we have a borderline kurtosis 00:33:30.800 --> 00:33:32.300 value. NOTE Treffsikkerhet: 84% (H?Y) 00:33:35.800 --> 00:33:43.550 What about this situation, this doesn't look very normally distributed because it looks very pointed 00:33:43.550 --> 00:33:51.450 and indeed we have a somewhat worrisome QQ plot that suggests that the line formed by the data 00:33:51.450 --> 00:34:00.900 deviate systematically from the reference line let us look at the comparison between the density and 00:34:00.900 --> 00:34:06.250 the reference normal distribution. So this data set has a lot NOTE Treffsikkerhet: 85% (H?Y) 00:34:06.250 --> 00:34:14.649 values farther from the mean than expected and also a lot of values around the mean. NOTE Treffsikkerhet: 88% (H?Y) 00:34:14.649 --> 00:34:24.900 What are the indices for this? Skewness is 0 because it's symmetric, kertosis is in excess of one so 00:34:24.900 --> 00:34:27.850 it's again a mild deviation NOTE Treffsikkerhet: 86% (H?Y) 00:34:27.850 --> 00:34:38.199 but it's outside of the limits minus 1 and 1, theShapiro Wilks test is actually not worrisome we don't 00:34:38.199 --> 00:34:43.000 fail the test with a high P value of 0.45 NOTE Treffsikkerhet: 91% (H?Y) 00:34:43.000 --> 00:34:50.300 indicating that it is not unlikely to obtain this shape by sampling from a normally distributed 00:34:50.300 --> 00:34:51.899 population. NOTE Treffsikkerhet: 83% (H?Y) 00:34:51.899 --> 00:34:59.900 This does not mean that our data are a good approximation to the normal distribution, but we need to 00:34:59.900 --> 00:35:06.000 evaluate further if there's something to do about it and with these indices in most cases we 00:35:06.000 --> 00:35:11.550 wouldn't actually need to do anything in order to obtain reliable results. NOTE Treffsikkerhet: 82% (H?Y) 00:35:11.550 --> 00:35:21.300 So we would probably not act on this value of kurtosis alone, although the shape of the QQ plot is 00:35:21.300 --> 00:35:23.850 not exactly reassuring NOTE Treffsikkerhet: 91% (H?Y) 00:35:23.850 --> 00:35:30.100 and we should take that into account in the interpretation of our findings that there is a mild 00:35:30.100 --> 00:35:33.600 violation of the normality assumption here. NOTE Treffsikkerhet: 82% (H?Y) 00:35:35.100 --> 00:35:42.500 What about these data ? We can see that this is a more extreme version of the previous set where we 00:35:42.500 --> 00:35:51.300 have a very pointy histogram with many values near the mean and also a lot of values spread out from 00:35:51.300 --> 00:35:54.350 the mean and we can see that NOTE Treffsikkerhet: 87% (H?Y) 00:35:54.350 --> 00:36:05.000 the line formed in the QQ plot jumps off the reference line at both ends, and the density is too flat 00:36:05.000 --> 00:36:09.900 here and to pointy here and again too Flat here. NOTE Treffsikkerhet: 83% (H?Y) 00:36:10.500 --> 00:36:19.600 The indices associated with this distribution are 0 skewness because it's symmetric and a kurtosis of 00:36:19.600 --> 00:36:21.900 positive 2.3 NOTE Treffsikkerhet: 88% (H?Y) 00:36:21.900 --> 00:36:29.200 so this is a moderate deviation from normality which is substantial and in this case the Shapiro 00:36:29.200 --> 00:36:37.900 Wilks test indicates that the probability of obtaining a data set of this shape by sampling from a 00:36:37.900 --> 00:36:47.900 normally distributed population is actually quite low, is less than 0.05, it's 0.03, so this data set 00:36:47.900 --> 00:36:52.300 fails the shapiro wilks test and we can say that these data are not NOTE Treffsikkerhet: 91% (H?Y) 00:36:52.300 --> 00:36:54.900 normally distributed. NOTE Treffsikkerhet: 91% (H?Y) 00:36:57.000 --> 00:37:05.100 Let us now look at some actual data sets, all the previous data sets we have looked at were created to 00:37:05.100 --> 00:37:11.900 produce these special situations so you can train your eyes a little bit on what shapes of data are 00:37:11.900 --> 00:37:19.700 associated with what values of the indices and the test. What about our real data ? NOTE Treffsikkerhet: 85% (H?Y) 00:37:19.700 --> 00:37:29.250 Here is a histogram of real data with the associated QQ plot, both the histogram and the QQ plot 00:37:29.250 --> 00:37:35.800 looks somewhat noisy but they do not seem to indicate significant deviations from the normal 00:37:35.800 --> 00:37:37.600 distribution NOTE Treffsikkerhet: 91% (H?Y) 00:37:37.600 --> 00:37:46.800 indeed if we compared to the reference curves we see that there is some deviation here but doesn't 00:37:46.800 --> 00:37:51.400 look like very much and the same is the case down here. NOTE Treffsikkerhet: 91% (H?Y) 00:37:51.500 --> 00:38:03.200 The indices are plus 0.23 for skewness, and - 0.52 for kurtosis, both of these are well within the 00:38:03.200 --> 00:38:12.100 limits of minus 1 to 1 and do not cause concern the Shapiro Wilks test also indicates no cause for 00:38:12.100 --> 00:38:16.900 concern as the p-value is 0.39. NOTE Treffsikkerhet: 82% (H?Y) 00:38:18.300 --> 00:38:25.500 So we can accept this data set as sufficiently close to the normal distribution for the purpose of 00:38:25.500 --> 00:38:35.000 statistical analyses that require an assumption of normality, here is another set of real data NOTE Treffsikkerhet: 81% (H?Y) 00:38:35.000 --> 00:38:43.100 and this is the one that really doesn't conform to the normal distribution as most children receive 00:38:43.100 --> 00:38:50.400 this score of zero and therefore the density of the data has nothing to do with the normal 00:38:50.400 --> 00:38:52.500 distribution curve NOTE Treffsikkerhet: 83% (H?Y) 00:38:52.500 --> 00:38:59.800 and also the QQ plot suggest a completely different shape for the data points than the reference 00:38:59.800 --> 00:39:01.149 line. NOTE Treffsikkerhet: 85% (H?Y) 00:39:01.149 --> 00:39:10.000 Indeed the indices for this data set are a very large positive skewness of 2.2, a very large positive 00:39:10.000 --> 00:39:15.550 kurtosis of 3.9 because of all these values far from the mean NOTE Treffsikkerhet: 89% (H?Y) 00:39:15.550 --> 00:39:25.500 and a quite significant Shapiro wilks P value well below the limit of 0.05 due to this deviation 00:39:25.500 --> 00:39:29.450 from normality and the low w at 0.5. NOTE Treffsikkerhet: 77% (H?Y) 00:39:29.450 --> 00:39:36.500 So this distribution fails all tests the visual ones, the indices, and the statistical test and these 00:39:36.500 --> 00:39:42.400 data are in no way normally distributed and cannot be used in any analyses that requires a normality 00:39:42.400 --> 00:39:44.000 assumption. NOTE Treffsikkerhet: 91% (H?Y) 00:39:45.300 --> 00:39:49.150 Here is another real data set NOTE Treffsikkerhet: 91% (H?Y) 00:39:49.150 --> 00:39:57.100 these are the same reading fluency values that we saw before but in a histogram with wider bars NOTE Treffsikkerhet: 91% (H?Y) 00:39:57.100 --> 00:40:03.900 and we can see that both the histogram and the QQ plot don't look exactly like the normal 00:40:03.900 --> 00:40:08.800 distribution, but don't look severely different from it either NOTE Treffsikkerhet: 81% (H?Y) 00:40:09.400 --> 00:40:18.900 and the indices associated with this one are a slight positive skewness of 0.2 and a negative 00:40:18.900 --> 00:40:27.100 kurtosis of minus 0.5, both of which are within the range of minus 1 to 1 and therefore are no cause 00:40:27.100 --> 00:40:35.399 for concern, and likewise the Shapiro Wilkes test produces a very high W and a high p-value of 0.53 00:40:35.399 --> 00:40:40.650 which is well above the Criterion of 0.05. NOTE Treffsikkerhet: 91% (H?Y) 00:40:40.650 --> 00:40:47.300 Therefore we can use this data set in analyses that require an assumption of normality. NOTE Treffsikkerhet: 83% (H?Y) 00:40:50.500 --> 00:40:59.000 Here is one more data set another realistic data set, actual data, in this one we see something a 00:40:59.000 --> 00:41:07.800 little strange the QQ plot doesn't look horrible although we have some deviation here, this in itself 00:41:07.800 --> 00:41:16.000 is no cause for concern, however this data set looks more like it has two peaks NOTE Treffsikkerhet: 88% (H?Y) 00:41:16.100 --> 00:41:24.400 rather than a single one and it doesn't look like this is caused by a small Gap somewhere where 00:41:24.400 --> 00:41:31.800 there is a slight dense region of extra values and a slightly less dense region with fewer values 00:41:31.800 --> 00:41:39.500 adjacent to it. This looks like a real two Peak situation which is a substantial deviation from the 00:41:39.500 --> 00:41:41.700 normal distribution. NOTE Treffsikkerhet: 89% (H?Y) 00:41:41.700 --> 00:41:51.700 What are the indices associated with that? Well we have essentially no skewness is 0.06, is an 00:41:51.700 --> 00:41:59.900 essentially symmetrical distribution, however we have a lack of values away from the mean as the 00:41:59.900 --> 00:42:09.400 kurtosis is farther from 0 then minus one it's minus 1 .3, this is only mild deviation actually but we 00:42:09.400 --> 00:42:12.200 also fail the Shapiro Wilks test as the NOTE Treffsikkerhet: 91% (H?Y) 00:42:12.200 --> 00:42:23.600 p value is less than 0.05 it's 0.0 35 and this is caused by the double Peak situation. So in this case 00:42:23.600 --> 00:42:31.700 we would not reject this set on the basis of the QQ plot or the skewness, we would only put a 00:42:31.700 --> 00:42:38.500 question mark based on the kertosis but the histogram in conjunction with this Shapiro Wilks test NOTE Treffsikkerhet: 91% (H?Y) 00:42:38.500 --> 00:42:45.900 are actually a cause of concern in case of analysis that required a normality assumption. Unfortunately 00:42:45.900 --> 00:42:53.700 there is not much one can do with a bimodal distribution, so if it has more than one Peak, so in some 00:42:53.700 --> 00:42:59.700 cases we might be forced to use these data as they are but we would need to discuss the implications 00:42:59.700 --> 00:43:04.900 of the violation of the normality assumption in interpreting our findings. NOTE Treffsikkerhet: 91% (H?Y) 00:43:05.700 --> 00:43:15.700 To sum up we often need to evaluate whether our dataset conforms to the expectations for normally 00:43:15.700 --> 00:43:23.700 distributed data. This is because many statistical procedures assume that the data are normally 00:43:23.700 --> 00:43:31.100 distributed, that is they have been sampled from a normally distributed population and aren't a weird 00:43:31.100 --> 00:43:35.550 sample of it but actually is far from a normally distributed NOTE Treffsikkerhet: 91% (H?Y) 00:43:35.550 --> 00:43:37.650 shape itself. NOTE Treffsikkerhet: 91% (H?Y) 00:43:37.650 --> 00:43:43.600 We generally need to evaluate normality with all of the data we collect NOTE Treffsikkerhet: 88% (H?Y) 00:43:45.000 --> 00:43:52.000 and as we saw there are three ways to do that and the three offer different kinds of information so 00:43:52.000 --> 00:43:54.400 we should carry out all three of them. NOTE Treffsikkerhet: 87% (H?Y) 00:43:54.400 --> 00:44:01.700 The first one is the visual appraisal of normality, that means we look at the histogram, we look 00:44:01.700 --> 00:44:03.850 at the QQ plot NOTE Treffsikkerhet: 91% (H?Y) 00:44:03.850 --> 00:44:12.600 and evaluate the degree to which they may indicate deviations from the normally distributed shape. NOTE Treffsikkerhet: 89% (H?Y) 00:44:13.100 --> 00:44:21.700 The second one is to calculate indices, the values of which are known for the normal distribution in 00:44:21.700 --> 00:44:28.600 the common indices of this short is skewness and kurtosis. Skewness evaluates whether a 00:44:28.600 --> 00:44:36.400 distribution is symmetrical and kertosis evaluates the proportion of data far from the mean. If the 00:44:36.400 --> 00:44:41.200 values of these indices is not within minus 1 and 1 NOTE Treffsikkerhet: 83% (H?Y) 00:44:41.200 --> 00:44:48.800 then we have reason to be concerned that our data aren't sufficiently normally distributed if the 00:44:48.800 --> 00:44:56.400 values are within minus 1 and 1 there could still be problems with our distribution but the specific 00:44:56.400 --> 00:45:03.000 properties that are assessed by these indices are not violated, are not far from what is expected for 00:45:03.000 --> 00:45:05.400 normally distributed data. NOTE Treffsikkerhet: 77% (H?Y) 00:45:05.400 --> 00:45:13.800 Finally there are statistical tests of normality and a commonly used one, but not the only, one is the 00:45:13.800 --> 00:45:15.750 Shapiro Wilks test NOTE Treffsikkerhet: 85% (H?Y) 00:45:15.750 --> 00:45:20.450 and for this test we look at the P value NOTE Treffsikkerhet: 91% (H?Y) 00:45:20.450 --> 00:45:31.500 and conclude that our data violate the normality assumption if the p-value is less than 0.05, whereas 00:45:31.500 --> 00:45:39.700 this test is passed if the p-value is equal to or greater than 0.05, but in the end we have to draw a 00:45:39.700 --> 00:45:47.600 conclusion about our data based on all three kinds of evaluations taken together in conjunction with 00:45:47.600 --> 00:45:50.649 the specific requirements of the test NOTE Treffsikkerhet: 79% (H?Y) 00:45:50.649 --> 00:45:52.200 we need to run. NOTE Treffsikkerhet: 91% (H?Y) 00:45:52.200 --> 00:46:01.200 And if our data are not sufficiently close to the normal distribution then we may need to decide to 00:46:01.200 --> 00:46:07.700 run different kinds of statistical tests that do not depend on the normality assumption.