WEBVTT Kind: captions; language: en-us NOTE Treffsikkerhet: 91% (H?Y) 00:00:00.000 --> 00:00:09.400 In this video, we will go into how the mean is defined and what that means. We're going to use this 00:00:09.400 --> 00:00:17.100 example of five students measuring the height of one other student with a measuring tape and 00:00:17.100 --> 00:00:26.400 producing a set of estimates that we have from a previous class. And the question is, what should the 00:00:26.400 --> 00:00:29.900 answer as being the person's height? NOTE Treffsikkerhet: 90% (H?Y) 00:00:30.100 --> 00:00:38.600 So here we have a graphical display of measurements of this person's height. And on the horizontal 00:00:38.600 --> 00:00:46.600 axis we have values that could be reported and on the vertical axis we have the number of times 00:00:46.600 --> 00:00:55.600 that each value was reported. So in this particular case, the first person reported 163.5. So we 00:00:55.600 --> 00:01:00.200 draw a line from 0 to 1 to indicate that the NOTE Treffsikkerhet: 87% (H?Y) 00:01:00.200 --> 00:01:09.650 value 163,5 right here occurred once after the first person had measured it. NOTE Treffsikkerhet: 85% (H?Y) 00:01:09.650 --> 00:01:19.600 After the second person, we have one occurrence of the value 162 in addition to the first one and 00:01:19.600 --> 00:01:30.300 then three more people measure the height. And so we have finally two measures at 162. That's why 00:01:30.300 --> 00:01:38.350 this line goes all the way up to 2. So this show just the distribution of the actual values reported 00:01:38.350 --> 00:01:40.650 as measured height for this person. NOTE Treffsikkerhet: 72% (MEDIUM) 00:01:40.650 --> 00:01:46.500 And the question is, how can we derive an answer from these? NOTE Treffsikkerhet: 88% (H?Y) 00:01:46.500 --> 00:01:55.100 So to see how we can calculate a reasonable answer, we will imagine we have that answer and it will 00:01:55.100 --> 00:02:03.100 just draw a line to represent it and use the symbol ¦Ì to indicate this value that we don't know 00:02:03.100 --> 00:02:04.250 yet. NOTE Treffsikkerhet: 91% (H?Y) 00:02:04.250 --> 00:02:10.400 So let's imagine that our answer would be somewhere around here. NOTE Treffsikkerhet: 91% (H?Y) 00:02:13.000 --> 00:02:22.700 How much do the actual observations differ from this desired answer? We can subtract this from each 00:02:22.700 --> 00:02:31.300 observation to find the distances. So, these d values are the distances d1 is the difference 00:02:31.300 --> 00:02:34.700 between the first measurement. NOTE Treffsikkerhet: 79% (H?Y) 00:02:36.100 --> 00:02:47.500 Designated x1, minus this value, which we don't know yet, but we can still write that d1 equals x1 00:02:47.500 --> 00:02:56.800 minus ¦Ì. So the first distance equals the first measurement minus this position, whatever it turns 00:02:56.800 --> 00:03:06.350 out to be, and we can do the same for the other values that were reported. So we have five distances NOTE Treffsikkerhet: 84% (H?Y) 00:03:06.850 --> 00:03:14.500 from this hypothetical line, and these are the 5 differences of the actual observations. NOTE Treffsikkerhet: 76% (H?Y) 00:03:14.500 --> 00:03:24.400 Notice that the way that these are displayed, two of the measurements are to the right of this 00:03:24.400 --> 00:03:33.800 hypothetical line. So if we subtract this from that, we get a positive number. NOTE Treffsikkerhet: 75% (MEDIUM) 00:03:34.400 --> 00:03:43.500 The same is true for this one. x1 minus ¦Ì will produce a positive number. This is a positive 00:03:43.500 --> 00:03:51.700 distance, and x4 minus ¦Ì, again will be a positive distance, because this is more than this. It's to 00:03:51.700 --> 00:03:56.200 the right on the axis. In contrast NOTE Treffsikkerhet: 67% (MEDIUM) 00:03:56.200 --> 00:04:06.900 X3 minus ¦Ì will produce a negative number because X3 here is less than ¦Ì the way they are shown 00:04:06.900 --> 00:04:13.600 right now in this example. So this is to the left of that. Therefore, they're different will be 00:04:13.600 --> 00:04:15.400 negative. NOTE Treffsikkerhet: 91% (H?Y) 00:04:17.500 --> 00:04:28.550 So now we can begin to think of this line, as a line that is pulled by the different observations. NOTE Treffsikkerhet: 85% (H?Y) 00:04:28.550 --> 00:04:37.800 And we want this line to just settle wherever all the different pools balance out. NOTE Treffsikkerhet: 88% (H?Y) 00:04:38.100 --> 00:04:48.800 We will accept this as an optimal answer from the five data points. When we allow each data point to 00:04:48.800 --> 00:04:57.950 pull and let this line settle wherever all the poles equal in the two directions. NOTE Treffsikkerhet: 81% (H?Y) 00:04:57.950 --> 00:05:09.000 Another way to state that is that we demand the sum of all these distances to equal zero. When is 00:05:09.000 --> 00:05:16.300 the sum of all the distance is going to be zero. It's going to be zero when the positive ones, pull 00:05:16.300 --> 00:05:22.300 through the right, exactly as much as the negative ones pull to the left. NOTE Treffsikkerhet: 90% (H?Y) 00:05:22.300 --> 00:05:29.100 So because these are positive and these are negative. When all the positives some up to exactly the 00:05:29.100 --> 00:05:37.900 opposite of these negative ones, then the sum of all of them together is going to be 0. So we demand 00:05:37.900 --> 00:05:43.900 this, we set this to be the case. It's not that we know something about the distances. All we know 00:05:43.900 --> 00:05:50.300 about the distances is that they're defined by a subtraction, but I don't know ¦Ì yet. NOTE Treffsikkerhet: 84% (H?Y) 00:05:50.300 --> 00:05:58.900 So we say we want the distance is to add up to zero and that's how we define where we want this line 00:05:58.900 --> 00:06:10.100 to be. So to proceed. All we have to do, is substitute the d values from their definitions. NOTE Treffsikkerhet: 82% (H?Y) 00:06:13.000 --> 00:06:21.600 So the first distance equals X1 minus ¦Ì. The second distance equals X2 minus ¦Ì, and of course 00:06:21.600 --> 00:06:28.000 we retain the demand that this sum has to equal zero. NOTE Treffsikkerhet: 89% (H?Y) 00:06:28.000 --> 00:06:35.150 Now, all we have to do is group together all the x here NOTE Treffsikkerhet: 84% (H?Y) 00:06:35.150 --> 00:06:45.000 and all the ¦Ì, not changing anything just moving them are around on the same side. So that 00:06:45.000 --> 00:06:56.900 we see that there's five of them and five of them. And we can take all the ¦Ì to the other side by 00:06:56.900 --> 00:06:58.950 changing their sign. NOTE Treffsikkerhet: 83% (H?Y) 00:06:58.950 --> 00:07:05.600 So instead of subtracting them on the left, we add them on the right, which is exactly the same 00:07:05.600 --> 00:07:14.000 thing. And then well if there's five of them, we just write this as 5 times ¦Ì equals the sum of all 00:07:14.000 --> 00:07:15.800 these values. NOTE Treffsikkerhet: 90% (H?Y) 00:07:15.800 --> 00:07:23.200 Just take these five and discard the zero, which doesn't add anything. NOTE Treffsikkerhet: 75% (MEDIUM) 00:07:23.600 --> 00:07:34.200 So based on this, we can solve form ¦Ì just by taking the 5 on the other side. So, if five times ¦Ì 00:07:34.200 --> 00:07:43.799 is equal to the sum, then ¦Ì is equal to the sum divided by 5. Which is the formula for the mean, 00:07:43.799 --> 00:07:46.200 that you already knew. NOTE Treffsikkerhet: 90% (H?Y) 00:07:47.200 --> 00:08:00.000 What this means is that the mean expresses the idea that it's a balance point. It's the point where 00:08:00.000 --> 00:08:03.850 the pulls from all the available data. NOTE Treffsikkerhet: 91% (H?Y) 00:08:03.850 --> 00:08:09.049 Settle and counteract each other exactly. NOTE Treffsikkerhet: 77% (H?Y) 00:08:09.049 --> 00:08:16.800 And in mathematical terms, this is exactly the same as saying that the total distance equals zero. 00:08:16.800 --> 00:08:25.000 That is the positive and the negative distances exactly balance out. Remember, we started by just 00:08:25.000 --> 00:08:33.000 drawing these distances and this line but not giving them any value. The value is derived from the 00:08:33.000 --> 00:08:37.850 measurements after we demand that they all add up to zero. NOTE Treffsikkerhet: 89% (H?Y) 00:08:37.850 --> 00:08:46.200 So, this is how the mean arises and it's actual value in this case is indeed 162.3. NOTE Treffsikkerhet: 91% (H?Y) 00:08:46.300 --> 00:08:54.600 And if you want to add up the values of these distances, now that you know this, you can subtract 00:08:54.600 --> 00:09:01.500 it from each observation to find these distances here. And if you add these two, you'll see that the 00:09:01.500 --> 00:09:04.700 exactly counteract these three. NOTE Treffsikkerhet: 82% (H?Y) 00:09:04.700 --> 00:09:15.100 So that is what the mean is and why it is equally affected by every observation that we have. This 00:09:15.100 --> 00:09:23.200 means that unless the observations that we have are measures of the same thing. And in fact, equally 00:09:23.200 --> 00:09:29.600 good measures of the same thing, then allowing them to go into the calculation of the mean would be 00:09:29.600 --> 00:09:30.849 misleading. NOTE Treffsikkerhet: 88% (H?Y) 00:09:30.849 --> 00:09:38.000 And this is the important assumption for calculating a meaningful mean, which also has the useful 00:09:38.000 --> 00:09:45.800 properties of minimizing errors and other ones that will encounter later in this course.