WEBVTT Kind: captions; language: en-us NOTE Treffsikkerhet: 91% (H?Y) 00:00:00.000 --> 00:00:08.600 In this video we will go through some preliminary remarks regarding probability, in particular we 00:00:08.600 --> 00:00:16.400 will consider the question: what is probability, and why should we care ? NOTE Treffsikkerhet: 91% (H?Y) 00:00:16.400 --> 00:00:24.100 Let's start with the second question: why is probability important to understand? NOTE Treffsikkerhet: 88% (H?Y) 00:00:24.300 --> 00:00:33.300 First of all probability is how we deal with variability. As we've said before nothing, is certain 00:00:33.300 --> 00:00:41.800 things are variable, we can never be completely sure. But not everything is equally likely, some things 00:00:41.800 --> 00:00:48.599 are likely and expected and some other uncertain things are less likely and unexpected. NOTE Treffsikkerhet: 89% (H?Y) 00:00:48.599 --> 00:00:56.500 And we need to be able to handle this variability so that we make good decisions based on justified 00:00:56.500 --> 00:01:03.000 expectations. When are our expectations justified? When they reflect the actual probability that 00:01:03.000 --> 00:01:05.099 something will happen. NOTE Treffsikkerhet: 90% (H?Y) 00:01:05.700 --> 00:01:13.050 More specifically in a research context there is the issue of variability coming from sampling. 00:01:13.050 --> 00:01:20.900 Sampling means, because we can't measure everyone we're interested in, we have to sample and measure 00:01:20.900 --> 00:01:29.700 some people or some instances or some cases. And we want to be able to draw conclusions from our 00:01:29.700 --> 00:01:35.950 sample that will be valid for the whole population, that is we want our NOTE Treffsikkerhet: 78% (H?Y) 00:01:35.950 --> 00:01:40.300 has to be valid for all those we didn't happen to sample. NOTE Treffsikkerhet: 85% (H?Y) 00:01:40.300 --> 00:01:49.150 Would we get the same result if we had used another sample ? If we run a study with 20 or 50 people 00:01:49.150 --> 00:01:57.600 and find something, would we have gotten the same finding with a different group of 20 or 50 people ? This 00:01:57.600 --> 00:02:03.000 is very important in order to be able to make the right decision based on our study. NOTE Treffsikkerhet: 91% (H?Y) 00:02:03.000 --> 00:02:12.200 So in general we need to understand the variability of sampling, he variability that comes from sampling, 00:02:12.200 --> 00:02:18.500 what it implies for our conclusions, the implications of this sampling variability which are 00:02:18.500 --> 00:02:21.500 inherently probabilistic. NOTE Treffsikkerhet: 81% (H?Y) 00:02:22.700 --> 00:02:30.400 Now what exactly is probability ? There are different philosophical approaches to this question and 00:02:30.400 --> 00:02:36.800 there are formal definitions, and we're not going to go into any of that. The question for us is 00:02:36.800 --> 00:02:44.200 how to think about probability so that we have reasonable intuitions and our thinking makes sense. NOTE Treffsikkerhet: 91% (H?Y) 00:02:44.200 --> 00:02:54.600 So the basic idea is that probability refers to how likely something is to happe,n but this is not 00:02:54.600 --> 00:02:56.700 very easy to think about. NOTE Treffsikkerhet: 83% (H?Y) 00:02:56.700 --> 00:03:05.700 So the easier and equivalent way of thinking about it is how frequently something happens. So if 00:03:05.700 --> 00:03:11.800 there were many opportunities for an event occur, how many of these would actually lead to the 00:03:11.800 --> 00:03:14.300 occurrence of the event ? NOTE Treffsikkerhet: 85% (H?Y) 00:03:14.800 --> 00:03:23.100 High probability means that something happens often, so it's a frequent occurrence. This does not mean 00:03:23.100 --> 00:03:32.400 that it's certain, we can never be certain, but we can expect it to happen so a high probability event 00:03:32.400 --> 00:03:36.700 is something that doesn't surprise us when it happens. NOTE Treffsikkerhet: 91% (H?Y) 00:03:36.700 --> 00:03:44.000 What about low probability? Well that's the opposite, it's something that doesn't happen very often so 00:03:44.000 --> 00:03:51.900 it's an infrequent event, that does not mean that it's impossible it actually will happen, but not 00:03:51.900 --> 00:03:59.600 very frequently. So a low probability event is something unexpected, but tends to surprise us when it 00:03:59.600 --> 00:04:06.550 does happenm it doesn't surprises that it happens at allm but it surprises us NOTE Treffsikkerhet: 73% (MEDIUM) 00:04:06.550 --> 00:04:16.600 more than a high probability event because it's expected less. And the question then arises; when 00:04:16.600 --> 00:04:24.400 exactly are low probability events expected ? They're not zero probability so they will happen at some 00:04:24.400 --> 00:04:27.900 point, can we say something more about that ? NOTE Treffsikkerhet: 91% (H?Y) 00:04:27.900 --> 00:04:35.300 Let's ask this kind of question in a little more concrete context it's not a special needs education 00:04:35.300 --> 00:04:42.200 context but it's something that's very easy to understand and think about and then we'll go back to 00:04:42.200 --> 00:04:45.900 more relevant examples later in the course. NOTE Treffsikkerhet: 88% (H?Y) 00:04:46.300 --> 00:04:52.000 So the easy question is: why don't I win the lottery ? NOTE Treffsikkerhet: 91% (H?Y) 00:04:52.200 --> 00:04:57.350 And there is also another question if everybody NOTE Treffsikkerhet: 82% (H?Y) 00:04:57.350 --> 00:05:05.200 can ask themselves why don't I win the lottery then how come anyone wins the lottery if it's so 00:05:05.200 --> 00:05:10.900 unlikely for anyone of us why is it likely for someone ? NOTE Treffsikkerhet: 89% (H?Y) 00:05:11.300 --> 00:05:16.000 So you probably familiar with this type of Lottery NOTE Treffsikkerhet: 90% (H?Y) 00:05:16.000 --> 00:05:25.950 in this particular case there are 34 numbers and you are supposed to pick seven numbers NOTE Treffsikkerhet: 85% (H?Y) 00:05:25.950 --> 00:05:34.300 and note these seven numbers, and then there is some sort of draw that is a completely random event 00:05:34.300 --> 00:05:44.900 there is some device that ensures that seven out of 34 numbers are selected. So in this Norwegian 00:05:44.900 --> 00:05:52.600 case and also in many other countries there is a rotating device that uses air or something else to 00:05:52.600 --> 00:05:56.250 just mix up these numbered balls NOTE Treffsikkerhet: 91% (H?Y) 00:05:56.250 --> 00:06:04.400 and some of them are caught and rolled out and then you have the winners. So there are seven numbers 00:06:04.400 --> 00:06:11.500 that are drawn at random, and if you happen to have picked exactly these seven numbers, then you can 00:06:11.500 --> 00:06:13.600 win a lot of money. NOTE Treffsikkerhet: 84% (H?Y) 00:06:13.600 --> 00:06:19.200 So that sounds enticing and that's why people actually pay to do this, NOTE Treffsikkerhet: 91% (H?Y) 00:06:19.300 --> 00:06:28.000 but as you probably know you're not very likely to win, and why aren't you likely to win ? Well this 00:06:28.000 --> 00:06:33.900 can be answered by thinking of what it takes to have a winning sequence. NOTE Treffsikkerhet: 91% (H?Y) 00:06:34.700 --> 00:06:44.400 If you are to win this game this is necessary to happen, so when there are still all 34 balls inside 00:06:44.400 --> 00:06:53.700 that device the one that comes out must be one of the seven you have picked, so the odds of getting 00:06:53.700 --> 00:07:03.200 the first ball correct are seven divided by 34. And after this the following must happen of NOTE Treffsikkerhet: 81% (H?Y) 00:07:03.200 --> 00:07:06.800 of the 33 balls that are remaining in there NOTE Treffsikkerhet: 81% (H?Y) 00:07:07.000 --> 00:07:15.000 whichever one comes out must be one of the six left that you have guessed, and because you want both of 00:07:15.000 --> 00:07:23.300 these to be true these have to be multiplied. And then there are 32 balls in there and the one that 00:07:23.300 --> 00:07:31.600 comes out must be one of your five remaining choices, and so on and so forth, and the answer is for 00:07:31.600 --> 00:07:35.950 all of this to happen so that you win, that you get the NOTE Treffsikkerhet: 82% (H?Y) 00:07:35.950 --> 00:07:45.500 seven balls correct by your single choice, the chances are one in about five and a half million. NOTE Treffsikkerhet: 89% (H?Y) 00:07:47.700 --> 00:07:51.450 To put this number in context NOTE Treffsikkerhet: 88% (H?Y) 00:07:51.450 --> 00:08:02.500 the chance of being hit by lightning in Norway are estimated to be about 1 in 150 thousand, and if 00:08:02.500 --> 00:08:13.500 you randomly dial completely random numbers on your phone the chance of calling someone you know can 00:08:13.500 --> 00:08:22.550 be estimated to be about one in 167,000 assuming that you know about 600 people on average NOTE Treffsikkerhet: 84% (H?Y) 00:08:22.550 --> 00:08:31.800 and this is how many numbers are possible to ring in Norway and I didn't make up these estimates 00:08:31.800 --> 00:08:39.799 there are some published estimates for that. So how likely do you think it is to actually reach 00:08:39.799 --> 00:08:42.600 someone you know by randomly dialing ? NOTE Treffsikkerhet: 91% (H?Y) 00:08:42.600 --> 00:08:51.400 How likely do you think it is to be hit by lightning ? Well winning the lottery is way less likely 00:08:51.400 --> 00:08:56.200 than that, so that's why you don't win the lottery. NOTE Treffsikkerhet: 91% (H?Y) 00:08:58.100 --> 00:09:03.950 An important note in this context is that NOTE Treffsikkerhet: 91% (H?Y) 00:09:03.950 --> 00:09:11.400 this calculation doesn't take into account which numbers you selected, because it doesn't matter in 00:09:11.400 --> 00:09:20.200 any way, so any possible set of seven numbers has exactly the same probability of winning. This means 00:09:20.200 --> 00:09:28.500 that these numbers are exactly equally likely to win as these numbers. NOTE Treffsikkerhet: 91% (H?Y) 00:09:28.500 --> 00:09:37.200 And the reason I'm saying this is because these don't look very likely, these look more likely, these 00:09:37.200 --> 00:09:44.300 look more random, but that makes absolutely no difference any set of seven numbers has exactly the 00:09:44.300 --> 00:09:45.900 same probability. NOTE Treffsikkerhet: 91% (H?Y) 00:09:45.900 --> 00:09:54.500 So the probability of winning the lottery, a Lottery of this sort, is best thought of if you are 00:09:54.500 --> 00:10:02.250 thinking of this sequence. You get a more accurate feeling than by thinking about these. NOTE Treffsikkerhet: 91% (H?Y) 00:10:02.250 --> 00:10:10.600 Moreover which ball comes out at any given instant does not depend on which one came out before 00:10:10.600 --> 00:10:19.650 either in the same, or in a previous draw. The random choice of numbers is independent and memory less. NOTE Treffsikkerhet: 91% (H?Y) 00:10:19.650 --> 00:10:27.000 Regardless of what happened before or what happened in the immediately previous draw, so even if 00:10:27.000 --> 00:10:36.150 these numbers came up last week, this doesn't affect their likelihood of coming up again this week. NOTE Treffsikkerhet: 89% (H?Y) 00:10:36.150 --> 00:10:42.800 So this should give you a better sense of how likely it is doing the lottery, although you already 00:10:42.800 --> 00:10:51.400 knew that you are not likely to win it at all. Which brings us to the next question: how come anyone 00:10:51.400 --> 00:10:54.300 wins if it is so unlikely ? NOTE Treffsikkerhet: 91% (H?Y) 00:10:54.300 --> 00:11:03.800 To answer this question we have to think of a different approach, so if you were not just calling one 00:11:03.800 --> 00:11:12.800 random number, but if you dialed a million different random numbers so you had a lot of time for many 00:11:12.800 --> 00:11:17.650 years all you did was just dialing random numbers. NOTE Treffsikkerhet: 91% (H?Y) 00:11:17.650 --> 00:11:27.250 After 1 million dials what do you expect to have reached someone you know probably, more than one. NOTE Treffsikkerhet: 75% (MEDIUM) 00:11:27.250 --> 00:11:34.900 If you played a million different 7 number sets, would you expect to win the lottery ? Wou wouldn't be 00:11:34.900 --> 00:11:38.300 certain, but your chances would be much better NOTE Treffsikkerhet: 89% (H?Y) 00:11:38.300 --> 00:11:45.500 if you played all possible 7 number sets then you'd be sure to win, but then of course you would lose 00:11:45.500 --> 00:11:52.200 money instead of winning, it would be too expensive to do that. But think of everything that's between 00:11:52.200 --> 00:11:59.300 selecting just one set of seven numbers and all possible sets of seven numbers. You move from high 00:11:59.300 --> 00:12:03.000 very high uncertainty to near certainty. NOTE Treffsikkerhet: 91% (H?Y) 00:12:03.000 --> 00:12:08.750 So whether or not you expect to win depends on how many times you try. NOTE Treffsikkerhet: 91% (H?Y) 00:12:08.750 --> 00:12:11.750 Okay now we're getting somewhere. NOTE Treffsikkerhet: 91% (H?Y) 00:12:11.750 --> 00:12:18.900 How many times you try, it doesn't have to be you it can be different people, so whether you can 00:12:18.900 --> 00:12:27.000 expect a winning set depends on how many people play. It actually depends on how many 7 number sets are 00:12:27.000 --> 00:12:34.500 chosen, but if everyone chooses just one then the number of players determines how likely it is to 00:12:34.500 --> 00:12:36.400 have a winner. NOTE Treffsikkerhet: 91% (H?Y) 00:12:36.500 --> 00:12:47.700 So large numbers, in this case large number of Trials, lead to a predictability of patterns of random 00:12:47.700 --> 00:12:55.500 low chance events. So here we have low chance events that are completely unpredictable, but if you 00:12:55.500 --> 00:13:02.150 have a large number of Trials then you can predict how often you can expect someone to win, 00:13:02.150 --> 00:13:07.000 approximately. You can calculate if you can expect a NOTE Treffsikkerhet: 91% (H?Y) 00:13:07.000 --> 00:13:13.900 winner at every draw, if you know how many sets of seven numbers are played before the draw. You can 00:13:13.900 --> 00:13:21.100 never predict who will win,but you can predict how often someone will win. And for the research 00:13:21.100 --> 00:13:28.100 situation that we are interested in, this kind of a probabilistic thinking is very very useful. Because 00:13:28.100 --> 00:13:34.800 we're not interested in individual events we are interested in long-term outcomes of sets of 00:13:34.800 --> 00:13:36.400 events. NOTE Treffsikkerhet: 77% (H?Y) 00:13:37.300 --> 00:13:45.400 So what's the point of all this? Is this about gambling well although knowing about probability help 00:13:45.400 --> 00:13:55.000 with gambling but that's not all. Otcomes of random events or any kind of events that are variable, we 00:13:55.000 --> 00:14:02.000 don't control the variability, and we can think of them as being random. So any variability can be 00:14:02.000 --> 00:14:07.000 ascribed to a random process and we treat our variable data as being NOTE Treffsikkerhet: 91% (H?Y) 00:14:07.000 --> 00:14:14.600 random in some way, and then we can use ideas from probability to predict what can happen on the 00:14:14.600 --> 00:14:17.000 basis of these data. NOTE Treffsikkerhet: 91% (H?Y) 00:14:19.800 --> 00:14:30.349 A bit closer to home if we have a study based on data from a sample, so we study a special population 00:14:30.349 --> 00:14:36.500 and we want to understand a clinical population for example, or we study the effects of an 00:14:36.500 --> 00:14:44.000 intervention, either way we don't study everyone we use a sample of people and we measure something 00:14:44.000 --> 00:14:45.450 about them. NOTE Treffsikkerhet: 91% (H?Y) 00:14:45.450 --> 00:14:55.000 So how far from the population value can we expect to be based on our sample ? We'd like to draw a 00:14:55.000 --> 00:15:02.800 conclusion that is good for everyone in the population. including everyone we didn't measure. If we 00:15:02.800 --> 00:15:09.849 only measure a sample, how far from that population value can we expect to be ? NOTE Treffsikkerhet: 91% (H?Y) 00:15:09.849 --> 00:15:16.600 And how big a sample do we need in order to not be very far ? NOTE Treffsikkerhet: 91% (H?Y) 00:15:16.600 --> 00:15:23.400 This kind of question you can first think of in terms of something else you're familiar with, which 00:15:23.400 --> 00:15:32.800 is polling of voting preferences or voting intentions. So polling companies go and ask people what 00:15:32.800 --> 00:15:39.300 they plan to vote, who they plan to vote for, and then they report the results and give a margin of 00:15:39.300 --> 00:15:46.200 error, they say this is the percentage of votes for each party with a margin of plus or minus for NOTE Treffsikkerhet: 53% (MEDIUM) 00:15:46.200 --> 00:15:50.550 example two percentage points, or one or five percentage points. NOTE Treffsikkerhet: 88% (H?Y) 00:15:50.550 --> 00:16:00.000 So they give a range within which they expect to be with respect to the actual voting intention for 00:16:00.000 --> 00:16:06.800 the rest of the population. And they can calculate how many people they have to go around and ask in 00:16:06.800 --> 00:16:14.000 order to have a range of their choice. Do they want to be certain to within one percentage point or 00:16:14.000 --> 00:16:17.200 is within five percentage points enough ? NOTE Treffsikkerhet: 89% (H?Y) 00:16:17.200 --> 00:16:25.900 So all of this comes from understanding the role of probability in affecting long-term outcomes of 00:16:25.900 --> 00:16:32.800 many events, and sampling is very very important for us because all research is based on sampling. 00:16:32.800 --> 00:16:39.000 It's very rare that we can actually study the whole population that we're interested in. NOTE Treffsikkerhet: 88% (H?Y) 00:16:40.600 --> 00:16:50.100 And more specifically its understanding probability that helps us decide if the finding from our 00:16:50.100 --> 00:16:57.900 study can be trusted and should be interpreted in generalized, or it should rather be ignored, or 00:16:57.900 --> 00:17:06.300 augmented with more findings. So this is why probability is very important to understand, not just in 00:17:06.300 --> 00:17:11.300 the context of a statistics course, but in the context of understanding NOTE Treffsikkerhet: 91% (H?Y) 00:17:11.300 --> 00:17:15.300 research and practice in Special Needs education.