WEBVTT Kind: captions; language: en-us NOTE Treffsikkerhet: 88% (H?Y) 00:00:00.199 --> 00:00:07.900 In this video, we will talk about statistical associations. I want to give you a sense of what this 00:00:07.900 --> 00:00:14.300 means and what they look like before we go on to any actual statistical analyses. So there are 00:00:14.300 --> 00:00:20.000 several words you can use to refer to statistical associations and they all essentially mean 00:00:20.000 --> 00:00:21.650 about the same thing. NOTE Treffsikkerhet: 78% (H?Y) 00:00:21.650 --> 00:00:29.900 The first and simplest word is a relationships. So two variables can be related, or there is a 00:00:29.900 --> 00:00:36.500 relationship among variables. The forms that relationships can take depends on the measurement 00:00:36.500 --> 00:00:43.000 scales of the variables. And we'll see some examples about different kinds of scales. When the 00:00:43.000 --> 00:00:50.100 variables are numeric or quantitative the relationships can be linear or proportional, which means 00:00:50.100 --> 00:00:51.849 the same thing, or they can be NOTE Treffsikkerhet: 76% (H?Y) 00:00:51.849 --> 00:00:56.000 nonn linear and we will also see some examples of what that means. NOTE Treffsikkerhet: 80% (H?Y) 00:00:56.000 --> 00:01:02.700 Other words we can use to describe statistical associations is the word effect. Which in 00:01:02.700 --> 00:01:09.700 statistics means the same thing as a relationship. So the strength of a relationship is termed 00:01:09.700 --> 00:01:18.200 effect size. Actually effect sizes have more specific meanings depending on the type of test. And so 00:01:18.200 --> 00:01:24.900 they can be precisely calculated as quantities or they can be characterized as small medium or 00:01:24.900 --> 00:01:25.650 large. NOTE Treffsikkerhet: 90% (H?Y) 00:01:25.650 --> 00:01:35.600 Another word that we use to refer to the same idea is prediction. This means that if two variables 00:01:35.600 --> 00:01:43.000 are related, then you can predict one based on the other. And of course, you cannot predict 00:01:43.000 --> 00:01:48.700 perfectly because the two variables aren't measuring wxactly the same thing, but you predict to 00:01:48.700 --> 00:01:55.699 some degree and how far off you predict is your prediction error. Which is also NOTE Treffsikkerhet: 85% (H?Y) 00:01:55.699 --> 00:02:03.300 called a residual. So residual is a measure of how far off your prediction is when you use one 00:02:03.300 --> 00:02:07.200 variable to predict the value of another. NOTE Treffsikkerhet: 85% (H?Y) 00:02:08.000 --> 00:02:17.600 One final term that is used is shared variance or common variance. So the idea here is that two 00:02:17.600 --> 00:02:24.700 variables that are related have some common variance, that's what the relationship consists in. The 00:02:24.700 --> 00:02:30.800 changes in one variable are associated with changes in the other variable. That's why they're 00:02:30.800 --> 00:02:38.600 related. In the degree to which these changes are consistent and proportional is reflected NOTE Treffsikkerhet: 91% (H?Y) 00:02:38.600 --> 00:02:44.000 in the proportion of variance that is shared between these two variables. NOTE Treffsikkerhet: 88% (H?Y) 00:02:45.000 --> 00:02:53.000 So a simple way to think about statistical associations is by imagining a guessing game. The 00:02:53.000 --> 00:03:01.200 guessing game is a game where you have values from one variable and you see if you can use those to 00:03:01.200 --> 00:03:09.000 improve your guessing of the values of another variable. Let's start with an example, using two 00:03:09.000 --> 00:03:12.450 qualitative or categorical variables. NOTE Treffsikkerhet: 85% (H?Y) 00:03:12.450 --> 00:03:18.850 Let's say we are outside a high school (videreg?ende skole). NOTE Treffsikkerhet: 85% (H?Y) 00:03:18.850 --> 00:03:27.600 And we just pick a random student who is walking their from the vocational program. NOTE Treffsikkerhet: 82% (H?Y) 00:03:27.600 --> 00:03:33.850 Try to guess this random student sex. Is it a man or a woman? NOTE Treffsikkerhet: 91% (H?Y) 00:03:33.850 --> 00:03:42.500 What should you answer? If you knew the proportion of women attending vocational programs in high 00:03:42.500 --> 00:03:45.800 school, then that should be your answer. NOTE Treffsikkerhet: 91% (H?Y) 00:03:45.800 --> 00:03:51.400 And that would also indicate how often you would be guessing correctly. NOTE Treffsikkerhet: 83% (H?Y) 00:03:51.400 --> 00:03:59.000 Now think of the variables specialization. So which actual study program have they chosen? Would 00:03:59.000 --> 00:04:01.750 that help you guess any better? NOTE Treffsikkerhet: 91% (H?Y) 00:04:01.750 --> 00:04:10.000 If it does, then it means that sex and specialization are related variables. Let's look at some 00:04:10.000 --> 00:04:20.899 actual data. So it turns out that I need 2019, the proportion of women attending vocational programs 00:04:20.899 --> 00:04:25.500 in high school in Norway was 42 percent. NOTE Treffsikkerhet: 91% (H?Y) 00:04:27.400 --> 00:04:36.600 I picked a random student and asked you to guess the sex. You should guess man, because you would be 00:04:36.600 --> 00:04:44.500 right more often than you would be wrong. However, look at the variation among the different 00:04:44.500 --> 00:04:46.800 specializations. NOTE Treffsikkerhet: 78% (H?Y) 00:04:47.200 --> 00:04:55.150 So construction and electricians have very low percentages of women. NOTE Treffsikkerhet: 91% (H?Y) 00:04:55.150 --> 00:05:04.850 Whereas design and health related occupations have very high percentages of women. NOTE Treffsikkerhet: 91% (H?Y) 00:05:04.850 --> 00:05:15.250 So this means that if you knew the study direction of the random person I picked, you could guess 00:05:15.250 --> 00:05:23.800 the sex with a higher probability of guessing correctly. It means that the variable specialization, 00:05:23.800 --> 00:05:28.050 so which of these directions the person is attending, NOTE Treffsikkerhet: 91% (H?Y) 00:05:28.050 --> 00:05:37.400 so this variable helps you predict the sex variable more accurately. So the two variables are 00:05:37.400 --> 00:05:42.500 related. What does this look like in a graphical form? NOTE Treffsikkerhet: 89% (H?Y) 00:05:42.500 --> 00:05:49.150 So, this is a mosaic plot that plots sex NOTE Treffsikkerhet: 91% (H?Y) 00:05:49.150 --> 00:06:00.200 against specialization. And so, this column here, indicates the men. NOTE Treffsikkerhet: 91% (H?Y) 00:06:01.200 --> 00:06:06.800 And this column here indicates the women. NOTE Treffsikkerhet: 87% (H?Y) 00:06:07.200 --> 00:06:15.900 And you can see that the column for men is wider because there are more men in the vocational 00:06:15.900 --> 00:06:20.400 programs than there are women. NOTE Treffsikkerhet: 91% (H?Y) 00:06:20.400 --> 00:06:29.700 The important property here is that the area of each of these rectangles is proportional to the 00:06:29.700 --> 00:06:37.700 number of people of the corresponding sex, attending that study direction. So we can see that the 00:06:37.700 --> 00:06:45.900 area of the rectangle for construction for men, is much larger than the one for women. NOTE Treffsikkerhet: 91% (H?Y) 00:06:46.000 --> 00:06:50.100 In the rectangle for electricians. NOTE Treffsikkerhet: 91% (H?Y) 00:06:50.100 --> 00:07:00.000 There is also much larger than for women, but the rectangle for health related occupations is way larger 00:07:00.000 --> 00:07:02.600 for women than for men. NOTE Treffsikkerhet: 70% (MEDIUM) 00:07:02.600 --> 00:07:14.400 And the fact that these sets of rectangles looks so different in their sizes indicates that the 00:07:14.400 --> 00:07:23.900 proportion of women varies widely between different study directions. Which is a visual way to say 00:07:23.900 --> 00:07:33.200 that the variables specialization and sex are related, in the sense that knowing one helps you NOTE Treffsikkerhet: 91% (H?Y) 00:07:33.200 --> 00:07:35.850 guess the other one more accurately. NOTE Treffsikkerhet: 90% (H?Y) 00:07:35.850 --> 00:07:43.400 And this works both ways. So if I pick a random student outside the high school. NOTE Treffsikkerhet: 91% (H?Y) 00:07:43.400 --> 00:07:51.700 And asked you to guess their specialization. What are they studying? Of course, you can try to guess 00:07:51.700 --> 00:08:00.000 based on the overall proportions, but then would your guess be different and more accurate if you 00:08:00.000 --> 00:08:07.200 knew this sex. So if I told you this random person is a man, you should probably not guess that 00:08:07.200 --> 00:08:13.700 they're studying design or health. Although of course, there are plenty of men who do take these NOTE Treffsikkerhet: 91% (H?Y) 00:08:13.700 --> 00:08:22.350 specializations. But there are many more men who have chosen other specializations. If I told you 00:08:22.350 --> 00:08:29.800 that the random person is a woman, you should probably guess that they are in the health related 00:08:29.800 --> 00:08:36.700 occupations specialization, because that is the study direction with most women. NOTE Treffsikkerhet: 91% (H?Y) 00:08:36.700 --> 00:08:45.200 So the fact that knowledge of one variable would make you change your guess on the other variable 00:08:45.200 --> 00:08:54.500 means that the two variables are related and this is the visual display of two related qualitative 00:08:54.500 --> 00:08:57.400 or categorical variables. NOTE Treffsikkerhet: 83% (H?Y) 00:08:57.400 --> 00:09:03.100 Let's see what the guessing game looks like four different kinds of variable. NOTE Treffsikkerhet: 91% (H?Y) 00:09:03.100 --> 00:09:08.100 So now let's go to numerical or quantitative variables. NOTE Treffsikkerhet: 91% (H?Y) 00:09:08.100 --> 00:09:16.300 Let's go outside and elementary School now and pick a random kid, one pupil. NOTE Treffsikkerhet: 84% (H?Y) 00:09:16.300 --> 00:09:20.600 Guess their vocabulary score. NOTE Treffsikkerhet: 91% (H?Y) 00:09:20.600 --> 00:09:29.500 That is, guess how many correct answers they will get in a test of receptive vocabulary showing 00:09:29.500 --> 00:09:36.000 them pictures and saying a word and they have to choose the correct picture that corresponds to the 00:09:36.000 --> 00:09:46.599 word. So how would you do that? Would knowing the age of the child help you guess more accurately? NOTE Treffsikkerhet: 91% (H?Y) 00:09:46.599 --> 00:09:52.200 Here is what the relationship between these two variables looks like. NOTE Treffsikkerhet: 91% (H?Y) 00:09:52.200 --> 00:09:58.000 On the horizontal axis is the age of children in months. NOTE Treffsikkerhet: 89% (H?Y) 00:09:58.000 --> 00:10:05.300 And on the vertical axis is the number of correct answers on this vocabulary test. So their 00:10:05.300 --> 00:10:07.700 vocabulary score. NOTE Treffsikkerhet: 91% (H?Y) 00:10:07.900 --> 00:10:18.000 You can see that this graph indicates a relationship in the sense that there aren't many low scores 00:10:18.000 --> 00:10:21.000 for older children. NOTE Treffsikkerhet: 91% (H?Y) 00:10:21.200 --> 00:10:26.200 But there are many low scores for younger children. NOTE Treffsikkerhet: 91% (H?Y) 00:10:26.200 --> 00:10:32.400 And there aren't very many very high scores for younger children. NOTE Treffsikkerhet: 91% (H?Y) 00:10:32.400 --> 00:10:37.850 Whereas there are many high scores for older children. NOTE Treffsikkerhet: 83% (H?Y) 00:10:37.850 --> 00:10:46.300 So there seems to be a clear relationship between age and vocabulary. If you have to guess the 00:10:46.300 --> 00:10:53.300 vocabulary of a child, a random child, about which you know nothing. Then your best guess would be 00:10:53.300 --> 00:11:00.600 the mean vocabulary score. Which is about 122. And that should be your answer because that would 00:11:00.600 --> 00:11:07.900 minimize your error. That would minimize your long-term distance from the correct answer. How far 00:11:07.900 --> 00:11:08.250 off you NOTE Treffsikkerhet: 88% (H?Y) 00:11:08.250 --> 00:11:17.100 get. So this would be your best bet. However, if you knew the age, then your prediction should be 00:11:17.100 --> 00:11:24.500 different. If you knew that the age of the child was 8 years. NOTE Treffsikkerhet: 91% (H?Y) 00:11:25.500 --> 00:11:30.050 Or that the age of the child was 12 years. NOTE Treffsikkerhet: 89% (H?Y) 00:11:30.050 --> 00:11:37.500 Then your guess of their vocabulary score should be very different because of this relationship. 00:11:37.500 --> 00:11:47.250 As you can see for an eight-year-old, most scores are between 70 and 140. And the mean for 00:11:47.250 --> 00:11:51.650 eight-year-olds is about a 105. NOTE Treffsikkerhet: 89% (H?Y) 00:11:51.650 --> 00:11:59.900 In contrast, scores for 12 year olds are between a hundred and twenty and a hundred and fifty and 00:11:59.900 --> 00:12:04.200 the mean is just above 140. NOTE Treffsikkerhet: 83% (H?Y) 00:12:04.300 --> 00:12:14.400 So if the random kid was 12 years old you should guess 141. If your random kid was 8 years old you 00:12:14.400 --> 00:12:16.600 should guess hundred and five. NOTE Treffsikkerhet: 91% (H?Y) 00:12:16.600 --> 00:12:24.000 And that precisely reflects the relationship between age and vocabulary. NOTE Treffsikkerhet: 91% (H?Y) 00:12:24.900 --> 00:12:29.500 Again, this works backwards as well. NOTE Treffsikkerhet: 90% (H?Y) 00:12:29.500 --> 00:12:38.500 If I said here is a random kid. Can you guess their age? You should guess the average age for kids 00:12:38.500 --> 00:12:40.850 going to elementary school. NOTE Treffsikkerhet: 82% (H?Y) 00:12:40.850 --> 00:12:47.150 But if I told you that the kids vocabulary score was 140. NOTE Treffsikkerhet: 90% (H?Y) 00:12:47.150 --> 00:12:56.250 You should not guess the average age. You should guess the age at which 140 is an average score. So 00:12:56.250 --> 00:13:03.900 you should probably guess around 12 years old and not around 8 or 10 years old. NOTE Treffsikkerhet: 91% (H?Y) 00:13:04.500 --> 00:13:13.000 One final guessing game we can play is with one quantitative and one qualitative variable. So one 00:13:13.000 --> 00:13:20.700 categorical and one numeric variable. So now let's stand at the University campus and pick a random 00:13:20.700 --> 00:13:24.200 person. Guess how old they are. NOTE Treffsikkerhet: 91% (H?Y) 00:13:24.200 --> 00:13:33.200 If I tell you whether this person is a student or an employee, would you still guess the same age? 00:13:33.200 --> 00:13:41.500 You probably shouldn't because these two variables are related. Let's look at just one way this can 00:13:41.500 --> 00:13:46.700 be visualized and later we will see more weights for this combination. NOTE Treffsikkerhet: 84% (H?Y) 00:13:46.700 --> 00:13:55.000 So, these are actual data for all Norwegian universities employees versus students by age range. NOTE Treffsikkerhet: 91% (H?Y) 00:13:55.100 --> 00:14:03.000 And we can see that most students are less than 30 years old. NOTE Treffsikkerhet: 91% (H?Y) 00:14:03.000 --> 00:14:10.700 And you should probably guess their age in the 20s. Whereas most employees are older than 30 years 00:14:10.700 --> 00:14:15.000 old. So you should probably guess an aging their 40s. NOTE Treffsikkerhet: 82% (H?Y) 00:14:15.000 --> 00:14:23.200 That's because the two variables age and staus are related. And again, it works the other way as 00:14:23.200 --> 00:14:33.300 well. So if you were to guess whether a random person on campus is an employee or a student. Would 00:14:33.300 --> 00:14:36.100 it help you to know the age? NOTE Treffsikkerhet: 77% (H?Y) 00:14:37.300 --> 00:14:41.550 It would because the variables are related. NOTE Treffsikkerhet: 91% (H?Y) 00:14:41.550 --> 00:14:54.550 So any age, below 55 and you should guess it's a student because students are more than employees 00:14:54.550 --> 00:15:04.300 for all of these age ranges. Except for the ages between 55 and 69 when employees clearly 00:15:04.300 --> 00:15:06.349 outnumber students. NOTE Treffsikkerhet: 91% (H?Y) 00:15:06.349 --> 00:15:13.400 And so if I told you this person was 60 years old, you should guess employee. If I said, the person 00:15:13.400 --> 00:15:17.750 is 35 years old, then you should guess student. NOTE Treffsikkerhet: 91% (H?Y) 00:15:17.750 --> 00:15:24.700 And that's because two variables are related. And when two variables are related it means that each 00:15:24.700 --> 00:15:34.800 of them can help predict the other one better. Let's look at some properties of statistical, 00:15:34.800 --> 00:15:42.400 associations involving quantitative variables, namely linearity and strength. What this means and 00:15:42.400 --> 00:15:44.500 what it looks like. NOTE Treffsikkerhet: 90% (H?Y) 00:15:46.300 --> 00:15:54.400 This is a measure of word reading efficiency. The number of words read correctly within 45 00:15:54.400 --> 00:16:04.100 seconds, as a function of age from 88 through almost a hundred fifty months. So these are for 00:16:04.100 --> 00:16:05.500 children. NOTE Treffsikkerhet: 91% (H?Y) 00:16:05.500 --> 00:16:12.100 Again, we see that we have a lot of lower rates. NOTE Treffsikkerhet: 91% (H?Y) 00:16:12.100 --> 00:16:14.650 For the younger kids. NOTE Treffsikkerhet: 91% (H?Y) 00:16:14.650 --> 00:16:23.800 And not very high rates for young kids and higher rates for the older kids, and not very many low 00:16:23.800 --> 00:16:31.400 rates for the older kids, because of course, kids read faster as they grow up and become more 00:16:31.400 --> 00:16:33.950 experienced and skilled readers. NOTE Treffsikkerhet: 91% (H?Y) 00:16:33.950 --> 00:16:42.900 So the relationship between age and word reading efficiency looks like a band of values that is 00:16:42.900 --> 00:16:51.200 sloping upwards and it's very clear in this graph by the lack, the relative lack, of data points in 00:16:51.200 --> 00:16:54.800 this corner and in this corner. NOTE Treffsikkerhet: 91% (H?Y) 00:16:55.600 --> 00:17:03.800 The actual shape of this relationship appears to be close to a straight line. So, I can draw a 00:17:03.800 --> 00:17:11.599 straight line going through these points and it would seem to go through the points at all age 00:17:11.599 --> 00:17:21.349 levels with a similar success. That means that it leaves about as many data points above as it does 00:17:21.349 --> 00:17:22.900 below. NOTE Treffsikkerhet: 91% (H?Y) 00:17:23.900 --> 00:17:32.949 So here we have a relationship that appears to be linear. Linear means proportional. It means that 00:17:32.949 --> 00:17:41.400 equal changes in age are associated with equal changes in expected performance in word reading 00:17:41.400 --> 00:17:43.000 efficiency. NOTE Treffsikkerhet: 91% (H?Y) 00:17:43.000 --> 00:17:51.700 The fact that this band is relatively near the straight line suggests that the relationship is 00:17:51.700 --> 00:18:00.250 strong. This means that if you know the age of a child, you will not make a huge mistake predicting 00:18:00.250 --> 00:18:08.300 their actual performance. Of course, there is a wide range. So, a child at a hundred months of age, 00:18:08.300 --> 00:18:12.600 can have as few as 20 words read or as many as NOTE Treffsikkerhet: 72% (MEDIUM) 00:18:12.600 --> 00:18:22.500 80. So that is a big range. But most of them are between 30 and 55. So if you guess the mean 00:18:22.500 --> 00:18:29.500 performance around 50, then you're not going to be far off for most children and you shouldn't guess 00:18:29.500 --> 00:18:33.900 55 for a child that is 12 years old because then NOTE Treffsikkerhet: 91% (H?Y) 00:18:33.900 --> 00:18:42.000 you would really be very unlikely to get it right. You should guess something more like 70, and 00:18:42.000 --> 00:18:46.500 then you'd be near most actually observed scores. NOTE Treffsikkerhet: 91% (H?Y) 00:18:47.300 --> 00:18:56.000 So this is what a linear strong relationship between two numerical variables looks like. NOTE Treffsikkerhet: 91% (H?Y) 00:18:57.700 --> 00:19:04.100 What about a measure of reading comprehension for these same children. NOTE Treffsikkerhet: 90% (H?Y) 00:19:04.100 --> 00:19:14.000 Well again, we can see that there are few points here, which means that there are few older kids 00:19:14.000 --> 00:19:21.950 with very low scores. Those scores now mean few questions about the meaning of the passages 00:19:21.950 --> 00:19:28.400 answered correctly. This is a reading comprehension test. The low scores mean that you answer 00:19:28.400 --> 00:19:34.600 incorrectly, the comprehension questions. And there are also somewhat very NOTE Treffsikkerhet: 82% (H?Y) 00:19:34.600 --> 00:19:37.800 high scores for the youngest children. NOTE Treffsikkerhet: 91% (H?Y) 00:19:37.800 --> 00:19:44.500 But we see that the shape of this relationship is very different from the previous one. It's 00:19:44.500 --> 00:19:51.800 different in two ways. The first way in which it's different, is that the band or the range of 00:19:51.800 --> 00:20:00.600 performance is very much wider. So if you know the age of the child, you can still make a better 00:20:00.600 --> 00:20:07.500 guess than without the age. But the improvement from knowing the age would be smaller. NOTE Treffsikkerhet: 91% (H?Y) 00:20:07.500 --> 00:20:09.950 So, you can guess NOTE Treffsikkerhet: 72% (MEDIUM) 00:20:09.950 --> 00:20:16.150 11 correct answers for a child at 90 months. NOTE Treffsikkerhet: 91% (H?Y) 00:20:16.150 --> 00:20:22.500 Or you can guess it for a child and hundred twenty months and although it's not optimal in either 00:20:22.500 --> 00:20:28.699 case. You would not be making a huge mistake for many children. NOTE Treffsikkerhet: 91% (H?Y) 00:20:28.699 --> 00:20:37.800 And the difference between the score you'd guess for a young child and a much older child would be 00:20:37.800 --> 00:20:48.100 much smaller. So this is a weaker relationship, indicated by the scattered values in this graph, in 00:20:48.100 --> 00:20:52.500 comparison to the condensed values of the previous graph. NOTE Treffsikkerhet: 91% (H?Y) 00:20:52.500 --> 00:21:01.600 It's also different in one more way. In that we shouldn't fit a straight line here. If we try to fit 00:21:01.600 --> 00:21:09.000 a line that goes through the average performance at each stage. We will see that the best line would 00:21:09.000 --> 00:21:16.700 not be completely straight. This line isn't going straight up like the previous one. Instead it's 00:21:16.700 --> 00:21:22.050 decelerating. It's increasing less and less steeply with NOTE Treffsikkerhet: 79% (H?Y) 00:21:22.050 --> 00:21:30.100 higher ages. And that's in large part because there is only 18 questions to this test. So older kids 00:21:30.100 --> 00:21:36.300 tend to answer them or almost all of them. So, there's not much room to grow, but here the question 00:21:36.300 --> 00:21:43.300 is about the relationship not about its cause, so it looks like this relationship is not exactly 00:21:43.300 --> 00:21:50.400 linear. It's not proportional. The change in expected score NOTE Treffsikkerhet: 91% (H?Y) 00:21:50.400 --> 00:21:59.400 between a child from ninety to a hundred months old, is more than the expected change for a child 00:21:59.400 --> 00:22:05.600 going from a hundred and thirty 240 months old. So, the expected change here is less, NOTE Treffsikkerhet: 91% (H?Y) 00:22:05.600 --> 00:22:08.600 then the expected change here. NOTE Treffsikkerhet: 79% (H?Y) 00:22:08.600 --> 00:22:16.300 For the same amount of time. So this is not a straight line and this is called a curvilinear 00:22:16.300 --> 00:22:18.050 relationship. NOTE Treffsikkerhet: 75% (MEDIUM) 00:22:18.050 --> 00:22:26.949 And it's a curvilinear week relationship and we can look at the two graphs side by side. So this is 00:22:26.949 --> 00:22:33.600 reading comprehension by age and this is word reading efficiency by age and you can see that they're 00:22:33.600 --> 00:22:38.900 different in shape and in how spread the points are. NOTE Treffsikkerhet: 91% (H?Y) 00:22:38.900 --> 00:22:47.500 We can plot this relationship in a different way. If, instead of using the age, let's say we didn't 00:22:47.500 --> 00:22:55.000 have the actual age of the children, but we just recorded their grade. What school grade they were 00:22:55.000 --> 00:22:57.600 going when they were measured. NOTE Treffsikkerhet: 89% (H?Y) 00:22:57.600 --> 00:23:05.700 This is not a continuous variable. There is no grade two and a half that you go from 1 to 2. So 00:23:05.700 --> 00:23:07.350 these are categories. NOTE Treffsikkerhet: 84% (H?Y) 00:23:07.350 --> 00:23:17.500 Order categories at equal distances. So this is an interval scale variable, but there aren't any 00:23:17.500 --> 00:23:24.900 continuous values. So it's much more informative to plot them using box plots. NOTE Treffsikkerhet: 88% (H?Y) 00:23:24.900 --> 00:23:36.300 And we can see that the boxes for comprehension are much higher, much taller. Indicating a larger 00:23:36.300 --> 00:23:42.150 spread of values in comparison to the range of changes observed. NOTE Treffsikkerhet: 89% (H?Y) 00:23:42.150 --> 00:23:51.700 Relative to the fluency boxes and we can also see that the rate of increase for the medians, which 00:23:51.700 --> 00:23:57.900 are displayed here, seems to be decelerating more strongly for comprehension than word reading 00:23:57.900 --> 00:24:01.850 efficiency. Actually, using this graph NOTE Treffsikkerhet: 91% (H?Y) 00:24:01.850 --> 00:24:09.000 we can see that in fact neither of these two relationships is exactly linear because there is a 00:24:09.000 --> 00:24:16.800 slight deceleration forward reading efficiency as well. So as children grow older, they rate of 00:24:16.800 --> 00:24:24.600 increasing fluency becomes a bit smaller. So they improve at increasingly slower rates to some 00:24:24.600 --> 00:24:31.450 extent, but this effect is larger for comprehension which seems to deviate more NOTE Treffsikkerhet: 89% (H?Y) 00:24:31.450 --> 00:24:39.700 from the straight line because the shape of the boxes also changes. So these boxes NOTE Treffsikkerhet: 89% (H?Y) 00:24:39.800 --> 00:24:50.400 are much taller than these boxes. Which is not the case in the word reading efficiency graph. NOTE Treffsikkerhet: 89% (H?Y) 00:24:50.400 --> 00:24:59.300 So this is a more clearly nonlinear relationship. And this is also slightly nonlinear when you look 00:24:59.300 --> 00:25:01.800 at it in this graph. NOTE Treffsikkerhet: 91% (H?Y) 00:25:04.800 --> 00:25:12.850 Let us see what the difference is between specific categories look like and how you can evaluate 00:25:12.850 --> 00:25:20.200 the relationships between variables using a box plot. If you don't remember exactly what the box 00:25:20.200 --> 00:25:26.500 plot shows. Please go back and review the information about the box plots. NOTE Treffsikkerhet: 91% (H?Y) 00:25:28.300 --> 00:25:36.900 Here is the graph of word reading efficiency by grade and we can see the boxes and closing the half 00:25:36.900 --> 00:25:43.100 of the data by showing the first second and third quartile. NOTE Treffsikkerhet: 90% (H?Y) 00:25:44.300 --> 00:25:54.200 Let us concentrate on the difference between second grade and third grade. Is there a relationship 00:25:54.200 --> 00:26:03.200 between grade and word reading efficiency for this subset of the data? So, just for grades two and 00:26:03.200 --> 00:26:11.000 three. Is there a relationship between grade and word reading efficiency? NOTE Treffsikkerhet: 91% (H?Y) 00:26:11.000 --> 00:26:16.900 Does it look like we can predict one from the other? NOTE Treffsikkerhet: 90% (H?Y) 00:26:16.900 --> 00:26:27.300 The box plots help us evaluate that by focusing on the indicated lines. So these two boxes are 00:26:27.300 --> 00:26:38.200 actually quite far apart, as you can see by comparing the median for grade 3 to the 3rd quartile 00:26:38.200 --> 00:26:40.150 for grade 2. NOTE Treffsikkerhet: 88% (H?Y) 00:26:40.150 --> 00:26:46.600 You see that this line is actually a bit higher. This means that NOTE Treffsikkerhet: 86% (H?Y) 00:26:46.600 --> 00:26:50.900 half of third graders NOTE Treffsikkerhet: 84% (H?Y) 00:26:51.000 --> 00:27:02.200 have higher word reading efficiency than three-quarters of second graders. So this is 75% of second 00:27:02.200 --> 00:27:05.199 grade data below this red line. NOTE Treffsikkerhet: 90% (H?Y) 00:27:05.199 --> 00:27:14.700 And this is half of third grade data above this red line. So, this is a substantial difference in 00:27:14.700 --> 00:27:21.800 proportions, which is consistent with a relationship among these two variables. It means that you 00:27:21.800 --> 00:27:30.300 can use one to predict the other. And this looks like a clear difference between grades 2 & 3. NOTE Treffsikkerhet: 91% (H?Y) 00:27:37.600 --> 00:27:49.000 What about grades 5 & 6? Is there a clear difference? This time we see that it's only the median 00:27:49.000 --> 00:27:53.550 that's above the median. And so we can say that NOTE Treffsikkerhet: 66% (MEDIUM) 00:27:53.550 --> 00:28:04.000 fifty percent of sixth graders read with higher efficiency than 50% of fifth graders. So this 00:28:04.000 --> 00:28:11.300 doesn't look like as clear. A difference as the one between second and third grade. NOTE Treffsikkerhet: 91% (H?Y) 00:28:11.300 --> 00:28:16.900 However, there are more points we can visually compare. NOTE Treffsikkerhet: 88% (H?Y) 00:28:16.900 --> 00:28:27.300 If we look at the relationships between the first second and third quartile for these two groups. We 00:28:27.300 --> 00:28:31.250 see that for grade 6 they're all higher. NOTE Treffsikkerhet: 82% (H?Y) 00:28:31.250 --> 00:28:43.250 So this box and all of its parts for this box plot is systematically above this box and its parts. 00:28:43.250 --> 00:28:53.200 So all the quartiles for grade 6 are higher than the corresponding quartiles for grade 5 and 00:28:53.200 --> 00:28:58.700 this suggests that there probably is a relationship between these two variables that can be 00:28:58.700 --> 00:29:01.650 discerned if the samples are large enough. NOTE Treffsikkerhet: 76% (H?Y) 00:29:01.650 --> 00:29:09.600 So if we have enough five graders and enough sixth graders, so if we have lots of kids, this 00:29:09.600 --> 00:29:17.500 situation here suggests that there probably is a relationship between the two variables. Although 00:29:17.500 --> 00:29:22.600 it's not as clear as the one between the second and third. NOTE Treffsikkerhet: 87% (H?Y) 00:29:25.300 --> 00:29:33.200 Let's look at the other variable. Reading comprehension. Are there differences between specific 00:29:33.200 --> 00:29:37.500 grades? What about between grades three and four? NOTE Treffsikkerhet: 78% (H?Y) 00:29:37.500 --> 00:29:46.650 Well, here we see that the boxplot for grade 4, actually the middle 50% NOTE Treffsikkerhet: 77% (H?Y) 00:29:46.650 --> 00:29:52.200 falls within the range of the middle fifty percent for third grade. NOTE Treffsikkerhet: 88% (H?Y) 00:29:52.200 --> 00:30:01.700 So the difference is even less clear if it exists. These 2 box plots are not suggestive of a clear 00:30:01.700 --> 00:30:09.900 difference. Now, if we look at the individual lines, we see that the first quartile and the median 00:30:09.900 --> 00:30:17.700 are higher for grade 4 than for grade 3. Which suggests that there might be a relationship between 00:30:17.700 --> 00:30:21.150 these two variables for these two grades. NOTE Treffsikkerhet: 91% (H?Y) 00:30:21.150 --> 00:30:29.700 But it's not very clear at all. And the third quartile is exactly the same. So this is a slight 00:30:29.700 --> 00:30:36.500 possibility for a relationship and it would really depend on having a lot of children in order to be 00:30:36.500 --> 00:30:39.000 statistically discernible. NOTE Treffsikkerhet: 82% (H?Y) 00:30:40.300 --> 00:30:47.800 Let's contrast this situation with a comparison between grades 2 and 6. NOTE Treffsikkerhet: 91% (H?Y) 00:30:47.800 --> 00:30:51.699 In this case we see that NOTE Treffsikkerhet: 57% (MEDIUM) 00:30:51.699 --> 00:30:55.900 seventy percent of 6th graders NOTE Treffsikkerhet: 91% (H?Y) 00:30:55.900 --> 00:31:07.500 score higher than 70% of second graders. So if we only had these two boxes to work with, we would 00:31:07.500 --> 00:31:15.200 say that there is a very clear difference or otherwise a very clear relationship, between grade and 00:31:15.200 --> 00:31:22.250 reading comprehension. That there is an effect of grade on reading comprehension or that there is 00:31:22.250 --> 00:31:26.700 shared variance between grade and reading comprehension. NOTE Treffsikkerhet: 91% (H?Y) 00:31:30.600 --> 00:31:37.900 Let us now go back to look at how different strengths and directions of relationships appear on 00:31:37.900 --> 00:31:43.650 scatter plots. Here is a plot of NOTE Treffsikkerhet: 91% (H?Y) 00:31:43.650 --> 00:31:53.700 word fluency, so words read correctly per minute, as a function of number of spelling errors on a 00:31:53.700 --> 00:31:55.400 spelling test. NOTE Treffsikkerhet: 91% (H?Y) 00:31:55.400 --> 00:32:03.300 We see that there aren't any points here in this corner, and there aren't any points in this corner, 00:32:03.300 --> 00:32:09.250 but there are plenty of points in this corner, and in this corner. NOTE Treffsikkerhet: 91% (H?Y) 00:32:09.250 --> 00:32:17.900 Indeed the points are relatively clustered around an imaginary line going like this. Which is 00:32:17.900 --> 00:32:26.200 consistent with a relatively strong relationship between spelling and fluency. More specifically we 00:32:26.200 --> 00:32:35.600 see that higher word reading efficiency or higher fluency, is associated with few spelling errors. 00:32:35.600 --> 00:32:39.900 This makes a lot of sense. It means that kids who read more NOTE Treffsikkerhet: 68% (MEDIUM) 00:32:39.900 --> 00:32:48.400 efficiency are also better spellers in. This is what this graph shows us. There is a clear relationship 00:32:48.400 --> 00:32:57.000 between spelling and word reading efficiency or fluency, and we can see that this relationship is 00:32:57.000 --> 00:33:03.850 negative for these two variables. Negative means that as one variable increases, NOTE Treffsikkerhet: 91% (H?Y) 00:33:03.850 --> 00:33:07.550 the other variable decreases. NOTE Treffsikkerhet: 62% (MEDIUM) 00:33:07.550 --> 00:33:13.350 So the variable number of spelling errors gets higher NOTE Treffsikkerhet: 91% (H?Y) 00:33:13.350 --> 00:33:18.500 the variable words per minute gets lower. NOTE Treffsikkerhet: 88% (H?Y) 00:33:19.900 --> 00:33:28.500 These are data for one grade only. These are only for third grade kids. So this is not an age 00:33:28.500 --> 00:33:37.200 effect. It's an effect of reading and spelling skill and it shows that more spelling errors as well 00:33:37.200 --> 00:33:44.700 as lower efficiency in word reading, are associated and are both indices of reading and spelling 00:33:44.700 --> 00:33:46.200 skills. NOTE Treffsikkerhet: 91% (H?Y) 00:33:49.900 --> 00:33:56.500 This is the corresponding graph for the relationship between the variable reading comprehension 00:33:56.500 --> 00:34:03.600 score, as the number of correct answers to comprehension questions, and again number of spelling 00:34:03.600 --> 00:34:05.100 errors. NOTE Treffsikkerhet: 84% (H?Y) 00:34:05.900 --> 00:34:15.400 Again, we see that this looks like a negative relationship. So that there are points here, a large 00:34:15.400 --> 00:34:23.699 number of spelling errors, seems to be somewhat associated with a lower score on comprehension and a 00:34:23.699 --> 00:34:31.300 low number of spelling errors Sseems associated with a high score on comprehension. There aren't 00:34:31.300 --> 00:34:36.100 many points in this corner. However, these points are NOTE Treffsikkerhet: 74% (MEDIUM) 00:34:36.100 --> 00:34:42.699 much more scattered, their spread over a much larger range. So, this is not a very strong 00:34:42.699 --> 00:34:50.400 relationship. It's a week, negative relationship between the variables of spelling errors and 00:34:50.400 --> 00:34:55.800 number of correct comprehension answers for third graders. NOTE Treffsikkerhet: 91% (H?Y) 00:34:58.000 --> 00:35:07.000 Here is another pair of variables. On the vertical axis here we have the same fluency metric, so 00:35:07.000 --> 00:35:13.900 words per minute or word reading efficiency, and on the horizontal axis we have the vocabulary 00:35:13.900 --> 00:35:19.800 score, so number of pictures correctly chosen for the spoken words. NOTE Treffsikkerhet: 91% (H?Y) 00:35:20.400 --> 00:35:28.700 We don't really see much of a relationship here. It's not really the case that there are many points 00:35:28.700 --> 00:35:34.800 in only two of the four corners. And we don't see a very clear shape formed by these points. They 00:35:34.800 --> 00:35:43.000 seem to be scattered all over the panel. Maybe there is a few more points around the top right 00:35:43.000 --> 00:35:49.000 corner, but if there is a relationship, it's a really, really weak one. NOTE Treffsikkerhet: 91% (H?Y) 00:35:51.600 --> 00:36:02.300 In contrast, if we plot comprehension by vocabulary. So the horizontal axis is again the number of 00:36:02.300 --> 00:36:05.800 correct pictures chosen in the vocabulary test. NOTE Treffsikkerhet: 91% (H?Y) 00:36:05.800 --> 00:36:13.600 And the vertical axis is the number of correct answers in the reading comprehension test. NOTE Treffsikkerhet: 86% (H?Y) 00:36:13.600 --> 00:36:21.200 We see that there aren't many points in this corner, or in this corner. NOTE Treffsikkerhet: 91% (H?Y) 00:36:21.400 --> 00:36:25.550 There are plenty of points in this corner. NOTE Treffsikkerhet: 90% (H?Y) 00:36:25.550 --> 00:36:34.000 And there are also some points down here. The spread of these points is substantial. So this is not 00:36:34.000 --> 00:36:41.300 a very strong relationship. But still, it's a clear relationship between vocabulary and reading 00:36:41.300 --> 00:36:49.100 comprehension. And there is actually a quantity that we can calculate based on these points that 00:36:49.100 --> 00:36:52.650 expresses the strength of the relationship. NOTE Treffsikkerhet: 91% (H?Y) 00:36:52.650 --> 00:36:58.400 It is called correlation and we will talk about it more specifically. NOTE Treffsikkerhet: 89% (H?Y) 00:37:00.800 --> 00:37:07.800 It is called correlation and we will see it in more detail later in the course. For now it's enough 00:37:07.800 --> 00:37:14.800 to say that it goes from minus 1 to 1, that 0 means that there is no relationship between the 00:37:14.800 --> 00:37:16.050 variables. NOTE Treffsikkerhet: 81% (H?Y) 00:37:16.050 --> 00:37:25.500 - 1 means perfect negative relationship, and plus 1 means perfect positive relationship. So the 00:37:25.500 --> 00:37:35.149 correlation values for these sets of variables range from 0.15. NOTE Treffsikkerhet: 80% (H?Y) 00:37:35.149 --> 00:37:43.400 Which is a very low value, its near zero. It's a slightly positive relationship, but it's really 00:37:43.400 --> 00:37:45.050 very low. NOTE Treffsikkerhet: 77% (H?Y) 00:37:45.050 --> 00:37:55.900 The largest value here is 0.71 and it's negative. It's - 0.71 indicated that there's one variable 00:37:55.900 --> 00:38:03.300 increases. The other variable decreases. Hence, this is sloping downwards in the panel. NOTE Treffsikkerhet: 91% (H?Y) 00:38:03.300 --> 00:38:12.400 The relatively large value here is consistent with these points being cluster around an imaginary 00:38:12.400 --> 00:38:15.950 line rather than spread over the whole panel. NOTE Treffsikkerhet: 91% (H?Y) 00:38:15.950 --> 00:38:19.600 In the other two cases are intermediate. NOTE Treffsikkerhet: 91% (H?Y) 00:38:19.600 --> 00:38:30.000 So this is a low to moderate relationship between spelling and comprehension, which is negative. So 00:38:30.000 --> 00:38:38.200 more spelling errors are to a slight extent associated with fewer correct responses to the 00:38:38.200 --> 00:38:40.100 comprehension questions. NOTE Treffsikkerhet: 89% (H?Y) 00:38:40.100 --> 00:38:47.100 And a moderate to strong relationship between reading comprehension and vocabulary, which is a 00:38:47.100 --> 00:38:48.899 positive one. NOTE Treffsikkerhet: 91% (H?Y) 00:38:48.899 --> 00:38:57.200 As higher vocabulary scores are associated with a larger number of correct answers to the 00:38:57.200 --> 00:39:07.100 comprehension questions. So this is a representative range of strength and direction for 00:39:07.100 --> 00:39:14.300 relationships among pairs of quantitative, that is numerical variables. NOTE Treffsikkerhet: 84% (H?Y) 00:39:17.300 --> 00:39:28.500 Here is one final case. This is time looking at a new word while reading. So these are from kids in 00:39:28.500 --> 00:39:37.500 grade five reading sentences in which there are some made-up words they've never seen before, but 00:39:37.500 --> 00:39:46.000 they encounter these same novel words in several sentences and we use eye tracking to measure how 00:39:46.000 --> 00:39:47.200 long they look NOTE Treffsikkerhet: 83% (H?Y) 00:39:47.200 --> 00:39:58.900 at each word and we plot the duration of their gaze. So this is one second. It's 1000 of a 00:39:58.900 --> 00:40:03.300 second. So this is one second. This is two seconds. NOTE Treffsikkerhet: 89% (H?Y) 00:40:03.600 --> 00:40:11.800 And we plot this variable, as a function of how many times they've seen this new word. So one is the 00:40:11.800 --> 00:40:20.700 first time they encounter it and then it's the second time and third time, up to six times. Is there 00:40:20.700 --> 00:40:24.100 a relationship between repetition, NOTE Treffsikkerhet: 80% (H?Y) 00:40:24.100 --> 00:40:33.300 how many times have seen the word, and gaze duration, how long they look at it? There seems to be a 00:40:33.300 --> 00:40:41.600 relationship because we can look at the difference between the first and second time they see the 00:40:41.600 --> 00:40:50.800 word and we can see that the lines that join the corresponding quartiles are all sloping downwards. NOTE Treffsikkerhet: 91% (H?Y) 00:40:50.800 --> 00:40:57.350 And it looks like half of the data for the first encounter NOTE Treffsikkerhet: 89% (H?Y) 00:40:57.350 --> 00:41:05.000 are higher than almost 3/4 of the data for the second encounter. Almost. NOTE Treffsikkerhet: 87% (H?Y) 00:41:05.000 --> 00:41:11.200 So this is a relatively clear difference between the first and the second time you look at a new 00:41:11.200 --> 00:41:14.400 word. If you're in fifth grade. NOTE Treffsikkerhet: 91% (H?Y) 00:41:14.600 --> 00:41:21.600 If you look at the difference between the second and third time, it seems smaller than the preceding 00:41:21.600 --> 00:41:30.400 one, but probably there, although it would require a relatively large number of data points to 00:41:30.400 --> 00:41:33.900 statistically verify that it exists. NOTE Treffsikkerhet: 88% (H?Y) 00:41:35.000 --> 00:41:42.800 This situation is less clear for the difference between the third and fourth repetition. So here, it 00:41:42.800 --> 00:41:51.500 seems to level off. There may be a bit of a difference. So you may be getting a bit faster in the 00:41:51.500 --> 00:42:00.300 fourth encounter, but it's certainly not very big and may not be statistically reliable. NOTE Treffsikkerhet: 85% (H?Y) 00:42:00.300 --> 00:42:08.700 After the fourth encounter, there is really not much evidence for a relationship. This looks like a 00:42:08.700 --> 00:42:16.700 bit of noise there. Lines go a bit up or a bit down. The different quartiles aren't coordinated. So, 00:42:16.700 --> 00:42:22.700 it looks like once you've seen a new word for times, you don't get any faster. NOTE Treffsikkerhet: 91% (H?Y) 00:42:22.700 --> 00:42:30.500 And we can visualize the whole relationship using a smooth curve of this kind of shape, which shows 00:42:30.500 --> 00:42:36.450 two things. The first one is that this curve is sloping downwards. NOTE Treffsikkerhet: 90% (H?Y) 00:42:36.450 --> 00:42:46.650 Which is a negative relationship. So the more times you've seen the word, the less you look at it. NOTE Treffsikkerhet: 87% (H?Y) 00:42:46.650 --> 00:42:54.600 One variable increases the other variable decreases. So, this line is sloping downwards. The other 00:42:54.600 --> 00:43:01.800 is this line is not straight at all. It's markedly curved. So, the differences between the first 00:43:01.800 --> 00:43:09.400 couple of encounters with the word, a new word, are much larger than any differences further down 00:43:09.400 --> 00:43:17.200 with subsequent encounters. So this line is leveling off. So, this is a curvilinear negative NOTE Treffsikkerhet: 80% (H?Y) 00:43:17.200 --> 00:43:24.700 relationship between the number of times you have encountered a new word and the time you spend 00:43:24.700 --> 00:43:26.600 looking at it. NOTE Treffsikkerhet: 83% (H?Y) 00:43:28.700 --> 00:43:37.800 And that's the kind of information that you can visually obtain by looking at a box plot like this. NOTE Treffsikkerhet: 79% (H?Y) 00:43:38.800 --> 00:43:48.700 So, when we talk about relationship types and shapes that things that you can evaluate graphically 00:43:48.700 --> 00:43:55.500 include whether the relationship is linear. This means proportional. NOTE Treffsikkerhet: 87% (H?Y) 00:43:55.600 --> 00:44:04.000 Or if it's nonlinear, so it may be curvilinear. Maybe the line that indicates the relationship 00:44:04.000 --> 00:44:08.250 between two variables is curved instead of straight. NOTE Treffsikkerhet: 90% (H?Y) 00:44:08.250 --> 00:44:16.500 There can be a strong or weak relationship which is indicated by the spread of the data points on 00:44:16.500 --> 00:44:18.850 the two variable graph. NOTE Treffsikkerhet: 91% (H?Y) 00:44:18.850 --> 00:44:26.900 And it can be a positive or a negative relationship. Indicating whether the two values increase 00:44:26.900 --> 00:44:34.500 together or whether increase in one variable is associated with a decrease in the other variable. In 00:44:34.500 --> 00:44:41.250 these are all things that you can get from looking at graphs plotting two quantitative variables. NOTE Treffsikkerhet: 80% (H?Y) 00:44:41.250 --> 00:44:47.600 To sum up and taking into account all the different kinds of variables. NOTE Treffsikkerhet: 91% (H?Y) 00:44:47.600 --> 00:44:56.050 As we've said before statistical analysis always concerns associations among two or more variables. NOTE Treffsikkerhet: 86% (H?Y) 00:44:56.050 --> 00:45:04.300 Indeed, all quantitative research questions concerned associations among two or more variables. NOTE Treffsikkerhet: 91% (H?Y) 00:45:04.900 --> 00:45:12.700 There are several different words that essentially refer to the same idea, the associations between 00:45:12.700 --> 00:45:20.500 variables. These are relationship and effect, a prediction, or shared variance. NOTE Treffsikkerhet: 79% (H?Y) 00:45:20.900 --> 00:45:26.500 In all of these can be used in both direction. NOTE Treffsikkerhet: 87% (H?Y) 00:45:26.600 --> 00:45:34.500 In all of these can be used in both directions. So, associations between variables are 00:45:34.500 --> 00:45:42.100 statistically bidirectional. Statistics isn't telling you which variable affects the other. It's 00:45:42.100 --> 00:45:50.000 telling you how confident you can be in whether the two variables are associated. In whether the two 00:45:50.000 --> 00:45:56.800 variables change values in somewhat systematic or consistent ways. NOTE Treffsikkerhet: 84% (H?Y) 00:45:59.200 --> 00:46:08.700 These four different words referring to statistical associations also express the idea that 00:46:08.700 --> 00:46:16.600 relationships can vary in strength or effects vary in size, which the same thing, that are consistent 00:46:16.600 --> 00:46:24.200 with different proportions of shared variance or different magnitudes of prediction error. NOTE Treffsikkerhet: 91% (H?Y) 00:46:26.800 --> 00:46:35.200 Using graphs of our variables, we can detect any potential problems with the data. And we can also 00:46:35.200 --> 00:46:42.600 detect the extent to which there may be relationships between our variables, and we can have a first 00:46:42.600 --> 00:46:49.900 impression of the type, the shape of relationship and the strength of a relationship. However, we 00:46:49.900 --> 00:46:53.050 can never draw a conclusion based on that. NOTE Treffsikkerhet: 91% (H?Y) 00:46:53.050 --> 00:47:01.800 We always need to conduct formal statistical analysis in order to quantify the effect size. So find 00:47:01.800 --> 00:47:10.550 out exactly how strong the relationship is and also calculate the confidence. How confident we can be 00:47:10.550 --> 00:47:15.900 in that there is a statistical association between these two variables.