WEBVTT Kind: captions; language: en-us NOTE Treffsikkerhet: 90% (H?Y) 00:00:00.000 --> 00:00:07.700 in this video we will see how to derive descriptive statistics in general mean NOTE Treffsikkerhet: 91% (H?Y) 00:00:07.700 --> 00:00:14.000 for this and many of the following examples we will be using a data set which you can download from 00:00:14.000 --> 00:00:15.700 canvas NOTE Treffsikkerhet: 80% (H?Y) 00:00:15.899 --> 00:00:27.049 which is called stats data. So to open these for the first time you have to go to open Engine movie 00:00:27.049 --> 00:00:36.050 and then browse and go to that place in your computer where you have stored this file, starts data 00:00:36.050 --> 00:00:42.800 sped 4010, 2020 and open. NOTE Treffsikkerhet: 80% (H?Y) 00:00:44.100 --> 00:00:54.700 After you open this file once, this file will appear down here to the left side and you will be able 00:00:54.700 --> 00:00:58.600 to directly open it by clicking here. NOTE Treffsikkerhet: 91% (H?Y) 00:00:58.600 --> 00:01:08.100 To go back to our file, every time you first load a file you should check your variables. What they 00:01:08.100 --> 00:01:12.100 are and how they are defined. NOTE Treffsikkerhet: 90% (H?Y) 00:01:12.300 --> 00:01:19.700 Ee will go into greater detail about this data set later, for now we will just talk about the basic 00:01:19.700 --> 00:01:27.700 properties of variables. The first variable is called ID and it's just a designation for each 00:01:27.700 --> 00:01:29.150 participant NOTE Treffsikkerhet: 90% (H?Y) 00:01:29.150 --> 00:01:34.050 so 11, 12, 13 these are all different children, NOTE Treffsikkerhet: 91% (H?Y) 00:01:34.050 --> 00:01:37.600 and there is 47 of them- NOTE Treffsikkerhet: 84% (H?Y) 00:01:37.600 --> 00:01:46.250 The second variable is sex and it includes males and females. The third variable is called home Lang 00:01:46.250 --> 00:01:52.850 and it refers to the home language whether it is a majority language or a minority language . NOTE Treffsikkerhet: 91% (H?Y) 00:01:52.850 --> 00:02:00.500 The fourth variable is called condition and has to do with an intervention provided to some children. NOTE Treffsikkerhet: 85% (H?Y) 00:02:01.100 --> 00:02:10.400 And the next set of variables are measured in kindergarten for these children, so we have a 00:02:10.400 --> 00:02:17.600 measure called matrices which is a general cognitive ability test, we have a measure of letter 00:02:17.600 --> 00:02:22.399 knowledge, how many letters of the alphabet that children know, NOTE Treffsikkerhet: 91% (H?Y) 00:02:22.399 --> 00:02:31.200 and a measure of receptive vocabulary, how many words they know and can show it by pointing to the 00:02:31.200 --> 00:02:38.400 right picture when they hear the corresponding word. In kindergarten we also have a measure of word 00:02:38.400 --> 00:02:45.800 reading fluency and as you may be able to see here, most of the values are 0 which means that kids in 00:02:45.800 --> 00:02:50.200 kindergarten don't know how to read any words yet. NOTE Treffsikkerhet: 80% (H?Y) 00:02:50.600 --> 00:02:59.300 There are also measures taken in grade one you g1 and grade 2, g2 so we have vocabulary in grade 1 00:02:59.300 --> 00:03:07.500 and 2 and word fluency in grade 1 and 2. And there is also a measurement of word fluency taken after 00:03:07.500 --> 00:03:13.800 some reading intervention. Some of these data come from a real study others are made up for the 00:03:13.800 --> 00:03:17.700 purpose of demonstrations in this course. NOTE Treffsikkerhet: 91% (H?Y) 00:03:17.700 --> 00:03:24.400 But otherwise they are very realistic data, so the first thing to check is the definition of the 00:03:24.400 --> 00:03:33.500 variables. The first four variables here are obviously not numbers they are labels so it's very easy 00:03:33.500 --> 00:03:40.600 to understand that they are actually measured on a nominal scale there is no ordering in any of 00:03:40.600 --> 00:03:42.149 these, no rank, NOTE Treffsikkerhet: 77% (H?Y) 00:03:42.149 --> 00:03:50.550 there is a bunch of Children here identified by the IDS, there is two categories in sects two labels 00:03:50.550 --> 00:03:58.200 there is two labels in home language, and to labels in condition. And to check the definition or to 00:03:58.200 --> 00:04:04.650 correct the definition of a variable, you double click on the variable name and this panel opens up 00:04:04.650 --> 00:04:10.500 and you can see here what the measuring scale is this is at the nominal level, which is correct, 00:04:10.500 --> 00:04:12.050 we're not going to change it. NOTE Treffsikkerhet: 91% (H?Y) 00:04:12.050 --> 00:04:17.250 We could set it to ordinal or continuous if it were a different kind of variable NOTE Treffsikkerhet: 91% (H?Y) 00:04:17.250 --> 00:04:23.600 also Jimovie has a specific type that's called ID, but we don't need to use it here we'll just 00:04:23.600 --> 00:04:30.500 leave it out the actual nominal scale. And here you can see all the different levels which in this 00:04:30.500 --> 00:04:37.550 case are the children, you can go to the next variable by clicking here or by clicking here. NOTE Treffsikkerhet: 81% (H?Y) 00:04:37.550 --> 00:04:46.300 The sex variable is also measured at the nominal level, we're not changing,it is a text label and the 00:04:46.300 --> 00:04:49.000 possible labels are f and m. NOTE Treffsikkerhet: 91% (H?Y) 00:04:49.000 --> 00:04:52.150 The next variable NOTE Treffsikkerhet: 91% (H?Y) 00:04:52.150 --> 00:05:02.100 is again nominal measurement of a text type, and includes two levels: majority and minority language. 00:05:02.100 --> 00:05:09.799 And the next variable is condition that has control and intervention values measured at the nominal 00:05:09.799 --> 00:05:20.600 scale level. The next variable matrices 0K which is in fact the number of correct answers in 00:05:20.600 --> 00:05:22.300 this test. NOTE Treffsikkerhet: 91% (H?Y) 00:05:22.300 --> 00:05:26.150 It is actually measured at the ratio level, NOTE Treffsikkerhet: 91% (H?Y) 00:05:26.150 --> 00:05:33.000 because it is a real count. So although the data are integers you cannot have half a question 00:05:33.000 --> 00:05:42.700 answered correctly, you still have a natural zero which means zero correct answers. And so this 00:05:42.700 --> 00:05:49.400 variable is measured at the ratio level and here is marked as continues, because that's what Jimovie 00:05:49.400 --> 00:05:52.500 calls the number variables. NOTE Treffsikkerhet: 90% (H?Y) 00:05:52.700 --> 00:06:01.100 The next variable is also continuous, it's letter knowledge at kindergarten and this is the number of 00:06:01.100 --> 00:06:09.200 letters known by the child, so again it's a ratio level. Vocabulary at kindergarten is again 00:06:09.200 --> 00:06:14.900 continuous because it's the number of correct responses in the vocabulary test, so this is correctly 00:06:14.900 --> 00:06:21.850 assigned to be continues. And then we have vocabulary grade 1 again, correct, vocabulary NOTE Treffsikkerhet: 69% (MEDIUM) 00:06:21.850 --> 00:06:24.850 at grade 2, correct, NOTE Treffsikkerhet: 91% (H?Y) 00:06:24.850 --> 00:06:35.800 Word fluency, this is measured as words per minute so it is again a ratio level variable which is 00:06:35.800 --> 00:06:38.600 marked as continues in Jimovie. NOTE Treffsikkerhet: 84% (H?Y) 00:06:40.400 --> 00:06:46.600 Word fluency in grade one, word fluency in grade 2. NOTE Treffsikkerhet: 91% (H?Y) 00:06:47.500 --> 00:06:56.100 So clicking on this will remove this variable definition panel, and now that we have checked all our 00:06:56.100 --> 00:07:05.200 variables and ensure that they are correctly defined we can go on and look at some descriptives. To 00:07:05.200 --> 00:07:11.300 perform descriptive analysis we have to be on the analysis panel here, not the data panel, and click 00:07:11.300 --> 00:07:15.100 on exploration ,descriptives. NOTE Treffsikkerhet: 83% (H?Y) 00:07:17.100 --> 00:07:26.300 And we'll go through this twice because what we can do with nominal level scales and ratio level 00:07:26.300 --> 00:07:36.300 skills are very different things. So first we should choose our nominal scales, we don't need to do 00:07:36.300 --> 00:07:46.600 any statistics on the IDS, unless we wanted to check for mistakes of this sort, for example if there 00:07:46.600 --> 00:07:47.900 are any children that NOTE Treffsikkerhet: 73% (MEDIUM) 00:07:47.900 --> 00:07:56.000 that appear twice. So if we want to check that we could add ID here and use a values table to 00:07:56.000 --> 00:08:03.200 see if any ID appears twice, because that would be a mistake. So I click here on frequency tables as a 00:08:03.200 --> 00:08:11.000 first thing to look at arw the frequencies of all the values, frequency here means how many times it 00:08:11.000 --> 00:08:17.750 is observed, we look at the frequenciee. So let's start from the bottom, and you see NOTE Treffsikkerhet: 82% (H?Y) 00:08:17.750 --> 00:08:27.700 that each ID appears once, all these are once so there are no mistakes in the IDS, none 00:08:27.700 --> 00:08:34.500 of the IDS appears twice, we're not really interested in this so I'm sending it back. NOTE Treffsikkerhet: 91% (H?Y) 00:08:35.600 --> 00:08:45.100 What kind of descriptive statistics can we do with nominal level variables, well not very much, we 00:08:45.100 --> 00:08:54.600 cannot calculate means or medians so I unclick these, there is no minimum or maximum, we can count how 00:08:54.600 --> 00:09:03.800 many values we have, and how many missing values we have. You can see here that for our three nominal NOTE Treffsikkerhet: 91% (H?Y) 00:09:03.800 --> 00:09:15.500 variables we have no missing value, so there is 47 values of each, none of the other options apply for 00:09:15.500 --> 00:09:23.600 qualitative, or categorical, or nominal scale variables, so we don't check anything else and that's it 00:09:23.600 --> 00:09:25.350 for our NOTE Treffsikkerhet: 91% (H?Y) 00:09:25.350 --> 00:09:28.150 categorical variables. NOTE Treffsikkerhet: 82% (H?Y) 00:09:28.150 --> 00:09:40.100 Now we will do another run for the number, for the quantitative variables. And to go back I click here 00:09:40.100 --> 00:09:49.900 to close that analysis window, if I click here it comes back. And if I add or remove anything from 00:09:49.900 --> 00:09:54.750 here if I check anything this result is affected, NOTE Treffsikkerhet: 91% (H?Y) 00:09:54.750 --> 00:10:02.200 but I don't want to affect this now, I want to start a new set of descriptives concerning the 00:10:02.200 --> 00:10:08.600 quantitative, the numeric variables, those are marked as continues in Jimovie. So I want this to be 00:10:08.600 --> 00:10:09.700 done, NOTE Treffsikkerhet: 91% (H?Y) 00:10:09.700 --> 00:10:19.750 I could alternatively close it like that and start a new one. So again I go exploration, descriptives 00:10:19.750 --> 00:10:28.800 and here we have the beginning of a new set of descriptive analysis, and these will concern the 00:10:28.800 --> 00:10:34.849 numeric ones. So I clicked on the first variable, NOTE Treffsikkerhet: 83% (H?Y) 00:10:34.849 --> 00:10:39.700 I could double click, it it makes no difference, NOTE Treffsikkerhet: 85% (H?Y) 00:10:39.700 --> 00:10:44.200 I could send it in here like that NOTE Treffsikkerhet: 91% (H?Y) 00:10:44.300 --> 00:10:54.900 and back. And to select the whole bunch of variables together I can shift click shift, I'm pressing 00:10:54.900 --> 00:11:01.800 shift and clicking, and so all of these are selected. And I can send all of these here, I left this one 00:11:01.800 --> 00:11:07.550 out on purpose because it's the intervention variable and we don't care about it right now. NOTE Treffsikkerhet: 86% (H?Y) 00:11:07.550 --> 00:11:16.600 Jimovie will not produce a table of values for quantitative, for numeric variables, because there 00:11:16.600 --> 00:11:23.500 could be very many different values, indeed each value might even appear only once, so this is not a 00:11:23.500 --> 00:11:31.100 very informative thing to do with a numeric variable. Instead we can have the indices of central 00:11:31.100 --> 00:11:36.849 tendency and dispersion that we have already seen. So first we have to check NOTE Treffsikkerhet: 90% (H?Y) 00:11:36.849 --> 00:11:43.600 how many values we have and if there are any missing ones, this is always important to check. In terms 00:11:43.600 --> 00:11:49.400 of central tendency we can get the median and the mean, and we can also ask for the mode when it 00:11:49.400 --> 00:11:57.400 exists, we're usually not interested in it, if we are we click here and it appears in the list as well. NOTE Treffsikkerhet: 91% (H?Y) 00:11:57.800 --> 00:12:02.250 We can look at the individual quartiles, NOTE Treffsikkerhet: 91% (H?Y) 00:12:02.250 --> 00:12:12.800 we can look at the minimum and maximum values, the range of values, and the standard deviation. NOTE Treffsikkerhet: 91% (H?Y) 00:12:14.500 --> 00:12:22.200 And we can look at some other things that we will talk about later in the course. So these are the 00:12:22.200 --> 00:12:30.200 basic descriptive statistics that are usually interested in the mean and the median, the range, the 00:12:30.200 --> 00:12:37.400 standard deviation, the minimum and the maximum. aAd we're usually more interested in the minimum and 00:12:37.400 --> 00:12:44.000 maximum which are indicative of potential problems than their difference which is the range. So 00:12:44.000 --> 00:12:45.200 this is a NOTE Treffsikkerhet: 91% (H?Y) 00:12:45.200 --> 00:12:53.100 reasonable table of descriptives for a set of numeric variables. And now that we have completed this 00:12:53.100 --> 00:13:01.500 analysis we can hide the analysis panel, and we can save our results ,or copy/paste the tables into 00:13:01.500 --> 00:13:11.400 our report. We have one section for the categorical variables and one section for the numeric 00:13:11.400 --> 00:13:13.000 variables.