WEBVTT Kind: captions; language: en-us NOTE Treffsikkerhet: 85% (H?Y) 00:00:00.100 --> 00:00:08.100 in this video I will show you how to do calculations on contingency tables using Excel. Let us start 00:00:08.100 --> 00:00:16.400 with some counts, the same as in the previous example, so let's say we have the variable sex, NOTE Treffsikkerhet: 91% (H?Y) 00:00:18.800 --> 00:00:22.100 and home language, NOTE Treffsikkerhet: 86% (H?Y) 00:00:25.900 --> 00:00:30.900 and this can be majority or minority. NOTE Treffsikkerhet: 74% (MEDIUM) 00:00:33.200 --> 00:00:37.900 And this can be male or female. NOTE Treffsikkerhet: 75% (MEDIUM) 00:00:38.000 --> 00:00:46.000 and the numbers are 24, 1, 15 and 7. NOTE Treffsikkerhet: 91% (H?Y) 00:00:46.500 --> 00:00:53.400 So these are our basic counts, what are our marginal sums? NOTE Treffsikkerhet: 91% (H?Y) 00:00:58.700 --> 00:01:07.900 This is for the boys, and this is for the girls, and here are the marginal sums on columns. NOTE Treffsikkerhet: 89% (H?Y) 00:01:10.000 --> 00:01:20.700 And you see that what I'm doing is I am using ranges, and I'm then pulling to recreate the same sum. 00:01:20.700 --> 00:01:30.100 So if I enter B 3: before it means from B3 to be 4. NOTE Treffsikkerhet: 79% (H?Y) 00:01:33.000 --> 00:01:42.000 Now in order to calculate the expected values, which is the next thing I'm going to need, I have to 00:01:42.000 --> 00:01:52.300 multiply the marginal sums and divide by the grand total, so if I want the expected value for this one I need 00:01:52.300 --> 00:01:54.949 a formula. I need this NOTE Treffsikkerhet: 91% (H?Y) 00:01:54.949 --> 00:02:07.100 times this, divided by this, and what I want to do is I want to fix these so that they only change in 00:02:07.100 --> 00:02:13.700 the desired Direction. This one should be able to go down, but it shouldn't be able to go to the right, 00:02:13.700 --> 00:02:24.200 so I want D here to be fixed, so I had a dollar. This cell I want to be able to go right, but not down. 00:02:24.200 --> 00:02:24.600 So NOTE Treffsikkerhet: 76% (H?Y) 00:02:24.600 --> 00:02:29.900 I fix this to 5 and the grand total should be fixed for everything. NOTE Treffsikkerhet: 90% (H?Y) 00:02:29.900 --> 00:02:39.750 In this way when I pull to the right this one is using the correct marginal sums, NOTE Treffsikkerhet: 91% (H?Y) 00:02:39.750 --> 00:02:52.900 if I pull both of them down, then this is using the correct marginal sums. I can reduce the number of 00:02:52.900 --> 00:03:01.500 decimal places that I see because that's too many, and I can copy and paste the sums which paste the 00:03:01.500 --> 00:03:07.750 formulas, so that you can confirm that these are conserved. NOTE Treffsikkerhet: 91% (H?Y) 00:03:07.750 --> 00:03:11.650 So these are the expected values NOTE Treffsikkerhet: 91% (H?Y) 00:03:11.650 --> 00:03:21.100 if you want to confirm whether these indeed preserve the same proportions you can always check. So 00:03:21.100 --> 00:03:24.250 this divided by this, NOTE Treffsikkerhet: 78% (H?Y) 00:03:24.250 --> 00:03:31.300 is 4.875 and then the same is true for these. NOTE Treffsikkerhet: 91% (H?Y) 00:03:31.300 --> 00:03:37.700 And you can try the same thing on the columns, we're not going to do it right now. NOTE Treffsikkerhet: 91% (H?Y) 00:03:39.600 --> 00:03:50.900 So in order to create our table observed, expected, observed minus expected NOTE Treffsikkerhet: 91% (H?Y) 00:03:51.300 --> 00:03:54.250 and then squared NOTE Treffsikkerhet: 91% (H?Y) 00:03:54.250 --> 00:03:56.900 and then NOTE Treffsikkerhet: 79% (H?Y) 00:03:57.700 --> 00:04:09.250 minus expected, and then the sum. These are the lines that I need to calculate, so here I want this NOTE Treffsikkerhet: 91% (H?Y) 00:04:09.250 --> 00:04:13.400 here, I want this NOTE Treffsikkerhet: 91% (H?Y) 00:04:13.400 --> 00:04:20.149 here, I want this and here I want this. NOTE Treffsikkerhet: 91% (H?Y) 00:04:20.149 --> 00:04:23.000 So this is NOTE Treffsikkerhet: 88% (H?Y) 00:04:23.100 --> 00:04:31.800 I'm just going to copy again the names of the combinations of the two variables so that it's clear 00:04:31.800 --> 00:04:33.800 what we're looking at. NOTE Treffsikkerhet: 85% (H?Y) 00:04:39.900 --> 00:04:44.000 And correspondingly here I want this NOTE Treffsikkerhet: 75% (MEDIUM) 00:04:44.000 --> 00:04:46.100 and here NOTE Treffsikkerhet: 91% (H?Y) 00:04:46.100 --> 00:05:00.300 I want this here, I want this and here I want this. So this is the expected. And now I want observed 00:05:00.300 --> 00:05:03.050 minus expected NOTE Treffsikkerhet: 91% (H?Y) 00:05:03.050 --> 00:05:13.900 and I pull this to the right so I get all of them. And here I want this times itself, I click again, NOTE Treffsikkerhet: 85% (H?Y) 00:05:13.900 --> 00:05:17.100 and I pull to the right, NOTE Treffsikkerhet: 86% (H?Y) 00:05:17.100 --> 00:05:20.299 and I don't need that many NOTE Treffsikkerhet: 91% (H?Y) 00:05:20.299 --> 00:05:23.200 visible decimals. NOTE Treffsikkerhet: 81% (H?Y) 00:05:23.300 --> 00:05:33.850 And here I want this divided by the expected, pull is to the right. NOTE Treffsikkerhet: 91% (H?Y) 00:05:33.850 --> 00:05:41.500 I don't need to see all of these, actually I could have done this, let me just delete and do it again 00:05:41.500 --> 00:05:49.700 so that they're all pulled together, so it can only do it once this is observed minus NOTE Treffsikkerhet: 91% (H?Y) 00:05:50.300 --> 00:05:52.950 expected. NOTE Treffsikkerhet: 91% (H?Y) 00:05:52.950 --> 00:05:55.600 This is NOTE Treffsikkerhet: 91% (H?Y) 00:05:55.700 --> 00:06:02.200 the difference by itself, and this is NOTE Treffsikkerhet: 91% (H?Y) 00:06:02.200 --> 00:06:11.600 the squared deviation divided by the expectation. sSo I can collect all these by clicking and dragging, 00:06:11.600 --> 00:06:14.250 and just drag the whole thing NOTE Treffsikkerhet: 90% (H?Y) 00:06:14.250 --> 00:06:18.900 resulting in this same table. NOTE Treffsikkerhet: 82% (H?Y) 00:06:18.900 --> 00:06:22.700 And then I just need to add all these up, NOTE Treffsikkerhet: 81% (H?Y) 00:06:23.000 --> 00:06:34.200 sum and I click and drag the whole range so you see I sum everything from G7 to J7 and this is my 00:06:34.200 --> 00:06:35.799 chi Square. NOTE Treffsikkerhet: 91% (H?Y) 00:06:35.799 --> 00:06:45.500 Now I can look this up in the p-value calculator, the online app, with one degree of Freedom or if you 00:06:45.500 --> 00:06:52.550 have a sufficiently recent version of excel you can use the Excel function to get you NOTE Treffsikkerhet: 83% (H?Y) 00:06:52.550 --> 00:07:00.400 the correct probability from the chi-square distribution. And this is your chi-square value NOTE Treffsikkerhet: 89% (H?Y) 00:07:01.100 --> 00:07:08.550 with one degree of freedom, and we want acumulative probability so for all the area, NOTE Treffsikkerhet: 79% (H?Y) 00:07:08.550 --> 00:07:19.700 but actually we want not up to that value, but from that value onwards to the tail. So we want one minus 00:07:19.700 --> 00:07:28.150 this so 1 minus the cumulative probability for the chi-square distribution up to that value, which is 00:07:28.150 --> 00:07:29.750 this number. NOTE Treffsikkerhet: 77% (H?Y) 00:07:29.750 --> 00:07:33.000 Well that's alot of decimals, NOTE Treffsikkerhet: 75% (MEDIUM) 00:07:33.000 --> 00:07:37.500 and this should now be a familiar number. NOTE Treffsikkerhet: 90% (H?Y) 00:07:37.500 --> 00:07:44.700 So that's how you calculate the contingency table and the chi-square test of Association for this 00:07:44.700 --> 00:07:48.500 contingency table using Excel. NOTE Treffsikkerhet: 88% (H?Y) 00:07:48.500 --> 00:07:55.500 The advantage of using Excel to do these kinds of things is that you can very easily change your 00:07:55.500 --> 00:07:59.600 data and everything is updated automatically. NOTE Treffsikkerhet: 91% (H?Y) 00:07:59.700 --> 00:08:05.800 So this is the only place where we should intervene. NOTE Treffsikkerhet: 86% (H?Y) 00:08:11.300 --> 00:08:19.850 Let's try to duplicate the other example with the language screening where the relevant numbers were 00:08:19.850 --> 00:08:25.900 12,32, 85 and 90. NOTE Treffsikkerhet: 86% (H?Y) 00:08:25.900 --> 00:08:31.100 And now these values are teacher ratings NOTE Treffsikkerhet: 91% (H?Y) 00:08:34.600 --> 00:08:40.500 and these are the clinical classifications. NOTE Treffsikkerhet: 91% (H?Y) 00:08:42.000 --> 00:08:49.200 As you can see the whole analysis is immediately replicated based on the new accounts, and so now we 00:08:49.200 --> 00:08:57.200 can see how we can calculate our diagnostic indices starting with the useless accuracy, NOTE Treffsikkerhet: 91% (H?Y) 00:09:03.700 --> 00:09:09.650 positive predictive value, negative predictive value NOTE Treffsikkerhet: 91% (H?Y) 00:09:09.650 --> 00:09:25.800 and I have already aligned these to the right. So accuracy of this screening is the sum of these two, NOTE Treffsikkerhet: 84% (H?Y) 00:09:26.500 --> 00:09:33.900 the parentheses are very important because I want the whole sum to be divided by the grand total and 00:09:33.900 --> 00:09:38.550 not just the second number. So this is my accuracy, NOTE Treffsikkerhet: 86% (H?Y) 00:09:38.550 --> 00:09:50.900 and I can set all these numbers to be shown with two decimal places, my sensitivity is this divided 00:09:50.900 --> 00:09:52.600 by this, NOTE Treffsikkerhet: 85% (H?Y) 00:09:52.600 --> 00:09:59.150 my specificity is this divided by this, NOTE Treffsikkerhet: 74% (MEDIUM) 00:09:59.150 --> 00:10:12.950 my positive predictive value is this divided by this, and my negative predictive value is this divided by this. NOTE Treffsikkerhet: 90% (H?Y) 00:10:12.950 --> 00:10:22.800 So here are all my diagnostic indices in a very simple Excel sheet that will immediately update as 00:10:22.800 --> 00:10:25.900 soon as I enter new counts.