Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
0
00:00:00,000 --> 00:00:01,417
PETER REDDIEN: Hypothesis testing.
1
00:00:01,417 --> 00:00:08,130
2
00:00:08,130 --> 00:00:16,730
We have a null hypothesis, our model,
3
00:00:16,730 --> 00:00:18,145
which is our hypothesis 3.
4
00:00:18,145 --> 00:00:21,570
5
00:00:21,570 --> 00:00:26,040
And what we want to know is can we reject it?
6
00:00:26,040 --> 00:00:35,200
7
00:00:35,200 --> 00:00:38,140
Or are our data inconsistent with hypothesis 3.
8
00:00:38,140 --> 00:00:42,160
9
00:00:42,160 --> 00:00:44,170
And the statistical test will use here
10
00:00:44,170 --> 00:00:49,960
is a chi-square test, where what we'll
11
00:00:49,960 --> 00:00:55,300
do is for all of our classes we're
12
00:00:55,300 --> 00:01:01,390
going to sum up our data of the observed data
13
00:01:01,390 --> 00:01:04,989
minus the expected squared.
14
00:01:04,989 --> 00:01:07,630
15
00:01:07,630 --> 00:01:08,410
Over the expected.
16
00:01:08,410 --> 00:01:20,050
17
00:01:20,050 --> 00:01:22,450
So what did we observe and expect?
18
00:01:22,450 --> 00:01:37,880
Well, so if we say we have paralyzed flies, not paralyzed,
19
00:01:37,880 --> 00:01:40,760
and then we have some observations,
20
00:01:40,760 --> 00:01:44,420
and some expectations.
21
00:01:44,420 --> 00:01:50,570
So what we observed was we had four paralyzed, 12 not
22
00:01:50,570 --> 00:01:55,160
paralyzed, and we expected seven and nine.
23
00:01:55,160 --> 00:01:58,700
24
00:01:58,700 --> 00:02:04,200
So we have two classes and our observed and expected data,
25
00:02:04,200 --> 00:02:08,039
so now we can calculate this chi-square value.
26
00:02:08,039 --> 00:02:20,885
So it will be 7 minus 4 squared over 7.
27
00:02:20,885 --> 00:02:25,960
28
00:02:25,960 --> 00:02:28,680
Well, I'll do it the other way.
29
00:02:28,680 --> 00:02:29,680
It doesn't matter, but--
30
00:02:29,680 --> 00:02:35,450
31
00:02:35,450 --> 00:02:41,690
and then 12 minus 9 squared over 9.
32
00:02:41,690 --> 00:02:46,920
33
00:02:46,920 --> 00:02:53,040
We'll get some value for this, which is 2.28.
34
00:02:53,040 --> 00:02:56,040
And that is our chi-square value.
35
00:02:56,040 --> 00:03:03,300
And now, we can ask with this chi-square value
36
00:03:03,300 --> 00:03:07,230
how likely would it be to get a chi-square value of 2.28
37
00:03:07,230 --> 00:03:09,220
if our hypothesis was right.
38
00:03:09,220 --> 00:03:13,620
And we can look at a lookup table, where
39
00:03:13,620 --> 00:03:18,990
we can see the probabilities of getting a chi-square value
40
00:03:18,990 --> 00:03:23,820
of 2.28 with this data.
41
00:03:23,820 --> 00:03:26,370
So let's just look at this first row for now.
42
00:03:26,370 --> 00:03:31,710
We see 2.28 is falling somewhere between 0.05 and 0.2,
43
00:03:31,710 --> 00:03:34,920
but we do have to consider this y-axis here,
44
00:03:34,920 --> 00:03:38,640
these other rows, which are the degrees of freedom.
45
00:03:38,640 --> 00:03:42,700
46
00:03:42,700 --> 00:03:54,960
So the Degrees of Freedom, or df, we're going to be equal--
47
00:03:54,960 --> 00:03:58,140
is going to be equal to the number of classes minus 1.
48
00:03:58,140 --> 00:04:05,900
49
00:04:05,900 --> 00:04:07,980
It's referring to the number of independent,
50
00:04:07,980 --> 00:04:11,470
normally distributed variables.
51
00:04:11,470 --> 00:04:14,820
And because if we know the total in one class,
52
00:04:14,820 --> 00:04:17,160
we can calculate the other class,
53
00:04:17,160 --> 00:04:20,769
then we have one independent, normally distributed variable.
54
00:04:20,769 --> 00:04:22,170
So that's why you subtract 1.
55
00:04:22,170 --> 00:04:25,530
So if you had three classes, we have two classes.
56
00:04:25,530 --> 00:04:28,320
If we had three classes, we'd have two degrees of freedom.
57
00:04:28,320 --> 00:04:31,687
If we have two classes, we have one degree of freedom.
58
00:04:31,687 --> 00:04:33,270
So we have one degree of freedom here.
59
00:04:33,270 --> 00:04:36,310
60
00:04:36,310 --> 00:04:40,700
So our degrees of freedom is 1.
61
00:04:40,700 --> 00:04:44,100
62
00:04:44,100 --> 00:04:46,510
So now then looking at our chi-square table,
63
00:04:46,510 --> 00:04:51,090
we see that our probability of getting this chi-square value
64
00:04:51,090 --> 00:04:57,100
is greater than 0.05 and less than 0.2.
65
00:04:57,100 --> 00:05:05,620
So our p-value is greater than 0.05 and less than 0.2.
66
00:05:05,620 --> 00:05:08,340
So we have greater than a 1 in 20 chance
67
00:05:08,340 --> 00:05:11,040
if we do this experiment of getting
68
00:05:11,040 --> 00:05:16,900
data that deviates this far or more from our expected.
69
00:05:16,900 --> 00:05:19,320
So that's in some sense a less than a one in five chance.
70
00:05:19,320 --> 00:05:21,690
So in that, in some sense, is our answer
71
00:05:21,690 --> 00:05:24,330
to what is the probability of getting this data
72
00:05:24,330 --> 00:05:26,100
under the hypothesis 3.
73
00:05:26,100 --> 00:05:27,540
Now, can we reject it?
74
00:05:27,540 --> 00:05:37,090
75
00:05:37,090 --> 00:05:41,315
By convention, you need a p-value of less than 0.05
76
00:05:41,315 --> 00:05:41,815
to reject.
77
00:05:41,815 --> 00:05:47,860
78
00:05:47,860 --> 00:05:50,920
Now, that's just convention, doesn't mean necessarily
79
00:05:50,920 --> 00:05:52,925
that the hypothesis is wrong, just
80
00:05:52,925 --> 00:05:54,550
means that you would expect to get data
81
00:05:54,550 --> 00:05:58,610
like this with a probability of less than one in 20.
82
00:05:58,610 --> 00:06:02,410
By convention, that is a threshold for rejection.
83
00:06:02,410 --> 00:06:05,590
You can set different thresholds, but anyway--
84
00:06:05,590 --> 00:06:09,400
so what do we conclude from this, from our data?
85
00:06:09,400 --> 00:06:10,750
Can we reject the hypothesis?
86
00:06:10,750 --> 00:06:12,550
We cannot.
87
00:06:12,550 --> 00:06:25,620
So our data, we cannot reject hypothesis 3.
88
00:06:25,620 --> 00:06:27,900
You could go through some practice
89
00:06:27,900 --> 00:06:29,850
yourself by making bigger sample sizes
90
00:06:29,850 --> 00:06:32,350
and figure out what would happen.
91
00:06:32,350 --> 00:06:34,920
And general one wants to think about with experiments,
92
00:06:34,920 --> 00:06:36,782
so we won't do a lot of this here--
93
00:06:36,782 --> 00:06:39,240
the sample size you have, if you have two hypotheses you're
94
00:06:39,240 --> 00:06:41,010
trying to consider between, I want
95
00:06:41,010 --> 00:06:44,010
to do something called power analysis to figure out
96
00:06:44,010 --> 00:06:47,500
what sample size you would need to distinguish between data.
97
00:06:47,500 --> 00:06:48,000
6772
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.