Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,930 --> 00:00:01,980
-: All right.
2
00:00:01,980 --> 00:00:04,200
Now that we've covered the necessary theory,
3
00:00:04,200 --> 00:00:05,823
it is time for some testing.
4
00:00:06,990 --> 00:00:09,570
We're going to explore two types of tests
5
00:00:09,570 --> 00:00:11,310
drawn from a single population
6
00:00:11,310 --> 00:00:13,413
and drawn from multiple populations.
7
00:00:14,430 --> 00:00:16,470
This is very similar to confidence intervals
8
00:00:16,470 --> 00:00:18,000
for a single population,
9
00:00:18,000 --> 00:00:20,250
and confidence intervals for two populations
10
00:00:20,250 --> 00:00:21,603
that we covered previously.
11
00:00:22,860 --> 00:00:26,340
In the next few videos, we will run tests for a single mean
12
00:00:26,340 --> 00:00:29,223
with both known variance and unknown variance.
13
00:00:30,810 --> 00:00:31,830
Let's start with a test
14
00:00:31,830 --> 00:00:34,083
in which the variance is known, shall we?
15
00:00:35,580 --> 00:00:36,420
For this test,
16
00:00:36,420 --> 00:00:39,663
we will use our good old data scientist salary example.
17
00:00:40,710 --> 00:00:42,663
Here's the data set one more time.
18
00:00:43,680 --> 00:00:46,683
By now, I hope you are able to calculate the sample mean,
19
00:00:48,210 --> 00:00:51,097
it is $100,200.
20
00:00:52,110 --> 00:00:54,150
The population variance is known
21
00:00:54,150 --> 00:00:57,933
and its standard deviation is equal to $15,000.
22
00:00:58,950 --> 00:01:01,803
Moreover, the sample size is 30.
23
00:01:03,300 --> 00:01:06,480
However, you saw that according to Glassdoor,
24
00:01:06,480 --> 00:01:08,820
the popular salary information website,
25
00:01:08,820 --> 00:01:12,693
the mean data scientist salary is $113,000.
26
00:01:13,620 --> 00:01:15,660
The sample that is available on Glassdoor
27
00:01:15,660 --> 00:01:17,760
is based on self-reported numbers,
28
00:01:17,760 --> 00:01:20,910
and you would like to see if its value is correct.
29
00:01:20,910 --> 00:01:22,860
We needed a two-sided test,
30
00:01:22,860 --> 00:01:24,090
as we are interested in knowing
31
00:01:24,090 --> 00:01:27,090
both that the salary is significantly less than that
32
00:01:27,090 --> 00:01:29,013
or significantly more than that.
33
00:01:30,840 --> 00:01:34,380
The null hypothesis is, the population means salary
34
00:01:34,380 --> 00:01:36,603
is $113,000.
35
00:01:38,190 --> 00:01:43,190
We denoted as mu zero equals $113,000.
36
00:01:44,520 --> 00:01:46,260
The alternative hypothesis
37
00:01:46,260 --> 00:01:48,180
is that the population mean salary
38
00:01:48,180 --> 00:01:50,763
is different than $113,000.
39
00:01:53,250 --> 00:01:54,090
All right.
40
00:01:54,090 --> 00:01:56,193
Formula time, almost.
41
00:01:57,060 --> 00:02:00,120
Testing is done by standardizing the variable at hand
42
00:02:00,120 --> 00:02:02,580
and comparing it to the z,
43
00:02:02,580 --> 00:02:05,013
which follows a standard normal distribution.
44
00:02:06,120 --> 00:02:08,070
Remember standardization?
45
00:02:08,070 --> 00:02:10,350
We learned about it in the previous section.
46
00:02:10,350 --> 00:02:12,960
Back then, I told you it was very important
47
00:02:12,960 --> 00:02:14,613
and you will now see why.
48
00:02:15,720 --> 00:02:17,160
For those that don't remember,
49
00:02:17,160 --> 00:02:18,480
I suggest watching the video
50
00:02:18,480 --> 00:02:20,700
on standardization once again.
51
00:02:20,700 --> 00:02:22,983
For the others, I will quickly go through it.
52
00:02:24,180 --> 00:02:27,030
We standardize a variable by subtracting the mean
53
00:02:27,030 --> 00:02:29,373
and dividing by the standard deviation.
54
00:02:30,420 --> 00:02:33,303
Since it is a sample, we use the standard error.
55
00:02:34,230 --> 00:02:37,420
Thus, the formula for standardization becomes
56
00:02:38,740 --> 00:02:41,850
Z is equal to the sample mean
57
00:02:41,850 --> 00:02:45,660
minus the value of interest from the null hypothesis
58
00:02:45,660 --> 00:02:47,823
divided by the standard error.
59
00:02:50,070 --> 00:02:52,380
In this way, we obtain a distribution
60
00:02:52,380 --> 00:02:55,623
with a mean of 0 and a standard deviation of 1.
61
00:02:56,910 --> 00:03:00,753
This Z should not be mistaken with z.
62
00:03:02,460 --> 00:03:05,100
The Z is the standardized variable
63
00:03:05,100 --> 00:03:07,050
associated with the test
64
00:03:07,050 --> 00:03:09,393
and will be called the Z-score from now on.
65
00:03:11,070 --> 00:03:13,080
The z is the one from the table
66
00:03:13,080 --> 00:03:14,790
that we've talked about before,
67
00:03:14,790 --> 00:03:18,213
and henceforth will be referred to as the critical value.
68
00:03:20,190 --> 00:03:22,710
All right, how does testing work?
69
00:03:22,710 --> 00:03:23,703
Think about this.
70
00:03:24,660 --> 00:03:26,730
The z is normally distributed
71
00:03:26,730 --> 00:03:29,340
with a mean and standard deviation of 1.
72
00:03:29,340 --> 00:03:31,320
The Z is normally distributed
73
00:03:31,320 --> 00:03:34,110
with a mean of X bar minus mu zero
74
00:03:34,110 --> 00:03:35,930
and a standard deviation of 1.
75
00:03:38,490 --> 00:03:41,250
Standardization lets us compare the means.
76
00:03:41,250 --> 00:03:45,210
The closer the difference of X bar and mu zero to zero,
77
00:03:45,210 --> 00:03:47,583
the closer Z-score itself to zero.
78
00:03:48,870 --> 00:03:52,383
This implies a higher chance to accept the null hypothesis.
79
00:03:53,670 --> 00:03:55,383
Let's go back to the example.
80
00:03:56,340 --> 00:04:00,120
So what is the value of our standardized variable?
81
00:04:00,120 --> 00:04:01,710
We plug in the numbers that we have
82
00:04:01,710 --> 00:04:03,360
from the beginning of the lesson.
83
00:04:04,260 --> 00:04:07,863
What we get is a Z-score of minus 4.67.
84
00:04:08,970 --> 00:04:11,160
Now, we will compare the absolute value
85
00:04:11,160 --> 00:04:16,079
of minus 4.67 with a z of alpha divided by 2,
86
00:04:16,079 --> 00:04:18,183
where alpha is the significance level.
87
00:04:19,110 --> 00:04:21,000
Note that we use the absolute value
88
00:04:21,000 --> 00:04:24,330
as it is much easier to always compare positive Z's
89
00:04:24,330 --> 00:04:26,073
with positive z's.
90
00:04:27,270 --> 00:04:31,290
Moreover, some z tables don't include negative values.
91
00:04:31,290 --> 00:04:33,360
You should be aware that the two statements,
92
00:04:33,360 --> 00:04:37,620
minus 4.67 is lower than the negative critical value,
93
00:04:37,620 --> 00:04:40,830
is the same as 4.67 is higher
94
00:04:40,830 --> 00:04:42,633
than the positive critical value.
95
00:04:43,620 --> 00:04:45,960
Thus, our decision rule becomes,
96
00:04:45,960 --> 00:04:47,610
absolute value of the Z-score
97
00:04:47,610 --> 00:04:49,530
should be higher than the absolute value
98
00:04:49,530 --> 00:04:50,630
of the critical value.
99
00:04:52,140 --> 00:04:53,790
Using 5% significance,
100
00:04:53,790 --> 00:04:56,550
our alpha is 0.05.
101
00:04:56,550 --> 00:04:58,350
Since it is a two-sided test,
102
00:04:58,350 --> 00:05:01,743
we check the table for z of 0.025.
103
00:05:02,880 --> 00:05:05,643
A corresponding value is 1.96.
104
00:05:07,200 --> 00:05:09,000
The last thing we need to do is compare
105
00:05:09,000 --> 00:05:11,553
our standardized variable to the critical value.
106
00:05:12,450 --> 00:05:15,180
If the Z-score is higher than 1.96,
107
00:05:15,180 --> 00:05:17,940
we would reject the null hypothesis.
108
00:05:17,940 --> 00:05:20,073
If it is lower, we will accept it.
109
00:05:21,630 --> 00:05:25,020
4.67 is higher than 1.96,
110
00:05:25,020 --> 00:05:27,333
therefore, we reject the null hypothesis.
111
00:05:28,500 --> 00:05:31,590
The answer is that at the 5% significance level,
112
00:05:31,590 --> 00:05:34,290
we have rejected the null hypothesis,
113
00:05:34,290 --> 00:05:36,240
or at 5% significance,
114
00:05:36,240 --> 00:05:37,680
there is no statistical evidence
115
00:05:37,680 --> 00:05:40,623
that the mean salary is $113,000.
116
00:05:42,600 --> 00:05:44,640
There are many other ways to express this
117
00:05:44,640 --> 00:05:46,500
and you'll probably hear more about this
118
00:05:46,500 --> 00:05:47,793
later on in the course.
119
00:05:50,070 --> 00:05:52,770
What if we had a different significance level?
120
00:05:52,770 --> 00:05:54,270
Using 1% significance,
121
00:05:54,270 --> 00:05:57,240
we have an alpha of 0.01.
122
00:05:57,240 --> 00:06:01,173
So z of alpha divided by 2 is 2.58.
123
00:06:02,100 --> 00:06:07,100
Once again, our Z-score of 4.67 is higher than 2.58,
124
00:06:07,710 --> 00:06:09,930
so we would reject the null hypothesis
125
00:06:09,930 --> 00:06:11,853
even at the 1% significance.
126
00:06:13,650 --> 00:06:15,030
But how much further can we go
127
00:06:15,030 --> 00:06:18,298
before we could not reject the null hypothesis anymore?
128
00:06:18,298 --> 00:06:19,131
0.5%?
129
00:06:20,073 --> 00:06:20,906
0.1%?
130
00:06:21,900 --> 00:06:23,820
There is a special technique that allows us
131
00:06:23,820 --> 00:06:26,100
to see what the significance level is,
132
00:06:26,100 --> 00:06:29,493
after which we will be unable to reject the null hypothesis.
133
00:06:30,330 --> 00:06:33,240
We will see it in our next video.
134
00:06:33,240 --> 00:06:34,073
Stay tuned.
10021
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.