All language subtitles for 003 Rejection Region and Significance Level_en

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian Download
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,600 --> 00:00:01,830 Instructor: Hi again. 2 00:00:01,830 --> 00:00:03,900 So you know what a hypothesis is, 3 00:00:03,900 --> 00:00:05,790 and you have an idea of how to form 4 00:00:05,790 --> 00:00:08,640 the null and alternative hypotheses. 5 00:00:08,640 --> 00:00:09,750 By the end of this lesson 6 00:00:09,750 --> 00:00:13,413 we will understand the reason why hypothesis testing works. 7 00:00:14,400 --> 00:00:17,613 First, we must define the term significance level. 8 00:00:18,630 --> 00:00:22,800 Normally, we aim to reject the null if it is false, right? 9 00:00:22,800 --> 00:00:25,920 However, as with any test, there is a small chance 10 00:00:25,920 --> 00:00:27,630 that we could get it wrong and reject 11 00:00:27,630 --> 00:00:29,433 a null hypothesis that is true. 12 00:00:30,570 --> 00:00:32,189 The significance level is denoted 13 00:00:32,189 --> 00:00:33,990 by alpha and is the probability 14 00:00:33,990 --> 00:00:37,200 of rejecting the null hypothesis if it is true. 15 00:00:37,200 --> 00:00:39,933 So the probability of making this error, 16 00:00:41,370 --> 00:00:46,370 typical values for alpha are 0.01, 0.05, and 0.1. 17 00:00:48,630 --> 00:00:50,280 It is a value that you select based 18 00:00:50,280 --> 00:00:51,753 on the certainty you need. 19 00:00:52,590 --> 00:00:53,580 In most cases, 20 00:00:53,580 --> 00:00:54,960 the choice of alpha is determined 21 00:00:54,960 --> 00:00:57,000 by the context you are operating in 22 00:00:57,000 --> 00:01:00,423 but 0.05 is the most commonly used value. 23 00:01:01,950 --> 00:01:04,110 Let's explore an example. 24 00:01:04,110 --> 00:01:07,890 Say you need to test if a machine is working properly 25 00:01:07,890 --> 00:01:09,480 You would expect the test to make little 26 00:01:09,480 --> 00:01:12,690 or no mistakes as you wanna be very precise, 27 00:01:12,690 --> 00:01:16,563 you should pick a low significance level, such as 0.01. 28 00:01:17,550 --> 00:01:21,390 The famous Coca-Cola glass bottle is 12 ounces. 29 00:01:21,390 --> 00:01:24,090 If the machine pours 12.1 ounces 30 00:01:24,090 --> 00:01:25,860 some of the liquid will be spilled 31 00:01:25,860 --> 00:01:28,680 and the label would be damaged as well. 32 00:01:28,680 --> 00:01:30,300 So in certain situations 33 00:01:30,300 --> 00:01:32,200 we need to be as accurate as possible. 34 00:01:33,570 --> 00:01:36,570 However, if we are analyzing humans or companies 35 00:01:36,570 --> 00:01:39,990 we would expect more random or at least uncertain behavior 36 00:01:39,990 --> 00:01:41,853 and hence a higher degree of error. 37 00:01:42,810 --> 00:01:44,610 For instance, if we wanna predict how much 38 00:01:44,610 --> 00:01:47,280 Coca-Cola it's consumers drink, on average, 39 00:01:47,280 --> 00:01:50,370 the difference between 12 ounces and 12.1 ounces 40 00:01:50,370 --> 00:01:52,320 will not be that crucial. 41 00:01:52,320 --> 00:01:54,480 So we can choose a higher significance level 42 00:01:54,480 --> 00:01:56,780 like 0.05 or 0.1. 43 00:01:59,130 --> 00:02:03,210 Okay, now that we have an idea about the significance level, 44 00:02:03,210 --> 00:02:06,093 let's get to the mechanics of hypothesis testing. 45 00:02:07,590 --> 00:02:10,710 Imagine you were consulting a university and wanna carry out 46 00:02:10,710 --> 00:02:13,773 an analysis on how students are performing on average. 47 00:02:14,640 --> 00:02:16,290 The university dean believes 48 00:02:16,290 --> 00:02:20,160 that on average students have a GPA of 70%. 49 00:02:20,160 --> 00:02:22,680 Being the data driven researcher that you are 50 00:02:22,680 --> 00:02:24,780 you can't simply agree with his opinion, 51 00:02:24,780 --> 00:02:26,193 so you start testing. 52 00:02:27,540 --> 00:02:32,540 The null hypothesis is the population mean grade is 70%. 53 00:02:32,670 --> 00:02:36,903 This is a hypothesized value, and we denote it with mu zero. 54 00:02:38,400 --> 00:02:40,440 The alternative hypothesis is 55 00:02:40,440 --> 00:02:43,440 the population mean grade is not 70%, 56 00:02:43,440 --> 00:02:47,003 so mu zero defers from 70%. 57 00:02:48,060 --> 00:02:50,040 All right, assuming that the population 58 00:02:50,040 --> 00:02:51,840 of grades is normally distributed, 59 00:02:51,840 --> 00:02:54,490 all grades received by students should look this way. 60 00:02:55,920 --> 00:02:58,200 That is the true population mean. 61 00:02:58,200 --> 00:03:01,353 Now a test we would normally perform is the Z test. 62 00:03:02,490 --> 00:03:05,790 The formula is, Z equals the sample mean, 63 00:03:05,790 --> 00:03:09,843 minus the hypothesized mean, divided by the standard error. 64 00:03:11,310 --> 00:03:13,620 The idea is the following, 65 00:03:13,620 --> 00:03:17,160 we are standardizing or scaling the sample mean we got, 66 00:03:17,160 --> 00:03:20,160 if the sample mean is close enough to the hypothesized mean, 67 00:03:20,160 --> 00:03:21,843 then Z will be close to zero, 68 00:03:22,740 --> 00:03:24,753 otherwise it will be far away from it. 69 00:03:26,010 --> 00:03:27,420 Naturally, if the sample mean is 70 00:03:27,420 --> 00:03:30,993 exactly equal to the hypothesized mean, Z will be zero. 71 00:03:32,190 --> 00:03:35,253 In all these cases, we would accept the null hypothesis. 72 00:03:36,630 --> 00:03:40,080 Okay, the question here is the following, 73 00:03:40,080 --> 00:03:43,563 how big should Z be for us to reject the null hypothesis? 74 00:03:44,910 --> 00:03:46,593 Well, there is a cutoff line. 75 00:03:47,520 --> 00:03:50,550 Since we are conducting a two sided or a two tail test, 76 00:03:50,550 --> 00:03:53,790 there are two cutoff lines, one on each side. 77 00:03:53,790 --> 00:03:56,820 When we calculate Z, we will get a value. 78 00:03:56,820 --> 00:03:58,680 If this value falls into the middle part 79 00:03:58,680 --> 00:04:00,870 then we cannot reject the null. 80 00:04:00,870 --> 00:04:03,450 If it falls outside, in the shaded region 81 00:04:03,450 --> 00:04:05,463 then we reject the null hypothesis. 82 00:04:06,300 --> 00:04:09,783 That is why the shaded part is called rejection region. 83 00:04:11,190 --> 00:04:13,470 All right, the area that is cut off 84 00:04:13,470 --> 00:04:15,620 actually depends on the significance level. 85 00:04:16,470 --> 00:04:20,399 The level of significance, alpha, is 0.05. 86 00:04:20,399 --> 00:04:22,800 Then we have alpha divided by 2, 87 00:04:22,800 --> 00:04:27,800 or 0.025 on the left side and 0.025 on the right side. 88 00:04:29,820 --> 00:04:33,210 Now, these are values we can check from the Z table. 89 00:04:33,210 --> 00:04:38,210 When alpha is 0.025, Z is 1.96, 90 00:04:38,370 --> 00:04:43,173 so 1.96 on the right side and -1.96 on the left side. 91 00:04:44,520 --> 00:04:46,470 Therefore, if the value we get for Z 92 00:04:46,470 --> 00:04:51,470 from the test is lower than minus 1.96 or higher than 1.96, 93 00:04:51,690 --> 00:04:54,300 we will reject the null hypothesis. 94 00:04:54,300 --> 00:04:56,073 Otherwise, we will accept it. 95 00:04:57,450 --> 00:05:00,363 That's more or less how hypothesis testing works. 96 00:05:01,200 --> 00:05:02,580 We scale the sample mean 97 00:05:02,580 --> 00:05:05,580 with respect to the hypothesized value. 98 00:05:05,580 --> 00:05:09,360 If Z is close to zero, then we cannot reject the null. 99 00:05:09,360 --> 00:05:10,920 If it is far away from zero 100 00:05:10,920 --> 00:05:12,843 then we reject the null hypothesis. 101 00:05:14,610 --> 00:05:17,880 Okay, what about one-sided tests? 102 00:05:17,880 --> 00:05:19,740 We have those too. 103 00:05:19,740 --> 00:05:21,903 Let's take the example from last lecture. 104 00:05:23,280 --> 00:05:28,110 Paul says, "Data scientists earn more than $125,000." 105 00:05:28,110 --> 00:05:33,110 So, H zero, is mu zero, is bigger or equal to $125,000. 106 00:05:35,310 --> 00:05:39,513 The alternative is that mu zero is lower than $125,000. 107 00:05:41,670 --> 00:05:43,560 Using the same level of significance, 108 00:05:43,560 --> 00:05:47,610 this time, the whole rejection region is on the left, 109 00:05:47,610 --> 00:05:50,853 so the rejection region has an area of alpha. 110 00:05:52,020 --> 00:05:53,460 Looking at the Z table 111 00:05:53,460 --> 00:05:57,150 that corresponds to a Z score of 1.645, 112 00:05:57,150 --> 00:06:00,093 and since it is on the left, it is with a minus sign. 113 00:06:01,290 --> 00:06:04,050 Now, when calculating our test statistic Z, 114 00:06:04,050 --> 00:06:07,530 if we get a value lower than -1.645, 115 00:06:07,530 --> 00:06:09,720 we would reject the null hypothesis 116 00:06:09,720 --> 00:06:11,340 as we have statistical evidence 117 00:06:11,340 --> 00:06:16,110 that the data scientists salary is less than $125,000. 118 00:06:16,110 --> 00:06:17,973 Otherwise, we would accept it. 119 00:06:19,830 --> 00:06:22,950 All right, to exhaust all possibilities, 120 00:06:22,950 --> 00:06:25,473 let's explore another one-tail test. 121 00:06:26,850 --> 00:06:28,770 Say the university dean told you 122 00:06:28,770 --> 00:06:33,390 that the average GPA students get is lower than 70%. 123 00:06:33,390 --> 00:06:36,480 In that case, the null hypothesis is 124 00:06:36,480 --> 00:06:39,990 mu zero is lower or equal to 70%, 125 00:06:39,990 --> 00:06:44,990 while the alternative, mu zero is bigger than 70%. 126 00:06:45,420 --> 00:06:47,520 In this situation, the rejection region 127 00:06:47,520 --> 00:06:49,440 is on the right side, 128 00:06:49,440 --> 00:06:52,830 so if the test statistic is bigger than the cutoff Z score, 129 00:06:52,830 --> 00:06:54,870 we would reject the null. 130 00:06:54,870 --> 00:06:56,133 Otherwise we wouldn't. 131 00:06:57,630 --> 00:07:00,030 Cool. That's all for now. 132 00:07:00,030 --> 00:07:02,970 In a lesson or two, we'll start testing. 133 00:07:02,970 --> 00:07:05,313 Just hold on a bit and thanks for watching. 10577

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.