subtitlecat.com

All language subtitles for 003 Rejection Region and Significance Level_en

Afrikaans

Akan

Albanian

Amharic

Arabic

Armenian

Azerbaijani

Basque

Belarusian

Bemba

Bengali

Bihari

Bosnian

Breton

Bulgarian

Cambodian

Catalan

Cebuano

Cherokee

Chichewa

Chinese (Simplified)

Chinese (Traditional)

Corsican

Croatian

Czech

Danish

Dutch

English

Esperanto

Estonian

Ewe

Faroese

Filipino

Finnish

French

Frisian

Galician

Georgian

German

Greek

Guarani

Gujarati

Haitian Creole

Hausa

Hawaiian

Hebrew

Hindi

Hmong

Hungarian

Icelandic

Igbo

Indonesian

Interlingua

Irish

Italian

Japanese

Javanese

Kannada

Kazakh

Kinyarwanda

Kirundi

Kongo

Korean

Krio (Sierra Leone)

Kurdish

Kurdish (Soranî)

Kyrgyz

Laothian

Latin

Latvian

Lingala

Lithuanian

Lozi

Luganda

Luo

Luxembourgish

Macedonian

Malagasy

Malay

Malayalam

Maltese

Maori

Marathi

Mauritian Creole

Moldavian

Mongolian

Myanmar (Burmese)

Montenegrin

Nepali

Nigerian Pidgin

Northern Sotho

Norwegian

Norwegian (Nynorsk)

Occitan

Oriya

Oromo

Pashto

Persian Download

Polish

Portuguese (Brazil)

Portuguese (Portugal)

Punjabi

Quechua

Romanian

Romansh

Runyakitara

Russian

Samoan

Scots Gaelic

Serbian

Serbo-Croatian

Sesotho

Setswana

Seychellois Creole

Shona

Sindhi

Sinhalese

Slovak

Slovenian

Somali

Spanish

Spanish (Latin American)

Sundanese

Swahili

Swedish

Tajik

Tamil

Tatar

Telugu

Thai

Tigrinya

Tonga

Tshiluba

Tumbuka

Turkish

Turkmen

Twi

Uighur

Ukrainian

Urdu

Uzbek

Vietnamese

Welsh

Wolof

Xhosa

Yiddish

Yoruba

Zulu

Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,600 --> 00:00:01,830 Instructor: Hi again. 2 00:00:01,830 --> 00:00:03,900 So you know what a hypothesis is, 3 00:00:03,900 --> 00:00:05,790 and you have an idea of how to form 4 00:00:05,790 --> 00:00:08,640 the null and alternative hypotheses. 5 00:00:08,640 --> 00:00:09,750 By the end of this lesson 6 00:00:09,750 --> 00:00:13,413 we will understand the reason why hypothesis testing works. 7 00:00:14,400 --> 00:00:17,613 First, we must define the term significance level. 8 00:00:18,630 --> 00:00:22,800 Normally, we aim to reject the null if it is false, right? 9 00:00:22,800 --> 00:00:25,920 However, as with any test, there is a small chance 10 00:00:25,920 --> 00:00:27,630 that we could get it wrong and reject 11 00:00:27,630 --> 00:00:29,433 a null hypothesis that is true. 12 00:00:30,570 --> 00:00:32,189 The significance level is denoted 13 00:00:32,189 --> 00:00:33,990 by alpha and is the probability 14 00:00:33,990 --> 00:00:37,200 of rejecting the null hypothesis if it is true. 15 00:00:37,200 --> 00:00:39,933 So the probability of making this error, 16 00:00:41,370 --> 00:00:46,370 typical values for alpha are 0.01, 0.05, and 0.1. 17 00:00:48,630 --> 00:00:50,280 It is a value that you select based 18 00:00:50,280 --> 00:00:51,753 on the certainty you need. 19 00:00:52,590 --> 00:00:53,580 In most cases, 20 00:00:53,580 --> 00:00:54,960 the choice of alpha is determined 21 00:00:54,960 --> 00:00:57,000 by the context you are operating in 22 00:00:57,000 --> 00:01:00,423 but 0.05 is the most commonly used value. 23 00:01:01,950 --> 00:01:04,110 Let's explore an example. 24 00:01:04,110 --> 00:01:07,890 Say you need to test if a machine is working properly 25 00:01:07,890 --> 00:01:09,480 You would expect the test to make little 26 00:01:09,480 --> 00:01:12,690 or no mistakes as you wanna be very precise, 27 00:01:12,690 --> 00:01:16,563 you should pick a low significance level, such as 0.01. 28 00:01:17,550 --> 00:01:21,390 The famous Coca-Cola glass bottle is 12 ounces. 29 00:01:21,390 --> 00:01:24,090 If the machine pours 12.1 ounces 30 00:01:24,090 --> 00:01:25,860 some of the liquid will be spilled 31 00:01:25,860 --> 00:01:28,680 and the label would be damaged as well. 32 00:01:28,680 --> 00:01:30,300 So in certain situations 33 00:01:30,300 --> 00:01:32,200 we need to be as accurate as possible. 34 00:01:33,570 --> 00:01:36,570 However, if we are analyzing humans or companies 35 00:01:36,570 --> 00:01:39,990 we would expect more random or at least uncertain behavior 36 00:01:39,990 --> 00:01:41,853 and hence a higher degree of error. 37 00:01:42,810 --> 00:01:44,610 For instance, if we wanna predict how much 38 00:01:44,610 --> 00:01:47,280 Coca-Cola it's consumers drink, on average, 39 00:01:47,280 --> 00:01:50,370 the difference between 12 ounces and 12.1 ounces 40 00:01:50,370 --> 00:01:52,320 will not be that crucial. 41 00:01:52,320 --> 00:01:54,480 So we can choose a higher significance level 42 00:01:54,480 --> 00:01:56,780 like 0.05 or 0.1. 43 00:01:59,130 --> 00:02:03,210 Okay, now that we have an idea about the significance level, 44 00:02:03,210 --> 00:02:06,093 let's get to the mechanics of hypothesis testing. 45 00:02:07,590 --> 00:02:10,710 Imagine you were consulting a university and wanna carry out 46 00:02:10,710 --> 00:02:13,773 an analysis on how students are performing on average. 47 00:02:14,640 --> 00:02:16,290 The university dean believes 48 00:02:16,290 --> 00:02:20,160 that on average students have a GPA of 70%. 49 00:02:20,160 --> 00:02:22,680 Being the data driven researcher that you are 50 00:02:22,680 --> 00:02:24,780 you can't simply agree with his opinion, 51 00:02:24,780 --> 00:02:26,193 so you start testing. 52 00:02:27,540 --> 00:02:32,540 The null hypothesis is the population mean grade is 70%. 53 00:02:32,670 --> 00:02:36,903 This is a hypothesized value, and we denote it with mu zero. 54 00:02:38,400 --> 00:02:40,440 The alternative hypothesis is 55 00:02:40,440 --> 00:02:43,440 the population mean grade is not 70%, 56 00:02:43,440 --> 00:02:47,003 so mu zero defers from 70%. 57 00:02:48,060 --> 00:02:50,040 All right, assuming that the population 58 00:02:50,040 --> 00:02:51,840 of grades is normally distributed, 59 00:02:51,840 --> 00:02:54,490 all grades received by students should look this way. 60 00:02:55,920 --> 00:02:58,200 That is the true population mean. 61 00:02:58,200 --> 00:03:01,353 Now a test we would normally perform is the Z test. 62 00:03:02,490 --> 00:03:05,790 The formula is, Z equals the sample mean, 63 00:03:05,790 --> 00:03:09,843 minus the hypothesized mean, divided by the standard error. 64 00:03:11,310 --> 00:03:13,620 The idea is the following, 65 00:03:13,620 --> 00:03:17,160 we are standardizing or scaling the sample mean we got, 66 00:03:17,160 --> 00:03:20,160 if the sample mean is close enough to the hypothesized mean, 67 00:03:20,160 --> 00:03:21,843 then Z will be close to zero, 68 00:03:22,740 --> 00:03:24,753 otherwise it will be far away from it. 69 00:03:26,010 --> 00:03:27,420 Naturally, if the sample mean is 70 00:03:27,420 --> 00:03:30,993 exactly equal to the hypothesized mean, Z will be zero. 71 00:03:32,190 --> 00:03:35,253 In all these cases, we would accept the null hypothesis. 72 00:03:36,630 --> 00:03:40,080 Okay, the question here is the following, 73 00:03:40,080 --> 00:03:43,563 how big should Z be for us to reject the null hypothesis? 74 00:03:44,910 --> 00:03:46,593 Well, there is a cutoff line. 75 00:03:47,520 --> 00:03:50,550 Since we are conducting a two sided or a two tail test, 76 00:03:50,550 --> 00:03:53,790 there are two cutoff lines, one on each side. 77 00:03:53,790 --> 00:03:56,820 When we calculate Z, we will get a value. 78 00:03:56,820 --> 00:03:58,680 If this value falls into the middle part 79 00:03:58,680 --> 00:04:00,870 then we cannot reject the null. 80 00:04:00,870 --> 00:04:03,450 If it falls outside, in the shaded region 81 00:04:03,450 --> 00:04:05,463 then we reject the null hypothesis. 82 00:04:06,300 --> 00:04:09,783 That is why the shaded part is called rejection region. 83 00:04:11,190 --> 00:04:13,470 All right, the area that is cut off 84 00:04:13,470 --> 00:04:15,620 actually depends on the significance level. 85 00:04:16,470 --> 00:04:20,399 The level of significance, alpha, is 0.05. 86 00:04:20,399 --> 00:04:22,800 Then we have alpha divided by 2, 87 00:04:22,800 --> 00:04:27,800 or 0.025 on the left side and 0.025 on the right side. 88 00:04:29,820 --> 00:04:33,210 Now, these are values we can check from the Z table. 89 00:04:33,210 --> 00:04:38,210 When alpha is 0.025, Z is 1.96, 90 00:04:38,370 --> 00:04:43,173 so 1.96 on the right side and -1.96 on the left side. 91 00:04:44,520 --> 00:04:46,470 Therefore, if the value we get for Z 92 00:04:46,470 --> 00:04:51,470 from the test is lower than minus 1.96 or higher than 1.96, 93 00:04:51,690 --> 00:04:54,300 we will reject the null hypothesis. 94 00:04:54,300 --> 00:04:56,073 Otherwise, we will accept it. 95 00:04:57,450 --> 00:05:00,363 That's more or less how hypothesis testing works. 96 00:05:01,200 --> 00:05:02,580 We scale the sample mean 97 00:05:02,580 --> 00:05:05,580 with respect to the hypothesized value. 98 00:05:05,580 --> 00:05:09,360 If Z is close to zero, then we cannot reject the null. 99 00:05:09,360 --> 00:05:10,920 If it is far away from zero 100 00:05:10,920 --> 00:05:12,843 then we reject the null hypothesis. 101 00:05:14,610 --> 00:05:17,880 Okay, what about one-sided tests? 102 00:05:17,880 --> 00:05:19,740 We have those too. 103 00:05:19,740 --> 00:05:21,903 Let's take the example from last lecture. 104 00:05:23,280 --> 00:05:28,110 Paul says, "Data scientists earn more than $125,000." 105 00:05:28,110 --> 00:05:33,110 So, H zero, is mu zero, is bigger or equal to $125,000. 106 00:05:35,310 --> 00:05:39,513 The alternative is that mu zero is lower than $125,000. 107 00:05:41,670 --> 00:05:43,560 Using the same level of significance, 108 00:05:43,560 --> 00:05:47,610 this time, the whole rejection region is on the left, 109 00:05:47,610 --> 00:05:50,853 so the rejection region has an area of alpha. 110 00:05:52,020 --> 00:05:53,460 Looking at the Z table 111 00:05:53,460 --> 00:05:57,150 that corresponds to a Z score of 1.645, 112 00:05:57,150 --> 00:06:00,093 and since it is on the left, it is with a minus sign. 113 00:06:01,290 --> 00:06:04,050 Now, when calculating our test statistic Z, 114 00:06:04,050 --> 00:06:07,530 if we get a value lower than -1.645, 115 00:06:07,530 --> 00:06:09,720 we would reject the null hypothesis 116 00:06:09,720 --> 00:06:11,340 as we have statistical evidence 117 00:06:11,340 --> 00:06:16,110 that the data scientists salary is less than $125,000. 118 00:06:16,110 --> 00:06:17,973 Otherwise, we would accept it. 119 00:06:19,830 --> 00:06:22,950 All right, to exhaust all possibilities, 120 00:06:22,950 --> 00:06:25,473 let's explore another one-tail test. 121 00:06:26,850 --> 00:06:28,770 Say the university dean told you 122 00:06:28,770 --> 00:06:33,390 that the average GPA students get is lower than 70%. 123 00:06:33,390 --> 00:06:36,480 In that case, the null hypothesis is 124 00:06:36,480 --> 00:06:39,990 mu zero is lower or equal to 70%, 125 00:06:39,990 --> 00:06:44,990 while the alternative, mu zero is bigger than 70%. 126 00:06:45,420 --> 00:06:47,520 In this situation, the rejection region 127 00:06:47,520 --> 00:06:49,440 is on the right side, 128 00:06:49,440 --> 00:06:52,830 so if the test statistic is bigger than the cutoff Z score, 129 00:06:52,830 --> 00:06:54,870 we would reject the null. 130 00:06:54,870 --> 00:06:56,133 Otherwise we wouldn't. 131 00:06:57,630 --> 00:07:00,030 Cool. That's all for now. 132 00:07:00,030 --> 00:07:02,970 In a lesson or two, we'll start testing. 133 00:07:02,970 --> 00:07:05,313 Just hold on a bit and thanks for watching. 10577