All language subtitles for 001 Fundamentals of Probability Distributions_en

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian Download
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:03,030 --> 00:00:04,500 Instructor: Welcome back! 2 00:00:04,500 --> 00:00:05,790 This lecture is going to serve 3 00:00:05,790 --> 00:00:09,210 as an overview of what a probability distribution is 4 00:00:09,210 --> 00:00:12,210 and what some of its main characteristics are. 5 00:00:12,210 --> 00:00:15,810 Simply put, a distribution shows the possible values 6 00:00:15,810 --> 00:00:18,783 a variable can take and how frequently they occur. 7 00:00:19,680 --> 00:00:20,790 Before we start, 8 00:00:20,790 --> 00:00:23,760 let us introduce some important notation we use 9 00:00:23,760 --> 00:00:26,160 for the remainder of the course. 10 00:00:26,160 --> 00:00:30,060 Assume that uppercase Y represents the actual outcome 11 00:00:30,060 --> 00:00:33,600 of an event and lowercase Y represents one 12 00:00:33,600 --> 00:00:34,863 of the possible outcomes. 13 00:00:36,000 --> 00:00:37,440 One way to denote the likelihood 14 00:00:37,440 --> 00:00:39,870 of reaching a particular outcome Y, 15 00:00:39,870 --> 00:00:43,863 is P of Y equals Y. 16 00:00:44,790 --> 00:00:48,063 We can also express it as P of Y. 17 00:00:49,590 --> 00:00:53,070 For example, uppercase Y could represent the number 18 00:00:53,070 --> 00:00:55,890 of red marbles we draw out of a bag 19 00:00:55,890 --> 00:00:58,650 and lowercase Y would be a specific number 20 00:00:58,650 --> 00:01:00,603 like three or five. 21 00:01:01,620 --> 00:01:03,390 Then we express the probability 22 00:01:03,390 --> 00:01:05,430 of getting exactly five red marbles 23 00:01:05,430 --> 00:01:10,430 as P of Y equals five or P of five. 24 00:01:12,270 --> 00:01:14,970 Since P of Y expresses the probability 25 00:01:14,970 --> 00:01:16,770 for each distinct outcome, 26 00:01:16,770 --> 00:01:19,233 we call this the probability function. 27 00:01:20,850 --> 00:01:22,680 Good job folks! 28 00:01:22,680 --> 00:01:24,660 So probability distributions 29 00:01:24,660 --> 00:01:27,540 or simply probabilities measure the likelihood 30 00:01:27,540 --> 00:01:28,920 of an outcome depending 31 00:01:28,920 --> 00:01:31,653 on how often it is featured in the sample space. 32 00:01:32,520 --> 00:01:34,320 Recall that we constructed the probability 33 00:01:34,320 --> 00:01:36,480 frequency distribution of an event 34 00:01:36,480 --> 00:01:38,733 in the introductory section of the course. 35 00:01:40,080 --> 00:01:41,520 We recorded the frequency 36 00:01:41,520 --> 00:01:43,860 for each unique value and divided it 37 00:01:43,860 --> 00:01:46,473 by the total number of elements in the sample space. 38 00:01:47,370 --> 00:01:50,340 Usually, that is the way we construct these probabilities 39 00:01:50,340 --> 00:01:53,673 when we have a finite number of possible outcomes. 40 00:01:54,690 --> 00:01:57,180 If we had an infinite number of possibilities 41 00:01:57,180 --> 00:02:00,960 then recording the frequency for each one becomes impossible 42 00:02:00,960 --> 00:02:04,320 because there are infinitely many of them. 43 00:02:04,320 --> 00:02:07,680 For instance, imagine you are a data scientist 44 00:02:07,680 --> 00:02:10,983 and want to analyze the time it takes for your code to run. 45 00:02:11,910 --> 00:02:13,980 Any single compilation could take anywhere 46 00:02:13,980 --> 00:02:17,280 from a few milliseconds to several days. 47 00:02:17,280 --> 00:02:20,700 Often, the result will be between a few milliseconds 48 00:02:20,700 --> 00:02:21,663 and a few minutes. 49 00:02:22,650 --> 00:02:26,010 If we record time in seconds, we lose precision 50 00:02:26,010 --> 00:02:27,660 which is something to be avoided. 51 00:02:29,010 --> 00:02:30,180 To do so we need 52 00:02:30,180 --> 00:02:32,673 to use the smallest possible measurement of time. 53 00:02:33,510 --> 00:02:37,380 Since every milli, micro or even nanosecond could be split 54 00:02:37,380 --> 00:02:41,460 in half for greater accuracy no such thing exists. 55 00:02:41,460 --> 00:02:43,050 In less than an hour from now 56 00:02:43,050 --> 00:02:46,500 we will talk in more detail about continuous distributions 57 00:02:46,500 --> 00:02:47,800 and how to deal with them. 58 00:02:49,380 --> 00:02:51,903 Now is the time to introduce some key definitions. 59 00:02:52,860 --> 00:02:54,690 Regardless of whether we have a finite 60 00:02:54,690 --> 00:02:56,880 or infinite number of possibilities, 61 00:02:56,880 --> 00:02:58,620 we define distributions using 62 00:02:58,620 --> 00:03:00,690 only two characteristics, 63 00:03:00,690 --> 00:03:03,360 mean and variance. 64 00:03:03,360 --> 00:03:04,950 Simply put, the mean 65 00:03:04,950 --> 00:03:07,983 of the distribution is its average value. 66 00:03:08,940 --> 00:03:10,710 Variance, on the other hand 67 00:03:10,710 --> 00:03:12,993 is essentially how spread out the data is. 68 00:03:13,980 --> 00:03:16,170 We measure this spread by how far away 69 00:03:16,170 --> 00:03:18,213 from the mean all the values are. 70 00:03:19,860 --> 00:03:21,540 The more dispersed the data is 71 00:03:21,540 --> 00:03:23,433 the higher its variance will be. 72 00:03:24,600 --> 00:03:26,760 We denote the mean of a distribution 73 00:03:26,760 --> 00:03:28,950 with the Greek letter mu 74 00:03:28,950 --> 00:03:31,503 and it's variance with sigma-squared. 75 00:03:33,780 --> 00:03:36,450 Okay, when analyzing distributions 76 00:03:36,450 --> 00:03:38,280 it is important to understand what kind 77 00:03:38,280 --> 00:03:42,753 of data we are dealing with, population or sample data. 78 00:03:43,980 --> 00:03:46,290 Population data is the formal way of referring 79 00:03:46,290 --> 00:03:50,673 to all the data while sample data is just a part of it. 80 00:03:51,810 --> 00:03:54,900 For example, if an employer surveys an entire department 81 00:03:54,900 --> 00:03:56,700 about how they travel to work 82 00:03:56,700 --> 00:03:58,950 the data would represent the population 83 00:03:58,950 --> 00:04:00,570 of the department. 84 00:04:00,570 --> 00:04:03,750 However, this same data would also just be a sample 85 00:04:03,750 --> 00:04:05,763 of the employees in the whole company. 86 00:04:07,560 --> 00:04:10,050 Something to remember when using sample data is 87 00:04:10,050 --> 00:04:11,970 that we adopt different notations 88 00:04:11,970 --> 00:04:13,743 for the mean and variance. 89 00:04:14,670 --> 00:04:17,640 We denote sample mean as x-bar 90 00:04:17,640 --> 00:04:20,673 and sample variance as s-squared. 91 00:04:22,260 --> 00:04:23,880 One flaw of variance is 92 00:04:23,880 --> 00:04:26,610 that it is measured in squared units. 93 00:04:26,610 --> 00:04:29,580 For example, if you are measuring time and seconds, 94 00:04:29,580 --> 00:04:32,080 the variance would be measured in seconds-squared. 95 00:04:32,940 --> 00:04:35,823 Usually, there is no direct interpretation of that value. 96 00:04:36,810 --> 00:04:39,060 To make further sense of variance, we introduce 97 00:04:39,060 --> 00:04:41,670 a third characteristic of the distribution 98 00:04:41,670 --> 00:04:43,263 called standard deviation. 99 00:04:44,400 --> 00:04:47,220 Standard deviation is simply the positive square root 100 00:04:47,220 --> 00:04:48,183 of variance. 101 00:04:49,290 --> 00:04:52,950 As you may suspect, we denote it as sigma when dealing 102 00:04:52,950 --> 00:04:57,423 with a population and as S when dealing with a sample. 103 00:04:59,400 --> 00:05:01,860 Unlike variance, standard deviation is measured 104 00:05:01,860 --> 00:05:04,470 in the same units as the mean. 105 00:05:04,470 --> 00:05:08,640 Thus, we can directly interpret it and is often preferable. 106 00:05:08,640 --> 00:05:11,700 One idea, which we will use a lot, is that any value 107 00:05:11,700 --> 00:05:16,080 between mu minus sigma and mu plus sigma falls 108 00:05:16,080 --> 00:05:19,830 within one standard deviation away from the mean. 109 00:05:19,830 --> 00:05:22,260 The more congested the middle of the distribution, 110 00:05:22,260 --> 00:05:24,393 the more data falls within that interval. 111 00:05:25,230 --> 00:05:27,330 Similarly, the less data that falls 112 00:05:27,330 --> 00:05:30,393 within the interval the more dispersed the data is. 113 00:05:31,740 --> 00:05:33,360 Fantastic! 114 00:05:33,360 --> 00:05:36,240 It is important to know that a constant relationship exists 115 00:05:36,240 --> 00:05:39,840 between mean and variance for any distribution. 116 00:05:39,840 --> 00:05:43,230 By definition, the variance equals the expected value 117 00:05:43,230 --> 00:05:46,590 of the squared difference from the mean for any value. 118 00:05:46,590 --> 00:05:50,790 We denote this as sigma-squared equals the expected value 119 00:05:50,790 --> 00:05:52,713 of Y minus mu-squared. 120 00:05:53,820 --> 00:05:56,190 After some simplification, this is equal 121 00:05:56,190 --> 00:06:01,190 to the expected value of y-squared minus mu-squared. 122 00:06:01,470 --> 00:06:03,330 As you will see in the coming lectures 123 00:06:03,330 --> 00:06:05,790 if we are dealing with a specific distribution 124 00:06:05,790 --> 00:06:08,073 we can find a much more precise formula. 125 00:06:10,380 --> 00:06:12,120 Okay, when we are getting acquainted 126 00:06:12,120 --> 00:06:13,410 with a certain data set 127 00:06:13,410 --> 00:06:15,960 we want to analyze or make predictions with, 128 00:06:15,960 --> 00:06:17,070 we are most interested 129 00:06:17,070 --> 00:06:21,060 in the mean, variance and type of the distribution. 130 00:06:21,060 --> 00:06:24,360 In our next video, we will introduce several distributions 131 00:06:24,360 --> 00:06:26,223 and the characteristics they possess. 132 00:06:27,090 --> 00:06:28,173 Thanks for watching! 10339

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.