All language subtitles for 003 Standardization_en

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian Download
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,780 --> 00:00:02,130 Instructor: The most common problem 2 00:00:02,130 --> 00:00:03,840 when working with numerical data 3 00:00:03,840 --> 00:00:05,610 is about the difference in magnitudes 4 00:00:05,610 --> 00:00:07,620 as we mentioned in the first lesson. 5 00:00:07,620 --> 00:00:10,680 An easy fix for this issue is standardization. 6 00:00:10,680 --> 00:00:13,050 Other names by which you may have heard this term 7 00:00:13,050 --> 00:00:16,050 are feature scaling and normalization. 8 00:00:16,050 --> 00:00:18,510 However, normalization could refer 9 00:00:18,510 --> 00:00:21,570 to a few additional concepts even within machine learning 10 00:00:21,570 --> 00:00:24,270 which is why we'll stick with the term standardization 11 00:00:24,270 --> 00:00:25,443 and feature scaling. 12 00:00:26,940 --> 00:00:29,130 Standardization or feature scaling 13 00:00:29,130 --> 00:00:31,140 is the process of transforming the data 14 00:00:31,140 --> 00:00:33,423 we are working with into a standard scale. 15 00:00:34,530 --> 00:00:36,780 A very common way to approach this problem 16 00:00:36,780 --> 00:00:38,070 is by subtracting the mean 17 00:00:38,070 --> 00:00:40,650 and dividing by the standard deviation. 18 00:00:40,650 --> 00:00:41,640 In this way, 19 00:00:41,640 --> 00:00:43,470 regardless of the data set, 20 00:00:43,470 --> 00:00:46,620 we will always obtain a distribution with a mean of zero 21 00:00:46,620 --> 00:00:48,450 and a standard deviation of one, 22 00:00:48,450 --> 00:00:50,133 which could easily be proven. 23 00:00:51,360 --> 00:00:54,240 Let's show that with an FX example. 24 00:00:54,240 --> 00:00:57,090 Say our algorithm has two input variables, 25 00:00:57,090 --> 00:01:00,003 Euro dollar exchange rate and the daily trading volume. 26 00:01:01,470 --> 00:01:04,440 We have three days worth of observations. 27 00:01:04,440 --> 00:01:07,623 First day, 1.3 and 110,000, 28 00:01:08,850 --> 00:01:13,850 second day, 1.34 and 98,700, 29 00:01:13,920 --> 00:01:18,003 and the third day, 1.25 and 135,000. 30 00:01:19,260 --> 00:01:21,900 The first value shows the Euro dollar exchange rate, 31 00:01:21,900 --> 00:01:25,320 while the second one shows the daily trading volume. 32 00:01:25,320 --> 00:01:27,480 Let's standardize these figures. 33 00:01:27,480 --> 00:01:29,820 We standardize the Euro dollar exchange rates 34 00:01:29,820 --> 00:01:32,790 regarding the other Euro dollar exchange rates. 35 00:01:32,790 --> 00:01:37,740 So, we look at 1.3, 1.34 and 1.25. 36 00:01:37,740 --> 00:01:39,639 The mean is 1.3, 37 00:01:39,639 --> 00:01:42,993 while the standard deviation 0.045. 38 00:01:44,370 --> 00:01:47,040 Going through the above mentioned transformation, 39 00:01:47,040 --> 00:01:52,040 these values become 0.07, 0.96 and -1.03 respectively. 40 00:01:56,010 --> 00:01:57,750 Standardizing trading volumes, 41 00:01:57,750 --> 00:02:02,750 we obtain -0.25, -0.85 and 1.1. 42 00:02:04,410 --> 00:02:05,280 In this way, 43 00:02:05,280 --> 00:02:07,740 we have focused figures of very different scales 44 00:02:07,740 --> 00:02:09,090 to appear similar. 45 00:02:09,090 --> 00:02:11,400 That's why another name for standardization 46 00:02:11,400 --> 00:02:12,870 is feature scaling. 47 00:02:12,870 --> 00:02:15,420 This will ensure our linear combinations 48 00:02:15,420 --> 00:02:17,460 treat the two variables equally. 49 00:02:17,460 --> 00:02:20,673 Also, it is much easier to make sense of the data. 50 00:02:21,870 --> 00:02:23,760 The transformation of trading volumes 51 00:02:23,760 --> 00:02:25,380 allowed us to transform the volumes 52 00:02:25,380 --> 00:02:30,380 from 110,000, 98,700 and 135,000 to -0.25, -0.85 and 1.1. 53 00:02:35,280 --> 00:02:36,300 In this way, 54 00:02:36,300 --> 00:02:39,330 the third term is considerably higher than the average, 55 00:02:39,330 --> 00:02:42,060 while the first one is around the average. 56 00:02:42,060 --> 00:02:45,780 We can confidently say that 135,000 trades per day 57 00:02:45,780 --> 00:02:46,980 is a high figure, 58 00:02:46,980 --> 00:02:49,950 while 98,700 is low. 59 00:02:49,950 --> 00:02:51,930 Please disregard the simplification 60 00:02:51,930 --> 00:02:54,000 of having just three observations. 61 00:02:54,000 --> 00:02:55,653 That's just an example. 62 00:02:57,360 --> 00:02:58,920 Besides standardization, 63 00:02:58,920 --> 00:03:01,080 there are other popular methods, too. 64 00:03:01,080 --> 00:03:02,610 We will shortly introduce them 65 00:03:02,610 --> 00:03:04,653 without going too much in detail. 66 00:03:06,540 --> 00:03:08,700 Initially, we said that normalization 67 00:03:08,700 --> 00:03:10,740 refers to several concepts. 68 00:03:10,740 --> 00:03:13,230 One of them, which comes up in machine learning 69 00:03:13,230 --> 00:03:15,690 often consists of converting each sample 70 00:03:15,690 --> 00:03:19,593 into a unit length vector using the L1 or L2 norm. 71 00:03:21,060 --> 00:03:23,670 Another pre-processing method is PCA 72 00:03:23,670 --> 00:03:26,460 standing for principal components analysis. 73 00:03:26,460 --> 00:03:28,830 It is a dimension reduction technique 74 00:03:28,830 --> 00:03:31,140 often used when working with several variables 75 00:03:31,140 --> 00:03:34,920 referring to the same bigger concept or latent variable. 76 00:03:34,920 --> 00:03:36,000 For instance, 77 00:03:36,000 --> 00:03:38,100 if we have data about one's religion, 78 00:03:38,100 --> 00:03:39,030 voting history, 79 00:03:39,030 --> 00:03:41,340 participation in different associations, 80 00:03:41,340 --> 00:03:42,360 an upbringing, 81 00:03:42,360 --> 00:03:43,710 we can combine these four 82 00:03:43,710 --> 00:03:46,830 to reflect his or her attitude towards immigration. 83 00:03:46,830 --> 00:03:49,350 This new variable will normally be standardized 84 00:03:49,350 --> 00:03:50,940 in a range with the mean of zero 85 00:03:50,940 --> 00:03:52,803 and a standard deviation of one. 86 00:03:54,330 --> 00:03:56,910 Whitening is another technique frequently used 87 00:03:56,910 --> 00:03:58,440 for pre-processing. 88 00:03:58,440 --> 00:04:00,840 It is often performed after PCA 89 00:04:00,840 --> 00:04:03,150 and removes most of the underlying correlations 90 00:04:03,150 --> 00:04:04,680 between data points. 91 00:04:04,680 --> 00:04:06,960 Whitening can be useful when conceptually, 92 00:04:06,960 --> 00:04:08,730 the data should be uncorrelated. 93 00:04:08,730 --> 00:04:11,283 But that's not reflected in the observations. 94 00:04:12,660 --> 00:04:14,850 We can't cover all the strategies 95 00:04:14,850 --> 00:04:17,610 as each strategy is problem specific. 96 00:04:17,610 --> 00:04:20,760 However, standardization is the most common one 97 00:04:20,760 --> 00:04:22,320 and is the one we will employ 98 00:04:22,320 --> 00:04:25,530 in the practical examples we will face in this course. 99 00:04:25,530 --> 00:04:26,640 In the next lesson, 100 00:04:26,640 --> 00:04:29,640 we will see how to deal with categorical data. 101 00:04:29,640 --> 00:04:30,933 Thanks for watching. 7584

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.