All language subtitles for 001 Preprocessing Introduction_en

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian Download
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:01,140 --> 00:00:04,380 Instructor: Hey, this is the last theoretical section 2 00:00:04,380 --> 00:00:05,430 of the course. 3 00:00:05,430 --> 00:00:07,740 It is about the first activity you want to do 4 00:00:07,740 --> 00:00:10,740 when you start creating a machine learning algorithm. 5 00:00:10,740 --> 00:00:12,450 Preprocessing. 6 00:00:12,450 --> 00:00:14,910 Preprocessing refers to any manipulation 7 00:00:14,910 --> 00:00:16,230 we apply to the data set 8 00:00:16,230 --> 00:00:18,270 before running it through the model. 9 00:00:18,270 --> 00:00:20,400 Everything we saw so far was conditioned 10 00:00:20,400 --> 00:00:22,830 on the fact that we had already pre-processed our data 11 00:00:22,830 --> 00:00:25,230 in a way suitable for training. 12 00:00:25,230 --> 00:00:27,690 You've already seen some preprocessing. 13 00:00:27,690 --> 00:00:31,680 In the TensorFlow intro, we created an npz file. 14 00:00:31,680 --> 00:00:34,050 All the training we did came from there, 15 00:00:34,050 --> 00:00:37,320 so, if you must work with data in an xl file, 16 00:00:37,320 --> 00:00:38,910 CSV, or whatever, 17 00:00:38,910 --> 00:00:40,980 saving it into an npz file 18 00:00:40,980 --> 00:00:43,023 would be a type of preprocessing. 19 00:00:44,190 --> 00:00:45,450 In this section though, 20 00:00:45,450 --> 00:00:48,270 we will mainly focus on data transformations 21 00:00:48,270 --> 00:00:50,433 rather than reordering as before. 22 00:00:52,350 --> 00:00:55,200 What is the motivation for preprocessing? 23 00:00:55,200 --> 00:00:57,363 There are several important points. 24 00:00:58,350 --> 00:01:00,390 The first one is about compatibility 25 00:01:00,390 --> 00:01:02,220 with the libraries we use. 26 00:01:02,220 --> 00:01:03,600 As we saw earlier, 27 00:01:03,600 --> 00:01:05,550 TensorFlow works with tent source 28 00:01:05,550 --> 00:01:07,500 and not Excel spreadsheets. 29 00:01:07,500 --> 00:01:08,730 In data science, 30 00:01:08,730 --> 00:01:11,460 you will often be given data in whatever format 31 00:01:11,460 --> 00:01:14,373 and you must make it compatible with the tools you use. 32 00:01:16,170 --> 00:01:20,580 Second, we may need to adjust inputs of different magnitude. 33 00:01:20,580 --> 00:01:22,950 Let's say we are 4x traders. 34 00:01:22,950 --> 00:01:24,810 If one input we are working with 35 00:01:24,810 --> 00:01:27,450 is the end of the day Euro/Dollar exchange rate, 36 00:01:27,450 --> 00:01:29,493 it would be a value around 1. 37 00:01:30,540 --> 00:01:34,110 However, if another input is the daily trading volume, 38 00:01:34,110 --> 00:01:37,380 we would have values like 100,000 and higher. 39 00:01:37,380 --> 00:01:41,340 Obviously, the orders of magnitude are quite different. 40 00:01:41,340 --> 00:01:43,260 A linear combination of numbers 41 00:01:43,260 --> 00:01:46,200 based on such different skills as problematic. 42 00:01:46,200 --> 00:01:48,210 In purely mathematical terms, 43 00:01:48,210 --> 00:01:53,130 a value of 1 is negligible regarding a value of 100,000. 44 00:01:53,130 --> 00:01:56,100 As all the inputs are on an equal footing in a vector 45 00:01:56,100 --> 00:01:57,150 or a matrix, 46 00:01:57,150 --> 00:02:00,573 the algorithm is likely to ignore all values around 1. 47 00:02:01,650 --> 00:02:03,570 These values essentially represent 48 00:02:03,570 --> 00:02:05,790 the Euro/Dollar exchange rate itself, 49 00:02:05,790 --> 00:02:09,509 so they are often more important than the volume of trading. 50 00:02:09,509 --> 00:02:13,203 Obviously, something needs to be done to solve this issue. 51 00:02:15,720 --> 00:02:18,360 A third reason is generalization. 52 00:02:18,360 --> 00:02:19,980 Problems that seem different 53 00:02:19,980 --> 00:02:22,800 can often be solved by similar models. 54 00:02:22,800 --> 00:02:25,170 Standardizing inputs of different problems 55 00:02:25,170 --> 00:02:28,200 allows us to reuse the exact same models. 56 00:02:28,200 --> 00:02:29,850 Sometimes there are cases 57 00:02:29,850 --> 00:02:32,820 when we can even reuse already trained networks. 58 00:02:32,820 --> 00:02:34,230 Imagine that! 59 00:02:34,230 --> 00:02:36,510 You have trained a model previously, 60 00:02:36,510 --> 00:02:39,000 you face a new problem, you test your model, 61 00:02:39,000 --> 00:02:40,620 and it works like a charm. 62 00:02:40,620 --> 00:02:42,903 That's not unusual in machine learning. 63 00:02:44,190 --> 00:02:45,450 In the next few lessons, 64 00:02:45,450 --> 00:02:47,190 we will focus on these concepts 65 00:02:47,190 --> 00:02:50,160 and introduce several pre-processing techniques. 66 00:02:50,160 --> 00:02:51,393 Thanks for watching. 5008

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.