subtitlecat.com

All language subtitles for 001 Preprocessing Introduction_en

Afrikaans

Akan

Albanian

Amharic

Arabic

Armenian

Azerbaijani

Basque

Belarusian

Bemba

Bengali

Bihari

Bosnian

Breton

Bulgarian

Cambodian

Catalan

Cebuano

Cherokee

Chichewa

Chinese (Simplified)

Chinese (Traditional)

Corsican

Croatian

Czech

Danish

Dutch

English

Esperanto

Estonian

Ewe

Faroese

Filipino

Finnish

French

Frisian

Galician

Georgian

German

Greek

Guarani

Gujarati

Haitian Creole

Hausa

Hawaiian

Hebrew

Hindi

Hmong

Hungarian

Icelandic

Igbo

Indonesian

Interlingua

Irish

Italian

Japanese

Javanese

Kannada

Kazakh

Kinyarwanda

Kirundi

Kongo

Korean

Krio (Sierra Leone)

Kurdish

Kurdish (Soranî)

Kyrgyz

Laothian

Latin

Latvian

Lingala

Lithuanian

Lozi

Luganda

Luo

Luxembourgish

Macedonian

Malagasy

Malay

Malayalam

Maltese

Maori

Marathi

Mauritian Creole

Moldavian

Mongolian

Myanmar (Burmese)

Montenegrin

Nepali

Nigerian Pidgin

Northern Sotho

Norwegian

Norwegian (Nynorsk)

Occitan

Oriya

Oromo

Pashto

Persian Download

Polish

Portuguese (Brazil)

Portuguese (Portugal)

Punjabi

Quechua

Romanian

Romansh

Runyakitara

Russian

Samoan

Scots Gaelic

Serbian

Serbo-Croatian

Sesotho

Setswana

Seychellois Creole

Shona

Sindhi

Sinhalese

Slovak

Slovenian

Somali

Spanish

Spanish (Latin American)

Sundanese

Swahili

Swedish

Tajik

Tamil

Tatar

Telugu

Thai

Tigrinya

Tonga

Tshiluba

Tumbuka

Turkish

Turkmen

Twi

Uighur

Ukrainian

Urdu

Uzbek

Vietnamese

Welsh

Wolof

Xhosa

Yiddish

Yoruba

Zulu

Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:01,140 --> 00:00:04,380 Instructor: Hey, this is the last theoretical section 2 00:00:04,380 --> 00:00:05,430 of the course. 3 00:00:05,430 --> 00:00:07,740 It is about the first activity you want to do 4 00:00:07,740 --> 00:00:10,740 when you start creating a machine learning algorithm. 5 00:00:10,740 --> 00:00:12,450 Preprocessing. 6 00:00:12,450 --> 00:00:14,910 Preprocessing refers to any manipulation 7 00:00:14,910 --> 00:00:16,230 we apply to the data set 8 00:00:16,230 --> 00:00:18,270 before running it through the model. 9 00:00:18,270 --> 00:00:20,400 Everything we saw so far was conditioned 10 00:00:20,400 --> 00:00:22,830 on the fact that we had already pre-processed our data 11 00:00:22,830 --> 00:00:25,230 in a way suitable for training. 12 00:00:25,230 --> 00:00:27,690 You've already seen some preprocessing. 13 00:00:27,690 --> 00:00:31,680 In the TensorFlow intro, we created an npz file. 14 00:00:31,680 --> 00:00:34,050 All the training we did came from there, 15 00:00:34,050 --> 00:00:37,320 so, if you must work with data in an xl file, 16 00:00:37,320 --> 00:00:38,910 CSV, or whatever, 17 00:00:38,910 --> 00:00:40,980 saving it into an npz file 18 00:00:40,980 --> 00:00:43,023 would be a type of preprocessing. 19 00:00:44,190 --> 00:00:45,450 In this section though, 20 00:00:45,450 --> 00:00:48,270 we will mainly focus on data transformations 21 00:00:48,270 --> 00:00:50,433 rather than reordering as before. 22 00:00:52,350 --> 00:00:55,200 What is the motivation for preprocessing? 23 00:00:55,200 --> 00:00:57,363 There are several important points. 24 00:00:58,350 --> 00:01:00,390 The first one is about compatibility 25 00:01:00,390 --> 00:01:02,220 with the libraries we use. 26 00:01:02,220 --> 00:01:03,600 As we saw earlier, 27 00:01:03,600 --> 00:01:05,550 TensorFlow works with tent source 28 00:01:05,550 --> 00:01:07,500 and not Excel spreadsheets. 29 00:01:07,500 --> 00:01:08,730 In data science, 30 00:01:08,730 --> 00:01:11,460 you will often be given data in whatever format 31 00:01:11,460 --> 00:01:14,373 and you must make it compatible with the tools you use. 32 00:01:16,170 --> 00:01:20,580 Second, we may need to adjust inputs of different magnitude. 33 00:01:20,580 --> 00:01:22,950 Let's say we are 4x traders. 34 00:01:22,950 --> 00:01:24,810 If one input we are working with 35 00:01:24,810 --> 00:01:27,450 is the end of the day Euro/Dollar exchange rate, 36 00:01:27,450 --> 00:01:29,493 it would be a value around 1. 37 00:01:30,540 --> 00:01:34,110 However, if another input is the daily trading volume, 38 00:01:34,110 --> 00:01:37,380 we would have values like 100,000 and higher. 39 00:01:37,380 --> 00:01:41,340 Obviously, the orders of magnitude are quite different. 40 00:01:41,340 --> 00:01:43,260 A linear combination of numbers 41 00:01:43,260 --> 00:01:46,200 based on such different skills as problematic. 42 00:01:46,200 --> 00:01:48,210 In purely mathematical terms, 43 00:01:48,210 --> 00:01:53,130 a value of 1 is negligible regarding a value of 100,000. 44 00:01:53,130 --> 00:01:56,100 As all the inputs are on an equal footing in a vector 45 00:01:56,100 --> 00:01:57,150 or a matrix, 46 00:01:57,150 --> 00:02:00,573 the algorithm is likely to ignore all values around 1. 47 00:02:01,650 --> 00:02:03,570 These values essentially represent 48 00:02:03,570 --> 00:02:05,790 the Euro/Dollar exchange rate itself, 49 00:02:05,790 --> 00:02:09,509 so they are often more important than the volume of trading. 50 00:02:09,509 --> 00:02:13,203 Obviously, something needs to be done to solve this issue. 51 00:02:15,720 --> 00:02:18,360 A third reason is generalization. 52 00:02:18,360 --> 00:02:19,980 Problems that seem different 53 00:02:19,980 --> 00:02:22,800 can often be solved by similar models. 54 00:02:22,800 --> 00:02:25,170 Standardizing inputs of different problems 55 00:02:25,170 --> 00:02:28,200 allows us to reuse the exact same models. 56 00:02:28,200 --> 00:02:29,850 Sometimes there are cases 57 00:02:29,850 --> 00:02:32,820 when we can even reuse already trained networks. 58 00:02:32,820 --> 00:02:34,230 Imagine that! 59 00:02:34,230 --> 00:02:36,510 You have trained a model previously, 60 00:02:36,510 --> 00:02:39,000 you face a new problem, you test your model, 61 00:02:39,000 --> 00:02:40,620 and it works like a charm. 62 00:02:40,620 --> 00:02:42,903 That's not unusual in machine learning. 63 00:02:44,190 --> 00:02:45,450 In the next few lessons, 64 00:02:45,450 --> 00:02:47,190 we will focus on these concepts 65 00:02:47,190 --> 00:02:50,160 and introduce several pre-processing techniques. 66 00:02:50,160 --> 00:02:51,393 Thanks for watching. 5008