All language subtitles for 004 Preprocessing Categorical Data_en

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian Download
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,990 --> 00:00:03,210 Speaker: So far, most of what we've seen 2 00:00:03,210 --> 00:00:05,580 were examples of numerical variables, 3 00:00:05,580 --> 00:00:09,633 exchange rates, trading volume, security prices, and so on. 4 00:00:11,040 --> 00:00:14,610 Often though, we must deal with categorical data. 5 00:00:14,610 --> 00:00:18,510 In short, categorical data refers to groups or categories, 6 00:00:18,510 --> 00:00:21,330 such as our cat dog examples, 7 00:00:21,330 --> 00:00:23,220 but the machine learning algorithm 8 00:00:23,220 --> 00:00:26,280 takes only numbers as values, doesn't it? 9 00:00:26,280 --> 00:00:29,520 Therefore, the question when working with categorical data 10 00:00:29,520 --> 00:00:33,150 is how to convert a CAT category into a number 11 00:00:33,150 --> 00:00:36,633 so we can input it into a model or output it in the end. 12 00:00:38,100 --> 00:00:41,130 Obviously, a different number should be associated 13 00:00:41,130 --> 00:00:42,990 with each category, right? 14 00:00:42,990 --> 00:00:46,563 Or better, a tensor, we are getting closer. 15 00:00:48,150 --> 00:00:50,520 Imagine our shop has three products, 16 00:00:50,520 --> 00:00:53,070 bread, yogurt, and muffins. 17 00:00:53,070 --> 00:00:56,520 Now, how do we convert these categories to numbers? 18 00:00:56,520 --> 00:01:00,090 A possible solution could be to enumerate them like this. 19 00:01:00,090 --> 00:01:04,712 Bread equals one, yogurt equals two, muffins equals three. 20 00:01:06,150 --> 00:01:09,510 Unfortunately, this implies there is some order. 21 00:01:09,510 --> 00:01:12,330 It's like saying that a muffin is more than a yogurt, 22 00:01:12,330 --> 00:01:13,833 which is more than bread. 23 00:01:15,390 --> 00:01:17,160 Think about prices. 24 00:01:17,160 --> 00:01:21,603 If we instead had three prices, $1, $2, and $3, 25 00:01:22,530 --> 00:01:24,933 three times $1 is equal to $3. 26 00:01:25,830 --> 00:01:28,920 Using the same logic, does it make any sense to you 27 00:01:28,920 --> 00:01:31,503 that three times bread equals one muffin? 28 00:01:32,400 --> 00:01:34,860 There is another level of ambiguity. 29 00:01:34,860 --> 00:01:36,540 To get from bread to muffins, 30 00:01:36,540 --> 00:01:38,253 we always go through yogurt. 31 00:01:39,780 --> 00:01:42,780 Ultimately, what we have done is assume the data 32 00:01:42,780 --> 00:01:44,790 has some order while it hasn't. 33 00:01:44,790 --> 00:01:46,500 Typically that's an issue 34 00:01:46,500 --> 00:01:49,410 when our data is divided into categories. 35 00:01:49,410 --> 00:01:51,210 Think about the products in a shop, 36 00:01:51,210 --> 00:01:53,733 about different car brands or about people. 37 00:01:55,440 --> 00:01:59,490 So our question becomes how to encode such categories 38 00:01:59,490 --> 00:02:01,110 in a way which will be useful 39 00:02:01,110 --> 00:02:03,540 for a machine learning algorithm. 40 00:02:03,540 --> 00:02:05,760 Two main ways are adopted. 41 00:02:05,760 --> 00:02:08,250 The first one is called One-hot Encoding 42 00:02:08,250 --> 00:02:10,650 and the other Binary Encoding. 43 00:02:10,650 --> 00:02:13,680 We will see how to perform them in the next lesson. 44 00:02:13,680 --> 00:02:14,823 Thanks for watching. 3474

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.