All language subtitles for 004 Clustering Categorical Data_en

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian Download
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,330 --> 00:00:02,700 Instructor: Hey, let's continue the problem 2 00:00:02,700 --> 00:00:04,620 from the last lecture. 3 00:00:04,620 --> 00:00:07,230 As you can see, we had one other piece of information 4 00:00:07,230 --> 00:00:10,080 that we did not use, language. 5 00:00:10,080 --> 00:00:11,400 In order to make use of it, 6 00:00:11,400 --> 00:00:13,980 we must first encode it in some way. 7 00:00:13,980 --> 00:00:16,762 The simplest way to do that is by using numbers. 8 00:00:16,762 --> 00:00:20,130 I'll create a new variable called data_mapped 9 00:00:20,130 --> 00:00:23,617 equal to data.copy. 10 00:00:23,617 --> 00:00:28,028 Next, I'll map the languages using the usual method. 11 00:00:28,028 --> 00:00:33,028 Data_mapped language equals data_mapped language.map. 12 00:00:36,210 --> 00:00:38,640 And I'll set English to zero, 13 00:00:38,640 --> 00:00:41,343 French to one, and German to two. 14 00:00:42,270 --> 00:00:45,060 Note that this is not the optimal way to encode them 15 00:00:45,060 --> 00:00:46,860 but it will work for now. 16 00:00:46,860 --> 00:00:49,950 Here's the result, cool. 17 00:00:49,950 --> 00:00:51,960 Next, let's choose the features 18 00:00:51,960 --> 00:00:53,913 that we want to use for clustering. 19 00:00:54,840 --> 00:00:57,600 Did you know that we can use a single feature? 20 00:00:57,600 --> 00:00:59,970 Well, we certainly can. 21 00:00:59,970 --> 00:01:03,667 Let x be equal to data_mapped.iloc:,3:4. 22 00:01:10,890 --> 00:01:15,300 I am basically slicing all rows, but only the last column. 23 00:01:15,300 --> 00:01:18,210 What we are left with is this. 24 00:01:18,210 --> 00:01:20,133 Now we can perform clustering. 25 00:01:21,060 --> 00:01:24,540 I have the same code ready, so I'll just use it. 26 00:01:24,540 --> 00:01:26,580 We are running k means clustering 27 00:01:26,580 --> 00:01:28,440 with three clusters. 28 00:01:28,440 --> 00:01:31,770 Run, run, run, run and we are done. 29 00:01:31,770 --> 00:01:33,660 The plot is unequivocal. 30 00:01:33,660 --> 00:01:38,660 The three clusters are USA, Canada, UK and Australia 31 00:01:39,180 --> 00:01:42,180 in the first one, France in the second 32 00:01:42,180 --> 00:01:43,533 and Germany in the third. 33 00:01:44,400 --> 00:01:47,070 That's precisely what we expected, right? 34 00:01:47,070 --> 00:01:49,950 English, French and German. 35 00:01:49,950 --> 00:01:51,180 Great. 36 00:01:51,180 --> 00:01:53,493 By the way, we are still using the longitude 37 00:01:53,493 --> 00:01:56,056 and latitude as axis of the plot. 38 00:01:56,056 --> 00:01:58,500 Unlike regression, when doing clustering 39 00:01:58,500 --> 00:02:00,930 you can plot the data as you wish. 40 00:02:00,930 --> 00:02:02,850 The cluster information is contained 41 00:02:02,850 --> 00:02:05,250 in the cluster column in the data frame 42 00:02:05,250 --> 00:02:07,473 and is the color of the points on the plot. 43 00:02:09,300 --> 00:02:10,830 Can we use both numerical 44 00:02:10,830 --> 00:02:13,500 and categorical data in clustering? 45 00:02:13,500 --> 00:02:17,139 Sure, Let's go back to our input data, x, 46 00:02:17,139 --> 00:02:21,416 and take the last three series instead of just one. 47 00:02:21,416 --> 00:02:24,240 Run, run, run, run. 48 00:02:24,240 --> 00:02:27,150 Okay, this time the three clusters turned out 49 00:02:27,150 --> 00:02:29,850 to be based simply on geographical location 50 00:02:29,850 --> 00:02:31,833 instead of language and location. 51 00:02:32,738 --> 00:02:36,273 Hmm, what if we use two clusters? 52 00:02:41,520 --> 00:02:44,220 We've seen that solution too, haven't we? 53 00:02:44,220 --> 00:02:45,930 We will have to work on figuring out 54 00:02:45,930 --> 00:02:48,660 what's going on in the following lesson. 55 00:02:48,660 --> 00:02:49,660 Thanks for watching. 4100

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.