All language subtitles for 012 Market Segmentation with Cluster Analysis (Part 2)_en

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian Download
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,360 --> 00:00:01,400 Instructor: Hey, welcome back 2 00:00:01,400 --> 00:00:03,930 to this market segmentation problem. 3 00:00:03,930 --> 00:00:06,363 We were just investigating an elbow. 4 00:00:07,290 --> 00:00:09,450 There isn't a clear tip of the elbow. 5 00:00:09,450 --> 00:00:12,870 I can see three or four tips which are worth trying, 6 00:00:12,870 --> 00:00:16,590 at two, three, four, and five clusters. 7 00:00:16,590 --> 00:00:19,260 Here's the limitation of the elbow method. 8 00:00:19,260 --> 00:00:22,470 We can see the change in WCSS with the increase 9 00:00:22,470 --> 00:00:23,880 in the number of clusters, 10 00:00:23,880 --> 00:00:26,883 but we don't really know which solution is the best one. 11 00:00:28,830 --> 00:00:32,460 All right, let's try two clusters. 12 00:00:32,460 --> 00:00:34,170 We already discussed qualitatively 13 00:00:34,170 --> 00:00:36,510 that this is probably a suboptimal solution, 14 00:00:36,510 --> 00:00:38,760 but it is worth inspecting the difference 15 00:00:38,760 --> 00:00:40,950 with standardized variables. 16 00:00:40,950 --> 00:00:43,450 We'll declare a variable called kmeans_new 17 00:00:45,690 --> 00:00:48,150 equal to KMeans of two. 18 00:00:48,150 --> 00:00:50,823 Next, we will fit the x_scaled data. 19 00:00:51,870 --> 00:00:54,270 Finally, we will create a new data frame 20 00:00:54,270 --> 00:00:59,270 called clusters_new, containing the values from x. 21 00:00:59,730 --> 00:01:01,410 Then the column cluster_pred 22 00:01:01,410 --> 00:01:03,240 will contain the predicted clusters 23 00:01:03,240 --> 00:01:06,303 from this new clustering solution with the scaled x. 24 00:01:07,680 --> 00:01:09,753 Here is a crucial moment. 25 00:01:10,830 --> 00:01:13,800 The data frame contains the original values, 26 00:01:13,800 --> 00:01:15,390 but the predicted clusters are based 27 00:01:15,390 --> 00:01:18,330 on the solution using the standardized data. 28 00:01:18,330 --> 00:01:20,700 This is very important. 29 00:01:20,700 --> 00:01:23,220 We will plot the data without standardizing it, 30 00:01:23,220 --> 00:01:26,163 but the solution itself is the standardized one. 31 00:01:27,450 --> 00:01:30,150 Let me show you by plotting the data. 32 00:01:30,150 --> 00:01:33,780 By keeping the original x-axis, we get the intuition, 33 00:01:33,780 --> 00:01:36,540 how satisfied were the customers. 34 00:01:36,540 --> 00:01:40,230 If we plot the standardized values, we would be deceived. 35 00:01:40,230 --> 00:01:43,260 The middle parts of the two graphs are different. 36 00:01:43,260 --> 00:01:46,500 This one is 5.5, and on the standardized graph, 37 00:01:46,500 --> 00:01:49,020 the midpoint zero actually corresponds 38 00:01:49,020 --> 00:01:51,933 to the mean of the variable, or 6.4. 39 00:01:53,010 --> 00:01:55,380 Let that sink in for a second. 40 00:01:55,380 --> 00:01:58,503 If you wish, you can rewind to be sure you got that right. 41 00:01:59,580 --> 00:02:01,803 We will now continue with the solution. 42 00:02:02,640 --> 00:02:04,350 We can see two clusters, 43 00:02:04,350 --> 00:02:06,690 as we specified the number to be two. 44 00:02:06,690 --> 00:02:08,550 No surprise here. 45 00:02:08,550 --> 00:02:12,000 What's different, though, is the clusters themselves. 46 00:02:12,000 --> 00:02:14,340 Comparing this result with the previous one, 47 00:02:14,340 --> 00:02:15,600 we can clearly see 48 00:02:15,600 --> 00:02:18,690 that both dimensions were taken into account. 49 00:02:18,690 --> 00:02:21,480 Moreover, these two clusters coincide 50 00:02:21,480 --> 00:02:23,370 with our initial speculations, 51 00:02:23,370 --> 00:02:27,270 that those two would be the result of k equals two. 52 00:02:27,270 --> 00:02:31,140 Okay, great, we are now much more confident 53 00:02:31,140 --> 00:02:34,440 that standardization is generally a good thing. 54 00:02:34,440 --> 00:02:37,710 However, the problem is not solved yet. 55 00:02:37,710 --> 00:02:40,410 This two-cluster solution does not make a whole lot 56 00:02:40,410 --> 00:02:44,130 of sense, as we discussed before, but it's a good start. 57 00:02:44,130 --> 00:02:45,990 Let's name the two clusters. 58 00:02:45,990 --> 00:02:49,800 One contains people with low loyalty and low satisfaction, 59 00:02:49,800 --> 00:02:53,073 so we can call these people alienated. 60 00:02:54,000 --> 00:02:57,690 By the way, naming your clusters is very important. 61 00:02:57,690 --> 00:03:01,050 In unsupervised learning, clustering included, 62 00:03:01,050 --> 00:03:02,820 the algorithm will do the magic, 63 00:03:02,820 --> 00:03:05,850 but then we step in to interpret the result. 64 00:03:05,850 --> 00:03:09,180 My feeling here is to call them the alienated cluster, 65 00:03:09,180 --> 00:03:11,820 as they are dissatisfied and not loyal. 66 00:03:11,820 --> 00:03:15,063 No wonder, it's unlikely they'll be back to our shop. 67 00:03:16,020 --> 00:03:18,690 As for the other cluster, it is so heterogeneous 68 00:03:18,690 --> 00:03:21,153 that I'd call it the everything else cluster. 69 00:03:22,260 --> 00:03:24,690 All right, let's get back to the elbow. 70 00:03:24,690 --> 00:03:28,593 Noteworthy tips of the elbow are also three, four, and five. 71 00:03:29,430 --> 00:03:31,620 I'll try them one after the other. 72 00:03:31,620 --> 00:03:33,330 With our well-parameterized code, 73 00:03:33,330 --> 00:03:36,450 we can just change the number of clusters in the first line, 74 00:03:36,450 --> 00:03:39,390 and rerunning the code would do the trick. 75 00:03:39,390 --> 00:03:42,240 Let's try with three clusters. 76 00:03:42,240 --> 00:03:43,890 That's the result. 77 00:03:43,890 --> 00:03:46,830 We have the alienated cluster once more. 78 00:03:46,830 --> 00:03:48,390 That's a good sign. 79 00:03:48,390 --> 00:03:49,770 It shows us that we were right 80 00:03:49,770 --> 00:03:52,020 in concluding that it is a cluster of its own, 81 00:03:52,020 --> 00:03:55,233 while the everything else cluster is now split into two. 82 00:03:56,880 --> 00:03:59,250 I'd call this group the supporters. 83 00:03:59,250 --> 00:04:01,050 They are not particularly happy 84 00:04:01,050 --> 00:04:03,810 with the shopping experience, but they like the brand 85 00:04:03,810 --> 00:04:05,760 and wanna keep coming back. 86 00:04:05,760 --> 00:04:07,890 Note that there are not that many of them. 87 00:04:07,890 --> 00:04:09,423 It is a small cluster. 88 00:04:10,860 --> 00:04:14,490 Finally, the third cluster is called, well, 89 00:04:14,490 --> 00:04:17,700 the all that's left cluster, I guess. 90 00:04:17,700 --> 00:04:20,853 We can't really name it as it is still very much mixed. 91 00:04:22,170 --> 00:04:23,820 What happens next? 92 00:04:23,820 --> 00:04:26,343 Let's check out a four-cluster solution. 93 00:04:31,800 --> 00:04:35,280 We have the alienated and the supporters clusters, 94 00:04:35,280 --> 00:04:38,973 and now these two new ones can also be named, finally. 95 00:04:40,290 --> 00:04:42,180 The upper right one consists of clients 96 00:04:42,180 --> 00:04:44,340 that are satisfied and loyal. 97 00:04:44,340 --> 00:04:47,520 These are our fans, the core customers. 98 00:04:47,520 --> 00:04:49,230 Eventually, we hope that all the points 99 00:04:49,230 --> 00:04:51,120 on this graph turn into fans, 100 00:04:51,120 --> 00:04:54,210 but we will elaborate on this later. 101 00:04:54,210 --> 00:04:56,490 Let's name the last cluster. 102 00:04:56,490 --> 00:04:58,950 We have people who are predominantly satisfied 103 00:04:58,950 --> 00:05:02,400 but not loyal, and some of them are actually disloyal. 104 00:05:02,400 --> 00:05:03,840 A term I've seen somewhere 105 00:05:03,840 --> 00:05:07,710 to describe such customers is roamers. 106 00:05:07,710 --> 00:05:11,370 They like your brand, but they are not very loyal to it. 107 00:05:11,370 --> 00:05:13,653 We have all been there for some brand. 108 00:05:14,520 --> 00:05:17,460 Okay, this solution is definitely the best one 109 00:05:17,460 --> 00:05:18,723 we've seen so far. 110 00:05:21,210 --> 00:05:24,030 Here's where it stood on the elbow graph, 111 00:05:24,030 --> 00:05:26,583 but how about we try with five clusters? 112 00:05:31,170 --> 00:05:33,180 The alienated, the supporters, 113 00:05:33,180 --> 00:05:35,640 and the fans remain unchanged. 114 00:05:35,640 --> 00:05:38,970 These people here look like the roamers from before. 115 00:05:38,970 --> 00:05:41,520 Finally, these clients are almost in the middle 116 00:05:41,520 --> 00:05:43,320 of our standardized graph. 117 00:05:43,320 --> 00:05:45,420 They almost neutral on the loyalty feature 118 00:05:45,420 --> 00:05:47,400 but are generally satisfied. 119 00:05:47,400 --> 00:05:49,200 They are also roamers. 120 00:05:49,200 --> 00:05:51,570 This solution actually split the roamers 121 00:05:51,570 --> 00:05:55,650 into two subclusters, those that are extremely satisfied 122 00:05:55,650 --> 00:05:57,780 and those that are just satisfied, 123 00:05:57,780 --> 00:06:00,813 so there isn't much value added to our segmentation. 124 00:06:01,860 --> 00:06:04,410 We can carry on with as many clusters as we want, 125 00:06:04,410 --> 00:06:05,700 but from now on, 126 00:06:05,700 --> 00:06:09,450 we would just further segment the four core clusters. 127 00:06:09,450 --> 00:06:11,823 Let's finish off with nine clusters. 128 00:06:15,600 --> 00:06:17,670 Similar to what we had a second ago, 129 00:06:17,670 --> 00:06:20,340 many of the clusters were further segmented. 130 00:06:20,340 --> 00:06:22,740 It is extremely hard to name all of them, 131 00:06:22,740 --> 00:06:24,270 and even if we do, 132 00:06:24,270 --> 00:06:27,390 we will probably need to use a lot of adjectives. 133 00:06:27,390 --> 00:06:31,380 For instance, the alienated cluster is split into two, 134 00:06:31,380 --> 00:06:33,240 the very alienated cluster 135 00:06:33,240 --> 00:06:36,300 and the moderately alienated cluster. 136 00:06:36,300 --> 00:06:38,580 As you can imagine, there is not much to gain 137 00:06:38,580 --> 00:06:40,323 by using such a fragmented. 138 00:06:42,000 --> 00:06:44,880 In my mind, the four and five-cluster solutions 139 00:06:44,880 --> 00:06:46,560 were the best ones. 140 00:06:46,560 --> 00:06:49,773 Which one you want to use depends on the problem at hand. 141 00:06:50,670 --> 00:06:54,540 Okay, in the next lesson, we will see what we can do 142 00:06:54,540 --> 00:06:56,880 with this new information. 143 00:06:56,880 --> 00:06:57,880 Thanks for watching. 11238

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.