subtitlecat.com

All language subtitles for 012 Market Segmentation with Cluster Analysis (Part 2)_en

Afrikaans

Akan

Albanian

Amharic

Arabic

Armenian

Azerbaijani

Basque

Belarusian

Bemba

Bengali

Bihari

Bosnian

Breton

Bulgarian

Cambodian

Catalan

Cebuano

Cherokee

Chichewa

Chinese (Simplified)

Chinese (Traditional)

Corsican

Croatian

Czech

Danish

Dutch

English

Esperanto

Estonian

Ewe

Faroese

Filipino

Finnish

French

Frisian

Galician

Georgian

German

Greek

Guarani

Gujarati

Haitian Creole

Hausa

Hawaiian

Hebrew

Hindi

Hmong

Hungarian

Icelandic

Igbo

Indonesian

Interlingua

Irish

Italian

Japanese

Javanese

Kannada

Kazakh

Kinyarwanda

Kirundi

Kongo

Korean

Krio (Sierra Leone)

Kurdish

Kurdish (Soranî)

Kyrgyz

Laothian

Latin

Latvian

Lingala

Lithuanian

Lozi

Luganda

Luo

Luxembourgish

Macedonian

Malagasy

Malay

Malayalam

Maltese

Maori

Marathi

Mauritian Creole

Moldavian

Mongolian

Myanmar (Burmese)

Montenegrin

Nepali

Nigerian Pidgin

Northern Sotho

Norwegian

Norwegian (Nynorsk)

Occitan

Oriya

Oromo

Pashto

Persian Download

Polish

Portuguese (Brazil)

Portuguese (Portugal)

Punjabi

Quechua

Romanian

Romansh

Runyakitara

Russian

Samoan

Scots Gaelic

Serbian

Serbo-Croatian

Sesotho

Setswana

Seychellois Creole

Shona

Sindhi

Sinhalese

Slovak

Slovenian

Somali

Spanish

Spanish (Latin American)

Sundanese

Swahili

Swedish

Tajik

Tamil

Tatar

Telugu

Thai

Tigrinya

Tonga

Tshiluba

Tumbuka

Turkish

Turkmen

Twi

Uighur

Ukrainian

Urdu

Uzbek

Vietnamese

Welsh

Wolof

Xhosa

Yiddish

Yoruba

Zulu

Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,360 --> 00:00:01,400 Instructor: Hey, welcome back 2 00:00:01,400 --> 00:00:03,930 to this market segmentation problem. 3 00:00:03,930 --> 00:00:06,363 We were just investigating an elbow. 4 00:00:07,290 --> 00:00:09,450 There isn't a clear tip of the elbow. 5 00:00:09,450 --> 00:00:12,870 I can see three or four tips which are worth trying, 6 00:00:12,870 --> 00:00:16,590 at two, three, four, and five clusters. 7 00:00:16,590 --> 00:00:19,260 Here's the limitation of the elbow method. 8 00:00:19,260 --> 00:00:22,470 We can see the change in WCSS with the increase 9 00:00:22,470 --> 00:00:23,880 in the number of clusters, 10 00:00:23,880 --> 00:00:26,883 but we don't really know which solution is the best one. 11 00:00:28,830 --> 00:00:32,460 All right, let's try two clusters. 12 00:00:32,460 --> 00:00:34,170 We already discussed qualitatively 13 00:00:34,170 --> 00:00:36,510 that this is probably a suboptimal solution, 14 00:00:36,510 --> 00:00:38,760 but it is worth inspecting the difference 15 00:00:38,760 --> 00:00:40,950 with standardized variables. 16 00:00:40,950 --> 00:00:43,450 We'll declare a variable called kmeans_new 17 00:00:45,690 --> 00:00:48,150 equal to KMeans of two. 18 00:00:48,150 --> 00:00:50,823 Next, we will fit the x_scaled data. 19 00:00:51,870 --> 00:00:54,270 Finally, we will create a new data frame 20 00:00:54,270 --> 00:00:59,270 called clusters_new, containing the values from x. 21 00:00:59,730 --> 00:01:01,410 Then the column cluster_pred 22 00:01:01,410 --> 00:01:03,240 will contain the predicted clusters 23 00:01:03,240 --> 00:01:06,303 from this new clustering solution with the scaled x. 24 00:01:07,680 --> 00:01:09,753 Here is a crucial moment. 25 00:01:10,830 --> 00:01:13,800 The data frame contains the original values, 26 00:01:13,800 --> 00:01:15,390 but the predicted clusters are based 27 00:01:15,390 --> 00:01:18,330 on the solution using the standardized data. 28 00:01:18,330 --> 00:01:20,700 This is very important. 29 00:01:20,700 --> 00:01:23,220 We will plot the data without standardizing it, 30 00:01:23,220 --> 00:01:26,163 but the solution itself is the standardized one. 31 00:01:27,450 --> 00:01:30,150 Let me show you by plotting the data. 32 00:01:30,150 --> 00:01:33,780 By keeping the original x-axis, we get the intuition, 33 00:01:33,780 --> 00:01:36,540 how satisfied were the customers. 34 00:01:36,540 --> 00:01:40,230 If we plot the standardized values, we would be deceived. 35 00:01:40,230 --> 00:01:43,260 The middle parts of the two graphs are different. 36 00:01:43,260 --> 00:01:46,500 This one is 5.5, and on the standardized graph, 37 00:01:46,500 --> 00:01:49,020 the midpoint zero actually corresponds 38 00:01:49,020 --> 00:01:51,933 to the mean of the variable, or 6.4. 39 00:01:53,010 --> 00:01:55,380 Let that sink in for a second. 40 00:01:55,380 --> 00:01:58,503 If you wish, you can rewind to be sure you got that right. 41 00:01:59,580 --> 00:02:01,803 We will now continue with the solution. 42 00:02:02,640 --> 00:02:04,350 We can see two clusters, 43 00:02:04,350 --> 00:02:06,690 as we specified the number to be two. 44 00:02:06,690 --> 00:02:08,550 No surprise here. 45 00:02:08,550 --> 00:02:12,000 What's different, though, is the clusters themselves. 46 00:02:12,000 --> 00:02:14,340 Comparing this result with the previous one, 47 00:02:14,340 --> 00:02:15,600 we can clearly see 48 00:02:15,600 --> 00:02:18,690 that both dimensions were taken into account. 49 00:02:18,690 --> 00:02:21,480 Moreover, these two clusters coincide 50 00:02:21,480 --> 00:02:23,370 with our initial speculations, 51 00:02:23,370 --> 00:02:27,270 that those two would be the result of k equals two. 52 00:02:27,270 --> 00:02:31,140 Okay, great, we are now much more confident 53 00:02:31,140 --> 00:02:34,440 that standardization is generally a good thing. 54 00:02:34,440 --> 00:02:37,710 However, the problem is not solved yet. 55 00:02:37,710 --> 00:02:40,410 This two-cluster solution does not make a whole lot 56 00:02:40,410 --> 00:02:44,130 of sense, as we discussed before, but it's a good start. 57 00:02:44,130 --> 00:02:45,990 Let's name the two clusters. 58 00:02:45,990 --> 00:02:49,800 One contains people with low loyalty and low satisfaction, 59 00:02:49,800 --> 00:02:53,073 so we can call these people alienated. 60 00:02:54,000 --> 00:02:57,690 By the way, naming your clusters is very important. 61 00:02:57,690 --> 00:03:01,050 In unsupervised learning, clustering included, 62 00:03:01,050 --> 00:03:02,820 the algorithm will do the magic, 63 00:03:02,820 --> 00:03:05,850 but then we step in to interpret the result. 64 00:03:05,850 --> 00:03:09,180 My feeling here is to call them the alienated cluster, 65 00:03:09,180 --> 00:03:11,820 as they are dissatisfied and not loyal. 66 00:03:11,820 --> 00:03:15,063 No wonder, it's unlikely they'll be back to our shop. 67 00:03:16,020 --> 00:03:18,690 As for the other cluster, it is so heterogeneous 68 00:03:18,690 --> 00:03:21,153 that I'd call it the everything else cluster. 69 00:03:22,260 --> 00:03:24,690 All right, let's get back to the elbow. 70 00:03:24,690 --> 00:03:28,593 Noteworthy tips of the elbow are also three, four, and five. 71 00:03:29,430 --> 00:03:31,620 I'll try them one after the other. 72 00:03:31,620 --> 00:03:33,330 With our well-parameterized code, 73 00:03:33,330 --> 00:03:36,450 we can just change the number of clusters in the first line, 74 00:03:36,450 --> 00:03:39,390 and rerunning the code would do the trick. 75 00:03:39,390 --> 00:03:42,240 Let's try with three clusters. 76 00:03:42,240 --> 00:03:43,890 That's the result. 77 00:03:43,890 --> 00:03:46,830 We have the alienated cluster once more. 78 00:03:46,830 --> 00:03:48,390 That's a good sign. 79 00:03:48,390 --> 00:03:49,770 It shows us that we were right 80 00:03:49,770 --> 00:03:52,020 in concluding that it is a cluster of its own, 81 00:03:52,020 --> 00:03:55,233 while the everything else cluster is now split into two. 82 00:03:56,880 --> 00:03:59,250 I'd call this group the supporters. 83 00:03:59,250 --> 00:04:01,050 They are not particularly happy 84 00:04:01,050 --> 00:04:03,810 with the shopping experience, but they like the brand 85 00:04:03,810 --> 00:04:05,760 and wanna keep coming back. 86 00:04:05,760 --> 00:04:07,890 Note that there are not that many of them. 87 00:04:07,890 --> 00:04:09,423 It is a small cluster. 88 00:04:10,860 --> 00:04:14,490 Finally, the third cluster is called, well, 89 00:04:14,490 --> 00:04:17,700 the all that's left cluster, I guess. 90 00:04:17,700 --> 00:04:20,853 We can't really name it as it is still very much mixed. 91 00:04:22,170 --> 00:04:23,820 What happens next? 92 00:04:23,820 --> 00:04:26,343 Let's check out a four-cluster solution. 93 00:04:31,800 --> 00:04:35,280 We have the alienated and the supporters clusters, 94 00:04:35,280 --> 00:04:38,973 and now these two new ones can also be named, finally. 95 00:04:40,290 --> 00:04:42,180 The upper right one consists of clients 96 00:04:42,180 --> 00:04:44,340 that are satisfied and loyal. 97 00:04:44,340 --> 00:04:47,520 These are our fans, the core customers. 98 00:04:47,520 --> 00:04:49,230 Eventually, we hope that all the points 99 00:04:49,230 --> 00:04:51,120 on this graph turn into fans, 100 00:04:51,120 --> 00:04:54,210 but we will elaborate on this later. 101 00:04:54,210 --> 00:04:56,490 Let's name the last cluster. 102 00:04:56,490 --> 00:04:58,950 We have people who are predominantly satisfied 103 00:04:58,950 --> 00:05:02,400 but not loyal, and some of them are actually disloyal. 104 00:05:02,400 --> 00:05:03,840 A term I've seen somewhere 105 00:05:03,840 --> 00:05:07,710 to describe such customers is roamers. 106 00:05:07,710 --> 00:05:11,370 They like your brand, but they are not very loyal to it. 107 00:05:11,370 --> 00:05:13,653 We have all been there for some brand. 108 00:05:14,520 --> 00:05:17,460 Okay, this solution is definitely the best one 109 00:05:17,460 --> 00:05:18,723 we've seen so far. 110 00:05:21,210 --> 00:05:24,030 Here's where it stood on the elbow graph, 111 00:05:24,030 --> 00:05:26,583 but how about we try with five clusters? 112 00:05:31,170 --> 00:05:33,180 The alienated, the supporters, 113 00:05:33,180 --> 00:05:35,640 and the fans remain unchanged. 114 00:05:35,640 --> 00:05:38,970 These people here look like the roamers from before. 115 00:05:38,970 --> 00:05:41,520 Finally, these clients are almost in the middle 116 00:05:41,520 --> 00:05:43,320 of our standardized graph. 117 00:05:43,320 --> 00:05:45,420 They almost neutral on the loyalty feature 118 00:05:45,420 --> 00:05:47,400 but are generally satisfied. 119 00:05:47,400 --> 00:05:49,200 They are also roamers. 120 00:05:49,200 --> 00:05:51,570 This solution actually split the roamers 121 00:05:51,570 --> 00:05:55,650 into two subclusters, those that are extremely satisfied 122 00:05:55,650 --> 00:05:57,780 and those that are just satisfied, 123 00:05:57,780 --> 00:06:00,813 so there isn't much value added to our segmentation. 124 00:06:01,860 --> 00:06:04,410 We can carry on with as many clusters as we want, 125 00:06:04,410 --> 00:06:05,700 but from now on, 126 00:06:05,700 --> 00:06:09,450 we would just further segment the four core clusters. 127 00:06:09,450 --> 00:06:11,823 Let's finish off with nine clusters. 128 00:06:15,600 --> 00:06:17,670 Similar to what we had a second ago, 129 00:06:17,670 --> 00:06:20,340 many of the clusters were further segmented. 130 00:06:20,340 --> 00:06:22,740 It is extremely hard to name all of them, 131 00:06:22,740 --> 00:06:24,270 and even if we do, 132 00:06:24,270 --> 00:06:27,390 we will probably need to use a lot of adjectives. 133 00:06:27,390 --> 00:06:31,380 For instance, the alienated cluster is split into two, 134 00:06:31,380 --> 00:06:33,240 the very alienated cluster 135 00:06:33,240 --> 00:06:36,300 and the moderately alienated cluster. 136 00:06:36,300 --> 00:06:38,580 As you can imagine, there is not much to gain 137 00:06:38,580 --> 00:06:40,323 by using such a fragmented. 138 00:06:42,000 --> 00:06:44,880 In my mind, the four and five-cluster solutions 139 00:06:44,880 --> 00:06:46,560 were the best ones. 140 00:06:46,560 --> 00:06:49,773 Which one you want to use depends on the problem at hand. 141 00:06:50,670 --> 00:06:54,540 Okay, in the next lesson, we will see what we can do 142 00:06:54,540 --> 00:06:56,880 with this new information. 143 00:06:56,880 --> 00:06:57,880 Thanks for watching. 11238