All language subtitles for 015 A Practical Example of Probability Distributions_en

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian Download
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:03,030 --> 00:00:04,320 Instructor: Welcome back. 2 00:00:04,320 --> 00:00:05,790 In this practical example 3 00:00:05,790 --> 00:00:07,500 we will explore several scenarios 4 00:00:07,500 --> 00:00:09,810 where understanding how a data set is distributed 5 00:00:09,810 --> 00:00:11,790 is truly beneficial. 6 00:00:11,790 --> 00:00:13,620 We will examine different data samples 7 00:00:13,620 --> 00:00:16,170 which follow a normal, a Student's T, 8 00:00:16,170 --> 00:00:18,423 and a Poisson distribution. 9 00:00:19,290 --> 00:00:21,450 Furthermore, we will analyze instances 10 00:00:21,450 --> 00:00:24,180 of exponential and binomial data 11 00:00:24,180 --> 00:00:26,610 to help us appreciate the elegant statistics 12 00:00:26,610 --> 00:00:28,353 these distributions possess. 13 00:00:29,670 --> 00:00:31,050 Let's begin. 14 00:00:31,050 --> 00:00:33,780 Imagine you are working as a head project manager 15 00:00:33,780 --> 00:00:35,340 for one of the most renowned companies 16 00:00:35,340 --> 00:00:38,550 in the world of video games, EA Games. 17 00:00:38,550 --> 00:00:40,380 Your various responsibilities include: 18 00:00:40,380 --> 00:00:43,710 supervising the development and release of the 2018 edition 19 00:00:43,710 --> 00:00:46,497 of the soccer game titled "FIFA 19". 20 00:00:47,640 --> 00:00:50,790 Above all else, you need to ensure the game is well rounded 21 00:00:50,790 --> 00:00:53,190 and provides a genuinely enjoyable experience 22 00:00:53,190 --> 00:00:55,080 for all the customers. 23 00:00:55,080 --> 00:00:57,300 The game has a professional competitive scene 24 00:00:57,300 --> 00:00:59,610 so it needs to be balanced. 25 00:00:59,610 --> 00:01:03,150 By balanced, we mean that no team or individual player 26 00:01:03,150 --> 00:01:05,459 should invariably be a preferred option 27 00:01:05,459 --> 00:01:07,323 regardless of the opposition. 28 00:01:08,430 --> 00:01:10,950 Therefore, we expect to have an equal number 29 00:01:10,950 --> 00:01:14,010 of good players and poor players in the game. 30 00:01:14,010 --> 00:01:15,633 Let's see if that's the case. 31 00:01:17,070 --> 00:01:19,440 We provided you with access to a data set 32 00:01:19,440 --> 00:01:21,900 containing the stats for each individual player 33 00:01:21,900 --> 00:01:23,520 in "FIFA 19". 34 00:01:23,520 --> 00:01:25,053 So let's have a closer look. 35 00:01:25,890 --> 00:01:29,790 You can use Microsoft Excel to open the FIFA 19 file 36 00:01:29,790 --> 00:01:31,113 accompanying this lecture. 37 00:01:33,450 --> 00:01:36,900 To begin with, examine the overall column. 38 00:01:36,900 --> 00:01:38,760 It represents the quality of a player 39 00:01:38,760 --> 00:01:42,063 in their natural position on a scale from one to 100. 40 00:01:42,900 --> 00:01:45,390 This value is a sort of weighted average 41 00:01:45,390 --> 00:01:48,153 of the many individual stats each player has. 42 00:01:49,410 --> 00:01:52,650 As you probably know, the importance of attributes varies 43 00:01:52,650 --> 00:01:54,660 for different positions on the field. 44 00:01:54,660 --> 00:01:57,870 For instance, acceleration and top speed 45 00:01:57,870 --> 00:02:00,573 are more important for a winger than tackling. 46 00:02:01,590 --> 00:02:04,440 However, the inverse is true for center backs. 47 00:02:04,440 --> 00:02:07,290 Thus, we alter the weight for each stat 48 00:02:07,290 --> 00:02:09,690 based on the position of the player. 49 00:02:09,690 --> 00:02:11,880 Therefore, we do not have a single formula 50 00:02:11,880 --> 00:02:14,583 which calculates the overall evaluation. 51 00:02:16,050 --> 00:02:17,940 To get an idea of how well distributed 52 00:02:17,940 --> 00:02:19,530 the overall values are, 53 00:02:19,530 --> 00:02:23,583 we can construct a histogram and set the bin size to one. 54 00:02:24,840 --> 00:02:27,393 We do so by selecting the overall column, 55 00:02:28,800 --> 00:02:30,960 clicking on insert, 56 00:02:30,960 --> 00:02:35,493 then insert statistics chart and selecting histogram. 57 00:02:36,990 --> 00:02:38,880 To adjust the size of the bins, 58 00:02:38,880 --> 00:02:41,400 right click on the x-axis of the graph 59 00:02:41,400 --> 00:02:46,293 and press format axis before setting bandwidth to one. 60 00:02:47,910 --> 00:02:49,290 The graph is bell-shaped 61 00:02:49,290 --> 00:02:51,960 and resembles a normal distribution. 62 00:02:51,960 --> 00:02:55,710 But wait, aren't we dealing with discrete values? 63 00:02:55,710 --> 00:02:57,783 How can this be a normal distribution? 64 00:02:58,755 --> 00:03:00,270 Although, although that may be true 65 00:03:00,270 --> 00:03:03,630 continuous variables can take discreet values 66 00:03:03,630 --> 00:03:05,343 but not vice versa. 67 00:03:06,180 --> 00:03:08,820 Furthermore, since we are dealing with rounded averages 68 00:03:08,820 --> 00:03:10,170 we are inclined to believe 69 00:03:10,170 --> 00:03:12,930 that the overall value is not entirely discreet 70 00:03:12,930 --> 00:03:14,853 but rather an approximation. 71 00:03:16,050 --> 00:03:17,973 Let's take a closer look at the graph. 72 00:03:20,280 --> 00:03:22,260 Now we can notice its thin tails 73 00:03:22,260 --> 00:03:25,410 which suggest a smaller number of outliers. 74 00:03:25,410 --> 00:03:27,810 This reflects real life quite accurately 75 00:03:27,810 --> 00:03:31,110 since very few professional players are exceptionally good 76 00:03:31,110 --> 00:03:33,993 or bad at every single aspect of the sport. 77 00:03:35,610 --> 00:03:38,550 Besides even the least skilled professional soccer players 78 00:03:38,550 --> 00:03:41,130 are far superior to the average person. 79 00:03:41,130 --> 00:03:43,980 That explains why the lowest overall values 80 00:03:43,980 --> 00:03:46,743 start from around 50 rather than zero. 81 00:03:48,390 --> 00:03:50,430 The stats should reflect the performance of players 82 00:03:50,430 --> 00:03:51,690 in the real world. 83 00:03:51,690 --> 00:03:54,450 As normal distribution is the most frequently observed 84 00:03:54,450 --> 00:03:56,730 in nature, it is only logical 85 00:03:56,730 --> 00:03:58,923 that the data resembles this distribution. 86 00:04:00,240 --> 00:04:03,180 Moreover, the bell-shaped graph with thin tails 87 00:04:03,180 --> 00:04:04,743 further supports this idea. 88 00:04:06,390 --> 00:04:07,950 Since one of the main characteristics 89 00:04:07,950 --> 00:04:10,500 of a normal distribution is symmetry, 90 00:04:10,500 --> 00:04:13,710 the overall values are symmetrically distributed. 91 00:04:13,710 --> 00:04:16,079 Thus, we can safely consider the game balanced 92 00:04:16,079 --> 00:04:18,333 and acceptable for competitive play. 93 00:04:19,589 --> 00:04:21,360 It is also worth noting that players 94 00:04:21,360 --> 00:04:25,380 within the single team or division share similar stats. 95 00:04:25,380 --> 00:04:27,330 This skews the data a certain way 96 00:04:27,330 --> 00:04:29,400 and explains why we cannot expect the values 97 00:04:29,400 --> 00:04:31,203 to follow a normal distribution. 98 00:04:32,430 --> 00:04:34,410 Now if we wish to further test the balance 99 00:04:34,410 --> 00:04:35,850 of the overall stats, 100 00:04:35,850 --> 00:04:39,570 we can examine a small sample of random players. 101 00:04:39,570 --> 00:04:41,597 For instance, we can construct a histogram 102 00:04:41,597 --> 00:04:44,310 of the first 30 players in the data set 103 00:04:44,310 --> 00:04:46,053 based on their ID number. 104 00:04:47,760 --> 00:04:49,080 Since our data is limited 105 00:04:49,080 --> 00:04:51,390 we need to adjust the size of the bins, 106 00:04:51,390 --> 00:04:53,970 otherwise it is possible for each value to occur 107 00:04:53,970 --> 00:04:55,800 only once or twice. 108 00:04:55,800 --> 00:04:58,440 That would result in many bins of one or two 109 00:04:58,440 --> 00:05:00,153 and make the histogram redundant. 110 00:05:01,770 --> 00:05:03,810 If we adjust the bin size to three, 111 00:05:03,810 --> 00:05:05,700 we will see that the graph slightly resembles 112 00:05:05,700 --> 00:05:07,650 a normal distribution. 113 00:05:07,650 --> 00:05:11,580 However, we will also notice the fatter tails. 114 00:05:11,580 --> 00:05:13,770 Since the number of observations is limited 115 00:05:13,770 --> 00:05:16,110 we can safely consider this sample follows 116 00:05:16,110 --> 00:05:18,153 a Student's t-distribution. 117 00:05:20,400 --> 00:05:23,790 Recall that the Student's t-distribution is also symmetric. 118 00:05:23,790 --> 00:05:24,930 So we are confident 119 00:05:24,930 --> 00:05:27,090 that even the small sample we are examining 120 00:05:27,090 --> 00:05:29,283 confirms our goal of a balanced game. 121 00:05:30,660 --> 00:05:32,490 Before we move on to other aspects 122 00:05:32,490 --> 00:05:33,900 of the development of the game, 123 00:05:33,900 --> 00:05:36,690 let's explore how a single stat is distributed 124 00:05:36,690 --> 00:05:38,940 among the players in the game. 125 00:05:38,940 --> 00:05:41,103 Take the shot power column for example. 126 00:05:42,300 --> 00:05:45,840 If we construct a histogram and set the bin size to one, 127 00:05:45,840 --> 00:05:49,260 we will see a distribution with two peaks. 128 00:05:49,260 --> 00:05:51,873 It resembles two graphs placed side by side. 129 00:05:52,770 --> 00:05:53,970 A way to interpret this 130 00:05:53,970 --> 00:05:56,370 is having two distinct groups of players, 131 00:05:56,370 --> 00:05:58,530 one with a mean of around 21 132 00:05:58,530 --> 00:06:00,783 and another one with a mean of around 65. 133 00:06:02,610 --> 00:06:03,840 The reason behind this 134 00:06:03,840 --> 00:06:06,270 is the presence of goalkeepers in the game. 135 00:06:06,270 --> 00:06:08,670 The stats important for them are completely different 136 00:06:08,670 --> 00:06:11,460 from the stats essential for outfield players. 137 00:06:11,460 --> 00:06:12,840 Thus, it only makes sense 138 00:06:12,840 --> 00:06:14,910 that they will have distinctly lower values 139 00:06:14,910 --> 00:06:17,433 for many of the non-goalkeeper specific stats. 140 00:06:18,510 --> 00:06:22,290 If we examine a goalkeeping trait like GK diving, 141 00:06:22,290 --> 00:06:25,800 we will be able to see the division into types more clearly. 142 00:06:25,800 --> 00:06:28,440 We have two completely different clusters. 143 00:06:28,440 --> 00:06:30,660 The low value represents how outfield players 144 00:06:30,660 --> 00:06:32,220 would perform in goal... 145 00:06:32,220 --> 00:06:33,900 And the higher one represents 146 00:06:33,900 --> 00:06:35,943 the actual goalkeepers performance. 147 00:06:36,810 --> 00:06:38,880 If we only examine the goalies 148 00:06:38,880 --> 00:06:41,880 we will see the values are normally distributed once again 149 00:06:41,880 --> 00:06:44,223 so the game is indeed balanced. 150 00:06:45,840 --> 00:06:47,520 Great job. 151 00:06:47,520 --> 00:06:49,920 Another aspect which meets the game more enjoyable 152 00:06:49,920 --> 00:06:52,410 is creating a sense of realism. 153 00:06:52,410 --> 00:06:54,840 For instance, the young professional soccer players 154 00:06:54,840 --> 00:06:56,670 outnumber the veterans. 155 00:06:56,670 --> 00:06:57,920 Here are the reasons why. 156 00:06:58,860 --> 00:07:02,070 First, a significant number of promising young players 157 00:07:02,070 --> 00:07:03,510 suffer bad injuries, 158 00:07:03,510 --> 00:07:05,730 which significantly slow down their progress 159 00:07:05,730 --> 00:07:07,323 or even halt it altogether. 160 00:07:08,340 --> 00:07:10,650 Second, some are forced to retire 161 00:07:10,650 --> 00:07:12,510 while others simply decide to quit 162 00:07:12,510 --> 00:07:15,360 after spending too much time off the field. 163 00:07:15,360 --> 00:07:16,350 Last but not least, 164 00:07:16,350 --> 00:07:19,050 young players who are not given the opportunity to play 165 00:07:19,050 --> 00:07:20,850 often decide to go to university 166 00:07:20,850 --> 00:07:22,953 instead of pursuing a career in soccer. 167 00:07:24,090 --> 00:07:26,130 All of these factors lead to attrition 168 00:07:26,130 --> 00:07:29,850 which results in having fewer players above the age of 35 169 00:07:29,850 --> 00:07:31,683 than players below the age of 20. 170 00:07:32,880 --> 00:07:35,910 To make sure the game captures this aspect of the sport 171 00:07:35,910 --> 00:07:37,383 check out the age column. 172 00:07:39,030 --> 00:07:41,640 Once again, we can construct a histogram 173 00:07:41,640 --> 00:07:43,353 and set the bin size to one. 174 00:07:44,220 --> 00:07:46,110 We already demonstrated how to do this 175 00:07:46,110 --> 00:07:49,263 for the overall column, so just follow the same steps. 176 00:07:50,760 --> 00:07:52,710 By setting the bin width to one 177 00:07:52,710 --> 00:07:55,623 every age gets represented by a separate bar on the graph. 178 00:07:56,640 --> 00:07:58,980 Age is a discreet variable 179 00:07:58,980 --> 00:08:01,560 representing the age of each player. 180 00:08:01,560 --> 00:08:04,380 In addition, age has a minimum value of 16 181 00:08:04,380 --> 00:08:07,170 since the game only consists of first team players 182 00:08:07,170 --> 00:08:09,930 who have signed a professional contract. 183 00:08:09,930 --> 00:08:12,840 Thus, you can consider 16 as the starting point 184 00:08:12,840 --> 00:08:16,110 for any player who can sign a professional contract. 185 00:08:16,110 --> 00:08:18,330 You may view it as sort of an origin 186 00:08:18,330 --> 00:08:20,820 for a Poisson distribution. 187 00:08:20,820 --> 00:08:23,370 Then each bar in the graph would showcase the likelihood 188 00:08:23,370 --> 00:08:26,853 of a certain player within the data to be a specific age. 189 00:08:27,690 --> 00:08:29,880 Since a Poisson distribution is skewed, 190 00:08:29,880 --> 00:08:32,403 the younger players outnumber the older ones. 191 00:08:33,299 --> 00:08:36,780 As we mentioned before, that is also true in real life. 192 00:08:36,780 --> 00:08:39,179 Therefore, this creates an additional layer of realism 193 00:08:39,179 --> 00:08:41,309 to the game, and should make it more enjoyable 194 00:08:41,309 --> 00:08:42,363 for the customers. 195 00:08:44,910 --> 00:08:47,190 Do you remember that as a head project manager, 196 00:08:47,190 --> 00:08:48,780 apart from the development of the game, 197 00:08:48,780 --> 00:08:52,110 you also need to supervise the official release? 198 00:08:52,110 --> 00:08:55,353 One of its most important aspects is social media marketing. 199 00:08:56,190 --> 00:08:58,110 Now, imagine your main competitor 200 00:08:58,110 --> 00:09:00,210 is trying to expand their customer base 201 00:09:00,210 --> 00:09:02,160 by uploading free video previews 202 00:09:02,160 --> 00:09:06,090 of their new games each Monday prior to their launch. 203 00:09:06,090 --> 00:09:08,550 A month ago, you assigned one of the interns 204 00:09:08,550 --> 00:09:11,400 to keep track of the progress of their views. 205 00:09:11,400 --> 00:09:13,260 You can find the recorded viewership values 206 00:09:13,260 --> 00:09:16,983 in the Daily Views Excel file accompanying this lecture. 207 00:09:19,110 --> 00:09:20,880 Before we proceed with the analysis 208 00:09:20,880 --> 00:09:23,280 I recommend that you download and open the file. 209 00:09:26,130 --> 00:09:30,390 Okay, the Excel file contains a single sheet titled Views, 210 00:09:30,390 --> 00:09:31,983 which comprises two columns. 211 00:09:33,030 --> 00:09:36,090 The first one indicates the number of days post-release 212 00:09:36,090 --> 00:09:38,010 when the value was recorded. 213 00:09:38,010 --> 00:09:40,290 The second one shows the number of views 214 00:09:40,290 --> 00:09:41,523 since the last check. 215 00:09:42,630 --> 00:09:44,490 To get a better understanding of the data 216 00:09:44,490 --> 00:09:47,140 you would wanna see how viewership changes over time. 217 00:09:48,150 --> 00:09:51,363 In order to do so, you decide to graph the data set. 218 00:09:52,740 --> 00:09:54,570 The easiest way to do this 219 00:09:54,570 --> 00:09:58,713 is by marking columns A and B and clicking on insert. 220 00:09:59,820 --> 00:10:01,890 The next step is going to charts 221 00:10:01,890 --> 00:10:03,693 and selecting a scatter plot. 222 00:10:05,160 --> 00:10:08,100 Since most of the views occur within the first few days 223 00:10:08,100 --> 00:10:10,680 the graph starts off at a very high point 224 00:10:10,680 --> 00:10:12,333 and drops down rather quickly. 225 00:10:13,500 --> 00:10:16,530 We can see that daily views start around 100,000 226 00:10:16,530 --> 00:10:19,413 but fall to about 20,000 within a week. 227 00:10:20,700 --> 00:10:22,980 Once the new video is released and promoted 228 00:10:22,980 --> 00:10:25,770 viewership drops to around 10,000 per day 229 00:10:25,770 --> 00:10:28,743 and steadily decreases as it loses relevancy. 230 00:10:30,510 --> 00:10:32,310 By the time a second video has been released 231 00:10:32,310 --> 00:10:33,900 around the 14th day, 232 00:10:33,900 --> 00:10:36,450 the video gets barely a few thousand views per day. 233 00:10:37,560 --> 00:10:40,713 This kind of behavior resembles an exponential distribution. 234 00:10:41,940 --> 00:10:44,070 To check how accurate our assumption is, 235 00:10:44,070 --> 00:10:46,890 we can select the chart elements button on 236 00:10:46,890 --> 00:10:48,243 and select a trend line. 237 00:10:50,070 --> 00:10:52,920 If we do not specify the type of relationship we expect, 238 00:10:52,920 --> 00:10:54,930 Excel is going to assume a linear one 239 00:10:54,930 --> 00:10:57,003 and create a straight trend line. 240 00:10:58,800 --> 00:11:01,440 Since this distribution resembles an exponential one 241 00:11:01,440 --> 00:11:03,993 we pick an exponential trend line instead. 242 00:11:04,890 --> 00:11:08,640 The curve of the trend line fits the data points accurately. 243 00:11:08,640 --> 00:11:10,860 If we assume that the views in fact follow 244 00:11:10,860 --> 00:11:12,090 such a distribution, 245 00:11:12,090 --> 00:11:14,670 then the trend line would represent the PDF 246 00:11:14,670 --> 00:11:16,983 for a view occurring on a specific day. 247 00:11:18,270 --> 00:11:19,770 To test whether views really follow 248 00:11:19,770 --> 00:11:21,390 an exponential distribution, 249 00:11:21,390 --> 00:11:23,943 we should look at the CDF graph as well. 250 00:11:24,780 --> 00:11:26,970 We can graph the relationship between the first 251 00:11:26,970 --> 00:11:28,410 and third columns. 252 00:11:28,410 --> 00:11:31,620 Since total views represents the cumulative number of views 253 00:11:31,620 --> 00:11:33,510 up to a given period in time, 254 00:11:33,510 --> 00:11:37,740 it shows the aggregated number of views the video got. 255 00:11:37,740 --> 00:11:39,480 Let's create another scatter plot 256 00:11:39,480 --> 00:11:41,380 following the same steps as last time. 257 00:11:43,080 --> 00:11:46,170 We can notice that the curve goes up at a decreasing rate 258 00:11:46,170 --> 00:11:48,510 before eventually plateauing. 259 00:11:48,510 --> 00:11:51,660 This also matches our expectation of the CDF 260 00:11:51,660 --> 00:11:53,433 of an exponential distribution. 261 00:11:55,020 --> 00:11:57,720 Now that we know the viewership fluctuates each day 262 00:11:57,720 --> 00:12:02,040 we can state that each video loses relevancy rather quickly. 263 00:12:02,040 --> 00:12:04,470 This means that such a campaign is only beneficial 264 00:12:04,470 --> 00:12:05,970 in the short term. 265 00:12:05,970 --> 00:12:07,710 Therefore, you advise your marketing team 266 00:12:07,710 --> 00:12:10,770 to release similar videos only during the last month 267 00:12:10,770 --> 00:12:12,510 before launching the game. 268 00:12:12,510 --> 00:12:15,360 That way, all the videos will generate enough attention 269 00:12:15,360 --> 00:12:17,010 to make the game feel immense 270 00:12:17,010 --> 00:12:19,443 without losing customer interest in the process. 271 00:12:21,060 --> 00:12:22,143 Fantastic. 272 00:12:22,980 --> 00:12:24,810 In addition to competitor analysis 273 00:12:24,810 --> 00:12:27,423 you need to conduct some customer analysis as well. 274 00:12:29,100 --> 00:12:31,170 You certainly care which of your clients can afford 275 00:12:31,170 --> 00:12:33,420 to spend more on in-game purchases, 276 00:12:33,420 --> 00:12:35,520 so you send out a survey. 277 00:12:35,520 --> 00:12:36,990 One of the survey questions 278 00:12:36,990 --> 00:12:38,940 is whether the customer is a premium member 279 00:12:38,940 --> 00:12:41,583 of the official fan club of any team in the game. 280 00:12:42,720 --> 00:12:45,870 Since these fans are more devoted and financially capable 281 00:12:45,870 --> 00:12:48,360 you wanna find out if there is any other feature 282 00:12:48,360 --> 00:12:50,970 you could use to target this group. 283 00:12:50,970 --> 00:12:53,340 You decide to examine a small sample of the data 284 00:12:53,340 --> 00:12:55,350 which contains the age of the customer 285 00:12:55,350 --> 00:12:57,180 according to their EA sports account 286 00:12:57,180 --> 00:12:59,163 and whether they are a premium member. 287 00:13:01,140 --> 00:13:04,200 This data is stored in the Customer's Membership Excel file 288 00:13:04,200 --> 00:13:05,523 accompanying this lecture. 289 00:13:06,630 --> 00:13:09,570 After opening the file, we see two columns, 290 00:13:09,570 --> 00:13:11,310 one with numeric values 291 00:13:11,310 --> 00:13:13,653 and the other one with ones and zeros. 292 00:13:15,000 --> 00:13:17,970 The first column represents the age of the customer. 293 00:13:17,970 --> 00:13:20,883 The second one shows whether they are a member or not. 294 00:13:21,750 --> 00:13:24,180 If the customer is also a member of the fan club 295 00:13:24,180 --> 00:13:27,360 we put one in the second column. 296 00:13:27,360 --> 00:13:30,753 Alternatively, if they are not, we write down a zero. 297 00:13:31,650 --> 00:13:33,870 Now, if we construct the scatter plot 298 00:13:33,870 --> 00:13:36,780 we are going to see that most people under the age of 34 299 00:13:36,780 --> 00:13:38,370 don't have a membership. 300 00:13:38,370 --> 00:13:42,030 Whilst most people over the age of 34 do. 301 00:13:42,030 --> 00:13:44,130 Of course, there are exceptions to this rule, 302 00:13:44,130 --> 00:13:46,980 which is normal when we are dealing with real world data. 303 00:13:48,750 --> 00:13:50,430 That being said, the data looks 304 00:13:50,430 --> 00:13:52,890 like it follows a logistic distribution 305 00:13:52,890 --> 00:13:55,920 since the likelihood of having a membership sharply rises 306 00:13:55,920 --> 00:13:57,663 after nearing a specific value. 307 00:13:58,560 --> 00:14:01,290 In this case, we can think about 34 308 00:14:01,290 --> 00:14:03,753 as the location of the distribution. 309 00:14:04,890 --> 00:14:08,040 This leads us to believe that 34 is the approximate age 310 00:14:08,040 --> 00:14:11,130 at which customers have already reached financial stability 311 00:14:11,130 --> 00:14:13,203 and can afford higher membership fees. 312 00:14:14,400 --> 00:14:17,100 This insight suggests we should target customers 313 00:14:17,100 --> 00:14:21,063 above the age of 34 since they're more likely to spend more. 314 00:14:22,170 --> 00:14:23,580 One way to use this information 315 00:14:23,580 --> 00:14:27,660 is to release more expensive legend FIFA ultimate team cards 316 00:14:27,660 --> 00:14:30,303 for players who have retired in the past 20 years. 317 00:14:32,940 --> 00:14:34,920 Fantastic work. 318 00:14:34,920 --> 00:14:37,680 In this lecture, you were able to see numerous examples 319 00:14:37,680 --> 00:14:39,630 where knowing how to deal with distributions 320 00:14:39,630 --> 00:14:41,400 is truly beneficial. 321 00:14:41,400 --> 00:14:42,720 You developed an understanding 322 00:14:42,720 --> 00:14:44,940 of the practical aspect of probability, 323 00:14:44,940 --> 00:14:47,760 and discovered why knowing how the data is distributed 324 00:14:47,760 --> 00:14:50,910 can help us make correct business decisions. 325 00:14:50,910 --> 00:14:52,650 In the next section of the course 326 00:14:52,650 --> 00:14:55,290 we will further talk about how probability ties 327 00:14:55,290 --> 00:14:59,790 into other important fields, like finance and data science. 328 00:14:59,790 --> 00:15:02,223 See you all there and thanks for watching. 26118

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.