All language subtitles for 02_feature-scaling-part-2.en

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian Download
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:01,130 --> 00:00:04,785 Let's look at how you can implement feature scaling, 2 00:00:04,785 --> 00:00:06,360 to take features that take on 3 00:00:06,360 --> 00:00:08,130 very different ranges of values and 4 00:00:08,130 --> 00:00:10,380 skill them to have comparable ranges 5 00:00:10,380 --> 00:00:11,865 of values to each other. 6 00:00:11,865 --> 00:00:14,550 How do you actually scale features? 7 00:00:14,550 --> 00:00:18,675 Well, if x_1 ranges from 3-2,000, 8 00:00:18,675 --> 00:00:22,035 one way to get a scale version of x_1 is to take 9 00:00:22,035 --> 00:00:26,760 each original x1_ value and divide by 2,000, 10 00:00:26,760 --> 00:00:28,545 the maximum of the range. 11 00:00:28,545 --> 00:00:34,140 The scale x_1 will range from 0.15 up to one. 12 00:00:34,140 --> 00:00:38,235 Similarly, since x_2 ranges from 0-5, 13 00:00:38,235 --> 00:00:41,400 you can calculate a scale version of x_2 by 14 00:00:41,400 --> 00:00:44,915 taking each original x_2 and dividing by five, 15 00:00:44,915 --> 00:00:46,880 which is again the maximum. 16 00:00:46,880 --> 00:00:51,270 So the scale is x_2 will now range from 0-1. 17 00:00:51,740 --> 00:00:56,070 If you plot the scale to x_1 and x_2 on a graph, 18 00:00:56,070 --> 00:00:58,060 it might look like this. 19 00:00:58,060 --> 00:01:01,235 In addition to dividing by the maximum, 20 00:01:01,235 --> 00:01:04,700 you can also do what's called mean normalization. 21 00:01:04,700 --> 00:01:06,560 What this looks like is, 22 00:01:06,560 --> 00:01:09,170 you start with the original features and then you 23 00:01:09,170 --> 00:01:10,880 re-scale them so that both 24 00:01:10,880 --> 00:01:13,105 of them are centered around zero. 25 00:01:13,105 --> 00:01:16,630 Whereas before they only had values greater than zero, 26 00:01:16,630 --> 00:01:20,060 now they have both negative and positive values 27 00:01:20,060 --> 00:01:24,910 that may be usually between negative one and plus one. 28 00:01:24,910 --> 00:01:28,575 To calculate the mean normalization of x_1, 29 00:01:28,575 --> 00:01:30,080 first find the average, 30 00:01:30,080 --> 00:01:33,470 also called the mean of x_1 on your training set, 31 00:01:33,470 --> 00:01:35,975 and let's call this mean Mu_1, 32 00:01:35,975 --> 00:01:39,425 with this being the Greek alphabets Mu. 33 00:01:39,425 --> 00:01:43,220 For example, you may find that the average of feature 1, 34 00:01:43,220 --> 00:01:46,400 Mu_1 is 600 square feet. 35 00:01:46,400 --> 00:01:48,485 Let's take each x_1, 36 00:01:48,485 --> 00:01:51,310 subtract the mean Mu_1, 37 00:01:51,310 --> 00:01:56,775 and then let's divide by the difference 2,000 minus 300, 38 00:01:56,775 --> 00:02:01,440 where 2,000 is the maximum and 300 the minimum, 39 00:02:01,440 --> 00:02:02,960 and if you do this, 40 00:02:02,960 --> 00:02:05,000 you get the normalized x_1 to 41 00:02:05,000 --> 00:02:10,570 range from negative 0.18-0.82. 42 00:02:10,570 --> 00:02:13,880 Similarly, to mean normalized x_2, 43 00:02:13,880 --> 00:02:16,925 you can calculate the average of feature 2. 44 00:02:16,925 --> 00:02:20,350 For instance, Mu_2 may be 2.3. 45 00:02:20,350 --> 00:02:22,980 Then you can take each x_2, 46 00:02:22,980 --> 00:02:27,960 subtract Mu_2 and divide by 5 minus 0. 47 00:02:27,960 --> 00:02:32,280 Again, the max 5 minus the mean, which is 0. 48 00:02:32,280 --> 00:02:35,849 The mean normalized x_2 now ranges 49 00:02:35,849 --> 00:02:41,155 from negative 0.46-0 54. 50 00:02:41,155 --> 00:02:43,205 If you plot the training data 51 00:02:43,205 --> 00:02:45,830 using the mean normalized x_1 and x_2, 52 00:02:45,830 --> 00:02:47,990 it might look like this. 53 00:02:47,990 --> 00:02:51,020 There's one last common re-scaling 54 00:02:51,020 --> 00:02:54,010 method call Z-score normalization. 55 00:02:54,010 --> 00:02:56,360 To implement Z-score normalization, 56 00:02:56,360 --> 00:02:58,190 you need to calculate something called 57 00:02:58,190 --> 00:03:00,530 the standard deviation of each feature. 58 00:03:00,530 --> 00:03:02,945 If you don't know what the standard deviation is, 59 00:03:02,945 --> 00:03:04,310 don't worry about it, you won't 60 00:03:04,310 --> 00:03:06,130 need to know it for this course. 61 00:03:06,130 --> 00:03:07,700 Or if you've heard of 62 00:03:07,700 --> 00:03:10,280 the normal distribution or the bell-shaped curve, 63 00:03:10,280 --> 00:03:12,590 sometimes also called the Gaussian distribution, 64 00:03:12,590 --> 00:03:14,900 this is what the standard deviation 65 00:03:14,900 --> 00:03:17,495 for the normal distribution looks like. 66 00:03:17,495 --> 00:03:18,980 But if you haven't heard of this, 67 00:03:18,980 --> 00:03:20,785 you don't need to worry about that either. 68 00:03:20,785 --> 00:03:23,990 But if you do know what is the standard deviation, 69 00:03:23,990 --> 00:03:26,720 then to implement a Z-score normalization, 70 00:03:26,720 --> 00:03:29,240 you first calculate the mean Mu, 71 00:03:29,240 --> 00:03:31,880 as well as the standard deviation, 72 00:03:31,880 --> 00:03:33,590 which is often denoted by 73 00:03:33,590 --> 00:03:38,135 the lowercase Greek alphabet Sigma of each feature. 74 00:03:38,135 --> 00:03:41,270 For instance, maybe feature 1 has 75 00:03:41,270 --> 00:03:46,405 a standard deviation of 450 and mean 600, 76 00:03:46,405 --> 00:03:49,740 then to Z-score normalize x_1, 77 00:03:49,740 --> 00:03:51,405 take each x_1, 78 00:03:51,405 --> 00:03:53,900 subtract Mu_1, and 79 00:03:53,900 --> 00:03:56,660 then divide by the standard deviation, 80 00:03:56,660 --> 00:03:59,620 which I'm going to denote as Sigma 1. 81 00:03:59,620 --> 00:04:03,555 What you may find is that the Z-score normalized 82 00:04:03,555 --> 00:04:08,650 x_1 now ranges from negative 0.67-3.1. 83 00:04:09,650 --> 00:04:12,290 Similarly, if you calculate the 84 00:04:12,290 --> 00:04:14,810 second features standard deviation 85 00:04:14,810 --> 00:04:19,855 to be 1.4 and mean to be 2.3, 86 00:04:19,855 --> 00:04:25,560 then you can compute x_2 minus Mu_2 divided by Sigma_2, 87 00:04:25,560 --> 00:04:26,940 and in this case, 88 00:04:26,940 --> 00:04:30,330 the Z-score normalized by x_2 might now 89 00:04:30,330 --> 00:04:36,060 range from negative 1.6-1.9. 90 00:04:36,060 --> 00:04:37,790 If you plot the training data on 91 00:04:37,790 --> 00:04:40,220 the normalized x_1 and x_2 on a graph, 92 00:04:40,220 --> 00:04:42,570 it might look like this. 93 00:04:42,650 --> 00:04:44,860 As a rule of thumb, 94 00:04:44,860 --> 00:04:47,104 when performing feature scaling, 95 00:04:47,104 --> 00:04:48,860 you might want to aim for getting 96 00:04:48,860 --> 00:04:51,620 the features to range from maybe anywhere 97 00:04:51,620 --> 00:04:54,320 around negative one to somewhere around 98 00:04:54,320 --> 00:04:57,530 plus one for each feature x. 99 00:04:57,530 --> 00:05:00,170 But these values, negative one and 100 00:05:00,170 --> 00:05:02,930 plus one can be a little bit loose. 101 00:05:02,930 --> 00:05:06,380 If the features range from negative three to plus 102 00:05:06,380 --> 00:05:10,445 three or negative 0.3 to plus 0.3, 103 00:05:10,445 --> 00:05:12,440 all of these are completely okay. 104 00:05:12,440 --> 00:05:14,630 If you have a feature x_1 that 105 00:05:14,630 --> 00:05:17,255 winds up being between zero and three, 106 00:05:17,255 --> 00:05:18,785 that's not a problem. 107 00:05:18,785 --> 00:05:21,050 You can re-scale it if you want, 108 00:05:21,050 --> 00:05:22,700 but if you don't re-scale it, 109 00:05:22,700 --> 00:05:24,355 it should work okay too. 110 00:05:24,355 --> 00:05:27,785 Or if you have a different feature, x_2, 111 00:05:27,785 --> 00:05:29,840 whose values are between negative 112 00:05:29,840 --> 00:05:32,180 2 and plus 0.5, again, 113 00:05:32,180 --> 00:05:34,715 that's okay, no harm re-scaling it, 114 00:05:34,715 --> 00:05:38,500 but it might be okay if you leave it alone as well. 115 00:05:38,500 --> 00:05:41,630 But if another feature, like x_3 here, 116 00:05:41,630 --> 00:05:45,680 ranges from negative 100 to plus 100, 117 00:05:45,680 --> 00:05:48,500 then this takes on a very different range of values, 118 00:05:48,500 --> 00:05:51,760 say something from around negative one to plus one. 119 00:05:51,760 --> 00:05:56,330 You're probably better off re-scaling this feature x_3 so 120 00:05:56,330 --> 00:05:57,770 that it ranges from something 121 00:05:57,770 --> 00:06:01,135 closer to negative one to plus one. 122 00:06:01,135 --> 00:06:04,140 Similarly, if you have a feature 123 00:06:04,140 --> 00:06:07,055 x_4 that takes on really small values, 124 00:06:07,055 --> 00:06:11,990 say between negative 0.001 and plus 0.001, 125 00:06:11,990 --> 00:06:14,680 then these values are so small. 126 00:06:14,680 --> 00:06:18,205 That means you may want to re-scale it as well. 127 00:06:18,205 --> 00:06:21,805 Finally, what if your feature x_5, 128 00:06:21,805 --> 00:06:23,645 such as measurements of 129 00:06:23,645 --> 00:06:26,195 a hospital patients by the temperature 130 00:06:26,195 --> 00:06:32,095 ranges from 98.6-105 degrees Fahrenheit? 131 00:06:32,095 --> 00:06:35,690 In this case, these values are around 100, 132 00:06:35,690 --> 00:06:37,430 which is actually pretty large 133 00:06:37,430 --> 00:06:40,130 compared to other scale features, 134 00:06:40,130 --> 00:06:41,660 and this will actually cause 135 00:06:41,660 --> 00:06:44,140 gradient descent to run more slowly. 136 00:06:44,140 --> 00:06:47,960 In this case, feature re-scaling will likely help. 137 00:06:47,960 --> 00:06:50,360 There's almost never any harm to 138 00:06:50,360 --> 00:06:52,700 carrying out feature re-scaling. 139 00:06:52,700 --> 00:06:56,245 When in doubt, I encourage you to just carry it out. 140 00:06:56,245 --> 00:06:58,605 That's it for feature scaling. 141 00:06:58,605 --> 00:06:59,900 With this little technique, 142 00:06:59,900 --> 00:07:01,790 you'll often be able to get 143 00:07:01,790 --> 00:07:04,805 gradient descent to run much faster. 144 00:07:04,805 --> 00:07:07,480 That's features scaling. 145 00:07:07,480 --> 00:07:10,144 With or without feature scaling, 146 00:07:10,144 --> 00:07:11,765 when you run gradient descent, 147 00:07:11,765 --> 00:07:13,610 how can you know, how can you check 148 00:07:13,610 --> 00:07:15,830 if gradient descent is really working? 149 00:07:15,830 --> 00:07:17,150 If it is finding you 150 00:07:17,150 --> 00:07:19,975 the global minimum or something close to it. 151 00:07:19,975 --> 00:07:21,335 In the next video, 152 00:07:21,335 --> 00:07:23,675 let's take a look at how to recognize 153 00:07:23,675 --> 00:07:26,225 if gradient descent is converging, 154 00:07:26,225 --> 00:07:28,220 and then in the video after that, 155 00:07:28,220 --> 00:07:30,710 this will lead to discussion of how to choose 156 00:07:30,710 --> 00:07:34,440 a good learning rate for gradient descent.11069

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.