All language subtitles for 04_choosing-the-learning-rate.en

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian Download
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:01,280 --> 00:00:04,050 Your learning algorithm will run much 2 00:00:04,050 --> 00:00:06,735 better with an appropriate choice of learning rate. 3 00:00:06,735 --> 00:00:08,055 If it's too small, 4 00:00:08,055 --> 00:00:10,800 it will run very slowly and if it is too large, 5 00:00:10,800 --> 00:00:12,195 it may not even converge. 6 00:00:12,195 --> 00:00:13,890 Let's take a look at how you can 7 00:00:13,890 --> 00:00:16,290 choose a good learning rate for your model. 8 00:00:16,290 --> 00:00:18,135 Concretely, if you 9 00:00:18,135 --> 00:00:21,270 plot the cost for a number of iterations 10 00:00:21,270 --> 00:00:23,580 and notice that the costs sometimes 11 00:00:23,580 --> 00:00:26,610 goes up and sometimes goes down, 12 00:00:26,610 --> 00:00:28,830 you should take that as a clear sign that 13 00:00:28,830 --> 00:00:31,245 gradient descent is not working properly. 14 00:00:31,245 --> 00:00:33,885 This could mean that there's a bug in the code. 15 00:00:33,885 --> 00:00:35,790 Or sometimes it could mean that 16 00:00:35,790 --> 00:00:37,875 your learning rate is too large. 17 00:00:37,875 --> 00:00:41,670 So here's an illustration of what might be happening. 18 00:00:41,670 --> 00:00:46,640 Here the vertical axis is a cost function J, 19 00:00:46,640 --> 00:00:50,600 and the horizontal axis represents a parameter like 20 00:00:50,600 --> 00:00:55,230 maybe w_1 and if the learning rate is too big, 21 00:00:55,230 --> 00:00:57,470 then if you start off here, 22 00:00:57,470 --> 00:00:59,360 your update step may overshoot 23 00:00:59,360 --> 00:01:01,385 the minimum and end up here, 24 00:01:01,385 --> 00:01:03,605 and in the next update step here, 25 00:01:03,605 --> 00:01:08,055 your gain overshooting so you end up here and so on. 26 00:01:08,055 --> 00:01:10,115 That's why the cost can sometimes go 27 00:01:10,115 --> 00:01:12,445 up instead of decreasing. 28 00:01:12,445 --> 00:01:15,830 To fix this, you can use a smaller learning rate. 29 00:01:15,830 --> 00:01:17,840 Then your updates may start 30 00:01:17,840 --> 00:01:20,980 here and go down a little bit and down a bit, 31 00:01:20,980 --> 00:01:22,760 and we'll hopefully consistently 32 00:01:22,760 --> 00:01:25,795 decrease until it reaches the global minimum. 33 00:01:25,795 --> 00:01:28,100 Sometimes you may see that the cost 34 00:01:28,100 --> 00:01:31,475 consistently increases after each iteration, 35 00:01:31,475 --> 00:01:33,515 like this curve here. 36 00:01:33,515 --> 00:01:35,390 This is also likely due to 37 00:01:35,390 --> 00:01:37,220 a learning rate that is too large, 38 00:01:37,220 --> 00:01:38,720 and it could be addressed by 39 00:01:38,720 --> 00:01:41,150 choosing a smaller learning rate. 40 00:01:41,150 --> 00:01:43,760 But learning rates like this could 41 00:01:43,760 --> 00:01:46,690 also be a sign of a possible broken code. 42 00:01:46,690 --> 00:01:51,165 For example, if I wrote my code so that w_1 gets 43 00:01:51,165 --> 00:01:56,405 updated as w_1 plus Alpha times this derivative term, 44 00:01:56,405 --> 00:01:58,910 this could result in the cost consistently 45 00:01:58,910 --> 00:02:01,690 increasing at each iteration. 46 00:02:01,690 --> 00:02:05,630 This is because having the derivative term moves 47 00:02:05,630 --> 00:02:07,220 your cost J further from 48 00:02:07,220 --> 00:02:09,545 the global minimum instead of closer. 49 00:02:09,545 --> 00:02:12,830 So remember, you want to use in minus sign, 50 00:02:12,830 --> 00:02:16,640 so the code should be updated w_1 updated 51 00:02:16,640 --> 00:02:21,290 by w_1 minus Alpha times the derivative term. 52 00:02:21,290 --> 00:02:24,950 One debugging tip for a correct implementation of 53 00:02:24,950 --> 00:02:26,435 gradient descent is that 54 00:02:26,435 --> 00:02:28,550 with a small enough learning rate, 55 00:02:28,550 --> 00:02:29,885 the cost function should 56 00:02:29,885 --> 00:02:32,995 decrease on every single iteration. 57 00:02:32,995 --> 00:02:36,075 So if gradient descent isn't working, 58 00:02:36,075 --> 00:02:37,720 one thing I often do 59 00:02:37,720 --> 00:02:39,955 and I hope you find this tip useful too, 60 00:02:39,955 --> 00:02:43,370 one thing I'll often do is just set Alpha to be 61 00:02:43,370 --> 00:02:46,595 a very small number and see if that 62 00:02:46,595 --> 00:02:50,965 causes the cost to decrease on every iteration. 63 00:02:50,965 --> 00:02:55,415 If even with Alpha set to a very small number, 64 00:02:55,415 --> 00:02:58,115 J doesn't decrease on every single iteration, 65 00:02:58,115 --> 00:03:00,005 but instead sometimes increases, 66 00:03:00,005 --> 00:03:01,460 then that usually means 67 00:03:01,460 --> 00:03:02,900 there's a bug somewhere in the code. 68 00:03:02,900 --> 00:03:06,065 Note that setting Alpha 69 00:03:06,065 --> 00:03:08,790 to be really small is meant here as 70 00:03:08,790 --> 00:03:12,485 a debugging step and a very small value of Alpha 71 00:03:12,485 --> 00:03:14,600 is not going to be the most efficient choice 72 00:03:14,600 --> 00:03:17,005 for actually training your learning algorithm. 73 00:03:17,005 --> 00:03:18,730 One important trade-off is 74 00:03:18,730 --> 00:03:21,325 that if your learning rate is too small, 75 00:03:21,325 --> 00:03:23,230 then gradient descents can take 76 00:03:23,230 --> 00:03:25,565 a lot of iterations to converge. 77 00:03:25,565 --> 00:03:28,600 So when I am running gradient descent, 78 00:03:28,600 --> 00:03:30,370 I will usually try a range of 79 00:03:30,370 --> 00:03:32,540 values for the learning rate Alpha. 80 00:03:32,540 --> 00:03:35,485 I may start by trying a learning rate of 81 00:03:35,485 --> 00:03:38,380 0.001 and I may also try 82 00:03:38,380 --> 00:03:40,630 learning rate as 10 times as large say 83 00:03:40,630 --> 00:03:44,765 0.01 and 0.1 and so on. 84 00:03:44,765 --> 00:03:46,975 For each choice of Alpha, 85 00:03:46,975 --> 00:03:49,330 you might run gradient descent just for 86 00:03:49,330 --> 00:03:52,825 a handful of iterations and plot the cost function 87 00:03:52,825 --> 00:03:55,360 J as a function of the number of 88 00:03:55,360 --> 00:03:59,490 iterations and after trying a few different values, 89 00:03:59,490 --> 00:04:02,630 you might then pick the value of Alpha that seems to 90 00:04:02,630 --> 00:04:04,085 decrease the learning rate 91 00:04:04,085 --> 00:04:07,025 rapidly, but also consistently. 92 00:04:07,025 --> 00:04:09,140 In fact, what I actually do 93 00:04:09,140 --> 00:04:12,140 is try a range of values like this. 94 00:04:12,140 --> 00:04:15,230 After trying 0.001, I'll 95 00:04:15,230 --> 00:04:19,730 then increase the learning rate threefold to 0.003. 96 00:04:19,730 --> 00:04:23,085 After that, I'll try 0.01, 97 00:04:23,085 --> 00:04:27,800 which is again about three times as large as 0.003. 98 00:04:27,800 --> 00:04:29,705 So these are roughly trying out 99 00:04:29,705 --> 00:04:31,895 gradient descents with each value of 100 00:04:31,895 --> 00:04:33,650 Alpha being roughly three times 101 00:04:33,650 --> 00:04:36,580 bigger than the previous value. 102 00:04:36,580 --> 00:04:38,900 What I'll do is try a range of 103 00:04:38,900 --> 00:04:41,240 values until I found the value of that's too 104 00:04:41,240 --> 00:04:43,040 small and then also make 105 00:04:43,040 --> 00:04:45,290 sure I've found a value that's too large. 106 00:04:45,290 --> 00:04:47,120 I'll slowly try to 107 00:04:47,120 --> 00:04:50,120 pick the largest possible learning rate, 108 00:04:50,120 --> 00:04:52,610 or just something slightly smaller than 109 00:04:52,610 --> 00:04:55,420 the largest reasonable value that I found. 110 00:04:55,420 --> 00:04:57,560 When I do that, it usually gives 111 00:04:57,560 --> 00:05:00,125 me a good learning rate for my model. 112 00:05:00,125 --> 00:05:02,480 I hope this technique too 113 00:05:02,480 --> 00:05:04,280 will be useful for you to choose 114 00:05:04,280 --> 00:05:05,690 a good learning rate for 115 00:05:05,690 --> 00:05:08,930 your implementation of gradient descent. 116 00:05:08,930 --> 00:05:12,290 In the upcoming optional lab you can 117 00:05:12,290 --> 00:05:15,140 also take a look at how feature scaling is done in 118 00:05:15,140 --> 00:05:18,230 code and also see how different choices of 119 00:05:18,230 --> 00:05:20,285 the learning rate Alpha can lead to 120 00:05:20,285 --> 00:05:23,890 either better or worse training of your model. 121 00:05:23,890 --> 00:05:26,795 I hope you have fun playing with the value of Alpha 122 00:05:26,795 --> 00:05:30,320 and seeing the outcomes of different choices of Alpha. 123 00:05:30,320 --> 00:05:32,870 Please take a look and run the code in 124 00:05:32,870 --> 00:05:34,280 the optional lab to gain 125 00:05:34,280 --> 00:05:36,635 a deeper intuition about feature scaling, 126 00:05:36,635 --> 00:05:39,010 as well as the learning rate Alpha. 127 00:05:39,010 --> 00:05:41,810 Choosing learning rates is an important part of 128 00:05:41,810 --> 00:05:44,390 training many learning algorithms and I hope that 129 00:05:44,390 --> 00:05:46,190 this video gives you intuition about 130 00:05:46,190 --> 00:05:50,080 different choices and how to pick a good value for Alpha. 131 00:05:50,080 --> 00:05:53,450 Now, there are couple more ideas that you can use to 132 00:05:53,450 --> 00:05:56,510 make multiple linear regression much more powerful. 133 00:05:56,510 --> 00:05:59,315 That is choosing custom features, 134 00:05:59,315 --> 00:06:01,820 which will also allow you to fit curves, 135 00:06:01,820 --> 00:06:03,775 not just a straight line to your data. 136 00:06:03,775 --> 00:06:06,930 Let's take a look at that in the next video.9827

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.