All language subtitles for 02_implementing-gradient-descent.en

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian Download
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:03,410 --> 00:00:06,435 Let's take a look at how you can actually 2 00:00:06,435 --> 00:00:09,540 implement the gradient descent algorithm. 3 00:00:09,540 --> 00:00:13,725 Let me write down the gradient descent algorithm. 4 00:00:13,725 --> 00:00:16,450 Here it is. On each step, 5 00:00:16,450 --> 00:00:18,410 w, the parameter, 6 00:00:18,410 --> 00:00:22,550 is updated to the old value of w minus 7 00:00:22,550 --> 00:00:27,305 Alpha times this term d/dw 8 00:00:27,305 --> 00:00:30,430 of the cos function J of wb. 9 00:00:30,430 --> 00:00:32,970 What this expression is saying is, 10 00:00:32,970 --> 00:00:36,860 after your parameter w by taking the current value of 11 00:00:36,860 --> 00:00:40,805 w and adjusting it a small amount, 12 00:00:40,805 --> 00:00:43,070 which is this expression on the right, 13 00:00:43,070 --> 00:00:47,820 minus Alpha times this term over here. 14 00:00:48,980 --> 00:00:51,440 If you feel like there's a lot 15 00:00:51,440 --> 00:00:53,270 going on in this equation, 16 00:00:53,270 --> 00:00:55,310 it's okay, don't worry about it. 17 00:00:55,310 --> 00:00:57,490 We'll unpack it together. 18 00:00:57,490 --> 00:01:01,140 First, this equal notation here. 19 00:01:01,140 --> 00:01:03,335 Now, since I said we're assigning 20 00:01:03,335 --> 00:01:06,350 w a value using this equal sign, 21 00:01:06,350 --> 00:01:08,090 so in this context, 22 00:01:08,090 --> 00:01:11,710 this equal sign is the assignment operator. 23 00:01:11,710 --> 00:01:13,969 Specifically, in this context, 24 00:01:13,969 --> 00:01:17,810 if you write code that says a equals c, 25 00:01:17,810 --> 00:01:21,964 it means take the value c and store it in your computer, 26 00:01:21,964 --> 00:01:23,665 in the variable a. 27 00:01:23,665 --> 00:01:26,330 Or if you write a equals a plus 1, 28 00:01:26,330 --> 00:01:29,900 it means set the value of a to be equal to a plus 1, 29 00:01:29,900 --> 00:01:33,325 or increments the value of a by one. 30 00:01:33,325 --> 00:01:36,710 The assignment operator encoding is 31 00:01:36,710 --> 00:01:41,095 different than truth assertions in mathematics. 32 00:01:41,095 --> 00:01:43,505 Where if I write a equals c, 33 00:01:43,505 --> 00:01:45,080 I'm asserting, that is, 34 00:01:45,080 --> 00:01:47,240 I'm claiming that the values 35 00:01:47,240 --> 00:01:50,095 of a and c are equal to each other. 36 00:01:50,095 --> 00:01:53,615 Hopefully, I will never write a truth assertion a equals 37 00:01:53,615 --> 00:01:57,655 a plus 1 because that just can't possibly be true. 38 00:01:57,655 --> 00:02:01,205 In Python and in other programming languages, 39 00:02:01,205 --> 00:02:05,590 truth assertions are sometimes written as equals equals, 40 00:02:05,590 --> 00:02:06,890 so you may see oh, 41 00:02:06,890 --> 00:02:10,110 that says a equals equals c if you're 42 00:02:10,110 --> 00:02:14,014 testing whether a is equal to c. But in math notation, 43 00:02:14,014 --> 00:02:15,860 as we conventionally use it, 44 00:02:15,860 --> 00:02:17,240 like in these videos, 45 00:02:17,240 --> 00:02:19,610 the equal sign can be used for 46 00:02:19,610 --> 00:02:23,215 either assignments or for truth assertion. 47 00:02:23,215 --> 00:02:25,100 I try to make sure I was clear 48 00:02:25,100 --> 00:02:26,780 when I write an equal sign, 49 00:02:26,780 --> 00:02:29,225 whether we're assigning a value to a variable, 50 00:02:29,225 --> 00:02:31,700 or whether we're asserting the truth of 51 00:02:31,700 --> 00:02:34,765 the equality of two values. 52 00:02:34,765 --> 00:02:37,220 Now, this dive more deeply 53 00:02:37,220 --> 00:02:39,965 into what the symbols in this equation means. 54 00:02:39,965 --> 00:02:45,070 The symbol here is the Greek alphabet Alpha. 55 00:02:45,070 --> 00:02:50,720 In this equation, Alpha is also called the learning rate. 56 00:02:50,720 --> 00:02:54,050 The learning rate is usually a small positive number 57 00:02:54,050 --> 00:02:58,650 between 0 and 1 and it might be say, 0.01. 58 00:02:58,660 --> 00:03:00,920 What Alpha does is, 59 00:03:00,920 --> 00:03:02,570 it basically controls how 60 00:03:02,570 --> 00:03:05,515 big of a step you take downhill. 61 00:03:05,515 --> 00:03:08,285 If Alpha is very large, 62 00:03:08,285 --> 00:03:09,800 then that corresponds to 63 00:03:09,800 --> 00:03:12,604 a very aggressive gradient descent procedure 64 00:03:12,604 --> 00:03:15,550 where you're trying to take huge steps downhill. 65 00:03:15,550 --> 00:03:17,705 If Alpha is very small, 66 00:03:17,705 --> 00:03:20,705 then you'd be taking small baby steps downhill. 67 00:03:20,705 --> 00:03:23,480 We'll come back later to dive more deeply into 68 00:03:23,480 --> 00:03:26,560 how to choose a good learning rate Alpha. 69 00:03:26,560 --> 00:03:29,435 Finally, this term here, 70 00:03:29,435 --> 00:03:33,155 that's the derivative term of the cost function J. 71 00:03:33,155 --> 00:03:35,150 Let's not worry about the details 72 00:03:35,150 --> 00:03:36,740 of this derivative right now. 73 00:03:36,740 --> 00:03:38,510 But later on, you'll get to 74 00:03:38,510 --> 00:03:40,750 see more about the derivative term. 75 00:03:40,750 --> 00:03:42,380 But for now, you can think of 76 00:03:42,380 --> 00:03:44,390 this derivative term that I drew 77 00:03:44,390 --> 00:03:46,685 a magenta box around as telling you 78 00:03:46,685 --> 00:03:49,660 in which direction you want to take your baby step. 79 00:03:49,660 --> 00:03:53,135 In combination with the learning rate Alpha, 80 00:03:53,135 --> 00:03:55,160 it also determines the size 81 00:03:55,160 --> 00:03:57,625 of the steps you want to take downhill. 82 00:03:57,625 --> 00:04:00,410 Now, I do want to mention that 83 00:04:00,410 --> 00:04:02,825 derivatives come from calculus. 84 00:04:02,825 --> 00:04:04,580 Even if you aren't familiar with 85 00:04:04,580 --> 00:04:06,820 calculus, don't worry about it. 86 00:04:06,820 --> 00:04:09,095 Even without knowing any calculus, 87 00:04:09,095 --> 00:04:10,760 you'd be able to figure out all you need 88 00:04:10,760 --> 00:04:12,515 to know about this derivative term 89 00:04:12,515 --> 00:04:16,470 in this video and the next. One more thing. 90 00:04:16,470 --> 00:04:19,100 Remember your model has two parameters, 91 00:04:19,100 --> 00:04:22,040 not just w, but also b. 92 00:04:22,040 --> 00:04:25,040 You also have an assignment operations 93 00:04:25,040 --> 00:04:28,595 update the parameter b that looks very similar. 94 00:04:28,595 --> 00:04:33,680 b is assigned the old value of b minus 95 00:04:33,680 --> 00:04:36,290 the learning rate Alpha times 96 00:04:36,290 --> 00:04:39,920 this slightly different derivative term, 97 00:04:39,920 --> 00:04:42,895 d/db of J of wb. 98 00:04:42,895 --> 00:04:46,460 Remember in the graph of the surface plot 99 00:04:46,460 --> 00:04:48,290 where you're taking baby steps 100 00:04:48,290 --> 00:04:50,315 until you get to the bottom of the value, 101 00:04:50,315 --> 00:04:53,210 well, for the gradient descent algorithm, 102 00:04:53,210 --> 00:04:54,410 you're going to repeat 103 00:04:54,410 --> 00:04:57,950 these two update steps until the algorithm converges. 104 00:04:57,950 --> 00:05:00,710 By converges, I mean that you 105 00:05:00,710 --> 00:05:03,470 reach the point at a local minimum where 106 00:05:03,470 --> 00:05:05,990 the parameters w and b no longer 107 00:05:05,990 --> 00:05:09,830 change much with each additional step that you take. 108 00:05:09,830 --> 00:05:13,054 Now, there's one more subtle detail 109 00:05:13,054 --> 00:05:16,340 about how to correctly in semantic gradient descent, 110 00:05:16,340 --> 00:05:20,465 you're going to update two parameters, w and b. 111 00:05:20,465 --> 00:05:25,655 This update takes place for both parameters, w and b. 112 00:05:25,655 --> 00:05:30,905 One important detail is that for gradient descent, 113 00:05:30,905 --> 00:05:35,615 you want to simultaneously update w and b, 114 00:05:35,615 --> 00:05:37,205 meaning you want to update 115 00:05:37,205 --> 00:05:39,710 both parameters at the same time. 116 00:05:39,710 --> 00:05:43,325 What I mean by that, is that in this expression, 117 00:05:43,325 --> 00:05:48,290 you're going to update w from the old w to a new w, 118 00:05:48,290 --> 00:05:50,840 and you're also updating b from 119 00:05:50,840 --> 00:05:54,815 its oldest value to a new value of b. 120 00:05:54,815 --> 00:05:58,850 The way to implement this is to compute the right side, 121 00:05:58,850 --> 00:06:02,570 computing this thing for w and b, 122 00:06:02,570 --> 00:06:05,630 and simultaneously at the same time, 123 00:06:05,630 --> 00:06:10,745 update w and b to the new values. 124 00:06:10,745 --> 00:06:13,955 Let's take a look at what this means. 125 00:06:13,955 --> 00:06:16,340 Here's the correct way to implement 126 00:06:16,340 --> 00:06:19,790 gradient descent which does a simultaneous update. 127 00:06:19,790 --> 00:06:24,620 This sets a variable temp_w equal to that expression, 128 00:06:24,620 --> 00:06:27,470 which is w minus that term here. 129 00:06:27,470 --> 00:06:31,475 There's also a set in another variable temp_b to that, 130 00:06:31,475 --> 00:06:33,755 which is b minus that term. 131 00:06:33,755 --> 00:06:36,875 You compute both for hand sides, both updates, 132 00:06:36,875 --> 00:06:40,775 and store them into variables temp_w and temp_b. 133 00:06:40,775 --> 00:06:46,655 Then you copy the value of temp_w into w, 134 00:06:46,655 --> 00:06:51,215 and you also copy the value of temp_b into b. 135 00:06:51,215 --> 00:06:55,490 Now, one thing you may notice is that this value of 136 00:06:55,490 --> 00:07:00,005 w is from the for w gets updated. 137 00:07:00,005 --> 00:07:03,695 Here, I noticed that the pre-update w 138 00:07:03,695 --> 00:07:07,640 is where it goes into the derivative term over here. 139 00:07:07,640 --> 00:07:10,190 In contrast, here is 140 00:07:10,190 --> 00:07:12,080 an incorrect implementation of 141 00:07:12,080 --> 00:07:15,665 gradient descent that does not do a simultaneous update. 142 00:07:15,665 --> 00:07:18,350 In this incorrect implementation, 143 00:07:18,350 --> 00:07:20,315 we compute temp_w, 144 00:07:20,315 --> 00:07:23,505 same as before, so far that's okay. 145 00:07:23,505 --> 00:07:26,725 Now here's where things start to differ. 146 00:07:26,725 --> 00:07:29,800 We then update w with the value in 147 00:07:29,800 --> 00:07:33,670 temp_w before calculating the new value 148 00:07:33,670 --> 00:07:35,550 for the other parameter to be. 149 00:07:35,550 --> 00:07:40,460 Next, we calculate temp_b as b minus that term here, 150 00:07:40,460 --> 00:07:45,095 and finally, we update b with the value in temp_b. 151 00:07:45,095 --> 00:07:47,510 The difference between the right-hand side and 152 00:07:47,510 --> 00:07:49,160 the left-hand side implementations 153 00:07:49,160 --> 00:07:51,305 is that if you look over here, 154 00:07:51,305 --> 00:07:55,865 this w has already been updated to this new value, 155 00:07:55,865 --> 00:07:58,940 and this is updated w that actually 156 00:07:58,940 --> 00:08:02,945 goes into the cost function j of w, b. 157 00:08:02,945 --> 00:08:05,900 It means that this term here on the right is not the 158 00:08:05,900 --> 00:08:10,655 same as this term over here that you see on the left. 159 00:08:10,655 --> 00:08:15,080 That also means this temp_b term on 160 00:08:15,080 --> 00:08:16,850 the right is not quite the 161 00:08:16,850 --> 00:08:20,015 same as the temp b term on the left, 162 00:08:20,015 --> 00:08:22,130 and thus this updated value for 163 00:08:22,130 --> 00:08:24,215 b on the right is not the same 164 00:08:24,215 --> 00:08:29,165 as this updated value for variable b on the left. 165 00:08:29,165 --> 00:08:32,105 The way that gradient descent is implemented in code, 166 00:08:32,105 --> 00:08:35,360 it actually turns out to be more natural to implement 167 00:08:35,360 --> 00:08:39,680 it the correct way with simultaneous updates. 168 00:08:39,680 --> 00:08:42,695 When you hear someone talk about gradient descent, 169 00:08:42,695 --> 00:08:44,990 they always mean the gradient descents where you 170 00:08:44,990 --> 00:08:48,335 perform a simultaneous update of the parameters. 171 00:08:48,335 --> 00:08:50,720 If however, you were 172 00:08:50,720 --> 00:08:53,375 to implement non-simultaneous update, 173 00:08:53,375 --> 00:08:57,230 it turns out it will probably work more or less anyway. 174 00:08:57,230 --> 00:08:59,000 But doing it this way isn't 175 00:08:59,000 --> 00:09:00,965 really the correct way to implement it, 176 00:09:00,965 --> 00:09:02,750 is actually some other algorithm 177 00:09:02,750 --> 00:09:04,385 with different properties. 178 00:09:04,385 --> 00:09:06,500 I would advise you to just stick to 179 00:09:06,500 --> 00:09:08,990 the correct simultaneous update and 180 00:09:08,990 --> 00:09:12,560 not use this incorrect version on the right. 181 00:09:12,560 --> 00:09:14,780 That's gradient descent. 182 00:09:14,780 --> 00:09:16,070 In the next video, 183 00:09:16,070 --> 00:09:17,450 we'll go into details of 184 00:09:17,450 --> 00:09:20,240 the derivative term which you saw in this video, 185 00:09:20,240 --> 00:09:22,895 but that we didn't really talk about in detail. 186 00:09:22,895 --> 00:09:25,475 Derivatives are part of calculus, 187 00:09:25,475 --> 00:09:27,125 and again, if you're not familiar with 188 00:09:27,125 --> 00:09:29,115 calculus, don't worry about it. 189 00:09:29,115 --> 00:09:31,720 You don't need to know calculus at all in order to 190 00:09:31,720 --> 00:09:34,540 complete this course or this specialization, 191 00:09:34,540 --> 00:09:36,835 and you have all the information you need 192 00:09:36,835 --> 00:09:39,550 in order to implement gradient descent. 193 00:09:39,550 --> 00:09:41,440 Coming up in the next video, 194 00:09:41,440 --> 00:09:43,810 we'll go over derivatives together, 195 00:09:43,810 --> 00:09:45,280 and you come away with 196 00:09:45,280 --> 00:09:47,740 the intuition and knowledge you need to 197 00:09:47,740 --> 00:09:51,975 be able to implement and apply gradient descent yourself. 198 00:09:51,975 --> 00:09:53,945 I think that'll be an exciting thing 199 00:09:53,945 --> 00:09:55,580 for you to know how to implement. 200 00:09:55,580 --> 00:09:59,760 Let's go on to the next video to see how to do that.14581

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.