All language subtitles for 05_gradient-descent-for-linear-regression.en

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian Download
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,830 --> 00:00:03,810 Previously, you took a look at 2 00:00:03,810 --> 00:00:07,440 the linear regression model and then the cost function, 3 00:00:07,440 --> 00:00:10,095 and then the gradient descent algorithm. 4 00:00:10,095 --> 00:00:13,590 In this video, we're going to pull out together and use 5 00:00:13,590 --> 00:00:15,660 the squared error cost function for 6 00:00:15,660 --> 00:00:19,000 the linear regression model with gradient descent. 7 00:00:19,000 --> 00:00:20,970 This will allow us to train 8 00:00:20,970 --> 00:00:23,310 the linear regression model to fit 9 00:00:23,310 --> 00:00:24,360 a straight line to achieve 10 00:00:24,360 --> 00:00:26,865 the training data. Let's get to it. 11 00:00:26,865 --> 00:00:30,210 Here's the linear regression model. 12 00:00:30,210 --> 00:00:34,675 To the right is the squared error cost function. 13 00:00:34,675 --> 00:00:38,150 Below is the gradient descent algorithm. 14 00:00:38,150 --> 00:00:42,040 It turns out if you calculate these derivatives, 15 00:00:42,040 --> 00:00:45,215 these are the terms you would get. 16 00:00:45,215 --> 00:00:51,285 The derivative with respect to W is this 1 over m, 17 00:00:51,285 --> 00:00:55,295 sum of i equals 1 through m. Then the error term, 18 00:00:55,295 --> 00:00:58,310 that is the difference between the predicted and 19 00:00:58,310 --> 00:01:03,020 the actual values times the input feature xi. 20 00:01:03,020 --> 00:01:05,650 The derivative with respect to b 21 00:01:05,650 --> 00:01:08,435 is this formula over here, 22 00:01:08,435 --> 00:01:10,775 which looks the same as the equation above, 23 00:01:10,775 --> 00:01:15,550 except that it doesn't have that xi term at the end. 24 00:01:15,550 --> 00:01:18,875 If you use these formulas to compute 25 00:01:18,875 --> 00:01:20,735 these two derivatives and 26 00:01:20,735 --> 00:01:24,170 implements gradient descent this way, it will work. 27 00:01:24,170 --> 00:01:26,645 Now, you may be wondering, 28 00:01:26,645 --> 00:01:28,550 where did I get these formulas from? 29 00:01:28,550 --> 00:01:31,130 They're derived using calculus. 30 00:01:31,130 --> 00:01:33,605 If you want to see the full derivation, 31 00:01:33,605 --> 00:01:34,730 I'll quickly run through 32 00:01:34,730 --> 00:01:36,715 the derivation on the next slide. 33 00:01:36,715 --> 00:01:38,870 But if you don't remember or aren't 34 00:01:38,870 --> 00:01:41,460 interested in the calculus, don't worry about it. 35 00:01:41,460 --> 00:01:42,920 You can skip the materials on 36 00:01:42,920 --> 00:01:45,410 the next slide entirely and still be able 37 00:01:45,410 --> 00:01:47,290 to implement gradient descent and finish 38 00:01:47,290 --> 00:01:50,215 this class and everything will work just fine. 39 00:01:50,215 --> 00:01:52,520 In this slide, which is one of 40 00:01:52,520 --> 00:01:55,550 the most mathematical slide of the entire specialization, 41 00:01:55,550 --> 00:01:57,665 and again is completely optional, 42 00:01:57,665 --> 00:02:01,475 we'll show you how to calculate the derivative terms. 43 00:02:01,475 --> 00:02:03,795 Let's start with the first term. 44 00:02:03,795 --> 00:02:06,800 The derivative of the cost function J with 45 00:02:06,800 --> 00:02:10,520 respect to w. We'll start by plugging in 46 00:02:10,520 --> 00:02:18,710 the definition of the cost function J. J of WP is this. 47 00:02:18,710 --> 00:02:25,255 1 over 2m times this sum of the squared error terms. 48 00:02:25,255 --> 00:02:30,420 Now remember also that f of wb 49 00:02:30,420 --> 00:02:36,560 of X^i is equal to this term over here, 50 00:02:36,560 --> 00:02:41,500 which is WX^i plus b. 51 00:02:41,500 --> 00:02:45,365 What we would like to do is compute the derivative, 52 00:02:45,365 --> 00:02:49,055 also called the partial derivative with respect to 53 00:02:49,055 --> 00:02:54,725 w of this equation right here on the right. 54 00:02:54,725 --> 00:02:57,305 If you taken a calculus class before, 55 00:02:57,305 --> 00:02:59,885 and again is totally fine if you haven't, 56 00:02:59,885 --> 00:03:02,674 you may know that by the rules of calculus, 57 00:03:02,674 --> 00:03:06,770 the derivative is equal to this term over here. 58 00:03:06,770 --> 00:03:12,950 Which is why the two here and two here cancel out, 59 00:03:12,950 --> 00:03:15,395 leaving us with this equation 60 00:03:15,395 --> 00:03:18,610 that you saw on the previous slide. 61 00:03:18,610 --> 00:03:23,600 This is why we had to find the cost function with the 62 00:03:23,600 --> 00:03:25,895 1.5 earlier this week 63 00:03:25,895 --> 00:03:29,060 is because it makes the partial derivative neater. 64 00:03:29,060 --> 00:03:31,310 It cancels out the two that appears 65 00:03:31,310 --> 00:03:33,950 from computing the derivative. 66 00:03:33,950 --> 00:03:38,000 For the other derivative with respect to b, 67 00:03:38,000 --> 00:03:39,680 this is quite similar. 68 00:03:39,680 --> 00:03:43,040 I can write it out like this, and once again, 69 00:03:43,040 --> 00:03:45,650 plugging the definition of f 70 00:03:45,650 --> 00:03:49,510 of X^i, giving this equation. 71 00:03:49,510 --> 00:03:52,310 By the rules of calculus, 72 00:03:52,310 --> 00:03:54,874 this is equal to this 73 00:03:54,874 --> 00:03:58,635 where there's no X^i anymore at the end. 74 00:03:58,635 --> 00:04:02,600 The 2's cancel one small and you end up with 75 00:04:02,600 --> 00:04:07,000 this expression for the derivative with respect to b. 76 00:04:07,000 --> 00:04:11,150 Now you have these two expressions for the derivatives. 77 00:04:11,150 --> 00:04:15,850 You can plug them into the gradient descent algorithm. 78 00:04:15,850 --> 00:04:18,530 Here's the gradient descent algorithm 79 00:04:18,530 --> 00:04:20,105 for linear regression. 80 00:04:20,105 --> 00:04:23,030 You repeatedly carry out these updates 81 00:04:23,030 --> 00:04:26,230 to w and b until convergence. 82 00:04:26,230 --> 00:04:30,665 Remember that this f of x is a linear regression model, 83 00:04:30,665 --> 00:04:35,500 so as equal to w times x plus b. 84 00:04:35,500 --> 00:04:37,460 This expression here is 85 00:04:37,460 --> 00:04:40,580 the derivative of the cost function with respect to 86 00:04:40,580 --> 00:04:43,100 w. This expression is 87 00:04:43,100 --> 00:04:47,230 the derivative of the cost function with respect to b. 88 00:04:47,230 --> 00:04:49,324 Just as a reminder, 89 00:04:49,324 --> 00:04:54,235 you want to update w and b simultaneously on each step. 90 00:04:54,235 --> 00:04:58,085 Now, let's get familiar with how gradient descent works. 91 00:04:58,085 --> 00:05:01,430 One the shoe we saw with gradient descent is that it can 92 00:05:01,430 --> 00:05:05,255 lead to a local minimum instead of a global minimum. 93 00:05:05,255 --> 00:05:08,090 Whether global minimum means the point that has 94 00:05:08,090 --> 00:05:10,430 the lowest possible value for the cost function 95 00:05:10,430 --> 00:05:13,040 J of all possible points. 96 00:05:13,040 --> 00:05:16,520 You may recall this surface plot that looks like 97 00:05:16,520 --> 00:05:18,710 an outdoor park with a few hills with 98 00:05:18,710 --> 00:05:21,575 the process and the birds as a relaxing Hobo Hill. 99 00:05:21,575 --> 00:05:24,755 This function has more than one local minimum. 100 00:05:24,755 --> 00:05:27,020 Remember, depending on where 101 00:05:27,020 --> 00:05:29,480 you initialize the parameters w and b, 102 00:05:29,480 --> 00:05:32,350 you can end up at different local minima. 103 00:05:32,350 --> 00:05:34,085 You can end up here, 104 00:05:34,085 --> 00:05:36,260 or you can end up here. 105 00:05:36,260 --> 00:05:38,465 But it turns out when you're using 106 00:05:38,465 --> 00:05:41,839 a squared error cost function with linear regression, 107 00:05:41,839 --> 00:05:44,270 the cost function does not and will 108 00:05:44,270 --> 00:05:47,180 never have multiple local minima. 109 00:05:47,180 --> 00:05:49,730 It has a single global minimum 110 00:05:49,730 --> 00:05:51,940 because of this bowl-shape. 111 00:05:51,940 --> 00:05:54,770 The technical term for this is that 112 00:05:54,770 --> 00:05:58,525 this cost function is a convex function. 113 00:05:58,525 --> 00:06:01,220 Informally, a convex function 114 00:06:01,220 --> 00:06:03,530 is of bowl-shaped function and 115 00:06:03,530 --> 00:06:05,630 it cannot have any local minima 116 00:06:05,630 --> 00:06:09,145 other than the single global minimum. 117 00:06:09,145 --> 00:06:13,715 When you implement gradient descent on a convex function, 118 00:06:13,715 --> 00:06:16,310 one nice property is that so 119 00:06:16,310 --> 00:06:19,040 long as you're learning rate is chosen appropriately, 120 00:06:19,040 --> 00:06:22,235 it will always converge to the global minimum. 121 00:06:22,235 --> 00:06:24,710 Congratulations, you now know how to 122 00:06:24,710 --> 00:06:27,815 implement gradient descent for linear regression. 123 00:06:27,815 --> 00:06:30,890 We have just one last video for this week. 124 00:06:30,890 --> 00:06:33,755 That video, we'll see this algorithm in action. 125 00:06:33,755 --> 00:06:36,360 Let's go to that last video.9276

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.