All language subtitles for 03_checking-gradient-descent-for-convergence.en

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian Download
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,680 --> 00:00:03,165 When running gradient descent, 2 00:00:03,165 --> 00:00:05,490 how can you tell if it is converging? 3 00:00:05,490 --> 00:00:07,740 That is, whether it's helping you to find 4 00:00:07,740 --> 00:00:10,020 parameters close to the global minimum 5 00:00:10,020 --> 00:00:11,565 of the cost function. 6 00:00:11,565 --> 00:00:13,560 By learning to recognize what 7 00:00:13,560 --> 00:00:15,030 a well-running implementation of 8 00:00:15,030 --> 00:00:16,320 gradient descent looks like, 9 00:00:16,320 --> 00:00:18,630 we will also, in a later video, 10 00:00:18,630 --> 00:00:22,170 be better able to choose a good learning rate Alpha. 11 00:00:22,170 --> 00:00:24,735 Let's take a look. As a reminder, 12 00:00:24,735 --> 00:00:26,565 here's the gradient descent rule. 13 00:00:26,565 --> 00:00:29,040 One of the key choices is 14 00:00:29,040 --> 00:00:32,110 the choice of the learning rate Alpha. 15 00:00:32,110 --> 00:00:34,760 Here's something that I often do to make 16 00:00:34,760 --> 00:00:37,580 sure that gradient descent is working well. 17 00:00:37,580 --> 00:00:39,110 Recall that the job of 18 00:00:39,110 --> 00:00:41,690 gradient descent is to find parameters w 19 00:00:41,690 --> 00:00:46,075 and b that hopefully minimize the cost function J. 20 00:00:46,075 --> 00:00:49,865 What I'll often do is plot the cost function J, 21 00:00:49,865 --> 00:00:52,895 which is calculated on the training set, 22 00:00:52,895 --> 00:00:55,460 and I plot the value of J at 23 00:00:55,460 --> 00:00:58,550 each iteration of gradient descent. 24 00:00:58,550 --> 00:01:01,970 Remember that each iteration means after 25 00:01:01,970 --> 00:01:07,270 each simultaneous update of the parameters w and b. 26 00:01:07,270 --> 00:01:11,210 In this plot, the horizontal axis is 27 00:01:11,210 --> 00:01:13,430 the number of iterations of 28 00:01:13,430 --> 00:01:16,720 gradient descent that you've run so far. 29 00:01:16,720 --> 00:01:20,055 You may get a curve that looks like this. 30 00:01:20,055 --> 00:01:22,685 Notice that the horizontal axis 31 00:01:22,685 --> 00:01:24,650 is the number of iterations of 32 00:01:24,650 --> 00:01:29,830 gradient descent and not a parameter like w or b. 33 00:01:29,830 --> 00:01:32,360 This differs from previous graphs you've 34 00:01:32,360 --> 00:01:35,570 seen where the vertical axis was cost 35 00:01:35,570 --> 00:01:38,150 J and the horizontal axis was 36 00:01:38,150 --> 00:01:42,125 a single parameter like w or b. 37 00:01:42,125 --> 00:01:46,130 This curve is also called a learning curve. 38 00:01:46,130 --> 00:01:47,930 Note that there are 39 00:01:47,930 --> 00:01:49,730 a few different types of learning 40 00:01:49,730 --> 00:01:51,635 curves used in machine learning, 41 00:01:51,635 --> 00:01:53,315 and you see some of the types 42 00:01:53,315 --> 00:01:55,510 later in this course as well. 43 00:01:55,510 --> 00:01:59,900 Concretely, if you look here at this point on the curve, 44 00:01:59,900 --> 00:02:02,195 this means that after you've run 45 00:02:02,195 --> 00:02:04,880 gradient descent for 100 iterations, 46 00:02:04,880 --> 00:02:08,515 meaning 100 simultaneous updates of the parameters, 47 00:02:08,515 --> 00:02:12,495 you have some learned values for w and b. 48 00:02:12,495 --> 00:02:16,020 If you compute the cost J, w, 49 00:02:16,020 --> 00:02:19,265 b for those values of w and b, 50 00:02:19,265 --> 00:02:21,995 the ones you got after 100 iterations, 51 00:02:21,995 --> 00:02:25,160 you get this value for the cost J. 52 00:02:25,160 --> 00:02:29,665 That is this point on the vertical axis. 53 00:02:29,665 --> 00:02:34,550 This point here corresponds to the value of J for 54 00:02:34,550 --> 00:02:36,350 the parameters that you got after 55 00:02:36,350 --> 00:02:39,695 200 iterations of gradient descent. 56 00:02:39,695 --> 00:02:42,965 Looking at this graph helps you to see 57 00:02:42,965 --> 00:02:44,900 how your cost J changes 58 00:02:44,900 --> 00:02:47,615 after each iteration of gradient descent. 59 00:02:47,615 --> 00:02:50,150 If gradient descent is working properly, 60 00:02:50,150 --> 00:02:51,620 then the cost J should 61 00:02:51,620 --> 00:02:54,604 decrease after every single iteration. 62 00:02:54,604 --> 00:02:58,880 If J ever increases after one iteration, 63 00:02:58,880 --> 00:03:02,315 that means either Alpha is chosen poorly, 64 00:03:02,315 --> 00:03:05,225 and it usually means Alpha is too large, 65 00:03:05,225 --> 00:03:07,540 or there could be a bug in the code. 66 00:03:07,540 --> 00:03:10,280 Another useful thing that this part can tell 67 00:03:10,280 --> 00:03:12,965 you is that if you look at this curve, 68 00:03:12,965 --> 00:03:16,805 by the time you reach maybe 300 iterations also, 69 00:03:16,805 --> 00:03:19,145 the cost J is leveling 70 00:03:19,145 --> 00:03:22,690 off and is no longer decreasing much. 71 00:03:22,690 --> 00:03:24,950 By 400 iterations, 72 00:03:24,950 --> 00:03:27,740 it looks like the curve has flattened out. 73 00:03:27,740 --> 00:03:31,700 This means that gradient descent has more or less 74 00:03:31,700 --> 00:03:36,550 converged because the curve is no longer decreasing. 75 00:03:36,550 --> 00:03:38,840 Looking at this learning curve, 76 00:03:38,840 --> 00:03:41,180 you can try to spot whether or not 77 00:03:41,180 --> 00:03:44,005 gradient descent is converging. 78 00:03:44,005 --> 00:03:46,310 By the way, the number 79 00:03:46,310 --> 00:03:48,125 of iterations that gradient descent 80 00:03:48,125 --> 00:03:49,970 takes a conversion can vary 81 00:03:49,970 --> 00:03:52,175 a lot between different applications. 82 00:03:52,175 --> 00:03:53,840 In one application, it may 83 00:03:53,840 --> 00:03:56,584 converge after just 30 iterations. 84 00:03:56,584 --> 00:03:58,220 For a different application, 85 00:03:58,220 --> 00:04:02,180 it could take 1,000 or 100,000 iterations. 86 00:04:02,180 --> 00:04:06,050 It turns out to be very difficult to tell in 87 00:04:06,050 --> 00:04:08,405 advance how many iterations 88 00:04:08,405 --> 00:04:10,555 gradient descent needs to converge, 89 00:04:10,555 --> 00:04:12,710 which is why you can create 90 00:04:12,710 --> 00:04:15,185 a graph like this, a learning curve. 91 00:04:15,185 --> 00:04:17,360 Try to find out when you can start 92 00:04:17,360 --> 00:04:20,120 training your particular model. 93 00:04:20,120 --> 00:04:23,900 Another way to decide when your model is done training 94 00:04:23,900 --> 00:04:28,250 is with an automatic convergence test. 95 00:04:29,980 --> 00:04:33,805 Here is the Greek alphabet epsilon. 96 00:04:33,805 --> 00:04:35,710 Let's let epsilon be a 97 00:04:35,710 --> 00:04:37,930 variable representing a small number, 98 00:04:37,930 --> 00:04:43,220 such as 0.001 or 10^-3. 99 00:04:43,220 --> 00:04:45,970 If the cost J decreases by less 100 00:04:45,970 --> 00:04:48,550 than this number epsilon on one iteration, 101 00:04:48,550 --> 00:04:51,640 then you're likely on this flattened part of 102 00:04:51,640 --> 00:04:53,260 the curve that you see on 103 00:04:53,260 --> 00:04:56,105 the left and you can declare convergence. 104 00:04:56,105 --> 00:04:58,135 Remember, convergence, 105 00:04:58,135 --> 00:05:00,550 hopefully in the case that you found parameters 106 00:05:00,550 --> 00:05:04,045 w and b that are close to the minimum possible value of 107 00:05:04,045 --> 00:05:06,850 J. I usually find 108 00:05:06,850 --> 00:05:08,290 that choosing the right threshold 109 00:05:08,290 --> 00:05:10,195 epsilon is pretty difficult. 110 00:05:10,195 --> 00:05:12,020 I actually tend to look at graphs 111 00:05:12,020 --> 00:05:13,340 like this one on the left, 112 00:05:13,340 --> 00:05:16,720 rather than rely on automatic convergence tests. 113 00:05:16,720 --> 00:05:19,860 Looking at the solid figure can tell you, 114 00:05:19,860 --> 00:05:22,700 I'll give you at some advanced warning if 115 00:05:22,700 --> 00:05:26,635 maybe gradient descent is not working correctly as well. 116 00:05:26,635 --> 00:05:29,630 You've now seen what the learning curve 117 00:05:29,630 --> 00:05:32,450 should look like when gradient descent is running well. 118 00:05:32,450 --> 00:05:35,300 Let's take these insights and in the next video, 119 00:05:35,300 --> 00:05:36,500 take a look at how to 120 00:05:36,500 --> 00:05:39,810 choose an appropriate learning rate.8522

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.