All language subtitles for 02_addressing-overfitting.en

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian Download
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:01,400 --> 00:00:04,080 Later in this specialization, 2 00:00:04,080 --> 00:00:05,880 we'll talk about debugging and 3 00:00:05,880 --> 00:00:07,350 diagnosing things that can go 4 00:00:07,350 --> 00:00:09,090 wrong with learning algorithms. 5 00:00:09,090 --> 00:00:11,730 You'll also learn about specific tools to 6 00:00:11,730 --> 00:00:13,860 recognize when overfitting and 7 00:00:13,860 --> 00:00:16,005 underfitting may be occurring. 8 00:00:16,005 --> 00:00:19,320 But for now, when you think overfitting has occurred, 9 00:00:19,320 --> 00:00:21,870 lets talk about what you can do to address it. 10 00:00:21,870 --> 00:00:24,035 Let's say you fit a model 11 00:00:24,035 --> 00:00:27,110 and it has high variance, is overfit. 12 00:00:27,110 --> 00:00:31,375 Here's our overfit house price prediction model. 13 00:00:31,375 --> 00:00:34,880 One way to address this problem is to 14 00:00:34,880 --> 00:00:38,680 collect more training data, that's one option. 15 00:00:38,680 --> 00:00:40,835 If you're able to get more data, 16 00:00:40,835 --> 00:00:42,875 that is more training examples 17 00:00:42,875 --> 00:00:45,595 on sizes and prices of houses, 18 00:00:45,595 --> 00:00:48,485 then with the larger training set, 19 00:00:48,485 --> 00:00:50,705 the learning algorithm will learn to 20 00:00:50,705 --> 00:00:53,770 fit a function that is less wiggly. 21 00:00:53,770 --> 00:00:55,580 You can continue to fit 22 00:00:55,580 --> 00:00:57,200 a high order polynomial 23 00:00:57,200 --> 00:00:59,675 or some of the function with a lot of features, 24 00:00:59,675 --> 00:01:02,135 and if you have enough training examples, 25 00:01:02,135 --> 00:01:04,135 it will still do okay. 26 00:01:04,135 --> 00:01:08,050 To summarize, the number one tool you can 27 00:01:08,050 --> 00:01:11,700 use against overfitting is to get more training data. 28 00:01:11,700 --> 00:01:14,970 Now, getting more data isn't always an option. 29 00:01:14,970 --> 00:01:16,660 Maybe only so many houses have 30 00:01:16,660 --> 00:01:18,460 been sold in this location, 31 00:01:18,460 --> 00:01:21,280 so maybe there just isn't more data to be add. 32 00:01:21,280 --> 00:01:22,945 But when the data is available, 33 00:01:22,945 --> 00:01:24,440 this can work really well. 34 00:01:24,440 --> 00:01:26,800 A second option for addressing 35 00:01:26,800 --> 00:01:30,785 overfitting is to see if you can use fewer features. 36 00:01:30,785 --> 00:01:33,145 In the previous video, 37 00:01:33,145 --> 00:01:36,490 our models features included the size x, 38 00:01:36,490 --> 00:01:39,580 as well as the size squared, and this x squared, 39 00:01:39,580 --> 00:01:43,895 and x cubed and x^4 and so on. 40 00:01:43,895 --> 00:01:47,560 These were a lot of polynomial features. 41 00:01:47,560 --> 00:01:51,190 In that case, one way to reduce overfitting is to 42 00:01:51,190 --> 00:01:55,000 just not use so many of these polynomial features. 43 00:01:55,000 --> 00:01:57,655 But now let's look at a different example. 44 00:01:57,655 --> 00:02:00,280 Maybe you have a lot of different features of 45 00:02:00,280 --> 00:02:02,845 a house of which to try to predict its price, 46 00:02:02,845 --> 00:02:05,170 ranging from the size, number of bedrooms, 47 00:02:05,170 --> 00:02:06,895 number of floors, the age, 48 00:02:06,895 --> 00:02:08,740 average income of the neighborhood, 49 00:02:08,740 --> 00:02:10,090 and so on and so forth, 50 00:02:10,090 --> 00:02:12,910 total distance to the nearest coffee shop. 51 00:02:12,910 --> 00:02:16,150 It turns out that if you have a lot of features like 52 00:02:16,150 --> 00:02:19,420 these but don't have enough training data, 53 00:02:19,420 --> 00:02:21,100 then your learning algorithm may 54 00:02:21,100 --> 00:02:23,695 also overfit to your training set. 55 00:02:23,695 --> 00:02:26,875 Now instead of using all 100 features, 56 00:02:26,875 --> 00:02:30,160 if we were to pick just a subset of the most useful ones, 57 00:02:30,160 --> 00:02:33,740 maybe size, bedrooms, 58 00:02:33,740 --> 00:02:35,900 and the age of the house. 59 00:02:35,900 --> 00:02:38,545 If you think those are the most relevant features, 60 00:02:38,545 --> 00:02:41,605 then using just that smallest subset of features, 61 00:02:41,605 --> 00:02:45,815 you may find that your model no longer overfits as badly. 62 00:02:45,815 --> 00:02:48,860 Choosing the most appropriate set of features to 63 00:02:48,860 --> 00:02:52,240 use is sometimes also called feature selection. 64 00:02:52,240 --> 00:02:54,700 One way you could do so is to use 65 00:02:54,700 --> 00:02:56,500 your intuition to choose what you 66 00:02:56,500 --> 00:02:58,360 think is the best set of features, 67 00:02:58,360 --> 00:03:01,070 what's most relevant for predicting the price. 68 00:03:01,070 --> 00:03:04,570 Now, one disadvantage of feature selection 69 00:03:04,570 --> 00:03:08,095 is that by using only a subset of the features, 70 00:03:08,095 --> 00:03:10,420 the algorithm is throwing away some of 71 00:03:10,420 --> 00:03:12,875 the information that you have about the houses. 72 00:03:12,875 --> 00:03:15,420 For example, maybe all of these features, 73 00:03:15,420 --> 00:03:17,620 all 100 of them are actually 74 00:03:17,620 --> 00:03:20,125 useful for predicting the price of a house. 75 00:03:20,125 --> 00:03:22,390 Maybe you don't want to throw away some of 76 00:03:22,390 --> 00:03:25,820 the information by throwing away some of the features. 77 00:03:25,820 --> 00:03:27,495 Later in Course 2, 78 00:03:27,495 --> 00:03:30,610 you'll also see some algorithms for automatically 79 00:03:30,610 --> 00:03:32,620 choosing the most appropriate set of 80 00:03:32,620 --> 00:03:35,150 features to use for our prediction task. 81 00:03:35,150 --> 00:03:36,910 Now, this takes us to 82 00:03:36,910 --> 00:03:39,295 the third option for reducing overfitting. 83 00:03:39,295 --> 00:03:42,610 This technique, which we'll look at in even greater depth 84 00:03:42,610 --> 00:03:46,400 in the next video is called regularization. 85 00:03:46,400 --> 00:03:50,274 If you look at an overfit model, 86 00:03:50,274 --> 00:03:53,725 here's a model using polynomial features: x, 87 00:03:53,725 --> 00:03:55,570 x squared, x cubed, and so on. 88 00:03:55,570 --> 00:03:59,665 You find that the parameters are often relatively large. 89 00:03:59,665 --> 00:04:01,730 Now if you were to 90 00:04:01,730 --> 00:04:04,100 eliminate some of these features, say, 91 00:04:04,100 --> 00:04:07,100 if you were to eliminate the feature x4, 92 00:04:07,100 --> 00:04:12,220 that corresponds to setting this parameter to 0. 93 00:04:12,220 --> 00:04:15,140 So setting a parameter to 0 94 00:04:15,140 --> 00:04:17,660 is equivalent to eliminating a feature, 95 00:04:17,660 --> 00:04:20,515 which is what we saw on the previous slide. 96 00:04:20,515 --> 00:04:22,940 It turns out that regularization 97 00:04:22,940 --> 00:04:25,700 is a way to more gently reduce 98 00:04:25,700 --> 00:04:28,310 the impacts of some of the features without 99 00:04:28,310 --> 00:04:31,825 doing something as harsh as eliminating it outright. 100 00:04:31,825 --> 00:04:34,540 What regularization does is encourage 101 00:04:34,540 --> 00:04:37,295 the learning algorithm to shrink the values of 102 00:04:37,295 --> 00:04:39,470 the parameters without necessarily 103 00:04:39,470 --> 00:04:43,505 demanding that the parameter is set to exactly 0. 104 00:04:43,505 --> 00:04:45,920 It turns out that even if you fit 105 00:04:45,920 --> 00:04:48,355 a higher order polynomial like this, 106 00:04:48,355 --> 00:04:50,750 so long as you can get the algorithm to use 107 00:04:50,750 --> 00:04:53,000 smaller parameter values: w1, 108 00:04:53,000 --> 00:04:55,175 w2, w3, w4. 109 00:04:55,175 --> 00:04:57,575 You end up with a curve that ends up fitting 110 00:04:57,575 --> 00:05:00,275 the training data much better. 111 00:05:00,275 --> 00:05:02,210 So what regularization does, 112 00:05:02,210 --> 00:05:04,730 is it lets you keep all of your features, 113 00:05:04,730 --> 00:05:07,190 but they just prevents the features from 114 00:05:07,190 --> 00:05:09,920 having an overly large effect, 115 00:05:09,920 --> 00:05:13,720 which is what sometimes can cause overfitting. 116 00:05:13,720 --> 00:05:15,995 By the way, by convention, 117 00:05:15,995 --> 00:05:20,960 we normally just reduce the size of the wj parameters, 118 00:05:20,960 --> 00:05:23,125 that is w1 through wn. 119 00:05:23,125 --> 00:05:25,970 It doesn't make a huge difference whether you 120 00:05:25,970 --> 00:05:28,835 regularize the parameter b as well, 121 00:05:28,835 --> 00:05:31,370 you could do so if you want or not if you don't. 122 00:05:31,370 --> 00:05:33,650 I usually don't and it's just 123 00:05:33,650 --> 00:05:35,965 fine to regularize w1, w2, 124 00:05:35,965 --> 00:05:37,710 all the way to wn, 125 00:05:37,710 --> 00:05:41,265 but not really encourage b to become smaller. 126 00:05:41,265 --> 00:05:43,775 In practice, it should make very little difference 127 00:05:43,775 --> 00:05:47,035 whether you also regularize b or not. 128 00:05:47,035 --> 00:05:49,940 To recap, these are 129 00:05:49,940 --> 00:05:51,710 the three ways you saw in 130 00:05:51,710 --> 00:05:54,275 this video for addressing overfitting. 131 00:05:54,275 --> 00:05:56,765 One, collect more data. 132 00:05:56,765 --> 00:05:58,955 If you can get more data, 133 00:05:58,955 --> 00:06:01,615 this can really help reduce overfitting. 134 00:06:01,615 --> 00:06:03,800 Sometimes that's not possible. 135 00:06:03,800 --> 00:06:07,145 In which case, some of the options are, two, 136 00:06:07,145 --> 00:06:11,735 try selecting and using only a subset of the features. 137 00:06:11,735 --> 00:06:16,315 You'll learn more about feature selection in Course 2. 138 00:06:16,315 --> 00:06:19,685 Three would be to 139 00:06:19,685 --> 00:06:23,210 reduce the size of the parameters using regularization. 140 00:06:23,210 --> 00:06:26,470 This will be the subject of the next video as well. 141 00:06:26,470 --> 00:06:29,675 Just for myself, I use regularization all the time. 142 00:06:29,675 --> 00:06:31,580 So this is a very useful technique 143 00:06:31,580 --> 00:06:33,320 for training learning algorithms, 144 00:06:33,320 --> 00:06:35,705 including neural networks specifically, 145 00:06:35,705 --> 00:06:38,515 which you'll see later in this specialization as well. 146 00:06:38,515 --> 00:06:40,475 I hope you'll also check out 147 00:06:40,475 --> 00:06:43,820 the optional lab on overfitting. 148 00:06:43,820 --> 00:06:47,525 In the lab, you'll be able to see different examples of 149 00:06:47,525 --> 00:06:50,060 overfitting and adjust those examples 150 00:06:50,060 --> 00:06:52,660 by clicking on options in the plots. 151 00:06:52,660 --> 00:06:54,360 You'll also be able to add 152 00:06:54,360 --> 00:06:56,060 your own data points by clicking on 153 00:06:56,060 --> 00:07:00,835 the plot and see how that changes the curve that is fit. 154 00:07:00,835 --> 00:07:04,610 You can also try examples for both regression and 155 00:07:04,610 --> 00:07:07,070 classification and you will 156 00:07:07,070 --> 00:07:10,160 change the degree of the polynomial to be x, 157 00:07:10,160 --> 00:07:13,105 x squared, x cubed, and so on. 158 00:07:13,105 --> 00:07:15,980 The lab also lets you play with 159 00:07:15,980 --> 00:07:18,850 two different options for addressing overfitting. 160 00:07:18,850 --> 00:07:21,470 You can add additional training data to 161 00:07:21,470 --> 00:07:24,560 reduce overfitting and you can also select which 162 00:07:24,560 --> 00:07:27,095 features to include or to exclude 163 00:07:27,095 --> 00:07:30,790 as another way to try to reduce overfitting. 164 00:07:30,790 --> 00:07:32,525 Please take a look at a lab, 165 00:07:32,525 --> 00:07:35,750 which I hope will help you build your intuition about 166 00:07:35,750 --> 00:07:39,670 overfitting as well as some methods for addressing it. 167 00:07:39,670 --> 00:07:42,620 In this video, you also saw the idea of 168 00:07:42,620 --> 00:07:45,650 regularization at a relatively high level. 169 00:07:45,650 --> 00:07:48,380 I realize that all of these details on 170 00:07:48,380 --> 00:07:51,650 regularization may not fully make sense to you yet. 171 00:07:51,650 --> 00:07:53,180 But in the next video, 172 00:07:53,180 --> 00:07:55,970 we'll start to formulate exactly how to apply 173 00:07:55,970 --> 00:07:59,850 regularization and exactly what regularization means. 174 00:07:59,850 --> 00:08:03,590 Then we'll start to figure out how to make this work with 175 00:08:03,590 --> 00:08:05,510 our learning algorithms to make 176 00:08:05,510 --> 00:08:08,060 linear regression and logistic regression, 177 00:08:08,060 --> 00:08:09,680 and in the future, other algorithms 178 00:08:09,680 --> 00:08:11,750 as well avoid overfitting. 179 00:08:11,750 --> 00:08:15,000 Let's take a look at that in the next video.13175

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.