subtitlecat.com

All language subtitles for 06_visualization-examples.en

Afrikaans

Akan

Albanian

Amharic

Arabic

Armenian

Azerbaijani

Basque

Belarusian

Bemba

Bengali

Bihari

Bosnian

Breton

Bulgarian

Cambodian

Catalan

Cebuano

Cherokee

Chichewa

Chinese (Simplified)

Chinese (Traditional)

Corsican

Croatian

Czech

Danish

Dutch

English

Esperanto

Estonian

Ewe

Faroese

Filipino

Finnish

French

Frisian

Galician

Georgian

German

Greek

Guarani

Gujarati

Haitian Creole

Hausa

Hawaiian

Hebrew

Hindi

Hmong

Hungarian

Icelandic

Igbo

Indonesian

Interlingua

Irish

Italian

Japanese

Javanese

Kannada

Kazakh

Kinyarwanda

Kirundi

Kongo

Korean

Krio (Sierra Leone)

Kurdish

Kurdish (Soranî)

Kyrgyz

Laothian

Latin

Latvian

Lingala

Lithuanian

Lozi

Luganda

Luo

Luxembourgish

Macedonian

Malagasy

Malay

Malayalam

Maltese

Maori

Marathi

Mauritian Creole

Moldavian

Mongolian

Myanmar (Burmese)

Montenegrin

Nepali

Nigerian Pidgin

Northern Sotho

Norwegian

Norwegian (Nynorsk)

Occitan

Oriya

Oromo

Pashto

Persian Download

Polish

Portuguese (Brazil)

Portuguese (Portugal)

Punjabi

Quechua

Romanian

Romansh

Runyakitara

Russian

Samoan

Scots Gaelic

Serbian

Serbo-Croatian

Sesotho

Setswana

Seychellois Creole

Shona

Sindhi

Sinhalese

Slovak

Slovenian

Somali

Spanish

Spanish (Latin American)

Sundanese

Swahili

Swedish

Tajik

Tamil

Tatar

Telugu

Thai

Tigrinya

Tonga

Tshiluba

Tumbuka

Turkish

Turkmen

Twi

Uighur

Ukrainian

Urdu

Uzbek

Vietnamese

Welsh

Wolof

Xhosa

Yiddish

Yoruba

Zulu

Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:03,020 --> 00:00:05,850 Let's look at some more visualizations of 2 00:00:05,850 --> 00:00:08,905 w and b. Here's one example. 3 00:00:08,905 --> 00:00:14,400 Over here, you have a particular point on the graph j. 4 00:00:14,400 --> 00:00:17,730 For this point, w equals about negative 5 00:00:17,730 --> 00:00:22,470 0.15 and b equals about 800. 6 00:00:22,470 --> 00:00:26,160 This point corresponds to one pair of values for 7 00:00:26,160 --> 00:00:30,090 w and b that use a particular cost j. 8 00:00:30,090 --> 00:00:33,450 In fact, this booklet pair of values for w and 9 00:00:33,450 --> 00:00:37,145 b corresponds to this function f of x, 10 00:00:37,145 --> 00:00:40,495 which is this line you can see on the left. 11 00:00:40,495 --> 00:00:45,560 This line intersects the vertical axis at 800 because 12 00:00:45,560 --> 00:00:50,720 b equals 800 and the slope of the line is negative 0.15, 13 00:00:50,720 --> 00:00:53,770 because w equals negative 0.15. 14 00:00:53,770 --> 00:00:56,930 Now, if you look at the data points in the training set, 15 00:00:56,930 --> 00:00:58,910 you may notice that this line 16 00:00:58,910 --> 00:01:01,180 is not a good fit to the data. 17 00:01:01,180 --> 00:01:03,905 For this function f of x, 18 00:01:03,905 --> 00:01:07,055 with these values of w and b, 19 00:01:07,055 --> 00:01:11,135 many of the predictions for the value of y are quite far 20 00:01:11,135 --> 00:01:13,130 from the actual target value of 21 00:01:13,130 --> 00:01:15,785 y that is in the training data. 22 00:01:15,785 --> 00:01:18,390 Because this line is not a good fit, 23 00:01:18,390 --> 00:01:20,810 if you look at the graph of j, 24 00:01:20,810 --> 00:01:24,680 the cost of this line is out here, 25 00:01:24,680 --> 00:01:27,370 which is pretty far from the minimum. 26 00:01:27,370 --> 00:01:30,350 There's a pretty high cost because this choice of 27 00:01:30,350 --> 00:01:34,260 w and b is just not that good a fit to the training set. 28 00:01:34,310 --> 00:01:36,500 Now, let's look at 29 00:01:36,500 --> 00:01:41,180 another example with a different choice of w and b. 30 00:01:41,180 --> 00:01:43,760 Now, here's another function that 31 00:01:43,760 --> 00:01:46,415 is still not a great fit for the data, 32 00:01:46,415 --> 00:01:48,985 but maybe slightly less bad. 33 00:01:48,985 --> 00:01:51,410 This points here represents 34 00:01:51,410 --> 00:01:52,955 the cost for this booklet pair 35 00:01:52,955 --> 00:01:56,755 of w and b that creates that line. 36 00:01:56,755 --> 00:01:59,840 The value of w is equal to 0 and 37 00:01:59,840 --> 00:02:03,640 the value b is about 360. 38 00:02:03,640 --> 00:02:07,070 This pair of parameters corresponds to this function, 39 00:02:07,070 --> 00:02:08,645 which is a flat line, 40 00:02:08,645 --> 00:02:13,655 because f of x equals 0 times x plus 360. 41 00:02:13,655 --> 00:02:15,520 I hope that makes sense. 42 00:02:15,520 --> 00:02:18,635 Let's look at yet another example. 43 00:02:18,635 --> 00:02:21,350 Here's one more choice for w and b, 44 00:02:21,350 --> 00:02:23,000 and with these values, 45 00:02:23,000 --> 00:02:25,550 you end up with this line f of x. 46 00:02:25,550 --> 00:02:27,750 Again, not a great fit to the data, 47 00:02:27,750 --> 00:02:29,720 is actually further away from the minimum 48 00:02:29,720 --> 00:02:32,620 compared to the previous example. 49 00:02:32,620 --> 00:02:34,890 Remember that the minimum is at 50 00:02:34,890 --> 00:02:38,250 the center of that smallest ellipse. 51 00:02:38,250 --> 00:02:43,520 Last example, if you look at f of x on the left, 52 00:02:43,520 --> 00:02:46,670 this looks like a pretty good fit to the training set. 53 00:02:46,670 --> 00:02:49,160 You can see on the right, 54 00:02:49,160 --> 00:02:52,580 this point representing the cost is very 55 00:02:52,580 --> 00:02:56,570 close to the center of the smaller ellipse, 56 00:02:56,570 --> 00:02:58,445 it's not quite exactly the minimum, 57 00:02:58,445 --> 00:02:59,795 but it's pretty close. 58 00:02:59,795 --> 00:03:02,495 For this value of w and b, 59 00:03:02,495 --> 00:03:06,340 you get to this line, f of x. 60 00:03:06,340 --> 00:03:08,510 You can see that if you measure 61 00:03:08,510 --> 00:03:10,250 the vertical distances between 62 00:03:10,250 --> 00:03:11,390 the data points and 63 00:03:11,390 --> 00:03:14,315 the predicted values on the straight line, 64 00:03:14,315 --> 00:03:18,280 you'd get the error for each data point. 65 00:03:18,280 --> 00:03:21,020 The sum of squared errors for all of 66 00:03:21,020 --> 00:03:24,050 these data points is pretty close to 67 00:03:24,050 --> 00:03:25,970 the minimum possible sum of 68 00:03:25,970 --> 00:03:30,370 squared errors among all possible straight line fits. 69 00:03:30,370 --> 00:03:33,155 I hope that by looking at these figures, 70 00:03:33,155 --> 00:03:35,960 you can get a better sense of how different choices 71 00:03:35,960 --> 00:03:38,750 of the parameters affect the line f 72 00:03:38,750 --> 00:03:40,610 of x and how this 73 00:03:40,610 --> 00:03:44,875 corresponds to different values for the cost j, 74 00:03:44,875 --> 00:03:48,140 and hopefully you can see how 75 00:03:48,140 --> 00:03:52,160 the better fit lines correspond to points on the graph of 76 00:03:52,160 --> 00:03:55,865 j that are closer to the minimum possible cost 77 00:03:55,865 --> 00:04:00,935 for this cost function j of w and b. 78 00:04:00,935 --> 00:04:04,625 In the optional lab that follows this video, 79 00:04:04,625 --> 00:04:05,810 you'll get to run 80 00:04:05,810 --> 00:04:09,050 some codes and remember all the code is given, 81 00:04:09,050 --> 00:04:10,340 so you just need to hit 82 00:04:10,340 --> 00:04:13,060 Shift Enter to run it and take a look at it 83 00:04:13,060 --> 00:04:15,200 and the lab will show you how 84 00:04:15,200 --> 00:04:18,400 the cost function is implemented in code. 85 00:04:18,400 --> 00:04:20,570 Given a small training set 86 00:04:20,570 --> 00:04:23,060 and different choices for the parameters, 87 00:04:23,060 --> 00:04:25,760 you'll be able to see how the cost varies 88 00:04:25,760 --> 00:04:29,255 depending on how well the model fits the data. 89 00:04:29,255 --> 00:04:30,830 In the optional lab, 90 00:04:30,830 --> 00:04:32,425 you also can play with in 91 00:04:32,425 --> 00:04:35,070 interactive console plot. Check this out. 92 00:04:35,070 --> 00:04:37,220 You can use your mouse cursor to click 93 00:04:37,220 --> 00:04:39,800 anywhere on the contour plot and you will 94 00:04:39,800 --> 00:04:41,960 see the straight line defined by 95 00:04:41,960 --> 00:04:45,105 the values you chose for the parameters w and b. 96 00:04:45,105 --> 00:04:48,230 You'll see a dot up here also on 97 00:04:48,230 --> 00:04:51,425 the 3D surface plot showing the cost. 98 00:04:51,425 --> 00:04:54,440 Finally, the optional lab also has 99 00:04:54,440 --> 00:04:57,440 a 3D surface plot that you can manually 100 00:04:57,440 --> 00:04:59,630 rotate and spin around using 101 00:04:59,630 --> 00:05:01,310 your mouse cursor to take 102 00:05:01,310 --> 00:05:04,210 a better look at what the cost function looks like. 103 00:05:04,210 --> 00:05:07,310 I hope you'll enjoy playing with the optional lab. 104 00:05:07,310 --> 00:05:09,754 Now in linear regression, 105 00:05:09,754 --> 00:05:12,230 rather than having to manually try to read 106 00:05:12,230 --> 00:05:15,350 a contour plot for the best value for w and b, 107 00:05:15,350 --> 00:05:18,140 which isn't really a good procedure and also won't work 108 00:05:18,140 --> 00:05:21,265 once we get to more complex machine learning models. 109 00:05:21,265 --> 00:05:22,850 What you really want is 110 00:05:22,850 --> 00:05:26,060 an efficient algorithm that you can write in code for 111 00:05:26,060 --> 00:05:28,880 automatically finding the values of parameters w 112 00:05:28,880 --> 00:05:31,895 and b they give you the best fit line. 113 00:05:31,895 --> 00:05:34,655 That minimizes the cost function j. 114 00:05:34,655 --> 00:05:36,290 There is an algorithm for doing 115 00:05:36,290 --> 00:05:38,530 this called gradient descent. 116 00:05:38,530 --> 00:05:40,070 This algorithm is one of 117 00:05:40,070 --> 00:05:42,830 the most important algorithms in machine learning. 118 00:05:42,830 --> 00:05:45,290 Gradient descent and variations 119 00:05:45,290 --> 00:05:47,420 on gradient descent are used to train, 120 00:05:47,420 --> 00:05:49,025 not just linear regression, 121 00:05:49,025 --> 00:05:50,660 but some of the biggest and most 122 00:05:50,660 --> 00:05:53,365 complex models in all of AI. 123 00:05:53,365 --> 00:05:56,270 Let's go to the next video to dive into 124 00:05:56,270 --> 00:06:00,540 this really important algorithm called gradient descent.8914