Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:03,020 --> 00:00:05,850
Let's look at some
more visualizations of
2
00:00:05,850 --> 00:00:08,905
w and b. Here's one example.
3
00:00:08,905 --> 00:00:14,400
Over here, you have a particular
point on the graph j.
4
00:00:14,400 --> 00:00:17,730
For this point, w
equals about negative
5
00:00:17,730 --> 00:00:22,470
0.15 and b equals about 800.
6
00:00:22,470 --> 00:00:26,160
This point corresponds to
one pair of values for
7
00:00:26,160 --> 00:00:30,090
w and b that use a
particular cost j.
8
00:00:30,090 --> 00:00:33,450
In fact, this booklet
pair of values for w and
9
00:00:33,450 --> 00:00:37,145
b corresponds to this
function f of x,
10
00:00:37,145 --> 00:00:40,495
which is this line you
can see on the left.
11
00:00:40,495 --> 00:00:45,560
This line intersects the
vertical axis at 800 because
12
00:00:45,560 --> 00:00:50,720
b equals 800 and the slope of
the line is negative 0.15,
13
00:00:50,720 --> 00:00:53,770
because w equals negative 0.15.
14
00:00:53,770 --> 00:00:56,930
Now, if you look at the data
points in the training set,
15
00:00:56,930 --> 00:00:58,910
you may notice that this line
16
00:00:58,910 --> 00:01:01,180
is not a good fit to the data.
17
00:01:01,180 --> 00:01:03,905
For this function f of x,
18
00:01:03,905 --> 00:01:07,055
with these values of w and b,
19
00:01:07,055 --> 00:01:11,135
many of the predictions for
the value of y are quite far
20
00:01:11,135 --> 00:01:13,130
from the actual target value of
21
00:01:13,130 --> 00:01:15,785
y that is in the training data.
22
00:01:15,785 --> 00:01:18,390
Because this line
is not a good fit,
23
00:01:18,390 --> 00:01:20,810
if you look at the graph of j,
24
00:01:20,810 --> 00:01:24,680
the cost of this
line is out here,
25
00:01:24,680 --> 00:01:27,370
which is pretty far
from the minimum.
26
00:01:27,370 --> 00:01:30,350
There's a pretty high cost
because this choice of
27
00:01:30,350 --> 00:01:34,260
w and b is just not that good
a fit to the training set.
28
00:01:34,310 --> 00:01:36,500
Now, let's look at
29
00:01:36,500 --> 00:01:41,180
another example with a
different choice of w and b.
30
00:01:41,180 --> 00:01:43,760
Now, here's another
function that
31
00:01:43,760 --> 00:01:46,415
is still not a great
fit for the data,
32
00:01:46,415 --> 00:01:48,985
but maybe slightly less bad.
33
00:01:48,985 --> 00:01:51,410
This points here represents
34
00:01:51,410 --> 00:01:52,955
the cost for this booklet pair
35
00:01:52,955 --> 00:01:56,755
of w and b that
creates that line.
36
00:01:56,755 --> 00:01:59,840
The value of w is equal to 0 and
37
00:01:59,840 --> 00:02:03,640
the value b is about 360.
38
00:02:03,640 --> 00:02:07,070
This pair of parameters
corresponds to this function,
39
00:02:07,070 --> 00:02:08,645
which is a flat line,
40
00:02:08,645 --> 00:02:13,655
because f of x equals
0 times x plus 360.
41
00:02:13,655 --> 00:02:15,520
I hope that makes sense.
42
00:02:15,520 --> 00:02:18,635
Let's look at yet
another example.
43
00:02:18,635 --> 00:02:21,350
Here's one more
choice for w and b,
44
00:02:21,350 --> 00:02:23,000
and with these values,
45
00:02:23,000 --> 00:02:25,550
you end up with
this line f of x.
46
00:02:25,550 --> 00:02:27,750
Again, not a great
fit to the data,
47
00:02:27,750 --> 00:02:29,720
is actually further
away from the minimum
48
00:02:29,720 --> 00:02:32,620
compared to the
previous example.
49
00:02:32,620 --> 00:02:34,890
Remember that the minimum is at
50
00:02:34,890 --> 00:02:38,250
the center of that
smallest ellipse.
51
00:02:38,250 --> 00:02:43,520
Last example, if you look
at f of x on the left,
52
00:02:43,520 --> 00:02:46,670
this looks like a pretty good
fit to the training set.
53
00:02:46,670 --> 00:02:49,160
You can see on the right,
54
00:02:49,160 --> 00:02:52,580
this point representing
the cost is very
55
00:02:52,580 --> 00:02:56,570
close to the center of
the smaller ellipse,
56
00:02:56,570 --> 00:02:58,445
it's not quite
exactly the minimum,
57
00:02:58,445 --> 00:02:59,795
but it's pretty close.
58
00:02:59,795 --> 00:03:02,495
For this value of w and b,
59
00:03:02,495 --> 00:03:06,340
you get to this line, f of x.
60
00:03:06,340 --> 00:03:08,510
You can see that if you measure
61
00:03:08,510 --> 00:03:10,250
the vertical distances between
62
00:03:10,250 --> 00:03:11,390
the data points and
63
00:03:11,390 --> 00:03:14,315
the predicted values
on the straight line,
64
00:03:14,315 --> 00:03:18,280
you'd get the error
for each data point.
65
00:03:18,280 --> 00:03:21,020
The sum of squared
errors for all of
66
00:03:21,020 --> 00:03:24,050
these data points
is pretty close to
67
00:03:24,050 --> 00:03:25,970
the minimum possible sum of
68
00:03:25,970 --> 00:03:30,370
squared errors among all
possible straight line fits.
69
00:03:30,370 --> 00:03:33,155
I hope that by looking
at these figures,
70
00:03:33,155 --> 00:03:35,960
you can get a better sense
of how different choices
71
00:03:35,960 --> 00:03:38,750
of the parameters
affect the line f
72
00:03:38,750 --> 00:03:40,610
of x and how this
73
00:03:40,610 --> 00:03:44,875
corresponds to different
values for the cost j,
74
00:03:44,875 --> 00:03:48,140
and hopefully you can see how
75
00:03:48,140 --> 00:03:52,160
the better fit lines correspond
to points on the graph of
76
00:03:52,160 --> 00:03:55,865
j that are closer to the
minimum possible cost
77
00:03:55,865 --> 00:04:00,935
for this cost function
j of w and b.
78
00:04:00,935 --> 00:04:04,625
In the optional lab that
follows this video,
79
00:04:04,625 --> 00:04:05,810
you'll get to run
80
00:04:05,810 --> 00:04:09,050
some codes and remember
all the code is given,
81
00:04:09,050 --> 00:04:10,340
so you just need to hit
82
00:04:10,340 --> 00:04:13,060
Shift Enter to run it
and take a look at it
83
00:04:13,060 --> 00:04:15,200
and the lab will show you how
84
00:04:15,200 --> 00:04:18,400
the cost function is
implemented in code.
85
00:04:18,400 --> 00:04:20,570
Given a small training set
86
00:04:20,570 --> 00:04:23,060
and different choices
for the parameters,
87
00:04:23,060 --> 00:04:25,760
you'll be able to see
how the cost varies
88
00:04:25,760 --> 00:04:29,255
depending on how well
the model fits the data.
89
00:04:29,255 --> 00:04:30,830
In the optional lab,
90
00:04:30,830 --> 00:04:32,425
you also can play with in
91
00:04:32,425 --> 00:04:35,070
interactive console
plot. Check this out.
92
00:04:35,070 --> 00:04:37,220
You can use your
mouse cursor to click
93
00:04:37,220 --> 00:04:39,800
anywhere on the contour
plot and you will
94
00:04:39,800 --> 00:04:41,960
see the straight line defined by
95
00:04:41,960 --> 00:04:45,105
the values you chose for
the parameters w and b.
96
00:04:45,105 --> 00:04:48,230
You'll see a dot up here also on
97
00:04:48,230 --> 00:04:51,425
the 3D surface plot
showing the cost.
98
00:04:51,425 --> 00:04:54,440
Finally, the optional
lab also has
99
00:04:54,440 --> 00:04:57,440
a 3D surface plot
that you can manually
100
00:04:57,440 --> 00:04:59,630
rotate and spin around using
101
00:04:59,630 --> 00:05:01,310
your mouse cursor to take
102
00:05:01,310 --> 00:05:04,210
a better look at what the
cost function looks like.
103
00:05:04,210 --> 00:05:07,310
I hope you'll enjoy playing
with the optional lab.
104
00:05:07,310 --> 00:05:09,754
Now in linear regression,
105
00:05:09,754 --> 00:05:12,230
rather than having to
manually try to read
106
00:05:12,230 --> 00:05:15,350
a contour plot for the
best value for w and b,
107
00:05:15,350 --> 00:05:18,140
which isn't really a good
procedure and also won't work
108
00:05:18,140 --> 00:05:21,265
once we get to more complex
machine learning models.
109
00:05:21,265 --> 00:05:22,850
What you really want is
110
00:05:22,850 --> 00:05:26,060
an efficient algorithm that
you can write in code for
111
00:05:26,060 --> 00:05:28,880
automatically finding the
values of parameters w
112
00:05:28,880 --> 00:05:31,895
and b they give you
the best fit line.
113
00:05:31,895 --> 00:05:34,655
That minimizes the
cost function j.
114
00:05:34,655 --> 00:05:36,290
There is an algorithm for doing
115
00:05:36,290 --> 00:05:38,530
this called gradient descent.
116
00:05:38,530 --> 00:05:40,070
This algorithm is one of
117
00:05:40,070 --> 00:05:42,830
the most important algorithms
in machine learning.
118
00:05:42,830 --> 00:05:45,290
Gradient descent and variations
119
00:05:45,290 --> 00:05:47,420
on gradient descent
are used to train,
120
00:05:47,420 --> 00:05:49,025
not just linear regression,
121
00:05:49,025 --> 00:05:50,660
but some of the biggest and most
122
00:05:50,660 --> 00:05:53,365
complex models in all of AI.
123
00:05:53,365 --> 00:05:56,270
Let's go to the next
video to dive into
124
00:05:56,270 --> 00:06:00,540
this really important algorithm
called gradient descent.8914
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.