Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,770 --> 00:00:03,270
Let's look in this video at
2
00:00:03,270 --> 00:00:06,390
the process of how
supervised learning works.
3
00:00:06,390 --> 00:00:09,750
Supervised learning algorithm
will input a dataset and
4
00:00:09,750 --> 00:00:12,870
then what exactly does it
do and what does it output?
5
00:00:12,870 --> 00:00:14,820
Let's find out in this video.
6
00:00:14,820 --> 00:00:17,430
Recall that a training set in
7
00:00:17,430 --> 00:00:20,535
supervised learning includes
both the input features,
8
00:00:20,535 --> 00:00:21,900
such as the size
of the house and
9
00:00:21,900 --> 00:00:24,030
also the output targets,
10
00:00:24,030 --> 00:00:25,980
such as the price of the house.
11
00:00:25,980 --> 00:00:27,240
The output targets are
12
00:00:27,240 --> 00:00:30,360
the right answers to the
model we'll learn from.
13
00:00:30,360 --> 00:00:32,105
To train the model,
14
00:00:32,105 --> 00:00:33,650
you feed the training set,
15
00:00:33,650 --> 00:00:35,330
both the input features and
16
00:00:35,330 --> 00:00:39,290
the output targets to
your learning algorithm.
17
00:00:39,290 --> 00:00:42,499
Then your supervised
learning algorithm
18
00:00:42,499 --> 00:00:44,645
will produce some function.
19
00:00:44,645 --> 00:00:47,450
We'll write this
function as lowercase f,
20
00:00:47,450 --> 00:00:49,940
where f stands for function.
21
00:00:49,940 --> 00:00:52,100
Historically, this
function used to
22
00:00:52,100 --> 00:00:54,215
be called a hypothesis,
23
00:00:54,215 --> 00:00:58,105
but I'm just going to call it
a function f in this class.
24
00:00:58,105 --> 00:01:02,630
The job with f is
to take a new input
25
00:01:02,630 --> 00:01:08,405
x and output and estimate
or a prediction,
26
00:01:08,405 --> 00:01:11,270
which I'm going to call y-hat,
27
00:01:11,270 --> 00:01:13,250
and it's written like
28
00:01:13,250 --> 00:01:17,620
the variable y with this
little hat symbol on top.
29
00:01:17,620 --> 00:01:21,335
In machine learning,
the convention is that
30
00:01:21,335 --> 00:01:26,965
y-hat is the estimate or
the prediction for y.
31
00:01:26,965 --> 00:01:31,225
The function f is
called the model.
32
00:01:31,225 --> 00:01:35,120
X is called the input
or the input feature,
33
00:01:35,120 --> 00:01:40,405
and the output of the model
is the prediction, y-hat.
34
00:01:40,405 --> 00:01:45,110
The model's prediction is
the estimated value of y.
35
00:01:45,110 --> 00:01:48,844
When the symbol is
just the letter y,
36
00:01:48,844 --> 00:01:51,485
then that refers to the target,
37
00:01:51,485 --> 00:01:55,320
which is the actual true
value in the training set.
38
00:01:55,320 --> 00:01:58,395
In contrast, y-hat
is an estimate.
39
00:01:58,395 --> 00:02:01,430
It may or may not be
the actual true value.
40
00:02:01,430 --> 00:02:03,620
Well, if you're
helping your client
41
00:02:03,620 --> 00:02:05,180
to sell the house, well,
42
00:02:05,180 --> 00:02:06,650
the true price of the house
43
00:02:06,650 --> 00:02:08,815
is unknown until they sell it.
44
00:02:08,815 --> 00:02:11,779
Your model f, given the size,
45
00:02:11,779 --> 00:02:14,720
outputs the price which
is the estimator,
46
00:02:14,720 --> 00:02:18,665
that is the prediction of
what the true price will be.
47
00:02:18,665 --> 00:02:22,640
Now, when we design a
learning algorithm,
48
00:02:22,640 --> 00:02:24,725
a key question is,
49
00:02:24,725 --> 00:02:27,950
how are we going to
represent the function f?
50
00:02:27,950 --> 00:02:29,540
Or in other words,
51
00:02:29,540 --> 00:02:34,720
what is the math formula we're
going to use to compute f?
52
00:02:34,720 --> 00:02:39,430
For now, let's stick with
f being a straight line.
53
00:02:39,430 --> 00:02:43,470
You're function can
be written as f_w,
54
00:02:43,570 --> 00:02:47,285
b of x equals,
55
00:02:47,285 --> 00:02:50,750
I'm going to use w times x plus
56
00:02:50,750 --> 00:02:54,765
b. I'll define w and b soon.
57
00:02:54,765 --> 00:02:58,395
But for now, just know
that w and b are numbers,
58
00:02:58,395 --> 00:03:02,780
and the values chosen for
w and b will determine
59
00:03:02,780 --> 00:03:08,610
the prediction y-hat based
on the input feature x.
60
00:03:08,610 --> 00:03:11,000
This f_w b of x
61
00:03:11,000 --> 00:03:15,095
means f is a function
that takes x as input,
62
00:03:15,095 --> 00:03:18,140
and depending on the
values of w and b,
63
00:03:18,140 --> 00:03:23,395
f will output some value
of a prediction y-hat.
64
00:03:23,395 --> 00:03:26,860
As an alternative
to writing this,
65
00:03:26,860 --> 00:03:29,350
f_w, b of x,
66
00:03:29,350 --> 00:03:32,330
I'll sometimes just
write f of x without
67
00:03:32,330 --> 00:03:35,435
explicitly including w
and b into subscript.
68
00:03:35,435 --> 00:03:38,150
Is just a simpler
notation that means
69
00:03:38,150 --> 00:03:42,235
exactly the same
thing as f_w b of x.
70
00:03:42,235 --> 00:03:44,540
Let's plot the training set on
71
00:03:44,540 --> 00:03:47,330
the graph where the
input feature x is on
72
00:03:47,330 --> 00:03:49,280
the horizontal axis and
73
00:03:49,280 --> 00:03:53,470
the output target y is
on the vertical axis.
74
00:03:53,470 --> 00:03:57,170
Remember, the algorithm
learns from this data and
75
00:03:57,170 --> 00:04:01,910
generates the best-fit line
like maybe this one here.
76
00:04:01,910 --> 00:04:05,735
This straight line is
the linear function
77
00:04:05,735 --> 00:04:11,420
f_w b of x equals
w times x plus b.
78
00:04:11,420 --> 00:04:16,130
Or more simply, we can
drop w and b and just
79
00:04:16,130 --> 00:04:20,390
write f of x equals wx plus b.
80
00:04:20,390 --> 00:04:22,390
Here's what this
function is doing,
81
00:04:22,390 --> 00:04:24,860
it's making predictions
for the value of
82
00:04:24,860 --> 00:04:28,545
y using a streamline
function of x.
83
00:04:28,545 --> 00:04:32,300
You may ask, why are we
choosing a linear function,
84
00:04:32,300 --> 00:04:35,000
where linear function is
just a fancy term for
85
00:04:35,000 --> 00:04:36,830
a straight line instead of
86
00:04:36,830 --> 00:04:40,560
some non-linear function
like a curve or a parabola?
87
00:04:40,560 --> 00:04:42,650
Well, sometimes you want to fit
88
00:04:42,650 --> 00:04:45,470
more complex non-linear
functions as well,
89
00:04:45,470 --> 00:04:47,295
like a curve like this.
90
00:04:47,295 --> 00:04:49,280
But since this
linear function is
91
00:04:49,280 --> 00:04:51,560
relatively simple and
easy to work with,
92
00:04:51,560 --> 00:04:53,150
let's use a line as
93
00:04:53,150 --> 00:04:55,340
a foundation that
will eventually help
94
00:04:55,340 --> 00:04:59,720
you to get to more complex
models that are non-linear.
95
00:04:59,720 --> 00:05:02,340
This particular
model has a name,
96
00:05:02,340 --> 00:05:04,315
it's called linear regression.
97
00:05:04,315 --> 00:05:06,020
More specifically, this is
98
00:05:06,020 --> 00:05:08,945
linear regression
with one variable,
99
00:05:08,945 --> 00:05:11,240
where the phrase one
variable means that there's
100
00:05:11,240 --> 00:05:14,075
a single input
variable or feature x,
101
00:05:14,075 --> 00:05:16,645
namely the size of the house.
102
00:05:16,645 --> 00:05:19,805
Another name for a
linear model with
103
00:05:19,805 --> 00:05:23,720
one input variable is
univariate linear regression,
104
00:05:23,720 --> 00:05:26,465
where uni means one in Latin,
105
00:05:26,465 --> 00:05:29,470
and where variate
means variable.
106
00:05:29,470 --> 00:05:34,275
Univariate is just a fancy
way of saying one variable.
107
00:05:34,275 --> 00:05:35,650
In a later video,
108
00:05:35,650 --> 00:05:39,250
you'll also see a variation
of regression where you'll
109
00:05:39,250 --> 00:05:40,990
want to make a
prediction based not
110
00:05:40,990 --> 00:05:42,850
just on the size of a house,
111
00:05:42,850 --> 00:05:45,520
but on a bunch of other
things that you may know
112
00:05:45,520 --> 00:05:46,810
about the house
such as number of
113
00:05:46,810 --> 00:05:48,845
bedrooms and other features.
114
00:05:48,845 --> 00:05:51,325
By the way, when you're
done with this video,
115
00:05:51,325 --> 00:05:53,485
there is another optional lab.
116
00:05:53,485 --> 00:05:55,435
You don't need to
write any code.
117
00:05:55,435 --> 00:05:58,535
Just review it, run the
code and see what it does.
118
00:05:58,535 --> 00:06:00,550
That will show you
how to define in
119
00:06:00,550 --> 00:06:03,175
Python a straight line function.
120
00:06:03,175 --> 00:06:06,310
The lab will let you
choose the values of
121
00:06:06,310 --> 00:06:09,890
w and b to try to fit
the training data.
122
00:06:09,890 --> 00:06:12,880
You don't have to do the
lab if you don't want to,
123
00:06:12,880 --> 00:06:14,500
but I hope you play
with it when you're
124
00:06:14,500 --> 00:06:16,940
done watching this video.
125
00:06:16,940 --> 00:06:18,965
That's linear regression.
126
00:06:18,965 --> 00:06:20,855
In order for you
to make this work,
127
00:06:20,855 --> 00:06:22,850
one of the most important
things you have to do
128
00:06:22,850 --> 00:06:24,995
is construct a cost function.
129
00:06:24,995 --> 00:06:26,990
The idea of a cost
function is one of
130
00:06:26,990 --> 00:06:29,720
the most universal
and important ideas
131
00:06:29,720 --> 00:06:31,160
in machine learning,
132
00:06:31,160 --> 00:06:34,020
and is used in both
linear regression and in
133
00:06:34,020 --> 00:06:35,330
training many of the most
134
00:06:35,330 --> 00:06:37,510
advanced AI models in the world.
135
00:06:37,510 --> 00:06:40,190
Let's go on to the next
video and take a look
136
00:06:40,190 --> 00:06:43,920
at how you can construct
a cost function.9802
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.