Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:01,440 --> 00:00:04,240
Welcome back. In this week,
2
00:00:04,240 --> 00:00:05,845
we'll learn to make
linear regression
3
00:00:05,845 --> 00:00:08,320
much faster and
much more powerful,
4
00:00:08,320 --> 00:00:10,090
and by the end of this week,
5
00:00:10,090 --> 00:00:11,665
you'll be two thirds of the way
6
00:00:11,665 --> 00:00:13,645
to finishing this first course.
7
00:00:13,645 --> 00:00:16,060
Let's start by looking
at the version of
8
00:00:16,060 --> 00:00:19,300
linear regression that look
at not just one feature,
9
00:00:19,300 --> 00:00:22,615
but a lot of different
features. Let's take a look.
10
00:00:22,615 --> 00:00:26,845
In the original version
of linear regression,
11
00:00:26,845 --> 00:00:29,785
you had a single feature x,
12
00:00:29,785 --> 00:00:33,825
the size of the house and
you're able to predict y,
13
00:00:33,825 --> 00:00:35,820
the price of the house.
14
00:00:35,820 --> 00:00:43,065
The model was fwb of
x equals wx plus b.
15
00:00:43,065 --> 00:00:46,450
But now, what if you did
not only have the size of
16
00:00:46,450 --> 00:00:47,800
the house as a feature with
17
00:00:47,800 --> 00:00:49,775
which to try to
predict the price,
18
00:00:49,775 --> 00:00:53,244
but if you also knew
the number of bedrooms,
19
00:00:53,244 --> 00:00:56,650
the number of floors and the
age of the home in years.
20
00:00:56,650 --> 00:00:58,390
It seems like this
would give you
21
00:00:58,390 --> 00:01:01,720
a lot more information with
which to predict the price.
22
00:01:01,720 --> 00:01:04,580
To introduce a little
bit of new notation,
23
00:01:04,580 --> 00:01:07,580
we're going to use
the variables X_1,
24
00:01:07,580 --> 00:01:12,000
X_2, X_3 and X_4,
25
00:01:12,000 --> 00:01:14,195
to denote the four features.
26
00:01:14,195 --> 00:01:16,540
For simplicity, let's introduce
27
00:01:16,540 --> 00:01:18,190
a little bit more notation.
28
00:01:18,190 --> 00:01:20,525
We'll write X subscript j
29
00:01:20,525 --> 00:01:23,005
or sometimes I'll
just say for short,
30
00:01:23,005 --> 00:01:26,980
X sub j, to represent
the list of features.
31
00:01:26,980 --> 00:01:29,575
Here, j will go
from one to four,
32
00:01:29,575 --> 00:01:31,465
because we have four features.
33
00:01:31,465 --> 00:01:33,910
I'm going to use lowercase
34
00:01:33,910 --> 00:01:37,090
n to denote the total
number of features,
35
00:01:37,090 --> 00:01:38,485
so in this example,
36
00:01:38,485 --> 00:01:40,250
n is equal to 4.
37
00:01:40,250 --> 00:01:43,395
As before, we'll
use X superscript
38
00:01:43,395 --> 00:01:48,165
i to denote the ith
training example.
39
00:01:48,165 --> 00:01:51,760
Here X superscript i is actually
40
00:01:51,760 --> 00:01:55,615
going to be a list
of four numbers,
41
00:01:55,615 --> 00:01:59,360
or sometimes we'll call
this a vector that
42
00:01:59,360 --> 00:02:04,405
includes all the features of
the ith training example.
43
00:02:04,405 --> 00:02:07,079
As a concrete example,
44
00:02:07,079 --> 00:02:10,190
X superscript in parentheses 2,
45
00:02:10,190 --> 00:02:11,450
will be a vector of
46
00:02:11,450 --> 00:02:14,810
the features for the
second training example,
47
00:02:14,810 --> 00:02:18,770
so it will equal
to this 1416, 3,
48
00:02:18,770 --> 00:02:21,670
2 and 40 and technically,
49
00:02:21,670 --> 00:02:23,810
I'm writing these
numbers in a row,
50
00:02:23,810 --> 00:02:25,460
so sometimes this is called
51
00:02:25,460 --> 00:02:28,250
a row vector rather
than a column vector.
52
00:02:28,250 --> 00:02:30,050
But if you don't know
what the difference is,
53
00:02:30,050 --> 00:02:31,910
don't worry about it, it's
54
00:02:31,910 --> 00:02:34,405
not that important
for this purpose.
55
00:02:34,405 --> 00:02:38,035
To refer to a specific feature
56
00:02:38,035 --> 00:02:40,555
in the ith training example,
57
00:02:40,555 --> 00:02:45,425
I will write X superscript
i, subscript j,
58
00:02:45,425 --> 00:02:49,230
so for example, X
superscript 2 subscript
59
00:02:49,230 --> 00:02:53,240
3 will be the value
of the third feature,
60
00:02:53,240 --> 00:02:55,160
that is the number of floors in
61
00:02:55,160 --> 00:02:57,140
the second training example and
62
00:02:57,140 --> 00:03:00,325
so that's going
to be equal to 2.
63
00:03:00,325 --> 00:03:04,740
Sometimes in order to
emphasize that this X^2
64
00:03:04,740 --> 00:03:06,540
is not a number but is
65
00:03:06,540 --> 00:03:08,970
actually a list of
numbers that is a vector,
66
00:03:08,970 --> 00:03:12,510
we'll draw an arrow on
top of that just to
67
00:03:12,510 --> 00:03:17,985
visually show that is a
vector and over here as well,
68
00:03:17,985 --> 00:03:21,835
but you don't have to draw
this arrow in your notation.
69
00:03:21,835 --> 00:03:25,880
You can think of the arrow
as an optional signifier.
70
00:03:25,880 --> 00:03:27,680
They're sometimes used just to
71
00:03:27,680 --> 00:03:30,575
emphasize that this is a
vector and not a number.
72
00:03:30,575 --> 00:03:32,900
Now that we have
multiple features,
73
00:03:32,900 --> 00:03:36,175
let's take a look at what
a model would look like.
74
00:03:36,175 --> 00:03:39,325
Previously, this is how
we defined the model,
75
00:03:39,325 --> 00:03:41,480
where X was a single feature,
76
00:03:41,480 --> 00:03:42,875
so a single number.
77
00:03:42,875 --> 00:03:45,245
But now with multiple features,
78
00:03:45,245 --> 00:03:47,810
we're going to define
it differently.
79
00:03:47,810 --> 00:03:51,085
Instead, the model will be,
80
00:03:51,085 --> 00:03:55,965
fwb of X equals w1x1
81
00:03:55,965 --> 00:04:01,890
plus w2x2 plus w3x3
plus w4x4 plus b.
82
00:04:01,890 --> 00:04:05,855
Concretely for housing
price prediction,
83
00:04:05,855 --> 00:04:10,370
one possible model may be
that we estimate the price
84
00:04:10,370 --> 00:04:14,960
of the house as 0.1 times
X_1, the size of the house,
85
00:04:14,960 --> 00:04:18,315
plus four times X_2,
86
00:04:18,315 --> 00:04:19,545
the number of bedrooms,
87
00:04:19,545 --> 00:04:22,530
plus ten times X_3,
the number of floors,
88
00:04:22,530 --> 00:04:24,600
minus 2 times X_4,
89
00:04:24,600 --> 00:04:27,250
the age of the house
in years plus 80.
90
00:04:27,250 --> 00:04:29,420
Let's think a bit about how you
91
00:04:29,420 --> 00:04:31,535
might interpret
these parameters.
92
00:04:31,535 --> 00:04:33,620
If the model is
trying to predict
93
00:04:33,620 --> 00:04:36,095
the price of the house
in thousands of dollars,
94
00:04:36,095 --> 00:04:40,520
you can think of
this b equals 80 as
95
00:04:40,520 --> 00:04:42,800
saying that the base
price of a house
96
00:04:42,800 --> 00:04:45,320
starts off at maybe $80,000,
97
00:04:45,320 --> 00:04:46,790
assuming it has no size,
98
00:04:46,790 --> 00:04:49,615
no bedrooms, no
floor and no age.
99
00:04:49,615 --> 00:04:52,660
You can think of
this 0.1 as saying
100
00:04:52,660 --> 00:04:55,600
that maybe for every
additional square foot,
101
00:04:55,600 --> 00:05:01,835
the price will increase
by 0.1 $1,000 or by $100,
102
00:05:01,835 --> 00:05:04,945
because we're saying that
for each square foot,
103
00:05:04,945 --> 00:05:07,900
the price increases by 0.1,
104
00:05:07,900 --> 00:05:11,975
times $1,000, which is $100.
105
00:05:11,975 --> 00:05:15,150
Maybe for each
additional bathroom,
106
00:05:15,150 --> 00:05:16,680
the price increases by
107
00:05:16,680 --> 00:05:20,055
$4,000 and for each
additional floor
108
00:05:20,055 --> 00:05:21,300
the price may increase by
109
00:05:21,300 --> 00:05:25,515
$10,000 and for each additional
year of the house's age,
110
00:05:25,515 --> 00:05:28,650
the price may
decrease by $2,000,
111
00:05:28,650 --> 00:05:31,455
because the parameter
is negative 2.
112
00:05:31,455 --> 00:05:35,760
In general, if you
have n features,
113
00:05:35,760 --> 00:05:38,980
then the model will
look like this.
114
00:05:39,440 --> 00:05:42,300
Here again is the definition
115
00:05:42,300 --> 00:05:44,930
of the model with n features.
116
00:05:44,930 --> 00:05:46,840
What we're going to do next is
117
00:05:46,840 --> 00:05:48,520
introduce a little
bit of notation
118
00:05:48,520 --> 00:05:50,170
to rewrite this expression in
119
00:05:50,170 --> 00:05:52,210
a simpler but equivalent way.
120
00:05:52,210 --> 00:05:54,760
Let's define W as a list of
121
00:05:54,760 --> 00:05:58,000
numbers that list
the parameters W_1,
122
00:05:58,000 --> 00:05:59,590
W_2, W_3,
123
00:05:59,590 --> 00:06:01,925
all the way through W_n.
124
00:06:01,925 --> 00:06:04,960
In mathematics, this
is called a vector
125
00:06:04,960 --> 00:06:08,200
and sometimes to designate
that this is a vector,
126
00:06:08,200 --> 00:06:10,060
which just means a
list of numbers,
127
00:06:10,060 --> 00:06:12,970
I'm going to draw a
little arrow on top.
128
00:06:12,970 --> 00:06:15,895
You don't always have
to draw this arrow
129
00:06:15,895 --> 00:06:19,330
and you can do so or not
in your own notation,
130
00:06:19,330 --> 00:06:21,400
so you can think of
this little arrow as
131
00:06:21,400 --> 00:06:23,425
just an optional signifier
132
00:06:23,425 --> 00:06:26,350
to remind us that
this is a vector.
133
00:06:26,350 --> 00:06:29,260
If you've taken the linear
algebra class before,
134
00:06:29,260 --> 00:06:31,000
you might recognize that this is
135
00:06:31,000 --> 00:06:34,210
a row vector as opposed
to a column vector.
136
00:06:34,210 --> 00:06:36,460
But if you don't know
what those terms means,
137
00:06:36,460 --> 00:06:38,030
you don't need to
worry about it.
138
00:06:38,030 --> 00:06:39,795
Next, same as before,
139
00:06:39,795 --> 00:06:45,100
b is a single number and not
a vector and so this vector
140
00:06:45,100 --> 00:06:47,710
W together with this number b
141
00:06:47,710 --> 00:06:51,360
are the parameters of the model.
142
00:06:51,360 --> 00:06:57,365
Let me also write X as
a list or a vector,
143
00:06:57,365 --> 00:06:59,420
again a row vector that
144
00:06:59,420 --> 00:07:02,030
lists all of the
features X_1, X_2,
145
00:07:02,030 --> 00:07:04,795
X_3 up to X_n,
146
00:07:04,795 --> 00:07:07,530
this is again a vector,
147
00:07:07,530 --> 00:07:13,440
so I'm going to add a little
arrow up on top to signify.
148
00:07:13,760 --> 00:07:16,410
In the notation up on top,
149
00:07:16,410 --> 00:07:20,600
we can also add little
arrows here and here
150
00:07:20,600 --> 00:07:23,660
to signify that
that W and that X
151
00:07:23,660 --> 00:07:26,355
are actually these
lists of numbers,
152
00:07:26,355 --> 00:07:29,835
that they're actually
these vectors.
153
00:07:29,835 --> 00:07:34,750
With this notation, the model
can now be rewritten more
154
00:07:34,750 --> 00:07:39,310
succinctly as f of x equals,
155
00:07:39,310 --> 00:07:43,710
the vector w dot and
this dot refers to
156
00:07:43,710 --> 00:07:48,525
a dot product from linear
algebra of X the vector,
157
00:07:48,525 --> 00:07:50,730
plus the number b.
158
00:07:50,730 --> 00:07:53,190
What is this dot product thing?
159
00:07:53,190 --> 00:07:55,315
Well, the dot products of
160
00:07:55,315 --> 00:07:59,450
two vectors of two lists
of numbers W and X,
161
00:07:59,450 --> 00:08:02,490
is computed by checking
162
00:08:02,490 --> 00:08:05,475
the corresponding
pairs of numbers,
163
00:08:05,475 --> 00:08:09,030
W_1 and X_1 multiplying that,
164
00:08:09,030 --> 00:08:12,540
W_2 X_2 multiplying that,
165
00:08:12,540 --> 00:08:15,525
W_3 X_3 multiplying that,
166
00:08:15,525 --> 00:08:19,080
all the way up to W_n
and X_n multiplying
167
00:08:19,080 --> 00:08:23,340
that and then summing up
all of these products.
168
00:08:23,340 --> 00:08:25,335
Writing that out,
169
00:08:25,335 --> 00:08:32,480
this means that the dot products
is equal to W_1X_1 plus
170
00:08:32,480 --> 00:08:41,910
W_2X_2 plus W_3X_3 plus
all the way up to W_nX_n.
171
00:08:41,910 --> 00:08:46,810
Then finally we add
back in the b on top.
172
00:08:46,840 --> 00:08:49,410
You notice that this gives us
173
00:08:49,410 --> 00:08:53,080
exactly the same expression
as we had on top.
174
00:08:53,390 --> 00:08:57,490
The dot traffic notation
lets you write the model in
175
00:08:57,490 --> 00:09:01,240
a more compact form
with fewer characters.
176
00:09:01,240 --> 00:09:04,510
The name for this type of
linear regression model with
177
00:09:04,510 --> 00:09:08,230
multiple input features is
multiple linear regression.
178
00:09:08,230 --> 00:09:11,200
This is in contrast to
univariate regression,
179
00:09:11,200 --> 00:09:13,415
which has just one feature.
180
00:09:13,415 --> 00:09:15,510
By the way, you might think
181
00:09:15,510 --> 00:09:18,400
this algorithm is called
multivariate regression,
182
00:09:18,400 --> 00:09:20,590
but that term actually
refers to something
183
00:09:20,590 --> 00:09:23,170
else that we won't
be using here.
184
00:09:23,170 --> 00:09:24,970
I'm going to refer to this model
185
00:09:24,970 --> 00:09:27,755
as multiple linear regression.
186
00:09:27,755 --> 00:09:31,705
That's it for linear regression
with multiple features,
187
00:09:31,705 --> 00:09:34,555
which is also called
multiple linear regression.
188
00:09:34,555 --> 00:09:36,495
In order to implement this,
189
00:09:36,495 --> 00:09:39,710
there's a really neat trick
called vectorization,
190
00:09:39,710 --> 00:09:42,040
which will make it much
simpler to implement
191
00:09:42,040 --> 00:09:44,965
this and many other
learning algorithms.
192
00:09:44,965 --> 00:09:47,260
Let's go on to the
next video to take
193
00:09:47,260 --> 00:09:50,770
a look at what is vectorization.13756
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.