Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,740 --> 00:00:03,120
The choice of features can have
2
00:00:03,120 --> 00:00:06,090
a huge impact on your learning
algorithm's performance.
3
00:00:06,090 --> 00:00:08,700
In fact, for many
practical applications,
4
00:00:08,700 --> 00:00:11,130
choosing or entering
the right features is
5
00:00:11,130 --> 00:00:14,085
a critical step to making
the algorithm work well.
6
00:00:14,085 --> 00:00:17,340
In this video, let's take a
look at how you can choose or
7
00:00:17,340 --> 00:00:19,365
engineer the most
appropriate features
8
00:00:19,365 --> 00:00:21,150
for your learning algorithm.
9
00:00:21,150 --> 00:00:24,300
Let's take a look at feature
engineering by revisiting
10
00:00:24,300 --> 00:00:27,840
the example of predicting
the price of a house.
11
00:00:27,840 --> 00:00:31,140
Say you have two
features for each house.
12
00:00:31,140 --> 00:00:33,975
X_1 is the width of the lot size
13
00:00:33,975 --> 00:00:37,200
of the plots of land that
the house is built on.
14
00:00:37,200 --> 00:00:39,345
This in real state is also
15
00:00:39,345 --> 00:00:42,000
called the frontage of the lot,
16
00:00:42,000 --> 00:00:44,070
and the second feature,
17
00:00:44,070 --> 00:00:47,465
x_2, is the depth
of the lot size of,
18
00:00:47,465 --> 00:00:49,340
lets assume the rectangular plot
19
00:00:49,340 --> 00:00:51,590
of land that the
house was built on.
20
00:00:51,590 --> 00:00:54,430
Given these two
features, x_1 and x_2,
21
00:00:54,430 --> 00:00:57,755
you might build a model
like this where f of x
22
00:00:57,755 --> 00:01:02,110
is w_1x_1 plus w_2x_2 plus b,
23
00:01:02,110 --> 00:01:04,875
where x_1 is the
frontage or width,
24
00:01:04,875 --> 00:01:07,680
and x_2 is the depth.
25
00:01:07,680 --> 00:01:10,045
This model might work okay.
26
00:01:10,045 --> 00:01:12,020
But here's another
option for how
27
00:01:12,020 --> 00:01:13,700
you might choose a
different way to
28
00:01:13,700 --> 00:01:15,335
use these features in the model
29
00:01:15,335 --> 00:01:17,160
that could be even
more effective.
30
00:01:17,160 --> 00:01:18,830
You might notice
that the area of
31
00:01:18,830 --> 00:01:20,540
the land can be calculated
32
00:01:20,540 --> 00:01:24,375
as the frontage or
width times the depth.
33
00:01:24,375 --> 00:01:27,020
You may have an intuition that
34
00:01:27,020 --> 00:01:30,200
the area of the land is more
predictive of the price,
35
00:01:30,200 --> 00:01:33,940
than the frontage and depth
as separate features.
36
00:01:33,940 --> 00:01:36,570
You might define a new feature,
37
00:01:36,570 --> 00:01:39,660
x_3, as x_1 times x_2.
38
00:01:39,660 --> 00:01:41,795
This new feature x_3 is
39
00:01:41,795 --> 00:01:44,875
equal to the area of
the plot of land.
40
00:01:44,875 --> 00:01:48,785
With this feature, you can
then have a model f_w,
41
00:01:48,785 --> 00:01:53,255
b of x equals w_1x_1 plus w_2x_2
42
00:01:53,255 --> 00:01:56,110
plus w_3x_3 plus b
43
00:01:56,110 --> 00:01:59,330
so that the model can now
choose parameters w_1,
44
00:01:59,330 --> 00:02:01,205
w_2, and w_3,
45
00:02:01,205 --> 00:02:03,440
depending on whether
the data shows that
46
00:02:03,440 --> 00:02:06,335
the frontage or the
depth or the area
47
00:02:06,335 --> 00:02:08,750
x_3 of the lot turns out to be
48
00:02:08,750 --> 00:02:10,250
the most important thing for
49
00:02:10,250 --> 00:02:12,430
predicting the
price of the house.
50
00:02:12,430 --> 00:02:15,469
What we just did,
creating a new feature
51
00:02:15,469 --> 00:02:19,180
is an example of what's
called feature engineering,
52
00:02:19,180 --> 00:02:21,380
in which you might
use your knowledge or
53
00:02:21,380 --> 00:02:24,460
intuition about the problem
to design new features
54
00:02:24,460 --> 00:02:26,270
usually by transforming or
55
00:02:26,270 --> 00:02:28,010
combining the
original features of
56
00:02:28,010 --> 00:02:29,810
the problem in order to make it
57
00:02:29,810 --> 00:02:31,670
easier for the
learning algorithm
58
00:02:31,670 --> 00:02:33,545
to make accurate predictions.
59
00:02:33,545 --> 00:02:35,420
Depending on what insights you
60
00:02:35,420 --> 00:02:37,204
may have into the application,
61
00:02:37,204 --> 00:02:38,600
rather than just taking
62
00:02:38,600 --> 00:02:40,190
the features that
you happen to have
63
00:02:40,190 --> 00:02:43,745
started off with sometimes
by defining new features,
64
00:02:43,745 --> 00:02:47,500
you might be able to get
a much better model.
65
00:02:47,500 --> 00:02:50,625
That's feature engineering.
66
00:02:50,625 --> 00:02:54,260
It turns out that this one
flavor of feature engineering,
67
00:02:54,260 --> 00:02:56,780
that allow you to fit
not just straight lines,
68
00:02:56,780 --> 00:03:00,095
but curves, non-linear
functions to your data.
69
00:03:00,095 --> 00:03:02,000
Let's take a look
in the next video
70
00:03:02,000 --> 00:03:04,170
at how you can do that.4976
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.