Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,806 --> 00:00:03,300
Instructor: Hi, and welcome back.
2
00:00:03,300 --> 00:00:06,990
Let's begin this lesson by introducing binary encoding.
3
00:00:06,990 --> 00:00:08,130
We will start from the
4
00:00:08,130 --> 00:00:10,290
ordinal numbers we assigned earlier.
5
00:00:10,290 --> 00:00:12,630
Bread is represented by the number one,
6
00:00:12,630 --> 00:00:14,190
yogurt by the number two,
7
00:00:14,190 --> 00:00:16,203
and muffin is designated with three.
8
00:00:17,370 --> 00:00:19,560
Binary encoding implies we should turn
9
00:00:19,560 --> 00:00:21,630
these numbers into binary.
10
00:00:21,630 --> 00:00:24,000
One in binary is zero one,
11
00:00:24,000 --> 00:00:26,253
so bread would be zero one.
12
00:00:27,540 --> 00:00:29,700
Two in binary is one zero.
13
00:00:29,700 --> 00:00:32,043
So, yogurt would be one zero.
14
00:00:33,150 --> 00:00:35,190
Three in binary is one one.
15
00:00:35,190 --> 00:00:38,010
So, muffins would be one one.
16
00:00:38,010 --> 00:00:40,320
The next step of the process is to divide
17
00:00:40,320 --> 00:00:41,670
these into different columns,
18
00:00:41,670 --> 00:00:44,280
as if we were creating two new variables.
19
00:00:44,280 --> 00:00:46,281
For the first one, bread is zero,
20
00:00:46,281 --> 00:00:49,170
yogurt is one and muffins are one.
21
00:00:49,170 --> 00:00:51,600
For the second variable, bread is one,
22
00:00:51,600 --> 00:00:54,210
yogurt is zero, and muffins are one.
23
00:00:54,210 --> 00:00:56,790
We have differentiated between the three categories
24
00:00:56,790 --> 00:00:58,530
and have removed the order.
25
00:00:58,530 --> 00:01:00,330
However, there are still some
26
00:01:00,330 --> 00:01:02,223
implied correlations between them.
27
00:01:03,600 --> 00:01:05,850
For instance, bread and yogurt
28
00:01:05,850 --> 00:01:08,250
seem exactly the opposite of each other.
29
00:01:08,250 --> 00:01:09,570
It's like we are saying,
30
00:01:09,570 --> 00:01:11,700
whatever is bread is not yogurt
31
00:01:11,700 --> 00:01:13,110
and vice versa.
32
00:01:13,110 --> 00:01:14,790
Even if this makes sense,
33
00:01:14,790 --> 00:01:16,800
if we encode them in a different way.
34
00:01:16,800 --> 00:01:18,390
This opposite correlation would be
35
00:01:18,390 --> 00:01:19,980
true for muffins and yogurt,
36
00:01:19,980 --> 00:01:21,780
but no longer for bread.
37
00:01:21,780 --> 00:01:25,080
Therefore, binary encoding proves problematic,
38
00:01:25,080 --> 00:01:26,880
but is a great improvement regarding
39
00:01:26,880 --> 00:01:28,443
the initial ordinal method.
40
00:01:30,750 --> 00:01:31,830
All right.
41
00:01:31,830 --> 00:01:35,100
Finally, we have the so-called one-hot encoding.
42
00:01:35,100 --> 00:01:38,610
One-hot is very simple and widely adopted.
43
00:01:38,610 --> 00:01:40,740
It consists of creating as many columns
44
00:01:40,740 --> 00:01:42,630
as there are possible values.
45
00:01:42,630 --> 00:01:44,640
Here, we have three products.
46
00:01:44,640 --> 00:01:48,120
Thus, we need three columns or three variables.
47
00:01:48,120 --> 00:01:50,703
Let's call them bread, yogurt, and muffins.
48
00:01:51,780 --> 00:01:54,780
Imagine these variables as asking the question:
49
00:01:54,780 --> 00:01:56,370
Is this product bread?
50
00:01:56,370 --> 00:01:57,720
Is this product yogurt?
51
00:01:57,720 --> 00:01:59,463
And is this product muffins?
52
00:02:00,690 --> 00:02:03,330
One means yes, zero means no.
53
00:02:03,330 --> 00:02:05,610
So for a product that is bread,
54
00:02:05,610 --> 00:02:08,223
we will have one zero zero.
55
00:02:08,223 --> 00:02:11,733
For a product that is yogurt, zero one zero,
56
00:02:12,570 --> 00:02:15,990
and for a product that is muffin, zero zero one.
57
00:02:15,990 --> 00:02:17,610
This is very intuitive
58
00:02:17,610 --> 00:02:19,080
as a product can only be of
59
00:02:19,080 --> 00:02:21,000
one type at the same time.
60
00:02:21,000 --> 00:02:23,610
Thus, there will be only one value one
61
00:02:23,610 --> 00:02:25,590
and everything else will be zero.
62
00:02:25,590 --> 00:02:26,970
This means the products are
63
00:02:26,970 --> 00:02:28,950
uncorrelated and unequivocal,
64
00:02:28,950 --> 00:02:31,503
which is useful and usually works like a charm.
65
00:02:32,850 --> 00:02:35,400
Many lessons ago we were talking about cats,
66
00:02:35,400 --> 00:02:37,740
dogs, and horses classification.
67
00:02:37,740 --> 00:02:40,470
The target vectors there were one-hot encoded,
68
00:02:40,470 --> 00:02:42,963
so we had the same type of vectors.
69
00:02:44,370 --> 00:02:46,320
There is one big problem with
70
00:02:46,320 --> 00:02:47,880
one-hot encoding, though.
71
00:02:47,880 --> 00:02:51,300
One-hot encoding requires a lot of new variables.
72
00:02:51,300 --> 00:02:55,590
For example, Ikea offers around 12,000 products.
73
00:02:55,590 --> 00:02:57,450
Do we want to include 12,000
74
00:02:57,450 --> 00:02:59,250
columns in our inputs?
75
00:02:59,250 --> 00:03:00,603
Definitely not.
76
00:03:01,860 --> 00:03:04,380
If we used binary, the 12,000 products
77
00:03:04,380 --> 00:03:06,870
would be represented by 16 columns only
78
00:03:06,870 --> 00:03:09,210
since the 12000th product would be written
79
00:03:09,210 --> 00:03:10,653
like this in binary.
80
00:03:11,760 --> 00:03:13,400
This is exponentially lower than the
81
00:03:13,400 --> 00:03:15,210
12,000 columns we would need
82
00:03:15,210 --> 00:03:16,740
for one-hot encoding.
83
00:03:16,740 --> 00:03:19,470
In such cases, we must use binary,
84
00:03:19,470 --> 00:03:20,940
even though that would introduce
85
00:03:20,940 --> 00:03:22,560
some unjustified correlations
86
00:03:22,560 --> 00:03:23,793
between the products.
87
00:03:25,170 --> 00:03:26,850
Clearly there is a trade off
88
00:03:26,850 --> 00:03:29,370
between binary and one-hot encoding.
89
00:03:29,370 --> 00:03:30,870
We would prefer one-hot when
90
00:03:30,870 --> 00:03:32,370
we have a few categories
91
00:03:32,370 --> 00:03:35,550
and binary when dealing with many categories.
92
00:03:35,550 --> 00:03:37,219
All right, that was all.
93
00:03:37,219 --> 00:03:38,823
Thanks for watching.
6635
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.