Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:08,700 --> 00:00:12,070
High in this video we're going to talk about Huffman coding.
2
00:00:12,540 --> 00:00:22,020
Including is a great way to compress data and to be able to take a large amount of data and to shrink
3
00:00:22,020 --> 00:00:30,120
it down to bite size chunks and so to give me a good example of how this works right here on the right
4
00:00:30,120 --> 00:00:30,560
hand side.
5
00:00:30,570 --> 00:00:35,290
We have the letters A B C D and D.
6
00:00:35,430 --> 00:00:44,880
If we had a document that only had a range of these five letters and we want it to compress the data
7
00:00:45,810 --> 00:00:48,030
Huffman coding would be a great way of doing that.
8
00:00:48,030 --> 00:00:57,150
And the reason why is because if we want to store this data in the regular text format that usually
9
00:00:57,330 --> 00:01:00,930
would store data you would have to do something like this.
10
00:01:00,930 --> 00:01:14,100
You'd need to set up your system and each one would have to be represented by a byte and a byte is a
11
00:01:14,100 --> 00:01:20,730
total of 8 bits.
12
00:01:20,930 --> 00:01:31,140
And so you're looking at one two three four five six seven and eight.
13
00:01:31,220 --> 00:01:38,570
And so you're looking at having to take up 8 bits for every letter but there's actually a lot better
14
00:01:38,570 --> 00:01:39,730
way of doing this.
15
00:01:39,770 --> 00:01:44,220
And that's what huffman encoding does and this only works with specific types of data.
16
00:01:44,240 --> 00:01:50,850
So if you needed the full set and the full range of data this isn't something that would work for you.
17
00:01:50,900 --> 00:01:54,170
But what we can do is actually take it.
18
00:01:54,170 --> 00:01:59,140
So we have a range set of rules similar to something like this.
19
00:01:59,240 --> 00:02:10,190
We can actually take it and instead of requiring a total of eight to how we can do this just by using
20
00:02:11,710 --> 00:02:19,120
three bets and we're going to walk through how to do it in a tree form first and then I'm going to show
21
00:02:19,120 --> 00:02:22,840
you how the actual code itself works.
22
00:02:22,840 --> 00:02:31,310
So the very first thing we're going to do is on the right hand side the letter A.
23
00:02:31,440 --> 00:02:38,350
Shows that we have 24 counts of the letter and B has 12 C 10 etc. etc..
24
00:02:38,470 --> 00:02:43,270
And so if you're wondering when this could actually be important.
25
00:02:43,630 --> 00:02:50,770
There are certain studies in computer science dealing with things like bioinformatics that this is absolutely
26
00:02:50,770 --> 00:02:56,470
critical and because you're dealing with gigantic amounts of data and you need to be able to compress
27
00:02:56,740 --> 00:02:59,660
and Huffman coding is one of the best ways of doing that.
28
00:03:00,070 --> 00:03:05,350
So the type of tree that we're going to generate is going to be a little bit different than some of
29
00:03:05,350 --> 00:03:07,860
the others we've seen specifically.
30
00:03:07,900 --> 00:03:12,720
It's going to be different and like binary trees because it's going to have values associated with them.
31
00:03:12,850 --> 00:03:17,420
And so we're going to actually start off with the two lowest values.
32
00:03:17,560 --> 00:03:27,070
So we're going to give a B and then we're going to say that there is a count of it and we'll draw a
33
00:03:27,070 --> 00:03:31,430
box around this and we know that D is the same thing.
34
00:03:31,480 --> 00:03:42,340
So we will say there's eight of those and then we're going to push that up to the top and say that we
35
00:03:42,340 --> 00:03:48,070
know that 8 plus 8 equals 16 to have a total of 16.
36
00:03:48,070 --> 00:03:56,310
Now we know by looking up at our little chart up here that B and C both have 12 and 10 respectively.
37
00:03:56,410 --> 00:04:01,030
And so we're going to use those values as well starting with the lower value first.
38
00:04:01,030 --> 00:04:03,580
So we're going to start off with C
39
00:04:14,900 --> 00:04:20,770
we're going to start off with C being equal to its value 10
40
00:04:23,720 --> 00:04:30,940
and then B 12.
41
00:04:31,400 --> 00:04:33,640
And you notice that we are leaving out a.
42
00:04:33,650 --> 00:04:37,800
And that's an important part and I'll show you what that means here in a second.
43
00:04:37,970 --> 00:04:42,220
So we have a B and C and that you total those up those equal 22
44
00:04:45,940 --> 00:04:50,360
K and now all we have to do is connect these.
45
00:04:51,050 --> 00:04:55,760
So we have 16 plus 22 which gives the total of 38.
46
00:04:56,720 --> 00:04:59,920
And now all we have to do is connect.
47
00:05:00,110 --> 00:05:01,250
So we have a.
48
00:05:01,460 --> 00:05:02,960
Which equals 24
49
00:05:08,800 --> 00:05:10,740
and then we're going to total that up.
50
00:05:10,740 --> 00:05:20,530
So the way Huffman codes the root node doesn't actually containing data just contains a sum of the total
51
00:05:20,530 --> 00:05:21,840
data we're looking for.
52
00:05:21,910 --> 00:05:24,160
And so we know our total.
53
00:05:24,190 --> 00:05:28,960
If you add up all of these right here the total is 62.
54
00:05:29,080 --> 00:05:34,850
So that's how you know your root node is always going to represent the total amount.
55
00:05:35,080 --> 00:05:44,650
So if you're wondering how this is better than just having a just regular text file or something like
56
00:05:44,650 --> 00:05:52,900
that which usually to use like ASCII kind of format ASCII like we talked about takes bits or one byte
57
00:05:53,080 --> 00:06:00,430
for each one of these items and so that can actually take up quite a bit of space where as what we can
58
00:06:00,430 --> 00:06:04,210
do here is we can actually assign values.
59
00:06:04,300 --> 00:06:13,540
And so we're going to give this a value of zero and we're going to give this a value of 1 and then we're
60
00:06:13,540 --> 00:06:16,840
going to use this just like the regular binary tree.
61
00:06:16,890 --> 00:06:22,180
Right is 1 and 0 is on the left hand side.
62
00:06:22,510 --> 00:06:25,890
We'll do that with each one type.
63
00:06:25,900 --> 00:06:36,970
Now the way this will work is a is just going to be connected from the root so the root A is going to
64
00:06:37,780 --> 00:06:40,390
have the value of 0.
65
00:06:40,710 --> 00:06:51,590
B If you look at the way this works and you just follow binary tree you'll have the room going here.
66
00:06:51,700 --> 00:06:54,350
Here here.
67
00:06:54,360 --> 00:07:08,400
So you're going to be looking at B being represented by the binary value of 1 0 0 and an see if you
68
00:07:08,400 --> 00:07:13,580
follow it down in the tree where it going down 1 0 1
69
00:07:17,010 --> 00:07:31,200
D is going to be 1 1 0 and e is going to be 1 1 1 and all we're doing is we're assigning the edges on
70
00:07:31,200 --> 00:07:37,140
the path we're saying it a 1 or a 0 depending on if it's on the right hand or the left hand side and
71
00:07:37,140 --> 00:07:45,920
then that is what we're going to store in our in our compressed files.
72
00:07:45,920 --> 00:07:58,170
So instead of having to give a total of eight values here we are given a 1 instead of 8 values here
73
00:07:58,180 --> 00:07:59,140
it's three.
74
00:07:59,280 --> 00:08:04,830
And the reason it's 3 it's just one two three here instead of eight.
75
00:08:05,070 --> 00:08:09,330
It's three three instead of a three instead of eight.
76
00:08:09,330 --> 00:08:15,840
So all we're really doing is we're shrinking down the number of bits that are required.
77
00:08:15,990 --> 00:08:18,930
And as you can see right here tab.
78
00:08:18,960 --> 00:08:23,270
ABC D and E it with a normal system.
79
00:08:23,370 --> 00:08:27,520
This would be eight times eight times five.
80
00:08:27,630 --> 00:08:35,310
So we would need a total of 40 as compared to the new system that uses the Huffman coding which has
81
00:08:35,400 --> 00:08:40,550
a total of one plus four sets of three.
82
00:08:40,560 --> 00:08:42,770
So we're looking at 13.
83
00:08:42,840 --> 00:08:52,650
So by being able to Huffman coding we're saving well over half of the amount of space and that's one
84
00:08:52,650 --> 00:08:58,740
of the big reasons why Huffman coding is so powerful because by simply doing something by shrinking
85
00:08:58,740 --> 00:09:08,190
down and number of bits that are necessary on a per energy or on a per letter basis we're able to have
86
00:09:08,250 --> 00:09:09,780
a much smaller file.
87
00:09:09,780 --> 00:09:16,920
And whether you're dealing with DNA sequences or something where you can work with the data set like
88
00:09:16,920 --> 00:09:17,360
this.
89
00:09:17,390 --> 00:09:19,350
This can be a great way of doing it.
90
00:09:19,350 --> 00:09:23,610
So please let me know if you have any questions whatsoever and I'll see you in the next video.
9652
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.