Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
1
00:00:00,160 --> 00:00:05,160
(lively music)
(keyboard clacking)
2
2
00:00:05,230 --> 00:00:06,960
In this video, we're going to take a look
3
3
00:00:06,960 --> 00:00:10,480
at the Radix Sort and this is another sort algorithm
4
4
00:00:10,480 --> 00:00:13,520
that makes assumptions about the data it's sorting
5
5
00:00:13,520 --> 00:00:15,980
and in this case, the assumptions that it makes
6
6
00:00:15,980 --> 00:00:19,860
is that the data has the same radix and width.
7
7
00:00:19,860 --> 00:00:22,240
Now, what do I mean by that?
8
8
00:00:22,240 --> 00:00:25,360
The radix is the number of unique digits
9
9
00:00:25,360 --> 00:00:28,490
or values in the case of characters
10
10
00:00:28,490 --> 00:00:31,510
that a numbering system or an alphabet has.
11
11
00:00:31,510 --> 00:00:35,300
For example, the radix for the decimal system is 10
12
12
00:00:35,300 --> 00:00:37,180
because there are 10 possible digits
13
13
00:00:37,180 --> 00:00:38,660
in the decimal system.
14
14
00:00:38,660 --> 00:00:39,900
Zero to nine.
15
15
00:00:39,900 --> 00:00:42,580
For binary numbers, the radix is two
16
16
00:00:42,580 --> 00:00:47,230
because we use two digits in the binary system, zero and one
17
17
00:00:47,230 --> 00:00:50,450
and for the English alphabet, the radix is 26
18
18
00:00:50,450 --> 00:00:53,400
because there are 26 letters in the alphabet.
19
19
00:00:53,400 --> 00:00:56,240
So, that's what we mean by radix.
20
20
00:00:56,240 --> 00:00:59,110
Now, by width I mean the number of digits
21
21
00:00:59,110 --> 00:01:01,897
or letters, for example, the number 1235
22
22
00:01:03,660 --> 00:01:05,540
has a width of four.
23
23
00:01:05,540 --> 00:01:08,610
The string hello has a width of five.
24
24
00:01:08,610 --> 00:01:10,930
The number 10 has a width of two.
25
25
00:01:10,930 --> 00:01:15,840
So, 1,235 or 1235 has four digits,
26
26
00:01:15,840 --> 00:01:17,323
so it has a width of four,
27
27
00:01:17,323 --> 00:01:21,750
the string hello has five letters, so it has a width of five
28
28
00:01:21,750 --> 00:01:24,270
and the number 10 has two digits
29
29
00:01:24,270 --> 00:01:26,200
and so, it has a width of two.
30
30
00:01:26,200 --> 00:01:28,310
And so, the radix sort assumes
31
31
00:01:28,310 --> 00:01:32,740
that all the values have the same radix and the same width
32
32
00:01:32,740 --> 00:01:36,690
and so, that means that we can use the radix sort
33
33
00:01:36,690 --> 00:01:39,420
to sort integers and strings.
34
34
00:01:39,420 --> 00:01:41,220
The decimal point isn't a digit
35
35
00:01:41,220 --> 00:01:43,620
and so, floating point numbers can't be sorted
36
36
00:01:43,620 --> 00:01:45,100
using radix sort.
37
37
00:01:45,100 --> 00:01:46,170
So, how does it work?
38
38
00:01:46,170 --> 00:01:49,190
Well, we sort based on each individual digit
39
39
00:01:49,190 --> 00:01:51,530
or letter position and you'll see what I mean
40
40
00:01:51,530 --> 00:01:55,500
when we go through the algorithm on the slides.
41
41
00:01:55,500 --> 00:01:58,430
So, we start at the rightmost position
42
42
00:01:58,430 --> 00:02:01,210
and we sort based on that position,
43
43
00:02:01,210 --> 00:02:03,610
the digit or the letter at that position
44
44
00:02:03,610 --> 00:02:07,810
and then we move to the rightmost minus one digit
45
45
00:02:07,810 --> 00:02:09,780
or letter and we sort based on that
46
46
00:02:09,780 --> 00:02:10,890
and we keep doing that
47
47
00:02:10,890 --> 00:02:13,700
until we've done all of the digits or letters
48
48
00:02:13,700 --> 00:02:14,533
in the values.
49
49
00:02:14,533 --> 00:02:16,350
Now, the important point here
50
50
00:02:16,350 --> 00:02:19,531
and this is key, this is an absolute important point,
51
51
00:02:19,531 --> 00:02:22,820
is we have to use a stable sort algorithm
52
52
00:02:22,820 --> 00:02:27,070
at each stage, so when we sort the values based
53
53
00:02:27,070 --> 00:02:28,670
on the rightmost position,
54
54
00:02:28,670 --> 00:02:31,100
we have to use a stable sort algorithm.
55
55
00:02:31,100 --> 00:02:33,190
And then when we sort the values based
56
56
00:02:33,190 --> 00:02:35,710
on the rightmost minus one position,
57
57
00:02:35,710 --> 00:02:37,630
we have to use a stable sort algorithm
58
58
00:02:37,630 --> 00:02:39,307
and once again, you'll understand why
59
59
00:02:39,307 --> 00:02:42,430
when we take a look at going through the algorithm.
60
60
00:02:42,430 --> 00:02:43,870
So, let's do that now.
61
61
00:02:43,870 --> 00:02:46,238
So, we have a different array here now
62
62
00:02:46,238 --> 00:02:47,890
at the top of the screen.
63
63
00:02:47,890 --> 00:02:50,070
It's got six integers in it
64
64
00:02:50,070 --> 00:02:52,930
and you'll notice that they're all decimal numbers
65
65
00:02:52,930 --> 00:02:55,540
and they're all of width four.
66
66
00:02:55,540 --> 00:02:57,620
They all have four digits
67
67
00:02:57,620 --> 00:03:00,883
and so, we can sort this array using the radix sot.
68
68
00:03:02,250 --> 00:03:03,770
First we're going to sort this array
69
69
00:03:03,770 --> 00:03:05,770
based on the one's position
70
70
00:03:05,770 --> 00:03:08,020
and so, let's take a look at the first value,
71
71
00:03:08,020 --> 00:03:13,020
4725, the number five is in the one's position
72
72
00:03:14,180 --> 00:03:15,970
and for the second value,
73
73
00:03:15,970 --> 00:03:20,970
4586, the number six is in the one's position
74
74
00:03:21,410 --> 00:03:22,690
and so, what we're doing here
75
75
00:03:22,690 --> 00:03:25,440
is we're sorting by the digit
76
76
00:03:25,440 --> 00:03:27,370
that has the least weight
77
77
00:03:27,370 --> 00:03:30,210
because the one's position has the least weight
78
78
00:03:30,210 --> 00:03:32,180
in a decimal number
79
79
00:03:32,180 --> 00:03:34,310
and so, once we've done that,
80
80
00:03:34,310 --> 00:03:36,800
the bottom array will be the result.
81
81
00:03:36,800 --> 00:03:38,780
1330 will come first
82
82
00:03:38,780 --> 00:03:41,960
because it has zero in the one's position,
83
83
00:03:41,960 --> 00:03:44,370
that'll be followed by 8792
84
84
00:03:44,370 --> 00:03:47,270
because there's a two in the one's position.
85
85
00:03:47,270 --> 00:03:49,287
Then we have 1594,
86
86
00:03:49,287 --> 00:03:54,287
4725, 4586 and 5729
87
87
00:03:54,507 --> 00:03:56,570
and if you look at their one's position
88
88
00:03:56,570 --> 00:03:58,470
or the rightmost digit,
89
89
00:03:58,470 --> 00:04:01,170
you'll see that the values have been sorted
90
90
00:04:01,170 --> 00:04:03,520
according to what's in that position
91
91
00:04:03,520 --> 00:04:07,280
and now we move to the second position
92
92
00:04:07,280 --> 00:04:10,980
from the right or the second least significant digit
93
93
00:04:10,980 --> 00:04:13,170
and we sort based on that.
94
94
00:04:13,170 --> 00:04:15,440
And so, at the top is the array we just had
95
95
00:04:15,440 --> 00:04:16,840
on the other slide
96
96
00:04:16,840 --> 00:04:18,450
and now we're gonna sort based
97
97
00:04:18,450 --> 00:04:22,310
on the second to last rightmost position.
98
98
00:04:22,310 --> 00:04:25,460
So, for 1330 that will be three,
99
99
00:04:25,460 --> 00:04:28,590
for 8792 it will be nine,
100
100
00:04:28,590 --> 00:04:31,500
for 1594 it will be nine,
101
101
00:04:31,500 --> 00:04:34,630
for 4725 it will be two et cetera
102
102
00:04:34,630 --> 00:04:36,880
and so, down at the bottom,
103
103
00:04:36,880 --> 00:04:39,150
we've now sorted based on that digit,
104
104
00:04:39,150 --> 00:04:42,540
so we have 4725 'cause it has a two,
105
105
00:04:42,540 --> 00:04:45,330
5729 which also have a two,
106
106
00:04:45,330 --> 00:04:47,378
1330 'cause it has a three,
107
107
00:04:47,378 --> 00:04:50,800
4586 because it has an eight
108
108
00:04:50,800 --> 00:04:54,970
and then 8792 and 1594 both have nines.
109
109
00:04:54,970 --> 00:04:56,810
Now, as I mentioned,
110
110
00:04:56,810 --> 00:05:00,800
critically important, this is a stable sort
111
111
00:05:00,800 --> 00:05:01,930
and so, you'll notice
112
112
00:05:01,930 --> 00:05:04,890
that we have a couple of pairs of duplicates here,
113
113
00:05:04,890 --> 00:05:08,580
4725 and 5729
114
114
00:05:08,580 --> 00:05:12,370
both have twos in their rightmost digit minus one position
115
115
00:05:12,370 --> 00:05:13,800
and you'll notice at the top
116
116
00:05:13,800 --> 00:05:18,301
that 4725 came before 5729
117
117
00:05:18,301 --> 00:05:23,301
and in the bottom, 4725 still comes before 5729
118
118
00:05:24,447 --> 00:05:26,360
and that is critical
119
119
00:05:26,360 --> 00:05:29,800
and that's because we have to use the stable sort algorithm
120
120
00:05:29,800 --> 00:05:32,310
and the same thing goes for 8792
121
121
00:05:32,310 --> 00:05:35,000
and 1594, they both have nines
122
122
00:05:35,000 --> 00:05:37,590
in the second to right position
123
123
00:05:37,590 --> 00:05:40,590
and in the top right, 8792
124
124
00:05:40,590 --> 00:05:43,210
came before 1594
125
125
00:05:43,210 --> 00:05:44,590
and it still does
126
126
00:05:44,590 --> 00:05:46,310
and so, that's critical.
127
127
00:05:46,310 --> 00:05:48,930
We have to use a stable sort algorithm.
128
128
00:05:48,930 --> 00:05:52,630
So, now we're gonna sort based on the hundreds position.
129
129
00:05:52,630 --> 00:05:55,317
So, we sorted here based on the tens position,
130
130
00:05:55,317 --> 00:05:58,320
and now we're gonna sort based on the hundreds position
131
131
00:05:58,320 --> 00:06:00,520
and so, when we do that,
132
132
00:06:00,520 --> 00:06:04,810
1330 comes first 'cause it's got three in that position,
133
133
00:06:04,810 --> 00:06:07,400
followed by 4586,
134
134
00:06:07,400 --> 00:06:10,020
1594, 4725, 5729
135
135
00:06:12,510 --> 00:06:15,140
and 8792 and once again,
136
136
00:06:15,140 --> 00:06:17,771
we have to use a stable sort algorithm
137
137
00:06:17,771 --> 00:06:19,560
and so, you'll notice here
138
138
00:06:19,560 --> 00:06:23,440
that we have two fives in the hundreds position
139
139
00:06:23,440 --> 00:06:27,460
and 4586 still comes before 1594
140
140
00:06:27,460 --> 00:06:30,150
as it did in the top array
141
141
00:06:30,150 --> 00:06:33,820
and we have three values with seven in the hundreds position
142
142
00:06:33,820 --> 00:06:37,330
and the relative positions of those three values
143
143
00:06:37,330 --> 00:06:38,920
have been preserved.
144
144
00:06:38,920 --> 00:06:43,570
So, 4725 still comes before 5729
145
145
00:06:43,570 --> 00:06:46,480
and that still comes before 8792
146
146
00:06:46,480 --> 00:06:48,960
and then the final step is we sort based
147
147
00:06:48,960 --> 00:06:50,670
on the thousands position.
148
148
00:06:50,670 --> 00:06:55,670
And so, 1330 has a one, 1594 has a one,
149
149
00:06:56,090 --> 00:07:00,140
4586 and 4725 both have fours
150
150
00:07:00,140 --> 00:07:03,590
and then there's 5729 and 8792
151
151
00:07:03,590 --> 00:07:06,400
and we have sorted our array.
152
152
00:07:06,400 --> 00:07:08,850
Now once again, this had to be a stable sort
153
153
00:07:08,850 --> 00:07:13,096
and so, 1330 had to come before 1594,
154
154
00:07:13,096 --> 00:07:16,640
4586 had to come before 4725.
155
155
00:07:16,640 --> 00:07:18,940
Now maybe you can understand at this point
156
156
00:07:18,940 --> 00:07:21,060
why the sort has to be stable.
157
157
00:07:21,060 --> 00:07:24,540
If it wasn't stable, this radix sort wouldn't work
158
158
00:07:24,540 --> 00:07:27,320
because at each stage, for example,
159
159
00:07:27,320 --> 00:07:31,840
in this stage, if 1594 could cross 1330,
160
160
00:07:32,700 --> 00:07:34,860
we wouldn't get the correct sorted order.
161
161
00:07:34,860 --> 00:07:39,180
If 4725 could end up before 4586,
162
162
00:07:39,180 --> 00:07:41,530
we would not get the correct sorted order
163
163
00:07:41,530 --> 00:07:42,790
and it's kind of interesting
164
164
00:07:42,790 --> 00:07:45,310
because when we first start doing a sort,
165
165
00:07:45,310 --> 00:07:47,580
it just means, you start to wonder,
166
166
00:07:47,580 --> 00:07:48,930
how is this ever going to work?
167
167
00:07:48,930 --> 00:07:51,010
Because when we sort on the one's position,
168
168
00:07:51,010 --> 00:07:53,400
we got, we go back there,
169
169
00:07:53,400 --> 00:07:54,530
we got a certain array
170
170
00:07:54,530 --> 00:07:56,600
but then when we sorted on the 10 position,
171
171
00:07:56,600 --> 00:07:58,970
all the elements are changing positions again
172
172
00:07:58,970 --> 00:08:01,090
and you're wondering how is this ever going to end up
173
173
00:08:01,090 --> 00:08:02,310
in sorted order?
174
174
00:08:02,310 --> 00:08:04,610
But the key is we're sorting
175
175
00:08:04,610 --> 00:08:06,660
from the least significant digit,
176
176
00:08:06,660 --> 00:08:08,010
the one with the least weight
177
177
00:08:08,010 --> 00:08:09,460
when it comes to the value
178
178
00:08:09,460 --> 00:08:11,080
to the most significant digit
179
179
00:08:11,080 --> 00:08:14,500
and at every phase, it's a stable sort.
180
180
00:08:14,500 --> 00:08:15,951
And so, because of that,
181
181
00:08:15,951 --> 00:08:19,120
as we move from right to left,
182
182
00:08:19,120 --> 00:08:22,410
as we go from the least significant digit
183
183
00:08:22,410 --> 00:08:24,290
to the most significant digit,
184
184
00:08:24,290 --> 00:08:27,970
you'll see that a sorted order is starting to emerge
185
185
00:08:27,970 --> 00:08:29,580
and then on the final step,
186
186
00:08:29,580 --> 00:08:31,610
because it's a stable sort,
187
187
00:08:31,610 --> 00:08:35,030
all of the previous sorts that we've done still count,
188
188
00:08:35,030 --> 00:08:37,870
they don't get undone at each stage,
189
189
00:08:37,870 --> 00:08:40,680
it kind of looks like we're not sorting at each stage
190
190
00:08:40,680 --> 00:08:43,840
but we are, the data is gradually becoming more
191
191
00:08:43,840 --> 00:08:45,230
and more sorted
192
192
00:08:45,230 --> 00:08:47,810
and the fact that we're using a stable sort algorithm means
193
193
00:08:47,810 --> 00:08:49,640
that on that final sort,
194
194
00:08:49,640 --> 00:08:52,470
when we're sorting on the most significant digit,
195
195
00:08:52,470 --> 00:08:56,520
values that have the same most significant digit
196
196
00:08:56,520 --> 00:08:58,930
will not switch relative positions
197
197
00:08:58,930 --> 00:09:02,075
and so, the values that have a higher hundreds,
198
198
00:09:02,075 --> 00:09:04,930
a higher digit in the hundreds will remain
199
199
00:09:04,930 --> 00:09:08,130
after values that have a lower digit in the hundreds
200
200
00:09:08,130 --> 00:09:11,600
and the same will go when we do the sort with the tens.
201
201
00:09:11,600 --> 00:09:13,900
When we do the sort based on the tens,
202
202
00:09:13,900 --> 00:09:16,120
values that have a duplicate tens value,
203
203
00:09:16,120 --> 00:09:18,900
well, the value with the higher digit in the one's position
204
204
00:09:18,900 --> 00:09:21,250
will come after the value with the lower digit
205
205
00:09:21,250 --> 00:09:22,290
in the one's position
206
206
00:09:22,290 --> 00:09:23,220
and so, you can see
207
207
00:09:23,220 --> 00:09:25,560
that because we're using a stable sort algorithm
208
208
00:09:25,560 --> 00:09:26,880
at each phase
209
209
00:09:26,880 --> 00:09:29,800
and we're going from the least significant digit
210
210
00:09:29,800 --> 00:09:31,170
to the most significant digit,
211
211
00:09:31,170 --> 00:09:33,530
or letter depending on what we're sorting,
212
212
00:09:33,530 --> 00:09:37,840
the results of previous sorting phases are preserved
213
213
00:09:37,840 --> 00:09:40,820
and that's why radix sort works.
214
214
00:09:40,820 --> 00:09:42,990
So, counting sort is often used
215
215
00:09:42,990 --> 00:09:44,990
as the sort algorithm for radix sort.
216
216
00:09:44,990 --> 00:09:46,500
In the last video I said
217
217
00:09:46,500 --> 00:09:49,600
that the implementation I showed you was not stable,
218
218
00:09:49,600 --> 00:09:51,620
so when we implement radix sort,
219
219
00:09:51,620 --> 00:09:54,380
you'll have a chance to see the stable version
220
220
00:09:54,380 --> 00:09:55,610
of counting sort.
221
221
00:09:55,610 --> 00:09:58,440
Now, once again, because radix sort makes assumptions
222
222
00:09:58,440 --> 00:10:01,200
about the data, we can achieve linear time.
223
223
00:10:01,200 --> 00:10:04,160
Even so, this often runs slower
224
224
00:10:04,160 --> 00:10:06,360
than O to the nlog of n
225
225
00:10:06,360 --> 00:10:08,490
because of the overhead involved
226
226
00:10:08,490 --> 00:10:09,590
and by overhead,
227
227
00:10:09,590 --> 00:10:14,571
I mean that you have to isolate each individual digit
228
228
00:10:14,571 --> 00:10:17,550
or letter at each phase
229
229
00:10:17,550 --> 00:10:19,020
and so, there's overhead involved
230
230
00:10:19,020 --> 00:10:21,470
just to figure out what value you're supposed to be sorting
231
231
00:10:21,470 --> 00:10:23,010
at each phase and because of that,
232
232
00:10:23,010 --> 00:10:24,780
even though it can achieve linear time,
233
233
00:10:24,780 --> 00:10:27,010
it often runs a little bit slower than that.
234
234
00:10:27,010 --> 00:10:28,820
Now whether or not it's in place
235
235
00:10:28,820 --> 00:10:30,980
will depend on which sort algorithm you use
236
236
00:10:30,980 --> 00:10:33,830
because as we've seen, some sort algorithms are in place
237
237
00:10:33,830 --> 00:10:35,100
and some aren't.
238
238
00:10:35,100 --> 00:10:37,580
So, radix sort can be in place
239
239
00:10:37,580 --> 00:10:40,210
or it might not be place.
240
240
00:10:40,210 --> 00:10:43,520
It's gonna depend on what sort algorithm you choose
241
241
00:10:43,520 --> 00:10:45,340
to do the sorting at each phase.
242
242
00:10:45,340 --> 00:10:47,900
It's a stable algorithm as we've seen
243
243
00:10:47,900 --> 00:10:50,250
because the sort algorithm we use at each phase
244
244
00:10:50,250 --> 00:10:53,640
has to be stable, radix sort turns out to be stable
245
245
00:10:53,640 --> 00:10:57,000
and in fact, that's key to making it work
246
246
00:10:57,000 --> 00:10:59,300
is that the sorts in earlier phases
247
247
00:10:59,300 --> 00:11:02,610
aren't being undone as we sort based
248
248
00:11:02,610 --> 00:11:05,790
on the more significant positions.
249
249
00:11:05,790 --> 00:11:07,230
So, that's it for the theory.
250
250
00:11:07,230 --> 00:11:09,320
Let's move onto implementation.
251
251
00:11:09,320 --> 00:11:10,870
I'll see you in the next video.
21085
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.