Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
0
00:00:00,000 --> 00:00:02,994
>> [MUSIC PLAYING]
1
00:00:02,994 --> 00:00:05,426
2
00:00:05,426 --> 00:00:08,550
DOUG LLOYD: So we've been inching closer and closer that holy grail of data
3
00:00:08,550 --> 00:00:13,050
structures, one that we can insert into, delete from, and look up
4
00:00:13,050 --> 00:00:15,440
in constant time.
5
00:00:15,440 --> 00:00:16,270
Right.
6
00:00:16,270 --> 00:00:17,280
That's sort of the goal.
7
00:00:17,280 --> 00:00:19,720
We want to be able to do things very, very quickly.
8
00:00:19,720 --> 00:00:22,580
>> Have we found it here when we're talking about tries?
9
00:00:22,580 --> 00:00:23,670
Well, let's take a look.
10
00:00:23,670 --> 00:00:25,628
So we've seen several different data structures
11
00:00:25,628 --> 00:00:28,680
that handle the mapping of so-called key-value pairs,
12
00:00:28,680 --> 00:00:32,080
mapping some piece of data to some other piece of data
13
00:00:32,080 --> 00:00:36,020
so we know where to find information in the structure.
14
00:00:36,020 --> 00:00:40,060
>> So for array, for example, the key is the element index or array
15
00:00:40,060 --> 00:00:42,600
location 0 or array 1 and so on.
16
00:00:42,600 --> 00:00:46,140
And the value is the data that exists at that location.
17
00:00:46,140 --> 00:00:48,550
So what is stored in array 0?
18
00:00:48,550 --> 00:00:54,290
What is stored in array 1 versus just 0 and 1, which would be the keys.
19
00:00:54,290 --> 00:00:56,360
>> With a hash table it's sort of the same idea.
20
00:00:56,360 --> 00:01:00,690
With a hash table, we have this hash function that generates hash codes.
21
00:01:00,690 --> 00:01:03,670
So the key is the hash code of the data.
22
00:01:03,670 --> 00:01:06,530
And the value, particularly we talked about chaining
23
00:01:06,530 --> 00:01:10,590
in the video on hash tables, is that linked list of data
24
00:01:10,590 --> 00:01:12,550
that hashes to that hashcode.
25
00:01:12,550 --> 00:01:14,050
Right.
26
00:01:14,050 --> 00:01:16,050
What about another approach this method, though?
27
00:01:16,050 --> 00:01:21,000
What about a method where the key is guaranteed to be unique,
28
00:01:21,000 --> 00:01:25,410
unlike a hash table, where we could end up with two pieces of data
29
00:01:25,410 --> 00:01:27,200
having the same hashcode.
30
00:01:27,200 --> 00:01:30,020
And then we have to deal with that by either probing or more
31
00:01:30,020 --> 00:01:33,340
preferably chaining to fix that problem.
32
00:01:33,340 --> 00:01:37,520
>> So now we can guarantee that our key will be unique.
33
00:01:37,520 --> 00:01:39,690
And what if our value was just something as easy
34
00:01:39,690 --> 00:01:44,080
as true and false that tells us whether or not that piece of information
35
00:01:44,080 --> 00:01:45,610
exists in the structure?
36
00:01:45,610 --> 00:01:48,180
A Boolean could be as simple as a bit.
37
00:01:48,180 --> 00:01:52,660
Realistically it's probably a byte more likely than a bit.
38
00:01:52,660 --> 00:01:55,410
But that's a lot smaller than storing maybe a 50-character string,
39
00:01:55,410 --> 00:01:57,360
for example.
40
00:01:57,360 --> 00:02:02,210
>> So tries, similar to hash tables, which combine arrays and linked list,
41
00:02:02,210 --> 00:02:05,790
tries combine arrays, structures, and pointers
42
00:02:05,790 --> 00:02:08,509
together to store data in an interesting way that's
43
00:02:08,509 --> 00:02:11,550
pretty different from anything we've seen so far.
44
00:02:11,550 --> 00:02:16,750
Now we use the data as a roadmap to navigate this data structure.
45
00:02:16,750 --> 00:02:18,710
And if we can follow the roadmap, if we can
46
00:02:18,710 --> 00:02:22,390
follow the data from beginning to end, we'll
47
00:02:22,390 --> 00:02:24,945
know whether that data exist in the trie.
48
00:02:24,945 --> 00:02:28,310
>> And if we can't follow the map from meaning to end at all,
49
00:02:28,310 --> 00:02:30,600
the data can't exist.
50
00:02:30,600 --> 00:02:32,890
Again, the keys here are guaranteed to be unique.
51
00:02:32,890 --> 00:02:36,020
And so unlike a hash table, we'll never have to deal with collisions here.
52
00:02:36,020 --> 00:02:39,090
And no two pieces of data have exactly the same roadmap
53
00:02:39,090 --> 00:02:40,530
unless that data is identical.
54
00:02:40,530 --> 00:02:44,580
>> If we insert John, then we search for John.
55
00:02:44,580 --> 00:02:47,430
That's two identical pieces of data, right, we're looking through.
56
00:02:47,430 --> 00:02:49,880
But otherwise, any two pieces of data are
57
00:02:49,880 --> 00:02:52,750
guaranteed to have unique roadmaps through this data structure.
58
00:02:52,750 --> 00:02:56,210
And we're going to take a look at a visual of this in just a moment.
59
00:02:56,210 --> 00:02:58,810
>> We'll do this by trying to create a new data structure,
60
00:02:58,810 --> 00:03:00,564
mapping the following key value pairs.
61
00:03:00,564 --> 00:03:03,480
In this case, we're not going to use something as simple as a Boolean.
62
00:03:03,480 --> 00:03:06,200
We actually will store the string.
63
00:03:06,200 --> 00:03:08,690
And that string is going to be the name of a university.
64
00:03:08,690 --> 00:03:12,140
>> And the key is going to be the year when that university was founded.
65
00:03:12,140 --> 00:03:15,380
All years for universities are going to be four digits.
66
00:03:15,380 --> 00:03:19,840
And so we'll use those four digits to navigate through this data structure.
67
00:03:19,840 --> 00:03:22,270
And we'll see, again, how we do that in just a second.
68
00:03:22,270 --> 00:03:25,110
>> At the end of the path, we'll see the name
69
00:03:25,110 --> 00:03:30,250
of the university that corresponds to that key, those four digits.
70
00:03:30,250 --> 00:03:34,390
The basic idea behind a trie is we have a central route.
71
00:03:34,390 --> 00:03:35,640
So think about it like a tree.
72
00:03:35,640 --> 00:03:39,211
And this is similar in spelling and in concept to a tree.
73
00:03:39,211 --> 00:03:41,460
Generally when we think about trees in the real world,
74
00:03:41,460 --> 00:03:44,090
they have a root that's in the ground and they grow upward
75
00:03:44,090 --> 00:03:46,830
and they have branches and they have leaves.
76
00:03:46,830 --> 00:03:49,450
And basically the idea of a trie is exactly the same,
77
00:03:49,450 --> 00:03:51,755
except that root is anchored somewhere in the sky.
78
00:03:51,755 --> 00:03:53,130
And the leaves are at the bottom.
79
00:03:53,130 --> 00:03:55,750
>> So it's kind of like taking a tree and just flipping it upside down.
80
00:03:55,750 --> 00:03:56,880
But there are still branches.
81
00:03:56,880 --> 00:03:59,463
And those would be our pathways, those will be our connections
82
00:03:59,463 --> 00:04:02,220
from the root to the leaves.
83
00:04:02,220 --> 00:04:04,200
In this case, those paths, those branches
84
00:04:04,200 --> 00:04:08,490
are labeled with digits that tell us which way to go from where we are.
85
00:04:08,490 --> 00:04:11,800
>> If we see a 0, we go down this branch, if we see a 1, we go down this branch,
86
00:04:11,800 --> 00:04:12,900
and so and so on.
87
00:04:12,900 --> 00:04:14,060
Well, what does this mean?
88
00:04:14,060 --> 00:04:16,519
Well, that means that at every junction point
89
00:04:16,519 --> 00:04:19,260
and every node in the middle and every branch,
90
00:04:19,260 --> 00:04:23,020
there are 10 possible places that we can go.
91
00:04:23,020 --> 00:04:27,690
So there are 10 pointers from every location.
92
00:04:27,690 --> 00:04:30,610
>> And this is where tries can get a little bit intimidating for somebody
93
00:04:30,610 --> 00:04:34,460
who's doesn't have a lot of experience in computer science before.
94
00:04:34,460 --> 00:04:35,960
But tries are really pretty awesome.
95
00:04:35,960 --> 00:04:37,793
And if you have the chance to work with them
96
00:04:37,793 --> 00:04:40,420
and you're willing to dig-in and experiment with them,
97
00:04:40,420 --> 00:04:44,234
they're really quite interesting data structures to work with.
98
00:04:44,234 --> 00:04:46,900
If we want to insert an element into the trie, all we need to do
99
00:04:46,900 --> 00:04:51,360
is build the correct path from the root to the leaf.
100
00:04:51,360 --> 00:04:55,390
Here's what every step along the way might look like.
101
00:04:55,390 --> 00:04:59,660
We're going to define a new data structure for a new node called a trie.
102
00:04:59,660 --> 00:05:02,560
>> And inside of that data structure there are two pieces.
103
00:05:02,560 --> 00:05:05,460
We're going to store the name of a university.
104
00:05:05,460 --> 00:05:09,410
And we're going to store an array of pointers
105
00:05:09,410 --> 00:05:12,190
to other nodes of the same type.
106
00:05:12,190 --> 00:05:14,780
So, again, this is that sort of concept of everywhere
107
00:05:14,780 --> 00:05:18,567
we are, we at 10 possible places we can go.
108
00:05:18,567 --> 00:05:20,150
If we see a 0, we go down this branch.
109
00:05:20,150 --> 00:05:22,690
If we see a 1, this branch, and so on and so on and so on.
110
00:05:22,690 --> 00:05:25,160
If we say 9, we go down this branch.
111
00:05:25,160 --> 00:05:28,220
So at every junction point, we can go 10 possible places.
112
00:05:28,220 --> 00:05:35,740
So every node has to contain 10 pointers to other nodes, to 10 other nodes.
113
00:05:35,740 --> 00:05:39,810
>> And the data we're storing is just the name of the university.
114
00:05:39,810 --> 00:05:41,060
So let's build a trie.
115
00:05:41,060 --> 00:05:44,860
Let's insert a couple of items into our trie.
116
00:05:44,860 --> 00:05:46,740
So at the very top, this is our root node.
117
00:05:46,740 --> 00:05:49,740
This is probably going to be something you're going to globally declare.
118
00:05:49,740 --> 00:05:53,450
And you're going to globally maintain a pointer to this node always.
119
00:05:53,450 --> 00:05:55,360
>> You're going to say, root equals, and you're
120
00:05:55,360 --> 00:05:57,580
going to malloc yourself trie node.
121
00:05:57,580 --> 00:05:59,850
And you're never going to touch root again.
122
00:05:59,850 --> 00:06:02,300
Every time you want to start navigating through,
123
00:06:02,300 --> 00:06:05,802
you set another pointer equal to root, such as trav,
124
00:06:05,802 --> 00:06:07,760
which is the example I use in many of my videos
125
00:06:07,760 --> 00:06:11,090
here on stacks and queues and link lists and so on.
126
00:06:11,090 --> 00:06:13,320
>> You set another pointer called trav for traversal.
127
00:06:13,320 --> 00:06:15,890
And you use trav to navigate through the data structure.
128
00:06:15,890 --> 00:06:17,500
So let's see how this might look.
129
00:06:17,500 --> 00:06:19,880
So right now, what does a node look like?
130
00:06:19,880 --> 00:06:22,920
Well, just as our data structure declaration indicated,
131
00:06:22,920 --> 00:06:26,906
we have a string, which in this case is empty.
132
00:06:26,906 --> 00:06:27,780
There's nothing here.
133
00:06:27,780 --> 00:06:29,550
>> And an array of 10 pointers.
134
00:06:29,550 --> 00:06:31,790
And right now, we only have 1 node in this trie.
135
00:06:31,790 --> 00:06:33,110
There's nothing else in it.
136
00:06:33,110 --> 00:06:36,020
So all 10 of those pointers point to null.
137
00:06:36,020 --> 00:06:38,090
That's what the red indicates.
138
00:06:38,090 --> 00:06:39,500
>> Let's insert the string Harvard.
139
00:06:39,500 --> 00:06:41,999
Let's insert the University of Harvard into this trie, which
140
00:06:41,999 --> 00:06:43,940
was founded in the year 1636.
141
00:06:43,940 --> 00:06:48,220
We want to use the key, 1636, to tell us where we're
142
00:06:48,220 --> 00:06:50,140
going to store Harvard in the trie.
143
00:06:50,140 --> 00:06:51,470
Now, how might we do that?
144
00:06:51,470 --> 00:06:52,886
>> It might look something like this.
145
00:06:52,886 --> 00:06:54,160
We start at the root.
146
00:06:54,160 --> 00:06:56,920
And we have these 10 places we can go.
147
00:06:56,920 --> 00:06:59,900
The root is just like any other node of the trie.
148
00:06:59,900 --> 00:07:02,850
There are 10 places we can go from here.
149
00:07:02,850 --> 00:07:07,215
>> Where do we probably want to go if the key is 1636?
150
00:07:07,215 --> 00:07:08,340
There's really two options.
151
00:07:08,340 --> 00:07:08,450
Right.
152
00:07:08,450 --> 00:07:10,825
We can build the key from right to left and start with 6.
153
00:07:10,825 --> 00:07:14,000
Or we could build the key from left to right and start with 1.
154
00:07:14,000 --> 00:07:16,140
>> It's probably more intuitive as a human being
155
00:07:16,140 --> 00:07:18,110
to understand we'll just go left to right.
156
00:07:18,110 --> 00:07:21,140
And so if I want to insert Harvard into this trie,
157
00:07:21,140 --> 00:07:23,560
I probably want to start by starting at the root,
158
00:07:23,560 --> 00:07:25,720
looking at my 10 options in front of me, and saying
159
00:07:25,720 --> 00:07:28,700
I want to go down the 1 path.
160
00:07:28,700 --> 00:07:29,700
OK.
161
00:07:29,700 --> 00:07:31,810
>> Now, 1 path is currently null.
162
00:07:31,810 --> 00:07:35,920
So if I want to proceed down that path to insert this element into the trie,
163
00:07:35,920 --> 00:07:42,040
I have to malloc a new node, have 1 point there, and then I'm good to go.
164
00:07:42,040 --> 00:07:46,460
>> So I basically am at a point where I'm standing
165
00:07:46,460 --> 00:07:50,270
at the root of the tree or the trie and there are 10 branches.
166
00:07:50,270 --> 00:07:52,260
But each branch has a gate in front of it.
167
00:07:52,260 --> 00:07:53,060
Right.
168
00:07:53,060 --> 00:07:54,850
Because there's nothing else there's.
169
00:07:54,850 --> 00:07:56,522
No safe passage.
170
00:07:56,522 --> 00:07:58,980
That means that there's nothing down any of those branches.
171
00:07:58,980 --> 00:08:02,532
If I want to start building something, I want to remove the gate.
172
00:08:02,532 --> 00:08:04,490
I want to remove the gate in front of number 1.
173
00:08:04,490 --> 00:08:05,698
And I want to walk down that.
174
00:08:05,698 --> 00:08:08,060
And I want to build another place for me to go.
175
00:08:08,060 --> 00:08:09,470
>> And that's what I've done here.
176
00:08:09,470 --> 00:08:11,430
So 1 no longer points to null.
177
00:08:11,430 --> 00:08:13,830
I've said it's safe to go down here now.
178
00:08:13,830 --> 00:08:15,789
I built another node.
179
00:08:15,789 --> 00:08:18,330
And when I get to that node, I have another decision to make.
180
00:08:18,330 --> 00:08:20,890
Where am I going to go from here?
181
00:08:20,890 --> 00:08:22,700
>> Well, I've already gone down the 1.
182
00:08:22,700 --> 00:08:24,470
So now I probably want to go down the 6.
183
00:08:24,470 --> 00:08:24,970
Right.
184
00:08:24,970 --> 00:08:27,100
Again, I have 10 locations I can choose.
185
00:08:27,100 --> 00:08:30,060
So let's now go down number 6.
186
00:08:30,060 --> 00:08:32,280
So I clear the gate in front of number 6.
187
00:08:32,280 --> 00:08:33,250
And I walk down there.
188
00:08:33,250 --> 00:08:34,580
And I build another node.
189
00:08:34,580 --> 00:08:37,630
And I've reached another junction point.
190
00:08:37,630 --> 00:08:40,289
>> Again, I have 10 choices for where I can go.
191
00:08:40,289 --> 00:08:42,799
I've moved from 1 to 6.
192
00:08:42,799 --> 00:08:44,215
So now I probably want to go to 3.
193
00:08:44,215 --> 00:08:45,381
3, there's nowhere I can go.
194
00:08:45,381 --> 00:08:48,980
So I have to clear the way and build myself a new space.
195
00:08:48,980 --> 00:08:50,870
And then from 3, where do I want to go?
196
00:08:50,870 --> 00:08:52,450
I want to go down 6.
197
00:08:52,450 --> 00:08:54,770
>> And, again, I had to clear the way to do it.
198
00:08:54,770 --> 00:08:59,179
So now I've used my key to insert create nodes and start to build this trie.
199
00:08:59,179 --> 00:09:00,220
I've started at the root.
200
00:09:00,220 --> 00:09:03,666
I've gone down 1636.
201
00:09:03,666 --> 00:09:05,540
And now I'm at the bottom there on that node.
202
00:09:05,540 --> 00:09:06,610
And you might be able to see it on your screen.
203
00:09:06,610 --> 00:09:07,735
>> It's highlighted in yellow.
204
00:09:07,735 --> 00:09:10,020
That's where I currently am.
205
00:09:10,020 --> 00:09:11,300
My key is done.
206
00:09:11,300 --> 00:09:13,030
I've exhausted every position in my key.
207
00:09:13,030 --> 00:09:15,040
So I can't go any further.
208
00:09:15,040 --> 00:09:17,720
So at this point, all I really need to do is say, OK.
209
00:09:17,720 --> 00:09:18,990
It's kind of like looking down at the ground,
210
00:09:18,990 --> 00:09:21,115
if you're envisioning yourself as this sort of path
211
00:09:21,115 --> 00:09:22,350
with different connections.
212
00:09:22,350 --> 00:09:25,800
Sort of looking down and sort of spray painting Harvard on the ground.
213
00:09:25,800 --> 00:09:26,800
That's the name of this.
214
00:09:26,800 --> 00:09:28,300
Know that's what's at this location.
215
00:09:28,300 --> 00:09:31,870
If we start at the root and we go down 1 and then 6 and then 3 and then 6,
216
00:09:31,870 --> 00:09:32,780
where are we?
217
00:09:32,780 --> 00:09:35,640
Well if we look down and we see Harvard, then
218
00:09:35,640 --> 00:09:38,960
we know that Harvard was founded in 1636 based on the way
219
00:09:38,960 --> 00:09:41,400
we're implementing this data structure.
220
00:09:41,400 --> 00:09:43,177
So that was hopefully straightforward.
221
00:09:43,177 --> 00:09:44,760
We're going to do two more insertions.
222
00:09:44,760 --> 00:09:50,060
And hopefully it'll help to see this done twice more.
223
00:09:50,060 --> 00:09:52,210
>> Now, let's insert another university.
224
00:09:52,210 --> 00:09:54,630
Let's insert Yale into this trie.
225
00:09:54,630 --> 00:09:57,037
Yale was founded in 1701.
226
00:09:57,037 --> 00:09:58,870
So we'll start at the root, as we always do.
227
00:09:58,870 --> 00:09:59,890
And we set a traversal pointer.
228
00:09:59,890 --> 00:10:01,624
We're going to use that to move through.
229
00:10:01,624 --> 00:10:03,790
The first thing we want to do is go down the 1 path.
230
00:10:03,790 --> 00:10:05,830
That's the first digit of our key.
231
00:10:05,830 --> 00:10:08,420
Fortunately, though, we don't have to do any work this time.
232
00:10:08,420 --> 00:10:09,919
The 1 path has already been cleared.
233
00:10:09,919 --> 00:10:13,520
I cleared it previously when I was inserting Harvard at 1636.
234
00:10:13,520 --> 00:10:18,090
So I can safely move down 1 and just go there.
235
00:10:18,090 --> 00:10:20,150
If can move down the 1.
236
00:10:20,150 --> 00:10:22,930
>> Now, though, I want to go to 7.
237
00:10:22,930 --> 00:10:24,280
I cleared the way at 6.
238
00:10:24,280 --> 00:10:27,050
I know I can safely proceed down the 6 path.
239
00:10:27,050 --> 00:10:29,220
But I need to proceed on the 7 path.
240
00:10:29,220 --> 00:10:30,580
So what do I need to do?
241
00:10:30,580 --> 00:10:35,070
Well, just like before, I just need to clear the gate, get out of the way,
242
00:10:35,070 --> 00:10:38,740
and build a new node from the 7 path.
243
00:10:38,740 --> 00:10:40,250
Just like this.
244
00:10:40,250 --> 00:10:42,930
>> So now I've moved 1 and then 7.
245
00:10:42,930 --> 00:10:45,550
And now notice, I'm sort of on this new subbranch.
246
00:10:45,550 --> 00:10:46,050
Right.
247
00:10:46,050 --> 00:10:49,260
Everything else from 16 on, I don't care about.
248
00:10:49,260 --> 00:10:50,720
I'm not doing 16 anything.
249
00:10:50,720 --> 00:10:51,750
I'm doing 17 stuff.
250
00:10:51,750 --> 00:10:58,380
>> So now from 17 on, I have to sort of blaze new trails here.
251
00:10:58,380 --> 00:11:00,462
The next digit my key is 0.
252
00:11:00,462 --> 00:11:01,670
I clearly can't get anywhere.
253
00:11:01,670 --> 00:11:02,628
I just built this node.
254
00:11:02,628 --> 00:11:04,550
So I know there's no paths forward from here.
255
00:11:04,550 --> 00:11:06,370
So I have to make one myself.
256
00:11:06,370 --> 00:11:09,360
>> So I malloc a new node and have 0 point there.
257
00:11:09,360 --> 00:11:12,770
And then one more time, I malloc a new node and have one point there.
258
00:11:12,770 --> 00:11:15,870
Again, I've exhausted my key, 1701.
259
00:11:15,870 --> 00:11:18,472
So I look down and I spray paint Yale.
260
00:11:18,472 --> 00:11:19,680
That's the name of this node.
261
00:11:19,680 --> 00:11:24,660
>> And so now if I ever need to see if Yale is in this trie, I start at the root,
262
00:11:24,660 --> 00:11:27,060
I go down 1701, and look down.
263
00:11:27,060 --> 00:11:30,030
And if I see Yale spray painted onto the ground, then
264
00:11:30,030 --> 00:11:32,200
I know Yale exists in this trie.
265
00:11:32,200 --> 00:11:32,950
Let's do one more.
266
00:11:32,950 --> 00:11:36,430
Let's insert Dartmouth into this trie, which was founded in 1769.
267
00:11:36,430 --> 00:11:37,750
>> Start at the root again.
268
00:11:37,750 --> 00:11:39,445
My first digit of my key is 1.
269
00:11:39,445 --> 00:11:40,820
I can safely move down that path.
270
00:11:40,820 --> 00:11:42,400
That already exists.
271
00:11:42,400 --> 00:11:44,040
The next digit of my key is 7.
272
00:11:44,040 --> 00:11:45,890
I can safely move down that path.
273
00:11:45,890 --> 00:11:47,540
It exists as well.
274
00:11:47,540 --> 00:11:49,000
>> My next is 6.
275
00:11:49,000 --> 00:11:52,860
From here, from where I currently am in yellow there in that middle node,
276
00:11:52,860 --> 00:11:56,060
6 is currently locked off.
277
00:11:56,060 --> 00:11:58,830
If I want to go down that path, I have to build it myself.
278
00:11:58,830 --> 00:12:02,250
So I'll malloc a new node and have 6 point there.
279
00:12:02,250 --> 00:12:04,250
And then, again, I'm blazing new trails here.
280
00:12:04,250 --> 00:12:10,750
>> So I malloc a new node so that from that node-- path number 9-- and then now
281
00:12:10,750 --> 00:12:13,584
if I travel 1769, and I look down.
282
00:12:13,584 --> 00:12:15,500
There's nothing currently spray painted there.
283
00:12:15,500 --> 00:12:16,930
I can write Dartmouth.
284
00:12:16,930 --> 00:12:20,710
And I've inserted Dartmouth into the trie.
285
00:12:20,710 --> 00:12:23,450
>> So that's inserting things into the trie.
286
00:12:23,450 --> 00:12:25,384
Now we want to search for things.
287
00:12:25,384 --> 00:12:27,050
How do we search for things in the trie?
288
00:12:27,050 --> 00:12:29,170
Well, it's pretty much the same idea.
289
00:12:29,170 --> 00:12:33,620
Now we just use the digits of the key to see if we can navigate from the root
290
00:12:33,620 --> 00:12:37,170
to where we want to go in the trie.
291
00:12:37,170 --> 00:12:41,620
>> If we hit a dead end at any point, then we know that that element can't exists
292
00:12:41,620 --> 00:12:44,500
or else that path would have already been cleared.
293
00:12:44,500 --> 00:12:45,930
If we make it all the way to the end, all we need to do
294
00:12:45,930 --> 00:12:48,471
is look down and see if that's the element we're looking for.
295
00:12:48,471 --> 00:12:49,335
If it is, success.
296
00:12:49,335 --> 00:12:52,610
If it's not, fail.
297
00:12:52,610 --> 00:12:54,940
>> So let's search for Harvard in this trie.
298
00:12:54,940 --> 00:12:56,020
We start at the root.
299
00:12:56,020 --> 00:12:58,228
And, again, we're going to create a traversal pointer
300
00:12:58,228 --> 00:12:59,390
to do our moves for us.
301
00:12:59,390 --> 00:13:02,080
From the root we know that the first place we need to go is 1,
302
00:13:02,080 --> 00:13:03,390
can we do that?
303
00:13:03,390 --> 00:13:03,982
Yes, we can.
304
00:13:03,982 --> 00:13:04,690
If safely exists.
305
00:13:04,690 --> 00:13:06,660
We can go there.
306
00:13:06,660 --> 00:13:08,440
>> Now, the next place we need to go is 6.
307
00:13:08,440 --> 00:13:10,557
Does the 6 path exist from here?
308
00:13:10,557 --> 00:13:11,140
Yeah, it does.
309
00:13:11,140 --> 00:13:12,690
We can go down the 6 path.
310
00:13:12,690 --> 00:13:13,905
And we end up here.
311
00:13:13,905 --> 00:13:16,130
>> Can we go down the 3 path from here?
312
00:13:16,130 --> 00:13:18,450
Well, as it turns out, yes, that exists too.
313
00:13:18,450 --> 00:13:20,790
And can we get on the 6 path from here?
314
00:13:20,790 --> 00:13:21,982
Yes, we can.
315
00:13:21,982 --> 00:13:24,002
>> We haven't quite answered the question yet.
316
00:13:24,002 --> 00:13:25,710
There's still one more step, which is now
317
00:13:25,710 --> 00:13:28,520
we need to look down and see if that's actually--
318
00:13:28,520 --> 00:13:32,660
if we're looking for Harvard, is that what we find after we exhaust the key?
319
00:13:32,660 --> 00:13:35,430
In the example we're using here, the years are always four digits.
320
00:13:35,430 --> 00:13:40,280
But you might be using the example where you are storing a dictionary of words.
321
00:13:40,280 --> 00:13:44,060
>> And so instead of having 10 pointers for my location, you might have 26.
322
00:13:44,060 --> 00:13:46,040
One for each letter of the alphabet.
323
00:13:46,040 --> 00:13:50,350
And there are some words like bat, which is a subset of batch, for example.
324
00:13:50,350 --> 00:13:53,511
And so even if you get to the end of the key and you look down,
325
00:13:53,511 --> 00:13:55,260
you might not see what you're looking for.
326
00:13:55,260 --> 00:13:58,500
>> So you always have to traverse the entire path and then
327
00:13:58,500 --> 00:14:01,540
if you were successfully able to traverse the entire path,
328
00:14:01,540 --> 00:14:03,440
look down and do one final confirmation.
329
00:14:03,440 --> 00:14:05,120
Is that what I'm looking for?
330
00:14:05,120 --> 00:14:07,740
Well, I look down after starting at the top and going 1636.
331
00:14:07,740 --> 00:14:08,240
I look down.
332
00:14:08,240 --> 00:14:09,400
I see Harvard.
333
00:14:09,400 --> 00:14:11,689
So, yes, I succeeded.
334
00:14:11,689 --> 00:14:13,980
What if what I'm looking for isn't in the trie, though.
335
00:14:13,980 --> 00:14:17,200
What if I'm looking for Princeton, which was founded in 1746.
336
00:14:17,200 --> 00:14:20,875
And so 1746 becomes my key to navigate through the trie.
337
00:14:20,875 --> 00:14:22,040
Well, I start at the root.
338
00:14:22,040 --> 00:14:24,760
And the first place I want to goes down the 1 path.
339
00:14:24,760 --> 00:14:25,590
Can I do it?
340
00:14:25,590 --> 00:14:26,490
Yes, I can.
341
00:14:26,490 --> 00:14:28,730
>> Can I go down the 7 path from there?
342
00:14:28,730 --> 00:14:29,230
Yeah, I can.
343
00:14:29,230 --> 00:14:30,750
That exists too.
344
00:14:30,750 --> 00:14:32,460
But can I go down the 4 path from here?
345
00:14:32,460 --> 00:14:35,550
That's like asking the question, can I proceed down that little square
346
00:14:35,550 --> 00:14:37,114
that I've highlighted in yellow?
347
00:14:37,114 --> 00:14:38,030
There's nothing there.
348
00:14:38,030 --> 00:14:38,610
Right.
349
00:14:38,610 --> 00:14:41,310
>> There's no way forward down the 4 path.
350
00:14:41,310 --> 00:14:46,480
If Princeton was in this trie, that 4 would have been cleared for us already.
351
00:14:46,480 --> 00:14:49,130
And so at this point we've reached a dead end.
352
00:14:49,130 --> 00:14:50,250
We can't go any further.
353
00:14:50,250 --> 00:14:53,440
And so we can say, definitively, no.
354
00:14:53,440 --> 00:14:56,760
Princeton does not exist in this trie.
355
00:14:56,760 --> 00:14:58,860
>> So what does this all mean?
356
00:14:58,860 --> 00:14:59,360
Right.
357
00:14:59,360 --> 00:15:01,000
There's a lot going on here.
358
00:15:01,000 --> 00:15:02,500
There's pointers all over the place.
359
00:15:02,500 --> 00:15:04,249
And, as you can see just from the diagram,
360
00:15:04,249 --> 00:15:07,010
there's a lot of nodes that are kind of flying around.
361
00:15:07,010 --> 00:15:13,480
But notice every time we wanted to check whether something was in the trie,
362
00:15:13,480 --> 00:15:15,000
we only had to make 4 moves.
363
00:15:15,000 --> 00:15:17,208
>> Every time we wanted to insert something in the trie,
364
00:15:17,208 --> 00:15:20,440
we have to make 4 moves, possibly mallocing some stuff along the way.
365
00:15:20,440 --> 00:15:23,482
But as we saw when we inserted Dartmouth into the trie,
366
00:15:23,482 --> 00:15:25,940
sometimes some of the path might already be cleared for us.
367
00:15:25,940 --> 00:15:30,520
And so as our trie gets bigger and bigger, we have do less work every time
368
00:15:30,520 --> 00:15:32,270
to insert new things because we've already
369
00:15:32,270 --> 00:15:35,746
built a lot of the intermediate branches along the way.
370
00:15:35,746 --> 00:15:38,370
If we only ever have to look at 4 things, 4 is just a constant.
371
00:15:38,370 --> 00:15:41,750
We really are kind of approaching constant time insertion
372
00:15:41,750 --> 00:15:44,501
and constant time lookup.
373
00:15:44,501 --> 00:15:47,500
The tradeoff, of course, being that this trie, as you can probably tell,
374
00:15:47,500 --> 00:15:49,030
is huge.
375
00:15:49,030 --> 00:15:51,040
Each one of these nodes takes up a lot of space.
376
00:15:51,040 --> 00:15:52,090
>> But that's the tradeoff.
377
00:15:52,090 --> 00:15:55,260
If we want really quick insertion, really quick deletion,
378
00:15:55,260 --> 00:15:59,630
and really quick lookup, we have to have a lot of data flying around.
379
00:15:59,630 --> 00:16:03,590
We have to set aside a lot of space and memory for that data structure
380
00:16:03,590 --> 00:16:04,290
to exist.
381
00:16:04,290 --> 00:16:05,415
>> And so that's the tradeoff.
382
00:16:05,415 --> 00:16:07,310
But it looks like we might have found it.
383
00:16:07,310 --> 00:16:09,560
We might have found that holy grail of data structures
384
00:16:09,560 --> 00:16:12,264
with quick insertion, deletion, and lookup.
385
00:16:12,264 --> 00:16:14,430
And maybe this will be an appropriate data structure
386
00:16:14,430 --> 00:16:18,890
to use for whatever information we're trying to store.
387
00:16:18,890 --> 00:16:21,860
I'm Doug Lloyd, this is cs50.
388
00:16:21,860 --> 00:16:23,433
30602
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.