Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
0
00:00:00,000 --> 00:01:18,380
[MUSIC PLAYING]
1
00:01:18,380 --> 00:01:20,000
SPEAKER 1: All right.
2
00:01:20,000 --> 00:01:21,830
This is CS 50.
3
00:01:21,830 --> 00:01:24,740
And this is already week 5, which means this is actually
4
00:01:24,740 --> 00:01:27,240
our last week in C together.
5
00:01:27,240 --> 00:01:31,070
In fact, in just a few days' time, what has looked like this
6
00:01:31,070 --> 00:01:33,490
and much more cryptic than this perhaps, is
7
00:01:33,490 --> 00:01:35,990
going to be distilled into something much simpler next week.
8
00:01:35,990 --> 00:01:38,150
When we transition to a language called Python.
9
00:01:38,150 --> 00:01:42,470
And with Python, we'll still have our conditionals, and loops, and functions,
10
00:01:42,470 --> 00:01:43,173
and so forth.
11
00:01:43,173 --> 00:01:46,340
But a lot of the low-level plumbing that you might have been wrestling with,
12
00:01:46,340 --> 00:01:49,020
struggling with, frustrated by, over the past couple of weeks,
13
00:01:49,020 --> 00:01:51,320
especially, now that we've introduced pointers.
14
00:01:51,320 --> 00:01:54,200
And it feels like you probably have to do everything yourself.
15
00:01:54,200 --> 00:01:57,060
In Python, and in a lot of higher level languages
16
00:01:57,060 --> 00:01:59,450
so to speak-- more modern, more recent languages,
17
00:01:59,450 --> 00:02:02,540
you'll be able to do so much more with just single lines of code.
18
00:02:02,540 --> 00:02:05,540
And indeed, we're going to start leveraging libraries, all the more code
19
00:02:05,540 --> 00:02:06,980
that other people wrote.
20
00:02:06,980 --> 00:02:10,160
Frameworks, which is collections of libraries that other people wrote.
21
00:02:10,160 --> 00:02:13,610
And on top of all that, will you be able to make even better, grander, more
22
00:02:13,610 --> 00:02:17,210
impressive projects, that actually solve problems of particular interest to you.
23
00:02:17,210 --> 00:02:20,100
Particularly, by way of your own final project.
24
00:02:20,100 --> 00:02:23,600
So last week though, in week 4, recall that we focused on memory.
25
00:02:23,600 --> 00:02:26,210
And we've been treating this memory inside of your computer
26
00:02:26,210 --> 00:02:27,560
is like a canvas, right.
27
00:02:27,560 --> 00:02:30,770
At the end of the day, it's just zeros and ones, or bytes, really.
28
00:02:30,770 --> 00:02:33,900
And it's really up to you what you do with those bytes.
29
00:02:33,900 --> 00:02:37,400
And how you interconnect them, how you represent information on them.
30
00:02:37,400 --> 00:02:39,478
And arrays, were like one of the simplest ways.
31
00:02:39,478 --> 00:02:41,270
We started playing around with that memory.
32
00:02:41,270 --> 00:02:43,160
Just contiguous chunks of memory.
33
00:02:43,160 --> 00:02:44,300
Back-to-back, to back.
34
00:02:44,300 --> 00:02:47,030
But let's consider, for a moment, some of the problems that
35
00:02:47,030 --> 00:02:48,620
pretty quickly arise with arrays.
36
00:02:48,620 --> 00:02:52,190
And then, today focus on what more generally are called data structures.
37
00:02:52,190 --> 00:02:57,110
Using your computer's memory as a much more versatile canvas,
38
00:02:57,110 --> 00:02:59,380
to create even two-dimensional structures.
39
00:02:59,380 --> 00:03:01,130
To represent information, and, ultimately,
40
00:03:01,130 --> 00:03:03,210
to solve more interesting problems.
41
00:03:03,210 --> 00:03:04,790
So here's an array of size 3.
42
00:03:04,790 --> 00:03:06,590
Maybe, the size of 3 integers.
43
00:03:06,590 --> 00:03:08,838
And suppose that this is inside of a program.
44
00:03:08,838 --> 00:03:11,630
And at this point in the story, you've got 3 numbers in it already.
45
00:03:11,630 --> 00:03:13,040
1, 2 and 3.
46
00:03:13,040 --> 00:03:17,077
And suppose, whatever the context, you need to now add a fourth number
47
00:03:17,077 --> 00:03:17,660
to this array.
48
00:03:17,660 --> 00:03:18,950
Like, the number 4.
49
00:03:18,950 --> 00:03:21,967
Well, instinctively, where should the number 4 go?
50
00:03:21,967 --> 00:03:24,050
If this is your computer's memory and we currently
51
00:03:24,050 --> 00:03:25,759
have this array 1, 2, 3, from what.
52
00:03:25,759 --> 00:03:27,110
Left to right.
53
00:03:27,110 --> 00:03:30,340
Where should the number 4 just, perhaps, naively go.
54
00:03:30,340 --> 00:03:31,340
Yeah, what do you think?
55
00:03:31,340 --> 00:03:32,420
AUDIENCE: Replace number 1.
56
00:03:32,420 --> 00:03:32,930
SPEAKER 1: Sorry?
57
00:03:32,930 --> 00:03:33,830
AUDIENCE: Replace number 1.
58
00:03:33,830 --> 00:03:34,580
SPEAKER 1: Oh, OK.
59
00:03:34,580 --> 00:03:36,020
So you could replace number 1.
60
00:03:36,020 --> 00:03:37,895
I don't really like that, though, because I'd
61
00:03:37,895 --> 00:03:39,290
like to keep number 1 around.
62
00:03:39,290 --> 00:03:40,580
But that's an option.
63
00:03:40,580 --> 00:03:42,330
But I'm losing, of course, information.
64
00:03:42,330 --> 00:03:44,790
So what else could I do if I want to add the number 4.
65
00:03:44,790 --> 00:03:45,290
Over there?
66
00:03:45,290 --> 00:03:46,665
AUDIENCE: On the right side of 3.
67
00:03:46,665 --> 00:03:47,332
SPEAKER 1: Yeah.
68
00:03:47,332 --> 00:03:49,472
So, I mean, it feels like if there's some ordering
69
00:03:49,472 --> 00:03:51,680
to these, which seems kind of a reasonable inference,
70
00:03:51,680 --> 00:03:53,780
that it probably belongs somewhere over here.
71
00:03:53,780 --> 00:03:57,260
But recall last week, as we started poking around a computer's memory,
72
00:03:57,260 --> 00:03:59,130
there's other stuff potentially going on.
73
00:03:59,130 --> 00:04:02,750
And if fill that in, ideally, we'd want to just plop the number 4 here.
74
00:04:02,750 --> 00:04:04,580
If we're maintaining this kind of order.
75
00:04:04,580 --> 00:04:06,980
But recall in the context of your computer's memory,
76
00:04:06,980 --> 00:04:08,420
there might be other stuff there.
77
00:04:08,420 --> 00:04:10,932
Some of these garbage values that might be usable,
78
00:04:10,932 --> 00:04:12,890
but we don't really know or care what they are.
79
00:04:12,890 --> 00:04:14,480
As represented by Oscar here.
80
00:04:14,480 --> 00:04:17,510
But there might actually be useful data in use.
81
00:04:17,510 --> 00:04:20,900
Like, if your program has not just a few integers in this array,
82
00:04:20,900 --> 00:04:23,030
but also a string that says like, "Hello, world."
83
00:04:23,030 --> 00:04:29,090
It could be that your computer has plopped the H-E-L-L-O W-O-R-L-D right
84
00:04:29,090 --> 00:04:30,210
after this array.
85
00:04:30,210 --> 00:04:30,710
Why?
86
00:04:30,710 --> 00:04:32,960
Well, maybe, you created the array in one line of code
87
00:04:32,960 --> 00:04:34,610
and filled it with 1, 2, 3.
88
00:04:34,610 --> 00:04:37,010
Maybe the next line of code used GET-STRING.
89
00:04:37,010 --> 00:04:40,230
Or maybe just hard coded a string in your code for "Hello, world."
90
00:04:40,230 --> 00:04:42,977
And so you painted yourself into a corner, so to speak.
91
00:04:42,977 --> 00:04:45,560
Now I think you might claim, well, let's just overwrite the H.
92
00:04:45,560 --> 00:04:47,510
But that's problematic for the same reasons.
93
00:04:47,510 --> 00:04:49,230
We don't want to do that.
94
00:04:49,230 --> 00:04:52,130
So where else could the 4 go?
95
00:04:52,130 --> 00:04:55,370
Or how do we solve this problem if we want to add a number,
96
00:04:55,370 --> 00:04:57,080
and there's clearly memory available.
97
00:04:57,080 --> 00:05:00,470
Because those garbage values are junk that we don't care about anymore.
98
00:05:00,470 --> 00:05:02,600
So we could certainly reuse those.
99
00:05:02,600 --> 00:05:06,240
Where could the 4, and perhaps this whole array, go?
100
00:05:06,240 --> 00:05:06,740
OK.
101
00:05:06,740 --> 00:05:08,570
So I'm hearing we could move it somewhere.
102
00:05:08,570 --> 00:05:10,403
Maybe, replace some of those garbage values.
103
00:05:10,403 --> 00:05:12,420
And honestly, we have a lot of options.
104
00:05:12,420 --> 00:05:14,660
We could use any of these garbage values up here.
105
00:05:14,660 --> 00:05:17,400
We could use any of these down here, or even further down.
106
00:05:17,400 --> 00:05:20,960
The point is there is plenty of memory available as
107
00:05:20,960 --> 00:05:24,410
indicated by these Oscars, where we could put 4, maybe even, 5,
108
00:05:24,410 --> 00:05:25,790
6 or more integers.
109
00:05:25,790 --> 00:05:28,970
The catch is that we chose poorly early on.
110
00:05:28,970 --> 00:05:30,050
Or we just got unlucky.
111
00:05:30,050 --> 00:05:33,686
And 1, 2, 3 ended up back-to-back with some other data that we care about.
112
00:05:33,686 --> 00:05:34,769
All right, so that's fine.
113
00:05:34,769 --> 00:05:37,579
Let's go ahead and assume that we'll abstract away everything else.
114
00:05:37,579 --> 00:05:40,745
And we'll plop the new array in this location here.
115
00:05:40,745 --> 00:05:42,620
So I'm going to go ahead and copy the 1 over.
116
00:05:42,620 --> 00:05:43,520
The 2 over.
117
00:05:43,520 --> 00:05:44,420
The 3 over.
118
00:05:44,420 --> 00:05:47,152
And then, ultimately, once I'm ready to fill the 4,
119
00:05:47,152 --> 00:05:49,610
I can throw away, essentially, the old array at this point.
120
00:05:49,610 --> 00:05:51,620
Because I have it now entirely in duplicate.
121
00:05:51,620 --> 00:05:53,760
And I can populate it with the number 4.
122
00:05:53,760 --> 00:05:54,260
All right.
123
00:05:54,260 --> 00:05:55,130
So problem solved.
124
00:05:55,130 --> 00:05:58,100
That is a correct potential solution to this problem.
125
00:05:58,100 --> 00:05:59,183
But, what's the trade off?
126
00:05:59,183 --> 00:06:02,142
And this is something we're going to start thinking about all the more.
127
00:06:02,142 --> 00:06:04,820
What's the downside of having solved this problem in this way?
128
00:06:04,820 --> 00:06:06,415
Yeah.
129
00:06:06,415 --> 00:06:07,790
I'm adding a lot of running time.
130
00:06:07,790 --> 00:06:10,580
It took me a lot of effort to copy those additional numbers.
131
00:06:10,580 --> 00:06:12,020
Now, granted, it's a small array.
132
00:06:12,020 --> 00:06:13,020
3 numbers, who cares.
133
00:06:13,020 --> 00:06:14,895
It's going to be over in the blink of an eye.
134
00:06:14,895 --> 00:06:17,580
But if we start talking about interesting data sets,
135
00:06:17,580 --> 00:06:20,190
web application data sets, mobile app data sets.
136
00:06:20,190 --> 00:06:23,670
Where you have not just a few, but maybe a few hundred, few thousand,
137
00:06:23,670 --> 00:06:25,630
a few million pieces of data.
138
00:06:25,630 --> 00:06:28,770
This is probably a suboptimal solution to just, oh,
139
00:06:28,770 --> 00:06:30,752
move all your data from one place to another.
140
00:06:30,752 --> 00:06:32,460
Because who's to say that we're not going
141
00:06:32,460 --> 00:06:34,050
to paint ourselves into a new corner.
142
00:06:34,050 --> 00:06:37,260
And it would feel like you're wasting all of this time moving stuff around.
143
00:06:37,260 --> 00:06:41,110
And, ultimately, just costing yourself a huge amount of time.
144
00:06:41,110 --> 00:06:44,130
In fact, if we put this now into the context of our Big O notation
145
00:06:44,130 --> 00:06:49,050
from a few weeks back, what might the running time now of Search
146
00:06:49,050 --> 00:06:50,160
be for an array?
147
00:06:50,160 --> 00:06:51,270
Let's start simple.
148
00:06:51,270 --> 00:06:53,430
A throwback a couple of weeks ago.
149
00:06:53,430 --> 00:06:56,580
If you're using an array, to recap, what was the running time
150
00:06:56,580 --> 00:06:59,590
of a Search algorithm in Big O notation?
151
00:06:59,590 --> 00:07:01,770
So, maybe, in the worst case.
152
00:07:01,770 --> 00:07:05,550
If you've got n numbers, 3 in this case or 4, but n more generally.
153
00:07:05,550 --> 00:07:08,320
Big O of what for Search?
154
00:07:08,320 --> 00:07:08,820
Yeah.
155
00:07:08,820 --> 00:07:09,420
What do you think?
156
00:07:09,420 --> 00:07:10,050
AUDIENCE: Big O of n.
157
00:07:10,050 --> 00:07:11,100
SPEAKER 1: Big O of n.
158
00:07:11,100 --> 00:07:12,720
And what's your intuition for that?
159
00:07:12,720 --> 00:07:14,145
AUDIENCE: [INAUDIBLE].
160
00:07:18,487 --> 00:07:19,070
SPEAKER 1: OK.
161
00:07:19,070 --> 00:07:19,310
Yeah.
162
00:07:19,310 --> 00:07:22,102
So if we go through each element, for instance, from left to right,
163
00:07:22,102 --> 00:07:25,490
then Search is going to take this a Big O running time.
164
00:07:25,490 --> 00:07:28,520
If, though, we're talking about these numbers, specifically.
165
00:07:28,520 --> 00:07:31,490
And now I'll explicitly stipulate that, yeah, they're sorted.
166
00:07:31,490 --> 00:07:32,660
Does that buy us anything?
167
00:07:32,660 --> 00:07:36,950
What would the Big O notation be for Searching an array in this case,
168
00:07:36,950 --> 00:07:39,440
be it of size 3, or 4, or n, more generally.
169
00:07:39,440 --> 00:07:40,490
AUDIENCE: Big O of n.
170
00:07:40,490 --> 00:07:42,290
SPEAKER 1: Big O of, not n, but rather?
171
00:07:42,290 --> 00:07:42,680
AUDIENCE: Log n.
172
00:07:42,680 --> 00:07:43,700
SPEAKER 1: Log n, right.
173
00:07:43,700 --> 00:07:47,708
Because we could use per week zero binary search on an array like this,
174
00:07:47,708 --> 00:07:49,250
we'd have to deal with some rounding.
175
00:07:49,250 --> 00:07:51,440
Because there's not a perfect number of elements at the moment.
176
00:07:51,440 --> 00:07:52,850
But you could use binary search.
177
00:07:52,850 --> 00:07:54,170
Go to the middle roughly.
178
00:07:54,170 --> 00:07:55,910
And then go left or right, left or right,
179
00:07:55,910 --> 00:07:57,660
until you find the element you care about.
180
00:07:57,660 --> 00:08:01,820
So Search remains in Big O of log n when using arrays.
181
00:08:01,820 --> 00:08:03,650
But what about insertion, now?
182
00:08:03,650 --> 00:08:05,690
If we start to think about other operations.
183
00:08:05,690 --> 00:08:09,380
Like, adding a number to this array, or adding a friend to your contacts
184
00:08:09,380 --> 00:08:12,050
app, or Google finding another page on the internet.
185
00:08:12,050 --> 00:08:14,510
So insertion happens all the time.
186
00:08:14,510 --> 00:08:17,330
What's the running time of Insert?
187
00:08:17,330 --> 00:08:20,630
When it comes to inserting into an existing array of size n.
188
00:08:20,630 --> 00:08:23,300
How many steps might that take?
189
00:08:23,300 --> 00:08:24,170
Big O of n.
190
00:08:24,170 --> 00:08:25,220
It would be, indeed, n.
191
00:08:25,220 --> 00:08:25,720
Why?
192
00:08:25,720 --> 00:08:28,580
Because in the worst case, where you're out of space,
193
00:08:28,580 --> 00:08:31,148
you have to allocate, it would seem, a new array.
194
00:08:31,148 --> 00:08:33,440
Maybe, taking over some of the previous garbage values.
195
00:08:33,440 --> 00:08:35,180
But the catch is, even though you're only
196
00:08:35,180 --> 00:08:37,550
inserting one new number, like the number 4,
197
00:08:37,550 --> 00:08:41,070
you have to copy over all the darn existing numbers into the new one.
198
00:08:41,070 --> 00:08:44,060
So if your original array of size n, the copying of that
199
00:08:44,060 --> 00:08:45,930
is going to take Big O of n plus 1.
200
00:08:45,930 --> 00:08:48,930
But we can throw away the plus 1 because of the math we did in the past.
201
00:08:48,930 --> 00:08:51,860
So Insert now becomes Big O of n.
202
00:08:51,860 --> 00:08:53,720
And that might not be ideal.
203
00:08:53,720 --> 00:08:56,510
Because if you're in the habit of inserting things frequently,
204
00:08:56,510 --> 00:08:58,880
that could start to add up, and add up, and add up.
205
00:08:58,880 --> 00:09:01,820
And this is why computer programs, and websites, and mobile apps
206
00:09:01,820 --> 00:09:02,990
could be slow.
207
00:09:02,990 --> 00:09:06,000
If you're not being mindful of these trade offs.
208
00:09:06,000 --> 00:09:10,010
So what about, just for good measure, Omega notation.
209
00:09:10,010 --> 00:09:11,270
And maybe, the best case.
210
00:09:11,270 --> 00:09:13,760
Well just to recap here, we could get lucky
211
00:09:13,760 --> 00:09:16,052
and Search could just take one step.
212
00:09:16,052 --> 00:09:18,260
Because you might just get lucky, and boom the number
213
00:09:18,260 --> 00:09:20,810
you're looking for is right there in the middle, if using binary search.
214
00:09:20,810 --> 00:09:22,670
Or even linear search, for that matter.
215
00:09:22,670 --> 00:09:23,720
And insert 2.
216
00:09:23,720 --> 00:09:27,710
If there's enough room, and we didn't have to move all of those numbers--
217
00:09:27,710 --> 00:09:29,247
1, 2, and 3, to a new location.
218
00:09:29,247 --> 00:09:30,080
You could get lucky.
219
00:09:30,080 --> 00:09:32,240
And we could have, as someone suggested, just
220
00:09:32,240 --> 00:09:34,038
put the number 4 right there at the end.
221
00:09:34,038 --> 00:09:36,080
And if we don't get lucky, it might take n steps.
222
00:09:36,080 --> 00:09:39,960
If we do get lucky, it might just take the one, or constant number, of steps.
223
00:09:39,960 --> 00:09:41,670
In fact, let me go ahead and do this.
224
00:09:41,670 --> 00:09:43,320
How about we do something like this?
225
00:09:43,320 --> 00:09:45,020
Let me switch over to some code here.
226
00:09:45,020 --> 00:09:48,110
Let me start to make a program called List.C.
227
00:09:48,110 --> 00:09:50,789
And in List.C, let's start with the old way.
228
00:09:50,789 --> 00:09:54,030
So we follow the breadcrumbs we've laid for ourselves as follows.
229
00:09:54,030 --> 00:09:57,470
So in this List.C, I'm going to include standardio.h.
230
00:09:57,470 --> 00:09:59,450
Int main(void) as usual.
231
00:09:59,450 --> 00:10:02,780
Then inside of my code here, I'm going to go ahead and give myself
232
00:10:02,780 --> 00:10:04,590
the first version of memory.
233
00:10:04,590 --> 00:10:09,330
So int list 3 is now implemented at the moment, in an array.
234
00:10:09,330 --> 00:10:11,687
So we're rewinding for now to week 2 style code.
235
00:10:11,687 --> 00:10:13,520
And then, let me just initialize this thing.
236
00:10:13,520 --> 00:10:15,200
At the first location will be 1.
237
00:10:15,200 --> 00:10:17,240
At the next location will be 2.
238
00:10:17,240 --> 00:10:19,910
And at the last location will be 3.
239
00:10:19,910 --> 00:10:22,240
So the array is zero indexed always.
240
00:10:22,240 --> 00:10:23,990
I, for just the sake of discussion though,
241
00:10:23,990 --> 00:10:27,420
am putting in the numbers 1, 2, 3, like a normal person might.
242
00:10:27,420 --> 00:10:27,920
All right.
243
00:10:27,920 --> 00:10:29,337
So now let's just print these out.
244
00:10:29,337 --> 00:10:30,800
4 int i gets 0.
245
00:10:30,800 --> 00:10:32,840
I less than 3, i++.
246
00:10:32,840 --> 00:10:35,750
Let's go ahead now and print out using printf.
247
00:10:35,750 --> 00:10:38,660
%i/n list [i].
248
00:10:38,660 --> 00:10:42,290
So very simple program, inspired by what we did in week 2.
249
00:10:42,290 --> 00:10:46,200
Just to create and then print out the contents of an array.
250
00:10:46,200 --> 00:10:48,380
So let's Make List.
251
00:10:48,380 --> 00:10:52,460
So far, so good. ./list And voila, we see 1, 2, 3.
252
00:10:52,460 --> 00:10:57,470
Now let's start to practice some of what we're preaching with this new syntax.
253
00:10:57,470 --> 00:11:02,060
So let me go in now and get rid of the array version.
254
00:11:02,060 --> 00:11:04,910
And let me zoom out a little bit to give ourselves some more space.
255
00:11:04,910 --> 00:11:08,450
And now let's begin to create a list of size 3.
256
00:11:08,450 --> 00:11:11,630
So if I'm going to do this now, dynamically,
257
00:11:11,630 --> 00:11:15,780
so that I'm allocating these things again and again,
258
00:11:15,780 --> 00:11:17,430
let me go ahead and do this.
259
00:11:17,430 --> 00:11:24,470
Let me give myself a list that's of type int* equal the return value of malloc
260
00:11:24,470 --> 00:11:31,490
of 3 times the size of an int, so what this is going to do for me is give me
261
00:11:31,490 --> 00:11:34,490
enough memory for that very first picture we drew on the board.
262
00:11:34,490 --> 00:11:37,160
Which was the array containing 1, 2, and 3.
263
00:11:37,160 --> 00:11:39,990
But laying the foundation to be able to resize it,
264
00:11:39,990 --> 00:11:41,580
which was ultimately the goal.
265
00:11:41,580 --> 00:11:43,650
So my syntax is a little different here.
266
00:11:43,650 --> 00:11:47,090
I'm going to use malloc and get memory from the so-called "heap", as we
267
00:11:47,090 --> 00:11:48,000
called it last week.
268
00:11:48,000 --> 00:11:51,890
Instead of using the stack by just doing the previous version where I said,
269
00:11:51,890 --> 00:11:54,680
int list 3.
270
00:11:54,680 --> 00:11:59,090
That is to say this line of code from the first version is in some sense
271
00:11:59,090 --> 00:12:02,630
identical to this line of code in the second version.
272
00:12:02,630 --> 00:12:04,730
But the first line of code puts the memory
273
00:12:04,730 --> 00:12:06,890
on the stack, automatically, for me.
274
00:12:06,890 --> 00:12:09,800
The second line of code, that I've left here now,
275
00:12:09,800 --> 00:12:13,280
is creating an array of size 3, but it's putting it on the heap.
276
00:12:13,280 --> 00:12:16,900
And that's important because it was only on the heap and via this new function
277
00:12:16,900 --> 00:12:17,830
last week, malloc.
278
00:12:17,830 --> 00:12:20,860
That you can actually ask for more memory, and even give it back.
279
00:12:20,860 --> 00:12:24,760
When you just use the first notation int list 3,
280
00:12:24,760 --> 00:12:28,150
you have permanently given yourself an array of size 3.
281
00:12:28,150 --> 00:12:31,130
You cannot add to that in code.
282
00:12:31,130 --> 00:12:33,010
So let me go ahead and do this.
283
00:12:33,010 --> 00:12:36,143
If list==null, something went wrong.
284
00:12:36,143 --> 00:12:37,310
The computers out of memory.
285
00:12:37,310 --> 00:12:39,503
So let's just return 1 and quit out of this program.
286
00:12:39,503 --> 00:12:40,670
There's nothing to see here.
287
00:12:40,670 --> 00:12:42,520
So just a good error check there.
288
00:12:42,520 --> 00:12:44,770
Now let me go ahead and initialize this list.
289
00:12:44,770 --> 00:12:46,720
So list [0] will be 1 again.
290
00:12:46,720 --> 00:12:48,070
List [1] will be 2.
291
00:12:48,070 --> 00:12:50,440
And list [2] will be 3.
292
00:12:50,440 --> 00:12:52,810
So that's the same kind of syntax as before.
293
00:12:52,810 --> 00:12:55,930
And notice this equivalence.
294
00:12:55,930 --> 00:13:00,730
Recall that there's this relationship between chunks of memory and arrays.
295
00:13:00,730 --> 00:13:03,550
And arrays are really just doing pointer arithmetic for you,
296
00:13:03,550 --> 00:13:05,260
where the square bracket notation is.
297
00:13:05,260 --> 00:13:10,030
So if I've asked myself here, in line 5, for enough memory for 3 integers,
298
00:13:10,030 --> 00:13:15,250
it is perfectly OK to treat it now like an array using square bracket notation.
299
00:13:15,250 --> 00:13:17,740
Because the computer will do the arithmetic for me
300
00:13:17,740 --> 00:13:20,440
and find the first location, the second, and the third.
301
00:13:20,440 --> 00:13:24,550
If you really want to be cool and hacker-like, well,
302
00:13:24,550 --> 00:13:31,300
you could say list=1, list+1=2, list+2=3.
303
00:13:33,880 --> 00:13:36,220
That's the same thing using very explicit,
304
00:13:36,220 --> 00:13:38,830
pointer arithmetic, which we looked at briefly last week.
305
00:13:38,830 --> 00:13:41,170
But this is atrocious to look at for most people.
306
00:13:41,170 --> 00:13:42,860
It's just not very user friendly.
307
00:13:42,860 --> 00:13:45,790
It's longer to type, so most people, even when
308
00:13:45,790 --> 00:13:48,670
allocating memory dynamically as I did a second ago,
309
00:13:48,670 --> 00:13:52,630
would just use the more familiar notation of an array.
310
00:13:52,630 --> 00:13:53,240
All right.
311
00:13:53,240 --> 00:13:54,310
So let's go on.
312
00:13:54,310 --> 00:13:58,840
Now suppose time passes and I realize, oh shoot,
313
00:13:58,840 --> 00:14:03,820
I really wanted this array to be of size 4 instead of size 3.
314
00:14:03,820 --> 00:14:06,362
Now, obviously, I could just rewind and like fix the program.
315
00:14:06,362 --> 00:14:08,320
But suppose that this is a much larger program.
316
00:14:08,320 --> 00:14:10,690
And I've realized, at this point, that I need
317
00:14:10,690 --> 00:14:14,080
to be able to dynamically add more things to this array for whatever
318
00:14:14,080 --> 00:14:14,740
reason.
319
00:14:14,740 --> 00:14:16,280
Well let me go ahead and do this.
320
00:14:16,280 --> 00:14:18,670
Let me just say, all right, list should actually
321
00:14:18,670 --> 00:14:24,700
be the result of asking for 4 chunks of memory from malloc.
322
00:14:24,700 --> 00:14:28,735
And then, I could do something like this, list [3]=4.
323
00:14:31,690 --> 00:14:34,700
Now this is buggy, potentially, in a couple of ways.
324
00:14:34,700 --> 00:14:41,530
But let me ask first, what's really wrong, first, with this code?
325
00:14:41,530 --> 00:14:45,850
The goal at hand is to start with the array of size 3 with the 1, 2, 3.
326
00:14:45,850 --> 00:14:47,660
And I want to add a number 4 to it.
327
00:14:47,660 --> 00:14:53,380
So at the moment, in line 17, I've asked the computer for a chunk of 4 integers.
328
00:14:53,380 --> 00:14:54,940
Just like the picture.
329
00:14:54,940 --> 00:14:57,130
And then I'm adding the number 4 to it.
330
00:14:57,130 --> 00:15:00,610
But I have skipped a few steps and broken this somehow.
331
00:15:00,610 --> 00:15:01,894
Yeah.
332
00:15:01,894 --> 00:15:04,023
AUDIENCE: You don't know exactly [INAUDIBLE]..
333
00:15:04,023 --> 00:15:04,690
SPEAKER 1: Yeah.
334
00:15:04,690 --> 00:15:07,060
I don't necessarily know where this is going to end up in memory.
335
00:15:07,060 --> 00:15:08,560
It's probably not going to be immediately
336
00:15:08,560 --> 00:15:09,910
adjacent to the previous chunk.
337
00:15:09,910 --> 00:15:12,740
And so, yes, even though I'm putting the number for there,
338
00:15:12,740 --> 00:15:16,700
I haven't copied the 1, the 2, or the 3 over to this chunk of memory.
339
00:15:16,700 --> 00:15:18,400
So well let me fix--
340
00:15:18,400 --> 00:15:22,630
well, that's actually, indeed, really the essence of the problem.
341
00:15:22,630 --> 00:15:26,080
I am orphaning the original chunk of memory.
342
00:15:26,080 --> 00:15:29,260
If you think of the picture that I drew earlier, the line of code
343
00:15:29,260 --> 00:15:35,500
up here on line 5 that allocates space for the initial 3 integers.
344
00:15:35,500 --> 00:15:36,820
This code is fine.
345
00:15:36,820 --> 00:15:38,270
This code is fine.
346
00:15:38,270 --> 00:15:41,650
But as soon as I do this, I'm clobbering the value of list.
347
00:15:41,650 --> 00:15:43,960
And saying no, don't point at this chunk of memory.
348
00:15:43,960 --> 00:15:47,900
Point at this chunk of memory, at which point I've forgotten if you will,
349
00:15:47,900 --> 00:15:50,230
where the original chunk of memory is.
350
00:15:50,230 --> 00:15:54,820
So the right way to do something like this, would be a little more involved.
351
00:15:54,820 --> 00:15:57,398
Let me go ahead and give myself a temporary variable.
352
00:15:57,398 --> 00:15:58,690
And I'll literally call it TMP.
353
00:15:58,690 --> 00:16:00,820
T-M-P, like I did last week.
354
00:16:00,820 --> 00:16:04,120
So that I can now ask the computer for a completely different chunk of memory
355
00:16:04,120 --> 00:16:05,290
of size 4.
356
00:16:05,290 --> 00:16:08,230
I'm going to again say if TMP equals null,
357
00:16:08,230 --> 00:16:10,370
I'm going to say bad things happened here.
358
00:16:10,370 --> 00:16:11,560
So let me just return 1.
359
00:16:11,560 --> 00:16:13,840
And you know what, just to be tidy, let me
360
00:16:13,840 --> 00:16:16,542
free the original list before I quit.
361
00:16:16,542 --> 00:16:18,250
Because remember from last week, any time
362
00:16:18,250 --> 00:16:20,650
you use malloc you eventually have to use free.
363
00:16:20,650 --> 00:16:24,040
But this chunk of code here is just a safety check.
364
00:16:24,040 --> 00:16:26,440
If there's no more memory, there's nothing to see here.
365
00:16:26,440 --> 00:16:29,500
I'm just going to clean up my state and quit.
366
00:16:29,500 --> 00:16:32,840
But now, if I have asked for this chunk of memory,
367
00:16:32,840 --> 00:16:38,200
now I can do this 4 int i gets 0.
368
00:16:38,200 --> 00:16:40,600
I is less than 3, i++.
369
00:16:40,600 --> 00:16:42,520
What if I do something like this?
370
00:16:42,520 --> 00:16:46,540
TMP [i] equals list [i].
371
00:16:46,540 --> 00:16:50,980
That would seem to have the effect of copying all of the memory from one
372
00:16:50,980 --> 00:16:51,800
to the other.
373
00:16:51,800 --> 00:16:55,510
And then, I think I need to do one last thing TMP [3]
374
00:16:55,510 --> 00:16:57,460
gets the number 4, for instance.
375
00:16:57,460 --> 00:17:01,480
Again, I'm hard coding the numbers for the sake of discussion.
376
00:17:01,480 --> 00:17:06,460
After I've done this, what could I now do?
377
00:17:06,460 --> 00:17:10,990
I could now set list equals to TMP.
378
00:17:10,990 --> 00:17:14,048
And now, I have updated my linked list properly.
379
00:17:14,048 --> 00:17:15,340
So let me go ahead and do this.
380
00:17:15,340 --> 00:17:17,080
4 int i gets 0.
381
00:17:17,080 --> 00:17:19,480
I is less than 4, i++.
382
00:17:19,480 --> 00:17:24,820
Let me go ahead and print each of these elements out with %i using list [i].
383
00:17:24,820 --> 00:17:27,890
And then, I'm going to return 0 just to signify that all is successful.
384
00:17:27,890 --> 00:17:31,990
Now so to recap, we initialize the original array
385
00:17:31,990 --> 00:17:35,140
of size 3 and plug-in the values 1, 2, 3.
386
00:17:35,140 --> 00:17:35,960
Time passes.
387
00:17:35,960 --> 00:17:38,210
And then, I realize, wait a minute, I need more space.
388
00:17:38,210 --> 00:17:40,585
And so I asked the computer for a second chunk of memory.
389
00:17:40,585 --> 00:17:41,800
This one of size 4.
390
00:17:41,800 --> 00:17:44,467
Just as a safety check, I make sure that TMP doesn't equal null.
391
00:17:44,467 --> 00:17:46,008
Because if it does I'm out of memory.
392
00:17:46,008 --> 00:17:47,590
So I should just quit altogether.
393
00:17:47,590 --> 00:17:50,110
But once I'm sure that it's not null, I'm
394
00:17:50,110 --> 00:17:55,450
going to copy all the values from the old list into the new list.
395
00:17:55,450 --> 00:17:58,910
And then, I'm going to add my new number at the end of that list.
396
00:17:58,910 --> 00:18:02,410
And then, now that I'm done playing around with this temporary variable,
397
00:18:02,410 --> 00:18:05,860
I'm going to remember in my list variable what
398
00:18:05,860 --> 00:18:07,900
the addresses of this new chunk of memory.
399
00:18:07,900 --> 00:18:10,570
And then, I'm going to print all of those values out.
400
00:18:10,570 --> 00:18:14,350
So at least, aesthetically, when I make this new version of my list,
401
00:18:14,350 --> 00:18:16,660
except for my missing semicolon.
402
00:18:16,660 --> 00:18:17,590
Let me try this again.
403
00:18:17,590 --> 00:18:19,480
When I make lists, Oh OK.
404
00:18:19,480 --> 00:18:20,620
What did I do this time?
405
00:18:20,620 --> 00:18:23,290
Implicitly declaring a library function malloc.
406
00:18:23,290 --> 00:18:27,749
What's my mistake any time you see that kind of error?
407
00:18:27,749 --> 00:18:28,510
AUDIENCE: Library.
408
00:18:28,510 --> 00:18:28,800
SPEAKER 1: Yeah.
409
00:18:28,800 --> 00:18:29,380
A library.
410
00:18:29,380 --> 00:18:34,700
So up here, I forgot to do include stdlib.h, which is where malloc lives.
411
00:18:34,700 --> 00:18:36,490
Let me go ahead and, again, do make list.
412
00:18:36,490 --> 00:18:37,250
There we go.
413
00:18:37,250 --> 00:18:38,950
So I fixed that dot/list.
414
00:18:38,950 --> 00:18:41,829
And I should see 1, 2, 3, 4.
415
00:18:41,829 --> 00:18:45,640
But they're still a bug here.
416
00:18:45,640 --> 00:18:48,310
Does anyone see the the-- bug or question?
417
00:18:48,310 --> 00:18:50,100
AUDIENCE: You forgot to free them.
418
00:18:50,100 --> 00:18:50,790
SPEAKER 1: I'm sorry, say again.
419
00:18:50,790 --> 00:18:52,470
AUDIENCE: You forgot to free them.
420
00:18:52,470 --> 00:18:54,570
SPEAKER 1: I forgot to free the original list.
421
00:18:54,570 --> 00:18:58,170
And we could see this, even if not just with our own eyes or intuition.
422
00:18:58,170 --> 00:19:00,847
If I do something like Valgrind of dot/list,
423
00:19:00,847 --> 00:19:02,430
remember our tool from this past week.
424
00:19:02,430 --> 00:19:05,310
Let me increase the size of my terminal window, temporarily.
425
00:19:05,310 --> 00:19:07,540
The output is crazy cryptic at first.
426
00:19:07,540 --> 00:19:12,780
But, notice that I have definitely lost some number of bytes here.
427
00:19:12,780 --> 00:19:15,150
And indeed, it's even pointing at the line number
428
00:19:15,150 --> 00:19:16,930
in which some of those bytes were lost.
429
00:19:16,930 --> 00:19:18,930
So let me go ahead and back to my code.
430
00:19:18,930 --> 00:19:23,610
And indeed, I think what I need to do is, before I clobber the value of list
431
00:19:23,610 --> 00:19:27,150
pointing it at this new chunk of memory instead of the old,
432
00:19:27,150 --> 00:19:29,910
I think I now need to first, proactively,
433
00:19:29,910 --> 00:19:32,460
say free the old list of memory.
434
00:19:32,460 --> 00:19:34,480
And then, change its value.
435
00:19:34,480 --> 00:19:39,250
So if I now do Make List and do dot /list, the output is still the same.
436
00:19:39,250 --> 00:19:42,450
And, if I cross my fingers and run Valgrind again
437
00:19:42,450 --> 00:19:46,440
after increasing my window size, hopefully here.
438
00:19:46,440 --> 00:19:48,160
Oh, still a bug.
439
00:19:48,160 --> 00:19:49,080
So better.
440
00:19:49,080 --> 00:19:52,020
It seems like less memory is lost.
441
00:19:52,020 --> 00:19:54,450
What have I now forgotten to do?
442
00:19:54,450 --> 00:19:56,430
AUDIENCE: You forgot to free the end.
443
00:19:56,430 --> 00:19:58,740
SPEAKER 1: I forgot to free it at the very end, too.
444
00:19:58,740 --> 00:20:01,560
Because I still have a chunk of memory that I got from malloc.
445
00:20:01,560 --> 00:20:04,200
So let me go to the very bottom of the program now.
446
00:20:04,200 --> 00:20:09,330
And after I'm done senselessly just printing this thing out,
447
00:20:09,330 --> 00:20:12,450
let me free the new list.
448
00:20:12,450 --> 00:20:15,780
And now let me do Make List, dot/list.
449
00:20:15,780 --> 00:20:17,670
It's still works, visually.
450
00:20:17,670 --> 00:20:22,200
Now let's do Valgrind of dot/list, Enter.
451
00:20:22,200 --> 00:20:25,530
And now, hopefully, all heap blocks were freed.
452
00:20:25,530 --> 00:20:27,018
No leaks are possible.
453
00:20:27,018 --> 00:20:30,060
So this is perhaps the best output you can see from a tool like Valgrind.
454
00:20:30,060 --> 00:20:32,950
I used the heap, but I freed all the memory as well.
455
00:20:32,950 --> 00:20:34,630
So there were 2 fixes needed there.
456
00:20:34,630 --> 00:20:35,130
All right.
457
00:20:35,130 --> 00:20:38,910
Any questions then on this array-based approach, the first of which
458
00:20:38,910 --> 00:20:41,530
is statically allocating an array, so to speak.
459
00:20:41,530 --> 00:20:43,230
By just hard coding the number 3.
460
00:20:43,230 --> 00:20:47,190
The second version now is dynamically allocating the array,
461
00:20:47,190 --> 00:20:49,380
using not the stack but the heap.
462
00:20:49,380 --> 00:20:52,800
But, it too, suffers from the slowness we described earlier,
463
00:20:52,800 --> 00:20:55,290
of having to copy all those values from one to the other.
464
00:20:55,290 --> 00:20:55,790
OK.
465
00:20:55,790 --> 00:20:57,183
A hand was over here.
466
00:20:57,183 --> 00:20:59,858
AUDIENCE: Why do you not have to free the TMP?
467
00:20:59,858 --> 00:21:00,900
SPEAKER 1: Good question.
468
00:21:00,900 --> 00:21:02,820
Why did I not have to free the TMP?
469
00:21:02,820 --> 00:21:05,130
I essentially did eventually.
470
00:21:05,130 --> 00:21:10,360
Because TMP was pointing at the chunk of 4 integers.
471
00:21:10,360 --> 00:21:15,810
But on line 33 here, I assigned list to be
472
00:21:15,810 --> 00:21:18,580
identical to what TMP was pointing at.
473
00:21:18,580 --> 00:21:23,173
And so, when I finally freed the list, that was the same thing as freeing TMP.
474
00:21:23,173 --> 00:21:26,340
In fact, if I wanted to, I could say free TMP here and it would be the same.
475
00:21:26,340 --> 00:21:28,080
But conceptually, it's wrong.
476
00:21:28,080 --> 00:21:32,130
Because at this point in the story, I should be freeing the actual list, not
477
00:21:32,130 --> 00:21:33,240
that temporary variable.
478
00:21:33,240 --> 00:21:35,340
But they were the same at that point in the story.
479
00:21:35,340 --> 00:21:35,840
Yeah.
480
00:21:35,840 --> 00:21:37,878
AUDIENCE: Is [? the line ?] part of it?
481
00:21:37,878 --> 00:21:38,920
SPEAKER 1: Good question.
482
00:21:38,920 --> 00:21:41,350
And long story short, everything we're doing thus far
483
00:21:41,350 --> 00:21:42,820
is still in the world of arrays.
484
00:21:42,820 --> 00:21:44,710
The only distinction we're making is that
485
00:21:44,710 --> 00:21:51,220
in version 1, when I said int list [3], that was an array of fixed size.
486
00:21:51,220 --> 00:21:55,150
So-called statically allocated on the stack, as per last week.
487
00:21:55,150 --> 00:21:58,900
This version now is still dealing with arrays, but I'm flexing my muscles
488
00:21:58,900 --> 00:22:00,980
and using dynamic memory allocation.
489
00:22:00,980 --> 00:22:03,498
So that I can still use an array per the first pictures
490
00:22:03,498 --> 00:22:04,540
we started talking about.
491
00:22:04,540 --> 00:22:07,070
But I can at least grow the array if I want.
492
00:22:07,070 --> 00:22:10,990
So we haven't even now solved this, even better in a sense, with linked lists.
493
00:22:10,990 --> 00:22:12,080
That's going to come next.
494
00:22:12,080 --> 00:22:12,580
Yeah.
495
00:22:12,580 --> 00:22:16,930
AUDIENCE: How are you able to free list and then still make list?
496
00:22:16,930 --> 00:22:19,720
SPEAKER 1: How am I able to free list?
497
00:22:19,720 --> 00:22:24,310
I freed the original address of list.
498
00:22:24,310 --> 00:22:27,220
I, then, changed what list is storing.
499
00:22:27,220 --> 00:22:30,070
I'm moving its arrow to a new chunk of memory.
500
00:22:30,070 --> 00:22:33,550
And that is perfectly reasonable for me to now manipulate
501
00:22:33,550 --> 00:22:37,180
because now list is pointing at the same value of TMP.
502
00:22:37,180 --> 00:22:42,610
And TMP is what was given the return value of malloc, the second time.
503
00:22:42,610 --> 00:22:44,780
So that chunk of memory is valid.
504
00:22:44,780 --> 00:22:48,220
So these are just squares on the board, right.
505
00:22:48,220 --> 00:22:49,970
There's just pointers inside of them.
506
00:22:49,970 --> 00:22:51,887
So what I'm technically saying is, and I'm not
507
00:22:51,887 --> 00:22:54,040
pointing I'm not freeing list per se, I am
508
00:22:54,040 --> 00:22:58,660
freeing the chunk of memory that begins at the address currently in list.
509
00:22:58,660 --> 00:23:04,060
Therefore, if a few lines later, I change what the address is in list.
510
00:23:04,060 --> 00:23:08,080
Totally reasonable to then touch that memory, and eventually free it later.
511
00:23:08,080 --> 00:23:10,390
Because you're not freeing the variable per se,
512
00:23:10,390 --> 00:23:12,790
you're freeing the address in the variable.
513
00:23:12,790 --> 00:23:13,630
Good distinction.
514
00:23:13,630 --> 00:23:14,140
All right.
515
00:23:14,140 --> 00:23:19,750
So let me back up here and now make one final edit.
516
00:23:19,750 --> 00:23:24,190
So let's finish this with one final improvement here.
517
00:23:24,190 --> 00:23:27,160
Because it turns out, there's a somewhat better way
518
00:23:27,160 --> 00:23:30,610
to actually resize an array as we've been doing here.
519
00:23:30,610 --> 00:23:35,028
And there's another function in stdlib that's called realloc, for re-allocate.
520
00:23:35,028 --> 00:23:37,570
And I'm just going to go in and make a little bit of a change
521
00:23:37,570 --> 00:23:40,578
here so that I can do the following.
522
00:23:40,578 --> 00:23:42,370
Let me go ahead and first comment this now,
523
00:23:42,370 --> 00:23:45,320
just so we can keep track of what's been going on this whole time.
524
00:23:45,320 --> 00:23:51,970
So dynamically allocate an array of size 3.
525
00:23:51,970 --> 00:23:56,650
Assign 3 numbers to that array.
526
00:23:56,650 --> 00:23:58,330
Time passes.
527
00:23:58,330 --> 00:24:03,640
Allocate new array of size 4.
528
00:24:03,640 --> 00:24:09,460
Copy numbers from old array into new array.
529
00:24:09,460 --> 00:24:14,170
And add fourth number to new array.
530
00:24:14,170 --> 00:24:15,895
Free old array.
531
00:24:18,850 --> 00:24:24,460
Remember, if you will, new array using my same list variable.
532
00:24:24,460 --> 00:24:28,960
And now, print new array.
533
00:24:28,960 --> 00:24:31,270
Free new array.
534
00:24:31,270 --> 00:24:32,260
Hopefully, that helps.
535
00:24:32,260 --> 00:24:35,530
And we'll post this code online after 2, which tells a more explicit story.
536
00:24:35,530 --> 00:24:39,220
So it turns out that we can reduce some of the labor involved with this.
537
00:24:39,220 --> 00:24:41,980
Not so much with the printing here, but with this copying.
538
00:24:41,980 --> 00:24:44,260
Turns out c does have a function called realloc,
539
00:24:44,260 --> 00:24:49,580
that can actually handle the resizing of an array for you, as follows.
540
00:24:49,580 --> 00:24:51,700
I'm going to scroll up to where I previously
541
00:24:51,700 --> 00:24:54,820
allocated a new array of size 4.
542
00:24:54,820 --> 00:25:02,020
And I'm instead going to say this, resize old array to be of size 4.
543
00:25:02,020 --> 00:25:04,477
Now, previously this wasn't necessarily possible.
544
00:25:04,477 --> 00:25:06,310
Because recall that we had painted ourselves
545
00:25:06,310 --> 00:25:08,143
into a corner with the example on the screen
546
00:25:08,143 --> 00:25:10,990
where "Hello, world" happened to be right after the original array.
547
00:25:10,990 --> 00:25:12,410
But let me do this.
548
00:25:12,410 --> 00:25:15,340
Let me use realloc, for re-allocate.
549
00:25:15,340 --> 00:25:18,640
And pass in not just the size of memory we want this time,
550
00:25:18,640 --> 00:25:22,330
but also the address that we want to resize.
551
00:25:22,330 --> 00:25:25,940
Which, again, is this array called list.
552
00:25:25,940 --> 00:25:26,440
All right.
553
00:25:26,440 --> 00:25:29,330
The code thereafter is pretty much the same.
554
00:25:29,330 --> 00:25:33,200
But what I don't need to do is this.
555
00:25:33,200 --> 00:25:36,520
So realloc is a pretty handy function that will do the following.
556
00:25:36,520 --> 00:25:39,670
If at the very beginning of class, when we had 1, 2, 3 on the board.
557
00:25:39,670 --> 00:25:43,010
And someone's instinct was to just plop the 4 right at the end of the list.
558
00:25:43,010 --> 00:25:45,760
If there's available memory, realloc will just do that.
559
00:25:45,760 --> 00:25:50,200
And boom, it will just grow the array for you in the computer's memory.
560
00:25:50,200 --> 00:25:54,160
If, though, it realizes, sorry, there's already a string like "Hello, world"
561
00:25:54,160 --> 00:25:57,040
or something else there, realloc will handle
562
00:25:57,040 --> 00:26:00,730
the trouble of moving that whole array from 1 chunk of memory,
563
00:26:00,730 --> 00:26:03,010
originally, to a new chunk of memory.
564
00:26:03,010 --> 00:26:09,400
And then realloc will return to you, the address of that new chunk of memory.
565
00:26:09,400 --> 00:26:13,550
And it will handle the process of freeing the old chunk for you.
566
00:26:13,550 --> 00:26:15,800
So you do not need to do this yourself.
567
00:26:15,800 --> 00:26:19,130
So in fact, let me go ahead and get rid of this as well.
568
00:26:19,130 --> 00:26:24,100
So realloc just condenses, a lot of what we just did, into a single function.
569
00:26:24,100 --> 00:26:28,110
Whereby, realloc handles it for you.
570
00:26:28,110 --> 00:26:28,610
All right.
571
00:26:28,610 --> 00:26:31,670
So that's the final improvement on this array-based approach.
572
00:26:31,670 --> 00:26:34,450
So what now, knowing what your memory is,
573
00:26:34,450 --> 00:26:37,400
what can we now do with it that solves that kind of problem?
574
00:26:37,400 --> 00:26:39,320
Because the world is going to get really slow.
575
00:26:39,320 --> 00:26:42,320
And our apps, and our phones, and our computers are getting really slow,
576
00:26:42,320 --> 00:26:46,550
if we're just constantly wasting time moving things around in memory.
577
00:26:46,550 --> 00:26:48,410
What could we perhaps do instead?
578
00:26:48,410 --> 00:26:50,480
Well there's one new piece of syntax today
579
00:26:50,480 --> 00:26:53,840
that builds on these 3 pieces of syntax from the past.
580
00:26:53,840 --> 00:26:55,700
Recall, that we've looked at struct, which
581
00:26:55,700 --> 00:26:58,820
is a keyword in C, that just lets you invent your own structure.
582
00:26:58,820 --> 00:27:02,060
Your own variable, if you will, in conjunction with typedef.
583
00:27:02,060 --> 00:27:06,200
Which lets you say a person has a name and a number, or something like that.
584
00:27:06,200 --> 00:27:08,660
Or a candidate has a name and some number of votes.
585
00:27:08,660 --> 00:27:13,040
You can encapsulate multiple pieces of data inside of just one using struct.
586
00:27:13,040 --> 00:27:17,160
What did we use the Dot Notation for now, a couple of times?
587
00:27:17,160 --> 00:27:20,468
What does the Dot operator do in C?
588
00:27:20,468 --> 00:27:21,760
AUDIENCE: Access the structure.
589
00:27:21,760 --> 00:27:22,150
SPEAKER 1: Perfect.
590
00:27:22,150 --> 00:27:24,200
To access the field inside of a structure.
591
00:27:24,200 --> 00:27:26,325
So if you've got a person with a name and a number,
592
00:27:26,325 --> 00:27:29,350
you could say something like person.name or person.number,
593
00:27:29,350 --> 00:27:31,510
if person is the name of one such variable.
594
00:27:31,510 --> 00:27:33,850
Star, of course, we've seen now in a few ways.
595
00:27:33,850 --> 00:27:37,540
Like way back in week 1, we saw it as like, multiplication.
596
00:27:37,540 --> 00:27:40,750
Last week, we began to see it in the context of pointers,
597
00:27:40,750 --> 00:27:42,970
whereby, you use it to declare a pointer.
598
00:27:42,970 --> 00:27:45,560
Like, int* p, or something like that.
599
00:27:45,560 --> 00:27:48,040
But we also saw it in one other context, which
600
00:27:48,040 --> 00:27:51,380
was like the opposite, which was the dereference operator.
601
00:27:51,380 --> 00:27:53,272
Which says if this is an address, that is
602
00:27:53,272 --> 00:27:56,230
if this is a variable like a pointer, and you put a star in front of it
603
00:27:56,230 --> 00:27:59,980
then with no int or no char, no data type in front of it.
604
00:27:59,980 --> 00:28:01,870
That means go to that address.
605
00:28:01,870 --> 00:28:05,300
And it dereferences the pointer and goes to that location.
606
00:28:05,300 --> 00:28:07,720
So it turns out that using these 3 building blocks,
607
00:28:07,720 --> 00:28:10,760
you can actually start to now use your computer's memory almost any way
608
00:28:10,760 --> 00:28:11,260
you want.
609
00:28:11,260 --> 00:28:13,720
And even next week, when we transition to Python,
610
00:28:13,720 --> 00:28:16,360
and you start to get a lot of features for free.
611
00:28:16,360 --> 00:28:18,550
Like a single line of code will just do so much
612
00:28:18,550 --> 00:28:23,170
more in Python than it does in C. It boils down to those basic primitives.
613
00:28:23,170 --> 00:28:25,060
And just so you've seen it already.
614
00:28:25,060 --> 00:28:29,770
It turns out that it's so common in C to use this operator
615
00:28:29,770 --> 00:28:33,790
to go inside of a structure and this operator to go to an address,
616
00:28:33,790 --> 00:28:36,250
that there's shorthand notation for it, a.k.a.
617
00:28:36,250 --> 00:28:37,450
syntactic sugar.
618
00:28:37,450 --> 00:28:39,095
That literally looks like an arrow.
619
00:28:39,095 --> 00:28:41,470
So recall last week, I was in the habit of pointing, even
620
00:28:41,470 --> 00:28:42,670
with the big foam finger.
621
00:28:42,670 --> 00:28:47,020
This arrow notation, a hyphen and an angled bracket,
622
00:28:47,020 --> 00:28:53,950
denotes going to an address and looking at a field inside of it.
623
00:28:53,950 --> 00:28:56,240
But we'll see this in practice in just a bit.
624
00:28:56,240 --> 00:28:59,110
So what might be the solution, now, to this problem
625
00:28:59,110 --> 00:29:02,620
we saw a moment ago whereby, we had painted ourselves into a corner.
626
00:29:02,620 --> 00:29:05,900
And our memory, a few moments ago, looked like this.
627
00:29:05,900 --> 00:29:10,720
We could just copy the whole existing array to a new location, add the 4,
628
00:29:10,720 --> 00:29:12,010
and go about our business.
629
00:29:12,010 --> 00:29:15,850
What would another, perhaps better solution longer term
630
00:29:15,850 --> 00:29:21,145
be, that doesn't require constantly moving stuff around?
631
00:29:21,145 --> 00:29:23,020
Maybe hang in there for your instincts if you
632
00:29:23,020 --> 00:29:27,200
know the buzz phrase we're looking for from past experience, hang in there.
633
00:29:27,200 --> 00:29:29,800
But if we want to avoid moving the 1, 2, and the 3,
634
00:29:29,800 --> 00:29:32,500
but we still want to be able to add endless amounts of data.
635
00:29:32,500 --> 00:29:33,980
What could we do?
636
00:29:33,980 --> 00:29:34,480
Yeah.
637
00:29:34,480 --> 00:29:37,390
So maybe create some kind of list using pointers that
638
00:29:37,390 --> 00:29:39,370
just point at a new location, right.
639
00:29:39,370 --> 00:29:42,490
In an ideal world, even though this piece of memory
640
00:29:42,490 --> 00:29:45,430
is being used by this h in the string "Hello, world",
641
00:29:45,430 --> 00:29:47,980
maybe we could somehow use a pointer from last week.
642
00:29:47,980 --> 00:29:52,330
Like an arrow, that says after the 3, oh I don't know, go down over here
643
00:29:52,330 --> 00:29:54,040
to this location in memory.
644
00:29:54,040 --> 00:29:58,310
And you just stitch together these integers in memory
645
00:29:58,310 --> 00:30:00,340
so that each one leads to the next.
646
00:30:00,340 --> 00:30:03,700
It's not necessarily the case that it's literally back-to-back.
647
00:30:03,700 --> 00:30:05,950
That would have the downside, it would seem,
648
00:30:05,950 --> 00:30:07,510
of costing us a little bit of space.
649
00:30:07,510 --> 00:30:10,120
Like a pointer, which recall, takes up some amount of space.
650
00:30:10,120 --> 00:30:12,400
Typically 8 bytes or 64 bits.
651
00:30:12,400 --> 00:30:16,000
But I don't have to copy potentially a huge amount of data just
652
00:30:16,000 --> 00:30:17,440
to add one more number.
653
00:30:17,440 --> 00:30:19,278
And so these things do have a name.
654
00:30:19,278 --> 00:30:21,070
And indeed, these things are what generally
655
00:30:21,070 --> 00:30:24,820
would be called a linked list.
656
00:30:24,820 --> 00:30:27,340
A linked list captures exactly that intuition
657
00:30:27,340 --> 00:30:29,060
of linking together things in memory.
658
00:30:29,060 --> 00:30:30,530
So let's take a look at an example.
659
00:30:30,530 --> 00:30:32,322
Here's a computer's memory in the abstract.
660
00:30:32,322 --> 00:30:35,140
Suppose that I'm trying to create an array.
661
00:30:35,140 --> 00:30:38,200
Let's generalize it as a list, now, of numbers.
662
00:30:38,200 --> 00:30:39,880
An array has a very specific meaning.
663
00:30:39,880 --> 00:30:42,610
It's memory that's contiguous, back, to back, to back.
664
00:30:42,610 --> 00:30:46,240
At the end of the day, I as the programmer, just care about the data--
665
00:30:46,240 --> 00:30:48,340
1, 2, 3, 4, and so forth.
666
00:30:48,340 --> 00:30:52,300
I don't really care how it's stored.
667
00:30:52,300 --> 00:30:54,610
I don't care how it's stored when I'm writing the code,
668
00:30:54,610 --> 00:30:56,443
I just wanted to work at the end of the day.
669
00:30:56,443 --> 00:30:58,570
So suppose that I first insert my number 1.
670
00:30:58,570 --> 00:31:02,110
And, who knows, it ends up, up there at location, 0X123,
671
00:31:02,110 --> 00:31:03,320
for the sake of discussion.
672
00:31:03,320 --> 00:31:03,820
All right.
673
00:31:03,820 --> 00:31:06,070
Maybe there's something already here.
674
00:31:06,070 --> 00:31:08,110
And heck, maybe there's something already here,
675
00:31:08,110 --> 00:31:11,095
but there's plenty of other options for where this thing can go.
676
00:31:11,095 --> 00:31:12,970
And suppose that, for the sake of discussion,
677
00:31:12,970 --> 00:31:14,803
the first available spot for the next number
678
00:31:14,803 --> 00:31:20,612
happens to be over here at location 0X456, for the sake of discussion.
679
00:31:20,612 --> 00:31:22,570
So that's where I'm going to plop the number 2.
680
00:31:22,570 --> 00:31:24,070
And where might the number 3 end up?
681
00:31:24,070 --> 00:31:26,860
Oh I don't know, maybe down over there at 0X789.
682
00:31:26,860 --> 00:31:31,030
The point being, I don't know what is, or really care about,
683
00:31:31,030 --> 00:31:33,190
everything else that's in the computer's memory.
684
00:31:33,190 --> 00:31:37,240
I just care that there are at least 3 locations available where
685
00:31:37,240 --> 00:31:40,300
I can put my 1, my 2, and my 3.
686
00:31:40,300 --> 00:31:44,020
But the catch is, now that we're not using an array,
687
00:31:44,020 --> 00:31:48,370
we can't just naively assume that you just add 1 to an index and boom,
688
00:31:48,370 --> 00:31:49,510
you're at the next number.
689
00:31:49,510 --> 00:31:52,960
Add 2 to an index, and boom you're at the next, next number.
690
00:31:52,960 --> 00:31:57,370
Now you have to leave these little breadcrumbs, or use the arrow notation,
691
00:31:57,370 --> 00:31:59,680
to lead from one to the other.
692
00:31:59,680 --> 00:32:01,870
And sometimes, it might be close, a few bytes away.
693
00:32:01,870 --> 00:32:05,810
Maybe, it's a whole gigabyte away in an even bigger computer's memory.
694
00:32:05,810 --> 00:32:07,540
So how might I do this?
695
00:32:07,540 --> 00:32:12,770
Like where do these pointers go, as you proposed?
696
00:32:12,770 --> 00:32:13,270
All right.
697
00:32:13,270 --> 00:32:15,340
All I have access to here are bytes.
698
00:32:15,340 --> 00:32:17,410
I've already stored the 1, the 2, and the 3.
699
00:32:17,410 --> 00:32:19,780
So what more should I do?
700
00:32:19,780 --> 00:32:20,480
OK, yeah.
701
00:32:20,480 --> 00:32:23,370
So let me, you put the pointers right next to these numbers.
702
00:32:23,370 --> 00:32:27,410
So let me at least plan ahead, so that when I ask the computer like malloc,
703
00:32:27,410 --> 00:32:30,470
recall from last week, for some memory, I don't just ask it now
704
00:32:30,470 --> 00:32:32,375
for space for just the number.
705
00:32:32,375 --> 00:32:34,250
Let me start getting into the habit of asking
706
00:32:34,250 --> 00:32:39,350
malloc for enough space for the number and a pointer to another such number.
707
00:32:39,350 --> 00:32:42,060
So it's a little more aggressive of me to ask for more memory.
708
00:32:42,060 --> 00:32:43,340
But I'm planning ahead.
709
00:32:43,340 --> 00:32:45,140
And here is an example of a trade off.
710
00:32:45,140 --> 00:32:48,920
Almost any time in CS, when you start using more space, you can save time.
711
00:32:48,920 --> 00:32:53,180
Or if you try to conserve space, you might have to lose time.
712
00:32:53,180 --> 00:32:54,680
It's being that trade off there.
713
00:32:54,680 --> 00:32:56,910
So how might I solve this?
714
00:32:56,910 --> 00:32:58,460
Well let me abstract this away.
715
00:32:58,460 --> 00:33:01,575
And either next to or below, I'm just drawing it vertically, just
716
00:33:01,575 --> 00:33:02,700
for the sake of discussion.
717
00:33:02,700 --> 00:33:04,670
So the arrows are a bit prettier.
718
00:33:04,670 --> 00:33:07,580
I've asked malloc for now twice as much space,
719
00:33:07,580 --> 00:33:09,590
it would seem, than I previously needed.
720
00:33:09,590 --> 00:33:13,535
But I'm going to use this second chunk of memory to refer to the next number.
721
00:33:13,535 --> 00:33:16,160
And I'm going to use this chunk of memory to refer to the next,
722
00:33:16,160 --> 00:33:17,970
essentially, stitching this thing together.
723
00:33:17,970 --> 00:33:20,030
So what should go in this first box?
724
00:33:20,030 --> 00:33:23,600
Well, I claim the number, 0X456.
725
00:33:23,600 --> 00:33:26,300
And it's written in hex because it represents a memory address.
726
00:33:26,300 --> 00:33:30,320
But this is the equivalent of drawing an arrow from one to the other.
727
00:33:30,320 --> 00:33:34,070
As a little check here, what should go in this second box
728
00:33:34,070 --> 00:33:37,940
if the goal is to stitch these together in order 1, 2, 3?
729
00:33:37,940 --> 00:33:40,112
Feel free to just shout this out.
730
00:33:40,112 --> 00:33:41,570
AUDIENCE: 0X789.
731
00:33:41,570 --> 00:33:42,990
SPEAKER 1: OK, that worked well.
732
00:33:42,990 --> 00:33:43,915
So 0X789, indeed.
733
00:33:43,915 --> 00:33:46,790
And you can't do that with the hands because I can't count that fast.
734
00:33:46,790 --> 00:33:51,030
So 0X789 should go here because that's like a little breadcrumb to the next.
735
00:33:51,030 --> 00:33:54,290
And then, we don't really have terribly many possibilities here.
736
00:33:54,290 --> 00:33:56,960
This has to have a value, right.
737
00:33:56,960 --> 00:34:01,830
Because at the end of the day, it's got to use its 64 bits in some way.
738
00:34:01,830 --> 00:34:05,170
So what value should go here, if this is the end of this list?
739
00:34:05,170 --> 00:34:06,170
AUDIENCE: 0.
740
00:34:06,170 --> 00:34:08,270
SPEAKER 1: So it could be 0X123.
741
00:34:08,270 --> 00:34:12,050
The implication being that it would be a cyclical list.
742
00:34:12,050 --> 00:34:14,570
Which is OK, but potentially problematic.
743
00:34:14,570 --> 00:34:18,620
If any of you have accidentally lost control over your code space
744
00:34:18,620 --> 00:34:21,680
because you had an infinite loop, this would seem a very easy way
745
00:34:21,680 --> 00:34:26,330
to give yourself the accidental probability of an infinite loop.
746
00:34:26,330 --> 00:34:28,916
What might be simpler than that and ward that off?
747
00:34:28,916 --> 00:34:29,590
AUDIENCE: Null.
748
00:34:29,590 --> 00:34:30,505
SPEAKER 1: Say again?
749
00:34:30,505 --> 00:34:31,130
AUDIENCE: Null.
750
00:34:31,130 --> 00:34:32,840
SPEAKER 1: So just the null character.
751
00:34:32,840 --> 00:34:35,540
Not N-U-L, confusingly, which is at the end of strings.
752
00:34:35,540 --> 00:34:38,550
But N-U-L-L, as we introduced it last week.
753
00:34:38,550 --> 00:34:40,580
Which is the same as 0x0.
754
00:34:40,580 --> 00:34:43,400
So this is just a special value that programmers decades ago
755
00:34:43,400 --> 00:34:47,510
decided that if you store the address 0, that's not a valid address.
756
00:34:47,510 --> 00:34:50,420
There's never going to be anything useful at 0x0.
757
00:34:50,420 --> 00:34:53,600
Therefore, it's a sentinel value, just a special value,
758
00:34:53,600 --> 00:34:54,800
that indicates that's it.
759
00:34:54,800 --> 00:34:56,870
There's nowhere further to go.
760
00:34:56,870 --> 00:35:00,470
It's OK to come back to your suggestion of making a cyclical list.
761
00:35:00,470 --> 00:35:02,390
But we'd better be smart enough to, maybe,
762
00:35:02,390 --> 00:35:06,380
remember where did the list start so that you can detect cycles.
763
00:35:06,380 --> 00:35:08,940
If you start looping around in this structure, otherwise.
764
00:35:08,940 --> 00:35:09,440
All right.
765
00:35:09,440 --> 00:35:11,640
But these addresses, who really cares at the end of the day
766
00:35:11,640 --> 00:35:12,920
if we abstract this away.
767
00:35:12,920 --> 00:35:14,820
It really just now looks like this.
768
00:35:14,820 --> 00:35:17,778
And indeed, this is how most anyone would draw this on a whiteboard
769
00:35:17,778 --> 00:35:19,070
if having a discussion at work.
770
00:35:19,070 --> 00:35:20,862
Talking about what data structure we should
771
00:35:20,862 --> 00:35:22,790
use to solve some problem in the real world.
772
00:35:22,790 --> 00:35:25,040
We don't care generally about the addresses.
773
00:35:25,040 --> 00:35:27,630
We care that in code we can access them.
774
00:35:27,630 --> 00:35:30,590
But in terms of the concept alone this would be, perhaps,
775
00:35:30,590 --> 00:35:32,239
the right way to think about this.
776
00:35:32,239 --> 00:35:34,197
All right, let me pause here and see if there's
777
00:35:34,197 --> 00:35:38,420
any questions on this idea of creating a linked list in memory by just storing,
778
00:35:38,420 --> 00:35:42,540
not just the numbers like 1, 2, 3, but twice as much data.
779
00:35:42,540 --> 00:35:45,110
So that you have little breadcrumbs in the form of pointers
780
00:35:45,110 --> 00:35:48,510
that can lead you from one to the next.
781
00:35:48,510 --> 00:35:50,674
Any questions on these linked lists?
782
00:35:54,130 --> 00:35:54,730
Any questions?
783
00:35:54,730 --> 00:35:55,230
No?
784
00:35:55,230 --> 00:35:55,940
All right.
785
00:35:55,940 --> 00:35:56,440
Oh, yeah.
786
00:35:56,440 --> 00:35:57,431
Over here.
787
00:35:57,431 --> 00:36:02,025
AUDIENCE: So does this takes time more memory than an array?
788
00:36:02,025 --> 00:36:04,150
SPEAKER 1: This does take more memory than an array
789
00:36:04,150 --> 00:36:06,699
because I now need space for these pointers.
790
00:36:06,699 --> 00:36:10,670
And to be clear, I technically didn't really draw this to scale.
791
00:36:10,670 --> 00:36:13,600
Thus far, in the class, we've generally thought about integers
792
00:36:13,600 --> 00:36:16,510
like, 1, 2 and 3, as being 4 bytes, or 32 bits.
793
00:36:16,510 --> 00:36:19,540
I made the claim last week that on modern computer's pointers
794
00:36:19,540 --> 00:36:22,570
tend to be 8 bytes or 64 bits.
795
00:36:22,570 --> 00:36:25,280
So, technically, this box should actually be a little bigger.
796
00:36:25,280 --> 00:36:26,980
It was just going to look a little stupid in the picture.
797
00:36:26,980 --> 00:36:28,330
So I abstracted it away.
798
00:36:28,330 --> 00:36:31,330
But, indeed, you're using more space as a result.
799
00:36:31,330 --> 00:36:32,787
AUDIENCE: [INAUDIBLE].
800
00:36:32,787 --> 00:36:34,120
SPEAKER 1: Oh, how does-- sorry.
801
00:36:34,120 --> 00:36:37,970
How does the computer identify useful data from used data?
802
00:36:37,970 --> 00:36:40,780
So, for instance, garbage values or non-garbage values.
803
00:36:40,780 --> 00:36:43,420
For now, think of that as the job of malloc.
804
00:36:43,420 --> 00:36:46,810
So when you ask malloc for memory, as we started to last week,
805
00:36:46,810 --> 00:36:49,990
malloc keeps track of the addresses of the memory
806
00:36:49,990 --> 00:36:52,960
it has handed to as valid values.
807
00:36:52,960 --> 00:36:55,450
The other type of memory you use, not just from the heap.
808
00:36:55,450 --> 00:36:58,390
Because recall we briefly discussed that malloc uses space
809
00:36:58,390 --> 00:37:01,390
from the heap, which was drawn at the top of the picture, pointing down.
810
00:37:01,390 --> 00:37:05,220
There's also stack memory, which is where all of your local variables go.
811
00:37:05,220 --> 00:37:07,720
And where all of the memory used by individual functions go.
812
00:37:07,720 --> 00:37:10,053
And that was drawn in the picture is working its way up.
813
00:37:10,053 --> 00:37:12,820
That's just an artist's rendition of direction.
814
00:37:12,820 --> 00:37:16,180
The compiler, essentially, will also help
815
00:37:16,180 --> 00:37:19,868
keep track of which values are valid or not inside of the stack.
816
00:37:19,868 --> 00:37:21,910
Or really the underlying code that you've written
817
00:37:21,910 --> 00:37:23,243
will keep track of that for you.
818
00:37:23,243 --> 00:37:26,210
So it's managed for you at that point.
819
00:37:26,210 --> 00:37:26,710
All right.
820
00:37:26,710 --> 00:37:27,310
Good question.
821
00:37:27,310 --> 00:37:29,040
Sorry it took me a bit to catch on.
822
00:37:29,040 --> 00:37:31,210
So let's now translate this to actual code.
823
00:37:31,210 --> 00:37:34,780
How could we implement this idea of, let's call these things nodes.
824
00:37:34,780 --> 00:37:36,160
And that's a term of our NCS.
825
00:37:36,160 --> 00:37:40,210
Whenever you have some data structure that encapsulates information, node,
826
00:37:40,210 --> 00:37:42,947
N-O-D-E, is the generic term for that.
827
00:37:42,947 --> 00:37:44,780
So each of these might be said to be a node.
828
00:37:44,780 --> 00:37:45,830
Well, how can we do this?
829
00:37:45,830 --> 00:37:48,622
Well a couple of weeks ago, we saw how we could represent something
830
00:37:48,622 --> 00:37:50,260
like a student or a candidate.
831
00:37:50,260 --> 00:37:54,940
And a student, or rather a person, we said has a name and a number.
832
00:37:54,940 --> 00:37:56,680
And we used a few pieces of syntax here.
833
00:37:56,680 --> 00:37:59,890
One, we use the struct keyword, which gives us a data structure.
834
00:37:59,890 --> 00:38:04,420
We use typedef, which defines the name person to be our new data
835
00:38:04,420 --> 00:38:06,850
type representing that whole structure.
836
00:38:06,850 --> 00:38:08,950
So we probably have the right ingredients here
837
00:38:08,950 --> 00:38:11,500
to build up this thing called a node.
838
00:38:11,500 --> 00:38:14,620
And just to be clear, what should go inside of one of these nodes,
839
00:38:14,620 --> 00:38:15,435
do we think?
840
00:38:15,435 --> 00:38:17,560
It's not going to be a name or a number, obviously.
841
00:38:17,560 --> 00:38:22,250
But what should a node have in terms of those fields, perhaps?
842
00:38:22,250 --> 00:38:22,750
Yeah?
843
00:38:22,750 --> 00:38:23,625
AUDIENCE: [? Data. ?]
844
00:38:23,625 --> 00:38:26,600
SPEAKER 1: So a number like a number and a pointer in some form.
845
00:38:26,600 --> 00:38:28,850
So let's translate this to actual code.
846
00:38:28,850 --> 00:38:33,610
So let's rename person to node to capture this notion here.
847
00:38:33,610 --> 00:38:34,865
And the number is easy.
848
00:38:34,865 --> 00:38:36,740
If it's just going to be an int, that's fine.
849
00:38:36,740 --> 00:38:38,980
We can just say int number, or int n, or whatever
850
00:38:38,980 --> 00:38:41,380
you want to call that particular field.
851
00:38:41,380 --> 00:38:43,072
The next one is a little non-obvious.
852
00:38:43,072 --> 00:38:45,280
And this is where things get a little weird at first,
853
00:38:45,280 --> 00:38:47,830
but, in retrospect, it should all fit together.
854
00:38:47,830 --> 00:38:53,630
Let me propose that, ideally, we would say something like node* next.
855
00:38:53,630 --> 00:38:55,930
And I could call the word next anything I want.
856
00:38:55,930 --> 00:39:00,110
Next just means what comes after me is the notion I'm using it at.
857
00:39:00,110 --> 00:39:02,500
So a lot of CS people would just use next to represent
858
00:39:02,500 --> 00:39:03,880
the name of this pointer.
859
00:39:03,880 --> 00:39:05,260
But there's a catch here.
860
00:39:05,260 --> 00:39:08,440
C and C compilers are pretty naive, recall.
861
00:39:08,440 --> 00:39:11,660
They only look at code top to bottom, left to right.
862
00:39:11,660 --> 00:39:13,840
And any time they encounter a word they have never
863
00:39:13,840 --> 00:39:15,513
seen before, bad things happen.
864
00:39:15,513 --> 00:39:16,930
Like, you can't compile your code.
865
00:39:16,930 --> 00:39:18,920
You get some cryptic error message or the like.
866
00:39:18,920 --> 00:39:21,910
And that seems to be about to happen here.
867
00:39:21,910 --> 00:39:24,970
Because if the compiler is reading this code from top to bottom,
868
00:39:24,970 --> 00:39:27,340
it's going to say, oh, inside of this struct
869
00:39:27,340 --> 00:39:29,140
should be a variable called next.
870
00:39:29,140 --> 00:39:31,000
Which is of type node*.
871
00:39:31,000 --> 00:39:32,200
What the heck is a node?
872
00:39:32,200 --> 00:39:35,470
Because it literally does not find out until 2 lines
873
00:39:35,470 --> 00:39:37,720
later, after that semicolon.
874
00:39:37,720 --> 00:39:40,330
So the way to avoid this, which we haven't quite seen before,
875
00:39:40,330 --> 00:39:45,220
is that you can temporarily name this whole thing up here, struct node.
876
00:39:45,220 --> 00:39:50,560
And then, down here inside of the data structure, you say struct node*.
877
00:39:50,560 --> 00:39:52,210
And then, you leave the rest alone.
878
00:39:52,210 --> 00:39:56,620
This is a workaround this is possible because now you're
879
00:39:56,620 --> 00:39:59,740
teaching the compiler, from the first line, that here comes
880
00:39:59,740 --> 00:40:01,960
a data structure called struct node.
881
00:40:01,960 --> 00:40:05,420
Down here, you're shortening the name of this whole thing to just node.
882
00:40:05,420 --> 00:40:05,920
Why?
883
00:40:05,920 --> 00:40:09,003
It's just a little more convenient than having to write struct everywhere.
884
00:40:09,003 --> 00:40:12,760
But you do have to write struct node* inside of the data structure.
885
00:40:12,760 --> 00:40:15,730
But that's OK because it's already come into existence
886
00:40:15,730 --> 00:40:17,892
now, as of that first line of code.
887
00:40:17,892 --> 00:40:19,600
So that's the only fundamental difference
888
00:40:19,600 --> 00:40:22,900
between what we did last week with a person or a candidate.
889
00:40:22,900 --> 00:40:27,890
We just now have to use this struct workaround, syntactically.
890
00:40:27,890 --> 00:40:28,390
All right.
891
00:40:28,390 --> 00:40:29,170
Yeah, question.
892
00:40:29,170 --> 00:40:33,010
AUDIENCE: So [INAUDIBLE] have like right next to the [INAUDIBLE] point
893
00:40:33,010 --> 00:40:33,970
to another [INAUDIBLE].
894
00:40:33,970 --> 00:40:39,070
SPEAKER 1: Why is the next variable a struct node* pointer and not an int
895
00:40:39,070 --> 00:40:41,150
star pointer, for instance?
896
00:40:41,150 --> 00:40:43,870
So think about the picture we are trying to draw.
897
00:40:43,870 --> 00:40:47,740
Technically, yes, each of these arrows I deliberately drew
898
00:40:47,740 --> 00:40:49,240
is pointing at the number.
899
00:40:49,240 --> 00:40:50,500
But that's not alone.
900
00:40:50,500 --> 00:40:53,320
They need to point at the whole data structure in memory.
901
00:40:53,320 --> 00:40:55,600
Because the computer, ultimately, and the compiler,
902
00:40:55,600 --> 00:40:59,470
in turn, needs to know that this chunk of memory is not just an int.
903
00:40:59,470 --> 00:41:01,040
It is a whole node.
904
00:41:01,040 --> 00:41:04,370
Inside of a node is a number and also another pointer.
905
00:41:04,370 --> 00:41:06,770
So when you draw these arrows, it would be
906
00:41:06,770 --> 00:41:09,380
incorrect to point at just the number.
907
00:41:09,380 --> 00:41:11,757
Because that throws away information that
908
00:41:11,757 --> 00:41:14,090
would leave the compiler wondering, OK, I'm at a number.
909
00:41:14,090 --> 00:41:15,200
Where the heck is the pointer?
910
00:41:15,200 --> 00:41:17,450
You have to tell it that it's pointing at a whole node
911
00:41:17,450 --> 00:41:20,857
so it knows a few bytes away is that corresponding pointer.
912
00:41:20,857 --> 00:41:21,440
Good question.
913
00:41:21,440 --> 00:41:23,183
Yeah.
914
00:41:23,183 --> 00:41:24,630
AUDIENCE: How do you [INAUDIBLE].
915
00:41:24,630 --> 00:41:25,963
SPEAKER 1: Really good question.
916
00:41:25,963 --> 00:41:29,250
It would seem that just as copying the array earlier
917
00:41:29,250 --> 00:41:32,460
required twice as much memory, because we copied from old to new.
918
00:41:32,460 --> 00:41:35,130
So, technically, twice as much plus 1 for the new number.
919
00:41:35,130 --> 00:41:38,520
Here, too, it looks like we're using twice as much memory, also.
920
00:41:38,520 --> 00:41:41,400
And to my comment earlier, it's even more than twice as much memory
921
00:41:41,400 --> 00:41:45,270
because these pointers are 8 bytes, and not just 4 bytes like a typical integer
922
00:41:45,270 --> 00:41:45,870
is.
923
00:41:45,870 --> 00:41:47,280
The differences are these.
924
00:41:47,280 --> 00:41:50,910
In the context of the array, you were using that memory temporarily.
925
00:41:50,910 --> 00:41:52,750
So, yes, you needed twice as much memory.
926
00:41:52,750 --> 00:41:55,600
But then you were quickly freeing the original array.
927
00:41:55,600 --> 00:41:58,890
So you weren't consuming long-term, more memory than you might need.
928
00:41:58,890 --> 00:42:02,290
The difference here, too, is that, as we'll see in a moment,
929
00:42:02,290 --> 00:42:05,670
it turns out it's going to be relatively quick for me, potentially,
930
00:42:05,670 --> 00:42:07,620
to insert new numbers in here.
931
00:42:07,620 --> 00:42:10,620
Because I'm not going to have to do a huge amount of copying.
932
00:42:10,620 --> 00:42:13,800
And even though I might still have to follow all of these arrows, which
933
00:42:13,800 --> 00:42:16,080
is going to take some amount of time, I'm
934
00:42:16,080 --> 00:42:19,470
not going to have to be asking for more memory, freeing more memory.
935
00:42:19,470 --> 00:42:23,190
And certain operations in the computer, anything involving asking for or giving
936
00:42:23,190 --> 00:42:25,000
back memory, tends to be slower.
937
00:42:25,000 --> 00:42:26,858
So we get to avoid that situation as well.
938
00:42:26,858 --> 00:42:28,650
There's going to be some downsides, though.
939
00:42:28,650 --> 00:42:29,700
This is not all upside.
940
00:42:29,700 --> 00:42:33,760
But we'll see in a bit just what some of those trade offs actually are.
941
00:42:33,760 --> 00:42:34,260
All right.
942
00:42:34,260 --> 00:42:38,740
So from here, if we go back to the structure in code as we left it,
943
00:42:38,740 --> 00:42:41,820
let's start to now build up a linked list with some actual code.
944
00:42:41,820 --> 00:42:46,200
How do you go about, in C, representing a linked list in code?
945
00:42:46,200 --> 00:42:48,780
Well, at the moment, it would actually be as simple as this.
946
00:42:48,780 --> 00:42:51,930
You declare a variable, called list, for instance.
947
00:42:51,930 --> 00:42:54,970
That itself stores the address of a node.
948
00:42:54,970 --> 00:42:56,010
That's what node* means.
949
00:42:56,010 --> 00:42:57,220
The address of a node.
950
00:42:57,220 --> 00:42:59,880
So if you want to store a linked list in memory,
951
00:42:59,880 --> 00:43:02,397
you just create a variable called list, or whatever else.
952
00:43:02,397 --> 00:43:04,230
And you just say that this variable is going
953
00:43:04,230 --> 00:43:08,430
to be pointing at the first node in a list, wherever it happens to end up.
954
00:43:08,430 --> 00:43:12,270
Because malloc is ultimately going to be the tool that we use just to go
955
00:43:12,270 --> 00:43:16,270
get at any one particular node in memory.
956
00:43:16,270 --> 00:43:16,770
All right.
957
00:43:16,770 --> 00:43:18,690
So let's actually do this in pictorial form.
958
00:43:18,690 --> 00:43:21,690
When you write a line of code, like I just did here--
959
00:43:21,690 --> 00:43:25,680
and I do not initialize it to anything with the assignment operator,
960
00:43:25,680 --> 00:43:26,730
an equal sign.
961
00:43:26,730 --> 00:43:30,720
It does exist in memory as a box, as I'll draw it here, called list.
962
00:43:30,720 --> 00:43:33,430
But I've deliberately drawn Oscar inside of it.
963
00:43:33,430 --> 00:43:33,930
Why?
964
00:43:33,930 --> 00:43:35,630
To connote what exactly?
965
00:43:35,630 --> 00:43:36,630
AUDIENCE: Garbage value.
966
00:43:36,630 --> 00:43:37,963
SPEAKER 1: It's a garbage value.
967
00:43:37,963 --> 00:43:42,400
I have been allocated the variable in memory, called list.
968
00:43:42,400 --> 00:43:46,470
Which is going to give me 64 bits or 8 bytes somewhere drawn here
969
00:43:46,470 --> 00:43:47,470
with this box.
970
00:43:47,470 --> 00:43:50,220
But if I myself have not used the assignment operator,
971
00:43:50,220 --> 00:43:53,830
it's not going to get magically initialized to any particular address
972
00:43:53,830 --> 00:43:54,330
for me.
973
00:43:54,330 --> 00:43:56,470
It's not going to even give me a node.
974
00:43:56,470 --> 00:44:01,150
This is literally just going to be an address of a future node that exists.
975
00:44:01,150 --> 00:44:02,760
So what would be a solution here?
976
00:44:02,760 --> 00:44:05,760
Suppose that I'm beginning to create my linked list,
977
00:44:05,760 --> 00:44:07,290
but I don't have any nodes yet.
978
00:44:07,290 --> 00:44:11,302
What would be a sensible thing to initialize the list to, perhaps?
979
00:44:11,302 --> 00:44:12,122
AUDIENCE: Null.
980
00:44:12,122 --> 00:44:13,080
SPEAKER 1: Yeah, again.
981
00:44:13,080 --> 00:44:13,838
AUDIENCE: To null.
982
00:44:13,838 --> 00:44:15,130
SPEAKER 1: So just null, right.
983
00:44:15,130 --> 00:44:16,860
When in doubt with pointers, generally it's
984
00:44:16,860 --> 00:44:18,610
a good thing to initialize things to null,
985
00:44:18,610 --> 00:44:20,160
so at least it's not a garbage value.
986
00:44:20,160 --> 00:44:21,420
It's a known value.
987
00:44:21,420 --> 00:44:22,418
Invalid, yes.
988
00:44:22,418 --> 00:44:24,210
But it's a special value you can then check
989
00:44:24,210 --> 00:44:26,140
for with a conditional, or the like.
990
00:44:26,140 --> 00:44:30,120
So this might be a better way to create a linked list,
991
00:44:30,120 --> 00:44:34,120
even before you've inserted any numbers into the thing itself.
992
00:44:34,120 --> 00:44:34,620
All right.
993
00:44:34,620 --> 00:44:37,835
So after that, how can we go about adding something to this linked list?
994
00:44:37,835 --> 00:44:39,210
So now the story looks like this.
995
00:44:39,210 --> 00:44:42,150
Oscar is gone because inside of this box is all zero bits.
996
00:44:42,150 --> 00:44:46,050
Just because it's nice and clean, and this represents an empty linked list.
997
00:44:46,050 --> 00:44:50,590
Well, if I want to add the number 1 to this linked list, what could I do?
998
00:44:50,590 --> 00:44:52,590
Well, perhaps I could start with code like this.
999
00:44:52,590 --> 00:44:54,300
Borrowing inspiration from last week.
1000
00:44:54,300 --> 00:44:58,920
Let's ask malloc for enough space for the size of a node.
1001
00:44:58,920 --> 00:45:03,060
And this gets to your question earlier, like, what is it I'm manipulating here?
1002
00:45:03,060 --> 00:45:06,360
I don't just need space for an int and I don't just need space for a pointer.
1003
00:45:06,360 --> 00:45:07,440
I need space for both.
1004
00:45:07,440 --> 00:45:10,150
And I gave that thing a name, node.
1005
00:45:10,150 --> 00:45:12,930
So size of node figures out and does the arithmetic for me.
1006
00:45:12,930 --> 00:45:15,390
And gives me back the right number of bytes.
1007
00:45:15,390 --> 00:45:18,930
This, then, stores the address of that chunk of memory
1008
00:45:18,930 --> 00:45:20,880
in what I'll temporarily called n.
1009
00:45:20,880 --> 00:45:23,160
Just to represent a generic new node.
1010
00:45:23,160 --> 00:45:24,870
And it's of type node*.
1011
00:45:24,870 --> 00:45:28,080
Because just like last week when I asked malloc for enough space for an int
1012
00:45:28,080 --> 00:45:30,360
and I stored it in an int* pointer.
1013
00:45:30,360 --> 00:45:32,760
This week, if I'm asking for memory for a node,
1014
00:45:32,760 --> 00:45:35,340
I'm storing it in a node* pointer.
1015
00:45:35,340 --> 00:45:38,520
So technically, nothing new there except for this new term
1016
00:45:38,520 --> 00:45:41,020
of art in data structure called node.
1017
00:45:41,020 --> 00:45:41,520
All right.
1018
00:45:41,520 --> 00:45:42,870
So what does that do for me?
1019
00:45:42,870 --> 00:45:45,660
It essentially draws a picture like this in memory.
1020
00:45:45,660 --> 00:45:49,690
I still have my list variable from my previous line of code initialize
1021
00:45:49,690 --> 00:45:50,190
to null.
1022
00:45:50,190 --> 00:45:51,648
And that's why I've drawn it blank.
1023
00:45:51,648 --> 00:45:54,060
I also now have a temporary variable called
1024
00:45:54,060 --> 00:45:57,570
n, which I initialize to the return value of malloc.
1025
00:45:57,570 --> 00:45:59,650
Which gave me one of these nodes in memory.
1026
00:45:59,650 --> 00:46:02,130
But I've drawn it having garbage values, too,
1027
00:46:02,130 --> 00:46:03,850
because I don't know what int is there.
1028
00:46:03,850 --> 00:46:05,308
I don't know what pointer is there.
1029
00:46:05,308 --> 00:46:09,600
It's garbage values because malloc does not magically initialize memory for me.
1030
00:46:09,600 --> 00:46:11,250
There is another function for that.
1031
00:46:11,250 --> 00:46:14,100
But malloc alone just says, sure, use this chunk of memory.
1032
00:46:14,100 --> 00:46:15,910
Deal with whatever is there.
1033
00:46:15,910 --> 00:46:18,900
So how can I go about initializing this to known values?
1034
00:46:18,900 --> 00:46:23,440
Well, suppose I want to insert the number 1 and then, leave it at that.
1035
00:46:23,440 --> 00:46:27,212
A list of size 1, I could do something like this.
1036
00:46:27,212 --> 00:46:29,920
And this is where you have to think back to some of these basics.
1037
00:46:29,920 --> 00:46:34,060
My conditional here is asking the question if n does not equal null.
1038
00:46:34,060 --> 00:46:37,210
So that is, if malloc gave me valid memory,
1039
00:46:37,210 --> 00:46:40,690
and I don't have to quit altogether because my computer's out of memory.
1040
00:46:40,690 --> 00:46:44,590
If n does not equal null, but is equal to valid address,
1041
00:46:44,590 --> 00:46:46,070
I'm going to go ahead and do this.
1042
00:46:46,070 --> 00:46:48,820
And this is cryptic looking syntax now.
1043
00:46:48,820 --> 00:46:52,150
But does someone want to take a stab at translating this inside line of code
1044
00:46:52,150 --> 00:46:56,380
to English, in some sense?
1045
00:46:56,380 --> 00:47:00,520
How might you explain what that inner line of code is doing? *n.
1046
00:47:00,520 --> 00:47:03,130
number equals 1.
1047
00:47:03,130 --> 00:47:05,355
Let me go further back.
1048
00:47:05,355 --> 00:47:06,477
Nope?
1049
00:47:06,477 --> 00:47:07,060
OK, over here.
1050
00:47:07,060 --> 00:47:07,772
Yeah.
1051
00:47:07,772 --> 00:47:09,010
AUDIENCE: [INAUDIBLE].
1052
00:47:09,010 --> 00:47:09,802
SPEAKER 1: Perfect.
1053
00:47:09,802 --> 00:47:12,160
The place that n is pointing to, set it equal to 1.
1054
00:47:12,160 --> 00:47:16,060
Or using the vernacular of going there, go to the address in n
1055
00:47:16,060 --> 00:47:18,480
and set it's number field to 1.
1056
00:47:18,480 --> 00:47:20,480
However you want to think about it, that's fine.
1057
00:47:20,480 --> 00:47:22,930
But the * again is the dereference operator here.
1058
00:47:22,930 --> 00:47:24,730
And we're doing the parentheses, which we
1059
00:47:24,730 --> 00:47:28,240
haven't needed to do before because we haven't dealt with pointers and data
1060
00:47:28,240 --> 00:47:30,010
structures together until today.
1061
00:47:30,010 --> 00:47:32,380
This just means go there first.
1062
00:47:32,380 --> 00:47:34,720
And then once you're there, go access number.
1063
00:47:34,720 --> 00:47:36,830
You don't want to do one thing before the other.
1064
00:47:36,830 --> 00:47:38,890
So this is just enforcing order of operations.
1065
00:47:38,890 --> 00:47:41,300
The parentheses just like in grade school math.
1066
00:47:41,300 --> 00:47:41,800
All right.
1067
00:47:41,800 --> 00:47:43,210
So this line of code is cryptic.
1068
00:47:43,210 --> 00:47:43,982
It's ugly.
1069
00:47:43,982 --> 00:47:45,940
It's not something most people easily remember.
1070
00:47:45,940 --> 00:47:49,750
Thankfully, there's that syntactic sugar that simplifies this line of code
1071
00:47:49,750 --> 00:47:50,857
to just this.
1072
00:47:50,857 --> 00:47:52,690
And this, even though it's new to you today,
1073
00:47:52,690 --> 00:47:54,820
should eventually feel a little more familiar.
1074
00:47:54,820 --> 00:47:58,210
Because this now is shorthand notation for saying, start at n.
1075
00:47:58,210 --> 00:48:00,410
Go there as by following the arrow.
1076
00:48:00,410 --> 00:48:02,530
And when you get there, change the number field.
1077
00:48:02,530 --> 00:48:04,720
In this case, to 1.
1078
00:48:04,720 --> 00:48:07,240
So most people would not write code like this.
1079
00:48:07,240 --> 00:48:08,030
It's just ugly.
1080
00:48:08,030 --> 00:48:09,430
It's a couple extra keystrokes.
1081
00:48:09,430 --> 00:48:13,300
This just looks more like the artist's renditions we've been talking about.
1082
00:48:13,300 --> 00:48:17,530
And how most CS people would think about pointers as really just being arrows
1083
00:48:17,530 --> 00:48:18,710
in some form.
1084
00:48:18,710 --> 00:48:19,210
All right.
1085
00:48:19,210 --> 00:48:20,293
So what have we just done?
1086
00:48:20,293 --> 00:48:24,650
The picture now, after setting number to 1, looks a little something like this.
1087
00:48:24,650 --> 00:48:26,440
So there's still one step missing.
1088
00:48:26,440 --> 00:48:28,720
And that's, of course, to initialize, it would seem,
1089
00:48:28,720 --> 00:48:33,080
the pointer in this new node to something known like null.
1090
00:48:33,080 --> 00:48:34,735
So I bet we could do this like this.
1091
00:48:34,735 --> 00:48:36,610
With a different line of code, I'm just going
1092
00:48:36,610 --> 00:48:42,880
to say if n does not equal null, then set n's next field to null.
1093
00:48:42,880 --> 00:48:46,540
Or more pedantically, go to n, follow the arrow,
1094
00:48:46,540 --> 00:48:50,440
and then update the next field that you find there to equal null.
1095
00:48:50,440 --> 00:48:52,690
And again, this is just doing some nice bookkeeping.
1096
00:48:52,690 --> 00:48:55,870
Technically speaking, we might not need to set
1097
00:48:55,870 --> 00:48:58,910
this to null if we're going to keep adding more and more numbers to it.
1098
00:48:58,910 --> 00:49:02,110
But I'm doing it step-by-step so that I have a very clean picture.
1099
00:49:02,110 --> 00:49:05,800
And there's no bugs in my code at this point.
1100
00:49:05,800 --> 00:49:07,270
But I'm still not done.
1101
00:49:07,270 --> 00:49:09,730
There's one last thing I'm going to have to do here.
1102
00:49:09,730 --> 00:49:14,950
If the goal, ultimately, was to insert the number 1 into my linked list,
1103
00:49:14,950 --> 00:49:18,860
what's the last step I should, perhaps, do here?
1104
00:49:18,860 --> 00:49:20,050
Just been English is fine.
1105
00:49:20,050 --> 00:49:20,550
Yeah.
1106
00:49:20,550 --> 00:49:23,260
AUDIENCE: Set the pointer value to null.
1107
00:49:23,260 --> 00:49:24,010
SPEAKER 1: Yes.
1108
00:49:24,010 --> 00:49:27,970
I now need to update the actual variable, that represents my linked
1109
00:49:27,970 --> 00:49:31,030
list, to point at this brand new node.
1110
00:49:31,030 --> 00:49:35,317
That is now perfectly initialized as having an integer and a null pointer.
1111
00:49:35,317 --> 00:49:37,400
Yeah, technically, this is already pointing there.
1112
00:49:37,400 --> 00:49:40,090
But I describe this deliberately earlier as being temporary.
1113
00:49:40,090 --> 00:49:44,620
I just needed this to get it back from malloc and clean things up, initially.
1114
00:49:44,620 --> 00:49:47,230
This is the long term variable I care about.
1115
00:49:47,230 --> 00:49:49,480
So I'm going to want to do something simple like this.
1116
00:49:49,480 --> 00:49:51,520
List equals n.
1117
00:49:51,520 --> 00:49:53,863
And this seems a little weird that list equals n.
1118
00:49:53,863 --> 00:49:55,780
But again, think about what's inside this box.
1119
00:49:55,780 --> 00:49:57,988
At the moment this is null because there is no linked
1120
00:49:57,988 --> 00:49:59,530
list at the beginning of our story.
1121
00:49:59,530 --> 00:50:03,910
N is the address of the beginning, and it turns out, end of our linked list.
1122
00:50:03,910 --> 00:50:07,300
So it stands to reason that if you set list equal to n,
1123
00:50:07,300 --> 00:50:10,180
that has the effect of copying this address up here.
1124
00:50:10,180 --> 00:50:13,283
Or really just copying the arrow into that same location
1125
00:50:13,283 --> 00:50:14,950
so that now the picture looks like this.
1126
00:50:14,950 --> 00:50:18,340
And heck, if this was a temporary variable, it will eventually go away.
1127
00:50:18,340 --> 00:50:19,870
And now, this is the picture.
1128
00:50:19,870 --> 00:50:22,030
So an annoying number of steps, certainly,
1129
00:50:22,030 --> 00:50:24,520
to walk through verbally like this.
1130
00:50:24,520 --> 00:50:26,680
But it's just malloc to give yourself a node,
1131
00:50:26,680 --> 00:50:31,930
initialize the 2 fields inside of it, update the linked list, and boom,
1132
00:50:31,930 --> 00:50:32,770
you're on your way.
1133
00:50:32,770 --> 00:50:34,910
I didn't have to copy anything.
1134
00:50:34,910 --> 00:50:38,132
I just had to insert something in this case.
1135
00:50:38,132 --> 00:50:40,840
Let me pause here to see if there's any questions on those steps.
1136
00:50:40,840 --> 00:50:44,790
And we'll see before long it all in context with some larger code.
1137
00:50:44,790 --> 00:50:48,965
AUDIENCE: So if the statements [INAUDIBLE]..
1138
00:50:48,965 --> 00:50:49,590
SPEAKER 1: Yes.
1139
00:50:49,590 --> 00:50:53,010
I drew them separately just for the sake of the voiceover
1140
00:50:53,010 --> 00:50:55,020
of doing each thing very methodically.
1141
00:50:55,020 --> 00:50:57,090
In real code, as we'll transition to now,
1142
00:50:57,090 --> 00:50:59,220
I could have and should have just done it
1143
00:50:59,220 --> 00:51:03,000
all inside of one conditional after checking if n is not equal to null.
1144
00:51:03,000 --> 00:51:05,310
I could set number to a value like 1.
1145
00:51:05,310 --> 00:51:08,415
And I could set the pointer itself to something like null.
1146
00:51:08,415 --> 00:51:09,030
All right.
1147
00:51:09,030 --> 00:51:12,600
Well let's translate, then, this into some similar code
1148
00:51:12,600 --> 00:51:17,340
that allows us to build up a linked list now using code similar in spirit
1149
00:51:17,340 --> 00:51:18,150
to before.
1150
00:51:18,150 --> 00:51:19,900
But now, using this new primitive.
1151
00:51:19,900 --> 00:51:22,140
So I'm going to go back into VS Code here.
1152
00:51:22,140 --> 00:51:25,470
I'm going to go ahead now and delete the entirety of this old version that
1153
00:51:25,470 --> 00:51:27,270
was entirely array-based.
1154
00:51:27,270 --> 00:51:32,470
And now, inside of my main function, I'm going to go ahead and first do this.
1155
00:51:32,470 --> 00:51:36,180
I'm going to first give myself a list of size 0.
1156
00:51:36,180 --> 00:51:38,610
And I'm going to call that node* list.
1157
00:51:38,610 --> 00:51:41,610
And I'm going to initialize that to null, as we proposed earlier.
1158
00:51:41,610 --> 00:51:44,760
But I'm also now going to have to take the additional step of defining
1159
00:51:44,760 --> 00:51:45,970
what this node is.
1160
00:51:45,970 --> 00:51:49,500
So recall that I might do something like typedef, struct node.
1161
00:51:49,500 --> 00:51:52,320
Inside of this struct node, I'm going to have a number, which
1162
00:51:52,320 --> 00:51:54,010
I'll call number of type int.
1163
00:51:54,010 --> 00:51:56,160
And I'm going to have a structure called node
1164
00:51:56,160 --> 00:51:59,470
with a * that says the next pointer is called next.
1165
00:51:59,470 --> 00:52:03,150
And I'm going to call this whole thing, more succinctly, node,
1166
00:52:03,150 --> 00:52:04,830
instead of struct node.
1167
00:52:04,830 --> 00:52:07,920
Now as an aside, for those of you wondering what the difference really
1168
00:52:07,920 --> 00:52:09,600
is between struct and node.
1169
00:52:09,600 --> 00:52:12,450
Technically, I could do something like this.
1170
00:52:12,450 --> 00:52:15,960
Not use typedef and not use the word node alone.
1171
00:52:15,960 --> 00:52:19,680
This syntax here would actually create for me a new data
1172
00:52:19,680 --> 00:52:22,830
type called, verbosely, struct node.
1173
00:52:22,830 --> 00:52:25,440
And I could use this throughout my code saying struct node.
1174
00:52:25,440 --> 00:52:26,460
Struct node.
1175
00:52:26,460 --> 00:52:27,840
That just gets a little tedious.
1176
00:52:27,840 --> 00:52:30,715
And it would be nicer just to refer to this thing more simplistically
1177
00:52:30,715 --> 00:52:31,750
as a node.
1178
00:52:31,750 --> 00:52:34,230
So what typedef has been doing for us is it,
1179
00:52:34,230 --> 00:52:37,770
again, lets us invent our own word that's even more succinct.
1180
00:52:37,770 --> 00:52:41,040
And this just has the effect now of calling this whole thing
1181
00:52:41,040 --> 00:52:44,760
node without the need, subsequently, to keep saying struct all over the place.
1182
00:52:44,760 --> 00:52:46,170
Just FYI.
1183
00:52:46,170 --> 00:52:46,680
All right.
1184
00:52:46,680 --> 00:52:50,050
So now that this thing exists in main, let's go ahead and do this.
1185
00:52:50,050 --> 00:52:52,770
Let's add a number to list.
1186
00:52:52,770 --> 00:52:55,440
And to do this, I'm going to give myself a temporary variable.
1187
00:52:55,440 --> 00:52:57,340
I'll call it n for consistency.
1188
00:52:57,340 --> 00:53:00,540
I'm going to use malloc to give myself the size of a node,
1189
00:53:00,540 --> 00:53:02,080
just like in our slides.
1190
00:53:02,080 --> 00:53:03,540
And then, I'm going to do a little safety check.
1191
00:53:03,540 --> 00:53:06,470
If n equals equals null, I'm going to do the opposite of the slides.
1192
00:53:06,470 --> 00:53:08,220
I'm just going to quit out of this program
1193
00:53:08,220 --> 00:53:10,960
because there's nothing useful to be done at this point.
1194
00:53:10,960 --> 00:53:13,570
But most likely my computer is not going to run out of memory.
1195
00:53:13,570 --> 00:53:16,750
So I'm going to assume we can keep going with some of the logic here.
1196
00:53:16,750 --> 00:53:21,390
If n does not equal null, and that is it's a valid memory address,
1197
00:53:21,390 --> 00:53:23,370
I'm going to say n []--
1198
00:53:23,370 --> 00:53:24,930
I'm going to build this up backwards.
1199
00:53:24,930 --> 00:53:26,707
Well let's do.
1200
00:53:26,707 --> 00:53:28,290
That's OK, let's go ahead and do this.
1201
00:53:28,290 --> 00:53:30,600
N [number] equals 1.
1202
00:53:30,600 --> 00:53:35,490
And then n [arrow next] equals null.
1203
00:53:35,490 --> 00:53:42,420
And now, update list to point to new node, list equals n.
1204
00:53:42,420 --> 00:53:44,580
So at this point in the story, we've essentially
1205
00:53:44,580 --> 00:53:49,330
constructed what was that first picture, which looks like this.
1206
00:53:49,330 --> 00:53:53,880
This is the corresponding code via which we built up this node in memory.
1207
00:53:53,880 --> 00:53:56,860
Suppose now, we want to add the number 2 to the list.
1208
00:53:56,860 --> 00:53:58,080
So let's do this again.
1209
00:53:58,080 --> 00:54:02,550
Add a number to list.
1210
00:54:02,550 --> 00:54:03,910
How might I do this?
1211
00:54:03,910 --> 00:54:06,330
Well, I don't need to redeclare n because I can use
1212
00:54:06,330 --> 00:54:08,110
the same temporary variables before.
1213
00:54:08,110 --> 00:54:13,310
So this time, I'm just going to say n equals malloc and the size of a node.
1214
00:54:13,310 --> 00:54:15,060
I'm, again, going to have my safety check.
1215
00:54:15,060 --> 00:54:19,290
So if n equals equals null, then let's just quit out of this altogether.
1216
00:54:19,290 --> 00:54:23,820
But, I have to be a little more careful now.
1217
00:54:23,820 --> 00:54:26,160
Technically speaking, what do I still need
1218
00:54:26,160 --> 00:54:30,540
to do before I quit out of my program to be really proper?
1219
00:54:30,540 --> 00:54:33,880
Free the memory that did succeed a little higher up.
1220
00:54:33,880 --> 00:54:39,280
So I think it suffices to free what is now called list, way at the top.
1221
00:54:39,280 --> 00:54:39,780
All right.
1222
00:54:39,780 --> 00:54:46,260
Now, if all was well, though, let's go ahead and say n [number] equals 2.
1223
00:54:46,260 --> 00:54:51,840
And now, n [arrow next] equals null.
1224
00:54:51,840 --> 00:54:54,900
And now, let's go ahead and add it to the list.
1225
00:54:54,900 --> 00:55:02,910
If I go ahead and do list arrow next equals n,
1226
00:55:02,910 --> 00:55:06,660
I think what we've just done is build up the equivalent, now,
1227
00:55:06,660 --> 00:55:09,660
of this in the computer's memory.
1228
00:55:09,660 --> 00:55:12,180
By going to the list field's next field, which
1229
00:55:12,180 --> 00:55:16,080
is synonymous with the 1 nodes, bottom-most box.
1230
00:55:16,080 --> 00:55:19,540
And store the address of what was n, which a moment ago looked like this.
1231
00:55:19,540 --> 00:55:22,390
And I'm just throwing away, in the picture, the temporary variable.
1232
00:55:22,390 --> 00:55:22,890
All right.
1233
00:55:22,890 --> 00:55:24,880
One last thing to do.
1234
00:55:24,880 --> 00:55:30,087
Let me go down here and say, add a number to list, n equals malloc.
1235
00:55:30,087 --> 00:55:31,170
Let's do it one more time.
1236
00:55:31,170 --> 00:55:32,340
Size of node.
1237
00:55:32,340 --> 00:55:35,280
And clearly, in a real program, we might want to start using a loop.
1238
00:55:35,280 --> 00:55:39,060
And do this dynamically or a function because it's a lot of repetition now.
1239
00:55:39,060 --> 00:55:42,120
But just to go through the syntax here, this is fine.
1240
00:55:42,120 --> 00:55:45,700
If n equals equals null, out of memory for some reason.
1241
00:55:45,700 --> 00:55:51,650
Let's return 1, but we should free the list itself
1242
00:55:51,650 --> 00:55:55,450
and even the second node, list [next].
1243
00:55:55,450 --> 00:55:58,730
But I've deliberately done this poorly.
1244
00:55:58,730 --> 00:55:59,230
All right.
1245
00:55:59,230 --> 00:56:01,240
This is a little more subtle now.
1246
00:56:01,240 --> 00:56:04,570
And let me get rid of the highlighting just so it's a little more visible.
1247
00:56:04,570 --> 00:56:08,890
If n happens to equal equal null, and something really just
1248
00:56:08,890 --> 00:56:15,040
went wrong they're out of memory, why am I freeing 2 addresses now?
1249
00:56:15,040 --> 00:56:17,770
And again, it's not that I'm freeing those variables per se.
1250
00:56:17,770 --> 00:56:21,620
I'm freeing the addresses at in those variables.
1251
00:56:21,620 --> 00:56:23,890
But there's also a bug with my code here.
1252
00:56:23,890 --> 00:56:26,290
And it's subtle.
1253
00:56:26,290 --> 00:56:27,580
Let me ask more pointedly.
1254
00:56:27,580 --> 00:56:31,683
This line here, 43, what is that freeing specifically?
1255
00:56:31,683 --> 00:56:32,350
Can I go to you?
1256
00:56:32,350 --> 00:56:34,900
AUDIENCE: You're freeing list 2 times.
1257
00:56:34,900 --> 00:56:36,640
SPEAKER 1: I'm freeing, not so.
1258
00:56:36,640 --> 00:56:37,150
That's OK.
1259
00:56:37,150 --> 00:56:38,740
I'm not freeing list 2 times.
1260
00:56:38,740 --> 00:56:41,530
Technically, I'm freeing list once and list next once.
1261
00:56:41,530 --> 00:56:43,600
But let me just ask the more explicit question.
1262
00:56:43,600 --> 00:56:46,420
What am I freeing with line 43 at the moment?
1263
00:56:46,420 --> 00:56:49,420
Which node?
1264
00:56:49,420 --> 00:56:50,930
I think node number 1.
1265
00:56:50,930 --> 00:56:51,430
Why?
1266
00:56:51,430 --> 00:56:53,440
Because if 1 is at the beginning of the list,
1267
00:56:53,440 --> 00:56:56,530
list contains the address of that number 1 node.
1268
00:56:56,530 --> 00:56:58,280
And so this frees that node.
1269
00:56:58,280 --> 00:57:01,250
This line of code, you might think now intuitively, OK,
1270
00:57:01,250 --> 00:57:03,610
it's probably freeing the node number 2.
1271
00:57:03,610 --> 00:57:04,540
But this is bad.
1272
00:57:04,540 --> 00:57:05,410
And this is subtle.
1273
00:57:05,410 --> 00:57:07,120
Valgrind might help you catch this.
1274
00:57:07,120 --> 00:57:09,520
But by eyeing it, it's not necessarily obvious.
1275
00:57:09,520 --> 00:57:13,990
You should never touch memory that you have already freed.
1276
00:57:13,990 --> 00:57:16,930
And so, the fact that I did in this order, very bad.
1277
00:57:16,930 --> 00:57:19,630
Because I'm telling the operating system, I don't know.
1278
00:57:19,630 --> 00:57:22,150
I don't need the list address anymore.
1279
00:57:22,150 --> 00:57:23,410
Do with it what you want.
1280
00:57:23,410 --> 00:57:25,660
And then, literally one line later, you're saying, wait a minute.
1281
00:57:25,660 --> 00:57:27,730
Let me actually go to that address for a moment
1282
00:57:27,730 --> 00:57:30,400
and look at the next field of that first node.
1283
00:57:30,400 --> 00:57:31,220
It's too late.
1284
00:57:31,220 --> 00:57:33,710
You've already given up control over the node.
1285
00:57:33,710 --> 00:57:36,730
So it's an easy fix in this case, logically.
1286
00:57:36,730 --> 00:57:39,370
But we should be freeing the second node first
1287
00:57:39,370 --> 00:57:43,060
and then the first one so that we're doing it
1288
00:57:43,060 --> 00:57:45,040
in, essentially, reverse order.
1289
00:57:45,040 --> 00:57:46,957
And again, Valgrind would help you catch that.
1290
00:57:46,957 --> 00:57:49,582
But that's the kind of thing one needs to be careful about when
1291
00:57:49,582 --> 00:57:50,600
touching memory at all.
1292
00:57:50,600 --> 00:57:53,110
You cannot touch memory after you freed it.
1293
00:57:53,110 --> 00:57:54,970
But here is my last step.
1294
00:57:54,970 --> 00:58:00,490
Let me go ahead and update the number field of n to be 3.
1295
00:58:00,490 --> 00:58:03,500
The next node of n to be null.
1296
00:58:03,500 --> 00:58:05,290
And then, just like in the slide earlier,
1297
00:58:05,290 --> 00:58:11,020
I think I can do list next, next equals n.
1298
00:58:11,020 --> 00:58:14,890
And that has the effect now of building up in the computer's memory,
1299
00:58:14,890 --> 00:58:16,990
essentially, this data structure.
1300
00:58:16,990 --> 00:58:17,890
Very manually.
1301
00:58:17,890 --> 00:58:18,820
Very pedantically.
1302
00:58:18,820 --> 00:58:20,860
Like, in a better world, we'd have a loop and some functions
1303
00:58:20,860 --> 00:58:22,420
that are automating this process.
1304
00:58:22,420 --> 00:58:26,680
But, for now, we're doing it just to play around with the syntax.
1305
00:58:26,680 --> 00:58:31,420
So at this point, unfortunately, suppose I want to print the numbers.
1306
00:58:31,420 --> 00:58:36,190
It's no longer as easy as int i equals 0, i less than 3, i++.
1307
00:58:36,190 --> 00:58:43,420
Because you cannot just do something like this.
1308
00:58:43,420 --> 00:58:48,520
Because pointer arithmetic no longer comes into play
1309
00:58:48,520 --> 00:58:52,750
when it's you, who are stitching together the data structure in memory.
1310
00:58:52,750 --> 00:58:55,450
In all of our past examples with arrays, you've
1311
00:58:55,450 --> 00:58:58,820
been trusting that all of the bytes in the array are back, to back, to back.
1312
00:58:58,820 --> 00:59:01,533
So it's perfectly reasonable for the compiler and the computer
1313
00:59:01,533 --> 00:59:04,450
to just figure out, oh, well if you want [0], that's at the beginning.
1314
00:59:04,450 --> 00:59:06,130
[1], it's one location over.
1315
00:59:06,130 --> 00:59:08,110
[2], it's one location over.
1316
00:59:08,110 --> 00:59:11,030
This is way less obvious now.
1317
00:59:11,030 --> 00:59:14,650
Because even though you might want to go to the first element in the linked
1318
00:59:14,650 --> 00:59:19,270
list, or the second, or the third, you can't just jump to those arithmetically
1319
00:59:19,270 --> 00:59:20,590
by doing a bit of math.
1320
00:59:20,590 --> 00:59:24,040
Instead, you have to follow all of those arrows.
1321
00:59:24,040 --> 00:59:27,340
So with linked lists, you can't use this square bracket notation anymore
1322
00:59:27,340 --> 00:59:30,310
because one node might be here, over here, over here, over here.
1323
00:59:30,310 --> 00:59:33,550
You can't just use some simple offset.
1324
00:59:33,550 --> 00:59:36,340
So I think our code is going to have to be a little fancier.
1325
00:59:36,340 --> 00:59:39,820
And this might look scary at first, but it's just an application
1326
00:59:39,820 --> 00:59:42,160
of some of the basic definitions here.
1327
00:59:42,160 --> 00:59:49,480
Let me do a for-loop that actually uses a node* variable initialized
1328
00:59:49,480 --> 00:59:51,130
to the list itself.
1329
00:59:51,130 --> 00:59:55,780
I'm going to keep doing this, so long as TMP does not equal null.
1330
00:59:55,780 --> 00:59:58,360
And on each iteration of this loop, I'm going
1331
00:59:58,360 --> 01:00:03,100
to update TMP to be whatever TMP arrow next is.
1332
01:00:03,100 --> 01:00:05,710
And I'll remind you in a moment and explain in more detail.
1333
01:00:05,710 --> 01:00:09,730
But when I print something here with printf, I can still use %i.
1334
01:00:09,730 --> 01:00:12,040
Because it's still a number at the end of the day.
1335
01:00:12,040 --> 01:00:16,640
But what I want to print out is the number in this temporary variable.
1336
01:00:16,640 --> 01:00:19,032
So maybe the ugliest for-loop we've ever seen.
1337
01:00:19,032 --> 01:00:21,490
Because it's mixing, not just the idea of a for-loop, which
1338
01:00:21,490 --> 01:00:23,500
itself was a bit cryptic weeks ago.
1339
01:00:23,500 --> 01:00:26,025
But now, I'm using pointers instead of integers.
1340
01:00:26,025 --> 01:00:28,150
But I'm not violating the definition of a for-loop.
1341
01:00:28,150 --> 01:00:30,940
Recall that a for-loop has 3 main things in parentheses.
1342
01:00:30,940 --> 01:00:32,800
What do you want to initialize first?
1343
01:00:32,800 --> 01:00:35,740
What condition do you want to keep checking again and again?
1344
01:00:35,740 --> 01:00:39,440
And what update do you want to make on every iteration of the loop?
1345
01:00:39,440 --> 01:00:41,860
So with that basic definition in mind, this
1346
01:00:41,860 --> 01:00:44,350
is giving me a temporary variable called TMP
1347
01:00:44,350 --> 01:00:46,520
that is initialized to the beginning of the loop.
1348
01:00:46,520 --> 01:00:50,110
So it's like pointing my finger at the number 1 node.
1349
01:00:50,110 --> 01:00:53,530
Then, I'm asking the question, does TMP not equal null?
1350
01:00:53,530 --> 01:00:56,170
Well, hopefully, not because I'm pointing at a valid node
1351
01:00:56,170 --> 01:00:57,710
that is the number 1 node.
1352
01:00:57,710 --> 01:00:59,530
So, of course, it doesn't equal null yet.
1353
01:00:59,530 --> 01:01:02,030
Null won't be until we get to the end of the list.
1354
01:01:02,030 --> 01:01:03,530
So what do I do?
1355
01:01:03,530 --> 01:01:05,260
I started this TMP variable.
1356
01:01:05,260 --> 01:01:10,270
I follow the arrow and go to the number field they're in.
1357
01:01:10,270 --> 01:01:11,350
What do I then do?
1358
01:01:11,350 --> 01:01:15,010
The for-loop says, change TMP to be whatever
1359
01:01:15,010 --> 01:01:19,090
is at TMP, by following the arrow and grabbing the next field.
1360
01:01:19,090 --> 01:01:22,260
That, then, has the result of being checked against this conditional.
1361
01:01:22,260 --> 01:01:24,760
No, of course, it doesn't equal null because the second node
1362
01:01:24,760 --> 01:01:26,050
is the number 2 node.
1363
01:01:26,050 --> 01:01:27,920
Null is still at the very end.
1364
01:01:27,920 --> 01:01:29,710
So I print out the number 2.
1365
01:01:29,710 --> 01:01:33,670
Next step, I update TMP one more time to be whatever is next.
1366
01:01:33,670 --> 01:01:36,230
That, then, does not yet equal null.
1367
01:01:36,230 --> 01:01:38,470
So I go ahead and print out the number 3 node.
1368
01:01:38,470 --> 01:01:44,120
Then one last time, I update TMP to be whatever TMP is in the next field.
1369
01:01:44,120 --> 01:01:47,980
But after 1, 2, 3, that last next field is null.
1370
01:01:47,980 --> 01:01:51,790
And so, I break out of this for-loop altogether.
1371
01:01:51,790 --> 01:01:54,730
So if I do this in pictorial form, all we're
1372
01:01:54,730 --> 01:01:58,300
doing, if I now use my finger to represent the TMP variable.
1373
01:01:58,300 --> 01:02:02,080
I initialize TMP to be whatever list is, so it points here.
1374
01:02:02,080 --> 01:02:04,780
That's obviously not null so I print out whatever
1375
01:02:04,780 --> 01:02:09,100
is that TMP, follow the arrow in number, and I print that out.
1376
01:02:09,100 --> 01:02:11,290
Then I update TMP to point here.
1377
01:02:11,290 --> 01:02:13,077
Then I update TMP to point here.
1378
01:02:13,077 --> 01:02:14,410
Then I update TMP to point here.
1379
01:02:14,410 --> 01:02:15,160
Wait, that's null.
1380
01:02:15,160 --> 01:02:17,480
The for-loop ends.
1381
01:02:17,480 --> 01:02:21,670
So, again, admittedly much more cryptic than our familiar int i equals 0,
1382
01:02:21,670 --> 01:02:22,610
and so forth.
1383
01:02:22,610 --> 01:02:28,855
But it's just a different utilization of the for-loop syntax.
1384
01:02:28,855 --> 01:02:29,355
Yes.
1385
01:02:29,355 --> 01:02:33,140
AUDIENCE: How does it happen that you're always printing out the numbers.
1386
01:02:33,140 --> 01:02:35,018
Because it seems to me that addresses-
1387
01:02:35,018 --> 01:02:36,060
SPEAKER 1: Good question.
1388
01:02:36,060 --> 01:02:39,060
How is it that I'm actually printing numbers and not printing out
1389
01:02:39,060 --> 01:02:40,440
addresses instead.
1390
01:02:40,440 --> 01:02:42,120
The compiler is helping me here.
1391
01:02:42,120 --> 01:02:44,730
Because I taught it, in the very beginning of my program,
1392
01:02:44,730 --> 01:02:45,360
what a node is.
1393
01:02:45,360 --> 01:02:47,730
Which looks like this here.
1394
01:02:47,730 --> 01:02:51,510
The compiler knows that a node has a number of fields and a next field
1395
01:02:51,510 --> 01:02:53,430
down here, in the for-loop.
1396
01:02:53,430 --> 01:02:59,410
Because I'm iterating using a node* pointer, and not an int* pointer,
1397
01:02:59,410 --> 01:03:02,160
the compiler knows that any time I'm pointing at something,
1398
01:03:02,160 --> 01:03:03,940
I'm pointing at the whole node.
1399
01:03:03,940 --> 01:03:07,020
Doesn't matter where specifically in the rectangle I'm pointing per se.
1400
01:03:07,020 --> 01:03:09,210
It's, ultimately, pointing at the whole node itself.
1401
01:03:09,210 --> 01:03:13,320
And the fact that I, then, use TMP arrow number means, OK,
1402
01:03:13,320 --> 01:03:14,490
adjust your finger slightly.
1403
01:03:14,490 --> 01:03:18,510
So you're literally pointing at the number field and not the next field.
1404
01:03:18,510 --> 01:03:22,920
So that's sufficient information for the computer to distinguish the 2.
1405
01:03:22,920 --> 01:03:23,560
Good question.
1406
01:03:23,560 --> 01:03:26,730
Other questions then on this approach here.
1407
01:03:26,730 --> 01:03:28,042
Yeah, in the back.
1408
01:03:28,042 --> 01:03:29,280
AUDIENCE: How would you--
1409
01:03:29,280 --> 01:03:33,840
SPEAKER 1: How would I use a for-loop to add elements to a linked list?
1410
01:03:33,840 --> 01:03:38,640
You will do something like this, if I may, in problem set 5.
1411
01:03:38,640 --> 01:03:41,730
We will give you some of the scaffolding for doing this.
1412
01:03:41,730 --> 01:03:44,700
But in this coming weeks materials will we guide you to that.
1413
01:03:44,700 --> 01:03:47,293
But let me not spoil it just yet.
1414
01:03:47,293 --> 01:03:48,210
Fair question, though.
1415
01:03:48,210 --> 01:03:48,710
Yeah.
1416
01:03:48,710 --> 01:03:51,077
AUDIENCE: So I had a question about line 49.
1417
01:03:51,077 --> 01:03:51,660
SPEAKER 1: OK.
1418
01:03:51,660 --> 01:03:53,678
AUDIENCE: Is line 49 possible in line 43?
1419
01:03:53,678 --> 01:03:54,720
SPEAKER 1: Good question.
1420
01:03:54,720 --> 01:03:57,900
Is line 49 acceptable, even if we freed it earlier.
1421
01:03:57,900 --> 01:04:00,600
We didn't free it in line 43, in this case, right.
1422
01:04:00,600 --> 01:04:04,800
You can only reach line 49, if n does not equal null.
1423
01:04:04,800 --> 01:04:06,990
And you do not return on line 45.
1424
01:04:06,990 --> 01:04:07,860
So that's safe.
1425
01:04:07,860 --> 01:04:12,180
I was only doing those freeing, if I knew on line 45 that I'm out of here
1426
01:04:12,180 --> 01:04:13,620
anyway, at that point.
1427
01:04:13,620 --> 01:04:14,400
Good question.
1428
01:04:14,400 --> 01:04:15,030
And, yeah.
1429
01:04:15,030 --> 01:04:16,405
AUDIENCE: I had a quick question.
1430
01:04:16,405 --> 01:04:19,380
Is TMP [INAUDIBLE].
1431
01:04:19,380 --> 01:04:22,650
SPEAKER 1: Correct You're asking about TMP, because it's in a for-loop,
1432
01:04:22,650 --> 01:04:24,358
does that mean you don't have to free it?
1433
01:04:24,358 --> 01:04:26,760
You never have to free pointers, per se.
1434
01:04:26,760 --> 01:04:31,560
You should only free addresses that were returned to you by malloc.
1435
01:04:31,560 --> 01:04:33,930
So I haven't finished the program, to be fair.
1436
01:04:33,930 --> 01:04:35,880
But you're not freeing variables.
1437
01:04:35,880 --> 01:04:37,740
You're not freeing like, fields.
1438
01:04:37,740 --> 01:04:40,870
You are freeing specific addresses, whatever they may be.
1439
01:04:40,870 --> 01:04:43,770
So the last thing, and I was stalling on showing this
1440
01:04:43,770 --> 01:04:45,450
because it too is a little cryptic.
1441
01:04:45,450 --> 01:04:48,570
Here is how you can free, now, a whole linked list.
1442
01:04:48,570 --> 01:04:51,242
In the world of arrays, recall, it was so easy.
1443
01:04:51,242 --> 01:04:52,200
You just say free list.
1444
01:04:52,200 --> 01:04:53,920
You return 0 and you're done.
1445
01:04:53,920 --> 01:04:55,140
Not with a linked list.
1446
01:04:55,140 --> 01:04:57,000
Because, again, the computer doesn't know
1447
01:04:57,000 --> 01:04:59,700
what you have stitched together using all of these pointers
1448
01:04:59,700 --> 01:05:01,140
all over the computer's memory.
1449
01:05:01,140 --> 01:05:03,180
You need to follow those arrows.
1450
01:05:03,180 --> 01:05:05,920
So one way to do this would be as follows.
1451
01:05:05,920 --> 01:05:10,920
While the list itself is not null, so while there's a list to be freed.
1452
01:05:10,920 --> 01:05:12,240
What do I want to do?
1453
01:05:12,240 --> 01:05:14,972
I'm going to give myself a temporary variable called TMP again.
1454
01:05:14,972 --> 01:05:17,430
And it's a different TMP because it's in a different scope.
1455
01:05:17,430 --> 01:05:21,210
It's inside of the while loop instead the for-loop, a few lines earlier.
1456
01:05:21,210 --> 01:05:26,640
I am going to initialize TMP to be the address of the next node.
1457
01:05:26,640 --> 01:05:29,160
Just so I can get one step ahead of things.
1458
01:05:29,160 --> 01:05:30,450
Why am I doing this?
1459
01:05:30,450 --> 01:05:34,330
Because now, I can boldly free the list itself,
1460
01:05:34,330 --> 01:05:35,970
which does not mean the whole list.
1461
01:05:35,970 --> 01:05:38,670
Again, I'm freeing the address in list, which
1462
01:05:38,670 --> 01:05:41,410
is the address of the number 1 node.
1463
01:05:41,410 --> 01:05:42,390
That's what list is.
1464
01:05:42,390 --> 01:05:44,980
It's just the address of the number 1 node.
1465
01:05:44,980 --> 01:05:47,880
So if I first use TMP to point out the number
1466
01:05:47,880 --> 01:05:53,310
2 slightly in the middle of the picture, then it is safe for me on line 61,
1467
01:05:53,310 --> 01:05:55,290
at the moment, to free list.
1468
01:05:55,290 --> 01:05:57,870
That is the address of the first node.
1469
01:05:57,870 --> 01:06:02,160
Now I'm going to say, all right, once I freed the first node in the list,
1470
01:06:02,160 --> 01:06:07,080
I can update the list itself to be literally TMP.
1471
01:06:07,080 --> 01:06:09,120
And now, the loop repeats.
1472
01:06:09,120 --> 01:06:10,450
So what's happening here?
1473
01:06:10,450 --> 01:06:16,140
If you think about this picture, TMP is initially pointing at not the list,
1474
01:06:16,140 --> 01:06:17,550
but list arrow next.
1475
01:06:17,550 --> 01:06:20,940
So TMP, represented by my right hand here, is pointing at the number 2.
1476
01:06:20,940 --> 01:06:25,530
Totally safe and reasonable to free now the list itself a.k.a.
1477
01:06:25,530 --> 01:06:27,150
the address of the number 1 node.
1478
01:06:27,150 --> 01:06:29,880
That has the effect of just throwing away the number 1 node,
1479
01:06:29,880 --> 01:06:32,670
telling the computer you can reuse that memory for you.
1480
01:06:32,670 --> 01:06:36,150
The last line of code I wrote updated list to point at the number
1481
01:06:36,150 --> 01:06:40,560
2, at which point my loop proceeded to do the exact same thing again.
1482
01:06:40,560 --> 01:06:43,590
And only once my finger is literally pointing at nowhere,
1483
01:06:43,590 --> 01:06:46,350
the null symbol, will the loop, by nature of a while
1484
01:06:46,350 --> 01:06:48,990
loop as I'll toggle back to, break out.
1485
01:06:48,990 --> 01:06:51,630
And there's nothing more to be freed.
1486
01:06:51,630 --> 01:06:54,690
So again, what you'll see, ultimately, in problem set 5,
1487
01:06:54,690 --> 01:06:58,690
more on that later, is an opportunity to play around with just this syntax.
1488
01:06:58,690 --> 01:06:59,730
But also these ideas.
1489
01:06:59,730 --> 01:07:02,580
But again, even though the syntax is admittedly pretty cryptic,
1490
01:07:02,580 --> 01:07:06,300
we're still using basics like these for-loops or while loops.
1491
01:07:06,300 --> 01:07:09,960
We're just starting to now follow explicit addresses rather
1492
01:07:09,960 --> 01:07:13,740
than letting the computer do all of the arithmetic for us,
1493
01:07:13,740 --> 01:07:15,635
as we previously benefited from.
1494
01:07:15,635 --> 01:07:18,760
At the very end of this thing, I'm going to return 0 as though all is well.
1495
01:07:18,760 --> 01:07:22,240
And I think, then, we're good to go.
1496
01:07:22,240 --> 01:07:22,740
All right.
1497
01:07:22,740 --> 01:07:25,960
Questions on this linked list code now?
1498
01:07:25,960 --> 01:07:28,710
And again, we'll walk through this again in the coming weeks spec.
1499
01:07:28,710 --> 01:07:29,210
Yeah.
1500
01:07:29,210 --> 01:07:33,613
AUDIENCE: Can you explain the while loop [INAUDIBLE] starts in other ways?
1501
01:07:33,613 --> 01:07:34,280
SPEAKER 1: Sure.
1502
01:07:34,280 --> 01:07:37,950
Can we explain this while loop here for freeing the list.
1503
01:07:37,950 --> 01:07:40,580
So notice that, first, I'm just asking the obvious question.
1504
01:07:40,580 --> 01:07:41,420
Is the list null?
1505
01:07:41,420 --> 01:07:45,390
Because if it is, there's no work to be done.
1506
01:07:45,390 --> 01:07:49,460
However, while the list is not null, according to line 58,
1507
01:07:49,460 --> 01:07:50,540
what do we want to do?
1508
01:07:50,540 --> 01:07:54,920
I want to create a temporary variable that points at the same thing
1509
01:07:54,920 --> 01:07:57,540
that list arrow next is pointing at.
1510
01:07:57,540 --> 01:07:58,760
So what does that mean?
1511
01:07:58,760 --> 01:08:00,260
Here is list.
1512
01:08:00,260 --> 01:08:03,690
List arrow next is whatever this thing is here.
1513
01:08:03,690 --> 01:08:06,470
So if my right hand represents the temporary variable,
1514
01:08:06,470 --> 01:08:10,470
I'm literally pointing at the same thing as the list is itself.
1515
01:08:10,470 --> 01:08:13,640
The next line of code, recall, was free the list.
1516
01:08:13,640 --> 01:08:16,400
And unlike, in our world of arrays, like half an hour
1517
01:08:16,400 --> 01:08:19,100
ago where that just meant free the whole darn list,
1518
01:08:19,100 --> 01:08:23,690
you now have taken over control over the computer's memory with a linked list,
1519
01:08:23,690 --> 01:08:25,550
in ways that you didn't with the array.
1520
01:08:25,550 --> 01:08:28,850
The computer knew how to free the whole array because you
1521
01:08:28,850 --> 01:08:30,680
malloc the whole thing at once.
1522
01:08:30,680 --> 01:08:34,580
You are now mallocing the linked list one node at a time.
1523
01:08:34,580 --> 01:08:37,430
And the operating system does not keep track of for you
1524
01:08:37,430 --> 01:08:38,810
where all these nodes are.
1525
01:08:38,810 --> 01:08:42,470
So when you free list, you are literally freeing
1526
01:08:42,470 --> 01:08:46,430
the value of the list variable, which is just this first node here.
1527
01:08:46,430 --> 01:08:49,820
Then my last line of code, which I'll flip back to in a second, updates
1528
01:08:49,820 --> 01:08:54,500
list to now ignore the free memory and point at 2.
1529
01:08:54,500 --> 01:08:57,080
And the story then repeats.
1530
01:08:57,080 --> 01:09:00,500
So, again, it's just a very pedantic way of using
1531
01:09:00,500 --> 01:09:04,460
this new syntax of star notation, and the arrow notation, and the like,
1532
01:09:04,460 --> 01:09:08,420
to do the equivalent of walking down all of these arrows.
1533
01:09:08,420 --> 01:09:10,640
Following all of these breadcrumbs.
1534
01:09:10,640 --> 01:09:13,940
But it does take admittedly some getting used to.
1535
01:09:13,940 --> 01:09:16,445
Syntax, you only have to do one week.
1536
01:09:16,445 --> 01:09:18,320
But, again, next week in Python will we begin
1537
01:09:18,320 --> 01:09:20,150
to abstract a lot of this complexity away.
1538
01:09:20,150 --> 01:09:22,020
But none of this complexity is going away.
1539
01:09:22,020 --> 01:09:24,770
It's just that someone else, the authors of Python for instance,
1540
01:09:24,770 --> 01:09:26,908
will have automated this stuff for us.
1541
01:09:26,908 --> 01:09:28,700
The goal this week is to understand what it
1542
01:09:28,700 --> 01:09:31,980
is we're going to get for free, so to speak, next week.
1543
01:09:31,980 --> 01:09:32,480
All right.
1544
01:09:32,480 --> 01:09:36,810
Questions on these length lists.
1545
01:09:36,810 --> 01:09:37,310
All right.
1546
01:09:37,310 --> 01:09:38,450
Just, yeah, in the back.
1547
01:09:38,450 --> 01:09:41,264
AUDIENCE: So are the while loops strictly necessary
1548
01:09:41,264 --> 01:09:42,728
for the freeing [INAUDIBLE].
1549
01:09:42,728 --> 01:09:43,770
SPEAKER 1: Fair question.
1550
01:09:43,770 --> 01:09:46,353
Let me summarize as, could we have freed this with a for-loop?
1551
01:09:46,353 --> 01:09:47,279
Absolutely.
1552
01:09:47,279 --> 01:09:48,630
It just is a matter of style.
1553
01:09:48,630 --> 01:09:51,670
It's a little more elegant to do it in a while loop, according to me.
1554
01:09:51,670 --> 01:09:53,672
But other people will reasonably disagree.
1555
01:09:53,672 --> 01:09:56,380
Anything you can do with a while loop you can do with a for-loop,
1556
01:09:56,380 --> 01:09:57,390
and vise versa.
1557
01:09:57,390 --> 01:09:59,729
Do while loops, recall, are a little different.
1558
01:09:59,729 --> 01:10:02,372
But they will always do at least one thing.
1559
01:10:02,372 --> 01:10:04,830
But for-loops and while loops behave the same in this case.
1560
01:10:04,830 --> 01:10:05,953
AUDIENCE: Thank you.
1561
01:10:05,953 --> 01:10:06,620
SPEAKER 1: Sure.
1562
01:10:06,620 --> 01:10:08,000
Other questions?
1563
01:10:08,000 --> 01:10:10,399
All right, well let's just vary things a little bit here.
1564
01:10:10,399 --> 01:10:12,482
Just to see what some of the pitfalls might now be
1565
01:10:12,482 --> 01:10:14,240
without getting into the weeds of code.
1566
01:10:14,240 --> 01:10:18,229
Indeed, we'll try to save some of that for problem set 5's exploration.
1567
01:10:18,229 --> 01:10:22,520
But instead, let's imagine that we want to create a list here of our own.
1568
01:10:22,520 --> 01:10:25,700
I can offer, in exchange for a few volunteers, some foam fingers
1569
01:10:25,700 --> 01:10:27,617
to bring to the next game, perhaps.
1570
01:10:27,617 --> 01:10:29,450
Could we get maybe just one volunteer first?
1571
01:10:29,450 --> 01:10:30,109
Come on up.
1572
01:10:30,109 --> 01:10:33,109
You will be our linked list from the get go.
1573
01:10:33,109 --> 01:10:33,913
What's your name?
1574
01:10:33,913 --> 01:10:34,580
AUDIENCE: Pedro.
1575
01:10:34,580 --> 01:10:36,840
SPEAKER 1: Pedro, come on up.
1576
01:10:36,840 --> 01:10:38,090
All right, thank you to Pedro.
1577
01:10:38,090 --> 01:10:41,180
[AUDIENCE CLAPPING]
1578
01:10:41,180 --> 01:10:43,180
And if you want to just stand roughly over here.
1579
01:10:43,180 --> 01:10:45,729
But you are a null pointer so just point sort of at the ground,
1580
01:10:45,729 --> 01:10:46,930
as though you're pointing at 0.
1581
01:10:46,930 --> 01:10:47,430
All right.
1582
01:10:47,430 --> 01:10:50,027
So Pedro is our linked list of size 0, which pictorially
1583
01:10:50,027 --> 01:10:53,319
might look a little something like this for consistency with our past pictures.
1584
01:10:53,319 --> 01:10:58,000
Now suppose that we want to go ahead and malloc, oh, how about the number 2.
1585
01:10:58,000 --> 01:11:00,200
Can we get a volunteer to be on camera here?
1586
01:11:00,200 --> 01:11:00,700
OK.
1587
01:11:00,700 --> 01:11:01,867
You jumped out of your seat.
1588
01:11:01,867 --> 01:11:04,408
Do you want to come up?
1589
01:11:04,408 --> 01:11:06,200
OK, you really want the foam finger, I say.
1590
01:11:06,200 --> 01:11:06,370
All right.
1591
01:11:06,370 --> 01:11:07,450
Round of applause, sure.
1592
01:11:07,450 --> 01:11:12,690
[AUDIENCE CLAPPING]
1593
01:11:12,690 --> 01:11:13,235
OK.
1594
01:11:13,235 --> 01:11:14,110
And what's your name?
1595
01:11:14,110 --> 01:11:14,970
AUDIENCE: Caleb.
1596
01:11:14,970 --> 01:11:15,430
SPEAKER 1: Say again?
1597
01:11:15,430 --> 01:11:15,760
AUDIENCE: Caleb.
1598
01:11:15,760 --> 01:11:16,030
SPEAKER 1: Halen?
1599
01:11:16,030 --> 01:11:16,762
AUDIENCE: Caleb.
1600
01:11:16,762 --> 01:11:17,470
SPEAKER 1: Caleb.
1601
01:11:17,470 --> 01:11:18,770
Caleb, sorry.
1602
01:11:18,770 --> 01:11:19,270
All right.
1603
01:11:19,270 --> 01:11:21,790
So here is your number 2 for your number field.
1604
01:11:21,790 --> 01:11:23,020
And here is your pointer.
1605
01:11:23,020 --> 01:11:26,115
And come on, let's say that there was room for Caleb like, right there.
1606
01:11:26,115 --> 01:11:26,740
That's perfect.
1607
01:11:26,740 --> 01:11:29,480
So Caleb got malloced, if you will, over here.
1608
01:11:29,480 --> 01:11:33,805
So now if we want to insert Caleb and the number 2 into this linked list,
1609
01:11:33,805 --> 01:11:34,930
well what do we need to do?
1610
01:11:34,930 --> 01:11:36,340
I already initialized you to 2.
1611
01:11:36,340 --> 01:11:38,320
And pointing as you are to the ground means
1612
01:11:38,320 --> 01:11:40,630
you're initialized to null for your next field.
1613
01:11:40,630 --> 01:11:42,400
Pedro, what you should you-- perfect.
1614
01:11:42,400 --> 01:11:43,720
What should Pedro do.
1615
01:11:43,720 --> 01:11:44,620
That's fine, too.
1616
01:11:44,620 --> 01:11:46,195
So Pedro is now pointing at the list.
1617
01:11:46,195 --> 01:11:48,320
So now our list looks a little something like this.
1618
01:11:48,320 --> 01:11:49,540
So far, so good.
1619
01:11:49,540 --> 01:11:50,170
All is well.
1620
01:11:50,170 --> 01:11:52,670
So the first couple of these will be pretty straightforward.
1621
01:11:52,670 --> 01:11:56,180
Let's insert one more, if anyone really wants another foam finger.
1622
01:11:56,180 --> 01:11:57,680
Here, how about right in the middle.
1623
01:11:57,680 --> 01:11:58,870
Come on down.
1624
01:11:58,870 --> 01:12:01,678
And just in anticipation, how about let's malloc someone else.
1625
01:12:01,678 --> 01:12:03,220
OK, your friends are pointing at you.
1626
01:12:03,220 --> 01:12:05,350
Do you want to come down too, preemptively?
1627
01:12:05,350 --> 01:12:07,852
This is a pool of memory, if you will.
1628
01:12:07,852 --> 01:12:08,560
What's your name?
1629
01:12:08,560 --> 01:12:09,130
AUDIENCE: Hannah.
1630
01:12:09,130 --> 01:12:09,880
SPEAKER 1: Hannah.
1631
01:12:09,880 --> 01:12:10,600
All right, Hanna.
1632
01:12:10,600 --> 01:12:11,440
You are number 4.
1633
01:12:11,440 --> 01:12:13,180
[AUDIENCE CLAPPING]
1634
01:12:13,180 --> 01:12:14,810
And hang there for just a moment.
1635
01:12:14,810 --> 01:12:15,310
All right.
1636
01:12:15,310 --> 01:12:16,870
So we've just malloced Hannah.
1637
01:12:16,870 --> 01:12:20,140
And Hannah, how about Hannah, suppose you ended up over there
1638
01:12:20,140 --> 01:12:21,800
in just some random location.
1639
01:12:21,800 --> 01:12:22,300
All right.
1640
01:12:22,300 --> 01:12:25,960
So what should we now do, if the goal is to keep these things sorted?
1641
01:12:25,960 --> 01:12:26,560
How about?
1642
01:12:26,560 --> 01:12:28,538
So Pedro, do you have to update yourself?
1643
01:12:28,538 --> 01:12:29,080
AUDIENCE: No.
1644
01:12:29,080 --> 01:12:29,410
SPEAKER 1: No.
1645
01:12:29,410 --> 01:12:29,910
All right.
1646
01:12:29,910 --> 01:12:31,300
Caleb, what do you have to do?
1647
01:12:31,300 --> 01:12:31,800
OK.
1648
01:12:31,800 --> 01:12:34,692
And Hannah what should you be doing?
1649
01:12:34,692 --> 01:12:37,900
I would, it's just for you for now, so point at the ground representing null.
1650
01:12:37,900 --> 01:12:38,400
OK.
1651
01:12:38,400 --> 01:12:41,290
So, again demonstrating the fact that, unlike in past weeks where
1652
01:12:41,290 --> 01:12:43,810
we had our nice, clean array back, to back, to back,
1653
01:12:43,810 --> 01:12:46,380
contiguously, these guys are deliberately all over the stage.
1654
01:12:46,380 --> 01:12:47,380
So let's malloc another.
1655
01:12:47,380 --> 01:12:49,012
How about number 5.
1656
01:12:49,012 --> 01:12:49,720
What's your name?
1657
01:12:49,720 --> 01:12:50,440
AUDIENCE: Jonathan.
1658
01:12:50,440 --> 01:12:50,920
SPEAKER 1: Jonathan.
1659
01:12:50,920 --> 01:12:51,753
All right, Jonathan.
1660
01:12:51,753 --> 01:12:53,440
You are our number 5.
1661
01:12:53,440 --> 01:12:55,255
And pick your favorite place in memory.
1662
01:12:55,255 --> 01:12:56,200
[AUDIENCE CLAPPING]
1663
01:12:56,200 --> 01:12:56,700
OK.
1664
01:12:58,820 --> 01:12:59,320
All right.
1665
01:12:59,320 --> 01:13:01,548
So Jonathan's now over there.
1666
01:13:01,548 --> 01:13:02,590
And Hannah is over there.
1667
01:13:02,590 --> 01:13:04,447
So 5, we want to point Hannah at number 5.
1668
01:13:04,447 --> 01:13:06,280
So you, of course, are going to point there.
1669
01:13:06,280 --> 01:13:07,655
And where should you be pointing?
1670
01:13:07,655 --> 01:13:09,500
Down to represent null, as well.
1671
01:13:09,500 --> 01:13:10,000
OK.
1672
01:13:10,000 --> 01:13:11,553
So pretty straightforward.
1673
01:13:11,553 --> 01:13:13,220
But now things get a little interesting.
1674
01:13:13,220 --> 01:13:16,000
And here, we'll use a chance to, without the weeds of code,
1675
01:13:16,000 --> 01:13:19,090
point out how order of operations is really going to matter.
1676
01:13:19,090 --> 01:13:23,320
Suppose that I next want to allocate say, the number 1.
1677
01:13:23,320 --> 01:13:25,510
And I want to insert the number 1 into this list.
1678
01:13:25,510 --> 01:13:26,010
Yes.
1679
01:13:26,010 --> 01:13:27,620
This is what the code would look like.
1680
01:13:27,620 --> 01:13:31,180
But if we act this out-- could we get one more volunteer?
1681
01:13:31,180 --> 01:13:32,990
How about on the end there in the sweater.
1682
01:13:32,990 --> 01:13:33,490
Yeah.
1683
01:13:33,490 --> 01:13:34,780
Come on down.
1684
01:13:34,780 --> 01:13:35,950
We have, what's your name?
1685
01:13:35,950 --> 01:13:36,850
AUDIENCE: Lauren.
1686
01:13:36,850 --> 01:13:37,300
SPEAKER 1: Lauren.
1687
01:13:37,300 --> 01:13:37,540
OK.
1688
01:13:37,540 --> 01:13:38,650
Lauren, come on down.
1689
01:13:38,650 --> 01:13:43,975
[AUDIENCE CLAPPING]
1690
01:13:43,975 --> 01:13:45,850
And how about, Lauren, why don't you go right
1691
01:13:45,850 --> 01:13:47,470
in here in front, if you don't mind.
1692
01:13:47,470 --> 01:13:48,670
Here is your number.
1693
01:13:48,670 --> 01:13:49,780
Here is your pointer.
1694
01:13:49,780 --> 01:13:51,850
So I've initialized Lauren to the number 1.
1695
01:13:51,850 --> 01:13:54,460
And your pointer will be null, pointing at the ground.
1696
01:13:54,460 --> 01:13:57,003
Where do you belong if we're maintaining sorted order?
1697
01:13:57,003 --> 01:13:58,420
Looks like right at the beginning.
1698
01:13:58,420 --> 01:14:00,920
What should happen here?
1699
01:14:00,920 --> 01:14:01,420
OK.
1700
01:14:01,420 --> 01:14:06,100
So Pedro has presumed to point now at Lauren.
1701
01:14:06,100 --> 01:14:10,330
But how do you know where to point?
1702
01:14:10,330 --> 01:14:11,500
AUDIENCE: He's number 2.
1703
01:14:11,500 --> 01:14:13,400
SPEAKER 1: Pedro's undoing what he did a moment ago.
1704
01:14:13,400 --> 01:14:14,380
So this was deliberate.
1705
01:14:14,380 --> 01:14:17,750
And that was perfect that Pedro presumed to point immediately at Lauren.
1706
01:14:17,750 --> 01:14:18,250
Why?
1707
01:14:18,250 --> 01:14:21,950
You literally just orphaned all of these folks, all of these chunks of memory.
1708
01:14:21,950 --> 01:14:22,450
Why?
1709
01:14:22,450 --> 01:14:26,800
Because if Pedro was our only variable pointing at that chunk of memory,
1710
01:14:26,800 --> 01:14:29,800
this is the danger of using pointers, and dynamic memory allocation,
1711
01:14:29,800 --> 01:14:31,180
and building your own data structures.
1712
01:14:31,180 --> 01:14:33,138
The moment you point temporarily, if you could,
1713
01:14:33,138 --> 01:14:36,490
to Lauren, I have no idea where he's pointing to.
1714
01:14:36,490 --> 01:14:41,260
I have no idea how to get back to Caleb, or Hannah, or anyone else on stage.
1715
01:14:41,260 --> 01:14:42,040
So that was bad.
1716
01:14:42,040 --> 01:14:43,310
So you did undo it.
1717
01:14:43,310 --> 01:14:44,290
So that's good.
1718
01:14:44,290 --> 01:14:46,300
I think we need Lauren to make a decision first.
1719
01:14:46,300 --> 01:14:47,410
Who should you point at?
1720
01:14:47,410 --> 01:14:47,650
AUDIENCE: Caleb.
1721
01:14:47,650 --> 01:14:48,820
SPEAKER 1: So pointing at Caleb.
1722
01:14:48,820 --> 01:14:49,120
Why?
1723
01:14:49,120 --> 01:14:51,703
Because you're pointing at literally who Pedro is pointing at.
1724
01:14:51,703 --> 01:14:53,490
Pedro, now what are you safe to do?
1725
01:14:53,490 --> 01:14:53,990
Good.
1726
01:14:53,990 --> 01:14:55,730
So order of operations there matters.
1727
01:14:55,730 --> 01:14:59,830
And if we had just done this line of code in red here, list equals n.
1728
01:14:59,830 --> 01:15:02,740
That was like Pedro's first instinct, bad things happen.
1729
01:15:02,740 --> 01:15:04,700
And we orphaned the rest of the list.
1730
01:15:04,700 --> 01:15:08,350
But if we think through it logically and do this, as Lauren did for us, instead,
1731
01:15:08,350 --> 01:15:11,840
we've now updated the list to look a little something more like this.
1732
01:15:11,840 --> 01:15:12,910
Let's do one last one.
1733
01:15:12,910 --> 01:15:15,485
We got one more foam finger here for the number 3.
1734
01:15:15,485 --> 01:15:16,360
How about on the end?
1735
01:15:16,360 --> 01:15:16,860
Yeah.
1736
01:15:16,860 --> 01:15:18,190
You want to come down.
1737
01:15:18,190 --> 01:15:18,850
All right.
1738
01:15:18,850 --> 01:15:19,900
One final volunteer.
1739
01:15:19,900 --> 01:15:26,010
[AUDIENCE CLAPPING]
1740
01:15:26,010 --> 01:15:26,510
All right.
1741
01:15:26,510 --> 01:15:27,385
And what's your name?
1742
01:15:27,385 --> 01:15:28,230
AUDIENCE: Miriam.
1743
01:15:28,230 --> 01:15:28,430
SPEAKER 1: I'm sorry?
1744
01:15:28,430 --> 01:15:28,940
AUDIENCE: Miriam.
1745
01:15:28,940 --> 01:15:29,480
SPEAKER 1: Miriam.
1746
01:15:29,480 --> 01:15:29,750
All right.
1747
01:15:29,750 --> 01:15:30,860
So here is your number 3.
1748
01:15:30,860 --> 01:15:31,735
Here is your pointer.
1749
01:15:31,735 --> 01:15:35,370
If you want to go maybe in the middle of the stage in a random memory location.
1750
01:15:35,370 --> 01:15:39,270
So here, too, the goal is to maintain sorted order.
1751
01:15:39,270 --> 01:15:44,400
So let's ask the audience, who or what number should point at whom first here?
1752
01:15:44,400 --> 01:15:46,910
So we don't screw up and orphan some of the memory.
1753
01:15:46,910 --> 01:15:50,240
And if we do orphan memory, this is what's called, again per last week,
1754
01:15:50,240 --> 01:15:51,110
a memory leak.
1755
01:15:51,110 --> 01:15:53,420
Your Mac, your PC, your phone can start to slow down
1756
01:15:53,420 --> 01:15:56,610
if you keep asking for memory but never give it back or lose track of it.
1757
01:15:56,610 --> 01:15:58,430
So we want to get this right.
1758
01:15:58,430 --> 01:16:00,140
Who should point at whom?
1759
01:16:00,140 --> 01:16:01,370
Or what number?
1760
01:16:01,370 --> 01:16:02,312
Say again.
1761
01:16:02,312 --> 01:16:03,020
AUDIENCE: 3 to 4.
1762
01:16:03,020 --> 01:16:04,700
SPEAKER 1: 3 should point at 4.
1763
01:16:04,700 --> 01:16:08,090
So 3, do you want to point at 4.
1764
01:16:08,090 --> 01:16:09,800
And not, so, OK, good.
1765
01:16:09,800 --> 01:16:14,960
And how did you know, Miriam, whom to point at?
1766
01:16:14,960 --> 01:16:15,998
AUDIENCE: Copying Caleb.
1767
01:16:15,998 --> 01:16:16,790
SPEAKER 1: Perfect.
1768
01:16:16,790 --> 01:16:18,150
OK, so copying Caleb.
1769
01:16:18,150 --> 01:16:18,650
Why?
1770
01:16:18,650 --> 01:16:22,220
Because if you look at where this list is currently constructed,
1771
01:16:22,220 --> 01:16:25,070
and you can cheat on the board here, 2 is pointing to 4.
1772
01:16:25,070 --> 01:16:28,640
If you point at whoever Caleb, number 2, is pointing out,
1773
01:16:28,640 --> 01:16:31,460
that, indeed, leads you to Hannah for number 4.
1774
01:16:31,460 --> 01:16:35,600
So now what's the next step to stitch this together?
1775
01:16:35,600 --> 01:16:37,220
Our voice in the crowd.
1776
01:16:37,220 --> 01:16:38,150
AUDIENCE: 2 to 3.
1777
01:16:38,150 --> 01:16:39,260
SPEAKER 1: 2 to 3.
1778
01:16:39,260 --> 01:16:40,310
So, 2 to 3.
1779
01:16:40,310 --> 01:16:42,903
So Caleb, I think it's now safe for you to decouple.
1780
01:16:42,903 --> 01:16:44,820
Because someone is already pointing at Hannah.
1781
01:16:44,820 --> 01:16:45,945
We haven't orphaned anyone.
1782
01:16:45,945 --> 01:16:47,840
So now, if we follow the breadcrumbs, we've
1783
01:16:47,840 --> 01:16:52,870
got Pedro leading to 1, to 2, to 3, to 4, to 5.
1784
01:16:52,870 --> 01:16:55,370
We need the numbers back, but you can keep the foam fingers.
1785
01:16:55,370 --> 01:16:57,537
Thank you to our volunteers here.
1786
01:16:57,537 --> 01:16:58,370
AUDIENCE: Thank you.
1787
01:16:58,370 --> 01:16:58,870
Thank you.
1788
01:16:58,870 --> 01:17:00,260
[AUDIENCE CLAPPING]
1789
01:17:00,260 --> 01:17:03,257
SPEAKER 1: You can just put the numbers here.
1790
01:17:03,257 --> 01:17:04,090
AUDIENCE: Thank you.
1791
01:17:04,090 --> 01:17:05,257
SPEAKER 1: Thank you to all.
1792
01:17:05,257 --> 01:17:09,200
So this is only to say that when you start looking at the code this week
1793
01:17:09,200 --> 01:17:11,763
and in the problem set, it's going to be very easy to lose
1794
01:17:11,763 --> 01:17:13,180
sight of the forest for the trees.
1795
01:17:13,180 --> 01:17:15,220
Because the code does get really dense.
1796
01:17:15,220 --> 01:17:20,240
But the idea is, again, really do bubble up to these higher level descriptions.
1797
01:17:20,240 --> 01:17:23,300
And if you think about data structures at this level.
1798
01:17:23,300 --> 01:17:25,417
If you go off in program after a class like CS50
1799
01:17:25,417 --> 01:17:28,000
and your whiteboarding something with a friend or a colleague,
1800
01:17:28,000 --> 01:17:31,030
most people think at and talk at this level.
1801
01:17:31,030 --> 01:17:33,550
And they just assume that, yeah, if we went back and looked
1802
01:17:33,550 --> 01:17:36,890
at our textbooks or class notes, we could figure out how to implement this.
1803
01:17:36,890 --> 01:17:38,740
But the important stuff is the conversation.
1804
01:17:38,740 --> 01:17:40,120
And the idea is up here.
1805
01:17:40,120 --> 01:17:45,080
Even though, via this week, will we get some practice with the actual code.
1806
01:17:45,080 --> 01:17:49,090
So when it comes to analyzing an algorithm like this,
1807
01:17:49,090 --> 01:17:51,160
let's consider the following.
1808
01:17:51,160 --> 01:17:58,480
What might be now the running time of operations like searching and inserting
1809
01:17:58,480 --> 01:18:00,100
into a linked list?
1810
01:18:00,100 --> 01:18:01,810
We talked about arrays earlier.
1811
01:18:01,810 --> 01:18:04,810
And we had some binary search possibilities still, as soon
1812
01:18:04,810 --> 01:18:05,650
as it's an array.
1813
01:18:05,650 --> 01:18:08,830
But as soon as we have a linked list, these arrows, like our volunteers,
1814
01:18:08,830 --> 01:18:10,180
could be anywhere on stage.
1815
01:18:10,180 --> 01:18:11,888
And so you can't just assume that you can
1816
01:18:11,888 --> 01:18:14,680
jump arithmetically to the middle element, to the middle element,
1817
01:18:14,680 --> 01:18:15,500
to the middle one.
1818
01:18:15,500 --> 01:18:19,090
You pretty much have to follow all of these breadcrumbs again and again.
1819
01:18:19,090 --> 01:18:21,880
So how might that inform what we see?
1820
01:18:21,880 --> 01:18:23,595
Well, consider this too.
1821
01:18:23,595 --> 01:18:26,470
Even though I keep drawing all these pictures with all of the numbers
1822
01:18:26,470 --> 01:18:26,980
exposed.
1823
01:18:26,980 --> 01:18:28,772
And all of us humans in the room can easily
1824
01:18:28,772 --> 01:18:32,360
spot where the 1 is, where the 2 is, where the 3 is, the computer, again,
1825
01:18:32,360 --> 01:18:36,610
just like with our lockers and arrays, can only see one location at a time.
1826
01:18:36,610 --> 01:18:40,510
And the key thing with a linked list is that the only address
1827
01:18:40,510 --> 01:18:44,410
we've fundamentally been remembering is what Pedro represented a moment ago.
1828
01:18:44,410 --> 01:18:47,990
He was the link to all of the other nodes.
1829
01:18:47,990 --> 01:18:49,990
And, in turn, each person led to the next.
1830
01:18:49,990 --> 01:18:54,650
But without Pedro, we would have lost some of, or all of, the linked list.
1831
01:18:54,650 --> 01:18:56,950
So when you start with a linked list, if you
1832
01:18:56,950 --> 01:19:00,730
want to find an element as via search, you have to do it linearly.
1833
01:19:00,730 --> 01:19:02,200
Following all of the arrows.
1834
01:19:02,200 --> 01:19:04,210
Following all of the pointers on the stage
1835
01:19:04,210 --> 01:19:06,340
in order to get to the node in question.
1836
01:19:06,340 --> 01:19:09,700
And only once you hit null can you conclude, yep, it was there.
1837
01:19:09,700 --> 01:19:11,500
Or no, it was not.
1838
01:19:11,500 --> 01:19:14,440
So given that if a computer, essentially,
1839
01:19:14,440 --> 01:19:18,970
can only see the number 1, or the number 2, or the number 3, or the number 4,
1840
01:19:18,970 --> 01:19:22,270
or the number 5, one at a time, how might we
1841
01:19:22,270 --> 01:19:25,690
think about the running time of search?
1842
01:19:25,690 --> 01:19:27,610
And it is indeed Big O of n.
1843
01:19:27,610 --> 01:19:28,410
But why is that?
1844
01:19:28,410 --> 01:19:30,910
Well, in the worst case, the number you might be looking for
1845
01:19:30,910 --> 01:19:32,480
is all the way at the end.
1846
01:19:32,480 --> 01:19:35,710
And so, obviously, you're going to have to search all of the n elements.
1847
01:19:35,710 --> 01:19:37,943
And I drew these things with boxes on top of them.
1848
01:19:37,943 --> 01:19:40,360
Because, again, even though you and I can immediately see,
1849
01:19:40,360 --> 01:19:42,610
where the 5 is for instance, the computer
1850
01:19:42,610 --> 01:19:46,480
can only figure that out by starting at the beginning and going there.
1851
01:19:46,480 --> 01:19:48,400
So there, too, is another trade off.
1852
01:19:48,400 --> 01:19:52,030
It would seem that, overnight, we have lost the ability
1853
01:19:52,030 --> 01:19:57,190
to do a very powerful algorithm from week 0 known as binary search, right.
1854
01:19:57,190 --> 01:19:57,820
It's gone.
1855
01:19:57,820 --> 01:20:01,810
Because there's no way in this picture to jump mathematically
1856
01:20:01,810 --> 01:20:04,375
to the middle node, unless you remember where it is.
1857
01:20:04,375 --> 01:20:06,250
And then, remember where every other node is.
1858
01:20:06,250 --> 01:20:08,042
And at that point, you're back to an array.
1859
01:20:08,042 --> 01:20:12,380
Linked list, by design, only remember the next node in the list.
1860
01:20:12,380 --> 01:20:12,880
All right.
1861
01:20:12,880 --> 01:20:15,370
How about something like insert?
1862
01:20:15,370 --> 01:20:18,190
In the worst case, perhaps, how many steps
1863
01:20:18,190 --> 01:20:21,340
might it take to insert something into a linked list?
1864
01:20:21,340 --> 01:20:22,998
Someone else.
1865
01:20:22,998 --> 01:20:23,540
Someone else.
1866
01:20:23,540 --> 01:20:24,040
Yeah.
1867
01:20:24,040 --> 01:20:25,060
AUDIENCE: N squared.
1868
01:20:25,060 --> 01:20:25,480
SPEAKER 1: Say again?
1869
01:20:25,480 --> 01:20:26,320
AUDIENCE: N squared.
1870
01:20:26,320 --> 01:20:26,890
SPEAKER 1: N squared.
1871
01:20:26,890 --> 01:20:28,232
Fortunately, it's not that bad.
1872
01:20:28,232 --> 01:20:29,440
It's not as bad as n squared.
1873
01:20:29,440 --> 01:20:31,720
That typically means doing n things, n times.
1874
01:20:31,720 --> 01:20:36,260
And I think we can stay under that, but not a bad thought.
1875
01:20:36,260 --> 01:20:36,760
Yeah.
1876
01:20:36,760 --> 01:20:37,832
AUDIENCE: Is it n?
1877
01:20:37,832 --> 01:20:39,040
SPEAKER 1: Why would it be n?
1878
01:20:39,040 --> 01:20:42,787
AUDIENCE: Because the [INAUDIBLE].
1879
01:20:42,787 --> 01:20:43,370
SPEAKER 1: OK.
1880
01:20:43,370 --> 01:20:45,650
So to summarize, you're proposing n.
1881
01:20:45,650 --> 01:20:47,513
Because to find where the thing goes, you
1882
01:20:47,513 --> 01:20:49,430
have to traverse, potentially, the whole list.
1883
01:20:49,430 --> 01:20:52,220
Because if I'm inserting the number 6 or the number 99,
1884
01:20:52,220 --> 01:20:54,770
that numerically belongs at the very end,
1885
01:20:54,770 --> 01:20:57,830
I can only find its location by looking for all of them.
1886
01:20:57,830 --> 01:20:59,368
At this point, though, in the term.
1887
01:20:59,368 --> 01:21:01,160
And really, at this point in the story, you
1888
01:21:01,160 --> 01:21:04,590
should start to question these very simplistic questions, to be honest.
1889
01:21:04,590 --> 01:21:08,360
Because the answer is almost always going to depend, right.
1890
01:21:08,360 --> 01:21:10,980
If I've just got a link to list that looks like this,
1891
01:21:10,980 --> 01:21:14,240
the first question back to someone asking this question
1892
01:21:14,240 --> 01:21:17,300
would be, well does the list need to be sorted, right?
1893
01:21:17,300 --> 01:21:19,692
I've drawn it as sorted and it might imply as much.
1894
01:21:19,692 --> 01:21:21,650
So that's a reasonable assumption to have made.
1895
01:21:21,650 --> 01:21:24,320
But if I don't care about maintaining sorted order,
1896
01:21:24,320 --> 01:21:28,190
I could actually insert into a linked list in constant time.
1897
01:21:28,190 --> 01:21:28,730
Why?
1898
01:21:28,730 --> 01:21:31,628
I could just keep inserting into the beginning, into the beginning,
1899
01:21:31,628 --> 01:21:32,420
into the beginning.
1900
01:21:32,420 --> 01:21:34,310
And even though the list is getting longer,
1901
01:21:34,310 --> 01:21:38,270
the number of steps required to insert something between the first element
1902
01:21:38,270 --> 01:21:40,220
is not growing at all.
1903
01:21:40,220 --> 01:21:42,740
You just keep inserting.
1904
01:21:42,740 --> 01:21:44,900
If you want to keep it sorted though, yes, it's
1905
01:21:44,900 --> 01:21:46,310
going to be, indeed, Big O of n.
1906
01:21:46,310 --> 01:21:47,840
But again, these kinds of, now, assumptions
1907
01:21:47,840 --> 01:21:49,048
are going to start to matter.
1908
01:21:49,048 --> 01:21:51,740
So let's for the sake of discussion say it's Big O of n,
1909
01:21:51,740 --> 01:21:53,660
if we do want to maintain sorted order.
1910
01:21:53,660 --> 01:21:56,810
But what about in the case of not caring.
1911
01:21:56,810 --> 01:21:58,628
It might indeed be a Big O of 1.
1912
01:21:58,628 --> 01:22:01,670
And now these are the kinds of decisions that will start to leave to you.
1913
01:22:01,670 --> 01:22:03,200
What about in the best case here?
1914
01:22:03,200 --> 01:22:05,240
If we're thinking about Big Omega notation,
1915
01:22:05,240 --> 01:22:07,632
then, frankly, we could just get lucky in the best case.
1916
01:22:07,632 --> 01:22:10,340
And the element we're looking for happens to be at the beginning.
1917
01:22:10,340 --> 01:22:14,570
Or heck, we just blindly insert to the beginning irrespective of the order
1918
01:22:14,570 --> 01:22:16,500
that we want to keep things in.
1919
01:22:16,500 --> 01:22:17,000
All right.
1920
01:22:17,000 --> 01:22:22,418
So besides then, how can we improve further on this design?
1921
01:22:22,418 --> 01:22:23,960
We don't need to stop at linked list.
1922
01:22:23,960 --> 01:22:26,090
Because, honestly, it's not been a clear win.
1923
01:22:26,090 --> 01:22:28,940
Like, linked list allow us to use more of our memory
1924
01:22:28,940 --> 01:22:32,430
because we don't need massive growing chunks of contiguous memory.
1925
01:22:32,430 --> 01:22:33,300
So that's a win.
1926
01:22:33,300 --> 01:22:37,310
But they still require Big O of n time to find the end of it,
1927
01:22:37,310 --> 01:22:38,630
if we care about order.
1928
01:22:38,630 --> 01:22:41,870
We're using at least twice as much memory for the darn pointer.
1929
01:22:41,870 --> 01:22:44,120
So that seems like a sidestep.
1930
01:22:44,120 --> 01:22:46,100
It's not really a step forward.
1931
01:22:46,100 --> 01:22:47,840
So can we do better?
1932
01:22:47,840 --> 01:22:52,157
Here's where we can now accelerate the story by just stipulating that, hey,
1933
01:22:52,157 --> 01:22:53,990
even if you haven't used this technique yet,
1934
01:22:53,990 --> 01:22:58,130
we would seem to have an ability to stitch together pieces of memory just
1935
01:22:58,130 --> 01:22:59,120
using pointers .
1936
01:22:59,120 --> 01:23:01,520
And anything you could imagine drawing with arrows,
1937
01:23:01,520 --> 01:23:04,140
you can implement, it would seem, in code.
1938
01:23:04,140 --> 01:23:06,620
So what if we leverage a second dimension.
1939
01:23:06,620 --> 01:23:09,137
Instead of just stringing together things laterally,
1940
01:23:09,137 --> 01:23:10,970
left to right, essentially, even though they
1941
01:23:10,970 --> 01:23:12,620
were bouncing around on the screen.
1942
01:23:12,620 --> 01:23:15,770
What if we start to leverage a second dimension here, so to speak.
1943
01:23:15,770 --> 01:23:19,400
And build more interesting structures in the computer's memory.
1944
01:23:19,400 --> 01:23:22,190
Well it turns out that in a computer's memory,
1945
01:23:22,190 --> 01:23:25,130
we could create a tree, similar to a family tree.
1946
01:23:25,130 --> 01:23:28,880
If you've ever seen or draw on a family tree with grandparents, and parents,
1947
01:23:28,880 --> 01:23:30,170
and siblings, and so forth.
1948
01:23:32,960 --> 01:23:36,170
So inverted branch of a tree that grows, typically
1949
01:23:36,170 --> 01:23:39,050
when it's drawn, downward instead of upward like a typical tree.
1950
01:23:39,050 --> 01:23:41,540
But that's something we could translate into code as well.
1951
01:23:41,540 --> 01:23:45,240
Specifically, let's do something called a binary search tree.
1952
01:23:45,240 --> 01:23:47,120
Which is a type of tree.
1953
01:23:47,120 --> 01:23:49,670
And what I mean by this is the following.
1954
01:23:49,670 --> 01:23:50,480
Notice this.
1955
01:23:50,480 --> 01:23:53,360
This is an example of an array from like week 2,
1956
01:23:53,360 --> 01:23:54,750
when we first talked about those.
1957
01:23:54,750 --> 01:23:56,450
And we had the lockers on stage.
1958
01:23:56,450 --> 01:24:02,480
And recall that what was nice about an array, if 1, it's sorted.
1959
01:24:02,480 --> 01:24:05,540
And 2, all of its numbers are indeed contiguous,
1960
01:24:05,540 --> 01:24:07,530
which is by definition an array.
1961
01:24:07,530 --> 01:24:09,270
We can just do some simple math.
1962
01:24:09,270 --> 01:24:13,980
For instance, if there are 7 elements in this array, and we do 7 divided by 2,
1963
01:24:13,980 --> 01:24:14,480
that's what?
1964
01:24:14,480 --> 01:24:17,330
3 and 1/2, round down through truncation, that's 3.
1965
01:24:17,330 --> 01:24:18,680
0, 1, 2, 3.
1966
01:24:18,680 --> 01:24:21,933
That gives me the middle element, arithmetically, in this thing.
1967
01:24:21,933 --> 01:24:24,350
And even though I have to be careful about rounding, using
1968
01:24:24,350 --> 01:24:28,430
simple arithmetic, I can very quickly, with a single line of code or math,
1969
01:24:28,430 --> 01:24:30,890
find for you the middle of the left half, of the left half,
1970
01:24:30,890 --> 01:24:32,182
of the right half, or whatever.
1971
01:24:32,182 --> 01:24:33,480
That's the power of arrays.
1972
01:24:33,480 --> 01:24:35,420
And that's what gave us binary search.
1973
01:24:35,420 --> 01:24:36,940
And how did binary search work?
1974
01:24:36,940 --> 01:24:38,190
Well, we looked at the middle.
1975
01:24:38,190 --> 01:24:39,830
And then, we went left or right.
1976
01:24:39,830 --> 01:24:45,080
And then, we went left or right again, implied by this color scheme here.
1977
01:24:45,080 --> 01:24:50,210
Wouldn't it be nice if we somehow preserved the new upsides
1978
01:24:50,210 --> 01:24:53,038
today of dynamic memory allocation, giving ourselves
1979
01:24:53,038 --> 01:24:55,580
the ability to just add another element, add another element,
1980
01:24:55,580 --> 01:24:56,750
add another element.
1981
01:24:56,750 --> 01:24:59,300
But retain the power of binary search.
1982
01:24:59,300 --> 01:25:04,100
Because log of n was much better than n, certainly for large data sets, right.
1983
01:25:04,100 --> 01:25:06,980
Even the phone book demonstrated as much weeks ago.
1984
01:25:06,980 --> 01:25:11,010
So what if I draw this same picture in 2 dimensions.
1985
01:25:11,010 --> 01:25:14,960
And I preserve the color scheme, just so it's obvious what came where.
1986
01:25:14,960 --> 01:25:18,500
What are these things look like now?
1987
01:25:18,500 --> 01:25:21,050
Maybe, like, things we might now call nodes, right.
1988
01:25:21,050 --> 01:25:25,030
A node is just a generic term for like, storing some data.
1989
01:25:25,030 --> 01:25:28,200
What if the data these nodes are storing are numbers.
1990
01:25:28,200 --> 01:25:29,730
So still integers.
1991
01:25:29,730 --> 01:25:33,860
But what if we connected these cleverly, like an old family tree.
1992
01:25:33,860 --> 01:25:39,230
Whereby, every node has not one pointer now, but as many as 2.
1993
01:25:39,230 --> 01:25:42,330
Maybe 0, like in the leaves at the bottom are in green.
1994
01:25:42,330 --> 01:25:45,450
But other nodes on the interior might have as many as 2.
1995
01:25:45,450 --> 01:25:47,250
Like having 2 children, so to speak.
1996
01:25:47,250 --> 01:25:49,420
And indeed, the vernacular here is exactly that.
1997
01:25:49,420 --> 01:25:51,330
This would be called the root of the tree.
1998
01:25:51,330 --> 01:25:54,270
Or this would be a parent, with respect to these children.
1999
01:25:54,270 --> 01:25:56,910
The green ones would be grandchildren, respect to these.
2000
01:25:56,910 --> 01:26:01,530
The green ones would be siblings with respect to each other.
2001
01:26:01,530 --> 01:26:02,370
And over there, too.
2002
01:26:02,370 --> 01:26:04,662
So all the same jargon you might use in the real world,
2003
01:26:04,662 --> 01:26:07,920
applies in the world of data structures and CS trees.
2004
01:26:07,920 --> 01:26:12,810
But this is interesting because I think we could build this now, this data
2005
01:26:12,810 --> 01:26:15,300
structure in the computer's memory.
2006
01:26:15,300 --> 01:26:15,840
How?
2007
01:26:15,840 --> 01:26:20,040
Well, suppose that we defined a node to be no longer just
2008
01:26:20,040 --> 01:26:22,110
this, a number in a next field.
2009
01:26:22,110 --> 01:26:24,870
What if we give ourselves a bit more room here?
2010
01:26:24,870 --> 01:26:29,730
And give ourselves a pointer called left and another one called right.
2011
01:26:29,730 --> 01:26:32,080
Both of which is a pointer to a struct node.
2012
01:26:32,080 --> 01:26:36,030
So same idea as before, but now we just make sure we think of these things
2013
01:26:36,030 --> 01:26:39,210
as pointing this way and this way, not just this way.
2014
01:26:39,210 --> 01:26:41,280
Not just a single direction, but 2.
2015
01:26:41,280 --> 01:26:45,180
So you could imagine, in code, building something up like this with a node.
2016
01:26:45,180 --> 01:26:48,570
That creates, in essence, this diagram here.
2017
01:26:48,570 --> 01:26:50,250
But why is this compelling?
2018
01:26:50,250 --> 01:26:52,290
Suppose I want to find the number 3.
2019
01:26:52,290 --> 01:26:54,840
I want to search for the number 3 in this tree.
2020
01:26:54,840 --> 01:26:58,200
It would seem, just like Pedro was the beginning of our linked list,
2021
01:26:58,200 --> 01:27:01,090
in the world of trees, the root, so to speak,
2022
01:27:01,090 --> 01:27:03,090
is the beginning of your data structure.
2023
01:27:03,090 --> 01:27:08,730
You can retain and remember this entire tree just by pointing at the root node,
2024
01:27:08,730 --> 01:27:09,270
ultimately.
2025
01:27:09,270 --> 01:27:12,330
One variable can hang on to this whole tree.
2026
01:27:12,330 --> 01:27:14,520
So how can I find the number 3?
2027
01:27:14,520 --> 01:27:18,660
Well, if I look at the root node and the number I'm looking for is less than.
2028
01:27:18,660 --> 01:27:20,250
Notice, I can go this way.
2029
01:27:20,250 --> 01:27:22,570
Or if it's greater than, I can go this way.
2030
01:27:22,570 --> 01:27:24,750
So I preserve that property of the phone book,
2031
01:27:24,750 --> 01:27:27,000
or just assorted array in general.
2032
01:27:27,000 --> 01:27:28,320
What's true over here?
2033
01:27:28,320 --> 01:27:31,328
If I'm looking for 3, I can go to the right of the 2
2034
01:27:31,328 --> 01:27:33,120
because that number is going to be greater.
2035
01:27:33,120 --> 01:27:35,680
If I go left, it's going to be smaller instead.
2036
01:27:35,680 --> 01:27:38,430
And here's an example of actually recursion.
2037
01:27:38,430 --> 01:27:42,090
Recursion in a physical sense much like the Mario's pyramid.
2038
01:27:42,090 --> 01:27:44,250
Which was recursively to find.
2039
01:27:44,250 --> 01:27:45,300
Notice this.
2040
01:27:45,300 --> 01:27:47,250
I claim this whole thing is a tree.
2041
01:27:47,250 --> 01:27:50,790
Specifically, a binary search tree, which means every node
2042
01:27:50,790 --> 01:27:53,880
has 2, or maybe 1, or maybe 0 children.
2043
01:27:53,880 --> 01:27:55,110
But no more than 2.
2044
01:27:55,110 --> 01:27:56,730
Hence the bi in binary.
2045
01:27:56,730 --> 01:28:02,160
And it's the case that every left child is smaller than the root.
2046
01:28:02,160 --> 01:28:05,130
And every right child is larger than the root.
2047
01:28:05,130 --> 01:28:08,100
That definition certainly works for 2, 4, and 6.
2048
01:28:08,100 --> 01:28:12,930
But it also works recursively for every sub tree, or branch of this tree.
2049
01:28:12,930 --> 01:28:14,910
Notice, if you think of this as the root,
2050
01:28:14,910 --> 01:28:16,980
it is indeed bigger than this left child.
2051
01:28:16,980 --> 01:28:19,080
And it's smaller than this right child.
2052
01:28:19,080 --> 01:28:21,600
And if you look even at the leaves, so to speak.
2053
01:28:21,600 --> 01:28:23,010
The grandchildren here.
2054
01:28:23,010 --> 01:28:26,687
This root node is bigger than its left child, if it existed.
2055
01:28:26,687 --> 01:28:28,020
So it's a meaningless statement.
2056
01:28:28,020 --> 01:28:30,210
And it's less than its right child.
2057
01:28:30,210 --> 01:28:33,000
Or it's not greater than, certainly, so that's meaningless too.
2058
01:28:33,000 --> 01:28:36,760
So we haven't violated the definition even for these leaves, as well.
2059
01:28:36,760 --> 01:28:40,230
And so, now, how many steps does it take to find in the worst case
2060
01:28:40,230 --> 01:28:44,580
any number in a binary search tree, it would seem?
2061
01:28:44,580 --> 01:28:46,530
So it seems 2, literally.
2062
01:28:46,530 --> 01:28:48,400
And the height of this thing is actually 3.
2063
01:28:48,400 --> 01:28:51,150
And so long story short, especially, if you're a little less comfy
2064
01:28:51,150 --> 01:28:53,310
with your logarithms from yesteryear.
2065
01:28:53,310 --> 01:28:57,120
Log base 2 is the number of times you can divide something in half, and half,
2066
01:28:57,120 --> 01:28:58,860
and half, until you get down to 1.
2067
01:28:58,860 --> 01:29:01,828
This is like a logarithm in the reverse direction.
2068
01:29:01,828 --> 01:29:03,120
Here's a whole lot of elements.
2069
01:29:03,120 --> 01:29:05,490
And we're having, we're having until we get down to 1.
2070
01:29:05,490 --> 01:29:09,643
So the height of this tree, that is to say, is log base 2 of n.
2071
01:29:09,643 --> 01:29:12,810
Which means that even in the worst case, the number you're looking for maybe
2072
01:29:12,810 --> 01:29:14,685
it's all the way at the bottom in the leaves.
2073
01:29:14,685 --> 01:29:15,330
Doesn't matter.
2074
01:29:15,330 --> 01:29:20,220
It's going to take log base 2 of n steps, or log of n steps,
2075
01:29:20,220 --> 01:29:23,830
to find, maximally, any one of those numbers.
2076
01:29:23,830 --> 01:29:28,620
So, again, binary search is back.
2077
01:29:28,620 --> 01:29:30,635
But we've paid a price, right.
2078
01:29:30,635 --> 01:29:32,010
This isn't a linked list anymore.
2079
01:29:32,010 --> 01:29:33,192
It's a tree.
2080
01:29:33,192 --> 01:29:36,150
But we've gained back binary search, which is pretty compelling, right.
2081
01:29:36,150 --> 01:29:38,775
That's where the whole class began, on making that distinction.
2082
01:29:38,775 --> 01:29:44,020
But what price have we paid to retain binary search in this new world.
2083
01:29:44,020 --> 01:29:44,520
Yeah.
2084
01:29:47,070 --> 01:29:49,050
It's no longer sorted left to right, but this
2085
01:29:49,050 --> 01:29:52,020
is a claim sorted, according to the binary search tree definition.
2086
01:29:52,020 --> 01:29:56,010
Where, again, left child is smaller than root.
2087
01:29:56,010 --> 01:29:58,440
And right child is greater than root.
2088
01:29:58,440 --> 01:30:01,860
So it is sorted, but it's sorted in a 2-dimensional sense, if you will.
2089
01:30:01,860 --> 01:30:02,910
Not just 1.
2090
01:30:02,910 --> 01:30:05,260
But another price paid?
2091
01:30:05,260 --> 01:30:06,670
AUDIENCE: [INAUDIBLE] nodes now.
2092
01:30:06,670 --> 01:30:07,462
SPEAKER 1: Exactly.
2093
01:30:07,462 --> 01:30:11,830
Every node now needs not one number, but 2, 3 pieces of data.
2094
01:30:11,830 --> 01:30:13,630
A number and now 2 pointers.
2095
01:30:13,630 --> 01:30:15,385
So, again, there's that trade off again.
2096
01:30:15,385 --> 01:30:17,260
Where, well, if you want to save time, you've
2097
01:30:17,260 --> 01:30:20,080
got to give something if you start giving space.
2098
01:30:20,080 --> 01:30:22,547
And you start using more space, you can speed up time.
2099
01:30:22,547 --> 01:30:23,380
Like, you've got it.
2100
01:30:23,380 --> 01:30:24,640
There's always a price paid.
2101
01:30:24,640 --> 01:30:30,400
And it's very often in space, or time, or complexity, or developer time,
2102
01:30:30,400 --> 01:30:32,030
the number of bugs you have to solve.
2103
01:30:32,030 --> 01:30:34,060
I mean, all of these are finite resources
2104
01:30:34,060 --> 01:30:35,833
that you have to juggle them on.
2105
01:30:35,833 --> 01:30:38,500
So if we consider now the code with which we can implement this,
2106
01:30:38,500 --> 01:30:40,120
here might be the node.
2107
01:30:40,120 --> 01:30:43,070
And how might we actually use something like this?
2108
01:30:43,070 --> 01:30:45,520
Well, let's take a look at, maybe, one final program.
2109
01:30:45,520 --> 01:30:49,640
And see here, before we transition to higher level concepts, ultimately.
2110
01:30:49,640 --> 01:30:54,070
Let me go ahead here and let me just open a program I wrote here in advance.
2111
01:30:54,070 --> 01:30:58,210
So let me, in a moment, copy over file called tree.c.
2112
01:30:58,210 --> 01:31:01,068
Which we'll have on the course's websites.
2113
01:31:01,068 --> 01:31:02,860
And I'll walk you through some of the logic
2114
01:31:02,860 --> 01:31:07,790
here that I've written for tree.c.
2115
01:31:07,790 --> 01:31:08,290
All right.
2116
01:31:08,290 --> 01:31:09,800
So what do we have here first?
2117
01:31:09,800 --> 01:31:14,440
So here is an implementation of a binary search tree for numbers.
2118
01:31:14,440 --> 01:31:18,860
And as before, I've played around and I've inserted the numbers manually.
2119
01:31:18,860 --> 01:31:20,290
So what's going on first?
2120
01:31:20,290 --> 01:31:24,130
Here is my definition of a node for a binary search tree, copied and pasted
2121
01:31:24,130 --> 01:31:27,010
from what I proposed on the board a moment ago.
2122
01:31:27,010 --> 01:31:29,710
Here are 2 prototypes for 2 functions, that I'll
2123
01:31:29,710 --> 01:31:31,780
show you in a moment, that allow me to free
2124
01:31:31,780 --> 01:31:35,170
an entire tree, one node at a time.
2125
01:31:35,170 --> 01:31:37,900
And then, also allow me to print the tree in order.
2126
01:31:37,900 --> 01:31:40,300
So even though they're not sorted left to right,
2127
01:31:40,300 --> 01:31:43,450
I bet if I'm clever about what child I print first,
2128
01:31:43,450 --> 01:31:46,670
I can reconstruct the idea of printing this tree properly.
2129
01:31:46,670 --> 01:31:49,150
So how might I implement a binary search tree?
2130
01:31:49,150 --> 01:31:50,440
Here's my main function.
2131
01:31:50,440 --> 01:31:53,020
Here is how I might represent a tree of size 0.
2132
01:31:53,020 --> 01:31:55,960
It's just a null pointer called tree.
2133
01:31:55,960 --> 01:31:58,060
Here's how I might add a number to that list.
2134
01:31:58,060 --> 01:32:02,080
So here, for instance, is me malllocing space for a node.
2135
01:32:02,080 --> 01:32:04,210
Storing it in a temporary variable called n.
2136
01:32:04,210 --> 01:32:06,070
Here is me just doing a safety check.
2137
01:32:06,070 --> 01:32:07,780
Make sure n does not equal null.
2138
01:32:07,780 --> 01:32:12,130
And then, here is me initializing this node to contain the number 2, first.
2139
01:32:12,130 --> 01:32:14,860
Then, initializing the left child of that node to be null.
2140
01:32:14,860 --> 01:32:17,510
And the right child of that null node to be null.
2141
01:32:17,510 --> 01:32:22,670
And then, initializing the tree itself to be equal to that particular node.
2142
01:32:22,670 --> 01:32:25,840
So at this point in the story, there's just one rectangle on the screen
2143
01:32:25,840 --> 01:32:28,740
containing the number 2 with no children.
2144
01:32:28,740 --> 01:32:29,240
All right.
2145
01:32:29,240 --> 01:32:31,630
Let's just add manually to this a little further.
2146
01:32:31,630 --> 01:32:34,780
Let's add another number to the list, by mallocing another node.
2147
01:32:34,780 --> 01:32:38,140
I don't need to declare n as a node* because it already exists at this
2148
01:32:38,140 --> 01:32:38,780
point.
2149
01:32:38,780 --> 01:32:40,720
Here's a little safety check.
2150
01:32:40,720 --> 01:32:45,280
I'm going to not bother with my, let me do this, free memory here.
2151
01:32:45,280 --> 01:32:47,240
Just to be safe.
2152
01:32:47,240 --> 01:32:49,803
Do I want to do this?
2153
01:32:49,803 --> 01:32:51,970
We want a free memory too, which I've not done here,
2154
01:32:51,970 --> 01:32:53,650
but I'll save that for another time.
2155
01:32:53,650 --> 01:32:55,990
Here, I'm going to initialize the number to 1.
2156
01:32:55,990 --> 01:33:00,100
I'm going to initialize the children of this node to null and null.
2157
01:33:00,100 --> 01:33:01,810
And now, I'm going to do this.
2158
01:33:01,810 --> 01:33:06,280
Initialize the tree's left child to be n.
2159
01:33:06,280 --> 01:33:09,222
So what that's essentially doing here is if this
2160
01:33:09,222 --> 01:33:12,430
is my root node, the single rectangle I described a moment ago that currently
2161
01:33:12,430 --> 01:33:14,530
has no children, neither left nor right.
2162
01:33:14,530 --> 01:33:16,480
Here's my new node with the number 1.
2163
01:33:16,480 --> 01:33:18,620
I want it to become the new left child.
2164
01:33:18,620 --> 01:33:22,150
So that line of code on the screen there, tree left equals n,
2165
01:33:22,150 --> 01:33:26,720
is like stitching these 2 together with a pointer from 2 to the 1.
2166
01:33:26,720 --> 01:33:27,220
All right.
2167
01:33:27,220 --> 01:33:30,100
The next lines of code, you can probably guess,
2168
01:33:30,100 --> 01:33:32,560
are me adding another number to the list.
2169
01:33:32,560 --> 01:33:33,730
Just the number 3.
2170
01:33:33,730 --> 01:33:39,200
So this is a simpler tree with 2, 1, and, 3 respectively.
2171
01:33:39,200 --> 01:33:41,710
And this code, let me wave my hands, is almost the same.
2172
01:33:41,710 --> 01:33:45,010
Except for the fact that I'm updating the tree's right child
2173
01:33:45,010 --> 01:33:46,990
to be this new and third node.
2174
01:33:46,990 --> 01:33:50,380
Let's now run the code before looking at those 2 functions.
2175
01:33:50,380 --> 01:33:54,280
Let me do make tree, ./tree.
2176
01:33:54,280 --> 01:33:55,510
And while I'll 1, 2, 3.
2177
01:33:55,510 --> 01:33:58,930
So it sounds like the data structure is sorted, to your concern earlier.
2178
01:33:58,930 --> 01:34:00,700
But how did I actually print this?
2179
01:34:00,700 --> 01:34:02,590
And then, eventually, free the whole thing?
2180
01:34:02,590 --> 01:34:05,980
Well let's look at the definition of first print tree.
2181
01:34:05,980 --> 01:34:08,950
And this is where things get interesting.
2182
01:34:08,950 --> 01:34:12,790
Print tree returns nothing so it's a void function.
2183
01:34:12,790 --> 01:34:18,520
But it takes a pointer to a root element as its sole argument, node* root.
2184
01:34:18,520 --> 01:34:19,690
Here's my safety check.
2185
01:34:19,690 --> 01:34:21,790
If root equals equals null, there's obviously
2186
01:34:21,790 --> 01:34:23,110
nothing to print, just return.
2187
01:34:23,110 --> 01:34:24,970
That goes without saying.
2188
01:34:24,970 --> 01:34:27,010
But here's where things get a little magical.
2189
01:34:27,010 --> 01:34:30,280
Otherwise, print your left child.
2190
01:34:30,280 --> 01:34:33,010
Then print your own number.
2191
01:34:33,010 --> 01:34:36,430
Then, print your right child.
2192
01:34:36,430 --> 01:34:41,700
What is this an example of, even though it's not mentioned by name here?
2193
01:34:41,700 --> 01:34:43,320
What programming technique here?
2194
01:34:43,320 --> 01:34:44,250
AUDIENCE: Recursion.
2195
01:34:44,250 --> 01:34:44,917
SPEAKER 1: Yeah.
2196
01:34:44,917 --> 01:34:48,372
So this is actually perhaps the most compelling use of recursion, yet.
2197
01:34:48,372 --> 01:34:50,580
It wasn't really that compelling with the Mario thing
2198
01:34:50,580 --> 01:34:52,710
because we had such an easy implementation with a for-loop loop
2199
01:34:52,710 --> 01:34:53,550
weeks ago.
2200
01:34:53,550 --> 01:34:58,170
But here is a perfect application of recursion, where your data structure
2201
01:34:58,170 --> 01:34:59,910
itself is recursive, right.
2202
01:34:59,910 --> 01:35:02,220
If you take any snip of any branch, it all
2203
01:35:02,220 --> 01:35:04,590
still looks like a tree, just a smaller one.
2204
01:35:04,590 --> 01:35:06,430
That lends itself to recursion.
2205
01:35:06,430 --> 01:35:11,010
So here is this leap of faith where I say, print my left tree, or my left sub
2206
01:35:11,010 --> 01:35:13,830
tree, if you will, via my child at the left.
2207
01:35:13,830 --> 01:35:17,130
Then, I'll print my own root node here in the middle.
2208
01:35:17,130 --> 01:35:19,740
Then, go ahead and print my right sub tree.
2209
01:35:19,740 --> 01:35:24,180
And because we have this base case that makes sure that if the root is null,
2210
01:35:24,180 --> 01:35:26,967
there's nothing to do, you're not going to recurse infinitely.
2211
01:35:26,967 --> 01:35:29,550
You're not going to call yourself again, and again, and again,
2212
01:35:29,550 --> 01:35:31,210
infinitely, many times.
2213
01:35:31,210 --> 01:35:35,400
So it works out and prints the 1, the 2, and the 3.
2214
01:35:35,400 --> 01:35:36,840
And notice what we could do, too.
2215
01:35:36,840 --> 01:35:40,260
If you wanted to print the tree in reverse order, you could do that.
2216
01:35:40,260 --> 01:35:43,050
Print your right tree first, the greater element.
2217
01:35:43,050 --> 01:35:43,950
Then, yourself.
2218
01:35:43,950 --> 01:35:45,330
Then, your smaller sub tree.
2219
01:35:45,330 --> 01:35:47,970
And if I do make tree here and ./tree, well now,
2220
01:35:47,970 --> 01:35:50,100
I've reversed the order of the list.
2221
01:35:50,100 --> 01:35:51,190
And that's pretty cool.
2222
01:35:51,190 --> 01:35:52,940
You can do it with a for-loop in an array.
2223
01:35:52,940 --> 01:35:56,370
But you can also do it, even with this 2-dimensional structure.
2224
01:35:56,370 --> 01:36:00,180
Let's lastly look at this free tree function.
2225
01:36:00,180 --> 01:36:02,160
And this one's almost the same.
2226
01:36:02,160 --> 01:36:05,400
Order doesn't matter in quite the same way, but it does still matter.
2227
01:36:05,400 --> 01:36:07,020
Here's what I did with free tree.
2228
01:36:07,020 --> 01:36:09,978
Well, if the root of the tree is null, there's obviously nothing to do.
2229
01:36:09,978 --> 01:36:10,560
Just return.
2230
01:36:10,560 --> 01:36:15,100
Otherwise, go ahead and free your left child and all of its descendants.
2231
01:36:15,100 --> 01:36:18,090
Then free your right child and all of its descendants.
2232
01:36:18,090 --> 01:36:19,900
And then, free yourself.
2233
01:36:19,900 --> 01:36:25,690
And again, free literally just frees the address in that variable.
2234
01:36:25,690 --> 01:36:27,570
It doesn't free the whole darn thing.
2235
01:36:27,570 --> 01:36:29,850
It just frees literally what's at that address.
2236
01:36:29,850 --> 01:36:33,900
Why was it important that I did line 72 last, though?
2237
01:36:33,900 --> 01:36:36,450
Why did I free the left child and the right child
2238
01:36:36,450 --> 01:36:39,973
before I freed myself, so to speak?
2239
01:36:39,973 --> 01:36:40,890
AUDIENCE: [INAUDIBLE].
2240
01:36:40,890 --> 01:36:41,682
SPEAKER 1: Exactly.
2241
01:36:41,682 --> 01:36:46,140
If you free yourself first, if I had done incorrectly this line higher up,
2242
01:36:46,140 --> 01:36:50,820
you're not allowed to touch the left child tree or the right child tree.
2243
01:36:50,820 --> 01:36:53,350
Because the memory address is no longer valid at that point.
2244
01:36:53,350 --> 01:36:55,290
You would get some memory error, perhaps.
2245
01:36:55,290 --> 01:36:56,310
The program would crash.
2246
01:36:56,310 --> 01:36:57,990
Valgrind definitely wouldn't like it.
2247
01:36:57,990 --> 01:37:00,060
Bad things would otherwise happen.
2248
01:37:00,060 --> 01:37:01,890
But here, then, is an example of recursion.
2249
01:37:01,890 --> 01:37:06,360
And again, just a recursive use of an actual data structure.
2250
01:37:06,360 --> 01:37:09,120
And what's even cooler here is, relatively speaking,
2251
01:37:09,120 --> 01:37:11,640
suppose we wanted to search something like this.
2252
01:37:11,640 --> 01:37:15,720
Binary search actually gets pretty straightforward to implement 2.
2253
01:37:15,720 --> 01:37:16,410
For instance.
2254
01:37:16,410 --> 01:37:20,940
here might be the prototype for a search function for a binary search tree.
2255
01:37:20,940 --> 01:37:25,920
You give me the root of a tree, and you give me a number I'm looking for,
2256
01:37:25,920 --> 01:37:29,880
and I can pretty easily now return true if it's in there or false if it's not.
2257
01:37:29,880 --> 01:37:30,450
How?
2258
01:37:30,450 --> 01:37:32,430
Well, let's first ask a question.
2259
01:37:32,430 --> 01:37:35,395
If tree equals equals null, then you just return false.
2260
01:37:35,395 --> 01:37:38,520
Because if there's no tree, there's no number, so it's obviously not there.
2261
01:37:38,520 --> 01:37:39,860
Return false.
2262
01:37:39,860 --> 01:37:46,560
Else if, the number you're looking for is less than the tree's own number,
2263
01:37:46,560 --> 01:37:48,570
which direction should we go?
2264
01:37:48,570 --> 01:37:49,247
AUDIENCE: Left.
2265
01:37:49,247 --> 01:37:50,080
SPEAKER 1: OK, left.
2266
01:37:50,080 --> 01:37:51,190
How do we express that?
2267
01:37:51,190 --> 01:37:54,300
Well, let's just return the answer to this question.
2268
01:37:54,300 --> 01:37:58,440
Search the left sub tree, by way of my left child,
2269
01:37:58,440 --> 01:37:59,970
looking for the same number.
2270
01:37:59,970 --> 01:38:02,250
And you just assume through the beauty of recursion
2271
01:38:02,250 --> 01:38:05,400
that you're kicking the can and let yourself figure it out
2272
01:38:05,400 --> 01:38:06,600
with a smaller problem.
2273
01:38:06,600 --> 01:38:09,060
Just that snipped left tree instead.
2274
01:38:09,060 --> 01:38:13,320
Else if, the number you're looking for is greater than the tree's own number,
2275
01:38:13,320 --> 01:38:15,160
go to the right, as you might infer.
2276
01:38:15,160 --> 01:38:18,060
So I can just return the answer to this question.
2277
01:38:18,060 --> 01:38:21,150
Search my right sub tree for that same number.
2278
01:38:21,150 --> 01:38:23,020
And there's a fourth and final condition.
2279
01:38:23,020 --> 01:38:26,250
What's the fourth scenario we have to consider, explicitly?
2280
01:38:26,250 --> 01:38:26,760
Yeah.
2281
01:38:26,760 --> 01:38:27,780
AUDIENCE: The number.
2282
01:38:27,780 --> 01:38:29,822
SPEAKER 1: If the number, itself, is right there.
2283
01:38:29,822 --> 01:38:33,480
So else if, the number I'm looking for equals the tree's own number,
2284
01:38:33,480 --> 01:38:36,250
then and only then, should you return true.
2285
01:38:36,250 --> 01:38:38,490
And if you're thinking quickly here, there's
2286
01:38:38,490 --> 01:38:42,150
an optimization possible, better design opportunity.
2287
01:38:42,150 --> 01:38:43,650
Think back to even our scratch days.
2288
01:38:43,650 --> 01:38:45,770
What could we do a little better here?
2289
01:38:45,770 --> 01:38:46,710
You're pointing at it.
2290
01:38:46,710 --> 01:38:47,508
AUDIENCE: Else.
2291
01:38:47,508 --> 01:38:48,300
SPEAKER 1: Exactly.
2292
01:38:48,300 --> 01:38:49,140
An else suffices.
2293
01:38:49,140 --> 01:38:51,682
Because if there's logically only 4 things that could happen,
2294
01:38:51,682 --> 01:38:54,540
you're wasting your time by asking a fourth gratuitous question.
2295
01:38:54,540 --> 01:38:55,860
And else here suffices.
2296
01:38:55,860 --> 01:38:59,500
So here to, more so than the Mario example a few weeks ago,
2297
01:38:59,500 --> 01:39:02,100
there's just this elegance arguably to recursion.
2298
01:39:02,100 --> 01:39:02,850
And that's it.
2299
01:39:02,850 --> 01:39:03,960
This is not pseudocode.
2300
01:39:03,960 --> 01:39:07,950
This is the code for binary search on a binary search tree.
2301
01:39:07,950 --> 01:39:10,020
And so, recursion tends to work in lockstep
2302
01:39:10,020 --> 01:39:14,700
with these kinds of data structures that have this structure to them
2303
01:39:14,700 --> 01:39:16,180
as we're seeing here.
2304
01:39:16,180 --> 01:39:16,680
All right.
2305
01:39:16,680 --> 01:39:22,360
Any questions, then, on binary search as implemented here with a tree?
2306
01:39:22,360 --> 01:39:23,227
Yeah.
2307
01:39:23,227 --> 01:39:25,175
AUDIENCE: About like third years.
2308
01:39:25,175 --> 01:39:26,149
[INAUDIBLE]
2309
01:39:29,688 --> 01:39:30,730
SPEAKER 1: Good question.
2310
01:39:30,730 --> 01:39:36,690
So when returning a Boolean value, true and false are values that are defined
2311
01:39:36,690 --> 01:39:40,350
in a library called Standard Bool, S-T-D-B-O-O-L dot H.
2312
01:39:40,350 --> 01:39:42,480
With a header file that you can use.
2313
01:39:42,480 --> 01:39:49,258
It is the case that true is, it's not well defined what they are.
2314
01:39:49,258 --> 01:39:50,550
But they would map indeed, yes.
2315
01:39:50,550 --> 01:39:51,960
To 0 and 1, essentially.
2316
01:39:51,960 --> 01:39:54,390
But you should not compare them explicitly to 0 and 1.
2317
01:39:54,390 --> 01:39:57,390
When you're using true and false, you should compare them to each other.
2318
01:39:57,390 --> 01:40:01,375
AUDIENCE: I meant if it's in a code return.
2319
01:40:01,375 --> 01:40:02,250
SPEAKER 1: Oh, sorry.
2320
01:40:02,250 --> 01:40:05,850
So if I am in my own code from earlier, an avoid function,
2321
01:40:05,850 --> 01:40:08,280
it is totally fine to return.
2322
01:40:08,280 --> 01:40:10,950
You just can't return something explicitly.
2323
01:40:10,950 --> 01:40:12,720
So return just means that's it.
2324
01:40:12,720 --> 01:40:14,280
Quit out of this function.
2325
01:40:14,280 --> 01:40:16,150
You're not actually handing back a value.
2326
01:40:16,150 --> 01:40:19,770
So it's a way of short circuiting the execution.
2327
01:40:19,770 --> 01:40:22,050
If you don't like that, and some people do frown
2328
01:40:22,050 --> 01:40:26,760
upon having code return from functions prematurely, you could invert the logic
2329
01:40:26,760 --> 01:40:28,050
and do something like this.
2330
01:40:28,050 --> 01:40:31,740
If the root does not equal null, do all of these things.
2331
01:40:31,740 --> 01:40:34,020
And then, indent all three of these lines underneath.
2332
01:40:34,020 --> 01:40:35,490
That's perfectly fine too.
2333
01:40:35,490 --> 01:40:37,290
I happen to write it the other way just so
2334
01:40:37,290 --> 01:40:40,990
that there was explicitly a base case that I could point to on the screen.
2335
01:40:40,990 --> 01:40:43,920
Whereas, now, it's implicitly there for us only.
2336
01:40:43,920 --> 01:40:45,790
But a good observation too.
2337
01:40:45,790 --> 01:40:46,290
All right.
2338
01:40:46,290 --> 01:40:49,960
So let's ask the question as before about running time of this.
2339
01:40:49,960 --> 01:40:51,930
It would look like binary search is back.
2340
01:40:51,930 --> 01:40:57,600
And we can now do things in logarithmic time, but we should be careful.
2341
01:40:57,600 --> 01:40:59,940
Is this a binary search tree?
2342
01:40:59,940 --> 01:41:01,660
Just to be clear.
2343
01:41:01,660 --> 01:41:04,380
And again, a binary search tree is a tree
2344
01:41:04,380 --> 01:41:11,118
where the root is greater than its left child and smaller than its right child.
2345
01:41:11,118 --> 01:41:11,910
That's the essence.
2346
01:41:11,910 --> 01:41:13,380
So you're nodding your head.
2347
01:41:13,380 --> 01:41:15,280
You agree?
2348
01:41:15,280 --> 01:41:16,020
I agree.
2349
01:41:16,020 --> 01:41:18,030
So this is a binary search tree.
2350
01:41:18,030 --> 01:41:20,390
Is this a binary search tree?
2351
01:41:20,390 --> 01:41:21,330
[INTERPOSING VOICES]
2352
01:41:21,330 --> 01:41:21,830
OK.
2353
01:41:21,830 --> 01:41:22,860
I'm hearing yeses.
2354
01:41:22,860 --> 01:41:25,710
Or I'm hearing just my delay changing the vote it would seem.
2355
01:41:25,710 --> 01:41:28,080
So this is one of those trick questions.
2356
01:41:28,080 --> 01:41:30,480
This is a binary search tree because I've not
2357
01:41:30,480 --> 01:41:33,390
violated the definition of what I gave you, right.
2358
01:41:33,390 --> 01:41:39,480
Is there any example of a left child that is greater than its parent?
2359
01:41:39,480 --> 01:41:42,480
Or is there any example of a right child that's smaller than its parent?
2360
01:41:42,480 --> 01:41:44,897
That's just the opposite way of describing the same thing.
2361
01:41:44,897 --> 01:41:47,070
No, this is a binary search tree.
2362
01:41:47,070 --> 01:41:50,210
Unfortunately, it also looks like, albeit at a different axis, what?
2363
01:41:50,210 --> 01:41:51,210
AUDIENCE: A linked list.
2364
01:41:51,210 --> 01:41:51,900
SPEAKER 1: A linked list.
2365
01:41:51,900 --> 01:41:53,970
But you could imagine this happening, right.
2366
01:41:53,970 --> 01:41:56,640
Suppose that I hadn't been as thoughtful as I was earlier
2367
01:41:56,640 --> 01:41:59,970
by inserting 2, And then 1, and then 3.
2368
01:41:59,970 --> 01:42:02,160
Which nicely balanced everything out.
2369
01:42:02,160 --> 01:42:04,860
Suppose that instead, because of what the user is typing in
2370
01:42:04,860 --> 01:42:07,980
or whatever you contrive in your own code, suppose you insert a 1,
2371
01:42:07,980 --> 01:42:10,260
and then a 2, and then a 3.
2372
01:42:10,260 --> 01:42:12,850
Like, you've created a problem for yourself.
2373
01:42:12,850 --> 01:42:16,290
Because if we follow the same logic as before, going left or going right,
2374
01:42:16,290 --> 01:42:21,030
this is how you might implement a binary search tree accidentally
2375
01:42:21,030 --> 01:42:24,750
if you just blindly keep following that definition.
2376
01:42:24,750 --> 01:42:27,030
I mean, this would be better designed as what?
2377
01:42:27,030 --> 01:42:29,490
If we rotated the whole thing around.
2378
01:42:29,490 --> 01:42:30,870
And that's totally fine.
2379
01:42:30,870 --> 01:42:33,060
And those kinds of trees actually have names.
2380
01:42:33,060 --> 01:42:35,400
There's trees called AVL trees in computer science.
2381
01:42:35,400 --> 01:42:37,050
There are red-black black trees in computer science.
2382
01:42:37,050 --> 01:42:39,300
There are other types of trees that, additionally,
2383
01:42:39,300 --> 01:42:42,510
add some logic that tell you when you got to pivot the thing,
2384
01:42:42,510 --> 01:42:46,238
and rotate it, and snip off the root, and fix things in this way.
2385
01:42:46,238 --> 01:42:48,030
But a binary search tree, in and of itself,
2386
01:42:48,030 --> 01:42:51,670
does not guarantee that it will be balanced, so to speak.
2387
01:42:51,670 --> 01:42:54,240
And so, if you consider the worst case scenario
2388
01:42:54,240 --> 01:42:55,860
of even using a binary search tree.
2389
01:42:55,860 --> 01:42:57,960
If you're not smart about the code you're writing
2390
01:42:57,960 --> 01:43:00,180
and you just blindly follow this definition,
2391
01:43:00,180 --> 01:43:04,290
you might accidentally create a crazy, long and stringy binary search
2392
01:43:04,290 --> 01:43:07,050
tree that essentially looks like a linked list.
2393
01:43:07,050 --> 01:43:09,510
Because you're not even using any of the left children.
2394
01:43:09,510 --> 01:43:12,750
So unfortunately, the literal answer to the question
2395
01:43:12,750 --> 01:43:15,480
here is what's the running time of search?
2396
01:43:15,480 --> 01:43:17,400
Well, hopefully, log n.
2397
01:43:17,400 --> 01:43:19,980
But not if you don't maintain the balance of the tree.
2398
01:43:19,980 --> 01:43:25,290
Both, in certain search, could actually devolve into instead of big O of log n,
2399
01:43:25,290 --> 01:43:26,952
literally, big O of n.
2400
01:43:26,952 --> 01:43:29,160
If you don't somehow take into account, and we're not
2401
01:43:29,160 --> 01:43:30,720
going to do the code for that here.
2402
01:43:30,720 --> 01:43:34,140
It's a higher level thing you might explore down the road.
2403
01:43:34,140 --> 01:43:37,930
It can devolve into something that you might not have intended.
2404
01:43:37,930 --> 01:43:40,022
And so, now that we're talking about 2 dimensions,
2405
01:43:40,022 --> 01:43:41,730
it's really the onus is on the programmer
2406
01:43:41,730 --> 01:43:44,490
to consider what kinds of perverse situations might happen.
2407
01:43:44,490 --> 01:43:46,860
Where the thing devolves into a structure
2408
01:43:46,860 --> 01:43:50,350
that you don't actually want it to devolve into.
2409
01:43:50,350 --> 01:43:50,850
All right.
2410
01:43:50,850 --> 01:43:52,360
We've got just a few structures to go.
2411
01:43:52,360 --> 01:43:53,940
Let's go ahead and take one more 5 minute break here.
2412
01:43:53,940 --> 01:43:55,410
When we come back, we'll talk at this level
2413
01:43:55,410 --> 01:43:57,030
about some final applications of this.
2414
01:43:57,030 --> 01:43:58,510
See you in 5.
2415
01:43:58,510 --> 01:44:00,270
All right.
2416
01:44:00,270 --> 01:44:01,860
So we are back.
2417
01:44:01,860 --> 01:44:05,250
And as promised, we'll operate now at this higher level.
2418
01:44:05,250 --> 01:44:08,520
Where if we take for granted that, even though you haven't had an opportunity
2419
01:44:08,520 --> 01:44:11,312
to play with these techniques yet, you have the ability now in code
2420
01:44:11,312 --> 01:44:12,780
to stitch things together.
2421
01:44:12,780 --> 01:44:15,630
Both in a one dimension and even 2 dimensions,
2422
01:44:15,630 --> 01:44:17,970
to build things like lists and trees.
2423
01:44:17,970 --> 01:44:19,980
So if we have these building blocks.
2424
01:44:19,980 --> 01:44:22,680
Things like now arrays, and lists, and trees,
2425
01:44:22,680 --> 01:44:26,790
what if we start to amalgamate them such that we build things out
2426
01:44:26,790 --> 01:44:28,900
of multiple data structures?
2427
01:44:28,900 --> 01:44:32,360
Can we start to get some of the best of both worlds by way of, for instance,
2428
01:44:32,360 --> 01:44:33,710
something called a hash table.
2429
01:44:33,710 --> 01:44:37,540
So a hash table is a Swiss army knife of data structures
2430
01:44:37,540 --> 01:44:39,310
in that it's so commonly used.
2431
01:44:39,310 --> 01:44:44,000
Because it allows you to associate keys with value, so to speak.
2432
01:44:44,000 --> 01:44:49,060
So, for instance, it allows you to associate a username with a password.
2433
01:44:49,060 --> 01:44:51,070
Or a name with a number.
2434
01:44:51,070 --> 01:44:53,920
Or anything where you have to take something as input,
2435
01:44:53,920 --> 01:44:56,300
and get as output a corresponding piece of information.
2436
01:44:56,300 --> 01:44:59,210
A hash table is often a data structure of choice.
2437
01:44:59,210 --> 01:45:00,460
And here's what it looks like.
2438
01:45:00,460 --> 01:45:02,800
It's actually looks like an array, at first glance.
2439
01:45:02,800 --> 01:45:05,990
But for discussion's sake, I've drawn this array vertically,
2440
01:45:05,990 --> 01:45:06,920
which is totally fine.
2441
01:45:06,920 --> 01:45:08,660
It's still just an array.
2442
01:45:08,660 --> 01:45:13,720
But it allows you, a hash table, to jump to any of these locations randomly.
2443
01:45:13,720 --> 01:45:14,740
That is instantly.
2444
01:45:14,740 --> 01:45:18,130
So, for instance, there's actually 26 locations in this array.
2445
01:45:18,130 --> 01:45:21,100
Because I want to, for instance, store initially
2446
01:45:21,100 --> 01:45:23,980
names of people, for instance.
2447
01:45:23,980 --> 01:45:26,653
And wouldn't it be nice if the person's name starts with A,
2448
01:45:26,653 --> 01:45:27,820
I have a go to place for it.
2449
01:45:27,820 --> 01:45:28,780
Maybe the first box.
2450
01:45:28,780 --> 01:45:30,863
And if it starts with Z, I put them at the bottom.
2451
01:45:30,863 --> 01:45:33,070
So that I can jump instantly, arithmetically,
2452
01:45:33,070 --> 01:45:35,470
using a little bit of Ascii or Unicode fanciness,
2453
01:45:35,470 --> 01:45:38,540
exactly to the location that they want to they need to go.
2454
01:45:38,540 --> 01:45:40,690
So, for instance, here's our array 0 index.
2455
01:45:40,690 --> 01:45:42,130
0 through 25.
2456
01:45:42,130 --> 01:45:44,500
If I think of this, though, as A through Z,
2457
01:45:44,500 --> 01:45:46,370
I'm going to think of these 26 locations,
2458
01:45:46,370 --> 01:45:49,630
now in the context of a hash table, is what we'll generally call buckets.
2459
01:45:49,630 --> 01:45:52,010
So buckets into which you can put values.
2460
01:45:52,010 --> 01:45:56,380
So, for instance, suppose that we want to insert a value, one name
2461
01:45:56,380 --> 01:45:57,590
into this data structure.
2462
01:45:57,590 --> 01:45:59,260
And that name is say, Albus.
2463
01:45:59,260 --> 01:46:03,980
So Albus starting with A. Albus might go at the very beginning of this list.
2464
01:46:03,980 --> 01:46:04,480
All right.
2465
01:46:04,480 --> 01:46:06,188
And then, we want to insert another name.
2466
01:46:06,188 --> 01:46:07,630
This one happens to be Zacharias.
2467
01:46:07,630 --> 01:46:10,690
Starting with Z, so it goes all the way at the end of this data
2468
01:46:10,690 --> 01:46:12,490
structure in location 25 a.k.a.
2469
01:46:12,490 --> 01:46:13,390
Z.
2470
01:46:13,390 --> 01:46:17,260
And then, maybe a third name like Hermione, and that goes at location H
2471
01:46:17,260 --> 01:46:19,310
according to that position in the alphabet.
2472
01:46:19,310 --> 01:46:22,060
So this is great because in constant time,
2473
01:46:22,060 --> 01:46:26,020
I can insert and conversely search for any of these names,
2474
01:46:26,020 --> 01:46:27,700
based on the first letter of their name.
2475
01:46:27,700 --> 01:46:30,098
A, or Z, or H, in this case.
2476
01:46:30,098 --> 01:46:32,890
Let's fast forward and assume we put a whole bunch of other names--
2477
01:46:32,890 --> 01:46:34,900
might look familiar, into this hash table.
2478
01:46:34,900 --> 01:46:39,110
It's great because every name has its own location.
2479
01:46:39,110 --> 01:46:43,480
But if you're thinking of names you don't yet see it on the screen,
2480
01:46:43,480 --> 01:46:45,710
we eventually encounter a problem with this, right.
2481
01:46:45,710 --> 01:46:49,480
When could something go wrong using a hash table like this
2482
01:46:49,480 --> 01:46:52,090
if we wanted to insert even more names?
2483
01:46:52,090 --> 01:46:54,290
What's going to eventually happen?
2484
01:46:54,290 --> 01:46:54,790
Yeah.
2485
01:46:54,790 --> 01:46:56,998
There's already someone with the first letter, right.
2486
01:46:56,998 --> 01:46:59,860
Like I haven't even mentioned Harry, for instance, or Hagrid.
2487
01:46:59,860 --> 01:47:01,750
And yet, Hermione's already using that spot.
2488
01:47:01,750 --> 01:47:04,030
So that invites the question, well, what happens?
2489
01:47:04,030 --> 01:47:07,600
Maybe, if we want to insert Harry next, do we maybe cheat and put him
2490
01:47:07,600 --> 01:47:08,710
at location I?
2491
01:47:08,710 --> 01:47:11,323
But then if there's a location I, where do we put them?
2492
01:47:11,323 --> 01:47:13,990
And it just feels like the situation could very quickly devolve.
2493
01:47:13,990 --> 01:47:16,930
But I've deliberately drawn this data structure,
2494
01:47:16,930 --> 01:47:19,990
that I claim as a hash table, in 2 directions.
2495
01:47:19,990 --> 01:47:22,120
An array vertically, here.
2496
01:47:22,120 --> 01:47:25,300
But what might this be hinting I'm using horizontally,
2497
01:47:25,300 --> 01:47:28,300
even though I'm drawing the rectangles a little differently from before?
2498
01:47:28,300 --> 01:47:29,092
AUDIENCE: An array.
2499
01:47:29,092 --> 01:47:29,758
SPEAKER 1: Yeah.
2500
01:47:29,758 --> 01:47:31,091
Maybe another array, to be fair.
2501
01:47:31,091 --> 01:47:34,258
But, honestly, arrays are such a pain with the allocating, and reallocating,
2502
01:47:34,258 --> 01:47:34,810
and so forth.
2503
01:47:34,810 --> 01:47:38,600
These look like the beginnings of a linked list, if you will.
2504
01:47:38,600 --> 01:47:42,190
Where the name is where the number used to be, even though I'm drawing it
2505
01:47:42,190 --> 01:47:44,200
horizontally now just for discussion's sake.
2506
01:47:44,200 --> 01:47:47,800
And this seems to be a pointer that isn't pointing anywhere yet.
2507
01:47:47,800 --> 01:47:53,080
But it looks like the array is 26 pointers, some of which are null,
2508
01:47:53,080 --> 01:47:53,920
that is empty.
2509
01:47:53,920 --> 01:47:56,675
Some of which are pointing at the first node in a linked list.
2510
01:47:56,675 --> 01:47:59,050
So that's really what a hash table might be in your mind.
2511
01:47:59,050 --> 01:48:03,828
An amalgam of an array, whose elements are linked lists.
2512
01:48:03,828 --> 01:48:06,370
And in theory, this gives you the best of both worlds, right.
2513
01:48:06,370 --> 01:48:09,430
You get random access with high probability, right.
2514
01:48:09,430 --> 01:48:12,620
You get to jump immediately to the location you want to put someone.
2515
01:48:12,620 --> 01:48:15,430
But, if you run into this perverse situation where there's someone
2516
01:48:15,430 --> 01:48:16,870
already there, OK, fine.
2517
01:48:16,870 --> 01:48:20,350
It starts to devolve into a linked list, but it's at least 26
2518
01:48:20,350 --> 01:48:21,580
smaller length lists.
2519
01:48:21,580 --> 01:48:24,670
Not one massive linked list, which would be Big O of n.
2520
01:48:24,670 --> 01:48:26,480
And quite slow to solve.
2521
01:48:26,480 --> 01:48:28,630
So if Harry gets inserted in Hagrid.
2522
01:48:28,630 --> 01:48:32,780
Yeah, you have to chain them together, so to speak, in this way.
2523
01:48:32,780 --> 01:48:35,645
But, at least you've not painted yourself into a corner.
2524
01:48:35,645 --> 01:48:38,770
And in fact, if we fast forward and put a whole bunch of familiar names in,
2525
01:48:38,770 --> 01:48:41,120
the data structure starts to look like this.
2526
01:48:41,120 --> 01:48:43,460
So the chains not terribly long.
2527
01:48:43,460 --> 01:48:46,270
And some of them are actually of size 0 because there's just
2528
01:48:46,270 --> 01:48:49,150
some unpopular letters of the alphabet among these names.
2529
01:48:49,150 --> 01:48:51,100
But it seems better than just putting everyone
2530
01:48:51,100 --> 01:48:53,860
in one big array, or one big linked list.
2531
01:48:53,860 --> 01:48:58,190
We're trying to balance these trade offs a little bit in the middle here.
2532
01:48:58,190 --> 01:49:00,410
Well, how might we represent something like this?
2533
01:49:00,410 --> 01:49:02,140
Here's how we could describe this thing.
2534
01:49:02,140 --> 01:49:05,320
A node in the context of a linked list could be this.
2535
01:49:05,320 --> 01:49:08,860
I have an array called word of type char.
2536
01:49:08,860 --> 01:49:13,060
And it's big enough to fit the longest word in the alphabet plus 1.
2537
01:49:13,060 --> 01:49:14,890
And the plus 1 why, probably?
2538
01:49:14,890 --> 01:49:15,760
AUDIENCE: The null.
2539
01:49:15,760 --> 01:49:16,730
SPEAKER 1: The null character.
2540
01:49:16,730 --> 01:49:19,840
So I'm assuming that longest word is like a constant defined elsewhere
2541
01:49:19,840 --> 01:49:20,470
in the story.
2542
01:49:20,470 --> 01:49:22,735
And it's something big like 40, 100, whatever.
2543
01:49:22,735 --> 01:49:25,810
Whatever the longest word in the Harry Potter universe
2544
01:49:25,810 --> 01:49:28,440
is or the English dictionary is.
2545
01:49:28,440 --> 01:49:34,050
Longest word plus 1 should be sufficient to store any name in the story here.
2546
01:49:34,050 --> 01:49:36,360
And then, what else does it each of these nodes have?
2547
01:49:36,360 --> 01:49:40,060
Well it has a pointer to another node.
2548
01:49:40,060 --> 01:49:42,390
So here's how we might implement the notion of a node
2549
01:49:42,390 --> 01:49:46,710
in the context of storing not integers, but names.
2550
01:49:46,710 --> 01:49:48,360
Instead, like this.
2551
01:49:48,360 --> 01:49:51,360
But how do we decide what the hash table itself is?
2552
01:49:51,360 --> 01:49:55,140
Well, if we now have a definition of a node, we could have a variable in main,
2553
01:49:55,140 --> 01:49:57,510
or even globally, called hash table.
2554
01:49:57,510 --> 01:50:02,910
That itself is an array of node* pointers.
2555
01:50:02,910 --> 01:50:05,310
That is an array of pointers to nodes.
2556
01:50:05,310 --> 01:50:07,290
The beginnings of linked lists.
2557
01:50:07,290 --> 01:50:08,950
Number of buckets is to me.
2558
01:50:08,950 --> 01:50:11,083
I proposed, verbally, that it be 26.
2559
01:50:11,083 --> 01:50:13,500
But honestly, if you get a lot of collisions, so to speak.
2560
01:50:13,500 --> 01:50:15,623
A lot of H names trying to go to the same place.
2561
01:50:15,623 --> 01:50:17,790
Well, maybe, we need to be smarter and not just look
2562
01:50:17,790 --> 01:50:19,207
at the first letter of their name.
2563
01:50:19,207 --> 01:50:20,800
But, maybe, the first and the second.
2564
01:50:20,800 --> 01:50:24,900
So it's H-A and H-E. But wait, no, then Harry and Hagrid still collide.
2565
01:50:24,900 --> 01:50:27,840
But we start to at least make the problem a little less
2566
01:50:27,840 --> 01:50:31,500
impactful by tinkering with something like the number of buckets
2567
01:50:31,500 --> 01:50:32,880
in a hash table like this.
2568
01:50:32,880 --> 01:50:37,560
But how do we decide where someone goes in a hash table in this way?
2569
01:50:37,560 --> 01:50:39,900
Well, it's an old school problem of input and output.
2570
01:50:39,900 --> 01:50:43,260
The input to the problem is going to be something like the name.
2571
01:50:43,260 --> 01:50:45,300
And the algorithm in the middle, as of today,
2572
01:50:45,300 --> 01:50:47,730
is going to be something called a hash function.
2573
01:50:47,730 --> 01:50:49,620
A hash function is generally something that
2574
01:50:49,620 --> 01:50:53,370
takes as input, a string, a number, whatever, and produces
2575
01:50:53,370 --> 01:50:55,860
as output a location in our context.
2576
01:50:55,860 --> 01:50:57,750
Like a number 0 through 25.
2577
01:50:57,750 --> 01:50:59,490
Or 0 through 16,000.
2578
01:50:59,490 --> 01:51:02,190
Or whatever the number of buckets you want is,
2579
01:51:02,190 --> 01:51:06,370
it's going to just tell you where to put that input at a specific location.
2580
01:51:06,370 --> 01:51:10,200
So, for instance, Albus, according to the story thus far, gave me back to 0
2581
01:51:10,200 --> 01:51:10,710
as output.
2582
01:51:10,710 --> 01:51:12,570
Zacharias gave me 25.
2583
01:51:12,570 --> 01:51:15,300
So the hash function, in the middle of that black box,
2584
01:51:15,300 --> 01:51:17,760
is pretty simplistic in this story.
2585
01:51:17,760 --> 01:51:21,360
It's just looking at the Ascii value, it seems, of the first letter
2586
01:51:21,360 --> 01:51:22,110
in their name.
2587
01:51:22,110 --> 01:51:25,150
And then, subtracting off what capital A is 65.
2588
01:51:25,150 --> 01:51:29,470
So like doing some math to get back in number between 0 and 25.
2589
01:51:29,470 --> 01:51:32,610
So that's how we got to this point in the story.
2590
01:51:32,610 --> 01:51:37,440
And how might we, then, resolve the problem further and use
2591
01:51:37,440 --> 01:51:39,060
this notion of hashing more generally?
2592
01:51:39,060 --> 01:51:40,935
Well just for demonstration sake here, here's
2593
01:51:40,935 --> 01:51:43,290
actually some buckets, literally.
2594
01:51:43,290 --> 01:51:46,380
And we've labeled, in advance, these buckets with the suits
2595
01:51:46,380 --> 01:51:47,800
from a deck of cards.
2596
01:51:47,800 --> 01:51:49,770
So we've got some spades.
2597
01:51:49,770 --> 01:51:54,600
And we've got diamonds here.
2598
01:51:54,600 --> 01:51:58,110
And we've got, what else here?
2599
01:51:58,110 --> 01:52:01,890
Clubs and hearts.
2600
01:52:01,890 --> 01:52:04,592
So we have a deck of cards here, for instance, right.
2601
01:52:04,592 --> 01:52:07,050
And this is something you, yourself, might do instinctively
2602
01:52:07,050 --> 01:52:09,420
if you're getting ready to start playing a game of cards.
2603
01:52:09,420 --> 01:52:11,587
You're just cleaning up or you want things in order.
2604
01:52:11,587 --> 01:52:13,963
Like, here is literally a jumbo deck of cards.
2605
01:52:13,963 --> 01:52:16,380
What would be the easiest way for me to sort these things?
2606
01:52:16,380 --> 01:52:19,088
Well we've got a whole bunch of sorting algorithms from the past.
2607
01:52:19,088 --> 01:52:21,630
So I could go through like, here's the 3 of diamonds.
2608
01:52:21,630 --> 01:52:23,880
And I could, here let me throw this up on the screen.
2609
01:52:23,880 --> 01:52:25,570
Just so, if you're far in back.
2610
01:52:25,570 --> 01:52:27,900
So here's diamonds.
2611
01:52:27,900 --> 01:52:28,890
I could put this here.
2612
01:52:28,890 --> 01:52:30,510
3, 4.
2613
01:52:30,510 --> 01:52:32,130
I could do this in order here.
2614
01:52:32,130 --> 01:52:34,540
But a lot of us, honestly, if given a deck of cards.
2615
01:52:34,540 --> 01:52:37,290
And you just want to clean it up and sort it in order,
2616
01:52:37,290 --> 01:52:38,620
you might do things like this.
2617
01:52:38,620 --> 01:52:42,030
Well here's my input, 3 of diamonds, let's put it in this bucket.
2618
01:52:42,030 --> 01:52:43,770
4 of diamonds, this bucket.
2619
01:52:43,770 --> 01:52:45,640
5 of diamonds, this bucket.
2620
01:52:45,640 --> 01:52:49,500
And if you keep going through the cards, here's seven of hearts, hearts bucket.
2621
01:52:49,500 --> 01:52:51,210
8's bucket.
2622
01:52:51,210 --> 01:52:53,070
Queen of spades over here.
2623
01:52:53,070 --> 01:52:55,020
And it's still going to take you 52 steps.
2624
01:52:55,020 --> 01:52:58,020
But at the end of it, you have hashed all of the cards
2625
01:52:58,020 --> 01:52:59,610
into 4 distinct buckets.
2626
01:52:59,610 --> 01:53:02,490
And now you have problems of size 13, which
2627
01:53:02,490 --> 01:53:06,030
is a little more tenable than doing one massive 52 card problem.
2628
01:53:06,030 --> 01:53:08,070
You can now do 4, 13 size problems.
2629
01:53:08,070 --> 01:53:11,790
And so hashing is something that even you and I might do instinctively.
2630
01:53:11,790 --> 01:53:16,680
Taking as input some card, some name, and producing as output some location.
2631
01:53:16,680 --> 01:53:21,960
A temporary pile in which you want to stage things, so to speak.
2632
01:53:21,960 --> 01:53:24,442
But these collisions are inevitable.
2633
01:53:24,442 --> 01:53:27,150
And honestly, if we kept going through the Harry Potter universe,
2634
01:53:27,150 --> 01:53:29,950
some of these chains would get longer, and longer and longer.
2635
01:53:29,950 --> 01:53:33,330
Which means that instead of getting someone's name quickly,
2636
01:53:33,330 --> 01:53:36,178
by searching for them or inserting them, might
2637
01:53:36,178 --> 01:53:37,720
start taking a decent amount of time.
2638
01:53:37,720 --> 01:53:40,770
So what could we do instead to resolve situations like this?
2639
01:53:40,770 --> 01:53:44,370
If the problem, fundamentally, is that the first letter is just too darn
2640
01:53:44,370 --> 01:53:47,387
popular, H, we need to take in more input.
2641
01:53:47,387 --> 01:53:49,720
Not just the first letter but maybe the first 2 letters.
2642
01:53:49,720 --> 01:53:52,770
So if we do that, we can go from A through Z
2643
01:53:52,770 --> 01:53:59,200
to something more extreme like maybe H-A, H-B, H-C, H-D, H-F, and so forth.
2644
01:53:59,200 --> 01:54:02,670
So that now Harry and Hermione end up at different locations.
2645
01:54:02,670 --> 01:54:05,590
But, darn it, Hagrid still collides with Harry.
2646
01:54:05,590 --> 01:54:07,380
So it's better than before.
2647
01:54:07,380 --> 01:54:09,550
The chains aren't quite as long.
2648
01:54:09,550 --> 01:54:11,410
But the problem isn't fundamentally gone.
2649
01:54:11,410 --> 01:54:14,640
And in this case here, anyone know how many buckets we just
2650
01:54:14,640 --> 01:54:22,830
increased to, if we now look at not just a through Z but AA through ZZ, roughly?
2651
01:54:22,830 --> 01:54:24,183
AUDIENCE: 26 squared.
2652
01:54:24,183 --> 01:54:24,850
SPEAKER 1: Yeah.
2653
01:54:24,850 --> 01:54:25,440
OK, good.
2654
01:54:25,440 --> 01:54:28,980
So the easy answer to 26 squared are 676.
2655
01:54:28,980 --> 01:54:30,570
So that's a lot more buckets.
2656
01:54:30,570 --> 01:54:33,040
And this is why I only showed a few of them on the screen.
2657
01:54:33,040 --> 01:54:33,930
So that's a lot more.
2658
01:54:33,930 --> 01:54:37,050
And it spreads things out in particular.
2659
01:54:37,050 --> 01:54:38,640
What if we take this one step further?
2660
01:54:38,640 --> 01:54:44,130
Instead of H-A, we do like H-A-A, H-A-B, H-A-C, H-Z-Z, and so forth.
2661
01:54:44,130 --> 01:54:46,080
Well now, we have an even better situation.
2662
01:54:46,080 --> 01:54:48,480
Because Hermoine has her one spot.
2663
01:54:48,480 --> 01:54:49,770
Harry has his one spot.
2664
01:54:49,770 --> 01:54:51,840
Hagrid has his one spot.
2665
01:54:51,840 --> 01:54:53,880
But there's a trade off here.
2666
01:54:53,880 --> 01:54:57,240
The upside is now, arithmetically, we can find their locations
2667
01:54:57,240 --> 01:54:58,620
in constant time.
2668
01:54:58,620 --> 01:55:00,030
Maybe, technically 3 steps.
2669
01:55:00,030 --> 01:55:03,940
But 3 is constant, no matter how many other names are in here, it would seem.
2670
01:55:03,940 --> 01:55:07,152
But what's the downside here?
2671
01:55:07,152 --> 01:55:07,860
Sorry, say again.
2672
01:55:07,860 --> 01:55:08,490
AUDIENCE: Memory.
2673
01:55:08,490 --> 01:55:09,240
SPEAKER 1: Memory.
2674
01:55:09,240 --> 01:55:10,290
So significantly more.
2675
01:55:10,290 --> 01:55:15,840
We're now up to 17,576 buckets, which itself isn't that big a deal, right.
2676
01:55:15,840 --> 01:55:17,740
Computers have a lot of memory these days.
2677
01:55:17,740 --> 01:55:21,450
But as you can infer, I can't really think
2678
01:55:21,450 --> 01:55:26,160
of someone whose name started with H-E-Q, for instance, in the Harry
2679
01:55:26,160 --> 01:55:26,832
Potter universe.
2680
01:55:26,832 --> 01:55:29,040
And if we keep going, definitely don't know of anyone
2681
01:55:29,040 --> 01:55:32,040
whose name started with Z-Z-Z or A-A-A. There's
2682
01:55:32,040 --> 01:55:37,390
a lot of not useful combinations that have to be there mathematically,
2683
01:55:37,390 --> 01:55:41,040
so that you can do a bit of math and jump to randomly, so to speak,
2684
01:55:41,040 --> 01:55:42,292
the precise location.
2685
01:55:42,292 --> 01:55:43,750
But they're just going to be empty.
2686
01:55:43,750 --> 01:55:47,380
So it's a very sparsely populated array, so to speak.
2687
01:55:47,380 --> 01:55:50,640
So what does that really mean for performance, ultimately?
2688
01:55:50,640 --> 01:55:53,400
Well let's consider, again, in the context of our Big O notation.
2689
01:55:53,400 --> 01:55:56,790
It turns out that a hash table, technically speaking,
2690
01:55:56,790 --> 01:56:00,870
is still just going to give us Big O of n in the worst case.
2691
01:56:00,870 --> 01:56:01,470
Why?
2692
01:56:01,470 --> 01:56:04,440
If you have some crazy perverse case where everyone in the universe
2693
01:56:04,440 --> 01:56:07,950
has a name that starts with A, or starts with H, or starts with Z,
2694
01:56:07,950 --> 01:56:09,240
you just get really unlucky.
2695
01:56:09,240 --> 01:56:11,117
And your chain is massively long.
2696
01:56:11,117 --> 01:56:13,200
Well then, at that point, it's just a linked list.
2697
01:56:13,200 --> 01:56:14,117
It's not a hash table.
2698
01:56:14,117 --> 01:56:16,380
It's like the perverse situation with the tree, where
2699
01:56:16,380 --> 01:56:22,200
if you insert it without any mind for keeping it balance, it just evolves.
2700
01:56:22,200 --> 01:56:26,400
But there's a difference here between a theoretical performance
2701
01:56:26,400 --> 01:56:28,020
and an actual performance.
2702
01:56:28,020 --> 01:56:31,290
If you look back at the the hash table here,
2703
01:56:31,290 --> 01:56:37,890
this is absolutely, in practice, going to be faster than a single linked list.
2704
01:56:37,890 --> 01:56:40,860
Mathematically, asymptotically, big O notation, sure.
2705
01:56:40,860 --> 01:56:41,700
It's all the same.
2706
01:56:41,700 --> 01:56:42,630
Big O of n.
2707
01:56:42,630 --> 01:56:46,500
But if what we're really caring about is real humans using our software,
2708
01:56:46,500 --> 01:56:48,990
there's something to be said for crafting a data structure.
2709
01:56:48,990 --> 01:56:51,570
That technically, if this data were uniformly distributed,
2710
01:56:51,570 --> 01:56:55,450
is 26 times faster than a linked list alone.
2711
01:56:55,450 --> 01:57:00,720
And so, there's this tension too between systems, types of CS,
2712
01:57:00,720 --> 01:57:01,847
and theoretical CS.
2713
01:57:01,847 --> 01:57:03,930
Where yeah, theoretically, these are all the same.
2714
01:57:03,930 --> 01:57:06,660
But in practice, for making real-world software,
2715
01:57:06,660 --> 01:57:12,390
improving this speed by a factor of 26 in this case, let alone 576 or more,
2716
01:57:12,390 --> 01:57:14,170
might actually make a big difference.
2717
01:57:14,170 --> 01:57:15,670
But there's going to be a trade off.
2718
01:57:15,670 --> 01:57:19,540
And that's typically some other resource like giving up more space.
2719
01:57:19,540 --> 01:57:20,040
All right.
2720
01:57:20,040 --> 01:57:23,100
How about another data structure we could build.
2721
01:57:23,100 --> 01:57:26,010
Let me fast forward to something here called a trie.
2722
01:57:26,010 --> 01:57:28,920
So a trie, a weird name in pronunciation.
2723
01:57:28,920 --> 01:57:31,950
Short for retrieval, pronounced trie typically.
2724
01:57:31,950 --> 01:57:37,680
A trie is a tree that actually gives us constant time lookup,
2725
01:57:37,680 --> 01:57:41,040
even for massive data sets.
2726
01:57:41,040 --> 01:57:42,090
What do I mean by this?
2727
01:57:42,090 --> 01:57:47,230
In the world of a trie, you create a tree out of arrays.
2728
01:57:47,230 --> 01:57:49,560
So we're really getting into the Frankenstein territory
2729
01:57:49,560 --> 01:57:52,320
of just building things up with spare parts of data structures
2730
01:57:52,320 --> 01:57:53,500
that we have here.
2731
01:57:53,500 --> 01:57:56,460
But the root of a trie is, itself, an array.
2732
01:57:56,460 --> 01:57:58,530
For instance, of size 26.
2733
01:57:58,530 --> 01:58:04,800
Where each element in that trie points to another node,
2734
01:58:04,800 --> 01:58:06,510
which is to say another array.
2735
01:58:06,510 --> 01:58:09,480
And each of those locations in the array represents a letter
2736
01:58:09,480 --> 01:58:10,920
of the alphabet like A through Z.
2737
01:58:10,920 --> 01:58:14,970
So for instance, if you wanted to store the names of the Harry Potter universe,
2738
01:58:14,970 --> 01:58:19,050
not in a hash table, not in a linked list, not in a tree, but in a trie.
2739
01:58:19,050 --> 01:58:23,820
What you would do is hash on every letter in the person's name one
2740
01:58:23,820 --> 01:58:24,640
at a time.
2741
01:58:24,640 --> 01:58:28,050
So a trie is like a multi-tier hash table, in a sense.
2742
01:58:28,050 --> 01:58:29,770
Where you first look at the first letter,
2743
01:58:29,770 --> 01:58:32,478
then the second letter, then the third, and you do the following.
2744
01:58:32,478 --> 01:58:35,940
For instance, each of these locations represents a letter A
2745
01:58:35,940 --> 01:58:39,450
through Z. Suppose I wanted to insert someone's name into this
2746
01:58:39,450 --> 01:58:43,530
that starts with the letter H, like Hagrid for instance.
2747
01:58:43,530 --> 01:58:46,360
Well, I go to the location H. I see it's null,
2748
01:58:46,360 --> 01:58:49,440
which means I need to malloc myself another node or another array.
2749
01:58:49,440 --> 01:58:50,970
And that's depicted here.
2750
01:58:50,970 --> 01:58:54,810
Then, suppose I want to store the second letter in Hagrid's name,
2751
01:58:54,810 --> 01:58:57,432
an A. So I go to that location in the second node.
2752
01:58:57,432 --> 01:58:58,890
And I see, OK, it's currently null.
2753
01:58:58,890 --> 01:58:59,932
There's nothing below it.
2754
01:58:59,932 --> 01:59:02,440
So I allocate another node using malloc or the like.
2755
01:59:02,440 --> 01:59:06,690
And now I have H-A-G. And I continue this with R-I-D.
2756
01:59:06,690 --> 01:59:10,240
And then, when I get to the bottom of this person's name,
2757
01:59:10,240 --> 01:59:12,840
I just have to indicate here in color, but probably
2758
01:59:12,840 --> 01:59:14,280
with a Boolean value or something.
2759
01:59:14,280 --> 01:59:18,190
Like a true value that says, a name stops here.
2760
01:59:18,190 --> 01:59:23,740
So that it's clear that the person's name is not H-A, or H-A-G, or H-A-G-R,
2761
01:59:23,740 --> 01:59:28,270
or H-A-G-R-I. It's H-A-G-R-I-D. And the D is green,
2762
01:59:28,270 --> 01:59:31,600
just to indicate there's like some other Boolean value that just says, yes.
2763
01:59:31,600 --> 01:59:35,300
This is the node in which the name stops.
2764
01:59:35,300 --> 01:59:40,240
And if I continue this logic, here's how I might insert someone like Harry.
2765
01:59:40,240 --> 01:59:43,420
And here's how I might insert someone like Hermione.
2766
01:59:43,420 --> 01:59:48,010
And what's interesting about the design here is that some of these names
2767
01:59:48,010 --> 01:59:49,930
share a common prefix.
2768
01:59:49,930 --> 01:59:52,990
Which starts to get compelling because you're reusing space.
2769
01:59:52,990 --> 01:59:57,910
You're using the same nodes for names like H-A-G and H-A-R
2770
01:59:57,910 --> 02:00:00,370
because they share H and an A in common.
2771
02:00:00,370 --> 02:00:02,630
And they all share an H in common.
2772
02:00:02,630 --> 02:00:06,340
So you have this data structure now that, itself, is a tree.
2773
02:00:06,340 --> 02:00:10,090
Each node in the tree is, itself, an array.
2774
02:00:10,090 --> 02:00:13,690
And we, therefore, might implement this thing using code like this.
2775
02:00:13,690 --> 02:00:19,195
Every node is containing, I'll do it in reverse order, an array.
2776
02:00:19,195 --> 02:00:21,820
I'll call it children because that's what it really represents.
2777
02:00:21,820 --> 02:00:24,130
Up to 26 children for each of these nodes.
2778
02:00:24,130 --> 02:00:25,430
Size of the alphabet.
2779
02:00:25,430 --> 02:00:28,360
So I might have used just a constant for number 26,
2780
02:00:28,360 --> 02:00:30,400
to give myself 26 letters of the alphabet.
2781
02:00:30,400 --> 02:00:34,630
And each of those arrays stores that many node stars.
2782
02:00:34,630 --> 02:00:36,550
That many pointers to another node.
2783
02:00:36,550 --> 02:00:38,020
And here's an example of the Bool.
2784
02:00:38,020 --> 02:00:40,750
This is what I represented in green on the slide a moment ago.
2785
02:00:40,750 --> 02:00:42,580
I also need another piece of data.
2786
02:00:42,580 --> 02:00:45,520
Just a 0 or 1, a true or false, that says yes.
2787
02:00:45,520 --> 02:00:50,810
A name stops in this node or it's just a path to the rest of the person's name.
2788
02:00:50,810 --> 02:00:55,090
But the upside of this is that the height of this tree
2789
02:00:55,090 --> 02:00:58,090
is only as tall as the person's longest name.
2790
02:00:58,090 --> 02:01:04,930
H-A-G-R-I-D or H-E-R-M-O-I-N-E. And notice that no matter how many other
2791
02:01:04,930 --> 02:01:08,740
people are in this data structure, there's 3 at the moment,
2792
02:01:08,740 --> 02:01:13,150
if there were 3 million, it would still take me how many steps to search
2793
02:01:13,150 --> 02:01:14,500
for Hermoine?
2794
02:01:14,500 --> 02:01:19,750
H-E-R-M-I-O-N-E. So, 8 steps total.
2795
02:01:19,750 --> 02:01:24,580
No matter if there's 2 other people, 2 million, 10 million other people.
2796
02:01:24,580 --> 02:01:28,660
Because the path to her name is always on the same path.
2797
02:01:28,660 --> 02:01:33,550
And if you assume that there's a maximum limit on the length of names
2798
02:01:33,550 --> 02:01:34,420
in the human world.
2799
02:01:34,420 --> 02:01:36,510
Maybe it's 40, 100, whatever.
2800
02:01:36,510 --> 02:01:38,260
Whatever the longest name in the world is.
2801
02:01:38,260 --> 02:01:39,160
That's constant.
2802
02:01:39,160 --> 02:01:41,630
Maybe it's 40, 100, but that's constant.
2803
02:01:41,630 --> 02:01:44,840
Which is to say that with a trie, technically speaking,
2804
02:01:44,840 --> 02:01:49,480
it is the case that your lookup time, Big O of n, a big O notation,
2805
02:01:49,480 --> 02:01:51,520
would be big O of 1.
2806
02:01:51,520 --> 02:01:54,580
It's constant time, because unlike every other data structure
2807
02:01:54,580 --> 02:01:59,440
we've looked at, with a trie, the amount of time it takes you to find one person
2808
02:01:59,440 --> 02:02:02,920
or insert one person is completely independent of how
2809
02:02:02,920 --> 02:02:07,210
many other pieces of data are already in the data structure.
2810
02:02:07,210 --> 02:02:09,970
And this holds true even if one name is a prefix of another.
2811
02:02:09,970 --> 02:02:13,373
I don't think there was a Daniel or Danielle in the Harry Potter universe
2812
02:02:13,373 --> 02:02:14,290
that I could think of.
2813
02:02:14,290 --> 02:02:18,400
But, D-A-N-I-E-L could be one name.
2814
02:02:18,400 --> 02:02:20,988
And, therefore, we have a true there in green.
2815
02:02:20,988 --> 02:02:22,780
And if there's a longer name like Danielle.
2816
02:02:22,780 --> 02:02:24,760
Then, you keep going until you get to the E.
2817
02:02:24,760 --> 02:02:27,550
So you can still have with a trie, one name that's
2818
02:02:27,550 --> 02:02:29,660
a substring of another name.
2819
02:02:29,660 --> 02:02:32,380
So it's not as though we've created a problem there.
2820
02:02:32,380 --> 02:02:34,052
That, too, is still possible.
2821
02:02:34,052 --> 02:02:36,760
But at the end of the day, it only takes a finite number of steps
2822
02:02:36,760 --> 02:02:38,410
to find any of these people.
2823
02:02:38,410 --> 02:02:41,320
And again, that's what's particularly compelling.
2824
02:02:41,320 --> 02:02:43,398
That you effectively have constant time lookup.
2825
02:02:43,398 --> 02:02:44,440
So that's amazing, right.
2826
02:02:44,440 --> 02:02:48,153
We've gone through this whole story for weeks now of like, linear time.
2827
02:02:48,153 --> 02:02:49,570
And then, it went up to n squared.
2828
02:02:49,570 --> 02:02:50,350
And then, log n.
2829
02:02:50,350 --> 02:02:55,430
And now constant time, what's the price paid for a data structure like this?
2830
02:02:55,430 --> 02:02:58,630
This so-called trie?
2831
02:02:58,630 --> 02:02:59,810
What's the downside here?
2832
02:02:59,810 --> 02:03:01,540
There's got to be a catch.
2833
02:03:01,540 --> 02:03:03,970
And in fact, tries are not actually used that often,
2834
02:03:03,970 --> 02:03:07,500
amazing as they might sound on some CS level here.
2835
02:03:07,500 --> 02:03:08,260
AUDIENCE: Memory.
2836
02:03:08,260 --> 02:03:09,520
SPEAKER 1: Memory.
2837
02:03:09,520 --> 02:03:10,735
In what sense?
2838
02:03:10,735 --> 02:03:12,898
AUDIENCE: Much like a [INAUDIBLE].
2839
02:03:12,898 --> 02:03:13,690
SPEAKER 1: Exactly.
2840
02:03:13,690 --> 02:03:15,610
If you're storing all of these darn arrays
2841
02:03:15,610 --> 02:03:18,870
it's, again, a sparsely populated data structure.
2842
02:03:18,870 --> 02:03:19,870
And you can see it here.
2843
02:03:19,870 --> 02:03:23,800
Granted there's only 3 names, but most of those boxes, most of those pointers,
2844
02:03:23,800 --> 02:03:25,490
are going to remain null.
2845
02:03:25,490 --> 02:03:28,540
So this is an incredibly wide data structure, if you will.
2846
02:03:28,540 --> 02:03:31,040
It uses a huge amount of memory to store the names.
2847
02:03:31,040 --> 02:03:32,860
But again, you've got to pick a lane.
2848
02:03:32,860 --> 02:03:35,980
Either you're going to minimize space or you're going to minimize time.
2849
02:03:35,980 --> 02:03:39,240
It's not really possible to get truly the best of both worlds.
2850
02:03:39,240 --> 02:03:41,290
You have to decide where the inflection point is
2851
02:03:41,290 --> 02:03:44,110
for the device you're writing software for, how much memory it has,
2852
02:03:44,110 --> 02:03:45,460
how expensive it is.
2853
02:03:45,460 --> 02:03:48,980
And again, taking all of these things into account.
2854
02:03:48,980 --> 02:03:51,400
So lastly, let's do one further abstraction.
2855
02:03:51,400 --> 02:03:54,910
So even higher level to discuss something that are generally
2856
02:03:54,910 --> 02:03:56,962
known as abstract data structures.
2857
02:03:56,962 --> 02:03:58,670
It turns out we could spend like all day,
2858
02:03:58,670 --> 02:04:00,250
all week, talking about different things we
2859
02:04:00,250 --> 02:04:01,700
could build with these data structures.
2860
02:04:01,700 --> 02:04:03,658
But for the most part, now that we have arrays.
2861
02:04:03,658 --> 02:04:06,430
Now that we have linked lists or their cousin's trees, which
2862
02:04:06,430 --> 02:04:07,428
are 2-dimensional.
2863
02:04:07,428 --> 02:04:09,220
And beyond that, there's even graphs, where
2864
02:04:09,220 --> 02:04:12,407
the arrows can go in multiple directions, not just down, so to speak.
2865
02:04:12,407 --> 02:04:14,740
Now that we have this ability to stitch things together,
2866
02:04:14,740 --> 02:04:16,790
we can solve all different types of problems.
2867
02:04:16,790 --> 02:04:20,740
So, for instance, a very common type of data structure
2868
02:04:20,740 --> 02:04:24,730
to use in a program, or even our human world, are things called queues.
2869
02:04:24,730 --> 02:04:28,780
A queue being a data structure like a line outside of a store.
2870
02:04:28,780 --> 02:04:30,850
Where it has what's called a FIFO property.
2871
02:04:30,850 --> 02:04:32,240
First In, First Out.
2872
02:04:32,240 --> 02:04:34,660
Which is great for fairness, at least in the human world.
2873
02:04:34,660 --> 02:04:38,800
And if you've ever waited outside of Tasty Burger, or Salsa Fresca,
2874
02:04:38,800 --> 02:04:40,990
or some other restaurant nearby, presumably,
2875
02:04:40,990 --> 02:04:43,780
if you're queuing up at the counter, you want
2876
02:04:43,780 --> 02:04:46,270
them store to maintain a FIFO system.
2877
02:04:46,270 --> 02:04:47,530
First in and first out.
2878
02:04:47,530 --> 02:04:51,160
So that whoever's first in line gets their food first and gets out first.
2879
02:04:51,160 --> 02:04:54,710
So a queue is actually a computer science term, too.
2880
02:04:54,710 --> 02:04:57,460
And even if you're still in the habit of printing things on paper,
2881
02:04:57,460 --> 02:04:59,710
there are things you might have heard called printer
2882
02:04:59,710 --> 02:05:02,050
queues, which also do things in order.
2883
02:05:02,050 --> 02:05:04,467
The first person to send their essay to the printer
2884
02:05:04,467 --> 02:05:06,550
should, ideally, be printed before the last person
2885
02:05:06,550 --> 02:05:08,920
to send their essay to the printer.
2886
02:05:08,920 --> 02:05:10,720
Again, in the interest of fairness.
2887
02:05:10,720 --> 02:05:12,370
But how can you implement a queue?
2888
02:05:12,370 --> 02:05:15,250
Well, you typically have to implement 2 fundamental operations,
2889
02:05:15,250 --> 02:05:16,810
enqueue and dequeue.
2890
02:05:16,810 --> 02:05:19,910
So adding something to it and removing something from it.
2891
02:05:19,910 --> 02:05:23,650
And the interesting thing here is that how do you implement a queue?
2892
02:05:23,650 --> 02:05:26,650
Well in the human world, you would just have literally physical space
2893
02:05:26,650 --> 02:05:29,290
for humans to line up from left to right, or right to left.
2894
02:05:29,290 --> 02:05:30,333
Same in a computer.
2895
02:05:30,333 --> 02:05:33,250
Like a printer queue, if you send a whole bunch of jobs to be printed,
2896
02:05:33,250 --> 02:05:35,350
a whole bunch of essays or documents, well, you
2897
02:05:35,350 --> 02:05:37,430
need a chunk of memory like an array.
2898
02:05:37,430 --> 02:05:37,930
All right.
2899
02:05:37,930 --> 02:05:40,150
Well, if you use an array, what's a problem
2900
02:05:40,150 --> 02:05:43,760
that could happen in the world of printing, for instance?
2901
02:05:43,760 --> 02:05:47,020
If you use an array to store all of the documents that need to be printed.
2902
02:05:47,020 --> 02:05:48,178
AUDIENCE: It can be filled.
2903
02:05:48,178 --> 02:05:49,720
SPEAKER 1: It could be filled, right.
2904
02:05:49,720 --> 02:05:53,020
So if the programmer decided, HP or whoever makes the printer decides,
2905
02:05:53,020 --> 02:05:56,680
oh, you can send like a megabyte worth of documents to this printer at once.
2906
02:05:56,680 --> 02:05:58,730
At some point you might get an error message,
2907
02:05:58,730 --> 02:06:00,100
which says, sorry out of memory.
2908
02:06:00,100 --> 02:06:00,995
Wait a few minutes.
2909
02:06:00,995 --> 02:06:03,370
Which is maybe a reasonable solution, but a little annoy.
2910
02:06:03,370 --> 02:06:07,000
Or HP could write code that maybe dynamically resizes the array
2911
02:06:07,000 --> 02:06:07,670
or so forth.
2912
02:06:07,670 --> 02:06:10,240
But at that point, maybe they should just use a linked list.
2913
02:06:10,240 --> 02:06:11,170
And they could.
2914
02:06:11,170 --> 02:06:14,890
So there, too, you could implement the notion of a queue
2915
02:06:14,890 --> 02:06:16,238
using a linked list instead.
2916
02:06:16,238 --> 02:06:18,280
You're going to spend more memory, but you're not
2917
02:06:18,280 --> 02:06:20,650
going to run out of space in your array.
2918
02:06:20,650 --> 02:06:22,493
Which might be more compelling.
2919
02:06:22,493 --> 02:06:24,160
This happens even in the physical world.
2920
02:06:24,160 --> 02:06:27,640
You go to the store and you start having to line up outside and down the road.
2921
02:06:27,640 --> 02:06:31,927
And like, for a really busy store, they run out of space so they make do.
2922
02:06:31,927 --> 02:06:34,510
But in that case, it tends to be more of an array just because
2923
02:06:34,510 --> 02:06:36,965
of the physical notion of humans lining up.
2924
02:06:36,965 --> 02:06:38,590
But there's other data structures, too.
2925
02:06:38,590 --> 02:06:41,715
If you've ever gone to the dining hall and picked up like a Harvard or Yale
2926
02:06:41,715 --> 02:06:46,870
tray, you're typically picking up the last tray that was just cleaned,
2927
02:06:46,870 --> 02:06:48,730
not the first tray that was cleaned.
2928
02:06:48,730 --> 02:06:49,240
Why?
2929
02:06:49,240 --> 02:06:53,170
Because these cafeteria trays stack up on top of each other.
2930
02:06:53,170 --> 02:06:56,410
And indeed a stack is another type of abstract data structure.
2931
02:06:56,410 --> 02:06:58,870
In the physical world, it's literally something physical
2932
02:06:58,870 --> 02:07:01,030
like a stack of trays.
2933
02:07:01,030 --> 02:07:03,940
Which have what we would call a LIFO property.
2934
02:07:03,940 --> 02:07:05,460
Last In, First Out.
2935
02:07:05,460 --> 02:07:07,210
So as these things come out of the washer,
2936
02:07:07,210 --> 02:07:09,520
they're putting the most recent ones on the top.
2937
02:07:09,520 --> 02:07:13,240
And then you, the human, are probably taking the most recently cleaned one.
2938
02:07:13,240 --> 02:07:15,700
Which means in the extreme, no one on campus
2939
02:07:15,700 --> 02:07:19,135
might ever use that very first tray.
2940
02:07:19,135 --> 02:07:21,010
Which is probably fine in the world of trays,
2941
02:07:21,010 --> 02:07:24,970
but would really be bad in the world of Tasty Burger lining up for food if LIFO
2942
02:07:24,970 --> 02:07:26,770
were the property being implemented.
2943
02:07:26,770 --> 02:07:28,840
But here, too, it could be an array.
2944
02:07:28,840 --> 02:07:29,950
It could be a linked list.
2945
02:07:29,950 --> 02:07:31,533
And you see this, honestly, every day.
2946
02:07:31,533 --> 02:07:33,760
If you're using Gmail and your Gmail inbox.
2947
02:07:33,760 --> 02:07:36,280
That is actually a stack, at least by default,
2948
02:07:36,280 --> 02:07:39,678
where your newest message last in are the first ones
2949
02:07:39,678 --> 02:07:40,720
at the top of the screen.
2950
02:07:40,720 --> 02:07:42,580
That's a LIFO data structure.
2951
02:07:42,580 --> 02:07:44,710
And it means that you see your most recent emails.
2952
02:07:44,710 --> 02:07:47,168
But if you have a busy day, you're getting a lot of emails,
2953
02:07:47,168 --> 02:07:48,430
it might not be a good thing.
2954
02:07:48,430 --> 02:07:50,830
Because now you're ignoring the people who wrote you
2955
02:07:50,830 --> 02:07:53,140
way earlier in the day or the week.
2956
02:07:53,140 --> 02:07:55,600
So LIFO and FIFO are just properties that you
2957
02:07:55,600 --> 02:07:58,360
can achieve with these very specific types of data structures.
2958
02:07:58,360 --> 02:08:00,110
And the parliaments in the world of stacks
2959
02:08:00,110 --> 02:08:03,970
is to push something onto a stack or pop something out.
2960
02:08:03,970 --> 02:08:06,160
These are here, for instance, as an example of why
2961
02:08:06,160 --> 02:08:07,450
might you always wear the same color.
2962
02:08:07,450 --> 02:08:09,710
Well, if you're storing all of your clothes in a stack,
2963
02:08:09,710 --> 02:08:11,530
you might not ever get to the different colored
2964
02:08:11,530 --> 02:08:12,970
clothes at the bottom of the list.
2965
02:08:12,970 --> 02:08:17,890
And in fact, to paint this picture, we have a couple of minute video here.
2966
02:08:17,890 --> 02:08:20,890
Just to paint this here, made by a faculty member elsewhere.
2967
02:08:20,890 --> 02:08:23,830
Let's go ahead and dim the lights for just a minute or 2 here.
2968
02:08:23,830 --> 02:08:27,985
So that we can take a look at Jack learning some facts.
2969
02:08:27,985 --> 02:08:28,610
[VIDEO PLAYING]
2970
02:08:28,610 --> 02:08:31,360
SPEAKER 2: Once upon a time, there was a guy named Jack.
2971
02:08:31,360 --> 02:08:34,750
When it came to making friends Jack did not have the knack.
2972
02:08:34,750 --> 02:08:37,720
So Jack went to talk to the most popular guy he knew.
2973
02:08:37,720 --> 02:08:40,390
He went up to Lou and asked, what do I do?
2974
02:08:40,390 --> 02:08:42,850
Lou saw that his friend was really distressed.
2975
02:08:42,850 --> 02:08:45,560
Well, Lou began, just look how you're dressed.
2976
02:08:45,560 --> 02:08:48,130
Don't you have any clothes with a different look?
2977
02:08:48,130 --> 02:08:49,210
Yes, said Jack.
2978
02:08:49,210 --> 02:08:50,530
I sure do.
2979
02:08:50,530 --> 02:08:52,720
Come to my house and I'll showed them to you.
2980
02:08:52,720 --> 02:08:54,010
So they went off the Jack's.
2981
02:08:54,010 --> 02:08:57,700
And Jack showed Lou the box, where he kept all his shirts, and his pants,
2982
02:08:57,700 --> 02:08:58,750
at his socks.
2983
02:08:58,750 --> 02:09:01,720
Lou said, I see you have all your clothes in a pile.
2984
02:09:01,720 --> 02:09:04,300
Why don't you wear some others once in a while?
2985
02:09:04,300 --> 02:09:07,450
Jack said, well, when I remove clothes and socks,
2986
02:09:07,450 --> 02:09:10,180
I wash them and put them away in the box.
2987
02:09:10,180 --> 02:09:12,670
Then comes the next morning and up I hop.
2988
02:09:12,670 --> 02:09:15,910
I go to the box and get my clothes off the top.
2989
02:09:15,910 --> 02:09:18,520
Lou quickly realized the problem with Jack.
2990
02:09:18,520 --> 02:09:21,490
He kept clothes, CDs, and books in a stack.
2991
02:09:21,490 --> 02:09:23,920
When he'd reached for something to read or to wear,
2992
02:09:23,920 --> 02:09:26,530
he chose a top book or underwear.
2993
02:09:26,530 --> 02:09:28,920
Then when he was done he would put it right back.
2994
02:09:28,920 --> 02:09:31,500
Back it would go on top of the stack.
2995
02:09:31,500 --> 02:09:33,870
I know the solution, said a triumphant Lou.
2996
02:09:33,870 --> 02:09:36,510
You need to learn to start using a queue.
2997
02:09:36,510 --> 02:09:39,300
Lou took Jack's clothes and hung them in a closet.
2998
02:09:39,300 --> 02:09:42,120
And when he had emptied the box, he just tossed it.
2999
02:09:42,120 --> 02:09:45,990
Then he said, now Jack, at the end of the day, put your clothes on the left
3000
02:09:45,990 --> 02:09:47,470
when you put them away.
3001
02:09:47,470 --> 02:09:50,190
Then tomorrow morning when you see the sunshine, get
3002
02:09:50,190 --> 02:09:52,920
your clothes from the right, from the end of the line.
3003
02:09:52,920 --> 02:09:55,800
Don't you see, said Lou, it will be so nice.
3004
02:09:55,800 --> 02:09:59,130
You'll wear everything once before you wear something twice.
3005
02:09:59,130 --> 02:10:02,070
And with everything in queues in his closet and shelf,
3006
02:10:02,070 --> 02:10:04,680
Jack started to feel quite sure of himself.
3007
02:10:04,680 --> 02:10:07,155
All thanks to Lou and his wonderful queue.
3008
02:10:09,220 --> 02:10:12,220
SPEAKER 1: So just to help you realize that these things are everywhere.
3009
02:10:12,220 --> 02:10:14,830
[AUDIENCE CLAPPING]
3010
02:10:14,830 --> 02:10:16,380
Even in our human world.
3011
02:10:16,380 --> 02:10:18,060
If you've ever lined up at this place.
3012
02:10:18,060 --> 02:10:19,980
Anyone recognize this?
3013
02:10:19,980 --> 02:10:22,800
OK, so sweetgreen, little salad place in the square.
3014
02:10:22,800 --> 02:10:24,690
This is if you order online or in advance,
3015
02:10:24,690 --> 02:10:27,232
your food ends up according to the first letter in your name.
3016
02:10:27,232 --> 02:10:29,482
Which actually sounds awfully reminiscent of something
3017
02:10:29,482 --> 02:10:30,300
like a hash table.
3018
02:10:30,300 --> 02:10:33,360
And in fact, no matter whether you implement a hash table like we
3019
02:10:33,360 --> 02:10:35,130
did, with an array and linked list.
3020
02:10:35,130 --> 02:10:37,335
Or with 3 shelves like this.
3021
02:10:37,335 --> 02:10:40,320
This is actually an abstract data type called a dictionary.
3022
02:10:40,320 --> 02:10:43,680
And a dictionary, just like in our human world, has keys and values.
3023
02:10:43,680 --> 02:10:45,390
Words and their definitions.
3024
02:10:45,390 --> 02:10:49,890
This just has letters of the alphabet and salads as their value.
3025
02:10:49,890 --> 02:10:52,260
But here, too, there's a real world constraint.
3026
02:10:52,260 --> 02:10:55,740
In what kind of scenario does this system at sweetgreen
3027
02:10:55,740 --> 02:10:58,410
devolve into a problem, for instance?
3028
02:10:58,410 --> 02:11:02,100
Because they, too, are using only finite space, finite storage.
3029
02:11:02,100 --> 02:11:03,090
What could go wrong?
3030
02:11:03,090 --> 02:11:03,360
Yeah.
3031
02:11:03,360 --> 02:11:04,290
AUDIENCE: Run out of space.
3032
02:11:04,290 --> 02:11:04,530
SPEAKER 1: Yeah.
3033
02:11:04,530 --> 02:11:05,910
If they run out of space on the shelf and there's
3034
02:11:05,910 --> 02:11:08,380
a lot of people whose names start with D, or E, or whatever.
3035
02:11:08,380 --> 02:11:09,300
And so, they just pile up.
3036
02:11:09,300 --> 02:11:11,880
And then, maybe, they kind of overflow into the E's or the F's.
3037
02:11:11,880 --> 02:11:13,800
And they probably don't really care because any human
3038
02:11:13,800 --> 02:11:16,290
is going to come by, and just eyeball it, and figure it out anyway.
3039
02:11:16,290 --> 02:11:18,780
But in the world of a computer, you're the one coding
3040
02:11:18,780 --> 02:11:20,670
and have to be ever so precise.
3041
02:11:20,670 --> 02:11:24,240
We thought we would lastly do one final thing here.
3042
02:11:24,240 --> 02:11:28,045
In advance, we prepared a linked list of sorts in the audience.
3043
02:11:28,045 --> 02:11:29,670
Since this has become a bit of a thing.
3044
02:11:29,670 --> 02:11:32,530
I am starting to represent the beginning of this linked list.
3045
02:11:32,530 --> 02:11:37,110
And so far as I have a pointer here with seat location G9.
3046
02:11:37,110 --> 02:11:40,500
Whoever is in G9, would you mind standing up?
3047
02:11:40,500 --> 02:11:43,170
And what letter is on your sheet there?
3048
02:11:43,170 --> 02:11:44,100
AUDIENCE: F15.
3049
02:11:44,100 --> 02:11:46,650
SPEAKER 1: OK, so you have S15 and your letter--
3050
02:11:46,650 --> 02:11:47,305
AUDIENCE: F15.
3051
02:11:47,305 --> 02:11:48,180
SPEAKER 1: Say again?
3052
02:11:48,180 --> 02:11:48,870
AUDIENCE: F.
3053
02:11:48,870 --> 02:11:49,680
SPEAKER 1: F15.
3054
02:11:49,680 --> 02:11:51,990
So I see you're holding a C in your node.
3055
02:11:51,990 --> 02:11:55,500
You are pointing to, if you could physically, F15.
3056
02:11:55,500 --> 02:11:56,880
F15, what do you hold?
3057
02:11:56,880 --> 02:11:57,780
AUDIENCE: S.
3058
02:11:57,780 --> 02:12:00,390
SPEAKER 1: You have an S. And who should you be pointing at?
3059
02:12:00,390 --> 02:12:01,170
AUDIENCE: F5.
3060
02:12:01,170 --> 02:12:01,930
SPEAKER 1: F5.
3061
02:12:01,930 --> 02:12:03,240
Could you stand up, F5.
3062
02:12:03,240 --> 02:12:04,950
You're holding a 5, I see.
3063
02:12:04,950 --> 02:12:06,030
What address?
3064
02:12:06,030 --> 02:12:07,020
AUDIENCE: F12.
3065
02:12:07,020 --> 02:12:08,040
SPEAKER 1: F12.
3066
02:12:08,040 --> 02:12:08,820
Big finale.
3067
02:12:08,820 --> 02:12:13,020
F12, if you'd like to stand up holding a 0 and null, which means that was CS50.
3068
02:12:13,020 --> 02:12:16,540
[AUDIENCE CLAPPING]
3069
02:12:16,540 --> 02:12:17,040
All right.
3070
02:12:17,040 --> 02:12:19,340
We'll see you next time.
3071
02:12:19,340 --> 02:12:54,000
[MUSIC PLAYING]
247103
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.