Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
0
00:00:00,000 --> 00:00:02,982
[FILM REEL]
1
00:00:02,982 --> 00:00:06,461
[MUSIC PLAYING]
2
00:00:06,461 --> 00:01:12,660
3
00:01:12,660 --> 00:01:15,420
DAVID MALAN: All right, this is CS50.
4
00:01:15,420 --> 00:01:19,200
And this is Week 3 already, wherein we'll take a look back actually
5
00:01:19,200 --> 00:01:21,400
at Week 0 where we first began.
6
00:01:21,400 --> 00:01:24,930
And in Week 0, recall that everything was very intuitive, in a sense.
7
00:01:24,930 --> 00:01:28,000
We talked not just about representation of information, but algorithms.
8
00:01:28,000 --> 00:01:30,450
And we talked about tearing a phone book again and again.
9
00:01:30,450 --> 00:01:32,860
And that somehow got us to a better solution.
10
00:01:32,860 --> 00:01:35,888
But today, we'll try to start formalizing some of those ideas
11
00:01:35,888 --> 00:01:38,430
and capturing some of those same ideas not in pseudocode just
12
00:01:38,430 --> 00:01:41,650
yet, but in actual code as well.
13
00:01:41,650 --> 00:01:45,210
But we'll also consider the efficiency of those algorithms,
14
00:01:45,210 --> 00:01:48,372
like just how good, how well-designed our algorithms actually are.
15
00:01:48,372 --> 00:01:50,580
And if you recall, when we did the phone book example
16
00:01:50,580 --> 00:01:53,940
wherein I first had an algorithm searching one page at a time,
17
00:01:53,940 --> 00:01:56,460
and then second one two pages at a time, and then third,
18
00:01:56,460 --> 00:01:58,590
started tearing the thing in half, recall
19
00:01:58,590 --> 00:02:01,900
that we, with a wave of the hand, kind of analyzed it as follows.
20
00:02:01,900 --> 00:02:05,070
We proposed that if the x-axis here is the size of the problem,
21
00:02:05,070 --> 00:02:09,120
like number of pages in a phone book, and the y-axis is the time required
22
00:02:09,120 --> 00:02:11,430
to solve the problem in seconds, minutes,
23
00:02:11,430 --> 00:02:13,363
page tears, whatever your unit of measure is,
24
00:02:13,363 --> 00:02:16,530
recall that the first algorithm, which is the straight line such that if you
25
00:02:16,530 --> 00:02:19,620
had n pages in the phone book, it might have this slope of n--
26
00:02:19,620 --> 00:02:23,280
and there's this one-to-one relationship between pages and tears.
27
00:02:23,280 --> 00:02:27,000
Two pages at a time, of course, was twice as fast, but still really
28
00:02:27,000 --> 00:02:30,000
the same shape, the yellow line here indicating that yeah,
29
00:02:30,000 --> 00:02:33,570
it's n over 2, maybe plus 1 if you have to double back, as we discussed.
30
00:02:33,570 --> 00:02:36,840
But it's really still fundamentally the same algorithm one
31
00:02:36,840 --> 00:02:38,370
or two pages at a time.
32
00:02:38,370 --> 00:02:41,760
But the third algorithm, recall, was this one here in green,
33
00:02:41,760 --> 00:02:45,660
where we called it logarithmic in terms of how fast or how slow it was.
34
00:02:45,660 --> 00:02:48,235
And indeed, the implication of this algorithm
35
00:02:48,235 --> 00:02:50,610
was that we could even double the size of the phone book,
36
00:02:50,610 --> 00:02:53,280
and no big deal-- one additional page tear,
37
00:02:53,280 --> 00:02:55,990
and we take yet another 1,000 page bite out of the phone book.
38
00:02:55,990 --> 00:02:58,890
So today, we'll revisit some of these ideas, formalize them a bit,
39
00:02:58,890 --> 00:03:01,530
but also translate some of them, ultimately, to code.
40
00:03:01,530 --> 00:03:03,690
And all of that now is possible because we
41
00:03:03,690 --> 00:03:06,060
have this lower-level understanding, perhaps, of what's
42
00:03:06,060 --> 00:03:07,600
actually inside of your computer.
43
00:03:07,600 --> 00:03:10,170
This, of course, is your computer's RAM or memory.
44
00:03:10,170 --> 00:03:12,720
And recall that if we start to abstract this away,
45
00:03:12,720 --> 00:03:15,420
your computer's memory is really just a grid of bytes.
46
00:03:15,420 --> 00:03:17,730
In fact, we don't have to look at the hardware anymore.
47
00:03:17,730 --> 00:03:21,270
And we looked at a grid of bytes like this, whereby each of these bytes
48
00:03:21,270 --> 00:03:26,700
could be used to store a char, an int, a long, or even an entire string,
49
00:03:26,700 --> 00:03:27,460
at that.
50
00:03:27,460 --> 00:03:30,117
But let's focus perhaps just on a subset of this
51
00:03:30,117 --> 00:03:31,950
because last week, of course, we emphasized,
52
00:03:31,950 --> 00:03:34,830
really, arrays, storing things in arrays.
53
00:03:34,830 --> 00:03:38,140
And that allowed us to start storing entire strings, sequences
54
00:03:38,140 --> 00:03:40,470
of characters, and even arrays of integers
55
00:03:40,470 --> 00:03:44,770
if we wanted to have multiple ones and not just multiple variables as well.
56
00:03:44,770 --> 00:03:48,850
But the catch is that if you look inside of an array in the computer's memory--
57
00:03:48,850 --> 00:03:51,450
and for instance, suppose these integers here are stored--
58
00:03:51,450 --> 00:03:53,850
it's pretty easy for us humans to glance at this
59
00:03:53,850 --> 00:03:55,698
and immediately find the number 50.
60
00:03:55,698 --> 00:03:57,990
You sort of have this bird's eye view from where you're
61
00:03:57,990 --> 00:03:59,700
seated of everything on the screen.
62
00:03:59,700 --> 00:04:02,500
And so it's pretty obvious how you get to the number 50.
63
00:04:02,500 --> 00:04:04,710
But in the world of computers, of course,
64
00:04:04,710 --> 00:04:06,720
it turns out that this is hardware.
65
00:04:06,720 --> 00:04:10,440
And computers, for today's purposes, can only do one thing at a time.
66
00:04:10,440 --> 00:04:14,820
They can't just take it all in and find instantly some number like 50.
67
00:04:14,820 --> 00:04:17,130
So perhaps a decent metaphor is to consider
68
00:04:17,130 --> 00:04:20,010
the array of memory inside of your computer
69
00:04:20,010 --> 00:04:22,500
really is a sequence of closed doors.
70
00:04:22,500 --> 00:04:25,860
And if the computer wants to find some value in an array,
71
00:04:25,860 --> 00:04:29,970
it has to do the digital equivalent of opening each of these doors one
72
00:04:29,970 --> 00:04:30,730
at a time.
73
00:04:30,730 --> 00:04:32,410
Now how can code do that?
74
00:04:32,410 --> 00:04:35,610
Well, of course, we introduced indices or indexes last week,
75
00:04:35,610 --> 00:04:39,900
whereby we, by convention, call the first element of an array location 0,
76
00:04:39,900 --> 00:04:44,490
the second location 1, the third location 2, and so forth-- so-called 0
77
00:04:44,490 --> 00:04:45,150
indexed.
78
00:04:45,150 --> 00:04:48,300
And this allowed us to now bridge this conceptual world of what's
79
00:04:48,300 --> 00:04:51,510
going on in memory with actual code, because now we had this square bracket
80
00:04:51,510 --> 00:04:55,710
syntax via which we could go searching for something if we so choose.
81
00:04:55,710 --> 00:04:59,760
And it turns out, if I now paint these red instead of yellow,
82
00:04:59,760 --> 00:05:02,970
it would seem that we actually have a pretty good physical metaphor here
83
00:05:02,970 --> 00:05:07,170
standing in place for what would be a computer's array of memory
84
00:05:07,170 --> 00:05:10,390
if, for instance, you're storing some seven numbers like that.
85
00:05:10,390 --> 00:05:13,710
And so today we begin with a look of a specific type of algorithm.
86
00:05:13,710 --> 00:05:14,910
That is for searching.
87
00:05:14,910 --> 00:05:16,270
Searching is all over the place.
88
00:05:16,270 --> 00:05:19,900
All of us have probably gone to google.com or some equivalent already
89
00:05:19,900 --> 00:05:21,090
multiple times per day.
90
00:05:21,090 --> 00:05:24,120
And getting back answers fast is what companies like Google
91
00:05:24,120 --> 00:05:25,110
are really good at.
92
00:05:25,110 --> 00:05:26,700
So how are they doing that?
93
00:05:26,700 --> 00:05:29,882
How are they storing information in computers' memory?
94
00:05:29,882 --> 00:05:31,590
Well, let's consider what this really is.
95
00:05:31,590 --> 00:05:34,500
It's really just a problem as it was back in Week 0.
96
00:05:34,500 --> 00:05:36,420
The input, though, to the problem, for now,
97
00:05:36,420 --> 00:05:38,253
might be this array of seven lockers.
98
00:05:38,253 --> 00:05:40,920
So that's the input to the problem, inside of which is a number.
99
00:05:40,920 --> 00:05:45,060
And maybe for simplicity now, we just want a yes/no, a true/false answer--
100
00:05:45,060 --> 00:05:46,860
a bool, that is to say--
101
00:05:46,860 --> 00:05:51,250
of whether or not some number like 50 is in that array.
102
00:05:51,250 --> 00:05:52,680
It's not quite as fancy as Google.
103
00:05:52,680 --> 00:05:55,500
That doesn't just tell you yes, we have search results.
104
00:05:55,500 --> 00:05:57,300
It actually gives you the search results.
105
00:05:57,300 --> 00:05:59,520
But for, now we'll keep it simple, and just output
106
00:05:59,520 --> 00:06:02,070
as part of this problem yes or no, true or false,
107
00:06:02,070 --> 00:06:06,090
we have found the number we're looking for given an input like that array.
108
00:06:06,090 --> 00:06:09,930
But it turns out inside of this black box that we keep coming back to,
109
00:06:09,930 --> 00:06:12,385
there's all sorts of possible algorithms.
110
00:06:12,385 --> 00:06:15,010
And we talked about this at a high level conceptually in Week 0
111
00:06:15,010 --> 00:06:15,860
with the phone book.
112
00:06:15,860 --> 00:06:19,360
But today, let's consider it a little more concretely
113
00:06:19,360 --> 00:06:22,648
by way of a game that some of you might have grown up with, namely Monopoly.
114
00:06:22,648 --> 00:06:24,940
And so behind these doors, it turns out, will be hidden
115
00:06:24,940 --> 00:06:26,800
some denominations of monopoly money.
116
00:06:26,800 --> 00:06:28,690
But for this, we now have two volunteers.
117
00:06:28,690 --> 00:06:30,483
If you'd like to greet the world?
118
00:06:30,483 --> 00:06:31,525
JACKSON: Hi, I'm Jackson.
119
00:06:31,525 --> 00:06:35,440
120
00:06:35,440 --> 00:06:37,220
STEPHANIE: Hi, my name is Stephanie.
121
00:06:37,220 --> 00:06:38,410
DAVID MALAN: And do you want to say a little something
122
00:06:38,410 --> 00:06:40,570
about yourselves-- years, house, dorm?
123
00:06:40,570 --> 00:06:43,030
STEPHANIE: I'm a first year living in Matthews.
124
00:06:43,030 --> 00:06:43,780
DAVID MALAN: Nice.
125
00:06:43,780 --> 00:06:45,580
JACKSON: And I'm a first year in Canaday.
126
00:06:45,580 --> 00:06:46,330
DAVID MALAN: Nice.
127
00:06:46,330 --> 00:06:48,670
Well, welcome to our two volunteers.
128
00:06:48,670 --> 00:06:50,710
So why don't we do this?
129
00:06:50,710 --> 00:06:54,460
Would one of you like to volunteer the other to go first?
130
00:06:54,460 --> 00:06:55,510
STEPHANIE: I'll go.
131
00:06:55,510 --> 00:06:56,560
DAVID MALAN: OK.
132
00:06:56,560 --> 00:06:58,300
All right, so Stephanie's up first.
133
00:06:58,300 --> 00:07:02,560
And behind one of these doors here, we've hidden the monopoly money 50.
134
00:07:02,560 --> 00:07:04,180
And so we'd like you to find the 50.
135
00:07:04,180 --> 00:07:06,490
We'll tell you nothing more about the lockers.
136
00:07:06,490 --> 00:07:08,900
But we would like you to execute a certain algorithm.
137
00:07:08,900 --> 00:07:11,020
And in fact, I'm going to give you some pseudocode for this.
138
00:07:11,020 --> 00:07:12,770
And I'm going to give you the name for it.
139
00:07:12,770 --> 00:07:13,900
It's called linear search.
140
00:07:13,900 --> 00:07:15,855
And as the name implies, you're pretty much
141
00:07:15,855 --> 00:07:17,980
going to end up walking in sort of a straight line.
142
00:07:17,980 --> 00:07:19,220
But how are you going to do this?
143
00:07:19,220 --> 00:07:21,520
Well, let me propose that in a moment, your first step
144
00:07:21,520 --> 00:07:23,170
will be to think kind of like a loop.
145
00:07:23,170 --> 00:07:27,280
For each door from left to right, what do we want you to do on each iteration?
146
00:07:27,280 --> 00:07:31,528
Well, if 50 is behind that door, then we want
147
00:07:31,528 --> 00:07:33,070
to go ahead and have you return true.
148
00:07:33,070 --> 00:07:35,860
And hold up the 50 proudly, if you will, for the group.
149
00:07:35,860 --> 00:07:38,500
Otherwise, if you get through that whole loop
150
00:07:38,500 --> 00:07:40,690
and you haven't found the number 50, you can just
151
00:07:40,690 --> 00:07:42,640
throw up your hands in disappointment.
152
00:07:42,640 --> 00:07:45,190
False-- you've not found the number 50.
153
00:07:45,190 --> 00:07:50,293
So to be clear, step one is going to be for each door from left to right.
154
00:07:50,293 --> 00:07:51,460
How would you like to begin?
155
00:07:51,460 --> 00:07:55,610
156
00:07:55,610 --> 00:07:56,495
Yep.
157
00:07:56,495 --> 00:07:57,840
Oh, and then-- yep.
158
00:07:57,840 --> 00:07:58,340
There we go.
159
00:07:58,340 --> 00:08:00,365
Yep.
160
00:08:00,365 --> 00:08:02,060
And if you'd like to at least tell--
161
00:08:02,060 --> 00:08:04,040
good, good acting here.
162
00:08:04,040 --> 00:08:06,140
What have you found instead?
163
00:08:06,140 --> 00:08:07,970
STEPHANIE: It's not 50, but 20.
164
00:08:07,970 --> 00:08:08,900
DAVID MALAN: Oh, OK.
165
00:08:08,900 --> 00:08:10,280
So step one was a fail.
166
00:08:10,280 --> 00:08:11,870
So let's move on to step two.
167
00:08:11,870 --> 00:08:14,453
Inside of this loop, what are you going to do next?
168
00:08:14,453 --> 00:08:16,370
STEPHANIE: I'm going to move to the next door.
169
00:08:16,370 --> 00:08:17,037
DAVID MALAN: OK.
170
00:08:17,037 --> 00:08:20,790
171
00:08:20,790 --> 00:08:22,100
STEPHANIE: Almost.
172
00:08:22,100 --> 00:08:23,100
DAVID MALAN: OK, almost.
173
00:08:23,100 --> 00:08:23,790
Sort of.
174
00:08:23,790 --> 00:08:25,110
A 500 instead.
175
00:08:25,110 --> 00:08:26,280
Next locker?
176
00:08:26,280 --> 00:08:30,200
STEPHANIE: I would rather take that.
177
00:08:30,200 --> 00:08:30,700
No.
178
00:08:30,700 --> 00:08:33,840
179
00:08:33,840 --> 00:08:36,558
DAVID MALAN: OK, we're not telling the audience?
180
00:08:36,558 --> 00:08:38,260
STEPHANIE: It was a 10.
181
00:08:38,260 --> 00:08:39,789
DAVID MALAN: OK, so keep going.
182
00:08:39,789 --> 00:08:40,974
This is step three now.
183
00:08:40,974 --> 00:08:45,470
184
00:08:45,470 --> 00:08:46,310
STEPHANIE: Oh, man.
185
00:08:46,310 --> 00:08:49,850
186
00:08:49,850 --> 00:08:51,260
DAVID MALAN: Five, OK.
187
00:08:51,260 --> 00:08:52,670
A few more lockers to check.
188
00:08:52,670 --> 00:08:57,296
189
00:08:57,296 --> 00:08:58,790
STEPHANIE: A little sad, guys.
190
00:08:58,790 --> 00:09:02,527
191
00:09:02,527 --> 00:09:04,360
DAVID MALAN: All right, second-to-last step.
192
00:09:04,360 --> 00:09:07,710
193
00:09:07,710 --> 00:09:09,070
STEPHANIE: It's 1.
194
00:09:09,070 --> 00:09:10,022
Kind of close.
195
00:09:10,022 --> 00:09:10,980
DAVID MALAN: All right.
196
00:09:10,980 --> 00:09:12,780
And finally, the last step.
197
00:09:12,780 --> 00:09:15,780
Clearly you've been, perhaps, set up here.
198
00:09:15,780 --> 00:09:17,340
STEPHANIE: Let's go!
199
00:09:17,340 --> 00:09:19,920
DAVID MALAN: All right, so the number 50.
200
00:09:19,920 --> 00:09:23,500
201
00:09:23,500 --> 00:09:25,870
And Stephanie, if I may, let me ask you a question here.
202
00:09:25,870 --> 00:09:28,890
So on the screen, this is the pseudocode you just executed.
203
00:09:28,890 --> 00:09:31,800
Suppose, though, I had done what many of us
204
00:09:31,800 --> 00:09:34,770
have gotten to the habit of doing when you have an if condition.
205
00:09:34,770 --> 00:09:36,730
You often have an else branch as well.
206
00:09:36,730 --> 00:09:38,760
Suppose that I had done this now.
207
00:09:38,760 --> 00:09:41,680
And I'm marking it in red to be clear this is wrong.
208
00:09:41,680 --> 00:09:45,750
But what would have been bad about this code using an if and an else,
209
00:09:45,750 --> 00:09:47,040
might you say?
210
00:09:47,040 --> 00:09:48,360
Any instincts?
211
00:09:48,360 --> 00:09:55,620
212
00:09:55,620 --> 00:09:58,740
STEPHANIE: Then you would end up canceling
213
00:09:58,740 --> 00:10:00,210
the code before you found the 50.
214
00:10:00,210 --> 00:10:01,020
DAVID MALAN: Yeah, exactly.
215
00:10:01,020 --> 00:10:02,070
STEPHANIE: I mean, you'd just be eternally sad.
216
00:10:02,070 --> 00:10:02,460
DAVID MALAN: Indeed.
217
00:10:02,460 --> 00:10:05,127
When Stephanie had opened the first locker, she'd have found 20.
218
00:10:05,127 --> 00:10:06,630
20, of course, is not 50.
219
00:10:06,630 --> 00:10:07,838
She would have decreed false.
220
00:10:07,838 --> 00:10:10,547
But of course, she hadn't checked all of the rest of the lockers.
221
00:10:10,547 --> 00:10:13,440
So that would seem to be a key detail that, with this implementation
222
00:10:13,440 --> 00:10:16,440
of the pseudocode, we actually do go through-- as we did--
223
00:10:16,440 --> 00:10:18,810
and only return false not even with an else,
224
00:10:18,810 --> 00:10:23,070
but just at the end of the loop such that we only reach that line if we
225
00:10:23,070 --> 00:10:25,245
don't return true earlier than that.
226
00:10:25,245 --> 00:10:26,620
Well, let's go ahead and do this.
227
00:10:26,620 --> 00:10:27,360
Let me take the mic from you.
228
00:10:27,360 --> 00:10:28,930
If you'd like to take a seat next to Jackson?
229
00:10:28,930 --> 00:10:31,180
And Jackson, in just a moment, we'll have you come up.
230
00:10:31,180 --> 00:10:34,170
Carter, if you don't mind reorganizing the lockers for us.
231
00:10:34,170 --> 00:10:36,630
But in the meantime, let me point out how we might now
232
00:10:36,630 --> 00:10:38,010
translate that same idea to code.
233
00:10:38,010 --> 00:10:41,050
Pretty high level, pretty English-oriented with that pseudocode--
234
00:10:41,050 --> 00:10:44,910
but really now, as of last week, we have syntax via which Stephanie
235
00:10:44,910 --> 00:10:48,810
and, soon, Jackson could treat this locker, this set of lockers, as really
236
00:10:48,810 --> 00:10:51,250
indeed an array using bracket notation.
237
00:10:51,250 --> 00:10:54,480
So we can now get a little closer in our pseudocode to actual code.
238
00:10:54,480 --> 00:10:56,670
And the way a computer scientist, for instance,
239
00:10:56,670 --> 00:10:59,850
would translate fairly high level English pseudocode
240
00:10:59,850 --> 00:11:03,720
like this to something that's a little closer to C or any language
241
00:11:03,720 --> 00:11:06,600
that supports arrays would be a little more cryptically like this.
242
00:11:06,600 --> 00:11:09,060
But you'll see more of this syntax in the coming days.
243
00:11:09,060 --> 00:11:13,260
For i from 0 to n minus 1-- this is still pseudocode.
244
00:11:13,260 --> 00:11:15,840
But that's the English-like way of expressing what
245
00:11:15,840 --> 00:11:17,730
we've known come to know as a for loop.
246
00:11:17,730 --> 00:11:22,260
If 50 is behind doors bracket i-- so I'm assuming for the sake of discussion
247
00:11:22,260 --> 00:11:26,503
that doors, now, is the name of my variable, this array of seven doors.
248
00:11:26,503 --> 00:11:28,920
But then the rest of the logic, the rest of the pseudocode
249
00:11:28,920 --> 00:11:30,370
really is the same way.
250
00:11:30,370 --> 00:11:33,270
And so you'll find in time that programmers, computer scientists more
251
00:11:33,270 --> 00:11:36,900
generally, when you start expressing ideas, algorithms to someone else,
252
00:11:36,900 --> 00:11:40,170
instead of maybe operating at this level here,
253
00:11:40,170 --> 00:11:43,020
you now have a new vocabulary, really a new syntax
254
00:11:43,020 --> 00:11:44,910
that you can be a little more specific, not
255
00:11:44,910 --> 00:11:47,380
getting so into the weeds of writing actual C code,
256
00:11:47,380 --> 00:11:50,910
but at least now doing something that's a little closer to manipulating
257
00:11:50,910 --> 00:11:51,810
an array like this.
258
00:11:51,810 --> 00:11:55,140
So Jackson, would you like to stand on up?
259
00:11:55,140 --> 00:11:56,760
All right.
260
00:11:56,760 --> 00:11:57,360
Yes, yes.
261
00:11:57,360 --> 00:11:59,010
Support for Jackson here, too.
262
00:11:59,010 --> 00:12:00,780
Nice.
263
00:12:00,780 --> 00:12:04,470
And here now, I'm going to allow you an assumption that Stephanie did not have.
264
00:12:04,470 --> 00:12:06,660
Stephanie clearly was really doing her best
265
00:12:06,660 --> 00:12:10,050
searching from left to right using linear search, as we'll now call it.
266
00:12:10,050 --> 00:12:12,180
But they were pretty much in a random order, right?
267
00:12:12,180 --> 00:12:15,030
There was a 20 over there, there was 1 over there, and then a 50.
268
00:12:15,030 --> 00:12:19,110
So we deliberately jumbled things up and did not sort the numbers for her.
269
00:12:19,110 --> 00:12:22,530
But Carter kindly has just come up to give you a leg up, Jackson,
270
00:12:22,530 --> 00:12:24,510
by sorting the numbers in advance.
271
00:12:24,510 --> 00:12:27,630
And we'd like you this time, much like in week 0,
272
00:12:27,630 --> 00:12:30,060
to do something again and again, but this time
273
00:12:30,060 --> 00:12:32,250
using what we'll now call binary search.
274
00:12:32,250 --> 00:12:35,580
It's exactly the same algorithm conceptually as we did in Week 0.
275
00:12:35,580 --> 00:12:38,310
But if we translate it to the context of this array,
276
00:12:38,310 --> 00:12:40,450
we now might say something like this.
277
00:12:40,450 --> 00:12:42,960
The first step for Jackson might be to ask the question--
278
00:12:42,960 --> 00:12:46,110
if 50 is behind the middle door, where presumably he's
279
00:12:46,110 --> 00:12:48,570
done some mental math to figure out what the middle is,
280
00:12:48,570 --> 00:12:50,610
then he's going to just return true.
281
00:12:50,610 --> 00:12:53,070
And hopefully we'll get lucky and 50 will be right there.
282
00:12:53,070 --> 00:12:56,400
Of course, there's two other possibilities at least,
283
00:12:56,400 --> 00:12:58,290
which would be what?
284
00:12:58,290 --> 00:13:01,290
50 is, with respect to these doors?
285
00:13:01,290 --> 00:13:03,930
Yeah, so to the left or to the right, alternatively.
286
00:13:03,930 --> 00:13:07,722
So if 50 is less than the middle door, then presumably,
287
00:13:07,722 --> 00:13:09,180
Jackson's going to want to go left.
288
00:13:09,180 --> 00:13:11,310
Else, if 50 is greater than the middle door,
289
00:13:11,310 --> 00:13:13,380
he's going to want to go right, much like I
290
00:13:13,380 --> 00:13:16,170
did physically last week with the phone book,
291
00:13:16,170 --> 00:13:17,910
dividing and conquering left to right.
292
00:13:17,910 --> 00:13:20,020
But there's actually a fourth case.
293
00:13:20,020 --> 00:13:21,540
Let's put it on the board first.
294
00:13:21,540 --> 00:13:25,530
What else might happen here that Jackson should consider?
295
00:13:25,530 --> 00:13:26,170
Yeah.
296
00:13:26,170 --> 00:13:28,590
Oh, it's not there.
297
00:13:28,590 --> 00:13:32,930
So let me actually go back and amend my pseudocode here and just say Jackson,
298
00:13:32,930 --> 00:13:36,200
if we don't hand you any doors at all, or eventually,
299
00:13:36,200 --> 00:13:39,080
as he's dividing and conquering, if he's left with no more doors,
300
00:13:39,080 --> 00:13:42,380
we have to handle that situation so that the behavior is defined.
301
00:13:42,380 --> 00:13:44,210
All right, so with that said, Jackson, do
302
00:13:44,210 --> 00:13:46,127
you want to go ahead and find us the number 50
303
00:13:46,127 --> 00:13:48,650
and walk us through verbally what you're doing and finding?
304
00:13:48,650 --> 00:13:52,860
JACKSON: All right, so it looks like this one is the middle door.
305
00:13:52,860 --> 00:13:55,290
So I'm going to open it.
306
00:13:55,290 --> 00:13:57,030
But it's 20, not 50.
307
00:13:57,030 --> 00:13:59,622
DAVID MALAN: Aw.
308
00:13:59,622 --> 00:14:01,080
What's going through your head now?
309
00:14:01,080 --> 00:14:02,670
JACKSON: So now I'm looking--
310
00:14:02,670 --> 00:14:06,490
because 50 is higher than 20, I want to look to the right.
311
00:14:06,490 --> 00:14:07,440
DAVID MALAN: Good.
312
00:14:07,440 --> 00:14:10,270
JACKSON: And look for the new middle door, which would be here.
313
00:14:10,270 --> 00:14:11,700
DAVID MALAN: Nice.
314
00:14:11,700 --> 00:14:12,900
JACKSON: And it's 100--
315
00:14:12,900 --> 00:14:13,740
bad.
316
00:14:13,740 --> 00:14:16,560
But 50 is less than 100.
317
00:14:16,560 --> 00:14:20,520
So now we to look left, which would be here.
318
00:14:20,520 --> 00:14:21,240
And ta-da.
319
00:14:21,240 --> 00:14:21,990
DAVID MALAN: Nice.
320
00:14:21,990 --> 00:14:25,680
Very well done this time around, too.
321
00:14:25,680 --> 00:14:29,680
So thank you, first, to our volunteers here.
322
00:14:29,680 --> 00:14:32,970
And in fact, since you're a fan of Monopoly, as we're so informed,
323
00:14:32,970 --> 00:14:35,220
we have the Cambridge edition of Monopoly
324
00:14:35,220 --> 00:14:36,743
with all your Harvard favorites.
325
00:14:36,743 --> 00:14:37,410
JACKSON: No way.
326
00:14:37,410 --> 00:14:38,460
DAVID MALAN: Here you go.
327
00:14:38,460 --> 00:14:38,970
STEPHANIE: Thank you.
328
00:14:38,970 --> 00:14:40,095
JACKSON: Thank you so much.
329
00:14:40,095 --> 00:14:42,570
DAVID MALAN: Thank you to our volunteers for finding us 50.
330
00:14:42,570 --> 00:14:46,940
So-- that was more popular than we expected.
331
00:14:46,940 --> 00:14:50,470
So here, we can translate this one more time into something
332
00:14:50,470 --> 00:14:52,060
a little closer to code.
333
00:14:52,060 --> 00:14:54,730
And again, still pseudocode-- but here, now,
334
00:14:54,730 --> 00:14:57,460
might be another formulation of exactly what Jackson just did,
335
00:14:57,460 --> 00:14:59,380
just using the nomenclature now of arrays,
336
00:14:59,380 --> 00:15:01,570
where you can be a little more precise with your instructions
337
00:15:01,570 --> 00:15:04,640
and still leave it to someone else to translate this, finally, to code.
338
00:15:04,640 --> 00:15:06,640
But here we have same question at the beginning.
339
00:15:06,640 --> 00:15:08,650
If no doors left, return false.
340
00:15:08,650 --> 00:15:11,500
If 50 is behind doors bracket middle--
341
00:15:11,500 --> 00:15:13,810
so I'm assuming here, because this is pseudocode--
342
00:15:13,810 --> 00:15:16,810
that somewhere I've done the mental math or the actual math
343
00:15:16,810 --> 00:15:19,480
to figure out what the index of middle is.
344
00:15:19,480 --> 00:15:22,420
For instance, if these are seven doors in an array,
345
00:15:22,420 --> 00:15:27,310
this would be location 0, 1, 2, 3, 4, 5, 6.
346
00:15:27,310 --> 00:15:31,120
So somehow I've taken the total number of doors, 7,
347
00:15:31,120 --> 00:15:33,520
divided by 2 to find the middle.
348
00:15:33,520 --> 00:15:34,430
That's 3 and 1/2.
349
00:15:34,430 --> 00:15:35,680
We have to deal with rounding.
350
00:15:35,680 --> 00:15:38,920
But suffice it to say there's a well-defined formula for finding
351
00:15:38,920 --> 00:15:39,910
the middle index.
352
00:15:39,910 --> 00:15:43,280
Given the total number of lockers, divide by 2 and then round accordingly.
353
00:15:43,280 --> 00:15:45,550
So that's presumably what Jackson did just by counting
354
00:15:45,550 --> 00:15:48,790
or in his head to find us door number 3.
355
00:15:48,790 --> 00:15:52,210
Not the third door, the 4th door, but door bracket 3.
356
00:15:52,210 --> 00:15:56,200
So this is just saying if 50 is behind doors bracket middle, return true.
357
00:15:56,200 --> 00:15:57,200
That was not the case.
358
00:15:57,200 --> 00:15:58,960
He found a $20 bill instead.
359
00:15:58,960 --> 00:16:03,670
Else, if 50 is less than the doors bracket middle, go ahead--
360
00:16:03,670 --> 00:16:10,660
and now it gets interesting-- search doors 0 through doors middle minus 1.
361
00:16:10,660 --> 00:16:12,730
So it's getting a little more into the weeds now.
362
00:16:12,730 --> 00:16:16,900
But if middle is 3, this one here, what we want to now
363
00:16:16,900 --> 00:16:19,300
have Jackson search if 50 had been--
364
00:16:19,300 --> 00:16:22,360
if the number had been less, we want to start at bracket 0
365
00:16:22,360 --> 00:16:24,340
and go up through this one.
366
00:16:24,340 --> 00:16:26,380
And we deliberately subtract 1 because what's
367
00:16:26,380 --> 00:16:28,297
the point of looking in the same locker again?
368
00:16:28,297 --> 00:16:31,270
We might as well do 0 through middle minus 1.
369
00:16:31,270 --> 00:16:36,040
Else if 50 is greater than doors bracket middle, which it was,
370
00:16:36,040 --> 00:16:37,120
what did we then do?
371
00:16:37,120 --> 00:16:40,870
So Jackson intuitively searched for doors middle plus 1
372
00:16:40,870 --> 00:16:43,295
through doors n minus 1.
373
00:16:43,295 --> 00:16:46,420
And honestly, it gets a little annoying having the pluses and minuses here.
374
00:16:46,420 --> 00:16:47,780
But just think of what it means.
375
00:16:47,780 --> 00:16:49,090
This is the middle door.
376
00:16:49,090 --> 00:16:53,942
And Jackson then did proceed to search through doors middle plus 1
377
00:16:53,942 --> 00:16:56,150
because there's no point in searching this one again.
378
00:16:56,150 --> 00:17:00,130
And then the last element in any array of size n
379
00:17:00,130 --> 00:17:05,352
where n is just our go-to number for the size is always going to be n minus 1.
380
00:17:05,352 --> 00:17:06,310
It's not going to be n.
381
00:17:06,310 --> 00:17:10,839
It's going to be n minus 1 because we always start counting arrays at 0.
382
00:17:10,839 --> 00:17:14,200
So here then we have a translation into pseudocode that's a little closer
383
00:17:14,200 --> 00:17:16,430
to C of this exact same idea.
384
00:17:16,430 --> 00:17:18,490
And here, we come full circle to Week 0.
385
00:17:18,490 --> 00:17:22,240
In Week 0, it was pretty intuitive to imagine dividing and conquering
386
00:17:22,240 --> 00:17:23,420
a problem like this.
387
00:17:23,420 --> 00:17:26,770
But if you now think back to actual your iPhone, your Android phone,
388
00:17:26,770 --> 00:17:29,920
or the like, when you're doing autocomplete and searching the list,
389
00:17:29,920 --> 00:17:33,430
it's possible, if you don't have many friends or family or colleagues
390
00:17:33,430 --> 00:17:34,840
in the phone, you know what?
391
00:17:34,840 --> 00:17:39,070
Linear search, just checking every name for the person you're searching for,
392
00:17:39,070 --> 00:17:40,720
might be perfectly fine.
393
00:17:40,720 --> 00:17:43,810
But odds are your phone's being smarter than that, especially if you
394
00:17:43,810 --> 00:17:47,140
start to have dozens, hundreds, thousands of people in your contacts
395
00:17:47,140 --> 00:17:47,770
over the years.
396
00:17:47,770 --> 00:17:49,540
What would be better than linear search?
397
00:17:49,540 --> 00:17:51,340
Well, perhaps binary search.
398
00:17:51,340 --> 00:17:55,180
But, but, but-- there's an assumption, a requirement, which is what?
399
00:17:55,180 --> 00:18:00,340
Why was Jackson ultimately able to find the 50 in just three steps
400
00:18:00,340 --> 00:18:04,830
instead of a full seven, like Stephanie?
401
00:18:04,830 --> 00:18:06,570
Because the array was sorted.
402
00:18:06,570 --> 00:18:09,990
And so this is sort of a teaser for what we'll have to come back to later today.
403
00:18:09,990 --> 00:18:12,780
Well, how much effort did it take someone like Carter?
404
00:18:12,780 --> 00:18:15,240
How much effort does it take your phone to sort all
405
00:18:15,240 --> 00:18:17,040
of those names and numbers in advance?
406
00:18:17,040 --> 00:18:19,650
Because maybe it's not actually worth the amount of time.
407
00:18:19,650 --> 00:18:24,210
Now someone like Google probably somehow keeps the database of web pages sorted.
408
00:18:24,210 --> 00:18:28,110
You can imagine it being super slow if, when you type in cats or something else
409
00:18:28,110 --> 00:18:32,280
into google.com, if they searched linearly over their entire data set.
410
00:18:32,280 --> 00:18:35,430
Ideally, they're doing something a little smarter than that.
411
00:18:35,430 --> 00:18:38,820
So we'll formalize, now, exactly this kind of analysis.
412
00:18:38,820 --> 00:18:42,180
And it's not going to be so much mathy as it still will be intuitive.
413
00:18:42,180 --> 00:18:45,480
But we'll introduce you to some jargon, some terminology
414
00:18:45,480 --> 00:18:48,180
that most any programmer or computer scientists might use
415
00:18:48,180 --> 00:18:50,550
when analyzing their own algorithms.
416
00:18:50,550 --> 00:18:53,670
Let's formalize now what this kind of analysis is.
417
00:18:53,670 --> 00:18:56,910
So right now, I claim binary search better than linear search.
418
00:18:56,910 --> 00:18:59,100
But how much better and why, exactly?
419
00:18:59,100 --> 00:19:01,120
Well, it all comes back to this kind of graph.
420
00:19:01,120 --> 00:19:04,830
So this, recall, is how we analyzed the phone book back in Week 0.
421
00:19:04,830 --> 00:19:08,880
And recall that, indeed, we had these formulas, rough formulas that
422
00:19:08,880 --> 00:19:11,370
described the running time of those three algorithms--
423
00:19:11,370 --> 00:19:13,740
one page at a time, two pages at a time, and then
424
00:19:13,740 --> 00:19:15,720
tearing the thing again and again in half.
425
00:19:15,720 --> 00:19:19,830
And precisely, if you counted up the number of pages I was touching
426
00:19:19,830 --> 00:19:22,020
or the number of pages I was tearing, it's
427
00:19:22,020 --> 00:19:24,600
fair to say that the first algorithm, in the worst case,
428
00:19:24,600 --> 00:19:26,700
might have taken n total pages.
429
00:19:26,700 --> 00:19:29,010
It didn't because I was searching for John Harvard
430
00:19:29,010 --> 00:19:31,260
at the time, which is somewhat early in the alphabet.
431
00:19:31,260 --> 00:19:34,340
But if I were searching for someone with the last name of Z,
432
00:19:34,340 --> 00:19:36,840
I would have had to keep going and going, in the worst case,
433
00:19:36,840 --> 00:19:38,430
through all n pages.
434
00:19:38,430 --> 00:19:41,940
Not as bad for the second algorithm, and that's why we do n divided by 2.
435
00:19:41,940 --> 00:19:43,920
And even that's a bit of a white lie.
436
00:19:43,920 --> 00:19:48,270
It's probably n divided by 2 plus 1 in case I have to double back.
437
00:19:48,270 --> 00:19:50,405
But again, I'm sort of doing this more generally
438
00:19:50,405 --> 00:19:52,030
to capture the essence of these things.
439
00:19:52,030 --> 00:19:54,363
And then we really got into the weeds with like log base
440
00:19:54,363 --> 00:19:56,940
2 event for that third and final algorithm.
441
00:19:56,940 --> 00:20:00,690
And at the time, we claimed any time you're dividing something in half,
442
00:20:00,690 --> 00:20:04,200
in half, in half, odds are there's going to be some kind of logarithm involved.
443
00:20:04,200 --> 00:20:05,340
And we'll see that today.
444
00:20:05,340 --> 00:20:09,010
But today, we're going to actually start using computer science terminology.
445
00:20:09,010 --> 00:20:13,590
And we're going to formalize this imprecision, if you will.
446
00:20:13,590 --> 00:20:17,670
We are not going to care, generally, about exactly how many steps
447
00:20:17,670 --> 00:20:20,820
some algorithm takes because that's not going to be that enlightening,
448
00:20:20,820 --> 00:20:24,630
especially if maybe you have a faster computer tomorrow than you did today.
449
00:20:24,630 --> 00:20:27,510
It wouldn't really be fair to compare numbers too precisely.
450
00:20:27,510 --> 00:20:31,590
We really want to, with the wave of a hand, just get a sense of roughly
451
00:20:31,590 --> 00:20:33,930
how slow or how fast an algorithm is.
452
00:20:33,930 --> 00:20:36,000
So the notation here is deliberate.
453
00:20:36,000 --> 00:20:40,620
That is literally a capital O, often italicized, refer to as big O.
454
00:20:40,620 --> 00:20:43,920
And so the first algorithm is in big O of n.
455
00:20:43,920 --> 00:20:47,760
The second algorithm is in big O of n divided by 2.
456
00:20:47,760 --> 00:20:51,480
The third algorithm is in big O of log base 2 of n.
457
00:20:51,480 --> 00:20:54,690
But even that is kind of unnecessary detail.
458
00:20:54,690 --> 00:20:58,830
When using big O notation, you really don't care about, we'll see,
459
00:20:58,830 --> 00:21:01,230
the smaller order terms.
460
00:21:01,230 --> 00:21:04,500
We're not going to care about the divided by 2 because you know what?
461
00:21:04,500 --> 00:21:07,720
The shape of these algorithms are is almost the same.
462
00:21:07,720 --> 00:21:10,800
And really, the idea-- the algorithm itself is sort of fundamentally
463
00:21:10,800 --> 00:21:11,340
the same.
464
00:21:11,340 --> 00:21:13,620
Instead of one page at a time, I'm doing two.
465
00:21:13,620 --> 00:21:17,280
But if you throw millions of pages, billions of pages at me,
466
00:21:17,280 --> 00:21:20,460
those algorithms are really going to kind of perform the same as n gets
467
00:21:20,460 --> 00:21:22,085
really large, goes off toward infinity.
468
00:21:22,085 --> 00:21:23,585
And the same is true for logarithms.
469
00:21:23,585 --> 00:21:25,560
Even if you're a little rusty, it turns out
470
00:21:25,560 --> 00:21:29,560
that whether you do the math with log base 2, log base 3, log base 10,
471
00:21:29,560 --> 00:21:33,040
you can just multiply one by the other to really get the same formula.
472
00:21:33,040 --> 00:21:35,700
So this is only to say a computer scientist would generally
473
00:21:35,700 --> 00:21:39,270
say that the first two algorithms are on the order of n steps.
474
00:21:39,270 --> 00:21:42,690
The third algorithm is on the order of log n steps.
475
00:21:42,690 --> 00:21:46,350
And we don't really care precisely what we mean beyond that.
476
00:21:46,350 --> 00:21:49,770
And this big O notation, as we'll see-- and actually, let me zoom out.
477
00:21:49,770 --> 00:21:53,500
If you can imagine suddenly making the x-axis much longer--
478
00:21:53,500 --> 00:21:55,770
so more pages on the screen at once--
479
00:21:55,770 --> 00:21:58,320
it is indeed going to be the shapes of these curves
480
00:21:58,320 --> 00:22:01,480
that matter, because imagine in your mind's eye as you zoom out,
481
00:22:01,480 --> 00:22:04,950
zoom out, zoom out, zoom out, and as n gets much, much, much bigger
482
00:22:04,950 --> 00:22:07,470
on the x-axis, the red and the yellow line
483
00:22:07,470 --> 00:22:11,400
are essentially going to look the same once n is sufficiently large.
484
00:22:11,400 --> 00:22:14,378
But the green line is never going to look the same.
485
00:22:14,378 --> 00:22:16,420
It's going to be a fundamentally different shape.
486
00:22:16,420 --> 00:22:18,570
And so that's the intuition of big O, to get
487
00:22:18,570 --> 00:22:23,200
a sense of these rates of performance like this.
488
00:22:23,200 --> 00:22:25,410
So here, then, is big O. Here is, perhaps,
489
00:22:25,410 --> 00:22:29,160
a cheat sheet of the common formulas that a computer scientist, certainly
490
00:22:29,160 --> 00:22:32,490
in an introductory context, might use when analyzing algorithms.
491
00:22:32,490 --> 00:22:36,060
And let's consider for a moment which of our first two algorithms--
492
00:22:36,060 --> 00:22:39,040
linear search and binary search-- fall into these categories.
493
00:22:39,040 --> 00:22:44,318
So I've ordered them from slowest to fastest, so order of n squared.
494
00:22:44,318 --> 00:22:46,110
It's not something we've actually seen yet,
495
00:22:46,110 --> 00:22:48,392
but it tends to be slow because it's quadratic.
496
00:22:48,392 --> 00:22:49,350
You're doing n times n.
497
00:22:49,350 --> 00:22:51,090
That's got to add up to a lot of steps.
498
00:22:51,090 --> 00:22:53,190
Better today is going to be n log n.
499
00:22:53,190 --> 00:22:54,630
Even better is going to be n.
500
00:22:54,630 --> 00:22:56,190
Even better than that is log n.
501
00:22:56,190 --> 00:23:00,150
And best is so-called order of 1, like one step
502
00:23:00,150 --> 00:23:02,520
or maybe two steps, maybe even 1,000 steps,
503
00:23:02,520 --> 00:23:08,200
but a fixed, finite number of steps that never changes no matter how big n is.
504
00:23:08,200 --> 00:23:11,920
So given this chart, just to be clear, linear search--
505
00:23:11,920 --> 00:23:13,570
let's consider the worst case.
506
00:23:13,570 --> 00:23:18,040
In the worst case, how many steps did it take someone like Stephanie
507
00:23:18,040 --> 00:23:23,500
to find the solution to the problem, assuming not seven doors but n doors?
508
00:23:23,500 --> 00:23:25,160
Yeah?
509
00:23:25,160 --> 00:23:26,540
So on the order of n.
510
00:23:26,540 --> 00:23:28,280
And in this case, it's exactly n.
511
00:23:28,280 --> 00:23:32,660
But you know what, maybe it's arguably 2n because it took Stephanie
512
00:23:32,660 --> 00:23:33,530
a couple of steps.
513
00:23:33,530 --> 00:23:34,460
She had to lift the latch.
514
00:23:34,460 --> 00:23:35,360
She had to open the door.
515
00:23:35,360 --> 00:23:36,318
Maybe it's three steps.
516
00:23:36,318 --> 00:23:37,530
She had to show the money.
517
00:23:37,530 --> 00:23:39,170
So now it's 3n, 2n.
518
00:23:39,170 --> 00:23:41,990
But we don't really care about that level of precision.
519
00:23:41,990 --> 00:23:45,660
We really just care about the fundamental number of operations.
520
00:23:45,660 --> 00:23:47,540
So we'll say yes, on the order of n.
521
00:23:47,540 --> 00:23:51,320
So that might be an upper bound, we'll call this, for linear search.
522
00:23:51,320 --> 00:23:53,030
And how about binary search?
523
00:23:53,030 --> 00:23:57,140
In Jackson's case, or in general, me in Week 0, if there's n doors,
524
00:23:57,140 --> 00:24:02,910
how many steps did it take Jackson or me using binary search?
525
00:24:02,910 --> 00:24:04,860
In this case, it was literally three.
526
00:24:04,860 --> 00:24:07,200
But that's not a formula.
527
00:24:07,200 --> 00:24:09,690
Yeah so it's on the order of log n.
528
00:24:09,690 --> 00:24:11,970
And indeed, if there are seven doors, well, that's
529
00:24:11,970 --> 00:24:14,250
almost eight, if you just do a little bit of rounding.
530
00:24:14,250 --> 00:24:18,480
And indeed, if you take log base 2 of 8, that does actually give us 3.
531
00:24:18,480 --> 00:24:19,813
So the math actually checks out.
532
00:24:19,813 --> 00:24:22,272
And if you're not comfortable with logarithms, no big deal.
533
00:24:22,272 --> 00:24:23,670
Just think about it intuitively.
534
00:24:23,670 --> 00:24:27,010
Logarithm of base 2 is just dividing something again and again.
535
00:24:27,010 --> 00:24:31,110
So on this chart, when we consider big O, which to be clear,
536
00:24:31,110 --> 00:24:35,100
allows you to describe the order of an algorithm's running time--
537
00:24:35,100 --> 00:24:38,610
like the magnitude of it-- but it also describes, more specifically,
538
00:24:38,610 --> 00:24:40,090
an upper bound.
539
00:24:40,090 --> 00:24:42,990
So in the worst case, for instance, these
540
00:24:42,990 --> 00:24:45,480
are pretty good measures of how good--
541
00:24:45,480 --> 00:24:47,310
or rather, of how bad--
542
00:24:47,310 --> 00:24:49,270
linear search and binary search might be.
543
00:24:49,270 --> 00:24:49,770
Why?
544
00:24:49,770 --> 00:24:52,122
Well, suppose you're searching a 1,000-page phonebook
545
00:24:52,122 --> 00:24:55,080
and the person's name starts with Z. The algorithm is still going to be
546
00:24:55,080 --> 00:24:56,320
on the order of n steps.
547
00:24:56,320 --> 00:24:56,820
Why?
548
00:24:56,820 --> 00:25:01,080
Because it might take you as many as all n steps to find it.
549
00:25:01,080 --> 00:25:05,250
Now that's not necessarily going to be the case in practice.
550
00:25:05,250 --> 00:25:08,370
If I use big O as an upper bound, well, it
551
00:25:08,370 --> 00:25:11,160
would be nice if there's a corresponding lower bound,
552
00:25:11,160 --> 00:25:16,180
especially if you want to consider not just worst cases, but maybe best cases.
553
00:25:16,180 --> 00:25:18,040
So what might we use here?
554
00:25:18,040 --> 00:25:20,200
Well, this is a capital Greek omega symbol.
555
00:25:20,200 --> 00:25:23,550
So omega is the symbol that a computer scientist uses generally
556
00:25:23,550 --> 00:25:25,920
to describe a lower bound on an algorithm,
557
00:25:25,920 --> 00:25:28,710
often in the context of best case, though not necessarily.
558
00:25:28,710 --> 00:25:32,490
So a lower bound means how few steps might an algorithm take?
559
00:25:32,490 --> 00:25:33,990
And here, too, same formulas.
560
00:25:33,990 --> 00:25:36,270
And we'll fill in these blanks over time.
561
00:25:36,270 --> 00:25:40,140
Some algorithms might always take a minimum of n squared steps,
562
00:25:40,140 --> 00:25:41,370
or on the order of n steps.
563
00:25:41,370 --> 00:25:45,660
Some might only take n log n, or n, or log n, or 1.
564
00:25:45,660 --> 00:25:49,170
So something like a linear search--
565
00:25:49,170 --> 00:25:51,240
when Stephanie started with linear search,
566
00:25:51,240 --> 00:25:52,980
she didn't get lucky this time on stage.
567
00:25:52,980 --> 00:25:57,720
But what if she had, and the first door she opened were 50?
568
00:25:57,720 --> 00:26:01,260
How might you then describe the lower bound on linear
569
00:26:01,260 --> 00:26:08,290
search in this so-called best case, using this list of possible answers?
570
00:26:08,290 --> 00:26:09,530
Yeah?
571
00:26:09,530 --> 00:26:11,060
Yeah, so omega of 1.
572
00:26:11,060 --> 00:26:14,900
So in the best case, the lower bound on how many steps
573
00:26:14,900 --> 00:26:18,990
it might take linear search to find something might just be one step.
574
00:26:18,990 --> 00:26:19,490
Why?
575
00:26:19,490 --> 00:26:21,500
Because maybe if Stephanie had gotten lucky
576
00:26:21,500 --> 00:26:25,960
and we had prefilled these lockers with the numbers in some other order
577
00:26:25,960 --> 00:26:28,460
such that she might have opened the first locker, and voila,
578
00:26:28,460 --> 00:26:31,280
the number 50 could have been there, so a lower bound arguably
579
00:26:31,280 --> 00:26:34,610
could indeed be omega of 1 for linear search.
580
00:26:34,610 --> 00:26:35,990
And how about now for Jackson?
581
00:26:35,990 --> 00:26:37,440
He used binary search.
582
00:26:37,440 --> 00:26:40,940
So he dived right into the middle of the problem.
583
00:26:40,940 --> 00:26:45,020
But what would be a lower bound on binary search using this logic?
584
00:26:45,020 --> 00:26:45,980
Yeah?
585
00:26:45,980 --> 00:26:47,460
Yeah, so again, omega of 1.
586
00:26:47,460 --> 00:26:47,960
Why?
587
00:26:47,960 --> 00:26:49,580
Because maybe he just gets lucky.
588
00:26:49,580 --> 00:26:53,300
And indeed, right in the middle of the lockers could have been the number 50.
589
00:26:53,300 --> 00:26:54,060
It wasn't.
590
00:26:54,060 --> 00:26:57,770
And so more germane in Jackson's actual practice
591
00:26:57,770 --> 00:27:00,050
would have been the big O discussion.
592
00:27:00,050 --> 00:27:03,350
But big O and omega, upper bound and lower bound,
593
00:27:03,350 --> 00:27:05,450
just allow a computer scientist to kind of wrestle
594
00:27:05,450 --> 00:27:06,980
with what could happen maybe in the worst
595
00:27:06,980 --> 00:27:08,670
case, what can happen in the best case?
596
00:27:08,670 --> 00:27:12,267
And you can even get even more precise like the average case or the like.
597
00:27:12,267 --> 00:27:15,350
And this is, indeed, what engineers might do at a whiteboard in a company,
598
00:27:15,350 --> 00:27:17,600
in a university when designing an algorithm
599
00:27:17,600 --> 00:27:21,380
and trying to make arguments as to why their algorithm is better than someone
600
00:27:21,380 --> 00:27:24,080
else's, by way of these kinds of analyses.
601
00:27:24,080 --> 00:27:29,180
And just so you've seen it, it turns out that if some algorithm happens
602
00:27:29,180 --> 00:27:32,990
to have an identical upper bound and lower bound,
603
00:27:32,990 --> 00:27:35,880
you can actually use a capital Greek theta as well.
604
00:27:35,880 --> 00:27:38,210
And this is the last of the Greek symbols today.
605
00:27:38,210 --> 00:27:41,900
But a Greek theta indicates a coincidence of both upper
606
00:27:41,900 --> 00:27:43,130
bound and lower bound.
607
00:27:43,130 --> 00:27:44,702
That is, they are one and the same.
608
00:27:44,702 --> 00:27:47,660
That was not the case for our discussion a second ago of linear search,
609
00:27:47,660 --> 00:27:49,220
not the case for binary search.
610
00:27:49,220 --> 00:27:52,970
But you could use the same kinds of formulas
611
00:27:52,970 --> 00:27:56,970
if it turns out that your upper bound and lower bound are the same.
612
00:27:56,970 --> 00:28:00,440
So for instance, if I were to count everyone literally in this room-- one,
613
00:28:00,440 --> 00:28:03,870
two, three, four, five, six and so forth--
614
00:28:03,870 --> 00:28:09,080
you could actually say that counting in that way is in theta of n
615
00:28:09,080 --> 00:28:11,150
because in the best case, it's going to take
616
00:28:11,150 --> 00:28:13,468
me n points, people in the audience.
617
00:28:13,468 --> 00:28:15,260
In the worst case, it's going to take me n.
618
00:28:15,260 --> 00:28:18,380
It's always going to take me n steps if I want to count everyone in the room.
619
00:28:18,380 --> 00:28:20,930
You can't really do better than that unless you skip people.
620
00:28:20,930 --> 00:28:23,510
So that would be an example of the cuff of something
621
00:28:23,510 --> 00:28:26,150
where theta is instead germane.
622
00:28:26,150 --> 00:28:31,530
Are any questions now on big O, on omega, or theta,
623
00:28:31,530 --> 00:28:34,040
which are now just more formal tools in the toolkit
624
00:28:34,040 --> 00:28:38,730
for talking about the design of our algorithms?
625
00:28:38,730 --> 00:28:42,050
Any questions?
626
00:28:42,050 --> 00:28:42,860
No?
627
00:28:42,860 --> 00:28:44,720
Seeing none.
628
00:28:44,720 --> 00:28:45,560
Oh, is this-- yes?
629
00:28:45,560 --> 00:28:46,840
No?
630
00:28:46,840 --> 00:28:48,250
OK, so we're good.
631
00:28:48,250 --> 00:28:52,000
So let's go ahead and translate this, perhaps, to some actual code.
632
00:28:52,000 --> 00:28:53,900
Let me go over to VS Code here.
633
00:28:53,900 --> 00:28:57,100
And let's see if we can't now translate some of these ideas
634
00:28:57,100 --> 00:29:00,280
to some actual code, not so much using new syntax yet.
635
00:29:00,280 --> 00:29:03,320
We're going to still operate in this world of arrays like last week.
636
00:29:03,320 --> 00:29:05,237
So let me go ahead and create a program called
637
00:29:05,237 --> 00:29:09,280
search.c by executing code space search.c in my terminal.
638
00:29:09,280 --> 00:29:11,800
And then up here, let's go ahead and include our usual,
639
00:29:11,800 --> 00:29:14,740
so include cs50.h so I can get some input.
640
00:29:14,740 --> 00:29:18,370
Include standard io.h so I can print some output.
641
00:29:18,370 --> 00:29:21,910
We'll do int main void, the meaning of which we did start
642
00:29:21,910 --> 00:29:23,140
to tease apart last week.
643
00:29:23,140 --> 00:29:26,650
The fact that it's void again today just means no command line arguments.
644
00:29:26,650 --> 00:29:28,580
And let me go ahead and do this.
645
00:29:28,580 --> 00:29:33,130
Let me go ahead and declare, just for discussion's sake, a static array,
646
00:29:33,130 --> 00:29:34,840
like an array that never changes.
647
00:29:34,840 --> 00:29:39,280
And the syntax for this is going to be give me an array called numbers
648
00:29:39,280 --> 00:29:41,290
using the square bracket notation.
649
00:29:41,290 --> 00:29:43,390
And I'm going to immediately initialize it
650
00:29:43,390 --> 00:29:49,300
to 20, 500, 10, 5, 100, 1, and 50, reminiscent of those same denominations
651
00:29:49,300 --> 00:29:50,060
as before.
652
00:29:50,060 --> 00:29:54,080
So this is a slightly new syntax that we've perhaps not seen.
653
00:29:54,080 --> 00:29:57,130
And the curly braces here, which are different from for loops
654
00:29:57,130 --> 00:30:00,820
and while loops and functions, just tell the compiler please give me
655
00:30:00,820 --> 00:30:05,380
an array of whatever size this is containing those numbers left to right.
656
00:30:05,380 --> 00:30:10,220
I could alternatively use last week's syntax of saying something like this.
657
00:30:10,220 --> 00:30:13,090
Let's see, 1, 2, 3, 4, 5, 6, 7 denominations.
658
00:30:13,090 --> 00:30:15,250
I could alternatively do this.
659
00:30:15,250 --> 00:30:21,910
And then I could say numbers bracket 0 equals
660
00:30:21,910 --> 00:30:25,570
20, numbers bracket 1 equals 500.
661
00:30:25,570 --> 00:30:27,572
And I could do this five more times.
662
00:30:27,572 --> 00:30:28,780
That's just a little tedious.
663
00:30:28,780 --> 00:30:30,580
If you know the numbers in advance, you don't
664
00:30:30,580 --> 00:30:32,530
have to tell the compiler how many there are.
665
00:30:32,530 --> 00:30:37,420
You can just let it figure it out that your numbers will be 10, 500, 10, 5,
666
00:30:37,420 --> 00:30:39,430
100, 1, and 50.
667
00:30:39,430 --> 00:30:42,550
So this is how you statically define an array.
668
00:30:42,550 --> 00:30:45,380
All right, let me just go ahead and ask the user now for a number.
669
00:30:45,380 --> 00:30:48,920
We'll call it n by using get_int and prompting them for a number--
670
00:30:48,920 --> 00:30:50,020
so nothing new there.
671
00:30:50,020 --> 00:30:53,680
And now let me go ahead and implement linear search.
672
00:30:53,680 --> 00:30:57,520
And the pseudocode we had for this before used some array-like notation.
673
00:30:57,520 --> 00:30:59,620
Let me go ahead, then, and start similarly.
674
00:30:59,620 --> 00:31:04,270
For int i-- and you almost always start counting at i by convention.
675
00:31:04,270 --> 00:31:06,490
So that's perhaps a good starting point.
676
00:31:06,490 --> 00:31:09,790
I'm going to do this so long as i is less than 7.
677
00:31:09,790 --> 00:31:12,138
Not the best design to hard code the 7, but this is just
678
00:31:12,138 --> 00:31:13,930
for demonstration's sake for now, because I
679
00:31:13,930 --> 00:31:15,560
know how many numbers I put in there.
680
00:31:15,560 --> 00:31:16,992
And then I'm going to i++.
681
00:31:16,992 --> 00:31:19,450
So now I have the beginnings of a loop that will just allow
682
00:31:19,450 --> 00:31:21,550
me to iterate over the entire array.
683
00:31:21,550 --> 00:31:22,760
And let me ask this.
684
00:31:22,760 --> 00:31:27,370
If the current number at location i equals
685
00:31:27,370 --> 00:31:30,650
equals n, which is the number the human typed in,
686
00:31:30,650 --> 00:31:33,400
then let's go ahead and do something simple like printf,
687
00:31:33,400 --> 00:31:36,190
quote unquote, found, backslash n.
688
00:31:36,190 --> 00:31:40,240
And then per our discussion last week, to indicate that this is successful,
689
00:31:40,240 --> 00:31:42,610
I'm going to return 0 if I found it.
690
00:31:42,610 --> 00:31:45,940
And if I don't find it, I'm just going to go down here and, by default,
691
00:31:45,940 --> 00:31:48,640
say not found, backslash n.
692
00:31:48,640 --> 00:31:52,610
And just for convention-- whoops, just for good measure, per convention,
693
00:31:52,610 --> 00:31:55,630
I'll return 1 or, really, any value other than 0.
694
00:31:55,630 --> 00:31:57,130
0, recall, means success.
695
00:31:57,130 --> 00:32:00,670
And any other integer tends to mean error of some sort,
696
00:32:00,670 --> 00:32:02,690
irrespective of the number I'm looking for.
697
00:32:02,690 --> 00:32:06,670
So just to revisit, the only thing that's new here is the syntax.
698
00:32:06,670 --> 00:32:09,980
We're creating an array of seven numbers, these numbers.
699
00:32:09,980 --> 00:32:13,540
And then after that we have really highlighted here
700
00:32:13,540 --> 00:32:16,090
an implementation of linear search.
701
00:32:16,090 --> 00:32:19,390
I mean this is the C version, I daresay, of what Stephanie did on the board,
702
00:32:19,390 --> 00:32:22,540
whereas now the array is called numbers instead of doors.
703
00:32:22,540 --> 00:32:25,460
But I think it's pretty much the same.
704
00:32:25,460 --> 00:32:30,380
Let me go ahead and open my terminal window and run make search.
705
00:32:30,380 --> 00:32:32,518
Seems to compile, ./search.
706
00:32:32,518 --> 00:32:34,310
And let's go ahead and search for a number.
707
00:32:34,310 --> 00:32:36,230
We'll start with what we did before, 50.
708
00:32:36,230 --> 00:32:37,340
And it's found.
709
00:32:37,340 --> 00:32:39,770
Let's go ahead and run it again, ./search.
710
00:32:39,770 --> 00:32:42,500
Let's search for maybe 20 at the beginning.
711
00:32:42,500 --> 00:32:43,670
That one, too, is found.
712
00:32:43,670 --> 00:32:47,150
Let's run it one more time searching for like 1,000,
713
00:32:47,150 --> 00:32:50,720
which is not among the denominations.
714
00:32:50,720 --> 00:32:52,980
And that one, indeed, is not found.
715
00:32:52,980 --> 00:32:56,210
So we've taken an idea from Week 0, now formalized in Week 3,
716
00:32:56,210 --> 00:32:59,300
and just translated it now to code.
717
00:32:59,300 --> 00:33:05,500
Questions on this implementation of linear search?
718
00:33:05,500 --> 00:33:07,570
Linear search.
719
00:33:07,570 --> 00:33:08,680
Nothing.
720
00:33:08,680 --> 00:33:11,810
Oh, so successful so far today.
721
00:33:11,810 --> 00:33:15,820
So let's see if we can't maybe make this a little more interesting
722
00:33:15,820 --> 00:33:19,270
and see if we can't trip over a detail that's going to be important in C.
723
00:33:19,270 --> 00:33:23,330
And instead of doing numbers, let me go ahead and do this.
724
00:33:23,330 --> 00:33:25,030
We'll stay on theme with Monopoly.
725
00:33:25,030 --> 00:33:26,830
And I went down the rabbit hole of reading the Wikipedia
726
00:33:26,830 --> 00:33:27,730
article on Monopoly.
727
00:33:27,730 --> 00:33:32,170
And the original pieces or tokens that came with Monopoly-- and it turns out
728
00:33:32,170 --> 00:33:33,740
we can represent those with strings.
729
00:33:33,740 --> 00:33:37,690
So I'm going to create an array called strings, plural, of whatever
730
00:33:37,690 --> 00:33:39,170
size I defined here.
731
00:33:39,170 --> 00:33:42,340
And the very first monopoly pieces back in the day
732
00:33:42,340 --> 00:33:44,920
were a battleship that you could play with,
733
00:33:44,920 --> 00:33:53,320
a boot, a cannon, an iron, a thimble, and a top hat, some of which
734
00:33:53,320 --> 00:33:54,700
you might from the game nowadays.
735
00:33:54,700 --> 00:33:56,410
Turns out they've been changing these--
736
00:33:56,410 --> 00:33:57,890
had no idea-- over the years.
737
00:33:57,890 --> 00:34:00,170
So here is, now, an array of strings.
738
00:34:00,170 --> 00:34:03,940
Let me go ahead and prompt the user now not for an integer anymore.
739
00:34:03,940 --> 00:34:07,970
I want to now search for one of these strings still using linear search.
740
00:34:07,970 --> 00:34:11,020
So let me create a string s, set it equal to get_string,
741
00:34:11,020 --> 00:34:13,840
prompt the user for a string to search for.
742
00:34:13,840 --> 00:34:19,540
And then I think my code here is almost the same, except for one detail.
743
00:34:19,540 --> 00:34:21,850
I now have an array called strings.
744
00:34:21,850 --> 00:34:24,040
I now have a variable called s.
745
00:34:24,040 --> 00:34:27,790
But it turns out, for reasons we'll explore in more detail next week,
746
00:34:27,790 --> 00:34:31,030
this line of code is not going to work.
747
00:34:31,030 --> 00:34:34,900
And it turns out the reason has to do with what we discussed
748
00:34:34,900 --> 00:34:36,880
last week of what a string really is.
749
00:34:36,880 --> 00:34:39,355
And what is a string, again?
750
00:34:39,355 --> 00:34:41,000
A string is an array.
751
00:34:41,000 --> 00:34:44,929
And it turns out, though, that equals equals is not
752
00:34:44,929 --> 00:34:49,760
going to generously compare all of the characters in an array for you
753
00:34:49,760 --> 00:34:51,949
just because you use equal equals.
754
00:34:51,949 --> 00:34:54,650
It turns out it's not going to compare every letter.
755
00:34:54,650 --> 00:35:00,260
And so thankfully, there is, in the string library
756
00:35:00,260 --> 00:35:03,058
that we introduced last week, a solution to this problem.
757
00:35:03,058 --> 00:35:05,850
The reason for the problem, we'll explore in more detail next week.
758
00:35:05,850 --> 00:35:09,860
But for now, just know that when you want to compare strings in C--
759
00:35:09,860 --> 00:35:13,220
especially if you've come into the class knowing a bit of Java or Python or some
760
00:35:13,220 --> 00:35:15,680
other language-- you cannot use equals equals.
761
00:35:15,680 --> 00:35:18,500
Even though you could in Scratch, you cannot in C.
762
00:35:18,500 --> 00:35:21,620
So what I have to actually do here is this.
763
00:35:21,620 --> 00:35:26,330
I have to ask the question, does the return value of a function called str
764
00:35:26,330 --> 00:35:33,160
compare, or strcomp, equal 0 when passed in the current string
765
00:35:33,160 --> 00:35:36,050
and that's user input?
766
00:35:36,050 --> 00:35:39,790
So if you read the documentation for this function called str compare,
767
00:35:39,790 --> 00:35:44,500
you'll see that it takes two strings as input, first one and second one.
768
00:35:44,500 --> 00:35:47,140
It then-- someone decades ago wrote the code
769
00:35:47,140 --> 00:35:49,060
that probably uses a for loop or a while loop
770
00:35:49,060 --> 00:35:51,910
to compare every character in each of those strings.
771
00:35:51,910 --> 00:35:56,290
And it turns out it returns 0 if they are, in fact, equal.
772
00:35:56,290 --> 00:36:00,640
Turns out, too, it will return a positive number or a negative number
773
00:36:00,640 --> 00:36:02,440
in other situations.
774
00:36:02,440 --> 00:36:04,990
Any intuition for why it might actually be
775
00:36:04,990 --> 00:36:10,810
useful to have a function that allows you to check if two strings are equal?
776
00:36:10,810 --> 00:36:13,480
If they're not equal, what else might be interesting to know
777
00:36:13,480 --> 00:36:14,830
when comparing two strings?
778
00:36:14,830 --> 00:36:18,474
779
00:36:18,474 --> 00:36:19,391
If certain values are?
780
00:36:19,391 --> 00:36:23,347
STUDENT: [INAUDIBLE]
781
00:36:23,347 --> 00:36:24,430
DAVID MALAN: OK, possibly.
782
00:36:24,430 --> 00:36:26,950
Maybe you want to just how similar they are.
783
00:36:26,950 --> 00:36:28,810
And that's indeed an algorithm unto itself.
784
00:36:28,810 --> 00:36:31,410
But str compare is a little simpler than that.
785
00:36:31,410 --> 00:36:33,040
STUDENT: [INAUDIBLE]
786
00:36:33,040 --> 00:36:35,850
787
00:36:35,850 --> 00:36:39,100
DAVID MALAN: Exactly, if you're trying to alphabetize a whole list of strings,
788
00:36:39,100 --> 00:36:41,950
just like your phone probably is for your contacts or address book.
789
00:36:41,950 --> 00:36:44,320
It turns out that str compare will actually
790
00:36:44,320 --> 00:36:48,220
return a positive number or a negative number or a 0
791
00:36:48,220 --> 00:36:52,120
based on whether, maybe it comes alphabetically first or later,
792
00:36:52,120 --> 00:36:53,800
or in fact, equal.
793
00:36:53,800 --> 00:36:55,130
So that can be a useful thing.
794
00:36:55,130 --> 00:36:57,838
And that's just a teaser for a lower level explanation
795
00:36:57,838 --> 00:36:58,880
that we'll see next week.
796
00:36:58,880 --> 00:37:01,750
So now, let me cross my fingers and see if I got this right.
797
00:37:01,750 --> 00:37:05,410
Let me go ahead and do make search.
798
00:37:05,410 --> 00:37:08,590
Did compile, albeit slowly.
799
00:37:08,590 --> 00:37:11,920
Dot slash search, and let's search for something like the thimble.
800
00:37:11,920 --> 00:37:14,048
And we see that that's, indeed, found.
801
00:37:14,048 --> 00:37:16,090
Otherwise, let's search for something that I know
802
00:37:16,090 --> 00:37:19,060
isn't there, like a race car, which was there when I grew up.
803
00:37:19,060 --> 00:37:23,227
But huh, segmentation fault, core dumped.
804
00:37:23,227 --> 00:37:25,810
And actually, some of you have tripped over this error before.
805
00:37:25,810 --> 00:37:27,220
Anyone want to admit seeing this?
806
00:37:27,220 --> 00:37:29,350
So yeah, not something we've talked about,
807
00:37:29,350 --> 00:37:32,170
and honestly, not something I intended just now.
808
00:37:32,170 --> 00:37:34,450
But that too, we'll see next week.
809
00:37:34,450 --> 00:37:39,920
Any intuition for why my program just broke.
810
00:37:39,920 --> 00:37:41,900
I didn't really change the logic.
811
00:37:41,900 --> 00:37:43,550
It's still linear search.
812
00:37:43,550 --> 00:37:46,280
Let me hide the terminal so you can see all of the code at once.
813
00:37:46,280 --> 00:37:49,850
The only thing I did was switched from integers to strings.
814
00:37:49,850 --> 00:37:52,310
And I switched to str compare here.
815
00:37:52,310 --> 00:37:54,205
But segmentation fault happened.
816
00:37:54,205 --> 00:37:57,080
And the teaser is that that somehow relates to the computer's memory.
817
00:37:57,080 --> 00:37:57,996
Yeah.
818
00:37:57,996 --> 00:38:00,690
STUDENT: [INAUDIBLE]
819
00:38:00,690 --> 00:38:01,470
820
00:38:01,470 --> 00:38:03,670
DAVID MALAN: Yeah, and this is subtle, but spot on.
821
00:38:03,670 --> 00:38:06,510
So one, two, three, four, five, six elements
822
00:38:06,510 --> 00:38:11,880
total in this array, versus the seven numbers of monopoly denominations
823
00:38:11,880 --> 00:38:12,810
that we had earlier.
824
00:38:12,810 --> 00:38:13,888
And this is where, see?
825
00:38:13,888 --> 00:38:15,930
Sort of case in point, this came back to bite me.
826
00:38:15,930 --> 00:38:19,860
The fact that I hardcoded this value as opposed to maybe separating it out
827
00:38:19,860 --> 00:38:22,260
as a constant or declaring it higher up, kind of
828
00:38:22,260 --> 00:38:26,610
bit me here, because now, I'm iterating over an array of size 6.
829
00:38:26,610 --> 00:38:29,820
But clearly, I'm going one step too far, because I'm literally
830
00:38:29,820 --> 00:38:32,250
going to iterate seven times, not six.
831
00:38:32,250 --> 00:38:35,580
So it's as though I'm looking at memory that's over here.
832
00:38:35,580 --> 00:38:37,530
And indeed, next week, we'll focus on memory.
833
00:38:37,530 --> 00:38:38,860
And that's just a bad thing.
834
00:38:38,860 --> 00:38:42,270
So odds are, not even seeing your code from this past week, if any of you
835
00:38:42,270 --> 00:38:46,020
have had segmentation faults, odds are, you touched memory
836
00:38:46,020 --> 00:38:47,280
that you shouldn't have.
837
00:38:47,280 --> 00:38:49,290
You maybe looped too many times.
838
00:38:49,290 --> 00:38:52,770
You might have used a negative number to get into your array.
839
00:38:52,770 --> 00:38:55,220
In general, you touched memory that you shouldn't have.
840
00:38:55,220 --> 00:38:57,720
And you touched a segment of memory that you shouldn't have.
841
00:38:57,720 --> 00:39:00,060
The fix, though, at least in my case, is simple.
842
00:39:00,060 --> 00:39:01,300
Just don't do that.
843
00:39:01,300 --> 00:39:03,210
So let me go ahead and recompile this.
844
00:39:03,210 --> 00:39:06,870
Make search dot slash search.
845
00:39:06,870 --> 00:39:10,320
And I'll search again for race car, Enter.
846
00:39:10,320 --> 00:39:11,850
And now it does not crash.
847
00:39:11,850 --> 00:39:13,630
But it does tell me it's not found.
848
00:39:13,630 --> 00:39:17,040
So subtle, but something you might yourself have tripped over already.
849
00:39:17,040 --> 00:39:23,190
Questions then, on what I just did, intentionally or otherwise.
850
00:39:23,190 --> 00:39:24,423
Yeah, in front.
851
00:39:24,423 --> 00:39:29,060
STUDENT: One thing is the program still works if you do return--
852
00:39:29,060 --> 00:39:31,275
if you don't do return 0, return 1.
853
00:39:31,275 --> 00:39:33,220
So what is the purpose of doing [INAUDIBLE]??
854
00:39:33,220 --> 00:39:34,720
DAVID MALAN: A really good question.
855
00:39:34,720 --> 00:39:38,920
So the program will still work even if I don't return 0 or return 1.
856
00:39:38,920 --> 00:39:43,120
In fact, let me go ahead and do that and just hide my terminal window
857
00:39:43,120 --> 00:39:43,930
for a second.
858
00:39:43,930 --> 00:39:46,210
Let's get rid of the return here.
859
00:39:46,210 --> 00:39:48,040
Let's get rid of the return here.
860
00:39:48,040 --> 00:39:50,810
However, watch what happens here.
861
00:39:50,810 --> 00:39:53,710
Let me go ahead and recompile this, make search.
862
00:39:53,710 --> 00:39:55,610
Let me scroll up in my code here.
863
00:39:55,610 --> 00:39:57,560
Let me go ahead and do dot slash search.
864
00:39:57,560 --> 00:40:00,280
And let me go ahead and search for the first thing in the list,
865
00:40:00,280 --> 00:40:02,800
battle ship, so I know that this should be found.
866
00:40:02,800 --> 00:40:04,690
I hit Enter.
867
00:40:04,690 --> 00:40:05,858
Huh, interesting.
868
00:40:05,858 --> 00:40:07,150
So it's saying found not found.
869
00:40:07,150 --> 00:40:11,496
But do you see why, logically, in this case?
870
00:40:11,496 --> 00:40:12,980
STUDENT: Is the loop still running?
871
00:40:12,980 --> 00:40:13,910
DAVID MALAN: Exactly.
872
00:40:13,910 --> 00:40:15,302
So the loop is still running.
873
00:40:15,302 --> 00:40:17,010
So there's a couple of solutions to this.
874
00:40:17,010 --> 00:40:21,080
I could, for instance, somehow break out of the code here.
875
00:40:21,080 --> 00:40:24,200
But that's going to still result in line 18 executing.
876
00:40:24,200 --> 00:40:26,600
I could then instead just return here.
877
00:40:26,600 --> 00:40:29,390
I don't strictly need to return 1 down at the bottom.
878
00:40:29,390 --> 00:40:31,580
But I made this claim last week that it tends
879
00:40:31,580 --> 00:40:35,600
to be helpful as your programs get more sophisticated, to at least signify,
880
00:40:35,600 --> 00:40:39,300
just like a real world programmer, error codes when something goes wrong.
881
00:40:39,300 --> 00:40:44,090
So returning 0 in main is the easiest way to signify my code is done.
882
00:40:44,090 --> 00:40:46,340
I'm ready to exit successfully, that's it.
883
00:40:46,340 --> 00:40:48,988
But down here, I could absolutely still return 0,
884
00:40:48,988 --> 00:40:50,280
because that's not a huge deal.
885
00:40:50,280 --> 00:40:53,870
It's not really an error that deserves annoying the user with some kind of pop
886
00:40:53,870 --> 00:40:55,200
up that something went wrong.
887
00:40:55,200 --> 00:40:57,830
But return 1 is just a lower level way of signaling,
888
00:40:57,830 --> 00:41:00,330
eh, it didn't really find what I was looking for.
889
00:41:00,330 --> 00:41:03,510
And remember from last week, you can see this as follows.
890
00:41:03,510 --> 00:41:08,060
If I recompile this again, now that I've reverted those changes, so make search.
891
00:41:08,060 --> 00:41:13,100
And if I do a dot slash search and search for battle ship,
892
00:41:13,100 --> 00:41:16,700
which is indeed found, recall I can execute this magical command,
893
00:41:16,700 --> 00:41:19,790
echo dollar sign question mark, which you're not going to often execute.
894
00:41:19,790 --> 00:41:22,790
But it shows you what main returned.
895
00:41:22,790 --> 00:41:27,770
If I run search again and search for race car, which is not found,
896
00:41:27,770 --> 00:41:30,530
I see not found, but I can also run this command again
897
00:41:30,530 --> 00:41:32,150
and see that, oh, it returned 1.
898
00:41:32,150 --> 00:41:35,300
So now if you fast forward a few months, a few years, when you're actually
899
00:41:35,300 --> 00:41:37,850
writing code in a company or for larger projects,
900
00:41:37,850 --> 00:41:40,100
you might want to be automating software.
901
00:41:40,100 --> 00:41:43,100
You might not want the human to necessarily be running it manually.
902
00:41:43,100 --> 00:41:47,240
You might want code to be automated by some nightly process
903
00:41:47,240 --> 00:41:48,360
or something like that.
904
00:41:48,360 --> 00:41:53,030
Using these exit codes, can a program determine yes or no
905
00:41:53,030 --> 00:41:55,910
that other code succeeded or failed.
906
00:41:55,910 --> 00:42:01,850
Other questions on linear search in this way.
907
00:42:01,850 --> 00:42:02,350
No?
908
00:42:02,350 --> 00:42:07,570
All right, well, let's translate this to one other feature of C
909
00:42:07,570 --> 00:42:11,590
here by incorporating these two ideas now into one other program.
910
00:42:11,590 --> 00:42:16,605
So I'm going to create a phone book in C by doing code space phonebook dot C.
911
00:42:16,605 --> 00:42:18,730
And let's combine some of these ideas and implement
912
00:42:18,730 --> 00:42:21,700
this notion of searching a phonebook for an actual name
913
00:42:21,700 --> 00:42:23,030
and getting back a number.
914
00:42:23,030 --> 00:42:26,500
So I'm going to go ahead and quickly include some of the same things, cs50.h
915
00:42:26,500 --> 00:42:28,060
so we can get input.
916
00:42:28,060 --> 00:42:30,860
standard io dot h so we can print output.
917
00:42:30,860 --> 00:42:33,310
And I'm going to preemptively include string.h
918
00:42:33,310 --> 00:42:35,320
in case we need that one as well.
919
00:42:35,320 --> 00:42:39,010
int main void, no need for command line arguments today.
920
00:42:39,010 --> 00:42:42,650
And let me give myself, now, an array of names for this phone book.
921
00:42:42,650 --> 00:42:45,040
So string names equals.
922
00:42:45,040 --> 00:42:47,620
And then in curly braces, how about Carter
923
00:42:47,620 --> 00:42:50,840
will be one person in the phone book, and David, myself, will be the other.
924
00:42:50,840 --> 00:42:53,465
So we'll keep it short so we don't have to type too many names.
925
00:42:53,465 --> 00:42:55,840
But this is a phone book with two people thus far.
926
00:42:55,840 --> 00:42:59,628
Suppose, now, we want to also store Carter's phone number in mind.
927
00:42:59,628 --> 00:43:01,420
So it's not just saying found or not found.
928
00:43:01,420 --> 00:43:05,320
It's literally looking up our phone numbers like a proper phone book.
929
00:43:05,320 --> 00:43:09,440
Well, at the moment, there's really no way to do this.
930
00:43:09,440 --> 00:43:15,460
I could do something hackish like I could put a number like 617-495-1000
931
00:43:15,460 --> 00:43:16,510
after Carter.
932
00:43:16,510 --> 00:43:22,460
I could maybe do something like 949-468-2750 after me.
933
00:43:22,460 --> 00:43:25,300
But now you're kind of doing the whole apples and oranges thing.
934
00:43:25,300 --> 00:43:26,470
Now, it's not strings.
935
00:43:26,470 --> 00:43:28,420
It's a string int, string int.
936
00:43:28,420 --> 00:43:31,240
All right, so maybe I could just make all of these strings.
937
00:43:31,240 --> 00:43:34,600
But now it's just a conceptual mixing of apples and oranges.
938
00:43:34,600 --> 00:43:36,425
Like yes, that's an array of four strings.
939
00:43:36,425 --> 00:43:39,550
But now you're on the honor system to know that the first string is a name,
940
00:43:39,550 --> 00:43:42,160
the second string is a number, the third string is--
941
00:43:42,160 --> 00:43:43,100
you can do it.
942
00:43:43,100 --> 00:43:45,110
But it's a bit of a hack, so to speak.
943
00:43:45,110 --> 00:43:47,300
So what might be cleaner than this?
944
00:43:47,300 --> 00:43:51,070
Instead of combining our phone numbers into the same array as our names,
945
00:43:51,070 --> 00:43:55,480
what else might we do that's perhaps a little better?
946
00:43:55,480 --> 00:43:56,440
Say it little louder.
947
00:43:56,440 --> 00:43:58,960
948
00:43:58,960 --> 00:44:01,197
A 2D array, possibly something we could do.
949
00:44:01,197 --> 00:44:04,030
I'm going to keep it even simpler now, because we haven't used those
950
00:44:04,030 --> 00:44:07,823
by name, even though that is, we saw last week, technically what argv is.
951
00:44:07,823 --> 00:44:10,240
What else could I do if I want to store names and numbers?
952
00:44:10,240 --> 00:44:11,147
Yeah.
953
00:44:11,147 --> 00:44:12,220
STUDENT: [INAUDIBLE]
954
00:44:12,220 --> 00:44:13,690
DAVID MALAN: Yeah, let me go with this suggestion.
955
00:44:13,690 --> 00:44:14,607
It's a little simpler.
956
00:44:14,607 --> 00:44:17,440
Rather than complicate things in literally different dimensions,
957
00:44:17,440 --> 00:44:18,970
let me go ahead and do string.
958
00:44:18,970 --> 00:44:21,730
Well, I could do int numbers.
959
00:44:21,730 --> 00:44:22,690
But you know what?
960
00:44:22,690 --> 00:44:27,700
So that we can support punctuation like dashes or even parentheses or country
961
00:44:27,700 --> 00:44:29,200
codes, I'm going to do this instead.
962
00:44:29,200 --> 00:44:33,760
I'm going to do string numbers so that I can represent Carter's number as quote
963
00:44:33,760 --> 00:44:39,040
unquote plus 1 for the US, 617-495-1000, complete with hyphens,
964
00:44:39,040 --> 00:44:40,390
as is US convention.
965
00:44:40,390 --> 00:44:47,930
And then for mine I'll go ahead and do +1-949-468-2750 semicolon.
966
00:44:47,930 --> 00:44:51,610
And now down below, let's actually enable the user to search this phone
967
00:44:51,610 --> 00:44:53,860
book, just like in week 0 we did.
968
00:44:53,860 --> 00:44:55,960
String name equals get string.
969
00:44:55,960 --> 00:44:58,960
And let's ask the user for a name, presumably David or Carter
970
00:44:58,960 --> 00:44:59,990
or someone else.
971
00:44:59,990 --> 00:45:01,850
And now let's re-implement linear search.
972
00:45:01,850 --> 00:45:05,920
So 4, int i get 0. i is less than 2.
973
00:45:05,920 --> 00:45:07,510
And do as I say, not as I do.
974
00:45:07,510 --> 00:45:09,700
I think we should beware this hard coding,
975
00:45:09,700 --> 00:45:12,010
but we'll keep it simple for now.
976
00:45:12,010 --> 00:45:13,220
i++.
977
00:45:13,220 --> 00:45:16,390
And then in this for loop, I think we have all of the ingredients
978
00:45:16,390 --> 00:45:17,150
to solve this.
979
00:45:17,150 --> 00:45:23,440
So if the return value of str compare of all of the names bracket
980
00:45:23,440 --> 00:45:28,810
i comparing against the name that the human typed in, if all of that equals
981
00:45:28,810 --> 00:45:33,380
equals 0, that is, all of the characters in those two strings are equal,
982
00:45:33,380 --> 00:45:36,770
then I think we can go ahead and say found, just like last time.
983
00:45:36,770 --> 00:45:37,520
But you know what?
984
00:45:37,520 --> 00:45:40,130
Let's actually print Carter's or my phone number.
985
00:45:40,130 --> 00:45:44,770
So found percent s, and we'll plug in numbers, bracket i.
986
00:45:44,770 --> 00:45:47,800
And then just for consistency, I'll return 0 here.
987
00:45:47,800 --> 00:45:52,210
And down here, how about I'll say something like printf not
988
00:45:52,210 --> 00:45:53,600
found, just to be clear.
989
00:45:53,600 --> 00:45:56,240
And then I'll return 1 as well.
990
00:45:56,240 --> 00:45:58,120
So just to recap, here's all of the code.
991
00:45:58,120 --> 00:46:01,610
It's almost the same as before, except now it's useful.
992
00:46:01,610 --> 00:46:03,460
I'm not just saying found or not found.
993
00:46:03,460 --> 00:46:07,180
I found a number in monopoly, or I found a piece in monopoly.
994
00:46:07,180 --> 00:46:09,880
I'm looking up in one array, one of the strings.
995
00:46:09,880 --> 00:46:12,730
And then I'm printing from the other array, the answer.
996
00:46:12,730 --> 00:46:19,480
So let me go ahead here and run the compiler, make phone book, Enter.
997
00:46:19,480 --> 00:46:21,070
OK, that's promising, no errors.
998
00:46:21,070 --> 00:46:22,720
Dot slash phonebook now.
999
00:46:22,720 --> 00:46:26,350
And let's search, for instance, Carter Enter.
1000
00:46:26,350 --> 00:46:28,060
All right, so we found Carter's number.
1001
00:46:28,060 --> 00:46:29,393
All right, let me do that again.
1002
00:46:29,393 --> 00:46:30,960
Phone book, let's search for David.
1003
00:46:30,960 --> 00:46:32,960
All right, we seem to have found David's number.
1004
00:46:32,960 --> 00:46:34,502
All right, let's do it one last time.
1005
00:46:34,502 --> 00:46:35,410
Phone book, Enter.
1006
00:46:35,410 --> 00:46:37,360
And now we'll search for John Harvard.
1007
00:46:37,360 --> 00:46:40,060
Enter, not found.
1008
00:46:40,060 --> 00:46:45,520
All right, so I daresay, albeit with minimal testing, this code is correct.
1009
00:46:45,520 --> 00:46:48,190
Would anyone now like to critique the design?
1010
00:46:48,190 --> 00:46:51,910
Does something rub you the wrong way, perhaps, about this approach here?
1011
00:46:51,910 --> 00:46:55,120
1012
00:46:55,120 --> 00:46:58,180
And as always, think about how, if the program maybe
1013
00:46:58,180 --> 00:47:01,510
gets longer, more complicated, how decisions like this might unfold.
1014
00:47:01,510 --> 00:47:02,448
Yeah.
1015
00:47:02,448 --> 00:47:04,400
STUDENT: If i is less than 2.
1016
00:47:04,400 --> 00:47:07,820
DAVID MALAN: OK, so if i is less than 2, so technically,
1017
00:47:07,820 --> 00:47:10,080
if I change the number of people in this phone book,
1018
00:47:10,080 --> 00:47:11,330
I'm going to have to update i.
1019
00:47:11,330 --> 00:47:13,290
And we've already seen that I get myself into trouble.
1020
00:47:13,290 --> 00:47:14,165
So that's bad design.
1021
00:47:14,165 --> 00:47:15,005
Good.
1022
00:47:15,005 --> 00:47:17,795
STUDENT: Say you add someone's name to the phonebook,
1023
00:47:17,795 --> 00:47:20,710
but you don't have the corresponding number.
1024
00:47:20,710 --> 00:47:23,380
So then when you go to pull their number,
1025
00:47:23,380 --> 00:47:24,730
it [INAUDIBLE] someone's number.
1026
00:47:24,730 --> 00:47:25,480
DAVID MALAN: Yeah.
1027
00:47:25,480 --> 00:47:28,180
So again, I'm sort of trusting myself not to screw up.
1028
00:47:28,180 --> 00:47:30,850
If I add John or anyone else to the first array
1029
00:47:30,850 --> 00:47:34,180
but I forget to add their number to the second array,
1030
00:47:34,180 --> 00:47:36,640
eventually things are going to drift and be inconsistent.
1031
00:47:36,640 --> 00:47:39,010
And then code will be incorrect at that point.
1032
00:47:39,010 --> 00:47:43,420
So sort of a poor design setting me up for future failure, if you will.
1033
00:47:43,420 --> 00:47:44,860
Other thoughts?
1034
00:47:44,860 --> 00:47:45,460
Yeah.
1035
00:47:45,460 --> 00:47:49,816
STUDENT: [INAUDIBLE] so if you were to switch the order of the numbers
1036
00:47:49,816 --> 00:47:52,848
but not the main [INAUDIBLE]
1037
00:47:52,848 --> 00:47:54,140
DAVID MALAN: Yeah, really good.
1038
00:47:54,140 --> 00:47:55,550
We're assuming the same order.
1039
00:47:55,550 --> 00:47:59,452
From left to right, the names go, and from left to right, the numbers go.
1040
00:47:59,452 --> 00:48:01,160
But that's kind of just the honor system.
1041
00:48:01,160 --> 00:48:03,440
Like, there's literally nothing in code preventing me
1042
00:48:03,440 --> 00:48:07,047
from reversing the order for whatever reason, or maybe sorting the names.
1043
00:48:07,047 --> 00:48:10,130
Like, they're sorted now, and maybe that's deliberate, but maybe it's not.
1044
00:48:10,130 --> 00:48:12,920
So this honor system here, too, is just not good.
1045
00:48:12,920 --> 00:48:17,180
I could put a comment in here to remind myself, note to self,
1046
00:48:17,180 --> 00:48:19,490
always update arrays the same way.
1047
00:48:19,490 --> 00:48:21,600
But like, something's going to happen eventually,
1048
00:48:21,600 --> 00:48:26,090
especially when we have not two, but three, but 30, 300 names and numbers.
1049
00:48:26,090 --> 00:48:29,670
It would be nice to keep all of the related data together.
1050
00:48:29,670 --> 00:48:32,810
And so in fact, the one new feature of C we'll introduce today
1051
00:48:32,810 --> 00:48:37,970
is one that actually allows us to implement our very own data structures.
1052
00:48:37,970 --> 00:48:41,870
You can think of arrays as a very lightweight data
1053
00:48:41,870 --> 00:48:44,870
structure, in that it allows you to cluster related data back
1054
00:48:44,870 --> 00:48:45,930
to back to back to back.
1055
00:48:45,930 --> 00:48:48,170
And this is how strings are implemented.
1056
00:48:48,170 --> 00:48:51,560
They are a data structure effectively implemented with an array.
1057
00:48:51,560 --> 00:48:54,290
But with C and with other languages, it turns out
1058
00:48:54,290 --> 00:48:56,600
you can invent your own data types, whether they're
1059
00:48:56,600 --> 00:48:59,870
one dimensional, two dimensional even, or beyond.
1060
00:48:59,870 --> 00:49:05,960
And with C, can you specifically create your own types
1061
00:49:05,960 --> 00:49:07,200
that have their own names?
1062
00:49:07,200 --> 00:49:10,670
So for instance, wouldn't it have been nice if C came with,
1063
00:49:10,670 --> 00:49:16,380
not just char and int and floats and long and others.
1064
00:49:16,380 --> 00:49:19,970
Wouldn't it be nice if C came with a data type called person?
1065
00:49:19,970 --> 00:49:22,790
And ideally, a person would have a name and a number.
1066
00:49:22,790 --> 00:49:24,860
Now, that's a little naive and unrealistic.
1067
00:49:24,860 --> 00:49:28,460
Like, why would they define a person to have just those two fields.
1068
00:49:28,460 --> 00:49:30,950
Certainly, people could have disagreed what a person is.
1069
00:49:30,950 --> 00:49:32,300
So they leave it to us.
1070
00:49:32,300 --> 00:49:35,900
The authors of C gave us all of these primitives, ints and floats and strings
1071
00:49:35,900 --> 00:49:36,810
and so forth.
1072
00:49:36,810 --> 00:49:39,810
But it's up to us now to use those in a more interesting way
1073
00:49:39,810 --> 00:49:44,940
so that we can create an array of person variables, if you will,
1074
00:49:44,940 --> 00:49:48,150
inside of an array called people, just to pluralize it here.
1075
00:49:48,150 --> 00:49:49,740
So how are we going to do this?
1076
00:49:49,740 --> 00:49:54,140
Well, for now, let's just stipulate that a person in the world
1077
00:49:54,140 --> 00:49:57,260
will have a name and a number that we could argue all day long what else
1078
00:49:57,260 --> 00:49:58,010
a person should have.
1079
00:49:58,010 --> 00:49:58,677
And that's fine.
1080
00:49:58,677 --> 00:50:01,790
You can invent your own person eventually.
1081
00:50:01,790 --> 00:50:04,610
At the moment, I'm using just two variables
1082
00:50:04,610 --> 00:50:06,500
to define a person's name and number.
1083
00:50:06,500 --> 00:50:10,490
But wouldn't it be nice to encapsulate, that is, combine these two data
1084
00:50:10,490 --> 00:50:14,660
types, into a new and improved data type called person.
1085
00:50:14,660 --> 00:50:17,360
And the syntax for that is going to be this.
1086
00:50:17,360 --> 00:50:18,800
So it's a bit of a mouthful.
1087
00:50:18,800 --> 00:50:21,960
But you can, perhaps, infer what some of this is doing here.
1088
00:50:21,960 --> 00:50:24,500
So it turns out C has a keyword called typedef.
1089
00:50:24,500 --> 00:50:28,310
As the name kind of suggests, this allows you to define your own type.
1090
00:50:28,310 --> 00:50:31,550
Struct is an indication that it's a structure.
1091
00:50:31,550 --> 00:50:35,060
It's like a structure that has multiple values inside of it
1092
00:50:35,060 --> 00:50:36,710
that you are trying to define.
1093
00:50:36,710 --> 00:50:39,440
And then at the very bottom here outside of the curly braces,
1094
00:50:39,440 --> 00:50:42,270
is the name of the type that you want to create.
1095
00:50:42,270 --> 00:50:45,790
So you don't have discretion over using typedef or struct
1096
00:50:45,790 --> 00:50:46,790
in this particular case.
1097
00:50:46,790 --> 00:50:48,665
But you can name the thing whatever you want.
1098
00:50:48,665 --> 00:50:52,590
And you can put anything in the structure that you want as well.
1099
00:50:52,590 --> 00:50:56,510
And as soon as the semicolon is executed at the bottom of the code,
1100
00:50:56,510 --> 00:50:59,180
every line thereafter can now have access
1101
00:50:59,180 --> 00:51:05,760
to a person data type, whether as a single variable, or as an entire array.
1102
00:51:05,760 --> 00:51:10,260
So if I want to build on this then, let me go ahead and do this.
1103
00:51:10,260 --> 00:51:12,230
Let me go back to my C code here.
1104
00:51:12,230 --> 00:51:17,610
And I'm going to go ahead and change just a couple of things.
1105
00:51:17,610 --> 00:51:19,110
Let's go ahead and do this.
1106
00:51:19,110 --> 00:51:23,240
I'm going to go ahead and, first, get rid of those two hardcoded arrays.
1107
00:51:23,240 --> 00:51:27,180
And let me go ahead and, at the top of my file,
1108
00:51:27,180 --> 00:51:30,180
invent this type, so typedef struct.
1109
00:51:30,180 --> 00:51:34,470
Inside of it will be a string name and then a string number.
1110
00:51:34,470 --> 00:51:36,780
And then the name of the structure will be person.
1111
00:51:36,780 --> 00:51:40,025
And best practice would have me define it at the very top of my file
1112
00:51:40,025 --> 00:51:42,150
so that any of my functions, in fact, could use it,
1113
00:51:42,150 --> 00:51:44,530
even though I just have main in this case.
1114
00:51:44,530 --> 00:51:47,100
Now, if I wanted, I could do this.
1115
00:51:47,100 --> 00:51:50,370
Person P1 and person P2.
1116
00:51:50,370 --> 00:51:53,040
But we know from last week, that already is bad design.
1117
00:51:53,040 --> 00:51:57,400
If you want to have multiple instances of the same type of variable,
1118
00:51:57,400 --> 00:52:00,044
we should probably use what instead?
1119
00:52:00,044 --> 00:52:01,046
STUDENT: [INAUDIBLE]
1120
00:52:01,046 --> 00:52:01,796
DAVID MALAN: And--
1121
00:52:01,796 --> 00:52:02,470
STUDENT: An array.
1122
00:52:02,470 --> 00:52:03,637
DAVID MALAN: Yeah, an array.
1123
00:52:03,637 --> 00:52:05,230
So let me not even go down that road.
1124
00:52:05,230 --> 00:52:06,700
Let me instead just do this.
1125
00:52:06,700 --> 00:52:09,727
Person will be the type of the array.
1126
00:52:09,727 --> 00:52:10,810
But I'm going to call it--
1127
00:52:10,810 --> 00:52:11,980
I could call it persons.
1128
00:52:11,980 --> 00:52:13,720
But in English, we typically say people.
1129
00:52:13,720 --> 00:52:15,190
So I'll call the array people.
1130
00:52:15,190 --> 00:52:18,110
And I want two people to exist in this array,
1131
00:52:18,110 --> 00:52:20,920
though I could certainly change that number to be anything I want.
1132
00:52:20,920 --> 00:52:24,820
How, now, do you put a name inside of a person
1133
00:52:24,820 --> 00:52:27,190
and then put the number inside of that same person?
1134
00:52:27,190 --> 00:52:28,990
Well, slightly new syntax today.
1135
00:52:28,990 --> 00:52:30,520
I'm going to go ahead and say this.
1136
00:52:30,520 --> 00:52:34,420
People bracket 0 just gives me the first person in the array.
1137
00:52:34,420 --> 00:52:35,570
That's not new.
1138
00:52:35,570 --> 00:52:40,840
But if you want to go inside of that person in memory, you use a dot.
1139
00:52:40,840 --> 00:52:44,870
And then you just specify the name of the attribute therein.
1140
00:52:44,870 --> 00:52:47,410
So if I want to set the first person's name to Carter,
1141
00:52:47,410 --> 00:52:49,480
I just use that so-called dot notation.
1142
00:52:49,480 --> 00:52:52,780
And then if I want to set Carter's number using dot notation,
1143
00:52:52,780 --> 00:52:56,680
I would do this, +1-617-495-1000.
1144
00:52:56,680 --> 00:52:58,880
And then if I want to do the same for myself,
1145
00:52:58,880 --> 00:53:03,730
I would now do people bracket 1 dot name equals quote unquote David.
1146
00:53:03,730 --> 00:53:08,440
And then people bracket 1 still dot number equals quote unquote
1147
00:53:08,440 --> 00:53:13,030
+1-949-468-2750.
1148
00:53:13,030 --> 00:53:15,970
And now, at the bottom of my file, I think
1149
00:53:15,970 --> 00:53:18,610
my logic can pretty much stay the same.
1150
00:53:18,610 --> 00:53:22,180
I can still, on this line here, prompt the user
1151
00:53:22,180 --> 00:53:24,370
for the name of the person they want to look up.
1152
00:53:24,370 --> 00:53:26,620
For now, even though I admit it's not the best design,
1153
00:53:26,620 --> 00:53:28,495
I'm just doing this for demonstration's sake,
1154
00:53:28,495 --> 00:53:31,360
I'm going to leave the two there, because I know I have two people.
1155
00:53:31,360 --> 00:53:34,100
But down here, this is going to have to change.
1156
00:53:34,100 --> 00:53:37,000
I don't want to compare names bracket i anymore.
1157
00:53:37,000 --> 00:53:42,190
What do I want to type here as the first argument to str compare?
1158
00:53:42,190 --> 00:53:43,900
What do I want to do here?
1159
00:53:43,900 --> 00:53:44,960
Yeah.
1160
00:53:44,960 --> 00:53:46,800
STUDENT: People i dot name.
1161
00:53:46,800 --> 00:53:49,140
DAVID MALAN: So people i dot name, yeah.
1162
00:53:49,140 --> 00:53:52,938
So I want to go into the people array at the ith location,
1163
00:53:52,938 --> 00:53:54,480
because that's what my loop is doing.
1164
00:53:54,480 --> 00:53:55,890
It's updating i again and again.
1165
00:53:55,890 --> 00:53:58,087
And then look at name, and that's good.
1166
00:53:58,087 --> 00:53:59,670
I think now I need to change this too.
1167
00:53:59,670 --> 00:54:01,890
What do I want to print if the person is found?
1168
00:54:01,890 --> 00:54:02,445
Someone else?
1169
00:54:02,445 --> 00:54:05,070
1170
00:54:05,070 --> 00:54:08,850
What do I want to print here, if I found the person's name?
1171
00:54:08,850 --> 00:54:09,360
Yeah.
1172
00:54:09,360 --> 00:54:10,890
STUDENT: [INAUDIBLE]
1173
00:54:10,890 --> 00:54:12,390
DAVID MALAN: Say it a little louder.
1174
00:54:12,390 --> 00:54:13,795
STUDENT: People i dot number.
1175
00:54:13,795 --> 00:54:14,670
DAVID MALAN: Perfect.
1176
00:54:14,670 --> 00:54:17,460
So people bracket i dot number, if indeed I
1177
00:54:17,460 --> 00:54:20,310
want to print the corresponding number to this person.
1178
00:54:20,310 --> 00:54:22,930
And then I think the rest of my code can stay the same.
1179
00:54:22,930 --> 00:54:27,150
So let me go ahead and rerun make phone book to recompile this version.
1180
00:54:27,150 --> 00:54:28,170
So far so good.
1181
00:54:28,170 --> 00:54:29,400
Dot slash phone book.
1182
00:54:29,400 --> 00:54:31,598
Let's go ahead and type in Carter's name, found.
1183
00:54:31,598 --> 00:54:33,390
All right, let's go ahead and run it again.
1184
00:54:33,390 --> 00:54:35,273
David's name, found.
1185
00:54:35,273 --> 00:54:36,940
Let's go ahead and run it one more time.
1186
00:54:36,940 --> 00:54:40,260
Type in John Harvard, for instance, not found, in this case.
1187
00:54:40,260 --> 00:54:43,710
So fundamentally, the code isn't all that different.
1188
00:54:43,710 --> 00:54:46,090
Linear search is still behaving the same way.
1189
00:54:46,090 --> 00:54:48,690
And I admit, this is kind of ugly looking.
1190
00:54:48,690 --> 00:54:52,350
We've kind of made a two line solution five lines of code now.
1191
00:54:52,350 --> 00:54:56,640
But if we fast forward a week or two when we start saving information
1192
00:54:56,640 --> 00:55:01,500
to files, we'll introduce you to files like csv files,
1193
00:55:01,500 --> 00:55:03,892
comma separated values, or spreadsheet files, which
1194
00:55:03,892 --> 00:55:06,600
you've surely opened on your Mac or PC at some point in the past.
1195
00:55:06,600 --> 00:55:09,840
Suffice it to say we'll soon learn techniques for storing information,
1196
00:55:09,840 --> 00:55:11,790
like names and numbers, in files.
1197
00:55:11,790 --> 00:55:13,680
And at that point, we're not going to do any
1198
00:55:13,680 --> 00:55:15,990
of this hackish sort of hard coding of the number 2
1199
00:55:15,990 --> 00:55:19,080
and manually typing my name and Carter's name and number into our program.
1200
00:55:19,080 --> 00:55:21,750
We'll read the information dynamically from a file.
1201
00:55:21,750 --> 00:55:25,180
And in a few weeks, we'll read it dynamically from a database instead.
1202
00:55:25,180 --> 00:55:30,240
But this is, for now, just syntactically how we can create an array of size 2
1203
00:55:30,240 --> 00:55:32,190
containing one person each.
1204
00:55:32,190 --> 00:55:34,643
We can update the name and number of the first person,
1205
00:55:34,643 --> 00:55:36,810
update the name and the number of the second person,
1206
00:55:36,810 --> 00:55:40,500
and then later search across those names and print out
1207
00:55:40,500 --> 00:55:41,610
the corresponding numbers.
1208
00:55:41,610 --> 00:55:44,220
And in this sense, this is a better design.
1209
00:55:44,220 --> 00:55:44,730
Why?
1210
00:55:44,730 --> 00:55:48,690
Because my person data type encapsulates,
1211
00:55:48,690 --> 00:55:53,400
now, everything that it means to be a person, at least in this narrow world.
1212
00:55:53,400 --> 00:55:57,580
And if I want to add something to the notion of a person, for instance,
1213
00:55:57,580 --> 00:55:59,640
I could go up to my type def, and tomorrow,
1214
00:55:59,640 --> 00:56:03,743
add an address to every person and start reading that in as well.
1215
00:56:03,743 --> 00:56:05,160
And now it's not the honor system.
1216
00:56:05,160 --> 00:56:10,260
It's not a names array, a numbers array, an addresses array, and everything else
1217
00:56:10,260 --> 00:56:12,210
you might imagine related to a person.
1218
00:56:12,210 --> 00:56:17,223
It's all encapsulated, which is a term of art inside of the same type.
1219
00:56:17,223 --> 00:56:19,890
Reminiscent, if some of you have programmed before, of something
1220
00:56:19,890 --> 00:56:21,660
called object-oriented programming.
1221
00:56:21,660 --> 00:56:23,190
But we're not there yet.
1222
00:56:23,190 --> 00:56:24,690
C is not that.
1223
00:56:24,690 --> 00:56:29,280
Questions on this use of struct or this new syntax,
1224
00:56:29,280 --> 00:56:35,037
the dot operator being really the juicy part here.
1225
00:56:35,037 --> 00:56:35,620
Any questions?
1226
00:56:35,620 --> 00:56:36,522
Yeah.
1227
00:56:36,522 --> 00:56:39,414
STUDENT: [INAUDIBLE]
1228
00:56:39,414 --> 00:56:42,800
1229
00:56:42,800 --> 00:56:44,420
DAVID MALAN: On what line number?
1230
00:56:44,420 --> 00:56:46,063
STUDENT: 16.
1231
00:56:46,063 --> 00:56:46,730
DAVID MALAN: 16?
1232
00:56:46,730 --> 00:56:51,230
So yes, so syntactically, we introduced the square brackets last week.
1233
00:56:51,230 --> 00:56:55,310
So doing people bracket 0 just means go to the first person in the array.
1234
00:56:55,310 --> 00:56:58,400
That was like when Stephanie literally opened this door.
1235
00:56:58,400 --> 00:56:59,990
That's doors bracket 0.
1236
00:56:59,990 --> 00:57:02,330
But this is, of course, people bracket 0 instead.
1237
00:57:02,330 --> 00:57:04,580
Today, the dot is a new piece of syntax.
1238
00:57:04,580 --> 00:57:11,000
It means go inside of that person in memory and look at the name they're in
1239
00:57:11,000 --> 00:57:13,297
and set it equal to Carter and do the same for number.
1240
00:57:13,297 --> 00:57:13,880
So that's all.
1241
00:57:13,880 --> 00:57:16,340
It's like, open the locker door, go inside of it,
1242
00:57:16,340 --> 00:57:18,410
and check or set the name and the number.
1243
00:57:18,410 --> 00:57:19,040
Yeah.
1244
00:57:19,040 --> 00:57:29,280
STUDENT: [INAUDIBLE] can you set default values for each of the [INAUDIBLE]??
1245
00:57:29,280 --> 00:57:30,840
DAVID MALAN: Attributes is fine.
1246
00:57:30,840 --> 00:57:31,530
Good question.
1247
00:57:31,530 --> 00:57:34,050
In the struct, can you set default values?
1248
00:57:34,050 --> 00:57:35,100
Short answer, no.
1249
00:57:35,100 --> 00:57:39,000
And this is where C becomes less feature-able than more modern languages
1250
00:57:39,000 --> 00:57:42,580
like Python and Java and others, where you can, in fact, do that.
1251
00:57:42,580 --> 00:57:44,890
So when we transition to Python in a few weeks' time,
1252
00:57:44,890 --> 00:57:47,140
we'll see how we can start solving problems like that.
1253
00:57:47,140 --> 00:57:50,100
But for now, it's up to you to initialize name and number
1254
00:57:50,100 --> 00:57:51,450
to something.
1255
00:57:51,450 --> 00:57:52,832
Yeah.
1256
00:57:52,832 --> 00:57:55,540
STUDENT: [INAUDIBLE]
1257
00:57:55,540 --> 00:58:04,123
1258
00:58:04,123 --> 00:58:05,540
DAVID MALAN: Really good question.
1259
00:58:05,540 --> 00:58:08,470
How can we adjust or critique the design of what I'm doing?
1260
00:58:08,470 --> 00:58:12,190
This is one of the few situations where I would say, hypocritically,
1261
00:58:12,190 --> 00:58:13,780
do as I say, not as I do.
1262
00:58:13,780 --> 00:58:17,710
I am using pretty ugly lines like this, just to introduce the syntax.
1263
00:58:17,710 --> 00:58:20,428
But my claim, pedagogically today, is that eventually,
1264
00:58:20,428 --> 00:58:22,720
when we start storing names and numbers or other things
1265
00:58:22,720 --> 00:58:26,230
in files or in databases, you won't have this redundancy.
1266
00:58:26,230 --> 00:58:29,080
You'll have one line of code or two lines of code that
1267
00:58:29,080 --> 00:58:31,660
read the information from the file or database
1268
00:58:31,660 --> 00:58:34,630
and then fill the entire array with that data.
1269
00:58:34,630 --> 00:58:37,330
For now, I'm just doing it manually so as to keep our focus only
1270
00:58:37,330 --> 00:58:39,400
on the new syntax, but that's it.
1271
00:58:39,400 --> 00:58:42,640
So forgive the bad design by design today.
1272
00:58:42,640 --> 00:58:45,740
Other questions on this?
1273
00:58:45,740 --> 00:58:47,595
All right, that's been a lot already.
1274
00:58:47,595 --> 00:58:50,470
Why don't we go ahead and take our 10 minute break with snacks first.
1275
00:58:50,470 --> 00:58:53,020
We have some delightful brownies in the lobby.
1276
00:58:53,020 --> 00:58:55,900
All right, we are back.
1277
00:58:55,900 --> 00:58:59,890
And up until now, it clearly seems to be a good thing if your data is sorted,
1278
00:58:59,890 --> 00:59:02,350
because you can use binary search.
1279
00:59:02,350 --> 00:59:05,540
You know a little something more about the data.
1280
00:59:05,540 --> 00:59:07,570
But it turns out that sorting in and of itself
1281
00:59:07,570 --> 00:59:10,420
is kind of a problem to solve too.
1282
00:59:10,420 --> 00:59:14,830
And you might think, well, if sorting is going
1283
00:59:14,830 --> 00:59:18,070
to be pretty fast, we absolutely should do it before we start searching,
1284
00:59:18,070 --> 00:59:20,237
because that will just speed up all of our searches.
1285
00:59:20,237 --> 00:59:22,960
But if sorting is slow, that kind of invites the question, well,
1286
00:59:22,960 --> 00:59:25,420
should we bother sorting our data if we're only
1287
00:59:25,420 --> 00:59:28,090
going to search the data maybe once, maybe twice?
1288
00:59:28,090 --> 00:59:30,550
And so here is going to be, potentially, a trade off.
1289
00:59:30,550 --> 00:59:33,250
So let's consider what it means really to sort data.
1290
00:59:33,250 --> 00:59:35,950
In our case, it's just going to be simple and use numbers.
1291
00:59:35,950 --> 00:59:38,230
But it might, in the case of the Googles of the world,
1292
00:59:38,230 --> 00:59:40,880
be actual web pages or persons or the like.
1293
00:59:40,880 --> 00:59:46,090
So here is our typical picture for sorting, for solving any problem.
1294
00:59:46,090 --> 00:59:48,190
Input at left and output at right.
1295
00:59:48,190 --> 00:59:54,340
The input to our sort problem is going to be some unsorted set of values.
1296
00:59:54,340 --> 00:59:57,940
And the output, ideally, will be the same set of values sorted.
1297
00:59:57,940 --> 01:00:00,550
And if we do this concretely, let's suppose
1298
01:00:00,550 --> 01:00:04,786
that we want to go about sorting this list of numbers, 7, 2, 5, 4, 1, 6, 0,
1299
01:00:04,786 --> 01:00:05,860
3.
1300
01:00:05,860 --> 01:00:07,810
So it's all of the numbers from 0 to 7.
1301
01:00:07,810 --> 01:00:09,757
But they're somehow jumbled up randomly.
1302
01:00:09,757 --> 01:00:11,590
That's going to be the input to the problem.
1303
01:00:11,590 --> 01:00:13,840
And the goal is now to sort those so that you, indeed,
1304
01:00:13,840 --> 01:00:17,990
get out 0, 1, 2, 3, 4, 5, 6, 7 instead.
1305
01:00:17,990 --> 01:00:20,440
So it turns out there's lots of different ways
1306
01:00:20,440 --> 01:00:23,900
we can actually sort numbers like these here.
1307
01:00:23,900 --> 01:00:27,430
And in fact, just to complement our search example earlier,
1308
01:00:27,430 --> 01:00:29,952
could we perhaps quickly get some eight volunteers
1309
01:00:29,952 --> 01:00:32,410
to come up if you're comfortable appearing on the internet?
1310
01:00:32,410 --> 01:00:39,100
If you want to do 1, 2, 3, 4, 5, 6, 7, 8, how about?
1311
01:00:39,100 --> 01:00:40,255
All right, come on down.
1312
01:00:40,255 --> 01:00:45,040
1313
01:00:45,040 --> 01:00:47,970
All right.
1314
01:00:47,970 --> 01:00:50,560
Come on over here, and I'll give you each a number.
1315
01:00:50,560 --> 01:00:54,240
And if you want to start to organize yourselves in the same order
1316
01:00:54,240 --> 01:00:58,390
you see the numbers on the board.
1317
01:00:58,390 --> 01:01:01,530
So look up on the overhead and organize yourselves from left
1318
01:01:01,530 --> 01:01:04,460
to right in that same order.
1319
01:01:04,460 --> 01:01:06,210
And let's have the first of you-- perfect.
1320
01:01:06,210 --> 01:01:10,420
If you want to come right over here, how about right in line with this?
1321
01:01:10,420 --> 01:01:13,990
All right, and a few more numbers.
1322
01:01:13,990 --> 01:01:14,980
All right.
1323
01:01:14,980 --> 01:01:19,810
Number 2, 6, and perfect.
1324
01:01:19,810 --> 01:01:21,625
Just the right number, all right.
1325
01:01:21,625 --> 01:01:22,858
Uh oh.
1326
01:01:22,858 --> 01:01:24,400
All right, there we go, number three.
1327
01:01:24,400 --> 01:01:24,968
All right.
1328
01:01:24,968 --> 01:01:26,260
So let's just do a quick check.
1329
01:01:26,260 --> 01:01:30,867
We have 7, 2, 5, 4, 1, 6, 0, 3, very good so far.
1330
01:01:30,867 --> 01:01:32,950
Do you want to just scootch a little this way just
1331
01:01:32,950 --> 01:01:34,510
to make a little more room?
1332
01:01:34,510 --> 01:01:38,090
All right, and let's consider now who we have here on stage.
1333
01:01:38,090 --> 01:01:40,780
You want to each say a quick hello to the audience?
1334
01:01:40,780 --> 01:01:42,070
RYAN: Hi, my name is Ryan.
1335
01:01:42,070 --> 01:01:45,597
I'm a first year from Pennypacker.
1336
01:01:45,597 --> 01:01:46,930
ITSELLE: Hi, my name is Itselle.
1337
01:01:46,930 --> 01:01:49,177
I'm a first year at Strauss.
1338
01:01:49,177 --> 01:01:50,260
LUCY: Hi, my name is Lucy.
1339
01:01:50,260 --> 01:01:52,400
And I'm a first year from Greenough.
1340
01:01:52,400 --> 01:01:53,650
SHILOH: Hi, my name is Shiloh.
1341
01:01:53,650 --> 01:01:55,927
I'm a first year in Wigglesworth.
1342
01:01:55,927 --> 01:01:57,010
JACK: Hi, my name is Jack.
1343
01:01:57,010 --> 01:01:59,877
And I'm a first year in Strauss.
1344
01:01:59,877 --> 01:02:01,210
KATHRYN: Hi, my name is Kathryn.
1345
01:02:01,210 --> 01:02:02,787
I'm a first year at Strauss.
1346
01:02:02,787 --> 01:02:04,120
MICHAEL: Hi, my name is Michael.
1347
01:02:04,120 --> 01:02:06,063
I'm a first year at Pennypacker.
1348
01:02:06,063 --> 01:02:07,480
MUHAMMAD: Hi, my name is Muhammad.
1349
01:02:07,480 --> 01:02:09,047
I'm a first year in Matthews.
1350
01:02:09,047 --> 01:02:10,630
DAVID MALAN: Hi, nice, welcome aboard.
1351
01:02:10,630 --> 01:02:11,240
All right.
1352
01:02:11,240 --> 01:02:16,510
So let's consider, now, how we might go about sorting our kind volunteers here,
1353
01:02:16,510 --> 01:02:21,160
the goal being to get them into order from smallest to largest so that,
1354
01:02:21,160 --> 01:02:24,160
presumably then, we can use something smarter than just linear search.
1355
01:02:24,160 --> 01:02:26,350
We can actually use binary search, assuming
1356
01:02:26,350 --> 01:02:27,878
that they are already then sorted.
1357
01:02:27,878 --> 01:02:30,670
So let me propose that we first consider an algorithm that actually
1358
01:02:30,670 --> 01:02:32,600
has a name called selection sort.
1359
01:02:32,600 --> 01:02:36,220
And selection sort is going to be one that literally has me,
1360
01:02:36,220 --> 01:02:40,060
or really you, as the programmer, selecting the smallest element again
1361
01:02:40,060 --> 01:02:43,610
and again, and then putting them into the appropriate place.
1362
01:02:43,610 --> 01:02:47,115
So let me go ahead and start this here, starting with the number 7.
1363
01:02:47,115 --> 01:02:49,240
At the moment, 7 is the smallest number I've found.
1364
01:02:49,240 --> 01:02:52,610
So I'm going to make mental note of that with a mental variable, if you will.
1365
01:02:52,610 --> 01:02:53,710
I'm going to move on now.
1366
01:02:53,710 --> 01:02:55,460
Number 2 is obviously smaller, so I'm just
1367
01:02:55,460 --> 01:02:58,360
going to update my mental reminder that 2 is now the smallest,
1368
01:02:58,360 --> 01:03:01,555
effectively forgetting, for now, number 7.
1369
01:03:01,555 --> 01:03:02,440
5, not smaller.
1370
01:03:02,440 --> 01:03:03,370
4, not smaller.
1371
01:03:03,370 --> 01:03:04,170
1, smaller.
1372
01:03:04,170 --> 01:03:05,920
And I'm going to make mental note of that.
1373
01:03:05,920 --> 01:03:07,030
6, not smaller.
1374
01:03:07,030 --> 01:03:08,200
0, even smaller.
1375
01:03:08,200 --> 01:03:11,140
I'll make mental note of that, having forgotten now everything else.
1376
01:03:11,140 --> 01:03:13,180
And now number 3 is not smaller.
1377
01:03:13,180 --> 01:03:14,290
So what's your name again?
1378
01:03:14,290 --> 01:03:14,630
MICHAEL: Michael.
1379
01:03:14,630 --> 01:03:16,240
DAVID MALAN: So Michael is number 0.
1380
01:03:16,240 --> 01:03:18,310
He belongs, of course, way down there.
1381
01:03:18,310 --> 01:03:20,740
But unfortunately-- you are--
1382
01:03:20,740 --> 01:03:21,550
RYAN: Ryan.
1383
01:03:21,550 --> 01:03:23,360
DAVID MALAN: Ryan is in the way.
1384
01:03:23,360 --> 01:03:24,580
So what should we do?
1385
01:03:24,580 --> 01:03:27,570
How should we start to sort this list?
1386
01:03:27,570 --> 01:03:30,510
Where should number 0 go?
1387
01:03:30,510 --> 01:03:31,012
Yeah.
1388
01:03:31,012 --> 01:03:32,220
Do you want to say it louder?
1389
01:03:32,220 --> 01:03:34,545
STUDENT: I will swap, I think.
1390
01:03:34,545 --> 01:03:36,670
DAVID MALAN: Yeah, so let's just go ahead and swap.
1391
01:03:36,670 --> 01:03:39,190
So if you want to go ahead and 0, go on where 7 is.
1392
01:03:39,190 --> 01:03:41,170
We need to make room for number 7.
1393
01:03:41,170 --> 01:03:44,350
It would kind of be cheating if maybe everyone kind of politely
1394
01:03:44,350 --> 01:03:45,530
stepped over to the side.
1395
01:03:45,530 --> 01:03:46,030
Why?
1396
01:03:46,030 --> 01:03:49,000
Because if we imagine all of our volunteers here to be an array, like,
1397
01:03:49,000 --> 01:03:52,390
that's a crazy amount of work to have every element in the array
1398
01:03:52,390 --> 01:03:54,190
shift to the left just to make room.
1399
01:03:54,190 --> 01:03:57,340
So we're going to keep it simple and just evict whoever's there now.
1400
01:03:57,340 --> 01:04:00,880
Now, maybe we get lucky, and number 7 is actually closer to its destination.
1401
01:04:00,880 --> 01:04:03,250
Maybe we get unlucky, and it goes farther away.
1402
01:04:03,250 --> 01:04:05,260
But we've at least solved one problem.
1403
01:04:05,260 --> 01:04:08,140
If we had n problems at first, now we have n minus 1,
1404
01:04:08,140 --> 01:04:10,280
because number 0 is indeed in the right place.
1405
01:04:10,280 --> 01:04:14,588
So if I continue to act this out, let me go ahead and say 2, currently
1406
01:04:14,588 --> 01:04:15,130
the smallest.
1407
01:04:15,130 --> 01:04:18,040
5, no, 4, no, 1 currently the smallest.
1408
01:04:18,040 --> 01:04:19,000
I'll make mental note.
1409
01:04:19,000 --> 01:04:22,690
6, 7, 3, and now let me pause.
1410
01:04:22,690 --> 01:04:26,000
1 is obviously the now smallest element.
1411
01:04:26,000 --> 01:04:27,760
So did I need to keep going?
1412
01:04:27,760 --> 01:04:30,550
Well, turns out, at least as I've defined selection sort,
1413
01:04:30,550 --> 01:04:33,130
I do need to keep going, because I only claim
1414
01:04:33,130 --> 01:04:36,550
that I'm using one variable in my mind to remember the then smallest element.
1415
01:04:36,550 --> 01:04:39,970
I'm not smart enough like us humans to remember, wait a minute,
1416
01:04:39,970 --> 01:04:41,590
1 is definitely the smallest now.
1417
01:04:41,590 --> 01:04:43,190
I don't have that whole recollection.
1418
01:04:43,190 --> 01:04:45,590
So I just am keeping track of the now smallest.
1419
01:04:45,590 --> 01:04:46,910
So number 1, your name was?
1420
01:04:46,910 --> 01:04:47,410
JACK: Jack.
1421
01:04:47,410 --> 01:04:49,300
DAVID MALAN: Jack, where should Jack go?
1422
01:04:49,300 --> 01:04:50,380
Probably there.
1423
01:04:50,380 --> 01:04:51,670
And what's your name?
1424
01:04:51,670 --> 01:04:51,880
ITSELLE: Itselle.
1425
01:04:51,880 --> 01:04:54,588
DAVID MALAN: OK, so Jack and Itselle, if you want to swap places,
1426
01:04:54,588 --> 01:04:57,430
we've now solved two of the n total problems.
1427
01:04:57,430 --> 01:04:58,990
And now we'll do it a little faster.
1428
01:04:58,990 --> 01:05:03,880
If each of you want to start to swap as I find the right person, so 5 smallest,
1429
01:05:03,880 --> 01:05:06,340
4 smaller, 2 is smaller.
1430
01:05:06,340 --> 01:05:07,750
Got to keep checking.
1431
01:05:07,750 --> 01:05:09,572
OK, 2 was smaller.
1432
01:05:09,572 --> 01:05:11,780
All right, now I'm going to go back to the beginning.
1433
01:05:11,780 --> 01:05:13,090
All right, 4 is small.
1434
01:05:13,090 --> 01:05:14,050
5 is not.
1435
01:05:14,050 --> 01:05:14,740
6 is not.
1436
01:05:14,740 --> 01:05:16,120
7-- oh, 3 is small.
1437
01:05:16,120 --> 01:05:17,770
Where do you want to go?
1438
01:05:17,770 --> 01:05:18,670
OK, good.
1439
01:05:18,670 --> 01:05:19,810
I'm going to go back here.
1440
01:05:19,810 --> 01:05:21,060
And I can be a little smart.
1441
01:05:21,060 --> 01:05:22,810
I don't have to go all the way to the end,
1442
01:05:22,810 --> 01:05:24,950
because I know these folks are already sorted.
1443
01:05:24,950 --> 01:05:26,630
So I can at least optimize slightly.
1444
01:05:26,630 --> 01:05:27,970
So now 5 is small.
1445
01:05:27,970 --> 01:05:28,720
6 is small.
1446
01:05:28,720 --> 01:05:30,160
7 is 4, 4 is smaller.
1447
01:05:30,160 --> 01:05:33,080
If you want to go in place there.
1448
01:05:33,080 --> 01:05:34,810
And now, here things get interesting.
1449
01:05:34,810 --> 01:05:37,180
I can optimize by not looking at these folks
1450
01:05:37,180 --> 01:05:39,340
anymore, because they're obviously problem solved.
1451
01:05:39,340 --> 01:05:42,970
But now 5 is small, 6 is not, 7 is not.
1452
01:05:42,970 --> 01:05:45,010
OK, 5, you can stay where you are.
1453
01:05:45,010 --> 01:05:46,870
Now, a human in the room is obviously going
1454
01:05:46,870 --> 01:05:49,420
to question why I'm wasting any more time.
1455
01:05:49,420 --> 01:05:52,090
But with selection sort, as I've defined it thus far,
1456
01:05:52,090 --> 01:05:55,840
I still have to, now, check 6 is smallest, not 7.
1457
01:05:55,840 --> 01:05:58,520
And now my final step, OK, they're all in place.
1458
01:05:58,520 --> 01:06:00,760
So here, too, is this dichotomy between what we all
1459
01:06:00,760 --> 01:06:03,730
have is this bird's eye view of the whole problem, where it's obvious
1460
01:06:03,730 --> 01:06:05,060
where everyone needs to go.
1461
01:06:05,060 --> 01:06:09,137
But a computer implementing this with an array really has to be more methodical.
1462
01:06:09,137 --> 01:06:10,720
And we're actually saving a step here.
1463
01:06:10,720 --> 01:06:13,780
If we were really doing this, none of these numbers would be visible.
1464
01:06:13,780 --> 01:06:16,840
All eight of our volunteers would be inside of a locked door.
1465
01:06:16,840 --> 01:06:19,220
And only then could we see them one at a time.
1466
01:06:19,220 --> 01:06:21,670
But we're focusing now just on the sorting aspect.
1467
01:06:21,670 --> 01:06:24,640
So let me just, before we do one other demonstration here,
1468
01:06:24,640 --> 01:06:29,620
propose that what I really just did here in pseudocode was something like this.
1469
01:06:29,620 --> 01:06:33,430
For i from 0 to n minus 1, keeping in mind
1470
01:06:33,430 --> 01:06:35,590
that 0 is always the left of the array.
1471
01:06:35,590 --> 01:06:38,110
n minus 1 is always the right end of the array.
1472
01:06:38,110 --> 01:06:43,000
For i from 0 to n minus 1, I found the smallest number between numbers bracket
1473
01:06:43,000 --> 01:06:45,730
i and numbers bracket n minus 1.
1474
01:06:45,730 --> 01:06:48,610
And that's the very geeky way of expressing this optimization.
1475
01:06:48,610 --> 01:06:51,490
I'm always starting from numbers bracket i wherever I am.
1476
01:06:51,490 --> 01:06:53,200
And then everything else to the right.
1477
01:06:53,200 --> 01:06:56,890
And that's what was allowing me to ignore the already sorted volunteers.
1478
01:06:56,890 --> 01:07:01,220
If, though, my last line says swap smallest number with numbers i,
1479
01:07:01,220 --> 01:07:03,220
think that implements what our humans were doing
1480
01:07:03,220 --> 01:07:05,470
by physically walking to another spot.
1481
01:07:05,470 --> 01:07:09,220
All right, so that, then, would be what we'll call selection sort.
1482
01:07:09,220 --> 01:07:11,620
Let's go ahead and take a second approach here
1483
01:07:11,620 --> 01:07:13,360
using an algorithm that I'm going to call bubble sort.
1484
01:07:13,360 --> 01:07:16,090
But to do this, we need you all to reset to your original locations.
1485
01:07:16,090 --> 01:07:17,320
We have a little cheat sheet on the board
1486
01:07:17,320 --> 01:07:19,750
if you'd like to go back to this position here.
1487
01:07:19,750 --> 01:07:22,600
And let me take a fundamentally different approach, because I'm not
1488
01:07:22,600 --> 01:07:24,850
really liking selection sort as is, because it's kind
1489
01:07:24,850 --> 01:07:26,780
of a lot of walking back and forth.
1490
01:07:26,780 --> 01:07:30,620
And the lot of walking suggests a lot of, lot of steps again and again.
1491
01:07:30,620 --> 01:07:32,090
So what might I do instead?
1492
01:07:32,090 --> 01:07:34,840
Well, bubble sort is going to have me focus a little more
1493
01:07:34,840 --> 01:07:36,730
intuitively on just smaller problems.
1494
01:07:36,730 --> 01:07:38,605
And let's see if this gets me somewhere else.
1495
01:07:38,605 --> 01:07:42,430
So if I just look at this list without looking at everyone else, 7 and 2,
1496
01:07:42,430 --> 01:07:43,670
this is obviously a problem.
1497
01:07:43,670 --> 01:07:44,170
Why?
1498
01:07:44,170 --> 01:07:45,500
Because you're out of order.
1499
01:07:45,500 --> 01:07:47,810
So let's just solve one tiny problem first.
1500
01:07:47,810 --> 01:07:49,570
So 7 and 2, why don't you swap?
1501
01:07:49,570 --> 01:07:54,160
I know 2 is in a better place now, because she's definitely less than 7.
1502
01:07:54,160 --> 01:07:55,540
So I think I can now move on.
1503
01:07:55,540 --> 01:07:57,350
7 and 5, problem.
1504
01:07:57,350 --> 01:07:58,390
So let's solve that.
1505
01:07:58,390 --> 01:07:59,830
7 and 4, problem.
1506
01:07:59,830 --> 01:08:02,380
Let's solve that, 7 and 1, let's solve that.
1507
01:08:02,380 --> 01:08:03,970
7 and 6, let's solve that.
1508
01:08:03,970 --> 01:08:05,080
7 and 0, solve that.
1509
01:08:05,080 --> 01:08:06,550
7 and 3, solve that.
1510
01:08:06,550 --> 01:08:07,330
OK, done.
1511
01:08:07,330 --> 01:08:09,130
Sorted, right?
1512
01:08:09,130 --> 01:08:11,780
Or obviously not, if you just glance at these numbers here.
1513
01:08:11,780 --> 01:08:14,530
But we have, fundamentally, taken a bite out of the problem.
1514
01:08:14,530 --> 01:08:17,020
7 is indeed in the right place.
1515
01:08:17,020 --> 01:08:21,170
So we maximally have n minus 1 other problems to solve.
1516
01:08:21,170 --> 01:08:23,660
So how do I do this?
1517
01:08:23,660 --> 01:08:25,700
I think I can just repeat the same logic.
1518
01:08:25,700 --> 01:08:26,770
Let me go over here.
1519
01:08:26,770 --> 01:08:28,210
2 and 5, good.
1520
01:08:28,210 --> 01:08:29,800
5 and 4, no.
1521
01:08:29,800 --> 01:08:31,330
5 and 1, no.
1522
01:08:31,330 --> 01:08:32,590
5 and 6, yes.
1523
01:08:32,590 --> 01:08:34,660
6 and 0, no.
1524
01:08:34,660 --> 01:08:36,760
6 and 3, no.
1525
01:08:36,760 --> 01:08:39,191
So now we've solved two of the problems.
1526
01:08:39,191 --> 01:08:41,649
And what's nice about bubble sort, at least as this glance,
1527
01:08:41,649 --> 01:08:42,707
it's nice and simple.
1528
01:08:42,707 --> 01:08:43,540
It's nice and local.
1529
01:08:43,540 --> 01:08:46,510
And you just keep incrementally solving more and more problems.
1530
01:08:46,510 --> 01:08:48,010
So let's go ahead and do this again.
1531
01:08:48,010 --> 01:08:50,080
And I'll do it-- we can do it faster.
1532
01:08:50,080 --> 01:08:51,760
2 and 4, we know are good.
1533
01:08:51,760 --> 01:08:59,200
4 and 1, 4 and 5, 5 and 0, 5 and 3, 5 and 6, 6 and 7, good.
1534
01:08:59,200 --> 01:09:01,390
So we go back, 2 and 1.
1535
01:09:01,390 --> 01:09:03,340
Ah, now another problem solve.
1536
01:09:03,340 --> 01:09:09,939
2, and 4, 4 and 0, 4 and 3, 4 and 5, 5 and 6, 6 and 7.
1537
01:09:09,939 --> 01:09:13,270
And so notice 2, as per its name, the largest elements
1538
01:09:13,270 --> 01:09:14,895
have bubbled their way up to the top.
1539
01:09:14,895 --> 01:09:17,020
And that's what seems to be happening just as we're
1540
01:09:17,020 --> 01:09:18,340
fixing some remaining problems.
1541
01:09:18,340 --> 01:09:19,120
So almost done.
1542
01:09:19,120 --> 01:09:27,550
1 and 2, 2 and 0, 2 and 3, 3 and 4, 4 and 5, 5 and 6, 6 and 7, almost done.
1543
01:09:27,550 --> 01:09:29,830
Obviously, to us humans, it looks done.
1544
01:09:29,830 --> 01:09:32,529
How do I know as the computer for sure?
1545
01:09:32,529 --> 01:09:36,370
What would be the most surefire way for me to now go, it's not done, sorry.
1546
01:09:36,370 --> 01:09:38,080
That's a bug.
1547
01:09:38,080 --> 01:09:43,390
OK, 1 and 0, 1 and 2, 2 and 3, 3 and 4, 4 and 5, 5 and 6, 6 and 7.
1548
01:09:43,390 --> 01:09:47,899
OK, so now it's obviously sorted to the rest of us on stage.
1549
01:09:47,899 --> 01:09:50,290
How could I confirm as much as code?
1550
01:09:50,290 --> 01:09:52,670
You're doing it with your mind, just glancing at this.
1551
01:09:52,670 --> 01:09:56,080
How would the computer, the code, know for sure that this list is now sorted?
1552
01:09:56,080 --> 01:09:57,000
Yeah.
1553
01:09:57,000 --> 01:09:58,500
STUDENT: [INAUDIBLE] one more time.
1554
01:09:58,500 --> 01:10:00,000
DAVID MALAN: Let's do one more time.
1555
01:10:00,000 --> 01:10:03,512
And look, draw what conclusion?
1556
01:10:03,512 --> 01:10:05,490
STUDENT: That nothing has to switch at all.
1557
01:10:05,490 --> 01:10:07,365
DAVID MALAN: Yeah, let's do it one more time,
1558
01:10:07,365 --> 01:10:08,860
even though it's a little wasteful.
1559
01:10:08,860 --> 01:10:13,320
But logically, if I go through the whole list comparing pairs again, again,
1560
01:10:13,320 --> 01:10:15,810
and again, and I don't do any work that time,
1561
01:10:15,810 --> 01:10:19,133
now it's obviously logically safe to just stop, because otherwise, I'm
1562
01:10:19,133 --> 01:10:21,300
wasting my time doing the same thing again and again
1563
01:10:21,300 --> 01:10:22,933
if no one's actually moving.
1564
01:10:22,933 --> 01:10:25,350
So I'm afraid we don't have monopoly games for all of you.
1565
01:10:25,350 --> 01:10:26,767
But we do have eight stress balls.
1566
01:10:26,767 --> 01:10:30,090
And round of applause, if we could, for our volunteers.
1567
01:10:30,090 --> 01:10:33,910
If you want to put your numbers on the shelf there.
1568
01:10:33,910 --> 01:10:36,220
So if we consider for a moment--
1569
01:10:36,220 --> 01:10:36,720
thank you.
1570
01:10:36,720 --> 01:10:39,340
Thank you so much.
1571
01:10:39,340 --> 01:10:42,150
Sure.
1572
01:10:42,150 --> 01:10:43,170
Thank you.
1573
01:10:43,170 --> 01:10:44,230
Thanks.
1574
01:10:44,230 --> 01:10:44,730
Sure.
1575
01:10:44,730 --> 01:10:48,870
So if we consider now these two algorithms, which one is better?
1576
01:10:48,870 --> 01:10:51,990
Any intuition for whether selection sort the first
1577
01:10:51,990 --> 01:10:55,950
is better or worse than bubble sort the second?
1578
01:10:55,950 --> 01:10:58,020
Any thoughts?
1579
01:10:58,020 --> 01:10:58,860
Yeah.
1580
01:10:58,860 --> 01:11:03,620
STUDENT: Bubble sort's even better because it's less work [INAUDIBLE]..
1581
01:11:03,620 --> 01:11:06,110
DAVID MALAN: So bubble sort seems like less work,
1582
01:11:06,110 --> 01:11:08,930
especially since I was focusing on those localized problems.
1583
01:11:08,930 --> 01:11:11,460
Other intuition?
1584
01:11:11,460 --> 01:11:14,580
Selection sort versus bubble sort.
1585
01:11:14,580 --> 01:11:18,000
Well, let me propose that we try to quantize this so we can actually
1586
01:11:18,000 --> 01:11:19,420
analyze it in some way.
1587
01:11:19,420 --> 01:11:22,590
And this is not an exercise we'll do constantly for lots of algorithms.
1588
01:11:22,590 --> 01:11:24,940
But these are pretty representative of algorithms.
1589
01:11:24,940 --> 01:11:27,690
So we can wrap our minds around, indeed, the performance
1590
01:11:27,690 --> 01:11:28,960
or the design of these things.
1591
01:11:28,960 --> 01:11:34,350
So here is my pseudocode for selection sort, whereby as per its name,
1592
01:11:34,350 --> 01:11:38,500
I just iteratively select the next smallest element again and again.
1593
01:11:38,500 --> 01:11:41,890
So how can we go about analyzing something like this?
1594
01:11:41,890 --> 01:11:43,920
Well, we could just do it on paper pencil
1595
01:11:43,920 --> 01:11:46,260
and count up the number of steps that seem
1596
01:11:46,260 --> 01:11:48,030
to be implied logically by the code.
1597
01:11:48,030 --> 01:11:52,260
We could literally count the number of steps I was taking again and again,
1598
01:11:52,260 --> 01:11:52,890
left to right.
1599
01:11:52,890 --> 01:11:55,830
We could also just count the number of comparisons
1600
01:11:55,830 --> 01:11:58,302
I was making with each of the persons involved.
1601
01:11:58,302 --> 01:12:00,510
And I was doing it kind of quickly in selection sort.
1602
01:12:00,510 --> 01:12:03,033
But every time I was looking at a person trying to decide,
1603
01:12:03,033 --> 01:12:04,950
do I want to remember that number is smallest?
1604
01:12:04,950 --> 01:12:08,580
That number, I was comparing two values with an equals equals or less
1605
01:12:08,580 --> 01:12:11,700
than or greater than sign, at least if we had done this in code.
1606
01:12:11,700 --> 01:12:13,110
So that tends to be the norm.
1607
01:12:13,110 --> 01:12:16,680
When analyzing algorithms like these, counting the number of comparisons,
1608
01:12:16,680 --> 01:12:21,090
because it's kind of a global unit of measure
1609
01:12:21,090 --> 01:12:23,490
we can use to compare different algorithms entirely.
1610
01:12:23,490 --> 01:12:27,780
So think, too, that in the general case, when
1611
01:12:27,780 --> 01:12:30,390
we have more than eight volunteers, more than seven doors,
1612
01:12:30,390 --> 01:12:33,510
we can generalize our array in general, as this
1613
01:12:33,510 --> 01:12:35,220
is the first element at bracket 0.
1614
01:12:35,220 --> 01:12:37,770
And the end of it is always n minus 1.
1615
01:12:37,770 --> 01:12:41,550
So arrays or doors, in this case, or volunteers,
1616
01:12:41,550 --> 01:12:45,720
are always numerically indexed from 0 on up to n minus 1,
1617
01:12:45,720 --> 01:12:47,200
if there's n of them in total.
1618
01:12:47,200 --> 01:12:50,940
So how do we analyze the code of selection sort?
1619
01:12:50,940 --> 01:12:56,370
Well, how many steps did it take me to find the first smallest element?
1620
01:12:56,370 --> 01:12:58,835
Or more precisely, how many comparisons did I
1621
01:12:58,835 --> 01:13:00,960
need to make when I walked to left to right to find
1622
01:13:00,960 --> 01:13:06,100
our first-smallest person, which ended up being 0?
1623
01:13:06,100 --> 01:13:09,310
How many comparisons did I do when walking left to right?
1624
01:13:09,310 --> 01:13:15,850
If there were eight people on stage, how many total comparisons that I do?
1625
01:13:15,850 --> 01:13:18,280
Like if there's eight people, I compared these folks.
1626
01:13:18,280 --> 01:13:22,210
Then this person, this person, yeah.
1627
01:13:22,210 --> 01:13:23,412
Yeah, so seven total, right?
1628
01:13:23,412 --> 01:13:25,120
Because if there's eight people on stage,
1629
01:13:25,120 --> 01:13:28,420
you can only do seven comparisons total, because otherwise you'd
1630
01:13:28,420 --> 01:13:29,960
be comparing one number to itself.
1631
01:13:29,960 --> 01:13:31,960
So it seems like, in the general case, if you've
1632
01:13:31,960 --> 01:13:34,600
got n numbers that you're trying to sort,
1633
01:13:34,600 --> 01:13:38,560
finding the smallest element first takes n minus 1 comparisons.
1634
01:13:38,560 --> 01:13:41,275
Maybe n total steps left to right.
1635
01:13:41,275 --> 01:13:43,150
But the number of comparisons, which I claim,
1636
01:13:43,150 --> 01:13:46,030
is just a useful unit of measure, is n minus 1.
1637
01:13:46,030 --> 01:13:48,490
How about finding the next smallest person?
1638
01:13:48,490 --> 01:13:51,970
How many steps did it take me to find the next smallest number, which
1639
01:13:51,970 --> 01:13:53,200
ended up being the number 1?
1640
01:13:53,200 --> 01:13:55,790
1641
01:13:55,790 --> 01:13:56,855
Yeah.
1642
01:13:56,855 --> 01:13:58,340
STUDENT: [INAUDIBLE] n minus 2.
1643
01:13:58,340 --> 01:13:59,600
DAVID MALAN: Yeah, so just n minus 2.
1644
01:13:59,600 --> 01:13:59,870
Why?
1645
01:13:59,870 --> 01:14:01,610
Because I'd already solved one problem.
1646
01:14:01,610 --> 01:14:03,210
Someone was already in the right position.
1647
01:14:03,210 --> 01:14:05,490
It would be silly to keep counting them again and again.
1648
01:14:05,490 --> 01:14:08,157
So I can whittle down my number of comparisons for the next pass
1649
01:14:08,157 --> 01:14:09,200
to n minus 2.
1650
01:14:09,200 --> 01:14:12,350
The third pass to find the third smallest number would be n minus 3.
1651
01:14:12,350 --> 01:14:15,770
And then dot, dot, dot, presumably this story, this formula,
1652
01:14:15,770 --> 01:14:19,620
ends when you have just one final pair, the people at the end, to compare.
1653
01:14:19,620 --> 01:14:22,820
So if this is looking a little reminiscent of some kind of recurrence
1654
01:14:22,820 --> 01:14:25,520
from high school or high school math or physics or the like,
1655
01:14:25,520 --> 01:14:28,220
let me just stipulate that if you actually do out this math
1656
01:14:28,220 --> 01:14:32,840
and generalize it, that is the same thing as n times n minus 1
1657
01:14:32,840 --> 01:14:33,392
divided by 2.
1658
01:14:33,392 --> 01:14:35,100
And if you're rusty on that, no big deal.
1659
01:14:35,100 --> 01:14:37,070
Just kind of commit to memory that any time
1660
01:14:37,070 --> 01:14:40,010
you add up this kind of series, something plus something slightly
1661
01:14:40,010 --> 01:14:42,302
smaller, plus something slightly smaller, each of which
1662
01:14:42,302 --> 01:14:46,520
differs by 1, you're going to get this formula. n times n minus 1 over 2.
1663
01:14:46,520 --> 01:14:50,630
If we, of course, multiply that out, that's really n squared minus n,
1664
01:14:50,630 --> 01:14:51,680
all divided by 2.
1665
01:14:51,680 --> 01:14:56,540
If we keep multiplying it out, that's n squared divided by 2 minus n over 2.
1666
01:14:56,540 --> 01:14:59,660
And now, we have kind of a vocabulary with which we
1667
01:14:59,660 --> 01:15:03,140
can talk about the efficiency, the design of this algorithm.
1668
01:15:03,140 --> 01:15:06,380
But honestly, I don't really care about this level of precision,
1669
01:15:06,380 --> 01:15:09,560
like n squared divided by 2 minus n divided by 2.
1670
01:15:09,560 --> 01:15:14,240
As n gets really large, which of these symbols, which of these terms
1671
01:15:14,240 --> 01:15:17,480
is really going to dominate, become the biggest influencer
1672
01:15:17,480 --> 01:15:20,190
on the total value of steps?
1673
01:15:20,190 --> 01:15:20,690
Right?
1674
01:15:20,690 --> 01:15:21,890
It's the square, right?
1675
01:15:21,890 --> 01:15:23,382
It's definitely not n divided by 2.
1676
01:15:23,382 --> 01:15:24,590
That's shaving some time off.
1677
01:15:24,590 --> 01:15:27,800
But n squared, as n gets big, is going to get really big.
1678
01:15:27,800 --> 01:15:29,990
If n is 100, then n squared is bigger.
1679
01:15:29,990 --> 01:15:32,570
If n is a million, n squared is really bigger.
1680
01:15:32,570 --> 01:15:35,390
And so at the end of the day, when we're really just talking
1681
01:15:35,390 --> 01:15:39,140
about a wave of the hand analysis and upper bound, if you will,
1682
01:15:39,140 --> 01:15:43,130
let's just say that selection sort, as analyzed here,
1683
01:15:43,130 --> 01:15:45,860
it's on the order of n squared steps.
1684
01:15:45,860 --> 01:15:47,690
It's not precisely n squared steps.
1685
01:15:47,690 --> 01:15:52,830
But you know what? n squared divided by 2, the intuition here might be that,
1686
01:15:52,830 --> 01:15:55,250
well, it's half of that.
1687
01:15:55,250 --> 01:15:58,570
n squared is what really matters as n gets really, really large.
1688
01:15:58,570 --> 01:16:01,070
And that's when you start thinking about and trying to solve
1689
01:16:01,070 --> 01:16:02,445
the Google problems of the world.
1690
01:16:02,445 --> 01:16:04,670
When n gets large, that's when you have to be smarter
1691
01:16:04,670 --> 01:16:07,490
than just sort of naive implementations of any algorithm.
1692
01:16:07,490 --> 01:16:12,480
So where, then, does this algorithm fall into this categorization here?
1693
01:16:12,480 --> 01:16:14,870
Well, n squared, it turns out, is on the order
1694
01:16:14,870 --> 01:16:19,610
of n squared steps, in the worst case, whether it's sorted or not.
1695
01:16:19,610 --> 01:16:23,990
It turns out, though, lower bound, if we consider this same code,
1696
01:16:23,990 --> 01:16:28,670
suppose the best case scenario, like our eight volunteers came up on stage.
1697
01:16:28,670 --> 01:16:32,240
And just because they already sorted themselves, so 0 through 7.
1698
01:16:32,240 --> 01:16:34,490
Suppose they just happened to be in that state.
1699
01:16:34,490 --> 01:16:37,550
How many steps would selection store take
1700
01:16:37,550 --> 01:16:42,670
to sort an already-sorted list of volunteers?
1701
01:16:42,670 --> 01:16:43,420
Any intuition?
1702
01:16:43,420 --> 01:16:44,318
Yeah.
1703
01:16:44,318 --> 01:16:47,186
STUDENT: Would it still be [INAUDIBLE]?
1704
01:16:47,186 --> 01:16:49,717
DAVID MALAN: Would it still be n--
1705
01:16:49,717 --> 01:16:51,180
STUDENT: Still be 7 [INAUDIBLE].
1706
01:16:51,180 --> 01:16:53,263
DAVID MALAN: So for the first pass, it would still
1707
01:16:53,263 --> 01:16:55,710
be 7 for the first pass across the humans.
1708
01:16:55,710 --> 01:16:58,530
Because even though, yeah, I'm claiming 0 is here,
1709
01:16:58,530 --> 01:17:01,920
I don't know that 0 is the smallest until I make my way all the way
1710
01:17:01,920 --> 01:17:03,990
over there doing all seven comparisons.
1711
01:17:03,990 --> 01:17:08,220
OK, fine, first pass took seven or more generally n minus 1 steps.
1712
01:17:08,220 --> 01:17:11,940
What if I look for the next smallest element, and the humans in this story
1713
01:17:11,940 --> 01:17:14,370
are already sorted 0 through 7?
1714
01:17:14,370 --> 01:17:17,580
Well, yes, the number 1 is here, and I see them first.
1715
01:17:17,580 --> 01:17:21,405
But I don't know they're the smallest until I compare against everyone else
1716
01:17:21,405 --> 01:17:22,530
get to the end of the list.
1717
01:17:22,530 --> 01:17:24,238
And we're like, oh, well that was stupid.
1718
01:17:24,238 --> 01:17:26,550
I already had the smallest person in hand then.
1719
01:17:26,550 --> 01:17:29,820
And so this pseudocode, this implementation of selection sort,
1720
01:17:29,820 --> 01:17:31,650
is sort of fixed like this.
1721
01:17:31,650 --> 01:17:35,490
There's no special case that says, if already sorted, quit early.
1722
01:17:35,490 --> 01:17:37,860
It's always going to take n squared steps.
1723
01:17:37,860 --> 01:17:42,090
And so in this case, if we borrow our jargon from earlier
1724
01:17:42,090 --> 01:17:46,080
using omega notation, just to be clear, selection sort
1725
01:17:46,080 --> 01:17:50,790
is also going to be in this incarnation in omega of n squared,
1726
01:17:50,790 --> 01:17:53,190
because even in the best case, where the list is already
1727
01:17:53,190 --> 01:17:56,100
sorted, you're going to waste a huge amount of time
1728
01:17:56,100 --> 01:17:59,040
essentially verifying as much or discovering as much,
1729
01:17:59,040 --> 01:18:01,750
even though we humans of course could see it right away.
1730
01:18:01,750 --> 01:18:06,270
So selection sort would seem to take both n squared steps in the worst
1731
01:18:06,270 --> 01:18:08,425
case, n squared steps in the best case.
1732
01:18:08,425 --> 01:18:09,300
And so you know what?
1733
01:18:09,300 --> 01:18:11,280
We can use our theta terminology for that.
1734
01:18:11,280 --> 01:18:13,770
Here would be an algorithm, just like counting earlier,
1735
01:18:13,770 --> 01:18:17,880
that always takes n squared steps, no matter whether the array is sorted
1736
01:18:17,880 --> 01:18:19,652
or not from the get go.
1737
01:18:19,652 --> 01:18:21,360
All right, so hopefully we can do better.
1738
01:18:21,360 --> 01:18:23,550
And someone proposed earlier that bubble sort
1739
01:18:23,550 --> 01:18:25,618
felt like it was using fewer steps.
1740
01:18:25,618 --> 01:18:26,910
Well, let's consider that next.
1741
01:18:26,910 --> 01:18:30,630
With bubble sort, we had this pseudocode, I claim.
1742
01:18:30,630 --> 01:18:33,780
Whereby, let's focus on the inside of the code first.
1743
01:18:33,780 --> 01:18:36,120
Down here, what was I doing?
1744
01:18:36,120 --> 01:18:39,960
For i from 0 to n minus 2.
1745
01:18:39,960 --> 01:18:40,740
That's curious.
1746
01:18:40,740 --> 01:18:42,360
We've never seen n minus 2 before.
1747
01:18:42,360 --> 01:18:44,040
But I asked this question.
1748
01:18:44,040 --> 01:18:50,160
If numbers bracket i and numbers bracket i plus 1 are out of order, swap them.
1749
01:18:50,160 --> 01:18:53,610
So that was when I was pointing at our first two volunteers here.
1750
01:18:53,610 --> 01:18:57,090
I saw that they were out of order, so I swapped them.
1751
01:18:57,090 --> 01:19:01,830
How come I'm doing that again and again up to n minus 2, though,
1752
01:19:01,830 --> 01:19:05,610
instead of n minus 1, which we've always used up
1753
01:19:05,610 --> 01:19:09,670
until now as our rightmost boundary?
1754
01:19:09,670 --> 01:19:14,170
Any intuition for why I'm doing this from 0 to n minus 2?
1755
01:19:14,170 --> 01:19:14,700
Yeah.
1756
01:19:14,700 --> 01:19:18,540
STUDENT: [INAUDIBLE] number, you can't get rid of the ith number.
1757
01:19:18,540 --> 01:19:21,005
There's no benign character you can swap with.
1758
01:19:21,005 --> 01:19:21,880
DAVID MALAN: Exactly.
1759
01:19:21,880 --> 01:19:26,050
Because I'm looking at the ith person per this pseudocode here
1760
01:19:26,050 --> 01:19:29,740
and the ith plus 1 person, I better make sure I don't go step
1761
01:19:29,740 --> 01:19:31,550
beyond the boundaries of my array.
1762
01:19:31,550 --> 01:19:33,010
So if you think of my left hand.
1763
01:19:33,010 --> 01:19:36,250
When my back was to you here, pointing at the current person
1764
01:19:36,250 --> 01:19:39,225
at the first position, my right hand for this if conditioner
1765
01:19:39,225 --> 01:19:41,350
is essentially pointing at the person next to them.
1766
01:19:41,350 --> 01:19:44,740
And you want to iterate with your left hand all through these people.
1767
01:19:44,740 --> 01:19:47,620
But you don't want your left hand to point at the last person.
1768
01:19:47,620 --> 01:19:50,000
You want it to point at the second to last person.
1769
01:19:50,000 --> 01:19:54,220
But we know that the last person is always at n minus 1.
1770
01:19:54,220 --> 01:19:57,820
So the second to last person, just mathematically, is at n minus 2.
1771
01:19:57,820 --> 01:19:58,780
So it's a subtlety.
1772
01:19:58,780 --> 01:20:00,880
But this is a seg fault waiting to happen.
1773
01:20:00,880 --> 01:20:03,760
If you implemented bubble sort using n minus 1,
1774
01:20:03,760 --> 01:20:07,340
you will, my right hand would go beyond the boundaries of the array,
1775
01:20:07,340 --> 01:20:08,170
so just bad.
1776
01:20:08,170 --> 01:20:10,490
All right, so why am I saying this n times?
1777
01:20:10,490 --> 01:20:13,070
Well, we did it very organically with humans.
1778
01:20:13,070 --> 01:20:15,940
But each time someone--
1779
01:20:15,940 --> 01:20:18,370
each pass I did through the array, someone
1780
01:20:18,370 --> 01:20:19,840
bubbled their way up to the end.
1781
01:20:19,840 --> 01:20:22,870
Number 7, then number 6, then number 5.
1782
01:20:22,870 --> 01:20:26,600
So if on each pass through the array of volunteers,
1783
01:20:26,600 --> 01:20:31,360
I was solving at least one problem, it seems like bubble sort can just
1784
01:20:31,360 --> 01:20:34,315
run n times total to solve all n problems,
1785
01:20:34,315 --> 01:20:36,940
because the first pass will get at least one number into place.
1786
01:20:36,940 --> 01:20:38,470
Second pass, second number into place.
1787
01:20:38,470 --> 01:20:39,970
You might get lucky, and it would do more.
1788
01:20:39,970 --> 01:20:41,740
But worst case, this feels like enough.
1789
01:20:41,740 --> 01:20:46,240
Just do this blindly n times, and they'll all line up together.
1790
01:20:46,240 --> 01:20:49,780
Well, technically-- all right, now we're getting into the weeds.
1791
01:20:49,780 --> 01:20:52,060
Technically, you can just repeat it in minus 1 times,
1792
01:20:52,060 --> 01:20:54,520
because if you solve all n minus 1 other problems,
1793
01:20:54,520 --> 01:20:58,120
and you're left with 1, literally that person's where they need to be,
1794
01:20:58,120 --> 01:20:58,900
just logically.
1795
01:20:58,900 --> 01:21:01,390
If you've already sorted everything else and you've got just the 1 left,
1796
01:21:01,390 --> 01:21:02,540
it's already bubbled up.
1797
01:21:02,540 --> 01:21:03,980
So how do we analyze this?
1798
01:21:03,980 --> 01:21:06,670
Well in bubble sort, we might do something like this.
1799
01:21:06,670 --> 01:21:11,015
I'm essentially doing n minus 1 things n minus 1 times.
1800
01:21:11,015 --> 01:21:13,390
Now, let me back up to the pseudocode, because this one's
1801
01:21:13,390 --> 01:21:14,980
a little less obvious.
1802
01:21:14,980 --> 01:21:19,270
This is where you can actually mathematically infer from your loop
1803
01:21:19,270 --> 01:21:21,110
how many steps you're taking.
1804
01:21:21,110 --> 01:21:24,585
So this first line literally says, repeat the following n minus 1 times.
1805
01:21:24,585 --> 01:21:26,710
So that's going to translate very straightforwardly
1806
01:21:26,710 --> 01:21:28,240
to our mathematical formula.
1807
01:21:28,240 --> 01:21:30,190
Do something n minus 1 times.
1808
01:21:30,190 --> 01:21:33,970
This loop, just because I'm using for loop terminology,
1809
01:21:33,970 --> 01:21:35,840
it's framed a little differently.
1810
01:21:35,840 --> 01:21:39,910
But if you're iterating from 0 to n minus 2,
1811
01:21:39,910 --> 01:21:43,258
you're iterating a total of n minus 1 times.
1812
01:21:43,258 --> 01:21:45,550
And again, the arithmetic is getting a little annoying.
1813
01:21:45,550 --> 01:21:48,470
But this just means do the following n minus 1 times.
1814
01:21:48,470 --> 01:21:51,670
So do n minus 1 things n minus 1 times.
1815
01:21:51,670 --> 01:21:54,440
We can now run out the math as follows.
1816
01:21:54,440 --> 01:21:57,940
We have the formula n minus 1 times n minus 1.
1817
01:21:57,940 --> 01:22:01,210
We do our little FOIL method here, n squared minus 1 times n,
1818
01:22:01,210 --> 01:22:03,100
minus 1 times n, plus 1.
1819
01:22:03,100 --> 01:22:06,550
We can combine like terms. n squared minus 2n plus 1.
1820
01:22:06,550 --> 01:22:09,025
But at this point, when n gets really large,
1821
01:22:09,025 --> 01:22:10,900
which term are we really going to care about?
1822
01:22:10,900 --> 01:22:13,390
This is on the order of?
1823
01:22:13,390 --> 01:22:14,870
Yeah, n squared.
1824
01:22:14,870 --> 01:22:16,780
So at least asymptotically.
1825
01:22:16,780 --> 01:22:20,830
Asymptotically means, as n approaches infinity, gets really large.
1826
01:22:20,830 --> 01:22:24,070
Turns out that the upper bound on selection sort and bubble sort
1827
01:22:24,070 --> 01:22:25,430
are essentially the same.
1828
01:22:25,430 --> 01:22:28,540
Now, if we really nitpicked and compared the total number of comparisons,
1829
01:22:28,540 --> 01:22:29,680
they might differ slightly.
1830
01:22:29,680 --> 01:22:31,870
But as n gets large, honestly, you're barely
1831
01:22:31,870 --> 01:22:36,350
going to notice the difference, it would seem, between these two algorithms.
1832
01:22:36,350 --> 01:22:39,550
But what about the lower bound?
1833
01:22:39,550 --> 01:22:44,620
If the upper bound on bubble sort is also big O of n, what about the lower
1834
01:22:44,620 --> 01:22:45,470
bound here?
1835
01:22:45,470 --> 01:22:50,170
Well, with this pseudocode, what would the lower bound be on bubble sort?
1836
01:22:50,170 --> 01:22:53,890
Even in the best case when all of the volunteers are sorted.
1837
01:22:53,890 --> 01:22:56,830
Any intuition?
1838
01:22:56,830 --> 01:22:57,670
In this pseudo code.
1839
01:22:57,670 --> 01:22:58,538
Yeah, in the middle.
1840
01:22:58,538 --> 01:22:59,830
STUDENT: Sorry, quick question.
1841
01:22:59,830 --> 01:23:02,360
Isn't bubble sort structured such that you
1842
01:23:02,360 --> 01:23:05,955
wouldn't need to compare numbers that have already bubbled up?
1843
01:23:05,955 --> 01:23:07,080
DAVID MALAN: Good question.
1844
01:23:07,080 --> 01:23:09,122
Isn't bubble sort designed such that you wouldn't
1845
01:23:09,122 --> 01:23:12,860
need to compare numbers that have already bubbled up?
1846
01:23:12,860 --> 01:23:17,000
That's what's happening here in the middle, implicitly.
1847
01:23:17,000 --> 01:23:19,220
I'm always going from left to right.
1848
01:23:19,220 --> 01:23:21,650
But remember that even when I screwed up at the end
1849
01:23:21,650 --> 01:23:23,900
and the last two people were out of order, I do always
1850
01:23:23,900 --> 01:23:27,140
need to restart at the beginning, because the big numbers are
1851
01:23:27,140 --> 01:23:29,691
going that way, and the small numbers are coming this way.
1852
01:23:29,691 --> 01:23:32,892
STUDENT: [INAUDIBLE]
1853
01:23:32,892 --> 01:23:34,100
DAVID MALAN: So that is true.
1854
01:23:34,100 --> 01:23:37,460
There are some slight optimizations that I'm kind of glossing over here.
1855
01:23:37,460 --> 01:23:40,700
Let me stipulate that it would still end up being on the order of n squared.
1856
01:23:40,700 --> 01:23:43,910
But that would definitely shave off some actual running time here.
1857
01:23:43,910 --> 01:23:46,340
But what if the list is already sorted?
1858
01:23:46,340 --> 01:23:48,410
Our pseudocode, at the moment, has no allowance
1859
01:23:48,410 --> 01:23:51,020
for if list is already sorted, quit early.
1860
01:23:51,020 --> 01:23:53,840
So we're going to blindly do n minus 1 things
1861
01:23:53,840 --> 01:23:58,850
and minus 1 times unless we modify our pseudocode, as I did verbally earlier,
1862
01:23:58,850 --> 01:23:59,960
I proposed this.
1863
01:23:59,960 --> 01:24:04,520
Inside of that outer loop, if you make a pass across all of the volunteers,
1864
01:24:04,520 --> 01:24:06,907
and your mental counter has made no swaps,
1865
01:24:06,907 --> 01:24:08,990
you have to keep track with some kind of variable,
1866
01:24:08,990 --> 01:24:10,518
well then, you might as well stop.
1867
01:24:10,518 --> 01:24:12,560
Because if you do a whole pass and make no swaps,
1868
01:24:12,560 --> 01:24:17,550
why would you waste time doing it again expecting different behavior?
1869
01:24:17,550 --> 01:24:23,480
So to help visualize these, whereby now bubble sort can be advantageous
1870
01:24:23,480 --> 01:24:26,640
if the data is already sorted or mostly sorted.
1871
01:24:26,640 --> 01:24:27,140
Why?
1872
01:24:27,140 --> 01:24:29,510
Because it does have this short circuit detail.
1873
01:24:29,510 --> 01:24:31,790
At least if we implement it like that, how
1874
01:24:31,790 --> 01:24:36,263
can we go about visualizing these things a little more clearly?
1875
01:24:36,263 --> 01:24:37,680
Well, let me go ahead and do this.
1876
01:24:37,680 --> 01:24:41,870
Let me pull up, here, a visualization of exactly these algorithms,
1877
01:24:41,870 --> 01:24:45,320
thanks to a third party tool here that's going to help us visualize
1878
01:24:45,320 --> 01:24:46,850
these sorting algorithms as follows.
1879
01:24:46,850 --> 01:24:48,740
Small bars represent small numbers.
1880
01:24:48,740 --> 01:24:50,480
Big bars represent big numbers.
1881
01:24:50,480 --> 01:24:53,720
And so the idea, now, is when I hit a button here
1882
01:24:53,720 --> 01:24:56,843
to get all of the small bars this way, all of the big bars this way.
1883
01:24:56,843 --> 01:24:58,010
So just like our volunteers.
1884
01:24:58,010 --> 01:25:02,370
But instead of holding lighted numbers, it's bars representing their magnitude.
1885
01:25:02,370 --> 01:25:07,190
So let's go ahead and start with, for instance, selection sort.
1886
01:25:07,190 --> 01:25:09,920
And you'll see in pink, is being highlighted
1887
01:25:09,920 --> 01:25:12,980
the current number that is being selected
1888
01:25:12,980 --> 01:25:14,820
and then pulled all the way to the left.
1889
01:25:14,820 --> 01:25:16,220
So this is selection sort.
1890
01:25:16,220 --> 01:25:20,420
And again, it's selecting the next smallest element.
1891
01:25:20,420 --> 01:25:25,850
But you can see here, all the more visibly, that just like my human feet,
1892
01:25:25,850 --> 01:25:27,450
we're taking a lot of steps.
1893
01:25:27,450 --> 01:25:32,430
So is this algorithm touching these elements, again and again and again.
1894
01:25:32,430 --> 01:25:34,970
And this is why the n squared is really a thing.
1895
01:25:34,970 --> 01:25:37,322
There's got to be some inherent redundancy here.
1896
01:25:37,322 --> 01:25:40,280
Like, why do we keep looking at the same darn elements again and again?
1897
01:25:40,280 --> 01:25:43,070
We do, in terms of our pseudocode need to do so.
1898
01:25:43,070 --> 01:25:46,670
But it's this redundant comparisons that kind of explains
1899
01:25:46,670 --> 01:25:48,782
why n squared is indeed the case.
1900
01:25:48,782 --> 01:25:49,490
So now it's done.
1901
01:25:49,490 --> 01:25:50,977
Small bars here, big bars there.
1902
01:25:50,977 --> 01:25:53,060
And I had to just keep talking there to kill time,
1903
01:25:53,060 --> 01:25:54,650
because it's relatively slow.
1904
01:25:54,650 --> 01:25:57,182
Well, let me re-randomize the array, just
1905
01:25:57,182 --> 01:25:58,640
so we start with a different order.
1906
01:25:58,640 --> 01:26:00,380
And now let me click on bubble sort.
1907
01:26:00,380 --> 01:26:03,240
And you'll see similar idea, but different algorithm.
1908
01:26:03,240 --> 01:26:06,620
So now, the two bars in pink are the two that
1909
01:26:06,620 --> 01:26:09,995
are being compared and fixed, potentially, if they're out of order.
1910
01:26:09,995 --> 01:26:11,870
And you can see already that the biggest bars
1911
01:26:11,870 --> 01:26:14,420
are bubbling their way up to the top.
1912
01:26:14,420 --> 01:26:17,510
But now, you can also see this redundancy,
1913
01:26:17,510 --> 01:26:20,450
like we keep swooping through the list again and again, just
1914
01:26:20,450 --> 01:26:22,740
like I kept walking back and forth.
1915
01:26:22,740 --> 01:26:23,795
And this is n squared.
1916
01:26:23,795 --> 01:26:24,920
This is not that many bars.
1917
01:26:24,920 --> 01:26:25,420
What?
1918
01:26:25,420 --> 01:26:27,830
10, 20, there's like 40 or something bars, I'm guessing.
1919
01:26:27,830 --> 01:26:31,560
That's pretty slow already just to sort 40 numbers.
1920
01:26:31,560 --> 01:26:34,310
And I think it's going to get tedious if I keep talking over this.
1921
01:26:34,310 --> 01:26:37,590
So let's just assume that this too is relatively slow.
1922
01:26:37,590 --> 01:26:41,390
Had I gotten lucky and the list were almost sorted already,
1923
01:26:41,390 --> 01:26:43,310
bubble sort would have been pretty fast.
1924
01:26:43,310 --> 01:26:46,040
But this was a truly random array, so we did not get lucky.
1925
01:26:46,040 --> 01:26:50,010
So indeed, the worst case might be what's kicking in here.
1926
01:26:50,010 --> 01:26:54,470
So I feel like it'll be anticlimactic, like holding in a sneeze, if I
1927
01:26:54,470 --> 01:26:55,980
don't let you see the end of this.
1928
01:26:55,980 --> 01:26:57,890
So here we go.
1929
01:26:57,890 --> 01:27:00,110
Nothing interesting is about to happen.
1930
01:27:00,110 --> 01:27:02,330
Almost done.
1931
01:27:02,330 --> 01:27:03,080
OK, done.
1932
01:27:03,080 --> 01:27:05,890
All right, so thank you.
1933
01:27:05,890 --> 01:27:06,710
[APPLAUSE]
1934
01:27:06,710 --> 01:27:09,110
Thank you.
1935
01:27:09,110 --> 01:27:12,500
So still somewhat slow though.
1936
01:27:12,500 --> 01:27:15,800
How though can we, perhaps, do a little better fundamentally?
1937
01:27:15,800 --> 01:27:19,070
So we can do so if we introduce yet another technique.
1938
01:27:19,070 --> 01:27:22,130
And this one isn't so much a function of code as it is concept.
1939
01:27:22,130 --> 01:27:25,290
And it's something that you might have seen in the real world,
1940
01:27:25,290 --> 01:27:27,500
but perhaps not so obviously so.
1941
01:27:27,500 --> 01:27:31,490
So it turns out, in programming, recursion
1942
01:27:31,490 --> 01:27:34,970
refers to the ability of a function to call itself.
1943
01:27:34,970 --> 01:27:37,460
In the world of mathematics, if you have a function f,
1944
01:27:37,460 --> 01:27:41,450
if f appears on both the left side and the right side of a formula,
1945
01:27:41,450 --> 01:27:43,850
that would be a recursive function in the math world too.
1946
01:27:43,850 --> 01:27:46,490
Whenever f is defined in terms of itself, or in our case,
1947
01:27:46,490 --> 01:27:51,800
in compute-- in programming, any time a function calls itself,
1948
01:27:51,800 --> 01:27:53,660
that function is said to be recursive.
1949
01:27:53,660 --> 01:27:55,790
And this is actually something we've seen already in class,
1950
01:27:55,790 --> 01:27:57,373
even though we didn't call it as much.
1951
01:27:57,373 --> 01:28:00,350
So for instance, consider this pseudocode
1952
01:28:00,350 --> 01:28:03,590
from earlier, whereby this was the pseudocode
1953
01:28:03,590 --> 01:28:07,760
for searching via binary search, a whole bunch of doors.
1954
01:28:07,760 --> 01:28:10,040
If no doors are left returned false, that
1955
01:28:10,040 --> 01:28:12,600
was the additional conditional we added.
1956
01:28:12,600 --> 01:28:14,930
But then if number behind middle door returned true,
1957
01:28:14,930 --> 01:28:18,980
and here's the interesting part, if number is less than middle door,
1958
01:28:18,980 --> 01:28:20,780
search the left half.
1959
01:28:20,780 --> 01:28:24,020
Else if number is greater than middle door, search the right half.
1960
01:28:24,020 --> 01:28:27,800
This pseudocode earlier was, itself, recursive.
1961
01:28:27,800 --> 01:28:28,340
Why?
1962
01:28:28,340 --> 01:28:30,590
Because here is an algorithm for searching.
1963
01:28:30,590 --> 01:28:32,650
But what's the algorithm telling us?
1964
01:28:32,650 --> 01:28:37,280
Well, on this line and this line, it's telling us to search something else.
1965
01:28:37,280 --> 01:28:40,720
So even though it's not explicitly defined in code as having a name,
1966
01:28:40,720 --> 01:28:44,410
if this is a search algorithm, and yet the search algorithm is using a search
1967
01:28:44,410 --> 01:28:47,650
algorithm, this pseudocode is recursive.
1968
01:28:47,650 --> 01:28:50,380
Now, that could quickly get you into trouble if a function just
1969
01:28:50,380 --> 01:28:53,410
calls itself again and again and again.
1970
01:28:53,410 --> 01:28:57,130
But why, intuitively, is it not problematic
1971
01:28:57,130 --> 01:29:01,840
that this code, this pseudocode, calls itself?
1972
01:29:01,840 --> 01:29:03,460
Why will the algorithm still stop?
1973
01:29:03,460 --> 01:29:03,970
Yeah.
1974
01:29:03,970 --> 01:29:07,525
STUDENT: It has an exit condition, as in if there is no doors left, [INAUDIBLE]..
1975
01:29:07,525 --> 01:29:08,400
DAVID MALAN: Exactly.
1976
01:29:08,400 --> 01:29:10,860
It has some exit condition, like if no doors left.
1977
01:29:10,860 --> 01:29:14,700
And more importantly, any time you search the left half,
1978
01:29:14,700 --> 01:29:17,120
you're searching a smaller version of the problem.
1979
01:29:17,120 --> 01:29:18,870
Any time you search the right half, you're
1980
01:29:18,870 --> 01:29:22,330
searching a smaller version of the problem, literally half the size.
1981
01:29:22,330 --> 01:29:24,270
So this is why, in the phone book, obviously
1982
01:29:24,270 --> 01:29:27,210
I couldn't tear the phone book in half infinitely many times,
1983
01:29:27,210 --> 01:29:29,560
because it was literally getting smaller each time.
1984
01:29:29,560 --> 01:29:33,580
So recursion is this ability to call yourself, if you will.
1985
01:29:33,580 --> 01:29:36,850
But what's important is that you do it on a smaller, smaller problem,
1986
01:29:36,850 --> 01:29:39,750
so that eventually, you have no more problems to solve
1987
01:29:39,750 --> 01:29:42,010
or no more data, no more doors at all.
1988
01:29:42,010 --> 01:29:46,210
So these two lines here would be the recursive elements here.
1989
01:29:46,210 --> 01:29:49,690
But if we go back to week 0, we could have used recursion in some other way.
1990
01:29:49,690 --> 01:29:53,040
So this was our pseudocode for the phone book back in week 0.
1991
01:29:53,040 --> 01:29:55,350
And recall that we described these yellow lines
1992
01:29:55,350 --> 01:29:59,050
as really representing a loop, some kind of cycle again and again.
1993
01:29:59,050 --> 01:30:01,080
But there was a missed opportunity here.
1994
01:30:01,080 --> 01:30:05,670
What if I had re-implemented this code to do this?
1995
01:30:05,670 --> 01:30:09,120
Instead of saying open to middle of left half of book
1996
01:30:09,120 --> 01:30:12,450
and then go back to line 3, like literally inducing a loop,
1997
01:30:12,450 --> 01:30:14,610
or open to middle of right half a book and go back
1998
01:30:14,610 --> 01:30:17,400
to line 3 inducing another loop, why don't I
1999
01:30:17,400 --> 01:30:19,590
just recognize that what I'm staring at now
2000
01:30:19,590 --> 01:30:23,730
is a algorithm for searching a phone book?
2001
01:30:23,730 --> 01:30:27,480
And if you want to search a smaller phone book, like A through M or N
2002
01:30:27,480 --> 01:30:30,750
through Z, we'll just use this same algorithm.
2003
01:30:30,750 --> 01:30:35,100
So I can replace these yellow lines with just this, casually speaking.
2004
01:30:35,100 --> 01:30:37,282
Search left half of book, search right half of book.
2005
01:30:37,282 --> 01:30:39,240
This would be implicitly, and now I can shorten
2006
01:30:39,240 --> 01:30:42,360
the whole thing, a recursive implementation of the phone book
2007
01:30:42,360 --> 01:30:43,633
pseudocode from week 0.
2008
01:30:43,633 --> 01:30:46,050
And it's recursive, because if this is a search algorithm,
2009
01:30:46,050 --> 01:30:48,900
and you're saying go search something else, that's fine.
2010
01:30:48,900 --> 01:30:49,890
That's recursive.
2011
01:30:49,890 --> 01:30:52,440
But because you're searching half of the phone book,
2012
01:30:52,440 --> 01:30:55,710
it's indeed going to get smaller and smaller.
2013
01:30:55,710 --> 01:30:59,400
Even in the real world or the real real virtual world,
2014
01:30:59,400 --> 01:31:01,903
you can see recursive data structures in the wild,
2015
01:31:01,903 --> 01:31:03,820
or at least in Super Mario Brothers like this.
2016
01:31:03,820 --> 01:31:05,612
Let me get rid of all the distractions here
2017
01:31:05,612 --> 01:31:08,790
and focus on this pyramid, where you have one block,
2018
01:31:08,790 --> 01:31:10,860
then two, then three, then four.
2019
01:31:10,860 --> 01:31:14,710
Well, this itself, is technically recursively-defined in the sense that,
2020
01:31:14,710 --> 01:31:16,590
well, what is a pyramid of height for?
2021
01:31:16,590 --> 01:31:18,420
Well, it's really, what?
2022
01:31:18,420 --> 01:31:21,090
How would you describe a pyramid of height 4
2023
01:31:21,090 --> 01:31:25,743
is actually the same thing as a pyramid of--
2024
01:31:25,743 --> 01:31:28,200
STUDENT: Height 3.
2025
01:31:28,200 --> 01:31:30,750
DAVID MALAN: --of height 3, plus 1 additional layer.
2026
01:31:30,750 --> 01:31:32,370
Well, what's a pyramid of height 3?
2027
01:31:32,370 --> 01:31:36,250
Well, it's technically a pyramid of height 2 plus 1 additional layer.
2028
01:31:36,250 --> 01:31:38,670
And so even physical structures can be recursive
2029
01:31:38,670 --> 01:31:40,630
if you can define them in terms of itself.
2030
01:31:40,630 --> 01:31:44,280
Now, at some point, you have to say that if the pyramid is of height 1,
2031
01:31:44,280 --> 01:31:46,090
there's just one block.
2032
01:31:46,090 --> 01:31:49,020
You can't forever say it's defined in terms of a height negative 1,
2033
01:31:49,020 --> 01:31:50,440
negative 2, you would never stop.
2034
01:31:50,440 --> 01:31:52,752
So you have to kind of have a special case there.
2035
01:31:52,752 --> 01:31:55,710
But let's go ahead and translate something like this, in fact, to code.
2036
01:31:55,710 --> 01:32:00,390
Let me go back to VS code here, and let me implement a program called iteration
2037
01:32:00,390 --> 01:32:03,090
that refers to a loop iterating.
2038
01:32:03,090 --> 01:32:05,620
And let me implement a very simple pyramid like that.
2039
01:32:05,620 --> 01:32:08,370
So let me go ahead and include the CS50 library.
2040
01:32:08,370 --> 01:32:14,918
I'll include our standard io.h int main void, no command line arguments today.
2041
01:32:14,918 --> 01:32:16,210
And let's go ahead and do this.
2042
01:32:16,210 --> 01:32:18,900
Let's declare a variable called height, ask the human
2043
01:32:18,900 --> 01:32:21,150
for the height of this pyramid.
2044
01:32:21,150 --> 01:32:25,300
And then let's go ahead and draw a pyramid of that height.
2045
01:32:25,300 --> 01:32:27,580
Now, of course, draw does not yet exist.
2046
01:32:27,580 --> 01:32:30,090
So I'm going to need to invent the draw function.
2047
01:32:30,090 --> 01:32:33,180
Let me go ahead and define a function that doesn't have a return value.
2048
01:32:33,180 --> 01:32:34,722
It's just going to have side effects.
2049
01:32:34,722 --> 01:32:37,230
It's just going to print bricks on the screen, called draw.
2050
01:32:37,230 --> 01:32:40,240
And it takes in an integer, n, as its input.
2051
01:32:40,240 --> 01:32:41,950
And how am I going to implement this?
2052
01:32:41,950 --> 01:32:46,530
Well again, I want to print one block, then two, then three, then four.
2053
01:32:46,530 --> 01:32:49,680
That's pretty straightforward, at least once you're comfortable with loops.
2054
01:32:49,680 --> 01:32:51,370
Let me go back to the code here.
2055
01:32:51,370 --> 01:32:55,170
Let me go ahead and say 4, int i, get 0.
2056
01:32:55,170 --> 01:32:56,880
i is less than n.
2057
01:32:56,880 --> 01:32:58,260
i plus plus.
2058
01:32:58,260 --> 01:33:01,170
And that's going to iterate, essentially row by row.
2059
01:33:01,170 --> 01:33:05,370
And on each row, I want to print out one, then two, then three, then
2060
01:33:05,370 --> 01:33:06,060
four bricks.
2061
01:33:06,060 --> 01:33:08,815
But I'm iterating from 0 to 1 to 2 to 3.
2062
01:33:08,815 --> 01:33:09,690
So I think that's OK.
2063
01:33:09,690 --> 01:33:13,020
I can just say something like 4 int j get 0.
2064
01:33:13,020 --> 01:33:17,160
j, let's be clever about this, is less than i.
2065
01:33:17,160 --> 01:33:19,380
j++.
2066
01:33:19,380 --> 01:33:22,560
And now, let me go ahead and, inside of this loop,
2067
01:33:22,560 --> 01:33:27,130
I think I can get away with just printing out a single hash sign.
2068
01:33:27,130 --> 01:33:30,270
But then outside of that loop, similar to last week,
2069
01:33:30,270 --> 01:33:32,920
I'm going to print my new line separately.
2070
01:33:32,920 --> 01:33:34,470
So a little non-obvious at first.
2071
01:33:34,470 --> 01:33:38,790
But this outer loop iterates row by row, line by line, if you will.
2072
01:33:38,790 --> 01:33:46,890
And then the inner loop just makes sure that when i equals zero, let's see.
2073
01:33:46,890 --> 01:33:48,960
Oh nope, there's a bug.
2074
01:33:48,960 --> 01:33:52,170
I need to make sure that it's j is less than i plus 1.
2075
01:33:52,170 --> 01:33:55,500
So when i is 0 on my first line of output,
2076
01:33:55,500 --> 01:33:57,600
I'm going to print out one brick.
2077
01:33:57,600 --> 01:34:02,350
When i is 1, I'm going to print out two bricks and so forth.
2078
01:34:02,350 --> 01:34:05,460
So let me go ahead and run make iteration.
2079
01:34:05,460 --> 01:34:09,090
All right, and now, seems to compile.
2080
01:34:09,090 --> 01:34:10,770
Uh oh, huh.
2081
01:34:10,770 --> 01:34:12,900
Implicit declaration of function draw.
2082
01:34:12,900 --> 01:34:16,100
So I'm making week one mistakes again.
2083
01:34:16,100 --> 01:34:16,660
What?
2084
01:34:16,660 --> 01:34:17,570
Say again.
2085
01:34:17,570 --> 01:34:18,450
STUDENT: [INAUDIBLE]
2086
01:34:18,450 --> 01:34:19,200
DAVID MALAN: Yeah.
2087
01:34:19,200 --> 01:34:20,320
The prototype is missing.
2088
01:34:20,320 --> 01:34:21,300
I didn't declare it at the top.
2089
01:34:21,300 --> 01:34:23,550
That's an easy fix, and the only time, really, it's
2090
01:34:23,550 --> 01:34:25,530
OK and necessary to copy paste.
2091
01:34:25,530 --> 01:34:29,050
Let me copy the functions declaration there and it with a semicolon.
2092
01:34:29,050 --> 01:34:32,370
So that clang now knows that draw will exist.
2093
01:34:32,370 --> 01:34:33,240
Make iteration.
2094
01:34:33,240 --> 01:34:33,930
Now it works.
2095
01:34:33,930 --> 01:34:36,090
Thank you. dot slash iteration.
2096
01:34:36,090 --> 01:34:37,830
We'll type in something like 4.
2097
01:34:37,830 --> 01:34:40,860
And there we have it, our pyramid of height one, two, three, four,
2098
01:34:40,860 --> 01:34:43,340
that looks pretty similar to this, albeit using hashes.
2099
01:34:43,340 --> 01:34:46,590
So that's how we would have implemented this, like, two weeks ago in week one,
2100
01:34:46,590 --> 01:34:49,110
maybe last week, but just using arrays.
2101
01:34:49,110 --> 01:34:53,640
But let me propose that we could do something recursively instead.
2102
01:34:53,640 --> 01:34:55,480
Let me close this version of the code.
2103
01:34:55,480 --> 01:34:59,730
And let me go back to VS Code and open up recursion.c,
2104
01:34:59,730 --> 01:35:01,800
just to demonstrate something recursively.
2105
01:35:01,800 --> 01:35:04,420
And I'll do it incorrectly deliberately the first time.
2106
01:35:04,420 --> 01:35:06,630
So let me include cs50.h.
2107
01:35:06,630 --> 01:35:08,850
Let me include standard io.h.
2108
01:35:08,850 --> 01:35:12,000
Let me do int main void.
2109
01:35:12,000 --> 01:35:17,910
And let me just blindly draw a pyramid initially of height 1.
2110
01:35:17,910 --> 01:35:21,910
But now in my draw function, let me re-implement it a little differently.
2111
01:35:21,910 --> 01:35:24,840
So my draw function this time is still going to take a number n.
2112
01:35:24,840 --> 01:35:26,860
But that's how many hashes it's going to print.
2113
01:35:26,860 --> 01:35:30,030
So let's do 4, int i get 0.
2114
01:35:30,030 --> 01:35:32,220
i is less than n.
2115
01:35:32,220 --> 01:35:34,050
i++.
2116
01:35:34,050 --> 01:35:38,440
Then let's go ahead and print out a single hash mark here.
2117
01:35:38,440 --> 01:35:44,290
And then after that, let's print out the end of the line, just as before.
2118
01:35:44,290 --> 01:35:49,770
But now this, of course, is only going to draw a single row.
2119
01:35:49,770 --> 01:35:53,790
It's going to print out one hash or two hashes or three hashes, but only
2120
01:35:53,790 --> 01:35:54,750
on one line.
2121
01:35:54,750 --> 01:35:58,560
Let me now, incorrectly, but just kind of curiously say, all right.
2122
01:35:58,560 --> 01:36:01,230
Well, if this draws a pyramid of height 1,
2123
01:36:01,230 --> 01:36:04,860
let's just use ourselves to draw a pyramid of height n plus 1.
2124
01:36:04,860 --> 01:36:08,370
So the first time I call draw, it will print out one hash.
2125
01:36:08,370 --> 01:36:12,870
Then the second time I call draw, it will print out two hashes, then three,
2126
01:36:12,870 --> 01:36:13,770
then four.
2127
01:36:13,770 --> 01:36:18,000
So we're kind of laying these bricks down from top to bottom.
2128
01:36:18,000 --> 01:36:20,670
Make recursion.
2129
01:36:20,670 --> 01:36:22,420
Whoops, I screwed up again.
2130
01:36:22,420 --> 01:36:24,630
So let's copy the prototype here.
2131
01:36:24,630 --> 01:36:27,260
Let's put this down over here, semicolon.
2132
01:36:27,260 --> 01:36:28,600
Let's do this again.
2133
01:36:28,600 --> 01:36:30,010
Make recursion.
2134
01:36:30,010 --> 01:36:32,410
All right, all good, dot slash recursion.
2135
01:36:32,410 --> 01:36:34,930
And now let me increase the size of my terminal window,
2136
01:36:34,930 --> 01:36:37,310
just so you can see more of the output.
2137
01:36:37,310 --> 01:36:39,490
And here we have.
2138
01:36:39,490 --> 01:36:41,480
OK, bad, but thank you.
2139
01:36:41,480 --> 01:36:43,525
So we have an infinitely tall pyramid.
2140
01:36:43,525 --> 01:36:45,400
And it's just flying across the screen, which
2141
01:36:45,400 --> 01:36:47,020
is why it looks kind of like a mess.
2142
01:36:47,020 --> 01:36:51,670
But I printed out a pyramid of height 1, and then 2, and then 3, and then 4.
2143
01:36:51,670 --> 01:36:55,210
And unfortunately, what am I lacking any sort of quick condition,
2144
01:36:55,210 --> 01:36:58,270
any kind of condition that says, wait a minute, when it's too tall,
2145
01:36:58,270 --> 01:36:59,353
stop altogether.
2146
01:36:59,353 --> 01:37:00,520
So this is an infinite loop.
2147
01:37:00,520 --> 01:37:01,570
But it's not a loop.
2148
01:37:01,570 --> 01:37:03,250
It's a recursive call.
2149
01:37:03,250 --> 01:37:05,780
And actually, doing this in general, is very bad.
2150
01:37:05,780 --> 01:37:08,410
We'll see next week that if you call a function too many times,
2151
01:37:08,410 --> 01:37:11,967
you can actually trigger yet another of those segmentation faults,
2152
01:37:11,967 --> 01:37:14,050
because you're using too much memory, essentially.
2153
01:37:14,050 --> 01:37:16,300
But for now, I haven't triggered that yet.
2154
01:37:16,300 --> 01:37:17,927
Control C is your friend to cancel.
2155
01:37:17,927 --> 01:37:21,010
And as an aside, if you're playing along at home or playing with this code
2156
01:37:21,010 --> 01:37:22,750
later, I actually cheated here.
2157
01:37:22,750 --> 01:37:25,360
We have a special clang configuration feature
2158
01:37:25,360 --> 01:37:28,392
that prevents you from calling a function like that
2159
01:37:28,392 --> 01:37:29,350
and creating a problem.
2160
01:37:29,350 --> 01:37:31,598
I overrode it just for demonstration sake.
2161
01:37:31,598 --> 01:37:34,640
But odds are at home, you wouldn't be able to compile this code yourself.
2162
01:37:34,640 --> 01:37:39,050
But let me do a proper version recursively of this code as follows.
2163
01:37:39,050 --> 01:37:41,870
Let me go back into the code here.
2164
01:37:41,870 --> 01:37:45,130
Let me go ahead and, not just blindly start drawing one, then two,
2165
01:37:45,130 --> 01:37:46,540
then three layers of bricks.
2166
01:37:46,540 --> 01:37:50,380
Let me prompt the human as before for the height of the pyramid
2167
01:37:50,380 --> 01:37:53,350
they want using our get int function.
2168
01:37:53,350 --> 01:37:55,670
And now let me call draw of height again.
2169
01:37:55,670 --> 01:37:58,330
So now I'm going back to the loop-like version.
2170
01:37:58,330 --> 01:38:03,250
But instead of using a loop now, this is where recursion gets rather elegant,
2171
01:38:03,250 --> 01:38:04,120
if you will.
2172
01:38:04,120 --> 01:38:10,690
Let me go ahead and execute and code the draw function as follows.
2173
01:38:10,690 --> 01:38:14,050
Per your definition, if a pyramid of height 4
2174
01:38:14,050 --> 01:38:17,437
is really just a pyramid of height 3 plus another row, well,
2175
01:38:17,437 --> 01:38:18,520
let's take that literally.
2176
01:38:18,520 --> 01:38:19,990
Let me go back to my code.
2177
01:38:19,990 --> 01:38:24,340
And if you want to draw a pyramid of height 4, well go right ahead
2178
01:38:24,340 --> 01:38:29,380
and draw a pyramid of height 3 first, or more generally, n minus 1.
2179
01:38:29,380 --> 01:38:30,640
But what's the second step?
2180
01:38:30,640 --> 01:38:34,510
Well, once you've drawn a pyramid of height 3, draw an extra row.
2181
01:38:34,510 --> 01:38:37,190
So I at least have to bite off that part of the problem myself.
2182
01:38:37,190 --> 01:38:39,310
So let me just do for int i get 0.
2183
01:38:39,310 --> 01:38:41,530
i is less than n i++.
2184
01:38:41,530 --> 01:38:46,010
And let me, the programmer of this function, print out my hashes.
2185
01:38:46,010 --> 01:38:48,340
And then at the very bottom, print out a new line
2186
01:38:48,340 --> 01:38:50,350
so the cursor moves to the next line.
2187
01:38:50,350 --> 01:38:55,660
But this is kind of elegant now, I dare say, in that draw is recursive,
2188
01:38:55,660 --> 01:38:58,570
because I'm literally translating from English to C code,
2189
01:38:58,570 --> 01:39:02,050
this idea that a pyramid of height 4 is really just a pyramid of height 3.
2190
01:39:02,050 --> 01:39:03,640
So I do that first.
2191
01:39:03,640 --> 01:39:06,560
And I'm sort of trusting that this will work.
2192
01:39:06,560 --> 01:39:09,800
Then I just have to lay one more layer of bricks, four of them.
2193
01:39:09,800 --> 01:39:13,030
So if n is 4, this is just a simple for loop, a la week 1,
2194
01:39:13,030 --> 01:39:15,520
that will print out an additional layer.
2195
01:39:15,520 --> 01:39:18,610
But this, of course, is going to be problematic eventually.
2196
01:39:18,610 --> 01:39:20,030
Why?
2197
01:39:20,030 --> 01:39:22,670
It's not done yet, this program.
2198
01:39:22,670 --> 01:39:27,644
How many times will draw call itself in this model?
2199
01:39:27,644 --> 01:39:28,640
STUDENT: It's infinite.
2200
01:39:28,640 --> 01:39:30,098
DAVID MALAN: Infinitely many times.
2201
01:39:30,098 --> 01:39:30,814
Why?
2202
01:39:30,814 --> 01:39:34,170
STUDENT: Because there's no quit function.
2203
01:39:34,170 --> 01:39:36,450
DAVID MALAN: Yeah, there's no equivalent of quit.
2204
01:39:36,450 --> 01:39:39,810
Like, if you've printed enough already, then quit, well,
2205
01:39:39,810 --> 01:39:41,050
how do we capture that?
2206
01:39:41,050 --> 01:39:43,320
Well, I don't think we want this to go negative.
2207
01:39:43,320 --> 01:39:46,570
It would make no sense to draw a negative height pyramid.
2208
01:39:46,570 --> 01:39:51,270
So I think we can just pluck off, as the programmer, an easy case,
2209
01:39:51,270 --> 01:39:53,650
an easy answer, a so-called base case.
2210
01:39:53,650 --> 01:39:54,900
And I'm just going to do this.
2211
01:39:54,900 --> 01:40:00,030
At the top of my draw function, let me just say, if n is less than
2212
01:40:00,030 --> 01:40:02,830
or, heck, less than or equal to 0, that's it.
2213
01:40:02,830 --> 01:40:04,530
Go ahead and just return.
2214
01:40:04,530 --> 01:40:06,030
There's nothing more to do.
2215
01:40:06,030 --> 01:40:10,680
And that simple condition, technically known as a base case,
2216
01:40:10,680 --> 01:40:13,290
will ensure that the code doesn't run forever.
2217
01:40:13,290 --> 01:40:13,860
Why?
2218
01:40:13,860 --> 01:40:17,730
Well, suppose that draw is called with an argument of 4.
2219
01:40:17,730 --> 01:40:20,580
4 is, of course, not less than 0, so we don't return.
2220
01:40:20,580 --> 01:40:22,590
But we do draw a pyramid of height 3.
2221
01:40:22,590 --> 01:40:24,870
And here's where things get a little mentally tricky.
2222
01:40:24,870 --> 01:40:28,320
You don't move on to line 20 until draw has been called.
2223
01:40:28,320 --> 01:40:31,080
So when draw is called with an argument of 3,
2224
01:40:31,080 --> 01:40:34,230
it's as though you're executing from the top of this function again.
2225
01:40:34,230 --> 01:40:35,520
3 is not less than 0.
2226
01:40:35,520 --> 01:40:36,330
So what do you do?
2227
01:40:36,330 --> 01:40:38,490
You draw 2.
2228
01:40:38,490 --> 01:40:39,540
How do you draw 2?
2229
01:40:39,540 --> 01:40:41,950
Well, 2 is not less than 0, so you don't return.
2230
01:40:41,950 --> 01:40:43,050
So you draw 1.
2231
01:40:43,050 --> 01:40:44,370
Got to be careful here.
2232
01:40:44,370 --> 01:40:45,240
Draw 1.
2233
01:40:45,240 --> 01:40:47,340
And now, we go ahead back to the beginning.
2234
01:40:47,340 --> 01:40:48,090
How do you draw 1?
2235
01:40:48,090 --> 01:40:50,430
Well, 1 is not less than 0, so you don't return.
2236
01:40:50,430 --> 01:40:53,400
You draw height 0.
2237
01:40:53,400 --> 01:40:54,510
How do you draw height 0?
2238
01:40:54,510 --> 01:40:55,110
Wait a minute.
2239
01:40:55,110 --> 01:40:57,660
0 is less than or equal to 0.
2240
01:40:57,660 --> 01:40:58,980
And you return.
2241
01:40:58,980 --> 01:41:02,100
And so it's kind of like this mental stack, this to do list.
2242
01:41:02,100 --> 01:41:05,190
You keep postponing, executing these lower lines of code,
2243
01:41:05,190 --> 01:41:08,700
because you keep restarting, restarting, restarting the draw function
2244
01:41:08,700 --> 01:41:12,840
until, finally, one of those function calls says there's nothing to do,
2245
01:41:12,840 --> 01:41:13,530
return.
2246
01:41:13,530 --> 01:41:16,530
And now the whole thing starts to unravel, if you will.
2247
01:41:16,530 --> 01:41:18,330
And you pick back up where you left off.
2248
01:41:18,330 --> 01:41:20,300
And this is, perhaps, the best scenario.
2249
01:41:20,300 --> 01:41:21,300
We won't do it in class.
2250
01:41:21,300 --> 01:41:23,508
But if you'd like to wrestle through this on your own
2251
01:41:23,508 --> 01:41:27,780
using debug50 to keep stepping into, step into, step into, each
2252
01:41:27,780 --> 01:41:31,480
of those lines, logically, you'll see exactly what's actually happening.
2253
01:41:31,480 --> 01:41:34,330
So let me go to my terminal and do make recursion,
2254
01:41:34,330 --> 01:41:37,740
which is now this correct version of the code, dot slash recursion.
2255
01:41:37,740 --> 01:41:39,240
Let's type in a height of 4.
2256
01:41:39,240 --> 01:41:44,730
And voila, now we have that same pyramid, not using iteration per se,
2257
01:41:44,730 --> 01:41:47,910
though admittedly, we're using iteration to print the additional layer.
2258
01:41:47,910 --> 01:41:53,340
We're now using draw recursively to print all of the smaller pyramids
2259
01:41:53,340 --> 01:41:55,120
that need come before it.
2260
01:41:55,120 --> 01:41:57,370
STUDENT: Can you only use recursion for void function?
2261
01:41:57,370 --> 01:41:58,123
[INAUDIBLE]
2262
01:41:58,123 --> 01:41:58,790
DAVID MALAN: No.
2263
01:41:58,790 --> 01:42:01,070
Question is, can you only use recursion with a void function?
2264
01:42:01,070 --> 01:42:01,920
No, not at all.
2265
01:42:01,920 --> 01:42:05,120
In fact, it's very common to have a return value like an integer
2266
01:42:05,120 --> 01:42:09,350
or something else so that you can actually do something constructively
2267
01:42:09,350 --> 01:42:11,360
with that actual value.
2268
01:42:11,360 --> 01:42:13,190
Other questions on this.
2269
01:42:13,190 --> 01:42:15,290
STUDENT: When is line 21 getting executed?
2270
01:42:15,290 --> 01:42:16,790
DAVID MALAN: Say it a little louder.
2271
01:42:16,790 --> 01:42:18,770
STUDENT: When is line 21 getting executed?
2272
01:42:18,770 --> 01:42:20,850
DAVID MALAN: When is line 21 getting executed?
2273
01:42:20,850 --> 01:42:23,720
So if you continue to--
2274
01:42:23,720 --> 01:42:26,600
let me scroll down a bit more so you can see the top of the code.
2275
01:42:26,600 --> 01:42:35,310
So line 21 will be executed once line 19 is done executing itself.
2276
01:42:35,310 --> 01:42:40,790
Now, in the story I told, we kept calling draw again, again, again.
2277
01:42:40,790 --> 01:42:43,400
But as soon as one of those function calls
2278
01:42:43,400 --> 01:42:46,490
where n equals 0 returns immediately, then we
2279
01:42:46,490 --> 01:42:48,510
don't keep drawing again and again.
2280
01:42:48,510 --> 01:42:51,110
So now if you kind of think of the process as reversing,
2281
01:42:51,110 --> 01:42:57,530
then you continue to line 21, then line 21 again, then line 21 again,
2282
01:42:57,530 --> 01:42:59,303
and as the sort of logic unravels.
2283
01:42:59,303 --> 01:43:01,970
And next week, we'll actually paint a picture of what's actually
2284
01:43:01,970 --> 01:43:03,530
happening in the computer's memory.
2285
01:43:03,530 --> 01:43:07,450
But for now, it's just, it's very similar to the pseudocode for the phone
2286
01:43:07,450 --> 01:43:07,950
book.
2287
01:43:07,950 --> 01:43:09,680
You're just searching again and again.
2288
01:43:09,680 --> 01:43:14,408
But you're waiting until the very end to get back the final result.
2289
01:43:14,408 --> 01:43:16,700
Google now, who I keep mentioning by coincidence today,
2290
01:43:16,700 --> 01:43:18,830
is full of programmers of course.
2291
01:43:18,830 --> 01:43:20,600
Here's a fun exercise.
2292
01:43:20,600 --> 01:43:23,432
Let me go back to a browser.
2293
01:43:23,432 --> 01:43:26,390
I'm going to go ahead and search for recursion, because I want to learn
2294
01:43:26,390 --> 01:43:27,980
a little something about recursion.
2295
01:43:27,980 --> 01:43:30,230
Here is kind of an internet meme or joke.
2296
01:43:30,230 --> 01:43:35,360
If I zoom in here, the engineers at Google are kind of funny.
2297
01:43:35,360 --> 01:43:37,902
See why?
2298
01:43:37,902 --> 01:43:38,798
STUDENT: Ah.
2299
01:43:38,798 --> 01:43:40,540
DAVID MALAN: Ah, there you go.
2300
01:43:40,540 --> 01:43:41,740
Yes.
2301
01:43:41,740 --> 01:43:43,030
Yes, this is recursion.
2302
01:43:43,030 --> 01:43:45,822
And there's going to be so many memes you'll come across now, where
2303
01:43:45,822 --> 01:43:48,490
recursion, like if you've ever pointed a camera at the TV that's
2304
01:43:48,490 --> 01:43:51,190
showing the camera, and you sort of see yourself or the image again and again,
2305
01:43:51,190 --> 01:43:52,660
that's really recursion.
2306
01:43:52,660 --> 01:43:56,320
And in that case, it only stops once you hit the base case of a single pixel.
2307
01:43:56,320 --> 01:43:59,350
But this is a very funny joke in some circles
2308
01:43:59,350 --> 01:44:01,880
when it comes to recursion and Google.
2309
01:44:01,880 --> 01:44:04,570
So how can we actually use Google, or rather,
2310
01:44:04,570 --> 01:44:08,050
how can we actually use recursion constructively?
2311
01:44:08,050 --> 01:44:11,320
Well, let me propose that we actually introduced
2312
01:44:11,320 --> 01:44:14,950
a third and final algorithm for sorting that hopefully does better
2313
01:44:14,950 --> 01:44:16,870
than the two sorts thus far.
2314
01:44:16,870 --> 01:44:19,480
We've done selection sort and bubble sort.
2315
01:44:19,480 --> 01:44:22,845
Bubble sort, we liked a little better, at least insofar as in the best case
2316
01:44:22,845 --> 01:44:24,220
where the list is already sorted.
2317
01:44:24,220 --> 01:44:26,387
Bubble sort's at least smarter, and it will actually
2318
01:44:26,387 --> 01:44:28,840
terminate early, giving us a better lower bound,
2319
01:44:28,840 --> 01:44:30,490
in terms of our omega notation.
2320
01:44:30,490 --> 01:44:33,100
But it turns out that recursion, and this is not
2321
01:44:33,100 --> 01:44:36,250
necessarily a feature of recursion, but something we can now leverage.
2322
01:44:36,250 --> 01:44:39,460
It turns out, using recursion, we can take a fundamentally different approach
2323
01:44:39,460 --> 01:44:43,090
to sorting a whole bunch of numbers in such a way
2324
01:44:43,090 --> 01:44:47,560
that we can do far fewer comparisons and, ideally, speed up
2325
01:44:47,560 --> 01:44:49,010
our final results.
2326
01:44:49,010 --> 01:44:51,970
So here is the pseudocode for what we're about to see
2327
01:44:51,970 --> 01:44:54,010
for something called merge sort.
2328
01:44:54,010 --> 01:44:56,230
And it really is this terse.
2329
01:44:56,230 --> 01:44:58,330
Sort the left half of numbers.
2330
01:44:58,330 --> 01:45:00,550
Sort the right half of numbers.
2331
01:45:00,550 --> 01:45:02,950
Merge the sorted halves.
2332
01:45:02,950 --> 01:45:06,310
This is almost sort of nonsensical, because if you're
2333
01:45:06,310 --> 01:45:09,880
asked for an algorithm to sort, and you respond with, well, sort the left half,
2334
01:45:09,880 --> 01:45:10,960
sort the right half.
2335
01:45:10,960 --> 01:45:14,230
That's being difficult, because well, I'm asking for a sorting algorithm.
2336
01:45:14,230 --> 01:45:16,897
You're just telling me to sort the left half and the right half.
2337
01:45:16,897 --> 01:45:21,130
But implicit in that last line, merging is a pretty powerful feature
2338
01:45:21,130 --> 01:45:21,760
of this sort.
2339
01:45:21,760 --> 01:45:23,908
Now, we do need another base case at the top.
2340
01:45:23,908 --> 01:45:24,700
So let me add this.
2341
01:45:24,700 --> 01:45:27,850
If we find ourselves with a list, an array, of size 1,
2342
01:45:27,850 --> 01:45:29,807
well, that array is obviously sorted.
2343
01:45:29,807 --> 01:45:32,390
If there's only one element in it, there's no work to be done.
2344
01:45:32,390 --> 01:45:33,890
So that's going to be our base case.
2345
01:45:33,890 --> 01:45:38,770
But allowing us now, in just these, what, four, six lines of pseudocode,
2346
01:45:38,770 --> 01:45:40,900
to actually sort some elements.
2347
01:45:40,900 --> 01:45:43,652
But let's focus first on just a subset of this.
2348
01:45:43,652 --> 01:45:46,360
Let's consider for a moment what it means to merge sorted halves.
2349
01:45:46,360 --> 01:45:48,760
So Carter has wonderfully come up to volunteer here just
2350
01:45:48,760 --> 01:45:50,170
to help us reset these numbers.
2351
01:45:50,170 --> 01:45:54,080
Suppose that in the middle of the story we're about to tell,
2352
01:45:54,080 --> 01:45:56,020
we have two sorted halves.
2353
01:45:56,020 --> 01:45:58,300
I've already sorted the left half of these numbers,
2354
01:45:58,300 --> 01:46:01,630
and indeed, 2, 4, 5, 7 is sorted from smallest to largest.
2355
01:46:01,630 --> 01:46:06,100
And the right half appears to be already sorted, 0, 1, 3, 6, already sorted.
2356
01:46:06,100 --> 01:46:09,370
So in my pseudocode, we're already done sorting the left half
2357
01:46:09,370 --> 01:46:10,630
and the right half somehow.
2358
01:46:10,630 --> 01:46:12,160
But we'll see how in a moment.
2359
01:46:12,160 --> 01:46:14,980
Well, how do I go about merging these two halves?
2360
01:46:14,980 --> 01:46:18,490
Well, because they're sorted already, and you want to merge them in order,
2361
01:46:18,490 --> 01:46:20,110
I think we can flip down.
2362
01:46:20,110 --> 01:46:25,030
We can hide all but the first numbers in each of these sublists.
2363
01:46:25,030 --> 01:46:28,125
So here, we have a half that starts with 2.
2364
01:46:28,125 --> 01:46:30,250
And I don't really care what the other numbers are,
2365
01:46:30,250 --> 01:46:32,060
because they're clearly larger than 2.
2366
01:46:32,060 --> 01:46:35,043
I can focus only on 2, and 0 too, 0 also.
2367
01:46:35,043 --> 01:46:37,960
We know that 0 is the smallest there, so let's just ignore the numbers
2368
01:46:37,960 --> 01:46:39,293
that Carter kindly flipped down.
2369
01:46:39,293 --> 01:46:44,360
So how do I merge these two lists into a new sorted larger list?
2370
01:46:44,360 --> 01:46:47,590
Well, I compare the two on my left with the 0
2371
01:46:47,590 --> 01:46:50,470
on my right, obviously, which comes first, the 0.
2372
01:46:50,470 --> 01:46:51,963
So let me put this down here.
2373
01:46:51,963 --> 01:46:54,130
And Carter, if you want to give us the next element.
2374
01:46:54,130 --> 01:46:55,960
Now I have two sorted halves.
2375
01:46:55,960 --> 01:46:57,650
But I've already plucked one off.
2376
01:46:57,650 --> 01:47:00,010
So now I compare the two against the 1.
2377
01:47:00,010 --> 01:47:01,580
1 obviously comes next.
2378
01:47:01,580 --> 01:47:04,843
So I'm going to take out the 1 and put it in place here.
2379
01:47:04,843 --> 01:47:06,760
Now I'm going to compare the two halves again.
2380
01:47:06,760 --> 01:47:08,830
2 and 3, which do I merge first?
2381
01:47:08,830 --> 01:47:10,660
Obviously the 2 comes next.
2382
01:47:10,660 --> 01:47:14,170
And now, notice, each time I do this, my hands are theoretically
2383
01:47:14,170 --> 01:47:15,220
making forward progress.
2384
01:47:15,220 --> 01:47:18,580
I'm not doubling back like I kept doing with selection sort or bubble
2385
01:47:18,580 --> 01:47:20,290
sort, back and forth, back and forth.
2386
01:47:20,290 --> 01:47:22,810
My fingers are constantly advancing forward,
2387
01:47:22,810 --> 01:47:24,310
and that's going to be a key detail.
2388
01:47:24,310 --> 01:47:27,340
So I compare 4 and 3, 3 obviously.
2389
01:47:27,340 --> 01:47:32,560
I compare 4 and 6, 4 obviously.
2390
01:47:32,560 --> 01:47:36,520
I compare 5 and 6, 5 obviously.
2391
01:47:36,520 --> 01:47:40,810
And then I compare 7 and 6, 6 of course.
2392
01:47:40,810 --> 01:47:43,000
And then lastly, we have just one element left.
2393
01:47:43,000 --> 01:47:45,460
And even though I'm kind of moving awkwardly as a human,
2394
01:47:45,460 --> 01:47:48,160
my hands technically were only moving to the right.
2395
01:47:48,160 --> 01:47:51,130
I was never looping back doing something again and again.
2396
01:47:51,130 --> 01:47:54,430
And that's, perhaps, the intuition, and just enough room for the 7.
2397
01:47:54,430 --> 01:47:58,210
So that, then, is how you would merge two sorted halves.
2398
01:47:58,210 --> 01:48:00,610
We started with left half sorted, right half sorted.
2399
01:48:00,610 --> 01:48:02,860
And merging is just like what you would do as a human.
2400
01:48:02,860 --> 01:48:04,568
And Carter just flipped the numbers down,
2401
01:48:04,568 --> 01:48:08,620
so our focus was only on the smallest elements in each.
2402
01:48:08,620 --> 01:48:13,030
Any questions before we forge ahead with what it means, then,
2403
01:48:13,030 --> 01:48:17,120
to be merged in this way?
2404
01:48:17,120 --> 01:48:18,777
So now, here is an original list.
2405
01:48:18,777 --> 01:48:20,860
We deliberately put it at the top, because there's
2406
01:48:20,860 --> 01:48:22,480
one detail of merge sort that's key.
2407
01:48:22,480 --> 01:48:25,490
Merge sort is technically going to use a little more space.
2408
01:48:25,490 --> 01:48:28,270
And so whereas, previously, we just kept moving our humans around
2409
01:48:28,270 --> 01:48:30,790
and swapping people and making sure they stayed ultimately
2410
01:48:30,790 --> 01:48:32,110
in the original positions.
2411
01:48:32,110 --> 01:48:36,700
With merge sort, pretends that here's our original array of memory.
2412
01:48:36,700 --> 01:48:38,970
I'm going to need at least one other ray of memory.
2413
01:48:38,970 --> 01:48:41,160
And I'm going to cheat, and I'm going to use even more memory.
2414
01:48:41,160 --> 01:48:43,440
But technically, I could actually go back and forth
2415
01:48:43,440 --> 01:48:45,540
between 1 array and a secondary array.
2416
01:48:45,540 --> 01:48:48,370
But it is going to take me more space.
2417
01:48:48,370 --> 01:48:53,130
So how do I go about implementing merge sort on this code?
2418
01:48:53,130 --> 01:48:54,930
Well, let's consider this.
2419
01:48:54,930 --> 01:48:57,060
Here is a array of size 8.
2420
01:48:57,060 --> 01:48:59,590
If only one number quit, obviously not applicable.
2421
01:48:59,590 --> 01:49:01,230
So let's focus on the juicy part there.
2422
01:49:01,230 --> 01:49:02,880
Sort the left half of the numbers.
2423
01:49:02,880 --> 01:49:05,130
All right, how do I sort the left half of the numbers?
2424
01:49:05,130 --> 01:49:09,240
I'm going to just nudge them over just to be clear, which is the left half.
2425
01:49:09,240 --> 01:49:11,850
Here is now a sublist of size 4.
2426
01:49:11,850 --> 01:49:14,860
How do I sort the left half?
2427
01:49:14,860 --> 01:49:17,380
Well, do I have an algorithm for sorting?
2428
01:49:17,380 --> 01:49:18,430
Yeah, what do I do?
2429
01:49:18,430 --> 01:49:19,492
Here's a list of size 4.
2430
01:49:19,492 --> 01:49:20,200
How do I sort it?
2431
01:49:20,200 --> 01:49:22,000
What's step one?
2432
01:49:22,000 --> 01:49:23,330
Sort the left half.
2433
01:49:23,330 --> 01:49:28,060
So I now sort of, conceptually in my mind, take this sublist of size 4.
2434
01:49:28,060 --> 01:49:32,740
And I sort it by first sorting the left half, focusing now on the 7 and 2.
2435
01:49:32,740 --> 01:49:34,330
All right, here's a list of size 2.
2436
01:49:34,330 --> 01:49:37,060
How do I sort a list of size 2?
2437
01:49:37,060 --> 01:49:38,740
STUDENT: [INAUDIBLE]
2438
01:49:38,740 --> 01:49:40,170
DAVID MALAN: Sorry?
2439
01:49:40,170 --> 01:49:42,360
I think we just keep following our instructions.
2440
01:49:42,360 --> 01:49:43,650
Sort the left half.
2441
01:49:43,650 --> 01:49:45,630
All right, here is a list of size 1.
2442
01:49:45,630 --> 01:49:48,417
How do I sort a list of size 1?
2443
01:49:48,417 --> 01:49:49,803
STUDENT: [INAUDIBLE]
2444
01:49:49,803 --> 01:49:50,720
DAVID MALAN: I'm done.
2445
01:49:50,720 --> 01:49:51,360
It's done.
2446
01:49:51,360 --> 01:49:52,740
So I leave this alone.
2447
01:49:52,740 --> 01:49:54,740
What was the next step in the story?
2448
01:49:54,740 --> 01:49:58,160
I've just sorted the left half of the left half of the left half.
2449
01:49:58,160 --> 01:49:59,580
What comes next?
2450
01:49:59,580 --> 01:50:03,260
I sort the right half of the left half of the left half,
2451
01:50:03,260 --> 01:50:06,260
and I'm done, because it's just a list of size 1.
2452
01:50:06,260 --> 01:50:09,280
What comes after this?
2453
01:50:09,280 --> 01:50:09,972
Merge.
2454
01:50:09,972 --> 01:50:11,680
So this is where it gets a little trippy,
2455
01:50:11,680 --> 01:50:14,770
because you have to remember where we're pausing the story to do things
2456
01:50:14,770 --> 01:50:16,187
recursively again and again.
2457
01:50:16,187 --> 01:50:19,270
But if I've just sorted the left half and I've just sorted the right half,
2458
01:50:19,270 --> 01:50:20,890
now I merge them together.
2459
01:50:20,890 --> 01:50:25,040
This is a super short list, so we don't need Carter's help here as before.
2460
01:50:25,040 --> 01:50:27,640
But I think the first number I take here is the 2.
2461
01:50:27,640 --> 01:50:31,660
And then the second number I take, because it's the only option, is the 7.
2462
01:50:31,660 --> 01:50:35,260
But what's nice now is that, notice, the left half of the left half
2463
01:50:35,260 --> 01:50:39,228
is indeed sorted, because I trivially sorted the left half of it
2464
01:50:39,228 --> 01:50:40,270
and the right half of it.
2465
01:50:40,270 --> 01:50:42,760
But then merging is really where the magic happens.
2466
01:50:42,760 --> 01:50:45,670
All right, again, if you rewind now in your mind,
2467
01:50:45,670 --> 01:50:51,300
if I've just sorted the left half of the left half, what happens next?
2468
01:50:51,300 --> 01:50:55,000
Sort the right half of the left half.
2469
01:50:55,000 --> 01:50:56,980
So again, you kind of rewind in time.
2470
01:50:56,980 --> 01:50:58,290
So how do I do this?
2471
01:50:58,290 --> 01:50:59,520
I've got a list of size 2.
2472
01:50:59,520 --> 01:51:01,920
I sort the left half, just the 5, done.
2473
01:51:01,920 --> 01:51:04,200
Sort the right half, 4, done.
2474
01:51:04,200 --> 01:51:08,910
Now the interesting part, I merge the left half and the right half
2475
01:51:08,910 --> 01:51:11,380
of the right half of the left half.
2476
01:51:11,380 --> 01:51:12,450
So what do I do?
2477
01:51:12,450 --> 01:51:14,280
4 comes down here.
2478
01:51:14,280 --> 01:51:16,260
5 comes down here.
2479
01:51:16,260 --> 01:51:19,860
And now, notice what I have.
2480
01:51:19,860 --> 01:51:21,600
Left half is sorted.
2481
01:51:21,600 --> 01:51:23,130
Right half is sorted.
2482
01:51:23,130 --> 01:51:26,610
If you rewind in time, where is my next step, 3?
2483
01:51:26,610 --> 01:51:27,742
Merge the two halves.
2484
01:51:27,742 --> 01:51:29,700
And so this is what Carter helped me do before.
2485
01:51:29,700 --> 01:51:31,290
Let's focus only on the smallest elements,
2486
01:51:31,290 --> 01:51:32,665
just so there's less distraction.
2487
01:51:32,665 --> 01:51:34,020
I compare the 2 and the 4.
2488
01:51:34,020 --> 01:51:36,520
2 comes first, so let's obviously put that here.
2489
01:51:36,520 --> 01:51:39,570
Now, I compare the new beginning of this list
2490
01:51:39,570 --> 01:51:41,280
and the old beginning of this list.
2491
01:51:41,280 --> 01:51:43,050
4 obviously comes next.
2492
01:51:43,050 --> 01:51:45,940
And now, I compare the 7 against the 5.
2493
01:51:45,940 --> 01:51:47,430
5 obviously comes next.
2494
01:51:47,430 --> 01:51:49,240
And now, lastly, I'm left with one number.
2495
01:51:49,240 --> 01:51:50,970
So now I'm down to the 7.
2496
01:51:50,970 --> 01:51:53,490
So even if you've kind of lost track of some of the nuances
2497
01:51:53,490 --> 01:51:55,510
here, if you just kind of take a step back,
2498
01:51:55,510 --> 01:51:58,260
we have the original right half here still untouched.
2499
01:51:58,260 --> 01:52:03,210
But the left half of the original input is now, indeed, sorted,
2500
01:52:03,210 --> 01:52:07,140
all by way of doing sorting left half, right half, left half, right half,
2501
01:52:07,140 --> 01:52:08,940
but with those merges in between.
2502
01:52:08,940 --> 01:52:11,965
All right, so if we've just sorted the left half,
2503
01:52:11,965 --> 01:52:13,590
we rewind all the way to the beginning.
2504
01:52:13,590 --> 01:52:15,590
What do I now do?
2505
01:52:15,590 --> 01:52:17,120
All right, so sort the right half.
2506
01:52:17,120 --> 01:52:18,410
So sort the right half.
2507
01:52:18,410 --> 01:52:20,180
How do I sort a list of size 4?
2508
01:52:20,180 --> 01:52:22,550
Well, I first sort the left half, the 1 and the 6.
2509
01:52:22,550 --> 01:52:24,560
How do I sort a list of size 2?
2510
01:52:24,560 --> 01:52:26,757
You sort the left half, just the number 1.
2511
01:52:26,757 --> 01:52:28,340
Obviously, there's no work to be done.
2512
01:52:28,340 --> 01:52:30,620
Done, sorting the left half.
2513
01:52:30,620 --> 01:52:33,080
6, done, sorting the right half.
2514
01:52:33,080 --> 01:52:34,280
Now, what do I do?
2515
01:52:34,280 --> 01:52:40,610
I merge the left half here with the right half here.
2516
01:52:40,610 --> 01:52:42,240
And that one's pretty straightforward.
2517
01:52:42,240 --> 01:52:43,050
Now, what do I do?
2518
01:52:43,050 --> 01:52:43,910
I've just merged.
2519
01:52:43,910 --> 01:52:45,048
So now I sort it.
2520
01:52:45,048 --> 01:52:47,090
I've just sorted the left half of the right half.
2521
01:52:47,090 --> 01:52:49,550
So now I sort the right half of the right half.
2522
01:52:49,550 --> 01:52:51,590
So I consider the 0, done.
2523
01:52:51,590 --> 01:52:53,270
I consider the 3, done.
2524
01:52:53,270 --> 01:52:55,040
I now merge these two together.
2525
01:52:55,040 --> 01:52:56,640
0, of course, comes first.
2526
01:52:56,640 --> 01:52:58,100
Then comes the 3.
2527
01:52:58,100 --> 01:53:00,680
And now I'm at the point of the story where
2528
01:53:00,680 --> 01:53:02,930
I've sorted the left half of the right half
2529
01:53:02,930 --> 01:53:04,910
and the right half of the right half.
2530
01:53:04,910 --> 01:53:07,535
So step 3 is merge.
2531
01:53:07,535 --> 01:53:09,410
And I'll do it again like we did with Carter.
2532
01:53:09,410 --> 01:53:12,320
All right, 1 and 0, obviously the 0 comes first.
2533
01:53:12,320 --> 01:53:14,390
Now, compare the 1 and the 3.
2534
01:53:14,390 --> 01:53:16,130
Obviously, the 1 comes first.
2535
01:53:16,130 --> 01:53:18,590
Compare the 6 and the 3, obviously the 3.
2536
01:53:18,590 --> 01:53:20,300
And then lastly, the 6.
2537
01:53:20,300 --> 01:53:21,890
So now, where are we?
2538
01:53:21,890 --> 01:53:26,840
We've taken the left half of the whole thing and sorted it.
2539
01:53:26,840 --> 01:53:29,990
We then took the right half of the whole thing and sorted it.
2540
01:53:29,990 --> 01:53:33,560
So now we're at, lastly, step 3 for the last time.
2541
01:53:33,560 --> 01:53:35,120
What do we do?
2542
01:53:35,120 --> 01:53:35,780
Merge.
2543
01:53:35,780 --> 01:53:40,220
And so just to be consistent, let me push these down, and let's compare.
2544
01:53:40,220 --> 01:53:43,310
Left hand or right hand, noticing that they only make forward progress,
2545
01:53:43,310 --> 01:53:45,230
none of this back and forth comparisons.
2546
01:53:45,230 --> 01:53:47,270
2 and 0, of course, the 0.
2547
01:53:47,270 --> 01:53:48,860
So we'll put that in place.
2548
01:53:48,860 --> 01:53:51,140
2 and 1, of course, the 1.
2549
01:53:51,140 --> 01:53:52,880
So we put that in place.
2550
01:53:52,880 --> 01:53:56,930
2 and 3, we merge in, of course, the 2 in this case.
2551
01:53:56,930 --> 01:54:00,770
4 and 3, we now merge in the 3 in this case.
2552
01:54:00,770 --> 01:54:05,630
4 and 6, we now merge, of course, the 4 in place.
2553
01:54:05,630 --> 01:54:07,760
And now, we compare 5 and 6.
2554
01:54:07,760 --> 01:54:08,615
We keep the 5.
2555
01:54:08,615 --> 01:54:12,590
2556
01:54:12,590 --> 01:54:15,226
Bug.
2557
01:54:15,226 --> 01:54:17,295
OK, well pretend that the 5 is on.
2558
01:54:17,295 --> 01:54:20,040
2559
01:54:20,040 --> 01:54:21,450
Oh, this is why.
2560
01:54:21,450 --> 01:54:24,240
All right, so now we compare the 7 and the 6.
2561
01:54:24,240 --> 01:54:26,430
6th is gone.
2562
01:54:26,430 --> 01:54:29,520
And lastly, 7 is the last one in place.
2563
01:54:29,520 --> 01:54:32,140
And even though I grant that of all the algorithms,
2564
01:54:32,140 --> 01:54:34,320
this is probably the hardest one to stay on top of,
2565
01:54:34,320 --> 01:54:36,210
especially when I'm doing it as a voice over.
2566
01:54:36,210 --> 01:54:41,010
Realize that what we've just done is only those three steps, recursively.
2567
01:54:41,010 --> 01:54:42,510
We started with a list of size 8.
2568
01:54:42,510 --> 01:54:43,650
We sorted the left half.
2569
01:54:43,650 --> 01:54:44,790
We sorted the right half.
2570
01:54:44,790 --> 01:54:46,450
And then we merge the two together.
2571
01:54:46,450 --> 01:54:48,960
But if you go down each of those rabbit holes, so to speak,
2572
01:54:48,960 --> 01:54:52,410
sorting the left half involves sorting the left half of the left half
2573
01:54:52,410 --> 01:54:54,970
and the right half of the left half, and so forth.
2574
01:54:54,970 --> 01:54:59,250
But this germ of an idea of really dividing and conquering the problem,
2575
01:54:59,250 --> 01:55:02,520
not such that you're having the problem and only dealing with one half.
2576
01:55:02,520 --> 01:55:05,700
Clearly, we're sorting one half and the other half
2577
01:55:05,700 --> 01:55:07,780
and merging them together, ultimately.
2578
01:55:07,780 --> 01:55:10,810
It does still lead us to the same solution.
2579
01:55:10,810 --> 01:55:14,550
And if we visualize the remnants of this now, if I depict this as follows,
2580
01:55:14,550 --> 01:55:18,390
where on the screen here, you see where the numbers originally started
2581
01:55:18,390 --> 01:55:20,310
in the top row from left to right.
2582
01:55:20,310 --> 01:55:23,470
Essentially, even though this is in a different order,
2583
01:55:23,470 --> 01:55:28,708
I divided that list of size 8, ultimately, into eight lists of size 1.
2584
01:55:28,708 --> 01:55:31,000
And that's where the base case kicked in and just said,
2585
01:55:31,000 --> 01:55:32,580
OK, we're done sorting that.
2586
01:55:32,580 --> 01:55:37,680
And after that, logically, I then merged two lists of size 1
2587
01:55:37,680 --> 01:55:41,580
into many lists of size 2 and those lists of size 2 into lists of size 4.
2588
01:55:41,580 --> 01:55:47,250
And then finally, the list of size 4 into one big list sorted of size 8.
2589
01:55:47,250 --> 01:55:50,610
And so I put forth this picture with the little line indicators
2590
01:55:50,610 --> 01:55:55,620
here, because how many times did I divide, divide, divide in half?
2591
01:55:55,620 --> 01:55:57,360
Or really double, double, double.
2592
01:55:57,360 --> 01:55:59,280
So exponent is the opposite--
2593
01:55:59,280 --> 01:56:00,600
spoiler.
2594
01:56:00,600 --> 01:56:02,610
How many times did I divide?
2595
01:56:02,610 --> 01:56:04,320
So three, concretely.
2596
01:56:04,320 --> 01:56:08,070
But if there's eight elements total, and there's n more generally,
2597
01:56:08,070 --> 01:56:12,300
it really is a matter of dividing and conquering log n times.
2598
01:56:12,300 --> 01:56:15,360
You start this, and you can divide one, two, three times, log n times.
2599
01:56:15,360 --> 01:56:18,930
Or conversely, you can start here and exponentially double, double,
2600
01:56:18,930 --> 01:56:21,060
double three times, which is log n.
2601
01:56:21,060 --> 01:56:24,510
But on every row, every shelf, literally, I
2602
01:56:24,510 --> 01:56:28,350
made a fuss about pointing my hands only from the left to the right,
2603
01:56:28,350 --> 01:56:31,380
constantly advancing them, such that every time I did those merges,
2604
01:56:31,380 --> 01:56:34,500
I touched every element once and only once.
2605
01:56:34,500 --> 01:56:37,470
There was none of this back and forth, back and forth on stage.
2606
01:56:37,470 --> 01:56:42,750
So if I'm doing something log n times, or if I'm doing,
2607
01:56:42,750 --> 01:56:49,410
rather, n things log n times, what would be our big O formula, perhaps?
2608
01:56:49,410 --> 01:56:51,057
n things log n times?
2609
01:56:51,057 --> 01:56:52,140
STUDENT: Oh, it's n log n.
2610
01:56:52,140 --> 01:56:53,490
DAVID MALAN: Yeah, so n log n.
2611
01:56:53,490 --> 01:56:56,430
The order of n log n is, indeed, how we would describe
2612
01:56:56,430 --> 01:56:58,560
the running time of merge sort.
2613
01:56:58,560 --> 01:57:03,270
And so of all of the sorts thus far, we've seen that merge sort here,
2614
01:57:03,270 --> 01:57:07,120
actually, is n log n, which is strictly better than n squared,
2615
01:57:07,120 --> 01:57:10,650
which is where both selection sort and bubble sort landed.
2616
01:57:10,650 --> 01:57:14,015
But it's also slower than linear search, for instance.
2617
01:57:14,015 --> 01:57:15,390
But you would rather expect that.
2618
01:57:15,390 --> 01:57:17,880
If you have to do a lot of work up front sorting
2619
01:57:17,880 --> 01:57:19,878
some elements versus just searching them,
2620
01:57:19,878 --> 01:57:21,670
you're going to have to put in more effort.
2621
01:57:21,670 --> 01:57:24,150
And so the question of whether or not you should just search something
2622
01:57:24,150 --> 01:57:26,670
blindly with linear search and not bother sorting it,
2623
01:57:26,670 --> 01:57:30,327
really boils down to, can you afford to spend this amount of time?
2624
01:57:30,327 --> 01:57:32,160
And if you're the Googles of the world, odds
2625
01:57:32,160 --> 01:57:35,170
are you don't want to be searching their database linearly every time.
2626
01:57:35,170 --> 01:57:35,670
Why?
2627
01:57:35,670 --> 01:57:40,380
Because you can sort it once and then benefit millions, billions of people,
2628
01:57:40,380 --> 01:57:43,560
subsequently using something like binary search or, frankly in practice,
2629
01:57:43,560 --> 01:57:46,443
something even fancier and faster than binary search.
2630
01:57:46,443 --> 01:57:48,360
But there's always going to be this trade off.
2631
01:57:48,360 --> 01:57:52,140
You can achieve binary search only if the elements are sorted.
2632
01:57:52,140 --> 01:57:53,940
How much does it cost you to sort them?
2633
01:57:53,940 --> 01:57:56,940
Well, maybe n squared, if you used some of the earlier algorithms.
2634
01:57:56,940 --> 01:58:00,850
But it turns out, n log in is pretty fast as well.
2635
01:58:00,850 --> 01:58:06,180
So at the end of the day, these running times involve trade offs.
2636
01:58:06,180 --> 01:58:09,600
And indeed, in merge sort 2, I should note that the lower bound on merge
2637
01:58:09,600 --> 01:58:12,090
sort is also going to be omega of n log n.
2638
01:58:12,090 --> 01:58:14,640
As such, we can describe it in terms of our theta notation,
2639
01:58:14,640 --> 01:58:18,030
saying that merge sort is, indeed, in theta of n log n.
2640
01:58:18,030 --> 01:58:21,780
So generally speaking, probably better to use something like merge sort
2641
01:58:21,780 --> 01:58:24,030
or some other algorithm that's n log n.
2642
01:58:24,030 --> 01:58:27,660
In practice, most programmers are not implementing these sorting algorithms
2643
01:58:27,660 --> 01:58:28,200
themselves.
2644
01:58:28,200 --> 01:58:30,390
Odds are, they're using a library off the shelf
2645
01:58:30,390 --> 01:58:34,200
that themselves have made the decision as to which of these algorithms to do.
2646
01:58:34,200 --> 01:58:37,140
But generally speaking, and we're seeing now this for the first time,
2647
01:58:37,140 --> 01:58:41,430
if you want to improve time, like use less time, write faster code,
2648
01:58:41,430 --> 01:58:42,750
you've got to pay a price.
2649
01:58:42,750 --> 01:58:45,120
And that might be your human time, just takes you
2650
01:58:45,120 --> 01:58:47,400
more time to code up something more sophisticated,
2651
01:58:47,400 --> 01:58:49,080
more difficult to implement.
2652
01:58:49,080 --> 01:58:51,840
Or you need to spend something like space.
2653
01:58:51,840 --> 01:58:54,060
And as these shelves suggest, that too is
2654
01:58:54,060 --> 01:58:55,740
one of the key details of merge sort.
2655
01:58:55,740 --> 01:58:58,500
You can't just have the elements swapping in place.
2656
01:58:58,500 --> 01:59:02,520
You need at least an auxiliary array, so that when you do the merging,
2657
01:59:02,520 --> 01:59:03,960
you have a place to put them.
2658
01:59:03,960 --> 01:59:05,988
And this is excessive, this amount of memory.
2659
01:59:05,988 --> 01:59:09,030
I could have just gone back and forth between top shelf and bottom shelf.
2660
01:59:09,030 --> 01:59:11,113
But it's a little more interesting to go top down.
2661
01:59:11,113 --> 01:59:12,870
But you do need more space.
2662
01:59:12,870 --> 01:59:15,605
Back in the day, decades ago, space was really expensive.
2663
01:59:15,605 --> 01:59:16,480
And so you know what?
2664
01:59:16,480 --> 01:59:19,450
It might have been better to not use merge sort, use bubble sort
2665
01:59:19,450 --> 01:59:23,230
or selection sort even, or some other algorithm altogether.
2666
01:59:23,230 --> 01:59:25,300
Nowadays, space is relatively cheap.
2667
01:59:25,300 --> 01:59:27,160
And so these are more acceptable trade offs.
2668
01:59:27,160 --> 01:59:29,650
But it totally depends on the application.
2669
01:59:29,650 --> 01:59:32,950
The very last thing we thought we'd do is show you an actual comparison
2670
01:59:32,950 --> 01:59:34,450
of some of these sorting algorithms.
2671
01:59:34,450 --> 01:59:35,800
It's about 60 seconds long.
2672
01:59:35,800 --> 01:59:39,790
And it will compare for you, selection sort, bubble sort,
2673
01:59:39,790 --> 01:59:44,350
and merge sort in parallel simultaneously with some fun sorting
2674
01:59:44,350 --> 01:59:46,570
music, showing you ultimately what it really
2675
01:59:46,570 --> 01:59:52,750
means to be an O of n squared, or better yet, big O of n log n.
2676
01:59:52,750 --> 01:59:54,850
Selection on the top.
2677
01:59:54,850 --> 01:59:58,050
Bubble on the bottom.
2678
01:59:58,050 --> 01:59:59,400
Merge in the middle.
2679
01:59:59,400 --> 02:00:01,885
[MUSIC PLAYING]
2680
02:00:01,885 --> 02:00:53,660
2681
02:00:53,660 --> 02:00:55,700
All right, that's it for CS50.
2682
02:00:55,700 --> 02:00:57,910
We'll see you next time.
2683
02:00:57,910 --> 02:01:35,000
217603
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.