All language subtitles for 2022_lecture3_720p_sdr-en

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic Download
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 0 00:00:00,000 --> 00:00:02,982 [FILM REEL] 1 00:00:02,982 --> 00:00:06,461 [MUSIC PLAYING] 2 00:00:06,461 --> 00:01:12,660 3 00:01:12,660 --> 00:01:15,420 DAVID MALAN: All right, this is CS50. 4 00:01:15,420 --> 00:01:19,200 And this is Week 3 already, wherein we'll take a look back actually 5 00:01:19,200 --> 00:01:21,400 at Week 0 where we first began. 6 00:01:21,400 --> 00:01:24,930 And in Week 0, recall that everything was very intuitive, in a sense. 7 00:01:24,930 --> 00:01:28,000 We talked not just about representation of information, but algorithms. 8 00:01:28,000 --> 00:01:30,450 And we talked about tearing a phone book again and again. 9 00:01:30,450 --> 00:01:32,860 And that somehow got us to a better solution. 10 00:01:32,860 --> 00:01:35,888 But today, we'll try to start formalizing some of those ideas 11 00:01:35,888 --> 00:01:38,430 and capturing some of those same ideas not in pseudocode just 12 00:01:38,430 --> 00:01:41,650 yet, but in actual code as well. 13 00:01:41,650 --> 00:01:45,210 But we'll also consider the efficiency of those algorithms, 14 00:01:45,210 --> 00:01:48,372 like just how good, how well-designed our algorithms actually are. 15 00:01:48,372 --> 00:01:50,580 And if you recall, when we did the phone book example 16 00:01:50,580 --> 00:01:53,940 wherein I first had an algorithm searching one page at a time, 17 00:01:53,940 --> 00:01:56,460 and then second one two pages at a time, and then third, 18 00:01:56,460 --> 00:01:58,590 started tearing the thing in half, recall 19 00:01:58,590 --> 00:02:01,900 that we, with a wave of the hand, kind of analyzed it as follows. 20 00:02:01,900 --> 00:02:05,070 We proposed that if the x-axis here is the size of the problem, 21 00:02:05,070 --> 00:02:09,120 like number of pages in a phone book, and the y-axis is the time required 22 00:02:09,120 --> 00:02:11,430 to solve the problem in seconds, minutes, 23 00:02:11,430 --> 00:02:13,363 page tears, whatever your unit of measure is, 24 00:02:13,363 --> 00:02:16,530 recall that the first algorithm, which is the straight line such that if you 25 00:02:16,530 --> 00:02:19,620 had n pages in the phone book, it might have this slope of n-- 26 00:02:19,620 --> 00:02:23,280 and there's this one-to-one relationship between pages and tears. 27 00:02:23,280 --> 00:02:27,000 Two pages at a time, of course, was twice as fast, but still really 28 00:02:27,000 --> 00:02:30,000 the same shape, the yellow line here indicating that yeah, 29 00:02:30,000 --> 00:02:33,570 it's n over 2, maybe plus 1 if you have to double back, as we discussed. 30 00:02:33,570 --> 00:02:36,840 But it's really still fundamentally the same algorithm one 31 00:02:36,840 --> 00:02:38,370 or two pages at a time. 32 00:02:38,370 --> 00:02:41,760 But the third algorithm, recall, was this one here in green, 33 00:02:41,760 --> 00:02:45,660 where we called it logarithmic in terms of how fast or how slow it was. 34 00:02:45,660 --> 00:02:48,235 And indeed, the implication of this algorithm 35 00:02:48,235 --> 00:02:50,610 was that we could even double the size of the phone book, 36 00:02:50,610 --> 00:02:53,280 and no big deal-- one additional page tear, 37 00:02:53,280 --> 00:02:55,990 and we take yet another 1,000 page bite out of the phone book. 38 00:02:55,990 --> 00:02:58,890 So today, we'll revisit some of these ideas, formalize them a bit, 39 00:02:58,890 --> 00:03:01,530 but also translate some of them, ultimately, to code. 40 00:03:01,530 --> 00:03:03,690 And all of that now is possible because we 41 00:03:03,690 --> 00:03:06,060 have this lower-level understanding, perhaps, of what's 42 00:03:06,060 --> 00:03:07,600 actually inside of your computer. 43 00:03:07,600 --> 00:03:10,170 This, of course, is your computer's RAM or memory. 44 00:03:10,170 --> 00:03:12,720 And recall that if we start to abstract this away, 45 00:03:12,720 --> 00:03:15,420 your computer's memory is really just a grid of bytes. 46 00:03:15,420 --> 00:03:17,730 In fact, we don't have to look at the hardware anymore. 47 00:03:17,730 --> 00:03:21,270 And we looked at a grid of bytes like this, whereby each of these bytes 48 00:03:21,270 --> 00:03:26,700 could be used to store a char, an int, a long, or even an entire string, 49 00:03:26,700 --> 00:03:27,460 at that. 50 00:03:27,460 --> 00:03:30,117 But let's focus perhaps just on a subset of this 51 00:03:30,117 --> 00:03:31,950 because last week, of course, we emphasized, 52 00:03:31,950 --> 00:03:34,830 really, arrays, storing things in arrays. 53 00:03:34,830 --> 00:03:38,140 And that allowed us to start storing entire strings, sequences 54 00:03:38,140 --> 00:03:40,470 of characters, and even arrays of integers 55 00:03:40,470 --> 00:03:44,770 if we wanted to have multiple ones and not just multiple variables as well. 56 00:03:44,770 --> 00:03:48,850 But the catch is that if you look inside of an array in the computer's memory-- 57 00:03:48,850 --> 00:03:51,450 and for instance, suppose these integers here are stored-- 58 00:03:51,450 --> 00:03:53,850 it's pretty easy for us humans to glance at this 59 00:03:53,850 --> 00:03:55,698 and immediately find the number 50. 60 00:03:55,698 --> 00:03:57,990 You sort of have this bird's eye view from where you're 61 00:03:57,990 --> 00:03:59,700 seated of everything on the screen. 62 00:03:59,700 --> 00:04:02,500 And so it's pretty obvious how you get to the number 50. 63 00:04:02,500 --> 00:04:04,710 But in the world of computers, of course, 64 00:04:04,710 --> 00:04:06,720 it turns out that this is hardware. 65 00:04:06,720 --> 00:04:10,440 And computers, for today's purposes, can only do one thing at a time. 66 00:04:10,440 --> 00:04:14,820 They can't just take it all in and find instantly some number like 50. 67 00:04:14,820 --> 00:04:17,130 So perhaps a decent metaphor is to consider 68 00:04:17,130 --> 00:04:20,010 the array of memory inside of your computer 69 00:04:20,010 --> 00:04:22,500 really is a sequence of closed doors. 70 00:04:22,500 --> 00:04:25,860 And if the computer wants to find some value in an array, 71 00:04:25,860 --> 00:04:29,970 it has to do the digital equivalent of opening each of these doors one 72 00:04:29,970 --> 00:04:30,730 at a time. 73 00:04:30,730 --> 00:04:32,410 Now how can code do that? 74 00:04:32,410 --> 00:04:35,610 Well, of course, we introduced indices or indexes last week, 75 00:04:35,610 --> 00:04:39,900 whereby we, by convention, call the first element of an array location 0, 76 00:04:39,900 --> 00:04:44,490 the second location 1, the third location 2, and so forth-- so-called 0 77 00:04:44,490 --> 00:04:45,150 indexed. 78 00:04:45,150 --> 00:04:48,300 And this allowed us to now bridge this conceptual world of what's 79 00:04:48,300 --> 00:04:51,510 going on in memory with actual code, because now we had this square bracket 80 00:04:51,510 --> 00:04:55,710 syntax via which we could go searching for something if we so choose. 81 00:04:55,710 --> 00:04:59,760 And it turns out, if I now paint these red instead of yellow, 82 00:04:59,760 --> 00:05:02,970 it would seem that we actually have a pretty good physical metaphor here 83 00:05:02,970 --> 00:05:07,170 standing in place for what would be a computer's array of memory 84 00:05:07,170 --> 00:05:10,390 if, for instance, you're storing some seven numbers like that. 85 00:05:10,390 --> 00:05:13,710 And so today we begin with a look of a specific type of algorithm. 86 00:05:13,710 --> 00:05:14,910 That is for searching. 87 00:05:14,910 --> 00:05:16,270 Searching is all over the place. 88 00:05:16,270 --> 00:05:19,900 All of us have probably gone to google.com or some equivalent already 89 00:05:19,900 --> 00:05:21,090 multiple times per day. 90 00:05:21,090 --> 00:05:24,120 And getting back answers fast is what companies like Google 91 00:05:24,120 --> 00:05:25,110 are really good at. 92 00:05:25,110 --> 00:05:26,700 So how are they doing that? 93 00:05:26,700 --> 00:05:29,882 How are they storing information in computers' memory? 94 00:05:29,882 --> 00:05:31,590 Well, let's consider what this really is. 95 00:05:31,590 --> 00:05:34,500 It's really just a problem as it was back in Week 0. 96 00:05:34,500 --> 00:05:36,420 The input, though, to the problem, for now, 97 00:05:36,420 --> 00:05:38,253 might be this array of seven lockers. 98 00:05:38,253 --> 00:05:40,920 So that's the input to the problem, inside of which is a number. 99 00:05:40,920 --> 00:05:45,060 And maybe for simplicity now, we just want a yes/no, a true/false answer-- 100 00:05:45,060 --> 00:05:46,860 a bool, that is to say-- 101 00:05:46,860 --> 00:05:51,250 of whether or not some number like 50 is in that array. 102 00:05:51,250 --> 00:05:52,680 It's not quite as fancy as Google. 103 00:05:52,680 --> 00:05:55,500 That doesn't just tell you yes, we have search results. 104 00:05:55,500 --> 00:05:57,300 It actually gives you the search results. 105 00:05:57,300 --> 00:05:59,520 But for, now we'll keep it simple, and just output 106 00:05:59,520 --> 00:06:02,070 as part of this problem yes or no, true or false, 107 00:06:02,070 --> 00:06:06,090 we have found the number we're looking for given an input like that array. 108 00:06:06,090 --> 00:06:09,930 But it turns out inside of this black box that we keep coming back to, 109 00:06:09,930 --> 00:06:12,385 there's all sorts of possible algorithms. 110 00:06:12,385 --> 00:06:15,010 And we talked about this at a high level conceptually in Week 0 111 00:06:15,010 --> 00:06:15,860 with the phone book. 112 00:06:15,860 --> 00:06:19,360 But today, let's consider it a little more concretely 113 00:06:19,360 --> 00:06:22,648 by way of a game that some of you might have grown up with, namely Monopoly. 114 00:06:22,648 --> 00:06:24,940 And so behind these doors, it turns out, will be hidden 115 00:06:24,940 --> 00:06:26,800 some denominations of monopoly money. 116 00:06:26,800 --> 00:06:28,690 But for this, we now have two volunteers. 117 00:06:28,690 --> 00:06:30,483 If you'd like to greet the world? 118 00:06:30,483 --> 00:06:31,525 JACKSON: Hi, I'm Jackson. 119 00:06:31,525 --> 00:06:35,440 120 00:06:35,440 --> 00:06:37,220 STEPHANIE: Hi, my name is Stephanie. 121 00:06:37,220 --> 00:06:38,410 DAVID MALAN: And do you want to say a little something 122 00:06:38,410 --> 00:06:40,570 about yourselves-- years, house, dorm? 123 00:06:40,570 --> 00:06:43,030 STEPHANIE: I'm a first year living in Matthews. 124 00:06:43,030 --> 00:06:43,780 DAVID MALAN: Nice. 125 00:06:43,780 --> 00:06:45,580 JACKSON: And I'm a first year in Canaday. 126 00:06:45,580 --> 00:06:46,330 DAVID MALAN: Nice. 127 00:06:46,330 --> 00:06:48,670 Well, welcome to our two volunteers. 128 00:06:48,670 --> 00:06:50,710 So why don't we do this? 129 00:06:50,710 --> 00:06:54,460 Would one of you like to volunteer the other to go first? 130 00:06:54,460 --> 00:06:55,510 STEPHANIE: I'll go. 131 00:06:55,510 --> 00:06:56,560 DAVID MALAN: OK. 132 00:06:56,560 --> 00:06:58,300 All right, so Stephanie's up first. 133 00:06:58,300 --> 00:07:02,560 And behind one of these doors here, we've hidden the monopoly money 50. 134 00:07:02,560 --> 00:07:04,180 And so we'd like you to find the 50. 135 00:07:04,180 --> 00:07:06,490 We'll tell you nothing more about the lockers. 136 00:07:06,490 --> 00:07:08,900 But we would like you to execute a certain algorithm. 137 00:07:08,900 --> 00:07:11,020 And in fact, I'm going to give you some pseudocode for this. 138 00:07:11,020 --> 00:07:12,770 And I'm going to give you the name for it. 139 00:07:12,770 --> 00:07:13,900 It's called linear search. 140 00:07:13,900 --> 00:07:15,855 And as the name implies, you're pretty much 141 00:07:15,855 --> 00:07:17,980 going to end up walking in sort of a straight line. 142 00:07:17,980 --> 00:07:19,220 But how are you going to do this? 143 00:07:19,220 --> 00:07:21,520 Well, let me propose that in a moment, your first step 144 00:07:21,520 --> 00:07:23,170 will be to think kind of like a loop. 145 00:07:23,170 --> 00:07:27,280 For each door from left to right, what do we want you to do on each iteration? 146 00:07:27,280 --> 00:07:31,528 Well, if 50 is behind that door, then we want 147 00:07:31,528 --> 00:07:33,070 to go ahead and have you return true. 148 00:07:33,070 --> 00:07:35,860 And hold up the 50 proudly, if you will, for the group. 149 00:07:35,860 --> 00:07:38,500 Otherwise, if you get through that whole loop 150 00:07:38,500 --> 00:07:40,690 and you haven't found the number 50, you can just 151 00:07:40,690 --> 00:07:42,640 throw up your hands in disappointment. 152 00:07:42,640 --> 00:07:45,190 False-- you've not found the number 50. 153 00:07:45,190 --> 00:07:50,293 So to be clear, step one is going to be for each door from left to right. 154 00:07:50,293 --> 00:07:51,460 How would you like to begin? 155 00:07:51,460 --> 00:07:55,610 156 00:07:55,610 --> 00:07:56,495 Yep. 157 00:07:56,495 --> 00:07:57,840 Oh, and then-- yep. 158 00:07:57,840 --> 00:07:58,340 There we go. 159 00:07:58,340 --> 00:08:00,365 Yep. 160 00:08:00,365 --> 00:08:02,060 And if you'd like to at least tell-- 161 00:08:02,060 --> 00:08:04,040 good, good acting here. 162 00:08:04,040 --> 00:08:06,140 What have you found instead? 163 00:08:06,140 --> 00:08:07,970 STEPHANIE: It's not 50, but 20. 164 00:08:07,970 --> 00:08:08,900 DAVID MALAN: Oh, OK. 165 00:08:08,900 --> 00:08:10,280 So step one was a fail. 166 00:08:10,280 --> 00:08:11,870 So let's move on to step two. 167 00:08:11,870 --> 00:08:14,453 Inside of this loop, what are you going to do next? 168 00:08:14,453 --> 00:08:16,370 STEPHANIE: I'm going to move to the next door. 169 00:08:16,370 --> 00:08:17,037 DAVID MALAN: OK. 170 00:08:17,037 --> 00:08:20,790 171 00:08:20,790 --> 00:08:22,100 STEPHANIE: Almost. 172 00:08:22,100 --> 00:08:23,100 DAVID MALAN: OK, almost. 173 00:08:23,100 --> 00:08:23,790 Sort of. 174 00:08:23,790 --> 00:08:25,110 A 500 instead. 175 00:08:25,110 --> 00:08:26,280 Next locker? 176 00:08:26,280 --> 00:08:30,200 STEPHANIE: I would rather take that. 177 00:08:30,200 --> 00:08:30,700 No. 178 00:08:30,700 --> 00:08:33,840 179 00:08:33,840 --> 00:08:36,558 DAVID MALAN: OK, we're not telling the audience? 180 00:08:36,558 --> 00:08:38,260 STEPHANIE: It was a 10. 181 00:08:38,260 --> 00:08:39,789 DAVID MALAN: OK, so keep going. 182 00:08:39,789 --> 00:08:40,974 This is step three now. 183 00:08:40,974 --> 00:08:45,470 184 00:08:45,470 --> 00:08:46,310 STEPHANIE: Oh, man. 185 00:08:46,310 --> 00:08:49,850 186 00:08:49,850 --> 00:08:51,260 DAVID MALAN: Five, OK. 187 00:08:51,260 --> 00:08:52,670 A few more lockers to check. 188 00:08:52,670 --> 00:08:57,296 189 00:08:57,296 --> 00:08:58,790 STEPHANIE: A little sad, guys. 190 00:08:58,790 --> 00:09:02,527 191 00:09:02,527 --> 00:09:04,360 DAVID MALAN: All right, second-to-last step. 192 00:09:04,360 --> 00:09:07,710 193 00:09:07,710 --> 00:09:09,070 STEPHANIE: It's 1. 194 00:09:09,070 --> 00:09:10,022 Kind of close. 195 00:09:10,022 --> 00:09:10,980 DAVID MALAN: All right. 196 00:09:10,980 --> 00:09:12,780 And finally, the last step. 197 00:09:12,780 --> 00:09:15,780 Clearly you've been, perhaps, set up here. 198 00:09:15,780 --> 00:09:17,340 STEPHANIE: Let's go! 199 00:09:17,340 --> 00:09:19,920 DAVID MALAN: All right, so the number 50. 200 00:09:19,920 --> 00:09:23,500 201 00:09:23,500 --> 00:09:25,870 And Stephanie, if I may, let me ask you a question here. 202 00:09:25,870 --> 00:09:28,890 So on the screen, this is the pseudocode you just executed. 203 00:09:28,890 --> 00:09:31,800 Suppose, though, I had done what many of us 204 00:09:31,800 --> 00:09:34,770 have gotten to the habit of doing when you have an if condition. 205 00:09:34,770 --> 00:09:36,730 You often have an else branch as well. 206 00:09:36,730 --> 00:09:38,760 Suppose that I had done this now. 207 00:09:38,760 --> 00:09:41,680 And I'm marking it in red to be clear this is wrong. 208 00:09:41,680 --> 00:09:45,750 But what would have been bad about this code using an if and an else, 209 00:09:45,750 --> 00:09:47,040 might you say? 210 00:09:47,040 --> 00:09:48,360 Any instincts? 211 00:09:48,360 --> 00:09:55,620 212 00:09:55,620 --> 00:09:58,740 STEPHANIE: Then you would end up canceling 213 00:09:58,740 --> 00:10:00,210 the code before you found the 50. 214 00:10:00,210 --> 00:10:01,020 DAVID MALAN: Yeah, exactly. 215 00:10:01,020 --> 00:10:02,070 STEPHANIE: I mean, you'd just be eternally sad. 216 00:10:02,070 --> 00:10:02,460 DAVID MALAN: Indeed. 217 00:10:02,460 --> 00:10:05,127 When Stephanie had opened the first locker, she'd have found 20. 218 00:10:05,127 --> 00:10:06,630 20, of course, is not 50. 219 00:10:06,630 --> 00:10:07,838 She would have decreed false. 220 00:10:07,838 --> 00:10:10,547 But of course, she hadn't checked all of the rest of the lockers. 221 00:10:10,547 --> 00:10:13,440 So that would seem to be a key detail that, with this implementation 222 00:10:13,440 --> 00:10:16,440 of the pseudocode, we actually do go through-- as we did-- 223 00:10:16,440 --> 00:10:18,810 and only return false not even with an else, 224 00:10:18,810 --> 00:10:23,070 but just at the end of the loop such that we only reach that line if we 225 00:10:23,070 --> 00:10:25,245 don't return true earlier than that. 226 00:10:25,245 --> 00:10:26,620 Well, let's go ahead and do this. 227 00:10:26,620 --> 00:10:27,360 Let me take the mic from you. 228 00:10:27,360 --> 00:10:28,930 If you'd like to take a seat next to Jackson? 229 00:10:28,930 --> 00:10:31,180 And Jackson, in just a moment, we'll have you come up. 230 00:10:31,180 --> 00:10:34,170 Carter, if you don't mind reorganizing the lockers for us. 231 00:10:34,170 --> 00:10:36,630 But in the meantime, let me point out how we might now 232 00:10:36,630 --> 00:10:38,010 translate that same idea to code. 233 00:10:38,010 --> 00:10:41,050 Pretty high level, pretty English-oriented with that pseudocode-- 234 00:10:41,050 --> 00:10:44,910 but really now, as of last week, we have syntax via which Stephanie 235 00:10:44,910 --> 00:10:48,810 and, soon, Jackson could treat this locker, this set of lockers, as really 236 00:10:48,810 --> 00:10:51,250 indeed an array using bracket notation. 237 00:10:51,250 --> 00:10:54,480 So we can now get a little closer in our pseudocode to actual code. 238 00:10:54,480 --> 00:10:56,670 And the way a computer scientist, for instance, 239 00:10:56,670 --> 00:10:59,850 would translate fairly high level English pseudocode 240 00:10:59,850 --> 00:11:03,720 like this to something that's a little closer to C or any language 241 00:11:03,720 --> 00:11:06,600 that supports arrays would be a little more cryptically like this. 242 00:11:06,600 --> 00:11:09,060 But you'll see more of this syntax in the coming days. 243 00:11:09,060 --> 00:11:13,260 For i from 0 to n minus 1-- this is still pseudocode. 244 00:11:13,260 --> 00:11:15,840 But that's the English-like way of expressing what 245 00:11:15,840 --> 00:11:17,730 we've known come to know as a for loop. 246 00:11:17,730 --> 00:11:22,260 If 50 is behind doors bracket i-- so I'm assuming for the sake of discussion 247 00:11:22,260 --> 00:11:26,503 that doors, now, is the name of my variable, this array of seven doors. 248 00:11:26,503 --> 00:11:28,920 But then the rest of the logic, the rest of the pseudocode 249 00:11:28,920 --> 00:11:30,370 really is the same way. 250 00:11:30,370 --> 00:11:33,270 And so you'll find in time that programmers, computer scientists more 251 00:11:33,270 --> 00:11:36,900 generally, when you start expressing ideas, algorithms to someone else, 252 00:11:36,900 --> 00:11:40,170 instead of maybe operating at this level here, 253 00:11:40,170 --> 00:11:43,020 you now have a new vocabulary, really a new syntax 254 00:11:43,020 --> 00:11:44,910 that you can be a little more specific, not 255 00:11:44,910 --> 00:11:47,380 getting so into the weeds of writing actual C code, 256 00:11:47,380 --> 00:11:50,910 but at least now doing something that's a little closer to manipulating 257 00:11:50,910 --> 00:11:51,810 an array like this. 258 00:11:51,810 --> 00:11:55,140 So Jackson, would you like to stand on up? 259 00:11:55,140 --> 00:11:56,760 All right. 260 00:11:56,760 --> 00:11:57,360 Yes, yes. 261 00:11:57,360 --> 00:11:59,010 Support for Jackson here, too. 262 00:11:59,010 --> 00:12:00,780 Nice. 263 00:12:00,780 --> 00:12:04,470 And here now, I'm going to allow you an assumption that Stephanie did not have. 264 00:12:04,470 --> 00:12:06,660 Stephanie clearly was really doing her best 265 00:12:06,660 --> 00:12:10,050 searching from left to right using linear search, as we'll now call it. 266 00:12:10,050 --> 00:12:12,180 But they were pretty much in a random order, right? 267 00:12:12,180 --> 00:12:15,030 There was a 20 over there, there was 1 over there, and then a 50. 268 00:12:15,030 --> 00:12:19,110 So we deliberately jumbled things up and did not sort the numbers for her. 269 00:12:19,110 --> 00:12:22,530 But Carter kindly has just come up to give you a leg up, Jackson, 270 00:12:22,530 --> 00:12:24,510 by sorting the numbers in advance. 271 00:12:24,510 --> 00:12:27,630 And we'd like you this time, much like in week 0, 272 00:12:27,630 --> 00:12:30,060 to do something again and again, but this time 273 00:12:30,060 --> 00:12:32,250 using what we'll now call binary search. 274 00:12:32,250 --> 00:12:35,580 It's exactly the same algorithm conceptually as we did in Week 0. 275 00:12:35,580 --> 00:12:38,310 But if we translate it to the context of this array, 276 00:12:38,310 --> 00:12:40,450 we now might say something like this. 277 00:12:40,450 --> 00:12:42,960 The first step for Jackson might be to ask the question-- 278 00:12:42,960 --> 00:12:46,110 if 50 is behind the middle door, where presumably he's 279 00:12:46,110 --> 00:12:48,570 done some mental math to figure out what the middle is, 280 00:12:48,570 --> 00:12:50,610 then he's going to just return true. 281 00:12:50,610 --> 00:12:53,070 And hopefully we'll get lucky and 50 will be right there. 282 00:12:53,070 --> 00:12:56,400 Of course, there's two other possibilities at least, 283 00:12:56,400 --> 00:12:58,290 which would be what? 284 00:12:58,290 --> 00:13:01,290 50 is, with respect to these doors? 285 00:13:01,290 --> 00:13:03,930 Yeah, so to the left or to the right, alternatively. 286 00:13:03,930 --> 00:13:07,722 So if 50 is less than the middle door, then presumably, 287 00:13:07,722 --> 00:13:09,180 Jackson's going to want to go left. 288 00:13:09,180 --> 00:13:11,310 Else, if 50 is greater than the middle door, 289 00:13:11,310 --> 00:13:13,380 he's going to want to go right, much like I 290 00:13:13,380 --> 00:13:16,170 did physically last week with the phone book, 291 00:13:16,170 --> 00:13:17,910 dividing and conquering left to right. 292 00:13:17,910 --> 00:13:20,020 But there's actually a fourth case. 293 00:13:20,020 --> 00:13:21,540 Let's put it on the board first. 294 00:13:21,540 --> 00:13:25,530 What else might happen here that Jackson should consider? 295 00:13:25,530 --> 00:13:26,170 Yeah. 296 00:13:26,170 --> 00:13:28,590 Oh, it's not there. 297 00:13:28,590 --> 00:13:32,930 So let me actually go back and amend my pseudocode here and just say Jackson, 298 00:13:32,930 --> 00:13:36,200 if we don't hand you any doors at all, or eventually, 299 00:13:36,200 --> 00:13:39,080 as he's dividing and conquering, if he's left with no more doors, 300 00:13:39,080 --> 00:13:42,380 we have to handle that situation so that the behavior is defined. 301 00:13:42,380 --> 00:13:44,210 All right, so with that said, Jackson, do 302 00:13:44,210 --> 00:13:46,127 you want to go ahead and find us the number 50 303 00:13:46,127 --> 00:13:48,650 and walk us through verbally what you're doing and finding? 304 00:13:48,650 --> 00:13:52,860 JACKSON: All right, so it looks like this one is the middle door. 305 00:13:52,860 --> 00:13:55,290 So I'm going to open it. 306 00:13:55,290 --> 00:13:57,030 But it's 20, not 50. 307 00:13:57,030 --> 00:13:59,622 DAVID MALAN: Aw. 308 00:13:59,622 --> 00:14:01,080 What's going through your head now? 309 00:14:01,080 --> 00:14:02,670 JACKSON: So now I'm looking-- 310 00:14:02,670 --> 00:14:06,490 because 50 is higher than 20, I want to look to the right. 311 00:14:06,490 --> 00:14:07,440 DAVID MALAN: Good. 312 00:14:07,440 --> 00:14:10,270 JACKSON: And look for the new middle door, which would be here. 313 00:14:10,270 --> 00:14:11,700 DAVID MALAN: Nice. 314 00:14:11,700 --> 00:14:12,900 JACKSON: And it's 100-- 315 00:14:12,900 --> 00:14:13,740 bad. 316 00:14:13,740 --> 00:14:16,560 But 50 is less than 100. 317 00:14:16,560 --> 00:14:20,520 So now we to look left, which would be here. 318 00:14:20,520 --> 00:14:21,240 And ta-da. 319 00:14:21,240 --> 00:14:21,990 DAVID MALAN: Nice. 320 00:14:21,990 --> 00:14:25,680 Very well done this time around, too. 321 00:14:25,680 --> 00:14:29,680 So thank you, first, to our volunteers here. 322 00:14:29,680 --> 00:14:32,970 And in fact, since you're a fan of Monopoly, as we're so informed, 323 00:14:32,970 --> 00:14:35,220 we have the Cambridge edition of Monopoly 324 00:14:35,220 --> 00:14:36,743 with all your Harvard favorites. 325 00:14:36,743 --> 00:14:37,410 JACKSON: No way. 326 00:14:37,410 --> 00:14:38,460 DAVID MALAN: Here you go. 327 00:14:38,460 --> 00:14:38,970 STEPHANIE: Thank you. 328 00:14:38,970 --> 00:14:40,095 JACKSON: Thank you so much. 329 00:14:40,095 --> 00:14:42,570 DAVID MALAN: Thank you to our volunteers for finding us 50. 330 00:14:42,570 --> 00:14:46,940 So-- that was more popular than we expected. 331 00:14:46,940 --> 00:14:50,470 So here, we can translate this one more time into something 332 00:14:50,470 --> 00:14:52,060 a little closer to code. 333 00:14:52,060 --> 00:14:54,730 And again, still pseudocode-- but here, now, 334 00:14:54,730 --> 00:14:57,460 might be another formulation of exactly what Jackson just did, 335 00:14:57,460 --> 00:14:59,380 just using the nomenclature now of arrays, 336 00:14:59,380 --> 00:15:01,570 where you can be a little more precise with your instructions 337 00:15:01,570 --> 00:15:04,640 and still leave it to someone else to translate this, finally, to code. 338 00:15:04,640 --> 00:15:06,640 But here we have same question at the beginning. 339 00:15:06,640 --> 00:15:08,650 If no doors left, return false. 340 00:15:08,650 --> 00:15:11,500 If 50 is behind doors bracket middle-- 341 00:15:11,500 --> 00:15:13,810 so I'm assuming here, because this is pseudocode-- 342 00:15:13,810 --> 00:15:16,810 that somewhere I've done the mental math or the actual math 343 00:15:16,810 --> 00:15:19,480 to figure out what the index of middle is. 344 00:15:19,480 --> 00:15:22,420 For instance, if these are seven doors in an array, 345 00:15:22,420 --> 00:15:27,310 this would be location 0, 1, 2, 3, 4, 5, 6. 346 00:15:27,310 --> 00:15:31,120 So somehow I've taken the total number of doors, 7, 347 00:15:31,120 --> 00:15:33,520 divided by 2 to find the middle. 348 00:15:33,520 --> 00:15:34,430 That's 3 and 1/2. 349 00:15:34,430 --> 00:15:35,680 We have to deal with rounding. 350 00:15:35,680 --> 00:15:38,920 But suffice it to say there's a well-defined formula for finding 351 00:15:38,920 --> 00:15:39,910 the middle index. 352 00:15:39,910 --> 00:15:43,280 Given the total number of lockers, divide by 2 and then round accordingly. 353 00:15:43,280 --> 00:15:45,550 So that's presumably what Jackson did just by counting 354 00:15:45,550 --> 00:15:48,790 or in his head to find us door number 3. 355 00:15:48,790 --> 00:15:52,210 Not the third door, the 4th door, but door bracket 3. 356 00:15:52,210 --> 00:15:56,200 So this is just saying if 50 is behind doors bracket middle, return true. 357 00:15:56,200 --> 00:15:57,200 That was not the case. 358 00:15:57,200 --> 00:15:58,960 He found a $20 bill instead. 359 00:15:58,960 --> 00:16:03,670 Else, if 50 is less than the doors bracket middle, go ahead-- 360 00:16:03,670 --> 00:16:10,660 and now it gets interesting-- search doors 0 through doors middle minus 1. 361 00:16:10,660 --> 00:16:12,730 So it's getting a little more into the weeds now. 362 00:16:12,730 --> 00:16:16,900 But if middle is 3, this one here, what we want to now 363 00:16:16,900 --> 00:16:19,300 have Jackson search if 50 had been-- 364 00:16:19,300 --> 00:16:22,360 if the number had been less, we want to start at bracket 0 365 00:16:22,360 --> 00:16:24,340 and go up through this one. 366 00:16:24,340 --> 00:16:26,380 And we deliberately subtract 1 because what's 367 00:16:26,380 --> 00:16:28,297 the point of looking in the same locker again? 368 00:16:28,297 --> 00:16:31,270 We might as well do 0 through middle minus 1. 369 00:16:31,270 --> 00:16:36,040 Else if 50 is greater than doors bracket middle, which it was, 370 00:16:36,040 --> 00:16:37,120 what did we then do? 371 00:16:37,120 --> 00:16:40,870 So Jackson intuitively searched for doors middle plus 1 372 00:16:40,870 --> 00:16:43,295 through doors n minus 1. 373 00:16:43,295 --> 00:16:46,420 And honestly, it gets a little annoying having the pluses and minuses here. 374 00:16:46,420 --> 00:16:47,780 But just think of what it means. 375 00:16:47,780 --> 00:16:49,090 This is the middle door. 376 00:16:49,090 --> 00:16:53,942 And Jackson then did proceed to search through doors middle plus 1 377 00:16:53,942 --> 00:16:56,150 because there's no point in searching this one again. 378 00:16:56,150 --> 00:17:00,130 And then the last element in any array of size n 379 00:17:00,130 --> 00:17:05,352 where n is just our go-to number for the size is always going to be n minus 1. 380 00:17:05,352 --> 00:17:06,310 It's not going to be n. 381 00:17:06,310 --> 00:17:10,839 It's going to be n minus 1 because we always start counting arrays at 0. 382 00:17:10,839 --> 00:17:14,200 So here then we have a translation into pseudocode that's a little closer 383 00:17:14,200 --> 00:17:16,430 to C of this exact same idea. 384 00:17:16,430 --> 00:17:18,490 And here, we come full circle to Week 0. 385 00:17:18,490 --> 00:17:22,240 In Week 0, it was pretty intuitive to imagine dividing and conquering 386 00:17:22,240 --> 00:17:23,420 a problem like this. 387 00:17:23,420 --> 00:17:26,770 But if you now think back to actual your iPhone, your Android phone, 388 00:17:26,770 --> 00:17:29,920 or the like, when you're doing autocomplete and searching the list, 389 00:17:29,920 --> 00:17:33,430 it's possible, if you don't have many friends or family or colleagues 390 00:17:33,430 --> 00:17:34,840 in the phone, you know what? 391 00:17:34,840 --> 00:17:39,070 Linear search, just checking every name for the person you're searching for, 392 00:17:39,070 --> 00:17:40,720 might be perfectly fine. 393 00:17:40,720 --> 00:17:43,810 But odds are your phone's being smarter than that, especially if you 394 00:17:43,810 --> 00:17:47,140 start to have dozens, hundreds, thousands of people in your contacts 395 00:17:47,140 --> 00:17:47,770 over the years. 396 00:17:47,770 --> 00:17:49,540 What would be better than linear search? 397 00:17:49,540 --> 00:17:51,340 Well, perhaps binary search. 398 00:17:51,340 --> 00:17:55,180 But, but, but-- there's an assumption, a requirement, which is what? 399 00:17:55,180 --> 00:18:00,340 Why was Jackson ultimately able to find the 50 in just three steps 400 00:18:00,340 --> 00:18:04,830 instead of a full seven, like Stephanie? 401 00:18:04,830 --> 00:18:06,570 Because the array was sorted. 402 00:18:06,570 --> 00:18:09,990 And so this is sort of a teaser for what we'll have to come back to later today. 403 00:18:09,990 --> 00:18:12,780 Well, how much effort did it take someone like Carter? 404 00:18:12,780 --> 00:18:15,240 How much effort does it take your phone to sort all 405 00:18:15,240 --> 00:18:17,040 of those names and numbers in advance? 406 00:18:17,040 --> 00:18:19,650 Because maybe it's not actually worth the amount of time. 407 00:18:19,650 --> 00:18:24,210 Now someone like Google probably somehow keeps the database of web pages sorted. 408 00:18:24,210 --> 00:18:28,110 You can imagine it being super slow if, when you type in cats or something else 409 00:18:28,110 --> 00:18:32,280 into google.com, if they searched linearly over their entire data set. 410 00:18:32,280 --> 00:18:35,430 Ideally, they're doing something a little smarter than that. 411 00:18:35,430 --> 00:18:38,820 So we'll formalize, now, exactly this kind of analysis. 412 00:18:38,820 --> 00:18:42,180 And it's not going to be so much mathy as it still will be intuitive. 413 00:18:42,180 --> 00:18:45,480 But we'll introduce you to some jargon, some terminology 414 00:18:45,480 --> 00:18:48,180 that most any programmer or computer scientists might use 415 00:18:48,180 --> 00:18:50,550 when analyzing their own algorithms. 416 00:18:50,550 --> 00:18:53,670 Let's formalize now what this kind of analysis is. 417 00:18:53,670 --> 00:18:56,910 So right now, I claim binary search better than linear search. 418 00:18:56,910 --> 00:18:59,100 But how much better and why, exactly? 419 00:18:59,100 --> 00:19:01,120 Well, it all comes back to this kind of graph. 420 00:19:01,120 --> 00:19:04,830 So this, recall, is how we analyzed the phone book back in Week 0. 421 00:19:04,830 --> 00:19:08,880 And recall that, indeed, we had these formulas, rough formulas that 422 00:19:08,880 --> 00:19:11,370 described the running time of those three algorithms-- 423 00:19:11,370 --> 00:19:13,740 one page at a time, two pages at a time, and then 424 00:19:13,740 --> 00:19:15,720 tearing the thing again and again in half. 425 00:19:15,720 --> 00:19:19,830 And precisely, if you counted up the number of pages I was touching 426 00:19:19,830 --> 00:19:22,020 or the number of pages I was tearing, it's 427 00:19:22,020 --> 00:19:24,600 fair to say that the first algorithm, in the worst case, 428 00:19:24,600 --> 00:19:26,700 might have taken n total pages. 429 00:19:26,700 --> 00:19:29,010 It didn't because I was searching for John Harvard 430 00:19:29,010 --> 00:19:31,260 at the time, which is somewhat early in the alphabet. 431 00:19:31,260 --> 00:19:34,340 But if I were searching for someone with the last name of Z, 432 00:19:34,340 --> 00:19:36,840 I would have had to keep going and going, in the worst case, 433 00:19:36,840 --> 00:19:38,430 through all n pages. 434 00:19:38,430 --> 00:19:41,940 Not as bad for the second algorithm, and that's why we do n divided by 2. 435 00:19:41,940 --> 00:19:43,920 And even that's a bit of a white lie. 436 00:19:43,920 --> 00:19:48,270 It's probably n divided by 2 plus 1 in case I have to double back. 437 00:19:48,270 --> 00:19:50,405 But again, I'm sort of doing this more generally 438 00:19:50,405 --> 00:19:52,030 to capture the essence of these things. 439 00:19:52,030 --> 00:19:54,363 And then we really got into the weeds with like log base 440 00:19:54,363 --> 00:19:56,940 2 event for that third and final algorithm. 441 00:19:56,940 --> 00:20:00,690 And at the time, we claimed any time you're dividing something in half, 442 00:20:00,690 --> 00:20:04,200 in half, in half, odds are there's going to be some kind of logarithm involved. 443 00:20:04,200 --> 00:20:05,340 And we'll see that today. 444 00:20:05,340 --> 00:20:09,010 But today, we're going to actually start using computer science terminology. 445 00:20:09,010 --> 00:20:13,590 And we're going to formalize this imprecision, if you will. 446 00:20:13,590 --> 00:20:17,670 We are not going to care, generally, about exactly how many steps 447 00:20:17,670 --> 00:20:20,820 some algorithm takes because that's not going to be that enlightening, 448 00:20:20,820 --> 00:20:24,630 especially if maybe you have a faster computer tomorrow than you did today. 449 00:20:24,630 --> 00:20:27,510 It wouldn't really be fair to compare numbers too precisely. 450 00:20:27,510 --> 00:20:31,590 We really want to, with the wave of a hand, just get a sense of roughly 451 00:20:31,590 --> 00:20:33,930 how slow or how fast an algorithm is. 452 00:20:33,930 --> 00:20:36,000 So the notation here is deliberate. 453 00:20:36,000 --> 00:20:40,620 That is literally a capital O, often italicized, refer to as big O. 454 00:20:40,620 --> 00:20:43,920 And so the first algorithm is in big O of n. 455 00:20:43,920 --> 00:20:47,760 The second algorithm is in big O of n divided by 2. 456 00:20:47,760 --> 00:20:51,480 The third algorithm is in big O of log base 2 of n. 457 00:20:51,480 --> 00:20:54,690 But even that is kind of unnecessary detail. 458 00:20:54,690 --> 00:20:58,830 When using big O notation, you really don't care about, we'll see, 459 00:20:58,830 --> 00:21:01,230 the smaller order terms. 460 00:21:01,230 --> 00:21:04,500 We're not going to care about the divided by 2 because you know what? 461 00:21:04,500 --> 00:21:07,720 The shape of these algorithms are is almost the same. 462 00:21:07,720 --> 00:21:10,800 And really, the idea-- the algorithm itself is sort of fundamentally 463 00:21:10,800 --> 00:21:11,340 the same. 464 00:21:11,340 --> 00:21:13,620 Instead of one page at a time, I'm doing two. 465 00:21:13,620 --> 00:21:17,280 But if you throw millions of pages, billions of pages at me, 466 00:21:17,280 --> 00:21:20,460 those algorithms are really going to kind of perform the same as n gets 467 00:21:20,460 --> 00:21:22,085 really large, goes off toward infinity. 468 00:21:22,085 --> 00:21:23,585 And the same is true for logarithms. 469 00:21:23,585 --> 00:21:25,560 Even if you're a little rusty, it turns out 470 00:21:25,560 --> 00:21:29,560 that whether you do the math with log base 2, log base 3, log base 10, 471 00:21:29,560 --> 00:21:33,040 you can just multiply one by the other to really get the same formula. 472 00:21:33,040 --> 00:21:35,700 So this is only to say a computer scientist would generally 473 00:21:35,700 --> 00:21:39,270 say that the first two algorithms are on the order of n steps. 474 00:21:39,270 --> 00:21:42,690 The third algorithm is on the order of log n steps. 475 00:21:42,690 --> 00:21:46,350 And we don't really care precisely what we mean beyond that. 476 00:21:46,350 --> 00:21:49,770 And this big O notation, as we'll see-- and actually, let me zoom out. 477 00:21:49,770 --> 00:21:53,500 If you can imagine suddenly making the x-axis much longer-- 478 00:21:53,500 --> 00:21:55,770 so more pages on the screen at once-- 479 00:21:55,770 --> 00:21:58,320 it is indeed going to be the shapes of these curves 480 00:21:58,320 --> 00:22:01,480 that matter, because imagine in your mind's eye as you zoom out, 481 00:22:01,480 --> 00:22:04,950 zoom out, zoom out, zoom out, and as n gets much, much, much bigger 482 00:22:04,950 --> 00:22:07,470 on the x-axis, the red and the yellow line 483 00:22:07,470 --> 00:22:11,400 are essentially going to look the same once n is sufficiently large. 484 00:22:11,400 --> 00:22:14,378 But the green line is never going to look the same. 485 00:22:14,378 --> 00:22:16,420 It's going to be a fundamentally different shape. 486 00:22:16,420 --> 00:22:18,570 And so that's the intuition of big O, to get 487 00:22:18,570 --> 00:22:23,200 a sense of these rates of performance like this. 488 00:22:23,200 --> 00:22:25,410 So here, then, is big O. Here is, perhaps, 489 00:22:25,410 --> 00:22:29,160 a cheat sheet of the common formulas that a computer scientist, certainly 490 00:22:29,160 --> 00:22:32,490 in an introductory context, might use when analyzing algorithms. 491 00:22:32,490 --> 00:22:36,060 And let's consider for a moment which of our first two algorithms-- 492 00:22:36,060 --> 00:22:39,040 linear search and binary search-- fall into these categories. 493 00:22:39,040 --> 00:22:44,318 So I've ordered them from slowest to fastest, so order of n squared. 494 00:22:44,318 --> 00:22:46,110 It's not something we've actually seen yet, 495 00:22:46,110 --> 00:22:48,392 but it tends to be slow because it's quadratic. 496 00:22:48,392 --> 00:22:49,350 You're doing n times n. 497 00:22:49,350 --> 00:22:51,090 That's got to add up to a lot of steps. 498 00:22:51,090 --> 00:22:53,190 Better today is going to be n log n. 499 00:22:53,190 --> 00:22:54,630 Even better is going to be n. 500 00:22:54,630 --> 00:22:56,190 Even better than that is log n. 501 00:22:56,190 --> 00:23:00,150 And best is so-called order of 1, like one step 502 00:23:00,150 --> 00:23:02,520 or maybe two steps, maybe even 1,000 steps, 503 00:23:02,520 --> 00:23:08,200 but a fixed, finite number of steps that never changes no matter how big n is. 504 00:23:08,200 --> 00:23:11,920 So given this chart, just to be clear, linear search-- 505 00:23:11,920 --> 00:23:13,570 let's consider the worst case. 506 00:23:13,570 --> 00:23:18,040 In the worst case, how many steps did it take someone like Stephanie 507 00:23:18,040 --> 00:23:23,500 to find the solution to the problem, assuming not seven doors but n doors? 508 00:23:23,500 --> 00:23:25,160 Yeah? 509 00:23:25,160 --> 00:23:26,540 So on the order of n. 510 00:23:26,540 --> 00:23:28,280 And in this case, it's exactly n. 511 00:23:28,280 --> 00:23:32,660 But you know what, maybe it's arguably 2n because it took Stephanie 512 00:23:32,660 --> 00:23:33,530 a couple of steps. 513 00:23:33,530 --> 00:23:34,460 She had to lift the latch. 514 00:23:34,460 --> 00:23:35,360 She had to open the door. 515 00:23:35,360 --> 00:23:36,318 Maybe it's three steps. 516 00:23:36,318 --> 00:23:37,530 She had to show the money. 517 00:23:37,530 --> 00:23:39,170 So now it's 3n, 2n. 518 00:23:39,170 --> 00:23:41,990 But we don't really care about that level of precision. 519 00:23:41,990 --> 00:23:45,660 We really just care about the fundamental number of operations. 520 00:23:45,660 --> 00:23:47,540 So we'll say yes, on the order of n. 521 00:23:47,540 --> 00:23:51,320 So that might be an upper bound, we'll call this, for linear search. 522 00:23:51,320 --> 00:23:53,030 And how about binary search? 523 00:23:53,030 --> 00:23:57,140 In Jackson's case, or in general, me in Week 0, if there's n doors, 524 00:23:57,140 --> 00:24:02,910 how many steps did it take Jackson or me using binary search? 525 00:24:02,910 --> 00:24:04,860 In this case, it was literally three. 526 00:24:04,860 --> 00:24:07,200 But that's not a formula. 527 00:24:07,200 --> 00:24:09,690 Yeah so it's on the order of log n. 528 00:24:09,690 --> 00:24:11,970 And indeed, if there are seven doors, well, that's 529 00:24:11,970 --> 00:24:14,250 almost eight, if you just do a little bit of rounding. 530 00:24:14,250 --> 00:24:18,480 And indeed, if you take log base 2 of 8, that does actually give us 3. 531 00:24:18,480 --> 00:24:19,813 So the math actually checks out. 532 00:24:19,813 --> 00:24:22,272 And if you're not comfortable with logarithms, no big deal. 533 00:24:22,272 --> 00:24:23,670 Just think about it intuitively. 534 00:24:23,670 --> 00:24:27,010 Logarithm of base 2 is just dividing something again and again. 535 00:24:27,010 --> 00:24:31,110 So on this chart, when we consider big O, which to be clear, 536 00:24:31,110 --> 00:24:35,100 allows you to describe the order of an algorithm's running time-- 537 00:24:35,100 --> 00:24:38,610 like the magnitude of it-- but it also describes, more specifically, 538 00:24:38,610 --> 00:24:40,090 an upper bound. 539 00:24:40,090 --> 00:24:42,990 So in the worst case, for instance, these 540 00:24:42,990 --> 00:24:45,480 are pretty good measures of how good-- 541 00:24:45,480 --> 00:24:47,310 or rather, of how bad-- 542 00:24:47,310 --> 00:24:49,270 linear search and binary search might be. 543 00:24:49,270 --> 00:24:49,770 Why? 544 00:24:49,770 --> 00:24:52,122 Well, suppose you're searching a 1,000-page phonebook 545 00:24:52,122 --> 00:24:55,080 and the person's name starts with Z. The algorithm is still going to be 546 00:24:55,080 --> 00:24:56,320 on the order of n steps. 547 00:24:56,320 --> 00:24:56,820 Why? 548 00:24:56,820 --> 00:25:01,080 Because it might take you as many as all n steps to find it. 549 00:25:01,080 --> 00:25:05,250 Now that's not necessarily going to be the case in practice. 550 00:25:05,250 --> 00:25:08,370 If I use big O as an upper bound, well, it 551 00:25:08,370 --> 00:25:11,160 would be nice if there's a corresponding lower bound, 552 00:25:11,160 --> 00:25:16,180 especially if you want to consider not just worst cases, but maybe best cases. 553 00:25:16,180 --> 00:25:18,040 So what might we use here? 554 00:25:18,040 --> 00:25:20,200 Well, this is a capital Greek omega symbol. 555 00:25:20,200 --> 00:25:23,550 So omega is the symbol that a computer scientist uses generally 556 00:25:23,550 --> 00:25:25,920 to describe a lower bound on an algorithm, 557 00:25:25,920 --> 00:25:28,710 often in the context of best case, though not necessarily. 558 00:25:28,710 --> 00:25:32,490 So a lower bound means how few steps might an algorithm take? 559 00:25:32,490 --> 00:25:33,990 And here, too, same formulas. 560 00:25:33,990 --> 00:25:36,270 And we'll fill in these blanks over time. 561 00:25:36,270 --> 00:25:40,140 Some algorithms might always take a minimum of n squared steps, 562 00:25:40,140 --> 00:25:41,370 or on the order of n steps. 563 00:25:41,370 --> 00:25:45,660 Some might only take n log n, or n, or log n, or 1. 564 00:25:45,660 --> 00:25:49,170 So something like a linear search-- 565 00:25:49,170 --> 00:25:51,240 when Stephanie started with linear search, 566 00:25:51,240 --> 00:25:52,980 she didn't get lucky this time on stage. 567 00:25:52,980 --> 00:25:57,720 But what if she had, and the first door she opened were 50? 568 00:25:57,720 --> 00:26:01,260 How might you then describe the lower bound on linear 569 00:26:01,260 --> 00:26:08,290 search in this so-called best case, using this list of possible answers? 570 00:26:08,290 --> 00:26:09,530 Yeah? 571 00:26:09,530 --> 00:26:11,060 Yeah, so omega of 1. 572 00:26:11,060 --> 00:26:14,900 So in the best case, the lower bound on how many steps 573 00:26:14,900 --> 00:26:18,990 it might take linear search to find something might just be one step. 574 00:26:18,990 --> 00:26:19,490 Why? 575 00:26:19,490 --> 00:26:21,500 Because maybe if Stephanie had gotten lucky 576 00:26:21,500 --> 00:26:25,960 and we had prefilled these lockers with the numbers in some other order 577 00:26:25,960 --> 00:26:28,460 such that she might have opened the first locker, and voila, 578 00:26:28,460 --> 00:26:31,280 the number 50 could have been there, so a lower bound arguably 579 00:26:31,280 --> 00:26:34,610 could indeed be omega of 1 for linear search. 580 00:26:34,610 --> 00:26:35,990 And how about now for Jackson? 581 00:26:35,990 --> 00:26:37,440 He used binary search. 582 00:26:37,440 --> 00:26:40,940 So he dived right into the middle of the problem. 583 00:26:40,940 --> 00:26:45,020 But what would be a lower bound on binary search using this logic? 584 00:26:45,020 --> 00:26:45,980 Yeah? 585 00:26:45,980 --> 00:26:47,460 Yeah, so again, omega of 1. 586 00:26:47,460 --> 00:26:47,960 Why? 587 00:26:47,960 --> 00:26:49,580 Because maybe he just gets lucky. 588 00:26:49,580 --> 00:26:53,300 And indeed, right in the middle of the lockers could have been the number 50. 589 00:26:53,300 --> 00:26:54,060 It wasn't. 590 00:26:54,060 --> 00:26:57,770 And so more germane in Jackson's actual practice 591 00:26:57,770 --> 00:27:00,050 would have been the big O discussion. 592 00:27:00,050 --> 00:27:03,350 But big O and omega, upper bound and lower bound, 593 00:27:03,350 --> 00:27:05,450 just allow a computer scientist to kind of wrestle 594 00:27:05,450 --> 00:27:06,980 with what could happen maybe in the worst 595 00:27:06,980 --> 00:27:08,670 case, what can happen in the best case? 596 00:27:08,670 --> 00:27:12,267 And you can even get even more precise like the average case or the like. 597 00:27:12,267 --> 00:27:15,350 And this is, indeed, what engineers might do at a whiteboard in a company, 598 00:27:15,350 --> 00:27:17,600 in a university when designing an algorithm 599 00:27:17,600 --> 00:27:21,380 and trying to make arguments as to why their algorithm is better than someone 600 00:27:21,380 --> 00:27:24,080 else's, by way of these kinds of analyses. 601 00:27:24,080 --> 00:27:29,180 And just so you've seen it, it turns out that if some algorithm happens 602 00:27:29,180 --> 00:27:32,990 to have an identical upper bound and lower bound, 603 00:27:32,990 --> 00:27:35,880 you can actually use a capital Greek theta as well. 604 00:27:35,880 --> 00:27:38,210 And this is the last of the Greek symbols today. 605 00:27:38,210 --> 00:27:41,900 But a Greek theta indicates a coincidence of both upper 606 00:27:41,900 --> 00:27:43,130 bound and lower bound. 607 00:27:43,130 --> 00:27:44,702 That is, they are one and the same. 608 00:27:44,702 --> 00:27:47,660 That was not the case for our discussion a second ago of linear search, 609 00:27:47,660 --> 00:27:49,220 not the case for binary search. 610 00:27:49,220 --> 00:27:52,970 But you could use the same kinds of formulas 611 00:27:52,970 --> 00:27:56,970 if it turns out that your upper bound and lower bound are the same. 612 00:27:56,970 --> 00:28:00,440 So for instance, if I were to count everyone literally in this room-- one, 613 00:28:00,440 --> 00:28:03,870 two, three, four, five, six and so forth-- 614 00:28:03,870 --> 00:28:09,080 you could actually say that counting in that way is in theta of n 615 00:28:09,080 --> 00:28:11,150 because in the best case, it's going to take 616 00:28:11,150 --> 00:28:13,468 me n points, people in the audience. 617 00:28:13,468 --> 00:28:15,260 In the worst case, it's going to take me n. 618 00:28:15,260 --> 00:28:18,380 It's always going to take me n steps if I want to count everyone in the room. 619 00:28:18,380 --> 00:28:20,930 You can't really do better than that unless you skip people. 620 00:28:20,930 --> 00:28:23,510 So that would be an example of the cuff of something 621 00:28:23,510 --> 00:28:26,150 where theta is instead germane. 622 00:28:26,150 --> 00:28:31,530 Are any questions now on big O, on omega, or theta, 623 00:28:31,530 --> 00:28:34,040 which are now just more formal tools in the toolkit 624 00:28:34,040 --> 00:28:38,730 for talking about the design of our algorithms? 625 00:28:38,730 --> 00:28:42,050 Any questions? 626 00:28:42,050 --> 00:28:42,860 No? 627 00:28:42,860 --> 00:28:44,720 Seeing none. 628 00:28:44,720 --> 00:28:45,560 Oh, is this-- yes? 629 00:28:45,560 --> 00:28:46,840 No? 630 00:28:46,840 --> 00:28:48,250 OK, so we're good. 631 00:28:48,250 --> 00:28:52,000 So let's go ahead and translate this, perhaps, to some actual code. 632 00:28:52,000 --> 00:28:53,900 Let me go over to VS Code here. 633 00:28:53,900 --> 00:28:57,100 And let's see if we can't now translate some of these ideas 634 00:28:57,100 --> 00:29:00,280 to some actual code, not so much using new syntax yet. 635 00:29:00,280 --> 00:29:03,320 We're going to still operate in this world of arrays like last week. 636 00:29:03,320 --> 00:29:05,237 So let me go ahead and create a program called 637 00:29:05,237 --> 00:29:09,280 search.c by executing code space search.c in my terminal. 638 00:29:09,280 --> 00:29:11,800 And then up here, let's go ahead and include our usual, 639 00:29:11,800 --> 00:29:14,740 so include cs50.h so I can get some input. 640 00:29:14,740 --> 00:29:18,370 Include standard io.h so I can print some output. 641 00:29:18,370 --> 00:29:21,910 We'll do int main void, the meaning of which we did start 642 00:29:21,910 --> 00:29:23,140 to tease apart last week. 643 00:29:23,140 --> 00:29:26,650 The fact that it's void again today just means no command line arguments. 644 00:29:26,650 --> 00:29:28,580 And let me go ahead and do this. 645 00:29:28,580 --> 00:29:33,130 Let me go ahead and declare, just for discussion's sake, a static array, 646 00:29:33,130 --> 00:29:34,840 like an array that never changes. 647 00:29:34,840 --> 00:29:39,280 And the syntax for this is going to be give me an array called numbers 648 00:29:39,280 --> 00:29:41,290 using the square bracket notation. 649 00:29:41,290 --> 00:29:43,390 And I'm going to immediately initialize it 650 00:29:43,390 --> 00:29:49,300 to 20, 500, 10, 5, 100, 1, and 50, reminiscent of those same denominations 651 00:29:49,300 --> 00:29:50,060 as before. 652 00:29:50,060 --> 00:29:54,080 So this is a slightly new syntax that we've perhaps not seen. 653 00:29:54,080 --> 00:29:57,130 And the curly braces here, which are different from for loops 654 00:29:57,130 --> 00:30:00,820 and while loops and functions, just tell the compiler please give me 655 00:30:00,820 --> 00:30:05,380 an array of whatever size this is containing those numbers left to right. 656 00:30:05,380 --> 00:30:10,220 I could alternatively use last week's syntax of saying something like this. 657 00:30:10,220 --> 00:30:13,090 Let's see, 1, 2, 3, 4, 5, 6, 7 denominations. 658 00:30:13,090 --> 00:30:15,250 I could alternatively do this. 659 00:30:15,250 --> 00:30:21,910 And then I could say numbers bracket 0 equals 660 00:30:21,910 --> 00:30:25,570 20, numbers bracket 1 equals 500. 661 00:30:25,570 --> 00:30:27,572 And I could do this five more times. 662 00:30:27,572 --> 00:30:28,780 That's just a little tedious. 663 00:30:28,780 --> 00:30:30,580 If you know the numbers in advance, you don't 664 00:30:30,580 --> 00:30:32,530 have to tell the compiler how many there are. 665 00:30:32,530 --> 00:30:37,420 You can just let it figure it out that your numbers will be 10, 500, 10, 5, 666 00:30:37,420 --> 00:30:39,430 100, 1, and 50. 667 00:30:39,430 --> 00:30:42,550 So this is how you statically define an array. 668 00:30:42,550 --> 00:30:45,380 All right, let me just go ahead and ask the user now for a number. 669 00:30:45,380 --> 00:30:48,920 We'll call it n by using get_int and prompting them for a number-- 670 00:30:48,920 --> 00:30:50,020 so nothing new there. 671 00:30:50,020 --> 00:30:53,680 And now let me go ahead and implement linear search. 672 00:30:53,680 --> 00:30:57,520 And the pseudocode we had for this before used some array-like notation. 673 00:30:57,520 --> 00:30:59,620 Let me go ahead, then, and start similarly. 674 00:30:59,620 --> 00:31:04,270 For int i-- and you almost always start counting at i by convention. 675 00:31:04,270 --> 00:31:06,490 So that's perhaps a good starting point. 676 00:31:06,490 --> 00:31:09,790 I'm going to do this so long as i is less than 7. 677 00:31:09,790 --> 00:31:12,138 Not the best design to hard code the 7, but this is just 678 00:31:12,138 --> 00:31:13,930 for demonstration's sake for now, because I 679 00:31:13,930 --> 00:31:15,560 know how many numbers I put in there. 680 00:31:15,560 --> 00:31:16,992 And then I'm going to i++. 681 00:31:16,992 --> 00:31:19,450 So now I have the beginnings of a loop that will just allow 682 00:31:19,450 --> 00:31:21,550 me to iterate over the entire array. 683 00:31:21,550 --> 00:31:22,760 And let me ask this. 684 00:31:22,760 --> 00:31:27,370 If the current number at location i equals 685 00:31:27,370 --> 00:31:30,650 equals n, which is the number the human typed in, 686 00:31:30,650 --> 00:31:33,400 then let's go ahead and do something simple like printf, 687 00:31:33,400 --> 00:31:36,190 quote unquote, found, backslash n. 688 00:31:36,190 --> 00:31:40,240 And then per our discussion last week, to indicate that this is successful, 689 00:31:40,240 --> 00:31:42,610 I'm going to return 0 if I found it. 690 00:31:42,610 --> 00:31:45,940 And if I don't find it, I'm just going to go down here and, by default, 691 00:31:45,940 --> 00:31:48,640 say not found, backslash n. 692 00:31:48,640 --> 00:31:52,610 And just for convention-- whoops, just for good measure, per convention, 693 00:31:52,610 --> 00:31:55,630 I'll return 1 or, really, any value other than 0. 694 00:31:55,630 --> 00:31:57,130 0, recall, means success. 695 00:31:57,130 --> 00:32:00,670 And any other integer tends to mean error of some sort, 696 00:32:00,670 --> 00:32:02,690 irrespective of the number I'm looking for. 697 00:32:02,690 --> 00:32:06,670 So just to revisit, the only thing that's new here is the syntax. 698 00:32:06,670 --> 00:32:09,980 We're creating an array of seven numbers, these numbers. 699 00:32:09,980 --> 00:32:13,540 And then after that we have really highlighted here 700 00:32:13,540 --> 00:32:16,090 an implementation of linear search. 701 00:32:16,090 --> 00:32:19,390 I mean this is the C version, I daresay, of what Stephanie did on the board, 702 00:32:19,390 --> 00:32:22,540 whereas now the array is called numbers instead of doors. 703 00:32:22,540 --> 00:32:25,460 But I think it's pretty much the same. 704 00:32:25,460 --> 00:32:30,380 Let me go ahead and open my terminal window and run make search. 705 00:32:30,380 --> 00:32:32,518 Seems to compile, ./search. 706 00:32:32,518 --> 00:32:34,310 And let's go ahead and search for a number. 707 00:32:34,310 --> 00:32:36,230 We'll start with what we did before, 50. 708 00:32:36,230 --> 00:32:37,340 And it's found. 709 00:32:37,340 --> 00:32:39,770 Let's go ahead and run it again, ./search. 710 00:32:39,770 --> 00:32:42,500 Let's search for maybe 20 at the beginning. 711 00:32:42,500 --> 00:32:43,670 That one, too, is found. 712 00:32:43,670 --> 00:32:47,150 Let's run it one more time searching for like 1,000, 713 00:32:47,150 --> 00:32:50,720 which is not among the denominations. 714 00:32:50,720 --> 00:32:52,980 And that one, indeed, is not found. 715 00:32:52,980 --> 00:32:56,210 So we've taken an idea from Week 0, now formalized in Week 3, 716 00:32:56,210 --> 00:32:59,300 and just translated it now to code. 717 00:32:59,300 --> 00:33:05,500 Questions on this implementation of linear search? 718 00:33:05,500 --> 00:33:07,570 Linear search. 719 00:33:07,570 --> 00:33:08,680 Nothing. 720 00:33:08,680 --> 00:33:11,810 Oh, so successful so far today. 721 00:33:11,810 --> 00:33:15,820 So let's see if we can't maybe make this a little more interesting 722 00:33:15,820 --> 00:33:19,270 and see if we can't trip over a detail that's going to be important in C. 723 00:33:19,270 --> 00:33:23,330 And instead of doing numbers, let me go ahead and do this. 724 00:33:23,330 --> 00:33:25,030 We'll stay on theme with Monopoly. 725 00:33:25,030 --> 00:33:26,830 And I went down the rabbit hole of reading the Wikipedia 726 00:33:26,830 --> 00:33:27,730 article on Monopoly. 727 00:33:27,730 --> 00:33:32,170 And the original pieces or tokens that came with Monopoly-- and it turns out 728 00:33:32,170 --> 00:33:33,740 we can represent those with strings. 729 00:33:33,740 --> 00:33:37,690 So I'm going to create an array called strings, plural, of whatever 730 00:33:37,690 --> 00:33:39,170 size I defined here. 731 00:33:39,170 --> 00:33:42,340 And the very first monopoly pieces back in the day 732 00:33:42,340 --> 00:33:44,920 were a battleship that you could play with, 733 00:33:44,920 --> 00:33:53,320 a boot, a cannon, an iron, a thimble, and a top hat, some of which 734 00:33:53,320 --> 00:33:54,700 you might from the game nowadays. 735 00:33:54,700 --> 00:33:56,410 Turns out they've been changing these-- 736 00:33:56,410 --> 00:33:57,890 had no idea-- over the years. 737 00:33:57,890 --> 00:34:00,170 So here is, now, an array of strings. 738 00:34:00,170 --> 00:34:03,940 Let me go ahead and prompt the user now not for an integer anymore. 739 00:34:03,940 --> 00:34:07,970 I want to now search for one of these strings still using linear search. 740 00:34:07,970 --> 00:34:11,020 So let me create a string s, set it equal to get_string, 741 00:34:11,020 --> 00:34:13,840 prompt the user for a string to search for. 742 00:34:13,840 --> 00:34:19,540 And then I think my code here is almost the same, except for one detail. 743 00:34:19,540 --> 00:34:21,850 I now have an array called strings. 744 00:34:21,850 --> 00:34:24,040 I now have a variable called s. 745 00:34:24,040 --> 00:34:27,790 But it turns out, for reasons we'll explore in more detail next week, 746 00:34:27,790 --> 00:34:31,030 this line of code is not going to work. 747 00:34:31,030 --> 00:34:34,900 And it turns out the reason has to do with what we discussed 748 00:34:34,900 --> 00:34:36,880 last week of what a string really is. 749 00:34:36,880 --> 00:34:39,355 And what is a string, again? 750 00:34:39,355 --> 00:34:41,000 A string is an array. 751 00:34:41,000 --> 00:34:44,929 And it turns out, though, that equals equals is not 752 00:34:44,929 --> 00:34:49,760 going to generously compare all of the characters in an array for you 753 00:34:49,760 --> 00:34:51,949 just because you use equal equals. 754 00:34:51,949 --> 00:34:54,650 It turns out it's not going to compare every letter. 755 00:34:54,650 --> 00:35:00,260 And so thankfully, there is, in the string library 756 00:35:00,260 --> 00:35:03,058 that we introduced last week, a solution to this problem. 757 00:35:03,058 --> 00:35:05,850 The reason for the problem, we'll explore in more detail next week. 758 00:35:05,850 --> 00:35:09,860 But for now, just know that when you want to compare strings in C-- 759 00:35:09,860 --> 00:35:13,220 especially if you've come into the class knowing a bit of Java or Python or some 760 00:35:13,220 --> 00:35:15,680 other language-- you cannot use equals equals. 761 00:35:15,680 --> 00:35:18,500 Even though you could in Scratch, you cannot in C. 762 00:35:18,500 --> 00:35:21,620 So what I have to actually do here is this. 763 00:35:21,620 --> 00:35:26,330 I have to ask the question, does the return value of a function called str 764 00:35:26,330 --> 00:35:33,160 compare, or strcomp, equal 0 when passed in the current string 765 00:35:33,160 --> 00:35:36,050 and that's user input? 766 00:35:36,050 --> 00:35:39,790 So if you read the documentation for this function called str compare, 767 00:35:39,790 --> 00:35:44,500 you'll see that it takes two strings as input, first one and second one. 768 00:35:44,500 --> 00:35:47,140 It then-- someone decades ago wrote the code 769 00:35:47,140 --> 00:35:49,060 that probably uses a for loop or a while loop 770 00:35:49,060 --> 00:35:51,910 to compare every character in each of those strings. 771 00:35:51,910 --> 00:35:56,290 And it turns out it returns 0 if they are, in fact, equal. 772 00:35:56,290 --> 00:36:00,640 Turns out, too, it will return a positive number or a negative number 773 00:36:00,640 --> 00:36:02,440 in other situations. 774 00:36:02,440 --> 00:36:04,990 Any intuition for why it might actually be 775 00:36:04,990 --> 00:36:10,810 useful to have a function that allows you to check if two strings are equal? 776 00:36:10,810 --> 00:36:13,480 If they're not equal, what else might be interesting to know 777 00:36:13,480 --> 00:36:14,830 when comparing two strings? 778 00:36:14,830 --> 00:36:18,474 779 00:36:18,474 --> 00:36:19,391 If certain values are? 780 00:36:19,391 --> 00:36:23,347 STUDENT: [INAUDIBLE] 781 00:36:23,347 --> 00:36:24,430 DAVID MALAN: OK, possibly. 782 00:36:24,430 --> 00:36:26,950 Maybe you want to just how similar they are. 783 00:36:26,950 --> 00:36:28,810 And that's indeed an algorithm unto itself. 784 00:36:28,810 --> 00:36:31,410 But str compare is a little simpler than that. 785 00:36:31,410 --> 00:36:33,040 STUDENT: [INAUDIBLE] 786 00:36:33,040 --> 00:36:35,850 787 00:36:35,850 --> 00:36:39,100 DAVID MALAN: Exactly, if you're trying to alphabetize a whole list of strings, 788 00:36:39,100 --> 00:36:41,950 just like your phone probably is for your contacts or address book. 789 00:36:41,950 --> 00:36:44,320 It turns out that str compare will actually 790 00:36:44,320 --> 00:36:48,220 return a positive number or a negative number or a 0 791 00:36:48,220 --> 00:36:52,120 based on whether, maybe it comes alphabetically first or later, 792 00:36:52,120 --> 00:36:53,800 or in fact, equal. 793 00:36:53,800 --> 00:36:55,130 So that can be a useful thing. 794 00:36:55,130 --> 00:36:57,838 And that's just a teaser for a lower level explanation 795 00:36:57,838 --> 00:36:58,880 that we'll see next week. 796 00:36:58,880 --> 00:37:01,750 So now, let me cross my fingers and see if I got this right. 797 00:37:01,750 --> 00:37:05,410 Let me go ahead and do make search. 798 00:37:05,410 --> 00:37:08,590 Did compile, albeit slowly. 799 00:37:08,590 --> 00:37:11,920 Dot slash search, and let's search for something like the thimble. 800 00:37:11,920 --> 00:37:14,048 And we see that that's, indeed, found. 801 00:37:14,048 --> 00:37:16,090 Otherwise, let's search for something that I know 802 00:37:16,090 --> 00:37:19,060 isn't there, like a race car, which was there when I grew up. 803 00:37:19,060 --> 00:37:23,227 But huh, segmentation fault, core dumped. 804 00:37:23,227 --> 00:37:25,810 And actually, some of you have tripped over this error before. 805 00:37:25,810 --> 00:37:27,220 Anyone want to admit seeing this? 806 00:37:27,220 --> 00:37:29,350 So yeah, not something we've talked about, 807 00:37:29,350 --> 00:37:32,170 and honestly, not something I intended just now. 808 00:37:32,170 --> 00:37:34,450 But that too, we'll see next week. 809 00:37:34,450 --> 00:37:39,920 Any intuition for why my program just broke. 810 00:37:39,920 --> 00:37:41,900 I didn't really change the logic. 811 00:37:41,900 --> 00:37:43,550 It's still linear search. 812 00:37:43,550 --> 00:37:46,280 Let me hide the terminal so you can see all of the code at once. 813 00:37:46,280 --> 00:37:49,850 The only thing I did was switched from integers to strings. 814 00:37:49,850 --> 00:37:52,310 And I switched to str compare here. 815 00:37:52,310 --> 00:37:54,205 But segmentation fault happened. 816 00:37:54,205 --> 00:37:57,080 And the teaser is that that somehow relates to the computer's memory. 817 00:37:57,080 --> 00:37:57,996 Yeah. 818 00:37:57,996 --> 00:38:00,690 STUDENT: [INAUDIBLE] 819 00:38:00,690 --> 00:38:01,470 820 00:38:01,470 --> 00:38:03,670 DAVID MALAN: Yeah, and this is subtle, but spot on. 821 00:38:03,670 --> 00:38:06,510 So one, two, three, four, five, six elements 822 00:38:06,510 --> 00:38:11,880 total in this array, versus the seven numbers of monopoly denominations 823 00:38:11,880 --> 00:38:12,810 that we had earlier. 824 00:38:12,810 --> 00:38:13,888 And this is where, see? 825 00:38:13,888 --> 00:38:15,930 Sort of case in point, this came back to bite me. 826 00:38:15,930 --> 00:38:19,860 The fact that I hardcoded this value as opposed to maybe separating it out 827 00:38:19,860 --> 00:38:22,260 as a constant or declaring it higher up, kind of 828 00:38:22,260 --> 00:38:26,610 bit me here, because now, I'm iterating over an array of size 6. 829 00:38:26,610 --> 00:38:29,820 But clearly, I'm going one step too far, because I'm literally 830 00:38:29,820 --> 00:38:32,250 going to iterate seven times, not six. 831 00:38:32,250 --> 00:38:35,580 So it's as though I'm looking at memory that's over here. 832 00:38:35,580 --> 00:38:37,530 And indeed, next week, we'll focus on memory. 833 00:38:37,530 --> 00:38:38,860 And that's just a bad thing. 834 00:38:38,860 --> 00:38:42,270 So odds are, not even seeing your code from this past week, if any of you 835 00:38:42,270 --> 00:38:46,020 have had segmentation faults, odds are, you touched memory 836 00:38:46,020 --> 00:38:47,280 that you shouldn't have. 837 00:38:47,280 --> 00:38:49,290 You maybe looped too many times. 838 00:38:49,290 --> 00:38:52,770 You might have used a negative number to get into your array. 839 00:38:52,770 --> 00:38:55,220 In general, you touched memory that you shouldn't have. 840 00:38:55,220 --> 00:38:57,720 And you touched a segment of memory that you shouldn't have. 841 00:38:57,720 --> 00:39:00,060 The fix, though, at least in my case, is simple. 842 00:39:00,060 --> 00:39:01,300 Just don't do that. 843 00:39:01,300 --> 00:39:03,210 So let me go ahead and recompile this. 844 00:39:03,210 --> 00:39:06,870 Make search dot slash search. 845 00:39:06,870 --> 00:39:10,320 And I'll search again for race car, Enter. 846 00:39:10,320 --> 00:39:11,850 And now it does not crash. 847 00:39:11,850 --> 00:39:13,630 But it does tell me it's not found. 848 00:39:13,630 --> 00:39:17,040 So subtle, but something you might yourself have tripped over already. 849 00:39:17,040 --> 00:39:23,190 Questions then, on what I just did, intentionally or otherwise. 850 00:39:23,190 --> 00:39:24,423 Yeah, in front. 851 00:39:24,423 --> 00:39:29,060 STUDENT: One thing is the program still works if you do return-- 852 00:39:29,060 --> 00:39:31,275 if you don't do return 0, return 1. 853 00:39:31,275 --> 00:39:33,220 So what is the purpose of doing [INAUDIBLE]?? 854 00:39:33,220 --> 00:39:34,720 DAVID MALAN: A really good question. 855 00:39:34,720 --> 00:39:38,920 So the program will still work even if I don't return 0 or return 1. 856 00:39:38,920 --> 00:39:43,120 In fact, let me go ahead and do that and just hide my terminal window 857 00:39:43,120 --> 00:39:43,930 for a second. 858 00:39:43,930 --> 00:39:46,210 Let's get rid of the return here. 859 00:39:46,210 --> 00:39:48,040 Let's get rid of the return here. 860 00:39:48,040 --> 00:39:50,810 However, watch what happens here. 861 00:39:50,810 --> 00:39:53,710 Let me go ahead and recompile this, make search. 862 00:39:53,710 --> 00:39:55,610 Let me scroll up in my code here. 863 00:39:55,610 --> 00:39:57,560 Let me go ahead and do dot slash search. 864 00:39:57,560 --> 00:40:00,280 And let me go ahead and search for the first thing in the list, 865 00:40:00,280 --> 00:40:02,800 battle ship, so I know that this should be found. 866 00:40:02,800 --> 00:40:04,690 I hit Enter. 867 00:40:04,690 --> 00:40:05,858 Huh, interesting. 868 00:40:05,858 --> 00:40:07,150 So it's saying found not found. 869 00:40:07,150 --> 00:40:11,496 But do you see why, logically, in this case? 870 00:40:11,496 --> 00:40:12,980 STUDENT: Is the loop still running? 871 00:40:12,980 --> 00:40:13,910 DAVID MALAN: Exactly. 872 00:40:13,910 --> 00:40:15,302 So the loop is still running. 873 00:40:15,302 --> 00:40:17,010 So there's a couple of solutions to this. 874 00:40:17,010 --> 00:40:21,080 I could, for instance, somehow break out of the code here. 875 00:40:21,080 --> 00:40:24,200 But that's going to still result in line 18 executing. 876 00:40:24,200 --> 00:40:26,600 I could then instead just return here. 877 00:40:26,600 --> 00:40:29,390 I don't strictly need to return 1 down at the bottom. 878 00:40:29,390 --> 00:40:31,580 But I made this claim last week that it tends 879 00:40:31,580 --> 00:40:35,600 to be helpful as your programs get more sophisticated, to at least signify, 880 00:40:35,600 --> 00:40:39,300 just like a real world programmer, error codes when something goes wrong. 881 00:40:39,300 --> 00:40:44,090 So returning 0 in main is the easiest way to signify my code is done. 882 00:40:44,090 --> 00:40:46,340 I'm ready to exit successfully, that's it. 883 00:40:46,340 --> 00:40:48,988 But down here, I could absolutely still return 0, 884 00:40:48,988 --> 00:40:50,280 because that's not a huge deal. 885 00:40:50,280 --> 00:40:53,870 It's not really an error that deserves annoying the user with some kind of pop 886 00:40:53,870 --> 00:40:55,200 up that something went wrong. 887 00:40:55,200 --> 00:40:57,830 But return 1 is just a lower level way of signaling, 888 00:40:57,830 --> 00:41:00,330 eh, it didn't really find what I was looking for. 889 00:41:00,330 --> 00:41:03,510 And remember from last week, you can see this as follows. 890 00:41:03,510 --> 00:41:08,060 If I recompile this again, now that I've reverted those changes, so make search. 891 00:41:08,060 --> 00:41:13,100 And if I do a dot slash search and search for battle ship, 892 00:41:13,100 --> 00:41:16,700 which is indeed found, recall I can execute this magical command, 893 00:41:16,700 --> 00:41:19,790 echo dollar sign question mark, which you're not going to often execute. 894 00:41:19,790 --> 00:41:22,790 But it shows you what main returned. 895 00:41:22,790 --> 00:41:27,770 If I run search again and search for race car, which is not found, 896 00:41:27,770 --> 00:41:30,530 I see not found, but I can also run this command again 897 00:41:30,530 --> 00:41:32,150 and see that, oh, it returned 1. 898 00:41:32,150 --> 00:41:35,300 So now if you fast forward a few months, a few years, when you're actually 899 00:41:35,300 --> 00:41:37,850 writing code in a company or for larger projects, 900 00:41:37,850 --> 00:41:40,100 you might want to be automating software. 901 00:41:40,100 --> 00:41:43,100 You might not want the human to necessarily be running it manually. 902 00:41:43,100 --> 00:41:47,240 You might want code to be automated by some nightly process 903 00:41:47,240 --> 00:41:48,360 or something like that. 904 00:41:48,360 --> 00:41:53,030 Using these exit codes, can a program determine yes or no 905 00:41:53,030 --> 00:41:55,910 that other code succeeded or failed. 906 00:41:55,910 --> 00:42:01,850 Other questions on linear search in this way. 907 00:42:01,850 --> 00:42:02,350 No? 908 00:42:02,350 --> 00:42:07,570 All right, well, let's translate this to one other feature of C 909 00:42:07,570 --> 00:42:11,590 here by incorporating these two ideas now into one other program. 910 00:42:11,590 --> 00:42:16,605 So I'm going to create a phone book in C by doing code space phonebook dot C. 911 00:42:16,605 --> 00:42:18,730 And let's combine some of these ideas and implement 912 00:42:18,730 --> 00:42:21,700 this notion of searching a phonebook for an actual name 913 00:42:21,700 --> 00:42:23,030 and getting back a number. 914 00:42:23,030 --> 00:42:26,500 So I'm going to go ahead and quickly include some of the same things, cs50.h 915 00:42:26,500 --> 00:42:28,060 so we can get input. 916 00:42:28,060 --> 00:42:30,860 standard io dot h so we can print output. 917 00:42:30,860 --> 00:42:33,310 And I'm going to preemptively include string.h 918 00:42:33,310 --> 00:42:35,320 in case we need that one as well. 919 00:42:35,320 --> 00:42:39,010 int main void, no need for command line arguments today. 920 00:42:39,010 --> 00:42:42,650 And let me give myself, now, an array of names for this phone book. 921 00:42:42,650 --> 00:42:45,040 So string names equals. 922 00:42:45,040 --> 00:42:47,620 And then in curly braces, how about Carter 923 00:42:47,620 --> 00:42:50,840 will be one person in the phone book, and David, myself, will be the other. 924 00:42:50,840 --> 00:42:53,465 So we'll keep it short so we don't have to type too many names. 925 00:42:53,465 --> 00:42:55,840 But this is a phone book with two people thus far. 926 00:42:55,840 --> 00:42:59,628 Suppose, now, we want to also store Carter's phone number in mind. 927 00:42:59,628 --> 00:43:01,420 So it's not just saying found or not found. 928 00:43:01,420 --> 00:43:05,320 It's literally looking up our phone numbers like a proper phone book. 929 00:43:05,320 --> 00:43:09,440 Well, at the moment, there's really no way to do this. 930 00:43:09,440 --> 00:43:15,460 I could do something hackish like I could put a number like 617-495-1000 931 00:43:15,460 --> 00:43:16,510 after Carter. 932 00:43:16,510 --> 00:43:22,460 I could maybe do something like 949-468-2750 after me. 933 00:43:22,460 --> 00:43:25,300 But now you're kind of doing the whole apples and oranges thing. 934 00:43:25,300 --> 00:43:26,470 Now, it's not strings. 935 00:43:26,470 --> 00:43:28,420 It's a string int, string int. 936 00:43:28,420 --> 00:43:31,240 All right, so maybe I could just make all of these strings. 937 00:43:31,240 --> 00:43:34,600 But now it's just a conceptual mixing of apples and oranges. 938 00:43:34,600 --> 00:43:36,425 Like yes, that's an array of four strings. 939 00:43:36,425 --> 00:43:39,550 But now you're on the honor system to know that the first string is a name, 940 00:43:39,550 --> 00:43:42,160 the second string is a number, the third string is-- 941 00:43:42,160 --> 00:43:43,100 you can do it. 942 00:43:43,100 --> 00:43:45,110 But it's a bit of a hack, so to speak. 943 00:43:45,110 --> 00:43:47,300 So what might be cleaner than this? 944 00:43:47,300 --> 00:43:51,070 Instead of combining our phone numbers into the same array as our names, 945 00:43:51,070 --> 00:43:55,480 what else might we do that's perhaps a little better? 946 00:43:55,480 --> 00:43:56,440 Say it little louder. 947 00:43:56,440 --> 00:43:58,960 948 00:43:58,960 --> 00:44:01,197 A 2D array, possibly something we could do. 949 00:44:01,197 --> 00:44:04,030 I'm going to keep it even simpler now, because we haven't used those 950 00:44:04,030 --> 00:44:07,823 by name, even though that is, we saw last week, technically what argv is. 951 00:44:07,823 --> 00:44:10,240 What else could I do if I want to store names and numbers? 952 00:44:10,240 --> 00:44:11,147 Yeah. 953 00:44:11,147 --> 00:44:12,220 STUDENT: [INAUDIBLE] 954 00:44:12,220 --> 00:44:13,690 DAVID MALAN: Yeah, let me go with this suggestion. 955 00:44:13,690 --> 00:44:14,607 It's a little simpler. 956 00:44:14,607 --> 00:44:17,440 Rather than complicate things in literally different dimensions, 957 00:44:17,440 --> 00:44:18,970 let me go ahead and do string. 958 00:44:18,970 --> 00:44:21,730 Well, I could do int numbers. 959 00:44:21,730 --> 00:44:22,690 But you know what? 960 00:44:22,690 --> 00:44:27,700 So that we can support punctuation like dashes or even parentheses or country 961 00:44:27,700 --> 00:44:29,200 codes, I'm going to do this instead. 962 00:44:29,200 --> 00:44:33,760 I'm going to do string numbers so that I can represent Carter's number as quote 963 00:44:33,760 --> 00:44:39,040 unquote plus 1 for the US, 617-495-1000, complete with hyphens, 964 00:44:39,040 --> 00:44:40,390 as is US convention. 965 00:44:40,390 --> 00:44:47,930 And then for mine I'll go ahead and do +1-949-468-2750 semicolon. 966 00:44:47,930 --> 00:44:51,610 And now down below, let's actually enable the user to search this phone 967 00:44:51,610 --> 00:44:53,860 book, just like in week 0 we did. 968 00:44:53,860 --> 00:44:55,960 String name equals get string. 969 00:44:55,960 --> 00:44:58,960 And let's ask the user for a name, presumably David or Carter 970 00:44:58,960 --> 00:44:59,990 or someone else. 971 00:44:59,990 --> 00:45:01,850 And now let's re-implement linear search. 972 00:45:01,850 --> 00:45:05,920 So 4, int i get 0. i is less than 2. 973 00:45:05,920 --> 00:45:07,510 And do as I say, not as I do. 974 00:45:07,510 --> 00:45:09,700 I think we should beware this hard coding, 975 00:45:09,700 --> 00:45:12,010 but we'll keep it simple for now. 976 00:45:12,010 --> 00:45:13,220 i++. 977 00:45:13,220 --> 00:45:16,390 And then in this for loop, I think we have all of the ingredients 978 00:45:16,390 --> 00:45:17,150 to solve this. 979 00:45:17,150 --> 00:45:23,440 So if the return value of str compare of all of the names bracket 980 00:45:23,440 --> 00:45:28,810 i comparing against the name that the human typed in, if all of that equals 981 00:45:28,810 --> 00:45:33,380 equals 0, that is, all of the characters in those two strings are equal, 982 00:45:33,380 --> 00:45:36,770 then I think we can go ahead and say found, just like last time. 983 00:45:36,770 --> 00:45:37,520 But you know what? 984 00:45:37,520 --> 00:45:40,130 Let's actually print Carter's or my phone number. 985 00:45:40,130 --> 00:45:44,770 So found percent s, and we'll plug in numbers, bracket i. 986 00:45:44,770 --> 00:45:47,800 And then just for consistency, I'll return 0 here. 987 00:45:47,800 --> 00:45:52,210 And down here, how about I'll say something like printf not 988 00:45:52,210 --> 00:45:53,600 found, just to be clear. 989 00:45:53,600 --> 00:45:56,240 And then I'll return 1 as well. 990 00:45:56,240 --> 00:45:58,120 So just to recap, here's all of the code. 991 00:45:58,120 --> 00:46:01,610 It's almost the same as before, except now it's useful. 992 00:46:01,610 --> 00:46:03,460 I'm not just saying found or not found. 993 00:46:03,460 --> 00:46:07,180 I found a number in monopoly, or I found a piece in monopoly. 994 00:46:07,180 --> 00:46:09,880 I'm looking up in one array, one of the strings. 995 00:46:09,880 --> 00:46:12,730 And then I'm printing from the other array, the answer. 996 00:46:12,730 --> 00:46:19,480 So let me go ahead here and run the compiler, make phone book, Enter. 997 00:46:19,480 --> 00:46:21,070 OK, that's promising, no errors. 998 00:46:21,070 --> 00:46:22,720 Dot slash phonebook now. 999 00:46:22,720 --> 00:46:26,350 And let's search, for instance, Carter Enter. 1000 00:46:26,350 --> 00:46:28,060 All right, so we found Carter's number. 1001 00:46:28,060 --> 00:46:29,393 All right, let me do that again. 1002 00:46:29,393 --> 00:46:30,960 Phone book, let's search for David. 1003 00:46:30,960 --> 00:46:32,960 All right, we seem to have found David's number. 1004 00:46:32,960 --> 00:46:34,502 All right, let's do it one last time. 1005 00:46:34,502 --> 00:46:35,410 Phone book, Enter. 1006 00:46:35,410 --> 00:46:37,360 And now we'll search for John Harvard. 1007 00:46:37,360 --> 00:46:40,060 Enter, not found. 1008 00:46:40,060 --> 00:46:45,520 All right, so I daresay, albeit with minimal testing, this code is correct. 1009 00:46:45,520 --> 00:46:48,190 Would anyone now like to critique the design? 1010 00:46:48,190 --> 00:46:51,910 Does something rub you the wrong way, perhaps, about this approach here? 1011 00:46:51,910 --> 00:46:55,120 1012 00:46:55,120 --> 00:46:58,180 And as always, think about how, if the program maybe 1013 00:46:58,180 --> 00:47:01,510 gets longer, more complicated, how decisions like this might unfold. 1014 00:47:01,510 --> 00:47:02,448 Yeah. 1015 00:47:02,448 --> 00:47:04,400 STUDENT: If i is less than 2. 1016 00:47:04,400 --> 00:47:07,820 DAVID MALAN: OK, so if i is less than 2, so technically, 1017 00:47:07,820 --> 00:47:10,080 if I change the number of people in this phone book, 1018 00:47:10,080 --> 00:47:11,330 I'm going to have to update i. 1019 00:47:11,330 --> 00:47:13,290 And we've already seen that I get myself into trouble. 1020 00:47:13,290 --> 00:47:14,165 So that's bad design. 1021 00:47:14,165 --> 00:47:15,005 Good. 1022 00:47:15,005 --> 00:47:17,795 STUDENT: Say you add someone's name to the phonebook, 1023 00:47:17,795 --> 00:47:20,710 but you don't have the corresponding number. 1024 00:47:20,710 --> 00:47:23,380 So then when you go to pull their number, 1025 00:47:23,380 --> 00:47:24,730 it [INAUDIBLE] someone's number. 1026 00:47:24,730 --> 00:47:25,480 DAVID MALAN: Yeah. 1027 00:47:25,480 --> 00:47:28,180 So again, I'm sort of trusting myself not to screw up. 1028 00:47:28,180 --> 00:47:30,850 If I add John or anyone else to the first array 1029 00:47:30,850 --> 00:47:34,180 but I forget to add their number to the second array, 1030 00:47:34,180 --> 00:47:36,640 eventually things are going to drift and be inconsistent. 1031 00:47:36,640 --> 00:47:39,010 And then code will be incorrect at that point. 1032 00:47:39,010 --> 00:47:43,420 So sort of a poor design setting me up for future failure, if you will. 1033 00:47:43,420 --> 00:47:44,860 Other thoughts? 1034 00:47:44,860 --> 00:47:45,460 Yeah. 1035 00:47:45,460 --> 00:47:49,816 STUDENT: [INAUDIBLE] so if you were to switch the order of the numbers 1036 00:47:49,816 --> 00:47:52,848 but not the main [INAUDIBLE] 1037 00:47:52,848 --> 00:47:54,140 DAVID MALAN: Yeah, really good. 1038 00:47:54,140 --> 00:47:55,550 We're assuming the same order. 1039 00:47:55,550 --> 00:47:59,452 From left to right, the names go, and from left to right, the numbers go. 1040 00:47:59,452 --> 00:48:01,160 But that's kind of just the honor system. 1041 00:48:01,160 --> 00:48:03,440 Like, there's literally nothing in code preventing me 1042 00:48:03,440 --> 00:48:07,047 from reversing the order for whatever reason, or maybe sorting the names. 1043 00:48:07,047 --> 00:48:10,130 Like, they're sorted now, and maybe that's deliberate, but maybe it's not. 1044 00:48:10,130 --> 00:48:12,920 So this honor system here, too, is just not good. 1045 00:48:12,920 --> 00:48:17,180 I could put a comment in here to remind myself, note to self, 1046 00:48:17,180 --> 00:48:19,490 always update arrays the same way. 1047 00:48:19,490 --> 00:48:21,600 But like, something's going to happen eventually, 1048 00:48:21,600 --> 00:48:26,090 especially when we have not two, but three, but 30, 300 names and numbers. 1049 00:48:26,090 --> 00:48:29,670 It would be nice to keep all of the related data together. 1050 00:48:29,670 --> 00:48:32,810 And so in fact, the one new feature of C we'll introduce today 1051 00:48:32,810 --> 00:48:37,970 is one that actually allows us to implement our very own data structures. 1052 00:48:37,970 --> 00:48:41,870 You can think of arrays as a very lightweight data 1053 00:48:41,870 --> 00:48:44,870 structure, in that it allows you to cluster related data back 1054 00:48:44,870 --> 00:48:45,930 to back to back to back. 1055 00:48:45,930 --> 00:48:48,170 And this is how strings are implemented. 1056 00:48:48,170 --> 00:48:51,560 They are a data structure effectively implemented with an array. 1057 00:48:51,560 --> 00:48:54,290 But with C and with other languages, it turns out 1058 00:48:54,290 --> 00:48:56,600 you can invent your own data types, whether they're 1059 00:48:56,600 --> 00:48:59,870 one dimensional, two dimensional even, or beyond. 1060 00:48:59,870 --> 00:49:05,960 And with C, can you specifically create your own types 1061 00:49:05,960 --> 00:49:07,200 that have their own names? 1062 00:49:07,200 --> 00:49:10,670 So for instance, wouldn't it have been nice if C came with, 1063 00:49:10,670 --> 00:49:16,380 not just char and int and floats and long and others. 1064 00:49:16,380 --> 00:49:19,970 Wouldn't it be nice if C came with a data type called person? 1065 00:49:19,970 --> 00:49:22,790 And ideally, a person would have a name and a number. 1066 00:49:22,790 --> 00:49:24,860 Now, that's a little naive and unrealistic. 1067 00:49:24,860 --> 00:49:28,460 Like, why would they define a person to have just those two fields. 1068 00:49:28,460 --> 00:49:30,950 Certainly, people could have disagreed what a person is. 1069 00:49:30,950 --> 00:49:32,300 So they leave it to us. 1070 00:49:32,300 --> 00:49:35,900 The authors of C gave us all of these primitives, ints and floats and strings 1071 00:49:35,900 --> 00:49:36,810 and so forth. 1072 00:49:36,810 --> 00:49:39,810 But it's up to us now to use those in a more interesting way 1073 00:49:39,810 --> 00:49:44,940 so that we can create an array of person variables, if you will, 1074 00:49:44,940 --> 00:49:48,150 inside of an array called people, just to pluralize it here. 1075 00:49:48,150 --> 00:49:49,740 So how are we going to do this? 1076 00:49:49,740 --> 00:49:54,140 Well, for now, let's just stipulate that a person in the world 1077 00:49:54,140 --> 00:49:57,260 will have a name and a number that we could argue all day long what else 1078 00:49:57,260 --> 00:49:58,010 a person should have. 1079 00:49:58,010 --> 00:49:58,677 And that's fine. 1080 00:49:58,677 --> 00:50:01,790 You can invent your own person eventually. 1081 00:50:01,790 --> 00:50:04,610 At the moment, I'm using just two variables 1082 00:50:04,610 --> 00:50:06,500 to define a person's name and number. 1083 00:50:06,500 --> 00:50:10,490 But wouldn't it be nice to encapsulate, that is, combine these two data 1084 00:50:10,490 --> 00:50:14,660 types, into a new and improved data type called person. 1085 00:50:14,660 --> 00:50:17,360 And the syntax for that is going to be this. 1086 00:50:17,360 --> 00:50:18,800 So it's a bit of a mouthful. 1087 00:50:18,800 --> 00:50:21,960 But you can, perhaps, infer what some of this is doing here. 1088 00:50:21,960 --> 00:50:24,500 So it turns out C has a keyword called typedef. 1089 00:50:24,500 --> 00:50:28,310 As the name kind of suggests, this allows you to define your own type. 1090 00:50:28,310 --> 00:50:31,550 Struct is an indication that it's a structure. 1091 00:50:31,550 --> 00:50:35,060 It's like a structure that has multiple values inside of it 1092 00:50:35,060 --> 00:50:36,710 that you are trying to define. 1093 00:50:36,710 --> 00:50:39,440 And then at the very bottom here outside of the curly braces, 1094 00:50:39,440 --> 00:50:42,270 is the name of the type that you want to create. 1095 00:50:42,270 --> 00:50:45,790 So you don't have discretion over using typedef or struct 1096 00:50:45,790 --> 00:50:46,790 in this particular case. 1097 00:50:46,790 --> 00:50:48,665 But you can name the thing whatever you want. 1098 00:50:48,665 --> 00:50:52,590 And you can put anything in the structure that you want as well. 1099 00:50:52,590 --> 00:50:56,510 And as soon as the semicolon is executed at the bottom of the code, 1100 00:50:56,510 --> 00:50:59,180 every line thereafter can now have access 1101 00:50:59,180 --> 00:51:05,760 to a person data type, whether as a single variable, or as an entire array. 1102 00:51:05,760 --> 00:51:10,260 So if I want to build on this then, let me go ahead and do this. 1103 00:51:10,260 --> 00:51:12,230 Let me go back to my C code here. 1104 00:51:12,230 --> 00:51:17,610 And I'm going to go ahead and change just a couple of things. 1105 00:51:17,610 --> 00:51:19,110 Let's go ahead and do this. 1106 00:51:19,110 --> 00:51:23,240 I'm going to go ahead and, first, get rid of those two hardcoded arrays. 1107 00:51:23,240 --> 00:51:27,180 And let me go ahead and, at the top of my file, 1108 00:51:27,180 --> 00:51:30,180 invent this type, so typedef struct. 1109 00:51:30,180 --> 00:51:34,470 Inside of it will be a string name and then a string number. 1110 00:51:34,470 --> 00:51:36,780 And then the name of the structure will be person. 1111 00:51:36,780 --> 00:51:40,025 And best practice would have me define it at the very top of my file 1112 00:51:40,025 --> 00:51:42,150 so that any of my functions, in fact, could use it, 1113 00:51:42,150 --> 00:51:44,530 even though I just have main in this case. 1114 00:51:44,530 --> 00:51:47,100 Now, if I wanted, I could do this. 1115 00:51:47,100 --> 00:51:50,370 Person P1 and person P2. 1116 00:51:50,370 --> 00:51:53,040 But we know from last week, that already is bad design. 1117 00:51:53,040 --> 00:51:57,400 If you want to have multiple instances of the same type of variable, 1118 00:51:57,400 --> 00:52:00,044 we should probably use what instead? 1119 00:52:00,044 --> 00:52:01,046 STUDENT: [INAUDIBLE] 1120 00:52:01,046 --> 00:52:01,796 DAVID MALAN: And-- 1121 00:52:01,796 --> 00:52:02,470 STUDENT: An array. 1122 00:52:02,470 --> 00:52:03,637 DAVID MALAN: Yeah, an array. 1123 00:52:03,637 --> 00:52:05,230 So let me not even go down that road. 1124 00:52:05,230 --> 00:52:06,700 Let me instead just do this. 1125 00:52:06,700 --> 00:52:09,727 Person will be the type of the array. 1126 00:52:09,727 --> 00:52:10,810 But I'm going to call it-- 1127 00:52:10,810 --> 00:52:11,980 I could call it persons. 1128 00:52:11,980 --> 00:52:13,720 But in English, we typically say people. 1129 00:52:13,720 --> 00:52:15,190 So I'll call the array people. 1130 00:52:15,190 --> 00:52:18,110 And I want two people to exist in this array, 1131 00:52:18,110 --> 00:52:20,920 though I could certainly change that number to be anything I want. 1132 00:52:20,920 --> 00:52:24,820 How, now, do you put a name inside of a person 1133 00:52:24,820 --> 00:52:27,190 and then put the number inside of that same person? 1134 00:52:27,190 --> 00:52:28,990 Well, slightly new syntax today. 1135 00:52:28,990 --> 00:52:30,520 I'm going to go ahead and say this. 1136 00:52:30,520 --> 00:52:34,420 People bracket 0 just gives me the first person in the array. 1137 00:52:34,420 --> 00:52:35,570 That's not new. 1138 00:52:35,570 --> 00:52:40,840 But if you want to go inside of that person in memory, you use a dot. 1139 00:52:40,840 --> 00:52:44,870 And then you just specify the name of the attribute therein. 1140 00:52:44,870 --> 00:52:47,410 So if I want to set the first person's name to Carter, 1141 00:52:47,410 --> 00:52:49,480 I just use that so-called dot notation. 1142 00:52:49,480 --> 00:52:52,780 And then if I want to set Carter's number using dot notation, 1143 00:52:52,780 --> 00:52:56,680 I would do this, +1-617-495-1000. 1144 00:52:56,680 --> 00:52:58,880 And then if I want to do the same for myself, 1145 00:52:58,880 --> 00:53:03,730 I would now do people bracket 1 dot name equals quote unquote David. 1146 00:53:03,730 --> 00:53:08,440 And then people bracket 1 still dot number equals quote unquote 1147 00:53:08,440 --> 00:53:13,030 +1-949-468-2750. 1148 00:53:13,030 --> 00:53:15,970 And now, at the bottom of my file, I think 1149 00:53:15,970 --> 00:53:18,610 my logic can pretty much stay the same. 1150 00:53:18,610 --> 00:53:22,180 I can still, on this line here, prompt the user 1151 00:53:22,180 --> 00:53:24,370 for the name of the person they want to look up. 1152 00:53:24,370 --> 00:53:26,620 For now, even though I admit it's not the best design, 1153 00:53:26,620 --> 00:53:28,495 I'm just doing this for demonstration's sake, 1154 00:53:28,495 --> 00:53:31,360 I'm going to leave the two there, because I know I have two people. 1155 00:53:31,360 --> 00:53:34,100 But down here, this is going to have to change. 1156 00:53:34,100 --> 00:53:37,000 I don't want to compare names bracket i anymore. 1157 00:53:37,000 --> 00:53:42,190 What do I want to type here as the first argument to str compare? 1158 00:53:42,190 --> 00:53:43,900 What do I want to do here? 1159 00:53:43,900 --> 00:53:44,960 Yeah. 1160 00:53:44,960 --> 00:53:46,800 STUDENT: People i dot name. 1161 00:53:46,800 --> 00:53:49,140 DAVID MALAN: So people i dot name, yeah. 1162 00:53:49,140 --> 00:53:52,938 So I want to go into the people array at the ith location, 1163 00:53:52,938 --> 00:53:54,480 because that's what my loop is doing. 1164 00:53:54,480 --> 00:53:55,890 It's updating i again and again. 1165 00:53:55,890 --> 00:53:58,087 And then look at name, and that's good. 1166 00:53:58,087 --> 00:53:59,670 I think now I need to change this too. 1167 00:53:59,670 --> 00:54:01,890 What do I want to print if the person is found? 1168 00:54:01,890 --> 00:54:02,445 Someone else? 1169 00:54:02,445 --> 00:54:05,070 1170 00:54:05,070 --> 00:54:08,850 What do I want to print here, if I found the person's name? 1171 00:54:08,850 --> 00:54:09,360 Yeah. 1172 00:54:09,360 --> 00:54:10,890 STUDENT: [INAUDIBLE] 1173 00:54:10,890 --> 00:54:12,390 DAVID MALAN: Say it a little louder. 1174 00:54:12,390 --> 00:54:13,795 STUDENT: People i dot number. 1175 00:54:13,795 --> 00:54:14,670 DAVID MALAN: Perfect. 1176 00:54:14,670 --> 00:54:17,460 So people bracket i dot number, if indeed I 1177 00:54:17,460 --> 00:54:20,310 want to print the corresponding number to this person. 1178 00:54:20,310 --> 00:54:22,930 And then I think the rest of my code can stay the same. 1179 00:54:22,930 --> 00:54:27,150 So let me go ahead and rerun make phone book to recompile this version. 1180 00:54:27,150 --> 00:54:28,170 So far so good. 1181 00:54:28,170 --> 00:54:29,400 Dot slash phone book. 1182 00:54:29,400 --> 00:54:31,598 Let's go ahead and type in Carter's name, found. 1183 00:54:31,598 --> 00:54:33,390 All right, let's go ahead and run it again. 1184 00:54:33,390 --> 00:54:35,273 David's name, found. 1185 00:54:35,273 --> 00:54:36,940 Let's go ahead and run it one more time. 1186 00:54:36,940 --> 00:54:40,260 Type in John Harvard, for instance, not found, in this case. 1187 00:54:40,260 --> 00:54:43,710 So fundamentally, the code isn't all that different. 1188 00:54:43,710 --> 00:54:46,090 Linear search is still behaving the same way. 1189 00:54:46,090 --> 00:54:48,690 And I admit, this is kind of ugly looking. 1190 00:54:48,690 --> 00:54:52,350 We've kind of made a two line solution five lines of code now. 1191 00:54:52,350 --> 00:54:56,640 But if we fast forward a week or two when we start saving information 1192 00:54:56,640 --> 00:55:01,500 to files, we'll introduce you to files like csv files, 1193 00:55:01,500 --> 00:55:03,892 comma separated values, or spreadsheet files, which 1194 00:55:03,892 --> 00:55:06,600 you've surely opened on your Mac or PC at some point in the past. 1195 00:55:06,600 --> 00:55:09,840 Suffice it to say we'll soon learn techniques for storing information, 1196 00:55:09,840 --> 00:55:11,790 like names and numbers, in files. 1197 00:55:11,790 --> 00:55:13,680 And at that point, we're not going to do any 1198 00:55:13,680 --> 00:55:15,990 of this hackish sort of hard coding of the number 2 1199 00:55:15,990 --> 00:55:19,080 and manually typing my name and Carter's name and number into our program. 1200 00:55:19,080 --> 00:55:21,750 We'll read the information dynamically from a file. 1201 00:55:21,750 --> 00:55:25,180 And in a few weeks, we'll read it dynamically from a database instead. 1202 00:55:25,180 --> 00:55:30,240 But this is, for now, just syntactically how we can create an array of size 2 1203 00:55:30,240 --> 00:55:32,190 containing one person each. 1204 00:55:32,190 --> 00:55:34,643 We can update the name and number of the first person, 1205 00:55:34,643 --> 00:55:36,810 update the name and the number of the second person, 1206 00:55:36,810 --> 00:55:40,500 and then later search across those names and print out 1207 00:55:40,500 --> 00:55:41,610 the corresponding numbers. 1208 00:55:41,610 --> 00:55:44,220 And in this sense, this is a better design. 1209 00:55:44,220 --> 00:55:44,730 Why? 1210 00:55:44,730 --> 00:55:48,690 Because my person data type encapsulates, 1211 00:55:48,690 --> 00:55:53,400 now, everything that it means to be a person, at least in this narrow world. 1212 00:55:53,400 --> 00:55:57,580 And if I want to add something to the notion of a person, for instance, 1213 00:55:57,580 --> 00:55:59,640 I could go up to my type def, and tomorrow, 1214 00:55:59,640 --> 00:56:03,743 add an address to every person and start reading that in as well. 1215 00:56:03,743 --> 00:56:05,160 And now it's not the honor system. 1216 00:56:05,160 --> 00:56:10,260 It's not a names array, a numbers array, an addresses array, and everything else 1217 00:56:10,260 --> 00:56:12,210 you might imagine related to a person. 1218 00:56:12,210 --> 00:56:17,223 It's all encapsulated, which is a term of art inside of the same type. 1219 00:56:17,223 --> 00:56:19,890 Reminiscent, if some of you have programmed before, of something 1220 00:56:19,890 --> 00:56:21,660 called object-oriented programming. 1221 00:56:21,660 --> 00:56:23,190 But we're not there yet. 1222 00:56:23,190 --> 00:56:24,690 C is not that. 1223 00:56:24,690 --> 00:56:29,280 Questions on this use of struct or this new syntax, 1224 00:56:29,280 --> 00:56:35,037 the dot operator being really the juicy part here. 1225 00:56:35,037 --> 00:56:35,620 Any questions? 1226 00:56:35,620 --> 00:56:36,522 Yeah. 1227 00:56:36,522 --> 00:56:39,414 STUDENT: [INAUDIBLE] 1228 00:56:39,414 --> 00:56:42,800 1229 00:56:42,800 --> 00:56:44,420 DAVID MALAN: On what line number? 1230 00:56:44,420 --> 00:56:46,063 STUDENT: 16. 1231 00:56:46,063 --> 00:56:46,730 DAVID MALAN: 16? 1232 00:56:46,730 --> 00:56:51,230 So yes, so syntactically, we introduced the square brackets last week. 1233 00:56:51,230 --> 00:56:55,310 So doing people bracket 0 just means go to the first person in the array. 1234 00:56:55,310 --> 00:56:58,400 That was like when Stephanie literally opened this door. 1235 00:56:58,400 --> 00:56:59,990 That's doors bracket 0. 1236 00:56:59,990 --> 00:57:02,330 But this is, of course, people bracket 0 instead. 1237 00:57:02,330 --> 00:57:04,580 Today, the dot is a new piece of syntax. 1238 00:57:04,580 --> 00:57:11,000 It means go inside of that person in memory and look at the name they're in 1239 00:57:11,000 --> 00:57:13,297 and set it equal to Carter and do the same for number. 1240 00:57:13,297 --> 00:57:13,880 So that's all. 1241 00:57:13,880 --> 00:57:16,340 It's like, open the locker door, go inside of it, 1242 00:57:16,340 --> 00:57:18,410 and check or set the name and the number. 1243 00:57:18,410 --> 00:57:19,040 Yeah. 1244 00:57:19,040 --> 00:57:29,280 STUDENT: [INAUDIBLE] can you set default values for each of the [INAUDIBLE]?? 1245 00:57:29,280 --> 00:57:30,840 DAVID MALAN: Attributes is fine. 1246 00:57:30,840 --> 00:57:31,530 Good question. 1247 00:57:31,530 --> 00:57:34,050 In the struct, can you set default values? 1248 00:57:34,050 --> 00:57:35,100 Short answer, no. 1249 00:57:35,100 --> 00:57:39,000 And this is where C becomes less feature-able than more modern languages 1250 00:57:39,000 --> 00:57:42,580 like Python and Java and others, where you can, in fact, do that. 1251 00:57:42,580 --> 00:57:44,890 So when we transition to Python in a few weeks' time, 1252 00:57:44,890 --> 00:57:47,140 we'll see how we can start solving problems like that. 1253 00:57:47,140 --> 00:57:50,100 But for now, it's up to you to initialize name and number 1254 00:57:50,100 --> 00:57:51,450 to something. 1255 00:57:51,450 --> 00:57:52,832 Yeah. 1256 00:57:52,832 --> 00:57:55,540 STUDENT: [INAUDIBLE] 1257 00:57:55,540 --> 00:58:04,123 1258 00:58:04,123 --> 00:58:05,540 DAVID MALAN: Really good question. 1259 00:58:05,540 --> 00:58:08,470 How can we adjust or critique the design of what I'm doing? 1260 00:58:08,470 --> 00:58:12,190 This is one of the few situations where I would say, hypocritically, 1261 00:58:12,190 --> 00:58:13,780 do as I say, not as I do. 1262 00:58:13,780 --> 00:58:17,710 I am using pretty ugly lines like this, just to introduce the syntax. 1263 00:58:17,710 --> 00:58:20,428 But my claim, pedagogically today, is that eventually, 1264 00:58:20,428 --> 00:58:22,720 when we start storing names and numbers or other things 1265 00:58:22,720 --> 00:58:26,230 in files or in databases, you won't have this redundancy. 1266 00:58:26,230 --> 00:58:29,080 You'll have one line of code or two lines of code that 1267 00:58:29,080 --> 00:58:31,660 read the information from the file or database 1268 00:58:31,660 --> 00:58:34,630 and then fill the entire array with that data. 1269 00:58:34,630 --> 00:58:37,330 For now, I'm just doing it manually so as to keep our focus only 1270 00:58:37,330 --> 00:58:39,400 on the new syntax, but that's it. 1271 00:58:39,400 --> 00:58:42,640 So forgive the bad design by design today. 1272 00:58:42,640 --> 00:58:45,740 Other questions on this? 1273 00:58:45,740 --> 00:58:47,595 All right, that's been a lot already. 1274 00:58:47,595 --> 00:58:50,470 Why don't we go ahead and take our 10 minute break with snacks first. 1275 00:58:50,470 --> 00:58:53,020 We have some delightful brownies in the lobby. 1276 00:58:53,020 --> 00:58:55,900 All right, we are back. 1277 00:58:55,900 --> 00:58:59,890 And up until now, it clearly seems to be a good thing if your data is sorted, 1278 00:58:59,890 --> 00:59:02,350 because you can use binary search. 1279 00:59:02,350 --> 00:59:05,540 You know a little something more about the data. 1280 00:59:05,540 --> 00:59:07,570 But it turns out that sorting in and of itself 1281 00:59:07,570 --> 00:59:10,420 is kind of a problem to solve too. 1282 00:59:10,420 --> 00:59:14,830 And you might think, well, if sorting is going 1283 00:59:14,830 --> 00:59:18,070 to be pretty fast, we absolutely should do it before we start searching, 1284 00:59:18,070 --> 00:59:20,237 because that will just speed up all of our searches. 1285 00:59:20,237 --> 00:59:22,960 But if sorting is slow, that kind of invites the question, well, 1286 00:59:22,960 --> 00:59:25,420 should we bother sorting our data if we're only 1287 00:59:25,420 --> 00:59:28,090 going to search the data maybe once, maybe twice? 1288 00:59:28,090 --> 00:59:30,550 And so here is going to be, potentially, a trade off. 1289 00:59:30,550 --> 00:59:33,250 So let's consider what it means really to sort data. 1290 00:59:33,250 --> 00:59:35,950 In our case, it's just going to be simple and use numbers. 1291 00:59:35,950 --> 00:59:38,230 But it might, in the case of the Googles of the world, 1292 00:59:38,230 --> 00:59:40,880 be actual web pages or persons or the like. 1293 00:59:40,880 --> 00:59:46,090 So here is our typical picture for sorting, for solving any problem. 1294 00:59:46,090 --> 00:59:48,190 Input at left and output at right. 1295 00:59:48,190 --> 00:59:54,340 The input to our sort problem is going to be some unsorted set of values. 1296 00:59:54,340 --> 00:59:57,940 And the output, ideally, will be the same set of values sorted. 1297 00:59:57,940 --> 01:00:00,550 And if we do this concretely, let's suppose 1298 01:00:00,550 --> 01:00:04,786 that we want to go about sorting this list of numbers, 7, 2, 5, 4, 1, 6, 0, 1299 01:00:04,786 --> 01:00:05,860 3. 1300 01:00:05,860 --> 01:00:07,810 So it's all of the numbers from 0 to 7. 1301 01:00:07,810 --> 01:00:09,757 But they're somehow jumbled up randomly. 1302 01:00:09,757 --> 01:00:11,590 That's going to be the input to the problem. 1303 01:00:11,590 --> 01:00:13,840 And the goal is now to sort those so that you, indeed, 1304 01:00:13,840 --> 01:00:17,990 get out 0, 1, 2, 3, 4, 5, 6, 7 instead. 1305 01:00:17,990 --> 01:00:20,440 So it turns out there's lots of different ways 1306 01:00:20,440 --> 01:00:23,900 we can actually sort numbers like these here. 1307 01:00:23,900 --> 01:00:27,430 And in fact, just to complement our search example earlier, 1308 01:00:27,430 --> 01:00:29,952 could we perhaps quickly get some eight volunteers 1309 01:00:29,952 --> 01:00:32,410 to come up if you're comfortable appearing on the internet? 1310 01:00:32,410 --> 01:00:39,100 If you want to do 1, 2, 3, 4, 5, 6, 7, 8, how about? 1311 01:00:39,100 --> 01:00:40,255 All right, come on down. 1312 01:00:40,255 --> 01:00:45,040 1313 01:00:45,040 --> 01:00:47,970 All right. 1314 01:00:47,970 --> 01:00:50,560 Come on over here, and I'll give you each a number. 1315 01:00:50,560 --> 01:00:54,240 And if you want to start to organize yourselves in the same order 1316 01:00:54,240 --> 01:00:58,390 you see the numbers on the board. 1317 01:00:58,390 --> 01:01:01,530 So look up on the overhead and organize yourselves from left 1318 01:01:01,530 --> 01:01:04,460 to right in that same order. 1319 01:01:04,460 --> 01:01:06,210 And let's have the first of you-- perfect. 1320 01:01:06,210 --> 01:01:10,420 If you want to come right over here, how about right in line with this? 1321 01:01:10,420 --> 01:01:13,990 All right, and a few more numbers. 1322 01:01:13,990 --> 01:01:14,980 All right. 1323 01:01:14,980 --> 01:01:19,810 Number 2, 6, and perfect. 1324 01:01:19,810 --> 01:01:21,625 Just the right number, all right. 1325 01:01:21,625 --> 01:01:22,858 Uh oh. 1326 01:01:22,858 --> 01:01:24,400 All right, there we go, number three. 1327 01:01:24,400 --> 01:01:24,968 All right. 1328 01:01:24,968 --> 01:01:26,260 So let's just do a quick check. 1329 01:01:26,260 --> 01:01:30,867 We have 7, 2, 5, 4, 1, 6, 0, 3, very good so far. 1330 01:01:30,867 --> 01:01:32,950 Do you want to just scootch a little this way just 1331 01:01:32,950 --> 01:01:34,510 to make a little more room? 1332 01:01:34,510 --> 01:01:38,090 All right, and let's consider now who we have here on stage. 1333 01:01:38,090 --> 01:01:40,780 You want to each say a quick hello to the audience? 1334 01:01:40,780 --> 01:01:42,070 RYAN: Hi, my name is Ryan. 1335 01:01:42,070 --> 01:01:45,597 I'm a first year from Pennypacker. 1336 01:01:45,597 --> 01:01:46,930 ITSELLE: Hi, my name is Itselle. 1337 01:01:46,930 --> 01:01:49,177 I'm a first year at Strauss. 1338 01:01:49,177 --> 01:01:50,260 LUCY: Hi, my name is Lucy. 1339 01:01:50,260 --> 01:01:52,400 And I'm a first year from Greenough. 1340 01:01:52,400 --> 01:01:53,650 SHILOH: Hi, my name is Shiloh. 1341 01:01:53,650 --> 01:01:55,927 I'm a first year in Wigglesworth. 1342 01:01:55,927 --> 01:01:57,010 JACK: Hi, my name is Jack. 1343 01:01:57,010 --> 01:01:59,877 And I'm a first year in Strauss. 1344 01:01:59,877 --> 01:02:01,210 KATHRYN: Hi, my name is Kathryn. 1345 01:02:01,210 --> 01:02:02,787 I'm a first year at Strauss. 1346 01:02:02,787 --> 01:02:04,120 MICHAEL: Hi, my name is Michael. 1347 01:02:04,120 --> 01:02:06,063 I'm a first year at Pennypacker. 1348 01:02:06,063 --> 01:02:07,480 MUHAMMAD: Hi, my name is Muhammad. 1349 01:02:07,480 --> 01:02:09,047 I'm a first year in Matthews. 1350 01:02:09,047 --> 01:02:10,630 DAVID MALAN: Hi, nice, welcome aboard. 1351 01:02:10,630 --> 01:02:11,240 All right. 1352 01:02:11,240 --> 01:02:16,510 So let's consider, now, how we might go about sorting our kind volunteers here, 1353 01:02:16,510 --> 01:02:21,160 the goal being to get them into order from smallest to largest so that, 1354 01:02:21,160 --> 01:02:24,160 presumably then, we can use something smarter than just linear search. 1355 01:02:24,160 --> 01:02:26,350 We can actually use binary search, assuming 1356 01:02:26,350 --> 01:02:27,878 that they are already then sorted. 1357 01:02:27,878 --> 01:02:30,670 So let me propose that we first consider an algorithm that actually 1358 01:02:30,670 --> 01:02:32,600 has a name called selection sort. 1359 01:02:32,600 --> 01:02:36,220 And selection sort is going to be one that literally has me, 1360 01:02:36,220 --> 01:02:40,060 or really you, as the programmer, selecting the smallest element again 1361 01:02:40,060 --> 01:02:43,610 and again, and then putting them into the appropriate place. 1362 01:02:43,610 --> 01:02:47,115 So let me go ahead and start this here, starting with the number 7. 1363 01:02:47,115 --> 01:02:49,240 At the moment, 7 is the smallest number I've found. 1364 01:02:49,240 --> 01:02:52,610 So I'm going to make mental note of that with a mental variable, if you will. 1365 01:02:52,610 --> 01:02:53,710 I'm going to move on now. 1366 01:02:53,710 --> 01:02:55,460 Number 2 is obviously smaller, so I'm just 1367 01:02:55,460 --> 01:02:58,360 going to update my mental reminder that 2 is now the smallest, 1368 01:02:58,360 --> 01:03:01,555 effectively forgetting, for now, number 7. 1369 01:03:01,555 --> 01:03:02,440 5, not smaller. 1370 01:03:02,440 --> 01:03:03,370 4, not smaller. 1371 01:03:03,370 --> 01:03:04,170 1, smaller. 1372 01:03:04,170 --> 01:03:05,920 And I'm going to make mental note of that. 1373 01:03:05,920 --> 01:03:07,030 6, not smaller. 1374 01:03:07,030 --> 01:03:08,200 0, even smaller. 1375 01:03:08,200 --> 01:03:11,140 I'll make mental note of that, having forgotten now everything else. 1376 01:03:11,140 --> 01:03:13,180 And now number 3 is not smaller. 1377 01:03:13,180 --> 01:03:14,290 So what's your name again? 1378 01:03:14,290 --> 01:03:14,630 MICHAEL: Michael. 1379 01:03:14,630 --> 01:03:16,240 DAVID MALAN: So Michael is number 0. 1380 01:03:16,240 --> 01:03:18,310 He belongs, of course, way down there. 1381 01:03:18,310 --> 01:03:20,740 But unfortunately-- you are-- 1382 01:03:20,740 --> 01:03:21,550 RYAN: Ryan. 1383 01:03:21,550 --> 01:03:23,360 DAVID MALAN: Ryan is in the way. 1384 01:03:23,360 --> 01:03:24,580 So what should we do? 1385 01:03:24,580 --> 01:03:27,570 How should we start to sort this list? 1386 01:03:27,570 --> 01:03:30,510 Where should number 0 go? 1387 01:03:30,510 --> 01:03:31,012 Yeah. 1388 01:03:31,012 --> 01:03:32,220 Do you want to say it louder? 1389 01:03:32,220 --> 01:03:34,545 STUDENT: I will swap, I think. 1390 01:03:34,545 --> 01:03:36,670 DAVID MALAN: Yeah, so let's just go ahead and swap. 1391 01:03:36,670 --> 01:03:39,190 So if you want to go ahead and 0, go on where 7 is. 1392 01:03:39,190 --> 01:03:41,170 We need to make room for number 7. 1393 01:03:41,170 --> 01:03:44,350 It would kind of be cheating if maybe everyone kind of politely 1394 01:03:44,350 --> 01:03:45,530 stepped over to the side. 1395 01:03:45,530 --> 01:03:46,030 Why? 1396 01:03:46,030 --> 01:03:49,000 Because if we imagine all of our volunteers here to be an array, like, 1397 01:03:49,000 --> 01:03:52,390 that's a crazy amount of work to have every element in the array 1398 01:03:52,390 --> 01:03:54,190 shift to the left just to make room. 1399 01:03:54,190 --> 01:03:57,340 So we're going to keep it simple and just evict whoever's there now. 1400 01:03:57,340 --> 01:04:00,880 Now, maybe we get lucky, and number 7 is actually closer to its destination. 1401 01:04:00,880 --> 01:04:03,250 Maybe we get unlucky, and it goes farther away. 1402 01:04:03,250 --> 01:04:05,260 But we've at least solved one problem. 1403 01:04:05,260 --> 01:04:08,140 If we had n problems at first, now we have n minus 1, 1404 01:04:08,140 --> 01:04:10,280 because number 0 is indeed in the right place. 1405 01:04:10,280 --> 01:04:14,588 So if I continue to act this out, let me go ahead and say 2, currently 1406 01:04:14,588 --> 01:04:15,130 the smallest. 1407 01:04:15,130 --> 01:04:18,040 5, no, 4, no, 1 currently the smallest. 1408 01:04:18,040 --> 01:04:19,000 I'll make mental note. 1409 01:04:19,000 --> 01:04:22,690 6, 7, 3, and now let me pause. 1410 01:04:22,690 --> 01:04:26,000 1 is obviously the now smallest element. 1411 01:04:26,000 --> 01:04:27,760 So did I need to keep going? 1412 01:04:27,760 --> 01:04:30,550 Well, turns out, at least as I've defined selection sort, 1413 01:04:30,550 --> 01:04:33,130 I do need to keep going, because I only claim 1414 01:04:33,130 --> 01:04:36,550 that I'm using one variable in my mind to remember the then smallest element. 1415 01:04:36,550 --> 01:04:39,970 I'm not smart enough like us humans to remember, wait a minute, 1416 01:04:39,970 --> 01:04:41,590 1 is definitely the smallest now. 1417 01:04:41,590 --> 01:04:43,190 I don't have that whole recollection. 1418 01:04:43,190 --> 01:04:45,590 So I just am keeping track of the now smallest. 1419 01:04:45,590 --> 01:04:46,910 So number 1, your name was? 1420 01:04:46,910 --> 01:04:47,410 JACK: Jack. 1421 01:04:47,410 --> 01:04:49,300 DAVID MALAN: Jack, where should Jack go? 1422 01:04:49,300 --> 01:04:50,380 Probably there. 1423 01:04:50,380 --> 01:04:51,670 And what's your name? 1424 01:04:51,670 --> 01:04:51,880 ITSELLE: Itselle. 1425 01:04:51,880 --> 01:04:54,588 DAVID MALAN: OK, so Jack and Itselle, if you want to swap places, 1426 01:04:54,588 --> 01:04:57,430 we've now solved two of the n total problems. 1427 01:04:57,430 --> 01:04:58,990 And now we'll do it a little faster. 1428 01:04:58,990 --> 01:05:03,880 If each of you want to start to swap as I find the right person, so 5 smallest, 1429 01:05:03,880 --> 01:05:06,340 4 smaller, 2 is smaller. 1430 01:05:06,340 --> 01:05:07,750 Got to keep checking. 1431 01:05:07,750 --> 01:05:09,572 OK, 2 was smaller. 1432 01:05:09,572 --> 01:05:11,780 All right, now I'm going to go back to the beginning. 1433 01:05:11,780 --> 01:05:13,090 All right, 4 is small. 1434 01:05:13,090 --> 01:05:14,050 5 is not. 1435 01:05:14,050 --> 01:05:14,740 6 is not. 1436 01:05:14,740 --> 01:05:16,120 7-- oh, 3 is small. 1437 01:05:16,120 --> 01:05:17,770 Where do you want to go? 1438 01:05:17,770 --> 01:05:18,670 OK, good. 1439 01:05:18,670 --> 01:05:19,810 I'm going to go back here. 1440 01:05:19,810 --> 01:05:21,060 And I can be a little smart. 1441 01:05:21,060 --> 01:05:22,810 I don't have to go all the way to the end, 1442 01:05:22,810 --> 01:05:24,950 because I know these folks are already sorted. 1443 01:05:24,950 --> 01:05:26,630 So I can at least optimize slightly. 1444 01:05:26,630 --> 01:05:27,970 So now 5 is small. 1445 01:05:27,970 --> 01:05:28,720 6 is small. 1446 01:05:28,720 --> 01:05:30,160 7 is 4, 4 is smaller. 1447 01:05:30,160 --> 01:05:33,080 If you want to go in place there. 1448 01:05:33,080 --> 01:05:34,810 And now, here things get interesting. 1449 01:05:34,810 --> 01:05:37,180 I can optimize by not looking at these folks 1450 01:05:37,180 --> 01:05:39,340 anymore, because they're obviously problem solved. 1451 01:05:39,340 --> 01:05:42,970 But now 5 is small, 6 is not, 7 is not. 1452 01:05:42,970 --> 01:05:45,010 OK, 5, you can stay where you are. 1453 01:05:45,010 --> 01:05:46,870 Now, a human in the room is obviously going 1454 01:05:46,870 --> 01:05:49,420 to question why I'm wasting any more time. 1455 01:05:49,420 --> 01:05:52,090 But with selection sort, as I've defined it thus far, 1456 01:05:52,090 --> 01:05:55,840 I still have to, now, check 6 is smallest, not 7. 1457 01:05:55,840 --> 01:05:58,520 And now my final step, OK, they're all in place. 1458 01:05:58,520 --> 01:06:00,760 So here, too, is this dichotomy between what we all 1459 01:06:00,760 --> 01:06:03,730 have is this bird's eye view of the whole problem, where it's obvious 1460 01:06:03,730 --> 01:06:05,060 where everyone needs to go. 1461 01:06:05,060 --> 01:06:09,137 But a computer implementing this with an array really has to be more methodical. 1462 01:06:09,137 --> 01:06:10,720 And we're actually saving a step here. 1463 01:06:10,720 --> 01:06:13,780 If we were really doing this, none of these numbers would be visible. 1464 01:06:13,780 --> 01:06:16,840 All eight of our volunteers would be inside of a locked door. 1465 01:06:16,840 --> 01:06:19,220 And only then could we see them one at a time. 1466 01:06:19,220 --> 01:06:21,670 But we're focusing now just on the sorting aspect. 1467 01:06:21,670 --> 01:06:24,640 So let me just, before we do one other demonstration here, 1468 01:06:24,640 --> 01:06:29,620 propose that what I really just did here in pseudocode was something like this. 1469 01:06:29,620 --> 01:06:33,430 For i from 0 to n minus 1, keeping in mind 1470 01:06:33,430 --> 01:06:35,590 that 0 is always the left of the array. 1471 01:06:35,590 --> 01:06:38,110 n minus 1 is always the right end of the array. 1472 01:06:38,110 --> 01:06:43,000 For i from 0 to n minus 1, I found the smallest number between numbers bracket 1473 01:06:43,000 --> 01:06:45,730 i and numbers bracket n minus 1. 1474 01:06:45,730 --> 01:06:48,610 And that's the very geeky way of expressing this optimization. 1475 01:06:48,610 --> 01:06:51,490 I'm always starting from numbers bracket i wherever I am. 1476 01:06:51,490 --> 01:06:53,200 And then everything else to the right. 1477 01:06:53,200 --> 01:06:56,890 And that's what was allowing me to ignore the already sorted volunteers. 1478 01:06:56,890 --> 01:07:01,220 If, though, my last line says swap smallest number with numbers i, 1479 01:07:01,220 --> 01:07:03,220 think that implements what our humans were doing 1480 01:07:03,220 --> 01:07:05,470 by physically walking to another spot. 1481 01:07:05,470 --> 01:07:09,220 All right, so that, then, would be what we'll call selection sort. 1482 01:07:09,220 --> 01:07:11,620 Let's go ahead and take a second approach here 1483 01:07:11,620 --> 01:07:13,360 using an algorithm that I'm going to call bubble sort. 1484 01:07:13,360 --> 01:07:16,090 But to do this, we need you all to reset to your original locations. 1485 01:07:16,090 --> 01:07:17,320 We have a little cheat sheet on the board 1486 01:07:17,320 --> 01:07:19,750 if you'd like to go back to this position here. 1487 01:07:19,750 --> 01:07:22,600 And let me take a fundamentally different approach, because I'm not 1488 01:07:22,600 --> 01:07:24,850 really liking selection sort as is, because it's kind 1489 01:07:24,850 --> 01:07:26,780 of a lot of walking back and forth. 1490 01:07:26,780 --> 01:07:30,620 And the lot of walking suggests a lot of, lot of steps again and again. 1491 01:07:30,620 --> 01:07:32,090 So what might I do instead? 1492 01:07:32,090 --> 01:07:34,840 Well, bubble sort is going to have me focus a little more 1493 01:07:34,840 --> 01:07:36,730 intuitively on just smaller problems. 1494 01:07:36,730 --> 01:07:38,605 And let's see if this gets me somewhere else. 1495 01:07:38,605 --> 01:07:42,430 So if I just look at this list without looking at everyone else, 7 and 2, 1496 01:07:42,430 --> 01:07:43,670 this is obviously a problem. 1497 01:07:43,670 --> 01:07:44,170 Why? 1498 01:07:44,170 --> 01:07:45,500 Because you're out of order. 1499 01:07:45,500 --> 01:07:47,810 So let's just solve one tiny problem first. 1500 01:07:47,810 --> 01:07:49,570 So 7 and 2, why don't you swap? 1501 01:07:49,570 --> 01:07:54,160 I know 2 is in a better place now, because she's definitely less than 7. 1502 01:07:54,160 --> 01:07:55,540 So I think I can now move on. 1503 01:07:55,540 --> 01:07:57,350 7 and 5, problem. 1504 01:07:57,350 --> 01:07:58,390 So let's solve that. 1505 01:07:58,390 --> 01:07:59,830 7 and 4, problem. 1506 01:07:59,830 --> 01:08:02,380 Let's solve that, 7 and 1, let's solve that. 1507 01:08:02,380 --> 01:08:03,970 7 and 6, let's solve that. 1508 01:08:03,970 --> 01:08:05,080 7 and 0, solve that. 1509 01:08:05,080 --> 01:08:06,550 7 and 3, solve that. 1510 01:08:06,550 --> 01:08:07,330 OK, done. 1511 01:08:07,330 --> 01:08:09,130 Sorted, right? 1512 01:08:09,130 --> 01:08:11,780 Or obviously not, if you just glance at these numbers here. 1513 01:08:11,780 --> 01:08:14,530 But we have, fundamentally, taken a bite out of the problem. 1514 01:08:14,530 --> 01:08:17,020 7 is indeed in the right place. 1515 01:08:17,020 --> 01:08:21,170 So we maximally have n minus 1 other problems to solve. 1516 01:08:21,170 --> 01:08:23,660 So how do I do this? 1517 01:08:23,660 --> 01:08:25,700 I think I can just repeat the same logic. 1518 01:08:25,700 --> 01:08:26,770 Let me go over here. 1519 01:08:26,770 --> 01:08:28,210 2 and 5, good. 1520 01:08:28,210 --> 01:08:29,800 5 and 4, no. 1521 01:08:29,800 --> 01:08:31,330 5 and 1, no. 1522 01:08:31,330 --> 01:08:32,590 5 and 6, yes. 1523 01:08:32,590 --> 01:08:34,660 6 and 0, no. 1524 01:08:34,660 --> 01:08:36,760 6 and 3, no. 1525 01:08:36,760 --> 01:08:39,191 So now we've solved two of the problems. 1526 01:08:39,191 --> 01:08:41,649 And what's nice about bubble sort, at least as this glance, 1527 01:08:41,649 --> 01:08:42,707 it's nice and simple. 1528 01:08:42,707 --> 01:08:43,540 It's nice and local. 1529 01:08:43,540 --> 01:08:46,510 And you just keep incrementally solving more and more problems. 1530 01:08:46,510 --> 01:08:48,010 So let's go ahead and do this again. 1531 01:08:48,010 --> 01:08:50,080 And I'll do it-- we can do it faster. 1532 01:08:50,080 --> 01:08:51,760 2 and 4, we know are good. 1533 01:08:51,760 --> 01:08:59,200 4 and 1, 4 and 5, 5 and 0, 5 and 3, 5 and 6, 6 and 7, good. 1534 01:08:59,200 --> 01:09:01,390 So we go back, 2 and 1. 1535 01:09:01,390 --> 01:09:03,340 Ah, now another problem solve. 1536 01:09:03,340 --> 01:09:09,939 2, and 4, 4 and 0, 4 and 3, 4 and 5, 5 and 6, 6 and 7. 1537 01:09:09,939 --> 01:09:13,270 And so notice 2, as per its name, the largest elements 1538 01:09:13,270 --> 01:09:14,895 have bubbled their way up to the top. 1539 01:09:14,895 --> 01:09:17,020 And that's what seems to be happening just as we're 1540 01:09:17,020 --> 01:09:18,340 fixing some remaining problems. 1541 01:09:18,340 --> 01:09:19,120 So almost done. 1542 01:09:19,120 --> 01:09:27,550 1 and 2, 2 and 0, 2 and 3, 3 and 4, 4 and 5, 5 and 6, 6 and 7, almost done. 1543 01:09:27,550 --> 01:09:29,830 Obviously, to us humans, it looks done. 1544 01:09:29,830 --> 01:09:32,529 How do I know as the computer for sure? 1545 01:09:32,529 --> 01:09:36,370 What would be the most surefire way for me to now go, it's not done, sorry. 1546 01:09:36,370 --> 01:09:38,080 That's a bug. 1547 01:09:38,080 --> 01:09:43,390 OK, 1 and 0, 1 and 2, 2 and 3, 3 and 4, 4 and 5, 5 and 6, 6 and 7. 1548 01:09:43,390 --> 01:09:47,899 OK, so now it's obviously sorted to the rest of us on stage. 1549 01:09:47,899 --> 01:09:50,290 How could I confirm as much as code? 1550 01:09:50,290 --> 01:09:52,670 You're doing it with your mind, just glancing at this. 1551 01:09:52,670 --> 01:09:56,080 How would the computer, the code, know for sure that this list is now sorted? 1552 01:09:56,080 --> 01:09:57,000 Yeah. 1553 01:09:57,000 --> 01:09:58,500 STUDENT: [INAUDIBLE] one more time. 1554 01:09:58,500 --> 01:10:00,000 DAVID MALAN: Let's do one more time. 1555 01:10:00,000 --> 01:10:03,512 And look, draw what conclusion? 1556 01:10:03,512 --> 01:10:05,490 STUDENT: That nothing has to switch at all. 1557 01:10:05,490 --> 01:10:07,365 DAVID MALAN: Yeah, let's do it one more time, 1558 01:10:07,365 --> 01:10:08,860 even though it's a little wasteful. 1559 01:10:08,860 --> 01:10:13,320 But logically, if I go through the whole list comparing pairs again, again, 1560 01:10:13,320 --> 01:10:15,810 and again, and I don't do any work that time, 1561 01:10:15,810 --> 01:10:19,133 now it's obviously logically safe to just stop, because otherwise, I'm 1562 01:10:19,133 --> 01:10:21,300 wasting my time doing the same thing again and again 1563 01:10:21,300 --> 01:10:22,933 if no one's actually moving. 1564 01:10:22,933 --> 01:10:25,350 So I'm afraid we don't have monopoly games for all of you. 1565 01:10:25,350 --> 01:10:26,767 But we do have eight stress balls. 1566 01:10:26,767 --> 01:10:30,090 And round of applause, if we could, for our volunteers. 1567 01:10:30,090 --> 01:10:33,910 If you want to put your numbers on the shelf there. 1568 01:10:33,910 --> 01:10:36,220 So if we consider for a moment-- 1569 01:10:36,220 --> 01:10:36,720 thank you. 1570 01:10:36,720 --> 01:10:39,340 Thank you so much. 1571 01:10:39,340 --> 01:10:42,150 Sure. 1572 01:10:42,150 --> 01:10:43,170 Thank you. 1573 01:10:43,170 --> 01:10:44,230 Thanks. 1574 01:10:44,230 --> 01:10:44,730 Sure. 1575 01:10:44,730 --> 01:10:48,870 So if we consider now these two algorithms, which one is better? 1576 01:10:48,870 --> 01:10:51,990 Any intuition for whether selection sort the first 1577 01:10:51,990 --> 01:10:55,950 is better or worse than bubble sort the second? 1578 01:10:55,950 --> 01:10:58,020 Any thoughts? 1579 01:10:58,020 --> 01:10:58,860 Yeah. 1580 01:10:58,860 --> 01:11:03,620 STUDENT: Bubble sort's even better because it's less work [INAUDIBLE].. 1581 01:11:03,620 --> 01:11:06,110 DAVID MALAN: So bubble sort seems like less work, 1582 01:11:06,110 --> 01:11:08,930 especially since I was focusing on those localized problems. 1583 01:11:08,930 --> 01:11:11,460 Other intuition? 1584 01:11:11,460 --> 01:11:14,580 Selection sort versus bubble sort. 1585 01:11:14,580 --> 01:11:18,000 Well, let me propose that we try to quantize this so we can actually 1586 01:11:18,000 --> 01:11:19,420 analyze it in some way. 1587 01:11:19,420 --> 01:11:22,590 And this is not an exercise we'll do constantly for lots of algorithms. 1588 01:11:22,590 --> 01:11:24,940 But these are pretty representative of algorithms. 1589 01:11:24,940 --> 01:11:27,690 So we can wrap our minds around, indeed, the performance 1590 01:11:27,690 --> 01:11:28,960 or the design of these things. 1591 01:11:28,960 --> 01:11:34,350 So here is my pseudocode for selection sort, whereby as per its name, 1592 01:11:34,350 --> 01:11:38,500 I just iteratively select the next smallest element again and again. 1593 01:11:38,500 --> 01:11:41,890 So how can we go about analyzing something like this? 1594 01:11:41,890 --> 01:11:43,920 Well, we could just do it on paper pencil 1595 01:11:43,920 --> 01:11:46,260 and count up the number of steps that seem 1596 01:11:46,260 --> 01:11:48,030 to be implied logically by the code. 1597 01:11:48,030 --> 01:11:52,260 We could literally count the number of steps I was taking again and again, 1598 01:11:52,260 --> 01:11:52,890 left to right. 1599 01:11:52,890 --> 01:11:55,830 We could also just count the number of comparisons 1600 01:11:55,830 --> 01:11:58,302 I was making with each of the persons involved. 1601 01:11:58,302 --> 01:12:00,510 And I was doing it kind of quickly in selection sort. 1602 01:12:00,510 --> 01:12:03,033 But every time I was looking at a person trying to decide, 1603 01:12:03,033 --> 01:12:04,950 do I want to remember that number is smallest? 1604 01:12:04,950 --> 01:12:08,580 That number, I was comparing two values with an equals equals or less 1605 01:12:08,580 --> 01:12:11,700 than or greater than sign, at least if we had done this in code. 1606 01:12:11,700 --> 01:12:13,110 So that tends to be the norm. 1607 01:12:13,110 --> 01:12:16,680 When analyzing algorithms like these, counting the number of comparisons, 1608 01:12:16,680 --> 01:12:21,090 because it's kind of a global unit of measure 1609 01:12:21,090 --> 01:12:23,490 we can use to compare different algorithms entirely. 1610 01:12:23,490 --> 01:12:27,780 So think, too, that in the general case, when 1611 01:12:27,780 --> 01:12:30,390 we have more than eight volunteers, more than seven doors, 1612 01:12:30,390 --> 01:12:33,510 we can generalize our array in general, as this 1613 01:12:33,510 --> 01:12:35,220 is the first element at bracket 0. 1614 01:12:35,220 --> 01:12:37,770 And the end of it is always n minus 1. 1615 01:12:37,770 --> 01:12:41,550 So arrays or doors, in this case, or volunteers, 1616 01:12:41,550 --> 01:12:45,720 are always numerically indexed from 0 on up to n minus 1, 1617 01:12:45,720 --> 01:12:47,200 if there's n of them in total. 1618 01:12:47,200 --> 01:12:50,940 So how do we analyze the code of selection sort? 1619 01:12:50,940 --> 01:12:56,370 Well, how many steps did it take me to find the first smallest element? 1620 01:12:56,370 --> 01:12:58,835 Or more precisely, how many comparisons did I 1621 01:12:58,835 --> 01:13:00,960 need to make when I walked to left to right to find 1622 01:13:00,960 --> 01:13:06,100 our first-smallest person, which ended up being 0? 1623 01:13:06,100 --> 01:13:09,310 How many comparisons did I do when walking left to right? 1624 01:13:09,310 --> 01:13:15,850 If there were eight people on stage, how many total comparisons that I do? 1625 01:13:15,850 --> 01:13:18,280 Like if there's eight people, I compared these folks. 1626 01:13:18,280 --> 01:13:22,210 Then this person, this person, yeah. 1627 01:13:22,210 --> 01:13:23,412 Yeah, so seven total, right? 1628 01:13:23,412 --> 01:13:25,120 Because if there's eight people on stage, 1629 01:13:25,120 --> 01:13:28,420 you can only do seven comparisons total, because otherwise you'd 1630 01:13:28,420 --> 01:13:29,960 be comparing one number to itself. 1631 01:13:29,960 --> 01:13:31,960 So it seems like, in the general case, if you've 1632 01:13:31,960 --> 01:13:34,600 got n numbers that you're trying to sort, 1633 01:13:34,600 --> 01:13:38,560 finding the smallest element first takes n minus 1 comparisons. 1634 01:13:38,560 --> 01:13:41,275 Maybe n total steps left to right. 1635 01:13:41,275 --> 01:13:43,150 But the number of comparisons, which I claim, 1636 01:13:43,150 --> 01:13:46,030 is just a useful unit of measure, is n minus 1. 1637 01:13:46,030 --> 01:13:48,490 How about finding the next smallest person? 1638 01:13:48,490 --> 01:13:51,970 How many steps did it take me to find the next smallest number, which 1639 01:13:51,970 --> 01:13:53,200 ended up being the number 1? 1640 01:13:53,200 --> 01:13:55,790 1641 01:13:55,790 --> 01:13:56,855 Yeah. 1642 01:13:56,855 --> 01:13:58,340 STUDENT: [INAUDIBLE] n minus 2. 1643 01:13:58,340 --> 01:13:59,600 DAVID MALAN: Yeah, so just n minus 2. 1644 01:13:59,600 --> 01:13:59,870 Why? 1645 01:13:59,870 --> 01:14:01,610 Because I'd already solved one problem. 1646 01:14:01,610 --> 01:14:03,210 Someone was already in the right position. 1647 01:14:03,210 --> 01:14:05,490 It would be silly to keep counting them again and again. 1648 01:14:05,490 --> 01:14:08,157 So I can whittle down my number of comparisons for the next pass 1649 01:14:08,157 --> 01:14:09,200 to n minus 2. 1650 01:14:09,200 --> 01:14:12,350 The third pass to find the third smallest number would be n minus 3. 1651 01:14:12,350 --> 01:14:15,770 And then dot, dot, dot, presumably this story, this formula, 1652 01:14:15,770 --> 01:14:19,620 ends when you have just one final pair, the people at the end, to compare. 1653 01:14:19,620 --> 01:14:22,820 So if this is looking a little reminiscent of some kind of recurrence 1654 01:14:22,820 --> 01:14:25,520 from high school or high school math or physics or the like, 1655 01:14:25,520 --> 01:14:28,220 let me just stipulate that if you actually do out this math 1656 01:14:28,220 --> 01:14:32,840 and generalize it, that is the same thing as n times n minus 1 1657 01:14:32,840 --> 01:14:33,392 divided by 2. 1658 01:14:33,392 --> 01:14:35,100 And if you're rusty on that, no big deal. 1659 01:14:35,100 --> 01:14:37,070 Just kind of commit to memory that any time 1660 01:14:37,070 --> 01:14:40,010 you add up this kind of series, something plus something slightly 1661 01:14:40,010 --> 01:14:42,302 smaller, plus something slightly smaller, each of which 1662 01:14:42,302 --> 01:14:46,520 differs by 1, you're going to get this formula. n times n minus 1 over 2. 1663 01:14:46,520 --> 01:14:50,630 If we, of course, multiply that out, that's really n squared minus n, 1664 01:14:50,630 --> 01:14:51,680 all divided by 2. 1665 01:14:51,680 --> 01:14:56,540 If we keep multiplying it out, that's n squared divided by 2 minus n over 2. 1666 01:14:56,540 --> 01:14:59,660 And now, we have kind of a vocabulary with which we 1667 01:14:59,660 --> 01:15:03,140 can talk about the efficiency, the design of this algorithm. 1668 01:15:03,140 --> 01:15:06,380 But honestly, I don't really care about this level of precision, 1669 01:15:06,380 --> 01:15:09,560 like n squared divided by 2 minus n divided by 2. 1670 01:15:09,560 --> 01:15:14,240 As n gets really large, which of these symbols, which of these terms 1671 01:15:14,240 --> 01:15:17,480 is really going to dominate, become the biggest influencer 1672 01:15:17,480 --> 01:15:20,190 on the total value of steps? 1673 01:15:20,190 --> 01:15:20,690 Right? 1674 01:15:20,690 --> 01:15:21,890 It's the square, right? 1675 01:15:21,890 --> 01:15:23,382 It's definitely not n divided by 2. 1676 01:15:23,382 --> 01:15:24,590 That's shaving some time off. 1677 01:15:24,590 --> 01:15:27,800 But n squared, as n gets big, is going to get really big. 1678 01:15:27,800 --> 01:15:29,990 If n is 100, then n squared is bigger. 1679 01:15:29,990 --> 01:15:32,570 If n is a million, n squared is really bigger. 1680 01:15:32,570 --> 01:15:35,390 And so at the end of the day, when we're really just talking 1681 01:15:35,390 --> 01:15:39,140 about a wave of the hand analysis and upper bound, if you will, 1682 01:15:39,140 --> 01:15:43,130 let's just say that selection sort, as analyzed here, 1683 01:15:43,130 --> 01:15:45,860 it's on the order of n squared steps. 1684 01:15:45,860 --> 01:15:47,690 It's not precisely n squared steps. 1685 01:15:47,690 --> 01:15:52,830 But you know what? n squared divided by 2, the intuition here might be that, 1686 01:15:52,830 --> 01:15:55,250 well, it's half of that. 1687 01:15:55,250 --> 01:15:58,570 n squared is what really matters as n gets really, really large. 1688 01:15:58,570 --> 01:16:01,070 And that's when you start thinking about and trying to solve 1689 01:16:01,070 --> 01:16:02,445 the Google problems of the world. 1690 01:16:02,445 --> 01:16:04,670 When n gets large, that's when you have to be smarter 1691 01:16:04,670 --> 01:16:07,490 than just sort of naive implementations of any algorithm. 1692 01:16:07,490 --> 01:16:12,480 So where, then, does this algorithm fall into this categorization here? 1693 01:16:12,480 --> 01:16:14,870 Well, n squared, it turns out, is on the order 1694 01:16:14,870 --> 01:16:19,610 of n squared steps, in the worst case, whether it's sorted or not. 1695 01:16:19,610 --> 01:16:23,990 It turns out, though, lower bound, if we consider this same code, 1696 01:16:23,990 --> 01:16:28,670 suppose the best case scenario, like our eight volunteers came up on stage. 1697 01:16:28,670 --> 01:16:32,240 And just because they already sorted themselves, so 0 through 7. 1698 01:16:32,240 --> 01:16:34,490 Suppose they just happened to be in that state. 1699 01:16:34,490 --> 01:16:37,550 How many steps would selection store take 1700 01:16:37,550 --> 01:16:42,670 to sort an already-sorted list of volunteers? 1701 01:16:42,670 --> 01:16:43,420 Any intuition? 1702 01:16:43,420 --> 01:16:44,318 Yeah. 1703 01:16:44,318 --> 01:16:47,186 STUDENT: Would it still be [INAUDIBLE]? 1704 01:16:47,186 --> 01:16:49,717 DAVID MALAN: Would it still be n-- 1705 01:16:49,717 --> 01:16:51,180 STUDENT: Still be 7 [INAUDIBLE]. 1706 01:16:51,180 --> 01:16:53,263 DAVID MALAN: So for the first pass, it would still 1707 01:16:53,263 --> 01:16:55,710 be 7 for the first pass across the humans. 1708 01:16:55,710 --> 01:16:58,530 Because even though, yeah, I'm claiming 0 is here, 1709 01:16:58,530 --> 01:17:01,920 I don't know that 0 is the smallest until I make my way all the way 1710 01:17:01,920 --> 01:17:03,990 over there doing all seven comparisons. 1711 01:17:03,990 --> 01:17:08,220 OK, fine, first pass took seven or more generally n minus 1 steps. 1712 01:17:08,220 --> 01:17:11,940 What if I look for the next smallest element, and the humans in this story 1713 01:17:11,940 --> 01:17:14,370 are already sorted 0 through 7? 1714 01:17:14,370 --> 01:17:17,580 Well, yes, the number 1 is here, and I see them first. 1715 01:17:17,580 --> 01:17:21,405 But I don't know they're the smallest until I compare against everyone else 1716 01:17:21,405 --> 01:17:22,530 get to the end of the list. 1717 01:17:22,530 --> 01:17:24,238 And we're like, oh, well that was stupid. 1718 01:17:24,238 --> 01:17:26,550 I already had the smallest person in hand then. 1719 01:17:26,550 --> 01:17:29,820 And so this pseudocode, this implementation of selection sort, 1720 01:17:29,820 --> 01:17:31,650 is sort of fixed like this. 1721 01:17:31,650 --> 01:17:35,490 There's no special case that says, if already sorted, quit early. 1722 01:17:35,490 --> 01:17:37,860 It's always going to take n squared steps. 1723 01:17:37,860 --> 01:17:42,090 And so in this case, if we borrow our jargon from earlier 1724 01:17:42,090 --> 01:17:46,080 using omega notation, just to be clear, selection sort 1725 01:17:46,080 --> 01:17:50,790 is also going to be in this incarnation in omega of n squared, 1726 01:17:50,790 --> 01:17:53,190 because even in the best case, where the list is already 1727 01:17:53,190 --> 01:17:56,100 sorted, you're going to waste a huge amount of time 1728 01:17:56,100 --> 01:17:59,040 essentially verifying as much or discovering as much, 1729 01:17:59,040 --> 01:18:01,750 even though we humans of course could see it right away. 1730 01:18:01,750 --> 01:18:06,270 So selection sort would seem to take both n squared steps in the worst 1731 01:18:06,270 --> 01:18:08,425 case, n squared steps in the best case. 1732 01:18:08,425 --> 01:18:09,300 And so you know what? 1733 01:18:09,300 --> 01:18:11,280 We can use our theta terminology for that. 1734 01:18:11,280 --> 01:18:13,770 Here would be an algorithm, just like counting earlier, 1735 01:18:13,770 --> 01:18:17,880 that always takes n squared steps, no matter whether the array is sorted 1736 01:18:17,880 --> 01:18:19,652 or not from the get go. 1737 01:18:19,652 --> 01:18:21,360 All right, so hopefully we can do better. 1738 01:18:21,360 --> 01:18:23,550 And someone proposed earlier that bubble sort 1739 01:18:23,550 --> 01:18:25,618 felt like it was using fewer steps. 1740 01:18:25,618 --> 01:18:26,910 Well, let's consider that next. 1741 01:18:26,910 --> 01:18:30,630 With bubble sort, we had this pseudocode, I claim. 1742 01:18:30,630 --> 01:18:33,780 Whereby, let's focus on the inside of the code first. 1743 01:18:33,780 --> 01:18:36,120 Down here, what was I doing? 1744 01:18:36,120 --> 01:18:39,960 For i from 0 to n minus 2. 1745 01:18:39,960 --> 01:18:40,740 That's curious. 1746 01:18:40,740 --> 01:18:42,360 We've never seen n minus 2 before. 1747 01:18:42,360 --> 01:18:44,040 But I asked this question. 1748 01:18:44,040 --> 01:18:50,160 If numbers bracket i and numbers bracket i plus 1 are out of order, swap them. 1749 01:18:50,160 --> 01:18:53,610 So that was when I was pointing at our first two volunteers here. 1750 01:18:53,610 --> 01:18:57,090 I saw that they were out of order, so I swapped them. 1751 01:18:57,090 --> 01:19:01,830 How come I'm doing that again and again up to n minus 2, though, 1752 01:19:01,830 --> 01:19:05,610 instead of n minus 1, which we've always used up 1753 01:19:05,610 --> 01:19:09,670 until now as our rightmost boundary? 1754 01:19:09,670 --> 01:19:14,170 Any intuition for why I'm doing this from 0 to n minus 2? 1755 01:19:14,170 --> 01:19:14,700 Yeah. 1756 01:19:14,700 --> 01:19:18,540 STUDENT: [INAUDIBLE] number, you can't get rid of the ith number. 1757 01:19:18,540 --> 01:19:21,005 There's no benign character you can swap with. 1758 01:19:21,005 --> 01:19:21,880 DAVID MALAN: Exactly. 1759 01:19:21,880 --> 01:19:26,050 Because I'm looking at the ith person per this pseudocode here 1760 01:19:26,050 --> 01:19:29,740 and the ith plus 1 person, I better make sure I don't go step 1761 01:19:29,740 --> 01:19:31,550 beyond the boundaries of my array. 1762 01:19:31,550 --> 01:19:33,010 So if you think of my left hand. 1763 01:19:33,010 --> 01:19:36,250 When my back was to you here, pointing at the current person 1764 01:19:36,250 --> 01:19:39,225 at the first position, my right hand for this if conditioner 1765 01:19:39,225 --> 01:19:41,350 is essentially pointing at the person next to them. 1766 01:19:41,350 --> 01:19:44,740 And you want to iterate with your left hand all through these people. 1767 01:19:44,740 --> 01:19:47,620 But you don't want your left hand to point at the last person. 1768 01:19:47,620 --> 01:19:50,000 You want it to point at the second to last person. 1769 01:19:50,000 --> 01:19:54,220 But we know that the last person is always at n minus 1. 1770 01:19:54,220 --> 01:19:57,820 So the second to last person, just mathematically, is at n minus 2. 1771 01:19:57,820 --> 01:19:58,780 So it's a subtlety. 1772 01:19:58,780 --> 01:20:00,880 But this is a seg fault waiting to happen. 1773 01:20:00,880 --> 01:20:03,760 If you implemented bubble sort using n minus 1, 1774 01:20:03,760 --> 01:20:07,340 you will, my right hand would go beyond the boundaries of the array, 1775 01:20:07,340 --> 01:20:08,170 so just bad. 1776 01:20:08,170 --> 01:20:10,490 All right, so why am I saying this n times? 1777 01:20:10,490 --> 01:20:13,070 Well, we did it very organically with humans. 1778 01:20:13,070 --> 01:20:15,940 But each time someone-- 1779 01:20:15,940 --> 01:20:18,370 each pass I did through the array, someone 1780 01:20:18,370 --> 01:20:19,840 bubbled their way up to the end. 1781 01:20:19,840 --> 01:20:22,870 Number 7, then number 6, then number 5. 1782 01:20:22,870 --> 01:20:26,600 So if on each pass through the array of volunteers, 1783 01:20:26,600 --> 01:20:31,360 I was solving at least one problem, it seems like bubble sort can just 1784 01:20:31,360 --> 01:20:34,315 run n times total to solve all n problems, 1785 01:20:34,315 --> 01:20:36,940 because the first pass will get at least one number into place. 1786 01:20:36,940 --> 01:20:38,470 Second pass, second number into place. 1787 01:20:38,470 --> 01:20:39,970 You might get lucky, and it would do more. 1788 01:20:39,970 --> 01:20:41,740 But worst case, this feels like enough. 1789 01:20:41,740 --> 01:20:46,240 Just do this blindly n times, and they'll all line up together. 1790 01:20:46,240 --> 01:20:49,780 Well, technically-- all right, now we're getting into the weeds. 1791 01:20:49,780 --> 01:20:52,060 Technically, you can just repeat it in minus 1 times, 1792 01:20:52,060 --> 01:20:54,520 because if you solve all n minus 1 other problems, 1793 01:20:54,520 --> 01:20:58,120 and you're left with 1, literally that person's where they need to be, 1794 01:20:58,120 --> 01:20:58,900 just logically. 1795 01:20:58,900 --> 01:21:01,390 If you've already sorted everything else and you've got just the 1 left, 1796 01:21:01,390 --> 01:21:02,540 it's already bubbled up. 1797 01:21:02,540 --> 01:21:03,980 So how do we analyze this? 1798 01:21:03,980 --> 01:21:06,670 Well in bubble sort, we might do something like this. 1799 01:21:06,670 --> 01:21:11,015 I'm essentially doing n minus 1 things n minus 1 times. 1800 01:21:11,015 --> 01:21:13,390 Now, let me back up to the pseudocode, because this one's 1801 01:21:13,390 --> 01:21:14,980 a little less obvious. 1802 01:21:14,980 --> 01:21:19,270 This is where you can actually mathematically infer from your loop 1803 01:21:19,270 --> 01:21:21,110 how many steps you're taking. 1804 01:21:21,110 --> 01:21:24,585 So this first line literally says, repeat the following n minus 1 times. 1805 01:21:24,585 --> 01:21:26,710 So that's going to translate very straightforwardly 1806 01:21:26,710 --> 01:21:28,240 to our mathematical formula. 1807 01:21:28,240 --> 01:21:30,190 Do something n minus 1 times. 1808 01:21:30,190 --> 01:21:33,970 This loop, just because I'm using for loop terminology, 1809 01:21:33,970 --> 01:21:35,840 it's framed a little differently. 1810 01:21:35,840 --> 01:21:39,910 But if you're iterating from 0 to n minus 2, 1811 01:21:39,910 --> 01:21:43,258 you're iterating a total of n minus 1 times. 1812 01:21:43,258 --> 01:21:45,550 And again, the arithmetic is getting a little annoying. 1813 01:21:45,550 --> 01:21:48,470 But this just means do the following n minus 1 times. 1814 01:21:48,470 --> 01:21:51,670 So do n minus 1 things n minus 1 times. 1815 01:21:51,670 --> 01:21:54,440 We can now run out the math as follows. 1816 01:21:54,440 --> 01:21:57,940 We have the formula n minus 1 times n minus 1. 1817 01:21:57,940 --> 01:22:01,210 We do our little FOIL method here, n squared minus 1 times n, 1818 01:22:01,210 --> 01:22:03,100 minus 1 times n, plus 1. 1819 01:22:03,100 --> 01:22:06,550 We can combine like terms. n squared minus 2n plus 1. 1820 01:22:06,550 --> 01:22:09,025 But at this point, when n gets really large, 1821 01:22:09,025 --> 01:22:10,900 which term are we really going to care about? 1822 01:22:10,900 --> 01:22:13,390 This is on the order of? 1823 01:22:13,390 --> 01:22:14,870 Yeah, n squared. 1824 01:22:14,870 --> 01:22:16,780 So at least asymptotically. 1825 01:22:16,780 --> 01:22:20,830 Asymptotically means, as n approaches infinity, gets really large. 1826 01:22:20,830 --> 01:22:24,070 Turns out that the upper bound on selection sort and bubble sort 1827 01:22:24,070 --> 01:22:25,430 are essentially the same. 1828 01:22:25,430 --> 01:22:28,540 Now, if we really nitpicked and compared the total number of comparisons, 1829 01:22:28,540 --> 01:22:29,680 they might differ slightly. 1830 01:22:29,680 --> 01:22:31,870 But as n gets large, honestly, you're barely 1831 01:22:31,870 --> 01:22:36,350 going to notice the difference, it would seem, between these two algorithms. 1832 01:22:36,350 --> 01:22:39,550 But what about the lower bound? 1833 01:22:39,550 --> 01:22:44,620 If the upper bound on bubble sort is also big O of n, what about the lower 1834 01:22:44,620 --> 01:22:45,470 bound here? 1835 01:22:45,470 --> 01:22:50,170 Well, with this pseudocode, what would the lower bound be on bubble sort? 1836 01:22:50,170 --> 01:22:53,890 Even in the best case when all of the volunteers are sorted. 1837 01:22:53,890 --> 01:22:56,830 Any intuition? 1838 01:22:56,830 --> 01:22:57,670 In this pseudo code. 1839 01:22:57,670 --> 01:22:58,538 Yeah, in the middle. 1840 01:22:58,538 --> 01:22:59,830 STUDENT: Sorry, quick question. 1841 01:22:59,830 --> 01:23:02,360 Isn't bubble sort structured such that you 1842 01:23:02,360 --> 01:23:05,955 wouldn't need to compare numbers that have already bubbled up? 1843 01:23:05,955 --> 01:23:07,080 DAVID MALAN: Good question. 1844 01:23:07,080 --> 01:23:09,122 Isn't bubble sort designed such that you wouldn't 1845 01:23:09,122 --> 01:23:12,860 need to compare numbers that have already bubbled up? 1846 01:23:12,860 --> 01:23:17,000 That's what's happening here in the middle, implicitly. 1847 01:23:17,000 --> 01:23:19,220 I'm always going from left to right. 1848 01:23:19,220 --> 01:23:21,650 But remember that even when I screwed up at the end 1849 01:23:21,650 --> 01:23:23,900 and the last two people were out of order, I do always 1850 01:23:23,900 --> 01:23:27,140 need to restart at the beginning, because the big numbers are 1851 01:23:27,140 --> 01:23:29,691 going that way, and the small numbers are coming this way. 1852 01:23:29,691 --> 01:23:32,892 STUDENT: [INAUDIBLE] 1853 01:23:32,892 --> 01:23:34,100 DAVID MALAN: So that is true. 1854 01:23:34,100 --> 01:23:37,460 There are some slight optimizations that I'm kind of glossing over here. 1855 01:23:37,460 --> 01:23:40,700 Let me stipulate that it would still end up being on the order of n squared. 1856 01:23:40,700 --> 01:23:43,910 But that would definitely shave off some actual running time here. 1857 01:23:43,910 --> 01:23:46,340 But what if the list is already sorted? 1858 01:23:46,340 --> 01:23:48,410 Our pseudocode, at the moment, has no allowance 1859 01:23:48,410 --> 01:23:51,020 for if list is already sorted, quit early. 1860 01:23:51,020 --> 01:23:53,840 So we're going to blindly do n minus 1 things 1861 01:23:53,840 --> 01:23:58,850 and minus 1 times unless we modify our pseudocode, as I did verbally earlier, 1862 01:23:58,850 --> 01:23:59,960 I proposed this. 1863 01:23:59,960 --> 01:24:04,520 Inside of that outer loop, if you make a pass across all of the volunteers, 1864 01:24:04,520 --> 01:24:06,907 and your mental counter has made no swaps, 1865 01:24:06,907 --> 01:24:08,990 you have to keep track with some kind of variable, 1866 01:24:08,990 --> 01:24:10,518 well then, you might as well stop. 1867 01:24:10,518 --> 01:24:12,560 Because if you do a whole pass and make no swaps, 1868 01:24:12,560 --> 01:24:17,550 why would you waste time doing it again expecting different behavior? 1869 01:24:17,550 --> 01:24:23,480 So to help visualize these, whereby now bubble sort can be advantageous 1870 01:24:23,480 --> 01:24:26,640 if the data is already sorted or mostly sorted. 1871 01:24:26,640 --> 01:24:27,140 Why? 1872 01:24:27,140 --> 01:24:29,510 Because it does have this short circuit detail. 1873 01:24:29,510 --> 01:24:31,790 At least if we implement it like that, how 1874 01:24:31,790 --> 01:24:36,263 can we go about visualizing these things a little more clearly? 1875 01:24:36,263 --> 01:24:37,680 Well, let me go ahead and do this. 1876 01:24:37,680 --> 01:24:41,870 Let me pull up, here, a visualization of exactly these algorithms, 1877 01:24:41,870 --> 01:24:45,320 thanks to a third party tool here that's going to help us visualize 1878 01:24:45,320 --> 01:24:46,850 these sorting algorithms as follows. 1879 01:24:46,850 --> 01:24:48,740 Small bars represent small numbers. 1880 01:24:48,740 --> 01:24:50,480 Big bars represent big numbers. 1881 01:24:50,480 --> 01:24:53,720 And so the idea, now, is when I hit a button here 1882 01:24:53,720 --> 01:24:56,843 to get all of the small bars this way, all of the big bars this way. 1883 01:24:56,843 --> 01:24:58,010 So just like our volunteers. 1884 01:24:58,010 --> 01:25:02,370 But instead of holding lighted numbers, it's bars representing their magnitude. 1885 01:25:02,370 --> 01:25:07,190 So let's go ahead and start with, for instance, selection sort. 1886 01:25:07,190 --> 01:25:09,920 And you'll see in pink, is being highlighted 1887 01:25:09,920 --> 01:25:12,980 the current number that is being selected 1888 01:25:12,980 --> 01:25:14,820 and then pulled all the way to the left. 1889 01:25:14,820 --> 01:25:16,220 So this is selection sort. 1890 01:25:16,220 --> 01:25:20,420 And again, it's selecting the next smallest element. 1891 01:25:20,420 --> 01:25:25,850 But you can see here, all the more visibly, that just like my human feet, 1892 01:25:25,850 --> 01:25:27,450 we're taking a lot of steps. 1893 01:25:27,450 --> 01:25:32,430 So is this algorithm touching these elements, again and again and again. 1894 01:25:32,430 --> 01:25:34,970 And this is why the n squared is really a thing. 1895 01:25:34,970 --> 01:25:37,322 There's got to be some inherent redundancy here. 1896 01:25:37,322 --> 01:25:40,280 Like, why do we keep looking at the same darn elements again and again? 1897 01:25:40,280 --> 01:25:43,070 We do, in terms of our pseudocode need to do so. 1898 01:25:43,070 --> 01:25:46,670 But it's this redundant comparisons that kind of explains 1899 01:25:46,670 --> 01:25:48,782 why n squared is indeed the case. 1900 01:25:48,782 --> 01:25:49,490 So now it's done. 1901 01:25:49,490 --> 01:25:50,977 Small bars here, big bars there. 1902 01:25:50,977 --> 01:25:53,060 And I had to just keep talking there to kill time, 1903 01:25:53,060 --> 01:25:54,650 because it's relatively slow. 1904 01:25:54,650 --> 01:25:57,182 Well, let me re-randomize the array, just 1905 01:25:57,182 --> 01:25:58,640 so we start with a different order. 1906 01:25:58,640 --> 01:26:00,380 And now let me click on bubble sort. 1907 01:26:00,380 --> 01:26:03,240 And you'll see similar idea, but different algorithm. 1908 01:26:03,240 --> 01:26:06,620 So now, the two bars in pink are the two that 1909 01:26:06,620 --> 01:26:09,995 are being compared and fixed, potentially, if they're out of order. 1910 01:26:09,995 --> 01:26:11,870 And you can see already that the biggest bars 1911 01:26:11,870 --> 01:26:14,420 are bubbling their way up to the top. 1912 01:26:14,420 --> 01:26:17,510 But now, you can also see this redundancy, 1913 01:26:17,510 --> 01:26:20,450 like we keep swooping through the list again and again, just 1914 01:26:20,450 --> 01:26:22,740 like I kept walking back and forth. 1915 01:26:22,740 --> 01:26:23,795 And this is n squared. 1916 01:26:23,795 --> 01:26:24,920 This is not that many bars. 1917 01:26:24,920 --> 01:26:25,420 What? 1918 01:26:25,420 --> 01:26:27,830 10, 20, there's like 40 or something bars, I'm guessing. 1919 01:26:27,830 --> 01:26:31,560 That's pretty slow already just to sort 40 numbers. 1920 01:26:31,560 --> 01:26:34,310 And I think it's going to get tedious if I keep talking over this. 1921 01:26:34,310 --> 01:26:37,590 So let's just assume that this too is relatively slow. 1922 01:26:37,590 --> 01:26:41,390 Had I gotten lucky and the list were almost sorted already, 1923 01:26:41,390 --> 01:26:43,310 bubble sort would have been pretty fast. 1924 01:26:43,310 --> 01:26:46,040 But this was a truly random array, so we did not get lucky. 1925 01:26:46,040 --> 01:26:50,010 So indeed, the worst case might be what's kicking in here. 1926 01:26:50,010 --> 01:26:54,470 So I feel like it'll be anticlimactic, like holding in a sneeze, if I 1927 01:26:54,470 --> 01:26:55,980 don't let you see the end of this. 1928 01:26:55,980 --> 01:26:57,890 So here we go. 1929 01:26:57,890 --> 01:27:00,110 Nothing interesting is about to happen. 1930 01:27:00,110 --> 01:27:02,330 Almost done. 1931 01:27:02,330 --> 01:27:03,080 OK, done. 1932 01:27:03,080 --> 01:27:05,890 All right, so thank you. 1933 01:27:05,890 --> 01:27:06,710 [APPLAUSE] 1934 01:27:06,710 --> 01:27:09,110 Thank you. 1935 01:27:09,110 --> 01:27:12,500 So still somewhat slow though. 1936 01:27:12,500 --> 01:27:15,800 How though can we, perhaps, do a little better fundamentally? 1937 01:27:15,800 --> 01:27:19,070 So we can do so if we introduce yet another technique. 1938 01:27:19,070 --> 01:27:22,130 And this one isn't so much a function of code as it is concept. 1939 01:27:22,130 --> 01:27:25,290 And it's something that you might have seen in the real world, 1940 01:27:25,290 --> 01:27:27,500 but perhaps not so obviously so. 1941 01:27:27,500 --> 01:27:31,490 So it turns out, in programming, recursion 1942 01:27:31,490 --> 01:27:34,970 refers to the ability of a function to call itself. 1943 01:27:34,970 --> 01:27:37,460 In the world of mathematics, if you have a function f, 1944 01:27:37,460 --> 01:27:41,450 if f appears on both the left side and the right side of a formula, 1945 01:27:41,450 --> 01:27:43,850 that would be a recursive function in the math world too. 1946 01:27:43,850 --> 01:27:46,490 Whenever f is defined in terms of itself, or in our case, 1947 01:27:46,490 --> 01:27:51,800 in compute-- in programming, any time a function calls itself, 1948 01:27:51,800 --> 01:27:53,660 that function is said to be recursive. 1949 01:27:53,660 --> 01:27:55,790 And this is actually something we've seen already in class, 1950 01:27:55,790 --> 01:27:57,373 even though we didn't call it as much. 1951 01:27:57,373 --> 01:28:00,350 So for instance, consider this pseudocode 1952 01:28:00,350 --> 01:28:03,590 from earlier, whereby this was the pseudocode 1953 01:28:03,590 --> 01:28:07,760 for searching via binary search, a whole bunch of doors. 1954 01:28:07,760 --> 01:28:10,040 If no doors are left returned false, that 1955 01:28:10,040 --> 01:28:12,600 was the additional conditional we added. 1956 01:28:12,600 --> 01:28:14,930 But then if number behind middle door returned true, 1957 01:28:14,930 --> 01:28:18,980 and here's the interesting part, if number is less than middle door, 1958 01:28:18,980 --> 01:28:20,780 search the left half. 1959 01:28:20,780 --> 01:28:24,020 Else if number is greater than middle door, search the right half. 1960 01:28:24,020 --> 01:28:27,800 This pseudocode earlier was, itself, recursive. 1961 01:28:27,800 --> 01:28:28,340 Why? 1962 01:28:28,340 --> 01:28:30,590 Because here is an algorithm for searching. 1963 01:28:30,590 --> 01:28:32,650 But what's the algorithm telling us? 1964 01:28:32,650 --> 01:28:37,280 Well, on this line and this line, it's telling us to search something else. 1965 01:28:37,280 --> 01:28:40,720 So even though it's not explicitly defined in code as having a name, 1966 01:28:40,720 --> 01:28:44,410 if this is a search algorithm, and yet the search algorithm is using a search 1967 01:28:44,410 --> 01:28:47,650 algorithm, this pseudocode is recursive. 1968 01:28:47,650 --> 01:28:50,380 Now, that could quickly get you into trouble if a function just 1969 01:28:50,380 --> 01:28:53,410 calls itself again and again and again. 1970 01:28:53,410 --> 01:28:57,130 But why, intuitively, is it not problematic 1971 01:28:57,130 --> 01:29:01,840 that this code, this pseudocode, calls itself? 1972 01:29:01,840 --> 01:29:03,460 Why will the algorithm still stop? 1973 01:29:03,460 --> 01:29:03,970 Yeah. 1974 01:29:03,970 --> 01:29:07,525 STUDENT: It has an exit condition, as in if there is no doors left, [INAUDIBLE].. 1975 01:29:07,525 --> 01:29:08,400 DAVID MALAN: Exactly. 1976 01:29:08,400 --> 01:29:10,860 It has some exit condition, like if no doors left. 1977 01:29:10,860 --> 01:29:14,700 And more importantly, any time you search the left half, 1978 01:29:14,700 --> 01:29:17,120 you're searching a smaller version of the problem. 1979 01:29:17,120 --> 01:29:18,870 Any time you search the right half, you're 1980 01:29:18,870 --> 01:29:22,330 searching a smaller version of the problem, literally half the size. 1981 01:29:22,330 --> 01:29:24,270 So this is why, in the phone book, obviously 1982 01:29:24,270 --> 01:29:27,210 I couldn't tear the phone book in half infinitely many times, 1983 01:29:27,210 --> 01:29:29,560 because it was literally getting smaller each time. 1984 01:29:29,560 --> 01:29:33,580 So recursion is this ability to call yourself, if you will. 1985 01:29:33,580 --> 01:29:36,850 But what's important is that you do it on a smaller, smaller problem, 1986 01:29:36,850 --> 01:29:39,750 so that eventually, you have no more problems to solve 1987 01:29:39,750 --> 01:29:42,010 or no more data, no more doors at all. 1988 01:29:42,010 --> 01:29:46,210 So these two lines here would be the recursive elements here. 1989 01:29:46,210 --> 01:29:49,690 But if we go back to week 0, we could have used recursion in some other way. 1990 01:29:49,690 --> 01:29:53,040 So this was our pseudocode for the phone book back in week 0. 1991 01:29:53,040 --> 01:29:55,350 And recall that we described these yellow lines 1992 01:29:55,350 --> 01:29:59,050 as really representing a loop, some kind of cycle again and again. 1993 01:29:59,050 --> 01:30:01,080 But there was a missed opportunity here. 1994 01:30:01,080 --> 01:30:05,670 What if I had re-implemented this code to do this? 1995 01:30:05,670 --> 01:30:09,120 Instead of saying open to middle of left half of book 1996 01:30:09,120 --> 01:30:12,450 and then go back to line 3, like literally inducing a loop, 1997 01:30:12,450 --> 01:30:14,610 or open to middle of right half a book and go back 1998 01:30:14,610 --> 01:30:17,400 to line 3 inducing another loop, why don't I 1999 01:30:17,400 --> 01:30:19,590 just recognize that what I'm staring at now 2000 01:30:19,590 --> 01:30:23,730 is a algorithm for searching a phone book? 2001 01:30:23,730 --> 01:30:27,480 And if you want to search a smaller phone book, like A through M or N 2002 01:30:27,480 --> 01:30:30,750 through Z, we'll just use this same algorithm. 2003 01:30:30,750 --> 01:30:35,100 So I can replace these yellow lines with just this, casually speaking. 2004 01:30:35,100 --> 01:30:37,282 Search left half of book, search right half of book. 2005 01:30:37,282 --> 01:30:39,240 This would be implicitly, and now I can shorten 2006 01:30:39,240 --> 01:30:42,360 the whole thing, a recursive implementation of the phone book 2007 01:30:42,360 --> 01:30:43,633 pseudocode from week 0. 2008 01:30:43,633 --> 01:30:46,050 And it's recursive, because if this is a search algorithm, 2009 01:30:46,050 --> 01:30:48,900 and you're saying go search something else, that's fine. 2010 01:30:48,900 --> 01:30:49,890 That's recursive. 2011 01:30:49,890 --> 01:30:52,440 But because you're searching half of the phone book, 2012 01:30:52,440 --> 01:30:55,710 it's indeed going to get smaller and smaller. 2013 01:30:55,710 --> 01:30:59,400 Even in the real world or the real real virtual world, 2014 01:30:59,400 --> 01:31:01,903 you can see recursive data structures in the wild, 2015 01:31:01,903 --> 01:31:03,820 or at least in Super Mario Brothers like this. 2016 01:31:03,820 --> 01:31:05,612 Let me get rid of all the distractions here 2017 01:31:05,612 --> 01:31:08,790 and focus on this pyramid, where you have one block, 2018 01:31:08,790 --> 01:31:10,860 then two, then three, then four. 2019 01:31:10,860 --> 01:31:14,710 Well, this itself, is technically recursively-defined in the sense that, 2020 01:31:14,710 --> 01:31:16,590 well, what is a pyramid of height for? 2021 01:31:16,590 --> 01:31:18,420 Well, it's really, what? 2022 01:31:18,420 --> 01:31:21,090 How would you describe a pyramid of height 4 2023 01:31:21,090 --> 01:31:25,743 is actually the same thing as a pyramid of-- 2024 01:31:25,743 --> 01:31:28,200 STUDENT: Height 3. 2025 01:31:28,200 --> 01:31:30,750 DAVID MALAN: --of height 3, plus 1 additional layer. 2026 01:31:30,750 --> 01:31:32,370 Well, what's a pyramid of height 3? 2027 01:31:32,370 --> 01:31:36,250 Well, it's technically a pyramid of height 2 plus 1 additional layer. 2028 01:31:36,250 --> 01:31:38,670 And so even physical structures can be recursive 2029 01:31:38,670 --> 01:31:40,630 if you can define them in terms of itself. 2030 01:31:40,630 --> 01:31:44,280 Now, at some point, you have to say that if the pyramid is of height 1, 2031 01:31:44,280 --> 01:31:46,090 there's just one block. 2032 01:31:46,090 --> 01:31:49,020 You can't forever say it's defined in terms of a height negative 1, 2033 01:31:49,020 --> 01:31:50,440 negative 2, you would never stop. 2034 01:31:50,440 --> 01:31:52,752 So you have to kind of have a special case there. 2035 01:31:52,752 --> 01:31:55,710 But let's go ahead and translate something like this, in fact, to code. 2036 01:31:55,710 --> 01:32:00,390 Let me go back to VS code here, and let me implement a program called iteration 2037 01:32:00,390 --> 01:32:03,090 that refers to a loop iterating. 2038 01:32:03,090 --> 01:32:05,620 And let me implement a very simple pyramid like that. 2039 01:32:05,620 --> 01:32:08,370 So let me go ahead and include the CS50 library. 2040 01:32:08,370 --> 01:32:14,918 I'll include our standard io.h int main void, no command line arguments today. 2041 01:32:14,918 --> 01:32:16,210 And let's go ahead and do this. 2042 01:32:16,210 --> 01:32:18,900 Let's declare a variable called height, ask the human 2043 01:32:18,900 --> 01:32:21,150 for the height of this pyramid. 2044 01:32:21,150 --> 01:32:25,300 And then let's go ahead and draw a pyramid of that height. 2045 01:32:25,300 --> 01:32:27,580 Now, of course, draw does not yet exist. 2046 01:32:27,580 --> 01:32:30,090 So I'm going to need to invent the draw function. 2047 01:32:30,090 --> 01:32:33,180 Let me go ahead and define a function that doesn't have a return value. 2048 01:32:33,180 --> 01:32:34,722 It's just going to have side effects. 2049 01:32:34,722 --> 01:32:37,230 It's just going to print bricks on the screen, called draw. 2050 01:32:37,230 --> 01:32:40,240 And it takes in an integer, n, as its input. 2051 01:32:40,240 --> 01:32:41,950 And how am I going to implement this? 2052 01:32:41,950 --> 01:32:46,530 Well again, I want to print one block, then two, then three, then four. 2053 01:32:46,530 --> 01:32:49,680 That's pretty straightforward, at least once you're comfortable with loops. 2054 01:32:49,680 --> 01:32:51,370 Let me go back to the code here. 2055 01:32:51,370 --> 01:32:55,170 Let me go ahead and say 4, int i, get 0. 2056 01:32:55,170 --> 01:32:56,880 i is less than n. 2057 01:32:56,880 --> 01:32:58,260 i plus plus. 2058 01:32:58,260 --> 01:33:01,170 And that's going to iterate, essentially row by row. 2059 01:33:01,170 --> 01:33:05,370 And on each row, I want to print out one, then two, then three, then 2060 01:33:05,370 --> 01:33:06,060 four bricks. 2061 01:33:06,060 --> 01:33:08,815 But I'm iterating from 0 to 1 to 2 to 3. 2062 01:33:08,815 --> 01:33:09,690 So I think that's OK. 2063 01:33:09,690 --> 01:33:13,020 I can just say something like 4 int j get 0. 2064 01:33:13,020 --> 01:33:17,160 j, let's be clever about this, is less than i. 2065 01:33:17,160 --> 01:33:19,380 j++. 2066 01:33:19,380 --> 01:33:22,560 And now, let me go ahead and, inside of this loop, 2067 01:33:22,560 --> 01:33:27,130 I think I can get away with just printing out a single hash sign. 2068 01:33:27,130 --> 01:33:30,270 But then outside of that loop, similar to last week, 2069 01:33:30,270 --> 01:33:32,920 I'm going to print my new line separately. 2070 01:33:32,920 --> 01:33:34,470 So a little non-obvious at first. 2071 01:33:34,470 --> 01:33:38,790 But this outer loop iterates row by row, line by line, if you will. 2072 01:33:38,790 --> 01:33:46,890 And then the inner loop just makes sure that when i equals zero, let's see. 2073 01:33:46,890 --> 01:33:48,960 Oh nope, there's a bug. 2074 01:33:48,960 --> 01:33:52,170 I need to make sure that it's j is less than i plus 1. 2075 01:33:52,170 --> 01:33:55,500 So when i is 0 on my first line of output, 2076 01:33:55,500 --> 01:33:57,600 I'm going to print out one brick. 2077 01:33:57,600 --> 01:34:02,350 When i is 1, I'm going to print out two bricks and so forth. 2078 01:34:02,350 --> 01:34:05,460 So let me go ahead and run make iteration. 2079 01:34:05,460 --> 01:34:09,090 All right, and now, seems to compile. 2080 01:34:09,090 --> 01:34:10,770 Uh oh, huh. 2081 01:34:10,770 --> 01:34:12,900 Implicit declaration of function draw. 2082 01:34:12,900 --> 01:34:16,100 So I'm making week one mistakes again. 2083 01:34:16,100 --> 01:34:16,660 What? 2084 01:34:16,660 --> 01:34:17,570 Say again. 2085 01:34:17,570 --> 01:34:18,450 STUDENT: [INAUDIBLE] 2086 01:34:18,450 --> 01:34:19,200 DAVID MALAN: Yeah. 2087 01:34:19,200 --> 01:34:20,320 The prototype is missing. 2088 01:34:20,320 --> 01:34:21,300 I didn't declare it at the top. 2089 01:34:21,300 --> 01:34:23,550 That's an easy fix, and the only time, really, it's 2090 01:34:23,550 --> 01:34:25,530 OK and necessary to copy paste. 2091 01:34:25,530 --> 01:34:29,050 Let me copy the functions declaration there and it with a semicolon. 2092 01:34:29,050 --> 01:34:32,370 So that clang now knows that draw will exist. 2093 01:34:32,370 --> 01:34:33,240 Make iteration. 2094 01:34:33,240 --> 01:34:33,930 Now it works. 2095 01:34:33,930 --> 01:34:36,090 Thank you. dot slash iteration. 2096 01:34:36,090 --> 01:34:37,830 We'll type in something like 4. 2097 01:34:37,830 --> 01:34:40,860 And there we have it, our pyramid of height one, two, three, four, 2098 01:34:40,860 --> 01:34:43,340 that looks pretty similar to this, albeit using hashes. 2099 01:34:43,340 --> 01:34:46,590 So that's how we would have implemented this, like, two weeks ago in week one, 2100 01:34:46,590 --> 01:34:49,110 maybe last week, but just using arrays. 2101 01:34:49,110 --> 01:34:53,640 But let me propose that we could do something recursively instead. 2102 01:34:53,640 --> 01:34:55,480 Let me close this version of the code. 2103 01:34:55,480 --> 01:34:59,730 And let me go back to VS Code and open up recursion.c, 2104 01:34:59,730 --> 01:35:01,800 just to demonstrate something recursively. 2105 01:35:01,800 --> 01:35:04,420 And I'll do it incorrectly deliberately the first time. 2106 01:35:04,420 --> 01:35:06,630 So let me include cs50.h. 2107 01:35:06,630 --> 01:35:08,850 Let me include standard io.h. 2108 01:35:08,850 --> 01:35:12,000 Let me do int main void. 2109 01:35:12,000 --> 01:35:17,910 And let me just blindly draw a pyramid initially of height 1. 2110 01:35:17,910 --> 01:35:21,910 But now in my draw function, let me re-implement it a little differently. 2111 01:35:21,910 --> 01:35:24,840 So my draw function this time is still going to take a number n. 2112 01:35:24,840 --> 01:35:26,860 But that's how many hashes it's going to print. 2113 01:35:26,860 --> 01:35:30,030 So let's do 4, int i get 0. 2114 01:35:30,030 --> 01:35:32,220 i is less than n. 2115 01:35:32,220 --> 01:35:34,050 i++. 2116 01:35:34,050 --> 01:35:38,440 Then let's go ahead and print out a single hash mark here. 2117 01:35:38,440 --> 01:35:44,290 And then after that, let's print out the end of the line, just as before. 2118 01:35:44,290 --> 01:35:49,770 But now this, of course, is only going to draw a single row. 2119 01:35:49,770 --> 01:35:53,790 It's going to print out one hash or two hashes or three hashes, but only 2120 01:35:53,790 --> 01:35:54,750 on one line. 2121 01:35:54,750 --> 01:35:58,560 Let me now, incorrectly, but just kind of curiously say, all right. 2122 01:35:58,560 --> 01:36:01,230 Well, if this draws a pyramid of height 1, 2123 01:36:01,230 --> 01:36:04,860 let's just use ourselves to draw a pyramid of height n plus 1. 2124 01:36:04,860 --> 01:36:08,370 So the first time I call draw, it will print out one hash. 2125 01:36:08,370 --> 01:36:12,870 Then the second time I call draw, it will print out two hashes, then three, 2126 01:36:12,870 --> 01:36:13,770 then four. 2127 01:36:13,770 --> 01:36:18,000 So we're kind of laying these bricks down from top to bottom. 2128 01:36:18,000 --> 01:36:20,670 Make recursion. 2129 01:36:20,670 --> 01:36:22,420 Whoops, I screwed up again. 2130 01:36:22,420 --> 01:36:24,630 So let's copy the prototype here. 2131 01:36:24,630 --> 01:36:27,260 Let's put this down over here, semicolon. 2132 01:36:27,260 --> 01:36:28,600 Let's do this again. 2133 01:36:28,600 --> 01:36:30,010 Make recursion. 2134 01:36:30,010 --> 01:36:32,410 All right, all good, dot slash recursion. 2135 01:36:32,410 --> 01:36:34,930 And now let me increase the size of my terminal window, 2136 01:36:34,930 --> 01:36:37,310 just so you can see more of the output. 2137 01:36:37,310 --> 01:36:39,490 And here we have. 2138 01:36:39,490 --> 01:36:41,480 OK, bad, but thank you. 2139 01:36:41,480 --> 01:36:43,525 So we have an infinitely tall pyramid. 2140 01:36:43,525 --> 01:36:45,400 And it's just flying across the screen, which 2141 01:36:45,400 --> 01:36:47,020 is why it looks kind of like a mess. 2142 01:36:47,020 --> 01:36:51,670 But I printed out a pyramid of height 1, and then 2, and then 3, and then 4. 2143 01:36:51,670 --> 01:36:55,210 And unfortunately, what am I lacking any sort of quick condition, 2144 01:36:55,210 --> 01:36:58,270 any kind of condition that says, wait a minute, when it's too tall, 2145 01:36:58,270 --> 01:36:59,353 stop altogether. 2146 01:36:59,353 --> 01:37:00,520 So this is an infinite loop. 2147 01:37:00,520 --> 01:37:01,570 But it's not a loop. 2148 01:37:01,570 --> 01:37:03,250 It's a recursive call. 2149 01:37:03,250 --> 01:37:05,780 And actually, doing this in general, is very bad. 2150 01:37:05,780 --> 01:37:08,410 We'll see next week that if you call a function too many times, 2151 01:37:08,410 --> 01:37:11,967 you can actually trigger yet another of those segmentation faults, 2152 01:37:11,967 --> 01:37:14,050 because you're using too much memory, essentially. 2153 01:37:14,050 --> 01:37:16,300 But for now, I haven't triggered that yet. 2154 01:37:16,300 --> 01:37:17,927 Control C is your friend to cancel. 2155 01:37:17,927 --> 01:37:21,010 And as an aside, if you're playing along at home or playing with this code 2156 01:37:21,010 --> 01:37:22,750 later, I actually cheated here. 2157 01:37:22,750 --> 01:37:25,360 We have a special clang configuration feature 2158 01:37:25,360 --> 01:37:28,392 that prevents you from calling a function like that 2159 01:37:28,392 --> 01:37:29,350 and creating a problem. 2160 01:37:29,350 --> 01:37:31,598 I overrode it just for demonstration sake. 2161 01:37:31,598 --> 01:37:34,640 But odds are at home, you wouldn't be able to compile this code yourself. 2162 01:37:34,640 --> 01:37:39,050 But let me do a proper version recursively of this code as follows. 2163 01:37:39,050 --> 01:37:41,870 Let me go back into the code here. 2164 01:37:41,870 --> 01:37:45,130 Let me go ahead and, not just blindly start drawing one, then two, 2165 01:37:45,130 --> 01:37:46,540 then three layers of bricks. 2166 01:37:46,540 --> 01:37:50,380 Let me prompt the human as before for the height of the pyramid 2167 01:37:50,380 --> 01:37:53,350 they want using our get int function. 2168 01:37:53,350 --> 01:37:55,670 And now let me call draw of height again. 2169 01:37:55,670 --> 01:37:58,330 So now I'm going back to the loop-like version. 2170 01:37:58,330 --> 01:38:03,250 But instead of using a loop now, this is where recursion gets rather elegant, 2171 01:38:03,250 --> 01:38:04,120 if you will. 2172 01:38:04,120 --> 01:38:10,690 Let me go ahead and execute and code the draw function as follows. 2173 01:38:10,690 --> 01:38:14,050 Per your definition, if a pyramid of height 4 2174 01:38:14,050 --> 01:38:17,437 is really just a pyramid of height 3 plus another row, well, 2175 01:38:17,437 --> 01:38:18,520 let's take that literally. 2176 01:38:18,520 --> 01:38:19,990 Let me go back to my code. 2177 01:38:19,990 --> 01:38:24,340 And if you want to draw a pyramid of height 4, well go right ahead 2178 01:38:24,340 --> 01:38:29,380 and draw a pyramid of height 3 first, or more generally, n minus 1. 2179 01:38:29,380 --> 01:38:30,640 But what's the second step? 2180 01:38:30,640 --> 01:38:34,510 Well, once you've drawn a pyramid of height 3, draw an extra row. 2181 01:38:34,510 --> 01:38:37,190 So I at least have to bite off that part of the problem myself. 2182 01:38:37,190 --> 01:38:39,310 So let me just do for int i get 0. 2183 01:38:39,310 --> 01:38:41,530 i is less than n i++. 2184 01:38:41,530 --> 01:38:46,010 And let me, the programmer of this function, print out my hashes. 2185 01:38:46,010 --> 01:38:48,340 And then at the very bottom, print out a new line 2186 01:38:48,340 --> 01:38:50,350 so the cursor moves to the next line. 2187 01:38:50,350 --> 01:38:55,660 But this is kind of elegant now, I dare say, in that draw is recursive, 2188 01:38:55,660 --> 01:38:58,570 because I'm literally translating from English to C code, 2189 01:38:58,570 --> 01:39:02,050 this idea that a pyramid of height 4 is really just a pyramid of height 3. 2190 01:39:02,050 --> 01:39:03,640 So I do that first. 2191 01:39:03,640 --> 01:39:06,560 And I'm sort of trusting that this will work. 2192 01:39:06,560 --> 01:39:09,800 Then I just have to lay one more layer of bricks, four of them. 2193 01:39:09,800 --> 01:39:13,030 So if n is 4, this is just a simple for loop, a la week 1, 2194 01:39:13,030 --> 01:39:15,520 that will print out an additional layer. 2195 01:39:15,520 --> 01:39:18,610 But this, of course, is going to be problematic eventually. 2196 01:39:18,610 --> 01:39:20,030 Why? 2197 01:39:20,030 --> 01:39:22,670 It's not done yet, this program. 2198 01:39:22,670 --> 01:39:27,644 How many times will draw call itself in this model? 2199 01:39:27,644 --> 01:39:28,640 STUDENT: It's infinite. 2200 01:39:28,640 --> 01:39:30,098 DAVID MALAN: Infinitely many times. 2201 01:39:30,098 --> 01:39:30,814 Why? 2202 01:39:30,814 --> 01:39:34,170 STUDENT: Because there's no quit function. 2203 01:39:34,170 --> 01:39:36,450 DAVID MALAN: Yeah, there's no equivalent of quit. 2204 01:39:36,450 --> 01:39:39,810 Like, if you've printed enough already, then quit, well, 2205 01:39:39,810 --> 01:39:41,050 how do we capture that? 2206 01:39:41,050 --> 01:39:43,320 Well, I don't think we want this to go negative. 2207 01:39:43,320 --> 01:39:46,570 It would make no sense to draw a negative height pyramid. 2208 01:39:46,570 --> 01:39:51,270 So I think we can just pluck off, as the programmer, an easy case, 2209 01:39:51,270 --> 01:39:53,650 an easy answer, a so-called base case. 2210 01:39:53,650 --> 01:39:54,900 And I'm just going to do this. 2211 01:39:54,900 --> 01:40:00,030 At the top of my draw function, let me just say, if n is less than 2212 01:40:00,030 --> 01:40:02,830 or, heck, less than or equal to 0, that's it. 2213 01:40:02,830 --> 01:40:04,530 Go ahead and just return. 2214 01:40:04,530 --> 01:40:06,030 There's nothing more to do. 2215 01:40:06,030 --> 01:40:10,680 And that simple condition, technically known as a base case, 2216 01:40:10,680 --> 01:40:13,290 will ensure that the code doesn't run forever. 2217 01:40:13,290 --> 01:40:13,860 Why? 2218 01:40:13,860 --> 01:40:17,730 Well, suppose that draw is called with an argument of 4. 2219 01:40:17,730 --> 01:40:20,580 4 is, of course, not less than 0, so we don't return. 2220 01:40:20,580 --> 01:40:22,590 But we do draw a pyramid of height 3. 2221 01:40:22,590 --> 01:40:24,870 And here's where things get a little mentally tricky. 2222 01:40:24,870 --> 01:40:28,320 You don't move on to line 20 until draw has been called. 2223 01:40:28,320 --> 01:40:31,080 So when draw is called with an argument of 3, 2224 01:40:31,080 --> 01:40:34,230 it's as though you're executing from the top of this function again. 2225 01:40:34,230 --> 01:40:35,520 3 is not less than 0. 2226 01:40:35,520 --> 01:40:36,330 So what do you do? 2227 01:40:36,330 --> 01:40:38,490 You draw 2. 2228 01:40:38,490 --> 01:40:39,540 How do you draw 2? 2229 01:40:39,540 --> 01:40:41,950 Well, 2 is not less than 0, so you don't return. 2230 01:40:41,950 --> 01:40:43,050 So you draw 1. 2231 01:40:43,050 --> 01:40:44,370 Got to be careful here. 2232 01:40:44,370 --> 01:40:45,240 Draw 1. 2233 01:40:45,240 --> 01:40:47,340 And now, we go ahead back to the beginning. 2234 01:40:47,340 --> 01:40:48,090 How do you draw 1? 2235 01:40:48,090 --> 01:40:50,430 Well, 1 is not less than 0, so you don't return. 2236 01:40:50,430 --> 01:40:53,400 You draw height 0. 2237 01:40:53,400 --> 01:40:54,510 How do you draw height 0? 2238 01:40:54,510 --> 01:40:55,110 Wait a minute. 2239 01:40:55,110 --> 01:40:57,660 0 is less than or equal to 0. 2240 01:40:57,660 --> 01:40:58,980 And you return. 2241 01:40:58,980 --> 01:41:02,100 And so it's kind of like this mental stack, this to do list. 2242 01:41:02,100 --> 01:41:05,190 You keep postponing, executing these lower lines of code, 2243 01:41:05,190 --> 01:41:08,700 because you keep restarting, restarting, restarting the draw function 2244 01:41:08,700 --> 01:41:12,840 until, finally, one of those function calls says there's nothing to do, 2245 01:41:12,840 --> 01:41:13,530 return. 2246 01:41:13,530 --> 01:41:16,530 And now the whole thing starts to unravel, if you will. 2247 01:41:16,530 --> 01:41:18,330 And you pick back up where you left off. 2248 01:41:18,330 --> 01:41:20,300 And this is, perhaps, the best scenario. 2249 01:41:20,300 --> 01:41:21,300 We won't do it in class. 2250 01:41:21,300 --> 01:41:23,508 But if you'd like to wrestle through this on your own 2251 01:41:23,508 --> 01:41:27,780 using debug50 to keep stepping into, step into, step into, each 2252 01:41:27,780 --> 01:41:31,480 of those lines, logically, you'll see exactly what's actually happening. 2253 01:41:31,480 --> 01:41:34,330 So let me go to my terminal and do make recursion, 2254 01:41:34,330 --> 01:41:37,740 which is now this correct version of the code, dot slash recursion. 2255 01:41:37,740 --> 01:41:39,240 Let's type in a height of 4. 2256 01:41:39,240 --> 01:41:44,730 And voila, now we have that same pyramid, not using iteration per se, 2257 01:41:44,730 --> 01:41:47,910 though admittedly, we're using iteration to print the additional layer. 2258 01:41:47,910 --> 01:41:53,340 We're now using draw recursively to print all of the smaller pyramids 2259 01:41:53,340 --> 01:41:55,120 that need come before it. 2260 01:41:55,120 --> 01:41:57,370 STUDENT: Can you only use recursion for void function? 2261 01:41:57,370 --> 01:41:58,123 [INAUDIBLE] 2262 01:41:58,123 --> 01:41:58,790 DAVID MALAN: No. 2263 01:41:58,790 --> 01:42:01,070 Question is, can you only use recursion with a void function? 2264 01:42:01,070 --> 01:42:01,920 No, not at all. 2265 01:42:01,920 --> 01:42:05,120 In fact, it's very common to have a return value like an integer 2266 01:42:05,120 --> 01:42:09,350 or something else so that you can actually do something constructively 2267 01:42:09,350 --> 01:42:11,360 with that actual value. 2268 01:42:11,360 --> 01:42:13,190 Other questions on this. 2269 01:42:13,190 --> 01:42:15,290 STUDENT: When is line 21 getting executed? 2270 01:42:15,290 --> 01:42:16,790 DAVID MALAN: Say it a little louder. 2271 01:42:16,790 --> 01:42:18,770 STUDENT: When is line 21 getting executed? 2272 01:42:18,770 --> 01:42:20,850 DAVID MALAN: When is line 21 getting executed? 2273 01:42:20,850 --> 01:42:23,720 So if you continue to-- 2274 01:42:23,720 --> 01:42:26,600 let me scroll down a bit more so you can see the top of the code. 2275 01:42:26,600 --> 01:42:35,310 So line 21 will be executed once line 19 is done executing itself. 2276 01:42:35,310 --> 01:42:40,790 Now, in the story I told, we kept calling draw again, again, again. 2277 01:42:40,790 --> 01:42:43,400 But as soon as one of those function calls 2278 01:42:43,400 --> 01:42:46,490 where n equals 0 returns immediately, then we 2279 01:42:46,490 --> 01:42:48,510 don't keep drawing again and again. 2280 01:42:48,510 --> 01:42:51,110 So now if you kind of think of the process as reversing, 2281 01:42:51,110 --> 01:42:57,530 then you continue to line 21, then line 21 again, then line 21 again, 2282 01:42:57,530 --> 01:42:59,303 and as the sort of logic unravels. 2283 01:42:59,303 --> 01:43:01,970 And next week, we'll actually paint a picture of what's actually 2284 01:43:01,970 --> 01:43:03,530 happening in the computer's memory. 2285 01:43:03,530 --> 01:43:07,450 But for now, it's just, it's very similar to the pseudocode for the phone 2286 01:43:07,450 --> 01:43:07,950 book. 2287 01:43:07,950 --> 01:43:09,680 You're just searching again and again. 2288 01:43:09,680 --> 01:43:14,408 But you're waiting until the very end to get back the final result. 2289 01:43:14,408 --> 01:43:16,700 Google now, who I keep mentioning by coincidence today, 2290 01:43:16,700 --> 01:43:18,830 is full of programmers of course. 2291 01:43:18,830 --> 01:43:20,600 Here's a fun exercise. 2292 01:43:20,600 --> 01:43:23,432 Let me go back to a browser. 2293 01:43:23,432 --> 01:43:26,390 I'm going to go ahead and search for recursion, because I want to learn 2294 01:43:26,390 --> 01:43:27,980 a little something about recursion. 2295 01:43:27,980 --> 01:43:30,230 Here is kind of an internet meme or joke. 2296 01:43:30,230 --> 01:43:35,360 If I zoom in here, the engineers at Google are kind of funny. 2297 01:43:35,360 --> 01:43:37,902 See why? 2298 01:43:37,902 --> 01:43:38,798 STUDENT: Ah. 2299 01:43:38,798 --> 01:43:40,540 DAVID MALAN: Ah, there you go. 2300 01:43:40,540 --> 01:43:41,740 Yes. 2301 01:43:41,740 --> 01:43:43,030 Yes, this is recursion. 2302 01:43:43,030 --> 01:43:45,822 And there's going to be so many memes you'll come across now, where 2303 01:43:45,822 --> 01:43:48,490 recursion, like if you've ever pointed a camera at the TV that's 2304 01:43:48,490 --> 01:43:51,190 showing the camera, and you sort of see yourself or the image again and again, 2305 01:43:51,190 --> 01:43:52,660 that's really recursion. 2306 01:43:52,660 --> 01:43:56,320 And in that case, it only stops once you hit the base case of a single pixel. 2307 01:43:56,320 --> 01:43:59,350 But this is a very funny joke in some circles 2308 01:43:59,350 --> 01:44:01,880 when it comes to recursion and Google. 2309 01:44:01,880 --> 01:44:04,570 So how can we actually use Google, or rather, 2310 01:44:04,570 --> 01:44:08,050 how can we actually use recursion constructively? 2311 01:44:08,050 --> 01:44:11,320 Well, let me propose that we actually introduced 2312 01:44:11,320 --> 01:44:14,950 a third and final algorithm for sorting that hopefully does better 2313 01:44:14,950 --> 01:44:16,870 than the two sorts thus far. 2314 01:44:16,870 --> 01:44:19,480 We've done selection sort and bubble sort. 2315 01:44:19,480 --> 01:44:22,845 Bubble sort, we liked a little better, at least insofar as in the best case 2316 01:44:22,845 --> 01:44:24,220 where the list is already sorted. 2317 01:44:24,220 --> 01:44:26,387 Bubble sort's at least smarter, and it will actually 2318 01:44:26,387 --> 01:44:28,840 terminate early, giving us a better lower bound, 2319 01:44:28,840 --> 01:44:30,490 in terms of our omega notation. 2320 01:44:30,490 --> 01:44:33,100 But it turns out that recursion, and this is not 2321 01:44:33,100 --> 01:44:36,250 necessarily a feature of recursion, but something we can now leverage. 2322 01:44:36,250 --> 01:44:39,460 It turns out, using recursion, we can take a fundamentally different approach 2323 01:44:39,460 --> 01:44:43,090 to sorting a whole bunch of numbers in such a way 2324 01:44:43,090 --> 01:44:47,560 that we can do far fewer comparisons and, ideally, speed up 2325 01:44:47,560 --> 01:44:49,010 our final results. 2326 01:44:49,010 --> 01:44:51,970 So here is the pseudocode for what we're about to see 2327 01:44:51,970 --> 01:44:54,010 for something called merge sort. 2328 01:44:54,010 --> 01:44:56,230 And it really is this terse. 2329 01:44:56,230 --> 01:44:58,330 Sort the left half of numbers. 2330 01:44:58,330 --> 01:45:00,550 Sort the right half of numbers. 2331 01:45:00,550 --> 01:45:02,950 Merge the sorted halves. 2332 01:45:02,950 --> 01:45:06,310 This is almost sort of nonsensical, because if you're 2333 01:45:06,310 --> 01:45:09,880 asked for an algorithm to sort, and you respond with, well, sort the left half, 2334 01:45:09,880 --> 01:45:10,960 sort the right half. 2335 01:45:10,960 --> 01:45:14,230 That's being difficult, because well, I'm asking for a sorting algorithm. 2336 01:45:14,230 --> 01:45:16,897 You're just telling me to sort the left half and the right half. 2337 01:45:16,897 --> 01:45:21,130 But implicit in that last line, merging is a pretty powerful feature 2338 01:45:21,130 --> 01:45:21,760 of this sort. 2339 01:45:21,760 --> 01:45:23,908 Now, we do need another base case at the top. 2340 01:45:23,908 --> 01:45:24,700 So let me add this. 2341 01:45:24,700 --> 01:45:27,850 If we find ourselves with a list, an array, of size 1, 2342 01:45:27,850 --> 01:45:29,807 well, that array is obviously sorted. 2343 01:45:29,807 --> 01:45:32,390 If there's only one element in it, there's no work to be done. 2344 01:45:32,390 --> 01:45:33,890 So that's going to be our base case. 2345 01:45:33,890 --> 01:45:38,770 But allowing us now, in just these, what, four, six lines of pseudocode, 2346 01:45:38,770 --> 01:45:40,900 to actually sort some elements. 2347 01:45:40,900 --> 01:45:43,652 But let's focus first on just a subset of this. 2348 01:45:43,652 --> 01:45:46,360 Let's consider for a moment what it means to merge sorted halves. 2349 01:45:46,360 --> 01:45:48,760 So Carter has wonderfully come up to volunteer here just 2350 01:45:48,760 --> 01:45:50,170 to help us reset these numbers. 2351 01:45:50,170 --> 01:45:54,080 Suppose that in the middle of the story we're about to tell, 2352 01:45:54,080 --> 01:45:56,020 we have two sorted halves. 2353 01:45:56,020 --> 01:45:58,300 I've already sorted the left half of these numbers, 2354 01:45:58,300 --> 01:46:01,630 and indeed, 2, 4, 5, 7 is sorted from smallest to largest. 2355 01:46:01,630 --> 01:46:06,100 And the right half appears to be already sorted, 0, 1, 3, 6, already sorted. 2356 01:46:06,100 --> 01:46:09,370 So in my pseudocode, we're already done sorting the left half 2357 01:46:09,370 --> 01:46:10,630 and the right half somehow. 2358 01:46:10,630 --> 01:46:12,160 But we'll see how in a moment. 2359 01:46:12,160 --> 01:46:14,980 Well, how do I go about merging these two halves? 2360 01:46:14,980 --> 01:46:18,490 Well, because they're sorted already, and you want to merge them in order, 2361 01:46:18,490 --> 01:46:20,110 I think we can flip down. 2362 01:46:20,110 --> 01:46:25,030 We can hide all but the first numbers in each of these sublists. 2363 01:46:25,030 --> 01:46:28,125 So here, we have a half that starts with 2. 2364 01:46:28,125 --> 01:46:30,250 And I don't really care what the other numbers are, 2365 01:46:30,250 --> 01:46:32,060 because they're clearly larger than 2. 2366 01:46:32,060 --> 01:46:35,043 I can focus only on 2, and 0 too, 0 also. 2367 01:46:35,043 --> 01:46:37,960 We know that 0 is the smallest there, so let's just ignore the numbers 2368 01:46:37,960 --> 01:46:39,293 that Carter kindly flipped down. 2369 01:46:39,293 --> 01:46:44,360 So how do I merge these two lists into a new sorted larger list? 2370 01:46:44,360 --> 01:46:47,590 Well, I compare the two on my left with the 0 2371 01:46:47,590 --> 01:46:50,470 on my right, obviously, which comes first, the 0. 2372 01:46:50,470 --> 01:46:51,963 So let me put this down here. 2373 01:46:51,963 --> 01:46:54,130 And Carter, if you want to give us the next element. 2374 01:46:54,130 --> 01:46:55,960 Now I have two sorted halves. 2375 01:46:55,960 --> 01:46:57,650 But I've already plucked one off. 2376 01:46:57,650 --> 01:47:00,010 So now I compare the two against the 1. 2377 01:47:00,010 --> 01:47:01,580 1 obviously comes next. 2378 01:47:01,580 --> 01:47:04,843 So I'm going to take out the 1 and put it in place here. 2379 01:47:04,843 --> 01:47:06,760 Now I'm going to compare the two halves again. 2380 01:47:06,760 --> 01:47:08,830 2 and 3, which do I merge first? 2381 01:47:08,830 --> 01:47:10,660 Obviously the 2 comes next. 2382 01:47:10,660 --> 01:47:14,170 And now, notice, each time I do this, my hands are theoretically 2383 01:47:14,170 --> 01:47:15,220 making forward progress. 2384 01:47:15,220 --> 01:47:18,580 I'm not doubling back like I kept doing with selection sort or bubble 2385 01:47:18,580 --> 01:47:20,290 sort, back and forth, back and forth. 2386 01:47:20,290 --> 01:47:22,810 My fingers are constantly advancing forward, 2387 01:47:22,810 --> 01:47:24,310 and that's going to be a key detail. 2388 01:47:24,310 --> 01:47:27,340 So I compare 4 and 3, 3 obviously. 2389 01:47:27,340 --> 01:47:32,560 I compare 4 and 6, 4 obviously. 2390 01:47:32,560 --> 01:47:36,520 I compare 5 and 6, 5 obviously. 2391 01:47:36,520 --> 01:47:40,810 And then I compare 7 and 6, 6 of course. 2392 01:47:40,810 --> 01:47:43,000 And then lastly, we have just one element left. 2393 01:47:43,000 --> 01:47:45,460 And even though I'm kind of moving awkwardly as a human, 2394 01:47:45,460 --> 01:47:48,160 my hands technically were only moving to the right. 2395 01:47:48,160 --> 01:47:51,130 I was never looping back doing something again and again. 2396 01:47:51,130 --> 01:47:54,430 And that's, perhaps, the intuition, and just enough room for the 7. 2397 01:47:54,430 --> 01:47:58,210 So that, then, is how you would merge two sorted halves. 2398 01:47:58,210 --> 01:48:00,610 We started with left half sorted, right half sorted. 2399 01:48:00,610 --> 01:48:02,860 And merging is just like what you would do as a human. 2400 01:48:02,860 --> 01:48:04,568 And Carter just flipped the numbers down, 2401 01:48:04,568 --> 01:48:08,620 so our focus was only on the smallest elements in each. 2402 01:48:08,620 --> 01:48:13,030 Any questions before we forge ahead with what it means, then, 2403 01:48:13,030 --> 01:48:17,120 to be merged in this way? 2404 01:48:17,120 --> 01:48:18,777 So now, here is an original list. 2405 01:48:18,777 --> 01:48:20,860 We deliberately put it at the top, because there's 2406 01:48:20,860 --> 01:48:22,480 one detail of merge sort that's key. 2407 01:48:22,480 --> 01:48:25,490 Merge sort is technically going to use a little more space. 2408 01:48:25,490 --> 01:48:28,270 And so whereas, previously, we just kept moving our humans around 2409 01:48:28,270 --> 01:48:30,790 and swapping people and making sure they stayed ultimately 2410 01:48:30,790 --> 01:48:32,110 in the original positions. 2411 01:48:32,110 --> 01:48:36,700 With merge sort, pretends that here's our original array of memory. 2412 01:48:36,700 --> 01:48:38,970 I'm going to need at least one other ray of memory. 2413 01:48:38,970 --> 01:48:41,160 And I'm going to cheat, and I'm going to use even more memory. 2414 01:48:41,160 --> 01:48:43,440 But technically, I could actually go back and forth 2415 01:48:43,440 --> 01:48:45,540 between 1 array and a secondary array. 2416 01:48:45,540 --> 01:48:48,370 But it is going to take me more space. 2417 01:48:48,370 --> 01:48:53,130 So how do I go about implementing merge sort on this code? 2418 01:48:53,130 --> 01:48:54,930 Well, let's consider this. 2419 01:48:54,930 --> 01:48:57,060 Here is a array of size 8. 2420 01:48:57,060 --> 01:48:59,590 If only one number quit, obviously not applicable. 2421 01:48:59,590 --> 01:49:01,230 So let's focus on the juicy part there. 2422 01:49:01,230 --> 01:49:02,880 Sort the left half of the numbers. 2423 01:49:02,880 --> 01:49:05,130 All right, how do I sort the left half of the numbers? 2424 01:49:05,130 --> 01:49:09,240 I'm going to just nudge them over just to be clear, which is the left half. 2425 01:49:09,240 --> 01:49:11,850 Here is now a sublist of size 4. 2426 01:49:11,850 --> 01:49:14,860 How do I sort the left half? 2427 01:49:14,860 --> 01:49:17,380 Well, do I have an algorithm for sorting? 2428 01:49:17,380 --> 01:49:18,430 Yeah, what do I do? 2429 01:49:18,430 --> 01:49:19,492 Here's a list of size 4. 2430 01:49:19,492 --> 01:49:20,200 How do I sort it? 2431 01:49:20,200 --> 01:49:22,000 What's step one? 2432 01:49:22,000 --> 01:49:23,330 Sort the left half. 2433 01:49:23,330 --> 01:49:28,060 So I now sort of, conceptually in my mind, take this sublist of size 4. 2434 01:49:28,060 --> 01:49:32,740 And I sort it by first sorting the left half, focusing now on the 7 and 2. 2435 01:49:32,740 --> 01:49:34,330 All right, here's a list of size 2. 2436 01:49:34,330 --> 01:49:37,060 How do I sort a list of size 2? 2437 01:49:37,060 --> 01:49:38,740 STUDENT: [INAUDIBLE] 2438 01:49:38,740 --> 01:49:40,170 DAVID MALAN: Sorry? 2439 01:49:40,170 --> 01:49:42,360 I think we just keep following our instructions. 2440 01:49:42,360 --> 01:49:43,650 Sort the left half. 2441 01:49:43,650 --> 01:49:45,630 All right, here is a list of size 1. 2442 01:49:45,630 --> 01:49:48,417 How do I sort a list of size 1? 2443 01:49:48,417 --> 01:49:49,803 STUDENT: [INAUDIBLE] 2444 01:49:49,803 --> 01:49:50,720 DAVID MALAN: I'm done. 2445 01:49:50,720 --> 01:49:51,360 It's done. 2446 01:49:51,360 --> 01:49:52,740 So I leave this alone. 2447 01:49:52,740 --> 01:49:54,740 What was the next step in the story? 2448 01:49:54,740 --> 01:49:58,160 I've just sorted the left half of the left half of the left half. 2449 01:49:58,160 --> 01:49:59,580 What comes next? 2450 01:49:59,580 --> 01:50:03,260 I sort the right half of the left half of the left half, 2451 01:50:03,260 --> 01:50:06,260 and I'm done, because it's just a list of size 1. 2452 01:50:06,260 --> 01:50:09,280 What comes after this? 2453 01:50:09,280 --> 01:50:09,972 Merge. 2454 01:50:09,972 --> 01:50:11,680 So this is where it gets a little trippy, 2455 01:50:11,680 --> 01:50:14,770 because you have to remember where we're pausing the story to do things 2456 01:50:14,770 --> 01:50:16,187 recursively again and again. 2457 01:50:16,187 --> 01:50:19,270 But if I've just sorted the left half and I've just sorted the right half, 2458 01:50:19,270 --> 01:50:20,890 now I merge them together. 2459 01:50:20,890 --> 01:50:25,040 This is a super short list, so we don't need Carter's help here as before. 2460 01:50:25,040 --> 01:50:27,640 But I think the first number I take here is the 2. 2461 01:50:27,640 --> 01:50:31,660 And then the second number I take, because it's the only option, is the 7. 2462 01:50:31,660 --> 01:50:35,260 But what's nice now is that, notice, the left half of the left half 2463 01:50:35,260 --> 01:50:39,228 is indeed sorted, because I trivially sorted the left half of it 2464 01:50:39,228 --> 01:50:40,270 and the right half of it. 2465 01:50:40,270 --> 01:50:42,760 But then merging is really where the magic happens. 2466 01:50:42,760 --> 01:50:45,670 All right, again, if you rewind now in your mind, 2467 01:50:45,670 --> 01:50:51,300 if I've just sorted the left half of the left half, what happens next? 2468 01:50:51,300 --> 01:50:55,000 Sort the right half of the left half. 2469 01:50:55,000 --> 01:50:56,980 So again, you kind of rewind in time. 2470 01:50:56,980 --> 01:50:58,290 So how do I do this? 2471 01:50:58,290 --> 01:50:59,520 I've got a list of size 2. 2472 01:50:59,520 --> 01:51:01,920 I sort the left half, just the 5, done. 2473 01:51:01,920 --> 01:51:04,200 Sort the right half, 4, done. 2474 01:51:04,200 --> 01:51:08,910 Now the interesting part, I merge the left half and the right half 2475 01:51:08,910 --> 01:51:11,380 of the right half of the left half. 2476 01:51:11,380 --> 01:51:12,450 So what do I do? 2477 01:51:12,450 --> 01:51:14,280 4 comes down here. 2478 01:51:14,280 --> 01:51:16,260 5 comes down here. 2479 01:51:16,260 --> 01:51:19,860 And now, notice what I have. 2480 01:51:19,860 --> 01:51:21,600 Left half is sorted. 2481 01:51:21,600 --> 01:51:23,130 Right half is sorted. 2482 01:51:23,130 --> 01:51:26,610 If you rewind in time, where is my next step, 3? 2483 01:51:26,610 --> 01:51:27,742 Merge the two halves. 2484 01:51:27,742 --> 01:51:29,700 And so this is what Carter helped me do before. 2485 01:51:29,700 --> 01:51:31,290 Let's focus only on the smallest elements, 2486 01:51:31,290 --> 01:51:32,665 just so there's less distraction. 2487 01:51:32,665 --> 01:51:34,020 I compare the 2 and the 4. 2488 01:51:34,020 --> 01:51:36,520 2 comes first, so let's obviously put that here. 2489 01:51:36,520 --> 01:51:39,570 Now, I compare the new beginning of this list 2490 01:51:39,570 --> 01:51:41,280 and the old beginning of this list. 2491 01:51:41,280 --> 01:51:43,050 4 obviously comes next. 2492 01:51:43,050 --> 01:51:45,940 And now, I compare the 7 against the 5. 2493 01:51:45,940 --> 01:51:47,430 5 obviously comes next. 2494 01:51:47,430 --> 01:51:49,240 And now, lastly, I'm left with one number. 2495 01:51:49,240 --> 01:51:50,970 So now I'm down to the 7. 2496 01:51:50,970 --> 01:51:53,490 So even if you've kind of lost track of some of the nuances 2497 01:51:53,490 --> 01:51:55,510 here, if you just kind of take a step back, 2498 01:51:55,510 --> 01:51:58,260 we have the original right half here still untouched. 2499 01:51:58,260 --> 01:52:03,210 But the left half of the original input is now, indeed, sorted, 2500 01:52:03,210 --> 01:52:07,140 all by way of doing sorting left half, right half, left half, right half, 2501 01:52:07,140 --> 01:52:08,940 but with those merges in between. 2502 01:52:08,940 --> 01:52:11,965 All right, so if we've just sorted the left half, 2503 01:52:11,965 --> 01:52:13,590 we rewind all the way to the beginning. 2504 01:52:13,590 --> 01:52:15,590 What do I now do? 2505 01:52:15,590 --> 01:52:17,120 All right, so sort the right half. 2506 01:52:17,120 --> 01:52:18,410 So sort the right half. 2507 01:52:18,410 --> 01:52:20,180 How do I sort a list of size 4? 2508 01:52:20,180 --> 01:52:22,550 Well, I first sort the left half, the 1 and the 6. 2509 01:52:22,550 --> 01:52:24,560 How do I sort a list of size 2? 2510 01:52:24,560 --> 01:52:26,757 You sort the left half, just the number 1. 2511 01:52:26,757 --> 01:52:28,340 Obviously, there's no work to be done. 2512 01:52:28,340 --> 01:52:30,620 Done, sorting the left half. 2513 01:52:30,620 --> 01:52:33,080 6, done, sorting the right half. 2514 01:52:33,080 --> 01:52:34,280 Now, what do I do? 2515 01:52:34,280 --> 01:52:40,610 I merge the left half here with the right half here. 2516 01:52:40,610 --> 01:52:42,240 And that one's pretty straightforward. 2517 01:52:42,240 --> 01:52:43,050 Now, what do I do? 2518 01:52:43,050 --> 01:52:43,910 I've just merged. 2519 01:52:43,910 --> 01:52:45,048 So now I sort it. 2520 01:52:45,048 --> 01:52:47,090 I've just sorted the left half of the right half. 2521 01:52:47,090 --> 01:52:49,550 So now I sort the right half of the right half. 2522 01:52:49,550 --> 01:52:51,590 So I consider the 0, done. 2523 01:52:51,590 --> 01:52:53,270 I consider the 3, done. 2524 01:52:53,270 --> 01:52:55,040 I now merge these two together. 2525 01:52:55,040 --> 01:52:56,640 0, of course, comes first. 2526 01:52:56,640 --> 01:52:58,100 Then comes the 3. 2527 01:52:58,100 --> 01:53:00,680 And now I'm at the point of the story where 2528 01:53:00,680 --> 01:53:02,930 I've sorted the left half of the right half 2529 01:53:02,930 --> 01:53:04,910 and the right half of the right half. 2530 01:53:04,910 --> 01:53:07,535 So step 3 is merge. 2531 01:53:07,535 --> 01:53:09,410 And I'll do it again like we did with Carter. 2532 01:53:09,410 --> 01:53:12,320 All right, 1 and 0, obviously the 0 comes first. 2533 01:53:12,320 --> 01:53:14,390 Now, compare the 1 and the 3. 2534 01:53:14,390 --> 01:53:16,130 Obviously, the 1 comes first. 2535 01:53:16,130 --> 01:53:18,590 Compare the 6 and the 3, obviously the 3. 2536 01:53:18,590 --> 01:53:20,300 And then lastly, the 6. 2537 01:53:20,300 --> 01:53:21,890 So now, where are we? 2538 01:53:21,890 --> 01:53:26,840 We've taken the left half of the whole thing and sorted it. 2539 01:53:26,840 --> 01:53:29,990 We then took the right half of the whole thing and sorted it. 2540 01:53:29,990 --> 01:53:33,560 So now we're at, lastly, step 3 for the last time. 2541 01:53:33,560 --> 01:53:35,120 What do we do? 2542 01:53:35,120 --> 01:53:35,780 Merge. 2543 01:53:35,780 --> 01:53:40,220 And so just to be consistent, let me push these down, and let's compare. 2544 01:53:40,220 --> 01:53:43,310 Left hand or right hand, noticing that they only make forward progress, 2545 01:53:43,310 --> 01:53:45,230 none of this back and forth comparisons. 2546 01:53:45,230 --> 01:53:47,270 2 and 0, of course, the 0. 2547 01:53:47,270 --> 01:53:48,860 So we'll put that in place. 2548 01:53:48,860 --> 01:53:51,140 2 and 1, of course, the 1. 2549 01:53:51,140 --> 01:53:52,880 So we put that in place. 2550 01:53:52,880 --> 01:53:56,930 2 and 3, we merge in, of course, the 2 in this case. 2551 01:53:56,930 --> 01:54:00,770 4 and 3, we now merge in the 3 in this case. 2552 01:54:00,770 --> 01:54:05,630 4 and 6, we now merge, of course, the 4 in place. 2553 01:54:05,630 --> 01:54:07,760 And now, we compare 5 and 6. 2554 01:54:07,760 --> 01:54:08,615 We keep the 5. 2555 01:54:08,615 --> 01:54:12,590 2556 01:54:12,590 --> 01:54:15,226 Bug. 2557 01:54:15,226 --> 01:54:17,295 OK, well pretend that the 5 is on. 2558 01:54:17,295 --> 01:54:20,040 2559 01:54:20,040 --> 01:54:21,450 Oh, this is why. 2560 01:54:21,450 --> 01:54:24,240 All right, so now we compare the 7 and the 6. 2561 01:54:24,240 --> 01:54:26,430 6th is gone. 2562 01:54:26,430 --> 01:54:29,520 And lastly, 7 is the last one in place. 2563 01:54:29,520 --> 01:54:32,140 And even though I grant that of all the algorithms, 2564 01:54:32,140 --> 01:54:34,320 this is probably the hardest one to stay on top of, 2565 01:54:34,320 --> 01:54:36,210 especially when I'm doing it as a voice over. 2566 01:54:36,210 --> 01:54:41,010 Realize that what we've just done is only those three steps, recursively. 2567 01:54:41,010 --> 01:54:42,510 We started with a list of size 8. 2568 01:54:42,510 --> 01:54:43,650 We sorted the left half. 2569 01:54:43,650 --> 01:54:44,790 We sorted the right half. 2570 01:54:44,790 --> 01:54:46,450 And then we merge the two together. 2571 01:54:46,450 --> 01:54:48,960 But if you go down each of those rabbit holes, so to speak, 2572 01:54:48,960 --> 01:54:52,410 sorting the left half involves sorting the left half of the left half 2573 01:54:52,410 --> 01:54:54,970 and the right half of the left half, and so forth. 2574 01:54:54,970 --> 01:54:59,250 But this germ of an idea of really dividing and conquering the problem, 2575 01:54:59,250 --> 01:55:02,520 not such that you're having the problem and only dealing with one half. 2576 01:55:02,520 --> 01:55:05,700 Clearly, we're sorting one half and the other half 2577 01:55:05,700 --> 01:55:07,780 and merging them together, ultimately. 2578 01:55:07,780 --> 01:55:10,810 It does still lead us to the same solution. 2579 01:55:10,810 --> 01:55:14,550 And if we visualize the remnants of this now, if I depict this as follows, 2580 01:55:14,550 --> 01:55:18,390 where on the screen here, you see where the numbers originally started 2581 01:55:18,390 --> 01:55:20,310 in the top row from left to right. 2582 01:55:20,310 --> 01:55:23,470 Essentially, even though this is in a different order, 2583 01:55:23,470 --> 01:55:28,708 I divided that list of size 8, ultimately, into eight lists of size 1. 2584 01:55:28,708 --> 01:55:31,000 And that's where the base case kicked in and just said, 2585 01:55:31,000 --> 01:55:32,580 OK, we're done sorting that. 2586 01:55:32,580 --> 01:55:37,680 And after that, logically, I then merged two lists of size 1 2587 01:55:37,680 --> 01:55:41,580 into many lists of size 2 and those lists of size 2 into lists of size 4. 2588 01:55:41,580 --> 01:55:47,250 And then finally, the list of size 4 into one big list sorted of size 8. 2589 01:55:47,250 --> 01:55:50,610 And so I put forth this picture with the little line indicators 2590 01:55:50,610 --> 01:55:55,620 here, because how many times did I divide, divide, divide in half? 2591 01:55:55,620 --> 01:55:57,360 Or really double, double, double. 2592 01:55:57,360 --> 01:55:59,280 So exponent is the opposite-- 2593 01:55:59,280 --> 01:56:00,600 spoiler. 2594 01:56:00,600 --> 01:56:02,610 How many times did I divide? 2595 01:56:02,610 --> 01:56:04,320 So three, concretely. 2596 01:56:04,320 --> 01:56:08,070 But if there's eight elements total, and there's n more generally, 2597 01:56:08,070 --> 01:56:12,300 it really is a matter of dividing and conquering log n times. 2598 01:56:12,300 --> 01:56:15,360 You start this, and you can divide one, two, three times, log n times. 2599 01:56:15,360 --> 01:56:18,930 Or conversely, you can start here and exponentially double, double, 2600 01:56:18,930 --> 01:56:21,060 double three times, which is log n. 2601 01:56:21,060 --> 01:56:24,510 But on every row, every shelf, literally, I 2602 01:56:24,510 --> 01:56:28,350 made a fuss about pointing my hands only from the left to the right, 2603 01:56:28,350 --> 01:56:31,380 constantly advancing them, such that every time I did those merges, 2604 01:56:31,380 --> 01:56:34,500 I touched every element once and only once. 2605 01:56:34,500 --> 01:56:37,470 There was none of this back and forth, back and forth on stage. 2606 01:56:37,470 --> 01:56:42,750 So if I'm doing something log n times, or if I'm doing, 2607 01:56:42,750 --> 01:56:49,410 rather, n things log n times, what would be our big O formula, perhaps? 2608 01:56:49,410 --> 01:56:51,057 n things log n times? 2609 01:56:51,057 --> 01:56:52,140 STUDENT: Oh, it's n log n. 2610 01:56:52,140 --> 01:56:53,490 DAVID MALAN: Yeah, so n log n. 2611 01:56:53,490 --> 01:56:56,430 The order of n log n is, indeed, how we would describe 2612 01:56:56,430 --> 01:56:58,560 the running time of merge sort. 2613 01:56:58,560 --> 01:57:03,270 And so of all of the sorts thus far, we've seen that merge sort here, 2614 01:57:03,270 --> 01:57:07,120 actually, is n log n, which is strictly better than n squared, 2615 01:57:07,120 --> 01:57:10,650 which is where both selection sort and bubble sort landed. 2616 01:57:10,650 --> 01:57:14,015 But it's also slower than linear search, for instance. 2617 01:57:14,015 --> 01:57:15,390 But you would rather expect that. 2618 01:57:15,390 --> 01:57:17,880 If you have to do a lot of work up front sorting 2619 01:57:17,880 --> 01:57:19,878 some elements versus just searching them, 2620 01:57:19,878 --> 01:57:21,670 you're going to have to put in more effort. 2621 01:57:21,670 --> 01:57:24,150 And so the question of whether or not you should just search something 2622 01:57:24,150 --> 01:57:26,670 blindly with linear search and not bother sorting it, 2623 01:57:26,670 --> 01:57:30,327 really boils down to, can you afford to spend this amount of time? 2624 01:57:30,327 --> 01:57:32,160 And if you're the Googles of the world, odds 2625 01:57:32,160 --> 01:57:35,170 are you don't want to be searching their database linearly every time. 2626 01:57:35,170 --> 01:57:35,670 Why? 2627 01:57:35,670 --> 01:57:40,380 Because you can sort it once and then benefit millions, billions of people, 2628 01:57:40,380 --> 01:57:43,560 subsequently using something like binary search or, frankly in practice, 2629 01:57:43,560 --> 01:57:46,443 something even fancier and faster than binary search. 2630 01:57:46,443 --> 01:57:48,360 But there's always going to be this trade off. 2631 01:57:48,360 --> 01:57:52,140 You can achieve binary search only if the elements are sorted. 2632 01:57:52,140 --> 01:57:53,940 How much does it cost you to sort them? 2633 01:57:53,940 --> 01:57:56,940 Well, maybe n squared, if you used some of the earlier algorithms. 2634 01:57:56,940 --> 01:58:00,850 But it turns out, n log in is pretty fast as well. 2635 01:58:00,850 --> 01:58:06,180 So at the end of the day, these running times involve trade offs. 2636 01:58:06,180 --> 01:58:09,600 And indeed, in merge sort 2, I should note that the lower bound on merge 2637 01:58:09,600 --> 01:58:12,090 sort is also going to be omega of n log n. 2638 01:58:12,090 --> 01:58:14,640 As such, we can describe it in terms of our theta notation, 2639 01:58:14,640 --> 01:58:18,030 saying that merge sort is, indeed, in theta of n log n. 2640 01:58:18,030 --> 01:58:21,780 So generally speaking, probably better to use something like merge sort 2641 01:58:21,780 --> 01:58:24,030 or some other algorithm that's n log n. 2642 01:58:24,030 --> 01:58:27,660 In practice, most programmers are not implementing these sorting algorithms 2643 01:58:27,660 --> 01:58:28,200 themselves. 2644 01:58:28,200 --> 01:58:30,390 Odds are, they're using a library off the shelf 2645 01:58:30,390 --> 01:58:34,200 that themselves have made the decision as to which of these algorithms to do. 2646 01:58:34,200 --> 01:58:37,140 But generally speaking, and we're seeing now this for the first time, 2647 01:58:37,140 --> 01:58:41,430 if you want to improve time, like use less time, write faster code, 2648 01:58:41,430 --> 01:58:42,750 you've got to pay a price. 2649 01:58:42,750 --> 01:58:45,120 And that might be your human time, just takes you 2650 01:58:45,120 --> 01:58:47,400 more time to code up something more sophisticated, 2651 01:58:47,400 --> 01:58:49,080 more difficult to implement. 2652 01:58:49,080 --> 01:58:51,840 Or you need to spend something like space. 2653 01:58:51,840 --> 01:58:54,060 And as these shelves suggest, that too is 2654 01:58:54,060 --> 01:58:55,740 one of the key details of merge sort. 2655 01:58:55,740 --> 01:58:58,500 You can't just have the elements swapping in place. 2656 01:58:58,500 --> 01:59:02,520 You need at least an auxiliary array, so that when you do the merging, 2657 01:59:02,520 --> 01:59:03,960 you have a place to put them. 2658 01:59:03,960 --> 01:59:05,988 And this is excessive, this amount of memory. 2659 01:59:05,988 --> 01:59:09,030 I could have just gone back and forth between top shelf and bottom shelf. 2660 01:59:09,030 --> 01:59:11,113 But it's a little more interesting to go top down. 2661 01:59:11,113 --> 01:59:12,870 But you do need more space. 2662 01:59:12,870 --> 01:59:15,605 Back in the day, decades ago, space was really expensive. 2663 01:59:15,605 --> 01:59:16,480 And so you know what? 2664 01:59:16,480 --> 01:59:19,450 It might have been better to not use merge sort, use bubble sort 2665 01:59:19,450 --> 01:59:23,230 or selection sort even, or some other algorithm altogether. 2666 01:59:23,230 --> 01:59:25,300 Nowadays, space is relatively cheap. 2667 01:59:25,300 --> 01:59:27,160 And so these are more acceptable trade offs. 2668 01:59:27,160 --> 01:59:29,650 But it totally depends on the application. 2669 01:59:29,650 --> 01:59:32,950 The very last thing we thought we'd do is show you an actual comparison 2670 01:59:32,950 --> 01:59:34,450 of some of these sorting algorithms. 2671 01:59:34,450 --> 01:59:35,800 It's about 60 seconds long. 2672 01:59:35,800 --> 01:59:39,790 And it will compare for you, selection sort, bubble sort, 2673 01:59:39,790 --> 01:59:44,350 and merge sort in parallel simultaneously with some fun sorting 2674 01:59:44,350 --> 01:59:46,570 music, showing you ultimately what it really 2675 01:59:46,570 --> 01:59:52,750 means to be an O of n squared, or better yet, big O of n log n. 2676 01:59:52,750 --> 01:59:54,850 Selection on the top. 2677 01:59:54,850 --> 01:59:58,050 Bubble on the bottom. 2678 01:59:58,050 --> 01:59:59,400 Merge in the middle. 2679 01:59:59,400 --> 02:00:01,885 [MUSIC PLAYING] 2680 02:00:01,885 --> 02:00:53,660 2681 02:00:53,660 --> 02:00:55,700 All right, that's it for CS50. 2682 02:00:55,700 --> 02:00:57,910 We'll see you next time. 2683 02:00:57,910 --> 02:01:35,000 217603

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.