All language subtitles for data_structures-720p-en

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian Download
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 0 00:00:00,000 --> 00:00:05,259 1 00:00:05,259 --> 00:00:08,300 DOUG LLOYD: So in CS50, we've covered a lot of different data structures, 2 00:00:08,300 --> 00:00:09,180 right? 3 00:00:09,180 --> 00:00:11,420 We've seen arrays, and linked lists, and hash tables, 4 00:00:11,420 --> 00:00:15,210 and tries, stacks and queues. 5 00:00:15,210 --> 00:00:17,579 We'll also learn a little about trees and heaps, 6 00:00:17,579 --> 00:00:20,120 but really these all just end up being variations on a theme. 7 00:00:20,120 --> 00:00:22,840 There really are these kind of four basic ideas 8 00:00:22,840 --> 00:00:25,190 that everything else can boil down to. 9 00:00:25,190 --> 00:00:28,150 Arrays, linked lists, hash tables, and tries. 10 00:00:28,150 --> 00:00:30,720 And like I said, there are variations on them, 11 00:00:30,720 --> 00:00:32,720 but this is pretty much going to summarize 12 00:00:32,720 --> 00:00:38,140 everything we're going to talk about in this class in terms of C. 13 00:00:38,140 --> 00:00:40,140 >> But how do these all measure up, right? 14 00:00:40,140 --> 00:00:44,265 We've talked about the pros and cons of each in separate videos on them, 15 00:00:44,265 --> 00:00:46,390 but there's a lot of numbers getting thrown around. 16 00:00:46,390 --> 00:00:48,723 There's a lot of general thoughts getting thrown around. 17 00:00:48,723 --> 00:00:51,950 Let's try and consolidate it into just one place. 18 00:00:51,950 --> 00:00:55,507 Let's weigh the pros against the cons, and consider 19 00:00:55,507 --> 00:00:57,340 which data structure might be the right data 20 00:00:57,340 --> 00:01:01,440 structure for your particular situation, whatever kind of data you're storing. 21 00:01:01,440 --> 00:01:06,625 You don't necessarily always need to use the super fast insertion, deletion, 22 00:01:06,625 --> 00:01:10,761 and lookup of a trie if you really don't care about inserting and deleting 23 00:01:10,761 --> 00:01:11,260 too much. 24 00:01:11,260 --> 00:01:13,968 If you need just quickly random access, maybe an array is better. 25 00:01:13,968 --> 00:01:15,340 So let's distill that. 26 00:01:15,340 --> 00:01:18,530 Let's talk about each of the four major kinds of data structures 27 00:01:18,530 --> 00:01:21,720 that we've talked about, and just see when they might be good, 28 00:01:21,720 --> 00:01:23,340 and when they might not be so good. 29 00:01:23,340 --> 00:01:24,610 So let's start with arrays. 30 00:01:24,610 --> 00:01:27,300 So insertion, that's kind of bad. 31 00:01:27,300 --> 00:01:31,350 >> Insertion at the end of an array is OK, if we're building an array as we go. 32 00:01:31,350 --> 00:01:33,570 But if we need to insert elements into the middle, 33 00:01:33,570 --> 00:01:35,550 think back to insertion sort, there's a lot 34 00:01:35,550 --> 00:01:37,510 of shifting to fit an element in there. 35 00:01:37,510 --> 00:01:41,170 And so if we're going to insert anywhere but the end of an array, 36 00:01:41,170 --> 00:01:43,590 that's probably not so great. 37 00:01:43,590 --> 00:01:46,710 >> Similarly, deletion, unless we're deleting from the end of an array, 38 00:01:46,710 --> 00:01:49,810 is probably also not so great if we don't want to leave empty gaps, 39 00:01:49,810 --> 00:01:50,790 which usually we don't. 40 00:01:50,790 --> 00:01:54,700 We want to remove an element, and then sort of make it snug again. 41 00:01:54,700 --> 00:01:57,670 And so deleting elements from an array, also not so great. 42 00:01:57,670 --> 00:01:58,820 >> Lookup, though, is great. 43 00:01:58,820 --> 00:02:00,920 We have random access, constant time lookup. 44 00:02:00,920 --> 00:02:03,800 We just say seven, and we go to array relocation seven. 45 00:02:03,800 --> 00:02:05,907 We say 20, with go to array relocation 20. 46 00:02:05,907 --> 00:02:07,240 We don't have to iterate across. 47 00:02:07,240 --> 00:02:08,630 That's pretty good. 48 00:02:08,630 --> 00:02:11,020 >> Arrays are also relatively easy to sort. 49 00:02:11,020 --> 00:02:14,040 Every time we talked about a sorting algorithm, such as selection sort, 50 00:02:14,040 --> 00:02:18,820 insertion sort, bubble sort, merge sort, we always used arrays to do it, 51 00:02:18,820 --> 00:02:21,860 because arrays are pretty easy to sort, relative to the data structures 52 00:02:21,860 --> 00:02:22,970 we've seen so far. 53 00:02:22,970 --> 00:02:24,320 >> They're also relatively small. 54 00:02:24,320 --> 00:02:25,695 There's not a lot of extra space. 55 00:02:25,695 --> 00:02:29,210 You just set aside exactly as much as you need to hold your data, 56 00:02:29,210 --> 00:02:30,320 and that's pretty much it. 57 00:02:30,320 --> 00:02:33,180 So they're pretty small and efficient in that way. 58 00:02:33,180 --> 00:02:36,000 But another downside, though, is that they are fixed in size. 59 00:02:36,000 --> 00:02:38,630 We have to declare exactly how big we want our array to be, 60 00:02:38,630 --> 00:02:39,940 and we only get one shot at it. 61 00:02:39,940 --> 00:02:41,280 We can't grow and shrink it. 62 00:02:41,280 --> 00:02:44,582 >> If we need to grow or shrink it, we need to declare an entirely new array, 63 00:02:44,582 --> 00:02:47,750 copy all of the elements of the first array into the second array. 64 00:02:47,750 --> 00:02:51,410 And if we miscalculated that time, we need to do it again. 65 00:02:51,410 --> 00:02:52,760 Not so great. 66 00:02:52,760 --> 00:02:58,750 So arrays don't give us the flexibility to have variable numbers of elements. 67 00:02:58,750 --> 00:03:01,320 >> With a linked list, insertion is pretty easy. 68 00:03:01,320 --> 00:03:03,290 We just tack onto the front. 69 00:03:03,290 --> 00:03:04,892 Deletion is also pretty easy. 70 00:03:04,892 --> 00:03:06,100 We have to find the elements. 71 00:03:06,100 --> 00:03:07,270 That involve some searching. 72 00:03:07,270 --> 00:03:10,270 >> But once you've found the element you're looking for, all you need to do 73 00:03:10,270 --> 00:03:12,830 is change a pointer, possibly two if you have 74 00:03:12,830 --> 00:03:15,151 a linked list-- a doubly linked list, rather-- 75 00:03:15,151 --> 00:03:16,650 and then you can just free the node. 76 00:03:16,650 --> 00:03:18,399 You don't have to shift everything around. 77 00:03:18,399 --> 00:03:22,090 You just change two pointers, so that's pretty quick. 78 00:03:22,090 --> 00:03:23,470 >> Lookup is bad though, right? 79 00:03:23,470 --> 00:03:26,280 In order for us to find an element in a linked list, 80 00:03:26,280 --> 00:03:29,154 whether singly or doubly linked, we have to linear search it. 81 00:03:29,154 --> 00:03:32,320 We have to start at the beginning and move the end, or start at the end move 82 00:03:32,320 --> 00:03:33,860 to the beginning. 83 00:03:33,860 --> 00:03:35,474 We don't have random access anymore. 84 00:03:35,474 --> 00:03:37,265 So if we're doing a lot of searching, maybe 85 00:03:37,265 --> 00:03:39,830 a linked list isn't quite so good for us. 86 00:03:39,830 --> 00:03:43,750 >> They're also really difficult to sort, right? 87 00:03:43,750 --> 00:03:45,666 The only way you can really sort a linked list 88 00:03:45,666 --> 00:03:47,870 is to sort it as you construct it. 89 00:03:47,870 --> 00:03:50,497 But if you sort it as you construct it, you're no longer 90 00:03:50,497 --> 00:03:51,830 making quick insertions anymore. 91 00:03:51,830 --> 00:03:53,746 You're not just tacking things onto the front. 92 00:03:53,746 --> 00:03:55,710 You have to find the right spot to put it, 93 00:03:55,710 --> 00:03:57,820 and then your insertion becomes just about as bad 94 00:03:57,820 --> 00:03:59,390 as inserting into an array. 95 00:03:59,390 --> 00:04:03,130 So linked lists are not so great for sorting data. 96 00:04:03,130 --> 00:04:05,830 >> They're also pretty small, size-wise. 97 00:04:05,830 --> 00:04:08,496 Doubly linked list slightly larger than singly linked lists, 98 00:04:08,496 --> 00:04:10,620 which are slightly larger than arrays, but it's not 99 00:04:10,620 --> 00:04:13,330 a huge amount of wasted space. 100 00:04:13,330 --> 00:04:18,730 So if space is at a premium, but not a really intense premium, 101 00:04:18,730 --> 00:04:22,180 this might be the right way to go. 102 00:04:22,180 --> 00:04:23,330 >> Hash tables. 103 00:04:23,330 --> 00:04:25,850 Insertion into a hash table is fairly straightforward. 104 00:04:25,850 --> 00:04:26,980 It's a two-step process. 105 00:04:26,980 --> 00:04:30,700 First we need to run our data through a hash function to get a hash code, 106 00:04:30,700 --> 00:04:37,550 and then we insert the element into the hash table at that hash code location. 107 00:04:37,550 --> 00:04:40,879 >> Deletion, similar to linked list, is easy once you find the element. 108 00:04:40,879 --> 00:04:43,170 You have to find it first, but then when you delete it, 109 00:04:43,170 --> 00:04:45,128 you just need to exchange a couple of pointers, 110 00:04:45,128 --> 00:04:47,250 if you're using separate chaining. 111 00:04:47,250 --> 00:04:49,942 If you're using probing, or if you're not 112 00:04:49,942 --> 00:04:51,650 using chaining at all in your hash table, 113 00:04:51,650 --> 00:04:53,040 deletion is actually really easy. 114 00:04:53,040 --> 00:04:57,134 All you need to do is hash the data, and then go to that location. 115 00:04:57,134 --> 00:04:58,925 And assuming you don't have any collisions, 116 00:04:58,925 --> 00:05:01,650 you'll be able to delete very quickly. 117 00:05:01,650 --> 00:05:04,930 >> Now, lookup is where things get a little more complicated. 118 00:05:04,930 --> 00:05:06,910 It's on average better than linked lists. 119 00:05:06,910 --> 00:05:09,560 If you're using chaining, you still have a linked list, 120 00:05:09,560 --> 00:05:13,170 which means you still have the search detriment a linked list. 121 00:05:13,170 --> 00:05:18,390 But because you're taking your linked list and splitting it over 100 or 1,000 122 00:05:18,390 --> 00:05:25,380 or n elements in your hash table, you're linked lists are all one nth the size. 123 00:05:25,380 --> 00:05:27,650 They're all substantially smaller. 124 00:05:27,650 --> 00:05:32,080 You have n linked lists instead of one linked list of size n. 125 00:05:32,080 --> 00:05:34,960 >> And so this real-world constant factor, which we generally 126 00:05:34,960 --> 00:05:39,730 don't talk about in time complexity, it does actually make a difference here. 127 00:05:39,730 --> 00:05:43,020 So lookup is still linear search if you're using chaining, 128 00:05:43,020 --> 00:05:46,780 but the length of the list you're searching through 129 00:05:46,780 --> 00:05:50,080 is very, very short by comparison. 130 00:05:50,080 --> 00:05:52,995 Again, if sorting is your goal here, hash table's 131 00:05:52,995 --> 00:05:54,370 probably not the right way to go. 132 00:05:54,370 --> 00:05:56,830 Just use an array if sorting is really important to you. 133 00:05:56,830 --> 00:05:58,590 >> And they can run the gamut of size. 134 00:05:58,590 --> 00:06:01,640 It's hard to say whether a hash table is small or big, 135 00:06:01,640 --> 00:06:04,110 because it really depends on how large your hash table is. 136 00:06:04,110 --> 00:06:07,340 If you're only going to be storing five elements in your hash table, 137 00:06:07,340 --> 00:06:10,620 and you have a hash table with 10,000 elements in it, 138 00:06:10,620 --> 00:06:12,614 you're probably wasting a lot of space. 139 00:06:12,614 --> 00:06:15,030 Contrast being you can also have very compact hash tables, 140 00:06:15,030 --> 00:06:18,720 but the smaller your hash table gets, the longer each of those linked lists 141 00:06:18,720 --> 00:06:19,220 gets. 142 00:06:19,220 --> 00:06:22,607 And so there's really no way to define exactly the size of a hash table, 143 00:06:22,607 --> 00:06:24,440 but it's probably safe to say it's generally 144 00:06:24,440 --> 00:06:27,990 going to be bigger than a linked list storing the same data, 145 00:06:27,990 --> 00:06:30,400 but smaller than a trie. 146 00:06:30,400 --> 00:06:32,720 >> And tries are the fourth of these structures 147 00:06:32,720 --> 00:06:34,070 that we've been talking about. 148 00:06:34,070 --> 00:06:36,450 Inserting into a trie is complex. 149 00:06:36,450 --> 00:06:38,400 There's a lot of dynamic memory allocation, 150 00:06:38,400 --> 00:06:40,780 especially at the beginning, as you're starting to build. 151 00:06:40,780 --> 00:06:43,700 But it's constant time. 152 00:06:43,700 --> 00:06:47,690 It's only the human element here that makes it tricky. 153 00:06:47,690 --> 00:06:53,250 Having to encounter null pointer, malloc space, go there, possibly malloc space 154 00:06:53,250 --> 00:06:54,490 from there again. 155 00:06:54,490 --> 00:06:58,880 The sort of intimidation factor of pointers in dynamic memory allocation 156 00:06:58,880 --> 00:07:00,130 is the hurdle to clear. 157 00:07:00,130 --> 00:07:04,550 But once you've cleared it, insertion actually comes quite simple, 158 00:07:04,550 --> 00:07:06,810 and it certainly is constant time. 159 00:07:06,810 --> 00:07:07,680 >> Deletion is easy. 160 00:07:07,680 --> 00:07:11,330 All you need to do is navigate down a couple of pointers and free the node, 161 00:07:11,330 --> 00:07:12,420 so that's pretty good. 162 00:07:12,420 --> 00:07:13,930 Lookup is also pretty fast. 163 00:07:13,930 --> 00:07:16,780 It's only based on the length of your data. 164 00:07:16,780 --> 00:07:19,924 So if all of your data is five character strings, 165 00:07:19,924 --> 00:07:22,590 for example, you're storing five character strings in your trie, 166 00:07:22,590 --> 00:07:25,439 it only takes five steps to find what you're looking for. 167 00:07:25,439 --> 00:07:28,480 Five is just a constant factor, so again, insertion, deletion, and lookup 168 00:07:28,480 --> 00:07:31,670 here are all constant time, effectively. 169 00:07:31,670 --> 00:07:34,880 >> Another thing is that your trie is actually kind of already sorted, right? 170 00:07:34,880 --> 00:07:36,800 By virtue of how we're inserting elements, 171 00:07:36,800 --> 00:07:40,060 by going letter by letter of the key, or digit by digit of the key, 172 00:07:40,060 --> 00:07:45,084 typically, your trie ends up being kind of sorted as you build it. 173 00:07:45,084 --> 00:07:47,250 It doesn't really makes sense to think about sorting 174 00:07:47,250 --> 00:07:49,874 in the same way we think about it with arrays, or linked lists, 175 00:07:49,874 --> 00:07:51,070 or hash tables. 176 00:07:51,070 --> 00:07:54,780 But in some sense, your trie is sorted as you go. 177 00:07:54,780 --> 00:07:58,630 >> The downside, of course, is that a trie rapidly becomes huge. 178 00:07:58,630 --> 00:08:02,970 From every junction point, you might have-- if your key consists of digits, 179 00:08:02,970 --> 00:08:04,880 you have 10 other places you can go, which 180 00:08:04,880 --> 00:08:07,490 means that every node contains information 181 00:08:07,490 --> 00:08:11,440 about the data you want to store at that node, plus 10 pointers. 182 00:08:11,440 --> 00:08:14,430 Which, on CS50 IDE, is 80 bytes. 183 00:08:14,430 --> 00:08:17,220 So it's at least 80 bytes for every node that you create, 184 00:08:17,220 --> 00:08:19,240 and that's not even counting data. 185 00:08:19,240 --> 00:08:24,950 And if your nodes are letters instead of digits, 186 00:08:24,950 --> 00:08:27,825 now you have 26 pointers from every location. 187 00:08:27,825 --> 00:08:32,007 And 26 times 8 is probably 200 bytes, or something like that. 188 00:08:32,007 --> 00:08:33,840 And you have capital and lowercase-- you can 189 00:08:33,840 --> 00:08:35,381 see where I'm going with this, right? 190 00:08:35,381 --> 00:08:37,500 Your nodes can get really big, and so the trie 191 00:08:37,500 --> 00:08:40,470 itself, overall, can get really big, too. 192 00:08:40,470 --> 00:08:42,630 So if space is at a high premium on your system, 193 00:08:42,630 --> 00:08:45,830 a trie might not be the right way to go, even though its other benefits 194 00:08:45,830 --> 00:08:47,780 come into play. 195 00:08:47,780 --> 00:08:48,710 I'm Doug Lloyd. 196 00:08:48,710 --> 00:08:50,740 This is CS50. 197 00:08:50,740 --> 00:08:52,316 16389

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.