All language subtitles for 28 - How to Develop a Good Hash Function English

af Afrikaans
ak Akan
sq Albanian
am Amharic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (SoranĂ®)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:08,760 --> 00:00:13,440 ‫This video I'm going to walk through what it takes to create a good hash function. 2 00:00:13,680 --> 00:00:20,280 ‫So before we decide what it takes to create a good hash function that's probably important to understand 3 00:00:20,280 --> 00:00:21,760 ‫what a hash function is. 4 00:00:21,840 --> 00:00:32,040 ‫And so what a hash function is is a delimiter that separates certain values into a data structure and 5 00:00:32,550 --> 00:00:42,840 ‫show you how this works visually and think of that as being able to have nice buckets that you can put 6 00:00:42,840 --> 00:00:43,510 ‫things in. 7 00:00:43,620 --> 00:00:53,750 ‫So of bucket one bucket two bucket three and four. 8 00:00:54,030 --> 00:00:59,630 ‫So one two three and four. 9 00:00:59,880 --> 00:01:09,390 ‫So if we were to have a set of data so say we have an array and arrays a made up of just these four 10 00:01:09,390 --> 00:01:20,390 ‫items so we have 1 4 2 3 2 1 4 1. 11 00:01:20,730 --> 00:01:29,040 ‫So this is our array in a hash function the way it would work is you would go through each item and 12 00:01:29,040 --> 00:01:34,880 ‫you would designate it to go into a particular bucket. 13 00:01:34,890 --> 00:01:39,850 ‫So you know one on one just like that. 14 00:01:39,930 --> 00:01:46,440 ‫And for your fours these two fours just click here. 15 00:01:46,440 --> 00:01:53,790 ‫I'm not doing this in any order as you probably notice men or twos look like that. 16 00:01:54,010 --> 00:01:59,570 ‫And for LWN three there we go. 17 00:01:59,580 --> 00:02:04,620 ‫So that's a way that a very very basic hash function works. 18 00:02:04,620 --> 00:02:14,390 ‫Now the problem with hash function or how hash tables is that they can be incredibly fast in the last 19 00:02:14,490 --> 00:02:19,400 ‫you have to deal with a lot of collisions and what collisions are. 20 00:02:19,410 --> 00:02:23,190 ‫Are these things right here. 21 00:02:23,970 --> 00:02:27,040 ‫He knows the three doesn't have one because only has one item. 22 00:02:27,210 --> 00:02:37,500 ‫Ideally you want to have as few collisions as possible so your hash tables have a round the same amount 23 00:02:37,590 --> 00:02:39,890 ‫of values going inside of them. 24 00:02:39,930 --> 00:02:48,120 ‫In fact if you want to write down a property the best definition that I've been able to see for what 25 00:02:48,120 --> 00:02:54,960 ‫it takes to create a good hash function is that the function should provide a uniform distribution of 26 00:02:54,960 --> 00:02:56,290 ‫hash values. 27 00:02:56,430 --> 00:03:03,900 ‫And this is because a non uniform distribution increases the number of collisions and the cost of resolution. 28 00:03:03,960 --> 00:03:12,090 ‫That's just a lot of fancy talk for saying say that this data was not what it is right now. 29 00:03:12,090 --> 00:03:14,640 ‫So say that we had. 30 00:03:14,820 --> 00:03:24,140 ‫And I'm going to raise this and we're going to work on some different values right here. 31 00:03:24,540 --> 00:03:33,960 ‫So we're still going to have our one our two three and our four except we're going to deal with a different 32 00:03:33,960 --> 00:03:34,920 ‫data set. 33 00:03:34,920 --> 00:03:38,670 ‫So in this dataset we're just going to have one 34 00:03:44,430 --> 00:03:50,940 ‫say We have a data set like this where a reasonable way of 90 percent of the data is actually the same 35 00:03:50,940 --> 00:03:52,300 ‫exact value. 36 00:03:52,320 --> 00:03:59,970 ‫This wouldn't be good because all we would have are collisions. 37 00:03:59,980 --> 00:04:08,740 ‫And if you remember back to the definition we want a uniform distribution uniform distribution means 38 00:04:08,740 --> 00:04:15,760 ‫that we should essentially have the same number of elements in each one of these buckets. 39 00:04:15,820 --> 00:04:21,760 ‫So each one of these should have as close to the same number of items as possible. 40 00:04:21,760 --> 00:04:26,640 ‫And so a good way of doing this is looking at an example. 41 00:04:26,650 --> 00:04:33,550 ‫So the example I'm going to use will get rid of all of this right here because I'm going to do actual 42 00:04:33,550 --> 00:04:34,660 ‫real life example. 43 00:04:34,660 --> 00:04:42,490 ‫This is one of my professors at the University taught when he explained how hash functions work and 44 00:04:42,490 --> 00:04:43,730 ‫how to create a good one. 45 00:04:44,020 --> 00:04:52,210 ‫Say that you wanted to organize students at a university you have a number of different options in terms 46 00:04:52,210 --> 00:04:57,760 ‫of having some theme that there's some fast way of looking them up. 47 00:04:57,760 --> 00:04:59,770 ‫So if you have a student 48 00:05:03,390 --> 00:05:10,090 ‫as a student right here there is a few different ways that you may be able to categorize a student so 49 00:05:10,240 --> 00:05:26,680 ‫you could probably say he may have a social security number he may have a phone number he may have a 50 00:05:27,040 --> 00:05:37,250 ‫driver's license and he will have a student ID. 51 00:05:37,620 --> 00:05:43,020 ‫Now when we're looking at a good hash function remember we're looking for something that has a uniform 52 00:05:43,020 --> 00:05:55,470 ‫distribution and so phone numbers even if you could assume that every every student had a phone number 53 00:05:55,740 --> 00:06:01,800 ‫you wouldn't have a uniform distribution because the majority of them if they're local students are 54 00:06:01,800 --> 00:06:08,510 ‫all going to have what they're all going to have similar area codes. 55 00:06:08,690 --> 00:06:13,930 ‫So this is not going to be uniform distribution so phone number would not be a good thing. 56 00:06:13,940 --> 00:06:20,270 ‫Driver's license seems like it would be a good one except for the fact that you're dealing with students 57 00:06:20,360 --> 00:06:24,880 ‫from different states and differ and they're in different parts of the world. 58 00:06:24,950 --> 00:06:30,680 ‫One you have students that are not even going to have a driver's license so if you even have a single 59 00:06:30,680 --> 00:06:36,140 ‫student who does not have it then it's not going to work as a hash function because every element has 60 00:06:36,140 --> 00:06:38,160 ‫to have this ID. 61 00:06:38,300 --> 00:06:39,320 ‫So that's not going to work. 62 00:06:39,320 --> 00:06:45,440 ‫And the other problem is you typically want to work with items that have the same numbers. 63 00:06:45,470 --> 00:06:51,530 ‫There's the same count of numbers so in other words if you have a driver's license number from one state 64 00:06:51,890 --> 00:07:04,390 ‫that has letters and then numbers compared with a different state that has a different set of numbers. 65 00:07:04,460 --> 00:07:06,520 ‫This is not going to be a good hash functions. 66 00:07:06,540 --> 00:07:12,170 ‫A driver's license is already not good for a number different reasons Social Security number would be 67 00:07:12,170 --> 00:07:19,280 ‫one of the best options except for the fact that you have a lot of students at universities that are 68 00:07:19,280 --> 00:07:23,270 ‫international and they do not have a Social Security number. 69 00:07:23,330 --> 00:07:25,950 ‫So that's not going to be the best option. 70 00:07:26,180 --> 00:07:32,360 ‫Because remember every single student has to have this in order to be looked up. 71 00:07:32,450 --> 00:07:37,430 ‫So that leaves the student ID as being one of the best options. 72 00:07:37,430 --> 00:07:39,260 ‫It's something every student has. 73 00:07:39,350 --> 00:07:42,200 ‫And it also has a uniform distribution. 74 00:07:42,200 --> 00:07:51,830 ‫And what I mean by that is say we have three students here say we have John 75 00:07:54,160 --> 00:08:06,470 ‫celly and Sam they have a corresponding I.D. and that corresponding I.D. is all going to be the same. 76 00:08:06,470 --> 00:08:10,050 ‫So the idea may be a seven digit number. 77 00:08:10,060 --> 00:08:12,430 ‫So we can say a 7 6 78 00:08:15,900 --> 00:08:17,450 ‫we'll just go 6 digits. 79 00:08:17,580 --> 00:08:33,550 ‫Just keep it easy 2 1 in 4 2 3 and then we'll do 7 2 4 6 2 8. 80 00:08:33,740 --> 00:08:34,130 ‫OK. 81 00:08:34,170 --> 00:08:36,420 ‫So why this. 82 00:08:36,420 --> 00:08:40,440 ‫And this is all coming from the student 90. 83 00:08:40,440 --> 00:08:47,160 ‫Why this would be a good hash function is because each one of these items fits all the properties of 84 00:08:47,160 --> 00:08:55,040 ‫a good hash function they're uniform meaning that every student has one. 85 00:08:55,050 --> 00:09:03,000 ‫They have a uniform distribution which means that these numbers are generated randomly. 86 00:09:03,000 --> 00:09:08,900 ‫So they're all different meaning that they should by mathematical probability laws. 87 00:09:09,030 --> 00:09:15,390 ‫If you have enough students that you should actually have the same number of students in each virtual 88 00:09:15,390 --> 00:09:21,490 ‫bucket of your hash table which means that your number of collisions should be minimized. 89 00:09:21,660 --> 00:09:24,870 ‫And they also have the same count. 90 00:09:24,900 --> 00:09:28,160 ‫So here are six items there of six. 91 00:09:28,170 --> 00:09:33,070 ‫On the second one and six on the third one and so that matches there. 92 00:09:33,090 --> 00:09:41,580 ‫So one of the best house hash functions for for's university student as just an example this can be 93 00:09:41,730 --> 00:09:48,720 ‫used in a number of different operations and it is as hash tables are incredibly useful in a number 94 00:09:48,720 --> 00:09:51,050 ‫of different fields of computer science. 95 00:09:51,060 --> 00:09:53,280 ‫This is a great way of generating one. 96 00:09:53,280 --> 00:10:00,390 ‫These are the properties a big thing to remember is just making sure that the data and the values are 97 00:10:00,480 --> 00:10:02,270 ‫as random as possible. 98 00:10:02,350 --> 00:10:12,600 ‫They're uniformly distributed and they are available for every single item that needs to be searched 99 00:10:12,600 --> 00:10:14,090 ‫for and categorized. 100 00:10:14,090 --> 00:10:17,220 ‫So that's what it takes to make a good hash function. 101 00:10:17,220 --> 00:10:20,640 ‫Please let me know if you have any questions whatsoever and I'll see in the next video. 10998

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.