All language subtitles for 02_vectorization-part-1.en

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian Download
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:01,460 --> 00:00:03,795 In this video, you see 2 00:00:03,795 --> 00:00:06,735 a very useful idea called vectorization. 3 00:00:06,735 --> 00:00:09,045 When you're implementing a learning algorithm, 4 00:00:09,045 --> 00:00:11,895 using vectorization will both make your code 5 00:00:11,895 --> 00:00:15,870 shorter and also make it run much more efficiently. 6 00:00:15,870 --> 00:00:17,370 Learning how to write 7 00:00:17,370 --> 00:00:19,560 vectorized code will allow you to also 8 00:00:19,560 --> 00:00:20,610 take advantage of 9 00:00:20,610 --> 00:00:23,460 modern numerical linear algebra libraries, 10 00:00:23,460 --> 00:00:25,650 as well as maybe even GPU 11 00:00:25,650 --> 00:00:28,305 hardware that stands for graphics processing unit. 12 00:00:28,305 --> 00:00:30,180 This is hardware objectively designed 13 00:00:30,180 --> 00:00:33,030 to speed up computer graphics in your computer, 14 00:00:33,030 --> 00:00:35,790 but turns out can be used when you write vectorized code 15 00:00:35,790 --> 00:00:38,835 to also help you execute your code much more quickly. 16 00:00:38,835 --> 00:00:40,725 Let's look at a concrete example 17 00:00:40,725 --> 00:00:42,750 of what vectorization means. 18 00:00:42,750 --> 00:00:46,820 Here's an example with parameters w and b, 19 00:00:46,820 --> 00:00:50,725 where w is a vector with three numbers, 20 00:00:50,725 --> 00:00:53,225 and you also have a vector of features 21 00:00:53,225 --> 00:00:56,075 x with also three numbers. 22 00:00:56,075 --> 00:00:59,575 Here n is equal to 3. 23 00:00:59,575 --> 00:01:02,210 Notice that in linear algebra, 24 00:01:02,210 --> 00:01:05,930 the index or the counting starts from 1 and so 25 00:01:05,930 --> 00:01:10,420 the first value is subscripted w1 and x1. 26 00:01:10,420 --> 00:01:12,225 In Python code, 27 00:01:12,225 --> 00:01:15,140 you can define these variables w, 28 00:01:15,140 --> 00:01:18,485 b, and x using arrays like this. 29 00:01:18,485 --> 00:01:20,165 Here, I'm actually using 30 00:01:20,165 --> 00:01:22,220 a numerical linear algebra library 31 00:01:22,220 --> 00:01:24,515 in Python called NumPy, 32 00:01:24,515 --> 00:01:25,790 which is by far 33 00:01:25,790 --> 00:01:28,430 the most widely used numerical linear algebra library 34 00:01:28,430 --> 00:01:30,980 in Python and in machine learning. 35 00:01:30,980 --> 00:01:32,960 Because in Python, 36 00:01:32,960 --> 00:01:35,240 the indexing of arrays while 37 00:01:35,240 --> 00:01:38,860 counting in arrays starts from 0, 38 00:01:38,860 --> 00:01:41,345 you would access the first value of 39 00:01:41,345 --> 00:01:44,630 w using w square brackets 0. 40 00:01:44,630 --> 00:01:48,590 The second value using w square bracket 1, 41 00:01:48,590 --> 00:01:52,390 and the third and using w square bracket 2. 42 00:01:52,390 --> 00:01:56,040 The indexing here, it goes from 0,1 to 43 00:01:56,040 --> 00:02:00,105 2 rather than 1, 2 to 3. 44 00:02:00,105 --> 00:02:04,775 Similarly, to access individual features of x, 45 00:02:04,775 --> 00:02:06,665 you will use x0, 46 00:02:06,665 --> 00:02:08,570 x1, and x2. 47 00:02:08,570 --> 00:02:11,390 Many programming languages including Python 48 00:02:11,390 --> 00:02:14,705 start counting from 0 rather than 1. 49 00:02:14,705 --> 00:02:17,990 Now, let's look at an implementation without 50 00:02:17,990 --> 00:02:21,415 vectorization for computing the model's prediction. 51 00:02:21,415 --> 00:02:23,880 In codes, it will look like this. 52 00:02:23,880 --> 00:02:25,915 You take each parameter 53 00:02:25,915 --> 00:02:30,345 w and multiply it by his associated feature. 54 00:02:30,345 --> 00:02:33,470 Now, you could write your code like this, 55 00:02:33,470 --> 00:02:37,310 but what if n isn't three but instead n is a 100 or a 56 00:02:37,310 --> 00:02:39,910 100,000 is both inefficient for you 57 00:02:39,910 --> 00:02:43,710 the code and inefficient for your computer to compute. 58 00:02:43,710 --> 00:02:46,900 Here's another way. Without using 59 00:02:46,900 --> 00:02:49,400 vectorization but using a for loop. 60 00:02:49,400 --> 00:02:52,570 In math, you can use 61 00:02:52,570 --> 00:02:56,665 a summation operator to add all the products of w_j 62 00:02:56,665 --> 00:02:59,320 and x_j for j equals 1 through 63 00:02:59,320 --> 00:03:04,740 n. Then I'll cite the summation you add b at the end. 64 00:03:04,740 --> 00:03:08,340 To summation goes from j equals 1 up 65 00:03:08,340 --> 00:03:12,180 to and including n. For n equals 3, 66 00:03:12,180 --> 00:03:15,900 j therefore goes from 1, 2 to 3. 67 00:03:15,900 --> 00:03:19,450 In code, you can initialize after 0. 68 00:03:19,450 --> 00:03:23,680 Then for j in range from 0 to n, 69 00:03:23,680 --> 00:03:28,075 this actually makes j go from 0 to n minus 1. 70 00:03:28,075 --> 00:03:30,265 From 0, 1 to 2, 71 00:03:30,265 --> 00:03:36,065 you can then add to f the product of w_j times x_j. 72 00:03:36,065 --> 00:03:39,880 Finally, outside the for loop, you add b. 73 00:03:39,880 --> 00:03:41,724 Notice that in Python, 74 00:03:41,724 --> 00:03:45,175 the range 0 to n means that j goes from 0 75 00:03:45,175 --> 00:03:49,775 all the way to n minus 1 and does not include n itself. 76 00:03:49,775 --> 00:03:54,320 This is written range n in Python. 77 00:03:54,320 --> 00:03:56,390 But in this video, I added a 0 78 00:03:56,390 --> 00:03:59,765 here just to emphasize that it starts from 0. 79 00:03:59,765 --> 00:04:02,210 While this implementation is 80 00:04:02,210 --> 00:04:04,250 a bit better than the first one, 81 00:04:04,250 --> 00:04:06,560 this still doesn't use factorization, 82 00:04:06,560 --> 00:04:08,525 and isn't that efficient? 83 00:04:08,525 --> 00:04:10,850 Now, let's look at how you can 84 00:04:10,850 --> 00:04:14,180 do this using vectorization. 85 00:04:14,180 --> 00:04:17,870 This is the math expression of the function f, 86 00:04:17,870 --> 00:04:22,100 which is the dot product of w and x plus b, 87 00:04:22,100 --> 00:04:24,320 and now you can implement this 88 00:04:24,320 --> 00:04:26,360 with a single line of code by 89 00:04:26,360 --> 00:04:30,540 computing fp equals np dot dot, 90 00:04:30,540 --> 00:04:33,650 I said dot dot because the first dot is the period and 91 00:04:33,650 --> 00:04:38,050 the second dot is the function or the method called DOT. 92 00:04:38,050 --> 00:04:44,330 But is fp equals np dot dot w comma x and 93 00:04:44,330 --> 00:04:48,170 this implements the mathematical dot products 94 00:04:48,170 --> 00:04:51,320 between the vectors w and x. 95 00:04:51,320 --> 00:04:55,015 Then finally, you can add b to it at the end. 96 00:04:55,015 --> 00:04:59,330 This NumPy dot function is a vectorized implementation of 97 00:04:59,330 --> 00:05:01,160 the dot product operation between 98 00:05:01,160 --> 00:05:04,100 two vectors and especially when n is large, 99 00:05:04,100 --> 00:05:06,005 this will run much faster 100 00:05:06,005 --> 00:05:08,875 than the two previous code examples. 101 00:05:08,875 --> 00:05:11,100 I want to emphasize that vectorization 102 00:05:11,100 --> 00:05:13,725 actually has two distinct benefits. 103 00:05:13,725 --> 00:05:15,870 First, it makes code shorter, 104 00:05:15,870 --> 00:05:18,930 is now just one line of code. Isn't that cool? 105 00:05:18,930 --> 00:05:23,015 Second, it also results in your code running much faster 106 00:05:23,015 --> 00:05:25,790 than either of the two previous implementations 107 00:05:25,790 --> 00:05:28,360 that did not use vectorization. 108 00:05:28,360 --> 00:05:32,330 The reason that the vectorized implementation 109 00:05:32,330 --> 00:05:35,425 is much faster is behind the scenes. 110 00:05:35,425 --> 00:05:37,460 The NumPy dot function is 111 00:05:37,460 --> 00:05:39,710 able to use parallel hardware in 112 00:05:39,710 --> 00:05:41,780 your computer and this is true whether 113 00:05:41,780 --> 00:05:44,165 you're running this on a normal computer, 114 00:05:44,165 --> 00:05:46,555 that is on a normal computer CPU 115 00:05:46,555 --> 00:05:48,830 or if you are using a GPU, 116 00:05:48,830 --> 00:05:50,760 a graphics processor unit, 117 00:05:50,760 --> 00:05:54,045 that's often used to accelerate machine learning jobs. 118 00:05:54,045 --> 00:05:56,870 The ability of the NumPy dot function 119 00:05:56,870 --> 00:05:59,540 to use parallel hardware makes it much more 120 00:05:59,540 --> 00:06:01,820 efficient than the for loop or 121 00:06:01,820 --> 00:06:06,469 the sequential calculation that we saw previously. 122 00:06:06,469 --> 00:06:09,740 Now, this version is much more 123 00:06:09,740 --> 00:06:12,470 practical when n is large because you are 124 00:06:12,470 --> 00:06:16,800 not typing w0 times x0 plus w1 times x1 125 00:06:16,800 --> 00:06:19,040 plus lots of additional terms 126 00:06:19,040 --> 00:06:21,800 like you would have had for the previous version. 127 00:06:21,800 --> 00:06:25,775 But while this saves a lot on the typing, 128 00:06:25,775 --> 00:06:27,679 is still not that computationally 129 00:06:27,679 --> 00:06:31,210 efficient because it still doesn't use vectorization. 130 00:06:31,210 --> 00:06:35,345 To recap, vectorization makes your code shorter, 131 00:06:35,345 --> 00:06:37,175 so hopefully easier to write 132 00:06:37,175 --> 00:06:39,530 and easier for you or others to read, 133 00:06:39,530 --> 00:06:42,500 and it also makes it run much faster. 134 00:06:42,500 --> 00:06:44,570 But honest, this magic behind 135 00:06:44,570 --> 00:06:47,795 vectorization that makes this run so much faster. 136 00:06:47,795 --> 00:06:50,480 Let's take a look at what your computer is actually doing 137 00:06:50,480 --> 00:06:51,680 behind the scenes to make 138 00:06:51,680 --> 00:06:54,540 vectorized code run so much faster.9923

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.