All language subtitles for 029 Object Detection - Step 3-en

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese Download
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,540 --> 00:00:07,080 Hello and welcome to this new tutorial today in this to all we are going to define the detect function 2 00:00:07,380 --> 00:00:12,660 that will do the detections exactly like what we did for open city only this time it is not going to 3 00:00:12,660 --> 00:00:14,260 be based on open city. 4 00:00:14,370 --> 00:00:20,340 It's going to be based on deep learning because we're going to do that detection through the SSD model 5 00:00:20,430 --> 00:00:23,350 single shot multi-book detection. 6 00:00:23,400 --> 00:00:28,310 So excited to start because this time we're really taking things at the next level. 7 00:00:28,320 --> 00:00:29,710 All right so let's do this. 8 00:00:29,760 --> 00:00:35,100 The first thing very important to understand is that exactly like before we are going to do a frame 9 00:00:35,100 --> 00:00:42,180 by frame detection that is that the detect function that we're about to implement will work on single 10 00:00:42,270 --> 00:00:49,170 images it will not do the detection on the video directly it will do the detection on each single image 11 00:00:49,380 --> 00:00:56,940 of the video and then using some tricks with actually image IO we will manage to extract all the frames 12 00:00:57,030 --> 00:01:02,880 of the video apply the detect function on the frames and then reassemble the whole thing to make the 13 00:01:02,880 --> 00:01:07,320 video with the rectangles detecting the dog and the humans. 14 00:01:07,350 --> 00:01:11,530 And Carol Carol is actually going to be detected on just one frame you'll see. 15 00:01:11,580 --> 00:01:15,150 But anyway that's frame by frame detection. 16 00:01:15,360 --> 00:01:17,490 That's the first thing important to understand. 17 00:01:17,500 --> 00:01:20,430 And now let's start to implement that function. 18 00:01:20,460 --> 00:01:27,530 So as usual we start with that to define a new function that we need to give a name to these functions 19 00:01:27,540 --> 00:01:29,920 we are going to call it detect. 20 00:01:30,180 --> 00:01:31,090 Just like that. 21 00:01:31,140 --> 00:01:36,560 And now we need to specify the arguments that this function is going to take. 22 00:01:36,660 --> 00:01:39,530 So this function detector is going to take three arguments. 23 00:01:39,720 --> 00:01:46,770 The first one is the image on which the detect function is going to be applied to detect the object. 24 00:01:46,770 --> 00:01:50,970 That's our first argument and we're going to call it Freyne. 25 00:01:51,200 --> 00:02:00,360 Now as opposed to before with open C we only need to input as arguments the frame and not the great 26 00:02:00,480 --> 00:02:01,130 image. 27 00:02:01,140 --> 00:02:06,710 We only need to put the original image here the frame the original frame in color and not the grayscale 28 00:02:06,720 --> 00:02:12,490 because remember we had to do this before because Open City only works on great images. 29 00:02:12,690 --> 00:02:17,370 But here we're doing something totally different and therefore we don't have to take the black and white 30 00:02:17,370 --> 00:02:18,920 version of the frame. 31 00:02:18,960 --> 00:02:26,350 All right so that's our first argument then the second argument will be net which will be the SS The 32 00:02:26,400 --> 00:02:33,780 neural network the single shots multi-book detection neural network and then the third and last argument 33 00:02:33,900 --> 00:02:40,320 is transform because there is going to be some transformations applied to the image but not to put them 34 00:02:40,440 --> 00:02:46,970 in black and white just to make sure that the images are compatible with the new one that work you know 35 00:02:47,130 --> 00:02:51,630 the images will be the input of the neural network and therefore they have to have a certain format 36 00:02:51,870 --> 00:02:59,310 and this transform argument that is the final argument of this detect function will transform the images 37 00:02:59,400 --> 00:03:02,950 so that they have the right format to get into the new network. 38 00:03:02,970 --> 00:03:03,720 All right. 39 00:03:03,870 --> 00:03:06,950 And then it's important to understand what this function will do. 40 00:03:07,020 --> 00:03:13,880 So as you might have guessed it will do the detections on the images the single images one by one. 41 00:03:14,130 --> 00:03:17,660 But what exactly is this function going to return. 42 00:03:17,850 --> 00:03:25,350 Well it will simply return this same frame but with the rectangle detecting the objects the dog and 43 00:03:25,350 --> 00:03:31,370 the humans and not only will there be direct Englebert Also there will be the label on the rectangle 44 00:03:31,380 --> 00:03:37,560 So we'll see some rectangles with the label humans because there are several persons on the video and 45 00:03:37,650 --> 00:03:39,810 one rectangle with the label Doug. 46 00:03:39,990 --> 00:03:44,850 So there will be everything that will be totally clear and amazing detection on the video. 47 00:03:44,850 --> 00:03:47,490 I can't wait to show you this. 48 00:03:47,550 --> 00:03:48,090 All right. 49 00:03:48,090 --> 00:03:54,520 So there are three arguments and now we're ready to go inside the function to define what we wanted 50 00:03:54,540 --> 00:03:56,250 to do. 51 00:03:56,250 --> 00:04:01,710 All right so the first thing we have to do is to get the height and the weight of the image and the 52 00:04:01,710 --> 00:04:06,210 frame the frame that is the argument here on which the function is applied. 53 00:04:06,240 --> 00:04:08,790 So we're going to introduce two new variables. 54 00:04:08,880 --> 00:04:19,310 Hide and with right and to get these height and width of the frame we're going to do that very efficiently. 55 00:04:19,540 --> 00:04:26,370 We're going to get it from our frame obviously because this is some information specific to the frame. 56 00:04:26,620 --> 00:04:29,530 And then this frame has some attributes. 57 00:04:29,530 --> 00:04:35,910 One of them is shape and shape is actually an attribute that returns a vector of three elements. 58 00:04:35,950 --> 00:04:38,820 The first one is the height of the frame. 59 00:04:38,830 --> 00:04:45,790 The second one therefore of index One is the width of the frame and is there one of index 2 is the number 60 00:04:45,790 --> 00:04:46,410 of channels. 61 00:04:46,510 --> 00:04:51,850 So the number of channels means that if you have a black and white image you will have one channel and 62 00:04:51,860 --> 00:04:56,010 if you have a color image you will have three channels for red blue and green. 63 00:04:56,290 --> 00:04:59,400 But we just want the height and width. 64 00:04:59,560 --> 00:05:05,710 So we're going to get the first index which is zero corresponding to the height and the second index 65 00:05:05,920 --> 00:05:08,270 one corresponding to the width. 66 00:05:08,470 --> 00:05:14,920 But the correct way to do this is actually to type a colon here and then to because that means we're 67 00:05:14,920 --> 00:05:19,090 taking the range from zero to two but with two excluded. 68 00:05:19,130 --> 00:05:23,900 So we're just taking zero and 1 and therefore we're taking the hide and do it. 69 00:05:24,340 --> 00:05:24,750 All right. 70 00:05:24,790 --> 00:05:26,240 So that's the first thing we had to do. 71 00:05:26,350 --> 00:05:32,590 And now we're going to do several transformations to go from the original image which is our friend 72 00:05:32,590 --> 00:05:39,000 right now to a torch variable that will be accepted into the as is the neural network. 73 00:05:39,070 --> 00:05:44,200 So there is a series of transformations to do before getting to this torche viable. 74 00:05:44,200 --> 00:05:51,580 The first one is to apply the transform transformation to make sure that the image has the right format. 75 00:05:51,580 --> 00:05:54,990 That is the right dimensions and the right color values. 76 00:05:55,000 --> 00:05:57,070 That's the first transformation we need to make. 77 00:05:57,220 --> 00:06:04,770 Once we have done this transformation then we will need to convert this transform frame from a number 78 00:06:04,850 --> 00:06:10,250 array because it will still be an entire array from a number of array to a torch tensor. 79 00:06:10,300 --> 00:06:11,380 That's not the same. 80 00:06:11,380 --> 00:06:18,520 That sensor is a more advanced matrix A more advanced array and therefore that's exactly what is our 81 00:06:18,520 --> 00:06:19,910 second transformation. 82 00:06:19,930 --> 00:06:26,890 We convert the transform frame from an umpire array to a torch tensor then that's not all we will need 83 00:06:26,890 --> 00:06:33,890 to do a third transformation which will be to add a fake dimension to the torch sensor and that fig 84 00:06:33,890 --> 00:06:35,840 damage and will respond to the batch. 85 00:06:35,950 --> 00:06:42,510 And then finally the fourth and final transformation to do before it is ready to go into the new one 86 00:06:42,510 --> 00:06:46,840 that work will be to convert it into a torch variable. 87 00:06:46,840 --> 00:06:53,990 Remember this variable closets we imported here is a class that converts a torch sensor into a torch 88 00:06:54,040 --> 00:06:58,620 variable that contains both the tensor and a gradient. 89 00:06:58,740 --> 00:07:04,420 And this torch variable will then be an element of the dynamic graph which will allow us later to do 90 00:07:04,420 --> 00:07:09,560 some very fast and efficient computation of the gradients during backward propagation. 91 00:07:09,850 --> 00:07:12,490 So there we go we have four transformations to make. 92 00:07:12,490 --> 00:07:15,820 We'll make them in the next tutorial starting with the first one. 93 00:07:16,000 --> 00:07:21,310 And then finally we'll be able to feed the neural network with this image. 94 00:07:21,550 --> 00:07:22,610 So let's match this. 95 00:07:22,660 --> 00:07:24,760 And until then enjoy computer vision. 10799

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.