All language subtitles for 007 Objects Detection on Image with YOLO v3 and OpenCV.en

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese Download
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:09,040 --> 00:00:15,160 In this coding activity we are going to write the code for objects detection on image. 2 00:00:15,380 --> 00:00:26,530 We have input image, configuration file that describes parameters for every layer, trained weights and names 3 00:00:26,590 --> 00:00:27,540 for COCO 4 00:00:27,540 --> 00:00:36,580 classes. We will use all these with OpenCV deep learning library and will get detected objects on resulted 5 00:00:36,580 --> 00:00:43,480 image. It is expected that you have basic knowledge on how YOLO version 3 works. 6 00:00:43,880 --> 00:00:52,070 But anyway, you can refresh your knowledge and find brief description of algorithm in PDF attached 7 00:00:52,130 --> 00:01:01,890 to this lecture. You will also find short description of used parameters in configuration file. And other 8 00:01:01,920 --> 00:01:02,820 useful links. 9 00:01:03,790 --> 00:01:10,090 Even if you don't know anything about YOLO, step-by-step with a lot of practice in this course you will 10 00:01:10,090 --> 00:01:17,300 get principal idea. You can pause video now, read PDF for some time and come back whenever you are 11 00:01:17,310 --> 00:01:21,260 ready. Okay, let's jump to the code now. 12 00:01:35,040 --> 00:01:41,180 This is file with name yolo-3-image.py and we will use it to detect objects on given input 13 00:01:41,190 --> 00:01:44,760 image. Algorithm is as follows: 14 00:01:45,050 --> 00:01:48,600 Firstly, we read input image and get from it 15 00:01:48,620 --> 00:01:50,530 so-called blob. 16 00:01:50,680 --> 00:01:59,450 Then we load YOLO version 3 network and implement forward pass with blob. After that, we get bounding 17 00:01:59,450 --> 00:02:07,800 boxes and filter them with technique called Non-maximum suppression. 18 00:02:07,800 --> 00:02:16,660 Finally, we draw found bounding boxes and labels on original image. As a result, we will have OpenCV window 19 00:02:17,320 --> 00:02:23,470 with original image and bounding boxes with labels around detected objects. 20 00:02:23,510 --> 00:02:24,610 Let's go through the code. 21 00:02:31,970 --> 00:02:35,110 We import numpy and OpenCV libraries. 22 00:02:35,180 --> 00:02:41,990 Also, we will use library to measure time spent for forward pass. In this block 23 00:02:42,050 --> 00:02:50,750 we read input image by OpenCV function that give us BGR image in form of numpy array. We prepare 24 00:02:50,780 --> 00:03:00,790 OpenCV window giving the name and specifying that it is re-sizable with argument WINDOW NORMAL. 25 00:03:00,790 --> 00:03:02,230 To show original image 26 00:03:02,230 --> 00:03:05,380 we use function imshow() in which we pass 27 00:03:05,530 --> 00:03:08,470 name of the window and image itself. 28 00:03:11,470 --> 00:03:16,330 Also, we have here checkpoints, let's uncomment them. 29 00:03:16,330 --> 00:03:19,690 This checkpoint shows us shape of input 30 00:03:19,690 --> 00:03:20,110 image. 31 00:03:28,300 --> 00:03:34,750 We prepare variables for height and width of input image that we are going to use later in the code. 32 00:03:35,990 --> 00:03:40,130 And below there is checkpoint that shows these height and width. 33 00:03:50,700 --> 00:03:54,610 Let's comment everything else and have a look at our input image. 34 00:03:54,840 --> 00:03:58,430 Just select all lines of the code below and press Control Slash. 35 00:04:24,140 --> 00:04:29,690 Let's run the code. 36 00:04:29,800 --> 00:04:31,390 Here is our input image 37 00:04:34,320 --> 00:04:41,490 and we can see shape of the numpy array, height and width. Okay, 38 00:04:41,500 --> 00:04:45,570 let's continue. Let’s uncomment 39 00:04:45,640 --> 00:04:46,470 next block. 40 00:05:08,490 --> 00:05:14,790 In this block we prepare so called blob from image that gives us preprocessed image that we will feed to 41 00:05:14,790 --> 00:05:17,540 the network. To get the blob 42 00:05:17,550 --> 00:05:26,100 we use OpenCV function blobFromImage() that takes image itself in form of numpy array, scale factor 43 00:05:26,310 --> 00:05:34,250 by the help of which we normalize image dividing every element of array to 255, desired size 44 00:05:34,280 --> 00:05:44,430 of output blob from image and we specify that we don't need image to be cropped. And one more argument 45 00:05:44,520 --> 00:05:47,350 about swapping Red and Blue channels, 46 00:05:47,640 --> 00:05:55,770 we set it to True, because we opened input image by OpenCV library that gave us image in BGR order 47 00:05:55,830 --> 00:06:04,140 of channels. This function returns 4-dimensional tensor which is blob and shape of which we check 48 00:06:04,260 --> 00:06:07,970 by this checkpoint. Let's uncomment it. 49 00:06:18,230 --> 00:06:26,660 We will see that blob has shape with number of images, which is one, number of channels and size of 50 00:06:26,660 --> 00:06:34,810 the image. We also can show blob in OpenCV window with next checkpoint. 51 00:06:34,820 --> 00:06:42,620 But firstly, we need to slice the blob to get only needed numpy array of image, and move channels to 52 00:06:42,620 --> 00:06:51,840 the end with transpose() method where we specify needed order. Let's uncomment this checkpoint and have 53 00:06:51,840 --> 00:06:53,040 a look at the blob. 54 00:07:04,890 --> 00:07:08,190 Let's run the code. 55 00:07:08,400 --> 00:07:15,350 Here is our input image, and here is our blob from input image. 56 00:07:15,350 --> 00:07:15,850 Good, 57 00:07:15,950 --> 00:07:19,440 let's continue. Let’s uncomment 58 00:07:19,480 --> 00:07:20,320 next block. 59 00:07:46,670 --> 00:07:50,100 Now we load YOLO version 3 network. 60 00:07:50,420 --> 00:07:57,980 We read file with names of COCO classes and put them into the list that we are going to use when drawing 61 00:07:57,980 --> 00:08:03,520 bounding boxes around detected objects. With this checkpoint 62 00:08:03,570 --> 00:08:06,990 we can see all 80 classes in the list. 63 00:08:07,190 --> 00:08:21,740 Let's uncomment it. 64 00:08:21,950 --> 00:08:30,030 Now we load our trained YOLO v3 network by OpenCV deep learning library. We use function 65 00:08:30,120 --> 00:08:43,220 readNetFromDarknet(), specifying path to configuration file, and to trained weights. After we load our YOLO 66 00:08:43,220 --> 00:08:54,680 version 3 network we need to get only output layers' names: yolo 82, yolo 94, and yolo 106, because we are 67 00:08:54,680 --> 00:09:00,340 going to use them later to get response from feed forward pass. 68 00:09:00,490 --> 00:09:07,990 That's why, we firstly get all layers' names with function getLayerNames() and we can use checkpoint to 69 00:09:07,990 --> 00:09:13,060 print all layers' names inside YOLO version 3. Let's uncomment 70 00:09:13,070 --> 00:09:13,350 it. 71 00:09:21,200 --> 00:09:31,980 And then, we get only output layers' names with function getUnconnectedOutLayers() and put them into 72 00:09:31,980 --> 00:09:32,520 the list. 73 00:09:35,470 --> 00:09:40,030 With this checkpoint we can see resulted list. Let's uncomment it. 74 00:09:46,570 --> 00:09:54,130 Also, in this block we set minimum probability to threshold with in order to eliminate all weak predictions. 75 00:09:55,220 --> 00:10:03,440 And we set threshold to filter weak bounding boxes by non-maximum suppression technique. Later, when 76 00:10:03,440 --> 00:10:10,670 you use this code for your own purposes you can tweak these two parameters to find better performance. 77 00:10:17,860 --> 00:10:22,500 And finally, for this block we generate colours for bounding boxes. 78 00:10:22,870 --> 00:10:32,150 We use numpy function for generating integer numbers, specifying low and high boundaries, size which is 80 79 00:10:32,170 --> 00:10:39,610 classes with three numbers for Red, Green and Blue channels. 80 00:10:39,790 --> 00:10:46,710 By this checkpoint we can see the shape of generated numpy array and have a look at numbers of colour 81 00:10:46,810 --> 00:10:48,220 for the first class. 82 00:10:48,480 --> 00:10:51,030 Let's uncomment this checkpoint and have a look. 83 00:10:56,970 --> 00:10:59,360 Let's run the code. 84 00:10:59,400 --> 00:11:02,220 Here is our input image, 85 00:11:02,230 --> 00:11:21,560 here is our blob from input image. 86 00:11:21,570 --> 00:11:25,540 Here are our names for classes. Here 87 00:11:25,580 --> 00:11:38,500 are our all layers' names. Here are our output layers' names. And here are our generated numbers of colour for the first 88 00:11:38,500 --> 00:11:39,000 class. 89 00:11:41,230 --> 00:11:41,940 Great, 90 00:11:41,980 --> 00:11:42,670 let's continue. 91 00:11:45,860 --> 00:11:46,730 Let’s uncomment 92 00:11:46,730 --> 00:11:47,450 next block. 93 00:11:57,150 --> 00:12:00,890 We have prepared everything for implementing forward pass. 94 00:12:00,930 --> 00:12:02,490 Let's do it. 95 00:12:02,490 --> 00:12:06,390 We use function setInput() for setting as input 96 00:12:06,390 --> 00:12:16,180 our blob. And we use function forward() for getting output results only for specified layers that we prepared 97 00:12:16,300 --> 00:12:16,690 earlier. 98 00:12:19,390 --> 00:12:29,980 Also, we measure time spent for forward pass with time library specifying start point and end point. 99 00:12:30,200 --> 00:12:32,600 Here we print resulted time in seconds. 100 00:12:35,560 --> 00:12:36,760 Let's have a look 101 00:12:36,910 --> 00:12:38,360 but firstly let's comment back 102 00:12:38,380 --> 00:12:39,790 all previous checkpoints. 103 00:13:57,670 --> 00:13:58,100 Okay, 104 00:13:58,160 --> 00:14:02,780 let's run the code. 105 00:14:02,920 --> 00:14:11,700 Here is our input image. And here is time spent for forward pass to detect objects. Yours time should be 106 00:14:11,700 --> 00:14:15,680 different and depends on machine you use. 107 00:14:15,680 --> 00:14:18,630 Great, let's continue. Let’s uncomment 108 00:14:18,650 --> 00:14:19,490 next block. 109 00:14:41,010 --> 00:14:46,920 After we get response from output layers we can collect all detected objects and corresponding bounding 110 00:14:46,920 --> 00:14:51,900 boxes. Firstly, we prepare list for collecting bounding boxes, 111 00:14:53,310 --> 00:15:04,650 list for corresponding confidences and list for class numbers. Then, we go through output layers and through 112 00:15:04,710 --> 00:15:07,050 all detections in every layer. 113 00:15:09,340 --> 00:15:16,570 In the first for loop we iterate output layers and in the second for loop we iterate every detection. 114 00:15:19,100 --> 00:15:27,440 Every detected objects variable is given in form of numpy array where first four numbers represent 115 00:15:27,440 --> 00:15:35,810 coordinates of bounding box that is normalized to real width and height of original image, and rest 80 116 00:15:35,830 --> 00:15:43,710 numbers represent probabilities for every class that this bounding box might belongs to. In variable 117 00:15:43,710 --> 00:15:52,540 scores we collect all 80 probabilities for current detected object. Then, in variable class current we 118 00:15:52,540 --> 00:16:01,270 write index of class with highest probability and, in variable confidence current we write value of highest 119 00:16:01,270 --> 00:16:03,850 probability for current detected object. 120 00:16:15,620 --> 00:16:24,560 Then, we eliminate weak predictions by checking if current confidence is higher than minimum probability 121 00:16:24,680 --> 00:16:26,120 that we set earlier. 122 00:16:27,180 --> 00:16:36,410 If it's higher, then we get first four numbers from detected objects where first two are centre point 123 00:16:36,500 --> 00:16:45,870 of detected object and last two are width and height. These four numbers are normalized by width and height 124 00:16:45,960 --> 00:16:47,290 of original image, 125 00:16:47,610 --> 00:16:55,260 that's why we can just multiply them elementwise in order to scale up bounding box and fit it to original 126 00:16:55,320 --> 00:17:04,110 image size. We multiply elementwise sliced numpy array with four numbers to another numpy array 127 00:17:04,380 --> 00:17:15,270 also with four numbers. As a result, we get centre point of detected object in original big image and 128 00:17:15,330 --> 00:17:24,290 its real width and height. In order to use OpenCV windows to draw bounding box we need to calculate 129 00:17:24,320 --> 00:17:34,550 top left point. We have centre point, width and height. And we calculate top left corner by subtracting 130 00:17:34,910 --> 00:17:38,940 half of width and half of height. 131 00:17:41,460 --> 00:17:52,610 Finally, we add current detected object to the prepared lists: bounding box top left corner, width and height, 132 00:17:56,800 --> 00:18:07,460 maximum confidence, and class number that is one out of 80. Good, let's continue. Let’s uncomment 133 00:18:07,480 --> 00:18:08,250 next block. 134 00:18:22,560 --> 00:18:28,750 When we eliminate weak predictions by minimum probability we get all bounding boxes with confidences 135 00:18:28,750 --> 00:18:36,630 above this number. But it can happen that some of bounding boxes overlap each other. Which one to choose 136 00:18:36,630 --> 00:18:43,670 then? In order to answer this question, we use so called non-maximum suppression technique that filter 137 00:18:43,750 --> 00:18:51,030 non-needed bounding boxes if their corresponding confidences are low or there is another bounding box 138 00:18:51,180 --> 00:19:00,920 for this region with higher confidence. We use function NMSBoxes() and pass as arguments bounding 139 00:19:00,920 --> 00:19:10,500 boxes, their confidences, minimum probability and threshold that we defined earlier. 140 00:19:11,690 --> 00:19:13,850 As a result, in results variable 141 00:19:16,600 --> 00:19:20,800 we get final bounding boxes. 142 00:19:20,930 --> 00:19:21,410 Great, 143 00:19:21,410 --> 00:19:22,550 let's continue. 144 00:19:22,550 --> 00:19:23,290 Let’s uncomment 145 00:19:23,340 --> 00:19:24,080 final block. 146 00:19:46,400 --> 00:19:48,380 We've got needed bounding boxes 147 00:19:48,410 --> 00:19:51,880 and now we can draw them and label them. 148 00:19:51,920 --> 00:19:59,120 We check if there is any object being left after filtering by non-maximum suppression and go through 149 00:19:59,180 --> 00:20:03,140 indexes. These indexes 150 00:20:03,170 --> 00:20:09,890 we use to access the needed bounding boxes, corresponding class numbers and confidences in the lists 151 00:20:10,010 --> 00:20:16,080 that we defined and filled earlier, before non-maximum suppression. 152 00:20:16,140 --> 00:20:19,550 Also, we define counter for detected objects, 153 00:20:19,650 --> 00:20:26,800 and here we print the class number for detected object. 154 00:20:26,810 --> 00:20:32,620 Now we get coordinates of current bounding box and its width and height. 155 00:20:32,750 --> 00:20:39,240 Then we prepare colour for current class that we generated in form of numpy array earlier and convert 156 00:20:39,240 --> 00:20:45,890 it into the list with method tolist() in order to use it with OpenCV function to draw bounding box. 157 00:20:47,270 --> 00:20:48,470 With this checkpoint 158 00:20:48,500 --> 00:21:03,080 we can check that it was converted from numpy array to list type. 159 00:21:03,130 --> 00:21:10,720 Finally, we can draw bounding box with function rectangle() where we pass original image, coordinate of 160 00:21:10,840 --> 00:21:19,810 left top corner, coordinate of right bottom corner that we calculate by adding to left top corner width and 161 00:21:19,810 --> 00:21:30,530 height of bounding box. And we pass colour for current class, and thickness of the line for bounding box. 162 00:21:30,760 --> 00:21:39,460 Here we prepare text to show above bounding box with label and confidence. We start to draw text a little 163 00:21:39,460 --> 00:21:46,810 bit above from left top corner of bounding box by function putText() where we also pass original image, 164 00:21:47,470 --> 00:22:00,100 prepared text, starting point, font style, font size, colour which is the same with colour of current bounding 165 00:22:00,110 --> 00:22:13,300 box, and thickness. And we also print number of detected objects before and after non-maximum suppression. 166 00:22:16,330 --> 00:22:23,080 Okay, we are ready to show final results in the window with original image and bounding boxes with labels 167 00:22:23,170 --> 00:22:36,190 on it. Let's do it. Let's run the code. Here is our input image, here is resulted image. Here is time 168 00:22:36,280 --> 00:22:47,080 spent for forward pass to detect objects. List of objects with their labels. And comparison results of 169 00:22:47,080 --> 00:22:52,700 total objects been detected before non-maximum suppression and left after filtering 170 00:22:52,750 --> 00:22:53,860 with this technique. 171 00:22:59,930 --> 00:23:08,440 In the end of the file you can find section with useful comments. Well done! We detected objects on 172 00:23:08,440 --> 00:23:16,990 this image. How much time did it take for forward pass in your computer? Share your results in Question 173 00:23:17,050 --> 00:23:19,230 and Answer board! 174 00:23:19,280 --> 00:23:24,160 Let's now move to activity where you will detect objects on given image. 18062

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.