Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:09,040 --> 00:00:15,160
In this coding activity we are going to write the code for objects detection on image.
2
00:00:15,380 --> 00:00:26,530
We have input image, configuration file that describes parameters for every layer, trained weights and names
3
00:00:26,590 --> 00:00:27,540
for COCO
4
00:00:27,540 --> 00:00:36,580
classes. We will use all these with OpenCV deep learning library and will get detected objects on resulted
5
00:00:36,580 --> 00:00:43,480
image. It is expected that you have basic knowledge on how YOLO version 3 works.
6
00:00:43,880 --> 00:00:52,070
But anyway, you can refresh your knowledge and find brief description of algorithm in PDF attached
7
00:00:52,130 --> 00:01:01,890
to this lecture. You will also find short description of used parameters in configuration file. And other
8
00:01:01,920 --> 00:01:02,820
useful links.
9
00:01:03,790 --> 00:01:10,090
Even if you don't know anything about YOLO, step-by-step with a lot of practice in this course you will
10
00:01:10,090 --> 00:01:17,300
get principal idea. You can pause video now, read PDF for some time and come back whenever you are
11
00:01:17,310 --> 00:01:21,260
ready. Okay, let's jump to the code now.
12
00:01:35,040 --> 00:01:41,180
This is file with name yolo-3-image.py and we will use it to detect objects on given input
13
00:01:41,190 --> 00:01:44,760
image. Algorithm is as follows:
14
00:01:45,050 --> 00:01:48,600
Firstly, we read input image and get from it
15
00:01:48,620 --> 00:01:50,530
so-called blob.
16
00:01:50,680 --> 00:01:59,450
Then we load YOLO version 3 network and implement forward pass with blob. After that, we get bounding
17
00:01:59,450 --> 00:02:07,800
boxes and filter them with technique called Non-maximum suppression.
18
00:02:07,800 --> 00:02:16,660
Finally, we draw found bounding boxes and labels on original image. As a result, we will have OpenCV window
19
00:02:17,320 --> 00:02:23,470
with original image and bounding boxes with labels around detected objects.
20
00:02:23,510 --> 00:02:24,610
Let's go through the code.
21
00:02:31,970 --> 00:02:35,110
We import numpy and OpenCV libraries.
22
00:02:35,180 --> 00:02:41,990
Also, we will use library to measure time spent for forward pass. In this block
23
00:02:42,050 --> 00:02:50,750
we read input image by OpenCV function that give us BGR image in form of numpy array. We prepare
24
00:02:50,780 --> 00:03:00,790
OpenCV window giving the name and specifying that it is re-sizable with argument WINDOW NORMAL.
25
00:03:00,790 --> 00:03:02,230
To show original image
26
00:03:02,230 --> 00:03:05,380
we use function imshow() in which we pass
27
00:03:05,530 --> 00:03:08,470
name of the window and image itself.
28
00:03:11,470 --> 00:03:16,330
Also, we have here checkpoints, let's uncomment them.
29
00:03:16,330 --> 00:03:19,690
This checkpoint shows us shape of input
30
00:03:19,690 --> 00:03:20,110
image.
31
00:03:28,300 --> 00:03:34,750
We prepare variables for height and width of input image that we are going to use later in the code.
32
00:03:35,990 --> 00:03:40,130
And below there is checkpoint that shows these height and width.
33
00:03:50,700 --> 00:03:54,610
Let's comment everything else and have a look at our input image.
34
00:03:54,840 --> 00:03:58,430
Just select all lines of the code below and press Control Slash.
35
00:04:24,140 --> 00:04:29,690
Let's run the code.
36
00:04:29,800 --> 00:04:31,390
Here is our input image
37
00:04:34,320 --> 00:04:41,490
and we can see shape of the numpy array, height and width. Okay,
38
00:04:41,500 --> 00:04:45,570
let's continue. Let’s uncomment
39
00:04:45,640 --> 00:04:46,470
next block.
40
00:05:08,490 --> 00:05:14,790
In this block we prepare so called blob from image that gives us preprocessed image that we will feed to
41
00:05:14,790 --> 00:05:17,540
the network. To get the blob
42
00:05:17,550 --> 00:05:26,100
we use OpenCV function blobFromImage() that takes image itself in form of numpy array, scale factor
43
00:05:26,310 --> 00:05:34,250
by the help of which we normalize image dividing every element of array to 255, desired size
44
00:05:34,280 --> 00:05:44,430
of output blob from image and we specify that we don't need image to be cropped. And one more argument
45
00:05:44,520 --> 00:05:47,350
about swapping Red and Blue channels,
46
00:05:47,640 --> 00:05:55,770
we set it to True, because we opened input image by OpenCV library that gave us image in BGR order
47
00:05:55,830 --> 00:06:04,140
of channels. This function returns 4-dimensional tensor which is blob and shape of which we check
48
00:06:04,260 --> 00:06:07,970
by this checkpoint. Let's uncomment it.
49
00:06:18,230 --> 00:06:26,660
We will see that blob has shape with number of images, which is one, number of channels and size of
50
00:06:26,660 --> 00:06:34,810
the image. We also can show blob in OpenCV window with next checkpoint.
51
00:06:34,820 --> 00:06:42,620
But firstly, we need to slice the blob to get only needed numpy array of image, and move channels to
52
00:06:42,620 --> 00:06:51,840
the end with transpose() method where we specify needed order. Let's uncomment this checkpoint and have
53
00:06:51,840 --> 00:06:53,040
a look at the blob.
54
00:07:04,890 --> 00:07:08,190
Let's run the code.
55
00:07:08,400 --> 00:07:15,350
Here is our input image, and here is our blob from input image.
56
00:07:15,350 --> 00:07:15,850
Good,
57
00:07:15,950 --> 00:07:19,440
let's continue. Let’s uncomment
58
00:07:19,480 --> 00:07:20,320
next block.
59
00:07:46,670 --> 00:07:50,100
Now we load YOLO version 3 network.
60
00:07:50,420 --> 00:07:57,980
We read file with names of COCO classes and put them into the list that we are going to use when drawing
61
00:07:57,980 --> 00:08:03,520
bounding boxes around detected objects. With this checkpoint
62
00:08:03,570 --> 00:08:06,990
we can see all 80 classes in the list.
63
00:08:07,190 --> 00:08:21,740
Let's uncomment it.
64
00:08:21,950 --> 00:08:30,030
Now we load our trained YOLO v3 network by OpenCV deep learning library. We use function
65
00:08:30,120 --> 00:08:43,220
readNetFromDarknet(), specifying path to configuration file, and to trained weights. After we load our YOLO
66
00:08:43,220 --> 00:08:54,680
version 3 network we need to get only output layers' names: yolo 82, yolo 94, and yolo 106, because we are
67
00:08:54,680 --> 00:09:00,340
going to use them later to get response from feed forward pass.
68
00:09:00,490 --> 00:09:07,990
That's why, we firstly get all layers' names with function getLayerNames() and we can use checkpoint to
69
00:09:07,990 --> 00:09:13,060
print all layers' names inside YOLO version 3. Let's uncomment
70
00:09:13,070 --> 00:09:13,350
it.
71
00:09:21,200 --> 00:09:31,980
And then, we get only output layers' names with function getUnconnectedOutLayers() and put them into
72
00:09:31,980 --> 00:09:32,520
the list.
73
00:09:35,470 --> 00:09:40,030
With this checkpoint we can see resulted list. Let's uncomment it.
74
00:09:46,570 --> 00:09:54,130
Also, in this block we set minimum probability to threshold with in order to eliminate all weak predictions.
75
00:09:55,220 --> 00:10:03,440
And we set threshold to filter weak bounding boxes by non-maximum suppression technique. Later, when
76
00:10:03,440 --> 00:10:10,670
you use this code for your own purposes you can tweak these two parameters to find better performance.
77
00:10:17,860 --> 00:10:22,500
And finally, for this block we generate colours for bounding boxes.
78
00:10:22,870 --> 00:10:32,150
We use numpy function for generating integer numbers, specifying low and high boundaries, size which is 80
79
00:10:32,170 --> 00:10:39,610
classes with three numbers for Red, Green and Blue channels.
80
00:10:39,790 --> 00:10:46,710
By this checkpoint we can see the shape of generated numpy array and have a look at numbers of colour
81
00:10:46,810 --> 00:10:48,220
for the first class.
82
00:10:48,480 --> 00:10:51,030
Let's uncomment this checkpoint and have a look.
83
00:10:56,970 --> 00:10:59,360
Let's run the code.
84
00:10:59,400 --> 00:11:02,220
Here is our input image,
85
00:11:02,230 --> 00:11:21,560
here is our blob from input image.
86
00:11:21,570 --> 00:11:25,540
Here are our names for classes. Here
87
00:11:25,580 --> 00:11:38,500
are our all layers' names. Here are our output layers' names. And here are our generated numbers of colour for the first
88
00:11:38,500 --> 00:11:39,000
class.
89
00:11:41,230 --> 00:11:41,940
Great,
90
00:11:41,980 --> 00:11:42,670
let's continue.
91
00:11:45,860 --> 00:11:46,730
Let’s uncomment
92
00:11:46,730 --> 00:11:47,450
next block.
93
00:11:57,150 --> 00:12:00,890
We have prepared everything for implementing forward pass.
94
00:12:00,930 --> 00:12:02,490
Let's do it.
95
00:12:02,490 --> 00:12:06,390
We use function setInput() for setting as input
96
00:12:06,390 --> 00:12:16,180
our blob. And we use function forward() for getting output results only for specified layers that we prepared
97
00:12:16,300 --> 00:12:16,690
earlier.
98
00:12:19,390 --> 00:12:29,980
Also, we measure time spent for forward pass with time library specifying start point and end point.
99
00:12:30,200 --> 00:12:32,600
Here we print resulted time in seconds.
100
00:12:35,560 --> 00:12:36,760
Let's have a look
101
00:12:36,910 --> 00:12:38,360
but firstly let's comment back
102
00:12:38,380 --> 00:12:39,790
all previous checkpoints.
103
00:13:57,670 --> 00:13:58,100
Okay,
104
00:13:58,160 --> 00:14:02,780
let's run the code.
105
00:14:02,920 --> 00:14:11,700
Here is our input image. And here is time spent for forward pass to detect objects. Yours time should be
106
00:14:11,700 --> 00:14:15,680
different and depends on machine you use.
107
00:14:15,680 --> 00:14:18,630
Great, let's continue. Let’s uncomment
108
00:14:18,650 --> 00:14:19,490
next block.
109
00:14:41,010 --> 00:14:46,920
After we get response from output layers we can collect all detected objects and corresponding bounding
110
00:14:46,920 --> 00:14:51,900
boxes. Firstly, we prepare list for collecting bounding boxes,
111
00:14:53,310 --> 00:15:04,650
list for corresponding confidences and list for class numbers. Then, we go through output layers and through
112
00:15:04,710 --> 00:15:07,050
all detections in every layer.
113
00:15:09,340 --> 00:15:16,570
In the first for loop we iterate output layers and in the second for loop we iterate every detection.
114
00:15:19,100 --> 00:15:27,440
Every detected objects variable is given in form of numpy array where first four numbers represent
115
00:15:27,440 --> 00:15:35,810
coordinates of bounding box that is normalized to real width and height of original image, and rest 80
116
00:15:35,830 --> 00:15:43,710
numbers represent probabilities for every class that this bounding box might belongs to. In variable
117
00:15:43,710 --> 00:15:52,540
scores we collect all 80 probabilities for current detected object. Then, in variable class current we
118
00:15:52,540 --> 00:16:01,270
write index of class with highest probability and, in variable confidence current we write value of highest
119
00:16:01,270 --> 00:16:03,850
probability for current detected object.
120
00:16:15,620 --> 00:16:24,560
Then, we eliminate weak predictions by checking if current confidence is higher than minimum probability
121
00:16:24,680 --> 00:16:26,120
that we set earlier.
122
00:16:27,180 --> 00:16:36,410
If it's higher, then we get first four numbers from detected objects where first two are centre point
123
00:16:36,500 --> 00:16:45,870
of detected object and last two are width and height.
These four numbers are normalized by width and height
124
00:16:45,960 --> 00:16:47,290
of original image,
125
00:16:47,610 --> 00:16:55,260
that's why we can just multiply them elementwise in order to scale up bounding box and fit it to original
126
00:16:55,320 --> 00:17:04,110
image size. We multiply elementwise sliced numpy array with four numbers to another numpy array
127
00:17:04,380 --> 00:17:15,270
also with four numbers. As a result, we get centre point of detected object in original big image and
128
00:17:15,330 --> 00:17:24,290
its real width and height. In order to use OpenCV windows to draw bounding box we need to calculate
129
00:17:24,320 --> 00:17:34,550
top left point. We have centre point, width and height. And we calculate top left corner by subtracting
130
00:17:34,910 --> 00:17:38,940
half of width and half of height.
131
00:17:41,460 --> 00:17:52,610
Finally, we add current detected object to the prepared lists: bounding box top left corner, width and height,
132
00:17:56,800 --> 00:18:07,460
maximum confidence, and class number that is one out of 80. Good, let's continue. Let’s uncomment
133
00:18:07,480 --> 00:18:08,250
next block.
134
00:18:22,560 --> 00:18:28,750
When we eliminate weak predictions by minimum probability we get all bounding boxes with confidences
135
00:18:28,750 --> 00:18:36,630
above this number. But it can happen that some of bounding boxes overlap each other. Which one to choose
136
00:18:36,630 --> 00:18:43,670
then? In order to answer this question, we use so called non-maximum suppression technique that filter
137
00:18:43,750 --> 00:18:51,030
non-needed bounding boxes if their corresponding confidences are low or there is another bounding box
138
00:18:51,180 --> 00:19:00,920
for this region with higher confidence. We use function NMSBoxes() and pass as arguments bounding
139
00:19:00,920 --> 00:19:10,500
boxes, their confidences, minimum probability and threshold that we defined earlier.
140
00:19:11,690 --> 00:19:13,850
As a result, in results variable
141
00:19:16,600 --> 00:19:20,800
we get final bounding boxes.
142
00:19:20,930 --> 00:19:21,410
Great,
143
00:19:21,410 --> 00:19:22,550
let's continue.
144
00:19:22,550 --> 00:19:23,290
Let’s uncomment
145
00:19:23,340 --> 00:19:24,080
final block.
146
00:19:46,400 --> 00:19:48,380
We've got needed bounding boxes
147
00:19:48,410 --> 00:19:51,880
and now we can draw them and label them.
148
00:19:51,920 --> 00:19:59,120
We check if there is any object being left after filtering by non-maximum suppression and go through
149
00:19:59,180 --> 00:20:03,140
indexes. These indexes
150
00:20:03,170 --> 00:20:09,890
we use to access the needed bounding boxes, corresponding class numbers and confidences in the lists
151
00:20:10,010 --> 00:20:16,080
that we defined and filled earlier, before non-maximum suppression.
152
00:20:16,140 --> 00:20:19,550
Also, we define counter for detected objects,
153
00:20:19,650 --> 00:20:26,800
and here we print the class number for detected object.
154
00:20:26,810 --> 00:20:32,620
Now we get coordinates of current bounding box and its width and height.
155
00:20:32,750 --> 00:20:39,240
Then we prepare colour for current class that we generated in form of numpy array earlier and convert
156
00:20:39,240 --> 00:20:45,890
it into the list with method tolist() in order to use it with OpenCV function to draw bounding box.
157
00:20:47,270 --> 00:20:48,470
With this checkpoint
158
00:20:48,500 --> 00:21:03,080
we can check that it was converted from numpy array to list type.
159
00:21:03,130 --> 00:21:10,720
Finally, we can draw bounding box with function rectangle() where we pass original image, coordinate of
160
00:21:10,840 --> 00:21:19,810
left top corner, coordinate of right bottom corner that we calculate by adding to left top corner width and
161
00:21:19,810 --> 00:21:30,530
height of bounding box. And we pass colour for current class, and thickness of the line for bounding box.
162
00:21:30,760 --> 00:21:39,460
Here we prepare text to show above bounding box with label and confidence. We start to draw text a little
163
00:21:39,460 --> 00:21:46,810
bit above from left top corner of bounding box by function putText() where we also pass original image,
164
00:21:47,470 --> 00:22:00,100
prepared text, starting point, font style, font size, colour which is the same with colour of current bounding
165
00:22:00,110 --> 00:22:13,300
box, and thickness. And we also print number of detected objects before and after non-maximum suppression.
166
00:22:16,330 --> 00:22:23,080
Okay, we are ready to show final results in the window with original image and bounding boxes with labels
167
00:22:23,170 --> 00:22:36,190
on it. Let's do it. Let's run the code. Here is our input image, here is resulted image. Here is time
168
00:22:36,280 --> 00:22:47,080
spent for forward pass to detect objects. List of objects with their labels. And comparison results of
169
00:22:47,080 --> 00:22:52,700
total objects been detected before non-maximum suppression and left after filtering
170
00:22:52,750 --> 00:22:53,860
with this technique.
171
00:22:59,930 --> 00:23:08,440
In the end of the file you can find section with useful comments. Well done! We detected objects on
172
00:23:08,440 --> 00:23:16,990
this image. How much time did it take for forward pass in your computer? Share your results in Question
173
00:23:17,050 --> 00:23:19,230
and Answer board!
174
00:23:19,280 --> 00:23:24,160
Let's now move to activity where you will detect objects on given image.
18062
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.