Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,330 --> 00:00:02,950
Hello and welcome to this new tutorial.
2
00:00:02,950 --> 00:00:05,800
All right now things are going to get slightly more difficult.
3
00:00:05,850 --> 00:00:08,780
We're going to get into the heart of that as is immoral.
4
00:00:08,790 --> 00:00:10,820
So make sure you understand this model first.
5
00:00:10,830 --> 00:00:14,550
I highly recommend to watch Carol's intuition lectures first.
6
00:00:14,550 --> 00:00:19,850
So if that's the case and you're ready to go and you're all fresh let's tackle this.
7
00:00:20,130 --> 00:00:28,070
So first of all let's remind the context we got the output of the neural network SSD which is why.
8
00:00:28,200 --> 00:00:35,250
And then from this output why we extracted the important informations that we need and we extract them
9
00:00:35,460 --> 00:00:43,530
in this detections variable which is a tensor by taking the data of Y which corresponds to the tensor
10
00:00:43,530 --> 00:00:45,380
part of the torch variable.
11
00:00:45,480 --> 00:00:51,390
Remember the torch marble is composed of two elements a torched answer and a gradient.
12
00:00:51,450 --> 00:00:56,650
And by taking the data attribute of the output Y which is a variable.
13
00:00:56,820 --> 00:01:01,820
Well we get the first element of this torche variable which is the torch tensor.
14
00:01:01,980 --> 00:01:05,900
And now we're going to see what this tensor contains exactly.
15
00:01:05,910 --> 00:01:07,950
So what is this information exactly.
16
00:01:08,010 --> 00:01:11,890
What is this detections tensor and what does it contain.
17
00:01:12,150 --> 00:01:19,830
So this detection sensor I'm going to try to come in here to make sure everybody understands this detections
18
00:01:19,920 --> 00:01:23,790
tensor right now contains four elements.
19
00:01:23,790 --> 00:01:25,940
So I'm going to put them into brackets.
20
00:01:26,130 --> 00:01:34,350
The first element is the Bachche because we created this fake dimension of the batch with this and squeeze
21
00:01:34,470 --> 00:01:35,680
function here.
22
00:01:35,700 --> 00:01:39,940
So the first element of this detections tensor is this batch.
23
00:01:39,960 --> 00:01:45,150
So I know I told you about a batch of inputs but that's the same for the output.
24
00:01:45,150 --> 00:01:49,600
We also have batch of outputs associated to the same batch of inputs.
25
00:01:49,620 --> 00:01:52,470
That is the batch of inputs contains several inputs.
26
00:01:52,620 --> 00:01:57,900
Of these inputs we get the outputs and these upwards are all into about and that's where this first
27
00:01:57,990 --> 00:02:04,490
element of detections is you can write sections here detections equal.
28
00:02:04,500 --> 00:02:04,800
All right.
29
00:02:04,800 --> 00:02:06,740
So for us Almont the batch.
30
00:02:06,780 --> 00:02:12,300
Now the second element is the number of classes.
31
00:02:12,310 --> 00:02:13,790
So what do I mean by classes.
32
00:02:13,900 --> 00:02:17,180
I simply mean the objects that can be detected.
33
00:02:17,350 --> 00:02:19,830
So for example one class will be the dog.
34
00:02:19,960 --> 00:02:21,520
Another class will be the plane.
35
00:02:21,520 --> 00:02:24,630
Another class will be there but another class will be the car.
36
00:02:24,820 --> 00:02:29,300
So each class corresponds to each object that can be detected.
37
00:02:29,500 --> 00:02:35,830
And the second element of this detection sensor is the number of classes that is the number of objects
38
00:02:36,250 --> 00:02:39,000
that was detected in the input image.
39
00:02:39,010 --> 00:02:45,630
So a number of classes and now the third element is the number of occupants of the class.
40
00:02:45,670 --> 00:02:50,970
I'm going to write it down number of utterance of the class.
41
00:02:51,010 --> 00:02:52,170
So what does it mean.
42
00:02:52,330 --> 00:02:58,270
Well it means that for example for Class Number two the class number two which let's say a response
43
00:02:58,270 --> 00:02:59,280
to the dog.
44
00:02:59,500 --> 00:03:04,990
Well the number of arguments will be the number of the utterance of the dog.
45
00:03:05,170 --> 00:03:10,820
So in the video of the funny dog that we watched in the first title of this module there was one dog.
46
00:03:11,050 --> 00:03:14,250
But imagine there are several dogs to detect in the video.
47
00:03:14,260 --> 00:03:15,590
Let's say there are two dogs.
48
00:03:15,760 --> 00:03:19,420
Well we would have two numbers of occupants.
49
00:03:19,450 --> 00:03:24,040
The first occupants of the dog that is the first dog on the video and the second argument of the dog
50
00:03:24,170 --> 00:03:26,070
corresponding to the second dog in the video.
51
00:03:26,230 --> 00:03:28,970
So we would have the actor in zero and the occupants won.
52
00:03:29,090 --> 00:03:30,750
But in the video we only have one dog.
53
00:03:30,880 --> 00:03:35,080
So we will only have one number of utterance when accurate.
54
00:03:35,650 --> 00:03:43,010
All right so that's the third element of this detection sensor and the fourth element will be a couple.
55
00:03:43,080 --> 00:03:44,800
It's up all of five elements.
56
00:03:44,860 --> 00:03:48,800
So I'm going to write them now or we will get crazy.
57
00:03:48,850 --> 00:03:51,400
This is a couple of the following five elements.
58
00:03:51,400 --> 00:03:53,210
First Ullman's is the score.
59
00:03:53,230 --> 00:04:00,350
Second element is x 0 third element is y zero then fourth element is X.
60
00:04:00,400 --> 00:04:05,250
And the last element is why when and why does this double the belt.
61
00:04:05,410 --> 00:04:13,490
Well for each occurrence of each class in the batch we will get a score for this argument.
62
00:04:13,570 --> 00:04:19,600
And of course the coordinates of the upper left corner of the rectangle defect in the end and the lower
63
00:04:19,600 --> 00:04:20,830
right corner.
64
00:04:21,190 --> 00:04:22,670
And what are these scores about.
65
00:04:22,810 --> 00:04:26,460
Well these scores are going to go from low to high.
66
00:04:26,530 --> 00:04:34,750
We will have a score for each arguments of each class such that if the score is lower than 0.6 then
67
00:04:34,750 --> 00:04:41,540
the arguments of the class won't be found in the image if it is higher than 0.6 then it will consider
68
00:04:41,540 --> 00:04:42,590
it to be found.
69
00:04:42,780 --> 00:04:44,260
So that's what this course is about.
70
00:04:44,260 --> 00:04:48,010
It's like a threshold for each occupants of each class.
71
00:04:48,010 --> 00:04:52,460
We will get a score if it is more than 0.6 then no arguments will be found.
72
00:04:52,510 --> 00:04:57,910
And if this course hadn't 0.6 the arguments will be found and we will get the coordinates and the upper
73
00:04:57,910 --> 00:05:02,090
left corner and the lower right corner of the detected object.
74
00:05:02,110 --> 00:05:08,620
All right so now that we clearly understand what's inside the detections tensor we can move on to a
75
00:05:08,620 --> 00:05:09,450
for loop.
76
00:05:09,820 --> 00:05:15,700
Yes there we go we have to make a follow because we have to iterate through all the classes and then
77
00:05:15,700 --> 00:05:20,200
through all the arguments of the classes we're going to look for a certain number of occupancies for
78
00:05:20,230 --> 00:05:21,070
each class.
79
00:05:21,070 --> 00:05:26,020
We're going to get the score for each of these other answers and we'll make an IF condition to say that
80
00:05:26,200 --> 00:05:32,910
if discours had an open six we keep the utterance and if the score is lower than 0.6 we reject the accurate.
81
00:05:33,190 --> 00:05:33,740
All right.
82
00:05:33,780 --> 00:05:34,800
Are you ready.
83
00:05:34,810 --> 00:05:35,360
Perfect.
84
00:05:35,380 --> 00:05:37,290
So let's start this for you.
85
00:05:37,570 --> 00:05:42,910
So for then as I just said we're going to iterate through all the classes and therefore I'm going to
86
00:05:42,910 --> 00:05:43,650
add here.
87
00:05:43,710 --> 00:05:52,620
I for I in range detections that size 1.
88
00:05:52,910 --> 00:05:57,920
So detections that size one is exactly the number of classes.
89
00:05:58,140 --> 00:06:04,070
So this that I highlighted is the number of classes so we're just making a full loop from 0 to this
90
00:06:04,070 --> 00:06:05,150
number of classes.
91
00:06:05,180 --> 00:06:07,410
There is a number of detectable object.
92
00:06:07,550 --> 00:06:13,520
And so for all these classes we're going to go inside the for loop and we're going to start by introducing
93
00:06:13,520 --> 00:06:19,660
the variable J which will correspond to the utterance exactly the utterance.
94
00:06:19,670 --> 00:06:23,790
So why is the class and j will be the utterance of the class.
95
00:06:24,170 --> 00:06:30,890
And now we're going to start a second loop not a fluke this time it's going to be a while loop because
96
00:06:30,890 --> 00:06:35,980
you know we're going to put that condition into the loop which is actually more efficient.
97
00:06:36,200 --> 00:06:39,850
So it's like a loop combined with the if condition at the same time.
98
00:06:39,890 --> 00:06:44,760
And the trick to do that you're going to see is very intuitive is to do this well loop.
99
00:06:45,170 --> 00:06:51,330
And since we only want to keep the occupancies for which the score is higher than 0.6.
100
00:06:51,530 --> 00:06:59,380
Well we simply need to take the detections of the utterance J of the class I.
101
00:06:59,660 --> 00:07:06,500
But then let's not forget the Bache zero zero and then we're going to get the score which is the first
102
00:07:06,620 --> 00:07:08,250
element of this double here.
103
00:07:08,270 --> 00:07:16,840
So we simply need to add zero therefore since zero corresponds here to the index of the score Jaker
104
00:07:16,860 --> 00:07:20,300
response to the index of the occupants of the class I.
105
00:07:20,490 --> 00:07:29,500
Well this detections of 0 0 is exactly the score of the occurrence J of the class.
106
00:07:29,930 --> 00:07:40,530
And therefore Well the score of the current state of the class is larger than 0.6 than what are we going
107
00:07:40,530 --> 00:07:41,110
to do.
108
00:07:41,140 --> 00:07:45,720
We're going to keep this accurate and how are we going to keep it.
109
00:07:45,900 --> 00:07:51,680
Well we are going to keep it in the variable that we're going to call Peetie for point because we're
110
00:07:51,690 --> 00:07:57,720
going to keep that argument by keeping the point and therefore the coordinates of the upper left corner
111
00:07:57,810 --> 00:08:02,020
and the lower right corner of the rectangle detecting arguments J.
112
00:08:02,190 --> 00:08:02,980
The class II.
113
00:08:03,150 --> 00:08:09,200
So Peetie will be the detections and we're going to take the same is zero.
114
00:08:09,240 --> 00:08:16,440
That is the Bache then the class II then the arguments J and then be careful try to guess what I'm going
115
00:08:16,440 --> 00:08:17,230
to type here.
116
00:08:18,380 --> 00:08:22,640
Well it's not going to be zero because we're no longer interested in the score.
117
00:08:22,760 --> 00:08:26,990
We are now interested in these four coordinates.
118
00:08:26,990 --> 00:08:28,030
Exit row 1 0.
119
00:08:28,040 --> 00:08:33,920
That is the coordinates of the upper left corner of the rectangle and X one y one that is the coordinates
120
00:08:33,920 --> 00:08:36,910
of the lower right corner of the rectangle.
121
00:08:37,100 --> 00:08:40,610
And therefore we want to take the last four elements of this table.
122
00:08:40,670 --> 00:08:47,450
And so we're going to take the range from one to the end and the trick to take the range from one to
123
00:08:47,450 --> 00:08:55,370
the end is to use this the range from one colon and nothing and nothing means to the end.
124
00:08:55,370 --> 00:08:56,360
Perfect.
125
00:08:56,360 --> 00:09:00,750
So here with this trick we're taking X you are wise you are x 1 and white 1.
126
00:09:00,750 --> 00:09:02,320
So now we have our coordinates.
127
00:09:02,360 --> 00:09:03,410
That's perfect.
128
00:09:03,580 --> 00:09:11,450
But now remember that we created this scale tensor to do this normalization of the coordinates between
129
00:09:11,450 --> 00:09:12,380
0 and 1.
130
00:09:12,530 --> 00:09:18,470
And that's exactly where we're going to use the scale tensor and to use it we simply need to multiply
131
00:09:18,470 --> 00:09:26,000
all this by this scale tensor and that will apply to normalization which will give us the coordinates
132
00:09:26,000 --> 00:09:28,550
of these points at the scale of the image.
133
00:09:29,410 --> 00:09:36,310
And finally we need to do one last thing we need to transform this tensor for coordinates that is all
134
00:09:36,310 --> 00:09:39,690
this you know all this is a tensor right now it's a torch tensor.
135
00:09:39,880 --> 00:09:45,700
Well since now we're going to use open Svea to draw the rectangles thanks to the upper left corner and
136
00:09:45,700 --> 00:09:48,330
the lower right corner coordinates that we have.
137
00:09:48,550 --> 00:09:51,230
Well we need to put that back into an umpire.
138
00:09:51,460 --> 00:09:53,890
Because open sea works with an array.
139
00:09:53,890 --> 00:09:58,690
We're going to use the rectangle function you know to draw the rectangles exactly like what we did in
140
00:09:58,690 --> 00:10:05,570
the first module but this rectangle function works only with non-pay arrays and not with torch tensors.
141
00:10:05,620 --> 00:10:09,730
So we just need to convert that back into an umpire.
142
00:10:09,940 --> 00:10:12,090
And to do this there is nothing more simple.
143
00:10:12,220 --> 00:10:16,100
We just used a pi function like that.
144
00:10:16,300 --> 00:10:23,170
And here we go we have Arnon by Array containing the four normalized coordinates of the upper left corner
145
00:10:23,200 --> 00:10:27,140
and the lower right corner of the occurrence J of the class.
146
00:10:27,470 --> 00:10:33,490
And now since we have these coordinates Well we can draw the rectangle we're going to do that still
147
00:10:33,490 --> 00:10:35,270
in the well loop obviously.
148
00:10:35,370 --> 00:10:36,750
And you know how to do that.
149
00:10:36,850 --> 00:10:44,530
We take open city CB2 then the rectangle function and remember the arguments we first need to input
150
00:10:44,650 --> 00:10:46,870
frame the image.
151
00:10:46,870 --> 00:10:49,130
Then the second argument is zero.
152
00:10:49,240 --> 00:10:56,910
And that is Peachi of index 0 because pittie contains x 0 1 0 X Y N Y one.
153
00:10:57,040 --> 00:10:59,150
So PITI 0 will be zero.
154
00:10:59,230 --> 00:11:05,740
Then the third argument to be y 0 and that is P-T 1.
155
00:11:06,250 --> 00:11:13,300
Then the next argument is x 1 and that is Peachi to and the last one is why one.
156
00:11:13,360 --> 00:11:21,670
And that is P-T three because zero pity one 52 and three are exactly these four coordinates in the same
157
00:11:21,670 --> 00:11:24,660
or x y z x y y y.
158
00:11:24,990 --> 00:11:26,840
And now we're just going to add a safety.
159
00:11:26,870 --> 00:11:31,780
We're going to convert these values of the coordinates into integers.
160
00:11:31,840 --> 00:11:40,260
It's always safer to do that and to do this we're going to use the int function to convert them int
161
00:11:43,360 --> 00:11:43,810
int
162
00:11:47,070 --> 00:11:50,940
and and now be careful.
163
00:11:50,940 --> 00:11:57,720
The second argument of this rectangle function from open city is actually the coordinates of the upper
164
00:11:57,720 --> 00:11:59,820
left corner of the rectangle.
165
00:11:59,820 --> 00:12:05,650
So these two coordinates here should go into the same second argument.
166
00:12:05,730 --> 00:12:13,080
So I'm going to put some parentheses around these two coordinates and same for these two coordinates
167
00:12:13,260 --> 00:12:15,710
that should go into one same argument.
168
00:12:15,750 --> 00:12:21,420
That's the third argument of the rectangle function and these correspond of course to the coordinates
169
00:12:21,510 --> 00:12:25,830
of the lower right corner of the rectangle that is detecting the object.
170
00:12:25,830 --> 00:12:29,880
So again I'm putting some parenthesis around them.
171
00:12:30,270 --> 00:12:35,700
All right so first argument frame second argument the coordinates of the upper left corner.
172
00:12:35,790 --> 00:12:41,780
And third argument the coordinates of the lower right corner of the detector rectangle.
173
00:12:41,830 --> 00:12:48,970
Now next argument there is two more arguments to go the next one is the color of the rectangle.
174
00:12:49,020 --> 00:12:58,230
And we're going to choose the red color that is coded in the RGV code by 255 0 0.
175
00:12:58,270 --> 00:13:00,860
All right so that's our next argument.
176
00:13:00,870 --> 00:13:05,940
And now the final arguments of our rectangle function is.
177
00:13:06,000 --> 00:13:13,300
Remember the thickness of the text to display and as in the Mudgal one we're going to choose to.
178
00:13:13,530 --> 00:13:14,160
All good.
179
00:13:14,310 --> 00:13:21,240
And now let's move on to the next step which is to print the label onto the rectangle because we will
180
00:13:21,240 --> 00:13:23,440
have several objects to detect.
181
00:13:23,610 --> 00:13:29,010
So we want to print the label Doug onto the rectangle that is detecting the drug and then the label
182
00:13:29,010 --> 00:13:32,660
person onto the rectangle that is detecting the person.
183
00:13:32,710 --> 00:13:40,410
So it's indeed quite useful to print a label so to print these labels we're going to use open city again
184
00:13:40,540 --> 00:13:43,620
CB2 and then we're going to use another function.
185
00:13:43,620 --> 00:13:50,710
You can see that open city has many functions but the function we want right now is called put text.
186
00:13:50,820 --> 00:13:51,200
There we go.
187
00:13:51,210 --> 00:13:55,350
This one put text and this function takes several arguments.
188
00:13:55,470 --> 00:13:59,550
Not exactly the same as a rectangle function but pretty close.
189
00:13:59,550 --> 00:14:02,130
The first one is of course our frame.
190
00:14:02,130 --> 00:14:09,930
The second one is the text to display and to get this text we need to use the label map shortcut name
191
00:14:10,050 --> 00:14:17,910
that we gave to classes remember what lessors is this dictionary that maps the names of the classes
192
00:14:18,060 --> 00:14:22,050
with numbers and we can get the label of the text we're interested in.
193
00:14:22,050 --> 00:14:25,870
That is the class we're dealing with right now which is a Class I.
194
00:14:26,130 --> 00:14:33,980
Well we get the index I minus one y minus one it's because indexes in Python started 0.
195
00:14:34,020 --> 00:14:38,590
So the index minus one is actually the index of the ice class.
196
00:14:38,610 --> 00:14:39,040
All right.
197
00:14:39,060 --> 00:14:45,470
And then a third argument will be the position of the text where we want to display the label.
198
00:14:45,630 --> 00:14:51,610
And we're going to display it at the upper left corner of the rectangle just above the upper left corner
199
00:14:52,200 --> 00:14:59,250
and therefore we need to take these coordinates because they correspond exactly to the coordinates of
200
00:14:59,430 --> 00:15:00,630
the upper left corner.
201
00:15:00,810 --> 00:15:05,180
So that's our next argument then we need to choose a fund.
202
00:15:05,220 --> 00:15:06,640
That's our next argument.
203
00:15:06,750 --> 00:15:12,040
And we're going to pick the following fund which we get from our open library.
204
00:15:12,330 --> 00:15:19,530
And the name of the fund is called fund her say and not complex but simplex.
205
00:15:19,530 --> 00:15:22,140
That's a nice one right.
206
00:15:22,200 --> 00:15:24,830
Then we need to choose a size of the text.
207
00:15:24,990 --> 00:15:29,320
We're going to choose size two then a color of the text.
208
00:15:29,320 --> 00:15:38,480
We're going to choose the following color 2 5 5 5 5 and 2 5 5 then a thickness of the text.
209
00:15:38,520 --> 00:15:40,390
We're going to choose two again.
210
00:15:40,500 --> 00:15:46,920
And finally we want our text to be displayed continuously you know with continuous lines and not little
211
00:15:46,920 --> 00:15:47,670
dots.
212
00:15:47,850 --> 00:15:56,280
And to make sure we get this we need to hear the last argument which is going to be C-v to that line.
213
00:15:56,700 --> 00:16:02,250
Right that will just give us some continuous lines to display the text and not little dots and that's
214
00:16:02,250 --> 00:16:02,850
it.
215
00:16:02,910 --> 00:16:11,100
We displayed a nice label onto our detector rectangles and now eventually we have one last thing to
216
00:16:11,100 --> 00:16:12,700
do inside this while loop.
217
00:16:12,840 --> 00:16:18,300
And then after that one final thing to do over all this one I think you have to do inside this while
218
00:16:18,300 --> 00:16:25,090
loop is of course to increment J because right now we're dealing with the arguments 0 because Jaycar
219
00:16:25,230 --> 00:16:31,950
0 and we did not do a for loop we did a while loop and then a while loop we must not forget to increment
220
00:16:32,160 --> 00:16:34,480
the iterative variable that is J.
221
00:16:34,590 --> 00:16:39,420
So in other words we just need to deal with the next occupants of the class.
222
00:16:39,440 --> 00:16:46,250
I and to do this we just need to increment J like that J plus equals 1.
223
00:16:46,560 --> 00:16:47,310
Perfect.
224
00:16:47,310 --> 00:16:53,310
So now we can get out of this while loop and also get out of this for loop because we did exactly what
225
00:16:53,310 --> 00:16:56,850
we had to do for all the objects that can be detected.
226
00:16:57,280 --> 00:17:03,650
And now the final thing that we need to do is just to return the frame and that's it.
227
00:17:03,720 --> 00:17:11,660
We have the frame returned with the detector rectangles on any object that is part of the training and
228
00:17:11,790 --> 00:17:12,900
as is the model.
229
00:17:12,900 --> 00:17:18,720
So you know that as the model was trained to detect between 30 to 40 objects here in this loop we look
230
00:17:18,720 --> 00:17:22,700
for all these objects and several possible occupancies of these objects.
231
00:17:22,710 --> 00:17:26,290
We approach them as core and just matching scores high enough.
232
00:17:26,400 --> 00:17:32,190
We keep it and therefore we have several objects detected on this return frame with the rectangles and
233
00:17:32,190 --> 00:17:33,150
the labels.
234
00:17:33,150 --> 00:17:34,470
So congratulations.
235
00:17:34,470 --> 00:17:35,720
That was the hardest part.
236
00:17:35,760 --> 00:17:37,130
I hope that's OK.
237
00:17:37,170 --> 00:17:43,820
Now the rest will be much easier and soon enough we should be able to see that output video with the
238
00:17:43,830 --> 00:17:45,360
several detected object.
239
00:17:45,360 --> 00:17:46,820
I can't wait to show you this.
240
00:17:46,950 --> 00:17:48,990
And until then enjoy computer vision.
25422
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.