Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,450 --> 00:00:02,520
Hello and welcome to this tutorial.
2
00:00:02,550 --> 00:00:06,760
Now we have everything we have our frames that we're going to get from the video.
3
00:00:06,810 --> 00:00:13,110
We have our neural network net the SS The neural network and we have our transform transformation.
4
00:00:13,110 --> 00:00:17,570
So we are ready to do some object detection on a video.
5
00:00:17,790 --> 00:00:24,480
This video is going to be the funny dog that before there is this video of this very cute dog bouncing
6
00:00:24,750 --> 00:00:27,810
on the field on the grass there is curial in the video.
7
00:00:27,810 --> 00:00:32,650
We're going to try to detect as well another human and some other humans behind.
8
00:00:32,640 --> 00:00:38,300
Let's see if it's powerful enough to even detect the humans that are behind the yard.
9
00:00:38,610 --> 00:00:41,440
Well we might figure it out in this Statoil.
10
00:00:41,490 --> 00:00:45,710
And if not in this detail it's going to be the next one so let's do it.
11
00:00:45,720 --> 00:00:48,870
Let's start by opening the video.
12
00:00:48,900 --> 00:00:50,450
That's the first thing we need to do.
13
00:00:50,670 --> 00:00:54,260
Then we're going to get all the frames of the video one by one.
14
00:00:54,300 --> 00:01:00,550
We're going to apply the detect function on these frames with our SSD net and our transform transformation.
15
00:01:00,690 --> 00:01:06,090
Then we'll get the processed images with this rectangle and then we'll reassemble the whole thing to
16
00:01:06,210 --> 00:01:07,920
have the final video.
17
00:01:07,920 --> 00:01:10,880
All right let's do this let's first open the video.
18
00:01:11,010 --> 00:01:18,640
So we're going to create a new object that we're going to call reader and this object is going to be
19
00:01:18,640 --> 00:01:21,800
created with Image IO image.
20
00:01:21,820 --> 00:01:25,020
I always a great library to process videos.
21
00:01:25,050 --> 00:01:31,010
There is another great library that could do the job that is Bill P L in capital letters.
22
00:01:31,030 --> 00:01:36,210
We actually tried that but it turned out to be much more efficient with Image IO.
23
00:01:36,460 --> 00:01:44,480
So we're going to open the video with Image IO and to do this we well first we get our image I O library
24
00:01:44,860 --> 00:01:50,150
and then we're going to use to get this core reader function.
25
00:01:50,380 --> 00:01:52,560
And inside this function what do we need to input.
26
00:01:52,690 --> 00:02:01,610
Well of course it's the video end quote and the video is well the name of the video is funny dog Dutt
27
00:02:01,760 --> 00:02:03,310
and for.
28
00:02:03,360 --> 00:02:04,110
All right.
29
00:02:04,260 --> 00:02:11,010
So that opens the video basically funny dog that and before we're going to watch the video again before
30
00:02:11,100 --> 00:02:17,870
we get the final output then the next step is to get the frequence of the frames.
31
00:02:18,000 --> 00:02:19,730
That is the FPL frequents.
32
00:02:19,810 --> 00:02:27,290
FPL means frames per second and we just need to get this frequence because we're going to need it afterwards.
33
00:02:27,330 --> 00:02:32,810
So let's call this frequence fix and introducing a new variable and to get it.
34
00:02:32,820 --> 00:02:42,540
Well we can get it from our reader object from which we used to get underscore Meda underscored data
35
00:02:43,000 --> 00:02:45,020
Barondess is nothing inside.
36
00:02:45,090 --> 00:02:50,220
But then in square brackets here you have to specify in quotes.
37
00:02:50,410 --> 00:02:56,760
US and that will just get you the FBI frequence that is the number of frames per second.
38
00:02:56,760 --> 00:02:57,250
All right.
39
00:02:57,270 --> 00:03:03,160
And now next step and you're going to understand now why we needed that frames per second frequents
40
00:03:03,900 --> 00:03:10,180
the next step is to create and now put video that is going to be the final output with that same FBA
41
00:03:10,200 --> 00:03:11,010
sequence.
42
00:03:11,250 --> 00:03:16,080
And we're going to create that output video with Again image IO.
43
00:03:16,290 --> 00:03:23,080
So we have to give it a different name we're going to call it writer and again and we're going to call
44
00:03:23,080 --> 00:03:26,010
our image I O library.
45
00:03:26,260 --> 00:03:32,180
And this time since we're not opening a video we are creating a new video where we're not going to use
46
00:03:32,180 --> 00:03:33,580
a get rid of function.
47
00:03:33,580 --> 00:03:41,410
We're going to use to get writer function that basically creates something like an object that will
48
00:03:41,410 --> 00:03:43,090
contain a video.
49
00:03:43,240 --> 00:03:49,270
And this something will add the sequence of frames you know we will append to process frames that is
50
00:03:49,570 --> 00:03:52,540
the frames on which we apply to detect function.
51
00:03:52,540 --> 00:03:54,070
So there we go get writer.
52
00:03:54,330 --> 00:03:59,980
And then we need to put two arguments the first one is the name we want to give to this output video
53
00:04:00,310 --> 00:04:01,490
and we're going to call it.
54
00:04:01,660 --> 00:04:11,340
Well very simply outputs that and before this way I'll put that image for a second argument which is
55
00:04:11,340 --> 00:04:15,180
actually the frequence how many frames per second do we want.
56
00:04:15,180 --> 00:04:20,010
And so while the name of the argument is yes you have to specify it.
57
00:04:20,220 --> 00:04:21,900
And this is equal to this.
58
00:04:21,900 --> 00:04:29,440
FP is variable here that we got things to get made a data function from our reader object.
59
00:04:29,640 --> 00:04:33,190
So FP as equals Appius are right.
60
00:04:33,210 --> 00:04:36,300
So now we have everything we have.
61
00:04:36,300 --> 00:04:43,110
All we need to start this for loop and process each of the images of the funny the video.
62
00:04:43,280 --> 00:04:48,510
And so of course you understand that in each step of the loop we're going to work on a specific frame
63
00:04:48,570 --> 00:04:54,660
of the video and on that frame we're going to apply the detect function to detect the objects in the
64
00:04:54,660 --> 00:04:57,410
frame and print the rectangles.
65
00:04:57,690 --> 00:04:58,380
All right.
66
00:04:58,440 --> 00:05:02,350
So let's start this for loop for I.
67
00:05:02,520 --> 00:05:07,170
So I will just correspond to the number of the image that is processed.
68
00:05:07,200 --> 00:05:13,620
So you know I'm going to go from 0 to I told you there's going to be 68 fremd so I'm going to go from
69
00:05:13,620 --> 00:05:16,500
zero to 68 or 67 something like that.
70
00:05:16,770 --> 00:05:22,980
So for I and then frame of course we're iterating over the frames of the video.
71
00:05:23,280 --> 00:05:28,290
That's why I'm taking frame that's just the name of the variable that will exactly correspond to each
72
00:05:28,290 --> 00:05:30,180
of the frames of the video.
73
00:05:30,200 --> 00:05:31,940
So I-frame in.
74
00:05:32,190 --> 00:05:40,470
And then we can use and enumerate parenthesis reader that that will just iterate through all the frames
75
00:05:40,530 --> 00:05:42,330
of the reader video.
76
00:05:42,530 --> 00:05:43,820
The funny the video.
77
00:05:44,160 --> 00:05:47,630
So for I-frame numerate reader.
78
00:05:48,030 --> 00:05:49,240
Well what do we do.
79
00:05:49,410 --> 00:05:57,450
We simply need to apply to detect method on this frame right here to have some objects detected by the
80
00:05:57,450 --> 00:06:04,530
net which is or as is the neural network that we created associated to the right transformation to transform
81
00:06:04,620 --> 00:06:10,910
object here to make sure that this frame can be accepted into this net.
82
00:06:10,920 --> 00:06:11,280
All right.
83
00:06:11,280 --> 00:06:13,950
So nothing more easy to do here.
84
00:06:13,980 --> 00:06:25,860
We apply the detect function to our frame with our neural network net and with our transformation transform.
85
00:06:25,870 --> 00:06:28,200
However there is just little trick here.
86
00:06:28,280 --> 00:06:31,030
NET is actually an advanced structure.
87
00:06:31,030 --> 00:06:33,750
Remember it's an object of the build as the class.
88
00:06:33,870 --> 00:06:40,680
And in order to get this neural network that is expected by the Dodik function it's not only that the
89
00:06:40,680 --> 00:06:48,750
22 input it's actually not that level that just to align with the way to build as is the function was
90
00:06:48,750 --> 00:06:58,680
made but basically net that yvel represents our new network net from which we get the output y and therefore
91
00:06:58,830 --> 00:07:00,840
the detections on each frame.
92
00:07:01,260 --> 00:07:09,480
So perfect we detected the objects on our frame but remember that this detect function returns actually
93
00:07:09,480 --> 00:07:16,620
the frame the processed frame with the detected object and therefore I'm going to introduce here a new
94
00:07:16,620 --> 00:07:20,030
variable that will represent that process frame.
95
00:07:20,190 --> 00:07:26,240
And there is actually no danger to call it again frame so I'm just overwriting the frame here.
96
00:07:26,490 --> 00:07:28,440
But that's totally OK here.
97
00:07:28,440 --> 00:07:33,930
The frame is the original frame with no detection made yet and this frame is the new frame.
98
00:07:33,930 --> 00:07:37,650
After we apply the detect function with the detector rectangles.
99
00:07:37,850 --> 00:07:38,610
All right.
100
00:07:38,820 --> 00:07:41,990
So now the loop is now over.
101
00:07:42,000 --> 00:07:43,250
What do we need to do.
102
00:07:43,500 --> 00:07:52,710
Well each time we get a new process frame with the objects detected we need to append this frame to
103
00:07:52,920 --> 00:08:00,190
our writer output video and that's exactly what we're going to do now to append a frame to our right
104
00:08:00,190 --> 00:08:01,000
of video.
105
00:08:01,020 --> 00:08:10,790
We simply need to take our writer object than dot and then we use append underscore data function to
106
00:08:10,800 --> 00:08:11,940
which we need two input.
107
00:08:11,940 --> 00:08:18,690
Of course what we want to append to the writer output video and that is of course this new preset frame
108
00:08:18,900 --> 00:08:20,990
with the detected object.
109
00:08:21,000 --> 00:08:21,680
Perfect.
110
00:08:21,720 --> 00:08:24,060
So now the process for them is appended.
111
00:08:24,240 --> 00:08:32,310
And now we can just add print I just to see during the detection which frame we reached.
112
00:08:32,310 --> 00:08:32,910
All right.
113
00:08:32,910 --> 00:08:39,750
That's just a practical thing to see the number of the process frame will be displayed during the detection.
114
00:08:39,750 --> 00:08:42,050
And finally last line of code.
115
00:08:42,240 --> 00:08:49,260
Well we just need to close the process that manages the creation of this video and to close it.
116
00:08:49,260 --> 00:08:56,430
We just need to take our writer and then add that and then close parenthesis the close function that
117
00:08:56,430 --> 00:09:01,390
will close the process and we'll get the output video in that same repertory.
118
00:09:01,410 --> 00:09:04,120
That is our working directory folder.
119
00:09:04,140 --> 00:09:06,210
All right so that's it.
120
00:09:06,210 --> 00:09:08,660
We're ready to watch the final output.
121
00:09:08,670 --> 00:09:10,990
So what do you thing do we do it in Statoil.
122
00:09:11,220 --> 00:09:20,230
Well yeah let's do it let's do it right now so we simply need to select all the code and execute.
123
00:09:20,250 --> 00:09:21,370
There we go.
124
00:09:21,510 --> 00:09:23,850
No error just a warning that's OK.
125
00:09:23,850 --> 00:09:29,060
That's just a warning for f MPEG library but it's OK.
126
00:09:29,490 --> 00:09:33,420
And here we can see the number of the frame that is processed.
127
00:09:33,420 --> 00:09:35,550
You can see that it's going actually pretty fast.
128
00:09:35,730 --> 00:09:37,350
So we'll get the final output video.
129
00:09:37,350 --> 00:09:44,180
Very quickly I told you there's about sixty eight frames to be processed on each frame.
130
00:09:44,190 --> 00:09:46,730
Replying the direct function to the object.
131
00:09:46,950 --> 00:09:51,800
Let's see what happens and we'll get quickly to the final result.
132
00:09:56,490 --> 00:10:00,080
All right so it's about to end very very soon.
133
00:10:00,630 --> 00:10:00,970
Yeah.
134
00:10:01,010 --> 00:10:01,260
OK.
135
00:10:01,260 --> 00:10:08,220
So it went from zero to 67 so there was indeed sixty eight frames to process that is to detect some
136
00:10:08,220 --> 00:10:11,110
object in the video in a two seconds video.
137
00:10:11,340 --> 00:10:12,150
OK.
138
00:10:12,270 --> 00:10:14,980
So let's watch the final output.
139
00:10:15,050 --> 00:10:22,380
But before that I just want to show you again the original video funny Doug before.
140
00:10:22,470 --> 00:10:23,960
So let's watch this again.
141
00:10:24,310 --> 00:10:26,610
BOING BOING BOING BOING BOING BOING.
142
00:10:26,620 --> 00:10:29,090
All right the duck bouncing on the field.
143
00:10:29,560 --> 00:10:33,370
And now let's see what our mole was able to do.
144
00:10:33,430 --> 00:10:39,670
So there is this dog here a human here Carol here and some other humans here.
145
00:10:39,820 --> 00:10:42,790
Let's see what this mole was able to detect.
146
00:10:42,790 --> 00:10:44,240
Going to close that video.
147
00:10:44,260 --> 00:10:50,540
I'm going to get my outputs and let's watch the result.
148
00:10:51,550 --> 00:10:52,110
Ready.
149
00:10:53,210 --> 00:11:02,720
And play all right amazing job Doug was detected the human was detected and I didn't have time to see
150
00:11:03,170 --> 00:11:07,900
how well your role was detected but I think I saw some detections on this humans.
151
00:11:07,910 --> 00:11:09,060
Let's watch this again.
152
00:11:09,110 --> 00:11:14,000
That's an amazing job you can try to do that with open Sivi or some other models.
153
00:11:14,000 --> 00:11:16,480
I'm not sure you get such a great result.
154
00:11:16,490 --> 00:11:20,810
I actually tried it with open city and I definitely didn't get the same results.
155
00:11:20,820 --> 00:11:22,370
There were rectangles everywhere.
156
00:11:22,490 --> 00:11:24,140
So that didn't work.
157
00:11:24,230 --> 00:11:31,940
But with this SSD model the detection is amazing the drug is perfectly well detected.
158
00:11:31,940 --> 00:11:35,580
So now let's see let's see for the other detection.
159
00:11:35,600 --> 00:11:40,590
So the human is also detected this human here but it's quite big in the video.
160
00:11:40,640 --> 00:11:42,900
So of course it's detected.
161
00:11:44,000 --> 00:11:45,990
The drug is well detected as well.
162
00:11:46,760 --> 00:11:49,010
Let's see some other OK.
163
00:11:49,100 --> 00:11:55,070
So the humans behind are hidden by the arms of course they're not yet detected but let's see what happens
164
00:11:55,070 --> 00:11:55,750
next.
165
00:11:59,170 --> 00:12:03,280
All right that's what I'm talking about here on this special frame.
166
00:12:03,310 --> 00:12:09,340
This special frame is very interesting because not only we can see the humans behind detect it very
167
00:12:09,340 --> 00:12:19,740
well detected it's actually detected this lady here and and also Kiril was detected but we lost the
168
00:12:19,740 --> 00:12:20,870
detection on the dog.
169
00:12:20,950 --> 00:12:21,960
And why is that.
170
00:12:21,970 --> 00:12:29,080
It's because the dog merged with Curiel you see CULE has the upper body of Kiril but the lower body
171
00:12:29,080 --> 00:12:33,060
of the dog and therefore the model things that one same person.
172
00:12:33,220 --> 00:12:35,590
And that's why it detected the person.
173
00:12:35,650 --> 00:12:37,030
So that's pretty funny.
174
00:12:37,180 --> 00:12:44,840
And then if I move onto the next frame Well the detection of the real person is gone.
175
00:12:44,950 --> 00:12:47,760
And we got back the detection of the dog.
176
00:12:47,890 --> 00:12:49,870
That's a pretty funny thing that happened here.
177
00:12:50,230 --> 00:12:50,710
OK.
178
00:12:50,710 --> 00:12:56,210
And then all right we had some more dog and more Duguay.
179
00:12:56,290 --> 00:12:59,410
So that's pretty cool isn't it.
180
00:12:59,410 --> 00:13:04,960
The dog is well detected the humans will detect it and sometimes we get some other detections on other
181
00:13:04,960 --> 00:13:08,200
humans that are much harder to detect.
182
00:13:09,580 --> 00:13:16,270
So I hope you are convinced by the power of this as demurral you can actually try with the other ones
183
00:13:16,270 --> 00:13:23,260
you'll see that it's a pretty great job that was done here by the SS The neural network in the next
184
00:13:23,530 --> 00:13:24,060
tutorial.
185
00:13:24,070 --> 00:13:25,340
I'll give you a little homework.
186
00:13:25,390 --> 00:13:32,380
It will be to do some detection and some other video some very cool and really really beautiful video
187
00:13:32,680 --> 00:13:37,390
of some horses running on some field and filmed by the drone.
188
00:13:37,390 --> 00:13:42,590
I'd like you to try this because I like you to keep in mind that this model can not only detect common
189
00:13:42,610 --> 00:13:49,840
things like humans and dogs but also many other objects like horses boats cars planes whatever.
190
00:13:49,840 --> 00:13:54,010
I think between 30 to 40 objects so that will be a funny homework to do.
191
00:13:54,010 --> 00:13:54,900
Not difficult.
192
00:13:54,910 --> 00:14:02,600
But don't worry we'll get back to difficult things in module three with deep convolutional Ganns.
193
00:14:02,770 --> 00:14:05,890
So I'll see you in the homework and module 3.
194
00:14:05,950 --> 00:14:07,830
And until then enjoy can do revision.
19761
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.