Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,560 --> 00:00:07,190
Hello and welcome to the practical applications of module to object detection I'm super excited to start
2
00:00:07,190 --> 00:00:08,830
this module for two reasons.
3
00:00:08,840 --> 00:00:12,260
First one is we are taking things at the next level now.
4
00:00:12,440 --> 00:00:18,560
As I told you open TV is not the most powerful model and the model we will implement in this module
5
00:00:18,710 --> 00:00:24,410
is much more powerful because it is based on deep learning and neural networks that computer vision
6
00:00:24,500 --> 00:00:26,050
where the computer will have a brain.
7
00:00:26,150 --> 00:00:27,580
That's exactly what it means.
8
00:00:27,590 --> 00:00:30,890
And the second reason is that we have an exciting challenge.
9
00:00:30,890 --> 00:00:34,830
I will show you a video of a very cute dog bouncing on the field.
10
00:00:35,000 --> 00:00:42,800
And our challenge will be to detect the dog will be to implement some program that will detect the dog
11
00:00:42,890 --> 00:00:43,600
in the video.
12
00:00:43,790 --> 00:00:49,400
So it's good that you see several ways of doing some computer vision in the first module you learn how
13
00:00:49,400 --> 00:00:53,240
to do some face detection through a webcam.
14
00:00:53,450 --> 00:00:58,160
And now you're going to learn how to do some object detection on a video directly.
15
00:00:58,160 --> 00:01:05,280
Now before we start I would like to say a big thank you to this developer here next to Groote.
16
00:01:05,360 --> 00:01:07,390
That's a picture of him in a horseshoe.
17
00:01:07,580 --> 00:01:13,850
He's the creator of the PI torch implementation of single shot multi-book detector that we're going
18
00:01:13,850 --> 00:01:15,420
to use in this module.
19
00:01:15,440 --> 00:01:19,850
So thank you very much for sharing this and make it open source.
20
00:01:19,850 --> 00:01:27,710
We actually tried several object detection models we tried the first are CNN the yellow open CD and
21
00:01:27,860 --> 00:01:33,320
the SSD and we obtain the best result with the single shot multi-book detection.
22
00:01:33,320 --> 00:01:38,750
Not only we obtain the best result with this moral but also if you look at the paper you will see that
23
00:01:38,810 --> 00:01:45,270
on the tested cases the single shot multiplexed detection model beats yolo and fester are CNN.
24
00:01:45,440 --> 00:01:53,370
So that's why our choice for the ultimate objective texture model of muchall 2 was single shot multi
25
00:01:53,380 --> 00:01:58,870
box detection and the best implementation we found was from this developer Max agreed.
26
00:01:58,990 --> 00:02:00,230
So thank you so much.
27
00:02:00,230 --> 00:02:06,020
Thank you for sharing this pre-trained moral it's actually a pre-trained moral that was trained to detect
28
00:02:06,110 --> 00:02:13,790
between 30 and 40 objects including cars dogs horses ships boats planes and more.
29
00:02:13,850 --> 00:02:18,410
So a very useful not all that you could use for your own business problems.
30
00:02:18,410 --> 00:02:24,020
We're going to go inside the SSD in this module and we're going to learn how to use it and how to detect
31
00:02:24,110 --> 00:02:26,610
any object on any video.
32
00:02:26,660 --> 00:02:28,650
So that's going to be a pretty exciting module.
33
00:02:28,670 --> 00:02:30,640
I can't wait to show you this video of this Doug.
34
00:02:30,650 --> 00:02:35,860
I really like this Doug it's actually Carol who filmed this little dog with a drone.
35
00:02:36,020 --> 00:02:41,240
So the first thing we're going to do now is we're going to open Anaconda because I want to make sure
36
00:02:41,540 --> 00:02:44,900
that you don't forget to connect to the virtual platform.
37
00:02:45,140 --> 00:02:46,520
So let's do it.
38
00:02:46,550 --> 00:02:49,130
I'm opening an icon the Navigator.
39
00:02:49,130 --> 00:02:51,430
You have to find an X on the Navigator.
40
00:02:51,470 --> 00:02:57,440
If you're on Windows you will find it in the list of programs and on Linux you can open it through either
41
00:02:57,460 --> 00:03:00,410
to terminal or in the programs.
42
00:03:00,410 --> 00:03:03,490
All right so now and I can the navigator is opened.
43
00:03:03,650 --> 00:03:08,510
And don't forget to do this applications on virtual platform.
44
00:03:08,510 --> 00:03:09,020
There we go.
45
00:03:09,020 --> 00:03:11,670
Now we're connected to the virtual platform environment.
46
00:03:11,780 --> 00:03:14,330
And so we were ready to launch spider.
47
00:03:14,540 --> 00:03:17,010
And we don't have to install anything.
48
00:03:17,030 --> 00:03:19,760
Everything is already installed on the virtual platform.
49
00:03:19,850 --> 00:03:23,900
So we are ready to execute the code and I'm super happy to start.
50
00:03:24,320 --> 00:03:30,530
But before we start implementing the code we have to be in the right folder because there are some external
51
00:03:30,530 --> 00:03:34,230
files that we'll be calling when executing the code in the end.
52
00:03:34,400 --> 00:03:38,880
So anyway we always have to be in the right folder where we implement the code.
53
00:03:38,930 --> 00:03:43,040
So that's the first thing I'm going to do now I'm going to go to my desktop.
54
00:03:43,130 --> 00:03:46,500
This is where my computer vision is at full that is.
55
00:03:46,670 --> 00:03:53,870
So let's double click on it and now congratulations you reached module to object detection.
56
00:03:53,870 --> 00:03:54,280
All right.
57
00:03:54,290 --> 00:03:55,330
That's the folder.
58
00:03:55,460 --> 00:03:58,190
Let's quickly describe what's inside this folder.
59
00:03:58,250 --> 00:04:04,880
So you have data is just a folder that contains the classes based transform that will do the required
60
00:04:04,880 --> 00:04:11,300
transformations so that the input images will be compatible with the neural network then.
61
00:04:11,540 --> 00:04:15,800
Funny Doug is of course this video of this very funny.
62
00:04:15,810 --> 00:04:17,590
We will be trying to detect.
63
00:04:17,660 --> 00:04:19,420
I will show you this video in a second.
64
00:04:19,670 --> 00:04:25,990
But then layer's is another folder that contained some other tools for the detection and the multi-book
65
00:04:26,020 --> 00:04:28,270
as part of the SSD.
66
00:04:28,280 --> 00:04:35,480
Then you have of course to code the commented version of the code object detection commented where you
67
00:04:35,480 --> 00:04:41,300
have the whole code that will implemented this module come into line by line so that can be useful either
68
00:04:41,300 --> 00:04:42,770
before or after.
69
00:04:42,770 --> 00:04:48,350
Actually I also recommend to have a look at this before so that you can expect what you need to understand
70
00:04:48,500 --> 00:04:55,100
and therefore when I explain it you might understand it more easily than this code object detection
71
00:04:55,520 --> 00:04:57,540
is actually going to open it.
72
00:04:57,650 --> 00:05:03,280
Is the code that we will implement in this module so I already imported the libraries.
73
00:05:03,280 --> 00:05:05,620
I'm going to describe what those libraries are.
74
00:05:05,620 --> 00:05:11,440
But anyway this is where I will implement this whole code and when I'm done implementing it with you
75
00:05:11,800 --> 00:05:14,510
I will rename it object detection.
76
00:05:14,530 --> 00:05:17,880
No comment that you can have the commented version of the code.
77
00:05:17,950 --> 00:05:22,220
And the non-committed version of the code you can practice to recoated.
78
00:05:22,390 --> 00:05:24,350
That's excellent practice.
79
00:05:24,400 --> 00:05:31,400
Then we have the SSD that you wife file which contains the architecture of the single shot multi-button
80
00:05:31,400 --> 00:05:32,420
action model.
81
00:05:32,440 --> 00:05:38,590
We won't implement this one because I want it to keep what's most important for you to understand in
82
00:05:38,890 --> 00:05:40,990
this object detection implementation.
83
00:05:41,170 --> 00:05:45,290
Because if we implement the whole model this will be overwhelming.
84
00:05:45,400 --> 00:05:48,700
And you might miss what's at the heart of the model.
85
00:05:48,760 --> 00:05:51,030
So I prefer to proceed this way.
86
00:05:51,100 --> 00:05:54,020
And this model is all the architecture.
87
00:05:54,070 --> 00:05:59,620
And in fact after you watched in tuition lectures you will be totally able to understand what's going
88
00:05:59,620 --> 00:06:00,010
on.
89
00:06:00,060 --> 00:06:04,880
Well it's mostly about the architecture with all the boxes how they're defined.
90
00:06:05,020 --> 00:06:10,580
But then the heart of the model will be in this implementation objective section.
91
00:06:11,050 --> 00:06:19,150
And then finally this file is the file we will be loading to get the pre-trained SS DeMaio and more
92
00:06:19,150 --> 00:06:26,690
precisely this is the file that contains the weight of the SSD neural network that was already pre-trained.
93
00:06:26,890 --> 00:06:33,340
So we will be loading this file with torch the torch library that load which is a function of torche
94
00:06:33,840 --> 00:06:39,610
this tortured load function will open a tensor a tensor that will contain the weight of this already
95
00:06:39,610 --> 00:06:46,240
pre-trained neural network and then through a mapping with a dictionary we will transfer these weights
96
00:06:46,510 --> 00:06:48,370
to the model we implement.
97
00:06:48,550 --> 00:06:53,890
So basically this just contains the weight of an already pre-trained model and we will transfer these
98
00:06:53,890 --> 00:06:56,880
weights to the model we will implement.
99
00:06:56,890 --> 00:06:58,820
I hope that's clear and that's it.
100
00:06:58,930 --> 00:07:00,620
So I guess we're ready to start.
101
00:07:00,640 --> 00:07:05,140
And therefore let's start with some funny video of this very cute Doug.
102
00:07:05,230 --> 00:07:08,350
So I'm going to double click on the video.
103
00:07:08,350 --> 00:07:08,920
There we go.
104
00:07:08,920 --> 00:07:11,220
That's the video you can recognize.
105
00:07:11,220 --> 00:07:14,870
Kiril going to put that at the beginning.
106
00:07:15,030 --> 00:07:15,980
So this is curial.
107
00:07:15,990 --> 00:07:17,120
This is the dog.
108
00:07:17,190 --> 00:07:20,580
This video last two seconds so that it doesn't take too much time.
109
00:07:20,670 --> 00:07:22,340
When you try to Marans video.
110
00:07:22,470 --> 00:07:25,980
But we will totally have time to see the dog bouncing.
111
00:07:25,980 --> 00:07:27,000
It's very funny.
112
00:07:27,000 --> 00:07:27,770
Check this out.
113
00:07:30,670 --> 00:07:35,450
It you Doug when I watched this Doug I absolutely want to play with him.
114
00:07:37,320 --> 00:07:40,570
And actually you can see Kyrle piloting the drone behind.
115
00:07:40,770 --> 00:07:42,280
So there we go.
116
00:07:42,300 --> 00:07:43,400
That's the video.
117
00:07:43,520 --> 00:07:50,700
And actually the model we will implement will not only detect the dog bouncing on the field but also
118
00:07:50,700 --> 00:07:51,930
this human here.
119
00:07:51,930 --> 00:07:57,180
And you will see that it will also manage to detect curial even if you're really far actually from the
120
00:07:57,180 --> 00:07:57,800
video.
121
00:07:58,050 --> 00:08:04,440
And I'd like to tell you now that actually you know for you it's very easy to detect the drug but the
122
00:08:04,440 --> 00:08:06,720
drug is actually pretty small in the video.
123
00:08:06,750 --> 00:08:08,540
You know it's a pretty small object.
124
00:08:08,730 --> 00:08:14,220
And when we tried to detect that with open city we had extremely bad results.
125
00:08:14,220 --> 00:08:19,160
It couldn't detect the drug it couldn't detect what it was and there were some rectangles everywhere
126
00:08:19,170 --> 00:08:20,830
you can actually try yourself.
127
00:08:21,030 --> 00:08:26,580
And that's why I wanted to highlight that open Svea is definitely not among the most powerful models
128
00:08:26,850 --> 00:08:32,970
but you'll see that the more we will implement in this module will do a perfect job at detecting this
129
00:08:32,970 --> 00:08:39,120
drug even if it is small and even if there is not a perfect contrast between the drug and environment
130
00:08:39,120 --> 00:08:42,400
you know it's not like we have a white environment with a black dog.
131
00:08:42,630 --> 00:08:44,920
The dog can be confused with something else.
132
00:08:45,120 --> 00:08:54,450
So you'll be convinced of the power of this model at the end and I can't wait to show you how this model
133
00:08:54,540 --> 00:08:57,180
is going to do all right.
134
00:08:57,230 --> 00:08:58,570
That's what I wanted to catch.
135
00:08:58,570 --> 00:09:02,880
You know sometimes it really doesn't look like a dyke but you'll see what happens.
136
00:09:02,890 --> 00:09:09,220
Let's implement the SSD single shot multi-book detection and let's do that from the next tutorial.
137
00:09:09,280 --> 00:09:11,320
Until then enjoy computer vision.
14347
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.