Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,500 --> 00:00:06,610
Hello and welcome to the last step of this module to object detection before we start the homework.
2
00:00:06,650 --> 00:00:12,140
But here we go with the last step and this last step is going to be about the training of the SSD.
3
00:00:12,290 --> 00:00:18,260
So we're not going to implement the whole code that we'll try and the SSD simply because it's a huge
4
00:00:18,260 --> 00:00:18,720
code.
5
00:00:18,740 --> 00:00:23,630
And besides you're going to see that you're going to need specific requirements to be able to run that
6
00:00:23,630 --> 00:00:24,320
training.
7
00:00:24,500 --> 00:00:29,630
So this tutorial is going to be about explaining how you could train the SSD.
8
00:00:29,630 --> 00:00:32,160
So I'm going to show you the file how to execute it.
9
00:00:32,300 --> 00:00:36,700
And mostly I'm going to show you the data set on which the SSD was trained.
10
00:00:36,890 --> 00:00:42,920
So in case you want to train the SSD with some other object you would understand the approach of how
11
00:00:42,920 --> 00:00:43,570
to do it.
12
00:00:43,730 --> 00:00:48,530
But remember you're going to see that you're going to need some pretty advanced system.
13
00:00:48,650 --> 00:00:55,370
So as you can see right now I'm on a web page and this is the web page that contains the data sets on
14
00:00:55,370 --> 00:00:57,080
which the SS was trained.
15
00:00:57,080 --> 00:00:59,320
This dataset is not image net.
16
00:00:59,330 --> 00:01:04,520
If you were hoping for that but the Pascal visual object classes dataset.
17
00:01:04,520 --> 00:01:11,030
So if you remember in the code we saw Virk the OSI you know for classes to have the mapping between
18
00:01:11,030 --> 00:01:12,840
the classes and integers.
19
00:01:12,980 --> 00:01:20,230
Well Varg stands for visual object classes and this is already a huge dataset not as huge as image net
20
00:01:20,300 --> 00:01:24,380
but a huge one that you will see contains lots and lots of images.
21
00:01:24,650 --> 00:01:29,840
So you can find this page at this address but don't worry the next it will be an article which you will
22
00:01:29,840 --> 00:01:37,580
find the homework folder that will contain all the tools to train the SSD Plus the address of this website
23
00:01:37,790 --> 00:01:40,730
so that you can have a look at this better.
24
00:01:40,730 --> 00:01:47,600
All right so let's go through this web page so as you can see the Pascals Virk project provides standardized
25
00:01:47,690 --> 00:01:49,930
image data sets for object recognition.
26
00:01:49,940 --> 00:01:56,010
So exactly what we're doing and besides provides a common set of tools for accessing the data set and
27
00:01:56,010 --> 00:01:57,010
annotations.
28
00:01:57,050 --> 00:02:03,230
So it's not only a data set it's also an advanced structure of data set that helps a lot for the training.
29
00:02:03,350 --> 00:02:07,370
And also you can see that there were some challenges but now finished.
30
00:02:07,430 --> 00:02:11,460
But anyway it is definitely useful data set to train the SSD.
31
00:02:11,460 --> 00:02:18,180
And so now if we have a closer look at what these days that are well we need to scroll down to that.
32
00:02:18,240 --> 00:02:25,610
What challenges from 2005 to 2012 and here are the data sets you can see there's several of them from
33
00:02:25,670 --> 00:02:33,440
2005 to 2012 and the ones we have the ones on which the SSD was trans and I'm going to show you the
34
00:02:33,440 --> 00:02:39,150
code and the data sets themselves and how you can execute the code to train the SSD on these two data
35
00:02:39,150 --> 00:02:39,780
set.
36
00:02:39,920 --> 00:02:45,670
These two data sets are the Vark 2007 and the Vok 2012.
37
00:02:45,980 --> 00:02:51,310
So let's have a closer look at where they are right now and we need to scroll down even more.
38
00:02:51,440 --> 00:02:54,390
Actually at the bottom of the page there we go.
39
00:02:54,650 --> 00:02:58,640
So here you have this table that's the most important part of this page.
40
00:02:58,640 --> 00:03:04,350
That's where you can see how the data sets are structured and made and what they contain exactly.
41
00:03:04,700 --> 00:03:05,990
So let's have a look at the first one.
42
00:03:05,990 --> 00:03:14,060
Remember we have the 2007 data set and 2012 sort of the 2007 data set contains 20 classes.
43
00:03:14,060 --> 00:03:20,560
Remember I told you that the pre-trained the model we have can detect between 30 to 40 object.
44
00:03:20,630 --> 00:03:27,530
Well among these objects we have some persons Well the S's can detect any person in a video or in some
45
00:03:27,530 --> 00:03:36,080
images then several animals some birds cats cows dogs horses sheeps some vehicles airplanes bicycles
46
00:03:36,080 --> 00:03:41,750
boats buses cars you can do some computer vision for some driving car if you have a self-driving car
47
00:03:41,750 --> 00:03:48,530
business I would highly recommend to integrate as is the model for the computer vision inside yourself
48
00:03:48,650 --> 00:03:52,450
in cars are also motorbikes and trains of course.
49
00:03:52,670 --> 00:04:00,770
And then we have some other objects like some bottles chairs dining table potted plants sofas and TV
50
00:04:00,890 --> 00:04:01,920
monitors.
51
00:04:01,940 --> 00:04:02,300
All right.
52
00:04:02,300 --> 00:04:04,850
And I think we have even more but there you go.
53
00:04:04,850 --> 00:04:07,220
You can see you have plenty of objects to detect.
54
00:04:07,220 --> 00:04:09,380
So really really useful.
55
00:04:09,380 --> 00:04:19,160
And now if we go to the 2012 dataset of 2012 we can see we also have 20 classes and the data contains
56
00:04:19,400 --> 00:04:27,560
eleven thousand five hundred and thirty images containing 27 450 animated objects so we're going to
57
00:04:27,560 --> 00:04:33,640
have a look at these objects specifically because as I told you I've prepared for you a folder to training
58
00:04:33,640 --> 00:04:37,990
as is the folder that contains not only the code but also these two data sets.
59
00:04:38,000 --> 00:04:40,470
We're going to have a clear look at images.
60
00:04:40,490 --> 00:04:42,100
So we're going to look at some of them.
61
00:04:42,200 --> 00:04:47,540
And so what I'm going to do right now is walk you through this folder that I've prepared.
62
00:04:47,540 --> 00:04:50,190
So let's open Anaconda up.
63
00:04:50,240 --> 00:04:50,810
There we go.
64
00:04:50,810 --> 00:04:56,660
We need to connect the virtual platform of course because the training is made with by torch again and
65
00:04:56,660 --> 00:05:00,840
some other tools that are preinstalled individual them.
66
00:05:00,850 --> 00:05:01,750
So there we go.
67
00:05:01,810 --> 00:05:06,670
Let's not forget to connect to the virtual platform applications on virtual platform.
68
00:05:06,670 --> 00:05:07,520
There we go.
69
00:05:07,600 --> 00:05:13,960
And now we launch spider so spiders launching and I'm going to show you this training as is the four
70
00:05:13,960 --> 00:05:15,400
that I've prepared for you.
71
00:05:15,400 --> 00:05:22,870
I put it in the module to object detection color which I'm going to go to right now so I'll explore
72
00:05:23,110 --> 00:05:28,570
it on my desktop that's the computer vision it is that folder and if we go to module 2 you'll find a
73
00:05:28,570 --> 00:05:29,590
new folder.
74
00:05:29,720 --> 00:05:35,590
This folder training as is d that contains all the tools and data sets for you to train the SSD.
75
00:05:35,830 --> 00:05:38,760
But I have to specify something very important now.
76
00:05:38,920 --> 00:05:45,290
Remember at the beginning of this oil I told you that you need an advanced system to train the SSD while
77
00:05:45,430 --> 00:05:49,140
this advance system is simply to have kuda on your machine.
78
00:05:49,180 --> 00:05:55,900
Kuda is like an accelerator that is connected to every geographic cards and that allows to speed up
79
00:05:55,960 --> 00:05:58,990
considerably the training computations.
80
00:05:58,990 --> 00:06:04,880
Therefore the training implementations that were made are only compatible with kuda.
81
00:06:04,890 --> 00:06:10,870
I'm going to tell you a fun fact now if there was an implementation of the training implemented without
82
00:06:10,870 --> 00:06:16,240
kuda I can tell you for sure the training would take several months or even years.
83
00:06:16,240 --> 00:06:17,440
This is not a joke.
84
00:06:17,500 --> 00:06:22,370
The training of the SSD is first of all made with thousands and thousands of images.
85
00:06:22,540 --> 00:06:27,390
But if you train that without kuda it would definitely take several months.
86
00:06:27,400 --> 00:06:33,460
You can not trained the SSD on your CPQ you have to train it with a GP you and the best you can find
87
00:06:33,520 --> 00:06:34,390
is kuda.
88
00:06:34,480 --> 00:06:40,990
So just to set things clear this implementation is only kuda compatible and therefore you will only
89
00:06:40,990 --> 00:06:47,320
be able to execute the code if you have an NDA Geographic card with kuda and labels on your machine.
90
00:06:47,320 --> 00:06:48,720
But even if that's the case.
91
00:06:48,820 --> 00:06:53,550
While it would take some time and you would not have your SSD trained in the next hour.
92
00:06:53,860 --> 00:06:58,960
But I think it's still interesting for you to see how you can run the training and therefore that's
93
00:06:58,960 --> 00:07:00,650
exactly what I'm going to show you right now.
94
00:07:00,790 --> 00:07:08,410
But first let me walk you through this training SSD folder starting with the data so you can see we
95
00:07:08,410 --> 00:07:11,170
have this data folder that contains several files.
96
00:07:11,200 --> 00:07:17,650
Here you can have the scripts which are shellscript and which allow to download the datasets Vogue 2007
97
00:07:17,670 --> 00:07:19,180
and Vok 2012.
98
00:07:19,360 --> 00:07:27,940
So to do that you simply need to open a terminal and then type S H space Virk 2007 dot S H then execute
99
00:07:28,270 --> 00:07:35,350
and this will download word 2007 and then you can type another command as a space walk 2012 that s h
100
00:07:35,740 --> 00:07:37,880
and then press enter to the queue to download.
101
00:07:37,910 --> 00:07:41,790
Also 2012 but I've already done that for you.
102
00:07:41,800 --> 00:07:49,180
I've downloaded this dataset and here they are there in this folder Vark dev kit and you have Virk 2007
103
00:07:49,430 --> 00:07:51,080
and Vark 2012.
104
00:07:51,460 --> 00:07:53,310
So let's have a look at Vocht 2007.
105
00:07:53,320 --> 00:07:58,390
First you have several subfolder but the one I want to show you is this one.
106
00:07:58,480 --> 00:08:02,620
GPL images which contains lots and lots of images.
107
00:08:02,830 --> 00:08:05,630
As you can see let's open randomly.
108
00:08:05,650 --> 00:08:09,070
Some of them so let's see this one for example number 8.
109
00:08:09,070 --> 00:08:09,980
It's a chair.
110
00:08:10,020 --> 00:08:13,220
All right so that's the ground truth for the chair.
111
00:08:13,240 --> 00:08:19,510
Remember Kyrios intuition lectures need to start with the ground truth to ground truth is the truth
112
00:08:19,600 --> 00:08:23,370
that there is in fact a chair right here on this image.
113
00:08:23,460 --> 00:08:28,450
So that's the ground truth that will be compared to the prediction of the SS doing the training just
114
00:08:28,450 --> 00:08:32,590
before a possible error is back propagated into the neural network.
115
00:08:32,590 --> 00:08:35,350
All right so let's close this let's open another one.
116
00:08:35,350 --> 00:08:36,380
Number 14.
117
00:08:36,430 --> 00:08:41,470
It's a car a yellow cab probably in New York I would say maybe.
118
00:08:41,560 --> 00:08:43,340
Or some other cities in the US.
119
00:08:43,450 --> 00:08:44,210
But there we go.
120
00:08:44,230 --> 00:08:49,720
That's to detect a car so it's a ground truth for a car number 31 then a train.
121
00:08:49,720 --> 00:08:50,170
There we go.
122
00:08:50,170 --> 00:08:52,820
You can even detect train.
123
00:08:52,940 --> 00:08:55,030
And let's open a few more.
124
00:08:55,030 --> 00:08:58,030
So that's a pigeon to detect birds.
125
00:08:58,060 --> 00:08:59,620
Ground Truth for the bird.
126
00:08:59,920 --> 00:09:03,260
And now 53 a cat cute little cat.
127
00:09:03,520 --> 00:09:05,920
And one last one hundred.
128
00:09:05,950 --> 00:09:07,730
Well let's go further.
129
00:09:07,900 --> 00:09:11,290
That's open the number one thousand five hundred eighty six.
130
00:09:11,290 --> 00:09:14,000
And that's a beautiful horse jumping.
131
00:09:14,140 --> 00:09:18,240
And speaking of horses well that's going to be the homework for this section.
132
00:09:18,280 --> 00:09:24,070
It's going to be it's going to be a fun homework for you to relax and enjoy playing some object detection
133
00:09:24,130 --> 00:09:26,470
and some really really nice videos.
134
00:09:26,500 --> 00:09:30,690
We're going to do some action on beautiful horses running on a field.
135
00:09:30,760 --> 00:09:31,600
So there we go.
136
00:09:31,600 --> 00:09:32,610
You have the object.
137
00:09:32,630 --> 00:09:38,380
And now let's have a look at the other data set because the training is made on the two data sets at
138
00:09:38,380 --> 00:09:39,320
the same time.
139
00:09:39,580 --> 00:09:42,990
So have to scroll back up.
140
00:09:43,210 --> 00:09:44,520
There we go.
141
00:09:44,620 --> 00:09:48,300
Varg 2017 and now let's have a look at 2012.
142
00:09:48,580 --> 00:09:50,010
Same subfolder is.
143
00:09:50,230 --> 00:09:53,950
But this time with different images.
144
00:09:53,970 --> 00:09:54,250
All right.
145
00:09:54,250 --> 00:09:55,160
Here they are.
146
00:09:55,330 --> 00:10:07,550
So the 2012 we actually have images of several 2007 2008 and probably 2009 there we go and up to 2012.
147
00:10:07,550 --> 00:10:11,510
So let's open any of them for example this one from 2010.
148
00:10:11,720 --> 00:10:15,020
Beautiful dog with a somewhat suspicious look.
149
00:10:15,260 --> 00:10:16,490
But I love dogs.
150
00:10:16,490 --> 00:10:21,160
I hope you liked the dog bouncing on the field as we detected during this much too.
151
00:10:21,170 --> 00:10:22,340
I really like this Doug.
152
00:10:22,560 --> 00:10:28,390
So let's have let's open another one let's open one image from 2011.
153
00:10:28,520 --> 00:10:29,210
There we go.
154
00:10:29,210 --> 00:10:30,010
This one.
155
00:10:30,230 --> 00:10:34,640
Well two nice persons probably a mother with her daughter.
156
00:10:34,740 --> 00:10:35,700
So very nice.
157
00:10:35,810 --> 00:10:41,160
Let's open one from 2012 now and that will be our last.
158
00:10:41,180 --> 00:10:44,100
So there we go the less ground truth we have.
159
00:10:44,270 --> 00:10:47,000
Oh another kid and we can see several persons.
160
00:10:47,000 --> 00:10:48,160
This one is actually interesting.
161
00:10:48,160 --> 00:10:50,640
There are several ground truth in this image.
162
00:10:50,720 --> 00:10:55,240
So maybe let's open the last one to look for a last object.
163
00:10:55,240 --> 00:11:02,810
So that's another person and another person so probably all the images from 2012 are persons.
164
00:11:02,820 --> 00:11:10,490
Now here we have a horse and a person so that's basically combining several different classes for the
165
00:11:10,760 --> 00:11:15,660
day to be able to recognize several objects in an image.
166
00:11:15,660 --> 00:11:21,300
All right so we're going to stop here and now now that we're done with the dataset I'm going to show
167
00:11:21,300 --> 00:11:24,080
you the code and I'm going to show you how to run the code.
168
00:11:24,780 --> 00:11:32,490
First let me just scroll back up to find my way back to the folder.
169
00:11:32,850 --> 00:11:33,740
There we go.
170
00:11:33,780 --> 00:11:40,370
Scripts done data done and going back to the main training as the folder.
171
00:11:40,530 --> 00:11:41,210
All right.
172
00:11:41,530 --> 00:11:46,930
So in this folder you also have besides the data you also have the layers which was the same folder
173
00:11:46,960 --> 00:11:49,030
as before when we made the detection.
174
00:11:49,030 --> 00:11:54,260
So that's because in order to train the SSD we need some of the functions and modules of the SSD.
175
00:11:54,370 --> 00:11:59,290
Then we have the SSD implementation that contains all the architecture of the SSD Anchorage to have
176
00:11:59,290 --> 00:12:00,100
a look at this.
177
00:12:00,370 --> 00:12:06,320
And we have our trained file which I'm going to open right now because data from this file that we're
178
00:12:06,340 --> 00:12:17,780
going to execute the training of the SSD on these two datasets Word 2007 and book 2012.
179
00:12:17,810 --> 00:12:18,870
All right.
180
00:12:19,200 --> 00:12:25,110
And then we have you Teal's which contains some tools for the training like augmentations so augmentations
181
00:12:25,110 --> 00:12:31,580
are some ways to increase the amount of images so that we can have even more material to train our SD
182
00:12:31,590 --> 00:12:34,730
on even if we have already lots and lots of images.
183
00:12:34,800 --> 00:12:40,020
You can actually check that in our deep learning course we have some tutorials explaining in more details
184
00:12:40,320 --> 00:12:45,900
how Image augmentations work and then finally we have this for that that contains some weights.
185
00:12:45,930 --> 00:12:51,030
So make sure to understand that these are not some weights of some pre-trained model because actually
186
00:12:51,030 --> 00:12:56,600
right now we're doing some training but these are just some initialized weights.
187
00:12:56,700 --> 00:13:02,460
You imagine that they need to be initialized in some way a way that should be compatible with the future
188
00:13:02,460 --> 00:13:04,510
of data the way weights during the training.
189
00:13:04,710 --> 00:13:10,620
So these are the weights and now we're going to go through this train file and mostly we're going to
190
00:13:10,620 --> 00:13:14,110
execute the code to show you how the training is done.
191
00:13:14,160 --> 00:13:18,390
It's actually going to be very easy we simply need to select the whole code and then press command or
192
00:13:18,390 --> 00:13:20,280
control press and to to execute.
193
00:13:20,280 --> 00:13:25,040
But before we do that I would just like to show you some of the arguments here.
194
00:13:25,170 --> 00:13:30,660
So basically these are some parcels containing all the arguments that you can change if you want to
195
00:13:30,660 --> 00:13:32,220
change the way the training will be done.
196
00:13:32,220 --> 00:13:38,160
So for example you can change the argument related to the learning rate here the default learning rate
197
00:13:38,160 --> 00:13:45,750
is 0.01 but you can change it to 0.05 for example if you want to try a different training then you can
198
00:13:45,750 --> 00:13:48,220
also change the parameter for the momentum.
199
00:13:48,330 --> 00:13:50,810
The way to the gamma parameter.
200
00:13:50,940 --> 00:13:56,400
Well these are all the parameters that you can change the hyper parameters to do some type of parameter
201
00:13:56,400 --> 00:13:57,860
tuning of the training.
202
00:13:57,930 --> 00:14:03,600
But again remember that training is taking lots and lots of time so it's mostly for the research and
203
00:14:03,600 --> 00:14:06,990
developers working on the state of the art the models.
204
00:14:06,990 --> 00:14:09,410
That's not for us to do right now anyway.
205
00:14:09,600 --> 00:14:12,920
But then you have some other parameters I mentioned.
206
00:14:12,990 --> 00:14:20,040
Image net which is another big huge dataset on which you can train your DeBerry morals and computer
207
00:14:20,040 --> 00:14:27,710
vision models and therefore if you want to train your SSD on a different data set then Virk 2007 invoked
208
00:14:27,730 --> 00:14:28,870
2012.
209
00:14:29,070 --> 00:14:37,390
Well here is the parameter related to that which contains the path to the dataset data if get that exactly
210
00:14:37,400 --> 00:14:45,250
the path data as this father and then there it is this folder containing Vok 2007 invoke 2012.
211
00:14:45,520 --> 00:14:48,280
All right so let me go back.
212
00:14:48,540 --> 00:14:50,160
So basically that's how it works.
213
00:14:50,160 --> 00:14:55,530
You know you have several hyper parameters you can change them to experiment different trainings and
214
00:14:55,530 --> 00:14:59,750
the rest is some implementation of other functions like.
215
00:14:59,760 --> 00:15:04,590
Exactly and wait and it function that initializers the way it's the proper way.
216
00:15:04,590 --> 00:15:09,990
So we're going to see that in more details in module 3 because in modules we will implement from scratch
217
00:15:10,200 --> 00:15:11,490
the training of the Ganns.
218
00:15:11,510 --> 00:15:17,100
Generally our research on that works because this will be possible to do without kuda and without thousands
219
00:15:17,100 --> 00:15:18,080
of lines of code.
220
00:15:18,240 --> 00:15:21,120
So you have several functions and mostly we have.
221
00:15:21,210 --> 00:15:28,470
There you go the train function that does the training of the SSD on all the images that are contained
222
00:15:28,560 --> 00:15:30,390
in the Vok folders.
223
00:15:30,390 --> 00:15:32,840
So there we go now to finish the tutorial.
224
00:15:32,850 --> 00:15:39,740
I'm going to show you how you can execute this file so very simply you just select the whole code again.
225
00:15:39,840 --> 00:15:45,660
Make sure that you have an empty Geographic card and kuda enabled and that's not the case you will get
226
00:15:45,780 --> 00:15:47,460
an error and be relieved.
227
00:15:47,490 --> 00:15:50,890
Even if we had a non a version of this training.
228
00:15:51,070 --> 00:15:53,150
Well it would take months or years.
229
00:15:53,190 --> 00:15:55,980
And therefore this would be completely useless.
230
00:15:56,190 --> 00:16:02,310
And that's exactly the reason why the developers have not implemented and none could have version of
231
00:16:02,320 --> 00:16:03,440
the training.
232
00:16:03,600 --> 00:16:05,910
That is because it would be completely useless.
233
00:16:06,030 --> 00:16:08,570
So I'm going to you now.
234
00:16:08,580 --> 00:16:09,500
There we go.
235
00:16:09,510 --> 00:16:16,190
The train is on its way and it's basically going to take several hours so we're just going to stop here.
236
00:16:16,200 --> 00:16:21,750
It was just to show you how the training works how the data sets were structured and to show you a little
237
00:16:21,750 --> 00:16:24,390
behind the scene how the SS is trained.
238
00:16:24,570 --> 00:16:31,160
And now we hope you understand why we had to use a pre-trained model to do our detection.
239
00:16:31,170 --> 00:16:33,800
All right so now we're going to move on to the homework.
240
00:16:33,960 --> 00:16:40,170
It's going to be a fun and exciting homework about detecting some beautiful horses running on a field.
241
00:16:40,230 --> 00:16:41,490
I can't wait to show you this.
242
00:16:41,490 --> 00:16:43,270
And until then enjoy computer vision.
25548
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.