Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,690 --> 00:00:06,930
Hello and welcome to this patented Hoyo today we're going to create the SSD neural network.
2
00:00:06,930 --> 00:00:08,890
So we already defined our detect function.
3
00:00:08,890 --> 00:00:15,330
So now the main thing that we have to do left is to indeed create this as is the neural network.
4
00:00:15,360 --> 00:00:21,540
But the good news is that we already have the weights of a pre-trained as is the neural network.
5
00:00:21,570 --> 00:00:26,550
So what will simply do is create an object that will represent the neural network itself.
6
00:00:26,700 --> 00:00:34,260
Thanks to the build SSD the function that we import from this SSD pipe and fallar recommend to have
7
00:00:34,260 --> 00:00:35,390
a look at this.
8
00:00:35,550 --> 00:00:41,520
And then after we create this neural network object we will get the weights by loading from an already
9
00:00:41,520 --> 00:00:43,680
pre-trained as is the neural network.
10
00:00:43,860 --> 00:00:45,840
The weights are contained in this file.
11
00:00:45,870 --> 00:00:50,880
We're going to get these weights by using torture load which is a function of torch and that will open
12
00:00:50,880 --> 00:00:57,650
a tensor that will contain these weights and then using another function to load state dict function.
13
00:00:57,660 --> 00:01:04,100
We will attribute these loaded weights to our instance object of our neural network.
14
00:01:04,140 --> 00:01:05,730
So that's exactly the process.
15
00:01:05,730 --> 00:01:06,930
It's not that hard.
16
00:01:06,930 --> 00:01:09,550
We will do it in three or four lines of code.
17
00:01:09,780 --> 00:01:10,780
So let's do this.
18
00:01:10,860 --> 00:01:16,030
The first thing that we need to do is to create our neural network object.
19
00:01:16,200 --> 00:01:21,990
So we're going to call this new one that work not just to align with this variable here but we could
20
00:01:21,990 --> 00:01:26,010
actually choose another name it's just less confusing this way.
21
00:01:26,190 --> 00:01:32,210
And now as we just said we're going to use to build as is the function that is a function from the SSD
22
00:01:32,210 --> 00:01:33,250
pattern file.
23
00:01:33,330 --> 00:01:37,280
So we call this function build underscore SSD.
24
00:01:37,500 --> 00:01:41,430
And in this function we actually have to input only one argument.
25
00:01:41,490 --> 00:01:45,470
This argument is the face you have two possible phases.
26
00:01:45,510 --> 00:01:47,590
The train phase and the test phase.
27
00:01:47,890 --> 00:01:54,240
But here since we're going to use an already pre-trained model bellowing its weight well we're not going
28
00:01:54,240 --> 00:01:55,520
to train anything.
29
00:01:55,560 --> 00:02:01,020
We're just going to test actually because we're going to test as is the model on our Video of the funny
30
00:02:01,020 --> 00:02:01,570
dog.
31
00:02:01,800 --> 00:02:08,820
So the face we have to choose here is test and we just put it this way in quotes test just like that
32
00:02:09,270 --> 00:02:10,230
and that's it.
33
00:02:10,440 --> 00:02:16,130
Our neural network our SSD neural network is created with this single line of code.
34
00:02:16,410 --> 00:02:17,630
Awesome.
35
00:02:17,640 --> 00:02:24,060
Then next step as we said the next step is to load the weights of an already pre-trained SSD neural
36
00:02:24,060 --> 00:02:30,990
network and the name of this already pre-trained neural network is exactly SSD 300 and this core map
37
00:02:31,230 --> 00:02:37,270
underscores seventy seven point forty three underscore V-2 that's the name of the neural network.
38
00:02:37,380 --> 00:02:38,530
That's a powerful one.
39
00:02:38,560 --> 00:02:44,220
It was pre-trained to detect between 30 to 40 objects and you're going to see that on the video that
40
00:02:44,220 --> 00:02:46,150
it's not only going to detect the drug.
41
00:02:46,350 --> 00:02:47,670
It's going to detect more.
42
00:02:47,820 --> 00:02:53,520
So actually this model and this will be part of the homework you will have to test this model on other
43
00:02:53,520 --> 00:02:55,650
videos containing other objects.
44
00:02:55,830 --> 00:02:58,670
So that's the best pre-trained model we could find.
45
00:02:58,770 --> 00:03:05,250
And believe me or believe the paper is actually more powerful than some other great object detection
46
00:03:05,250 --> 00:03:08,810
Morell's like faster or CNN or yellow.
47
00:03:09,120 --> 00:03:14,820
According to the paper that is according to the cases tested by the paper the SSD is the most powerful.
48
00:03:14,830 --> 00:03:19,120
A mom says the first of our CNN and yellow.
49
00:03:19,170 --> 00:03:19,740
All right.
50
00:03:19,740 --> 00:03:21,670
So let's slow this wait.
51
00:03:21,780 --> 00:03:26,790
And to load these weights we just need to take our new network object and then we're going to use the
52
00:03:26,790 --> 00:03:35,060
load underscore state underscored dict function and inside this function we exact input.
53
00:03:35,460 --> 00:03:49,260
Well in quotes our pre-trained model says the 300 this core and a P underscore 77 dot 43 underscore
54
00:03:49,350 --> 00:03:52,250
the two dot p t h.
55
00:03:52,620 --> 00:03:59,040
But these weights we're going to put them into a tensor and therefore inside the load state dict function
56
00:03:59,400 --> 00:04:06,520
I'm going to call another function which I already mentioned that is the torch that load function.
57
00:04:06,630 --> 00:04:14,100
So the load function from the torture library and this towards stop load function will open a tensor
58
00:04:14,340 --> 00:04:20,460
that will contain these weights and then the use of the load stated function is to attribute these weights
59
00:04:20,580 --> 00:04:23,670
to our esset the neural network object.
60
00:04:23,670 --> 00:04:29,910
All right so that's almost ready we just need to add two more arguments in our torch's upload function.
61
00:04:29,910 --> 00:04:40,320
The first one is map underscore location and that should be equal to lambda storage and a third argument
62
00:04:40,680 --> 00:04:44,610
luck for location storage.
63
00:04:44,610 --> 00:04:48,700
All right so that's just the way to open a center that will contain these weights.
64
00:04:48,840 --> 00:04:53,840
And so now not only do we have a sensor that contains these weights but also these weights or attribute
65
00:04:53,840 --> 00:04:57,200
it to our as is the net object.
66
00:04:57,210 --> 00:04:57,570
All right.
67
00:04:57,570 --> 00:05:02,720
Now the neural network SSD single shot multi-post detection is created.
68
00:05:02,880 --> 00:05:08,790
So that means that we have the frames coming from the video and we have our neural network nets but
69
00:05:08,790 --> 00:05:11,630
remember to apply the detect function on the frames.
70
00:05:11,850 --> 00:05:18,030
We not only need the frames and the net but we also need the transform transformation and that's exactly
71
00:05:18,030 --> 00:05:20,450
what we're going to create in the next tutorial.
72
00:05:20,510 --> 00:05:26,400
We're going to create this transformation that will make sure that the input frames coming from the
73
00:05:26,400 --> 00:05:33,200
video and that will be the input of the function will be compatible with our SSD neural network.
74
00:05:33,210 --> 00:05:35,920
So let's just create this last thing we need.
75
00:05:35,970 --> 00:05:38,260
That is a transformation in the next tutorial.
76
00:05:38,280 --> 00:05:40,110
And until then you can build a vision.
8247
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.