Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,420 --> 00:00:02,700
Hello and welcome to listen to Tauriel.
2
00:00:02,700 --> 00:00:08,220
All right so we have a series of transformations to make so that the input frame can be fed into the
3
00:00:08,220 --> 00:00:09,130
neural network.
4
00:00:09,270 --> 00:00:14,300
Let's do these four transformations in this tutorial and let's start immediately with the first one.
5
00:00:14,580 --> 00:00:20,910
So as we said in the previous Statoil the first one is to apply the transform transformation so that
6
00:00:21,120 --> 00:00:26,990
our input frame has a right format meaning the right time dominations and the right colors.
7
00:00:27,180 --> 00:00:34,710
So to apply this transformation we're going to introduce a new Voivode which will be frame T and that
8
00:00:34,710 --> 00:00:38,680
will respond to this future transform frame.
9
00:00:38,760 --> 00:00:44,190
But I'm calling it Frente in that frame because I want to keep the original frame because we will use
10
00:00:44,460 --> 00:00:47,700
the original frame and to put the rectangles inside.
11
00:00:47,960 --> 00:00:53,700
So Frente will be the transformed frame after applying the transform transformation and to apply this
12
00:00:53,700 --> 00:00:55,300
transform transformation.
13
00:00:55,500 --> 00:01:02,130
Well we simply need to take that transformation transform and as input as you might guess when put the
14
00:01:02,130 --> 00:01:11,340
frame frame then we must understand that this transform function returns two elements and we are only
15
00:01:11,340 --> 00:01:16,260
interested in the first element which is actually the transform frame the frame with the right format
16
00:01:16,710 --> 00:01:21,410
and therefore to get the first element returned by this function.
17
00:01:21,480 --> 00:01:27,960
We simply need to add some brackets here and get the index of the first element which is zero.
18
00:01:28,410 --> 00:01:32,140
All right so then we get this new transform frame.
19
00:01:32,160 --> 00:01:39,720
Now Frente has the right format that has the right dimensions and the right color values good first
20
00:01:39,720 --> 00:01:41,500
transformation done now.
21
00:01:41,520 --> 00:01:43,080
Second transformation.
22
00:01:43,170 --> 00:01:45,810
Remember Frente is an umpire.
23
00:01:45,930 --> 00:01:50,210
We applied the first transformation but it still returns a number by Array.
24
00:01:50,340 --> 00:01:56,690
And so the second transformation to make now is to convert this number by Array into the torch tensor.
25
00:01:56,740 --> 00:02:04,140
I remind the torch tensor is a much more advanced matrix of a single type than an array.
26
00:02:04,260 --> 00:02:08,310
It is very useful for neural networks but it will not be useful enough.
27
00:02:08,310 --> 00:02:12,040
You will see we will have still two transformations to make.
28
00:02:12,090 --> 00:02:17,240
So let's do this second transformation converting the non-bio right into a torch sensor.
29
00:02:17,280 --> 00:02:24,780
So since we're getting closer and closer to the final input that will be accepted by our as is the neural
30
00:02:24,780 --> 00:02:25,480
network.
31
00:02:25,650 --> 00:02:32,340
I'm going to start to call this input X because then I will just override the x variable with the same
32
00:02:32,340 --> 00:02:32,870
name.
33
00:02:33,060 --> 00:02:39,520
So X will be this tortured answer that will just be converted from the non-firing.
34
00:02:39,810 --> 00:02:45,150
So now it's very simple to convert a non binary into a tortured answer.
35
00:02:45,150 --> 00:02:53,070
We just need to take our torched library and then a simple function very intuitive to remember which
36
00:02:53,070 --> 00:02:55,840
is from none.
37
00:02:56,130 --> 00:03:02,970
All right from run by the function that converts numbers into torsion answers and therefore very obviously
38
00:03:03,090 --> 00:03:07,240
what we have to input here in this function is of course our number.
39
00:03:07,250 --> 00:03:07,650
Right.
40
00:03:07,650 --> 00:03:10,870
That is Frente Okay perfect.
41
00:03:10,950 --> 00:03:16,250
But now there is another slight thing to do we could call it a sub transformation.
42
00:03:16,250 --> 00:03:23,400
This wasn't a transformation that I mentioned because it's a small thing but anyway this small transformation
43
00:03:23,430 --> 00:03:31,340
is to convert the sequence red blue green into green red blue because right now the order of the indexes
44
00:03:31,340 --> 00:03:34,560
or the color for our image is red blue green.
45
00:03:34,680 --> 00:03:40,890
But the neural network says he was trained under the convention green red blue.
46
00:03:40,920 --> 00:03:46,260
So we just need to inverse the order but that's a specific thing to the neural network and therefore
47
00:03:46,530 --> 00:03:52,320
this is not the general transformations that we have to make each time remember that the transformations
48
00:03:52,320 --> 00:03:58,920
that we're making except the one we're about to make right now is the general process before finding
49
00:03:58,970 --> 00:04:04,710
and then put into a new network implemented with torche that's very important to remember but by doing
50
00:04:04,710 --> 00:04:07,330
it two or three times it will be very easy for you.
51
00:04:07,380 --> 00:04:12,360
So let's do this small and quick transformation that is a permanent nation of colors.
52
00:04:12,540 --> 00:04:18,510
So we're going to add a dot here and then we're going to use the permute function and we want to go
53
00:04:18,510 --> 00:04:21,870
from red blue green to green red blue.
54
00:04:21,930 --> 00:04:29,550
So since Green is the last index number two we always to hear then since Red is the first one we put
55
00:04:29,620 --> 00:04:36,000
a zero and since Blue is the second index that is one we put one right.
56
00:04:36,000 --> 00:04:39,220
We want to go from red blue green to green red blue.
57
00:04:39,220 --> 00:04:43,030
So we want to go from 0 1 to 2 2 0 1.
58
00:04:43,170 --> 00:04:44,040
Perfect.
59
00:04:44,040 --> 00:04:50,850
So now we have the right torch sensor format with the right order of colors.
60
00:04:50,850 --> 00:04:51,570
Perfect.
61
00:04:51,570 --> 00:04:56,220
Now next transformation this is the third transformation out of the four.
62
00:04:56,310 --> 00:05:01,450
But actually we're going to make the last two transformations in the same one line of code.
63
00:05:01,470 --> 00:05:07,810
So the third transformation is to add this fake dimension corresponding to the batch.
64
00:05:07,830 --> 00:05:15,300
And the reason for doing this is that the neural network cannot actually accept single inputs like a
65
00:05:15,300 --> 00:05:18,810
single input vector or a single input image.
66
00:05:18,900 --> 00:05:22,540
It only accepts them in to some batches.
67
00:05:22,560 --> 00:05:27,040
So basically the neural network only except batches of inputs.
68
00:05:27,180 --> 00:05:33,390
And that's why now we have to create a structure with the first time engine correspond to the batch
69
00:05:33,750 --> 00:05:37,480
and then the other dimensions corresponding to the input.
70
00:05:37,680 --> 00:05:40,440
That's very very important in PI torch.
71
00:05:40,470 --> 00:05:44,500
You will always see that we do it with the squeeze function.
72
00:05:44,580 --> 00:05:50,490
So each time you see the squeeze function it's definitely just before feeding the neural network with
73
00:05:50,490 --> 00:05:54,820
the input we use the N squeeze function to create that thing domination of the batch.
74
00:05:54,840 --> 00:06:00,940
So good thing to keep in mind if you're going to use torture in the future which I highly recommend.
75
00:06:01,110 --> 00:06:03,700
But let's do this third transformation.
76
00:06:03,830 --> 00:06:07,440
The Squeeze and the good news is that it's very simple.
77
00:06:07,440 --> 00:06:12,350
We take our input image which is now a torch sensor.
78
00:06:12,450 --> 00:06:20,340
Thanks to the previous transformation we are added and then we use the squeeze function.
79
00:06:20,910 --> 00:06:28,950
And now this squeeze function takes one argument which is exactly the index of the dimension of the
80
00:06:28,950 --> 00:06:33,290
batch and the batch should always be the first time engine.
81
00:06:33,480 --> 00:06:39,700
So it you always have the first index and therefore since indexes and bytes and start at zero.
82
00:06:39,870 --> 00:06:42,700
Well we put a zero that zero.
83
00:06:42,700 --> 00:06:48,780
Here is the Index of this first diamond and corresponding to the bet that we're adding to our structure
84
00:06:48,900 --> 00:06:50,300
of input images.
85
00:06:51,460 --> 00:06:53,810
All right so that's the third transformation that's done.
86
00:06:53,860 --> 00:06:54,760
Perfect.
87
00:06:54,850 --> 00:06:58,350
And now we're going to do the last transformation in that same line.
88
00:06:58,600 --> 00:07:00,980
And that last transformation.
89
00:07:01,030 --> 00:07:06,970
The final one and then we'll be ready to feed the new one that work with the input image is to convert
90
00:07:07,240 --> 00:07:16,180
this batch of torture and sort of input into a torch viable I remind a torch viable is a highly advanced
91
00:07:16,180 --> 00:07:20,080
variable that contains both a tensor and a gradient.
92
00:07:20,080 --> 00:07:25,810
This torch viable will become an element of the dynamic graph which will compute very efficiently the
93
00:07:25,810 --> 00:07:30,480
gradients of any composition functions during backward propagation.
94
00:07:30,490 --> 00:07:31,340
So there we go.
95
00:07:31,360 --> 00:07:33,150
That's the final transformation.
96
00:07:33,190 --> 00:07:34,880
And that is very easy to do.
97
00:07:34,930 --> 00:07:41,880
We simply need to take the variable class like that.
98
00:07:41,920 --> 00:07:48,630
So this is the variable class which will create an object which will be this towards variable.
99
00:07:48,760 --> 00:07:54,730
And therefore since we're creating a new object we need to override the previous variable X because
100
00:07:54,730 --> 00:08:01,720
I'm going to call it X again and I'm going to add here X equals variable x and zero.
101
00:08:02,080 --> 00:08:06,870
And that's exactly an object which is the torch viable.
102
00:08:06,890 --> 00:08:10,090
So now purrfect are four transformations are done.
103
00:08:10,150 --> 00:08:17,590
So congratulations you are ready to feed the is the neural network with the input images that are now
104
00:08:17,770 --> 00:08:18,940
towards variables.
105
00:08:19,180 --> 00:08:20,440
So that's wonderful.
106
00:08:20,440 --> 00:08:21,900
We're done with this material.
107
00:08:21,910 --> 00:08:24,730
We did very well our four transformations.
108
00:08:24,730 --> 00:08:31,600
So in the next time we will feed this torch very well to the neural network SSD which is already pre-trained
109
00:08:31,630 --> 00:08:37,870
because you know it's a neural network that was pre-trained to detect between 30 to 40 objects including
110
00:08:37,870 --> 00:08:42,500
planes horses dogs cars boats a lot more.
111
00:08:42,580 --> 00:08:44,820
So we're not going to do the whole training again.
112
00:08:44,830 --> 00:08:45,930
That's the beauty of it.
113
00:08:45,940 --> 00:08:47,860
We have a pre-trained model.
114
00:08:47,880 --> 00:08:51,730
We're going to feed the input image to this pre-trained model and we will get the output.
115
00:08:51,790 --> 00:08:54,470
That is the prediction detection.
116
00:08:54,970 --> 00:08:58,720
So let's do this in the next little while and until then enjoy computer vision.
12714
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.