Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,510 --> 00:00:05,250
Hello and welcome to this new editorial in the British this editorials we described the challenge that
2
00:00:05,250 --> 00:00:05,890
we have.
3
00:00:05,970 --> 00:00:12,750
We have to detect a funny dog on a two seconds video and we will do it through a computer vision based
4
00:00:12,840 --> 00:00:14,730
on deep learning neural networks.
5
00:00:14,790 --> 00:00:18,050
So we already found the right folder now.
6
00:00:18,070 --> 00:00:22,850
This quick to toile I'm going to explain the libraries that we're going to use.
7
00:00:22,890 --> 00:00:27,480
They're already all installed I already prepared the code that will import them.
8
00:00:27,480 --> 00:00:33,000
So there is nothing to do but I think it's important that you understand what we will be using them
9
00:00:33,000 --> 00:00:33,840
for.
10
00:00:33,840 --> 00:00:39,060
So let's start with the first one as you can see the first library when port is torche that's of course
11
00:00:39,150 --> 00:00:47,220
the torch library that contains PI torch which is definitely by far our best weapon to build a new one
12
00:00:47,220 --> 00:00:52,650
that work and do some computer vision and that's for a specific reason it's because by torch content
13
00:00:52,650 --> 00:00:58,890
the dynamic graphs things to which we are able to compute very efficiently the gradients of composition
14
00:00:58,890 --> 00:01:01,590
functions in backward propagation.
15
00:01:01,590 --> 00:01:06,870
You know when we have to update day to wait through stochastic gradient descent Well we have to compute
16
00:01:06,960 --> 00:01:11,380
the gradient of some composition functions because we have several layers.
17
00:01:11,400 --> 00:01:14,240
You know it's a deep neural network so we have several layers.
18
00:01:14,370 --> 00:01:19,740
And therefore it's like one has some functions of the PRI's layer which has some functions of the previous
19
00:01:19,740 --> 00:01:23,160
previous layer so that generates some composition functions.
20
00:01:23,160 --> 00:01:28,140
We have to compute the gradient of these composition functions to have data weights according to how
21
00:01:28,140 --> 00:01:32,980
much they are responsible for the error between the target and the predictions.
22
00:01:33,120 --> 00:01:34,860
So that's where it plays.
23
00:01:35,010 --> 00:01:41,400
And the dynamic graphs is a highly advanced graph structure that allows to have some very fast and efficient
24
00:01:41,610 --> 00:01:43,550
computation of the gradients.
25
00:01:43,560 --> 00:01:49,980
So that's why torture is our first choice then from torche undergrad which is the module responsible
26
00:01:49,980 --> 00:01:51,240
for graden descent.
27
00:01:51,360 --> 00:01:58,710
We are importing the variable class which will be used to convert the tensors into some torche variables
28
00:01:58,710 --> 00:02:02,030
that will contain both the tensor and a gradient.
29
00:02:02,160 --> 00:02:07,710
And then the storage variable containing the tensor in the gradients will be one element of the graph.
30
00:02:07,710 --> 00:02:14,550
Then of course we import CB2 and that even if we're not going to implement a model based on open Hargus
31
00:02:14,580 --> 00:02:20,310
gate we're just importing CB2 because we will be drawing some rectangles around the deck.
32
00:02:20,390 --> 00:02:24,860
But the detection of the dog will not be based on open city Harker's Cate's.
33
00:02:25,020 --> 00:02:33,160
They will be based on as is the neural network that is single shot multa box detection so opens we just
34
00:02:33,180 --> 00:02:41,490
to draw the rectangles then hear from Data Import base transform the classes as label map data is just
35
00:02:41,490 --> 00:02:50,010
a folder that contains the classes based transform and classes then base transform is a class that will
36
00:02:50,010 --> 00:02:56,850
do the required transformations so that the image the input images will be compatible with the neural
37
00:02:56,850 --> 00:02:57,450
network.
38
00:02:57,510 --> 00:03:02,370
You know when we feed the neural network with the input images they have to have a certain format and
39
00:03:02,490 --> 00:03:08,880
base transform will be used to transform the images in this format so that they can be accepted into
40
00:03:08,880 --> 00:03:10,100
the network.
41
00:03:10,380 --> 00:03:12,360
And then what Les's.
42
00:03:12,630 --> 00:03:17,160
Well look Les's is just a dictionary that will do the encoding of the classes.
43
00:03:17,280 --> 00:03:24,390
So for example planes will be encoded as one Dug's will be included as to is just an example it's not
44
00:03:24,540 --> 00:03:26,710
exactly just mapping but that's the idea.
45
00:03:26,730 --> 00:03:31,170
We'll do a mapping because of course we want to work with numbers and not text.
46
00:03:31,230 --> 00:03:36,480
So that's just a very simple dictionary doing the mapping between the text fields of the classes and
47
00:03:36,540 --> 00:03:37,860
some integers.
48
00:03:37,860 --> 00:03:42,400
All right then from the import build SSD.
49
00:03:42,720 --> 00:03:51,600
So first SSD is the library of the single shot multi-book action model and then build as that we import
50
00:03:51,600 --> 00:03:57,120
from the SSD library will be the constructor of the SSD neural network.
51
00:03:57,120 --> 00:04:03,090
And so if you want to have a look you can have a look in this as is digitized and fell to see how it
52
00:04:03,090 --> 00:04:03,680
works.
53
00:04:03,840 --> 00:04:09,300
But it is just a constructor that will build the architecture of this single shot not box detection
54
00:04:09,300 --> 00:04:10,100
model.
55
00:04:10,230 --> 00:04:19,860
And finally image I know is just the library that we'll use to process the images of the video and applying
56
00:04:19,980 --> 00:04:23,250
the detect function that will implement on the images.
57
00:04:23,250 --> 00:04:30,960
So at first we wanted to import pill P L which is another library but image I O actually turns out to
58
00:04:30,960 --> 00:04:34,090
be a much better choice in terms of lines of code.
59
00:04:34,170 --> 00:04:40,350
You'll see we will only have to type two or three lines of code to be able to process the images of
60
00:04:40,350 --> 00:04:41,100
the video.
61
00:04:41,220 --> 00:04:47,430
That is funny Doug and before and apply to detect function that will implement to detect the dog and
62
00:04:47,430 --> 00:04:49,430
the humans on the video.
63
00:04:50,010 --> 00:04:50,370
All right.
64
00:04:50,370 --> 00:04:56,490
So I hope you have now a clear understanding of the libraries that will be used.
65
00:04:56,490 --> 00:04:58,360
It's important to know how they work.
66
00:04:58,560 --> 00:05:05,820
And now with you're going to do is define the detect function that will do the detections.
67
00:05:05,840 --> 00:05:08,650
So let's take a fresh start in the next tutorial.
68
00:05:08,750 --> 00:05:10,790
And so until then enjoy computer vision.
7633
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.