Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,450 --> 00:00:03,220
Hello and welcome back to the course on computer vision.
2
00:00:03,420 --> 00:00:06,260
Today we're going to find out how Gannes work.
3
00:00:06,270 --> 00:00:08,830
So this is gonna be an interesting tutorial.
4
00:00:08,880 --> 00:00:11,620
Some exciting slides prepared ahead.
5
00:00:12,120 --> 00:00:14,760
It's going to be quite long so prepare yourself for that.
6
00:00:14,820 --> 00:00:17,590
And let's dive straight into it.
7
00:00:17,610 --> 00:00:17,970
All right.
8
00:00:17,970 --> 00:00:24,360
So as we discussed Gannes stands for generative adversarial networks and we're going to go through these
9
00:00:24,360 --> 00:00:30,420
letters one by one and then we'll get to the training part of the gang which is the exciting part.
10
00:00:30,420 --> 00:00:39,140
All right so G stands for generative and this is a model which takes as input a random noise a signal
11
00:00:39,420 --> 00:00:43,400
and then it can output an image.
12
00:00:44,160 --> 00:00:47,560
The adversarial part stands for the discriminator.
13
00:00:47,640 --> 00:00:53,950
It's another model which is going to be rivalling the generators going to be the opponent of the generator.
14
00:00:54,180 --> 00:01:02,350
You can think of the generator as like a thief or somebody who's trying to forge dollar bills and discriminator
15
00:01:02,390 --> 00:01:04,960
as as the police officer as the detective.
16
00:01:05,220 --> 00:01:11,820
Oh there you go Dee for Detective so is going to be the revival of the generator and diesel model which
17
00:01:11,820 --> 00:01:21,480
is capable of learning about objects animals or people or basically certain features or a cable or blurring
18
00:01:21,480 --> 00:01:21,810
features.
19
00:01:21,810 --> 00:01:27,700
For instance if you show with lots of dogs and then you show it lots of non dogs it will be able to
20
00:01:27,700 --> 00:01:31,410
offer that discriminate between dogs and not dogs.
21
00:01:31,410 --> 00:01:38,080
So in an ideal world if you show the discriminator of this image it'll say zero.
22
00:01:38,150 --> 00:01:41,040
I will give it a zero probability that it's a dog.
23
00:01:41,190 --> 00:01:45,540
And if you show it this image it will give it a warning and give it a 100 percent probability that it's
24
00:01:45,540 --> 00:01:46,210
a dog.
25
00:01:46,500 --> 00:01:49,530
So that is the ideal world.
26
00:01:49,530 --> 00:01:52,030
Now we're going to get oh now we have it.
27
00:01:52,050 --> 00:01:57,000
And so that's why this is where we have this little animation you see.
28
00:01:57,000 --> 00:02:00,030
There we go the network letters jumping around.
29
00:02:00,090 --> 00:02:05,310
And that stands for the fact that these two models are actually neural networks so we're going to replace
30
00:02:05,310 --> 00:02:09,120
them with neural networks because that's what they are.
31
00:02:09,120 --> 00:02:16,350
So we've got a neural network on the left a neural network on the right and why that's important is
32
00:02:16,350 --> 00:02:24,480
because as we discussed in annex to the course about neural networks hopefully you checked that out.
33
00:02:24,690 --> 00:02:30,480
They are capable of learning that's where you know the weights of those synopses that we see in blue.
34
00:02:30,480 --> 00:02:36,690
They are that's where information that's where the training is going to go that's where the those weights
35
00:02:36,690 --> 00:02:39,270
are going to be updated so that the neural networks.
36
00:02:39,630 --> 00:02:39,840
All right.
37
00:02:39,840 --> 00:02:41,220
So we discussed the letters.
38
00:02:41,220 --> 00:02:42,360
Let's go through the training.
39
00:02:42,360 --> 00:02:44,160
Exciting times.
40
00:02:44,160 --> 00:02:44,430
All right.
41
00:02:44,430 --> 00:02:45,490
Step one.
42
00:02:45,750 --> 00:02:47,160
Basically all of the steps.
43
00:02:47,160 --> 00:02:53,000
It's not like step one step two step three that you need to Conseco really not that different.
44
00:02:53,010 --> 00:02:55,530
There is always going to be repeating the same step.
45
00:02:55,530 --> 00:03:01,080
We're going to be doing it many times in fact here in this intuition tutorial we're going to do it three
46
00:03:01,080 --> 00:03:06,940
times in the practical tutorials are going to do with hundreds and thousands of times.
47
00:03:07,520 --> 00:03:09,930
But intuition is going to be enough.
48
00:03:10,050 --> 00:03:14,220
But basically you will see how the networks evolve over the iteration.
49
00:03:14,220 --> 00:03:17,180
So every step is an iteration.
50
00:03:17,460 --> 00:03:23,140
Don't confuse step with espagnol and pork is after you've done your steps.
51
00:03:23,190 --> 00:03:27,760
That's one epal and then you do another book that's you do all the steps again and again.
52
00:03:28,050 --> 00:03:32,570
So you'll see that in the practice but back here what we have is step one.
53
00:03:32,580 --> 00:03:41,320
We input the noise signal into a noise signal a random noise signal into our generator.
54
00:03:41,540 --> 00:03:45,210
It just basically kind of like gives it some random this to generate from it.
55
00:03:45,240 --> 00:03:50,240
It needs to start somewhere and then it generates some images.
56
00:03:50,250 --> 00:03:55,530
These images are as you can see completely useless completely random images.
57
00:03:56,610 --> 00:04:02,040
And at this stage what we want to train is we want to train the discriminator.
58
00:04:02,040 --> 00:04:07,290
So this is what we're going to be doing and you will see exactly the same steps in the practical tutorials
59
00:04:07,290 --> 00:04:12,300
and this is how the generative adversarial networks concept or
60
00:04:14,970 --> 00:04:18,330
algorithm was really like was designed in the first place.
61
00:04:18,330 --> 00:04:20,120
So first we train the discriminator.
62
00:04:20,340 --> 00:04:25,950
And as we discussed we want the descriptor to be able to distinguish between for instance we're going
63
00:04:25,950 --> 00:04:29,060
to stick with the example of dogs between dogs and non-dog.
64
00:04:29,070 --> 00:04:36,420
So we need to give it in addition to these random images we generated we need to give it some dog images.
65
00:04:36,420 --> 00:04:37,130
So there you go.
66
00:04:37,140 --> 00:04:43,740
We've got some dog images and so we're going to input these a batch of batch of non-dog images and a
67
00:04:43,740 --> 00:04:45,850
batch of dog images.
68
00:04:45,850 --> 00:04:51,720
We're going to put them into the discriminator and we'll see what probabilities this community will
69
00:04:51,720 --> 00:04:52,160
spit out.
70
00:04:52,170 --> 00:04:56,240
And remember the discriminator knows nothing at this stage hasn't been trained at all.
71
00:04:56,250 --> 00:04:57,490
This is the very first step.
72
00:04:57,540 --> 00:05:03,460
None of these two models have been trained so the discrimination might spit out numbers like this for
73
00:05:03,460 --> 00:05:08,710
instance zero point or point trees or open fire like quite high probabilities for the images at the
74
00:05:08,710 --> 00:05:09,920
top that they're dogs.
75
00:05:09,940 --> 00:05:14,500
Even though we can see they're not dogs discriminate has never seen a dog before so it might give them
76
00:05:14,830 --> 00:05:20,170
some you know just based on its initial configuration which is you know some probably doesn't really
77
00:05:20,170 --> 00:05:21,070
worry us at this stage.
78
00:05:21,070 --> 00:05:23,770
It's got to start somewhere again.
79
00:05:24,070 --> 00:05:32,950
But you're going to start somewhere and we will fix this as a goal as a train that will fix this and
80
00:05:32,950 --> 00:05:35,920
then it give some probabilities to the images below.
81
00:05:36,310 --> 00:05:39,290
Even though we know their dogs discriminate has never seen a dog before.
82
00:05:39,340 --> 00:05:44,860
It just gives them some believe that happened to be the outputs of this neural network at this point
83
00:05:44,860 --> 00:05:48,920
in time or at this stage.
84
00:05:49,210 --> 00:05:51,760
At the very beginning at the very first stage.
85
00:05:51,820 --> 00:05:59,240
So what happens next is we take these values and we look at what they should have been.
86
00:05:59,350 --> 00:06:08,230
So we as humans because we are training this neural network we know what each of outputs in the ideal
87
00:06:08,230 --> 00:06:12,850
scenario and ideal scenario the top values should have been zeros the bottom values should have been
88
00:06:12,850 --> 00:06:13,330
ones.
89
00:06:13,360 --> 00:06:20,560
So the top owlish top our pictures are not dogs so they should have caught on a 0 percent likelihood
90
00:06:20,560 --> 00:06:21,410
of being dogs.
91
00:06:21,470 --> 00:06:24,970
Bottom one should have grown 100 percent like a big dog so that's ideal world.
92
00:06:25,000 --> 00:06:30,120
Obviously our model isn't ideal but it can learn from this.
93
00:06:30,130 --> 00:06:37,210
So what's going to happen is the era is going to be calculated so basically in every single case the
94
00:06:37,240 --> 00:06:43,960
values will be subtracted more from each other so zero zero point AIDS or about 0.3 or or 0.5 and 1
95
00:06:44,140 --> 00:06:46,660
0.1 minus 1 and so on.
96
00:06:46,660 --> 00:06:52,780
So all those will be subtracted then the cumulative error will be calculated and will be back propagate
97
00:06:52,810 --> 00:06:55,280
through the network of the discriminator.
98
00:06:55,450 --> 00:07:02,800
So if you have and you're not familiar with the term back propagation then I highly recommend checking
99
00:07:02,800 --> 00:07:07,630
an annex on this course where you talk about artificial neural networks and can do it on your own networks
100
00:07:08,380 --> 00:07:15,130
and you will be up to after that you'll be up to speed with everything we discussed including back propagation.
101
00:07:15,310 --> 00:07:21,070
It's quite a lengthy consequence we're going to discuss it here but if you are we're back propagation
102
00:07:21,090 --> 00:07:22,950
and this is exactly what's happening here.
103
00:07:23,080 --> 00:07:28,840
These errors are back propagated through a network and the weights on the network are a bit dated.
104
00:07:28,870 --> 00:07:29,620
So there we go.
105
00:07:29,620 --> 00:07:34,240
Now the discriminator has learned that that was basically the learning process of this has learned from
106
00:07:34,240 --> 00:07:36,920
his mistakes and next time is going to do a better job.
107
00:07:37,770 --> 00:07:41,480
And now we want to train the generator.
108
00:07:41,500 --> 00:07:47,560
So this part of our Gamme and how we're going to trend generation is we're going to take the same image
109
00:07:47,650 --> 00:07:53,860
images that we've created this batch of images which are supposed to be dogs and we're going to use
110
00:07:53,860 --> 00:07:54,430
them again.
111
00:07:54,430 --> 00:07:59,520
So a quick note here is that you'll notice that this is exactly what we did in the practical Tauriel.
112
00:07:59,530 --> 00:08:05,680
We took the images that were indeed there we had generated for the training of the discriminator and
113
00:08:05,680 --> 00:08:09,270
we use the same images in his paper.
114
00:08:09,610 --> 00:08:15,360
Ian Goodfellow recommends taking image generating new images.
115
00:08:15,430 --> 00:08:21,160
So running the noise part and the generation part again and using those images for the train or generator
116
00:08:21,520 --> 00:08:24,170
we just use the same images and it worked fine.
117
00:08:24,430 --> 00:08:25,910
It's up to you how you want to go.
118
00:08:25,930 --> 00:08:30,520
You might want to experiment with this part in the practical if you like or just follow along with our
119
00:08:30,520 --> 00:08:35,230
tutorials and the networks worked really well.
120
00:08:35,230 --> 00:08:38,760
But just so that you're aware of what it is.
121
00:08:38,770 --> 00:08:43,160
Originally in the paper by in good form.
122
00:08:43,620 --> 00:08:48,700
OK so we're going to take those images and we're going to run them through the discriminator again but
123
00:08:48,700 --> 00:08:55,960
this time we don't need any dog images because the way that the generator learns is by trying to trick
124
00:08:55,990 --> 00:09:02,110
the discriminator and based on whether it succeeds or not it will update its way so it doesn't need
125
00:09:02,110 --> 00:09:08,680
to compare to any dogs or dogs it just needs to try and trick the discriminator as the other discriminator
126
00:09:08,890 --> 00:09:13,900
will provide an output as you can see the output is different to what it had before because it has already
127
00:09:14,470 --> 00:09:20,260
being trained itself and therefore knows what to what a dog can kind of more or less it knows what a
128
00:09:20,260 --> 00:09:24,080
dog looks like and it knows what a dog doesn't look like.
129
00:09:24,580 --> 00:09:31,950
And as you can see here the numbers are not all like 0 0 0 and they're kind of close to zero they're
130
00:09:31,960 --> 00:09:38,730
lower but they're not just older or 0 0 because it takes time for the discriminator to learn is one
131
00:09:38,740 --> 00:09:44,470
this illustrates the concept that why these two should be training at the same time because if you for
132
00:09:44,470 --> 00:09:50,920
instance the discriminator was very well trained like it was it had been trained for a very long time
133
00:09:50,940 --> 00:09:57,310
where I was trained separately and up until two to a very good level then and then and the generator
134
00:09:57,310 --> 00:10:04,150
was only being trained now than the discriminator would always like completely completely get leave
135
00:10:04,180 --> 00:10:06,490
no chances for the from the generator.
136
00:10:06,730 --> 00:10:11,920
And that's why the will be like You're too smart for the generator the it would have no chances of tricking
137
00:10:11,920 --> 00:10:13,050
those discriminator.
138
00:10:13,270 --> 00:10:14,920
But that would be not good for anybody.
139
00:10:14,920 --> 00:10:16,590
They have to they have to train together.
140
00:10:16,600 --> 00:10:22,400
They have to kind of get better and better and better with time together and so.
141
00:10:22,770 --> 00:10:27,130
So what we're going to do now is we're going to take these values and we're going to compare them to
142
00:10:27,130 --> 00:10:28,740
what it should be.
143
00:10:28,990 --> 00:10:34,920
And you know what these values should have been like what would you what would you compare them to.
144
00:10:34,990 --> 00:10:36,380
You compare them to zero.
145
00:10:36,470 --> 00:10:40,710
But this time we're going to compare them to one because now our objective is different prove you are
146
00:10:40,720 --> 00:10:46,620
training the discriminator now entering the generator and for the generator it wants to trick the discriminate.
147
00:10:46,620 --> 00:10:51,730
Once those values to be ones as it can see those values are much less than ones who calculate the error
148
00:10:51,870 --> 00:11:00,640
and calculate the difference between between each value on one and you'll take the aggregate era and
149
00:11:00,730 --> 00:11:09,780
it will buy and that era will be back propagated through the network of the discretion of the generator.
150
00:11:09,790 --> 00:11:11,030
This time around.
151
00:11:11,480 --> 00:11:24,280
And so that's going to allow the generator to now update its way its They go so as you can see it in
152
00:11:24,290 --> 00:11:25,710
a day that it's weights there.
153
00:11:25,920 --> 00:11:27,280
And now this time.
154
00:11:27,390 --> 00:11:32,760
Now next time round they will know it will do a better job will be to do a better job of generating
155
00:11:32,750 --> 00:11:37,210
the images because now it knows that this wasn't wasn't good enough.
156
00:11:37,350 --> 00:11:43,080
So it knows how to hold that awaits him up there to improve that result.
157
00:11:43,560 --> 00:11:52,200
And again this is this all boils down to neural networks and the process of back propagation and gradient
158
00:11:52,200 --> 00:11:55,850
descent and the casting gradient descent which happens in the neural network.
159
00:11:55,850 --> 00:12:00,960
So those are concepts that are required in order for if you like to understand in in detail how this
160
00:12:00,960 --> 00:12:03,890
is happening and those concepts we discussed in the annex.
161
00:12:04,250 --> 00:12:04,770
OK.
162
00:12:04,860 --> 00:12:07,320
So that was the first step.
163
00:12:07,320 --> 00:12:07,580
All right.
164
00:12:07,590 --> 00:12:12,140
So let's move on to Step 2.
18202
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.