subtitlecat.com

All language subtitles for 043 How Do GANs Work (Step 1)-en

Afrikaans

Akan

Albanian

Amharic

Arabic

Armenian

Azerbaijani

Basque

Belarusian

Bemba

Bengali

Bihari

Bosnian

Breton

Bulgarian

Cambodian

Catalan

Cebuano

Cherokee

Chichewa

Chinese (Simplified)

Chinese (Traditional)

Corsican

Croatian

Czech

Danish

Dutch

English

Esperanto

Estonian

Ewe

Faroese

Filipino

Finnish

French

Frisian

Galician

Georgian

German

Greek

Guarani

Gujarati

Haitian Creole

Hausa

Hawaiian

Hebrew

Hindi

Hmong

Hungarian

Icelandic

Igbo

Indonesian

Interlingua

Irish

Italian

Japanese

Javanese

Kannada

Kazakh

Kinyarwanda

Kirundi

Kongo

Korean

Krio (Sierra Leone)

Kurdish

Kurdish (Soranî)

Kyrgyz

Laothian

Latin

Latvian

Lingala

Lithuanian

Lozi

Luganda

Luo

Luxembourgish

Macedonian

Malagasy

Malay

Malayalam

Maltese

Maori

Marathi

Mauritian Creole

Moldavian

Mongolian

Myanmar (Burmese)

Montenegrin

Nepali

Nigerian Pidgin

Northern Sotho

Norwegian

Norwegian (Nynorsk)

Occitan

Oriya

Oromo

Pashto

Persian

Polish

Portuguese (Brazil)

Portuguese (Portugal)

Punjabi

Quechua

Romanian

Romansh

Runyakitara

Russian

Samoan

Scots Gaelic

Serbian

Serbo-Croatian

Sesotho

Setswana

Seychellois Creole

Shona

Sindhi

Sinhalese

Slovak

Slovenian

Somali

Spanish

Spanish (Latin American)

Sundanese

Swahili

Swedish

Tajik

Tamil

Tatar

Telugu

Thai

Tigrinya

Tonga

Tshiluba

Tumbuka

Turkish

Turkmen

Twi

Uighur

Ukrainian

Urdu

Uzbek

Vietnamese Download

Welsh

Wolof

Xhosa

Yiddish

Yoruba

Zulu

Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,450 --> 00:00:03,220 Hello and welcome back to the course on computer vision. 2 00:00:03,420 --> 00:00:06,260 Today we're going to find out how Gannes work. 3 00:00:06,270 --> 00:00:08,830 So this is gonna be an interesting tutorial. 4 00:00:08,880 --> 00:00:11,620 Some exciting slides prepared ahead. 5 00:00:12,120 --> 00:00:14,760 It's going to be quite long so prepare yourself for that. 6 00:00:14,820 --> 00:00:17,590 And let's dive straight into it. 7 00:00:17,610 --> 00:00:17,970 All right. 8 00:00:17,970 --> 00:00:24,360 So as we discussed Gannes stands for generative adversarial networks and we're going to go through these 9 00:00:24,360 --> 00:00:30,420 letters one by one and then we'll get to the training part of the gang which is the exciting part. 10 00:00:30,420 --> 00:00:39,140 All right so G stands for generative and this is a model which takes as input a random noise a signal 11 00:00:39,420 --> 00:00:43,400 and then it can output an image. 12 00:00:44,160 --> 00:00:47,560 The adversarial part stands for the discriminator. 13 00:00:47,640 --> 00:00:53,950 It's another model which is going to be rivalling the generators going to be the opponent of the generator. 14 00:00:54,180 --> 00:01:02,350 You can think of the generator as like a thief or somebody who's trying to forge dollar bills and discriminator 15 00:01:02,390 --> 00:01:04,960 as as the police officer as the detective. 16 00:01:05,220 --> 00:01:11,820 Oh there you go Dee for Detective so is going to be the revival of the generator and diesel model which 17 00:01:11,820 --> 00:01:21,480 is capable of learning about objects animals or people or basically certain features or a cable or blurring 18 00:01:21,480 --> 00:01:21,810 features. 19 00:01:21,810 --> 00:01:27,700 For instance if you show with lots of dogs and then you show it lots of non dogs it will be able to 20 00:01:27,700 --> 00:01:31,410 offer that discriminate between dogs and not dogs. 21 00:01:31,410 --> 00:01:38,080 So in an ideal world if you show the discriminator of this image it'll say zero. 22 00:01:38,150 --> 00:01:41,040 I will give it a zero probability that it's a dog. 23 00:01:41,190 --> 00:01:45,540 And if you show it this image it will give it a warning and give it a 100 percent probability that it's 24 00:01:45,540 --> 00:01:46,210 a dog. 25 00:01:46,500 --> 00:01:49,530 So that is the ideal world. 26 00:01:49,530 --> 00:01:52,030 Now we're going to get oh now we have it. 27 00:01:52,050 --> 00:01:57,000 And so that's why this is where we have this little animation you see. 28 00:01:57,000 --> 00:02:00,030 There we go the network letters jumping around. 29 00:02:00,090 --> 00:02:05,310 And that stands for the fact that these two models are actually neural networks so we're going to replace 30 00:02:05,310 --> 00:02:09,120 them with neural networks because that's what they are. 31 00:02:09,120 --> 00:02:16,350 So we've got a neural network on the left a neural network on the right and why that's important is 32 00:02:16,350 --> 00:02:24,480 because as we discussed in annex to the course about neural networks hopefully you checked that out. 33 00:02:24,690 --> 00:02:30,480 They are capable of learning that's where you know the weights of those synopses that we see in blue. 34 00:02:30,480 --> 00:02:36,690 They are that's where information that's where the training is going to go that's where the those weights 35 00:02:36,690 --> 00:02:39,270 are going to be updated so that the neural networks. 36 00:02:39,630 --> 00:02:39,840 All right. 37 00:02:39,840 --> 00:02:41,220 So we discussed the letters. 38 00:02:41,220 --> 00:02:42,360 Let's go through the training. 39 00:02:42,360 --> 00:02:44,160 Exciting times. 40 00:02:44,160 --> 00:02:44,430 All right. 41 00:02:44,430 --> 00:02:45,490 Step one. 42 00:02:45,750 --> 00:02:47,160 Basically all of the steps. 43 00:02:47,160 --> 00:02:53,000 It's not like step one step two step three that you need to Conseco really not that different. 44 00:02:53,010 --> 00:02:55,530 There is always going to be repeating the same step. 45 00:02:55,530 --> 00:03:01,080 We're going to be doing it many times in fact here in this intuition tutorial we're going to do it three 46 00:03:01,080 --> 00:03:06,940 times in the practical tutorials are going to do with hundreds and thousands of times. 47 00:03:07,520 --> 00:03:09,930 But intuition is going to be enough. 48 00:03:10,050 --> 00:03:14,220 But basically you will see how the networks evolve over the iteration. 49 00:03:14,220 --> 00:03:17,180 So every step is an iteration. 50 00:03:17,460 --> 00:03:23,140 Don't confuse step with espagnol and pork is after you've done your steps. 51 00:03:23,190 --> 00:03:27,760 That's one epal and then you do another book that's you do all the steps again and again. 52 00:03:28,050 --> 00:03:32,570 So you'll see that in the practice but back here what we have is step one. 53 00:03:32,580 --> 00:03:41,320 We input the noise signal into a noise signal a random noise signal into our generator. 54 00:03:41,540 --> 00:03:45,210 It just basically kind of like gives it some random this to generate from it. 55 00:03:45,240 --> 00:03:50,240 It needs to start somewhere and then it generates some images. 56 00:03:50,250 --> 00:03:55,530 These images are as you can see completely useless completely random images. 57 00:03:56,610 --> 00:04:02,040 And at this stage what we want to train is we want to train the discriminator. 58 00:04:02,040 --> 00:04:07,290 So this is what we're going to be doing and you will see exactly the same steps in the practical tutorials 59 00:04:07,290 --> 00:04:12,300 and this is how the generative adversarial networks concept or 60 00:04:14,970 --> 00:04:18,330 algorithm was really like was designed in the first place. 61 00:04:18,330 --> 00:04:20,120 So first we train the discriminator. 62 00:04:20,340 --> 00:04:25,950 And as we discussed we want the descriptor to be able to distinguish between for instance we're going 63 00:04:25,950 --> 00:04:29,060 to stick with the example of dogs between dogs and non-dog. 64 00:04:29,070 --> 00:04:36,420 So we need to give it in addition to these random images we generated we need to give it some dog images. 65 00:04:36,420 --> 00:04:37,130 So there you go. 66 00:04:37,140 --> 00:04:43,740 We've got some dog images and so we're going to input these a batch of batch of non-dog images and a 67 00:04:43,740 --> 00:04:45,850 batch of dog images. 68 00:04:45,850 --> 00:04:51,720 We're going to put them into the discriminator and we'll see what probabilities this community will 69 00:04:51,720 --> 00:04:52,160 spit out. 70 00:04:52,170 --> 00:04:56,240 And remember the discriminator knows nothing at this stage hasn't been trained at all. 71 00:04:56,250 --> 00:04:57,490 This is the very first step. 72 00:04:57,540 --> 00:05:03,460 None of these two models have been trained so the discrimination might spit out numbers like this for 73 00:05:03,460 --> 00:05:08,710 instance zero point or point trees or open fire like quite high probabilities for the images at the 74 00:05:08,710 --> 00:05:09,920 top that they're dogs. 75 00:05:09,940 --> 00:05:14,500 Even though we can see they're not dogs discriminate has never seen a dog before so it might give them 76 00:05:14,830 --> 00:05:20,170 some you know just based on its initial configuration which is you know some probably doesn't really 77 00:05:20,170 --> 00:05:21,070 worry us at this stage. 78 00:05:21,070 --> 00:05:23,770 It's got to start somewhere again. 79 00:05:24,070 --> 00:05:32,950 But you're going to start somewhere and we will fix this as a goal as a train that will fix this and 80 00:05:32,950 --> 00:05:35,920 then it give some probabilities to the images below. 81 00:05:36,310 --> 00:05:39,290 Even though we know their dogs discriminate has never seen a dog before. 82 00:05:39,340 --> 00:05:44,860 It just gives them some believe that happened to be the outputs of this neural network at this point 83 00:05:44,860 --> 00:05:48,920 in time or at this stage. 84 00:05:49,210 --> 00:05:51,760 At the very beginning at the very first stage. 85 00:05:51,820 --> 00:05:59,240 So what happens next is we take these values and we look at what they should have been. 86 00:05:59,350 --> 00:06:08,230 So we as humans because we are training this neural network we know what each of outputs in the ideal 87 00:06:08,230 --> 00:06:12,850 scenario and ideal scenario the top values should have been zeros the bottom values should have been 88 00:06:12,850 --> 00:06:13,330 ones. 89 00:06:13,360 --> 00:06:20,560 So the top owlish top our pictures are not dogs so they should have caught on a 0 percent likelihood 90 00:06:20,560 --> 00:06:21,410 of being dogs. 91 00:06:21,470 --> 00:06:24,970 Bottom one should have grown 100 percent like a big dog so that's ideal world. 92 00:06:25,000 --> 00:06:30,120 Obviously our model isn't ideal but it can learn from this. 93 00:06:30,130 --> 00:06:37,210 So what's going to happen is the era is going to be calculated so basically in every single case the 94 00:06:37,240 --> 00:06:43,960 values will be subtracted more from each other so zero zero point AIDS or about 0.3 or or 0.5 and 1 95 00:06:44,140 --> 00:06:46,660 0.1 minus 1 and so on. 96 00:06:46,660 --> 00:06:52,780 So all those will be subtracted then the cumulative error will be calculated and will be back propagate 97 00:06:52,810 --> 00:06:55,280 through the network of the discriminator. 98 00:06:55,450 --> 00:07:02,800 So if you have and you're not familiar with the term back propagation then I highly recommend checking 99 00:07:02,800 --> 00:07:07,630 an annex on this course where you talk about artificial neural networks and can do it on your own networks 100 00:07:08,380 --> 00:07:15,130 and you will be up to after that you'll be up to speed with everything we discussed including back propagation. 101 00:07:15,310 --> 00:07:21,070 It's quite a lengthy consequence we're going to discuss it here but if you are we're back propagation 102 00:07:21,090 --> 00:07:22,950 and this is exactly what's happening here. 103 00:07:23,080 --> 00:07:28,840 These errors are back propagated through a network and the weights on the network are a bit dated. 104 00:07:28,870 --> 00:07:29,620 So there we go. 105 00:07:29,620 --> 00:07:34,240 Now the discriminator has learned that that was basically the learning process of this has learned from 106 00:07:34,240 --> 00:07:36,920 his mistakes and next time is going to do a better job. 107 00:07:37,770 --> 00:07:41,480 And now we want to train the generator. 108 00:07:41,500 --> 00:07:47,560 So this part of our Gamme and how we're going to trend generation is we're going to take the same image 109 00:07:47,650 --> 00:07:53,860 images that we've created this batch of images which are supposed to be dogs and we're going to use 110 00:07:53,860 --> 00:07:54,430 them again. 111 00:07:54,430 --> 00:07:59,520 So a quick note here is that you'll notice that this is exactly what we did in the practical Tauriel. 112 00:07:59,530 --> 00:08:05,680 We took the images that were indeed there we had generated for the training of the discriminator and 113 00:08:05,680 --> 00:08:09,270 we use the same images in his paper. 114 00:08:09,610 --> 00:08:15,360 Ian Goodfellow recommends taking image generating new images. 115 00:08:15,430 --> 00:08:21,160 So running the noise part and the generation part again and using those images for the train or generator 116 00:08:21,520 --> 00:08:24,170 we just use the same images and it worked fine. 117 00:08:24,430 --> 00:08:25,910 It's up to you how you want to go. 118 00:08:25,930 --> 00:08:30,520 You might want to experiment with this part in the practical if you like or just follow along with our 119 00:08:30,520 --> 00:08:35,230 tutorials and the networks worked really well. 120 00:08:35,230 --> 00:08:38,760 But just so that you're aware of what it is. 121 00:08:38,770 --> 00:08:43,160 Originally in the paper by in good form. 122 00:08:43,620 --> 00:08:48,700 OK so we're going to take those images and we're going to run them through the discriminator again but 123 00:08:48,700 --> 00:08:55,960 this time we don't need any dog images because the way that the generator learns is by trying to trick 124 00:08:55,990 --> 00:09:02,110 the discriminator and based on whether it succeeds or not it will update its way so it doesn't need 125 00:09:02,110 --> 00:09:08,680 to compare to any dogs or dogs it just needs to try and trick the discriminator as the other discriminator 126 00:09:08,890 --> 00:09:13,900 will provide an output as you can see the output is different to what it had before because it has already 127 00:09:14,470 --> 00:09:20,260 being trained itself and therefore knows what to what a dog can kind of more or less it knows what a 128 00:09:20,260 --> 00:09:24,080 dog looks like and it knows what a dog doesn't look like. 129 00:09:24,580 --> 00:09:31,950 And as you can see here the numbers are not all like 0 0 0 and they're kind of close to zero they're 130 00:09:31,960 --> 00:09:38,730 lower but they're not just older or 0 0 because it takes time for the discriminator to learn is one 131 00:09:38,740 --> 00:09:44,470 this illustrates the concept that why these two should be training at the same time because if you for 132 00:09:44,470 --> 00:09:50,920 instance the discriminator was very well trained like it was it had been trained for a very long time 133 00:09:50,940 --> 00:09:57,310 where I was trained separately and up until two to a very good level then and then and the generator 134 00:09:57,310 --> 00:10:04,150 was only being trained now than the discriminator would always like completely completely get leave 135 00:10:04,180 --> 00:10:06,490 no chances for the from the generator. 136 00:10:06,730 --> 00:10:11,920 And that's why the will be like You're too smart for the generator the it would have no chances of tricking 137 00:10:11,920 --> 00:10:13,050 those discriminator. 138 00:10:13,270 --> 00:10:14,920 But that would be not good for anybody. 139 00:10:14,920 --> 00:10:16,590 They have to they have to train together. 140 00:10:16,600 --> 00:10:22,400 They have to kind of get better and better and better with time together and so. 141 00:10:22,770 --> 00:10:27,130 So what we're going to do now is we're going to take these values and we're going to compare them to 142 00:10:27,130 --> 00:10:28,740 what it should be. 143 00:10:28,990 --> 00:10:34,920 And you know what these values should have been like what would you what would you compare them to. 144 00:10:34,990 --> 00:10:36,380 You compare them to zero. 145 00:10:36,470 --> 00:10:40,710 But this time we're going to compare them to one because now our objective is different prove you are 146 00:10:40,720 --> 00:10:46,620 training the discriminator now entering the generator and for the generator it wants to trick the discriminate. 147 00:10:46,620 --> 00:10:51,730 Once those values to be ones as it can see those values are much less than ones who calculate the error 148 00:10:51,870 --> 00:11:00,640 and calculate the difference between between each value on one and you'll take the aggregate era and 149 00:11:00,730 --> 00:11:09,780 it will buy and that era will be back propagated through the network of the discretion of the generator. 150 00:11:09,790 --> 00:11:11,030 This time around. 151 00:11:11,480 --> 00:11:24,280 And so that's going to allow the generator to now update its way its They go so as you can see it in 152 00:11:24,290 --> 00:11:25,710 a day that it's weights there. 153 00:11:25,920 --> 00:11:27,280 And now this time. 154 00:11:27,390 --> 00:11:32,760 Now next time round they will know it will do a better job will be to do a better job of generating 155 00:11:32,750 --> 00:11:37,210 the images because now it knows that this wasn't wasn't good enough. 156 00:11:37,350 --> 00:11:43,080 So it knows how to hold that awaits him up there to improve that result. 157 00:11:43,560 --> 00:11:52,200 And again this is this all boils down to neural networks and the process of back propagation and gradient 158 00:11:52,200 --> 00:11:55,850 descent and the casting gradient descent which happens in the neural network. 159 00:11:55,850 --> 00:12:00,960 So those are concepts that are required in order for if you like to understand in in detail how this 160 00:12:00,960 --> 00:12:03,890 is happening and those concepts we discussed in the annex. 161 00:12:04,250 --> 00:12:04,770 OK. 162 00:12:04,860 --> 00:12:07,320 So that was the first step. 163 00:12:07,320 --> 00:12:07,580 All right. 164 00:12:07,590 --> 00:12:12,140 So let's move on to Step 2. 18202