subtitlecat.com

All language subtitles for 035 Object Detection - Step 9-en

Afrikaans

Akan

Albanian

Amharic

Arabic

Armenian

Azerbaijani

Basque

Belarusian

Bemba

Bengali

Bihari

Bosnian

Breton

Bulgarian

Cambodian

Catalan

Cebuano

Cherokee

Chichewa

Chinese (Simplified)

Chinese (Traditional)

Corsican

Croatian

Czech

Danish

Dutch

English

Esperanto

Estonian

Ewe

Faroese

Filipino

Finnish

French

Frisian

Galician

Georgian

German

Greek

Guarani

Gujarati

Haitian Creole

Hausa

Hawaiian

Hebrew

Hindi

Hmong

Hungarian

Icelandic

Igbo

Indonesian

Interlingua

Irish

Italian

Japanese

Javanese

Kannada

Kazakh

Kinyarwanda

Kirundi

Kongo

Korean

Krio (Sierra Leone)

Kurdish

Kurdish (Soranî)

Kyrgyz

Laothian

Latin

Latvian

Lingala

Lithuanian

Lozi

Luganda

Luo

Luxembourgish

Macedonian

Malagasy

Malay

Malayalam

Maltese

Maori

Marathi

Mauritian Creole

Moldavian

Mongolian

Myanmar (Burmese)

Montenegrin

Nepali

Nigerian Pidgin

Northern Sotho

Norwegian

Norwegian (Nynorsk)

Occitan

Oriya

Oromo

Pashto

Persian

Polish

Portuguese (Brazil)

Portuguese (Portugal)

Punjabi

Quechua

Romanian

Romansh

Runyakitara

Russian

Samoan

Scots Gaelic

Serbian

Serbo-Croatian

Sesotho

Setswana

Seychellois Creole

Shona

Sindhi

Sinhalese

Slovak

Slovenian

Somali

Spanish

Spanish (Latin American)

Sundanese

Swahili

Swedish

Tajik

Tamil

Tatar

Telugu

Thai

Tigrinya

Tonga

Tshiluba

Tumbuka

Turkish Download

Turkmen

Twi

Uighur

Ukrainian

Urdu

Uzbek

Vietnamese Download

Welsh

Wolof

Xhosa

Yiddish

Yoruba

Zulu

Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,450 --> 00:00:02,520 Hello and welcome to this tutorial. 2 00:00:02,550 --> 00:00:06,760 Now we have everything we have our frames that we're going to get from the video. 3 00:00:06,810 --> 00:00:13,110 We have our neural network net the SS The neural network and we have our transform transformation. 4 00:00:13,110 --> 00:00:17,570 So we are ready to do some object detection on a video. 5 00:00:17,790 --> 00:00:24,480 This video is going to be the funny dog that before there is this video of this very cute dog bouncing 6 00:00:24,750 --> 00:00:27,810 on the field on the grass there is curial in the video. 7 00:00:27,810 --> 00:00:32,650 We're going to try to detect as well another human and some other humans behind. 8 00:00:32,640 --> 00:00:38,300 Let's see if it's powerful enough to even detect the humans that are behind the yard. 9 00:00:38,610 --> 00:00:41,440 Well we might figure it out in this Statoil. 10 00:00:41,490 --> 00:00:45,710 And if not in this detail it's going to be the next one so let's do it. 11 00:00:45,720 --> 00:00:48,870 Let's start by opening the video. 12 00:00:48,900 --> 00:00:50,450 That's the first thing we need to do. 13 00:00:50,670 --> 00:00:54,260 Then we're going to get all the frames of the video one by one. 14 00:00:54,300 --> 00:01:00,550 We're going to apply the detect function on these frames with our SSD net and our transform transformation. 15 00:01:00,690 --> 00:01:06,090 Then we'll get the processed images with this rectangle and then we'll reassemble the whole thing to 16 00:01:06,210 --> 00:01:07,920 have the final video. 17 00:01:07,920 --> 00:01:10,880 All right let's do this let's first open the video. 18 00:01:11,010 --> 00:01:18,640 So we're going to create a new object that we're going to call reader and this object is going to be 19 00:01:18,640 --> 00:01:21,800 created with Image IO image. 20 00:01:21,820 --> 00:01:25,020 I always a great library to process videos. 21 00:01:25,050 --> 00:01:31,010 There is another great library that could do the job that is Bill P L in capital letters. 22 00:01:31,030 --> 00:01:36,210 We actually tried that but it turned out to be much more efficient with Image IO. 23 00:01:36,460 --> 00:01:44,480 So we're going to open the video with Image IO and to do this we well first we get our image I O library 24 00:01:44,860 --> 00:01:50,150 and then we're going to use to get this core reader function. 25 00:01:50,380 --> 00:01:52,560 And inside this function what do we need to input. 26 00:01:52,690 --> 00:02:01,610 Well of course it's the video end quote and the video is well the name of the video is funny dog Dutt 27 00:02:01,760 --> 00:02:03,310 and for. 28 00:02:03,360 --> 00:02:04,110 All right. 29 00:02:04,260 --> 00:02:11,010 So that opens the video basically funny dog that and before we're going to watch the video again before 30 00:02:11,100 --> 00:02:17,870 we get the final output then the next step is to get the frequence of the frames. 31 00:02:18,000 --> 00:02:19,730 That is the FPL frequents. 32 00:02:19,810 --> 00:02:27,290 FPL means frames per second and we just need to get this frequence because we're going to need it afterwards. 33 00:02:27,330 --> 00:02:32,810 So let's call this frequence fix and introducing a new variable and to get it. 34 00:02:32,820 --> 00:02:42,540 Well we can get it from our reader object from which we used to get underscore Meda underscored data 35 00:02:43,000 --> 00:02:45,020 Barondess is nothing inside. 36 00:02:45,090 --> 00:02:50,220 But then in square brackets here you have to specify in quotes. 37 00:02:50,410 --> 00:02:56,760 US and that will just get you the FBI frequence that is the number of frames per second. 38 00:02:56,760 --> 00:02:57,250 All right. 39 00:02:57,270 --> 00:03:03,160 And now next step and you're going to understand now why we needed that frames per second frequents 40 00:03:03,900 --> 00:03:10,180 the next step is to create and now put video that is going to be the final output with that same FBA 41 00:03:10,200 --> 00:03:11,010 sequence. 42 00:03:11,250 --> 00:03:16,080 And we're going to create that output video with Again image IO. 43 00:03:16,290 --> 00:03:23,080 So we have to give it a different name we're going to call it writer and again and we're going to call 44 00:03:23,080 --> 00:03:26,010 our image I O library. 45 00:03:26,260 --> 00:03:32,180 And this time since we're not opening a video we are creating a new video where we're not going to use 46 00:03:32,180 --> 00:03:33,580 a get rid of function. 47 00:03:33,580 --> 00:03:41,410 We're going to use to get writer function that basically creates something like an object that will 48 00:03:41,410 --> 00:03:43,090 contain a video. 49 00:03:43,240 --> 00:03:49,270 And this something will add the sequence of frames you know we will append to process frames that is 50 00:03:49,570 --> 00:03:52,540 the frames on which we apply to detect function. 51 00:03:52,540 --> 00:03:54,070 So there we go get writer. 52 00:03:54,330 --> 00:03:59,980 And then we need to put two arguments the first one is the name we want to give to this output video 53 00:04:00,310 --> 00:04:01,490 and we're going to call it. 54 00:04:01,660 --> 00:04:11,340 Well very simply outputs that and before this way I'll put that image for a second argument which is 55 00:04:11,340 --> 00:04:15,180 actually the frequence how many frames per second do we want. 56 00:04:15,180 --> 00:04:20,010 And so while the name of the argument is yes you have to specify it. 57 00:04:20,220 --> 00:04:21,900 And this is equal to this. 58 00:04:21,900 --> 00:04:29,440 FP is variable here that we got things to get made a data function from our reader object. 59 00:04:29,640 --> 00:04:33,190 So FP as equals Appius are right. 60 00:04:33,210 --> 00:04:36,300 So now we have everything we have. 61 00:04:36,300 --> 00:04:43,110 All we need to start this for loop and process each of the images of the funny the video. 62 00:04:43,280 --> 00:04:48,510 And so of course you understand that in each step of the loop we're going to work on a specific frame 63 00:04:48,570 --> 00:04:54,660 of the video and on that frame we're going to apply the detect function to detect the objects in the 64 00:04:54,660 --> 00:04:57,410 frame and print the rectangles. 65 00:04:57,690 --> 00:04:58,380 All right. 66 00:04:58,440 --> 00:05:02,350 So let's start this for loop for I. 67 00:05:02,520 --> 00:05:07,170 So I will just correspond to the number of the image that is processed. 68 00:05:07,200 --> 00:05:13,620 So you know I'm going to go from 0 to I told you there's going to be 68 fremd so I'm going to go from 69 00:05:13,620 --> 00:05:16,500 zero to 68 or 67 something like that. 70 00:05:16,770 --> 00:05:22,980 So for I and then frame of course we're iterating over the frames of the video. 71 00:05:23,280 --> 00:05:28,290 That's why I'm taking frame that's just the name of the variable that will exactly correspond to each 72 00:05:28,290 --> 00:05:30,180 of the frames of the video. 73 00:05:30,200 --> 00:05:31,940 So I-frame in. 74 00:05:32,190 --> 00:05:40,470 And then we can use and enumerate parenthesis reader that that will just iterate through all the frames 75 00:05:40,530 --> 00:05:42,330 of the reader video. 76 00:05:42,530 --> 00:05:43,820 The funny the video. 77 00:05:44,160 --> 00:05:47,630 So for I-frame numerate reader. 78 00:05:48,030 --> 00:05:49,240 Well what do we do. 79 00:05:49,410 --> 00:05:57,450 We simply need to apply to detect method on this frame right here to have some objects detected by the 80 00:05:57,450 --> 00:06:04,530 net which is or as is the neural network that we created associated to the right transformation to transform 81 00:06:04,620 --> 00:06:10,910 object here to make sure that this frame can be accepted into this net. 82 00:06:10,920 --> 00:06:11,280 All right. 83 00:06:11,280 --> 00:06:13,950 So nothing more easy to do here. 84 00:06:13,980 --> 00:06:25,860 We apply the detect function to our frame with our neural network net and with our transformation transform. 85 00:06:25,870 --> 00:06:28,200 However there is just little trick here. 86 00:06:28,280 --> 00:06:31,030 NET is actually an advanced structure. 87 00:06:31,030 --> 00:06:33,750 Remember it's an object of the build as the class. 88 00:06:33,870 --> 00:06:40,680 And in order to get this neural network that is expected by the Dodik function it's not only that the 89 00:06:40,680 --> 00:06:48,750 22 input it's actually not that level that just to align with the way to build as is the function was 90 00:06:48,750 --> 00:06:58,680 made but basically net that yvel represents our new network net from which we get the output y and therefore 91 00:06:58,830 --> 00:07:00,840 the detections on each frame. 92 00:07:01,260 --> 00:07:09,480 So perfect we detected the objects on our frame but remember that this detect function returns actually 93 00:07:09,480 --> 00:07:16,620 the frame the processed frame with the detected object and therefore I'm going to introduce here a new 94 00:07:16,620 --> 00:07:20,030 variable that will represent that process frame. 95 00:07:20,190 --> 00:07:26,240 And there is actually no danger to call it again frame so I'm just overwriting the frame here. 96 00:07:26,490 --> 00:07:28,440 But that's totally OK here. 97 00:07:28,440 --> 00:07:33,930 The frame is the original frame with no detection made yet and this frame is the new frame. 98 00:07:33,930 --> 00:07:37,650 After we apply the detect function with the detector rectangles. 99 00:07:37,850 --> 00:07:38,610 All right. 100 00:07:38,820 --> 00:07:41,990 So now the loop is now over. 101 00:07:42,000 --> 00:07:43,250 What do we need to do. 102 00:07:43,500 --> 00:07:52,710 Well each time we get a new process frame with the objects detected we need to append this frame to 103 00:07:52,920 --> 00:08:00,190 our writer output video and that's exactly what we're going to do now to append a frame to our right 104 00:08:00,190 --> 00:08:01,000 of video. 105 00:08:01,020 --> 00:08:10,790 We simply need to take our writer object than dot and then we use append underscore data function to 106 00:08:10,800 --> 00:08:11,940 which we need two input. 107 00:08:11,940 --> 00:08:18,690 Of course what we want to append to the writer output video and that is of course this new preset frame 108 00:08:18,900 --> 00:08:20,990 with the detected object. 109 00:08:21,000 --> 00:08:21,680 Perfect. 110 00:08:21,720 --> 00:08:24,060 So now the process for them is appended. 111 00:08:24,240 --> 00:08:32,310 And now we can just add print I just to see during the detection which frame we reached. 112 00:08:32,310 --> 00:08:32,910 All right. 113 00:08:32,910 --> 00:08:39,750 That's just a practical thing to see the number of the process frame will be displayed during the detection. 114 00:08:39,750 --> 00:08:42,050 And finally last line of code. 115 00:08:42,240 --> 00:08:49,260 Well we just need to close the process that manages the creation of this video and to close it. 116 00:08:49,260 --> 00:08:56,430 We just need to take our writer and then add that and then close parenthesis the close function that 117 00:08:56,430 --> 00:09:01,390 will close the process and we'll get the output video in that same repertory. 118 00:09:01,410 --> 00:09:04,120 That is our working directory folder. 119 00:09:04,140 --> 00:09:06,210 All right so that's it. 120 00:09:06,210 --> 00:09:08,660 We're ready to watch the final output. 121 00:09:08,670 --> 00:09:10,990 So what do you thing do we do it in Statoil. 122 00:09:11,220 --> 00:09:20,230 Well yeah let's do it let's do it right now so we simply need to select all the code and execute. 123 00:09:20,250 --> 00:09:21,370 There we go. 124 00:09:21,510 --> 00:09:23,850 No error just a warning that's OK. 125 00:09:23,850 --> 00:09:29,060 That's just a warning for f MPEG library but it's OK. 126 00:09:29,490 --> 00:09:33,420 And here we can see the number of the frame that is processed. 127 00:09:33,420 --> 00:09:35,550 You can see that it's going actually pretty fast. 128 00:09:35,730 --> 00:09:37,350 So we'll get the final output video. 129 00:09:37,350 --> 00:09:44,180 Very quickly I told you there's about sixty eight frames to be processed on each frame. 130 00:09:44,190 --> 00:09:46,730 Replying the direct function to the object. 131 00:09:46,950 --> 00:09:51,800 Let's see what happens and we'll get quickly to the final result. 132 00:09:56,490 --> 00:10:00,080 All right so it's about to end very very soon. 133 00:10:00,630 --> 00:10:00,970 Yeah. 134 00:10:01,010 --> 00:10:01,260 OK. 135 00:10:01,260 --> 00:10:08,220 So it went from zero to 67 so there was indeed sixty eight frames to process that is to detect some 136 00:10:08,220 --> 00:10:11,110 object in the video in a two seconds video. 137 00:10:11,340 --> 00:10:12,150 OK. 138 00:10:12,270 --> 00:10:14,980 So let's watch the final output. 139 00:10:15,050 --> 00:10:22,380 But before that I just want to show you again the original video funny Doug before. 140 00:10:22,470 --> 00:10:23,960 So let's watch this again. 141 00:10:24,310 --> 00:10:26,610 BOING BOING BOING BOING BOING BOING. 142 00:10:26,620 --> 00:10:29,090 All right the duck bouncing on the field. 143 00:10:29,560 --> 00:10:33,370 And now let's see what our mole was able to do. 144 00:10:33,430 --> 00:10:39,670 So there is this dog here a human here Carol here and some other humans here. 145 00:10:39,820 --> 00:10:42,790 Let's see what this mole was able to detect. 146 00:10:42,790 --> 00:10:44,240 Going to close that video. 147 00:10:44,260 --> 00:10:50,540 I'm going to get my outputs and let's watch the result. 148 00:10:51,550 --> 00:10:52,110 Ready. 149 00:10:53,210 --> 00:11:02,720 And play all right amazing job Doug was detected the human was detected and I didn't have time to see 150 00:11:03,170 --> 00:11:07,900 how well your role was detected but I think I saw some detections on this humans. 151 00:11:07,910 --> 00:11:09,060 Let's watch this again. 152 00:11:09,110 --> 00:11:14,000 That's an amazing job you can try to do that with open Sivi or some other models. 153 00:11:14,000 --> 00:11:16,480 I'm not sure you get such a great result. 154 00:11:16,490 --> 00:11:20,810 I actually tried it with open city and I definitely didn't get the same results. 155 00:11:20,820 --> 00:11:22,370 There were rectangles everywhere. 156 00:11:22,490 --> 00:11:24,140 So that didn't work. 157 00:11:24,230 --> 00:11:31,940 But with this SSD model the detection is amazing the drug is perfectly well detected. 158 00:11:31,940 --> 00:11:35,580 So now let's see let's see for the other detection. 159 00:11:35,600 --> 00:11:40,590 So the human is also detected this human here but it's quite big in the video. 160 00:11:40,640 --> 00:11:42,900 So of course it's detected. 161 00:11:44,000 --> 00:11:45,990 The drug is well detected as well. 162 00:11:46,760 --> 00:11:49,010 Let's see some other OK. 163 00:11:49,100 --> 00:11:55,070 So the humans behind are hidden by the arms of course they're not yet detected but let's see what happens 164 00:11:55,070 --> 00:11:55,750 next. 165 00:11:59,170 --> 00:12:03,280 All right that's what I'm talking about here on this special frame. 166 00:12:03,310 --> 00:12:09,340 This special frame is very interesting because not only we can see the humans behind detect it very 167 00:12:09,340 --> 00:12:19,740 well detected it's actually detected this lady here and and also Kiril was detected but we lost the 168 00:12:19,740 --> 00:12:20,870 detection on the dog. 169 00:12:20,950 --> 00:12:21,960 And why is that. 170 00:12:21,970 --> 00:12:29,080 It's because the dog merged with Curiel you see CULE has the upper body of Kiril but the lower body 171 00:12:29,080 --> 00:12:33,060 of the dog and therefore the model things that one same person. 172 00:12:33,220 --> 00:12:35,590 And that's why it detected the person. 173 00:12:35,650 --> 00:12:37,030 So that's pretty funny. 174 00:12:37,180 --> 00:12:44,840 And then if I move onto the next frame Well the detection of the real person is gone. 175 00:12:44,950 --> 00:12:47,760 And we got back the detection of the dog. 176 00:12:47,890 --> 00:12:49,870 That's a pretty funny thing that happened here. 177 00:12:50,230 --> 00:12:50,710 OK. 178 00:12:50,710 --> 00:12:56,210 And then all right we had some more dog and more Duguay. 179 00:12:56,290 --> 00:12:59,410 So that's pretty cool isn't it. 180 00:12:59,410 --> 00:13:04,960 The dog is well detected the humans will detect it and sometimes we get some other detections on other 181 00:13:04,960 --> 00:13:08,200 humans that are much harder to detect. 182 00:13:09,580 --> 00:13:16,270 So I hope you are convinced by the power of this as demurral you can actually try with the other ones 183 00:13:16,270 --> 00:13:23,260 you'll see that it's a pretty great job that was done here by the SS The neural network in the next 184 00:13:23,530 --> 00:13:24,060 tutorial. 185 00:13:24,070 --> 00:13:25,340 I'll give you a little homework. 186 00:13:25,390 --> 00:13:32,380 It will be to do some detection and some other video some very cool and really really beautiful video 187 00:13:32,680 --> 00:13:37,390 of some horses running on some field and filmed by the drone. 188 00:13:37,390 --> 00:13:42,590 I'd like you to try this because I like you to keep in mind that this model can not only detect common 189 00:13:42,610 --> 00:13:49,840 things like humans and dogs but also many other objects like horses boats cars planes whatever. 190 00:13:49,840 --> 00:13:54,010 I think between 30 to 40 objects so that will be a funny homework to do. 191 00:13:54,010 --> 00:13:54,900 Not difficult. 192 00:13:54,910 --> 00:14:02,600 But don't worry we'll get back to difficult things in module three with deep convolutional Ganns. 193 00:14:02,770 --> 00:14:05,890 So I'll see you in the homework and module 3. 194 00:14:05,950 --> 00:14:07,830 And until then enjoy can do revision. 19761