subtitlecat.com

All language subtitles for 027 Object Detection - Step 1-en

Afrikaans

Akan

Albanian

Amharic

Arabic

Armenian

Azerbaijani

Basque

Belarusian

Bemba

Bengali

Bihari

Bosnian

Breton

Bulgarian

Cambodian

Catalan

Cebuano

Cherokee

Chichewa

Chinese (Simplified)

Chinese (Traditional)

Corsican

Croatian

Czech

Danish

Dutch

English

Esperanto

Estonian

Ewe

Faroese

Filipino

Finnish

French

Frisian

Galician

Georgian

German

Greek

Guarani

Gujarati

Haitian Creole

Hausa

Hawaiian

Hebrew

Hindi

Hmong

Hungarian

Icelandic

Igbo

Indonesian

Interlingua

Irish

Italian

Japanese

Javanese

Kannada

Kazakh

Kinyarwanda

Kirundi

Kongo

Korean

Krio (Sierra Leone)

Kurdish

Kurdish (Soranî)

Kyrgyz

Laothian

Latin

Latvian

Lingala

Lithuanian

Lozi

Luganda

Luo

Luxembourgish

Macedonian

Malagasy

Malay

Malayalam

Maltese

Maori

Marathi

Mauritian Creole

Moldavian

Mongolian

Myanmar (Burmese)

Montenegrin

Nepali

Nigerian Pidgin

Northern Sotho

Norwegian

Norwegian (Nynorsk)

Occitan

Oriya

Oromo

Pashto

Persian

Polish

Portuguese (Brazil)

Portuguese (Portugal)

Punjabi

Quechua

Romanian

Romansh

Runyakitara

Russian

Samoan

Scots Gaelic

Serbian

Serbo-Croatian

Sesotho

Setswana

Seychellois Creole

Shona

Sindhi

Sinhalese

Slovak

Slovenian

Somali

Spanish

Spanish (Latin American)

Sundanese

Swahili

Swedish

Tajik

Tamil

Tatar

Telugu

Thai

Tigrinya

Tonga

Tshiluba

Tumbuka

Turkish

Turkmen

Twi

Uighur

Ukrainian

Urdu

Uzbek

Vietnamese Download

Welsh

Wolof

Xhosa

Yiddish

Yoruba

Zulu

Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,560 --> 00:00:07,190 Hello and welcome to the practical applications of module to object detection I'm super excited to start 2 00:00:07,190 --> 00:00:08,830 this module for two reasons. 3 00:00:08,840 --> 00:00:12,260 First one is we are taking things at the next level now. 4 00:00:12,440 --> 00:00:18,560 As I told you open TV is not the most powerful model and the model we will implement in this module 5 00:00:18,710 --> 00:00:24,410 is much more powerful because it is based on deep learning and neural networks that computer vision 6 00:00:24,500 --> 00:00:26,050 where the computer will have a brain. 7 00:00:26,150 --> 00:00:27,580 That's exactly what it means. 8 00:00:27,590 --> 00:00:30,890 And the second reason is that we have an exciting challenge. 9 00:00:30,890 --> 00:00:34,830 I will show you a video of a very cute dog bouncing on the field. 10 00:00:35,000 --> 00:00:42,800 And our challenge will be to detect the dog will be to implement some program that will detect the dog 11 00:00:42,890 --> 00:00:43,600 in the video. 12 00:00:43,790 --> 00:00:49,400 So it's good that you see several ways of doing some computer vision in the first module you learn how 13 00:00:49,400 --> 00:00:53,240 to do some face detection through a webcam. 14 00:00:53,450 --> 00:00:58,160 And now you're going to learn how to do some object detection on a video directly. 15 00:00:58,160 --> 00:01:05,280 Now before we start I would like to say a big thank you to this developer here next to Groote. 16 00:01:05,360 --> 00:01:07,390 That's a picture of him in a horseshoe. 17 00:01:07,580 --> 00:01:13,850 He's the creator of the PI torch implementation of single shot multi-book detector that we're going 18 00:01:13,850 --> 00:01:15,420 to use in this module. 19 00:01:15,440 --> 00:01:19,850 So thank you very much for sharing this and make it open source. 20 00:01:19,850 --> 00:01:27,710 We actually tried several object detection models we tried the first are CNN the yellow open CD and 21 00:01:27,860 --> 00:01:33,320 the SSD and we obtain the best result with the single shot multi-book detection. 22 00:01:33,320 --> 00:01:38,750 Not only we obtain the best result with this moral but also if you look at the paper you will see that 23 00:01:38,810 --> 00:01:45,270 on the tested cases the single shot multiplexed detection model beats yolo and fester are CNN. 24 00:01:45,440 --> 00:01:53,370 So that's why our choice for the ultimate objective texture model of muchall 2 was single shot multi 25 00:01:53,380 --> 00:01:58,870 box detection and the best implementation we found was from this developer Max agreed. 26 00:01:58,990 --> 00:02:00,230 So thank you so much. 27 00:02:00,230 --> 00:02:06,020 Thank you for sharing this pre-trained moral it's actually a pre-trained moral that was trained to detect 28 00:02:06,110 --> 00:02:13,790 between 30 and 40 objects including cars dogs horses ships boats planes and more. 29 00:02:13,850 --> 00:02:18,410 So a very useful not all that you could use for your own business problems. 30 00:02:18,410 --> 00:02:24,020 We're going to go inside the SSD in this module and we're going to learn how to use it and how to detect 31 00:02:24,110 --> 00:02:26,610 any object on any video. 32 00:02:26,660 --> 00:02:28,650 So that's going to be a pretty exciting module. 33 00:02:28,670 --> 00:02:30,640 I can't wait to show you this video of this Doug. 34 00:02:30,650 --> 00:02:35,860 I really like this Doug it's actually Carol who filmed this little dog with a drone. 35 00:02:36,020 --> 00:02:41,240 So the first thing we're going to do now is we're going to open Anaconda because I want to make sure 36 00:02:41,540 --> 00:02:44,900 that you don't forget to connect to the virtual platform. 37 00:02:45,140 --> 00:02:46,520 So let's do it. 38 00:02:46,550 --> 00:02:49,130 I'm opening an icon the Navigator. 39 00:02:49,130 --> 00:02:51,430 You have to find an X on the Navigator. 40 00:02:51,470 --> 00:02:57,440 If you're on Windows you will find it in the list of programs and on Linux you can open it through either 41 00:02:57,460 --> 00:03:00,410 to terminal or in the programs. 42 00:03:00,410 --> 00:03:03,490 All right so now and I can the navigator is opened. 43 00:03:03,650 --> 00:03:08,510 And don't forget to do this applications on virtual platform. 44 00:03:08,510 --> 00:03:09,020 There we go. 45 00:03:09,020 --> 00:03:11,670 Now we're connected to the virtual platform environment. 46 00:03:11,780 --> 00:03:14,330 And so we were ready to launch spider. 47 00:03:14,540 --> 00:03:17,010 And we don't have to install anything. 48 00:03:17,030 --> 00:03:19,760 Everything is already installed on the virtual platform. 49 00:03:19,850 --> 00:03:23,900 So we are ready to execute the code and I'm super happy to start. 50 00:03:24,320 --> 00:03:30,530 But before we start implementing the code we have to be in the right folder because there are some external 51 00:03:30,530 --> 00:03:34,230 files that we'll be calling when executing the code in the end. 52 00:03:34,400 --> 00:03:38,880 So anyway we always have to be in the right folder where we implement the code. 53 00:03:38,930 --> 00:03:43,040 So that's the first thing I'm going to do now I'm going to go to my desktop. 54 00:03:43,130 --> 00:03:46,500 This is where my computer vision is at full that is. 55 00:03:46,670 --> 00:03:53,870 So let's double click on it and now congratulations you reached module to object detection. 56 00:03:53,870 --> 00:03:54,280 All right. 57 00:03:54,290 --> 00:03:55,330 That's the folder. 58 00:03:55,460 --> 00:03:58,190 Let's quickly describe what's inside this folder. 59 00:03:58,250 --> 00:04:04,880 So you have data is just a folder that contains the classes based transform that will do the required 60 00:04:04,880 --> 00:04:11,300 transformations so that the input images will be compatible with the neural network then. 61 00:04:11,540 --> 00:04:15,800 Funny Doug is of course this video of this very funny. 62 00:04:15,810 --> 00:04:17,590 We will be trying to detect. 63 00:04:17,660 --> 00:04:19,420 I will show you this video in a second. 64 00:04:19,670 --> 00:04:25,990 But then layer's is another folder that contained some other tools for the detection and the multi-book 65 00:04:26,020 --> 00:04:28,270 as part of the SSD. 66 00:04:28,280 --> 00:04:35,480 Then you have of course to code the commented version of the code object detection commented where you 67 00:04:35,480 --> 00:04:41,300 have the whole code that will implemented this module come into line by line so that can be useful either 68 00:04:41,300 --> 00:04:42,770 before or after. 69 00:04:42,770 --> 00:04:48,350 Actually I also recommend to have a look at this before so that you can expect what you need to understand 70 00:04:48,500 --> 00:04:55,100 and therefore when I explain it you might understand it more easily than this code object detection 71 00:04:55,520 --> 00:04:57,540 is actually going to open it. 72 00:04:57,650 --> 00:05:03,280 Is the code that we will implement in this module so I already imported the libraries. 73 00:05:03,280 --> 00:05:05,620 I'm going to describe what those libraries are. 74 00:05:05,620 --> 00:05:11,440 But anyway this is where I will implement this whole code and when I'm done implementing it with you 75 00:05:11,800 --> 00:05:14,510 I will rename it object detection. 76 00:05:14,530 --> 00:05:17,880 No comment that you can have the commented version of the code. 77 00:05:17,950 --> 00:05:22,220 And the non-committed version of the code you can practice to recoated. 78 00:05:22,390 --> 00:05:24,350 That's excellent practice. 79 00:05:24,400 --> 00:05:31,400 Then we have the SSD that you wife file which contains the architecture of the single shot multi-button 80 00:05:31,400 --> 00:05:32,420 action model. 81 00:05:32,440 --> 00:05:38,590 We won't implement this one because I want it to keep what's most important for you to understand in 82 00:05:38,890 --> 00:05:40,990 this object detection implementation. 83 00:05:41,170 --> 00:05:45,290 Because if we implement the whole model this will be overwhelming. 84 00:05:45,400 --> 00:05:48,700 And you might miss what's at the heart of the model. 85 00:05:48,760 --> 00:05:51,030 So I prefer to proceed this way. 86 00:05:51,100 --> 00:05:54,020 And this model is all the architecture. 87 00:05:54,070 --> 00:05:59,620 And in fact after you watched in tuition lectures you will be totally able to understand what's going 88 00:05:59,620 --> 00:06:00,010 on. 89 00:06:00,060 --> 00:06:04,880 Well it's mostly about the architecture with all the boxes how they're defined. 90 00:06:05,020 --> 00:06:10,580 But then the heart of the model will be in this implementation objective section. 91 00:06:11,050 --> 00:06:19,150 And then finally this file is the file we will be loading to get the pre-trained SS DeMaio and more 92 00:06:19,150 --> 00:06:26,690 precisely this is the file that contains the weight of the SSD neural network that was already pre-trained. 93 00:06:26,890 --> 00:06:33,340 So we will be loading this file with torch the torch library that load which is a function of torche 94 00:06:33,840 --> 00:06:39,610 this tortured load function will open a tensor a tensor that will contain the weight of this already 95 00:06:39,610 --> 00:06:46,240 pre-trained neural network and then through a mapping with a dictionary we will transfer these weights 96 00:06:46,510 --> 00:06:48,370 to the model we implement. 97 00:06:48,550 --> 00:06:53,890 So basically this just contains the weight of an already pre-trained model and we will transfer these 98 00:06:53,890 --> 00:06:56,880 weights to the model we will implement. 99 00:06:56,890 --> 00:06:58,820 I hope that's clear and that's it. 100 00:06:58,930 --> 00:07:00,620 So I guess we're ready to start. 101 00:07:00,640 --> 00:07:05,140 And therefore let's start with some funny video of this very cute Doug. 102 00:07:05,230 --> 00:07:08,350 So I'm going to double click on the video. 103 00:07:08,350 --> 00:07:08,920 There we go. 104 00:07:08,920 --> 00:07:11,220 That's the video you can recognize. 105 00:07:11,220 --> 00:07:14,870 Kiril going to put that at the beginning. 106 00:07:15,030 --> 00:07:15,980 So this is curial. 107 00:07:15,990 --> 00:07:17,120 This is the dog. 108 00:07:17,190 --> 00:07:20,580 This video last two seconds so that it doesn't take too much time. 109 00:07:20,670 --> 00:07:22,340 When you try to Marans video. 110 00:07:22,470 --> 00:07:25,980 But we will totally have time to see the dog bouncing. 111 00:07:25,980 --> 00:07:27,000 It's very funny. 112 00:07:27,000 --> 00:07:27,770 Check this out. 113 00:07:30,670 --> 00:07:35,450 It you Doug when I watched this Doug I absolutely want to play with him. 114 00:07:37,320 --> 00:07:40,570 And actually you can see Kyrle piloting the drone behind. 115 00:07:40,770 --> 00:07:42,280 So there we go. 116 00:07:42,300 --> 00:07:43,400 That's the video. 117 00:07:43,520 --> 00:07:50,700 And actually the model we will implement will not only detect the dog bouncing on the field but also 118 00:07:50,700 --> 00:07:51,930 this human here. 119 00:07:51,930 --> 00:07:57,180 And you will see that it will also manage to detect curial even if you're really far actually from the 120 00:07:57,180 --> 00:07:57,800 video. 121 00:07:58,050 --> 00:08:04,440 And I'd like to tell you now that actually you know for you it's very easy to detect the drug but the 122 00:08:04,440 --> 00:08:06,720 drug is actually pretty small in the video. 123 00:08:06,750 --> 00:08:08,540 You know it's a pretty small object. 124 00:08:08,730 --> 00:08:14,220 And when we tried to detect that with open city we had extremely bad results. 125 00:08:14,220 --> 00:08:19,160 It couldn't detect the drug it couldn't detect what it was and there were some rectangles everywhere 126 00:08:19,170 --> 00:08:20,830 you can actually try yourself. 127 00:08:21,030 --> 00:08:26,580 And that's why I wanted to highlight that open Svea is definitely not among the most powerful models 128 00:08:26,850 --> 00:08:32,970 but you'll see that the more we will implement in this module will do a perfect job at detecting this 129 00:08:32,970 --> 00:08:39,120 drug even if it is small and even if there is not a perfect contrast between the drug and environment 130 00:08:39,120 --> 00:08:42,400 you know it's not like we have a white environment with a black dog. 131 00:08:42,630 --> 00:08:44,920 The dog can be confused with something else. 132 00:08:45,120 --> 00:08:54,450 So you'll be convinced of the power of this model at the end and I can't wait to show you how this model 133 00:08:54,540 --> 00:08:57,180 is going to do all right. 134 00:08:57,230 --> 00:08:58,570 That's what I wanted to catch. 135 00:08:58,570 --> 00:09:02,880 You know sometimes it really doesn't look like a dyke but you'll see what happens. 136 00:09:02,890 --> 00:09:09,220 Let's implement the SSD single shot multi-book detection and let's do that from the next tutorial. 137 00:09:09,280 --> 00:09:11,320 Until then enjoy computer vision. 14347