subtitlecat.com

All language subtitles for 09-Lecture 1 Segment 9 What can AI do Perception.en

Afrikaans

Akan

Albanian

Amharic

Arabic Download

Armenian

Azerbaijani

Basque

Belarusian

Bemba

Bengali

Bihari

Bosnian

Breton

Bulgarian

Cambodian

Catalan

Cebuano

Cherokee

Chichewa

Chinese (Simplified)

Chinese (Traditional)

Corsican

Croatian

Czech

Danish

Dutch

English

Esperanto

Estonian

Ewe

Faroese

Filipino

Finnish

French

Frisian

Galician

Georgian

German

Greek

Guarani

Gujarati

Haitian Creole

Hausa

Hawaiian

Hebrew

Hindi

Hmong

Hungarian

Icelandic

Igbo

Indonesian

Interlingua

Irish

Italian

Japanese

Javanese

Kannada

Kazakh

Kinyarwanda

Kirundi

Kongo

Korean

Krio (Sierra Leone)

Kurdish

Kurdish (Soranî)

Kyrgyz

Laothian

Latin

Latvian

Lingala

Lithuanian

Lozi

Luganda

Luo

Luxembourgish

Macedonian

Malagasy

Malay

Malayalam

Maltese

Maori

Marathi

Mauritian Creole

Moldavian

Mongolian

Myanmar (Burmese)

Montenegrin

Nepali

Nigerian Pidgin

Northern Sotho

Norwegian

Norwegian (Nynorsk)

Occitan

Oriya

Oromo

Pashto

Persian

Polish

Portuguese (Brazil)

Portuguese (Portugal)

Punjabi

Quechua

Romanian

Romansh

Runyakitara

Russian

Samoan

Scots Gaelic

Serbian

Serbo-Croatian

Sesotho

Setswana

Seychellois Creole

Shona

Sindhi

Sinhalese

Slovak

Slovenian

Somali

Spanish

Spanish (Latin American)

Sundanese

Swahili

Swedish

Tajik

Tamil

Tatar

Telugu

Thai

Tigrinya

Tonga

Tshiluba

Tumbuka

Turkish

Turkmen

Twi

Uighur

Ukrainian

Urdu

Uzbek

Vietnamese

Welsh

Wolof

Xhosa

Yiddish

Yoruba

Zulu

Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:01,790 --> 00:00:05,899 Okay, other areas. Another big sub-area of artificial intelligence is perceiving the world, 2 00:00:05,899 --> 00:00:09,730 and in large part this is vision but there's other kinds of perception. 3 00:00:09,730 --> 00:00:11,949 So, things like object recognition, face recognition. 4 00:00:11,949 --> 00:00:15,079 You probably have a lot of this technology closer than you think, you 5 00:00:15,079 --> 00:00:18,880 probably have face recognition built into your cameras, that's actually face detection, 6 00:00:18,880 --> 00:00:22,669 not always face recognition though some cameras do that too. Segmenting scenes into pieces, 7 00:00:22,669 --> 00:00:26,320 figuring out for a given image what it means, what's going on. 8 00:00:26,320 --> 00:00:30,529 Here's an example on the left of an image, and overlaid on the image is the 9 00:00:30,529 --> 00:00:33,780 machine's reconstruction of kind of the underlying 3D 10 00:00:33,780 --> 00:00:35,180 outline and mesh, 11 00:00:35,180 --> 00:00:39,020 and you can see that reconstructed on the right. This is from an image, but actually 12 00:00:39,020 --> 00:00:42,660 it turns out that in addition to being able to do a bunch of cool things in 13 00:00:42,660 --> 00:00:44,380 vision with the image, 14 00:00:44,380 --> 00:00:47,990 one realization we've had in cases like autonomous driving and vision is 15 00:00:47,990 --> 00:00:51,210 we don't have to use the tools that humans use. We've spent a long time with vision 16 00:00:51,210 --> 00:00:54,830 just trying to use like a camera, or maybe two cameras slightly apart because 17 00:00:54,830 --> 00:00:58,140 that's what we have, we've got two cameras slightly apart. But then we realized, 18 00:00:58,140 --> 00:01:01,300 we can do other stuff. So what's this, anybody recognize this? 19 00:01:01,300 --> 00:01:05,010 It's a Kinect. The Kinect's got sensors that you don't. 20 00:01:05,010 --> 00:01:08,790 Sorry you didn't get a rangefinder, a depth detector, you just didn't. 21 00:01:08,790 --> 00:01:11,980 But, you know, we can build them so why not. And so, now we can do cool things like 22 00:01:11,980 --> 00:01:15,660 take an image and produce a depth map that isn't just about parallax, looking at 23 00:01:15,660 --> 00:01:18,740 the difference between the two eyes, or about kind of inferring from occlusion. 24 00:01:18,740 --> 00:01:22,740 You'll notice, people like to think that vision is all about having two images, 25 00:01:22,740 --> 00:01:26,110 but if you close one eye, you can still see depth. 26 00:01:26,110 --> 00:01:29,990 It's not like the world suddenly goes flat and you shriek. I mean, you close one eye, you still have a sense of depth, 27 00:01:29,990 --> 00:01:33,050 we want to be to build machines that do that, but at the moment we do pretty well by 28 00:01:33,050 --> 00:01:36,730 using things like depth detectors, cause why not. 29 00:01:36,730 --> 00:01:38,600 Let's take a look at science fiction again. 30 00:01:38,600 --> 00:01:43,120 Does anybody recognize this movie? Does anybody know what this is gonna be? 31 00:01:43,120 --> 00:01:48,720 Yeah, this is Terminator here, and let's take a look at what it's like to be a 32 00:01:48,720 --> 00:01:52,520 Terminator--it's relevant to vision. So here's what it's like to be a terminator. 33 00:01:52,520 --> 00:01:59,520 It's actually a lot like being Governor of California, apparently. 34 00:02:00,009 --> 00:02:02,070 Okay, so he looks around, 35 00:02:02,070 --> 00:02:07,090 okay, motorcycle, motorcycle, motorcycle, car, 36 00:02:07,090 --> 00:02:09,229 motorcycle, 37 00:02:09,229 --> 00:02:13,189 some place, target acquired. Okay, so 38 00:02:13,189 --> 00:02:16,969 looking around, outlines, detection. Identifying what the objects are, figuring 39 00:02:16,969 --> 00:02:20,379 out what the target is, that's in the movies. 40 00:02:20,379 --> 00:02:21,729 Straight out of science fiction. 41 00:02:21,729 --> 00:02:25,879 Let's look at some vision recognition system--this is a cute demo from Al Rahimi's lab. 42 00:02:25,879 --> 00:02:27,579 43 00:02:27,579 --> 00:02:31,849 So here we have a camera panning around, and it's kind of--we can do exactly the same 44 00:02:31,849 --> 00:02:34,050 thing but for real. So here we have 45 00:02:34,050 --> 00:02:35,859 the cat. 46 00:02:35,859 --> 00:02:38,319 Cat... 47 00:02:38,319 --> 00:02:41,489 Frog... 48 00:02:41,489 --> 00:02:44,839 Fox... 49 00:02:44,839 --> 00:02:49,379 Dalmatian... 50 00:02:49,379 --> 00:02:51,669 Bulldog... Terminate bulldog, right. Okay. 51 00:02:51,669 --> 00:02:55,229 So, 52 00:02:55,229 --> 00:02:59,269 this is a case where I think it's amazing how close we can to what people thought 53 00:02:59,269 --> 00:03:04,019 it might be like if this technology were possible. This is not robots from the 54 00:03:04,019 --> 00:03:05,909 future detecting the bulldog, this is today. 4965