subtitlecat.com

All language subtitles for 011 Policy vs Plan-subtitle-en

Afrikaans

Akan

Albanian

Amharic

Arabic

Armenian

Azerbaijani

Basque

Belarusian

Bemba

Bengali

Bihari

Bosnian

Breton

Bulgarian

Cambodian

Catalan

Cebuano

Cherokee

Chichewa

Chinese (Simplified)

Chinese (Traditional)

Corsican

Croatian

Czech

Danish

Dutch

English

Esperanto

Estonian

Ewe

Faroese

Filipino

Finnish

French

Frisian

Galician

Georgian

German

Greek

Guarani

Gujarati

Haitian Creole

Hausa

Hawaiian

Hebrew

Hindi

Hmong

Hungarian

Icelandic

Igbo

Indonesian

Interlingua

Irish

Italian

Japanese

Javanese

Kannada

Kazakh

Kinyarwanda

Kirundi

Kongo

Korean

Krio (Sierra Leone)

Kurdish

Kurdish (Soranî)

Kyrgyz

Laothian

Latin

Latvian

Lingala

Lithuanian

Lozi

Luganda

Luo

Luxembourgish

Macedonian

Malagasy

Malay

Malayalam

Maltese

Maori

Marathi

Mauritian Creole

Moldavian

Mongolian

Myanmar (Burmese)

Montenegrin

Nepali

Nigerian Pidgin

Northern Sotho

Norwegian

Norwegian (Nynorsk)

Occitan

Oriya

Oromo

Pashto

Persian

Polish

Portuguese (Brazil)

Portuguese (Portugal)

Punjabi

Quechua

Romanian

Romansh

Runyakitara

Russian

Samoan

Scots Gaelic

Serbian

Serbo-Croatian

Sesotho

Setswana

Seychellois Creole

Shona

Sindhi

Sinhalese

Slovak

Slovenian

Somali

Spanish

Spanish (Latin American)

Sundanese

Swahili

Swedish

Tajik

Tamil

Tatar

Telugu

Thai

Tigrinya

Tonga

Tshiluba

Tumbuka

Turkish

Turkmen

Twi

Uighur

Ukrainian

Urdu Download

Uzbek

Vietnamese

Welsh

Wolof

Xhosa

Yiddish

Yoruba

Zulu

Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,980 --> 00:00:04,960 Hello and welcome back to the course on artificial intelligence. 2 00:00:05,000 --> 00:00:12,140 Previously we had quite a strenuous and long tutorial on Margrove decision processes and hopefully you 3 00:00:12,200 --> 00:00:13,710 got along well with that. 4 00:00:13,760 --> 00:00:19,010 And hopefully I could explain things in a approachable and engaging way. 5 00:00:19,130 --> 00:00:22,750 And today we're going to talk about policies versus plans. 6 00:00:22,760 --> 00:00:27,910 There will be a quick and fun tutorial because now we're entering into a new world we're entering into 7 00:00:27,910 --> 00:00:34,310 a world of stochastic search nondeterministic search when you're just not getting through the maze but 8 00:00:34,310 --> 00:00:38,990 also accounting for random factors that might just hit you in the head when you're going through this 9 00:00:38,990 --> 00:00:41,080 maze and you need to be prepared for it. 10 00:00:41,080 --> 00:00:42,070 That's the world. 11 00:00:42,080 --> 00:00:48,640 Our agent is living in and it's more fun but it's also more dangerous it's more it's less predictable. 12 00:00:48,650 --> 00:00:50,880 So how is our agent going to behave. 13 00:00:50,960 --> 00:00:52,280 Let's have a look. 14 00:00:52,280 --> 00:00:58,190 There's our mark of decision process framework which is once again our favor Belman equation. 15 00:00:58,250 --> 00:01:02,010 However the the more advanced version of the Belman equation we are working with. 16 00:01:02,010 --> 00:01:04,760 So from now on we're just going to call this the Beldon equation. 17 00:01:04,760 --> 00:01:10,970 And here we've got our maximal and Crucell action so the value of a state any state as is the maximum 18 00:01:10,970 --> 00:01:14,020 across all actions an agent could possibly perform in that state. 19 00:01:14,120 --> 00:01:21,230 And the maxim was taken from the reward that the agent will get by perform action A instate as Plus 20 00:01:21,230 --> 00:01:26,590 a discount factor multiplied by the expected value of the new state it will be in. 21 00:01:26,830 --> 00:01:31,850 And I'd expect those taken here because they are doesn't know exactly what sadle end up in. 22 00:01:31,880 --> 00:01:40,390 They are some random effects that are present in the environment that might alter the state and not 23 00:01:40,800 --> 00:01:42,630 might not end up in the desired state. 24 00:01:42,640 --> 00:01:44,200 It might end up in a different state. 25 00:01:44,210 --> 00:01:47,760 That's why we're taking the expected value over here somewhere here. 26 00:01:47,990 --> 00:01:53,750 So let's have a look at this as our example our or in our example of the maze. 27 00:01:53,750 --> 00:02:00,220 So this is what we had previously so previously we're dealing with live deterministic search. 28 00:02:00,230 --> 00:02:01,960 So we knew that. 29 00:02:01,970 --> 00:02:05,550 All right so if I'm here I definitely need to go here if I'm here. 30 00:02:05,570 --> 00:02:09,030 I definitely need to go here if I'm here I definitely need to go here if I'm here I'm here. 31 00:02:09,140 --> 00:02:11,360 So it was all pretty straightforward. 32 00:02:11,480 --> 00:02:14,680 Once you have this map and remember called it we called it a plan. 33 00:02:14,690 --> 00:02:18,050 Once you have the plan it's pretty straightforward to do. 34 00:02:18,050 --> 00:02:18,990 There are. 35 00:02:18,990 --> 00:02:20,490 So that's the plan with arrows. 36 00:02:20,580 --> 00:02:25,000 And from here it was very straightforward we're this is these are the routes that they will take whenever 37 00:02:25,010 --> 00:02:26,210 you start on this blue line. 38 00:02:26,210 --> 00:02:28,210 That's that's exactly the way you'd go. 39 00:02:28,680 --> 00:02:31,120 However now we don't have a plan anymore. 40 00:02:31,120 --> 00:02:38,060 We can't have a plan because you know whatever we plan might not happen it's not under control or plan 41 00:02:38,060 --> 00:02:40,940 is when you know exactly what you need to do next. 42 00:02:40,940 --> 00:02:41,820 You know the steps. 43 00:02:41,840 --> 00:02:46,640 So you have a you have a starting point you have a goal and you know every single step so you can plan 44 00:02:46,640 --> 00:02:50,500 them out you like I'll do this one I'll do this one I'll do this like in life like a plan. 45 00:02:50,630 --> 00:02:54,870 But at the same time there's so much now randomness going on. 46 00:02:54,890 --> 00:03:00,080 You can have a plan because what if you get here and then you click to the right and actually takes 47 00:03:00,080 --> 00:03:00,560 you down. 48 00:03:00,680 --> 00:03:02,100 So that's not part of your plan. 49 00:03:02,390 --> 00:03:04,120 So that's why it's called the planning more. 50 00:03:04,220 --> 00:03:09,080 And here we're going to calculate the values are actually going to just look at the calculated values 51 00:03:09,410 --> 00:03:11,990 for this same problem. 52 00:03:12,080 --> 00:03:16,700 But based on that given that we have this randomness inside. 53 00:03:16,700 --> 00:03:18,380 So these are the new values. 54 00:03:18,800 --> 00:03:22,840 And so why are these values different so let's just compare to what we had previously. 55 00:03:22,850 --> 00:03:24,710 This is what we had previously. 56 00:03:24,710 --> 00:03:25,650 These are then you. 57 00:03:25,660 --> 00:03:29,750 So once again we had previously because he won 3.9 percent. 58 00:03:29,770 --> 00:03:31,590 He was really 366. 59 00:03:31,790 --> 00:03:36,750 And this is what we have now a a a less than once in force and 1 6 3. 60 00:03:36,800 --> 00:03:43,850 And by the way these are not exactly the current rallies off the top of my head but if we were to run 61 00:03:43,850 --> 00:03:49,220 an agent some values would be something similar to this and the values could change because depending 62 00:03:49,220 --> 00:03:54,650 on the gamble that it would choose 3.9 or other value but nevertheless for the for argument's sake these 63 00:03:54,650 --> 00:04:00,560 are the values that we're dealing with now and they're approximate they convey the whole notion in the 64 00:04:00,560 --> 00:04:02,270 correct way so let's have a look at them. 65 00:04:02,270 --> 00:04:03,240 Why have they changed. 66 00:04:03,410 --> 00:04:07,480 Well why is here with this one here the value was one. 67 00:04:07,490 --> 00:04:10,520 Why is it all of a sudden 0.26 Why is it less than one. 68 00:04:10,560 --> 00:04:11,730 Just go from here here. 69 00:04:11,930 --> 00:04:18,620 Well we actually called because from here if we go right which is our intention if we go right we could 70 00:04:18,640 --> 00:04:22,340 I would actually we have a 10 percent chance we'd end up here. 71 00:04:22,340 --> 00:04:25,130 So we'd hit the wall and would be back in this state. 72 00:04:25,130 --> 00:04:30,740 And remember we have a Gamla So the value would be discounted and or we're off or off at 10 and chance 73 00:04:30,740 --> 00:04:32,150 would end up here in this state. 74 00:04:32,150 --> 00:04:37,670 So it's not 100 percent probability I would get here so therefore disvalue can no longer be a one it's 75 00:04:37,670 --> 00:04:41,310 something less and it's 0.26. 76 00:04:41,570 --> 00:04:43,770 So that's an example of why it's like this. 77 00:04:43,770 --> 00:04:49,130 And you could get the exact value if you calculated the Belman equation the full but my question that 78 00:04:49,130 --> 00:04:49,850 we have now. 79 00:04:49,850 --> 00:04:53,540 The only problem is that there's going to be some recursion because you would need to know the value 80 00:04:53,540 --> 00:04:57,440 for this and then you need to know the value for this that is quite complex and that's why we're not 81 00:04:57,440 --> 00:04:59,180 doing the calculations manually here. 82 00:04:59,240 --> 00:05:06,000 That's why the I can do them as it's going through all this it's it's like it's like nothing too complex 83 00:05:06,000 --> 00:05:06,510 for a. 84 00:05:06,540 --> 00:05:08,520 You can't play these things. 85 00:05:08,520 --> 00:05:10,090 So that's our value here. 86 00:05:10,110 --> 00:05:11,520 But of this is a different one. 87 00:05:11,520 --> 00:05:16,830 So here it just to be 0.9 just because of discounting factor remember from here to here again now from 88 00:05:16,830 --> 00:05:23,070 here we colleges jump from here to here simply because even if we jump if we go like this we might end 89 00:05:23,070 --> 00:05:24,680 up back here back here. 90 00:05:24,700 --> 00:05:28,440 Right this 20 percent chance that will still stay in the square because we'll hit a wall. 91 00:05:28,710 --> 00:05:29,730 And again and so on. 92 00:05:29,730 --> 00:05:32,700 So the value of being here is zero point seventy one. 93 00:05:32,850 --> 00:05:35,370 Again this and the discounting factor. 94 00:05:35,370 --> 00:05:39,970 You know this might look odd to you that this is even with the discount in factor this is too high. 95 00:05:40,050 --> 00:05:44,440 Maybe the discounting factor in this example is not 0.9 maybe it's seven point ninety nine or something 96 00:05:44,500 --> 00:05:46,310 that so don't worry about that. 97 00:05:46,350 --> 00:05:48,480 Just kind of like focus on that. 98 00:05:48,480 --> 00:05:53,210 The values have indeed changed that the values are now less. 99 00:05:53,460 --> 00:05:58,700 Mostly because it's not it's a hundred percent probability to get to the state that you want to get 100 00:05:59,100 --> 00:06:00,180 and what you will find. 101 00:06:00,210 --> 00:06:06,660 An interesting one is here that here just to be 0.9 actually has dropped very much has dropped substantially. 102 00:06:06,660 --> 00:06:07,110 Why is that. 103 00:06:07,110 --> 00:06:12,120 Well because if you go from here up which is our intention there's a 10 percent chance of hitting a 104 00:06:12,120 --> 00:06:18,700 wall but there's a 10 percent chance of actually ending up in the firepit and losing minus one to reward 105 00:06:18,700 --> 00:06:22,820 and basically that means for the agent that's that's end of the game. 106 00:06:23,160 --> 00:06:25,640 And so this is a very bad state to be in. 107 00:06:25,680 --> 00:06:29,910 So all of a sudden remember we had zero point nine years apart and so they were equivalent. 108 00:06:29,910 --> 00:06:34,900 Doesn't matter you hear here they're pretty much equal in terms of value of being in each of these states. 109 00:06:34,980 --> 00:06:43,440 But now all of a sudden bam this date is like nearly twice as good as this one simply just because here 110 00:06:43,590 --> 00:06:46,980 if you go straight to it you go right where you want to go. 111 00:06:47,050 --> 00:06:51,270 The you know the consequences of the randomness occurring is you just stay here. 112 00:06:51,290 --> 00:06:55,070 Here one of the consequences a 10 percent chance is you end up in the pit. 113 00:06:55,110 --> 00:07:02,160 So as you can see this is no longer such a good state anymore simply because of something that fluctuation 114 00:07:02,160 --> 00:07:03,460 that could happen. 115 00:07:03,570 --> 00:07:09,150 As you can see this one is also very bad because it's as bad as this one in terms of you know it's only 116 00:07:09,150 --> 00:07:12,660 10 percent chance of ending up in the pit and 10 percent chance of ending up in the wall. 117 00:07:12,660 --> 00:07:18,480 But at the same time there's a discounting factor So first of all the discounting factor and also after 118 00:07:18,480 --> 00:07:20,390 this one you'd have to go here. 119 00:07:20,700 --> 00:07:23,900 And even if you hypothetically went here you could end up in the pit again. 120 00:07:23,910 --> 00:07:28,710 So that chance would also be taken into account because remember this values derive from this value 121 00:07:28,710 --> 00:07:31,760 and this value is derived from this value. 122 00:07:31,820 --> 00:07:32,350 Right. 123 00:07:32,400 --> 00:07:37,560 And therefore it's small but in reality actually what I said there was wrong. 124 00:07:37,560 --> 00:07:39,640 This value is not derived from the Fed. 125 00:07:39,810 --> 00:07:46,800 So if you just have a look now you will notice that this value over here is actually greater than this 126 00:07:46,800 --> 00:07:47,300 one. 127 00:07:47,610 --> 00:07:54,780 You will notice that for the agent it's better to go all this way than this way and it makes sense right. 128 00:07:54,780 --> 00:07:58,580 Because this way it doesn't lose it there's no chance of getting in the pit. 129 00:07:58,590 --> 00:08:03,450 Yes is a bit longer and therefore the discounting factor has a bigger effect. 130 00:08:03,510 --> 00:08:07,470 But at the same time simply because there's a chance of getting in the pit here if it goes straight 131 00:08:07,530 --> 00:08:09,140 it will there's a chance of jumping. 132 00:08:09,160 --> 00:08:15,120 So it will take a draw to take its time and just go around because that way there's a much lesser chance 133 00:08:15,120 --> 00:08:16,530 of it getting But there still is. 134 00:08:16,530 --> 00:08:19,590 So from here goes there it from here goes there. 135 00:08:19,590 --> 00:08:23,590 It could potentially get into the pit because it could end up there and that could end up in the bill. 136 00:08:23,730 --> 00:08:27,430 But nevertheless it's a lesser chance so it will just go on like that. 137 00:08:27,430 --> 00:08:32,430 So very interesting to see how they're all change remember previous you from here you'd go like that. 138 00:08:32,430 --> 00:08:34,790 From here you'd go like that and from here we go like that. 139 00:08:35,010 --> 00:08:36,870 And now all of a sudden you can see his change. 140 00:08:36,870 --> 00:08:41,000 Let's roll the arrows and see what it looks like now and voila. 141 00:08:41,010 --> 00:08:43,760 You see even a more random thing right. 142 00:08:43,770 --> 00:08:45,260 So yes this is true. 143 00:08:45,270 --> 00:08:46,500 But look at what happened here. 144 00:08:46,500 --> 00:08:47,610 Look at this one. 145 00:08:47,690 --> 00:08:48,970 Look at this one. 146 00:08:49,050 --> 00:08:50,490 Were you expecting that. 147 00:08:50,520 --> 00:08:54,570 That's something I definitely like when I saw this one first time I was was very impressed. 148 00:08:54,570 --> 00:08:59,800 I was not super I was not surprised and I was not expecting this at all. 149 00:08:59,970 --> 00:09:04,860 And this is an example of you know when I can outsmart a human. 150 00:09:05,120 --> 00:09:10,680 It sounds like something you caught even you could predict but the I through enforcement learning remember 151 00:09:10,680 --> 00:09:14,400 that example of dogs can actually sometimes work better than normal real life. 152 00:09:14,400 --> 00:09:21,330 Dogs are preprogrammed robot dogs can play soccer simply because they come up with these ideas that 153 00:09:21,390 --> 00:09:22,350 even we can't see. 154 00:09:22,440 --> 00:09:27,330 And as a great example you probably weren't expecting that as well that the Asians instead of going 155 00:09:27,330 --> 00:09:29,690 up is like why would I. 156 00:09:29,850 --> 00:09:33,120 Like if I go up then there's a 10 percent chance I'll jump into the pit. 157 00:09:33,120 --> 00:09:35,130 But what does it achieve by going into the war. 158 00:09:35,280 --> 00:09:38,330 Well 80 percent of the time will bump back and stay in the state. 159 00:09:38,490 --> 00:09:42,360 But 10 percent of the time will go here and 10 percent of the time I'll go here. 160 00:09:42,360 --> 00:09:49,130 So all of a sudden you can see that now it's actually in this new approach of jumping into the wall. 161 00:09:49,170 --> 00:09:53,350 There is a zero percent chance it will go into the fire but from this spot so. 162 00:09:53,370 --> 00:09:57,690 And it's like it really doesn't want to go into the fire pit so drugged bon bons into the wall a couple 163 00:09:57,690 --> 00:10:03,050 of times and then it will go the right or left at some point because that randomness is going to happen. 164 00:10:03,080 --> 00:10:09,680 And so it learned that through experimentation it learned that OK when I go forward the results are 165 00:10:09,680 --> 00:10:11,440 not as good as when I go to the wall. 166 00:10:11,510 --> 00:10:13,540 And if you think about it it's like this. 167 00:10:13,580 --> 00:10:18,350 This robot if you think about it this is a firepit is a very this is the this is like a square is like 168 00:10:18,350 --> 00:10:21,630 a very tiny ledge and then this is like a mountain like a cliff. 169 00:10:21,650 --> 00:10:27,830 And this robot is just hugging the cliff and just like trying to waiting until it like pushes right 170 00:10:27,830 --> 00:10:32,640 or left because well as a human you probably do the same you wouldn't be standing facing that way or 171 00:10:32,750 --> 00:10:34,970 you'd be hugging the cliff right. 172 00:10:35,000 --> 00:10:35,860 Or something like that. 173 00:10:35,940 --> 00:10:39,740 And hopefully you know we need to end up never end up in situations like that. 174 00:10:39,770 --> 00:10:43,670 But like visually just visually if you think about something here. 175 00:10:43,760 --> 00:10:46,450 And so that's pretty intense right. 176 00:10:46,460 --> 00:10:51,860 So that the AI came up with this idea and same here that is sort of going left and Riskin get in a fight 177 00:10:51,860 --> 00:10:56,270 but I'm just going to try balls off the wall like you know hug a wall try to jump into the wall and 178 00:10:56,300 --> 00:11:01,430 at some point I know that you know just there is a probability is a 10 percent chance every time I do 179 00:11:01,430 --> 00:11:04,910 that I'll go here and something will happen and I'll end up here and I'll be safe and then I'll just 180 00:11:04,910 --> 00:11:06,680 keep going like that. 181 00:11:06,830 --> 00:11:13,240 So very very interesting approach that they took here and you can see the the routes are like this so 182 00:11:13,250 --> 00:11:17,500 from here it might go right and then it'll go right to the exit or here or go left like that. 183 00:11:17,690 --> 00:11:22,230 And here we will at some point you will go left and go like that again. 184 00:11:22,310 --> 00:11:23,170 This is important. 185 00:11:23,180 --> 00:11:27,610 I'm not a policy so even when it jumps from here it will go here. 186 00:11:27,650 --> 00:11:30,400 Maybe And then from here it might actually rain straight. 187 00:11:30,410 --> 00:11:34,520 It might actually go back to the right and then from here and I going to let me get that right. 188 00:11:34,550 --> 00:11:38,260 So there's lots of different options for it guys who might not follow exactly this ironmonger go the 189 00:11:38,270 --> 00:11:38,730 other way. 190 00:11:38,960 --> 00:11:42,500 This is just the desired routes that it's designed for itself. 191 00:11:42,590 --> 00:11:44,690 But the way it'll work out is actually could be different. 192 00:11:44,690 --> 00:11:46,130 It depends on the real world. 193 00:11:46,340 --> 00:11:46,940 So there we go. 194 00:11:46,950 --> 00:11:50,090 That's the world of artificial intelligence. 195 00:11:50,090 --> 00:11:56,780 That's what a policy versus a plan is and hopefully you're getting slowly getting excited by what the 196 00:11:57,000 --> 00:12:01,220 AI can do especially given what we saw over here. 197 00:12:01,340 --> 00:12:07,430 These are some very virtuoso type of decisions that the AIs coming up with. 198 00:12:07,610 --> 00:12:12,500 And as you can see when you're playing AI even from this small example you can see that even when you 199 00:12:12,500 --> 00:12:18,950 play in a real world maybe you'll come up with ideas and decisions that even sometimes humans can come 200 00:12:18,950 --> 00:12:19,240 up with. 201 00:12:19,250 --> 00:12:25,460 And that's exactly kind of like what happened in those games where the Google Alpha goal was playing 202 00:12:25,520 --> 00:12:32,320 versus Lisa idole champion of goal in Korea back in the world champion of go. 203 00:12:32,390 --> 00:12:37,000 And they were playing in Korea back bakla in 2016 I think is March 2016. 204 00:12:37,000 --> 00:12:42,370 It came up with some moves that humans had never played in 3000 years or humans were not used to playing. 205 00:12:42,380 --> 00:12:45,510 And this is this is exactly an example of that. 206 00:12:45,740 --> 00:12:50,290 So once again hope you're getting excited and pumped about discourse and about what we can integrate. 207 00:12:50,330 --> 00:12:51,840 And I look for it. 208 00:12:51,840 --> 00:12:52,720 See you next time. 209 00:12:52,730 --> 00:12:54,410 Until then enjoy. 210 00:12:54,410 --> 00:12:54,640 I. 22616