All language subtitles for 019 Deep Q-Learning Intuition - Acting-subtitle-en

af Afrikaans
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bn Bengali
bs Bosnian
bg Bulgarian
ca Catalan
ceb Cebuano
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
tl Filipino
fi Finnish
fr French
fy Frisian
gl Galician
ka Georgian
de German
el Greek
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
km Khmer
ko Korean
ku Kurdish (Kurmanji)
ky Kyrgyz
lo Lao
la Latin
lv Latvian
lt Lithuanian
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mn Mongolian
my Myanmar (Burmese)
ne Nepali
no Norwegian
ps Pashto
fa Persian
pl Polish
pt Portuguese
pa Punjabi
ro Romanian
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
st Sesotho
sn Shona
sd Sindhi
si Sinhala
sk Slovak
sl Slovenian
so Somali
es Spanish
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
te Telugu
th Thai
tr Turkish
uk Ukrainian
ur Urdu Download
uz Uzbek
vi Vietnamese
cy Welsh
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
or Odia (Oriya)
rw Kinyarwanda
tk Turkmen
tt Tatar
ug Uyghur
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,650 --> 00:00:05,690 Hello and welcome back to the course on a I I in the previous part we talked about the deep learning 2 00:00:05,750 --> 00:00:08,360 Killary intuition we started there. 3 00:00:08,360 --> 00:00:14,900 And in fact we actually got all the way to this part and where we talked about learning and now we're 4 00:00:14,900 --> 00:00:18,200 going to move on to the actual acting part. 5 00:00:18,200 --> 00:00:22,250 So there's there's two parts to distinct parts that we have to remember. 6 00:00:22,250 --> 00:00:25,520 So that's the learning part but now he actually he's done all of this. 7 00:00:25,520 --> 00:00:26,390 That's beautiful. 8 00:00:26,390 --> 00:00:30,500 Now he actually has to take an action he has to decide what is he going to do is going to do action 9 00:00:30,500 --> 00:00:31,710 one two three or four. 10 00:00:31,740 --> 00:00:32,860 And so how does he do that. 11 00:00:33,020 --> 00:00:39,370 Well the way he does it is now given those same values so the values don't change after we've we have 12 00:00:39,370 --> 00:00:43,430 these values of compare them with Calcott the last two by arrogated era we've updated the weights but 13 00:00:43,430 --> 00:00:45,950 the values don't change in that whole process. 14 00:00:45,990 --> 00:00:47,410 To have got the cube values there. 15 00:00:47,430 --> 00:00:48,380 They're fixed. 16 00:00:48,380 --> 00:00:49,440 We know what they are. 17 00:00:49,440 --> 00:00:50,480 All this happens though. 18 00:00:50,510 --> 00:00:53,820 Networks updated and out using those same values that we had. 19 00:00:53,960 --> 00:00:58,600 What we're going to do is we're going to parse them through a soft max function. 20 00:00:58,610 --> 00:01:00,580 And again soft Max as described. 21 00:01:00,620 --> 00:01:05,160 I think an annex 2 and we'll talk a bit more about soft max. 22 00:01:05,180 --> 00:01:12,070 Further down in or we'll talk about this action selection policy further down in the rest of this section. 23 00:01:12,140 --> 00:01:13,610 So just in a few tutorials. 24 00:01:13,730 --> 00:01:17,270 But for now we're just going to say we're passing it through a soft next function. 25 00:01:17,270 --> 00:01:22,150 Basically what it does is it allows it helps select the best one it selects the best action possible. 26 00:01:22,250 --> 00:01:23,650 And there's a small caveat to that. 27 00:01:23,660 --> 00:01:26,120 It's not just the best one possible. 28 00:01:26,120 --> 00:01:28,940 We'll talk about that in the action selection policy tutorial. 29 00:01:28,940 --> 00:01:35,890 But for now let's just say it selects the best action from here it says OK so Q1 you know the likelihood. 30 00:01:36,140 --> 00:01:41,960 Basically we know that q values predicted the Q value so it can look at them and say OK so the highest 31 00:01:41,960 --> 00:01:46,280 Q value of these just as we did in the simple Q learning algorithm. 32 00:01:46,280 --> 00:01:50,240 Ill just look at all these for say the highest values this one I'm going to select that action we're 33 00:01:50,240 --> 00:01:50,860 going to take those. 34 00:01:50,900 --> 00:01:52,180 And that's pretty much it. 35 00:01:52,220 --> 00:01:57,300 That's how he chooses which action take takes takes action and then all of this process happens again. 36 00:01:57,290 --> 00:02:02,120 For for the next stage the agent ends up in in our case and the next square of the maze. 37 00:02:02,120 --> 00:02:04,540 But generally speaking in the next state. 38 00:02:04,640 --> 00:02:05,420 So there we go. 39 00:02:05,420 --> 00:02:14,660 That's how we feed in a reinforcement learning problem into a neural network through a vector describing 40 00:02:14,660 --> 00:02:16,160 the state that we're in. 41 00:02:16,160 --> 00:02:17,510 And once we fit it. 42 00:02:17,510 --> 00:02:22,210 There's two parts of the process that happen Part one is the learning. 43 00:02:22,400 --> 00:02:26,840 So remember that part where we compare each of the cube values with the target and then we back propagate 44 00:02:26,840 --> 00:02:32,360 the loss through the network to update the weights so that our network is learning as we go through 45 00:02:32,360 --> 00:02:34,830 this maze or through this environment. 46 00:02:35,210 --> 00:02:41,120 And also the second part is of course we have to act we have to select an action and that is where we 47 00:02:41,120 --> 00:02:46,880 pass the values through a soft max function and or basically an action selection policy which we'll 48 00:02:46,880 --> 00:02:48,330 talk about further down. 49 00:02:48,470 --> 00:02:53,570 And then we simply select the action that we want to take and we perform that action and then this whole 50 00:02:53,570 --> 00:02:54,580 process starts again. 51 00:02:54,770 --> 00:02:59,570 And then maybe the agent gets then maybe the agent doesn't pausa the game. 52 00:02:59,630 --> 00:03:01,250 In any case the game ends. 53 00:03:01,250 --> 00:03:08,270 And then once again the whole process repeats the agent plays the whole game again and then that stops 54 00:03:08,270 --> 00:03:14,460 so basically that's that's another airpark every time the agent you know every time the game ends with 55 00:03:14,460 --> 00:03:16,680 a favor beyond fairie that's the end of an airport. 56 00:03:16,700 --> 00:03:19,560 And then he starts again and then he starts again and then he starts again. 57 00:03:19,790 --> 00:03:20,420 And so on. 58 00:03:20,420 --> 00:03:26,810 So that happens and this process happens for every single time the agent is in you in a new state so 59 00:03:26,810 --> 00:03:32,240 the state is encoded here so that's important not just for every single game that he plays but for every 60 00:03:32,240 --> 00:03:33,020 single state. 61 00:03:33,020 --> 00:03:38,030 So he's in a state that goes through his process dates and so on and happens every single time. 62 00:03:38,150 --> 00:03:41,410 And so the learning happens and the acting happens as well. 63 00:03:41,720 --> 00:03:47,090 So that is deep learning in the intuition behind deep learning. 64 00:03:47,090 --> 00:03:54,200 We've got lots more to cover off and then of course practical and in the meantime if you'd like to get 65 00:03:54,410 --> 00:03:56,720 some additional information on keep learning. 66 00:03:56,720 --> 00:04:05,200 We've got a recommended reading so we've already spoken about Arthur Giuliani's series of blog posts. 67 00:04:05,210 --> 00:04:12,590 If you look at simple informal learning Lifton's flow part 4 you will find the part that's relevant 68 00:04:12,590 --> 00:04:14,260 to what we discussed today. 69 00:04:14,270 --> 00:04:21,170 Note that here he talks about convolutions we are not covering revolutions in this section we're going 70 00:04:21,170 --> 00:04:23,650 to be talking about them in the next section of the course. 71 00:04:23,720 --> 00:04:28,880 So the difference here is that it's just kind of skip the conclusions part for now and we'll talk about 72 00:04:28,880 --> 00:04:32,850 them in the next part of the course but the difference is in evolutions. 73 00:04:32,850 --> 00:04:39,170 You're like looking the agent is looking at the image and and therefore he has to process an image an 74 00:04:39,170 --> 00:04:43,540 additional complication for now where we're slowly gradually building up to that. 75 00:04:43,580 --> 00:04:50,060 For now we're encoding our environment through you look here we're encoding our environment or maybe 76 00:04:50,060 --> 00:04:58,700 like look at this one probably in coding our environment as a or in to state the agent is in as a vector. 77 00:04:58,700 --> 00:05:01,330 So in our case was very simple vector of values. 78 00:05:01,490 --> 00:05:06,190 Sometimes people even in that in that simple may sometimes or as you'll see from this blog post. 79 00:05:06,290 --> 00:05:10,180 Sometimes people prefer the one hot and coded version of that state. 80 00:05:10,180 --> 00:05:13,380 So basically where every single box of the maze has a. 81 00:05:13,620 --> 00:05:17,780 So you have like a vector of for a null case would be 12 values three by four. 82 00:05:17,800 --> 00:05:22,130 So it isn't like either either 1 or 0 depending on which elements and which box you're in. 83 00:05:22,160 --> 00:05:22,990 In the environment. 84 00:05:23,060 --> 00:05:29,900 So in whichever way you decide to code your environment and the state of your environment that's how 85 00:05:29,900 --> 00:05:31,520 in coding It's basically a vector. 86 00:05:31,520 --> 00:05:36,410 The key here is that it's not a convolution So it's not like an image and there's no convolution volt 87 00:05:36,410 --> 00:05:37,810 So this part will come later. 88 00:05:37,820 --> 00:05:43,410 For us it starts over here and that just simplifies the process for us to gradually understand better. 89 00:05:43,550 --> 00:05:49,130 And of course don't forget that this post is rude and tends to flow and we're using pi torche in our 90 00:05:49,130 --> 00:05:50,090 tutorials. 91 00:05:50,090 --> 00:05:51,910 So hopefully you enjoy this. 92 00:05:51,920 --> 00:05:59,220 A quick intro into a deep convolutional deep not yet deep book learning. 93 00:05:59,310 --> 00:06:02,910 And on that note I look forward to seeing you next. 94 00:06:02,930 --> 00:06:05,430 And until then enjoy artificial intelligence. 10213

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.