Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,930 --> 00:00:03,970
Hello and welcome back to the course on artificial intelligence.
2
00:00:03,990 --> 00:00:08,480
So we've talked about the Belman equation and we've analyzed our little maze.
3
00:00:08,520 --> 00:00:11,100
Let's have a look at the plan.
4
00:00:11,100 --> 00:00:12,400
What is the plan.
5
00:00:12,750 --> 00:00:14,650
Well here is our main analysis.
6
00:00:14,670 --> 00:00:20,970
And we know that we can see actually the states the values of each state we can see what the value of
7
00:00:20,970 --> 00:00:23,310
being in every single state is.
8
00:00:23,400 --> 00:00:27,810
Therefore the AI can or the agent can navigate this maze.
9
00:00:27,840 --> 00:00:28,770
So what is the plan.
10
00:00:28,770 --> 00:00:35,640
Well the plan is simply like a treasure map for artificial intelligence instead of looking at these
11
00:00:35,730 --> 00:00:41,420
values that just replace them with arrows which indicate in which direction the agent should go.
12
00:00:41,490 --> 00:00:43,360
Because of those because it knows those values.
13
00:00:43,350 --> 00:00:47,230
So an ideal scenario after it's explored this environment.
14
00:00:47,250 --> 00:00:50,860
It knows the value of being in each state and therefore you can come up with this map.
15
00:00:50,870 --> 00:00:52,330
So let's have a look again.
16
00:00:52,380 --> 00:00:58,410
We know that your values one so if you are here out of the two the better one is this Once you go right
17
00:00:58,830 --> 00:01:02,010
from here out of the two this one is a better one this one is a better one.
18
00:01:02,010 --> 00:01:02,750
This one is a better one.
19
00:01:02,760 --> 00:01:04,740
Or actually from here you have two options right.
20
00:01:04,770 --> 00:01:11,130
So he was kind of like a tie so just pick one at random doesn't matter which one because the value in
21
00:01:11,130 --> 00:01:16,110
these in either case is the same and more so even if you look through it will take the same amount of
22
00:01:16,110 --> 00:01:18,390
steps same number of steps to get to the end.
23
00:01:18,690 --> 00:01:22,520
From here you've got three options but this one is the better value from here.
24
00:01:22,530 --> 00:01:24,360
This one is a better value from here.
25
00:01:24,360 --> 00:01:29,380
Obviously this was a better value because you're you know you just get it minus one reward right away.
26
00:01:29,590 --> 00:01:35,250
And from here you have like three actually but this one is the best one of the best value of the state.
27
00:01:35,400 --> 00:01:41,190
And so therefore if we replace them with arrows it makes sense that this is how the agent would go if
28
00:01:41,200 --> 00:01:44,570
it stars here or solve for some reason it ends up in this square.
29
00:01:44,580 --> 00:01:46,070
It knows how to get out of here.
30
00:01:46,280 --> 00:01:48,980
Stars and this square knows how to get on here and so on.
31
00:01:48,980 --> 00:01:51,440
So that is what a plan is.
32
00:01:51,440 --> 00:01:56,850
And don't confuse plan with policy because we're going to be talking about policies for Iran poses a
33
00:01:56,850 --> 00:02:01,660
very similar to plans but they have a little trick to them because the environment's going to be a bit
34
00:02:01,670 --> 00:02:02,380
different.
35
00:02:02,420 --> 00:02:07,560
It's going to be stochastic and that's what we're going to talk about in the next tutorial.
36
00:02:07,910 --> 00:02:10,000
So Conway to you on the next one.
37
00:02:10,020 --> 00:02:12,060
And until then enjoy.
3880
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.