Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,000 --> 00:00:05,280
This comprehensive course will teach you the foundations of machine learning and deep learning
2
00:00:05,280 --> 00:00:11,360
using PyTorch. PyTorch is a machine learning framework written in Python. You'll learn machine
3
00:00:11,360 --> 00:00:17,280
learning by writing PyTorch code. So when in doubt, run the provided code and experiment.
4
00:00:17,280 --> 00:00:23,520
Your teacher for this course is Daniel Bourke. Daniel is a machine learning engineer and popular
5
00:00:23,520 --> 00:00:28,080
course creator. So enjoy the course and don't watch the whole thing in one sitting.
6
00:00:28,080 --> 00:00:34,240
Hello, welcome to the video. It's quite a big one. But if you've come here to learn machine
7
00:00:34,240 --> 00:00:40,800
learning and deep learning and PyTorch code, well, you're in the right place. Now, this video and
8
00:00:40,800 --> 00:00:45,520
tutorial is focused for beginners who have got about three to six months of Python coding experience.
9
00:00:46,400 --> 00:00:50,000
So we're going to cover a whole bunch of important machine learning concepts
10
00:00:50,000 --> 00:00:57,120
by writing PyTorch code. Now, if you get stuck, you can leave a comment below or post on the course
11
00:00:57,120 --> 00:01:02,560
GitHub discussions page. And on GitHub is where you'll be able to find all the materials that we cover,
12
00:01:02,560 --> 00:01:09,280
as well as on learn pytorch.io. There's an online readable book version of this course there.
13
00:01:10,160 --> 00:01:15,280
But if you finish this video and you find that, hey, I would still like to learn more PyTorch.
14
00:01:15,280 --> 00:01:19,600
I mean, you can't really cover all the PyTorch in a day that video titles just apply on words of
15
00:01:19,600 --> 00:01:25,840
the length of video. That's an aside. There is five more chapters available at learn pytorch.io,
16
00:01:25,840 --> 00:01:30,640
covering everything from transfer learning to model deployment to experiment tracking.
17
00:01:30,640 --> 00:01:36,400
And all the videos to go with those are available at zero to mastery.io. But that's enough for me.
18
00:01:37,280 --> 00:01:40,400
Having machine learning and I'll see you inside.
19
00:01:45,120 --> 00:01:50,720
Hello, my name is Daniel and welcome to the deep learning with
20
00:01:50,720 --> 00:02:00,160
PyTorch course. Now, that was too good not to watch twice. Welcome to the deep learning with
21
00:02:01,920 --> 00:02:07,280
cools at fire PyTorch course. So this is very exciting. Are you going to see that animation
22
00:02:07,280 --> 00:02:13,040
quite a bit because, I mean, it's fun and PyTorch's symbol is a flame because of torch.
23
00:02:13,040 --> 00:02:17,520
But let's get into it. So naturally, if you've come to this course, you might have already
24
00:02:17,520 --> 00:02:21,280
researched what is deep learning, but we're going to cover it quite briefly.
25
00:02:21,840 --> 00:02:26,720
And just in the sense of how much you need to know for this course, because we're going to be
26
00:02:26,720 --> 00:02:30,960
more focused on, rather than just definitions, we're going to be focused on getting practical
27
00:02:30,960 --> 00:02:36,080
and seeing things happen. So if we define what machine learning is, because as we'll see in a
28
00:02:36,080 --> 00:02:41,600
second, deep learning is a subset of machine learning. Machine learning is turning things
29
00:02:41,600 --> 00:02:49,280
data, which can be almost anything, images, text, tables of numbers, video, audio files,
30
00:02:49,280 --> 00:02:54,640
almost anything can be classified as data into numbers. So computers love numbers,
31
00:02:55,200 --> 00:03:01,840
and then finding patterns in those numbers. Now, how do we find those patterns? Well,
32
00:03:01,840 --> 00:03:05,680
the computer does this part specifically a machine learning algorithm or a deep learning
33
00:03:05,680 --> 00:03:11,760
algorithm of things that we're going to be building in this course. How? Code and math. Now,
34
00:03:11,760 --> 00:03:16,320
this course is code focused. I want to stress that before you get into it. We're focused on
35
00:03:16,320 --> 00:03:21,760
writing code. Now, behind the scenes, that code is going to trigger some math to find patterns in
36
00:03:21,760 --> 00:03:26,720
those numbers. If you would like to deep dive into the math behind the code, I'm going to be
37
00:03:26,720 --> 00:03:32,080
linking extra resources for that. However, we're going to be getting hands on and writing lots of
38
00:03:32,080 --> 00:03:37,280
code to do lots of this. And so if we keep going to break things down a little bit more,
39
00:03:37,920 --> 00:03:43,360
machine learning versus deep learning, if we have this giant bubble here of artificial
40
00:03:43,360 --> 00:03:48,320
intelligence, you might have seen something similar like this on the internet. I've just
41
00:03:48,320 --> 00:03:53,280
copied that and put it into pretty colors for this course. So you've got this overarching
42
00:03:54,160 --> 00:04:00,000
big bubble of the topic of artificial intelligence, which you could define as, again, almost anything
43
00:04:00,000 --> 00:04:04,960
you want. Then typically, there's a subset within artificial intelligence, which is known as machine
44
00:04:04,960 --> 00:04:10,960
learning, which is quite a broad topic. And then within machine learning, you have another topic
45
00:04:10,960 --> 00:04:16,000
called deep learning. And so that's what we're going to be focused on working with PyTorch,
46
00:04:16,720 --> 00:04:22,320
writing deep learning code. But again, you could use PyTorch for a lot of different machine
47
00:04:22,320 --> 00:04:29,920
learning things. And truth be told, I kind of use these two terms interchangeably. Yes, ML is the
48
00:04:29,920 --> 00:04:36,080
broader topic and deep learning is a bit more nuanced. But again, if you want to form your
49
00:04:36,080 --> 00:04:41,360
own definitions of these, I'd highly encourage you to do so. This course is more focused on,
50
00:04:41,360 --> 00:04:47,280
rather than defining what things are, is seeing how they work. So this is what we're focused on doing.
51
00:04:47,280 --> 00:04:52,320
Just to break things down, if you're familiar with the fundamentals of machine learning,
52
00:04:52,320 --> 00:04:58,320
you probably understand this paradigm, but we're going to just rehash on it anyway. So if we
53
00:04:58,320 --> 00:05:04,160
consider traditional programming, let's say you'd like to write a computer program that's enabled to,
54
00:05:04,160 --> 00:05:12,320
or has the ability to reproduce your grandmother's favorite or famous roast chicken dish. And so we
55
00:05:12,320 --> 00:05:17,440
might have some inputs here, which are some beautiful vegetables, a chicken that you've raised on the
56
00:05:17,440 --> 00:05:24,000
farm. You might write down some rules. This could be your program, cut the vegetables, season the
57
00:05:24,000 --> 00:05:28,320
chicken, preheat the oven, cook the chicken for 30 minutes and add vegetables. Now, it might not
58
00:05:28,320 --> 00:05:33,760
be this simple, or it might actually be because your Sicilian grandmother is a great cook. So she's
59
00:05:33,760 --> 00:05:39,440
put things into an art now and can just do it step by step. And then those inputs combined with
60
00:05:39,440 --> 00:05:46,800
those rules makes this beautiful roast chicken dish. So that's traditional programming. Now,
61
00:05:46,800 --> 00:05:52,480
a machine learning algorithm typically takes some inputs and some desired outputs and then
62
00:05:52,480 --> 00:05:59,280
figures out the rules. So the patterns between the inputs and the outputs. So where in traditional
63
00:05:59,280 --> 00:06:05,200
program, we had to hand write all of these rules, the ideal machine learning algorithm will figure
64
00:06:05,200 --> 00:06:12,320
out this bridge between our inputs and our idealized output. Now, in the machine learning sense, this
65
00:06:12,320 --> 00:06:18,400
is typically described as supervised learning, because you will have some kind of input with
66
00:06:18,400 --> 00:06:24,240
some kind of output, also known as features, and also known as labels. And the machine learning
67
00:06:24,240 --> 00:06:30,880
algorithm's job is to figure out the relationships between the inputs or the features and the outputs
68
00:06:30,880 --> 00:06:36,960
or the label. So if we wanted to write a machine learning algorithm to figure out our Sicilian
69
00:06:36,960 --> 00:06:42,880
grandmother's famous roast chicken dish, we would probably gather a bunch of inputs of ingredients
70
00:06:42,880 --> 00:06:47,840
such as these delicious vegetables and chicken, and then have a whole bunch of outputs of the
71
00:06:47,840 --> 00:06:53,920
finished product and see if our algorithm can figure out what we should do to go from these
72
00:06:53,920 --> 00:07:00,480
inputs to output. So that's almost enough to cover of the difference between traditional programming
73
00:07:00,480 --> 00:07:05,280
and machine learning as far as definitions go. We're going to get hands on encoding these sort
74
00:07:05,280 --> 00:07:12,720
of algorithms throughout the course. For now, let's go to the next video and ask the question,
75
00:07:12,720 --> 00:07:17,200
why use machine learning or deep learning? And actually, before we get there, I'd like you to
76
00:07:17,200 --> 00:07:22,400
think about that. So going back to what we just saw, the paradigm between traditional programming
77
00:07:22,400 --> 00:07:28,960
and machine learning, why would you want to use machine learning algorithms rather than
78
00:07:28,960 --> 00:07:33,920
traditional programming? So if you had to write all these rules, could that get cumbersome?
79
00:07:34,800 --> 00:07:37,840
So have a think about it and we'll cover it in the next video.
80
00:07:41,040 --> 00:07:45,520
Welcome back. So in the last video, we covered briefly the difference between
81
00:07:45,520 --> 00:07:49,920
traditional programming and machine learning. And again, I don't want to spend too much time
82
00:07:49,920 --> 00:07:54,800
on definitions. I'd rather you see this in practice. And I left you with the question,
83
00:07:54,800 --> 00:08:00,800
why would you want to use machine learning or deep learning? Well, let's think of a good reason.
84
00:08:01,360 --> 00:08:06,400
Why not? I mean, if we had to write all those handwritten rules to reproduce Alsace and grandmother's
85
00:08:06,400 --> 00:08:13,280
roast chicken dish all the time, that would be quite cumbersome, right? Well, let's draw a line
86
00:08:13,280 --> 00:08:18,560
on that. Why not? What's a better reason? And kind of what we just said, right? For a complex
87
00:08:18,560 --> 00:08:24,160
problem, can you think of all the rules? So let's imagine we're trying to build a self-driving car.
88
00:08:24,800 --> 00:08:30,480
Now, if you've learned to drive, you've probably done so in maybe 20 hours, 100 hours. But now,
89
00:08:30,480 --> 00:08:34,960
I'll give you a task of writing down every single rule about driving. How do you back out of your
90
00:08:34,960 --> 00:08:40,000
driveway? How do you turn left and go down the street? How do you park a reverse park? How do
91
00:08:40,000 --> 00:08:45,920
you stop at an intersection? How do you know how fast to go somewhere? So we just listed half a
92
00:08:45,920 --> 00:08:50,400
dozen rules. But you could probably go a fair few more. You might get into the thousands.
93
00:08:50,400 --> 00:08:56,480
And so for a complex problem, such as driving, can you think of all the rules? Well, probably not.
94
00:08:56,480 --> 00:09:02,960
So that's where machine learning and deep learning come in to help. And so this is a beautiful comment
95
00:09:02,960 --> 00:09:08,960
I like to share with you on one of my YouTube videos is my 2020 machine learning roadmap.
96
00:09:08,960 --> 00:09:13,360
And this is from Yashawing. I'm probably going to mispronounce this if I even try to.
97
00:09:13,360 --> 00:09:18,480
But Yashawing says, I think you can use ML. So ML is machine learning. I'm going to use that
98
00:09:18,480 --> 00:09:22,880
a lot throughout the course, by the way. ML is machine learning, just so you know.
99
00:09:22,880 --> 00:09:27,920
For literally anything, as long as you can convert it into numbers, ah, that's what we said before,
100
00:09:27,920 --> 00:09:33,280
machine learning is turning something into computer readable numbers. And then programming it to find
101
00:09:33,280 --> 00:09:38,400
patterns, except with a machine learning algorithm, typically we write the algorithm and it finds
102
00:09:38,400 --> 00:09:44,640
the patterns, not us. And so literally it could be anything, any input or output from the universe.
103
00:09:44,640 --> 00:09:51,360
That's pretty darn cool about machine learning, right? But should you always use it just because
104
00:09:51,360 --> 00:09:58,320
it could be used for anything? Well, I'd like to also introduce you to Google's number one rule
105
00:09:58,320 --> 00:10:05,200
of machine learning. Now, if you can build a simple rule based system such as the step of five
106
00:10:05,200 --> 00:10:10,560
rules that we had to map the ingredients to our Sicilian grandmothers roast chicken dish,
107
00:10:10,560 --> 00:10:15,680
if you can write just five steps to do that, that's going to work every time, well, you should
108
00:10:15,680 --> 00:10:20,160
probably do that. So if you can build a simple rule based system that doesn't require machine
109
00:10:20,160 --> 00:10:26,320
learning, do that. And of course, maybe it's not so very simple, but maybe you can just write some
110
00:10:26,320 --> 00:10:31,120
rules to solve the problem that you're working on. And this is from a wise software engineer,
111
00:10:31,120 --> 00:10:36,080
which is, I kind of hinted at it before, rule one of Google's machine learning handbook. Now,
112
00:10:36,080 --> 00:10:39,680
I'm going to highly recommend you read through that, but we're not going to go through that in
113
00:10:39,680 --> 00:10:44,720
this video. So check that out. You can Google that otherwise the links will be where you get links.
114
00:10:45,440 --> 00:10:49,840
So just keep that in mind, although machine learning is very powerful and very fun and very
115
00:10:49,840 --> 00:10:54,880
excited, it doesn't mean that you should always use it. I know this is quite the thing to be saying
116
00:10:54,880 --> 00:10:59,200
at the start of a deep learning machine learning course, but I just want you to keep in mind,
117
00:10:59,200 --> 00:11:04,640
simple rule based systems are still good. Machine learning isn't a solve all for everything.
118
00:11:05,440 --> 00:11:09,440
Now, let's have a look at what deep learning is good for, but I'm going to leave you on a
119
00:11:09,440 --> 00:11:13,040
clip hammock because we're going to check this out in the next video. See you soon.
120
00:11:14,960 --> 00:11:20,080
In the last video, we familiarized ourselves with Google's number one rule of machine learning,
121
00:11:20,080 --> 00:11:25,360
which is basically if you don't need it, don't use it. And with that in mind,
122
00:11:25,360 --> 00:11:29,520
what should we actually be looking to use machine learning or deep learning for?
123
00:11:30,720 --> 00:11:35,280
Well, problems with long lists of rules. So when the traditional approach fails to
124
00:11:35,280 --> 00:11:40,480
remember the traditional approach is you have some sort of data input, you write a list of rules for
125
00:11:40,480 --> 00:11:45,040
that data to be manipulated in some way, shape, or form, and then you have the outputs that you
126
00:11:45,040 --> 00:11:50,320
know. But if you have a long, long list of rules, like the rules of driving a car, which could be
127
00:11:50,320 --> 00:11:54,640
hundreds, could be thousands, could be millions, who knows, that's where machine learning and
128
00:11:54,640 --> 00:11:59,200
deep learning may help. And it kind of is at the moment in the world of self-driving cars,
129
00:11:59,200 --> 00:12:02,400
machine learning and deep learning are the state of the art approach.
130
00:12:03,680 --> 00:12:08,800
Continually changing environments. So whatever the benefits of deep learning is that it can
131
00:12:08,800 --> 00:12:15,040
keep learning if it needs to. And so it can adapt and learn to new scenarios. So if you update the
132
00:12:15,040 --> 00:12:21,040
data that your model was trained on, it can adjust to new different kinds of data in the future.
133
00:12:21,040 --> 00:12:26,000
So similarly to if you are driving a car, you might know your own neighborhood very well.
134
00:12:26,000 --> 00:12:30,960
But then when you go to somewhere you haven't been before, sure you can draw on the foundations
135
00:12:30,960 --> 00:12:35,600
of what you know, but you're going to have to adapt. How fast should you go? Where should you
136
00:12:35,600 --> 00:12:40,240
stop? Where should you park? These kinds of things. So with problems with long lists of rules,
137
00:12:40,240 --> 00:12:48,640
or continually changing environments, or if you had a large, large data set. And so this is where
138
00:12:48,640 --> 00:12:55,200
deep learning is flourishing in the world of technology. So let's give an example. One of my
139
00:12:55,200 --> 00:13:00,880
favorites is the food 101 data set, which you can search for online, which is images of 101
140
00:13:00,880 --> 00:13:07,280
different kinds of foods. Now we briefly looked at what a rule list might look like for cooking
141
00:13:07,280 --> 00:13:14,080
your grandmother's famous Sicilian roast chicken dish. But can you imagine if you wanted to build
142
00:13:14,080 --> 00:13:19,600
an app that could take photos of different food, how long your list of rules would be to differentiate
143
00:13:19,600 --> 00:13:24,880
101 different foods? It'd be so long. You need rule sets for every single one. Let's just take
144
00:13:24,880 --> 00:13:32,000
one food, for example. How do you write a program to tell what a banana looks like? I mean you'd
145
00:13:32,000 --> 00:13:36,400
have to code what a banana looks like, but not only a banana, what everything that isn't a banana
146
00:13:36,400 --> 00:13:42,720
looks like. So keep this in mind. What deep learning is good for? Problems with long lists of rules,
147
00:13:42,720 --> 00:13:47,360
continually changing environments, or discovering insights within large collections of data.
148
00:13:48,480 --> 00:13:52,880
Now, what deep learning is not good for? And I'm going to write typically here because,
149
00:13:53,440 --> 00:13:57,280
again, this is problem specific. Deep learning is quite powerful these days and things might
150
00:13:57,280 --> 00:14:01,360
change in the future. So keep an open mind, if there's anything about this course, it's not for
151
00:14:01,360 --> 00:14:07,040
me to tell you exactly what's what. It's for me to spark a curiosity into you to figure out what's
152
00:14:07,040 --> 00:14:13,760
what, or even better yet, what's not what. So when you need explainability, as we'll see,
153
00:14:13,760 --> 00:14:19,200
the patterns learned by a deep learning model, which is lots of numbers, called weights and biases,
154
00:14:19,200 --> 00:14:24,320
we'll have a look at that later on, are typically uninterpretable by a human. So some of the times
155
00:14:24,320 --> 00:14:30,240
deep learning models can have a million, 10 million, 100 million, a billion, some models are getting
156
00:14:30,240 --> 00:14:36,080
into the trillions of parameters. When I say parameters, I mean numbers or patterns in data.
157
00:14:36,080 --> 00:14:40,800
Remember, machine learning is turning things into numbers and then writing a machine learning model
158
00:14:40,800 --> 00:14:46,160
to find patterns in those numbers. So sometimes those patterns themselves can be lists of numbers
159
00:14:46,160 --> 00:14:50,080
that are in the millions. And so can you imagine looking at a list of numbers that has a million
160
00:14:50,080 --> 00:14:54,400
different things going on? That's going to be quite hard. I find it hard to understand
161
00:14:54,400 --> 00:15:00,720
three or four numbers, let alone a million. And when the traditional approach is a better option,
162
00:15:00,720 --> 00:15:05,680
again, this is Google's rule number one of machine learning. If you can do what you need to do with
163
00:15:05,680 --> 00:15:12,000
a simple rule based system, well, maybe you don't need to use machine learning or deep learning.
164
00:15:12,000 --> 00:15:15,840
Again, I'm going to use the deep learning machine learning terms interchangeably.
165
00:15:15,840 --> 00:15:20,640
I'm not too concerned with definitions. You can form your own definitions, but just so you know,
166
00:15:20,640 --> 00:15:26,960
from my perspective, ML and deep learning are quite similar. When arrows are unacceptable.
167
00:15:26,960 --> 00:15:31,200
So since the outputs of a deep learning model aren't always predictable, we'll see that deep
168
00:15:31,200 --> 00:15:35,760
learning models are probabilistic. That means they're when they predict something, they're making a
169
00:15:35,760 --> 00:15:42,240
probabilistic bet on it. Whereas in a rule based system, you kind of know what the outputs are
170
00:15:42,240 --> 00:15:48,080
going to be every single time. So if you can't have errors based on probabilistic errors,
171
00:15:48,080 --> 00:15:53,200
well, then you probably shouldn't use deep learning and you'd like to go back to a simple rule based
172
00:15:53,200 --> 00:15:59,680
system. And then finally, when you don't have much data, so deep learning models usually require a
173
00:15:59,680 --> 00:16:04,560
fairly large amount of data to produce great results. However, there's a caveat here, you know,
174
00:16:04,560 --> 00:16:09,680
at the start, I said typically, we're going to see some techniques of how to get great results
175
00:16:09,680 --> 00:16:14,320
without huge amounts of data. And again, I wrote typically here because there are techniques,
176
00:16:14,320 --> 00:16:18,720
you can just research deep learning explainability. You're going to find a whole bunch of stuff.
177
00:16:18,720 --> 00:16:24,000
You can look up examples of when machine learning versus deep learning. And then when arrows are
178
00:16:24,000 --> 00:16:30,320
unacceptable, again, there are ways to make your model reproducible. So it predicts you know what's
179
00:16:30,320 --> 00:16:36,800
going to come out. So we do a lot of testing to verify this as well. And so what's next? Ah,
180
00:16:37,360 --> 00:16:40,560
we've got machine learning versus deep learning, and we're going to have a look at some different
181
00:16:40,560 --> 00:16:46,560
problem spaces in a second, and mainly breaking down in terms of what kind of data you have.
182
00:16:46,560 --> 00:16:50,560
Not going to do this now prevent this video from getting too long. We'll cover all these
183
00:16:50,560 --> 00:16:57,760
colorful beautiful pictures in the next video. Welcome back. So in the last video, we covered a
184
00:16:57,760 --> 00:17:02,880
few things of what deep learning is good for and what deep learning is typically not good for.
185
00:17:02,880 --> 00:17:08,960
So let's dive in to a little more of a comparison of machine learning versus deep learning. Again,
186
00:17:08,960 --> 00:17:15,840
I'm going to be using these terms quite interchangeably. But there are some specific things that
187
00:17:15,840 --> 00:17:21,840
typically you want traditional style of machine learning techniques versus deep learning. However,
188
00:17:21,840 --> 00:17:28,400
this is constantly changing. So again, I'm not talking in absolutes here. I'm more just talking
189
00:17:28,400 --> 00:17:34,720
in general. And I'll leave it to you to use your own curiosity to research the specific
190
00:17:34,720 --> 00:17:40,400
differences between these two. But typically, for machine learning, like the traditional style of
191
00:17:40,400 --> 00:17:44,080
algorithms, although they are still machine learning algorithms, which is kind of a little
192
00:17:44,080 --> 00:17:49,760
bit confusing where deep learning and machine learning differ is you want to use traditional
193
00:17:49,760 --> 00:17:54,960
machine learning algorithms on structured data. So if you have tables of numbers, this is what I
194
00:17:54,960 --> 00:18:02,720
mean by structured rows and columns, structured data. And possibly one of the best algorithms
195
00:18:02,720 --> 00:18:08,560
for this type of data is a gradient boosted machine, such as xg boost. This is an algorithm
196
00:18:08,560 --> 00:18:12,960
that you'll see in a lot of data science competitions, and also used in production settings. When I
197
00:18:12,960 --> 00:18:17,520
say production settings, I mean, applications that you may interact with on the internet,
198
00:18:17,520 --> 00:18:23,200
or use on a day to day. So that's production. xg boost is typically the favorite algorithm for
199
00:18:23,200 --> 00:18:29,680
these kinds of situations. So again, if you have structured data, you might look into xg boost
200
00:18:29,680 --> 00:18:35,120
rather than building a deep learning algorithm. But again, the rules aren't set in stone. That's
201
00:18:35,120 --> 00:18:40,240
where deep learning and machine learning is kind of an art kind of a science is that sometimes
202
00:18:40,240 --> 00:18:45,440
xg boost is the best for structured data, but there might be exceptions to the rule. But for deep
203
00:18:45,440 --> 00:18:51,280
learning, it is typically better for unstructured data. And what I mean by that is data that's kind
204
00:18:51,280 --> 00:18:57,440
of all over the place. It's not in your nice, standardized rows and columns. So say you had
205
00:18:57,440 --> 00:19:02,640
natural language such as this tweet by this person, whose name is quite similar to mine,
206
00:19:02,640 --> 00:19:07,760
and has the same Twitter account as me. Oh, maybe I wrote that. How do I learn machine learning? What
207
00:19:07,760 --> 00:19:12,960
you need to hear? Learn Python, learn math, start probability, software engineering, build.
208
00:19:12,960 --> 00:19:16,880
What you need to do? Google it, go down the rabbit hole, resurfacing six to nine months,
209
00:19:16,880 --> 00:19:21,840
and ring assess. I like that. Or if you had a whole bunch of texts such as the definition for
210
00:19:21,840 --> 00:19:27,760
deep learning on Wikipedia, again, this is the reason why I'm not covering as many definitions
211
00:19:27,760 --> 00:19:31,440
in this course is because look how simple you can look these things up. Wikipedia is going to
212
00:19:31,440 --> 00:19:36,960
be able to define deep learning far better than what I can. I'm more focused on just getting involved
213
00:19:36,960 --> 00:19:41,520
in working hands on with this stuff than defining what it is. And then we have
214
00:19:42,720 --> 00:19:50,240
images. If we wanted to build a burger, take a photo app thing, you would work with image data,
215
00:19:50,240 --> 00:19:55,680
which doesn't really have much of a structure. Although we'll see that there are ways for deep
216
00:19:55,680 --> 00:20:01,120
learning that we can turn this kind of data to have some sort of structure through the beauty
217
00:20:01,120 --> 00:20:06,640
of a tensor. And then we might have audio files such as if you were talking to your voice assistant.
218
00:20:06,640 --> 00:20:11,680
I'm not going to say one because a whole bunch of my devices might go crazy if I say the name of
219
00:20:11,680 --> 00:20:18,240
my voice assistant, which rhymes with I'm not even going to say that out loud. And so typically,
220
00:20:18,240 --> 00:20:24,880
for unstructured data, you'll want to use a neural network of some kind. So structured data,
221
00:20:24,880 --> 00:20:30,800
gradient boosted machine, or a random forest, or a tree based algorithm, such as extra boost,
222
00:20:30,800 --> 00:20:38,480
and unstructured data, neural networks. So let's keep going. Let's have a look at some of the
223
00:20:38,480 --> 00:20:44,880
common algorithms that you might use for structured data, machine learning versus unstructured data,
224
00:20:44,880 --> 00:20:49,360
deep learning. So random forest is one of my favorites, gradient boosted models,
225
00:20:49,360 --> 00:20:56,080
native base nearest neighbor, support vector machine, SVM, and then many more. But since
226
00:20:56,080 --> 00:21:01,280
the advent of deep learning, these are often referred to as shallow algorithms. So deep learning,
227
00:21:01,280 --> 00:21:06,880
why is it called deep learning? Well, as we'll see is that it can have many different layers
228
00:21:06,880 --> 00:21:11,600
of algorithm, you might have an input layer, 100 layers in the middle, and then an output layer.
229
00:21:11,600 --> 00:21:16,720
But we'll get hands on with this later on. And so common algorithms for deep learning and neural
230
00:21:16,720 --> 00:21:21,520
networks, fully connected neural network, convolutional neural network, recurrent neural network,
231
00:21:21,520 --> 00:21:27,440
transformers have taken over over the past couple years, and of course, many more. And the beautiful
232
00:21:27,440 --> 00:21:33,680
thing about deep learning and neural networks is is almost as many problems that it can be applied
233
00:21:33,680 --> 00:21:39,280
to is as many different ways that you can construct them. So this is why I'm putting all these
234
00:21:39,280 --> 00:21:42,800
dot points on the page. And I can understand if you haven't had much experience of machine
235
00:21:42,800 --> 00:21:49,680
learning or deep learning, this can be a whole bunch of information overload. But good news is
236
00:21:49,680 --> 00:21:54,480
what we're going to be focused on building with PyTorch is neural networks, fully connected neural
237
00:21:54,480 --> 00:21:59,120
networks and convolutional neural networks, the foundation of deep learning. But the excellent
238
00:21:59,120 --> 00:22:03,280
thing is, the exciting thing is, is that if we learn these foundational building blocks,
239
00:22:03,280 --> 00:22:09,920
we can get into these other styles of things here. And again, part art, part science of machine
240
00:22:09,920 --> 00:22:14,720
learning and deep learning is depending on how you represent your problem, depending on what your
241
00:22:14,720 --> 00:22:21,920
problem is, many of the algorithms here and here can be used for both. So I know I've just kind of
242
00:22:21,920 --> 00:22:26,080
bedazzled you and saying that, Oh, well, you kind of use these ones for deep learning, you kind of
243
00:22:26,080 --> 00:22:30,960
use these ones for machine learning. But depending on what your problem is, you can also use both.
244
00:22:30,960 --> 00:22:35,760
So that's a little bit of confusion to machine learning. But that's a fun part about it too,
245
00:22:35,760 --> 00:22:41,440
is use your curiosity to figure out what's best for whatever you're working on. And with all this
246
00:22:41,440 --> 00:22:48,320
talk about neural networks, how about in the next video, we cover what are neural networks. Now,
247
00:22:48,320 --> 00:22:52,720
I'd like you to Google this before we watch the next video, because it's going to be hundreds of
248
00:22:52,720 --> 00:22:57,760
definitions of what they are. And I'd like you to start forming your own definition of what a
249
00:22:57,760 --> 00:23:05,120
neural network is. I'll see you in the next video. Welcome back. In the last video, I left you with
250
00:23:05,120 --> 00:23:09,920
the cliffhanger of a question. What are neural networks? And I gave you the challenge of
251
00:23:09,920 --> 00:23:13,440
Googling that, but you might have already done that by the time you've got here.
252
00:23:13,440 --> 00:23:19,360
Let's just do that together. If I type in what are neural networks, I've already done this.
253
00:23:20,240 --> 00:23:25,040
What are neural networks? Explain neural networks, neural network definition. There are hundreds
254
00:23:25,040 --> 00:23:30,240
of definitions of things like this online neural network in five minutes. Three blue one brown.
255
00:23:30,240 --> 00:23:35,280
I'd highly recommend that channel series on neural networks. That's going to be in the
256
00:23:35,280 --> 00:23:40,800
extracurricular stat quest is also amazing. So there's hundreds of different definitions on here,
257
00:23:40,800 --> 00:23:45,200
and you can read 10 of them, five of them, three of them, make your own definition.
258
00:23:45,200 --> 00:23:49,040
But for the sake of this course, here's how I'm going to find neural networks.
259
00:23:50,080 --> 00:23:55,440
So we have some data of whatever it is. We might have images of food. We might have
260
00:23:55,440 --> 00:24:00,240
tweets or natural language, and we might have speech. So these are some examples of inputs
261
00:24:00,240 --> 00:24:05,120
for unstructured data, because they're not rows and columns. So these are the input data that
262
00:24:05,120 --> 00:24:13,200
we have. And then how do we use them with a neural network? Well, before data can be used in a neural
263
00:24:13,200 --> 00:24:18,480
network, it needs to be turned into numbers, because humans, we like looking at images of Raman and
264
00:24:18,480 --> 00:24:23,120
spaghetti. We know that that's Raman. We know that that's spaghetti after we've seen it one or two
265
00:24:23,120 --> 00:24:30,000
times. And we like reading good tweets, and we like listening to amazing music or hearing our
266
00:24:30,000 --> 00:24:35,840
friend talk on the phone in audio file. However, before a computer understands what's going on
267
00:24:35,840 --> 00:24:41,760
in these inputs, it needs to turn them into numbers. So this is what I call a numerical
268
00:24:41,760 --> 00:24:47,600
encoding or a representation. And this numerical encoding, these square brackets indicate that
269
00:24:47,600 --> 00:24:52,560
it's part of a matrix or a tensor, which we're going to get very hands on with throughout this
270
00:24:52,560 --> 00:24:58,880
course. So we have our inputs, we've turned it into numbers, and then we pass it through a neural
271
00:24:58,880 --> 00:25:05,040
network. And now this is a graphic for a neural network. However, the graphics for neural networks,
272
00:25:05,040 --> 00:25:13,840
as we'll see, can get quite involved. But they all represent the same fundamentals. So if we go to
273
00:25:13,840 --> 00:25:18,640
this one, for example, we have an input layer, then we have multiple hidden layers. However,
274
00:25:18,640 --> 00:25:23,600
you define this, you can design these and how you want. Then we have an output layer. So our
275
00:25:23,600 --> 00:25:30,160
inputs will go in some kind of data. The hidden layers will perform mathematical operations on the
276
00:25:30,160 --> 00:25:35,840
input. So the numbers, and then we'll have an output. Oh, there's three blue one brown neural
277
00:25:35,840 --> 00:25:41,040
networks from the ground up. Great video. Highly recommend you check that out. But then if we come
278
00:25:41,040 --> 00:25:45,840
back to this, so we've got our inputs, we've turned it into numbers. And we've got our neural
279
00:25:45,840 --> 00:25:50,960
networks that we put the input in. This is typically the input layer, hidden layer. This can be as
280
00:25:50,960 --> 00:25:54,880
many different layers as you want, as many different, each of these little dots is called a node.
281
00:25:54,880 --> 00:25:58,160
There's a lot of information here, but we're going to get hands-on with seeing what this looks
282
00:25:58,160 --> 00:26:04,000
like. And then we have some kind of output. Now, which neural network should you use? Well,
283
00:26:04,000 --> 00:26:07,840
you can choose the appropriate neural network for your problem, which could involve you
284
00:26:07,840 --> 00:26:13,920
hand coding each one of these steps. Or you could find one that has worked on problems similar to
285
00:26:13,920 --> 00:26:19,760
your own, such as for images, you might use a CNN, which is a convolutional neural network.
286
00:26:19,760 --> 00:26:24,080
For natural language, you might use a transformer. For speech, you might also use a transformer.
287
00:26:24,080 --> 00:26:29,120
But fundamentally, they all follow the same principle of inputs, manipulation, outputs.
288
00:26:29,840 --> 00:26:35,440
And so the neural network will learn a representation on its own. We want to find what it learns.
289
00:26:35,440 --> 00:26:40,320
So it's going to manipulate these patterns in some way, shape, or form. And when I say
290
00:26:40,320 --> 00:26:44,800
learns representation, I'm going to also refer to it as learns patterns in the data.
291
00:26:44,800 --> 00:26:51,840
A lot of people refer to it as features. A feature may be the fact that the word do comes out to how,
292
00:26:51,840 --> 00:26:57,120
usually, in across a whole bunch of different languages. A feature can be almost anything you
293
00:26:57,120 --> 00:27:02,960
want. And again, we don't define this. The neural network learns these representations,
294
00:27:02,960 --> 00:27:08,880
patterns, features, also called weights on its own. And then where do we go from there? Well,
295
00:27:08,880 --> 00:27:14,720
we've got some sort of numbers, numerical encoding turned our data into numbers. Our neural network
296
00:27:14,720 --> 00:27:19,920
has learned a representation that it thinks best represents the patterns in our data.
297
00:27:20,560 --> 00:27:26,160
And then it outputs those representation outputs, which we can use. And often you'll
298
00:27:26,160 --> 00:27:30,960
hear this referred to as features or weight matrix or weight tensor.
299
00:27:31,520 --> 00:27:36,240
Learned representation is also another common one. There's a lot of different terms for these
300
00:27:36,240 --> 00:27:44,880
things. And then it will output. We can convert these outputs into human understandable outputs.
301
00:27:44,880 --> 00:27:49,600
So if we were to look at these, this could be, again, I said representations or patterns that
302
00:27:49,600 --> 00:27:54,640
are neural network learns can be millions of numbers. This is only nine. So imagine if these
303
00:27:54,640 --> 00:28:00,400
were millions of different numbers, I can barely understand the nine numbers that is going on here.
304
00:28:00,400 --> 00:28:05,760
So we need a way to convert these into human understandable terms. So for this example,
305
00:28:05,760 --> 00:28:10,640
we might have some input data, which are images of food. And then we want our neural network to
306
00:28:10,640 --> 00:28:15,120
learn the representations between an image of ramen and an image of spaghetti.
307
00:28:15,120 --> 00:28:19,520
And then eventually we'll take those patterns that it's learned and we'll convert them into
308
00:28:19,520 --> 00:28:25,440
whether it thinks that this is an image of ramen or spaghetti. Or in the case of this tweet,
309
00:28:25,440 --> 00:28:31,280
is this a tweet for a natural disaster or not a natural disaster? So our neural network has,
310
00:28:31,280 --> 00:28:36,320
well, we've written code to turn this into numbers. Pass it through our neural network. Our neural
311
00:28:36,320 --> 00:28:42,800
network has learned some kind of patterns. And then we ideally want it to represent this tweet
312
00:28:42,800 --> 00:28:48,320
as not a disaster. And then we can write code to do each of these steps here. And the same thing
313
00:28:48,320 --> 00:28:54,800
for these inputs going as speech, turning into something that you might say to your smart speaker,
314
00:28:54,800 --> 00:29:00,640
which I'm not going to say because a whole bunch of my devices might go off. And so let's cover
315
00:29:00,640 --> 00:29:05,920
the anatomy of neural networks. We've hinted at this a little bit already. But this is like
316
00:29:05,920 --> 00:29:11,760
neural network anatomy 101. Again, this is highly customizable what this thing actually is. We're
317
00:29:11,760 --> 00:29:18,400
going to see it in PyTorch code later on. But the data goes into the input layer. And in this case,
318
00:29:18,400 --> 00:29:26,720
the number of units slash neurons slash nodes is two hidden layers. You can have, I put a s here
319
00:29:26,720 --> 00:29:33,680
because you can have one hidden layer, or the deep in deep learning comes from having lots of
320
00:29:33,680 --> 00:29:39,680
layers. So this is only showing four layers. You might have, well, this is three layers as well.
321
00:29:40,240 --> 00:29:50,160
It might be very deep neural networks such as ResNet 152. This is 152 different layers.
322
00:29:50,160 --> 00:29:59,920
So again, you can, or this is 34, because this is only ResNet 34. But ResNet 152 has 152 different
323
00:29:59,920 --> 00:30:04,800
layers. So that's a common computer vision or a popular computer vision algorithm, by the way.
324
00:30:05,520 --> 00:30:10,720
Lots of terms we're throwing out here. But with time, you'll start to become familiar with them.
325
00:30:10,720 --> 00:30:15,920
So hidden layers can be almost as many as you want. We've only got pictured one here. And in this
326
00:30:15,920 --> 00:30:21,280
case, there's three hidden units slash neurons. And then we have an output layer. So the outputs
327
00:30:21,280 --> 00:30:27,600
learned representation or prediction probabilities from here, depending on how we set it up, which
328
00:30:27,600 --> 00:30:35,680
again, we will see what these are later on. And in this case, it has one hidden unit. So two input,
329
00:30:35,680 --> 00:30:40,560
three, one output, you can customize the number of these, you can customize how many layers there
330
00:30:40,560 --> 00:30:47,120
are, you can customize what goes into here, you can customize what goes out of there. So now,
331
00:30:47,840 --> 00:30:56,080
if we talk about the overall architecture, which is describing all of the layers combined. So that's,
332
00:30:56,080 --> 00:31:01,200
when you hear neural network architecture, it talks about the input, the hidden layers,
333
00:31:01,200 --> 00:31:05,920
which may be more than one, and the output layer. So that's a terminology for overall architecture.
334
00:31:05,920 --> 00:31:13,280
Now, I say patterns is an arbitrary term. You can hear embedding embedding might come from hidden
335
00:31:13,280 --> 00:31:18,480
layers, weights, feature representation, feature vectors, all referring to similar things. So,
336
00:31:18,480 --> 00:31:25,200
again, how do we turn our data into some numerical form, build a neural network to figure out patterns
337
00:31:25,200 --> 00:31:31,680
to output some desired output that we want. And now to get more technical, each layer is usually a
338
00:31:31,680 --> 00:31:38,560
combination of linear, so straight lines, and nonlinear, non-straight functions. So what I mean by that
339
00:31:38,560 --> 00:31:42,480
is a linear function is a straight line, a nonlinear function is a non-straight line.
340
00:31:43,120 --> 00:31:49,520
If I asked you to draw whatever you want with unlimited straight lines and not straight lines,
341
00:31:49,520 --> 00:31:54,640
so you can use straight lines or curved lines, what kind of patterns could you draw?
342
00:31:55,920 --> 00:32:00,080
At a fundamental level, that is basically what a neural network is doing. It's using a combination
343
00:32:00,080 --> 00:32:05,440
of linear, straight lines, and not straight lines to draw patterns in our data. We'll see what
344
00:32:05,440 --> 00:32:12,160
this looks like later on. Now, from the next video, let's dive in briefly to different kinds of
345
00:32:12,160 --> 00:32:17,840
learning. So we've looked at what a neural network is, the overall algorithm, but there are also
346
00:32:17,840 --> 00:32:22,480
different paradigms of how a neural network learns. I'll see you in the next video.
347
00:32:22,480 --> 00:32:29,920
Welcome back. We've discussed a brief overview of an anatomy of what a neural network is,
348
00:32:30,640 --> 00:32:36,560
but let's now discuss some learning paradigms. So the first one is supervised learning,
349
00:32:37,120 --> 00:32:43,120
and then we have unsupervised and self-supervised learning, and transfer learning. Now supervised
350
00:32:43,120 --> 00:32:48,800
learning is when you have data and labels, such as in the example we gave at the start, which was
351
00:32:48,800 --> 00:32:55,200
how you would build a neural network or a machine learning algorithm to figure out the rules to
352
00:32:55,200 --> 00:33:00,160
cook your Sicilian grandmother's famous roast chicken dish. So in the case of supervised learning,
353
00:33:00,160 --> 00:33:06,400
you'd have a lot of data, so inputs, such as raw ingredients as vegetables and chicken,
354
00:33:06,400 --> 00:33:13,680
and a lot of examples of what that inputs should ideally look like. Or in the case of discerning
355
00:33:13,680 --> 00:33:19,520
photos between a cat and a dog, you might have a thousand photos of a cat and a thousand photos
356
00:33:19,520 --> 00:33:25,440
of a dog that you know which photos are cat and which photos are dog, and you pass those photos
357
00:33:25,440 --> 00:33:31,600
to a machine learning algorithm to discern. So in that case, you have data, the photos, and the
358
00:33:31,600 --> 00:33:38,480
labels, aka cat and dog, for each of those photos. So that's supervised learning, data and labels.
359
00:33:39,040 --> 00:33:43,360
Unsupervised and self-supervised learning is you just have the data itself.
360
00:33:43,360 --> 00:33:48,400
You don't have any labels. So in the case of cat and dog photos, you only have the photos.
361
00:33:48,400 --> 00:33:54,000
You don't have the labels of cat and dog. So in the case of self-supervised learning,
362
00:33:54,720 --> 00:33:59,920
you could get a machine learning algorithm to learn an inherent representation of what,
363
00:33:59,920 --> 00:34:04,640
and when I say representation, I mean patterns and numbers, I mean weights, I mean features,
364
00:34:04,640 --> 00:34:08,960
a whole bunch of different names describing the same thing. You could get a self-supervised
365
00:34:08,960 --> 00:34:15,280
learning algorithm to figure out the fundamental patterns between a dog and a cat image, but
366
00:34:16,240 --> 00:34:19,440
it wouldn't necessarily know the difference between the two.
367
00:34:19,440 --> 00:34:22,640
That's where you could come in later and go show me the patterns you've learned,
368
00:34:22,640 --> 00:34:26,080
and it might show you the patterns and you could go, okay, the patterns that look like this,
369
00:34:26,080 --> 00:34:31,120
a dog and the patterns that look like that, a cat. So self-supervised and unsupervised learning
370
00:34:31,120 --> 00:34:36,800
learn solely on the data itself. And then finally, transfer learning is a very, very
371
00:34:36,800 --> 00:34:41,920
important paradigm in deep learning. It's taking the patterns that one model has learned
372
00:34:42,800 --> 00:34:48,720
of a data set and transferring it to another model, such in the case of if we were trying to
373
00:34:48,720 --> 00:34:53,520
build a supervised learning algorithm for discerning between cat and dog photos.
374
00:34:53,520 --> 00:34:57,840
We might start with a model that has already learned patterns and images
375
00:34:57,840 --> 00:35:03,120
and transfer those foundational patterns to our own model so that our model gets a head start.
376
00:35:03,120 --> 00:35:08,400
This is transfer learning is a very, very powerful technique, but as for this course,
377
00:35:08,400 --> 00:35:12,400
we're going to be writing code to focus on these two supervised learning and transfer learning,
378
00:35:12,400 --> 00:35:16,560
which are two of the most common paradigms or common types of learning in machine learning
379
00:35:16,560 --> 00:35:21,840
and deep learning. However, this style of code though can be adapted across different learning
380
00:35:21,840 --> 00:35:25,840
paradigms. Now, I just want to let you know there is one that I haven't mentioned here,
381
00:35:25,840 --> 00:35:33,040
which is kind of in its own bucket, and that is reinforcement learning. So I'll leave this
382
00:35:33,040 --> 00:35:36,720
as an extension if you wanted to look it up. But essentially, this is a good one.
383
00:35:37,440 --> 00:35:43,120
That's a good photo, actually. So shout out to Katie Nuggets. The whole idea of reinforcement
384
00:35:43,120 --> 00:35:48,560
learning is that you have some kind of environment and an agent that does actions in that environment,
385
00:35:48,560 --> 00:35:54,000
and you give rewards and observations back to that agent. So say, for example,
386
00:35:54,000 --> 00:36:01,440
you wanted to teach your dog to urinate outside. Well, you would reward its actions of urinating
387
00:36:01,440 --> 00:36:07,760
outside and possibly not reward its actions of urinating all over your couch. So reinforcement
388
00:36:07,760 --> 00:36:13,360
learning is again, it's kind of in its own paradigm. This picture has a good explanation
389
00:36:13,360 --> 00:36:17,440
between unsupervised learning, supervised learning to separate two different things,
390
00:36:17,440 --> 00:36:22,800
and then reinforcement learning is kind of like that. But again, I will let you research the
391
00:36:22,800 --> 00:36:27,360
different learning paradigms a little bit more in your own time. As I said, we're going to be
392
00:36:27,360 --> 00:36:33,280
focused on writing code to do supervised learning and transfer learning, specifically pytorch code.
393
00:36:34,160 --> 00:36:41,680
Now with that covered, let's get a few examples of what is deep learning actually used for. And
394
00:36:41,680 --> 00:36:45,920
before we get into the next video, I'm going to issue you a challenge to search this question
395
00:36:45,920 --> 00:36:51,440
yourself and come up with some of your own ideas for what deep learning is currently used for.
396
00:36:51,440 --> 00:36:58,480
So give that a shot and I'll see you in the next video. How'd you go? Did you do some research?
397
00:36:58,480 --> 00:37:02,400
Did you find out what deep learning is actually used for? I bet you found a treasure trail of
398
00:37:02,400 --> 00:37:06,880
things. And hey, I mean, if you're reading this course, chances are that you probably already
399
00:37:06,880 --> 00:37:11,120
know some use cases for deep learning. You're like, Daniel, hurry up and get to the code. Well,
400
00:37:11,120 --> 00:37:16,080
we're going to get there, don't you worry? But let's have a look at some things that deep
401
00:37:16,080 --> 00:37:20,720
learning can be used for. But before, I just want to remind you of this comment. This is from
402
00:37:20,720 --> 00:37:26,400
Yasha Sway on the 2020 machine learning roadmap video. I think you can use ML and remember,
403
00:37:26,400 --> 00:37:31,760
ML is machine learning. And remember, deep learning is a part of ML for literally anything as long
404
00:37:31,760 --> 00:37:36,400
as you can convert it into numbers and program it to find patterns. Literally, it could be anything,
405
00:37:36,400 --> 00:37:42,960
any input or output from the universe. So that's a beautiful thing about machine learning is that
406
00:37:42,960 --> 00:37:48,320
if you can encode it something into numbers, chances are you can build a machine learning
407
00:37:48,320 --> 00:37:53,680
algorithm to find patterns in those numbers. Will it work? Well, again, that's the reason machine
408
00:37:53,680 --> 00:37:57,920
learning and deep learning is part art, part science. A scientist would love to know that their
409
00:37:57,920 --> 00:38:02,240
experiments would work. But an artist is kind of excited about the fact that, I don't know,
410
00:38:02,240 --> 00:38:07,440
this might work, it might not. And so that's something to keep in mind. Along with the rule
411
00:38:07,440 --> 00:38:12,320
number one of machine learning is if you don't need it, you don't use it. But if you do use it,
412
00:38:12,320 --> 00:38:17,760
it can be used for almost anything. And let's get a little bit specific and find out some deep
413
00:38:17,760 --> 00:38:22,560
learning use cases. And I've put some up there for a reason because there are lots. These are just
414
00:38:22,560 --> 00:38:27,520
some that I interact with in my day to day life, such as recommendation, we've got a programming
415
00:38:27,520 --> 00:38:33,120
video, we've got a programming podcast, we got some jujitsu videos, we've got some RuneScape
416
00:38:33,120 --> 00:38:38,720
videos, a soundtrack from my favorite movie. Have you noticed, whenever you go to YouTube,
417
00:38:38,720 --> 00:38:44,320
you don't really search for things anymore. Well, sometimes you might, but the recommendation
418
00:38:44,320 --> 00:38:48,720
page is pretty darn good. That's all powered by deep learning. And in the last 10 years,
419
00:38:48,720 --> 00:38:54,000
have you noticed that translation has got pretty good too? Well, that's powered by deep learning
420
00:38:54,000 --> 00:39:00,080
as well. Now, I don't have much hands on experience with this. I did use it when I was in Japan.
421
00:39:00,080 --> 00:39:05,920
I speak a very little amount of Japanese and even smaller amount of Mandarin. But if I wanted to
422
00:39:05,920 --> 00:39:16,800
translate deep learning as epic to Spanish, it might come out as el aprendise, profando es ebiko.
423
00:39:16,800 --> 00:39:21,040
Now, all of the native Spanish speakers watching this video can laugh at me because that was a very
424
00:39:21,040 --> 00:39:27,120
Australian version of saying deep learning is epic in Spanish. But that's so cool. All the Google
425
00:39:27,120 --> 00:39:32,080
Translate is now powered by deep learning. And the beautiful thing, if I couldn't say it myself,
426
00:39:32,080 --> 00:39:37,200
I could click this speaker and it would say it for me. So that speech recognition that's powered
427
00:39:37,200 --> 00:39:41,520
by deep learning. So if you were to ask your voice assistant who's the biggest big dog of them all,
428
00:39:41,520 --> 00:39:45,920
of course, they're going to say you, which is what I've set up, my voice assistant to say.
429
00:39:45,920 --> 00:39:51,680
That's part of speech recognition. And in computer vision, oh, look at this. You see this? Where is
430
00:39:51,680 --> 00:39:57,440
this photo from? This photo is from this person driving this car. Did a hit and run on my car,
431
00:39:57,440 --> 00:40:02,160
at the front of my house, my apartment building, my car was parked on the street, this car, the
432
00:40:02,160 --> 00:40:07,440
trailer came off, ran into the back of my car, basically destroyed it, and then they drove off.
433
00:40:07,440 --> 00:40:14,240
However, my next door neighbors security camera picked up on this car. Now, I became a detective
434
00:40:14,240 --> 00:40:19,040
for a week, and I thought, hmm, if there was a computer vision algorithm built into that camera,
435
00:40:19,040 --> 00:40:24,080
it could have detected when the car hit. I mean, it took a lot of searching to find it,
436
00:40:24,080 --> 00:40:28,000
it turns out the car hit about 3.30am in the morning. So it's pitch black. And of course,
437
00:40:28,000 --> 00:40:32,080
we didn't get the license plate. So this person is out there somewhere in the world after doing
438
00:40:32,080 --> 00:40:37,760
a hit and run. So if you're watching this video, just remember computer vision might catch you one
439
00:40:37,760 --> 00:40:42,560
day. So this is called object detection, where you would place a box around the area where the
440
00:40:42,560 --> 00:40:46,960
pixels most represent the object that you're looking for. So for computer vision, we could
441
00:40:46,960 --> 00:40:52,400
train an object detector to capture cars that drive past a certain camera. And then if someone
442
00:40:52,400 --> 00:40:55,680
does a hit and run on you, you could capture it. And then fingers crossed, it's not too dark
443
00:40:55,680 --> 00:40:59,680
that you can read the license plate and go, hey, excuse me, please, this person has hit my car
444
00:40:59,680 --> 00:41:03,840
and wrecked it. So that's a very close to home story of where computer vision could be used.
445
00:41:03,840 --> 00:41:09,520
And then finally, natural language processing. Have you noticed as well, your spam detector on
446
00:41:09,520 --> 00:41:14,640
your email inbox is pretty darn good? Well, some are powered by deep learning, some not,
447
00:41:14,640 --> 00:41:19,360
it's hard to tell these days what is powered by deep learning, what isn't. But natural language
448
00:41:19,360 --> 00:41:25,520
processing is the process of looking at natural language text. So unstructured text. So whatever
449
00:41:25,520 --> 00:41:31,360
you'd write an email in a story in a Wikipedia document and deciding or getting your algorithm
450
00:41:31,360 --> 00:41:36,560
to find patterns in that. So for this example, you would find that this email is not spam.
451
00:41:36,560 --> 00:41:40,400
This deep learning course is incredible. I can't wait to use what I've learned. Thank you so much.
452
00:41:40,400 --> 00:41:45,200
And by the way, that is my real email. So if you want to email me, you can. And then this is spam.
453
00:41:45,200 --> 00:41:52,080
Hey, Daniel, congratulations, you win a lot of money. Wow, I really like that a lot of money.
454
00:41:52,080 --> 00:41:56,400
But somebody said, I don't think that this is real. So that would probably go to my spam inbox.
455
00:41:57,120 --> 00:42:05,040
Now, with that being said, if we wanted to put these problems in a little bit more of a
456
00:42:05,040 --> 00:42:09,520
classification, this is known as sequence to sequence because you put one sequence in
457
00:42:09,520 --> 00:42:15,280
and get one sequence out. Same as this, you have a sequence of audio waves and you get some
458
00:42:15,280 --> 00:42:22,880
text out. So sequence to sequence, sec to sec. This is classification slash regression. In this
459
00:42:22,880 --> 00:42:28,160
case, the regression is predicting a number. That's what a regression problem is. You would predict
460
00:42:28,160 --> 00:42:33,840
the coordinates of where these box corners should be. So say this should be at however many pixels
461
00:42:33,840 --> 00:42:38,560
in from the X angle and however many pixels down from the Y angle, that's that corner.
462
00:42:38,560 --> 00:42:43,280
And then you would draw in between the corners. And then the classification part would go,
463
00:42:43,280 --> 00:42:48,320
Hey, this is that car that did a hit and run on us. And in this case, this is classification.
464
00:42:48,320 --> 00:42:52,960
Classification is predicting whether something is one thing or another, or perhaps more than one
465
00:42:52,960 --> 00:42:58,160
thing or another in the class of multi class classification. So this email is not spam. That's
466
00:42:58,160 --> 00:43:06,560
a class and this email is spam. So that's also a class. So I think we've only got one direction
467
00:43:06,560 --> 00:43:09,840
to go now that we've sort of laid the foundation for the course. And that is
468
00:43:12,720 --> 00:43:16,560
Well, let's start talking about PyTorch. I'll see you in the next video.
469
00:43:18,320 --> 00:43:21,120
Well, let's now cover some of the foundations of
470
00:43:24,080 --> 00:43:31,840
PyTorch. But first, you might be asking, what is PyTorch? Well, of course, we could just go to
471
00:43:31,840 --> 00:43:38,880
our friend, the internet, and look up PyTorch.org. This is the homepage for PyTorch.
472
00:43:38,880 --> 00:43:43,680
This course is not a replacement for everything on this homepage. This should be your ground truth
473
00:43:43,680 --> 00:43:49,840
for everything PyTorch. So you can get started. You've got a big ecosystem. You've got a way to
474
00:43:49,840 --> 00:43:55,360
set up on your local computer. You've got resources. You've got docs. PyTorch. You've got the GitHub.
475
00:43:55,360 --> 00:44:00,800
You've got search. You've got blog, everything here. This website should be the place you're
476
00:44:00,800 --> 00:44:06,160
visiting most throughout this course as we're writing PyTorch code. You're coming here.
477
00:44:06,160 --> 00:44:09,840
You're reading about it. You're checking things out. You're looking at examples.
478
00:44:10,800 --> 00:44:18,080
But for the sake of this course, let's break PyTorch down. Oh, there's a little flame animation
479
00:44:18,080 --> 00:44:28,000
I just forgot about. What is PyTorch? I didn't sync up the animations. That's all right. So
480
00:44:28,000 --> 00:44:36,320
PyTorch is the most popular research deep learning framework. I'll get to that in a second.
481
00:44:36,320 --> 00:44:42,400
It allows you to write fast deep learning code in Python. If you know Python, it's a very user-friendly
482
00:44:42,400 --> 00:44:47,920
programming language. PyTorch allows us to write state-of-the-art deep learning code
483
00:44:47,920 --> 00:44:55,200
accelerated by GPUs with Python. It enables you access to many pre-built deep learning models
484
00:44:55,200 --> 00:45:01,280
from Torch Hub, which is a website that has lots of, if you remember, I said transfer learning is
485
00:45:01,280 --> 00:45:07,280
a way that we can use other deep learning models to power our own. Torch Hub is a resource for that.
486
00:45:07,280 --> 00:45:10,800
Same as Torch Vision.Models. We'll be looking at this throughout the course.
487
00:45:10,800 --> 00:45:16,000
It provides an ecosystem for the whole stack of machine learning. From pre-processing data,
488
00:45:16,000 --> 00:45:20,720
getting your data into tenses, what if you started with some images? How do you represent them as
489
00:45:20,720 --> 00:45:25,520
numbers? Then you can build models such as neural networks to model that data. Then you can even
490
00:45:25,520 --> 00:45:31,600
deploy your model in your application slash cloud, well, deploy your PyTorch model. Application slash
491
00:45:31,600 --> 00:45:37,360
cloud will be depending on what sort of application slash cloud that you're using, but generally it
492
00:45:37,360 --> 00:45:43,520
will run some kind of PyTorch model. And it was originally designed and used in-house by Facebook
493
00:45:43,520 --> 00:45:48,400
slash meta. I'm pretty sure Facebook have renamed themselves meta now, but it is now open source
494
00:45:48,400 --> 00:45:53,600
and used by companies such as Tesla, Microsoft and OpenAI. And when I say it is the most popular
495
00:45:53,600 --> 00:45:58,560
deep learning research framework, don't take my word for it. Let's have a look at papers with code
496
00:45:58,560 --> 00:46:03,840
dot com slash trends. If you're not sure what papers with code is, it is a website that tracks
497
00:46:03,840 --> 00:46:08,320
the latest and greatest machine learning papers and whether or not they have code. So we have some
498
00:46:08,320 --> 00:46:13,840
other languages here, other deep learning frameworks, PyTorch, TensorFlow, Jax is another one, MXNet,
499
00:46:13,840 --> 00:46:18,960
paddle paddle, the original torch. So PyTorch is an evolution of torch written in Python,
500
00:46:18,960 --> 00:46:25,760
CAF2, Mindspore. But if we look at this, when is this? Last date is December 2021. We have,
501
00:46:26,960 --> 00:46:33,520
oh, this is going to move every time I move it. No. So I'll highlight PyTorch at 58% there.
502
00:46:33,520 --> 00:46:40,560
So by far and large, the most popular research machine learning framework used to write the code
503
00:46:40,560 --> 00:46:45,680
for state of the art machine learning algorithms. So this is browse state of the art papers with
504
00:46:45,680 --> 00:46:51,120
code.com amazing website. We have semantic segmentation, image classification, object detection, image
505
00:46:51,120 --> 00:46:56,000
generation, computer vision, natural language processing, medical, I'll let you explore this.
506
00:46:56,000 --> 00:47:00,960
It's one of my favorite resources for staying up to date on the field. But as you see, out of the
507
00:47:00,960 --> 00:47:08,560
65,000 papers with code that this website is tracked, 58% of them are implemented with PyTorch.
508
00:47:08,560 --> 00:47:14,000
How cool is that? And this is what we're learning. So let's jump into there. Why PyTorch? Well,
509
00:47:14,000 --> 00:47:18,800
other than the reasons that we just spoke about, it's a research favorite. This is highlighting.
510
00:47:20,080 --> 00:47:27,280
There we go. So there we go. I've highlighted it here. PyTorch, 58%, nearly 2,500 repos. If
511
00:47:27,280 --> 00:47:31,920
you're not sure what a repo is, a repo is a place where you store all of your code online.
512
00:47:31,920 --> 00:47:37,680
And generally, if a paper gets published in machine learning, if it's fantastic research,
513
00:47:37,680 --> 00:47:44,080
it will come with code, code that you can access and use for your own applications or your own
514
00:47:44,080 --> 00:47:51,840
research. Again, why PyTorch? Well, this is a tweet from Francois Chale, who's the author of
515
00:47:51,840 --> 00:47:57,440
Keras, which is another popular deep learning framework. But with tools like Colab, we're going
516
00:47:57,440 --> 00:48:02,480
to see what Colab is in a second, Keras and TensorFlow. I've added in here and PyTorch.
517
00:48:02,480 --> 00:48:06,480
Virtually anyone can solve in a day with no initial investment problems that would have
518
00:48:06,480 --> 00:48:12,880
required an engineering team working for a quarter and $20,000 in hardware in 2014. So this is just
519
00:48:12,880 --> 00:48:18,800
to highlight how good the space of deep learning and machine learning tooling has become. Colab,
520
00:48:18,800 --> 00:48:24,880
Keras and TensorFlow are all fantastic. And now PyTorch is added to this list. If you want to
521
00:48:24,880 --> 00:48:31,600
check that out, there's Francois Chale on Twitter. Very, very prominent voice in the machine learning
522
00:48:31,600 --> 00:48:37,440
field. Why PyTorch? If you want some more reasons, well, have a look at this. Look at all the
523
00:48:37,440 --> 00:48:42,560
places that are using PyTorch. It's just coming up everywhere. We've got Andre Kapathi here,
524
00:48:42,560 --> 00:48:49,360
who's the director of AI at Tesla. So if we go, we could search this, PyTorch
525
00:48:51,200 --> 00:48:59,760
at Tesla. We've got a YouTube talk there, Andre Kapathi, director of AI at Tesla.
526
00:48:59,760 --> 00:49:08,640
And so Tesla are using PyTorch for the computer vision models of autopilot. So if we go to videos
527
00:49:08,640 --> 00:49:16,160
or maybe images, does it come up there? Things like this, a car detecting what's going on in the scene.
528
00:49:16,960 --> 00:49:21,360
Of course, there'll be some other code for planning, but I'll let you research that.
529
00:49:22,080 --> 00:49:28,640
When we come back here, OpenAI, which is one of the biggest open artificial intelligence
530
00:49:28,640 --> 00:49:34,480
research firms, open in the sense that they publish a lot of their research methodologies,
531
00:49:34,480 --> 00:49:40,960
however, recently there's been some debate about that. But if you go to openai.com,
532
00:49:40,960 --> 00:49:45,280
let's just say that they're one of the biggest AI research entities in the world,
533
00:49:45,280 --> 00:49:50,800
and they've standardized on PyTorch. So they've got a great blog, they've got great research,
534
00:49:50,800 --> 00:49:56,320
and now they've got OpenAI API, which is, you can use their API to access some of the models
535
00:49:56,320 --> 00:50:02,080
that they've trained. Presumably with PyTorch, because this blog post from January 2020 says
536
00:50:02,080 --> 00:50:07,280
that OpenAI is now standardized across PyTorch. There's a repo called the incredible PyTorch,
537
00:50:07,280 --> 00:50:11,040
which collects a whole bunch of different projects that are built on top of PyTorch.
538
00:50:11,040 --> 00:50:15,040
That's the beauty of PyTorch is that you can build on top of it, you can build with it
539
00:50:15,040 --> 00:50:22,960
AI for AG, for agriculture. PyTorch has been used. Let's have a look. PyTorch in agriculture.
540
00:50:22,960 --> 00:50:29,520
There we go. Agricultural robots use PyTorch. This is a medium article.
541
00:50:31,920 --> 00:50:37,120
It's everywhere. So if we go down here, this is using object detection. Beautiful.
542
00:50:38,560 --> 00:50:43,280
Object detection to detect what kind of weeds should be sprayed with fertilizer. This is just
543
00:50:43,280 --> 00:50:49,200
one of many different things, so PyTorch on a big tractor like this. It can be used almost
544
00:50:49,200 --> 00:50:53,920
anywhere. If we come back, PyTorch builds the future of AI and machine learning at Facebook,
545
00:50:53,920 --> 00:50:58,480
so Facebook, which is also MetaAI, a little bit confusing, even though it says MetaAI,
546
00:50:58,480 --> 00:51:03,520
it's on AI.facebook.com. That may change by the time you watch this. They use PyTorch in-house
547
00:51:03,520 --> 00:51:09,200
for all of their machine learning applications. Microsoft is huge in the PyTorch game.
548
00:51:09,200 --> 00:51:14,560
It's absolutely everywhere. So if that's not enough reason to use PyTorch,
549
00:51:14,560 --> 00:51:19,920
well, then maybe you're in the wrong course. So you've seen enough reasons of why to use PyTorch.
550
00:51:19,920 --> 00:51:25,280
I'm going to give you one more. That is that it helps you run your code, your machine learning code
551
00:51:25,280 --> 00:51:30,720
accelerated on a GPU. We've covered this briefly, but what is a GPU slash a TPU,
552
00:51:30,720 --> 00:51:36,320
because this is more of a newer chip these days. A GPU is a graphics processing unit,
553
00:51:36,320 --> 00:51:41,040
which is essentially very fast at crunching numbers. Originally designed for video games,
554
00:51:41,040 --> 00:51:45,440
if you've ever designed or played a video game, you know that the graphics are quite intense,
555
00:51:45,440 --> 00:51:50,800
especially these days. And so to render those graphics, you need to do a lot of numerical calculations.
556
00:51:50,800 --> 00:51:56,720
And so the beautiful thing about PyTorch is that it enables you to leverage a GPU through an
557
00:51:56,720 --> 00:52:01,840
interface called CUDA, which is a lot of words I'm going to throw out you here, a lot of acronyms
558
00:52:01,840 --> 00:52:08,080
in the deep learning space, CUDA. Let's just search CUDA. CUDA toolkit. So CUDA is a parallel
559
00:52:08,080 --> 00:52:12,880
computing platform and application programming interface, which is an API that allows software
560
00:52:12,880 --> 00:52:18,400
to use certain types of graphics processing units for general purpose computing. That's what
561
00:52:18,400 --> 00:52:25,840
we want. So PyTorch leverages CUDA to enable you to run your machine learning code on NVIDIA
562
00:52:25,840 --> 00:52:32,960
GPUs. Now, there is also an ability to run your PyTorch code on TPUs, which is a tensor processing
563
00:52:32,960 --> 00:52:39,440
unit. However, GPUs are far more popular when running various types of PyTorch code. So we're
564
00:52:39,440 --> 00:52:45,440
going to focus on running our PyTorch code on the GPU. And to just give you a quick example,
565
00:52:45,440 --> 00:52:51,520
PyTorch on TPU, let's see that. Getting started with PyTorch on cloud TPUs, there's plenty of
566
00:52:51,520 --> 00:52:57,120
guys for that. But as I said, GPUs are going to be far more common in practice. So that's what
567
00:52:57,120 --> 00:53:04,320
we're going to focus on. And with that said, we've said tensor processing unit. Now, the reason
568
00:53:04,320 --> 00:53:08,640
why these are called tensor processing units is because machine learning and deep learning
569
00:53:08,640 --> 00:53:16,160
deals a lot with tensors. And so in the next video, let's answer the question, what is a tensor?
570
00:53:16,160 --> 00:53:21,440
But before I go through and answer that from my perspective, I'd like you to research this
571
00:53:21,440 --> 00:53:27,200
question. So open up Google or your favorite search engine and type in what is a tensor and
572
00:53:27,200 --> 00:53:34,400
see what you find. I'll see you in the next video. Welcome back. In the last video, I left you on
573
00:53:34,400 --> 00:53:40,880
the cliffhanger question of what is a tensor? And I also issued you the challenge to research
574
00:53:40,880 --> 00:53:45,760
what is a tensor. Because as I said, this course isn't all about telling you exactly what things
575
00:53:45,760 --> 00:53:50,960
are. It's more so sparking a curiosity in you so that you can stumble upon the answers to these
576
00:53:50,960 --> 00:53:56,160
things yourself. But let's have a look. What is a tensor? Now, if you remember this graphic,
577
00:53:56,160 --> 00:54:00,320
there's a lot going on here. But this is our neural network. We have some kind of input,
578
00:54:00,320 --> 00:54:05,360
some kind of numerical encoding. Now, we start with this data. In our case, it's unstructured data
579
00:54:05,360 --> 00:54:11,920
because we have some images here, some text here, and some audio file here. Now, these necessarily
580
00:54:11,920 --> 00:54:17,600
don't go in all at the same time. This image could just focus on a neural network specifically
581
00:54:17,600 --> 00:54:23,280
for images. This text could focus on a neural network specifically for text. And this sound bite
582
00:54:23,280 --> 00:54:29,680
or speech could focus on a neural network specifically for speech. However, the field is sort of also
583
00:54:29,680 --> 00:54:34,240
moving towards building neural networks that are capable of handling all three types of inputs.
584
00:54:34,960 --> 00:54:39,520
For now, we're going to start small and then build up the algorithms that we're going to focus on
585
00:54:39,520 --> 00:54:45,440
are neural networks that focus on one type of data. But the premise is still the same. You have
586
00:54:45,440 --> 00:54:50,480
some kind of input. You have to numerically encode it in some form, pass it to a neural network
587
00:54:50,480 --> 00:54:55,520
to learn representations or patterns within that numerical encoding, output some form of
588
00:54:55,520 --> 00:55:00,800
representation. And then we can convert that representation into things that humans understand.
589
00:55:01,760 --> 00:55:06,800
And you might have already seen these, and I might have already referenced the fact that
590
00:55:07,360 --> 00:55:13,200
these are tensors. So when the question comes up, what are tensors? A tensor could be almost
591
00:55:13,200 --> 00:55:18,400
anything. It could be almost any representation of numbers. We're going to get very hands on with
592
00:55:18,400 --> 00:55:23,840
tensors. And that's actually the fundamental building block of PyTorch aside from neural network
593
00:55:23,840 --> 00:55:31,040
components is the torch dot tensor. We're going to see that very shortly. But this is a very
594
00:55:31,040 --> 00:55:36,400
important takeaway is that you have some sort of input data. You're going to numerically encode
595
00:55:36,400 --> 00:55:41,840
that data, turn it into a tensor of some kind. Whatever that kind is will depend on the problem
596
00:55:41,840 --> 00:55:47,920
you're working with. Then you're going to pass it to a neural network, which will perform mathematical
597
00:55:47,920 --> 00:55:54,000
operations on that tensor. Now, a lot of those mathematical operations are taken care of by
598
00:55:54,800 --> 00:55:59,760
PyTorch behind the scenes. So we'll be writing code to execute some kind of mathematical
599
00:55:59,760 --> 00:56:06,400
operations on these tensors. And then the neural network that we create, or the one that's already
600
00:56:06,400 --> 00:56:12,400
been created, but we just use for our problem, we'll output another tensor, similar to the input,
601
00:56:12,400 --> 00:56:17,920
but has been manipulated in a certain way that we've sort of programmed it to. And then we can take
602
00:56:17,920 --> 00:56:25,600
this output tensor and change it into something that a human can understand. So to remove a lot
603
00:56:25,600 --> 00:56:30,720
of the text around it, make it a little bit more clearer. If we were focusing on building an image
604
00:56:30,720 --> 00:56:35,200
classification model, so we want to classify whether this was a photo of Raman or spaghetti,
605
00:56:35,200 --> 00:56:40,240
we would have images as input. We would turn those images into numbers, which are represented
606
00:56:40,240 --> 00:56:45,440
by a tensor. We would pass that tensor of numbers to a neural network, or there might be lots of
607
00:56:45,440 --> 00:56:50,240
tensors here. We might have 10,000 images. We might have a million images. Or in some cases,
608
00:56:50,240 --> 00:56:55,600
if you're Google or Facebook, you might be working with 300 million or a billion images at a time.
609
00:56:56,800 --> 00:57:02,880
The principle still stands that you encode your data in some form of numerical representation,
610
00:57:02,880 --> 00:57:08,720
which is a tensor, pass that tensor, or lots of tensors to a neural network. The neural network
611
00:57:08,720 --> 00:57:14,080
performs mathematical operations on those tensors, outputs a tensor, we convert that tensor into
612
00:57:14,080 --> 00:57:20,000
something that we can understand as humans. And so with that being said, we've covered a lot of
613
00:57:20,000 --> 00:57:24,480
the fundamentals. What is machine learning? What is deep learning? What is neural network? Well,
614
00:57:24,480 --> 00:57:28,720
we've touched the surface of these things. You can get as deep as you like. We've covered
615
00:57:28,720 --> 00:57:33,760
why use PyTorch. What is PyTorch? Now, the fundamental building block of deep learning
616
00:57:33,760 --> 00:57:37,840
is tensors. We've covered that. Let's get a bit more specific in the next video
617
00:57:37,840 --> 00:57:43,840
of what we're going to cover code-wise in this first module. I'm so excited we're going to start
618
00:57:43,840 --> 00:57:50,720
codes in. I'll see you in the next video. Now it's time to get specific about what we're going to
619
00:57:50,720 --> 00:57:56,560
cover code-wise in this fundamentals module. But I just want to reiterate the fact that
620
00:57:56,560 --> 00:58:01,680
going back to the last video where I challenge you to look up what is a tensor, here's exactly
621
00:58:01,680 --> 00:58:06,720
what I would do. I would come to Google. I would type in the question, what is a tensor? There we go.
622
00:58:06,720 --> 00:58:11,680
What is a tensor in PyTorch? It knows Google knows that using that deep learning data that we want
623
00:58:11,680 --> 00:58:15,920
to know what a tensor is in PyTorch. But a tensor is a very general thing. It's not
624
00:58:15,920 --> 00:58:21,360
associated with just PyTorch. Now we've got tensor on Wikipedia. We've got tensor. This is probably
625
00:58:21,360 --> 00:58:26,960
my favorite video on what is a tensor. By Dan Flesch. Flesch, I'm probably saying that wrong,
626
00:58:26,960 --> 00:58:33,440
but good first name. This is going to be your extra curriculum for this video and the previous
627
00:58:33,440 --> 00:58:38,960
video is to watch this on what is a tensor. Now you might be saying, well, what gives? I've come to
628
00:58:38,960 --> 00:58:43,360
this course to learn PyTorch and all this guy's doing, all you're doing, Daniel, is just Googling
629
00:58:43,360 --> 00:58:49,120
things when a question comes up. Why don't you just tell me what it is? Well, if I was to tell you
630
00:58:49,120 --> 00:58:53,680
everything about deep learning and machine learning and PyTorch and what it is and what it's not,
631
00:58:53,680 --> 00:58:58,800
that course would be far too long. I'm doing this on purpose. I'm searching questions like this on
632
00:58:58,800 --> 00:59:04,240
purpose because that's exactly what I do day to day as a machine learning engineer. I write code
633
00:59:04,240 --> 00:59:09,440
like we're about to do. And then if I don't know something, I literally go to whatever search engine
634
00:59:09,440 --> 00:59:14,800
I'm using, Google most of the time, and type in whatever error I'm getting or PyTorch, what is
635
00:59:14,800 --> 00:59:20,800
a tensor, something like that. So I want to not only tell you that it's okay to search questions
636
00:59:20,800 --> 00:59:25,360
like that, but it's encouraged. So just keep that in mind as we go through the whole course,
637
00:59:25,360 --> 00:59:30,160
you're going to see me do it a lot. Let's get into what we're going to cover. Here we go.
638
00:59:31,200 --> 00:59:36,320
Now, this tweet is from Elon Musk. And so I've decided, you know what, let's base the whole
639
00:59:36,320 --> 00:59:41,920
course on this tweet. We have learning MLDL from university, you have a little bit of a small brain.
640
00:59:41,920 --> 00:59:47,040
Online courses, well, like this one, that brain's starting to explode and you get some little fireworks
641
00:59:47,040 --> 00:59:53,520
from YouTube. Oh, you're watching this on YouTube. Look at that shiny brain from articles. My goodness.
642
00:59:54,720 --> 01:00:03,040
Lucky that this course comes in article format. If you go to learn pytorch.io, all of the course
643
01:00:03,040 --> 01:00:08,800
materials are in online book format. So we're going to get into this fundamental section very
644
01:00:08,800 --> 01:00:13,360
shortly. But if you want a reference, the course materials are built off this book. And by the
645
01:00:13,360 --> 01:00:17,680
time you watch this, there's going to be more chapters here. So we're covering all the bases
646
01:00:17,680 --> 01:00:23,280
here. And then finally, from memes, you would ascend to some godlike creature. I think that's
647
01:00:23,280 --> 01:00:27,680
hovering underwater. So that is the best way to learn machine learning. So this is what we're
648
01:00:27,680 --> 01:00:33,680
going to start with MLDL from university online courses, YouTube from articles from memes. No,
649
01:00:33,680 --> 01:00:41,520
no, no, no. But kind of here is what we're going to cover broadly. So now in this module,
650
01:00:42,480 --> 01:00:47,520
we are going to cover the pytorch basics and fundamentals, mainly dealing with tensors and
651
01:00:47,520 --> 01:00:52,880
tensor operations. Remember, a neural network is all about input tensors, performing operations on
652
01:00:52,880 --> 01:00:59,920
those tensors, creating output operations. Later, we're going to be focused on pre-processing data,
653
01:00:59,920 --> 01:01:06,000
getting it into tensors, so turning data from raw form, images, whatever, into a numerical
654
01:01:06,000 --> 01:01:09,840
encoding, which is a tensor. Then we're going to look at building and using pre-trained deep
655
01:01:09,840 --> 01:01:14,240
learning models, specifically neural networks. We're going to fit a model to the data. So we're
656
01:01:14,240 --> 01:01:18,960
going to show our model or write code for our model to learn patterns in the data that we've
657
01:01:18,960 --> 01:01:22,400
pre-processed. We're going to see how we can make predictions with our model, because that's
658
01:01:22,400 --> 01:01:26,160
what deep learning and machine learning is all about, right, using patterns from the past to
659
01:01:26,160 --> 01:01:30,240
predict the future. And then we're going to evaluate our model's predictions. We're going to learn
660
01:01:30,240 --> 01:01:34,720
how to save and load our models. For example, if you wanted to export your model from where we're
661
01:01:34,720 --> 01:01:39,360
working to an application or something like that. And then finally, we're going to see how we can
662
01:01:39,360 --> 01:01:46,000
use a trained model to make predictions on our own data on custom data, which is very fun. And
663
01:01:46,000 --> 01:01:51,280
how? Well, you can see that the scientist has faded out a little bit, but that's not really that true.
664
01:01:51,280 --> 01:01:56,320
We're going to do it like cooks, not chemists. So chemists are quite precise. Everything has to be
665
01:01:56,320 --> 01:02:01,520
exactly how it is. But cooks are more like, oh, you know what, a little bit of salt, a little bit of
666
01:02:01,520 --> 01:02:06,080
butter. Does it taste good? Okay, well, then we're on. But machine learning is a little bit of both.
667
01:02:06,080 --> 01:02:11,280
It's a little bit of science, a little bit of art. That's how we're going to do it. But
668
01:02:11,280 --> 01:02:15,760
I like the idea of this being a machine learning cooking show. So welcome to cooking with machine
669
01:02:15,760 --> 01:02:25,120
learning, cooking with PyTorch with Daniel. And finally, we've got a workflow here, which we have
670
01:02:25,120 --> 01:02:29,840
a PyTorch workflow, which is one of many. We're going to kind of use this throughout the entire
671
01:02:29,840 --> 01:02:34,480
course is step one, we're going to get our data ready. Step two, we're going to build a
672
01:02:34,480 --> 01:02:38,720
pick a pre trained model to suit whatever problem we're working on. Step two point one,
673
01:02:38,720 --> 01:02:42,320
pick a loss function and optimizer. Don't worry about what they are. We're going to cover them
674
01:02:42,320 --> 01:02:47,440
soon. Step two point two, build a training loop. Now this is kind of all part of the parcel of
675
01:02:47,440 --> 01:02:51,680
step two, hence why we've got two point one and two point two. You'll see what that means later on.
676
01:02:52,240 --> 01:02:55,520
Number three, we're going to fit the model to the data and make a prediction. So say we're working
677
01:02:55,520 --> 01:03:01,120
on image classification for Raman or spaghetti. How do we build a neural network or put our
678
01:03:01,120 --> 01:03:06,000
images through that neural network to get some sort of idea of what's in an image? We'll see
679
01:03:06,000 --> 01:03:11,360
how to do that. Well, the value weight our model to see if it's predicting BS or it's actually
680
01:03:11,360 --> 01:03:16,640
going all right. Number five, we're going to improve through experimentation. That's another
681
01:03:16,640 --> 01:03:20,720
big thing that you'll notice throughout machine learning throughout this course is that it's
682
01:03:20,720 --> 01:03:27,200
very experimental part art, part science. Number six, save and reload your trained model. Again,
683
01:03:27,200 --> 01:03:33,040
I put these with numerical order, but they can kind of be mixed and matched depending on where
684
01:03:33,040 --> 01:03:40,560
you are in the journey. But numerical order is just easy to understand for now. Now we've got
685
01:03:40,560 --> 01:03:45,840
one more video, maybe another one before we get into code. But in the next video, I'm going to
686
01:03:45,840 --> 01:03:52,320
cover some very, very important points on how you should approach this course. I'll see you there.
687
01:03:54,400 --> 01:03:58,400
Now you might be asking, how should I approach this course? You might not be asking, but we're
688
01:03:58,400 --> 01:04:03,440
going to answer it anyway. How to approach this course? This is how I would recommend approaching
689
01:04:03,440 --> 01:04:10,000
this course. So I'm a machine learning engineer day to day and learning machine learning to
690
01:04:10,000 --> 01:04:14,560
coding machine learning, a kind of two different things. I remember when I first learned it was
691
01:04:14,560 --> 01:04:18,960
kind of, you learned a lot of theory rather than writing code. So not to take away from the theory
692
01:04:18,960 --> 01:04:24,000
of being important, this course is going to be focusing on writing machine learning specifically
693
01:04:24,000 --> 01:04:31,200
PyTorch code. So the number one step to approaching this course is to code along. Now because this
694
01:04:31,200 --> 01:04:37,840
course is focused on purely writing code, I will be linking extracurricular resources for you to
695
01:04:37,840 --> 01:04:43,360
learn more about what's going on behind the scenes of the code. My idea of teaching is that if we
696
01:04:43,360 --> 01:04:48,080
can code together, write some code, see how it's working, that's going to spark your curiosity to
697
01:04:48,080 --> 01:04:55,040
figure out what's going on behind the scenes. So motto number one is if and out, run the code,
698
01:04:55,040 --> 01:05:05,120
write it, run the code, see what happens. Number two, I love that. Explore an experiment again.
699
01:05:05,120 --> 01:05:13,600
Approach this with the idea, the mind of a scientist and a chef or science and art. Experiment,
700
01:05:13,600 --> 01:05:18,880
experiment, experiment. Try things with rigor like a scientist would, and then just try things
701
01:05:18,880 --> 01:05:25,200
for the fun of it like a chef would. Number three, visualize what you don't understand. I can't
702
01:05:25,200 --> 01:05:29,200
emphasize this one enough. We have three models so far. If and out, run the code, you're going to
703
01:05:29,200 --> 01:05:34,640
hear me say this a lot. Experiment, experiment, experiment. And number three, visualize, visualize,
704
01:05:34,640 --> 01:05:39,600
visualize. Why is this? Well, because we've spoken about machine learning and deep learning
705
01:05:39,600 --> 01:05:45,200
deals with a lot of data, a lot of numbers. And so I find it that if I visualize some numbers in
706
01:05:45,760 --> 01:05:51,840
whatever form that isn't just numbers all over a page, I tend to understand it better.
707
01:05:51,840 --> 01:05:57,120
And there are some great extracurricular resources that I'm going to link that also turn what we're
708
01:05:57,120 --> 01:06:07,040
doing. So writing code into fantastic visualizations. Number four, ask questions, including the dumb
709
01:06:07,040 --> 01:06:11,120
questions. Really, there's no such thing as a dumb question. Everyone is just on a different
710
01:06:11,120 --> 01:06:15,520
part of their learning journey. And in fact, if you do have a quote unquote dumb question,
711
01:06:15,520 --> 01:06:20,080
it turns out that a lot of people probably have that one as well. So be sure to ask questions.
712
01:06:20,080 --> 01:06:24,000
I'm going to link a resource in a minute of where you can ask those questions, but
713
01:06:24,000 --> 01:06:29,120
please, please, please ask questions, not only to the community, but to Google to the internet
714
01:06:29,120 --> 01:06:33,440
to wherever you can, or just yourself. Ask questions of the code and write code to figure
715
01:06:33,440 --> 01:06:38,160
out the answer to those questions. Number five, do the exercises. There are some
716
01:06:38,720 --> 01:06:44,640
great exercises that I've created for each of the modules. If we go, have we got the book version
717
01:06:44,640 --> 01:06:51,280
of the course up here? We do. Within all of these chapters here, down the bottom is going to be
718
01:06:51,280 --> 01:06:56,960
exercises and extra curriculum. So we've got some exercises. I'm not going to jump into them,
719
01:06:56,960 --> 01:07:02,720
but I would highly recommend don't just follow along with the course and code after I code.
720
01:07:02,720 --> 01:07:08,160
Please, please, please give the exercises a go because that's going to stretch your knowledge.
721
01:07:09,520 --> 01:07:13,200
We're going to have a lot of practice writing code together, doing all of this stuff here.
722
01:07:13,200 --> 01:07:16,800
But then the exercises are going to give you a chance to practice what you've learned.
723
01:07:16,800 --> 01:07:22,160
And then of course, extra curriculum. Well, hey, if you want to learn more, there's plenty of
724
01:07:22,160 --> 01:07:30,160
opportunities to do so there. And then finally, number six, share your work. I can't emphasize
725
01:07:30,160 --> 01:07:36,320
enough how much writing about learning deep learning or sharing my work through GitHub or
726
01:07:36,320 --> 01:07:42,160
different code resources or with the community has helped with my learning. So if you learn
727
01:07:42,160 --> 01:07:47,920
something cool about PyTorch, I'd love to see it. Link it to me somehow in the Discord chat
728
01:07:47,920 --> 01:07:53,360
or on GitHub or whatever. There'll be links of where you can find me. I'd love to see it. Please
729
01:07:53,360 --> 01:07:58,560
do share your work. It's a great way to not only learn something because when you share it, when
730
01:07:58,560 --> 01:08:02,960
you write about it, it's like, how would someone else understand it? But it's also a great way to
731
01:08:02,960 --> 01:08:10,160
help others learn too. And so we said how to approach this course. Now, let's go how not to
732
01:08:10,160 --> 01:08:16,800
approach this course. I would love for you to avoid overthinking the process. And this is your brain,
733
01:08:16,800 --> 01:08:21,920
and this is your brain on fire. So avoid having your brain on fire. That's not a good place to be.
734
01:08:21,920 --> 01:08:27,280
We are working with PyTorch, so it's going to be quite hot. Just playing on words with the name
735
01:08:27,280 --> 01:08:32,400
torch. But avoid your brain catching on fire. And avoid saying, I can't learn,
736
01:08:33,920 --> 01:08:38,000
I've said this to myself lots of times, and then I've practiced it and it turns out I can
737
01:08:38,000 --> 01:08:42,240
actually learn those things. So let's just draw a red line on there. Oh, I think a red line.
738
01:08:42,240 --> 01:08:46,480
Yeah, there we go. Nice and thick red line. We'll get that out there. It doesn't really make sense
739
01:08:46,480 --> 01:08:52,400
now that this says avoid and crossed out. But don't say I can't learn and prevent your brain from
740
01:08:52,400 --> 01:08:58,480
catching on fire. Finally, we've got one more video that I'm going to cover before this one
741
01:08:58,480 --> 01:09:03,120
gets too long of the resources for the course before we get into coding. I'll see you there.
742
01:09:03,120 --> 01:09:10,000
Now, there are some fundamental resources that I would like you to be aware of before we
743
01:09:10,000 --> 01:09:14,640
go any further in this course. These are going to be paramount to what we're working with.
744
01:09:14,640 --> 01:09:21,600
So for this course, there are three things. There is the GitHub repo. So if we click this link,
745
01:09:22,480 --> 01:09:25,840
I've got a pinned on my browser. So you might want to do the same while you're going through
746
01:09:25,840 --> 01:09:32,080
the course. But this is Mr. D. Burks in my GitHub slash PyTorch deep learning. It is still a work
747
01:09:32,080 --> 01:09:35,760
in progress at the time of recording this video. But by the time you go through it, it won't look
748
01:09:35,760 --> 01:09:40,080
too much different, but there just be more materials. You'll have materials outline,
749
01:09:40,080 --> 01:09:44,560
section, what does it cover? As you can see, some more are coming soon at the time of recording
750
01:09:44,560 --> 01:09:49,200
this. So these will probably be done by the time you watch this exercise in extra curriculum.
751
01:09:49,200 --> 01:09:54,240
There'll be links here. Basically, everything you need for the course will be in the GitHub repo.
752
01:09:54,240 --> 01:10:02,240
And then if we come back, also on the GitHub repo, the same repo. So Mr. D. Burks slash PyTorch
753
01:10:02,240 --> 01:10:08,240
deep learning. If you click on discussions, this is going to be the Q and A. This is just the same
754
01:10:08,240 --> 01:10:13,760
link here, the Q and A for the course. So if you have a question here, you can click new discussion,
755
01:10:13,760 --> 01:10:25,520
you can go Q and A, and then type in video, and then the title PyTorch Fundamentals, and then go
756
01:10:25,520 --> 01:10:35,200
in here. Or you could type in your error as well. What is N-DIM for a tensor? And then in here,
757
01:10:35,200 --> 01:10:42,720
you can type in some stuff here. Hello. I'm having trouble on video X, Y, Z. Put in the name of the
758
01:10:42,720 --> 01:10:48,640
video. So that way I can, or someone else can help you out. And then code, you can go three
759
01:10:48,640 --> 01:10:54,880
back ticks, write Python, and then you can go import torch, torch dot rand n, which is going to
760
01:10:54,880 --> 01:10:59,920
create a tensor. We're going to see this in a second. Yeah, yeah, yeah. And then if you post that
761
01:10:59,920 --> 01:11:05,840
question, the formatting of the code is very helpful that we can understand what's going on,
762
01:11:05,840 --> 01:11:11,040
and what's going on here. So this is basically the outline of how I would ask a question video.
763
01:11:11,040 --> 01:11:16,880
This is going on. What is such and such for whatever's going on? Hello. This is what I'm having
764
01:11:16,880 --> 01:11:21,600
trouble with. Here's the code, and here's what's happening. You could even include the error message,
765
01:11:21,600 --> 01:11:26,400
and then you can just click start discussion, and then someone, either myself or someone else from
766
01:11:26,400 --> 01:11:30,320
the course will be able to help out there. And the beautiful thing about this is that it's all in
767
01:11:30,320 --> 01:11:34,640
one place. You can start to search it. There's nothing here yet because the course isn't out yet,
768
01:11:34,640 --> 01:11:38,880
but as you go through it, there will probably be more and more stuff here. Then if you have any
769
01:11:38,880 --> 01:11:44,000
issues with the code that you think needs fixed, you can also open a new issue there. I'll let you
770
01:11:44,000 --> 01:11:48,800
read more into what's going on. I've just got some issues here already about the fact that I
771
01:11:48,800 --> 01:11:52,480
need to record videos for the course. I need to create some stuff. But if you think there's
772
01:11:52,480 --> 01:11:56,560
something that could be improved, make an issue. If you have a question about the course,
773
01:11:57,200 --> 01:12:02,000
ask a discussion. And then if we come back to the keynote, we have one more resource. So that
774
01:12:02,000 --> 01:12:06,000
was the course materials all live in the GitHub. The course Q&A is on the course
775
01:12:06,000 --> 01:12:13,120
GitHub's discussions tab, and then the course online book. Now, this is a work of art.
776
01:12:13,680 --> 01:12:18,400
This is quite beautiful. It is some code to automatically turn all of the materials from the
777
01:12:18,400 --> 01:12:24,720
GitHub. So if we come into here code, if we click on notebook zero zero, this is going to sometimes
778
01:12:24,720 --> 01:12:29,280
if you've ever worked with Jupiter notebooks on GitHub, they can take a while to load.
779
01:12:29,920 --> 01:12:35,200
So all of the materials here automatically get converted into this book. So the beautiful
780
01:12:35,200 --> 01:12:40,000
thing about the book is that it's got different headings here. It's all readable. It's all online.
781
01:12:40,000 --> 01:12:43,600
It's going to have all the images there. And you can also search some stuff here,
782
01:12:43,600 --> 01:12:50,880
PyTorch training steps, creating a training loop in PyTorch. Beautiful. We're going to see
783
01:12:50,880 --> 01:12:56,800
this later on. So they're the three big materials that you need to be aware of, the three big resources
784
01:12:56,800 --> 01:13:03,200
for this specific course materials on GitHub course Q&A course online book, which is
785
01:13:03,200 --> 01:13:08,160
learn pytorch.io, simple URL to remember, all the materials will be there. And then
786
01:13:09,120 --> 01:13:15,680
specifically for PyTorch or things PyTorch, the PyTorch website and the PyTorch forums.
787
01:13:15,680 --> 01:13:20,240
So if you have a question that's not course related, but more PyTorch related, I'd highly
788
01:13:20,240 --> 01:13:25,200
recommend you go to the PyTorch forums, which is available at discuss.pytorch.org. We've got a link
789
01:13:25,200 --> 01:13:30,560
there. Then the PyTorch website, PyTorch.org, this is going to be your home ground for everything
790
01:13:30,560 --> 01:13:36,080
PyTorch of course. We have the documentation here. And as I said, this course is not a replacement
791
01:13:36,080 --> 01:13:42,320
for getting familiar with the PyTorch documentation. This, the course actually is built off all of
792
01:13:42,320 --> 01:13:47,440
the PyTorch documentation. It's just organized in a slightly different way. So there's plenty of
793
01:13:47,440 --> 01:13:52,720
amazing resources here on everything to do with PyTorch. This is your home ground. And you're
794
01:13:52,720 --> 01:13:58,400
going to see me referring to this a lot throughout the course. So just keep these in mind, course
795
01:13:58,400 --> 01:14:05,760
materials on GitHub, course discussions, learnpytorch.io. This is all for the course. And all things
796
01:14:05,760 --> 01:14:11,440
PyTorch specific, so not necessarily this course, but just PyTorch in general, the PyTorch website
797
01:14:11,440 --> 01:14:18,240
and the PyTorch forums. With that all being said, we've come so far. We've covered a lot already,
798
01:14:18,240 --> 01:14:25,600
but guess what time it is? Let's write some code. I'll see you in the next video.
799
01:14:25,600 --> 01:14:32,240
We've covered enough of the fundamentals so far. Well, from a theory point of view,
800
01:14:32,240 --> 01:14:36,640
let's get into coding. So I'm going to go over to Google Chrome. I'm going to introduce you to
801
01:14:36,640 --> 01:14:41,360
the tool. One of the main tools we're going to be using for the entire course. And that is Google
802
01:14:41,360 --> 01:14:47,040
Colab. So the way I would suggest following along with this course is remember, one of the major
803
01:14:47,040 --> 01:14:52,880
ones is to code along. So we're going to go to colab.research.google. I've got a typo here.
804
01:14:52,880 --> 01:14:58,000
Classic. You're going to see me do lots of typos throughout this course. Colab.research.google.com.
805
01:14:58,000 --> 01:15:03,200
This is going to load up Google Colab. Now, you can follow along with what I'm going to do,
806
01:15:03,200 --> 01:15:08,800
but if you'd like to find out how to use Google Colab from a top-down perspective,
807
01:15:09,520 --> 01:15:13,040
you can go through some of these. I'd probably recommend going through overview of
808
01:15:13,040 --> 01:15:18,640
Collaboratory Features. But essentially, what Google Colab is going to enable us to do is
809
01:15:18,640 --> 01:15:23,840
create a new notebook. And this is how we're going to practice writing PyTorch code.
810
01:15:23,840 --> 01:15:30,640
So if you refer to the reference document of learnpytorch.io, these are actually
811
01:15:30,640 --> 01:15:37,760
Colab notebooks just in book format, so online book format. So these are the basis materials
812
01:15:37,760 --> 01:15:42,480
for what the course is going to be. There's going to be more here, but every new module,
813
01:15:42,480 --> 01:15:46,240
we're going to start a new notebook. And I'm going to just zoom in here.
814
01:15:46,240 --> 01:15:51,520
So this one, the first module is going to be zero, zero, because Python code starts at zero,
815
01:15:51,520 --> 01:15:57,520
zero. And we're going to call this PyTorch Fundamentals. I'm going to call mine video,
816
01:15:57,520 --> 01:16:02,640
just so we know that this is the notebook that I wrote through the video. And what this is going
817
01:16:02,640 --> 01:16:08,560
to do is if we click Connect, it's going to give us a space to write Python code. So here we can go
818
01:16:08,560 --> 01:16:16,960
print. Hello, I'm excited to learn PyTorch. And then if we hit shift and enter, it comes out like
819
01:16:16,960 --> 01:16:22,560
that. But another beautiful benefit of Google Colab are PS. I'm using the pro version, which
820
01:16:22,560 --> 01:16:26,880
costs about $10 a month or so. That price may be different depending on where you're from.
821
01:16:26,880 --> 01:16:31,200
The reason I'm doing that is because I use Colab all the time. However, you do not have to use
822
01:16:31,200 --> 01:16:36,080
the paid version for this course. Google Colab comes with a free version, which you'll be able
823
01:16:36,080 --> 01:16:41,680
to use to complete this course. If you see it worthwhile, I find the pro version is worthwhile.
824
01:16:42,240 --> 01:16:47,360
Another benefit of Google Colab is if we go here, we can go to runtime. Let me just show you that
825
01:16:47,360 --> 01:16:55,280
again. Runtime, change runtime type, hardware accelerator. And we can choose to run our code
826
01:16:55,280 --> 01:17:00,720
on an accelerator here. Now we've got GPU and TPU. We're going to be focused on using
827
01:17:00,720 --> 01:17:06,640
GPU. If you'd like to look into TPU, I'll leave that to you. But we can click GPU, click save.
828
01:17:06,640 --> 01:17:13,280
And now our code, if we write it in such a way, will run on the GPU. Now we're going to see this
829
01:17:13,280 --> 01:17:20,000
later on code that runs on the GPU is a lot faster in terms of compute time, especially for deep
830
01:17:20,000 --> 01:17:27,440
learning. So if we write here in a video SMI, we now have access to a GPU. In my case, I have a
831
01:17:27,440 --> 01:17:34,320
Tesla P100. It's quite a good GPU. You tend to get the better GPUs. If you pay for Google Colab,
832
01:17:34,320 --> 01:17:39,040
if you don't pay for it, you get the free version, you get a free GPU. It just won't be as fast as
833
01:17:39,040 --> 01:17:44,080
the GPUs you typically get with the paid version. So just keep that in mind. A whole bunch of stuff
834
01:17:44,080 --> 01:17:50,320
that we can do here. I'm not going to go through it all because there's too much. But we've covered
835
01:17:50,320 --> 01:17:56,800
basically what we need to cover. So if we just come up here, I'm going to write a text cell. So
836
01:17:56,800 --> 01:18:07,200
oo dot pytorch fundamentals. And I'm going to link in here resource notebook. Now you can come
837
01:18:07,200 --> 01:18:14,240
to learn pytorch.io and all the notebooks are going to be in sync. So 00, we can put this in here.
838
01:18:14,240 --> 01:18:20,160
Resource notebook is there. That's what this notebook is going to be based off. This one here.
839
01:18:20,160 --> 01:18:24,480
And then if you have a question about what's going on in this notebook,
840
01:18:24,480 --> 01:18:31,520
you can come to the course GitHub. And then we go back, back. This is where you can see what's
841
01:18:31,520 --> 01:18:35,840
going on. This is pytorch deep learning projects as you can see what's happening. At the moment,
842
01:18:35,840 --> 01:18:40,240
I've got pytorch course creation because I'm in the middle of creating it. But if you have a question,
843
01:18:40,240 --> 01:18:45,920
you can come to Mr. D Burke slash pytorch deep learning slash discussions, which is this tab here,
844
01:18:45,920 --> 01:18:51,440
and then ask a question by clicking new discussion. So any discussions related to this notebook,
845
01:18:51,440 --> 01:18:55,360
you can ask it there. And I'm going to turn this right now. This is a code cell.
846
01:18:56,000 --> 01:19:01,360
CoLab is basically comprised of code and text cells. I'm going to turn this into a text cell
847
01:19:01,360 --> 01:19:07,120
by pressing command mm, shift and enter. Now we have a text cell. And then if we wanted another
848
01:19:07,120 --> 01:19:13,120
code cell, we could go like that text code text code, yada, yada, yada. But I'm going to delete this.
849
01:19:14,160 --> 01:19:20,480
And to finish off this video, we're going to import pytorch. So we're going to import torch.
850
01:19:20,480 --> 01:19:27,680
And then we're going to print torch dot dot version. So that's another beautiful thing about Google
851
01:19:27,680 --> 01:19:33,840
Colab is that it comes with pytorch pre installed and a lot of other common Python data science
852
01:19:33,840 --> 01:19:43,120
packages, such as we could also go import pandas as PD, import NumPy as MP import mapplot lib
853
01:19:43,120 --> 01:19:53,840
lib dot pyplot as PLT. This is Google Colab is by far the easiest way to get started with this
854
01:19:53,840 --> 01:20:00,240
course. You can run things locally. If you'd like to do that, I'd refer to you to pytorch deep
855
01:20:00,240 --> 01:20:06,400
learning is going to be set up dot MD, getting set up to code pytorch. We've just gone through
856
01:20:06,400 --> 01:20:12,000
number one setting up with Google Colab. There is also another option for getting started locally.
857
01:20:12,000 --> 01:20:15,760
Right now, this document's a work in progress, but it'll be finished by the time you watch this
858
01:20:15,760 --> 01:20:20,880
video. This is not a replacement, though, for the pytorch documentation for getting set up
859
01:20:20,880 --> 01:20:25,920
locally. So if you'd like to run locally on your machine, rather than going on Google Colab,
860
01:20:25,920 --> 01:20:31,840
please refer to this documentation or set up dot MD here. But if you'd like to get started
861
01:20:31,840 --> 01:20:36,480
as soon as possible, I'd highly recommend you using Google Colab. In fact, the entire course
862
01:20:36,480 --> 01:20:40,960
is going to be able to be run through Google Colab. So let's finish off this video, make sure
863
01:20:40,960 --> 01:20:46,880
we've got pytorch ready to go. And of course, some fundamental data science packages here.
864
01:20:47,520 --> 01:20:55,040
Wonderful. This means that we have pytorch 1.10.0. So if your version number is far greater than this,
865
01:20:55,040 --> 01:21:00,320
maybe you're watching this video a couple of years in the future, and pytorch is up to 2.11,
866
01:21:00,320 --> 01:21:05,600
maybe some of the code in this notebook won't work. But 1.10.0 should be more than enough for
867
01:21:05,600 --> 01:21:15,920
what we're going to do. And plus Q111, CU111, stands for CUDA version 11.1, I believe. And what
868
01:21:15,920 --> 01:21:20,480
that would mean is if we came in here, and we wanted to install it on Linux, which is what
869
01:21:20,480 --> 01:21:27,280
Colab runs on, there's Mac and Windows as well. We've got CUDA. Yeah. So right now, as of recording
870
01:21:27,280 --> 01:21:33,680
this video, the latest pytorch build is 1.10.2. So you'll need at least pytorch 1.10 to complete
871
01:21:33,680 --> 01:21:43,280
this course and CUDA 11.3. So that's CUDA toolkit. If you remember, CUDA toolkit is NVIDIA's
872
01:21:44,400 --> 01:21:51,600
programming. There we go. NVIDIA developer. CUDA is what enables us to run our pytorch code on
873
01:21:51,600 --> 01:22:00,320
NVIDIA GPUs, which we have access to in Google Colab. Beautiful. So we're set up ready to write code.
874
01:22:00,320 --> 01:22:05,840
Let's get started in the next video writing some pytorch code. This is so exciting. I'll see you
875
01:22:05,840 --> 01:22:13,120
there. So we've got set up. We've got access to pytorch. We've got a Google Colab instance running
876
01:22:13,120 --> 01:22:18,640
here. We've got a GPU because we've gone up to runtime, change runtime type, hardware accelerator.
877
01:22:18,640 --> 01:22:23,920
You won't necessarily need a GPU for this entire notebook, but I just wanted to show you how to
878
01:22:23,920 --> 01:22:30,320
get access to a GPU because we're going to be using them later on. So let's get rid of this.
879
01:22:30,320 --> 01:22:36,080
And one last thing, how I'd recommend going through this course is in a split window fashion.
880
01:22:36,080 --> 01:22:40,640
So for example, you might have the video where I'm talking right now and writing code on the
881
01:22:40,640 --> 01:22:46,720
left side, and then you might have another window over the other side with your own Colab
882
01:22:46,720 --> 01:22:55,040
window. And you can go new notebook, call it whatever you want, my notebook. You could call it very
883
01:22:55,040 --> 01:23:00,240
similar to what we're writing here. And then if I write code over on this side, on this video,
884
01:23:01,280 --> 01:23:05,840
you can't copy it, of course, but you'll write the same code here and then go on and go on and
885
01:23:05,840 --> 01:23:10,000
go on. And if you get stuck, of course, you have the reference notebook and you have an
886
01:23:10,000 --> 01:23:15,680
opportunity to ask a question here. So with that being said, let's get started. The first thing
887
01:23:15,680 --> 01:23:23,200
we're going to have a look at in PyTorch is an introduction to tenses. So tenses are the main
888
01:23:23,200 --> 01:23:28,960
building block of deep learning in general, or data. And so you may have watched the video,
889
01:23:29,520 --> 01:23:37,840
what is a tensor? For the sake of this course, tenses are a way to represent data, especially
890
01:23:37,840 --> 01:23:43,520
multi dimensional data, numeric data that is, but that numeric data represents something else.
891
01:23:43,520 --> 01:23:49,920
So let's go in here, creating tenses. So the first kind of tensor we're going to create is
892
01:23:49,920 --> 01:23:53,760
actually called a scalar. I know I'm going to throw a lot of different names of things at you,
893
01:23:53,760 --> 01:23:58,640
but it's important that you're aware of such nomenclature. Even though in PyTorch, almost
894
01:23:58,640 --> 01:24:03,680
everything is referred to as a tensor, there are different kinds of tenses. And just to
895
01:24:03,680 --> 01:24:10,880
exemplify the fact that we're using a reference notebook, if we go up here, we can see we have
896
01:24:10,880 --> 01:24:16,080
importing PyTorch. We've done that. Now we're up to introduction to tenses. We've got creating
897
01:24:16,080 --> 01:24:22,320
tenses, and we've got scalar, etc, etc, etc. So this is what we're going to be working through.
898
01:24:22,320 --> 01:24:29,200
Let's do it together. So scalar, the way to, oops, what have I done there? The way to create a
899
01:24:29,200 --> 01:24:36,480
tensor in PyTorch, we're going to call this scalar equals torch dot tensor. And we're going to fill
900
01:24:36,480 --> 01:24:42,880
it with the number seven. And then if we press or retype in scalar, what do we get back? Seven,
901
01:24:42,880 --> 01:24:48,560
wonderful. And it's got the tensor data type here. So how would we find out about what torch dot
902
01:24:48,560 --> 01:24:55,200
tensor actually is? Well, let me show you how I would. We go to torch dot tensor. There we go.
903
01:24:55,200 --> 01:25:00,800
We've got the documentation. So this is possibly the most common class in PyTorch other than
904
01:25:00,800 --> 01:25:06,560
one we're going to see later on that you'll use, which is torch dot nn. Basically, everything in
905
01:25:06,560 --> 01:25:11,440
PyTorch works off torch dot tensor. And if you'd like to learn more, you can read through here.
906
01:25:11,440 --> 01:25:15,360
In fact, I would encourage you to read through this documentation for at least 10 minutes
907
01:25:15,360 --> 01:25:20,080
after you finish some videos here. So with that being said, I'm going to link that in here.
908
01:25:20,080 --> 01:25:31,040
So PyTorch tensors are created using torch dot tensor. And then we've got that link there.
909
01:25:32,800 --> 01:25:38,320
Oops, typos got law Daniel. Come on. They're better than this. No, I'm kidding. There's going to be
910
01:25:38,320 --> 01:25:44,320
typos got law through the whole course. Okay. Now, what are some attributes of a scalar? So
911
01:25:44,320 --> 01:25:49,040
some details about scalars. Let's find out how many dimensions there are. Oh, and by the way,
912
01:25:49,040 --> 01:25:55,520
this warning, perfect timing. Google Colab will give you some warnings here, depending on whether
913
01:25:55,520 --> 01:26:01,600
you're using a GPU or not. Now, the reason being is because Google Colab provides GPUs to you and
914
01:26:01,600 --> 01:26:08,560
I for free. However, GPUs aren't free for Google to provide. So if we're not using a GPU, we can
915
01:26:08,560 --> 01:26:14,880
save some resources, allow someone else to use a GPU by going to none. And of course, we can
916
01:26:14,880 --> 01:26:19,680
always switch this back. So I'm going to turn my GPU off so that someone else out there,
917
01:26:20,320 --> 01:26:25,920
I'm not using the GPU at the moment, they can use it. So what you're also going to see is if
918
01:26:25,920 --> 01:26:33,600
your Google Colab instance ever restarts up here, we're going to have to rerun these cells. So if
919
01:26:33,600 --> 01:26:38,560
you stop coding for a while, go have a break and then come back and you start your notebook again,
920
01:26:38,560 --> 01:26:46,080
that's one downside of Google Colab is that it resets after a few hours. How many hours? I don't
921
01:26:46,080 --> 01:26:51,680
know exactly. The reset time is longer if you have the pro subscription, but because it's a free
922
01:26:51,680 --> 01:26:56,560
service and the way Google calculate usage and all that sort of stuff, I can't give a conclusive
923
01:26:56,560 --> 01:27:02,640
evidence or conclusive answer on how long until it resets. But just know, if you come back, you might
924
01:27:02,640 --> 01:27:07,600
have to rerun some of your cells and you can do that with shift and enter. So a scalar has no
925
01:27:07,600 --> 01:27:13,760
dimensions. All right, it's just a single number. But then we move on to the next thing. Or actually,
926
01:27:13,760 --> 01:27:19,120
if we wanted to get this number out of a tensor type, we can use scalar dot item, this is going
927
01:27:19,120 --> 01:27:26,480
to give it back as just a regular Python integer. Wonderful, there we go, the number seven back,
928
01:27:26,480 --> 01:27:37,360
get tensor back as Python int. Now, the next thing that we have is a vector. So let's write
929
01:27:37,360 --> 01:27:44,160
in here vector, which again is going to be created with torch dot tensor. But you will also hear
930
01:27:44,720 --> 01:27:52,240
the word vector used a lot too. Now, what is the deal? Oops, seven dot seven. Google Colab's auto
931
01:27:52,240 --> 01:27:57,040
complete is a bit funny. It doesn't always do the thing you want it to. So if we see a vector,
932
01:27:57,840 --> 01:28:02,000
we've got two numbers here. And then if we really wanted to find out what is a vector.
933
01:28:02,000 --> 01:28:10,800
So a vector usually has magnitude and direction. So what we're going to see later on is, there we
934
01:28:10,800 --> 01:28:15,760
go, magnitude, how far it's going and which way it's going. And then if we plotted it, we've got,
935
01:28:15,760 --> 01:28:20,720
yeah, a vector equals the magnitude would be the length here and the direction would be where it's
936
01:28:20,720 --> 01:28:26,640
pointing. And oh, here we go, scalar vector matrix tensor. This is what we're working on as well.
937
01:28:26,640 --> 01:28:33,920
So the thing about vectors, how they differ with scalars is how I just remember them is
938
01:28:33,920 --> 01:28:38,000
rather than magnitude and direction is a vector typically has more than one number.
939
01:28:38,560 --> 01:28:42,800
So if we go vector and dim, how many dimensions does it have?
940
01:28:44,960 --> 01:28:50,240
It has one dimension, which is kind of confusing. But when we see tensors with more than one
941
01:28:50,240 --> 01:28:55,280
dimension, it'll make sense. And another way that I remember how many dimensions something
942
01:28:55,280 --> 01:29:02,640
has is by the number of square brackets. So let's check out something else. Maybe we go vector
943
01:29:03,440 --> 01:29:12,160
dot shape shape is two. So the difference between dimension. So dimension is like number of square
944
01:29:12,160 --> 01:29:18,080
brackets. And when I say, even though there's two here, I mean number of pairs of closing square
945
01:29:18,080 --> 01:29:24,640
brackets. So there's one pair of closing square brackets here. But the shape of the vector is two.
946
01:29:24,640 --> 01:29:31,680
So we have two by one elements. So that means a total of two elements. Now if we wanted to step
947
01:29:31,680 --> 01:29:37,440
things up a notch, let's create a matrix. So this is another term you're going to hear.
948
01:29:37,440 --> 01:29:42,640
And you might be wondering why I'm capitalizing matrix. Well, I'll explain that in the second
949
01:29:42,640 --> 01:29:50,560
matrix equals torch dot tensor. And we're going to put two square brackets here. You might be
950
01:29:50,560 --> 01:29:55,600
thinking, what could the two square brackets mean? Or actually, that's a little bit of a challenge.
951
01:29:55,600 --> 01:30:02,880
If one pair of square brackets had an endem of one, what will the endem be number of dimensions
952
01:30:02,880 --> 01:30:12,960
of two square brackets? So let's create this matrix. Beautiful. So we've got another tensor here.
953
01:30:12,960 --> 01:30:18,400
Again, as I said, these things have different names, like the traditional name of scalar,
954
01:30:18,400 --> 01:30:24,000
vector matrix, but they're all still a torch dot tensor. That's a little bit confusing,
955
01:30:24,000 --> 01:30:29,840
but the thing you should remember in PyTorch is basically anytime you encode data into numbers,
956
01:30:29,840 --> 01:30:37,040
it's of a tensor data type. And so now how many n number of dimensions do you think a matrix has?
957
01:30:38,160 --> 01:30:43,920
It has two. So there we go. We have two square brackets. So if we wanted to get matrix,
958
01:30:43,920 --> 01:30:50,640
let's index on the zeroth axis. Let's see what happens there. Ah, so we get seven and eight.
959
01:30:50,640 --> 01:30:58,000
And then we get off the first dimension. Ah, nine and 10. So this is where the square brackets,
960
01:30:58,000 --> 01:31:02,960
the pairings come into play. We've got two square bracket pairings on the outside here.
961
01:31:02,960 --> 01:31:08,800
So we have an endem of two. Now, if we get the shape of the matrix, what do you think the shape will be?
962
01:31:08,800 --> 01:31:21,280
Ah, two by two. So we've got two numbers here by two. So we have a total of four elements in there.
963
01:31:22,320 --> 01:31:25,920
So we're covering a fair bit of ground here, nice and quick, but that's going to be the
964
01:31:25,920 --> 01:31:30,880
teaching style of this course is we're going to get quite hands on and writing a lot of code and
965
01:31:30,880 --> 01:31:36,080
just interacting with it rather than continually going back over and discussing what's going on
966
01:31:36,080 --> 01:31:42,000
here. The best way to find out what's happening within a matrix is to write more code that's similar
967
01:31:42,000 --> 01:31:48,960
to these matrices here. But let's not stop at matrix. Let's upgrade to a tensor now. So I might
968
01:31:48,960 --> 01:31:54,400
put this in capitals as well. And I haven't explained what the capitals mean yet, but we'll see that
969
01:31:54,400 --> 01:32:01,840
in a second. So let's go torch dot tensor. And what we're going to do is this time,
970
01:32:01,840 --> 01:32:07,360
we've done one square bracket pairing. We've done two square bracket pairings. Let's do three
971
01:32:07,360 --> 01:32:11,840
square bracket pairings and just get a little bit adventurous. All right. And so you might be thinking
972
01:32:11,840 --> 01:32:16,480
at the moment, this is quite tedious. I'm just going to write a bunch of random numbers here. One,
973
01:32:16,480 --> 01:32:23,920
two, three, three, six, nine, two, five, four. Now you might be thinking, Daniel, you've said
974
01:32:23,920 --> 01:32:28,480
tensors could have millions of numbers. If we had to write them all by hand, that would be
975
01:32:28,480 --> 01:32:35,520
quite tedious. And yes, you're completely right. The fact is, though, that most of the time,
976
01:32:35,520 --> 01:32:41,200
you won't be crafting tensors by hand. PyTorch will do a lot of that behind the scenes. However,
977
01:32:41,200 --> 01:32:45,600
it's important to know that these are the fundamental building blocks of the models
978
01:32:45,600 --> 01:32:51,680
and the deep learning neural networks that we're going to be building. So tensor capitals as well,
979
01:32:51,680 --> 01:32:57,920
we have three square brackets. So, or three square bracket pairings. I'm just going to refer to three
980
01:32:57,920 --> 01:33:04,400
square brackets at the very start because they're going to be paired down here. How many n dim or
981
01:33:04,400 --> 01:33:11,520
number of dimensions do you think our tensor will have? Three, wonderful. And what do you think the
982
01:33:11,520 --> 01:33:17,360
shape of our tensor is? We have three elements here. We have three elements here, three elements
983
01:33:17,360 --> 01:33:29,840
here. And we have one, two, three. So maybe our tensor has a shape of one by three by three.
984
01:33:29,840 --> 01:33:38,800
Hmm. What does that mean? Well, we've got three by one, two, three. That's the second square
985
01:33:38,800 --> 01:33:44,960
bracket there by one. Ah, so that's the first dimension there or the zeroth dimension because
986
01:33:44,960 --> 01:33:49,760
we remember PyTorch is zero indexed. We have, well, let's just instead of talking about it,
987
01:33:49,760 --> 01:33:54,800
let's just get on the zeroth axis and see what happens with the zeroth dimension. There we go.
988
01:33:54,800 --> 01:34:01,280
Okay. So there's, this is the far left one, zero, which is very confusing because we've got a one
989
01:34:01,280 --> 01:34:12,160
here, but so we've got, oops, don't mean that. What this is saying is we've got one three by three
990
01:34:12,160 --> 01:34:19,920
shape tensor. So very outer bracket matches up with this number one here. And then this three
991
01:34:20,480 --> 01:34:28,320
matches up with the next one here, which is one, two, three. And then this three matches up with
992
01:34:28,320 --> 01:34:36,400
this one, one, two, three. Now, if you'd like to see this with a pretty picture, we can see it here.
993
01:34:36,400 --> 01:34:44,960
So dim zero lines up. So the blue bracket, the very outer one, lines up with the one. Then dim
994
01:34:44,960 --> 01:34:52,400
equals one, this one here, the middle bracket, lines up with the middle dimension here. And then
995
01:34:52,400 --> 01:35:01,680
dim equals two, the very inner lines up with these three here. So again, this is going to take a lot
996
01:35:01,680 --> 01:35:06,720
of practice. It's taken me a lot of practice to understand the dimensions of tensors. But
997
01:35:07,600 --> 01:35:14,000
to practice, I would like you to write out your own tensor of, you can put however many square
998
01:35:14,000 --> 01:35:20,560
brackets you want. And then just interact with the end dim shape and indexing, just as I've done
999
01:35:20,560 --> 01:35:25,120
here, but you can put any combination of numbers inside this tensor. That's a little bit of practice
1000
01:35:25,120 --> 01:35:30,400
before the next video. So give that a shot and then we'll move on to the next topic.
1001
01:35:30,400 --> 01:35:39,840
I'll see you there. Welcome back. In the last video, we covered the basic building blocks of data
1002
01:35:39,840 --> 01:35:45,760
representation in deep learning, which is the tensor, or in PyTorch, specifically torch.tensor.
1003
01:35:45,760 --> 01:35:51,520
But within that, we had to look at what a scalar is. We had to look at what a vector is. We had to
1004
01:35:51,520 --> 01:35:57,360
look at a matrix. We had to look at what a tensor is. And I issued you the challenge to get as
1005
01:35:57,360 --> 01:36:01,840
creative as you like with creating your own tensor. So I hope you gave that a shot because as you'll
1006
01:36:01,840 --> 01:36:07,200
see throughout the course and your deep learning journey, a tensor can represent or can be of almost
1007
01:36:07,200 --> 01:36:13,120
any shape and size and have almost any combination of numbers within it. And so this is very important
1008
01:36:13,120 --> 01:36:18,320
to be able to interact with different tensors to be able to understand what the different names of
1009
01:36:18,320 --> 01:36:24,000
things are. So when you hear matrix, you go, oh, maybe that's a two dimensional tensor. When you
1010
01:36:24,000 --> 01:36:28,960
hear a vector, maybe that's a one dimensional tensor. When you hear a tensor, that could be any
1011
01:36:28,960 --> 01:36:33,120
amount of dimensions. And just for reference for that, if we come back to the course reference,
1012
01:36:33,120 --> 01:36:38,400
we've got a scalar. What is it? A single number, number of dimensions, zero. We've got a vector,
1013
01:36:38,400 --> 01:36:45,920
a number with direction, number of dimensions, one, a matrix, a tensor. And now here's another little
1014
01:36:45,920 --> 01:36:52,160
tidbit of the nomenclature of things, the naming of things. Typically, you'll see a variable name
1015
01:36:52,160 --> 01:36:58,880
for a scalar or a vector as a lowercase. So a vector, you might have a lowercase y storing that
1016
01:36:58,880 --> 01:37:06,480
data. But for a matrix or a tensor, you'll often see an uppercase letter or variable in Python in
1017
01:37:06,480 --> 01:37:12,080
our case, because we're writing code. And so I am not exactly sure why this is, but this is just
1018
01:37:12,080 --> 01:37:16,800
what you're going to see in machine learning and deep learning code and research papers
1019
01:37:16,800 --> 01:37:22,720
across the board. This is a typical nomenclature. Scalars and vectors, lowercase, matrix and tensors,
1020
01:37:22,720 --> 01:37:28,000
uppercase, that's where that naming comes from. And that's why I've given the tensor uppercase here.
1021
01:37:28,880 --> 01:37:34,240
Now, with that being said, let's jump in to another very important concept with tensors.
1022
01:37:34,240 --> 01:37:40,000
And that is random tensors. Why random tensors? I'm just writing this in a code cell now.
1023
01:37:40,000 --> 01:37:48,160
I could go here. This is a comment in Python, random tensors. But we'll get rid of that. We could
1024
01:37:48,160 --> 01:37:54,320
just start another text cell here. And then three hashes is going to give us a heading, random tensors
1025
01:37:54,320 --> 01:38:02,560
there. Or I could turn this again into a markdown cell with command mm when I'm using Google Colab.
1026
01:38:02,560 --> 01:38:10,080
So random tensors. Let's write down here. Why random tensors? So we've done the tedious thing
1027
01:38:10,080 --> 01:38:15,120
of creating our own tensors with some numbers that we've defined, whatever these are. Again,
1028
01:38:15,120 --> 01:38:21,680
you could define these as almost anything. But random tensors is a big part in pytorch because
1029
01:38:21,680 --> 01:38:34,240
let's write this down. Random tensors are important because the way many neural networks learn is
1030
01:38:34,240 --> 01:38:42,800
that they start with tensors full of random numbers and then adjust those random numbers
1031
01:38:42,800 --> 01:38:52,720
to better represent the data. So seriously, this is one of the big concepts of neural networks.
1032
01:38:52,720 --> 01:38:58,480
I'm going to write in code here, which is this is what the tick is for. Start with random numbers.
1033
01:38:58,480 --> 01:39:19,200
Look at data, update random numbers. Look at data, update random numbers. That is the crux
1034
01:39:19,200 --> 01:39:25,280
of neural networks. So let's create a random tensor with pytorch. Remember how I said that
1035
01:39:25,280 --> 01:39:30,960
pytorch is going to create tensors for you behind the scenes? Well, this is one of the ways that
1036
01:39:30,960 --> 01:39:40,880
it does so. So we create a random tensor and we give it a size of random tensor of size or shape.
1037
01:39:40,880 --> 01:39:47,600
Pytorch use these independently. So size, shape, they mean the different versions of the same thing.
1038
01:39:47,600 --> 01:39:58,240
So random tensor equals torch dot rand. And we're going to type in here three, four. And the beautiful
1039
01:39:58,240 --> 01:40:02,960
thing about Google Colab as well is that if we wait long enough, it's going to pop up with the doc
1040
01:40:02,960 --> 01:40:07,680
string of what's going on. I personally find this a little hard to read in Google Colab,
1041
01:40:07,680 --> 01:40:13,360
because you see you can keep going down there. You might be able to read that. But what can we do?
1042
01:40:13,360 --> 01:40:19,920
Well, we can go to torch dot rand. Then we go to the documentation. Beautiful. Now there's a whole
1043
01:40:19,920 --> 01:40:24,240
bunch of stuff here that you're more than welcome to read. We're not going to go through all that.
1044
01:40:24,240 --> 01:40:30,400
We're just going to see what happens hands on. So we'll copy that in here. And write this in notes,
1045
01:40:31,120 --> 01:40:37,440
torch random tensors. Done. Just going to make some code cells down here. So I've got some space.
1046
01:40:37,440 --> 01:40:46,240
I can get this a bit up here. Let's see what our random tensor looks like. There we go. Beautiful
1047
01:40:46,240 --> 01:40:53,120
of size three, four. So we've got three or four elements here. And then we've got three deep
1048
01:40:53,120 --> 01:40:59,040
here. So again, there's the two pairs. So what do you think the number of dimensions will be
1049
01:40:59,040 --> 01:41:09,600
for random tensor? And dim. Two beautiful. And so we have some random numbers here. Now the
1050
01:41:09,600 --> 01:41:14,640
beautiful thing about pie torch again is that it's going to do a lot of this behind the scenes. So
1051
01:41:14,640 --> 01:41:20,320
if we wanted to create a size of 10 10, in some cases, we won't want one dimension here. And then
1052
01:41:20,320 --> 01:41:24,400
it's going to go 10 10. And then if we check the number of dimensions, how many do you think it
1053
01:41:24,400 --> 01:41:31,120
will be now three? Why is that? Because we've got one 10 10. And then if we wanted to create 10 10 10.
1054
01:41:32,560 --> 01:41:35,680
What's the number of dimensions going to be? It's not going to change. Why is that?
1055
01:41:36,720 --> 01:41:39,200
We haven't run that cell yet, but we've got a lot of numbers here.
1056
01:41:42,240 --> 01:41:47,280
We can find out what 10 times 10 times 10 is. And I know we can do that in our heads, but
1057
01:41:47,280 --> 01:41:51,520
the beauty of collab is we've got a calculator right here. 10 times 10 times 10. We've got a
1058
01:41:51,520 --> 01:41:57,120
thousand elements in there. But sometimes tenses can be hundreds of thousands of elements or
1059
01:41:57,120 --> 01:42:01,360
millions of elements. But pie torch is going to take care of a lot of this behind the scenes. So
1060
01:42:01,360 --> 01:42:10,720
let's clean up a bit of space here. This is a random tensor. Random numbers beautiful of now
1061
01:42:10,720 --> 01:42:15,200
it's got two dimensions because we've got three by four. And if we put another one in the front
1062
01:42:15,200 --> 01:42:20,480
there, we're going to have how many dimensions three dimensions there. But again, this number
1063
01:42:20,480 --> 01:42:26,560
of dimensions could be any number. And what's inside here could be any number. Let's get rid of that.
1064
01:42:26,560 --> 01:42:31,840
And let's get a bit specific because right now this is just a random tensor of whatever dimension.
1065
01:42:31,840 --> 01:42:44,160
How about we create a random tensor with similar shape to an image tensor. So a lot of the time
1066
01:42:44,160 --> 01:42:50,480
when we turn images, image size tensor, when we turn images into tenses, they're going to have,
1067
01:42:51,120 --> 01:42:58,800
let me just write it in code for you first, size equals a height, a width, and a number of color
1068
01:42:58,800 --> 01:43:06,640
channels. And so in this case, it's going to be height with color channels. And the color channels
1069
01:43:06,640 --> 01:43:15,520
are red, green, blue. And so let's create a random image tensor. Let's view the size of it or the
1070
01:43:15,520 --> 01:43:28,400
shape. And then random image size tensor will view the end dim. Beautiful. Okay, so we've got
1071
01:43:28,400 --> 01:43:36,080
torch size, the same size two, two, four, two, four, three, height, width, color channels. And we've got
1072
01:43:36,080 --> 01:43:42,080
three dimensions, one, four, height, width, color channels. Let's go and see an example of this. This
1073
01:43:42,080 --> 01:43:50,640
is the PyTorch Fundamentals notebook. If we go up to here, so say we wanted to encode this image
1074
01:43:50,640 --> 01:43:56,240
of my dad eating pizza with thumbs up of a square image of two, two, four by two, two, four.
1075
01:43:56,240 --> 01:44:02,560
This is an input. And if we wanted to encode this into tensor format, well, one of the ways of
1076
01:44:02,560 --> 01:44:07,360
representing an image tensor, very common ways is to split it into color channels because with
1077
01:44:07,360 --> 01:44:12,960
red, green, and blue, you can create almost any color you want. And then we have a tensor
1078
01:44:12,960 --> 01:44:17,600
representation. So sometimes you're going to see color channels come first. We can switch this
1079
01:44:17,600 --> 01:44:24,000
around and our code quite easily by going color channels here. But you'll also see color channels
1080
01:44:24,000 --> 01:44:28,880
come at the end. I know I'm saying a lot that we kind of haven't covered yet. The main takeaway
1081
01:44:28,880 --> 01:44:35,760
from here is that almost any data can be represented as a tensor. And one of the common ways to represent
1082
01:44:35,760 --> 01:44:42,720
images is in the format color channels, height, width, and how these values are will depend on
1083
01:44:42,720 --> 01:44:48,960
what's in the image. But we've done this in a random way. So the takeaway from this video is
1084
01:44:48,960 --> 01:44:55,680
that PyTorch enables you to create tensors quite easily with the random method. However, it is
1085
01:44:55,680 --> 01:45:02,000
going to do a lot of this creating tensors for you behind the scenes and why a random tensor is so
1086
01:45:02,000 --> 01:45:08,800
valuable because neural networks start with random numbers, look at data such as image tensors,
1087
01:45:08,800 --> 01:45:15,200
and then adjust those random numbers to better represent that data. And they repeat those steps
1088
01:45:15,200 --> 01:45:20,880
onwards and onwards and onwards. Let's finish this video here. I'm going to challenge for you
1089
01:45:20,880 --> 01:45:26,320
just to create your own random tensor of whatever size and shape you want. So you could have 5, 10,
1090
01:45:26,320 --> 01:45:30,640
10 here and see what that looks like. And then we'll keep coding in the next video.
1091
01:45:33,200 --> 01:45:37,920
I hope you took on the challenge of creating random tensor of your own size. And just a little
1092
01:45:37,920 --> 01:45:42,720
tidbit here. You might have seen me in the previous video. I didn't use the size parameter. But in
1093
01:45:42,720 --> 01:45:49,280
this case, I did here, you can go either way. So if we go torch dot rand size equals, we put in a
1094
01:45:49,280 --> 01:45:55,680
tuple here of three three, we've got that tensor there three three. But then also if we don't put
1095
01:45:55,680 --> 01:46:01,360
the size in there, it's the default. So it's going to create a very similar tensor. So whether you
1096
01:46:01,360 --> 01:46:07,360
have this size or not, it's going to have quite a similar output depending on the shape that you
1097
01:46:07,360 --> 01:46:14,480
put in there. But now let's get started to another kind of tensor that you might see zeros and ones.
1098
01:46:16,240 --> 01:46:21,920
So say you wanted to create a tensor, but that wasn't just full of random numbers,
1099
01:46:21,920 --> 01:46:30,400
you wanted to create a tensor of all zeros. This is helpful for if you're creating some form of
1100
01:46:30,400 --> 01:46:39,520
mask. Now, we haven't covered what a mask is. But essentially, if we create a tensor of all zeros,
1101
01:46:41,280 --> 01:46:48,320
what happens when you multiply a number by zero? All zeros. So if we wanted to multiply
1102
01:46:48,320 --> 01:46:53,520
these two together, let's do zeros times random tensor.
1103
01:46:53,520 --> 01:47:04,160
There we go, all zeros. So maybe if you're working with this random tensor and you wanted to mask
1104
01:47:04,160 --> 01:47:10,160
out, say all of the numbers in this column for some reason, you could create a tensor of zeros in
1105
01:47:10,160 --> 01:47:15,680
that column, multiply it by your target tensor, and you would zero all those numbers. That's telling
1106
01:47:15,680 --> 01:47:20,480
your model, hey, ignore all of the numbers that are in here because I've zeroed them out. And then
1107
01:47:20,480 --> 01:47:28,960
if you wanted to create a tensor of all ones, create a tensor of all ones, we can go ones equals
1108
01:47:28,960 --> 01:47:37,440
torch dot ones, size equals three, four. And then if we have a look, there's another parameter I
1109
01:47:37,440 --> 01:47:43,920
haven't showed you yet, but this is another important one is the D type. So the default data type,
1110
01:47:43,920 --> 01:47:49,520
so that's what D type stands for, is torch dot float. We've actually been using torch dot float
1111
01:47:49,520 --> 01:47:54,240
the whole time, because that's whenever you create a tensor with pytorch, we're using a pytorch
1112
01:47:54,240 --> 01:47:58,720
method, unless you explicitly define what the data type is, we'll see that later on, defining
1113
01:47:58,720 --> 01:48:06,160
what the data type is, it starts off as torch float 32. So these are float numbers. So that
1114
01:48:06,160 --> 01:48:13,440
is how you create zeros and ones zeros is probably I've seen more common than ones in use, but just
1115
01:48:13,440 --> 01:48:17,760
keep these in mind, you might come across them. There are lots of different methods to creating
1116
01:48:17,760 --> 01:48:25,200
tensors. And truth be told, like random is probably one of the most common, but you might see zeros
1117
01:48:25,200 --> 01:48:31,200
and ones out in the field. So now we've covered that. Let's move on into the next video, where
1118
01:48:31,200 --> 01:48:36,080
we're going to create a range. So have a go at creating a tensor full of zeros and whatever size
1119
01:48:36,080 --> 01:48:40,240
you want, and a tensor full of ones and whatever size you want. And I'll see you in the next video.
1120
01:48:42,480 --> 01:48:47,120
Welcome back. I hope you took on the challenge of creating a torch tensor of zeros of your
1121
01:48:47,120 --> 01:48:55,840
own size and ones of your own size. But now let's investigate how we might create a range of
1122
01:48:55,840 --> 01:49:04,400
tensors and tensors like. So these are two other very common methods of creating tensors.
1123
01:49:05,120 --> 01:49:12,480
So let's start by creating a range. So we'll first use torch dot range, because depending on
1124
01:49:12,480 --> 01:49:19,120
when you're watching this video, torch dot range may be still in play or it may be deprecated.
1125
01:49:19,680 --> 01:49:24,880
If we write in torch dot range right now with the pie torch version that I'm using, which is
1126
01:49:25,440 --> 01:49:33,920
torch dot version, which is torch or pie torch 1.10 point zero torch range is deprecated and
1127
01:49:33,920 --> 01:49:37,920
will be removed in a future release. So just keep that in mind. If you come across some code that's
1128
01:49:37,920 --> 01:49:44,240
using torch dot range, maybe out of whack. So the way to get around that is to fix that is to use
1129
01:49:44,240 --> 01:49:52,240
a range instead. And if we just write in torch dot a range, we've got tensors of zero to nine,
1130
01:49:52,240 --> 01:49:57,200
because it of course starts at zero index. If we wanted one to 10, we could go like this.
1131
01:49:57,200 --> 01:50:07,680
1, 2, 3, 4, 5, 6, 7, 8, 9, 10. And we can go zero, or we go 1, 2, 10, equals torch a range.
1132
01:50:10,960 --> 01:50:17,280
Wonderful. And we can also define the step. So let's let's type in some start and where can we
1133
01:50:17,280 --> 01:50:22,000
find the documentation on a range? Sometimes in Google Colab, you can press shift tab,
1134
01:50:22,000 --> 01:50:28,800
but I find that it doesn't always work for me. Yeah, you could hover over it, but we can also just
1135
01:50:28,800 --> 01:50:36,640
go torch a range and look for the documentation torch a range. So we've got start and step. Let's
1136
01:50:36,640 --> 01:50:44,560
see what all of these three do. Maybe we start at zero, and maybe we want it to go to a thousand,
1137
01:50:44,560 --> 01:50:53,120
and then we want a step of what should our step be? What's a fun number? 77. So it's not one to 10
1138
01:50:53,120 --> 01:51:01,200
anymore, but here we go. We've got start at zero, 77 plus 77 plus 77, all the way up to it finishes
1139
01:51:01,200 --> 01:51:09,840
at a thousand. So if we wanted to take it back to one to 10, we can go up here. 110, and the default
1140
01:51:09,840 --> 01:51:16,560
step is going to be one. Oops, we needed the end to be that it's going to finish at end minus one.
1141
01:51:17,200 --> 01:51:27,280
There we go. Beautiful. Now we can also create tensors like. So creating tensors like. So tensors
1142
01:51:27,280 --> 01:51:32,720
like is say you had a particular shape of a tensor you wanted to replicate somewhere else, but you
1143
01:51:32,720 --> 01:51:38,560
didn't want to explicitly define what that shape should be. So what's the shape of one to 10?
1144
01:51:42,160 --> 01:51:47,840
One to 10. Now if we wanted to create a tensor full of zeros that had the same shape as this,
1145
01:51:47,840 --> 01:51:57,120
we can use tensor like or zeros like. So 10 zeros, zeros equals, I'm not even sure if I'm
1146
01:51:57,120 --> 01:52:03,280
spelling zeros right then, zeros. Well, I might have a typo spelling zeros here, but you get what
1147
01:52:03,280 --> 01:52:10,800
I'm saying is torch zeros. Oh, torch spell it like that. That's why I'm spelling it like that.
1148
01:52:10,800 --> 01:52:19,040
Zeros like one to 10. And then the input is going to be one to 10. And we have a look at 10 zeros.
1149
01:52:19,040 --> 01:52:28,080
My goodness, this is taking quite the while to run. This is troubleshooting on the fly.
1150
01:52:28,080 --> 01:52:33,360
If something's happening like this, you can try to stop. If something was happening like that,
1151
01:52:33,360 --> 01:52:38,800
you can click run and then stop. Well, it's running so fast that I can't click stop. If you do also
1152
01:52:38,800 --> 01:52:43,760
run into trouble, you can go runtime, restart runtime. We might just do that now just to show you.
1153
01:52:44,320 --> 01:52:48,960
Restart and run all is going to restart the compute engine behind the collab notebook.
1154
01:52:48,960 --> 01:52:53,920
And run all the cells to where we are. So let's just see that we restart and run runtime. If you're
1155
01:52:53,920 --> 01:53:00,640
getting errors, sometimes this helps. There is no set in stone way to troubleshoot errors. It's
1156
01:53:00,640 --> 01:53:06,880
guess and check with this. So there we go. We've created 10 zeros, which is torch zeros like
1157
01:53:07,760 --> 01:53:14,320
our one to 10 tensor. So we've got zeros in the same shape as one to 10. So if you'd like to create
1158
01:53:14,320 --> 01:53:26,400
tensors, use torch arrange and get deprecated message. Use torch arrange instead for creating
1159
01:53:26,400 --> 01:53:31,680
a range of tensors with a start and end in a step. And then if you wanted to create tensors
1160
01:53:31,680 --> 01:53:38,080
or a tensor like something else, you want to look for the like method. And then you put an input,
1161
01:53:38,080 --> 01:53:43,200
which is another tensor. And then it'll create a similar tensor with whatever this method here
1162
01:53:43,200 --> 01:53:49,760
is like in that fashion or in the same shape as your input. So with that being said,
1163
01:53:49,760 --> 01:53:54,880
give that a try, create a range of tensors, and then try to replicate that range shape that you've
1164
01:53:54,880 --> 01:54:04,400
made with zeros. I'll see you in the next video. Welcome back. Let's now get into a very important
1165
01:54:04,400 --> 01:54:12,160
topic of tensor data types. So we've briefly hinted on this before. And I said that let's create
1166
01:54:12,160 --> 01:54:20,720
a tensor to begin with float 32 tensor. And we're going to go float 32 tensor equals torch
1167
01:54:20,720 --> 01:54:29,680
dot tensor. And let's just put in the numbers three, six, nine. If you've ever played need for
1168
01:54:29,680 --> 01:54:34,800
speed underground, you'll know where three, six, nine comes from. And then we're going to go
1169
01:54:34,800 --> 01:54:45,040
D type equals, let's just put none and see what happens, hey, float 32 tensor. Oh, what is the
1170
01:54:45,040 --> 01:54:54,640
data type? float 32, tensor dot D type. float 32, even though we put none, this is because
1171
01:54:54,640 --> 01:55:00,640
the default data type in pytorch, even if it's specified as none is going to come out as float 32.
1172
01:55:00,640 --> 01:55:06,480
What if we wanted to change that to something else? Well, let's type in here float 16.
1173
01:55:07,920 --> 01:55:14,480
And now we've got float 32 tensor. This variable name is a lie now because it's a float 16 tensor.
1174
01:55:14,480 --> 01:55:20,000
So we'll leave that as none. Let's go there. There's another parameter when creating tensors.
1175
01:55:20,000 --> 01:55:26,000
It's very important, which is device. So we'll see what that is later on. And then there's a
1176
01:55:26,000 --> 01:55:32,400
final one, which is also very important, which is requires grad equals false. Now this could be
1177
01:55:32,400 --> 01:55:38,080
true, of course, we're going to set this as false. So these are three of the most important parameters
1178
01:55:38,080 --> 01:55:43,920
when you're creating tensors. Now, again, you won't necessarily always have to enter these when
1179
01:55:43,920 --> 01:55:49,040
you're creating tensors, because pytorch does a lot of tensor creation behind the scenes for you.
1180
01:55:49,040 --> 01:55:58,080
So let's just write out what these are. Data type is what data type is the tensor, e.g. float 32,
1181
01:55:58,080 --> 01:56:04,400
or float 16. Now, if you'd like to look at what data types are available for pytorch tensors,
1182
01:56:04,400 --> 01:56:11,280
we can go torch tensor and write up the top unless the documentation changes. We have data types.
1183
01:56:11,280 --> 01:56:17,360
It's so important that data types is the first thing that comes up when you're creating a tensor.
1184
01:56:17,360 --> 01:56:24,000
So we have 32-bit floating point, 64-bit floating point, 16, 16, 32-bit complex. Now,
1185
01:56:24,000 --> 01:56:30,640
the most common ones that you will likely interact with are 32-bit floating point and 16-bit floating
1186
01:56:30,640 --> 01:56:36,000
point. Now, what does this mean? What do these numbers actually mean? Well, they have to do with
1187
01:56:36,000 --> 01:56:44,000
precision in computing. So let's look up that. Precision in computing. Precision computer science.
1188
01:56:44,000 --> 01:56:50,000
So in computer science, the precision of a numerical quantity, we're dealing with numbers, right?
1189
01:56:50,000 --> 01:56:55,280
As a measure of the detail in which the quantity is expressed. This is usually measured in bits,
1190
01:56:55,280 --> 01:57:01,280
but sometimes in decimal digits. It is related to precision in mathematics, which describes the
1191
01:57:01,280 --> 01:57:08,320
number of digits that are used to express a value. So, for us, precision is the numerical quantity,
1192
01:57:08,320 --> 01:57:14,560
is a measure of the detail, how much detail in which the quantity is expressed. So, I'm not going
1193
01:57:14,560 --> 01:57:19,600
to dive into the background of computer science and how computers represent numbers. The important
1194
01:57:19,600 --> 01:57:25,280
takeaway for you from this will be that single precision floating point is usually called float
1195
01:57:25,280 --> 01:57:33,280
32, which means, yeah, a number contains 32 bits in computer memory. So if you imagine, if we have
1196
01:57:33,280 --> 01:57:39,680
a tensor that is using 32 bit floating point, the computer memory stores the number as 32 bits.
1197
01:57:40,240 --> 01:57:46,880
Or if it has 16 bit floating point, it stores it as 16 bits or 16 numbers representing or 16.
1198
01:57:46,880 --> 01:57:52,480
I'm not sure if a bit equates to a single number in computer memory. But what this means is that
1199
01:57:52,480 --> 01:57:59,680
a 32 bit tensor is single precision. This is half precision. Now, this means that it's the default
1200
01:57:59,680 --> 01:58:05,520
of 32, float 32, torch dot float 32, as we've seen in code, which means it's going to take up
1201
01:58:05,520 --> 01:58:10,560
a certain amount of space in computer memory. Now, you might be thinking, why would I do anything
1202
01:58:10,560 --> 01:58:16,880
other than the default? Well, if you'd like to sacrifice some detail in how your number is
1203
01:58:16,880 --> 01:58:25,840
represented. So instead of 32 bits, it's represented by 16 bits, you can calculate faster on numbers
1204
01:58:25,840 --> 01:58:32,720
that take up less memory. So that is the main differentiator between 32 bit and 16 bit. But if
1205
01:58:32,720 --> 01:58:38,720
you need more precision, you might go up to 64 bit. So just keep that in mind as you go forward.
1206
01:58:38,720 --> 01:58:45,120
Single precision is 32. Half precision is 16. What do these numbers represent? They represent
1207
01:58:45,120 --> 01:58:53,360
how much detail a single number is stored in memory. That was a lot to take in. But we're talking
1208
01:58:53,360 --> 01:58:58,640
about 10 to data types. I'm spending a lot of time here, because I'm going to put a note here,
1209
01:58:58,640 --> 01:59:11,360
note, tensor data types is one of the three big issues with pytorch and deep learning or
1210
01:59:11,360 --> 01:59:17,200
not not issues, they're going to be errors that you run into and deep learning. Three big
1211
01:59:17,200 --> 01:59:29,840
errors, you'll run into with pytorch and deep learning. So one is tensors, not right data type.
1212
01:59:29,840 --> 01:59:39,360
Two tensors, not right shape. We've seen a few shapes of four and three tensors, not on the right
1213
01:59:39,360 --> 01:59:48,080
device. And so in this case, if we had a tensor that was float 16 and we were trying to do computations
1214
01:59:48,080 --> 01:59:53,040
with a tensor that was float 32, we might run into some errors. And so that's the tensors not
1215
01:59:53,040 --> 01:59:58,400
being in the right data type. So it's important to know about the D type parameter here. And then
1216
01:59:58,400 --> 02:00:03,600
tensors not being the right shape. Well, that's once we get onto matrix multiplication, we'll see
1217
02:00:03,600 --> 02:00:08,320
that if one tensor is a certain shape and another tensor is another shape and those shapes don't
1218
02:00:08,320 --> 02:00:13,360
line up, we're going to run into shape errors. And this is a perfect segue to the device.
1219
02:00:13,360 --> 02:00:19,840
Device equals none. By default, this is going to be CPU. This is why we are using Google Colab
1220
02:00:19,840 --> 02:00:25,440
because it enables us to have access to, oh, we don't want to restart, enables us to have access
1221
02:00:25,440 --> 02:00:32,480
to a GPU. As I've said before, a GPU enables us. So we could change this to CUDA. That would be,
1222
02:00:32,480 --> 02:00:39,760
we'll see how to write device agnostic code later on. But this device, if you try to do
1223
02:00:39,760 --> 02:00:46,240
operations between two tensors that are not on the same device. So for example, you have one tensor
1224
02:00:46,240 --> 02:00:51,040
that lives on a GPU for fast computing, and you have another tensor that lives on a CPU and you
1225
02:00:51,040 --> 02:00:56,400
try to do something with them, while pytorch is going to throw you an error. And then finally,
1226
02:00:56,400 --> 02:01:01,520
this last requirement is grad is if you want pytorch to track the gradients, we haven't covered
1227
02:01:01,520 --> 02:01:06,640
what that is of a tensor when it goes through certain numerical calculations. This is a bit of
1228
02:01:06,640 --> 02:01:12,960
a bombardment, but I thought I'd throw these in as important parameters to be aware of since
1229
02:01:12,960 --> 02:01:17,760
we're discussing data type. And really, it would be reminiscent of me to discuss data type without
1230
02:01:17,760 --> 02:01:24,960
discussing not the right shape or not the right device. So with that being said, let's write down
1231
02:01:24,960 --> 02:01:36,960
here what device is your tensor on, and whether or not to track gradients with this tensor's
1232
02:01:36,960 --> 02:01:44,880
operations. So we have a float 32 tensor. Now, how might we change the tensor data type of this?
1233
02:01:44,880 --> 02:01:52,000
Let's create float 16 tensor. And we saw that we could explicitly write in float 16 tensor.
1234
02:01:52,000 --> 02:02:00,560
Or we can just type in here, float 16 tensor equals float 32 tensor dot type. And we're going to type
1235
02:02:00,560 --> 02:02:07,920
in torch dot float 16, why float 16, because well, that's how we define float 16, or we could use
1236
02:02:07,920 --> 02:02:14,560
half. So the same thing, these things are the same, let's just do half, or float 16 is more
1237
02:02:14,560 --> 02:02:25,600
explicit for me. And then let's check out float 16 tensor. Beautiful, we've converted our float
1238
02:02:25,600 --> 02:02:31,600
32 tensor into float 16. So that is one of the ways that you'll be able to tackle the tensors
1239
02:02:31,600 --> 02:02:37,280
not in the right data type issue that you run into. And just a little note on the precision
1240
02:02:37,280 --> 02:02:43,520
and computing, if you'd like to read more on that, I'm going to link this in here. And this is all
1241
02:02:43,520 --> 02:02:52,640
about how computers store numbers. So precision in computing. There we go. I'll just get rid of that.
1242
02:02:53,520 --> 02:02:59,520
Wonderful. So give that a try, create some tensors, research, or go to the documentation of torch
1243
02:02:59,520 --> 02:03:04,640
dot tensor and see if you can find out a little bit more about D type device and requires grad,
1244
02:03:04,640 --> 02:03:09,920
and create some tensors of different data types. Play around with whatever the ones you want here,
1245
02:03:09,920 --> 02:03:14,960
and see if you can run into some errors, maybe try to multiply two tensors together. So if you go
1246
02:03:15,760 --> 02:03:23,680
float 16 tensor times float 32 tensor, give that a try and see what happens. I'll see you in the next
1247
02:03:23,680 --> 02:03:30,560
video. Welcome back. In the last video, we covered a little bit about tensor data types,
1248
02:03:30,560 --> 02:03:36,080
as well as some of the most common parameters you'll see past to the torch dot tensor method.
1249
02:03:36,080 --> 02:03:40,480
And so I should do the challenge at the end of the last video to create some of your own tensors
1250
02:03:40,480 --> 02:03:45,200
of different data types, and then to see what happens when you multiply a float 16 tensor by a
1251
02:03:45,200 --> 02:03:52,800
float 32 tensor. Oh, it works. And but you've like Daniel, you said that you're going to have tensors
1252
02:03:52,800 --> 02:03:58,880
not the right data type. Well, this is another kind of gotcha or caveat of pie torch and deep
1253
02:03:58,880 --> 02:04:03,760
learning in general, is that sometimes you'll find that even if you think something may error
1254
02:04:03,760 --> 02:04:08,800
because these two tensors are different data types, it actually results in no error. But then
1255
02:04:08,800 --> 02:04:13,200
sometimes you'll have other operations that you do, especially training large neural networks,
1256
02:04:13,200 --> 02:04:18,400
where you'll get data type issues. The important thing is to just be aware of the fact that some
1257
02:04:18,400 --> 02:04:23,520
operations will run an error when your tensors are not in the right data type. So let's try another
1258
02:04:23,520 --> 02:04:32,960
type. Maybe we try a 32 bit integer. So torch dot in 32. And we try to multiply that by a float.
1259
02:04:32,960 --> 02:04:45,200
Wonder what will happen then? So let's go into 32 in 32 tensor equals torch dot tensor. And we'll
1260
02:04:45,200 --> 02:04:52,720
just make it three. Notice that there's no floats there or no dot points to make it a float.
1261
02:04:52,720 --> 02:05:04,320
Three, six, nine and D type can be torch in 32. And then in 32 tensor, what does this look like?
1262
02:05:04,320 --> 02:05:12,080
Typo, of course, one of many in 32 tensor. So now let's go float 32 tensor and see what happens.
1263
02:05:12,080 --> 02:05:17,120
Can we get pie torch to throw an error in 32 tensor?
1264
02:05:17,120 --> 02:05:26,080
Huh, it worked as well. Or maybe we go into 64. What happens here?
1265
02:05:28,000 --> 02:05:34,640
Still works. Now, see, this is again one of the confusing parts of doing tensor operations.
1266
02:05:34,640 --> 02:05:40,160
What if we do a long tensor? Torch to long. Is this going to still work?
1267
02:05:41,760 --> 02:05:45,520
Ah, torch has no attribute called long. That's not a data type issue.
1268
02:05:45,520 --> 02:05:57,520
I think it's long tensor. Long tensor. Does this work? D type must be torch D type.
1269
02:05:58,560 --> 02:06:04,000
Torch long tensor. I could have sworn that this was torch dot tensor.
1270
02:06:08,000 --> 02:06:12,960
Oh, there we go. Torch dot long tensor. That's another word for 64 bit.
1271
02:06:12,960 --> 02:06:19,200
So what is this saying? CPU tensor. Okay, let's see. This is some troubleshooting on the fly here.
1272
02:06:23,040 --> 02:06:29,360
Then we multiply it. This is a float 32 times a long. It works. Okay, so it's actually a bit
1273
02:06:29,360 --> 02:06:33,360
more robust than what I thought it was. But just keep this in mind when we're training models,
1274
02:06:33,360 --> 02:06:36,320
we're probably going to run into some errors at some point of our tensor's not being the
1275
02:06:36,320 --> 02:06:40,480
right data type. And if pie torch throws us an error saying your tensors are in the wrong data
1276
02:06:40,480 --> 02:06:46,960
type, well, at least we know now how to change that data type or how to set the data type if we
1277
02:06:46,960 --> 02:06:53,920
need to. And so with that being said, let's just formalize what we've been doing a fair bit already.
1278
02:06:53,920 --> 02:06:59,600
And that's getting information from tensors. So the three big things that we'll want to get
1279
02:06:59,600 --> 02:07:04,320
from our tensors in line with the three big errors that we're going to face in neural networks and
1280
02:07:04,320 --> 02:07:13,680
deep lining is let's copy these down. Just going to get this, copy this down below. So if we want
1281
02:07:13,680 --> 02:07:18,400
to get some information from tensors, how do we check the shape? How do we check the data type?
1282
02:07:18,400 --> 02:07:24,320
How do we check the device? Let's write that down. So to get information from this, to get
1283
02:07:24,320 --> 02:07:39,680
D type or let's write data type from a tensor can use tensor dot D type. And let's go here to get
1284
02:07:39,680 --> 02:07:52,320
shape from a tensor can use tensor dot shape. And to get device from a tensor, which devices it on
1285
02:07:52,320 --> 02:08:02,720
CPU or GPU can use tensor dot device. Let's see these three in action. So if we run into one of
1286
02:08:02,720 --> 02:08:07,760
the three big problems in deep learning and neural networks in general, especially with PyTorch,
1287
02:08:07,760 --> 02:08:12,080
tensor's not the right data type, tensor's not the right shape or tensor's not on the right device.
1288
02:08:12,640 --> 02:08:19,440
Let's create a tensor and try these three out. We've got some tensor equals torch dot
1289
02:08:19,440 --> 02:08:23,440
rand and we'll create it a three four. Let's have a look at what it looks like.
1290
02:08:25,600 --> 02:08:31,440
There we go. Random numbers of shape three and four. Now let's find out some details about it.
1291
02:08:32,960 --> 02:08:39,840
Find out details about some tensor. So print or print some tensor.
1292
02:08:39,840 --> 02:08:49,360
And oops, didn't want that print. And let's format it or make an F string of shape of tensor.
1293
02:08:50,640 --> 02:08:52,720
Oh, let's do data type first. We'll follow that order.
1294
02:08:56,560 --> 02:09:02,000
Data type of tensor. And we're going to go, how do we do this? Some tensor dot what?
1295
02:09:02,000 --> 02:09:11,360
Dot d type. Beautiful. And then we're going to print tensors not in the right shape. So let's go
1296
02:09:13,120 --> 02:09:22,400
shape of tensor equals some tensor dot shape. Oh, I went a bit too fast, but we could also use
1297
02:09:22,400 --> 02:09:28,400
size. Let's just confirm that actually. We'll code that out together. From my experience,
1298
02:09:28,400 --> 02:09:40,640
some tensor dot size, and some tensor dot shape result in the same thing. Is that true? Oh, function.
1299
02:09:40,640 --> 02:09:46,320
Oh, that's what it is. Some tensor dot size is a function, not an attribute.
1300
02:09:49,040 --> 02:09:54,480
There we go. Which one should you use? For me, I'm probably more used to using shape. You may come
1301
02:09:54,480 --> 02:09:59,680
across dot size as well, but just realize that they do quite the same thing except one's a function
1302
02:09:59,680 --> 02:10:06,720
and one's an attribute. An attribute is written dot shape without the curly brackets. A function
1303
02:10:06,720 --> 02:10:12,560
or a method is with the brackets at the end. So that's the difference between these are attributes
1304
02:10:12,560 --> 02:10:18,400
here. D type size. We're going to change this to shape. Tensor attributes. This is what we're
1305
02:10:18,400 --> 02:10:27,200
getting. I should probably write that down. This is tensor attributes. That's the formal name for
1306
02:10:27,200 --> 02:10:31,760
these things. And then finally, what else do we want? Tensors, what device are we looking for?
1307
02:10:32,640 --> 02:10:41,200
Let's get rid of this, get rid of this. And then print f device tensor is on. By default,
1308
02:10:41,200 --> 02:10:52,560
our tensor is on the CPU. So some tensor dot device. There we go. So now we've got our tensor
1309
02:10:52,560 --> 02:10:57,760
here, some tensor. The data type is a torch float 32 because we didn't change it to anything else.
1310
02:10:57,760 --> 02:11:02,000
And torch float 32 is the default. The shape is three four, which makes a lot of sense because
1311
02:11:02,000 --> 02:11:07,360
we passed in three four here. And the device tensor is on is the CPU, which is, of course,
1312
02:11:07,360 --> 02:11:12,960
the default, unless we explicitly say to put it on another device, all of the tensors that we
1313
02:11:12,960 --> 02:11:18,720
create will default to being on the CPU, rather than the GPU. And we'll see later on how to put
1314
02:11:18,720 --> 02:11:25,040
tensors and other things in torch onto a GPU. But with that being said, give it a shot,
1315
02:11:25,040 --> 02:11:29,040
create your own tensor, get some information from that tensor, and see if you can change
1316
02:11:29,040 --> 02:11:36,640
these around. So see if you could create a random tensor, but instead of float 32, it's a float 16.
1317
02:11:36,640 --> 02:11:41,920
And then probably another extracurricular, we haven't covered this yet. But see how to change
1318
02:11:41,920 --> 02:11:47,920
the device a pytorch tensor is on. Give that a crack. And I'll see you in the next video.
1319
02:11:49,840 --> 02:11:55,360
Welcome back. So in the last video, we had a look at a few tensor attributes, namely the data
1320
02:11:55,360 --> 02:12:01,120
type of a tensor, the shape of a tensor, and the device that a tensor lives on. And I alluded to
1321
02:12:01,120 --> 02:12:08,160
the fact that these will help resolve three of the most common issues in building neural networks,
1322
02:12:08,160 --> 02:12:13,360
deep learning models, specifically with pytorch. So tensor has not been the right data type,
1323
02:12:13,360 --> 02:12:19,360
tensor has not been the right shape, and tensor has not been on the right device. So now let's
1324
02:12:19,360 --> 02:12:26,000
get into manipulating tensors. And what I mean by that, so let's just write here the title,
1325
02:12:26,000 --> 02:12:32,560
manipulating tensors. And this is going to be tensor operations. So when we're building neural
1326
02:12:32,560 --> 02:12:39,120
networks, neural networks are comprised of lots of mathematical functions that pytorch code is going
1327
02:12:39,120 --> 02:12:51,280
to run behind the scenes for us. So let's go here, tensor operations include addition,
1328
02:12:51,280 --> 02:13:02,560
subtraction, and these are the regular addition, subtraction, multiplication. There's two types
1329
02:13:02,560 --> 02:13:08,960
of multiplication in that you'll typically see referenced in deep learning and neural networks,
1330
02:13:09,600 --> 02:13:17,920
division, and matrix multiplication. And these, the ones here, so addition, subtraction,
1331
02:13:17,920 --> 02:13:25,600
multiplication, division, your typical operations that you're probably familiar with matrix multiplication.
1332
02:13:25,600 --> 02:13:29,680
The only different one here is matrix multiplication. We're going to have a look at that in a minute.
1333
02:13:30,320 --> 02:13:37,520
But to find patterns in numbers of a data set, a neural network will combine these functions
1334
02:13:37,520 --> 02:13:43,040
in some way, shape or form. So it takes a tensor full of random numbers, performs some kind of
1335
02:13:43,040 --> 02:13:48,400
combination of addition, subtraction, multiplication, division, matrix multiplication. It doesn't have
1336
02:13:48,400 --> 02:13:53,520
to be all of these. It could be any combination of these to manipulate these numbers in some way
1337
02:13:53,520 --> 02:13:59,040
to represent a data set. So that's how a neural network learns is it will just comprise these
1338
02:13:59,040 --> 02:14:05,920
functions, look at some data to adjust the numbers of a random tensor, and then go from there. But
1339
02:14:05,920 --> 02:14:11,040
with that being said, let's look at a few of these. So we'll begin with addition. First thing we need
1340
02:14:11,040 --> 02:14:22,720
to do is create a tensor. And to add something to a tensor, we'll just go torch tensor. Let's go one,
1341
02:14:22,720 --> 02:14:31,680
two, three, add something to a tensor is tensor plus, we can use plus as the addition operator,
1342
02:14:31,680 --> 02:14:42,000
just like in Python, tensor plus 10 is going to be tensor 11, 12, 13, tensor plus 100 is going to be
1343
02:14:42,000 --> 02:14:49,200
as you'd expect plus 100. Let's leave that as plus 10 and add 10 to it. And so you might be
1344
02:14:49,200 --> 02:14:59,920
able to guess how we would multiply it by 10. So let's go multiply tensor by 10. We can go tensor,
1345
02:14:59,920 --> 02:15:08,800
star, which are my keyboard shift eight, 10. We get 10, 10, 10. And because we didn't reassign it,
1346
02:15:10,800 --> 02:15:19,040
our tensor is still 123. So if we go, if we reassign it here, tensor equals tensor by 10,
1347
02:15:19,040 --> 02:15:26,080
and then check out tensor, we've now got 10 2030. And the same thing here, we'll have 10 2030. But
1348
02:15:26,080 --> 02:15:36,240
then if we go back from the top, if we delete this reassignment, oh, what do we get there, tensor
1349
02:15:36,240 --> 02:15:47,680
by 10. Oh, what's happened here? Oh, because we've got, yeah, okay, I see, tensor by 10, tensor,
1350
02:15:47,680 --> 02:15:55,760
still 123. What should we try now? How about subtract subtract 10 equals tensor minus 10.
1351
02:15:58,880 --> 02:16:04,880
And you can also use, well, there we go, one minus 10, eight minus 10, three minus 10.
1352
02:16:05,600 --> 02:16:12,640
You can also use like torch has inbuilt functions or pytorch. So try out pytorch
1353
02:16:12,640 --> 02:16:24,400
inbuilt functions. So torch dot mall is short for multiply. We can pass in our tensor here,
1354
02:16:24,400 --> 02:16:30,960
and we can add in 10. That's going to multiply each element of tensor by 10. So just taking
1355
02:16:30,960 --> 02:16:35,920
the original tensor that we created, which is 123. And performing the same thing as this,
1356
02:16:35,920 --> 02:16:42,800
I would recommend where you can use the operators from Python. If for some reason, you see torch
1357
02:16:42,800 --> 02:16:48,080
dot mall, maybe there's a reason for that. But generally, these are more understandable if you
1358
02:16:48,080 --> 02:16:53,200
just use the operators, if you need to do a straight up multiplication, straight up addition, or straight
1359
02:16:53,200 --> 02:17:00,560
up subtraction, because torch also has torch dot add, torch dot add, is it torch dot add? It might
1360
02:17:00,560 --> 02:17:07,280
be torch dot add. I'm not sure. Oh, there we go. Yeah, torch dot add. So as I alluded to before,
1361
02:17:07,280 --> 02:17:12,800
there's two different types of multiplication that you'll hear about element wise and matrix
1362
02:17:12,800 --> 02:17:18,000
multiplication. We're going to cover matrix multiplication in the next video. As a challenge,
1363
02:17:18,000 --> 02:17:26,320
though, I would like you to search what is matrix multiplication. And I think the first website that
1364
02:17:26,320 --> 02:17:32,400
comes up, matrix multiplication, Wikipedia, yeah, math is fun. It has a great guide. So before we
1365
02:17:32,400 --> 02:17:38,400
get into matrix multiplication, jump into math is fun to have a look at matrix multiplying,
1366
02:17:38,400 --> 02:17:43,520
and have a think about how we might be able to replicate that in pie torch. Even if you're not
1367
02:17:43,520 --> 02:17:52,000
sure, just have a think about it. I'll see you in the next video. Welcome back. In the last video,
1368
02:17:52,000 --> 02:17:58,880
we discussed some basic tensor operations, such as addition, subtraction, multiplication,
1369
02:17:58,880 --> 02:18:04,560
element wise, division, and matrix multiplication. But we didn't actually go through what matrix
1370
02:18:04,560 --> 02:18:09,920
multiplication is. So now let's start on that more particularly discussing the difference between
1371
02:18:09,920 --> 02:18:15,680
element wise and matrix multiplication. So we'll come down here, let's write another heading,
1372
02:18:15,680 --> 02:18:23,440
matrix multiplication. So there's two ways, or two main ways. Yeah, let's write that two main
1373
02:18:23,440 --> 02:18:33,920
ways of performing multiplication in neural networks and deep learning. So one is the simple
1374
02:18:33,920 --> 02:18:42,640
version, which is what we've seen, which is element wise multiplication. And number two is matrix
1375
02:18:42,640 --> 02:18:49,600
multiplication. So matrix multiplication is actually possibly the most common tensor operation you
1376
02:18:49,600 --> 02:18:57,280
will find inside neural networks. And in the last video, I issued the extra curriculum of having a
1377
02:18:57,280 --> 02:19:04,480
look at the math is fun dot com page for how to multiply matrices. So the first example they go
1378
02:19:04,480 --> 02:19:11,760
through is element wise multiplication, which just means multiplying each element by a specific
1379
02:19:11,760 --> 02:19:18,080
number. In this case, we have two times four equals eight, two times zero equals zero, two times one
1380
02:19:18,080 --> 02:19:23,840
equals two, two times negative nine equals negative 18. But then if we move on to matrix
1381
02:19:23,840 --> 02:19:30,160
multiplication, which is multiplying a matrix by another matrix, we need to do the dot product.
1382
02:19:30,160 --> 02:19:35,600
So that's something that you'll also hear matrix multiplication referred to as the dot product.
1383
02:19:35,600 --> 02:19:42,720
So these two are used interchangeably matrix multiplication or dot product. And if we just
1384
02:19:42,720 --> 02:19:50,960
look up the symbol for dot product, you'll find that it's just a dot. There we go, a heavy dot,
1385
02:19:50,960 --> 02:20:00,560
images. There we go, a dot B. So this is vector a dot product B. A few different options there,
1386
02:20:00,560 --> 02:20:05,440
but let's look at what it looks like in pytorch code. But first, there's a little bit of a
1387
02:20:05,440 --> 02:20:10,640
difference here. So how did we get from multiplying this matrix here of one, two, three, four, five,
1388
02:20:10,640 --> 02:20:17,680
six, times seven, eight, nine, 10, 11, 12? How did we get 58 there? Well, we start by going,
1389
02:20:17,680 --> 02:20:22,240
this is the difference between element wise and dot product, by the way, one times seven.
1390
02:20:23,120 --> 02:20:29,200
We'll record that down there. So that's seven. And then two times nine. So this is first row,
1391
02:20:29,200 --> 02:20:36,800
first column, two times nine is 18. And then three times 11 is 33. And if we add those up,
1392
02:20:36,800 --> 02:20:44,880
seven plus 18, plus 33, we get 58. And then if we were to do that for each other element that's
1393
02:20:44,880 --> 02:20:50,960
throughout these two matrices, we end up with something like this. So that's what I'd encourage
1394
02:20:50,960 --> 02:20:56,000
you to go through step by step and reproduce this a good challenge would be to reproduce this by
1395
02:20:56,000 --> 02:21:02,720
hand with pytorch code. But now let's go back and write some pytorch code to do both of these. So
1396
02:21:04,400 --> 02:21:12,720
I just want to link here as well, more information on multiplying matrices. So I'm going to turn
1397
02:21:12,720 --> 02:21:18,560
this into markdown. Let's first see element wise, element wise multiplication. We're going to start
1398
02:21:18,560 --> 02:21:27,440
with just a rudimentary example. So if we have our tensor, what is it at the moment? It's 123.
1399
02:21:27,440 --> 02:21:33,440
And then if we multiply that by itself, we get 149. But let's print something out so it looks a bit
1400
02:21:33,440 --> 02:21:42,640
prettier than that. So print, I'm going to turn this into a string. And then we do that. So if we
1401
02:21:42,640 --> 02:21:51,120
print tensor times tensor, element wise multiplication is going to give us print equals. And then
1402
02:21:51,120 --> 02:22:00,800
let's do in here tensor times tensor. We go like that. Wonderful. So we get one times one
1403
02:22:00,800 --> 02:22:08,560
equals one, two times two equals four, three times three equals nine. Now for matrix multiplication,
1404
02:22:08,560 --> 02:22:18,720
pytorch stores matrix multiplication, similar to torch dot mall in the torch dot mat mall space,
1405
02:22:19,440 --> 02:22:25,360
which stands for matrix multiplication. So let's just test it out. Let's just true the exact
1406
02:22:25,360 --> 02:22:30,640
same thing that we did here, instead of element wise, we'll do matrix multiplication on our 123
1407
02:22:30,640 --> 02:22:41,200
tensor. What happens here? Oh my goodness, 14. Now why did we get 14 instead of 149? Can you guess
1408
02:22:41,200 --> 02:22:48,880
how we got to 14 or think about how we got to 14 from these numbers? So if we recall back,
1409
02:22:49,600 --> 02:22:58,160
we saw that for we're only multiplying two smaller tensors, by the way, 123. This example is with
1410
02:22:58,160 --> 02:23:03,600
a larger one, but the same principle applies across different sizes of tensors or matrices.
1411
02:23:04,320 --> 02:23:09,280
And when I say matrix multiplication, you can also do matrix multiplication between tensors.
1412
02:23:10,000 --> 02:23:16,400
And in our case, we're using vectors just to add to the confusion. But what is the difference
1413
02:23:16,400 --> 02:23:22,960
here between element wise and dot product? Well, we've got one main addition. And that is addition.
1414
02:23:22,960 --> 02:23:32,560
So if we were to code this out by hand, matrix multiplication by hand, we'd have recall that
1415
02:23:32,560 --> 02:23:40,640
the elements of our tensor are 123. So if we wanted to matrix multiply that by itself,
1416
02:23:40,640 --> 02:23:49,600
we'd have one times one, which is the equivalent of doing one times seven in this visual example.
1417
02:23:49,600 --> 02:23:58,720
And then we'd have plus, it's going to be two times two, two times two. What does that give us?
1418
02:23:58,720 --> 02:24:08,080
Plus three times three. What does it give us? Three times three. That gives us 14. So that's how
1419
02:24:08,080 --> 02:24:14,720
we got to that number there. Now we could do this with a for loop. So let's have a gaze at when I
1420
02:24:14,720 --> 02:24:20,880
say gaze, it means have a look. That's a Australian colloquialism for having a look. But I want to
1421
02:24:20,880 --> 02:24:27,360
show you the time difference in it might not actually be that big a difference if we do it by hand
1422
02:24:27,360 --> 02:24:32,160
versus using something like matmore. And that's another thing to note is that if PyTorch has a
1423
02:24:32,160 --> 02:24:40,800
method already implemented, chances are it's a fast calculating version of that method. So I know
1424
02:24:40,800 --> 02:24:45,680
for basic operators, I said it's usually best to just use this straight up basic operator.
1425
02:24:45,680 --> 02:24:50,880
But for something like matrix multiplication or other advanced operators instead of the basic
1426
02:24:50,880 --> 02:24:55,520
operators, you probably want to use the torch version rather than writing a for loop, which is
1427
02:24:55,520 --> 02:25:02,480
what we're about to do. So let's go value equals zero. This is matrix multiplication by hand. So
1428
02:25:02,480 --> 02:25:11,760
for I in range, len tensor, so for each element in the length of our tensor, which is 123, we want to
1429
02:25:11,760 --> 02:25:19,680
update our value to be plus equal, which is doing this plus reassignment here. The ith element in
1430
02:25:19,680 --> 02:25:29,040
each tensor times the ith element. So times itself. And then how long is this going to take?
1431
02:25:29,040 --> 02:25:42,400
Let's now return the value. We should get 14, print 14. There we go. So 1.9 milliseconds on
1432
02:25:42,400 --> 02:25:49,440
whatever CPU that Google collab is using behind the scenes. But now if we time it and use the torch
1433
02:25:49,440 --> 02:25:55,120
method torch dot matmore, it was tensor dot sensor. And again, we're using a very small tensor. So
1434
02:25:55,120 --> 02:26:02,640
okay, there we go. It actually showed how much quicker it is, even with such a small tensor.
1435
02:26:02,640 --> 02:26:13,440
So this is 1.9 milliseconds. This is 252 microseconds. So this is 10 times slower using a for loop,
1436
02:26:13,440 --> 02:26:18,480
then pie torches vectorized version. I'll let you look into that if you want to find out what
1437
02:26:18,480 --> 02:26:24,560
vectorization means. It's just a type of programming that rather than writing for loops, because as
1438
02:26:24,560 --> 02:26:30,640
you could imagine, if this tensor was, let's say, had a million elements instead of just three,
1439
02:26:30,640 --> 02:26:36,000
if you have to loop through each of those elements one by one, that's going to be quite cumbersome.
1440
02:26:36,000 --> 02:26:44,080
So a lot of pie torches functions behind the scenes implement optimized functions to perform
1441
02:26:45,120 --> 02:26:49,600
mathematical operations, such as matrix multiplication, like the one we did by hand,
1442
02:26:49,600 --> 02:26:56,080
in a far faster manner, as we can see here. And that's only with a tensor of three elements.
1443
02:26:56,080 --> 02:27:00,480
So you can imagine the speedups on something like a tensor with a million elements.
1444
02:27:01,120 --> 02:27:06,960
But with that being said, that is the crux of matrix multiplication. For a little bit more,
1445
02:27:06,960 --> 02:27:12,080
I encourage you to read through this documentation here by mathisfun.com. Otherwise,
1446
02:27:12,080 --> 02:27:17,920
let's look at a couple of rules that we have to satisfy for larger versions of matrix multiplication.
1447
02:27:17,920 --> 02:27:22,640
Because right now, we've done it with a simple tensor, only 123. Let's step things up a notch
1448
02:27:22,640 --> 02:27:30,960
in the next video. Welcome back. In the last video, we were introduced to matrix multiplication,
1449
02:27:30,960 --> 02:27:38,640
which although we haven't seen it yet, is one of the most common operations in neural networks.
1450
02:27:38,640 --> 02:27:46,800
And we saw that you should always try to use torches implementation of certain operations,
1451
02:27:46,800 --> 02:27:51,360
except if they're basic operations, like plus multiplication and whatnot,
1452
02:27:51,360 --> 02:27:57,840
because chances are it's a lot faster version than if you would do things by hand. And also,
1453
02:27:57,840 --> 02:28:04,240
it's a lot less code. Like compared to this, this is pretty verbose code compared to just a matrix
1454
02:28:04,240 --> 02:28:10,000
multiply these two tensors. But there's something that we didn't allude to in the last video.
1455
02:28:10,000 --> 02:28:14,720
There's a couple of rules that need to be satisfied when performing matrix multiplication.
1456
02:28:14,720 --> 02:28:20,960
It worked for us because we have a rather simple tensor. But once you start to build larger tensors,
1457
02:28:20,960 --> 02:28:25,120
you might run into one of the most common errors in deep learning. I'm going to write this down
1458
02:28:25,120 --> 02:28:32,560
actually here. This is one to be very familiar with. One of the most common errors in deep
1459
02:28:32,560 --> 02:28:39,760
learning, we've already alluded to this as well, is shape errors. So let's jump back to this in a
1460
02:28:39,760 --> 02:28:50,400
minute. I just want to write up here. So there are two rules that performing or two main rules
1461
02:28:50,400 --> 02:28:56,560
that performing matrix multiplication needs to satisfy. Otherwise, we're going to get an error.
1462
02:28:57,120 --> 02:29:06,640
So number one is the inner dimensions must match. Let's see what this means.
1463
02:29:06,640 --> 02:29:14,880
So if we want to have two tensors of shape, three by two, and then we're going to use the at symbol.
1464
02:29:15,920 --> 02:29:21,360
Now, we might be asking why the at symbol. Well, the at symbol is another, is a like an operator
1465
02:29:21,360 --> 02:29:27,120
symbol for matrix multiplication. So I just want to give you an example. If we go tensor at
1466
02:29:29,200 --> 02:29:34,400
at stands for matrix multiplication, we get tensor 14, which is exactly the same as what we got there.
1467
02:29:34,400 --> 02:29:39,280
Should you use at or should you use mat mall? I would personally recommend to use mat mall.
1468
02:29:39,280 --> 02:29:43,920
It's a little bit clearer at sometimes can get confusing because it's not as common as seeing
1469
02:29:43,920 --> 02:29:49,680
something like mat mall. So we'll get rid of that, but I'm just using it up here for brevity.
1470
02:29:50,320 --> 02:29:56,800
And then we're going to go three, two. Now, this won't work. We'll see why in a second.
1471
02:29:56,800 --> 02:30:06,800
But if we go two, three, at, and then we have three, two, this will work. Or, and then if we go
1472
02:30:06,800 --> 02:30:13,120
the reverse, say threes on the outside, twos here. And then we have twos on the inside and threes
1473
02:30:13,120 --> 02:30:19,520
on the outside, this will work. Now, why is this? Well, this is the rule number one. The inner
1474
02:30:19,520 --> 02:30:26,960
dimensions must match. So the inner dimensions are what I mean by this is let's create torch
1475
02:30:26,960 --> 02:30:36,880
round or create of size 32. And then we'll get its shape. So we have, so if we created a tensor
1476
02:30:36,880 --> 02:30:42,960
like this, three, two, and then if we created another tensor, well, let me just show you straight
1477
02:30:42,960 --> 02:30:53,760
up torch dot mat mall torch dot ran to watch this won't work. We'll get an error. There we go. So
1478
02:30:53,760 --> 02:30:59,600
this is one of the most common errors that you're going to face in deep learning is that matrix
1479
02:30:59,600 --> 02:31:04,480
one and matrix two shapes cannot be multiplied because it doesn't satisfy rule number one.
1480
02:31:04,480 --> 02:31:11,600
The inner dimensions must match. And so what I mean by inner dimensions is this dimension multiplied
1481
02:31:11,600 --> 02:31:19,120
by this dimension. So say we were trying to multiply three, two by three, two, these are the inner
1482
02:31:19,120 --> 02:31:29,040
dimensions. Now this will work because why the inner dimensions match. Two, three by three, two,
1483
02:31:29,040 --> 02:31:40,640
two, three by three, two. Now notice how the inner dimensions, inner, inner match. Let's see what
1484
02:31:40,640 --> 02:31:48,480
comes out here. Look at that. And now this is where rule two comes into play. Two. The resulting
1485
02:31:48,480 --> 02:32:03,520
matrix has the shape of the outer dimensions. So we've just seen this one two, three at three, two,
1486
02:32:04,560 --> 02:32:12,960
which is at remember is matrix multiply. So we have a matrix of shape, two, three,
1487
02:32:12,960 --> 02:32:20,400
matrix multiply a matrix of three, two, the inner dimensions match. So it works. The resulting shape
1488
02:32:20,400 --> 02:32:30,160
is what? Two, two. Just as we've seen here, we've got a shape of two, two. Now what if we did
1489
02:32:31,120 --> 02:32:38,080
the reverse? What if we did this one that also will work? Three on the outside. What do you think
1490
02:32:38,080 --> 02:32:44,000
is going to happen here? In fact, I encourage you to pause the video and give it a go. So this
1491
02:32:44,000 --> 02:32:49,680
is going to result in a three three matrix. But don't take my word for it. Let's have a look. Three,
1492
02:32:50,560 --> 02:32:57,120
put two on the inside and we'll put two on the inside here and then three on the outside. What
1493
02:32:57,120 --> 02:33:05,920
does it give us? Oh, look at that. A three three. One, two, three. One, two, three. Now what if we
1494
02:33:05,920 --> 02:33:11,600
were to change this? Two and two. This can be almost any number you want. Let's change them both
1495
02:33:11,600 --> 02:33:18,320
to 10. What's going to happen? Will this work? What's the resulting shape going to be? So the
1496
02:33:18,320 --> 02:33:24,160
inner dimensions match? What's rule number two? The resulting matrix has the shape of the outer
1497
02:33:24,160 --> 02:33:29,200
dimension. So what do you think is going to be the shape of this resulting matrix multiplication?
1498
02:33:29,200 --> 02:33:39,360
Well, let's have a look. It's still three three. Wow. Now what if we go 10? 10 on the outside
1499
02:33:40,640 --> 02:33:46,400
and 10 and 10 on the inside? What do we get? Well, we get, I'm not going to count all of those,
1500
02:33:46,400 --> 02:33:54,480
but if we just go shape, we get 10 by 10. Because these are the two main rules of matrix multiplication
1501
02:33:54,480 --> 02:33:58,880
is if you're running into an error that the matrix multiplication can't work. So let's say this was
1502
02:33:58,880 --> 02:34:05,440
10 and this was seven. Watch what's going to happen? We can't multiply them because the inner
1503
02:34:05,440 --> 02:34:10,160
dimensions do not match. We don't have 10 and 10. We have 10 and seven. But then when we change
1504
02:34:10,160 --> 02:34:17,120
this so that they match, we get 10 and 10. Beautiful. So now let's create a little bit more of a
1505
02:34:17,120 --> 02:34:23,840
specific example. We'll create two tenses. We'll come down. Actually, to prevent this video from
1506
02:34:23,840 --> 02:34:28,240
being too long, I've got an error in the word error. That's funny. We'll go on with one of those
1507
02:34:28,240 --> 02:34:32,000
common errors in deep learning shape errors. We've just seen it, but I'm going to get a little bit
1508
02:34:32,000 --> 02:34:38,080
more specific with that shape error in the next video. Before we do that, have a look at matrix
1509
02:34:38,080 --> 02:34:43,680
multiplication. There's a website, my other favorite website. I told you I've got two. This is my
1510
02:34:43,680 --> 02:34:49,120
other one. Matrix multiplication dot XYZ. This is your challenge before the next video. Put in
1511
02:34:49,120 --> 02:34:57,600
some random numbers here, whatever you want, two, 10, five, six, seven, eight, whatever you want. Change
1512
02:34:57,600 --> 02:35:05,280
these around a bit, three, four. Well, that's a five, not a four. And then multiply and just watch
1513
02:35:05,280 --> 02:35:10,560
what happens. That's all I'd like you to do. Just watch what happens and we're going to replicate
1514
02:35:10,560 --> 02:35:16,080
something like this in PyTorch code in the next video. I'll see you there.
1515
02:35:16,080 --> 02:35:22,960
Welcome back. In the last video, we discussed a little bit more about matrix multiplication,
1516
02:35:22,960 --> 02:35:29,280
but we're not done there. We looked at two of the main rules of matrix multiplication,
1517
02:35:29,280 --> 02:35:34,960
and we saw a few errors of what happens if those rules aren't satisfied, particularly if the
1518
02:35:34,960 --> 02:35:41,040
inner dimensions don't match. So this is what I've been alluding to as one of the most common
1519
02:35:41,040 --> 02:35:46,560
errors in deep learning, and that is shape errors. Because neural networks are comprised of lots of
1520
02:35:46,560 --> 02:35:52,720
matrix multiplication operations, if you have some sort of tensor shape error somewhere
1521
02:35:52,720 --> 02:35:59,200
in your neural network, chances are you're going to get a shape error. So now let's investigate
1522
02:35:59,200 --> 02:36:06,560
how we can deal with those. So let's create some tenses, shapes for matrix multiplication.
1523
02:36:06,560 --> 02:36:12,000
And I also showed you the website, sorry, matrix multiplication dot xyz. I hope you had a go at
1524
02:36:12,000 --> 02:36:15,520
typing in some numbers here and visualizing what happens, because we're going to reproduce
1525
02:36:15,520 --> 02:36:22,240
something very similar to what happens here, but with PyTorch code. Shapes for matrix multiplication,
1526
02:36:22,240 --> 02:36:29,440
we have tensor a, let's create this as torch dot tensor. We're going to create a tensor with
1527
02:36:29,440 --> 02:36:36,720
just the elements one, two, all the way up to, let's just go to six, hey, that'll be enough. Six,
1528
02:36:36,720 --> 02:36:44,560
wonderful. And then tensor b can be equal to a torch tensor
1529
02:36:47,440 --> 02:36:53,520
of where we're going to go for this one. Let's go seven, 10, this will be a little bit confusing
1530
02:36:53,520 --> 02:37:03,040
this one, but then we'll go eight, 11, and this will go up to 12, nine, 12. So it's the same
1531
02:37:04,240 --> 02:37:09,040
sort of sequence as what's going on here, but they've been swapped around. So we've got the
1532
02:37:09,040 --> 02:37:14,400
vertical axis here, instead of one, two, three, four, this is just seven, eight, nine, 10, 11, 12.
1533
02:37:14,960 --> 02:37:19,360
But let's now try and perform a matrix multiplication. How do we do that?
1534
02:37:19,360 --> 02:37:25,760
Torch dot mat mall for matrix multiplication. PS torch also has torch dot mm, which stands
1535
02:37:25,760 --> 02:37:30,320
for matrix multiplication, which is a short version. So I'll just write down here so that you know
1536
02:37:30,960 --> 02:37:42,080
tensor a, tensor b. I'm going to write torch dot mm is the same as torch dot mat mall. It's an alias
1537
02:37:42,080 --> 02:37:50,320
for writing less code. This is literally how common matrix multiplications are in PyTorch
1538
02:37:50,320 --> 02:37:56,720
is that they've made torch dot mm as an alias for mat mall. So you have to type four less characters
1539
02:37:56,720 --> 02:38:02,720
using torch dot mm instead of mat mall. But I like to write mat mall because it's a little bit
1540
02:38:02,720 --> 02:38:09,280
like it explains what it does a little bit more than mm. So what do you think's going to happen
1541
02:38:09,280 --> 02:38:14,880
here? It's okay if you're not sure. But what you could probably do to find out is check the
1542
02:38:14,880 --> 02:38:20,320
shapes of these. Does this operation matrix multiplication satisfy the rules that we just
1543
02:38:20,320 --> 02:38:24,800
discussed? Especially this one. This is the main one. The inner dimensions must match.
1544
02:38:25,600 --> 02:38:34,160
Well, let's have a look, hey? Oh, no, mat one and mat two shapes cannot be multiplied.
1545
02:38:34,160 --> 02:38:39,360
Three by two and three by two. This is very similar to what we went through in the last video.
1546
02:38:39,360 --> 02:38:42,640
But now we've got some actual numbers there. Let's check the shape.
1547
02:38:44,400 --> 02:38:51,360
Oh, torch size three two. Torch size three two now. In the last video we created a random tensor
1548
02:38:51,360 --> 02:38:57,120
and we could adjust the shape on the fly. But these tensors already exist. How might we adjust
1549
02:38:57,120 --> 02:39:05,280
the shape of these? Well, now I'm going to introduce you to another very common operation or tensor
1550
02:39:05,280 --> 02:39:13,520
manipulation that you'll see. And that is the transpose. To fix our tensor shape issues,
1551
02:39:13,520 --> 02:39:31,440
we can manipulate the shape of one of our tensors using a transpose. And so, all right here,
1552
02:39:32,000 --> 02:39:38,000
we're going to see this anyway, but I'm going to define it in words. A transpose switches the
1553
02:39:38,000 --> 02:39:48,800
axes or dimensions of a given tensor. So let's see this in action. If we go, and the way to do it,
1554
02:39:48,800 --> 02:40:00,880
is you can go tensor b dot t. Let's see what happens. Let's look at the original tensor b as well.
1555
02:40:01,600 --> 02:40:06,480
So dot t stands for transpose. And that's a little bit hard to read, so we might do these on
1556
02:40:06,480 --> 02:40:18,560
different lines, tensor b. We'll get rid of that. So you see what's happened here. Instead of
1557
02:40:18,560 --> 02:40:24,880
tensor b, this is the original one. We might put the original on top. Instead of the original one
1558
02:40:24,880 --> 02:40:30,960
having seven, eight, nine, 10, 11, 12 down the vertical, the transpose has transposed it to seven,
1559
02:40:30,960 --> 02:40:35,920
eight, nine across the horizontal and 10, 11, 12 down here. Now, if we get the shape of this,
1560
02:40:36,720 --> 02:40:43,200
tensor b dot shape, let's have a look at that. Let's have a look at the original shape, tensor b dot
1561
02:40:43,200 --> 02:40:55,280
shape. What's happened? Oh, no, we've still got three, two. Oh, that's what I've missed out here.
1562
02:40:55,280 --> 02:41:00,960
I've got a typo. Excuse me. I thought I was, you think code that you've written is working,
1563
02:41:00,960 --> 02:41:06,080
but then you realize you've got something as small as just a dot t missing, and it throws off your
1564
02:41:06,080 --> 02:41:13,920
whole train of thought. So you're seeing these arrows on the fly here. Now, tensor b is this,
1565
02:41:13,920 --> 02:41:19,600
but its shape is torch dot size three, two. And if we try to matrix multiply three, two, and three,
1566
02:41:19,600 --> 02:41:26,560
two, tensor a and tensor b, we get an error. Why? Because the inner dimensions do not match.
1567
02:41:26,560 --> 02:41:35,520
But if we perform a transpose on tensor b, we switch the dimensions around. So now,
1568
02:41:35,520 --> 02:41:43,680
we perform a transpose with tensor b dot t, t's for transpose. We have, this is the important
1569
02:41:43,680 --> 02:41:49,360
point as well. We still have the same elements. It's just that they've been rearranged. They've
1570
02:41:49,360 --> 02:41:56,480
been transposed. So now, tensor b still has the same information encoded, but rearranged.
1571
02:41:56,480 --> 02:42:01,840
So now we have torch size two, three. And so when we try to matrix multiply these,
1572
02:42:02,400 --> 02:42:09,360
we satisfy the first criteria. And now look at the output of the matrix multiplication of tensor a
1573
02:42:09,360 --> 02:42:17,440
and tensor b dot t transposed is three, three. And that is because of the second rule of matrix
1574
02:42:17,440 --> 02:42:23,760
multiplication. The resulting matrix has the shape of the outer dimensions. So we've got three,
1575
02:42:23,760 --> 02:42:31,280
two matrix multiply two, three results in a shape of three, three. So let's predify some of this,
1576
02:42:31,280 --> 02:42:36,240
and we'll print out what's going on here. Just so we know, we can step through it,
1577
02:42:36,240 --> 02:42:41,360
because right now we've just got codal over the place a bit. Let's see here, the matrix
1578
02:42:41,360 --> 02:42:54,880
multiplication operation works when tensor b is transposed. And in a second, I'm going to
1579
02:42:54,880 --> 02:42:58,880
show you what this looks like visually. But right now we've done it with pytorch code,
1580
02:42:58,880 --> 02:43:03,040
which might be a little confusing. And that's perfectly fine. Matrix multiplication takes a
1581
02:43:03,040 --> 02:43:10,400
little while and a little practice. So original shapes is going to be tensor a dot shape. Let's
1582
02:43:10,400 --> 02:43:20,640
see what this is. And tensor b equals tensor b dot shape. But the reason why we're spending so
1583
02:43:20,640 --> 02:43:26,240
much time on this is because as you'll see, as you get more and more into neural networks and
1584
02:43:26,240 --> 02:43:32,800
deep learning, the matrix multiplication operation is one of the most or if not the most common.
1585
02:43:32,800 --> 02:43:40,080
Same shape as above, because we haven't changed tensor a shape, we've only changed tensor b shape,
1586
02:43:40,080 --> 02:43:50,880
or we've transposed it. And then in tensor b dot transpose equals, we want tensor b dot
1587
02:43:50,880 --> 02:43:57,600
t dot shape. Wonderful. And then if we print, let's just print out, oops,
1588
02:43:57,600 --> 02:44:06,160
print, I spelled the wrong word there, print. We want, what are we multiplying here? This is
1589
02:44:06,160 --> 02:44:11,120
one of the ways, remember our motto of visualize, visualize, visualize, well, this is how I visualize,
1590
02:44:11,120 --> 02:44:19,680
visualize, visualize things, shape, let's do the at symbol for brevity, tensor, and let's get b dot
1591
02:44:19,680 --> 02:44:27,840
t dot shape. We'll put down our little rule here, inner dimensions must match. And then print,
1592
02:44:29,120 --> 02:44:37,600
let's get the output output, I'll put that on a new line. The output is going to equal
1593
02:44:37,600 --> 02:44:43,040
torch dot, or our outputs already here, but we're going to rewrite it for a little bit of practice,
1594
02:44:43,040 --> 02:44:55,120
tensor a, tensor b dot t. And then we can go print output. And then finally, print, let's get it on a
1595
02:44:55,120 --> 02:45:00,800
new line as well, the output shape, a fair bit going on here. But we're going to step through it,
1596
02:45:00,800 --> 02:45:06,960
and it's going to help us understand a little bit about what's going on. That's the data visualizes
1597
02:45:06,960 --> 02:45:16,880
motto. There we go. Okay, so the original shapes are what torch size three two, and torch size three
1598
02:45:16,880 --> 02:45:23,120
two, the new shapes tensor a stays the same, we haven't changed tensor a, and then we have tensor
1599
02:45:23,120 --> 02:45:31,360
b dot t is torch size two three, then we multiply a three by two by a two by three. So the inner
1600
02:45:31,360 --> 02:45:37,520
dimensions must match, which is correct, they do match two and two. Then we have an output of tensor
1601
02:45:37,520 --> 02:45:45,680
at 27, 30, 33, 61, 68, 75, etc. And the output shape is what the output shape is the outer
1602
02:45:45,680 --> 02:45:52,800
dimensions three three. Now, of course, you could rearrange this maybe transpose tensor a instead of
1603
02:45:52,800 --> 02:45:58,160
tensor b, have a play around with it. See if you can create some more errors trying to multiply these
1604
02:45:58,160 --> 02:46:03,600
two, and see what happens if you transpose tensor a instead of tensor b, that's my challenge. But
1605
02:46:03,600 --> 02:46:11,040
before we finish this video, how about we just recreate what we've done here with this cool website
1606
02:46:11,040 --> 02:46:17,440
matrix multiplication. So what did we have? We had tensor a, which is one to six, let's recreate
1607
02:46:17,440 --> 02:46:28,080
this, remove that, this is going to be one, two, three, four, five, six, and then we want to increase
1608
02:46:28,080 --> 02:46:37,200
this, and this is going to be seven, eight, nine, 10, 11, 12. Is that the right way of doing things?
1609
02:46:38,160 --> 02:46:43,440
So this is already transposed, just to let you know. So this is the equivalent of tensor b
1610
02:46:43,440 --> 02:46:54,880
on the right here, tensor b dot t. So let me just show you, if we go tensor b dot transpose,
1611
02:46:55,920 --> 02:47:01,440
which original version was that, but we're just passing in the transpose version to our matrix
1612
02:47:01,440 --> 02:47:06,960
multiplication website. And then if we click multiply, this is what's happening behind the
1613
02:47:06,960 --> 02:47:12,160
scenes with our pytorch code of matmore. We have one times seven plus two times 10. Did you see
1614
02:47:12,160 --> 02:47:16,960
that little flippy thing that it did? That's where the 27 comes from. And then if we come down here,
1615
02:47:17,600 --> 02:47:22,640
what's our first element? 27 when we matrix multiply them. Then if we do the same thing,
1616
02:47:22,640 --> 02:47:28,480
the next step, we get 30 and 61, from a combination of these numbers, do it again,
1617
02:47:29,360 --> 02:47:36,240
33, 68, 95, from a combination of these numbers, again, and again, and finally we end up with
1618
02:47:36,240 --> 02:47:44,000
exactly what we have here. So that's a little bit of practice for you to go through is to create
1619
02:47:44,000 --> 02:47:49,520
some of your own tensors can be almost whatever you want. And then try to matrix multiply them
1620
02:47:49,520 --> 02:47:54,000
with different shapes. See what happens when you transpose and what different values you get.
1621
02:47:54,000 --> 02:47:57,680
And if you'd like to visualize it, you could write out something like this. That really
1622
02:47:57,680 --> 02:48:01,760
helps me understand matrix multiplication. And then if you really want to visualize it,
1623
02:48:01,760 --> 02:48:07,520
you can go through this website and recreate your target tensors in something like this.
1624
02:48:07,520 --> 02:48:12,080
I'm not sure how long you can go. But yeah, that should be enough to get started.
1625
02:48:12,080 --> 02:48:14,480
So give that a try and I'll see you in the next video.
1626
02:48:17,600 --> 02:48:22,560
Welcome back. In the last few videos, we've covered one of the most fundamental operations
1627
02:48:22,560 --> 02:48:27,760
in neural networks. And that is matrix multiplication. But now it's time to move on.
1628
02:48:27,760 --> 02:48:36,400
And let's cover tensor aggregation. And what I mean by that is finding the min, max, mean,
1629
02:48:37,040 --> 02:48:44,000
sum, et cetera, tensor aggregation of certain tensor values. So for whatever reason, you may
1630
02:48:44,000 --> 02:48:49,040
want to find the minimum value of a tensor, the maximum value, the mean, the sum, what's going on
1631
02:48:49,040 --> 02:48:54,160
there. So let's have a look at some few PyTorch methods that are in built to do all of these.
1632
02:48:54,160 --> 02:48:59,520
And again, if you're finding one of these values, it's called tensor aggregation because you're
1633
02:48:59,520 --> 02:49:04,640
going from what's typically a large amount of numbers to a small amount of numbers. So the min
1634
02:49:04,640 --> 02:49:12,160
of this tensor would be 27. So you're turning it from nine elements to one element, hence
1635
02:49:12,160 --> 02:49:20,560
aggregation. So let's create a tensor, create a tensor, x equals torch dot, let's use a range.
1636
02:49:20,560 --> 02:49:27,760
We'll create maybe a zero to 100 with a step of 10. Sounds good to me. And we can find the min
1637
02:49:30,640 --> 02:49:36,640
by going, can we do torch dot min? Maybe we can. Or we could also go
1638
02:49:38,880 --> 02:49:39,520
x dot min.
1639
02:49:39,520 --> 02:49:52,400
And then we can do the same, find the max torch dot max and x dot max. Now how do you think we
1640
02:49:52,400 --> 02:50:03,520
might get the average? So let's try it out. Or find the mean, find the mean torch dot mean
1641
02:50:03,520 --> 02:50:13,440
x. Oops, we don't have an x. Is this going to work? What's happened? Mean input data type
1642
02:50:13,440 --> 02:50:19,360
should be either floating point or complex D types got long instead. Ha ha. Finally,
1643
02:50:19,360 --> 02:50:23,840
I knew the error would show its face eventually. Remember how I said it right up here that
1644
02:50:24,640 --> 02:50:29,360
we've covered a fair bit already. But right up here, some of the most common errors that
1645
02:50:29,360 --> 02:50:33,040
you're going to run into is tensor is not the right data type, not the right shape. We've seen
1646
02:50:33,040 --> 02:50:36,880
that with matrix multiplication, not the right device. We haven't seen that yet. But not the
1647
02:50:36,880 --> 02:50:42,960
right data type. This is one of those times. So it turns out that the tensor that we created,
1648
02:50:42,960 --> 02:50:46,640
x is of the data type, x dot D type.
1649
02:50:48,720 --> 02:50:53,360
In 64, which is long. So if we go to, let's look up torch tensor.
1650
02:50:53,360 --> 02:51:04,000
This is where they're getting long from. We've seen long before is N64. Where's that or long?
1651
02:51:04,000 --> 02:51:08,560
Yeah. So long tenter. That's what it's saying. And it turns out that the torch mean function
1652
02:51:09,120 --> 02:51:15,200
can't work on tensors with data type long. So what can we do here? Well, we can change
1653
02:51:15,200 --> 02:51:24,480
the data type of x. So let's go torch mean x type and change it to float 32. Or before we do that,
1654
02:51:25,360 --> 02:51:32,720
if we go to torch dot mean, is this going to tell us that it needs a D type? Oh, D type.
1655
02:51:32,720 --> 02:51:48,560
One option on the desired data type. Does it have float 32? It doesn't tell us. Ah, so this is
1656
02:51:48,560 --> 02:51:52,640
another one of those little hidden things that you're going to come across. And you only really
1657
02:51:52,640 --> 02:51:58,880
come across this by writing code is that sometimes the documentation doesn't really tell you explicitly
1658
02:51:58,880 --> 02:52:06,000
what D type the input should be, the input tensor. However, we find out that with this error message
1659
02:52:06,000 --> 02:52:12,080
that it should either be a floating point or a complex D type, not along. So we can convert it
1660
02:52:12,080 --> 02:52:19,840
to torch float 32. So all we've done is gone x type as type float 32. Let's see what happens here.
1661
02:52:19,840 --> 02:52:27,200
45 beautiful. And then the same thing, if we went, can we do x dot mean? Is that going to work as well?
1662
02:52:29,520 --> 02:52:38,400
Oh, same thing. So if we go x dot type torch dot float 32, get the mean of that. There we go.
1663
02:52:38,960 --> 02:52:44,480
So that is, I knew it would come up eventually. A beautiful example of finding the right data
1664
02:52:44,480 --> 02:52:56,560
type. Let me just put a note here. Note the torch dot mean function requires a tensor of float 32.
1665
02:52:57,440 --> 02:53:05,760
So so far, we've seen two of the major errors in PyTorch is data type and shape issues. What's
1666
02:53:05,760 --> 02:53:12,320
another one that we said? Oh, some. So find the sum. Find the sum we want x dot sum or maybe we
1667
02:53:12,320 --> 02:53:17,280
just do torch dot sum first. Keep it in line with what's going on above and x dot sum.
1668
02:53:18,960 --> 02:53:24,720
Which one of these should you use like torch dot something x or x dot sum? Personally,
1669
02:53:24,720 --> 02:53:30,080
I prefer torch dot max, but you'll also probably see me at points right this. It really depends
1670
02:53:30,080 --> 02:53:35,840
on what's going on. I would say pick whichever style you prefer. And because behind the scenes,
1671
02:53:35,840 --> 02:53:40,960
they're calling the same methodology. Picture whichever style you prefer and stick with that
1672
02:53:40,960 --> 02:53:47,040
throughout your code. For now, let's leave it at that tensor aggregation. There's some
1673
02:53:47,600 --> 02:53:53,040
finding min max mean sum. In the next video, we're going to look at finding the positional
1674
02:53:53,040 --> 02:54:00,000
min and max, which is also known as arg max and arg min or vice versa. So actually, that's a
1675
02:54:00,000 --> 02:54:05,280
little bit of a challenge for the next video is see how you can find out what the positional
1676
02:54:05,280 --> 02:54:12,240
min and max is of this. And what I mean by that is which index does the max value occur at and
1677
02:54:12,240 --> 02:54:17,600
which index of this tensor does the min occur at? You'll probably want to look into the methods
1678
02:54:17,600 --> 02:54:23,520
arg min torch dot arg min for that one and torch dot arg max for that. But we'll cover that in the
1679
02:54:23,520 --> 02:54:32,320
next video. I'll see you there. Welcome back. In the last video, we learned all about tensor
1680
02:54:32,320 --> 02:54:37,280
aggregation. And we found the min the max the mean and the sum. And we also ran into one of the most
1681
02:54:37,280 --> 02:54:44,160
common issues in pie torch and deep learning and neural networks in general. And that was wrong
1682
02:54:44,160 --> 02:54:50,240
data types. And so we solved that issue by converting because some functions such as torch dot mean
1683
02:54:50,240 --> 02:54:56,800
require a specific type of data type as input. And we created our tensor here, which was of by
1684
02:54:56,800 --> 02:55:03,360
default torch in 64. However, torch dot mean requires torch dot float 32. We saw that in an error.
1685
02:55:03,360 --> 02:55:08,480
We fix that by changing the type of the inputs. I also issued you the challenge of finding
1686
02:55:10,560 --> 02:55:17,760
finding the positional min and max. And you might have found that you can use the
1687
02:55:17,760 --> 02:55:30,000
arg min for the minimum. Let's remind ourselves of what x is x. So this means at tensor index of
1688
02:55:30,000 --> 02:55:36,800
tensor x. If we find the argument, that is the minimum value, which is zero. So at index zero,
1689
02:55:36,800 --> 02:55:44,960
we get the value zero. So that's at zero there. Zero there. This is an index value. So this is
1690
02:55:44,960 --> 02:55:56,240
what arg min stands for find the position in tensor that has the minimum value with arg min.
1691
02:55:57,840 --> 02:56:05,520
And then returns index position of target tensor
1692
02:56:05,520 --> 02:56:15,520
where the minimum value occurs. Now, let's just change x to start from one,
1693
02:56:18,240 --> 02:56:27,680
just so there we go. So the arg min is still position zero, position zero. So this is an index
1694
02:56:27,680 --> 02:56:34,240
value. And then if we index on x at the zeroth index, we get one. So the minimum value in
1695
02:56:34,240 --> 02:56:43,920
x is one. And then the maximum, you might guess, is find the position in tensor that has the maximum
1696
02:56:43,920 --> 02:56:51,200
value with arg max. And it's going to be the same thing, except it'll be the maximum, which is,
1697
02:56:51,200 --> 02:57:00,960
which position index nine. So if we go zero, one, two, three, four, five, six, seven, eight,
1698
02:57:00,960 --> 02:57:09,920
nine. And then if we index on x for the ninth element, we get 91 beautiful. Now these two are
1699
02:57:09,920 --> 02:57:17,760
useful for if yes, you want to define the minimum of a tensor, you can just use min. But if you
1700
02:57:17,760 --> 02:57:22,720
sometimes you don't want the actual minimum value, you just want to know where it appears,
1701
02:57:22,720 --> 02:57:28,080
particularly with the arg max value. This is helpful for when we use the soft max activation
1702
02:57:28,080 --> 02:57:32,640
function later on. Now we haven't covered that yet. So I'm not going to allude too much to it.
1703
02:57:32,640 --> 02:57:38,560
But just remember to find the positional min and max, you can use arg min and arg max.
1704
02:57:39,280 --> 02:57:44,240
So that's all we need to cover with that. Let's keep going in the next video. I'll see you then.
1705
02:57:47,680 --> 02:57:53,280
Welcome back. So we've covered a fair bit of ground. And just to let you know, I took a little break
1706
02:57:53,280 --> 02:57:57,840
after going through all of these. And I'd just like to show you how I get back to where I'm at,
1707
02:57:57,840 --> 02:58:04,080
because if we tried to just write x here and press shift and enter, because our collab
1708
02:58:04,080 --> 02:58:08,240
was disconnected, it's now connecting because as soon as you press any button in collab, it's
1709
02:58:08,240 --> 02:58:15,440
going to reconnect. It's going to try to connect, initialize, and then x is probably not going to
1710
02:58:15,440 --> 02:58:21,520
be stored in memory anymore. So there we go. Name x is not defined. That's because the collab
1711
02:58:21,520 --> 02:58:26,880
state gets reset if you take a break for a couple of hours. This is to ensure Google can keep
1712
02:58:26,880 --> 02:58:31,760
providing resources for free. And it deletes everything to ensure that there's no compute
1713
02:58:31,760 --> 02:58:38,560
resources that are being wasted. So to get back to here, I'm just going to go restart and run all.
1714
02:58:38,560 --> 02:58:44,160
You don't necessarily have to restart the notebook. You could also go, do we have run all? Yeah,
1715
02:58:44,160 --> 02:58:48,480
we could do run before. That'll run every cell before this. We could run after we could run the
1716
02:58:48,480 --> 02:58:53,360
selection, which is this cell here. I'm going to click run all, which is just going to go through
1717
02:58:53,360 --> 02:59:01,360
every single cell that we've coded above and run them all. However, it will also stop at the errors
1718
02:59:01,360 --> 02:59:06,720
where I've left in on purpose. So remember when we ran into a shape error? Well, because this error,
1719
02:59:06,720 --> 02:59:11,280
we didn't fix it. I left it there on purpose so that we could keep seeing a shape error. It's
1720
02:59:11,280 --> 02:59:17,680
going to stop at this cell. So we're going to have to run every cell after the error cell.
1721
02:59:17,680 --> 02:59:22,560
So see how it's going to run these now. They run fine. And then we get right back to where we were,
1722
02:59:22,560 --> 02:59:30,240
which was X. So that's just a little tidbit of how I get back into coding. Let's now cover reshaping,
1723
02:59:32,320 --> 02:59:39,280
stacking, squeezing, and unsqueezing. You might be thinking, squeezing and unsqueezing. What are
1724
02:59:39,280 --> 02:59:45,440
you talking about, Daniel? Well, it's all to do with tenses. And you're like, are we going to
1725
02:59:45,440 --> 02:59:50,000
squeeze our tenses? Give them a hug. Are we going to let them go by unsqueezing them?
1726
02:59:50,000 --> 02:59:56,960
Well, let's quickly define what these are. So reshaping is we saw before one of the most common
1727
02:59:56,960 --> 03:00:01,600
errors in machine learning and deep learning is shape mismatches with matrices because they
1728
03:00:01,600 --> 03:00:10,240
have to satisfy certain rules. So reshape reshapes an input tensor to a defined shape.
1729
03:00:10,880 --> 03:00:15,360
Now, we're just defining these things in words right now, but we're going to see it in code in
1730
03:00:15,360 --> 03:00:26,000
just a minute. There's also view, which is return a view of an input tensor of certain shape,
1731
03:00:26,560 --> 03:00:34,800
but keep the same memory as the original tensor. So we'll see what view is in a second.
1732
03:00:34,800 --> 03:00:40,000
Reshaping and view are quite similar, but a view always shares the same memory as the original
1733
03:00:40,000 --> 03:00:45,760
tensor. It just shows you the same tensor, but from a different perspective, a different shape.
1734
03:00:46,320 --> 03:00:55,680
And then we have stacking, which is combine multiple tensors on top of each other. This is a V stack
1735
03:00:55,680 --> 03:01:05,520
for vertical stack or side by side. H stack. Let's see what different types of torch stacks there are.
1736
03:01:05,520 --> 03:01:09,680
Again, this is how I research different things. If I wanted to learn something new, I would search
1737
03:01:09,680 --> 03:01:16,400
torch something stack concatenate a sequence of tensors along a new dimension. Okay. So maybe we
1738
03:01:16,400 --> 03:01:21,360
not H stack or V stack, we can just define what dimension we'd like to combine them on.
1739
03:01:21,360 --> 03:01:28,240
I wonder if there is a torch V stack. Torch V stack. Oh, there it is. And is there a torch H stack for
1740
03:01:28,240 --> 03:01:34,640
horizontal stack? There is a H stack. Beautiful. So we'll focus on just the plain stack. If you
1741
03:01:34,640 --> 03:01:39,040
want to have a look at V stack, it'll be quite similar to what we're going to do with stack
1742
03:01:39,040 --> 03:01:42,800
and same with H stack. Again, this is just words for now. We're going to see the code in a minute.
1743
03:01:43,680 --> 03:01:52,080
So there's also squeeze, which removes all one dimensions. I'm going to put one in code,
1744
03:01:52,960 --> 03:01:58,080
dimensions from a tensor. We'll see what that looks like. And then there's unsqueeze,
1745
03:01:58,080 --> 03:02:11,920
which adds a one dimension to our target tensor. And then finally, there's permute, which is return
1746
03:02:11,920 --> 03:02:25,360
a view of the input with dimensions permuted. So swapped in a certain way. So a fair few methods
1747
03:02:25,360 --> 03:02:32,640
here. But essentially the crust of all of these, the main point of all of these is to manipulate
1748
03:02:32,640 --> 03:02:39,920
our tensors in some way to change their shape or change their dimension. Because again, one of the
1749
03:02:39,920 --> 03:02:45,200
number one issues in machine learning and deep learning is tensor shape issues. So let's start
1750
03:02:45,200 --> 03:02:51,440
off by creating a tensor and have a look at each of these. Let's create a tensor. And then we're
1751
03:02:51,440 --> 03:02:56,480
going to just import torch. We don't have to, but this will just enable us to run the notebook
1752
03:02:56,480 --> 03:03:02,320
directly from this cell if we wanted to, instead of having to run everything above here. So let's
1753
03:03:02,320 --> 03:03:09,440
create another X torch dot a range because range is deprecated. I'm just going to add a few code
1754
03:03:09,440 --> 03:03:15,440
cells here so that I can scroll and that's in the middle of the screen there. Beautiful. So let's
1755
03:03:15,440 --> 03:03:22,560
just make it between one and 10 nice and simple. And then let's have a look at X and X dot shape.
1756
03:03:22,560 --> 03:03:30,160
What does this give us? Okay, beautiful. So we've got the numbers from one to nine. Our tensor is
1757
03:03:30,160 --> 03:03:40,960
of shape torch size nine. Let's start with reshape. So how about we add an extra dimension. So then
1758
03:03:40,960 --> 03:03:48,800
we have X reshaped equals X dot reshape. Now a key thing to keep in mind about the reshape
1759
03:03:48,800 --> 03:03:54,080
is that the dimensions have to be compatible with the original dimensions. So we're going to
1760
03:03:54,080 --> 03:03:59,680
change the shape of our original tensor with a reshape. And we try to change it into the shape
1761
03:03:59,680 --> 03:04:06,640
one seven. Does that work with the number nine? Well, let's find out, hey, let's check out X reshaped.
1762
03:04:06,640 --> 03:04:16,480
And then we'll look at X reshaped dot shape. What's this going to do? Oh, why do we get an error there?
1763
03:04:16,480 --> 03:04:21,280
Well, it's telling us here, this is what pie torch is actually really good at is giving us
1764
03:04:21,280 --> 03:04:26,720
errors for what's going wrong. We have one seven is invalid for input size of nine.
1765
03:04:26,720 --> 03:04:32,720
Well, why is that? Well, we're trying to squeeze nine elements into a tensor of one
1766
03:04:32,720 --> 03:04:40,000
times seven into seven elements. But if we change this to nine, what do we get? Ah, so do you notice
1767
03:04:40,000 --> 03:04:45,280
what just happened here? We just added a single dimension. See the single square bracket with
1768
03:04:45,280 --> 03:04:51,360
the extra shape here. What if we wanted to add two? Can we do that? No, we can't. Why is that?
1769
03:04:51,920 --> 03:04:57,680
Well, because two nine is invalid for input size nine, because two times nine is what?
1770
03:04:57,680 --> 03:05:02,880
18. So we're trying to double the amount of elements without having double the amount of elements.
1771
03:05:02,880 --> 03:05:07,920
So if we change this back to one, what happens if we change these around nine one? What does this
1772
03:05:07,920 --> 03:05:14,480
do? Oh, a little bit different there. So now instead of adding one on the first dimension or
1773
03:05:14,480 --> 03:05:21,920
the zeroth dimension, because Python is zero indexed, we added it on the first dimension,
1774
03:05:21,920 --> 03:05:27,200
which is giving us a square bracket here if we go back. So we add it to the outside here,
1775
03:05:27,200 --> 03:05:30,960
because we've put the one there. And then if we wanted to add it on the inside,
1776
03:05:32,080 --> 03:05:37,920
we put the one on the outside there. So then we've got the torch size nine one. Now, let's try
1777
03:05:37,920 --> 03:05:45,120
change the view, change the view. So just to reiterate, the reshape has to be compatible
1778
03:05:45,120 --> 03:05:50,880
with the original size. So how about we change this to one to 10? So we have a size of 10,
1779
03:05:50,880 --> 03:05:57,440
and then we can go five, two, what happens there? Oh, it's compatible because five times two equals
1780
03:05:57,440 --> 03:06:05,680
10. And then what's another way we could do this? How about we make it up to 12? So we've got 12
1781
03:06:05,680 --> 03:06:12,960
elements, and then we can go three, four, a code cells taking a little while run here.
1782
03:06:12,960 --> 03:06:20,320
Then we'll go back to nine, just so we've got the original there.
1783
03:06:22,400 --> 03:06:31,040
Whoops, they're going to be incompatible. Oh, so this is another thing. This is good.
1784
03:06:31,040 --> 03:06:35,280
We're getting some errors on the fly here. Sometimes you'll get saved failed with Google
1785
03:06:35,280 --> 03:06:40,960
CoLab, and automatic saving failed. What you can do to fix this is just either keep coding,
1786
03:06:40,960 --> 03:06:46,960
keep running some cells, and CoLab will fix itself in the background, or restart the notebook,
1787
03:06:46,960 --> 03:06:52,400
close it, and open again. So we've got size nine, or size eight, sorry, incompatible.
1788
03:06:54,000 --> 03:06:59,920
But this is good. You're seeing the errors that come up on the fly, rather than me sort of just
1789
03:06:59,920 --> 03:07:03,440
telling you what the errors are, you're seeing them as they come up for me. I'm trying to live
1790
03:07:03,440 --> 03:07:09,120
code this, and this is what's going to happen when you start to use Google CoLab, and subsequently
1791
03:07:09,120 --> 03:07:16,400
other forms of Jupyter Notebooks. But now let's get into the view, so we can go z equals,
1792
03:07:16,400 --> 03:07:25,840
let's change the view of x. View will change it to one nine, and then we'll go z, and then z dot shape.
1793
03:07:29,680 --> 03:07:36,640
Ah, we get the same thing here. So view is quite similar to reshape. Remember, though, that a
1794
03:07:36,640 --> 03:07:44,960
view shares the memory with the original tensor. So z is just a different view of x. So z shares
1795
03:07:44,960 --> 03:07:54,640
the same memory as what x does. So let's exemplify this. So changing z changes x, because a view of
1796
03:07:54,640 --> 03:08:04,640
a tensor shares the same memory as the original input. So let's just change z, change the first
1797
03:08:04,640 --> 03:08:11,360
element by using indexing here. So we're targeting one, we'll set this to equal five, and then we'll
1798
03:08:11,360 --> 03:08:19,440
see what z and x equal. Yeah, so see, we've got z, the first one here, we change the first element,
1799
03:08:19,440 --> 03:08:25,360
the zero element to five. And the same thing happens with x, we change the first element of z.
1800
03:08:25,360 --> 03:08:32,160
So because z is a view of x, the first element of x changes as well. But let's keep going. How
1801
03:08:32,160 --> 03:08:37,040
about we stack some tenses on top of each other? And we'll see what the stack function does in
1802
03:08:37,040 --> 03:08:48,080
torch. So stack tenses on top of each other. And I'll just see if I press command S to save,
1803
03:08:48,080 --> 03:08:55,200
maybe we'll get this fixed. Or maybe it just will fix itself. Oh, notebook is saved.
1804
03:08:56,960 --> 03:09:01,280
Unless you've made some extensive changes that you're worried about losing, you could just
1805
03:09:01,280 --> 03:09:07,200
download this notebook, so file download, and upload it to collab. But usually if you click yes,
1806
03:09:08,880 --> 03:09:13,200
it sort of resolves itself. Yeah, there we go. All changes saved. So that's beautiful
1807
03:09:13,200 --> 03:09:18,720
troubleshooting on the fly. I like that. So x stack, let's stack some tenses together,
1808
03:09:18,720 --> 03:09:25,200
equals torch stack. Let's go x x x, because if we look at what the doc string of stack is,
1809
03:09:25,200 --> 03:09:32,160
will we get this in collab? Or we just go to the documentations? Yeah. So list, it takes a list of
1810
03:09:32,160 --> 03:09:37,200
tenses and concatenates a sequence of tenses along a new dimension. And we define the dimension,
1811
03:09:37,200 --> 03:09:42,480
the dimension by default is zero. That's a little bit hard to read for me. So tenses,
1812
03:09:42,480 --> 03:09:46,560
dim equals zero. If we come into here, the default dimension is zero. Let's see what happens when
1813
03:09:46,560 --> 03:09:52,640
we play around with the dimension here. So we've got four x's. And the first one, we'll just do it
1814
03:09:52,640 --> 03:10:01,120
by default, x stack. Okay, wonderful. So they're stacked vertically. Let's see what happens if we
1815
03:10:01,120 --> 03:10:07,600
change this to one. Oh, they rearranged a little and stack like that. What happens if we change it
1816
03:10:07,600 --> 03:10:13,120
to two? Does it have a dimension to? Oh, we can't do that. Well, that's because the original shape
1817
03:10:13,120 --> 03:10:19,680
of x is incompatible with using dimension two. So the only real way to get used to what happens
1818
03:10:19,680 --> 03:10:23,520
here by stacking them on top of each other is to play around with the different values for the
1819
03:10:23,520 --> 03:10:30,000
dimension. So dim zero, dim one, they look a little bit different there. Now they're on top of each
1820
03:10:30,000 --> 03:10:37,840
other. And so the first zero index is now the zeroth tensor. And then same with two being there,
1821
03:10:37,840 --> 03:10:44,240
three and so on. But we'll leave it at the default. And there's also v stack and h stack. I'll leave
1822
03:10:44,240 --> 03:10:52,000
that to you to to practice those. But I think from memory v stack is using dimension equals zero.
1823
03:10:52,000 --> 03:10:57,680
Or h stack is like using dimension equals one. I may have those back the front. You can correct me
1824
03:10:57,680 --> 03:11:04,640
if I'm wrong there. Now let's move on. We're going to now have a look at squeeze and unsqueeze.
1825
03:11:05,600 --> 03:11:10,880
So actually, I'm going to get you to practice this. So see if you can look up torch squeeze
1826
03:11:10,880 --> 03:11:16,720
and torch unsqueeze. And see if you can try them out. We've created a tensor here. We've used
1827
03:11:16,720 --> 03:11:22,720
reshape and view and we've used stack. The usage of squeeze and unsqueeze is quite similar. So give
1828
03:11:22,720 --> 03:11:27,040
that a go. And to prevent this video from getting too long, we'll do them together in the next video.
1829
03:11:29,760 --> 03:11:36,080
Welcome back. In the last video, I issued the challenge of trying out torch dot squeeze,
1830
03:11:36,080 --> 03:11:45,680
which removes all single dimensions from a target tensor. And how would you try that out? Well,
1831
03:11:45,680 --> 03:11:51,760
here's what I would have done. I'd go to torch dot squeeze and see what happens. Open up the
1832
03:11:51,760 --> 03:11:58,480
documentation. Squeeze input dimension returns a tensor with all the dimensions of input size
1833
03:11:58,480 --> 03:12:04,720
one removed. And does it have some demonstrations? Yes, it does. Wow. Okay. So you could copy this in
1834
03:12:04,720 --> 03:12:11,840
straight into a notebook, copy it here. But what I'd actually encourage you to do quite often is
1835
03:12:11,840 --> 03:12:17,600
if you're looking up a new torch method you haven't used, code all of the example by hand. And then
1836
03:12:17,600 --> 03:12:22,800
just practice what the inputs and outputs look like. So x is the input here. Check the size of x,
1837
03:12:23,360 --> 03:12:30,080
squeeze x, well, set the squeeze of x to y, check the size of y. So let's replicate something
1838
03:12:30,080 --> 03:12:38,000
similar to this. We'll go into here, we'll look at x reshaped and we'll remind ourselves of x reshaped
1839
03:12:38,000 --> 03:12:49,600
dot shape. And then how about we see what x reshaped dot squeeze looks like. Okay. What happened here?
1840
03:12:50,400 --> 03:12:55,360
Well, we started with two square brackets. And we started with a shape of one nine
1841
03:12:55,360 --> 03:13:02,560
and removes all single dimensions from a target tensor. And now if we call the squeeze method on
1842
03:13:02,560 --> 03:13:09,120
x reshaped, we only have one square bracket here. So what do you think the shape of x reshaped dot
1843
03:13:09,120 --> 03:13:17,120
squeeze is going to be? We'll check the shape here. It's just nine. So that's the squeeze method,
1844
03:13:17,120 --> 03:13:24,640
removes all single dimensions. If we had one one nine, it would remove all of the ones. So it would
1845
03:13:24,640 --> 03:13:31,360
just end up being nine as well. Now, let's write some print statements so we can have a little
1846
03:13:31,360 --> 03:13:39,440
pretty output. So previous tensor, this is what I like to do. This is a form of visualize, visualize,
1847
03:13:39,440 --> 03:13:46,080
visualize. If I'm trying to get my head around something, I print out each successive change
1848
03:13:46,080 --> 03:13:51,280
to see what's happening. That way, I can go, Oh, okay. So that's what it was there. And then I
1849
03:13:51,280 --> 03:13:57,360
called that line of code there. Yes, it's a bit tedious. But you do this half a dozen times, a
1850
03:13:57,360 --> 03:14:02,480
fair few times. I mean, I still do it a lot of the time, even though I've written thousands of lines
1851
03:14:02,480 --> 03:14:07,680
of machine learning code. But it starts to become instinct after a while, you start to go, Oh, okay,
1852
03:14:07,680 --> 03:14:13,920
I've got a dimension mismatch on my tensors. So I need to squeeze them before I put them into a
1853
03:14:13,920 --> 03:14:23,040
certain function. For a little while, but with practice, just like riding a bike, right? But that
1854
03:14:23,040 --> 03:14:27,600
try saying is like when you first start, you're all wobbly all over the place having to look up
1855
03:14:27,600 --> 03:14:32,720
the documentation, not that there's much documentation for riding a bike, you just kind of keep trying.
1856
03:14:32,720 --> 03:14:38,480
But that's the style of coding. I'd like you to adopt is to just try it first. Then if you're stuck,
1857
03:14:38,480 --> 03:14:42,640
go to the documentation, look something up, print it out like this, what we're doing,
1858
03:14:42,640 --> 03:14:47,440
quite cumbersome. But this is going to give us a good explanation for what's happening. Here's our
1859
03:14:47,440 --> 03:14:53,120
previous tensor x reshaped. And then if we look at the shape of x reshaped, it's one nine. And then
1860
03:14:53,120 --> 03:14:57,600
if we call the squeeze method, which removes all single dimensions from a target tensor,
1861
03:14:57,600 --> 03:15:04,160
we have the new tensor, which is has one square bracket removed. And the new shape is all single
1862
03:15:04,160 --> 03:15:09,840
dimensions removed. So it's still the original values, but just a different dimension. Now,
1863
03:15:09,840 --> 03:15:14,800
let's do the same as what we've done here with unsqueeze. So we've given our tensors a hug and
1864
03:15:14,800 --> 03:15:18,480
squeezed out all the single dimensions of them. Now we're going to unsqueeze them. We're going to
1865
03:15:18,480 --> 03:15:25,040
take a step back and let them grow a bit. So torch unsqueeze adds a single dimension
1866
03:15:26,480 --> 03:15:34,720
to a target tensor at a specific dim dimension. Now that's another thing to note in PyTorch whenever
1867
03:15:34,720 --> 03:15:40,000
it says dim, that's dimension as in this is a zeroth dimension, first dimension. And if there
1868
03:15:40,000 --> 03:15:45,680
was more here, we'd go two, three, four, five, six, et cetera. Because why tensors can have
1869
03:15:45,680 --> 03:15:56,320
unlimited dimensions. So let's go previous target can be excused. So we'll get this squeezed version
1870
03:15:56,320 --> 03:16:02,720
of our tensor, which is x squeezed up here. And then we'll go print. The previous shape
1871
03:16:02,720 --> 03:16:14,400
is going to be x squeezed dot shape. And then we're going to add an extra dimension with unsqueeze.
1872
03:16:17,360 --> 03:16:24,400
There we go, x unsqueezed equals x squeezed. So our tensor before that we remove the single
1873
03:16:24,400 --> 03:16:32,160
dimension. And we're going to put in unsqueeze, dim, we'll do it on the zeroth dimension. And I
1874
03:16:32,160 --> 03:16:35,840
want you to have a think about what this is going to output even before we run the code.
1875
03:16:35,840 --> 03:16:39,680
Just think about, because we've added an extra dimension on the zeroth dimension,
1876
03:16:39,680 --> 03:16:45,440
what's the new shape of the unsqueeze tensor going to be? So we're going to go x unsqueezed.
1877
03:16:47,120 --> 03:16:56,320
And then we're going to go print, we'll get our new tensor shape, which is going to be x unsqueezed
1878
03:16:56,320 --> 03:17:04,080
dot shape. All right, let's have a look. There we go. So there's our previous tensor,
1879
03:17:04,080 --> 03:17:10,560
which is the squeezed version, just as a single dimension here. And then we have our new tensor,
1880
03:17:10,560 --> 03:17:16,240
which with the unsqueeze method on dimension zero, we've added a square bracket on the zeroth
1881
03:17:16,240 --> 03:17:20,160
dimension, which is this one here. Now what do you think's going to happen if I change this to one?
1882
03:17:20,160 --> 03:17:28,880
Where's the single dimension going to be added? Let's have a look. Ah, so instead of adding the
1883
03:17:28,880 --> 03:17:34,480
single dimension on the zeroth dimension, we've added it on the first dimension here. It's quite
1884
03:17:34,480 --> 03:17:40,640
confusing because Python is zero index. So I kind of want to my brain's telling me to say first,
1885
03:17:40,640 --> 03:17:45,920
but it's really the zeroth index here or the zeroth dimension. Now let's change this back to
1886
03:17:45,920 --> 03:17:52,320
zero. But that's just another way of exploring things. Every time there's like a parameter that
1887
03:17:52,320 --> 03:17:58,320
we have here, dim equals something like that could be shape, could be size, whatever, try
1888
03:17:58,320 --> 03:18:02,640
changing the values. That's what I'd encourage you to do. And even write some print code like
1889
03:18:02,640 --> 03:18:09,600
we've done here. Now there's one more we want to try out. And that's permute. So torch dot permute
1890
03:18:09,600 --> 03:18:23,840
rearranges the dimensions of a target tensor in a specified order. So if we wanted to check out,
1891
03:18:23,840 --> 03:18:30,080
let's get rid of some of these extra tabs. Torch dot permute. Let's have a look. This one took me
1892
03:18:30,080 --> 03:18:36,080
a little bit of practice to get used to. Because again, working with zeroth dimensions, even though
1893
03:18:36,080 --> 03:18:41,760
it seems like the first one. So returns a view. Okay. So we know that a view shares the memory of
1894
03:18:41,760 --> 03:18:47,280
the original input tensor with its dimensions permuted. So permuted for me, I didn't really know
1895
03:18:47,280 --> 03:18:53,040
what that word meant. I just have mapped in my own memory that permute means rearrange dimensions.
1896
03:18:53,600 --> 03:18:58,480
So the example here is we start with a random tensor, we check the size, and then we'd have
1897
03:18:58,480 --> 03:19:04,320
torch permute. We're going to swap the order of the dimensions. So the second dimension is first,
1898
03:19:04,320 --> 03:19:10,480
the zeroth dimension is in the middle, and the first dimension is here. So these are dimension
1899
03:19:10,480 --> 03:19:17,600
values. So if we have torch random two, three, five, two, zero, one has changed this one to be
1900
03:19:17,600 --> 03:19:24,480
over here. And then zero, one is two, three, and now two, three there. So let's try something similar
1901
03:19:24,480 --> 03:19:30,800
to this. So one of the common places you'll be using permute, or you might see permute being
1902
03:19:30,800 --> 03:19:37,040
used is with images. So there's a data specific data format. We've kind of seen a little bit
1903
03:19:37,040 --> 03:19:44,320
before, not too much. Original equals torch dot rand size equals. So an image tensor,
1904
03:19:44,880 --> 03:19:50,800
we go height width color channels on the end. So I'll just write this down. So this is height
1905
03:19:50,800 --> 03:19:57,040
width color channels. Remember, much of, and I'm going to spell color Australian style,
1906
03:19:57,040 --> 03:20:04,080
much of deep learning is turning your data into numerical representations. And this is quite common
1907
03:20:04,080 --> 03:20:10,000
numerical representation of image data. You have a tensor dimension for the height, a tensor dimension
1908
03:20:10,000 --> 03:20:14,240
for the width, and a tensor dimension for the color channels, which is red, green, and blue,
1909
03:20:14,240 --> 03:20:20,080
because a certain number of red, green, and blue creates almost any color. Now, if we want to
1910
03:20:20,080 --> 03:20:31,840
permute this, so permute the original tensor to rearrange the axis or dimension, axis or dimension,
1911
03:20:31,840 --> 03:20:40,080
are kind of used in the same light for tensors or dim order. So let's switch the color channels
1912
03:20:40,080 --> 03:20:45,680
to be the first or the zeroth dimension. So instead of height width color channels,
1913
03:20:45,680 --> 03:20:51,200
it'll be color channels height width. How would we do that with permute? Let's give it a shot.
1914
03:20:51,840 --> 03:21:01,200
X permuted equals X original dot permute. And we're going to take the second dimension,
1915
03:21:01,200 --> 03:21:06,640
because this takes a series of dims here. So the second dimension is color channels. Remember,
1916
03:21:06,640 --> 03:21:13,200
zero, one, two. So two, we want two first, then we want the height, which is a zero. And then we
1917
03:21:13,200 --> 03:21:24,160
want the width, which is one. And now let's do this shifts, axis, zero to one, one to two,
1918
03:21:24,800 --> 03:21:35,360
and two to zero. So this is the order as well. This two maps to zero. This zero maps to the first
1919
03:21:35,360 --> 03:21:41,360
index. This one maps to this index. But that's enough talk about it. Let's see what it looks like.
1920
03:21:41,360 --> 03:21:51,840
So print, previous shape, X original dot shape. And then we go here, print new shape. This will
1921
03:21:51,840 --> 03:22:01,120
be the permuted version. We want X permuted dot shape. Let's see what this looks like. Wonderful.
1922
03:22:01,120 --> 03:22:06,000
That's exactly what we wanted. So you see, let's just write a little note here. Now this is
1923
03:22:06,000 --> 03:22:14,960
color channels, height, width. So the same data is going to be in both of these tenses. So X
1924
03:22:14,960 --> 03:22:20,320
original X permuted, it's just viewed from a different point of view. Because remember, a
1925
03:22:20,320 --> 03:22:26,480
permute is a view. And what did we discuss? A view shares the same memory as the original tensor.
1926
03:22:26,480 --> 03:22:32,880
So X permuted will share the same place in memory as X original, even though it's from a different
1927
03:22:32,880 --> 03:22:37,920
shape. So a little challenge before we move on to the next video for you, or before you move
1928
03:22:37,920 --> 03:22:46,080
on to the next video, try change one of the values in X original. Have a look at X original.
1929
03:22:46,080 --> 03:22:54,560
And see if that same value, it could be, let's get one of this zero, zero, get all of the dimensions
1930
03:22:54,560 --> 03:23:07,360
here, zero. See what that is? Or can we get a single value maybe? Oops. Oh, no, we'll need a zero
1931
03:23:07,360 --> 03:23:14,560
here, getting some practice on indexing here. Oh, zero, zero, zero. There we go. Okay, so maybe
1932
03:23:14,560 --> 03:23:22,000
we set that to some value, whatever you choose, and see if that changes in X permuted. So give
1933
03:23:22,000 --> 03:23:29,840
that a shot, and I'll see you in the next video. Welcome back. In the last video, we covered
1934
03:23:29,840 --> 03:23:36,480
squeezing, unsqueezing, and permuting, which I'm not going to lie, these concepts are quite a
1935
03:23:36,480 --> 03:23:41,600
lot to take in, but just so you're aware of them. Remember, what are they working towards? They're
1936
03:23:41,600 --> 03:23:46,640
helping us fix shape and dimension issues with our tensors, which is one of the most common
1937
03:23:46,640 --> 03:23:51,680
issues in deep learning and neural networks. And I usually do the little challenge of changing a
1938
03:23:51,680 --> 03:23:58,480
value of X original to highlight the fact that permute returns a different view of the original
1939
03:23:58,480 --> 03:24:04,960
tensor. And a view in PyTorch shares memory with that original tensor. So if we change the value
1940
03:24:04,960 --> 03:24:12,880
at zero, zero, zero of X original to, in my case, 728218, it happens the same value gets copied across
1941
03:24:12,880 --> 03:24:20,240
to X permuted. So with that being said, we looked at selecting data from tensors here, and this is
1942
03:24:20,240 --> 03:24:25,520
using a technique called indexing. So let's just rehash that, because this is another thing that
1943
03:24:25,520 --> 03:24:30,560
can be a little bit of a hurdle when first working with multi dimensional tensors. So let's see how
1944
03:24:30,560 --> 03:24:38,480
we can select data from tensors with indexing. So if you've ever done indexing, indexing,
1945
03:24:39,840 --> 03:24:46,400
with PyTorch is similar to indexing with NumPy. If you've ever worked with NumPy,
1946
03:24:46,400 --> 03:24:51,760
and you've done indexing, selecting data from arrays, NumPy uses an array as its main data type,
1947
03:24:51,760 --> 03:24:57,680
PyTorch uses tensors. It's very similar. So let's again start by creating a tensor.
1948
03:24:58,560 --> 03:25:04,960
And again, I'm just going to add a few code cells here, so I can make my screen right in the middle.
1949
03:25:04,960 --> 03:25:10,720
Now we're going to import torch. Again, we don't need to import torch all the time,
1950
03:25:10,720 --> 03:25:18,720
just so you can run the notebook from here later on. X equals torch dot. Let's create a range again,
1951
03:25:18,720 --> 03:25:24,320
just nice and simple. This is how I like to work out the fundamentals too, is just create the small
1952
03:25:24,320 --> 03:25:30,000
range, reshape it, and the reshape has to be compatible with the original dimension. So we go
1953
03:25:30,000 --> 03:25:35,840
one, three, three, and why is this because torch a range is going to return us nine values, because
1954
03:25:35,840 --> 03:25:43,120
it's from the start here to the end minus one, and then one times three times three is what is
1955
03:25:43,120 --> 03:25:53,200
nine. So let's have a look x x dot shape. Beautiful. So we have one, two, three, four, five, six,
1956
03:25:53,200 --> 03:25:59,760
seven, eight, nine of size one. So we have this is the outer bracket here, which is going to contain
1957
03:25:59,760 --> 03:26:09,920
all of this. And then we have three, which is this one here, one, two, three. And then we have three,
1958
03:26:09,920 --> 03:26:19,600
which is one, two, three. Now let's work with this. Let's index on our new tensor. So let's see what
1959
03:26:19,600 --> 03:26:29,760
happens when we get x zero, this is going to index on the first bracket. So we get this one here. So
1960
03:26:29,760 --> 03:26:35,280
we've indexed on the first dimension here, the zero dimension on this one here, which is why we get
1961
03:26:35,280 --> 03:26:47,120
what's inside here. And then let's try again, let's index on the middle bracket. So dimension
1962
03:26:47,120 --> 03:26:56,400
one. So we got to go x, and then zero, and then zero. Let's see what happens there. Now is this the
1963
03:26:56,400 --> 03:27:04,960
same as going x zero, zero? It is, there we go. So it depends on what you want to use. Sometimes
1964
03:27:04,960 --> 03:27:10,880
I prefer to go like this. So I know that I'm getting the first bracket, and then the zeroth
1965
03:27:10,880 --> 03:27:15,440
version of that first bracket. So then we have these three values here. Now what do you think
1966
03:27:15,440 --> 03:27:20,960
what's going to happen if we index on third dimension or the second dimension here? Well,
1967
03:27:20,960 --> 03:27:29,760
let's find out. So let's index on the most in our bracket, which is last dimension.
1968
03:27:31,120 --> 03:27:38,480
So we have x zero, zero, zero. What numbers is going to give us back of x zero,
1969
03:27:39,280 --> 03:27:44,160
on the zero dimension gives us back this middle tensor. And then if x zero, zero gives us back
1970
03:27:44,160 --> 03:27:51,040
the zeroth index of the middle tensor. If we go x zero, zero, zero is going to give us the zeroth
1971
03:27:52,000 --> 03:27:59,840
tensor, the zeroth index, and the zeroth element. A lot to take in there. But what we've done is
1972
03:27:59,840 --> 03:28:06,880
we've just broken it down step by step. We've got this first zero targets this outer bracket
1973
03:28:06,880 --> 03:28:14,800
and returns us all of this. And then zero, zero targets this first because of this first zero,
1974
03:28:14,800 --> 03:28:21,520
and then the zero here targets this. And then if we go zero, zero, zero, we target this,
1975
03:28:22,080 --> 03:28:27,760
then we target this, and then we get this back because we are getting the zeroth index here.
1976
03:28:27,760 --> 03:28:34,400
So if we change this to one, what do we get back? Two. And if we change these all to one,
1977
03:28:34,400 --> 03:28:42,640
what will we get? This is a bit of trivia here, or a challenge. So we're going one, one, one.
1978
03:28:45,520 --> 03:28:50,320
Let's see what happens. Oh, no, did you catch that before I ran the code? I did that one quite
1979
03:28:50,320 --> 03:28:56,080
quickly. We have index one is out of bounds. Why is that? Well, because this dimension is only one
1980
03:28:56,080 --> 03:29:00,640
here. So we can only index on the zero. That's where it gets a little bit confusing because this
1981
03:29:00,640 --> 03:29:05,200
says one, but because it's only got zero dimension, we can only index on the zero if to mention. But
1982
03:29:05,200 --> 03:29:13,760
what if we do 011? What does that give us? Five. Beautiful. So I'd like to issue you the challenge
1983
03:29:13,760 --> 03:29:20,240
of how about getting number nine? How would you get number nine? So rearrange this code to get
1984
03:29:20,240 --> 03:29:24,880
number nine. That's your challenge. Now, I just want to show you as well, is you can use,
1985
03:29:24,880 --> 03:29:37,600
you can also use, you might see this, the semicolon to select all of a target dimension. So let's say
1986
03:29:37,600 --> 03:29:45,360
we wanted to get all of the zeroth dimension, but the zero element from that. We can get 123.
1987
03:29:46,000 --> 03:29:51,040
And then let's say we want to say get all values of the zeroth and first dimensions,
1988
03:29:51,040 --> 03:29:58,080
but only index one of the second dimension. Oh, that was a mouthful. But get all values of
1989
03:29:58,080 --> 03:30:06,720
zeroth and first dimensions, but only index one of second dimension. So let's break this
1990
03:30:06,720 --> 03:30:14,880
down step by step. We want all values of zeroth and first dimensions, but only index one of the
1991
03:30:14,880 --> 03:30:22,720
second dimension. We press enter, shift enter, 258. So what did we get there? 258. Okay. So we've
1992
03:30:22,720 --> 03:30:30,160
got all elements of the zeroth and first dimension, but then so which will return us this thing here.
1993
03:30:30,160 --> 03:30:37,920
But then we only want 258, which is the first element here of the second dimension, which is
1994
03:30:37,920 --> 03:30:43,840
this three there. So quite confusing. But with some practice, you can figure out how to select
1995
03:30:43,840 --> 03:30:49,280
almost any numbers you want from any kind of tensor that you have. So now let's try again,
1996
03:30:49,280 --> 03:30:59,520
get all values of the zero dimension, but only the one index value of the first and second
1997
03:30:59,520 --> 03:31:04,560
dimension. So what might this look like? Let's break it down again. So we come down here x,
1998
03:31:05,120 --> 03:31:09,520
and we're going to go all values of the zero dimension because zero comes first. And then we
1999
03:31:09,520 --> 03:31:15,040
want only the one index value of the first and only the one index value of the second.
2000
03:31:15,680 --> 03:31:20,560
What is this going to give us five? Oh, we selected the middle tensor. So really,
2001
03:31:20,560 --> 03:31:27,520
this line of code is exactly the same as this line of code here, except we've got the square
2002
03:31:27,520 --> 03:31:33,040
brackets on the outside here, because we've got this semicolon there. So if we change this to a zero,
2003
03:31:34,560 --> 03:31:38,720
we remove that. But because we've got the semicolon there, we've selected all the
2004
03:31:38,720 --> 03:31:45,040
dimensions. So we get back the square bracket there, something to keep in mind. Finally,
2005
03:31:45,040 --> 03:31:57,760
let's just go one more. So get index zero of zero and first dimension, and all values of second
2006
03:31:57,760 --> 03:32:06,640
dimension. So x zero, zero. So zero, the index of zero and first dimension, zero, zero,
2007
03:32:06,640 --> 03:32:11,520
and all values of the second dimension. What have we just done here? We've got tensor one,
2008
03:32:11,520 --> 03:32:19,680
two, three, lovely. This code again is equivalent to what we've done up here. This has a semicolon
2009
03:32:19,680 --> 03:32:24,960
on the end. But what this line explicitly says without the semicolon is, hey, give us all the
2010
03:32:24,960 --> 03:32:30,480
values on the remaining dimension there. So my challenge for you is to take this tensor that we
2011
03:32:30,480 --> 03:32:42,000
have got here and index on it to return nine. So I'll write down here, index on x to return nine.
2012
03:32:42,000 --> 03:32:54,160
So if you have a look at x, as well as index on x to return three, six, nine. So these values
2013
03:32:54,160 --> 03:33:02,160
here. So give those both a go and I'll see you in the next video. Welcome back. How'd you go?
2014
03:33:02,160 --> 03:33:07,360
Did you give the challenge ago? I finished the last video with issuing the challenge to index on
2015
03:33:07,360 --> 03:33:13,600
x to return nine and index on x to return three, six, nine. Now here's what I came up with. Again,
2016
03:33:13,600 --> 03:33:16,960
there's a few different ways that you could approach both of these. But this is just what
2017
03:33:16,960 --> 03:33:25,760
I've found. So because x is one, three, three of size, well, that's his dimensions. If we want to
2018
03:33:25,760 --> 03:33:31,760
select nine, we need zero, which is this first outer bracket to get all of these elements. And
2019
03:33:31,760 --> 03:33:37,520
then we need two to select this bottom one here. And then we need this final two to select the
2020
03:33:37,520 --> 03:33:43,520
second dimension of this bottom one here. And then for three, six, nine, we need all of the
2021
03:33:43,520 --> 03:33:47,840
elements in the first dimension, all of the in the zeroth dimension, all of the elements in the
2022
03:33:47,840 --> 03:33:56,080
first dimension. And then we get two, which is this three, six, nine set up here. So that's how I
2023
03:33:56,080 --> 03:34:00,560
would practice indexing, start with whatever shape tensor you like, create it something like this,
2024
03:34:00,560 --> 03:34:05,920
and then see how you can write different indexing to select whatever number you pick.
2025
03:34:05,920 --> 03:34:18,160
So now let's move on to the next part, which is PyTorch tensors and NumPy. So NumPy is a
2026
03:34:18,160 --> 03:34:25,440
popular scientific, very popular. PyTorch actually requires NumPy when you install PyTorch. Popular
2027
03:34:25,440 --> 03:34:37,120
scientific Python numerical computing library, that's a bit of a mouthful. And because of this,
2028
03:34:37,120 --> 03:34:46,880
PyTorch has functionality to interact with it. So quite often, you might start off with,
2029
03:34:46,880 --> 03:34:52,320
let's change this into Markdown, you might start off with your data, because it's numerical format,
2030
03:34:52,320 --> 03:35:03,600
you might start off with data in NumPy, NumPy array, want in PyTorch tensor. Because your
2031
03:35:03,600 --> 03:35:07,680
data might be represented by NumPy because it started in NumPy, but say you want to do
2032
03:35:07,680 --> 03:35:12,320
some deep learning on it and you want to leverage PyTorch's deep learning capabilities,
2033
03:35:12,320 --> 03:35:17,360
well, you might want to change your data from NumPy to a PyTorch tensor. And PyTorch has a
2034
03:35:17,360 --> 03:35:26,320
method to do this, which is torch from NumPy, which will take in an ND array, which is NumPy's
2035
03:35:26,320 --> 03:35:31,840
main data type, and change it into a torch tensor. We'll see this in a second. And then if you want
2036
03:35:31,840 --> 03:35:38,560
to go from PyTorch tensor to NumPy because you want to use some sort of NumPy method,
2037
03:35:38,560 --> 03:35:47,200
well, the method to do this is torch dot tensor, and you can call dot NumPy on it. But this is all
2038
03:35:47,200 --> 03:35:55,600
just talking about in words, let's see it in action. So NumPy array to tensor. Let's try this out
2039
03:35:55,600 --> 03:36:04,560
first. So we'll import torch so we can run this cell on its own, and then import NumPy as np,
2040
03:36:04,560 --> 03:36:10,400
the common naming convention for NumPy, we're going to create an array in NumPy. And we're
2041
03:36:10,400 --> 03:36:18,960
going to just put one to eight, a range. And then we're going to go tensor equals torch from NumPy
2042
03:36:20,240 --> 03:36:26,320
because we want to go from NumPy array to a torch tensor. So we use from NumPy, and then we pass
2043
03:36:26,320 --> 03:36:35,040
in array, and then we have array and tensor. Wonderful. So there's our NumPy array, and our torch
2044
03:36:35,040 --> 03:36:41,600
tensor with the same data. But what you might notice here is that the D type for the tensor is
2045
03:36:41,600 --> 03:36:49,280
torch dot float 64. Now why is this? It's because NumPy's default data type. Oh, D type
2046
03:36:49,280 --> 03:36:57,840
is float 64. Whereas tensor, what have we discussed before? What's pytorch's default data type?
2047
03:36:58,560 --> 03:37:05,440
float 64. Well, that's not pytorch's default data type. If we were to create torch, a range,
2048
03:37:06,000 --> 03:37:10,560
1.0 to 8.0, by default, pytorch is going to create it in
2049
03:37:10,560 --> 03:37:21,520
float 32. So just be aware of that. If you are going from NumPy to pytorch, the default NumPy
2050
03:37:21,520 --> 03:37:28,720
data type is float 64. And pytorch reflects that data type when you use the from NumPy method.
2051
03:37:28,720 --> 03:37:36,240
I wonder if there's a D type. Can we go D type equals torch dot float 32? Takes no keyword.
2052
03:37:36,240 --> 03:37:43,040
Okay. But how could we change the data type here? Well, we could go type torch float 32.
2053
03:37:44,800 --> 03:37:52,400
Yeah, that will give us a tensor D type of float 32 instead of float 64. Beautiful. I'll just keep
2054
03:37:52,400 --> 03:38:06,320
that there so you know, warning when converting from NumPy pytorch, pytorch reflects NumPy's
2055
03:38:06,320 --> 03:38:17,920
default data type of float 64, unless specified. Otherwise, because what have we discussed,
2056
03:38:17,920 --> 03:38:24,560
when you're trying to perform certain calculations, you might run into a data type issue. So you might
2057
03:38:24,560 --> 03:38:32,720
need to convert the type from float 64 to float 32. Now, let's see what happens. What do you think
2058
03:38:32,720 --> 03:38:40,240
will happen if we change the array? We change the value of an array. Well, let's find out.
2059
03:38:40,240 --> 03:38:52,080
So change the value of array. The question is, what will this do to tensor? Because we've used
2060
03:38:52,080 --> 03:38:58,000
the from NumPy method, do you think if we change the array, the tensor will change? So let's try
2061
03:38:58,000 --> 03:39:06,800
this array equals array plus one. So we're just adding one to every value in the array. Now,
2062
03:39:06,800 --> 03:39:15,520
what is the array and the tensor going to look like? Uh huh. So array, we only change the first
2063
03:39:15,520 --> 03:39:21,520
value there. Oh, sorry, we change every value because we have one to seven. Now it's two, three,
2064
03:39:21,520 --> 03:39:26,000
four, five, six, seven, eight. We change the value from the array. It doesn't change the
2065
03:39:26,000 --> 03:39:32,240
value of the tensor. So that's just something to keep in mind. If you use from NumPy, we get
2066
03:39:32,240 --> 03:39:37,120
a new tensor in memory here. So the original, the new tensor doesn't change if you change the
2067
03:39:37,120 --> 03:39:43,360
original array. So now let's go from tensor to NumPy. If you wanted to go back to NumPy,
2068
03:39:43,360 --> 03:39:49,440
tensor to NumPy array. So we'll start with a tensor. We could use the one we have right now,
2069
03:39:49,440 --> 03:39:52,880
but we're going to create another one, but we'll create one of ones just for fun.
2070
03:39:53,680 --> 03:40:01,600
One rhymes with fun. NumPy tensor equals. How do we go to NumPy? Well, we have
2071
03:40:01,600 --> 03:40:08,480
torch dot tensor dot NumPy. So we just simply call NumPy on here. And then we have tensor
2072
03:40:08,480 --> 03:40:14,080
and NumPy tensor. What data type do you think the NumPy tensor is going to have?
2073
03:40:14,080 --> 03:40:19,040
Because we've returned it to NumPy. Pi torches, default data type is
2074
03:40:21,360 --> 03:40:26,560
Flight 32. So if we change that to NumPy, what's going to be the D type of the NumPy tensor?
2075
03:40:26,560 --> 03:40:36,800
NumPy tensor dot D type. It reflects the original D type of what you set the tensor as. So just
2076
03:40:36,800 --> 03:40:41,360
keep that in mind. If you're going between PyTorch and NumPy, default data type of NumPy is
2077
03:40:41,360 --> 03:40:47,120
float 64, whereas the default data type of PyTorch is float 32. So that may cause some errors if
2078
03:40:47,120 --> 03:40:51,600
you're doing different kinds of calculations. Now, what do you think is going to happen if we
2079
03:40:51,600 --> 03:40:58,800
went from our tensor to an array, if we change the tensor, change the tensor, what happens to
2080
03:41:01,760 --> 03:41:11,280
NumPy tensor? So we get tensor equals tensor plus one. And then we go NumPy tensor.
2081
03:41:11,920 --> 03:41:19,280
Oh, we'll get tensor as well. So our tensor is now all twos because we added one to the ones.
2082
03:41:19,280 --> 03:41:24,960
But our NumPy tensor remains the same. Remains unchanged. So this means they don't share memory.
2083
03:41:24,960 --> 03:41:31,600
So that's how we go in between PyTorch and NumPy. If you'd like to look up more, I'd encourage
2084
03:41:31,600 --> 03:41:40,160
you to go PyTorch and NumPy. So warm up NumPy, beginner. There's a fair few tutorials here on
2085
03:41:40,160 --> 03:41:45,840
PyTorch because NumPy is so prevalent, they work pretty well together. So have a look at that.
2086
03:41:45,840 --> 03:41:50,080
There's a lot going on there. There's a few more links, I'd encourage you to check out,
2087
03:41:50,080 --> 03:41:54,800
but we've covered some of the main ones that you'll see in practice. With that being said,
2088
03:41:54,800 --> 03:42:00,800
let's now jump into the next video where we're going to have a look at the concept of reproducibility.
2089
03:42:00,800 --> 03:42:05,200
If you'd like to look that up, I'd encourage you to search PyTorch's reproducibility and see
2090
03:42:05,200 --> 03:42:12,880
what you can find. Otherwise, I'll see you in the next video. Welcome back. It's now time for us
2091
03:42:12,880 --> 03:42:19,600
to cover the topic of reproducibility. If I could even spell it, that would be fantastic.
2092
03:42:19,600 --> 03:42:30,480
Reproducibility. Trying to take the random out of random. So we've touched upon the concept of
2093
03:42:30,480 --> 03:42:35,040
neural networks harnessing the power of randomness. And what I mean by that is we haven't actually
2094
03:42:35,040 --> 03:42:40,320
built our own neural network yet, but we will be doing that. And we've created tenses full of random
2095
03:42:40,320 --> 03:42:53,600
values. And so in short, how our neural network learns is start with random numbers, perform tensor
2096
03:42:53,600 --> 03:43:05,760
operations, update random numbers to try and make them better representations of the data. Again,
2097
03:43:05,760 --> 03:43:18,400
again, again, again, again. However, if you're trying to do reproducible experiments, sometimes
2098
03:43:18,400 --> 03:43:22,880
you don't want so much randomness. And what I mean by this is if we were creating random tensors,
2099
03:43:23,440 --> 03:43:28,080
from what we've seen so far is that every time we create a random tensor, let's create one here,
2100
03:43:28,080 --> 03:43:36,320
torch dot rand, and we'll create it of three three. Every time we run this cell, it gives us new numbers.
2101
03:43:36,320 --> 03:43:43,920
So 7 7 5 2. There we go. Rand again. Right. So we get a whole bunch of random numbers here.
2102
03:43:45,040 --> 03:43:50,160
Every single time. But what if you were trying to share this notebook with a friend,
2103
03:43:50,160 --> 03:43:55,840
so say you went up share and you clicked the share link and you sent that to someone and you're like,
2104
03:43:55,840 --> 03:44:00,400
hey, try out this machine learning experiment I did. And you wanted a little less randomness
2105
03:44:00,400 --> 03:44:06,640
because neural networks start with random numbers. How might you do that? Well, let's
2106
03:44:06,640 --> 03:44:20,560
this write down to reduce the randomness in neural networks. And pytorch comes the concept of a
2107
03:44:20,560 --> 03:44:27,840
random seed. So we're going to see this in action. But essentially, let's write this down,
2108
03:44:27,840 --> 03:44:41,840
essentially what the random seed does is flavor the randomness. So because of how computers work,
2109
03:44:41,840 --> 03:44:46,720
they're actually not true randomness. And actually, there's arguments against this,
2110
03:44:46,720 --> 03:44:50,640
and it's quite a big debate in the computer science topic, whatnot, but I am not a computer
2111
03:44:50,640 --> 03:44:56,160
scientist, I am a machine learning engineer. So computers are fundamentally deterministic.
2112
03:44:56,160 --> 03:45:01,360
It means they run the same steps over and over again. So what the randomness we're doing here
2113
03:45:01,360 --> 03:45:06,160
is referred to as pseudo randomness or generated randomness. And the random seed,
2114
03:45:06,160 --> 03:45:11,760
which is what you see a lot in machine learning experiments, flavors that randomness. So let's
2115
03:45:11,760 --> 03:45:16,000
see it in practice. And at the end of this video, I'll give you two resources that I'd recommend
2116
03:45:16,000 --> 03:45:21,680
to learn a little bit more about the concept of pseudo randomness and reproducibility in pytorch.
2117
03:45:22,240 --> 03:45:28,240
Let's start by importing torch so you could start this notebook right from here. Create two random
2118
03:45:28,240 --> 03:45:38,960
tensors. We'll just call this random tensor a equals torch dot rand and we'll go three four
2119
03:45:38,960 --> 03:45:48,480
and we'll go random tensor b equals torch dot rand same size three four. And then if we have a
2120
03:45:48,480 --> 03:45:59,600
look at let's go print random tensor a print random tensor b. And then let's print to see if
2121
03:45:59,600 --> 03:46:08,480
they're equal anywhere random tensor a equals equals equals random tensor b. Now what do you
2122
03:46:08,480 --> 03:46:15,520
think this is going to do? If we have a look at one equals one, what does it return? True.
2123
03:46:16,240 --> 03:46:21,200
So this is comparison operator to compare two different tensors. We're creating two random
2124
03:46:21,200 --> 03:46:25,280
tensors here. We're going to have a look at them. We'd expect them to be full of random values.
2125
03:46:25,280 --> 03:46:29,680
Do you think any of the values in each of these random tensors is going to be equal to each other?
2126
03:46:31,280 --> 03:46:36,320
Well, there is a chance that they are, but it's highly unlikely. I'll be quite surprised if they are.
2127
03:46:36,320 --> 03:46:43,600
Oh, again, my connection might be a little bit. Oh, there we go. Beautiful. So we have tensor a
2128
03:46:44,240 --> 03:46:51,440
tensor of three four with random numbers. And we have tensor b of three four with random numbers.
2129
03:46:51,440 --> 03:46:55,920
So if we were, if I was to share this notebook with my friend or my colleague or even you,
2130
03:46:56,480 --> 03:47:00,880
if you ran this cell, you are going to get random numbers as well. And you have every chance of
2131
03:47:00,880 --> 03:47:05,760
replicating one of these numbers. But again, it's highly unlikely. So again, I'm getting that
2132
03:47:05,760 --> 03:47:10,480
automatic save failed. You might get that if your internet connection is dropping out, maybe that's
2133
03:47:10,480 --> 03:47:15,360
something going on with my internet connection. But again, as we've seen, usually this resolves
2134
03:47:15,360 --> 03:47:20,640
itself. If you try a few times, I'll just keep coding. If it really doesn't resolve itself,
2135
03:47:20,640 --> 03:47:26,800
you can go file is a download notebook or save a copy and drive download. You can download the
2136
03:47:26,800 --> 03:47:32,800
notebook, save it to your local machine, re upload it to upload notebook and start again in another
2137
03:47:32,800 --> 03:47:38,080
Google Colab instance. But there we go. It fixed itself. Wonderful troubleshooting on the fly.
2138
03:47:38,880 --> 03:47:45,920
So the way we make these reproducible is through the concept of a random seed. So let's have a
2139
03:47:45,920 --> 03:47:58,560
look at that. Let's make some random, but reproducible tenses. So import torch. And we're going to
2140
03:47:58,560 --> 03:48:13,840
set the random seed by going torch dot manual seed random. Oh, we don't have random set yet.
2141
03:48:14,480 --> 03:48:20,720
I'm going to set my random seed. You set the random seed to some numerical value. 42 is a common
2142
03:48:20,720 --> 03:48:26,320
one. You might see zero. You might see one, two, three, four. Essentially, you can set it to whatever
2143
03:48:26,320 --> 03:48:33,280
you want. And each of these, you can think of 77, 100, as different flavors of randomness. So
2144
03:48:33,280 --> 03:48:39,680
I like to use 42, because it's the answer to the universe. And then we go random seed. And now
2145
03:48:39,680 --> 03:48:50,720
let's create some random tenses. Random tensor C with the flavor of our random seed. Three,
2146
03:48:50,720 --> 03:48:59,360
four. And then we're going to go torch tensor D equals torch dot rand three, four. Now, let's
2147
03:48:59,360 --> 03:49:10,800
see what happens. We'll print out random tensor C. And we'll print out random tensor D. And then
2148
03:49:10,800 --> 03:49:20,880
we'll print out to see if they're equal anywhere. Random tensor C equals random tensor D. So let's
2149
03:49:20,880 --> 03:49:32,000
find out what happens. Huh, what gives? Well, we've got randomness. We set the random seed. We're
2150
03:49:32,000 --> 03:49:42,640
telling pytorch a flavor our randomness with 42 torch manual seed. Hmm, let's try set the manual
2151
03:49:42,640 --> 03:49:52,240
seed each time we call a random method. We go there. Ah, much better. So now we've got some
2152
03:49:52,240 --> 03:49:59,600
flavored randomness. So a thing to keep in mind is that if you want to use the torch manual seed,
2153
03:49:59,600 --> 03:50:06,320
generally it only works for one block of code if you're using a notebook. So that's just
2154
03:50:06,320 --> 03:50:10,160
something to keep in mind. If you're creating random tensors, one after the other, we're using
2155
03:50:10,160 --> 03:50:15,280
assignment like this, you should use torch dot manual seed every time you want to call the rand
2156
03:50:15,280 --> 03:50:20,800
method or some sort of randomness. However, if we're using other torch processes, usually what
2157
03:50:20,800 --> 03:50:25,520
you might see is torch manual seed is set right at the start of a cell. And then a whole bunch
2158
03:50:25,520 --> 03:50:31,200
of code is done down here. But because we're calling subsequent methods here, we have to reset
2159
03:50:31,200 --> 03:50:36,720
the random seed. Otherwise, if we don't do this, we comment this line, it's going to flavor the
2160
03:50:36,720 --> 03:50:42,640
randomness of torch random tensor C with torch manual seed. But then random tensor D is just
2161
03:50:42,640 --> 03:50:48,800
going to have no flavor. It's not going to use a random seed. So we reset it there. Wonderful.
2162
03:50:48,800 --> 03:50:56,400
So I wonder, does this have a seed method? Let's go torch dot rand. Does this have seed?
2163
03:50:57,040 --> 03:51:02,480
Sometimes they have a seed method. Seed, no, it doesn't. Okay, that's all right.
2164
03:51:03,440 --> 03:51:08,000
The more you learn, but there's documentation for torch dot rand. And I said that I was going to
2165
03:51:08,000 --> 03:51:14,080
link at the end of this video. So the manual seed is a way to, or the random seed, but in
2166
03:51:14,080 --> 03:51:19,680
torch, it's called a manual seed is a way to flavor the randomness. So these numbers, as you see,
2167
03:51:19,680 --> 03:51:24,880
are still quite random. But the random seed just makes them reproducible. So if I was to share this
2168
03:51:24,880 --> 03:51:28,720
with you, if you had to run this block of code, ideally, you're going to get the same numerical
2169
03:51:28,720 --> 03:51:35,360
output here. So with that being said, I'd like to refer to you to the pie torch reproducibility
2170
03:51:35,360 --> 03:51:40,240
document, because we've only quite scratched the surface of this of reproducibility. We've covered
2171
03:51:40,240 --> 03:51:48,400
one of the main ones. But this is a great document on how to go through reproducibility in pie torch.
2172
03:51:48,400 --> 03:51:53,040
So this is your extra curriculum for this, even if you don't understand what's going on in a lot
2173
03:51:53,040 --> 03:51:58,320
of the code here, just be aware of reproducibility, because it's an important topic in machine
2174
03:51:58,320 --> 03:52:06,240
learning and deep learning. So I'll put this here, extra resources for reproducibility.
2175
03:52:06,240 --> 03:52:14,640
As we go pie torch randomness, we'll change this into markdown. And then finally, the concept
2176
03:52:14,640 --> 03:52:22,320
of a random seed is Wikipedia random seed. So random seeds quite a universal concept,
2177
03:52:22,320 --> 03:52:27,280
not just for pie torch, there's a random seed and NumPy as well. So if you'd like to see what
2178
03:52:27,280 --> 03:52:33,120
this means, yeah, initialize a pseudo random number generator. So that's a big word, pseudo random
2179
03:52:33,120 --> 03:52:38,160
number generator. But if you'd like to learn about more random number generation computing,
2180
03:52:38,160 --> 03:52:43,440
and what a random seed does is I'd refer to you to check out this documentation here.
2181
03:52:44,720 --> 03:52:50,400
Whoo, far out, we have covered a lot. But there's a couple more topics you should really be aware
2182
03:52:50,400 --> 03:52:55,600
of to finish off the pie torch fundamentals. You got this. I'll see you in the next video.
2183
03:52:55,600 --> 03:53:04,400
Welcome back. Now, let's talk about the important concept of running tenses or pie
2184
03:53:04,400 --> 03:53:17,440
torch objects. So running tenses and pie torch objects on GPUs and making faster computations.
2185
03:53:17,440 --> 03:53:26,480
So we've discussed that GPUs, let me just scroll down a little bit here, GPUs equal faster
2186
03:53:26,480 --> 03:53:40,560
computation on numbers. Thanks to CUDA plus NVIDIA hardware plus pie torch working behind the
2187
03:53:40,560 --> 03:53:50,400
scenes to make everything hunky dory. Good. That's what hunky dory means, by the way,
2188
03:53:50,400 --> 03:53:55,920
if you never heard that before. So let's have a look at how we do this. Now, we first need to
2189
03:53:55,920 --> 03:54:02,960
talk about, let's go here one getting a GPU. There's a few different ways we've seen one before.
2190
03:54:02,960 --> 03:54:12,000
Number one easiest is to use what we're using right now. Use Google Colab for a free GPU.
2191
03:54:13,360 --> 03:54:18,880
But there's also Google Colab Pro. And I think there might even be, let's look up Google Colab
2192
03:54:19,520 --> 03:54:25,840
Pro. Choose the best that's right for you. I use Google Colab Pro because I use it almost every day.
2193
03:54:25,840 --> 03:54:32,320
So yeah, I pay for Colab Pro. You can use Colab for free, which is might be what you're using.
2194
03:54:32,320 --> 03:54:38,800
There's also Colab Pro Plus, which has a lot more advantages as well. But Colab Pro is giving me
2195
03:54:38,800 --> 03:54:45,040
faster GPUs, so access to faster GPUs, which means you spend less time waiting while your code is running.
2196
03:54:45,040 --> 03:54:50,400
More memory, longer run time, so it'll last a bit longer if you leave it running idle.
2197
03:54:50,400 --> 03:54:55,760
And then Colab Pro again is a step up from that. I personally haven't had a need yet to use
2198
03:54:55,760 --> 03:55:01,760
Google Colab Pro Plus. You can complete this whole course on the free tier as well. But as you start
2199
03:55:01,760 --> 03:55:06,160
to code more, as you start to run bigger models, as you start to want to compute more, you might
2200
03:55:06,160 --> 03:55:14,400
want to look into something like Google Colab Pro. Or let's go here. Options to upgrade as well.
2201
03:55:14,400 --> 03:55:26,800
And then another way is use your own GPU. Now this takes a little bit of setup and requires
2202
03:55:26,800 --> 03:55:39,920
the investment of purchasing a GPU. There's lots of options. So one of my favorite posts for
2203
03:55:39,920 --> 03:55:50,640
getting a GPU is, yeah, the best GPUs for deep learning in 2020, or something like this.
2204
03:55:51,200 --> 03:56:00,240
What do we got? Deep learning? Tim Detmos. This is, yeah, which GPUs to get for deep learning?
2205
03:56:00,240 --> 03:56:07,120
Now, I believe at the time of this video, I think it's been updated since this date. So don't take
2206
03:56:07,120 --> 03:56:14,640
my word for it. But this is a fantastic blog post for figuring out what GPUs see this post
2207
03:56:14,640 --> 03:56:27,840
for what option to get. And then number three is use cloud computing. So such as
2208
03:56:28,800 --> 03:56:33,840
GCP, which is Google Cloud Platform AWS, which is Amazon Web Services or Azure.
2209
03:56:33,840 --> 03:56:45,120
These services, which is Azure is by Microsoft, allow you to rent computers on the cloud and access
2210
03:56:45,120 --> 03:56:51,600
them. So the first option using Google Colab, which is what we're using is by far the easiest
2211
03:56:51,600 --> 03:56:57,440
and free. So there's big advantages there. However, the downside is that you have to use a website
2212
03:56:57,440 --> 03:57:01,680
here, Google Colab, you can't run it locally. You don't get the benefit of using cloud computing,
2213
03:57:01,680 --> 03:57:07,680
but my personal workflow is I run basically all of my small scale experiments and things like
2214
03:57:07,680 --> 03:57:13,280
learning new stuff in Google Colab. And then if I want to upgrade things, run video experiments,
2215
03:57:13,280 --> 03:57:18,800
I have my own dedicated deep learning PC, which I have built with a big powerful GPU. And then
2216
03:57:18,800 --> 03:57:25,200
also I use cloud computing if necessary. So that's my workflow. Start with Google Colab.
2217
03:57:25,200 --> 03:57:30,160
And then these two, if I need to do some larger experiments. But because this is the beginning
2218
03:57:30,160 --> 03:57:34,880
of course, we can just stick with Google Colab for the time being. But I thought I'd make you aware
2219
03:57:34,880 --> 03:57:44,640
of these other two options. And if you'd like to set up a GPU, so four, two, three, PyTorch plus
2220
03:57:44,640 --> 03:57:56,000
GPU drivers, which is CUDA takes a little bit of setting up to do this, refer to PyTorch
2221
03:57:56,000 --> 03:58:07,200
setup documentation. So if we go to pytorch.org, they have some great setup guides here,
2222
03:58:07,200 --> 03:58:12,720
get started. And we have start locally. This is if you want to run on your local machine,
2223
03:58:12,720 --> 03:58:18,400
such as a Linux setup. This is what I have Linux CUDA 11.3. It's going to give you a
2224
03:58:18,400 --> 03:58:26,080
conda install command to use conda. And then if you want to use cloud partners, which is Alibaba
2225
03:58:26,080 --> 03:58:31,360
Cloud, Amazon Web Services, Google Cloud Platform, this is where you'll want to go. So I'll just link
2226
03:58:31,360 --> 03:58:38,560
this in here. But for this course, we're going to be focusing on using Google Colab. So now,
2227
03:58:38,560 --> 03:58:43,920
let's see how we might get a GPU in Google Colab. And we've already covered this, but I'm going to
2228
03:58:43,920 --> 03:58:50,640
recover it just so you know. We're going to change the runtime type. You can go in any notebook and
2229
03:58:50,640 --> 03:58:58,880
do this, runtime type, hardware accelerator, we can select GPU, click save. Now this is going to
2230
03:58:58,880 --> 03:59:07,680
restart our runtime and connect us to our runtime, aka a Google compute instance with a GPU. And so
2231
03:59:07,680 --> 03:59:18,400
now if we run NVIDIA SMI, I have a Tesla P100 GPU. So let's look at this Tesla P100
2232
03:59:21,360 --> 03:59:28,240
GPU. Do we have an image? Yeah, so this is the GPU that I've got running, not the Tesla car,
2233
03:59:28,240 --> 03:59:35,120
the GPU. So this is quite a powerful GPU. That is because I have upgraded to Colab Pro. Now,
2234
03:59:35,120 --> 03:59:40,560
if you're not using Colab Pro, you might get something like a Tesla K80, which is a slightly
2235
03:59:40,560 --> 03:59:48,240
less powerful GPU than a Tesla P100, but still a GPU nonetheless and will still work faster than
2236
03:59:48,240 --> 03:59:53,840
just running PyTorch code on the pure CPU, which is the default in Google Colab and the default
2237
03:59:53,840 --> 04:00:02,880
in PyTorch. And so now we can also check to see if we have GPU access with PyTorch. So let's go
2238
04:00:02,880 --> 04:00:11,600
here. This is number two now. Check for GPU access with PyTorch. So this is a little command that's
2239
04:00:11,600 --> 04:00:20,480
going to allow us or tell us if PyTorch, just having the GPU here, this is by the way, another
2240
04:00:20,480 --> 04:00:27,440
thing that Colab has a good setup with, is that all the connections between PyTorch and the NVIDIA
2241
04:00:27,440 --> 04:00:34,640
GPU are set up for us. Whereas when you set it up on your own GPU or using cloud computing,
2242
04:00:34,640 --> 04:00:38,560
there are a few steps you have to go through, which we're not going to cover in this course.
2243
04:00:38,560 --> 04:00:42,480
I'd highly recommend you go through the getting started locally set up if you want to do that,
2244
04:00:43,040 --> 04:00:49,440
to connect PyTorch to your own GPU. So let's check for the GPU access with PyTorch.
2245
04:00:49,440 --> 04:00:58,800
This is another advantage of using Google Colab. Almost zero set up to get started. So import
2246
04:00:58,800 --> 04:01:08,960
torch and then we're going to go torch dot cuda dot is available. And remember, cuda is
2247
04:01:08,960 --> 04:01:16,800
NVIDIA's programming interface that allows us to use GPUs for numerical computing. There we go,
2248
04:01:16,800 --> 04:01:22,240
beautiful. So big advantage of Google Colab is we get access to a free GPU. In my case, I'm paying
2249
04:01:22,240 --> 04:01:26,800
for the faster GPU, but in your case, you're more than welcome to use the free version.
2250
04:01:26,800 --> 04:01:34,400
All that means it'll be slightly slower than a faster GPU here. And we now have access to GPUs
2251
04:01:34,400 --> 04:01:44,560
with PyTorch. So there is one more thing known as device agnostic code. So set up device agnostic
2252
04:01:44,560 --> 04:01:50,320
code. Now, this is an important concept in PyTorch because wherever you run PyTorch, you might not
2253
04:01:50,320 --> 04:01:57,760
always have access to a GPU. But if there was access to a GPU, you'd like it to use it if it's
2254
04:01:57,760 --> 04:02:05,280
available. So one of the ways that this is done in PyTorch is to set the device variable. Now,
2255
04:02:05,280 --> 04:02:09,520
really, you could set this to any variable you want, but you're going to see it used as device
2256
04:02:09,520 --> 04:02:21,280
quite often. So cuda if torch dot cuda is available. Else CPU. So all this is going to say, and we'll
2257
04:02:21,280 --> 04:02:29,280
see where we use the device variable later on is set the device to use cuda if it's available. So
2258
04:02:29,280 --> 04:02:35,040
it is so true. If it's not available, if we don't have access to a GPU that PyTorch can use,
2259
04:02:35,040 --> 04:02:41,120
just default to the CPU. So with that being said, there's one more thing. You can also count the
2260
04:02:41,120 --> 04:02:45,520
number of GPUs. So this won't really apply to us for now because we're just going to stick with
2261
04:02:45,520 --> 04:02:51,040
using one GPU. But as you upgrade your PyTorch experiments and machine learning experiments,
2262
04:02:51,040 --> 04:02:55,360
you might have access to more than one GPU. So you can also count the devices here.
2263
04:02:57,200 --> 04:03:02,640
We have access to one GPU, which is this here. So the reason why you might want to count the number
2264
04:03:02,640 --> 04:03:08,960
of devices is because if you're running huge models on large data sets, you might want to run one
2265
04:03:08,960 --> 04:03:16,080
model on a certain GPU, another model on another GPU, and so on and so on. But final thing before
2266
04:03:16,080 --> 04:03:24,720
we finish this video is if we go PyTorch device agnostic code, cuda semantics, there's a little
2267
04:03:24,720 --> 04:03:31,120
section in here called best practices. This is basically what we just covered there is setting
2268
04:03:31,120 --> 04:03:37,600
the device argument. Now this is using the arg pass, but so yeah, there we go. args.device,
2269
04:03:37,600 --> 04:03:44,320
torch.device, cuda, args.device, torch.device, CPU. So this is one way to set it from the Python
2270
04:03:45,120 --> 04:03:50,720
arguments when you're running scripts, but we're using the version of running it through a notebook.
2271
04:03:51,760 --> 04:03:57,040
So check this out. I'll just link this here, device agnostic code. It's okay if you're not sure
2272
04:03:57,040 --> 04:04:01,040
of what's going on here. We're going to cover it a little bit more later on throughout the course,
2273
04:04:01,680 --> 04:04:12,640
but right here for PyTorch, since it's capable of running compute on the GPU or CPU,
2274
04:04:12,640 --> 04:04:27,840
it's best practice to set up device agnostic code, e.g. run on GPU if available,
2275
04:04:29,360 --> 04:04:37,440
else default to CPU. So check out the best practices for using cuda, which is namely setting up
2276
04:04:37,440 --> 04:04:44,000
device agnostic code. And let's in the next video, see what I mean about setting our PyTorch tensors
2277
04:04:44,000 --> 04:04:51,840
and objects to the target device. Welcome back. In the last video, we checked out a few different
2278
04:04:51,840 --> 04:04:58,480
options for getting a GPU, and then getting PyTorch to run on the GPU. And for now we're using
2279
04:04:58,480 --> 04:05:04,240
Google Colab, which is the easiest way to get set up because it gives us free access to a GPU,
2280
04:05:04,240 --> 04:05:11,040
faster ones if you set up with Colab Pro, and it comes with PyTorch automatically set up to
2281
04:05:11,040 --> 04:05:20,000
use the GPU if it's available. So now let's see how we can actually use the GPU. So to do so,
2282
04:05:20,000 --> 04:05:33,120
we'll look at putting tensors and models on the GPU. So the reason we want our tensors slash models
2283
04:05:33,120 --> 04:05:43,040
on the GPU is because using GPU results in faster computations. And if we're getting our machine
2284
04:05:43,040 --> 04:05:48,080
learning models to find patterns and numbers, GPUs are great at doing numerical calculations.
2285
04:05:48,080 --> 04:05:52,800
And the numerical calculations we're going to be doing are tensor operations like we saw above.
2286
04:05:53,520 --> 04:05:59,360
So the tensor operations, well, we've covered a lot. Somewhere here, tensor operations,
2287
04:05:59,360 --> 04:06:04,160
there we go, manipulating tensor operations. So if we can run these computations faster,
2288
04:06:04,160 --> 04:06:10,080
we can discover patterns in our data faster, we can do more experiments, and we can work towards
2289
04:06:10,080 --> 04:06:15,280
finding the best possible model for whatever problem that we're working on. So let's see,
2290
04:06:15,920 --> 04:06:21,840
we'll create a tensor, as usual, create a tensor. Now the default is on the CPU.
2291
04:06:21,840 --> 04:06:30,160
So tensor equals torch dot tensor. And we'll just make it a nice simple one, one, two, three.
2292
04:06:30,720 --> 04:06:38,480
And let's write here, tensor not on GPU will print out tensor. And this is where we can use,
2293
04:06:39,440 --> 04:06:47,040
we saw this parameter before device. Can we pass it in here? Device equals CPU.
2294
04:06:47,040 --> 04:06:54,880
Let's see what this comes out with. There we go. So if we print it out, tensor 123 is on the CPU.
2295
04:06:54,880 --> 04:07:02,560
But even if we got rid of that device parameter, by default, it's going to be on the CPU. Wonderful.
2296
04:07:02,560 --> 04:07:08,880
So now PyTorch makes it quite easy to move things to, and I'm saying to for a reason,
2297
04:07:08,880 --> 04:07:18,320
to the GPU, or to, even better, the target device. So if the GPU is available, we use CUDA.
2298
04:07:18,320 --> 04:07:23,840
If it's not, it uses CPU. This is why we set up the device variable. So let's see,
2299
04:07:24,560 --> 04:07:28,080
move tensor to GPU. If available,
2300
04:07:28,080 --> 04:07:40,240
tensor on GPU equals tensor dot two device. Now let's have a look at this, tensor on GPU.
2301
04:07:43,520 --> 04:07:48,960
So this is going to shift the tensor that we created up here to the target device.
2302
04:07:50,720 --> 04:07:57,040
Wonderful. Look at that. So now our tensor 123 is on device CUDA zero. Now this is the index of
2303
04:07:57,040 --> 04:08:02,240
the GPU that we're using, because we only have one, it's going to be at index zero. So later on,
2304
04:08:02,240 --> 04:08:06,960
when you start to do bigger experiments and work with multiple GPUs, you might have different tensors
2305
04:08:06,960 --> 04:08:12,640
that are stored on different GPUs. But for now, we're just sticking with one GPU, keeping it nice
2306
04:08:12,640 --> 04:08:18,960
and simple. And so you might have a case where you want to move, oh, actually, the reason why we
2307
04:08:18,960 --> 04:08:25,680
set up device agnostic code is again, this code would work if we run this, regardless if we had,
2308
04:08:25,680 --> 04:08:32,240
so it won't error out. But regardless if we had a GPU or not, this code will work. So whatever device
2309
04:08:32,240 --> 04:08:38,400
we have access to, whether it's only a CPU or whether it's a GPU, this tensor will move to whatever
2310
04:08:38,400 --> 04:08:44,480
target device. But since we have a GPU available, it goes there. You'll see this a lot. This two
2311
04:08:44,480 --> 04:08:50,240
method moves tensors and it can be also used for models. We're going to see that later on. So just
2312
04:08:50,240 --> 04:08:58,080
keep two device in mind. And then you might want to, for some computations, such as using NumPy,
2313
04:08:58,080 --> 04:09:06,320
NumPy only works with the CPU. So you might want to move tensors back to the CPU, moving tensors back
2314
04:09:06,320 --> 04:09:13,280
to the CPU. So can you guess how we might do that? It's okay if you don't know. We haven't covered a
2315
04:09:13,280 --> 04:09:17,760
lot of things, but I'm going to challenge you anyway, because that's the fun part of thinking
2316
04:09:17,760 --> 04:09:27,200
about something. So let's see how we can do it. Let's write down if tensor is on GPU, can't transform
2317
04:09:27,200 --> 04:09:33,840
it to NumPy. So let's see what happens if we take our tensor on the GPU and try to go NumPy.
2318
04:09:34,640 --> 04:09:39,760
What happens? Well, we get an error. So this is another huge error. Remember the top three
2319
04:09:39,760 --> 04:09:44,640
errors in deep learning or pytorch? There's lots of them, but number one, shape errors,
2320
04:09:44,640 --> 04:09:51,520
number two, data type issues. And with pytorch, number three is device issues. So can't convert
2321
04:09:51,520 --> 04:09:58,400
CUDA zero device type tensor to NumPy. So NumPy doesn't work with the GPU. Use tensor dot CPU
2322
04:09:58,400 --> 04:10:04,320
to copy the tensor to host memory first. So if we call tensor dot CPU, it's going to bring our
2323
04:10:04,320 --> 04:10:10,320
target tensor back to the CPU. And then we should be able to use it with NumPy. So
2324
04:10:10,320 --> 04:10:26,480
to fix the GPU tensor with NumPy issue, we can first set it to the CPU. So tensor back on CPU
2325
04:10:27,680 --> 04:10:34,480
equals tensor on GPU dot CPU. We're just taking what this said here. That's a beautiful thing
2326
04:10:34,480 --> 04:10:39,280
about pytorch is very helpful error messages. And then we're going to go NumPy.
2327
04:10:39,280 --> 04:10:45,520
And then if we go tensor back on CPU, is this going to work? Let's have a look. Oh, of course,
2328
04:10:45,520 --> 04:10:53,280
it's not because I typed it wrong. And I've typed it again twice. Third time, third time's a charm.
2329
04:10:54,320 --> 04:11:00,880
There we go. Okay, so that works because we've put it back to the CPU first before calling NumPy.
2330
04:11:00,880 --> 04:11:07,520
And then if we refer back to our tensor on the GPU, because we've reassociated this, again,
2331
04:11:07,520 --> 04:11:14,640
we've got typos galore classic, because we've reassigned tensor back on CPU, our tensor on
2332
04:11:14,640 --> 04:11:22,080
GPU remains unchanged. So that's the four main things about working with pytorch on the GPU.
2333
04:11:22,080 --> 04:11:26,560
There are a few more tidbits such as multiple GPUs, but now you've got the fundamentals. We're
2334
04:11:26,560 --> 04:11:30,400
going to stick with using one GPU. And if you'd like to later on once you've learned a bit more
2335
04:11:30,400 --> 04:11:36,080
research into multiple GPUs, well, as you might have guessed, pytorch has functionality for that too.
2336
04:11:36,080 --> 04:11:42,320
So have a go at getting access to a GPU using colab, check to see if it's available, set up device
2337
04:11:42,320 --> 04:11:48,000
agnostic code, create a few dummy tensors and just set them to different devices, see what happens
2338
04:11:48,000 --> 04:11:53,360
if you change the device parameter, run a few errors by trying to do some NumPy calculations
2339
04:11:53,360 --> 04:11:58,480
with tensors on the GPU, and then bring those tensors on the GPU back to NumPy and see what happens
2340
04:11:58,480 --> 04:12:05,920
there. So I think we've covered, I think we've reached the end of the fundamentals. We've covered
2341
04:12:05,920 --> 04:12:10,960
a fair bit. Introduction to tensors, the minmax, a whole bunch of stuff inside the introduction
2342
04:12:10,960 --> 04:12:16,880
to tensors, finding the positional minmax, reshaping, indexing, working with tensors and NumPy,
2343
04:12:16,880 --> 04:12:24,720
reproducibility, using a GPU and moving stuff back to the GPU far out. Now you're probably wondering,
2344
04:12:24,720 --> 04:12:29,680
Daniel, we've covered a whole bunch. What should I do to practice all this? Well, I'm glad you asked.
2345
04:12:29,680 --> 04:12:35,840
Let's cover that in the next video. Welcome back. And you should be very proud of your
2346
04:12:35,840 --> 04:12:41,120
self right now. We've been through a lot, but we've covered a whole bunch of PyTorch fundamentals.
2347
04:12:41,120 --> 04:12:45,040
These are going to be the building blocks that we use throughout the rest of the course.
2348
04:12:45,680 --> 04:12:51,760
But before moving on to the next section, I'd encourage you to try out what you've learned
2349
04:12:51,760 --> 04:12:59,680
through the exercises and extra curriculum. Now, I've set up a few exercises here based off
2350
04:12:59,680 --> 04:13:05,120
everything that we've covered. If you go into learn pytorch.io, go to the section that we're
2351
04:13:05,120 --> 04:13:09,680
currently on. This is going to be the case for every section, by the way. So just keep this in mind,
2352
04:13:10,240 --> 04:13:15,360
is we're working on PyTorch fundamentals. Now, if you go to the PyTorch fundamentals notebook,
2353
04:13:15,360 --> 04:13:20,000
this is going to refresh, but that if you scroll down to the table of contents at the bottom of
2354
04:13:20,000 --> 04:13:26,000
each one is going to be some exercises and extra curriculum. So these exercises here,
2355
04:13:26,560 --> 04:13:31,440
such as documentation reading, because a lot you've seen me refer to the PyTorch documentation
2356
04:13:31,440 --> 04:13:36,720
for almost everything we've covered a lot, but it's important to become familiar with that.
2357
04:13:36,720 --> 04:13:42,000
So exercise number one is read some of the documentation. Exercise number two is create a
2358
04:13:42,000 --> 04:13:48,160
random tensor with shape, seven, seven. Three, perform a matrix multiplication on the tensor from two
2359
04:13:48,160 --> 04:13:53,520
with another random tensor. So these exercises are all based off what we've covered here.
2360
04:13:53,520 --> 04:13:59,680
So I'd encourage you to reference what we've covered in whichever notebook you choose,
2361
04:13:59,680 --> 04:14:04,240
could be this learn pytorch.io, could be going back through the one we've just coded together
2362
04:14:04,240 --> 04:14:15,440
in the video. So I'm going to link this here, exercises, see exercises for this notebook here.
2363
04:14:16,880 --> 04:14:23,680
So then how should you approach these exercises? So one way would be to just read them here,
2364
04:14:23,680 --> 04:14:32,240
and then in collab we'll go file new notebook, wait for the notebook to load. Then you could call this
2365
04:14:32,240 --> 04:14:40,240
zero zero pytorch exercises or something like that, and then you could start off by importing
2366
04:14:40,240 --> 04:14:46,080
torch, and then away you go. For me, I'd probably set this up on one side of the screen, this one
2367
04:14:46,080 --> 04:14:51,760
up on the other side of the screen, and then I just have the exercises here. So number one,
2368
04:14:51,760 --> 04:14:56,080
I'm not going to really write much code for that, but you could have documentation reading here.
2369
04:14:57,440 --> 04:15:02,800
And then so this encourages you to read through torch.tensor and go through there
2370
04:15:04,000 --> 04:15:08,720
for 10 minutes or so. And then for the other ones, we've got create a random tensor with shape
2371
04:15:08,720 --> 04:15:17,120
seven seven. So we just comment that out. So torch, round seven seven, and there we go.
2372
04:15:17,120 --> 04:15:22,160
Some are as easy as that. Some are a little bit more complex. As we go throughout the course,
2373
04:15:22,160 --> 04:15:25,520
these exercises are going to get a little bit more in depth as we've learned more.
2374
04:15:26,560 --> 04:15:32,480
But if you'd like an exercise template, you can come back to the GitHub. This is the home for all
2375
04:15:32,480 --> 04:15:38,880
of the course materials. You can go into extras and then exercises. I've created templates for
2376
04:15:38,880 --> 04:15:46,480
each of the exercises. So pytorch fundamentals exercises. If you open this up, this is a template
2377
04:15:46,480 --> 04:15:51,600
for all of the exercises. So you see there, create a random tensor with shape seven seven.
2378
04:15:51,600 --> 04:15:55,840
These are all just headings. And if you'd like to open this in CoLab and work on it,
2379
04:15:55,840 --> 04:16:02,400
how can you do that? Well, you can copy this link here. Come to Google CoLab. We'll go file,
2380
04:16:03,040 --> 04:16:11,600
open notebook, GitHub. You can type in the link there. Click search. What's this going to do?
2381
04:16:11,600 --> 04:16:17,360
Boom. Pytorch fundamentals exercises. So now you can go through all of the exercises. This
2382
04:16:17,360 --> 04:16:23,920
will be the same for every module on the course and test your knowledge. Now it is open book. You
2383
04:16:23,920 --> 04:16:30,800
can use the notebook here, the ones that we've coded together. But I would encourage you to try
2384
04:16:30,800 --> 04:16:35,520
to do these things on your own first. If you get stuck, you can always reference back. And then
2385
04:16:35,520 --> 04:16:41,280
if you'd like to see an example solutions, you can go back to the extras. There's a solutions folder
2386
04:16:41,280 --> 04:16:46,320
as well. And that's where the solutions live. So the fundamental exercise solutions. But again,
2387
04:16:46,320 --> 04:16:52,480
I would encourage you to try these out, at least give them a go before having a look at the solutions.
2388
04:16:53,360 --> 04:16:58,240
So just keep that in mind at the end of every module, there's exercises and extra curriculum.
2389
04:16:58,240 --> 04:17:03,360
The exercises will be code based. The extra curriculum is usually like reading based.
2390
04:17:03,360 --> 04:17:07,520
So spend one hour going through the Pytorch basics tutorial. I recommend the quick start
2391
04:17:07,520 --> 04:17:12,160
in tensor sections. And then finally to learn more on how a tensor can represent data,
2392
04:17:12,160 --> 04:17:17,520
watch the video what's a tensor which we referred to throughout this. But massive effort on finishing
2393
04:17:17,520 --> 04:17:29,760
the Pytorch fundamentals section. I'll see you in the next section. Friends, welcome back to
2394
04:17:31,760 --> 04:17:36,240
the Pytorch workflow module. Now let's have a look at what we're going to get into.
2395
04:17:36,240 --> 04:17:43,920
So this is a Pytorch workflow. And I say a because it's one of many. When you get into
2396
04:17:43,920 --> 04:17:47,360
deep learning machine learning, you'll find that there's a fair few ways to do things. But here's
2397
04:17:47,360 --> 04:17:51,760
the rough outline of what we're going to do. We're going to get our data ready and turn it into
2398
04:17:51,760 --> 04:17:56,800
tensors because remember a tensor can represent almost any kind of data. We're going to pick or
2399
04:17:56,800 --> 04:18:00,880
build or pick a pre-trained model. We'll pick a loss function and optimize it. Don't worry if
2400
04:18:00,880 --> 04:18:03,920
you don't know what they are. We're going to cover this. We're going to build a training loop,
2401
04:18:03,920 --> 04:18:09,200
fit the model to make a prediction. So fit the model to the data that we have. We'll learn how
2402
04:18:09,200 --> 04:18:14,400
to evaluate our models. We'll see how we can improve through experimentation and we'll save
2403
04:18:14,400 --> 04:18:19,280
and reload our trained model. So if you wanted to export your model from a notebook and use it
2404
04:18:19,280 --> 04:18:25,520
somewhere else, this is what you want to be doing. And so where can you get help? Probably the most
2405
04:18:25,520 --> 04:18:29,520
important thing is to follow along with the code. We'll be coding all of this together.
2406
04:18:29,520 --> 04:18:35,840
Remember model number one. If and out, run the code. Try it for yourself. That's how I learn best.
2407
04:18:35,840 --> 04:18:40,800
Is I write code? I try it. I get it wrong. I try again and keep going until I get it right.
2408
04:18:41,760 --> 04:18:46,080
Read the doc string because that's going to show you some documentation about the functions that
2409
04:18:46,080 --> 04:18:51,200
we're using. So on a Mac, you can use shift command and space in Google Colab or if you're on a Windows
2410
04:18:51,200 --> 04:18:56,800
PC, it might be control here. If you're still stuck, try searching for it. You'll probably come
2411
04:18:56,800 --> 04:19:01,440
across resources such as stack overflow or the PyTorch documentation. We've already seen this
2412
04:19:01,440 --> 04:19:05,760
a whole bunch and we're probably going to see it a lot more throughout this entire course actually
2413
04:19:05,760 --> 04:19:11,120
because that's going to be the ground truth of everything PyTorch. Try again. And finally,
2414
04:19:11,120 --> 04:19:15,760
if you're still stuck, ask a question. So the best place to ask a question will be
2415
04:19:15,760 --> 04:19:20,000
at the PyTorch deep learning slash discussions tab. And then if we go to GitHub,
2416
04:19:20,640 --> 04:19:25,280
that's just under here. So Mr. Deeburg PyTorch deep learning. This is all the course materials.
2417
04:19:25,280 --> 04:19:30,960
We see here, this is your ground truth for the entire course. And then if you have a question,
2418
04:19:30,960 --> 04:19:36,160
go to the discussions tab, new discussion, you can ask a question there. And don't forget to
2419
04:19:36,160 --> 04:19:41,120
please put the video and the code that you're trying to run. That way we can reference
2420
04:19:41,120 --> 04:19:47,440
what's going on and help you out there. And also, don't forget, there is the book version of the
2421
04:19:47,440 --> 04:19:52,480
course. So learn pytorch.io. By the time you watch this video, it'll probably have all the chapters
2422
04:19:52,480 --> 04:19:56,960
here. But here's what we're working through. This is what the videos are based on. All of this,
2423
04:19:56,960 --> 04:20:00,720
we're going to go through all of this. How fun is that? But this is just reference material.
2424
04:20:00,720 --> 04:20:06,880
So you can read this at your own time. We're going to focus on coding together. And speaking of coding.
2425
04:20:09,840 --> 04:20:12,160
Let's code. I'll see you over at Google Colab.
2426
04:20:14,080 --> 04:20:21,840
Oh, right. Well, let's get hands on with some code. I'm going to come over to colab.research.google.com.
2427
04:20:21,840 --> 04:20:28,000
You may already have that bookmark. And I'm going to start a new notebook. So we're going to do
2428
04:20:28,000 --> 04:20:33,760
everything from scratch here. We'll let this load up. I'm just going to zoom in a little bit.
2429
04:20:35,360 --> 04:20:44,160
Beautiful. And now I'm going to title this 01 pytorch workflow. And I'm going to put the video
2430
04:20:45,040 --> 04:20:50,080
ending on here so that you know that this notebook's from the video. Why is that? Because in the
2431
04:20:50,080 --> 04:20:54,480
course resources, we have the original notebook here, which is what this video notebook is going
2432
04:20:54,480 --> 04:20:59,440
to be based off. You can refer to this notebook as reference for what we're going to go through.
2433
04:20:59,440 --> 04:21:03,520
It's got a lot of pictures and beautiful text annotations. We're going to be focused on the
2434
04:21:03,520 --> 04:21:08,640
code in the videos. And then of course, you've got the book version of the notebook as well,
2435
04:21:08,640 --> 04:21:14,240
which is just a different formatted version of this exact same notebook. So I'm going to link
2436
04:21:14,240 --> 04:21:24,720
both of these up here. So let's write in here, pytorch workflow. And let's explore an example,
2437
04:21:25,520 --> 04:21:36,720
pytorch end to end workflow. And then I'm going to put the resources. So ground truth notebook.
2438
04:21:36,720 --> 04:21:44,400
We go here. And I'm also going to put the book version.
2439
04:21:44,400 --> 04:21:58,080
Book version of notebook. And finally, ask a question, which will be where at the discussions
2440
04:21:58,080 --> 04:22:04,560
page. Then we'll go there. Beautiful. Let's turn this into markdown. So let's get started. Let's
2441
04:22:04,560 --> 04:22:10,160
just jump right in and start what we're covering. So this is the trend I want to start getting
2442
04:22:10,160 --> 04:22:14,880
towards is rather than spending a whole bunch of time going through keynotes and slides,
2443
04:22:14,880 --> 04:22:19,920
I'd rather we just code together. And then we explain different things as they need to be
2444
04:22:19,920 --> 04:22:23,840
explained because that's what you're going to be doing if you end up writing a lot of pytorch is
2445
04:22:23,840 --> 04:22:29,440
you're going to be writing code and then looking things up as you go. So I'll get out of these
2446
04:22:29,440 --> 04:22:34,640
extra tabs. I don't think we need them. Just these two will be the most important. So what we're
2447
04:22:34,640 --> 04:22:38,560
covering, let's create a little dictionary so we can check this if we wanted to later on.
2448
04:22:39,200 --> 04:22:44,640
So referring to our pytorch workflows, at least the example one that we're going to go through,
2449
04:22:45,280 --> 04:22:51,840
which is just here. So we're going to go through all six of these steps, maybe a little bit of
2450
04:22:51,840 --> 04:22:57,360
each one, but just to see it going from this to this, that's what we're really focused on. And then
2451
04:22:57,360 --> 04:23:04,240
we're going to go through through rest the course like really dig deep into all of these. So what
2452
04:23:04,240 --> 04:23:10,320
we're covering number one is data preparing and loading. Number two is we're going to see how we
2453
04:23:10,320 --> 04:23:16,080
can build a machine learning model in pytorch or a deep learning model. And then we're going
2454
04:23:16,080 --> 04:23:23,120
to see how we're going to fit our model to the data. So this is called training. So fit is another
2455
04:23:23,120 --> 04:23:27,520
word. As I said in machine learning, there's a lot of different names for similar things,
2456
04:23:27,520 --> 04:23:32,880
kind of confusing, but you'll pick it up with time. So we're going to once we've trained a model,
2457
04:23:32,880 --> 04:23:37,440
we're going to see how we can make predictions and evaluate those predictions,
2458
04:23:37,440 --> 04:23:43,200
evaluating a model. If you make predictions, it's often referred to as inference. I typically
2459
04:23:43,200 --> 04:23:47,920
say making predictions, but inference is another very common term. And then we're going to look
2460
04:23:47,920 --> 04:23:54,480
at how we can save and load a model. And then we're going to put it all together. So a little bit
2461
04:23:54,480 --> 04:24:01,120
different from the visual version we have of the pytorch workflow. So if we go back to here,
2462
04:24:02,080 --> 04:24:08,480
I might zoom in a little. There we go. So we're going to focus on this one later on,
2463
04:24:08,480 --> 04:24:12,560
improve through experimentation. We're just going to focus on the getting data ready,
2464
04:24:12,560 --> 04:24:17,600
building a model, fitting the model, evaluating model, save and reload. So we'll see this one more,
2465
04:24:18,400 --> 04:24:21,680
like in depth later on, but I'll hint at different things that you can do
2466
04:24:21,680 --> 04:24:26,400
for this while we're working through this workflow. And so let's put that in here.
2467
04:24:26,960 --> 04:24:31,840
And then if we wanted to refer to this later, we can just go what we're covering.
2468
04:24:34,400 --> 04:24:39,360
Oh, this is going to connect, of course. Beautiful. So we can refer to this later on,
2469
04:24:39,360 --> 04:24:45,920
if we wanted to. And we're going to start by import torch. We're going to get pytorch ready
2470
04:24:45,920 --> 04:24:52,160
to go import nn. So I'll write a note here. And then we haven't seen this one before, but
2471
04:24:52,160 --> 04:24:56,240
we're going to see a few things that we haven't seen, but that's okay. We'll explain it as we go.
2472
04:24:56,240 --> 04:25:03,360
So nn contains all of pytorch's building blocks for neural networks.
2473
04:25:03,360 --> 04:25:10,160
And how would we learn more about torch nn? Well, if we just go torch.nn, here's how I'd
2474
04:25:10,160 --> 04:25:15,760
learn about it, pytorch documentation. Beautiful. Look at all these. These are the basic building
2475
04:25:15,760 --> 04:25:20,560
blocks for graphs. Now, when you see the word graph, it's referring to a computational graph,
2476
04:25:20,560 --> 04:25:24,320
which is in the case of neural networks, let's look up a photo of a neural network.
2477
04:25:24,320 --> 04:25:33,680
Images, this is a graph. So if you start from here, you're going to go towards the right.
2478
04:25:33,680 --> 04:25:38,560
There's going to be many different pictures. So yeah, this is a good one. Input layer. You have
2479
04:25:38,560 --> 04:25:45,360
a hidden layer, hidden layer to output layer. So torch and n comprises of a whole bunch of
2480
04:25:45,360 --> 04:25:50,720
different layers. So you can see layers, layers, layers. And each one of these, you can see input
2481
04:25:50,720 --> 04:25:57,040
layer, hidden layer one, hidden layer two. So it's our job as data scientists and machine
2482
04:25:57,040 --> 04:26:03,040
learning engineers to combine these torch dot nn building blocks to build things such as these.
2483
04:26:03,040 --> 04:26:08,800
Now, it might not be exactly like this, but that's the beauty of pytorch is that you can
2484
04:26:08,800 --> 04:26:13,440
combine these in almost any different way to build any kind of neural network you can imagine.
2485
04:26:14,640 --> 04:26:19,840
And so let's keep going. That's torch nn. We're going to get hands on with it,
2486
04:26:19,840 --> 04:26:24,880
rather than just talk about it. And we're going to need map plot lib because what's our other
2487
04:26:24,880 --> 04:26:31,200
motto? Our data explorers motto is visualize, visualize, visualize. And let's check our pytorch
2488
04:26:31,200 --> 04:26:37,840
version. Pytorch version torch dot version. So this is just to show you you'll need
2489
04:26:39,360 --> 04:26:46,240
at least this version. So 1.10 plus CUDA 111. That means that we've got CU stands for CUDA.
2490
04:26:46,240 --> 04:26:50,080
That means we've got access to CUDA. We don't have a GPU on this runtime yet,
2491
04:26:50,080 --> 04:26:54,640
because we haven't gone to GPU. We might do that later.
2492
04:26:56,320 --> 04:27:02,880
So if you have a version that's lower than this, say 1.8, 0.0, you'll want pytorch 1.10 at least.
2493
04:27:02,880 --> 04:27:08,000
If you have a version higher than this, your code should still work. But that's about enough
2494
04:27:08,000 --> 04:27:13,280
for this video. We've got our workflow ready to set up our notebook, our video notebook.
2495
04:27:13,280 --> 04:27:17,360
We've got the resources. We've got what we're covering. We've got our dependencies.
2496
04:27:17,360 --> 04:27:24,240
Let's in the next one get started on one data, preparing and loading.
2497
04:27:26,320 --> 04:27:27,360
I'll see you in the next video.
2498
04:27:29,760 --> 04:27:36,880
Let's now get on to the first step of our pytorch workflow. And that is data, preparing and loading.
2499
04:27:36,880 --> 04:27:43,680
Now, I want to stress data can be almost anything in machine learning.
2500
04:27:44,640 --> 04:27:50,640
I mean, you could have an Excel spreadsheet, which is rows and columns,
2501
04:27:51,440 --> 04:27:58,800
nice and formatted data. You could have images of any kind. You could have videos. I mean,
2502
04:27:58,800 --> 04:28:09,360
YouTube has lots of data. You could have audio like songs or podcasts. You could have even DNA
2503
04:28:09,360 --> 04:28:14,640
these days. Patents and DNA are starting to get discovered by machine learning. And then, of course,
2504
04:28:14,640 --> 04:28:20,720
you could have text like what we're writing here. And so what we're going to be focusing on
2505
04:28:20,720 --> 04:28:26,640
throughout this entire course is the fact that machine learning is a game of two parts.
2506
04:28:26,640 --> 04:28:41,920
So one, get data into a numerical representation to build a model to learn patterns in that
2507
04:28:41,920 --> 04:28:47,200
numerical representation. Of course, there's more around it. Yes, yes, yes. I understand you can
2508
04:28:47,200 --> 04:28:52,560
get as complex as you like, but these are the main two concepts. And machine learning, when I say
2509
04:28:52,560 --> 04:28:59,280
machine learning, saying goes for deep learning, you need some kind of, oh, number a call. Number
2510
04:28:59,280 --> 04:29:04,000
a call. I like that word, number a call representation. Then you want to build a model to learn patterns
2511
04:29:04,000 --> 04:29:11,520
in that numerical representation. And if you want, I've got a nice pretty picture that describes that
2512
04:29:11,520 --> 04:29:16,400
machine learning a game of two parts. Let's refer to our data. Remember, data can be almost
2513
04:29:16,400 --> 04:29:22,000
anything. These are our inputs. So the first step that we want to do is create some form
2514
04:29:22,000 --> 04:29:28,240
of numerical encoding in the form of tenses to represent these inputs, how this looks will be
2515
04:29:28,880 --> 04:29:33,840
dependent on the data, depending on the numerical encoding you choose to use. Then we're going to
2516
04:29:33,840 --> 04:29:38,880
build some sort of neural network to learn a representation, which is also referred to as
2517
04:29:38,880 --> 04:29:45,200
patterns features or weights within that numerical encoding. It's going to output that
2518
04:29:45,200 --> 04:29:50,560
representation. And then we want to do something without representation, such as in the case of
2519
04:29:50,560 --> 04:29:55,760
this, we're doing image recognition, image classification, is it a photo of Raman or spaghetti?
2520
04:29:55,760 --> 04:30:02,560
Is this tweet spam or not spam? Is this audio file saying what it says here? I'm not going to say
2521
04:30:02,560 --> 04:30:08,320
this because my audio assistant that's also named to this word here is close by and I don't want it
2522
04:30:08,320 --> 04:30:16,880
to go off. So this is our game of two parts. One here is convert our data into a numerical
2523
04:30:16,880 --> 04:30:23,040
representation. And two here is build a model or use a pre trained model to find patterns in
2524
04:30:23,040 --> 04:30:29,280
that numerical representation. And so we've got a little stationary picture here, turn data into
2525
04:30:29,280 --> 04:30:34,880
numbers, part two, build a model to learn patterns in numbers. So with that being said,
2526
04:30:34,880 --> 04:30:46,720
now let's create some data to showcase this. So to showcase this, let's create some known
2527
04:30:49,280 --> 04:30:56,000
data using the linear regression formula. Now, if you're not sure what linear regression is,
2528
04:30:56,000 --> 04:31:03,120
or the formula is, let's have a look linear regression formula. This is how I'd find it.
2529
04:31:03,120 --> 04:31:09,920
Okay, we have some fancy Greek letters here. But essentially, we have y equals a function of x
2530
04:31:09,920 --> 04:31:16,320
and b plus epsilon. Okay. Well, there we go. A linear regression line has the equation in the
2531
04:31:16,320 --> 04:31:20,960
form of y equals a plus bx. Oh, I like this one better. This is nice and simple. We're going to
2532
04:31:20,960 --> 04:31:27,760
start from as simple as possible and work up from there. So y equals a plus bx, where x is the
2533
04:31:27,760 --> 04:31:34,480
explanatory variable, and y is the dependent variable. The slope of the line is b. And the
2534
04:31:34,480 --> 04:31:41,440
slope is also known as the gradient. And a is the intercept. Okay, the value of when y
2535
04:31:42,160 --> 04:31:48,720
when x equals zero. Now, this is just text on a page. This is formula on a page. You know how I
2536
04:31:48,720 --> 04:31:59,520
like to learn things? Let's code it out. So let's write it here. We'll use a linear regression formula
2537
04:31:59,520 --> 04:32:08,400
to make a straight line with known parameters. I'm going to write this down because parameter
2538
04:32:10,160 --> 04:32:16,880
is a common word that you're going to hear in machine learning as well. So a parameter is
2539
04:32:16,880 --> 04:32:22,800
something that a model learns. So for our data set, if machine learning is a game of two parts,
2540
04:32:22,800 --> 04:32:27,760
we're going to start with this. Number one is going to be done for us, because we're going to
2541
04:32:27,760 --> 04:32:35,280
start with a known representation, a known data set. And then we want our model to learn that
2542
04:32:35,280 --> 04:32:40,000
representation. This is all just talk, Daniel, let's get into coding. Yes, you're right. You're
2543
04:32:40,000 --> 04:32:46,560
right. Let's do it. So create known parameters. So I'm going to use a little bit different
2544
04:32:46,560 --> 04:32:54,640
names to what that Google definition did. So weight is going to be 0.7 and bias is going to be 0.3.
2545
04:32:55,280 --> 04:33:00,720
Now weight and bias are another common two terms that you're going to hear in neural networks.
2546
04:33:01,440 --> 04:33:07,680
So just keep that in mind. But for us, this is going to be the equivalent of our weight will be B
2547
04:33:08,640 --> 04:33:15,120
and our bias will be A. But forget about this for the time being. Let's just focus on the code.
2548
04:33:15,120 --> 04:33:22,400
So we know these numbers. But we want to build a model that is able to estimate these numbers.
2549
04:33:23,600 --> 04:33:28,800
How? By looking at different examples. So let's create some data here. We're going to create a
2550
04:33:28,800 --> 04:33:34,640
range of numbers. Start equals zero and equals one. We're going to create some numbers between
2551
04:33:34,640 --> 04:33:41,040
zero and one. And they're going to have a gap. So the step the gap is going to be 0.02.
2552
04:33:41,040 --> 04:33:45,280
Now we're going to create an X variable. Why is X a capital here?
2553
04:33:47,200 --> 04:33:52,480
Well, it's because typically X in machine learning you'll find is a matrix or a tensor.
2554
04:33:52,480 --> 04:33:58,320
And if we remember back to the fundamentals, a capital represents a matrix or a tensor
2555
04:33:58,320 --> 04:34:03,040
and a lowercase represents a vector. But now case it's going to be a little confusing because
2556
04:34:03,040 --> 04:34:09,360
X is a vector. But later on, X will start to be a tensor and a matrix. So for now,
2557
04:34:09,360 --> 04:34:12,720
we'll just keep the capital, not capital notation.
2558
04:34:15,600 --> 04:34:24,160
We're going to create the formula here, which is remember how I said our weight is in this case,
2559
04:34:24,880 --> 04:34:31,920
the B and our bias is the A. So we've got the same formula here. Y equals weight times X plus
2560
04:34:31,920 --> 04:34:38,320
bias. Now let's have a look at these different numbers. So we'll view the first 10 of X and we'll
2561
04:34:38,320 --> 04:34:43,840
view the first 10 of Y. We'll have a look at the length of X and we'll have a look at the length of
2562
04:34:43,840 --> 04:34:55,280
Y. Wonderful. So we've got some values here. We've got 50 numbers of each. This is a little
2563
04:34:55,280 --> 04:34:59,600
confusing. Let's just view the first 10 of X and Y first. And then we can have a look at the
2564
04:34:59,600 --> 04:35:11,520
length here. So what we're going to be doing is building a model to learn some values,
2565
04:35:12,960 --> 04:35:20,640
to look at the X values here and learn what the associated Y value is and the relationship
2566
04:35:20,640 --> 04:35:25,760
between those. Of course, we know what the relationship is between X and Y because we've
2567
04:35:25,760 --> 04:35:32,400
coded this formula here. But you won't always know that in the wild. That is the whole premise of
2568
04:35:32,400 --> 04:35:38,160
machine learning. This is our ideal output and this is our input. The whole premise of machine
2569
04:35:38,160 --> 04:35:44,640
learning is to learn a representation of the input and how it maps to the output. So here are our
2570
04:35:44,640 --> 04:35:51,040
input numbers and these are our output numbers. And we know that the parameters of the weight and
2571
04:35:51,040 --> 04:35:55,760
bias are 0.7 and 0.3. We could have set these to whatever we want, by the way. I just like the
2572
04:35:55,760 --> 04:36:01,520
number 7 and 3. You could set these to 0.9, whatever, whatever. The premise would be the same.
2573
04:36:02,160 --> 04:36:06,080
So, oh, and what I've just done here, I kind of just coded this without talking.
2574
04:36:06,880 --> 04:36:14,160
But I just did torch a range and it starts at 0 and it ends at 1 and the step is 0.02. So there
2575
04:36:14,160 --> 04:36:22,640
we go, 000 by 0.02, 04. And I've unsqueezed it. So what does unsqueezed do? Removes the extra
2576
04:36:22,640 --> 04:36:29,120
dimensions. Oh, sorry, ads are extra dimension. You're getting confused here. So if we remove that,
2577
04:36:31,920 --> 04:36:37,120
we get no extra square bracket. But if we add unsqueeze, you'll see that we need this later on
2578
04:36:37,120 --> 04:36:42,960
for when we're doing models. Wonderful. So let's just leave it at that. That's enough for this
2579
04:36:42,960 --> 04:36:47,040
video, we've got some data to work with. Don't worry if this is a little bit confusing for now,
2580
04:36:47,040 --> 04:36:52,800
we're going to keep coding on and see what we can do to build a model to infer patterns in this
2581
04:36:52,800 --> 04:36:58,720
data. But right now, I want you to have a think, this is tensor data, but it's just numbers on a
2582
04:36:58,720 --> 04:37:05,440
page. What might be a better way to hint, this is a hint by the way, visualize it. What's our
2583
04:37:05,440 --> 04:37:12,800
data explorer's motto? Let's have a look at that in the next video. Welcome back. In the last
2584
04:37:12,800 --> 04:37:18,560
video, we created some numbers on a page using the linear regression formula with some known
2585
04:37:18,560 --> 04:37:22,880
parameters. Now, there's a lot going on here, but that's all right. We're going to keep building
2586
04:37:22,880 --> 04:37:28,160
upon what we've done and learn by doing. So in this video, we're going to cover one of the most
2587
04:37:28,160 --> 04:37:35,840
important concepts in machine learning in general. So splitting data into training and test sets.
2588
04:37:35,840 --> 04:37:45,200
One of the most important concepts in machine learning in general. Now, I know I've said this
2589
04:37:45,200 --> 04:37:52,880
already a few times. One of the most important concepts, but truly, this is possibly, in terms
2590
04:37:52,880 --> 04:37:58,240
of data, this is probably the number one thing that you need to be aware of. And if you've come
2591
04:37:58,240 --> 04:38:02,480
from a little bit of a machine learning background, you probably well and truly know all about this.
2592
04:38:02,480 --> 04:38:08,240
But we're going to recover it anyway. So let's jump in to some pretty pictures. Oh, look at that
2593
04:38:08,240 --> 04:38:13,120
one speaking of pretty pictures. But that's not what we're focused on now. We're looking at the
2594
04:38:13,120 --> 04:38:18,160
three data sets. And I've written down here possibly the most important concept in machine
2595
04:38:18,160 --> 04:38:23,840
learning, because it definitely is from a data perspective. So the course materials,
2596
04:38:24,560 --> 04:38:29,680
imagine you're at university. So this is going to be the training set. And then you have the
2597
04:38:29,680 --> 04:38:34,000
practice exam, which is the validation set. Then you have the final exam, which is the test set.
2598
04:38:34,640 --> 04:38:41,200
And the goal of all of this is for generalization. So let's step back. So say you're trying to learn
2599
04:38:41,200 --> 04:38:46,080
something at university or through this course, you might have all of the materials, which is your
2600
04:38:46,080 --> 04:38:54,640
training set. So this is where our model learns patterns from. And then to practice what you've
2601
04:38:54,640 --> 04:39:01,360
done, you might have a practice exam. So the mid semester exam or something like that. Now,
2602
04:39:01,360 --> 04:39:06,400
let's just see if you're learning the course materials well. So in the case of our model,
2603
04:39:06,400 --> 04:39:13,440
we might tune our model on this plastic exam. So we might find that on the validation set,
2604
04:39:14,000 --> 04:39:20,480
our model doesn't do too well. And we adjusted a bit, and then we retrain it, and then it does
2605
04:39:20,480 --> 04:39:27,120
better. Before finally, at the end of semester, the most important exam is your final exam. And
2606
04:39:27,120 --> 04:39:32,000
this is to see if you've gone through the entire course materials, and you've learned some things.
2607
04:39:32,000 --> 04:39:36,800
Now you can adapt to unseen material. And that's a big point here. We're going to see this in
2608
04:39:36,800 --> 04:39:44,000
practice is that when the model learns something on the course materials, it never sees the validation
2609
04:39:44,000 --> 04:39:51,520
set or the test set. So say we started with 100 data points, you might use 70 of those data points
2610
04:39:51,520 --> 04:39:57,600
for the training material. You might use 15% of those data points, so 15 for the practice.
2611
04:39:57,600 --> 04:40:03,440
And you might use 15 for the final exam. So this final exam is just like if you're at university
2612
04:40:03,440 --> 04:40:08,480
learning something is to see if, hey, have you learned any skills from this material at all?
2613
04:40:08,480 --> 04:40:15,120
Are you ready to go into the wild into the quote unquote real world? And so this final exam is to
2614
04:40:15,120 --> 04:40:22,640
test your model's generalization, because it's never seen this data is, let's define generalization
2615
04:40:22,640 --> 04:40:27,840
is the ability for a machine learning model or a deep learning model to perform well on data it
2616
04:40:27,840 --> 04:40:32,240
hasn't seen before, because that's our whole goal, right? We want to build a machine learning model
2617
04:40:32,240 --> 04:40:38,800
on some training data that we can deploy in our application or production setting. And then
2618
04:40:38,800 --> 04:40:44,320
more data comes in that it hasn't seen before. And it can make decisions based on that new data
2619
04:40:44,320 --> 04:40:48,480
because of the patterns it's learned in the training set. So just keep this in mind,
2620
04:40:48,480 --> 04:40:54,880
three data sets training validation test. And if we jump in to the learn pytorch book,
2621
04:40:54,880 --> 04:41:04,800
we've got split data. So we're going to create three sets. Or in our case, we're only going to
2622
04:41:04,800 --> 04:41:10,080
create two or training in a test. Why is that? Because you don't always need a validation set.
2623
04:41:10,720 --> 04:41:18,160
There is often a use case for a validation set. But the main two that are always used is the training
2624
04:41:18,160 --> 04:41:23,920
set and the testing set. And how much should you split? Well, usually for the training set,
2625
04:41:23,920 --> 04:41:27,920
you'll have 60 to 80% of your data. If you do create a validation set, you'll have somewhere
2626
04:41:27,920 --> 04:41:33,280
between 10 and 20. And if you do create a testing set, it's a similar split to the validation set,
2627
04:41:33,280 --> 04:41:40,080
you'll have between 10 and 20%. So training, always testing always validation often, but
2628
04:41:40,800 --> 04:41:46,320
not always. So with that being said, I'll let you refer to those materials if you want. But now
2629
04:41:46,320 --> 04:41:55,680
let's create a training and test set with our data. So we saw before that we have 50 points,
2630
04:41:55,680 --> 04:42:01,920
we have X and Y, we have one to one ratio. So one value of X relates to one value of Y.
2631
04:42:01,920 --> 04:42:08,880
And we know that the split now for the training set is 60 to 80%. And the test set is 10 to 20%.
2632
04:42:09,600 --> 04:42:14,640
So let's go with the upper bounds of each of these, 80% and 20%, which is a very common split,
2633
04:42:14,640 --> 04:42:25,120
actually 80, 20. So let's go create a train test split. And we're going to go train split.
2634
04:42:25,760 --> 04:42:32,880
We'll create a number here so we can see how much. So we want an integer of 0.8, which is 80%
2635
04:42:32,880 --> 04:42:39,280
of the length of X. What does that give us? Train split should be about 40 samples. Wonderful.
2636
04:42:39,280 --> 04:42:46,240
So we're going to create 40 samples of X and 40 samples of Y. Our model will train on those 40
2637
04:42:46,240 --> 04:42:54,080
samples to predict what? The other 10 samples. So let's see this in practice. So X train,
2638
04:42:55,280 --> 04:43:03,680
Y train equals X. And we're going to use indexing to get all of the samples up until the train
2639
04:43:03,680 --> 04:43:10,560
split. That's what this colon does here. So hey, X up until the train split, Y up until the train
2640
04:43:10,560 --> 04:43:17,120
split, and then for the testing. Oh, thanks for that. Auto correct cola, but didn't actually need that
2641
04:43:17,120 --> 04:43:25,520
one. X test. Y test equals X. And then we're going to get everything from the train split onwards.
2642
04:43:25,520 --> 04:43:36,280
So the index onwards, that's what this notation means here. And Y from the train split onwards as
2643
04:43:36,280 --> 04:43:43,520
well. Now, there are many different ways to create a train and test split. Ours is quite simple here,
2644
04:43:43,520 --> 04:43:48,640
but that's because we're working with quite a simple data set. One of the most popular methods
2645
04:43:48,640 --> 04:43:53,880
that I like is scikit learns train test split. We're going to see this one later on. It adds a
2646
04:43:53,880 --> 04:43:59,960
little bit of randomness into splitting your data. But that's for another video, just to make you
2647
04:43:59,960 --> 04:44:09,400
aware of it. So let's go length X train. We should have 40 training samples to
2648
04:44:09,400 --> 04:44:24,280
how many testing samples length X test and length Y test. Wonderful 40 40 10 10 because we have
2649
04:44:24,280 --> 04:44:31,160
training features, training labels, testing features, testing labels. So essentially what we've
2650
04:44:31,160 --> 04:44:37,560
created here is now a training set. We've split our data. Training set could also be referred to
2651
04:44:37,560 --> 04:44:43,160
as training split yet another example of where machine learning has different names for different
2652
04:44:43,160 --> 04:44:48,920
things. So set split same thing training split test split. This is what we've created. Remember,
2653
04:44:48,920 --> 04:44:53,880
the validation set is used often, but not always because our data set is quite simple. We're just
2654
04:44:53,880 --> 04:44:59,960
sticking with the necessities training and test. But keep this in mind. One of your biggest,
2655
04:45:00,520 --> 04:45:06,360
biggest, biggest hurdles in machine learning will be creating proper training and test sets. So
2656
04:45:06,360 --> 04:45:11,560
it's a very important concept. With that being said, I did issue the challenge in the last video
2657
04:45:11,560 --> 04:45:17,000
to visualize these numbers on a page. We haven't done that in this video. So let's move towards
2658
04:45:17,000 --> 04:45:22,920
that next. I'd like you to think of how could you make these more visual? Right. These are just
2659
04:45:22,920 --> 04:45:33,880
numbers on a page right now. Maybe that plot lib can help. Let's find out. Hey, hey, hey, welcome
2660
04:45:33,880 --> 04:45:40,360
back. In the last video, we split our data into training and test sets. And now later on,
2661
04:45:40,360 --> 04:45:44,680
we're going to be building a model to learn patterns in the training data to relate to the
2662
04:45:44,680 --> 04:45:50,120
testing data. But as I said, right now, our data is just numbers on a page. It's kind of
2663
04:45:50,120 --> 04:45:54,760
hard to understand. You might be able to understand this, but I prefer to get visual. So let's write
2664
04:45:54,760 --> 04:46:04,280
this down. How might we better visualize our data? And I'm put a capital here. So we're grammatically
2665
04:46:04,280 --> 04:46:17,960
correct. And this is where the data Explorers motto comes in. Visualize, visualize, visualize.
2666
04:46:18,680 --> 04:46:23,640
Ha ha. Right. So if ever you don't understand a concept, one of the best ways to start
2667
04:46:23,640 --> 04:46:29,400
understanding it more for me is to visualize it. So let's write a function to do just that.
2668
04:46:29,400 --> 04:46:34,600
We're going to call this plot predictions. We'll see why we call it this later on. That's the
2669
04:46:34,600 --> 04:46:39,080
benefit of making these videos is that I've got a plan for the future. Although it might seem
2670
04:46:39,080 --> 04:46:43,000
like I'm winging it, there is a little bit of behind the scenes happening here. So we'll have
2671
04:46:43,000 --> 04:46:51,000
the train data, which is our X train. And then we'll have the train labels, which is our Y train.
2672
04:46:51,000 --> 04:46:58,200
And we'll also have the test data. Yeah, that's a good idea. X test. And we'll also have the test
2673
04:46:58,200 --> 04:47:05,960
labels, equals Y test. Excuse me. I was looking at too many X's there. And then the predictions.
2674
04:47:05,960 --> 04:47:11,400
And we'll set this to none, because we don't have any predictions yet. But as you might have guessed,
2675
04:47:11,400 --> 04:47:16,440
we might have some later on. So we'll put a little doc string here, so that we're being nice and
2676
04:47:16,440 --> 04:47:26,120
Pythonic. So plots training data, test data, and compares predictions. Nice and simple.
2677
04:47:28,120 --> 04:47:33,880
Nothing too outlandish. And then we're going to create a figure. This is where map plot lib comes
2678
04:47:33,880 --> 04:47:41,640
in. Plot figure. And we'll go fig size equals 10, seven, which is my favorite hand in poker.
2679
04:47:41,640 --> 04:47:46,920
And we'll plot the training data in blue also happens to be a good dimension for a map plot.
2680
04:47:47,880 --> 04:47:54,760
Plot dot scatter. Train data. Creating a scatter plot here. We'll see what it does in a second.
2681
04:47:55,560 --> 04:48:00,840
Color. We're going to give this a color of B for blue. That's what C stands for in map plot lib
2682
04:48:00,840 --> 04:48:09,480
scatter. We'll go size equals four and label equals training data. Now, where could you find
2683
04:48:09,480 --> 04:48:14,440
information about this scatter function here? We've got command shift space. Is that going to
2684
04:48:14,440 --> 04:48:19,160
give us a little bit of a doc string? Or sometimes if command not space is not working,
2685
04:48:19,720 --> 04:48:24,040
you can also hover over this bracket. I think you can even hover over this.
2686
04:48:26,280 --> 04:48:32,760
There we go. But this is a little hard for me to read. Like it's there, but it's got a lot going
2687
04:48:32,760 --> 04:48:46,840
on. X, Y, S, C, C map. I just like to go map plot lib scatter. There we go. We've got a whole
2688
04:48:46,840 --> 04:48:52,040
bunch of information there. A little bit easier to read for me here. And then you can see some
2689
04:48:52,040 --> 04:48:58,680
examples. Beautiful. So now let's jump back into here. So in our function plot predictions,
2690
04:48:58,680 --> 04:49:03,720
we've taken some training data, test data. We've got the training data plotting in blue. What
2691
04:49:03,720 --> 04:49:10,200
color should we use for the testing data? How about green? I like that idea. Plot.scatter.
2692
04:49:10,840 --> 04:49:17,720
Test data. Green's my favorite color. What's your favorite color? C equals G. You might be
2693
04:49:17,720 --> 04:49:22,200
able to just plot it in your favorite color here. Just remember though, it'll be a little bit
2694
04:49:22,200 --> 04:49:26,840
different from the videos. And then we're going to call this testing data. So just the exact same
2695
04:49:26,840 --> 04:49:33,560
line is above, but with a different set of data. Now, let's check if there are predictions. So
2696
04:49:33,560 --> 04:49:44,120
are there predictions? So if predictions is not none, let's plot the predictions, plot the
2697
04:49:44,120 --> 04:49:58,840
predictions, if they exist. So plot scatter test data. And why are we plotting the test data?
2698
04:49:58,840 --> 04:50:03,960
Remember, what is our scatter function? Let's go back up to here. It takes in x and y. So
2699
04:50:04,920 --> 04:50:10,200
our predictions are going to be compared to the testing data labels. So that's the whole
2700
04:50:10,200 --> 04:50:14,680
game that we're playing here. We're going to train our model on the training data.
2701
04:50:15,320 --> 04:50:19,400
And then to evaluate it, we're going to get our model to predict the y values
2702
04:50:20,280 --> 04:50:28,120
as with the input of x test. And then to evaluate our model, we compare how good our models
2703
04:50:28,120 --> 04:50:35,320
predictions are. In other words, predictions versus the actual values of the test data set.
2704
04:50:35,320 --> 04:50:42,280
But we're going to see this in practice. Rather than just talk about it. So let's do our predictions
2705
04:50:42,280 --> 04:50:55,320
in red. And label equals predictions. Wonderful. So let's also show the legend, because, I mean,
2706
04:50:55,320 --> 04:51:01,320
we're legends. So we could just put in a mirror here. Now I'm kidding. Legend is going to show
2707
04:51:01,320 --> 04:51:10,760
our labels on the map plot. So prop equals size and prop stands for properties. Well,
2708
04:51:11,640 --> 04:51:16,040
it may or may not. I just like to think it does. That's how I remember it. So we have a beautiful
2709
04:51:16,040 --> 04:51:24,040
function here to plot our data. Should we try it out? Remember, we've got hard coded inputs here,
2710
04:51:24,040 --> 04:51:28,360
so we don't actually need to input anything to our function. We've got our train and test data
2711
04:51:28,360 --> 04:51:32,840
ready to go. If in doubt, run the code, let's check it out. Did we make a mistake in our plot
2712
04:51:32,840 --> 04:51:40,840
predictions function? You might have caught it. Hey, there we go. Beautiful. So because we don't
2713
04:51:40,840 --> 04:51:46,120
have any predictions, we get no red dots. But this is what we're trying to do. We've got a simple
2714
04:51:46,120 --> 04:51:51,000
straight line. You can't get a much more simple data set than that. So we've got our training data
2715
04:51:51,000 --> 04:51:56,440
in blue, and we've got our testing data in green. So the whole idea of what we're going to be doing
2716
04:51:56,440 --> 04:52:00,520
with our machine learning model is we don't actually really need to build a machine learning
2717
04:52:00,520 --> 04:52:05,960
model for this. We could do other things, but machine learning is fun. So we're going to take
2718
04:52:05,960 --> 04:52:11,160
in the blue dots. There's quite a pattern here, right? This is the relationship we have an x value
2719
04:52:11,160 --> 04:52:17,720
here, and we have a y value. So we're going to build a model to try and learn the pattern
2720
04:52:17,720 --> 04:52:25,160
of these blue dots, so that if we fed our model, our model, the x values of the green dots,
2721
04:52:25,160 --> 04:52:29,560
could it predict the appropriate y values for that? Because remember, these are the test data set.
2722
04:52:29,560 --> 04:52:37,400
So pass our model x test to predict y test. So blue dots as input, green dots as the ideal output.
2723
04:52:37,400 --> 04:52:42,360
This is the ideal output, a perfect model would have red dots over the top of the green dots. So
2724
04:52:42,360 --> 04:52:47,640
that's what we will try to work towards. Now, we know the relationship between x and y.
2725
04:52:48,200 --> 04:52:53,160
How do we know that? Well, we set that up above here. This is our weight and bias.
2726
04:52:53,160 --> 04:52:59,560
We created that line y equals weight times x plus bias, which is the simple version of the
2727
04:52:59,560 --> 04:53:05,080
linear regression formula. So mx plus c, you might have heard that in high school algebra,
2728
04:53:05,080 --> 04:53:11,720
so gradient plus intercept. That's what we've got. With that being said,
2729
04:53:11,720 --> 04:53:16,920
let's move on to the next video and build a model. Well, this is exciting. I'll see you there.
2730
04:53:16,920 --> 04:53:24,760
Welcome back. In the last video, we saw how to get visual with our data. We followed the data
2731
04:53:24,760 --> 04:53:31,080
explorer's motto of visualize, visualize, visualize. And we've got an idea of the training data that
2732
04:53:31,080 --> 04:53:36,760
we're working with and the testing data that we're trying to build a model to learn the patterns
2733
04:53:36,760 --> 04:53:44,200
in the training data, essentially this upwards trend here, to be able to predict the testing data.
2734
04:53:44,200 --> 04:53:49,560
So I just want to give you another heads up. I took a little break after the recording last
2735
04:53:49,560 --> 04:53:54,760
video. And so now my colab notebook has disconnected. So I'm going to click reconnect.
2736
04:53:55,480 --> 04:54:02,920
And my variables here may not work. So this is what might happen on your end. If you take a break
2737
04:54:02,920 --> 04:54:08,200
from using Google Colab and come back, if I try to run this function, they might have been saved,
2738
04:54:08,200 --> 04:54:14,600
it looks like they have. But if not, you can go restart and run all. This is typically one of the
2739
04:54:14,600 --> 04:54:23,240
most helpful troubleshooting steps of using Google Colab. If a cell, say down here isn't working,
2740
04:54:23,240 --> 04:54:32,520
you can always rerun the cells above. And that may help with a lower cell here, such as if this
2741
04:54:32,520 --> 04:54:38,600
function wasn't instantiated because this cell wasn't run, and we couldn't run this cell here,
2742
04:54:38,600 --> 04:54:43,320
which calls this function here, we just have to rerun this cell above so that we can run this one.
2743
04:54:43,960 --> 04:54:51,960
But now let's get into building our first PyTorch model. We're going to jump straight into the code.
2744
04:54:51,960 --> 04:54:58,760
So our first PyTorch model. Now this is very exciting.
2745
04:54:58,760 --> 04:55:09,480
Let's do it. So we'll turn this into Markdown. Now we're going to create a linear regression model.
2746
04:55:09,480 --> 04:55:15,720
So look at linear regression formula again, we're going to create a model that's essentially going
2747
04:55:15,720 --> 04:55:23,480
to run this computation. So we need to create a model that has a parameter for A, a parameter for B,
2748
04:55:23,480 --> 04:55:29,640
and in our case it's going to be weight and bias, and a way to do this forward computation.
2749
04:55:29,640 --> 04:55:36,040
What I mean by that, we're going to see with code. So let's do it. We'll do it with pure PyTorch.
2750
04:55:36,040 --> 04:55:44,040
So create a linear regression model class. Now if you're not experienced with using Python classes,
2751
04:55:44,040 --> 04:55:49,240
I'm going to be using them throughout the course, and I'm going to call this one linear regression
2752
04:55:49,240 --> 04:55:56,440
model. If you haven't dealt with Python classes before, that's okay. I'm going to be explaining
2753
04:55:56,440 --> 04:56:03,000
what we're doing as we're doing it. But if you'd like a deeper dive, I'd recommend you to real Python
2754
04:56:04,120 --> 04:56:12,520
classes. OOP in Python three. That's a good rhyming. So I'm just going to link this here.
2755
04:56:12,520 --> 04:56:23,400
Because we're going to be building classes throughout the course,
2756
04:56:23,400 --> 04:56:31,480
I'd recommend getting familiar with OOP, which is object oriented programming, a little bit of a
2757
04:56:31,480 --> 04:56:43,880
mouthful, hence the OOP in Python. To do so, you can use the following resource from real Python.
2758
04:56:43,880 --> 04:56:48,280
But when I'm going to go through that now, I'd rather just code it out and talk it out while we
2759
04:56:48,280 --> 04:56:53,080
do it. So we've got a class here. Now the first thing you might notice is that the class inherits
2760
04:56:53,080 --> 04:57:00,040
from nn.module. And you might be wondering, well, what's nn.module? Well, let's write down here,
2761
04:57:00,040 --> 04:57:12,840
almost everything in PyTorch inherits from nn.module. So you can imagine nn.module as the
2762
04:57:12,840 --> 04:57:20,520
Lego building bricks of PyTorch model. And so nn.module has a lot of helpful inbuilt things that's
2763
04:57:20,520 --> 04:57:25,000
going to help us build our PyTorch models. And of course, how could you learn more about it?
2764
04:57:25,000 --> 04:57:33,080
Well, you could go nn.module, PyTorch. Module. Here we go. Base class for all neural network
2765
04:57:33,080 --> 04:57:38,760
modules. Wonderful. Your models should also subclass this class. So that's what we're building. We're
2766
04:57:38,760 --> 04:57:44,120
building our own PyTorch model. And so the documentation here says that your models should
2767
04:57:44,120 --> 04:57:49,800
also subclass this class. And another thing with PyTorch, this is what makes it, it might seem very
2768
04:57:49,800 --> 04:57:56,360
confusing when you first begin. But modules can contain other modules. So what I mean by being a
2769
04:57:56,360 --> 04:58:01,880
Lego brick is that you can stack these modules on top of each other and make progressively more
2770
04:58:01,880 --> 04:58:08,360
complex neural networks as you go. But we'll leave that for later on. For now, we're going to start
2771
04:58:08,360 --> 04:58:15,640
with something nice and simple. And let's clean up our web browser. So we're going to create a
2772
04:58:15,640 --> 04:58:23,720
constructor here, which is with the init function. It's going to take self as a parameter. If you're
2773
04:58:23,720 --> 04:58:29,000
not sure of what's going on here, just follow along with the code for now. And I'd encourage you
2774
04:58:29,000 --> 04:58:37,640
to read this documentation here after the video. So then we have super dot init. I know when I
2775
04:58:37,640 --> 04:58:40,920
first started learning this, I was like, why do we have to write a knit twice? And then what's
2776
04:58:40,920 --> 04:58:47,640
super and all that jazz. But just for now, just take this as being some required Python syntax.
2777
04:58:48,280 --> 04:58:54,040
And then we have self dot weights. So that means we're going to create a weights parameter. We'll
2778
04:58:54,040 --> 04:58:59,720
see why we do this in a second. And to create that parameter, we're going to use nn dot parameter.
2779
04:59:00,280 --> 04:59:08,280
And just a quick reminder that we imported nn from torch before. And if you remember,
2780
04:59:08,280 --> 04:59:15,880
nn is the building block layer for neural networks. And within nn, so nn stands for neural network
2781
04:59:15,880 --> 04:59:24,520
is module. So we've got nn dot parameter. Now, we're going to start with random parameters.
2782
04:59:25,240 --> 04:59:32,120
So torch dot rand n. One, we're going to talk through each of these in a second. So I'm also
2783
04:59:32,120 --> 04:59:39,560
going to put requires, requires grad equals true. We haven't touched any of these, but that's okay.
2784
04:59:40,120 --> 04:59:50,360
D type equals torch dot float. So let's see what nn parameter tells us. What do we have here?
2785
04:59:53,080 --> 04:59:58,440
A kind of tensor that is to be considered a module parameter. So we've just created a module
2786
04:59:58,440 --> 05:00:04,280
using nn module. Parameters are torch tensor subclasses. So this is a tensor in itself
2787
05:00:05,000 --> 05:00:09,480
that have a very special property when used with modules. When they're assigned as a module
2788
05:00:09,480 --> 05:00:14,760
attribute, they are automatically added to the list of its parameters. And we'll appear e g
2789
05:00:14,760 --> 05:00:20,440
in module dot parameters iterator. Oh, we're going to see that later on. Assigning a tensor
2790
05:00:20,440 --> 05:00:28,040
doesn't have such effect. So we're creating a parameter here. Now requires grad. What does that
2791
05:00:28,040 --> 05:00:32,680
mean? Well, let's just rather than just try to read the doc string collab, let's look it up.
2792
05:00:32,680 --> 05:00:42,600
nn dot parameter. What does it say requires grad optional. If the parameter requires gradient.
2793
05:00:43,400 --> 05:00:51,160
Hmm. What does requires gradient mean? Well, let's come back to that in a second. And then
2794
05:00:51,160 --> 05:00:56,680
for now, I just want you to think about it. D type equals torch dot float. Now,
2795
05:00:56,680 --> 05:01:02,920
the data type here torch dot float is, as we've discussed before, is the default
2796
05:01:02,920 --> 05:01:08,360
for pytorch to watch dot float. This could also be torch dot float 32. So we're just going to
2797
05:01:08,360 --> 05:01:14,920
leave it as torch float 32, because pytorch likes to work with flight 32. Now, do we have
2798
05:01:17,160 --> 05:01:24,280
this by default? We do. So we don't necessarily have to set requires grad equals true. So just
2799
05:01:24,280 --> 05:01:33,000
keep that in mind. So now we've created a parameter for the weights. We also have to create a parameter
2800
05:01:33,000 --> 05:01:41,080
for the bias. Let's finish creating this. And then we'll write the code, then we'll talk about it.
2801
05:01:41,080 --> 05:01:52,120
So rand n. Now requires grad equals true. And d type equals torch dot float. There we go.
2802
05:01:52,120 --> 05:02:01,000
And now we're going to write a forward method. So forward method to define the computation
2803
05:02:02,040 --> 05:02:14,520
in the model. So let's go def forward, which self takes in a parameter x, which is data,
2804
05:02:14,520 --> 05:02:23,720
which X is expected to be of type torch tensor. And it returns a torch dot tensor. And then we go
2805
05:02:23,720 --> 05:02:28,760
here. And so we say X, we don't necessarily need this comment. I'm just going to write it anyway.
2806
05:02:28,760 --> 05:02:36,280
X is the input data. So in our case, it might be the training data. And then from here, we want
2807
05:02:36,280 --> 05:02:46,440
it to return self dot weights times X plus self dot bias. Now, where have we seen this before?
2808
05:02:47,480 --> 05:02:56,440
Well, this is the linear regression formula. Now, let's take a step back into how we created our data.
2809
05:02:56,440 --> 05:02:59,560
And then we'll go back through and talk a little bit more about what's going on here.
2810
05:02:59,560 --> 05:03:08,680
So if we go back up to our data, where did we create that? We created it here. So you see how
2811
05:03:08,680 --> 05:03:16,520
we've created known parameters, weight and bias. And then we created our y variable, our target,
2812
05:03:16,520 --> 05:03:23,320
using the linear regression formula, wait times X plus bias, and X were a range of numbers.
2813
05:03:23,320 --> 05:03:29,560
So what we've done with our linear regression model that we've created from scratch,
2814
05:03:29,560 --> 05:03:37,880
if we go down here, we've created a parameter, weights. This could just be weight, if we wanted to.
2815
05:03:38,440 --> 05:03:44,840
We've created a parameter here. So when we created our data, we knew what the parameters weight and
2816
05:03:44,840 --> 05:03:52,200
bias were. The whole goal of our model is to start with random numbers. So these are going to be
2817
05:03:52,200 --> 05:03:58,440
random parameters. And to look at the data, which in our case will be the training samples,
2818
05:03:59,160 --> 05:04:07,400
and update those random numbers to represent the pattern here. So ideally, our model, if it's
2819
05:04:07,400 --> 05:04:13,800
learning correctly, will take our weight, which is going to be a random value, and our bias,
2820
05:04:13,800 --> 05:04:18,120
which is going to be a random value. And it will run it through this forward calculation,
2821
05:04:18,120 --> 05:04:25,720
which is the same formula that we use to create our data. And it will adjust the weight and bias
2822
05:04:25,720 --> 05:04:34,520
to represent as close as possible, if not perfect, the known parameters. So that's the premise of
2823
05:04:34,520 --> 05:04:41,960
machine learning. And how does it do this? Through an algorithm called gradient descent. So I'm just
2824
05:04:41,960 --> 05:04:46,360
going to write this down because we've talked a lot about this, but I'd like to just tie it together
2825
05:04:46,360 --> 05:05:01,880
here. So what our model does, so start with random values, weight and bias, look at training data,
2826
05:05:01,880 --> 05:05:21,480
and adjust the random values to better represent the, or get closer to the ideal values. So the
2827
05:05:21,480 --> 05:05:33,240
weight and bias values we use to create the data. So that's what it's going to do. It's going to
2828
05:05:33,240 --> 05:05:39,000
start with random values, and then continually look at our training data to see if it can adjust
2829
05:05:39,000 --> 05:05:46,120
those random values to be what would represent this straight line here. Now, how does it do so?
2830
05:05:46,120 --> 05:06:01,240
How does it do so? Through two main algorithms. So one is gradient descent, and two is back
2831
05:06:01,240 --> 05:06:12,120
propagation. So I'm going to leave it here for the time being, but we're going to continue talking
2832
05:06:12,120 --> 05:06:21,480
about this gradient descent is why we have requires grad equals true. And so what this is going to
2833
05:06:21,480 --> 05:06:28,680
do is when we run computations using this model here, pytorch is going to keep track of the gradients
2834
05:06:28,680 --> 05:06:36,280
of our weights parameter and our bias parameter. And then it's going to update them through a
2835
05:06:36,280 --> 05:06:43,000
combination of gradient descent and back propagation. Now, I'm going to leave this as extracurricular
2836
05:06:43,000 --> 05:06:46,120
for you to look through and gradient descent and back propagation. I'm going to add some
2837
05:06:46,120 --> 05:06:51,240
resources here. There will also be plenty of resources in the pytorch workflow fundamentals
2838
05:06:51,240 --> 05:06:57,000
book chapter on how these algorithms work behind the scenes. We're going to be focused on the code,
2839
05:06:57,000 --> 05:07:02,280
the pytorch code, to trigger these algorithms behind the scenes. So pytorch, lucky for us,
2840
05:07:02,280 --> 05:07:09,080
has implemented gradient descent and back propagation for us. So we're writing the higher level code
2841
05:07:09,080 --> 05:07:14,040
here to trigger these two algorithms. So in the next video, we're going to step through this a
2842
05:07:14,040 --> 05:07:21,560
little bit more, and then further discuss some of the most useful and required modules of pytorch,
2843
05:07:21,560 --> 05:07:27,560
particularly an N and a couple of others. So let's leave it there, and I'll see you in the next video.
2844
05:07:27,560 --> 05:07:35,240
Welcome back. In the last video, we covered a whole bunch in creating our first pytorch model
2845
05:07:35,240 --> 05:07:40,040
that inherits from nn.module. We talked about object oriented programming and how a lot of
2846
05:07:40,040 --> 05:07:45,400
pytorch uses object oriented programming. I can't say that. I might just say OOP for now.
2847
05:07:45,400 --> 05:07:51,480
What I've done since last video, though, is I've added two resources here for gradient descent
2848
05:07:51,480 --> 05:07:57,880
and back propagation. These are two of my favorite videos on YouTube by the channel three blue
2849
05:07:57,880 --> 05:08:02,280
one brown. So this is on gradient descent. I would highly recommend watching this entire series,
2850
05:08:02,280 --> 05:08:08,360
by the way. So that's your extra curriculum for this video, in particular, and for this course overall
2851
05:08:08,360 --> 05:08:13,240
is to go through these two videos. Even if you're not sure entirely what's happening,
2852
05:08:13,240 --> 05:08:17,960
you will gain an intuition for the code that we're going to be writing with pytorch.
2853
05:08:17,960 --> 05:08:23,480
So just keep that in mind as we go forward, a lot of what pytorch is doing behind the scenes for us
2854
05:08:23,480 --> 05:08:32,520
is taking care of these two algorithms for us. And we also created two parameters here in our model
2855
05:08:32,520 --> 05:08:39,720
where we've instantiated them as random values. So one parameter for each of the ones that we use,
2856
05:08:39,720 --> 05:08:44,680
the weight and bias for our data set. And now I want you to keep in mind that we're working
2857
05:08:44,680 --> 05:08:50,440
with a simple data set here. So we've created our known parameters. But in a data set that you
2858
05:08:50,440 --> 05:08:54,760
haven't created by yourself, you've maybe gathered that from the internet, such as images,
2859
05:08:55,560 --> 05:09:02,840
you won't be necessarily defining these parameters. Instead, another module from nn will define the
2860
05:09:02,840 --> 05:09:10,760
parameters for you. And we'll work out what those parameters should end up being. But since we're
2861
05:09:10,760 --> 05:09:16,760
working with a simple data set, we can define our two parameters that we're trying to estimate.
2862
05:09:16,760 --> 05:09:21,720
This is a key point here is that our model is going to start with random values. That's the
2863
05:09:21,720 --> 05:09:27,240
annotation I've added here. Start with a random weight value using torch random. And then we've
2864
05:09:27,240 --> 05:09:32,920
told it that it can update via gradient descent. So pytorch is going to track the gradients of
2865
05:09:32,920 --> 05:09:37,720
this parameter for us. And then we've told it that the D type we want is float 32. We don't
2866
05:09:37,720 --> 05:09:43,080
necessarily need these two set explicitly, because a lot of the time the default in pytorch is to
2867
05:09:43,080 --> 05:09:49,080
set these two requires grad equals true and d type equals torch dot float. Does that for us
2868
05:09:49,080 --> 05:09:54,280
behind the scenes? But just to keep things as fundamental and as straightforward as possible,
2869
05:09:54,280 --> 05:10:01,000
we've set all of this explicitly. So let's jump into the keynote. I'd just like to explain
2870
05:10:01,000 --> 05:10:06,840
what's going on one more time in a visual sense. So here's the exact code that we've
2871
05:10:06,840 --> 05:10:12,760
just written. I've just copied it from here. And I've just made it a little bit more colorful.
2872
05:10:13,480 --> 05:10:21,160
But here's what's going on. So when you build a model in pytorch, it subclasses the nn.modgable
2873
05:10:21,160 --> 05:10:27,560
class. This contains all the building blocks for neural networks. So our class of model, subclasses
2874
05:10:27,560 --> 05:10:36,680
nn.modgable. Now, inside the constructor, we initialize the model parameters. Now, as we'll see,
2875
05:10:36,680 --> 05:10:44,600
later on with bigger models, we won't necessarily always explicitly create the weights and biases.
2876
05:10:45,160 --> 05:10:49,880
We might initialize whole layers. Now, this is a concept we haven't touched on yet, but
2877
05:10:50,440 --> 05:10:57,480
we might initialize a list of layers or whatever we need. So basically, what happens in here is that
2878
05:10:57,480 --> 05:11:04,760
we create whatever variables that we need for our model to use. And so these could be different
2879
05:11:04,760 --> 05:11:10,200
layers from torch.nn, single parameters, which is what we've done in our case, hard coded values,
2880
05:11:10,200 --> 05:11:18,760
or even functions. Now, we've explicitly set requires grad equals true for our model parameters.
2881
05:11:19,320 --> 05:11:24,200
So this, in turn, means that pytorch behind the scenes will track all of the gradients
2882
05:11:24,200 --> 05:11:31,960
for these parameters here for use with torch.auto grad. So torch.auto grad module of pytorch is what
2883
05:11:31,960 --> 05:11:36,840
implements gradient descent. Now, a lot of this will happen behind the scenes for when we write
2884
05:11:36,840 --> 05:11:41,320
our pytorch training code. So if you'd like to know what's happening behind the scenes,
2885
05:11:41,320 --> 05:11:45,000
I'd highly recommend you checking out these two videos, hence is why I've linked them here.
2886
05:11:46,920 --> 05:11:52,280
Oh, and for many pytorch.nn modules requires grad is true is set by default.
2887
05:11:53,720 --> 05:12:00,200
Finally, we've got a forward method. Now, any subclass of nn.modgable, which is what we've done,
2888
05:12:00,200 --> 05:12:05,800
requires a forward method. Now, we can see this in the documentation. If we go torch
2889
05:12:06,520 --> 05:12:07,800
dot nn.modgable.
2890
05:12:10,440 --> 05:12:13,160
Click on module. Do we have forward?
2891
05:12:16,680 --> 05:12:22,040
Yeah, there we go. So forward, we've got a lot of things built into an nn.modgable.
2892
05:12:22,760 --> 05:12:28,680
So you see here, this is a subclass of an nn.modgable. And then we have forward.
2893
05:12:28,680 --> 05:12:34,280
So forward is what defines the computation performed at every call. So if we were
2894
05:12:34,280 --> 05:12:39,800
to call linear regression model and put some data through it, the forward method is the
2895
05:12:39,800 --> 05:12:46,360
operation that this module does that this model does. And in our case, our forward method is
2896
05:12:46,360 --> 05:12:52,840
the linear regression function. So keep this in mind, any subclass of nn.modgable needs to
2897
05:12:52,840 --> 05:12:56,920
override the forward method. So you need to define a forward method if you're going to subclass
2898
05:12:56,920 --> 05:13:03,480
nn.modgable. We'll see this very hands on. But for now, I believe that's enough coverage of what
2899
05:13:03,480 --> 05:13:10,120
we've done. If you have any questions, remember, you can ask it in the discussions. We've got a
2900
05:13:10,120 --> 05:13:17,560
fair bit going on here. But I think we've broken it down a fair bit. The next step is for us to,
2901
05:13:17,560 --> 05:13:22,280
I know I mentioned this in a previous video is to cover some PyTorch model building essentials.
2902
05:13:22,280 --> 05:13:27,560
But we're going to cover a few more of them. We've seen some already. But the next way to really
2903
05:13:27,560 --> 05:13:33,000
start to understand what's going on is to check the contents of our model, train one, and make
2904
05:13:33,000 --> 05:13:38,520
some predictions with it. So let's get hands on with that in the next few videos. I'll see you there.
2905
05:13:42,040 --> 05:13:47,640
Welcome back. In the last couple of videos, we stepped through creating our first PyTorch model.
2906
05:13:47,640 --> 05:13:52,520
And it looks like there's a fair bit going on here. But some of the main takeaways is that almost
2907
05:13:52,520 --> 05:14:00,040
every model in PyTorch inherits from nn.modgable. And if you are going to inherit from nn.modgable,
2908
05:14:00,040 --> 05:14:04,360
you should override the forward method to define what computation is happening in your model.
2909
05:14:05,160 --> 05:14:10,680
And for later on, when our model is learning things, in other words, updating its weights and
2910
05:14:10,680 --> 05:14:17,880
bias values from random values to values that better fit the data, it's going to do so via
2911
05:14:17,880 --> 05:14:22,840
gradient descent and back propagation. And so these two videos are some extra curriculum
2912
05:14:22,840 --> 05:14:27,880
for what's happening behind the scenes. But we haven't actually written any code yet to trigger
2913
05:14:27,880 --> 05:14:33,000
these two. So I'll refer back to these when we actually do write code to do that. For now,
2914
05:14:33,000 --> 05:14:41,240
we've just got a model that defines some forward computation. But speaking of models, let's have
2915
05:14:41,240 --> 05:14:45,880
a look at a couple of PyTorch model building essentials. So we're not going to write too much
2916
05:14:45,880 --> 05:14:50,680
code for this video, and it's going to be relatively short. But I just want to introduce you to some
2917
05:14:50,680 --> 05:14:54,920
of the main classes that you're going to be interacting with in PyTorch. And we've seen
2918
05:14:54,920 --> 05:15:02,040
some of these already. So one of the first is torch.nn. So contains all of the building blocks
2919
05:15:02,040 --> 05:15:08,200
for computational graphs. Computational graphs is another word for neural networks.
2920
05:15:09,320 --> 05:15:15,240
Well, actually computational graphs is quite general. I'll just write here, a neural network
2921
05:15:15,960 --> 05:15:28,360
can be considered a computational graph. So then we have torch.nn.parameter. We've seen this.
2922
05:15:28,360 --> 05:15:38,680
So what parameters should our model try and learn? And then we can write here often a PyTorch
2923
05:15:38,680 --> 05:15:50,040
layer from torch.nn will set these for us. And then we've got torch.nn.module, which is
2924
05:15:50,040 --> 05:16:00,440
what we've seen here. And so torch.nn.module is the base class for all neural network modules.
2925
05:16:03,240 --> 05:16:13,640
If you subclass it, you should overwrite forward, which is what we've done here. We've created our
2926
05:16:13,640 --> 05:16:19,960
own forward method. So what else should we cover here? We're going to see these later
2927
05:16:19,960 --> 05:16:28,600
on, but I'm going to put it here, torch.optim. This is where the optimizers in PyTorch live.
2928
05:16:29,320 --> 05:16:39,160
They will help with gradient descent. So optimizer, an optimizer is, as we've said before,
2929
05:16:39,720 --> 05:16:44,760
that our model starts with random values. And it looks at training data and adjusts the random
2930
05:16:44,760 --> 05:16:51,080
values to better represent the ideal values. The optimizer contains algorithm that's going to
2931
05:16:51,640 --> 05:16:58,840
optimize these values, instead of being random, to being values that better represent our data.
2932
05:16:59,400 --> 05:17:08,680
So those algorithms live in torch.optim. And then one more for now, I'll link to extra resources.
2933
05:17:08,680 --> 05:17:13,240
And we're going to cover them as we go. That's how I like to do things, cover them as we need them.
2934
05:17:13,240 --> 05:17:19,480
So all nn.module. So this is the forward method. I'm just going to explicitly say here that all
2935
05:17:19,480 --> 05:17:30,840
nn.module subclasses require you to overwrite forward. This method defines what happens
2936
05:17:31,640 --> 05:17:39,560
in the forward computation. So in our case, if we were to pass some data to our linear regression
2937
05:17:39,560 --> 05:17:45,400
model, the forward method would take that data and perform this computation here.
2938
05:17:45,960 --> 05:17:49,640
And as your models get bigger and bigger, ours is quite straightforward here.
2939
05:17:49,640 --> 05:17:54,680
This forward computation can be as simple or as complex as you like, depending on what you'd
2940
05:17:54,680 --> 05:18:02,280
like your model to do. And so I've got a nice and fancy slide here, which basically reiterates
2941
05:18:02,280 --> 05:18:06,040
what we've just discussed. PyTorch is central neural network building modules.
2942
05:18:06,040 --> 05:18:17,320
So the module torch.nn, torch.nn.module, torch.optim, torch.utils.dataset. We haven't actually talked
2943
05:18:17,320 --> 05:18:22,440
about this yet. And I believe there's one more data loader. We're going to see these two later on.
2944
05:18:22,440 --> 05:18:27,400
But these are very helpful when you've got a bit more of a complicated data set. In our case,
2945
05:18:27,400 --> 05:18:32,360
we've got just 50 integers for our data set. We've got a simple straight line. But when we need
2946
05:18:32,360 --> 05:18:38,280
to create more complex data sets, we're going to use these. So this will help us build models.
2947
05:18:39,160 --> 05:18:45,640
This will help us optimize our models parameters. And this will help us load data. And if you'd
2948
05:18:45,640 --> 05:18:50,920
like more, one of my favorite resources is the PyTorch cheat sheet. Again, we're referring
2949
05:18:50,920 --> 05:18:56,440
back to the documentation. See, all of this documentation, right? As I said, this course is
2950
05:18:56,440 --> 05:19:01,560
not a replacement for the documentation. It's just my interpretation of how one should best
2951
05:19:01,560 --> 05:19:08,760
become familiar with PyTorch. So we've got imports, the general import torch from torch.utils.dataset
2952
05:19:08,760 --> 05:19:13,960
data loader. Oh, did you look at that? We've got that mentioned here, data, data set data loader.
2953
05:19:14,520 --> 05:19:20,840
And torch, script and jit, neural network API. I want an X. I'll let you go through here.
2954
05:19:21,720 --> 05:19:26,360
We're covering some of the most fundamental ones here. But there's, of course, PyTorch is
2955
05:19:26,360 --> 05:19:32,680
quite a big library. So some extra curricula for this video would be to go through this for
2956
05:19:32,680 --> 05:19:36,520
five to 10 minutes and just read. You don't have to understand them all. We're going to start to
2957
05:19:36,520 --> 05:19:40,520
get more familiar with all of these. We're not all of them because, I mean, that would require
2958
05:19:40,520 --> 05:19:46,840
making videos for the whole documentation. But a lot of these through writing them via code.
2959
05:19:47,880 --> 05:19:54,040
So that's enough for this video. I'll link this PyTorch cheat sheet in the video here.
2960
05:19:54,040 --> 05:20:01,160
And in the next video, how about we, we haven't actually checked out what happens if we do
2961
05:20:01,160 --> 05:20:06,840
create an instance of our linear regression model. I think we should do that. I'll see you there.
2962
05:20:09,720 --> 05:20:16,680
Welcome back. In the last video, we covered some of the PyTorch model building essentials. And look,
2963
05:20:16,680 --> 05:20:21,560
I linked a cheat sheet here. There's a lot going on. There's a lot of text going on in the page.
2964
05:20:21,560 --> 05:20:27,560
Of course, the reference material for here is in the Learn PyTorch book. PyTorch model building
2965
05:20:27,560 --> 05:20:32,520
essentials under 0.1, which is the notebook we're working on here. But I couldn't help myself.
2966
05:20:32,520 --> 05:20:37,320
I wanted to add some color to this. So before we inspect our model, let's just add a little bit
2967
05:20:37,320 --> 05:20:43,560
of color to our text on the page. We go to whoa. Here's our workflow. This is what we're covering
2968
05:20:43,560 --> 05:20:50,280
in this video, right? Or in this module, 0.1. But to get data ready, here are some of the most
2969
05:20:50,280 --> 05:20:55,560
important PyTorch modules. Torchvision.transforms. We'll see that when we cover computer vision later
2970
05:20:55,560 --> 05:21:00,520
on. Torch.utils.data.data set. So that's if we want to create a data set that's a little bit
2971
05:21:00,520 --> 05:21:05,000
more complicated than because our data set is so simple, we haven't used either of these
2972
05:21:05,000 --> 05:21:12,040
data set creator or data loader. And if we go build a picker model, well, we can use torch.nn.
2973
05:21:12,040 --> 05:21:19,240
We've seen that one. We've seen torch.nn.module. So in our case, we're building a model. But if we
2974
05:21:19,240 --> 05:21:22,840
wanted a pre-trained model, well, there's some computer vision models that have already been
2975
05:21:22,840 --> 05:21:28,920
built for us in torchvision.models. Now torchvision stands for PyTorch's computer vision
2976
05:21:28,920 --> 05:21:34,040
module. So we haven't covered that either. But this is just a spoiler for what's coming on
2977
05:21:34,040 --> 05:21:39,400
later on. Then if the optimizer, if we wanted to optimize our model's parameters to better
2978
05:21:39,400 --> 05:21:45,640
represent a data set, we can go to torch.optim. Then if we wanted to evaluate the model,
2979
05:21:45,640 --> 05:21:49,320
well, we've got torch metrics for that. We haven't seen that, but we're going to be
2980
05:21:49,320 --> 05:21:53,640
hands-on with all of these later on. Then if we wanted to improve through experimentation,
2981
05:21:53,640 --> 05:22:00,280
we've got torch.utils.tensorboard. Hmm. What's this? But again, if you want more,
2982
05:22:00,280 --> 05:22:04,360
there's some at the PyTorch cheat sheet. But now this is just adding a little bit of color
2983
05:22:04,360 --> 05:22:09,000
and a little bit of code to our PyTorch workflow. And with that being said, let's get a little bit
2984
05:22:09,000 --> 05:22:18,520
deeper into what we've built, which is our first PyTorch model. So checking the contents of our
2985
05:22:18,520 --> 05:22:29,720
PyTorch model. So now we've created a model. Let's see what's inside. You might already be able
2986
05:22:30,760 --> 05:22:36,680
to guess this by the fact of what we've created in the constructor here in the init function.
2987
05:22:36,680 --> 05:22:42,680
So what do you think we have inside our model? And how do you think we'd look in that? Now,
2988
05:22:42,680 --> 05:22:45,720
of course, these are questions you might not have the answer to because you've just, you're like,
2989
05:22:45,720 --> 05:22:49,800
Daniel, I'm just starting to learn PyTorch. I don't know these, but I'm asking you just to start
2990
05:22:49,800 --> 05:22:59,400
thinking about these different things, you know? So we can check out our model parameters or what's
2991
05:22:59,400 --> 05:23:11,080
inside our model using, wait for it, dot parameters. Oh, don't you love it when things are nice and
2992
05:23:11,080 --> 05:23:16,760
simple? Well, let's check it out. Hey, well, first things we're going to do is let's create a random
2993
05:23:16,760 --> 05:23:25,560
seed. Now, why are we creating a random seed? Well, because recall, we're creating these parameters
2994
05:23:25,560 --> 05:23:32,360
with random values. And if we were to create them with outer random seed, we would get different
2995
05:23:32,360 --> 05:23:38,680
values every time. So for the sake of the educational sense, for the sake of this video,
2996
05:23:38,680 --> 05:23:44,600
we're going to create a manual seed here, torch dot manual seed. I'm going to use 42 or maybe 43,
2997
05:23:44,600 --> 05:23:52,360
I could use 43 now 42 because I love 42. It's the answer to the universe. And we're going to create
2998
05:23:52,360 --> 05:24:01,400
an instance of the model that we created. So this is a subclass of an end up module.
2999
05:24:02,840 --> 05:24:07,640
So let's do it. Model zero, because it's going to be the zeroth model, the first model that
3000
05:24:07,640 --> 05:24:15,080
we've ever created in this whole course, how amazing linear regression model, which is what
3001
05:24:15,080 --> 05:24:21,160
our class is called. So we can just call it like that. That's all I'm doing, just calling this class.
3002
05:24:21,160 --> 05:24:27,160
And so let's just see what happens there. And then if we go model zero, what does it give us? Oh,
3003
05:24:27,160 --> 05:24:32,440
linear regression. Okay, it doesn't give us much. But we want to find out what's going on in here.
3004
05:24:32,440 --> 05:24:45,880
So check out the parameters. So model zero dot parameters. What do we get from this? Oh, a generator.
3005
05:24:45,880 --> 05:24:53,880
Well, let's turn this into a list that'll be better to look at. There we go. Oh, how exciting is that?
3006
05:24:53,880 --> 05:25:01,880
So parameter containing. Look at the values tensor requires grad equals true parameter containing
3007
05:25:01,880 --> 05:25:11,880
wonderful. So these are our model parameters. So why are they the values that they are? Well,
3008
05:25:11,880 --> 05:25:20,040
it's because we've used torch rand n. Let's see what happens if we go, let's just create torch dot
3009
05:25:20,040 --> 05:25:26,600
rand n one, what happens? We get a value like that. And now if we run this again,
3010
05:25:28,520 --> 05:25:32,440
we get the same values. But if we run this again, so keep this in one two, three, four,
3011
05:25:32,440 --> 05:25:38,040
five, actually, that's, wow, that's pretty cool that we got a random value that was all in order,
3012
05:25:38,040 --> 05:25:44,040
four in a row. Can we do it twice in a row? Probably not. Oh, we get it the same one. Now,
3013
05:25:44,040 --> 05:25:49,560
why is that? Oh, we get a different one. Did we just get the same one twice? Oh, my gosh,
3014
05:25:49,560 --> 05:25:55,960
we got the same value twice in a row. You saw that. You saw that. That's incredible. Now,
3015
05:25:55,960 --> 05:26:01,640
the reason why we get this is because this one is different every time because there's no random
3016
05:26:01,640 --> 05:26:12,600
seed. Watch if we put the random seed here, torch dot manual seed, 42, 3, 3, 6, 7, what happens?
3017
05:26:13,640 --> 05:26:20,680
3, 3, 6, 7, what happens? 3, 3, 6, 7. Okay. And what if we commented out the random seed
3018
05:26:20,680 --> 05:26:27,080
here, initialized our model, different values, two, three, five, two, three, four, five, it must
3019
05:26:27,080 --> 05:26:34,280
like that value. Oh, my goodness. Let me know if you get that value, right? So if we keep going,
3020
05:26:34,280 --> 05:26:39,080
we get different values every single time. Why is this? Why are we getting different values
3021
05:26:39,080 --> 05:26:43,320
every single time? You might be, Daniel, you sound like a broken record, but I'm trying to
3022
05:26:43,320 --> 05:26:49,640
really drive home the fact that we initialize our models with random parameters. So this is the
3023
05:26:49,640 --> 05:26:53,560
essence of what our machine learning models and deep learning models are going to do. Start with
3024
05:26:53,560 --> 05:26:59,640
random values, weights and bias. Maybe we've only got two parameters here, but the future models
3025
05:26:59,640 --> 05:27:03,960
that we build might have thousands. And so of course, we're not going to do them all by hand.
3026
05:27:03,960 --> 05:27:08,920
We'll see how we do that later on. But for now, we start with random values. And our ideal model
3027
05:27:08,920 --> 05:27:13,640
will look at the training data and adjust these random values. But just so that we can get
3028
05:27:13,640 --> 05:27:20,520
reproducible results, I'll get rid of this cell. I've set the random seed here. So you should be
3029
05:27:20,520 --> 05:27:24,760
getting similar values to this. If you're not, because there's maybe some sort of pytorch update
3030
05:27:24,760 --> 05:27:29,160
and how the random seeds calculated, you might get slightly different values. But for now,
3031
05:27:29,160 --> 05:27:36,360
we'll use torch.manualc.42. And I want you to just be aware of this can be a little bit confusing.
3032
05:27:37,320 --> 05:27:45,080
If you just do the list of parameters, for me, I understand it better if I list the name parameters.
3033
05:27:45,080 --> 05:27:54,040
So the way we do that is with model zero, and we call state dict on it. This is going to give us
3034
05:27:54,040 --> 05:27:59,960
our dictionary of the parameters of our model. So as you can see here, we've got weights,
3035
05:27:59,960 --> 05:28:06,680
and we've got bias, and they are random values. So where did weights and bias come from? Well,
3036
05:28:06,680 --> 05:28:12,040
of course, they came from here, weights, bias. But of course, as well up here,
3037
05:28:12,040 --> 05:28:22,280
we've got known parameters. So now our whole goal is what? Our whole goal is to build code,
3038
05:28:22,280 --> 05:28:27,960
or write code, that is going to allow our model to look at these blue dots here,
3039
05:28:28,600 --> 05:28:40,040
and adjust this weight and bias value to be weights as close as possible to weight and bias.
3040
05:28:40,040 --> 05:28:49,160
Now, how do we go from here and here to here and here? Well, we're going to see that in future
3041
05:28:49,160 --> 05:28:57,800
videos, but the closer we get these values to these two, the better we're going to be able to
3042
05:28:58,360 --> 05:29:05,880
predict and model our data. Now, this principle, I cannot stress enough, is the fundamental
3043
05:29:05,880 --> 05:29:10,440
entire foundation, the fundamental foundation. Well, good description, Daniel. The entire
3044
05:29:10,440 --> 05:29:16,120
foundation of deep learning, we start with some random values, and we use gradient descent and
3045
05:29:16,120 --> 05:29:22,280
back propagation, plus whatever data that we're working with to move these random values as close
3046
05:29:22,280 --> 05:29:29,640
as possible to the ideal values. And in most cases, you won't know what the ideal values are.
3047
05:29:30,280 --> 05:29:33,640
But in our simple case, we already know what the ideal values are.
3048
05:29:33,640 --> 05:29:38,920
So just keep that in mind going forward. The premise of deep learning is to start with random
3049
05:29:38,920 --> 05:29:46,040
values and make them more representative closer to the ideal values. With that being said,
3050
05:29:46,040 --> 05:29:50,920
let's try and make some predictions with our model as it is. I mean, it's got random values.
3051
05:29:50,920 --> 05:29:55,480
How do you think the predictions will go? So I think in the next video, we'll make some predictions
3052
05:29:55,480 --> 05:30:03,960
on this test data and see what they look like. I'll see you there. Welcome back. In the last
3053
05:30:03,960 --> 05:30:10,520
video, we checked out the internals of our first PyTorch model. And we found out that because we're
3054
05:30:10,520 --> 05:30:17,000
creating our model with torch dot or the parameters of our model, with torch dot rand, they begin as
3055
05:30:17,000 --> 05:30:22,360
random variables. And we also discussed the entire premise of deep learning is to start with random
3056
05:30:22,360 --> 05:30:28,200
numbers and slowly progress those towards more ideal numbers, slightly less random numbers based
3057
05:30:28,200 --> 05:30:36,280
on the data. So let's see, before we start to improve these numbers, let's see what their predictive
3058
05:30:36,280 --> 05:30:41,880
power is like right now. Now you might be able to guess how well these random numbers will be
3059
05:30:41,880 --> 05:30:47,960
able to predict on our data. You're not sure what that predicting means? Let's have a look. So making
3060
05:30:47,960 --> 05:30:56,280
predictions using torch dot inference mode, something we haven't seen. But as always, we're going to
3061
05:30:56,280 --> 05:31:07,160
discuss it while we use it. So to check our models predictive power, let's see how well
3062
05:31:07,160 --> 05:31:18,760
it predicts Y test based on X test. Because remember again, another premise of a machine
3063
05:31:18,760 --> 05:31:24,120
learning model is to take some features as input and make some predictions close to some sort of
3064
05:31:24,120 --> 05:31:39,240
labels. So when we pass data through our model, it's going to run it through the forward method.
3065
05:31:41,400 --> 05:31:47,480
So here's where it's a little bit confusing. We defined a forward method and it takes X as input.
3066
05:31:47,480 --> 05:31:52,280
Now I've done a little X, but we're going to pass it in a large X as its input. But the reason why I've
3067
05:31:52,280 --> 05:31:57,480
done a little X is because oftentimes in pytorch code, you're going to find all over the internet
3068
05:31:57,480 --> 05:32:03,240
is that X is quite common, commonly used in the forward method here, like this as the input data.
3069
05:32:03,240 --> 05:32:06,200
So I've just left it there because that's what you're going to find quite often.
3070
05:32:07,080 --> 05:32:13,000
So let's test it out. We haven't discussed what inference mode does yet, but we will make predictions
3071
05:32:13,000 --> 05:32:19,480
with model. So with torch dot inference mode, let's use it. And then we will discuss what's going
3072
05:32:19,480 --> 05:32:32,920
on. Y threads equals a model zero X test. So that's all we're doing. We're passing the X test data
3073
05:32:34,520 --> 05:32:41,400
through our model. Now, when we pass this X test in here, let's remind ourselves of what X test is.
3074
05:32:41,400 --> 05:32:51,320
X test 10 variables here. And we're trying to our ideal model will predict the exact values of Y test.
3075
05:32:51,320 --> 05:32:58,200
So this is what our model will do if it's a perfect model. It will take these X test values as input,
3076
05:32:58,200 --> 05:33:06,360
and it will return these Y test values as output. That's an ideal model. So the predictions are the
3077
05:33:06,360 --> 05:33:11,720
exact same as the test data set. How do you think our model will go considering it's starting with
3078
05:33:11,720 --> 05:33:19,160
random values as its parameters? Well, let's find out, hey. So what can that Y threads?
3079
05:33:20,760 --> 05:33:27,320
Oh, what's happened here? Not implemented error. Ah, this is an error I get quite often in Google
3080
05:33:27,320 --> 05:33:33,880
Colab when I'm creating a high-torch model. Now, it usually happens. I'm glad we've stumbled upon
3081
05:33:33,880 --> 05:33:38,600
this. And I think I know the fix. But if not, we might see a little bit of troubleshooting in this
3082
05:33:38,600 --> 05:33:46,760
video is that when we create this, if you see this not implemented error, right, it's saying that
3083
05:33:46,760 --> 05:33:52,280
the Ford method. Here we go. Ford implemented. There we go. It's a little bit of a rabbit hole
3084
05:33:52,280 --> 05:33:57,240
this not implemented area. I've come across it a fair few times and it took me a while to figure
3085
05:33:57,240 --> 05:34:04,760
out that for some reason the spacing. So in Python, you know how you have space space and that defines
3086
05:34:04,760 --> 05:34:09,480
a function space space. There's another thing there and another line there. For some reason,
3087
05:34:09,480 --> 05:34:13,240
if you look at this line in my notebook, and by the way, if you don't have these lines or if you
3088
05:34:13,240 --> 05:34:19,720
don't have these numbers, you can go into tools, settings, editor, and then you can define them here.
3089
05:34:19,720 --> 05:34:25,240
So show line numbers, show notation guides, all that sort of jazz there. You can customize what's
3090
05:34:25,240 --> 05:34:31,720
going on. But I just have these two on because I've run into this error a fair few times. And so
3091
05:34:31,720 --> 05:34:39,080
it's because that this Ford method is not in line with this bracket here. So we need to highlight
3092
05:34:39,080 --> 05:34:45,560
this and click shift tab, move it over. So now you see that it's in line here. And then if we run
3093
05:34:45,560 --> 05:34:51,640
this, won't change any output there. See, that's the hidden gotcha. Is that when we ran this before,
3094
05:34:51,640 --> 05:35:01,320
it found no error. But then when we run it down here, it works. So just keep that in mind. I'm
3095
05:35:01,320 --> 05:35:07,720
really glad we stumbled upon that because indentation errors, not implemented errors,
3096
05:35:07,720 --> 05:35:12,440
one of the most common errors you'll find in PyTorch, or in, well, when you're writing PyTorch
3097
05:35:12,440 --> 05:35:18,840
code in Google CoLab, I'm not sure why, but it just happens. So these are our models predictions
3098
05:35:18,840 --> 05:35:24,040
so far by running the test data through our models Ford method that we defined. And so if
3099
05:35:24,040 --> 05:35:31,720
we look at Y test, are these close? Oh my gosh, they are shocking. So why don't we visualize them?
3100
05:35:33,240 --> 05:35:38,920
Plot predictions. And we're going to put in predictions equals Y threads.
3101
05:35:40,920 --> 05:35:46,840
Let's have a look. Oh my goodness. All the way over here. Remember how we discussed before
3102
05:35:46,840 --> 05:35:52,520
that an ideal model will have, what, red dots on top of the green dots because our ideal model
3103
05:35:52,520 --> 05:35:57,880
will be perfectly predicting the test data. So right now, because our model is initialized with
3104
05:35:57,880 --> 05:36:04,680
random parameters, it's basically making random predictions. So they're extremely far from where
3105
05:36:04,680 --> 05:36:09,880
our ideal predictions are, is that we'll have some training data. And our model predictions,
3106
05:36:09,880 --> 05:36:14,600
when we first create our model will be quite bad. But we want to write some code that will
3107
05:36:14,600 --> 05:36:19,640
hopefully move these red dots closer to these green dots. I'm going to see how we can do that in
3108
05:36:20,280 --> 05:36:26,760
later videos. But we did one thing up here, which we haven't discussed, which is with torch dot
3109
05:36:26,760 --> 05:36:33,800
inference mode. Now this is a context manager, which is what happens when we're making predictions.
3110
05:36:33,800 --> 05:36:38,760
So making predictions, another word for predictions is inference torch uses inference. So I'll try
3111
05:36:38,760 --> 05:36:43,320
to use that a bit more, but I like to use predictions as well. We could also just go
3112
05:36:43,320 --> 05:36:52,120
Y preds equals model zero dot X test. And we're going to get quite a similar output.
3113
05:36:56,120 --> 05:37:02,120
Right. But I've put on inference mode because I want to start making that a habit for later on,
3114
05:37:02,120 --> 05:37:06,360
when we make predictions, put on inference mode. Now why do this? You might notice something different.
3115
05:37:07,320 --> 05:37:12,440
What's the difference here between the outputs? Y preds equals model. There's no inference mode
3116
05:37:12,440 --> 05:37:19,160
here, no context manager. Do you notice that there's a grad function here? And we don't need to go
3117
05:37:19,160 --> 05:37:24,920
into discussing what exactly this is doing here. But do you notice that this one is lacking that
3118
05:37:24,920 --> 05:37:30,680
grad function? So do you remember how behind the scenes I said that pie torch does a few things
3119
05:37:31,880 --> 05:37:36,680
with requires grad equals true, it keeps track of the gradients of different parameters so that
3120
05:37:36,680 --> 05:37:44,360
they can be used in gradient descent and back propagation. Now what inference mode does is it
3121
05:37:44,360 --> 05:37:53,080
turns off that gradient tracking. So it essentially removes all of the, because when we're doing
3122
05:37:53,080 --> 05:37:56,920
inference, we're not doing training. So we don't need to keep track of the gradient. So we don't
3123
05:37:56,920 --> 05:38:03,800
need to do to keep track of how we should update our models. So inference mode disables all of the
3124
05:38:03,800 --> 05:38:09,960
useful things that are available during training. What's the benefit of this? Well, it means that
3125
05:38:09,960 --> 05:38:15,800
pie torch behind the scenes is keeping track of less data. So in turn, it will, with our small
3126
05:38:15,800 --> 05:38:20,520
data set, it probably won't be too dramatic. But with a larger data set, it means that your
3127
05:38:20,520 --> 05:38:26,680
predictions will potentially be a lot faster because a whole bunch of numbers aren't being
3128
05:38:26,680 --> 05:38:31,800
kept track of or a whole bunch of things that you don't need during prediction mode or inference
3129
05:38:31,800 --> 05:38:37,560
mode. That's why it's called inference mode. I'm not being saved to memory. If you'd like to
3130
05:38:37,560 --> 05:38:44,760
learn more about this, you go pie torch inference mode Twitter. I just remember to search for Twitter
3131
05:38:44,760 --> 05:38:53,800
because they did a big tweet storm about it. Here we go. So oh, this is another thing that we can
3132
05:38:53,800 --> 05:38:57,960
cover. I'm going to copy this in here. But there's also a blog post about what's going on behind
3133
05:38:57,960 --> 05:39:03,320
the scenes. Long story short, it makes your code faster. Want to make your inference code and pie
3134
05:39:03,320 --> 05:39:08,360
torch run faster? Here's a quick thread on doing exactly that. And that's what we're doing. So
3135
05:39:09,160 --> 05:39:14,840
I'm going to write down here. See more on inference mode here.
3136
05:39:17,400 --> 05:39:24,440
And I just want to highlight something as well is that they referenced torch no grad with the
3137
05:39:24,440 --> 05:39:29,000
torch inference mode context manager. Inference mode is fairly new in pie torch. So you might
3138
05:39:29,000 --> 05:39:34,600
see a lot of code existing pie torch code with torch dot no grad. You can use this as well.
3139
05:39:35,320 --> 05:39:40,360
Why? Preds equals model zero. And this will do much of the same as what inference mode is doing.
3140
05:39:40,360 --> 05:39:45,640
But inference mode has a few things that are advantages over no grad, which are discussed in
3141
05:39:45,640 --> 05:39:51,960
this thread here. But if we do this, we get very similar output to what we got before.
3142
05:39:51,960 --> 05:39:58,360
Grad function. But as you'll read in here and in the pie torch documentation, inference mode is
3143
05:39:58,360 --> 05:40:05,800
the favored way of doing inference for now. I just wanted to highlight this. So you can also do
3144
05:40:05,800 --> 05:40:26,440
something similar with torch dot no grad. However, inference mode is preferred. Alrighty. So I'm
3145
05:40:26,440 --> 05:40:32,920
just going to comment this out. So we just have one thing going on there. The main takeaway
3146
05:40:32,920 --> 05:40:38,680
from this video is that when we're making predictions, we use the context manager torch
3147
05:40:38,680 --> 05:40:44,040
dot inference mode. And right now, because our models variables or internal parameters are
3148
05:40:44,040 --> 05:40:51,640
randomly initialized, our models predictions are as good as random. So they're actually not too far
3149
05:40:51,640 --> 05:40:58,600
off where our values are. At least the red dots aren't like scattered all over here. But in the
3150
05:40:58,600 --> 05:41:04,840
upcoming videos, we're going to be writing some pie torch training code to move these values
3151
05:41:04,840 --> 05:41:12,440
closer to the green dots by looking at the training data here. So with that being said,
3152
05:41:12,440 --> 05:41:20,120
I'll see you in the next video. Friends, welcome back. In the last video, we saw that our model
3153
05:41:20,120 --> 05:41:26,120
performs pretty poorly. Like, ideally, these red dots should be in line with these green dots.
3154
05:41:26,120 --> 05:41:33,080
And we know that because why? Well, it's because our model is initialized with random parameters.
3155
05:41:33,080 --> 05:41:38,120
And I just want to put a little note here. You don't necessarily have to initialize your model
3156
05:41:38,120 --> 05:41:43,480
with random parameters. You could initialize it with this could be zero. Yeah, these two values,
3157
05:41:43,480 --> 05:41:49,400
weights can bias could be zero and you could go from there. Or you could also use the parameters
3158
05:41:49,400 --> 05:41:53,640
from another model. But we're going to see that later on. That's something called transfer learning.
3159
05:41:53,640 --> 05:42:00,680
That's just a little spoiler for what's to come. And so we've also discussed that an ideal model
3160
05:42:00,680 --> 05:42:09,720
will replicate these known parameters. So in other words, start with random unknown parameters,
3161
05:42:09,720 --> 05:42:17,080
these two values here. And then we want to write some code for our model to move towards estimating
3162
05:42:17,080 --> 05:42:22,920
the ideal parameters here. Now, I just want to be explicit here and write down some intuition
3163
05:42:22,920 --> 05:42:27,400
before we jump into the training code. But this is very exciting. We're about to get into
3164
05:42:27,400 --> 05:42:33,240
training our very first machine learning model. So what's right here, the whole idea of training
3165
05:42:34,120 --> 05:42:49,160
is for a model to move from some unknown parameters, these may be random to some known parameters.
3166
05:42:49,160 --> 05:43:02,040
Or in other words, from a poor representation, representation of the data to a better representation
3167
05:43:02,840 --> 05:43:09,480
of the data. And so in our case, would you say that our models representation of the green dots
3168
05:43:09,480 --> 05:43:14,920
here with this red dots, is that a good representation? Or is that a poor representation?
3169
05:43:14,920 --> 05:43:21,640
I mean, I don't know about you, but I would say that to me, this is a fairly poor representation.
3170
05:43:21,640 --> 05:43:28,600
And one way to measure the representation between your models outputs, in our case, the red dots,
3171
05:43:28,600 --> 05:43:36,040
the predictions, and the testing data, is to use a loss function. So I'm going to write
3172
05:43:36,040 --> 05:43:40,520
this down here. This is what we're moving towards. We're moving towards training, but we need a
3173
05:43:40,520 --> 05:43:49,560
way to measure how poorly our models predictions are doing. So one way to measure how poor or how
3174
05:43:49,560 --> 05:44:00,040
wrong your models predictions are, is to use a loss function. And so if we go pytorch loss
3175
05:44:00,040 --> 05:44:06,200
functions, we're going to see that pytorch has a fair few loss functions built in. But the essence
3176
05:44:06,200 --> 05:44:11,960
of all of them is quite similar. So just wait for this to load my internet's going a little bit
3177
05:44:11,960 --> 05:44:15,320
slow today, but that's okay. We're not in a rush here. We're learning something fun.
3178
05:44:16,040 --> 05:44:20,840
If I search here for loss, loss functions, here we go. So yeah, this is torch in N. These are the
3179
05:44:20,840 --> 05:44:25,240
basic building blocks for graphs, whole bunch of good stuff in here, including loss functions.
3180
05:44:25,240 --> 05:44:29,960
Beautiful. And this is another thing to note as well, another one of those scenarios where
3181
05:44:29,960 --> 05:44:35,960
there's more words for the same thing. You might also see a loss function referred to as a criterion.
3182
05:44:36,520 --> 05:44:42,200
There's another word called cost function. So I might just write this down so you're aware of it.
3183
05:44:42,200 --> 05:44:47,160
Yeah, cost function versus loss function. And maybe some formal definitions about what all of these
3184
05:44:47,160 --> 05:44:51,480
are. Maybe they're used in different fields. But in the case of we're focused on machine learning,
3185
05:44:51,480 --> 05:45:02,280
right? So I'm just going to go note, loss function may also be called cost function or criterion in
3186
05:45:03,080 --> 05:45:13,320
different areas. For our case, we're going to refer to it as a loss function. And let's
3187
05:45:13,320 --> 05:45:17,400
just formally define a loss function here, because we're going to go through a fair few steps in
3188
05:45:17,400 --> 05:45:22,680
the upcoming videos. So this is a warning, nothing we can't handle. But I want to put some formal
3189
05:45:22,680 --> 05:45:26,280
definitions on things. We're going to see them in practice. That's what I prefer to do,
3190
05:45:26,280 --> 05:45:30,760
rather than just sit here defining stuff. This lecture has already had enough text on the page.
3191
05:45:30,760 --> 05:45:38,520
So hurry up and get into coding Daniel. A loss function is a function to measure how wrong your
3192
05:45:38,520 --> 05:45:50,840
models predictions are to the ideal outputs. So lower is better. So ideally, think of a measurement,
3193
05:45:50,840 --> 05:45:55,400
how could we measure the difference between the red dots and the green dots? One of the
3194
05:45:55,400 --> 05:46:00,760
simplest ways to do so would be just measure the distance here, right? So if we go, let's just
3195
05:46:00,760 --> 05:46:09,560
estimate this is 035 to 0.8. They're abouts. So what's the difference there? About 0.45.
3196
05:46:09,560 --> 05:46:14,120
Then we could do the same again for all of these other dots, and then maybe take the average of that.
3197
05:46:15,320 --> 05:46:19,640
Now, if you've worked with loss functions before, you might have realized that I've just
3198
05:46:19,640 --> 05:46:25,320
reproduced mean absolute error. But we're going to get to that in a minute. So we need a loss
3199
05:46:25,320 --> 05:46:30,520
function. I'm going to write down another little dot point here. This is just setting up intuition.
3200
05:46:30,520 --> 05:46:37,240
Things we need to train. We need a loss function. This is PyTorch. And this is machine learning
3201
05:46:37,240 --> 05:46:42,440
in general, actually. But we're focused on PyTorch. We need an optimizer. What does the optimizer do?
3202
05:46:43,000 --> 05:46:52,520
Takes into account the loss of a model and adjusts the model's parameters. So the parameters recall
3203
05:46:52,520 --> 05:47:01,880
our weight and bias values. Weight and biases. We can check those or bias. We can check those by
3204
05:47:01,880 --> 05:47:10,600
going model dot parameter or parameters. But I also like, oh, that's going to give us a generator,
3205
05:47:10,600 --> 05:47:17,640
isn't it? Why do we not define the model yet? What do we call our model? Oh, model zero. Excuse me.
3206
05:47:17,640 --> 05:47:22,840
I forgot where. I'm going to build a lot of models in this course. So we're giving them numbers.
3207
05:47:24,120 --> 05:47:27,800
Modeled up parameters. Yeah, we've got a generator. So we'll turn that into a list.
3208
05:47:28,360 --> 05:47:32,600
But model zero, if we want to get them labeled, we want state dict here.
3209
05:47:35,480 --> 05:47:39,720
There we go. So our weight is this value. That's a random value we've set. And there's the bias.
3210
05:47:39,720 --> 05:47:45,080
And now we've only got two parameters for our model. So it's quite simple. However, the principles
3211
05:47:45,080 --> 05:47:50,040
that we're learning here are going to be the same principles, taking a loss function,
3212
05:47:50,040 --> 05:47:55,320
trying to minimize it, so getting it to lower. So the ideal model will predict exactly what our
3213
05:47:55,320 --> 05:48:02,760
test data is. And an optimizer will take into account the loss and will adjust a model's parameter.
3214
05:48:02,760 --> 05:48:08,440
And our case weights and bias to be, let's finish this definition takes into account the
3215
05:48:08,440 --> 05:48:15,880
loss of a model and adjust the model's parameters, e.g. weight and bias, in our case, to improve the
3216
05:48:15,880 --> 05:48:32,760
loss function. And specifically, for PyTorch, we need a training loop and a testing loop.
3217
05:48:32,760 --> 05:48:40,040
Now, this is what we're going to work towards building throughout the next couple of videos.
3218
05:48:40,040 --> 05:48:44,120
We're going to focus on these two first, the loss function and optimizer. There's the formal
3219
05:48:44,120 --> 05:48:47,320
definition of those. You're going to find many different definitions. That's how I'm going to
3220
05:48:47,320 --> 05:48:52,040
find them. Loss function measures how wrong your model's predictions are, lower is better,
3221
05:48:52,040 --> 05:48:57,560
optimizer takes into account the loss of your model. So how wrong it is, and starts to move
3222
05:48:57,560 --> 05:49:04,440
these two values into a way that improves where these red dots end up. But these, again, these
3223
05:49:04,440 --> 05:49:11,160
principles of a loss function and an optimizer can be for models with two parameters or models
3224
05:49:11,160 --> 05:49:17,080
with millions of parameters, can be for computer vision models, or could be for simple models like
3225
05:49:17,080 --> 05:49:22,760
ours that predict the dots on a straight line. So with that being said, let's jump into the next
3226
05:49:22,760 --> 05:49:28,120
video. We'll start to look a little deeper into loss function, row problem, and an optimizer.
3227
05:49:28,840 --> 05:49:36,280
I'll see you there. Welcome back. We're in the exciting streak of videos coming up here. I mean,
3228
05:49:36,280 --> 05:49:40,680
the whole course is fun. Trust me. But this is really exciting because training your first machine
3229
05:49:40,680 --> 05:49:45,320
learning model seems a little bit like magic, but it's even more fun when you're writing the code
3230
05:49:45,320 --> 05:49:50,360
yourself what's going on behind the scenes. So we discussed that the whole concept of training
3231
05:49:50,360 --> 05:49:54,760
is from going unknown parameters, random parameters, such as what we've got so far
3232
05:49:54,760 --> 05:49:59,800
to parameters that better represent the data. And we spoke of the concept of a loss function.
3233
05:49:59,800 --> 05:50:04,440
We want to minimize the loss function. That is the whole idea of a training loop in PyTorch,
3234
05:50:04,440 --> 05:50:10,600
or an optimization loop in PyTorch. And an optimizer is one of those ways that can
3235
05:50:10,600 --> 05:50:18,440
nudge the parameters of our model. In our case, weights or bias towards values rather than just
3236
05:50:18,440 --> 05:50:24,680
being random values like they are now towards values that lower the loss function. And if we
3237
05:50:24,680 --> 05:50:29,000
lower the loss function, what does a loss function do? It measures how wrong our models
3238
05:50:29,000 --> 05:50:34,040
predictions are compared to the ideal outputs. So if we lower that, well, hopefully we move
3239
05:50:34,040 --> 05:50:40,680
these red dots towards the green dots. And so as you might have guessed, PyTorch has some built
3240
05:50:40,680 --> 05:50:47,400
in functionality for implementing loss functions and optimizers. And by the way, what we're covering
3241
05:50:47,400 --> 05:50:52,920
so far is in the train model section of the PyTorch workflow fundamentals, I've got a little
3242
05:50:52,920 --> 05:50:57,480
nice table here, which describes a loss function. What does it do? Where does it live in PyTorch?
3243
05:50:57,480 --> 05:51:02,120
Common values, we're going to see some of these hands on. If you'd like to read about it,
3244
05:51:02,120 --> 05:51:07,160
of course, you have the book version of the course here. So loss functions in PyTorch,
3245
05:51:07,160 --> 05:51:11,800
I'm just in docstorch.nn. Look at this. Look at all these loss functions. There's far too many
3246
05:51:11,800 --> 05:51:16,200
for us to go through all in one hit. So we're just going to focus on some of the most common ones.
3247
05:51:16,200 --> 05:51:22,280
Look at that. We've got about what's our 15 loss functions, something like that? Well, truth be
3248
05:51:22,280 --> 05:51:28,680
told is that which one should use? You're not really going to know unless you start to work hands
3249
05:51:28,680 --> 05:51:34,360
on with different problems. And so in our case, we're going to be looking at L1 loss. And this is
3250
05:51:34,360 --> 05:51:39,480
an again, once more another instance where different machine learning libraries have different names
3251
05:51:39,480 --> 05:51:46,200
for the same thing, this is mean absolute error, which we kind of discussed in the last video,
3252
05:51:46,200 --> 05:51:51,800
which is if we took the distance from this red dot to this green dot and say at 0.4, they're about
3253
05:51:51,800 --> 05:51:58,200
0.4, 0.4, and then took the mean, well, we've got the mean absolute error. But in PyTorch,
3254
05:51:58,200 --> 05:52:03,480
they call it L1 loss, which is a little bit confusing because then we go to MSE loss,
3255
05:52:03,480 --> 05:52:09,880
which is mean squared error, which is L2. So naming conventions just takes a little bit of getting
3256
05:52:09,880 --> 05:52:16,040
used to this is a warning for you. So let's have a look at the L1 loss function. Again,
3257
05:52:16,040 --> 05:52:19,960
I'm just making you aware of where the other loss functions are. We'll do with some binary
3258
05:52:19,960 --> 05:52:25,160
cross entropy loss later in the course. And maybe even is that categorical cross entropy?
3259
05:52:26,120 --> 05:52:31,640
We'll see that later on. But all the others will be problem specific. For now, a couple of loss
3260
05:52:31,640 --> 05:52:37,240
functions like this, L1 loss, MSE loss, we use for regression problems. So that's predicting a number.
3261
05:52:38,040 --> 05:52:43,800
Cross entropy loss is a loss that you use with classification problems. But we'll see those hands
3262
05:52:43,800 --> 05:52:49,960
on later on. Let's have a look at L1 loss. So L1 loss creates a criterion. As I said, you might
3263
05:52:49,960 --> 05:52:55,000
hear the word criterion used in PyTorch for a loss function. I typically call them loss functions.
3264
05:52:55,000 --> 05:52:59,480
The literature typically calls it loss functions. That measures the mean absolute error. There we
3265
05:52:59,480 --> 05:53:07,000
go. L1 loss is the mean absolute error between each element in the input X and target Y. Now,
3266
05:53:07,000 --> 05:53:11,240
your extracurricular measure might have guessed is to read through the documentation for the
3267
05:53:11,240 --> 05:53:16,440
different loss functions, especially L1 loss. But for the sake of this video, let's just implement
3268
05:53:16,440 --> 05:53:22,840
it for ourselves. Oh, and if you want a little bit of a graphic, I've got one here. This is where
3269
05:53:22,840 --> 05:53:28,680
we're up to, by the way, picking a loss function optimizer for step two. This is a fun part, right?
3270
05:53:28,680 --> 05:53:33,560
We're getting into training a model. So we've got mean absolute error. Here's that graph we've
3271
05:53:33,560 --> 05:53:38,440
seen before. Oh, look at this. Okay. So we've got the difference here. I've actually measured
3272
05:53:38,440 --> 05:53:44,200
this before in the past. So I kind of knew what it was. Mean absolute error is if we repeat for
3273
05:53:44,200 --> 05:53:50,520
all samples in our set that we're working with. And if we take the absolute difference between
3274
05:53:50,520 --> 05:53:56,920
these two dots, well, then we take the mean, we've got mean absolute error. So MAE loss equals
3275
05:53:56,920 --> 05:54:01,000
torch mean we could write it out. That's the beauty of pine torch, right? We could write this out.
3276
05:54:01,000 --> 05:54:08,520
Or we could use the torch and N version, which is recommended. So let's jump in. There's a colorful
3277
05:54:08,520 --> 05:54:14,760
slide describing what we're about to do. So let's go set up a loss function. And then we're also
3278
05:54:14,760 --> 05:54:27,960
going to put in here, set up an optimizer. So let's call it loss FN equals NN dot L1 loss.
3279
05:54:29,400 --> 05:54:32,520
Simple as that. And then if we have a look at what's our loss function, what does this say?
3280
05:54:34,280 --> 05:54:37,000
Oh my goodness. My internet is going quite slow today.
3281
05:54:38,600 --> 05:54:42,360
It's raining outside. So there might be some delays somewhere. But that's right. Gives us a
3282
05:54:42,360 --> 05:54:48,360
chance to sit here and be mindful about what we're doing. Look at that. Okay. Loss function.
3283
05:54:48,360 --> 05:54:53,560
L1 loss. Beautiful. So we've got a loss function. Our objective for training a machine learning
3284
05:54:53,560 --> 05:54:58,600
model will be two. Let's go back. Look at the colorful graphic will be to minimize these
3285
05:54:58,600 --> 05:55:05,720
distances here. And in turn, minimize the overall value of MAE. That is our goal.
3286
05:55:05,720 --> 05:55:12,200
If our red dots line up with our green dots, we will have a loss value of zero, the ideal point
3287
05:55:12,200 --> 05:55:18,920
for a model to be. And so let's go here. We now need an optimizer. As we discussed before,
3288
05:55:18,920 --> 05:55:23,960
the optimizer takes into account the loss of a model. So these two work in tandem.
3289
05:55:23,960 --> 05:55:27,400
That's why I've put them as similar steps if we go back a few slides.
3290
05:55:28,760 --> 05:55:34,680
So this is why I put these as 2.1. Often picking a loss function and optimizer and pytorch
3291
05:55:34,680 --> 05:55:40,200
come as part of the same package because they work together. The optimizer's objective is to
3292
05:55:40,200 --> 05:55:45,880
give the model values. So parameters like a weight and a bias that minimize the loss function.
3293
05:55:45,880 --> 05:55:53,000
They work in tandem. And so let's see what an optimizer optimizes. Where might that be?
3294
05:55:53,000 --> 05:55:59,480
What if we search here? I typically don't use this search because I prefer just using Google
3295
05:55:59,480 --> 05:56:08,520
search. But does this give us optimizer? Hey, there we go. So again, pytorch has torch.optim
3296
05:56:09,640 --> 05:56:16,280
which is where the optimizers are. Torch.optim. Let me put this link in here.
3297
05:56:17,800 --> 05:56:21,800
This is another bit of your extracurricular. If you want to read more about different optimizers
3298
05:56:21,800 --> 05:56:26,920
in pytorch, as you might have guessed, they have a few. Torch.optim is a package implementing
3299
05:56:26,920 --> 05:56:32,520
various optimization algorithms. Most commonly used methods are already supported and the interface
3300
05:56:32,520 --> 05:56:38,120
is general enough so that more sophisticated ones can also be easily integrated into the future.
3301
05:56:38,120 --> 05:56:42,840
So if we have a look at what algorithms exist here, again, we're going to throw a lot of names
3302
05:56:42,840 --> 05:56:50,600
at you. But in the literature, a lot of them that have made it into here are already good working
3303
05:56:50,600 --> 05:56:55,880
algorithms. So it's a matter of picking whichever one's best for your problem. How do you find that
3304
05:56:55,880 --> 05:57:04,120
out? Well, SGD, stochastic gradient descent, is possibly the most popular. However, there are
3305
05:57:04,120 --> 05:57:11,160
some iterations on SGD, such as Adam, which is another one that's really popular. So again,
3306
05:57:11,160 --> 05:57:16,200
this is one of those other machine learning is part art, part science is trial and error of
3307
05:57:16,200 --> 05:57:20,280
figuring out what works best for your problem for us. We're going to start with SGD because
3308
05:57:20,280 --> 05:57:25,400
it's the most popular. And if you were paying attention to a previous video, you might have
3309
05:57:25,400 --> 05:57:31,800
seen that I said, look up gradient descent, wherever we got this gradient descent. There we go.
3310
05:57:32,360 --> 05:57:38,360
So this is one of the main algorithms that improves our models. So gradient descent and back
3311
05:57:38,360 --> 05:57:43,720
propagation. So if we have a look at this stochastic gradient descent, bit of a tongue twister,
3312
05:57:43,720 --> 05:57:49,400
is random gradient descent. So that's what stochastic means. So basically, our model
3313
05:57:49,400 --> 05:57:58,360
improves by taking random numbers, let's go down here, here, and randomly adjusting them
3314
05:57:58,360 --> 05:58:04,440
so that they minimize the loss. And once how optimizer, that's right here, once how optimizer
3315
05:58:04,440 --> 05:58:11,560
torch dot opt in, let's implement SGD, SGD stochastic gradient descent. We're going to write this here,
3316
05:58:11,560 --> 05:58:20,840
stochastic gradient descent. It starts by randomly adjusting these values. And once it's found
3317
05:58:20,840 --> 05:58:26,280
some random values or random steps that have minimized the loss value, we're going to see
3318
05:58:26,280 --> 05:58:32,600
this in action later on, it's going to continue adjusting them in that direction. So say it says,
3319
05:58:32,600 --> 05:58:37,880
oh, weights, if I increase the weights, it reduces the loss. So it's going to keep increasing the
3320
05:58:37,880 --> 05:58:44,760
weights until the weights no longer reduce the loss. Maybe it gets to a point at say 0.65.
3321
05:58:44,760 --> 05:58:48,760
If you increase the weights anymore, the loss is going to go up. So the optimizer is like,
3322
05:58:48,760 --> 05:58:53,320
well, I'm going to stop there. And then for the bias, the same thing happens. If it decreases the
3323
05:58:53,320 --> 05:58:57,640
bias and finds that the loss increases, well, it's going to go, well, I'm going to try increasing
3324
05:58:57,640 --> 05:59:04,600
the bias instead. So again, one last summary of what's going on here, a loss function measures
3325
05:59:04,600 --> 05:59:09,720
how wrong our model is. And the optimizer adjust our model parameters, no matter whether there's
3326
05:59:09,720 --> 05:59:15,320
two parameters or millions of them to reduce the loss. There are a couple of things that
3327
05:59:15,320 --> 05:59:23,160
an optimizer needs to take in. It needs to take in as an argument, params. So this is if we go to
3328
05:59:23,160 --> 05:59:30,600
SGD, I'm just going to link this as well. SGD, there's the formula of what SGD does. I look at this
3329
05:59:30,600 --> 05:59:35,240
and I go, hmm, there's a lot going on here. And take me a while to understand that. So I like to
3330
05:59:35,240 --> 05:59:43,640
see it in code. So we need params. This is short for what parameters should I optimize as an optimizer.
3331
05:59:43,640 --> 05:59:49,880
And then we also need an LR, which stands for, I'm going to write this in a comment, LR equals
3332
05:59:49,880 --> 05:59:55,320
learning rate, possibly the most, oh, I didn't even type rate, did I possibly the most important
3333
05:59:55,320 --> 06:00:02,360
hyper parameter you can set? So let me just remind you, I'm throwing lots of words out here, but I'm
3334
06:00:02,360 --> 06:00:07,240
kind of like trying to write notes about what we're doing. Again, we're going to see these in action
3335
06:00:07,240 --> 06:00:21,160
in a second. So check out our models and parameters. So a parameter is a value that the model sets
3336
06:00:21,160 --> 06:00:32,920
itself. So learning rate equals possibly the most important learning hyper parameter. I don't
3337
06:00:32,920 --> 06:00:39,160
need learning there, do I? Hyper parameter. And a hyper parameter is a value that us as a data scientist
3338
06:00:39,160 --> 06:00:46,840
or a machine learning engineer set ourselves, you can set. So the learning rate is, in our case,
3339
06:00:46,840 --> 06:00:52,440
let's go 0.01. You're like, Daniel, where did I get this value from? Well, again, these type of
3340
06:00:52,440 --> 06:01:00,360
values come with experience. I think it actually says it in here, LR, LR 0.1. Yeah, okay, so the
3341
06:01:00,360 --> 06:01:07,080
default is 0.1. But then if we go back to Optim, I think I saw it somewhere. Did I see it somewhere?
3342
06:01:07,080 --> 06:01:16,280
0.0? Yeah, there we go. Yeah, so a lot of the default settings are pretty good in torch optimizers.
3343
06:01:16,280 --> 06:01:22,520
However, the learning rate, what does it actually do? We could go 0.01. These are all common values
3344
06:01:22,520 --> 06:01:30,680
here. Triple zero one. I'm not sure exactly why. Oh, model, it's model zero. The learning rate says
3345
06:01:30,680 --> 06:01:36,600
to our optimizer, yes, it's going to optimize our parameters here. But the higher the learning
3346
06:01:36,600 --> 06:01:43,400
rate, the more it adjusts each of these parameters in one hit. So let's say it's 0.01. And it's going
3347
06:01:43,400 --> 06:01:49,560
to optimize this value here. So it's going to take that big of a step. If we changed it to here,
3348
06:01:49,560 --> 06:01:56,280
it's going to take a big step on this three. And if we changed it to all the way to the end 0.01,
3349
06:01:56,280 --> 06:02:01,640
it's only going to change this value. So the smaller the learning rate, the smaller the change
3350
06:02:01,640 --> 06:02:06,200
in the parameter, the larger the learning rate, the larger the change in the parameter.
3351
06:02:06,200 --> 06:02:13,320
So we've set up a loss function. We've set up an optimizer. Let's now move on to the next step
3352
06:02:13,320 --> 06:02:20,840
in our training workflow. And that's by building a training loop. Far out. This is exciting. I'll
3353
06:02:20,840 --> 06:02:29,400
see you in the next video. Welcome back. In the last video, we set up a loss function. And we set
3354
06:02:29,400 --> 06:02:35,240
up an optimizer. And we discussed the roles of each. So loss function measures how wrong our model
3355
06:02:35,240 --> 06:02:41,640
is. The optimizer talks to the loss function and goes, well, if I change these parameters a certain
3356
06:02:41,640 --> 06:02:47,480
way, does that reduce the loss function at all? And if it does, yes, let's keep adjusting them in
3357
06:02:47,480 --> 06:02:53,800
that direction. If it doesn't, let's adjust them in the opposite direction. And I just want to show
3358
06:02:53,800 --> 06:02:58,920
you I added a little bit of text here just to concretely put down what we were discussing.
3359
06:02:58,920 --> 06:03:05,320
Inside the optimizer, you'll often have to set two parameters, params and lr, where params is
3360
06:03:05,320 --> 06:03:10,840
the model parameters you'd like to optimize for an example, in our case, params equals our model
3361
06:03:10,840 --> 06:03:16,760
zero parameters, which were, of course, a weight and a bias. And the learning rate, which is lr
3362
06:03:16,760 --> 06:03:22,440
in optimizer, lr stands for learning rate. And the learning rate is a hyper parameter. Remember,
3363
06:03:22,440 --> 06:03:27,560
a hyper parameter is a value that we the data scientist or machine learning engineer sets,
3364
06:03:27,560 --> 06:03:35,000
whereas a parameter is what the model sets itself defines how big or smaller optimizer changes
3365
06:03:35,000 --> 06:03:41,080
the model parameters. So a small learning rate, so the smaller this value results in small
3366
06:03:41,080 --> 06:03:47,240
changes, a large learning rate results in large changes. So another question might be,
3367
06:03:47,880 --> 06:03:53,320
well, very valid question. Hey, I put this here already, is which loss function and which optimizer
3368
06:03:53,320 --> 06:03:58,760
should I use? So this is another tough one, because it's problem specific. But with experience
3369
06:03:58,760 --> 06:04:02,920
and machine learning, I'm showing you one example here, you'll get an idea of what works for your
3370
06:04:02,920 --> 06:04:08,040
particular problem for a regression problem, like ours, a loss function of l1 loss, which is mai
3371
06:04:08,040 --> 06:04:14,200
and pytorch. And an optimizer like torch dot opt in slash s gd like sarcastic gradient descent
3372
06:04:14,200 --> 06:04:18,600
will suffice. But for a classification problem, we're going to see this later on.
3373
06:04:18,600 --> 06:04:23,240
Not this one specifically, whether a photo is a cat of a dog, that's just an example of a binary
3374
06:04:23,240 --> 06:04:28,840
classification problem, you might want to use a binary classification loss. But with that being
3375
06:04:28,840 --> 06:04:35,640
said, we now are moving on to, well, here's our whole goal is to reduce the MAE of our model.
3376
06:04:35,640 --> 06:04:40,840
Let's get the workflow. We've done these two steps. Now we want to build a training loop. So
3377
06:04:40,840 --> 06:04:44,760
let's get back into here. There's going to be a fair few steps going on. We've already covered
3378
06:04:44,760 --> 06:04:54,520
a few, but hey, nothing we can't handle together. So building a training loop in pytorch.
3379
06:04:56,040 --> 06:05:01,320
So I thought about just talking about what's going on in the training loop, but we can talk
3380
06:05:01,320 --> 06:05:06,600
about the steps after we've coded them. How about we do that? So we want to build a training loop
3381
06:05:06,600 --> 06:05:16,200
and a testing loop. How about we do that? So a couple of things we need in a training loop.
3382
06:05:16,840 --> 06:05:20,440
So there's going to be a fair few steps here if you've never written a training loop before,
3383
06:05:20,440 --> 06:05:25,240
but that is completely fine because you'll find that the first couple of times that you write this,
3384
06:05:25,240 --> 06:05:28,920
you'll be like, oh my gosh, there's too much going on here. But then when you have practice,
3385
06:05:28,920 --> 06:05:33,480
you'll go, okay, I see what's going on here. And then eventually you'll write them with your
3386
06:05:33,480 --> 06:05:38,280
eyes closed. I've got a fun song for you to help you out remembering things. It's called the
3387
06:05:38,280 --> 06:05:43,400
unofficial pytorch optimization loop song. We'll see that later on, or actually, I'll probably leave
3388
06:05:43,400 --> 06:05:48,600
that as an extension, but you'll see that you can also functionize these things, which we will do
3389
06:05:48,600 --> 06:05:53,160
later in the course so that you can just write them once and then forget about them. But we're
3390
06:05:53,160 --> 06:05:57,640
going to write it all from scratch to begin with so we know what's happening. So we want to,
3391
06:05:57,640 --> 06:06:06,280
or actually step zero, is loop through the data. So we want to look at the data multiple times
3392
06:06:06,280 --> 06:06:11,240
because our model is going to, at first, start with random predictions on the data, make some
3393
06:06:11,240 --> 06:06:15,400
predictions. We're trying to improve those. We're trying to minimize the loss to make those
3394
06:06:15,400 --> 06:06:23,400
predictions. We do a forward pass. So forward pass. Why is it called a forward pass? So this
3395
06:06:23,400 --> 06:06:34,360
involves data moving through our model's forward functions. Now that I say functions because there
3396
06:06:34,360 --> 06:06:39,240
might be plural, there might be more than one. And the forward method recall, we wrote in our model
3397
06:06:39,240 --> 06:06:46,840
up here. Ford. A forward pass is our data going through this function here. And if you want to
3398
06:06:46,840 --> 06:06:54,600
look at it visually, let's look up a neural network graphic. Images, a forward pass is just
3399
06:06:55,400 --> 06:07:01,640
data moving from the inputs to the output layer. So starting here input layer moving through the
3400
06:07:01,640 --> 06:07:07,560
model. So that's a forward pass, also called forward propagation. Another time we'll have
3401
06:07:07,560 --> 06:07:14,040
more than one name is used for the same thing. So we'll go back down here, forward pass. And
3402
06:07:14,040 --> 06:07:22,760
I'll just write here also called forward propagation, propagation. Wonderful. And then we need to
3403
06:07:22,760 --> 06:07:33,480
calculate the loss. So forward pass. Let me write this. To calculate or to make predictions,
3404
06:07:33,480 --> 06:07:46,200
make predictions on data. So calculate the loss, compare forward pass predictions. Oh, there's
3405
06:07:46,200 --> 06:07:50,760
an undergoing in the background here of my place. We might be in for a storm. Perfect time to write
3406
06:07:50,760 --> 06:07:55,960
code, compare forward pass predictions to ground truth labels. We're going to see all this in code
3407
06:07:56,520 --> 06:08:01,160
in a second, calculate the loss. And then we're going to go optimise a zero grad. We haven't
3408
06:08:01,160 --> 06:08:04,760
spoken about what this is, but that's okay. We're going to see that in a second. I'm not going to
3409
06:08:04,760 --> 06:08:09,960
put too much there. Loss backward. We haven't discussed this one either. There's probably three
3410
06:08:09,960 --> 06:08:15,080
steps that we haven't really discussed. We've discussed the idea behind them, but not too much
3411
06:08:15,080 --> 06:08:23,800
in depth. Optimise our step. So this one is loss backwards is move backwards. If the forward pass
3412
06:08:23,800 --> 06:08:29,880
is forwards, like through the network, the forward pass is data goes into out. The backward pass
3413
06:08:29,880 --> 06:08:35,960
data goes, our calculations happen backwards. So we'll see what that is in a second. Where were
3414
06:08:35,960 --> 06:08:40,120
we over here? We've got too much going on. I'm getting rid of these moves backwards through
3415
06:08:41,480 --> 06:08:51,880
the network to calculate the gradients. Oh, oh, the gradients of each of the parameters
3416
06:08:53,320 --> 06:08:58,840
of our model with respect to the loss. Oh my gosh, that is an absolute mouthful,
3417
06:08:58,840 --> 06:09:06,600
but that'll do for now. Optimise a step. This is going to use the optimiser to adjust our
3418
06:09:06,600 --> 06:09:16,200
model's parameters to try and improve the loss. So remember how I said in a previous video
3419
06:09:16,200 --> 06:09:21,720
that I'd love you to watch the two videos I linked above. One on gradient descent and one
3420
06:09:21,720 --> 06:09:25,960
on back propagation. If you did, you might have seen like there's a fair bit of math going on in
3421
06:09:25,960 --> 06:09:33,000
there. Well, that's essentially how our model goes from random parameters to better parameters,
3422
06:09:33,000 --> 06:09:37,720
using math. Many people, one of the main things I get asked from machine learning is how do I
3423
06:09:37,720 --> 06:09:42,600
learn machine learning if I didn't do math? Well, the beautiful thing about PyTorch is that it
3424
06:09:42,600 --> 06:09:47,880
implements a lot of the math of back propagation. So this is back propagation. I'm going to write
3425
06:09:47,880 --> 06:09:53,160
this down here. This is an algorithm called back, back propagation, hence the loss backward. We're
3426
06:09:53,160 --> 06:10:00,280
going to see this in code in a second, don't you worry? And this is gradient descent. So these
3427
06:10:00,280 --> 06:10:06,760
two algorithms drive the majority of our learning. So back propagation, calculate the gradients of
3428
06:10:06,760 --> 06:10:11,400
the parameters of our model with respect to the loss function and optimise our step,
3429
06:10:11,400 --> 06:10:16,600
we'll trigger code to run gradient descent, which is to minimise the gradients because what is a
3430
06:10:16,600 --> 06:10:24,360
gradient? Let's look this up. What is a gradient? I know we haven't written a code yet, but we're
3431
06:10:24,360 --> 06:10:32,280
going to do that. Images. Gradient, there we go. Changing y, changing x. Gradient is from high
3432
06:10:32,280 --> 06:10:38,360
school math. Gradient is a slope. So if you were on a hill, let's find a picture of a hill.
3433
06:10:38,360 --> 06:10:50,440
Picture of a hill. There we go. This is a great big hill. So if you were on the top of this hill,
3434
06:10:50,440 --> 06:10:56,520
and you wanted to get to the bottom, how would you get to the bottom? Well, of course, you just
3435
06:10:56,520 --> 06:11:00,920
walked down the hill. But if you're a machine learning model, what are you trying to do? Let's
3436
06:11:00,920 --> 06:11:05,560
imagine your loss is the height of this hill. You start off with your losses really high, and you
3437
06:11:05,560 --> 06:11:10,200
want to take your loss down to zero, which is the bottom, right? Well, if you measure the gradient
3438
06:11:10,200 --> 06:11:17,800
of the hill, the bottom of the hill is in the opposite direction to where the gradient is steep.
3439
06:11:18,360 --> 06:11:23,560
Does that make sense? So the gradient here is an incline. We want our model to move towards the
3440
06:11:23,560 --> 06:11:27,800
gradient being nothing, which is down here. And you could argue, yeah, the gradient's probably
3441
06:11:27,800 --> 06:11:31,160
nothing up the top here, but let's just for argument's sake say that we want to get to the
3442
06:11:31,160 --> 06:11:35,480
bottom of the hill. So we're measuring the gradient, and one of the ways an optimisation algorithm
3443
06:11:35,480 --> 06:11:42,520
works is it moves our model parameters so that the gradient equals zero, and then if the gradient
3444
06:11:43,080 --> 06:11:48,520
of the loss equals zero, while the loss equals zero two. So now let's write some code. So we're
3445
06:11:48,520 --> 06:11:53,720
going to set up a parameter called or a variable called epochs. And we're going to start with one,
3446
06:11:53,720 --> 06:11:59,160
even though this could be any value, let me define these as we go. So we're going to write code to
3447
06:11:59,160 --> 06:12:10,440
do all of this. So epochs, an epoch is one loop through the data dot dot dot. So epochs, we're
3448
06:12:10,440 --> 06:12:15,320
going to start with one. So one time through all of the data, we don't have much data. And so
3449
06:12:15,880 --> 06:12:25,720
for epoch, let's go this, this is step zero, zero, loop through the data. By the way, when I say
3450
06:12:25,720 --> 06:12:32,520
loop through the data, I want you to do all of these steps within the loop. And do dot dot dot
3451
06:12:33,480 --> 06:12:39,080
loop through the data. So for epoch in range epochs, even though it's only going to be one,
3452
06:12:39,080 --> 06:12:44,120
we can adjust this later. And because epochs, we've set this ourselves, it is a,
3453
06:12:45,880 --> 06:12:55,080
this is a hyper parameter, because we've set it ourselves. And I know you could argue that,
3454
06:12:55,080 --> 06:13:01,640
hey, our machine learning parameters of model zero, or our model parameters, model zero aren't
3455
06:13:01,640 --> 06:13:06,440
actually parameters, because we've set them. But in the models that you build in the future,
3456
06:13:06,440 --> 06:13:12,600
they will likely be set automatically rather than you setting them explicitly like we've done when
3457
06:13:12,600 --> 06:13:17,080
we created model zero. And oh my gosh, this is taking quite a while to run. That's all right.
3458
06:13:17,080 --> 06:13:20,920
We don't need it to run fast. We just, we need to write some more code, then you'll come on.
3459
06:13:20,920 --> 06:13:27,480
There's a step here I haven't discussed either. Set the model to training mode. So pytorch models
3460
06:13:27,480 --> 06:13:32,520
have a couple of different modes. The default is training mode. So we can set it to training
3461
06:13:32,520 --> 06:13:40,200
mode by going like this. Train. So what does train mode do in a pytorch model? My goodness.
3462
06:13:40,200 --> 06:13:44,600
Is there a reason my engineer is going this slide? That's all right. I'm just going to
3463
06:13:44,600 --> 06:13:56,920
discuss this with talking again list. Train mode. Train mode in pytorch sets. Oh, there we go.
3464
06:13:57,560 --> 06:14:06,680
Requires grad equals true. Now I wonder if we do with torch dot no grad member no grad is similar
3465
06:14:06,680 --> 06:14:14,360
to inference mode. Will this adjust? See, I just wanted to take note of requires grad equals
3466
06:14:14,360 --> 06:14:19,560
true. Actually, what I might do is we do this in a different cell. Watch this. This is just going
3467
06:14:19,560 --> 06:14:24,600
to be rather than me just spit words at you. I reckon we might be able to get it work in doing
3468
06:14:24,600 --> 06:14:31,560
this. Oh, that didn't list the model parameters. Why did that not come out? Model zero dot eval.
3469
06:14:33,320 --> 06:14:40,120
So there's two modes of our mode and train mode model dot eval parameters. Hey, we're experimenting
3470
06:14:40,120 --> 06:14:44,680
together on the fly here. And actually, this is what I want you to do is I want you to experiment
3471
06:14:44,680 --> 06:14:53,000
with different things. It's not going to say requires grad equals false. Hmm. With torch dot no
3472
06:14:53,000 --> 06:15:03,000
grad. Model zero dot parameters. I don't know if this will work, but it definitely works behind
3473
06:15:03,000 --> 06:15:07,240
the scenes. And what I mean by works behind the scenes are not here. It works behind the scenes
3474
06:15:07,240 --> 06:15:11,400
when calculations have been made, but not if we're trying to explicitly print things out.
3475
06:15:12,760 --> 06:15:16,680
Well, that's an experiment that I thought was going to work and it didn't work. So train
3476
06:15:16,680 --> 06:15:26,120
mode in pytorch sets all parameters that require gradients to require gradients.
3477
06:15:27,240 --> 06:15:32,120
So do you remember with the picture of the hill? I spoke about how we're trying to minimize the
3478
06:15:32,120 --> 06:15:37,480
gradient. So the gradient is the steepness of the hill. If the height of the hill is a loss function
3479
06:15:37,480 --> 06:15:42,760
and we want to take that down to zero, we want to take the gradient down to zero. So same thing
3480
06:15:42,760 --> 06:15:49,880
with the gradients of our model parameters, which are here with respect to the loss function,
3481
06:15:49,880 --> 06:15:54,840
we want to try and minimize that gradient. So that's gradient descent is take that gradient down to
3482
06:15:54,840 --> 06:16:05,160
zero. So model dot train. And then there's also model zero dot a vowel. So turns off gradient
3483
06:16:05,160 --> 06:16:12,840
tracking. So we're going to see that later on. But for now, I feel like this video is getting far
3484
06:16:12,840 --> 06:16:17,560
too long. Let's finish the training loop in the next video. I'll see you there.
3485
06:16:19,960 --> 06:16:24,520
Friends, welcome back. In the last video, I promised a lot of code, but we didn't get there. We
3486
06:16:24,520 --> 06:16:29,320
discussed some important steps. I forgot how much behind the scenes there is to apply towards training
3487
06:16:29,320 --> 06:16:34,280
loop. And I think it's important to spend the time that we did discussing what's going on,
3488
06:16:34,280 --> 06:16:38,920
because there's a fair few steps. But once you know what's going on, I mean, later on, we don't
3489
06:16:38,920 --> 06:16:42,920
have to write all the code that we're going to write in this video, you can functionize it. We're
3490
06:16:42,920 --> 06:16:47,320
going to see that later on in the course, and it's going to run behind the scenes for us. But we're
3491
06:16:47,320 --> 06:16:52,440
spending a fair bit of time here, because this is literally the crux of how our model learns. So
3492
06:16:52,440 --> 06:16:58,920
let's get into it. So now we're going to implement the forward pass, which involves our model's
3493
06:16:58,920 --> 06:17:05,320
forward function, which we defined up here. When we built our model, the forward pass runs through
3494
06:17:05,320 --> 06:17:12,120
this code here. So let's just write that. So in our case, because we're training, I'm just
3495
06:17:12,120 --> 06:17:19,800
going to write here. This is training. We're going to see dot of our later on. We'll talk
3496
06:17:19,800 --> 06:17:25,240
about that when it comes. Let's do the forward pass. So the forward pass, we want to pass data
3497
06:17:25,240 --> 06:17:31,480
through our model's forward method. We can do this quite simply by going y pred. So y predictions,
3498
06:17:31,480 --> 06:17:37,880
because remember, we're trying to use our ideal model is using x test to predict y test
3499
06:17:38,440 --> 06:17:44,520
on our test data set. We make predictions on our test data set. We learn on our training data set.
3500
06:17:44,520 --> 06:17:49,080
So we're passing, which is going to get rid of that because we don't need that. So we're
3501
06:17:49,080 --> 06:17:56,840
passing our model x train and model zero is going to be our current model. There we go. So we learn
3502
06:17:56,840 --> 06:18:03,000
patterns on the training data to evaluate our model on the test data. Number two, where we are.
3503
06:18:05,160 --> 06:18:10,200
So we have to calculate the loss. Now, in a previous video, we set up a loss function.
3504
06:18:10,200 --> 06:18:15,000
So this is going to help us calculate the what what kind of loss are we using? We want to calculate
3505
06:18:15,000 --> 06:18:21,880
the MAE. So the difference or the distance between our red dot and a green dot. And the formula would
3506
06:18:21,880 --> 06:18:27,400
be the same if we had 10,000 red dots and 10,000 green dots, we're calculating how far they are
3507
06:18:27,400 --> 06:18:37,320
apart. And then we're taking the mean of that value. So let's go back here. So calculate the loss.
3508
06:18:38,040 --> 06:18:42,760
And in our case, we're going to set loss equal to our loss function, which is L one loss in
3509
06:18:42,760 --> 06:18:53,240
PyTorch, but it is MAE. Y-pred and Y-train. So we're calculating the difference between our models
3510
06:18:53,240 --> 06:18:59,800
predictions on the training data set and the ideal training values. And if you want to go into
3511
06:19:00,680 --> 06:19:08,360
torch dot NN loss functions, that's going to show you the order because sometimes this confuses me
3512
06:19:08,360 --> 06:19:14,920
to what order the values go in here, but it goes prediction first, then labels and I may be wrong
3513
06:19:14,920 --> 06:19:20,600
there because I get confused here. My dyslexia kicks in, but I'm pretty sure it's predictions first,
3514
06:19:20,600 --> 06:19:29,720
then actual labels. Do we have an example of where it's used? Yeah, import first, target next.
3515
06:19:30,360 --> 06:19:35,240
So there we go. And truth be told, because it's mean absolute error, it shouldn't actually matter
3516
06:19:35,240 --> 06:19:40,920
too much. But in the case of staying true to the documentation, let's do inputs first and then
3517
06:19:40,920 --> 06:19:48,200
targets next for the rest of the course. Then we're going to go optimizer zero grad. Hmm,
3518
06:19:48,760 --> 06:19:52,520
haven't discussed this one, but that's okay. I'm going to write the code and then I'm going to
3519
06:19:52,520 --> 06:19:59,080
discuss what it does. So what does this do? Actually, before we discuss this, I'm going to write
3520
06:19:59,080 --> 06:20:04,680
these two steps because they kind of all work together. And it's a lot easier to discuss what
3521
06:20:04,680 --> 06:20:13,560
optimizer zero grad does in the context of having everything else perform back propagation
3522
06:20:14,680 --> 06:20:24,440
on the loss with respect to the parameters of the model. Back propagation is going to take
3523
06:20:24,440 --> 06:20:29,160
the loss value. So lost backward, I always say backwards, but it's just backward. That's the code
3524
06:20:29,160 --> 06:20:41,320
there. And then number five is step the optimizer. So perform gradient descent. So optimizer dot
3525
06:20:41,320 --> 06:20:47,720
step. Oh, look at us. We just wrote the five major steps of a training loop. Now let's discuss
3526
06:20:47,720 --> 06:20:54,040
how all of these work together. So it's kind of strange, like the ordering of these, you might
3527
06:20:54,040 --> 06:20:59,720
think, Oh, what should I do the order? Typically the forward pass and the loss come straight up.
3528
06:20:59,720 --> 06:21:05,160
Then there's a little bit of ambiguity around what order these have to come in. But the optimizer
3529
06:21:05,160 --> 06:21:12,520
step should come after the back propagation. So I just like to keep this order how it is because
3530
06:21:12,520 --> 06:21:17,720
this works. Let's just keep it that way. But what happens here? Well, it also is a little bit
3531
06:21:17,720 --> 06:21:23,320
confusing in the first iteration of the loop because we've got zero grad. But what happens here is
3532
06:21:23,320 --> 06:21:29,480
that the optimizer makes some calculations in how it should adjust model parameters with regards to
3533
06:21:29,480 --> 06:21:38,520
the back propagation of the loss. And so by default, these will by default, how the optimizer
3534
06:21:38,520 --> 06:21:54,600
changes will accumulate through the loop. So we have to zero them above in step three
3535
06:21:54,600 --> 06:22:01,240
for the next iteration of the loop. So a big long comment there. But what this is saying is,
3536
06:22:01,240 --> 06:22:06,760
let's say we go through the loop and the optimizer chooses a value of one, change it by one. And
3537
06:22:06,760 --> 06:22:10,360
then it goes through a loop again, if we didn't zero it, if we didn't take it to zero, because
3538
06:22:10,360 --> 06:22:16,120
that's what it is doing, it's going one to zero, it would go, okay, next one, two, three, four,
3539
06:22:16,120 --> 06:22:21,240
five, six, seven, eight, all through the loop, right? Because we're looping here. If this was
3540
06:22:21,800 --> 06:22:27,240
10, it would accumulate the value that it's supposed to change 10 times. But we want it to start
3541
06:22:27,240 --> 06:22:32,840
again, start fresh each iteration of the loop. And now the reason why it accumulates, that's
3542
06:22:32,840 --> 06:22:37,000
pretty deep in the pytorch documentation. From my understanding, there's something to do with
3543
06:22:37,000 --> 06:22:41,480
like efficiency of computing. If you find out what the exact reason is, I'd love to know.
3544
06:22:42,040 --> 06:22:49,320
So we have to zero it, then we perform back propagation. If you recall, back propagation is
3545
06:22:49,320 --> 06:22:57,000
discussed in here. And then with optimizer step, we form gradient descent. So the beauty of pytorch,
3546
06:22:57,000 --> 06:23:02,520
this is the beauty of pytorch, is that it will perform back propagation, we're going to have a
3547
06:23:02,520 --> 06:23:10,040
look at this in second, and gradient descent for us. So to prevent this video from getting too long,
3548
06:23:10,040 --> 06:23:15,160
I know we've just written code, but I would like you to practice writing a training loop
3549
06:23:15,160 --> 06:23:19,160
yourself, just write this code, and then run it and see what happens. Actually, you can comment
3550
06:23:19,160 --> 06:23:23,480
this out, we're going to write the testing loop in a second. So your extra curriculum for this
3551
06:23:23,480 --> 06:23:31,720
video is to, one, rewrite this training loop, is to, two, sing the pytorch optimization loop
3552
06:23:31,720 --> 06:23:37,320
song, let's go into here. If you want to remember the steps, well, I've got a song for you. This is
3553
06:23:37,320 --> 06:23:41,720
the training loop song, we haven't discussed the test step, but maybe you could try this yourself.
3554
06:23:42,520 --> 06:23:47,640
So this is an old version of the song, actually, I've got a new one for you. But let's sing this
3555
06:23:47,640 --> 06:23:54,840
together. It's training time. So we do the forward pass, calculate the loss, optimise a zero grad,
3556
06:23:54,840 --> 06:24:01,160
loss backwards, optimise a step, step, step. Now you only have to call optimise a step once,
3557
06:24:01,160 --> 06:24:08,280
this is just for jingle purposes. But for test time, let's test with torch no grad, do the forward
3558
06:24:08,280 --> 06:24:15,640
pass, calculate the loss, watch it go down, down, down. That's from my Twitter, but this is a way
3559
06:24:15,640 --> 06:24:22,200
that I help myself remember the steps that are going on in the code here. And if you want the
3560
06:24:22,200 --> 06:24:31,400
video version of it, well, you're just going to have to search unofficial pytorch optimisation loop
3561
06:24:31,400 --> 06:24:38,440
song. Oh, look at that, who's that guy? Well, he looks pretty cool. So I'll let you check that
3562
06:24:38,440 --> 06:24:46,440
out in your own time. But for now, go back through the training loop steps. I've got a colorful
3563
06:24:46,440 --> 06:24:50,280
graphic coming up in the next video, we're going to write the testing steps. And then we're going
3564
06:24:50,280 --> 06:24:55,000
to go back one more time and talk about what's happening in each of them. And again, if you'd
3565
06:24:55,000 --> 06:24:59,720
like some even more extra curriculum, don't forget the videos I've shown you on back propagation
3566
06:24:59,720 --> 06:25:05,800
and gradient descent. But for now, let's leave this video here. I'll see you in the next one.
3567
06:25:05,800 --> 06:25:12,680
Friends, welcome back. In the last few videos, we've been discussing the steps in a training
3568
06:25:12,680 --> 06:25:17,880
loop in pytorch. And there's a fair bit going on. So in this video, we're going to step back
3569
06:25:17,880 --> 06:25:23,400
through what we've done just to recap. And then we're going to get into testing. And it's nice
3570
06:25:23,400 --> 06:25:28,520
and early where I am right now. The sun's about to come up. It's a very, very beautiful morning
3571
06:25:28,520 --> 06:25:34,040
to be writing code. So let's jump in. We've got a little song here for what we're doing in the
3572
06:25:34,040 --> 06:25:42,360
training steps. For an epoch in a range, comodel.train, do the forward pass, calculate the loss of the
3573
06:25:42,360 --> 06:25:51,160
measure zero grad, last backward of the measure step step step. That's the little jingle I use to
3574
06:25:51,160 --> 06:25:55,720
remember the steps in here, because the first time you write it, there's a fair bit going on.
3575
06:25:55,720 --> 06:26:01,480
But subsequent steps and subsequent times that you do write it, you'll start to memorize this.
3576
06:26:01,480 --> 06:26:06,840
And even better later on, we're going to put it into a function so that we can just call it
3577
06:26:06,840 --> 06:26:12,600
over and over and over and over again. With that being said, let's jump in to a colorful slide,
3578
06:26:13,160 --> 06:26:18,280
because that's a lot of code on the page. Let's add some color to it, understand what's happening.
3579
06:26:18,280 --> 06:26:24,840
That way you can refer to this and go, Hmm, I see what's going on now. So for the loop, this is why
3580
06:26:24,840 --> 06:26:30,760
it's called a training loop. We step through a number of epochs. One epoch is a single forward
3581
06:26:30,760 --> 06:26:36,760
pass through the data. So pass the data through the model for a number of epochs. Epox is a
3582
06:26:36,760 --> 06:26:42,600
hyper parameter, which means you could set it to 100, you could set it to 1000, you could set it
3583
06:26:42,600 --> 06:26:49,720
to one as we're going to see later on in this video. We skip this step with the colors, but
3584
06:26:49,720 --> 06:26:55,720
we put the model in we call model.train. This is the default mode that the model is in.
3585
06:26:55,720 --> 06:27:01,400
Essentially, it sets up a whole bunch of settings behind the scenes in our model parameters so that
3586
06:27:01,400 --> 06:27:07,400
they can track the gradients and do a whole bunch of learning behind the scenes with these
3587
06:27:07,400 --> 06:27:13,640
functions down here. PyTorch does a lot of this for us. So the next step is the forward pass.
3588
06:27:14,280 --> 06:27:18,920
We perform a forward pass on the training data in the training loop. This is an important note.
3589
06:27:18,920 --> 06:27:24,760
In the training loop is where the model learns patterns on the training data. Whereas in the
3590
06:27:24,760 --> 06:27:30,440
testing loop, we haven't got to that yet is where we evaluate the patterns that our model has learned
3591
06:27:30,440 --> 06:27:36,120
or the parameters that our model has learned on unseen data. So we pass the data through the model,
3592
06:27:36,120 --> 06:27:41,160
this will perform the forward method located within the model object. So because we created
3593
06:27:41,160 --> 06:27:46,600
a model object, you can actually call your models whatever you want, but it's good practice to
3594
06:27:46,600 --> 06:27:51,080
you'll often see it just called model. And if you remember, we'll go back to the code.
3595
06:27:51,080 --> 06:27:59,240
We created a forward method in our model up here, which is this, because our linear regression model,
3596
06:27:59,240 --> 06:28:06,120
class, subclasses, nn.module, we need to create our own custom forward method. So that's why it's
3597
06:28:06,120 --> 06:28:11,320
called a forward pass is because not only does it, well, the technical term is forward propagation.
3598
06:28:11,320 --> 06:28:20,920
So if we have a look at a neural network picture, forward propagation just means going through
3599
06:28:20,920 --> 06:28:26,680
the network from the input to the output, there's a thing called back propagation, which we're going
3600
06:28:26,680 --> 06:28:31,960
to discuss in a second, which happens when we call loss.backward, which is going backward through
3601
06:28:31,960 --> 06:28:40,440
the model. But let's return to our colorful slide. We've done the forward pass, call a forward method,
3602
06:28:40,440 --> 06:28:48,360
which performs some calculation on the data we pass it. Next is we calculate the loss value,
3603
06:28:48,360 --> 06:28:53,720
how wrong the model's predictions are. And this will depend on what loss function you use,
3604
06:28:53,720 --> 06:28:58,200
what kind of predictions your model is outputting, and what kind of true values you have.
3605
06:28:58,760 --> 06:29:03,880
But that's what we're doing here. We're comparing our model's predictions on the training data
3606
06:29:03,880 --> 06:29:11,320
to what they should ideally be. And these will be the training labels. The next step, we zero
3607
06:29:11,320 --> 06:29:17,320
the optimizer gradients. So why do we do this? Well, it's a little confusing for the first epoch in
3608
06:29:17,320 --> 06:29:23,240
the loop. But as we get down to optimizer dot step here, the gradients that the optimizer
3609
06:29:23,240 --> 06:29:30,840
calculates accumulate over time so that for each epoch for each loop step, we want them to go back
3610
06:29:30,840 --> 06:29:37,480
to zero. And now the exact reason behind why the optimizer accumulates gradients is buried somewhere
3611
06:29:37,480 --> 06:29:43,240
within the pie torch documentation. I'm not sure of the exact reason from memory. It's because of
3612
06:29:43,240 --> 06:29:48,680
compute optimization. It just adds them up in case you wanted to know what they were. But if
3613
06:29:48,680 --> 06:29:57,160
you find out exactly, I'd love to know. Next step is to perform back propagation on the loss function.
3614
06:29:57,160 --> 06:30:02,760
That's what we're calling loss. backward. Now back propagation is we compute the gradient of
3615
06:30:02,760 --> 06:30:08,680
every parameter with requires grad equals true. And if you recall, we go back to our code.
3616
06:30:08,680 --> 06:30:16,360
We've set requires grad equals true for our parameters. Now the reason we've set requires
3617
06:30:16,360 --> 06:30:23,400
grad equals true is not only so back propagation can be performed on it. But let me show you what
3618
06:30:23,400 --> 06:30:30,600
the gradients look like. So let's go loss function curve. That's a good idea. So we're looking for
3619
06:30:30,600 --> 06:30:38,600
so we're looking for some sort of convex curve here. There we go. L two loss. We're using L one loss
3620
06:30:38,600 --> 06:30:43,800
at the moment. Is there a better one here? All we need is just a nice looking curve. Here we go.
3621
06:30:44,760 --> 06:30:51,160
So this is why we keep track of the gradients behind the scenes. Pie torch is going to create
3622
06:30:51,160 --> 06:30:56,440
some sort of curve for all of our parameters that looks like this. Now this is just a 2d plot.
3623
06:30:56,440 --> 06:31:02,120
So the reason why we're just using an example from Google images is one, because you're going to
3624
06:31:02,120 --> 06:31:08,200
spend a lot of your time Googling different things. And two, in practice, when you have your own
3625
06:31:08,200 --> 06:31:14,760
custom neural networks, right now we only have two parameters. So it's quite easy to visualize a
3626
06:31:14,760 --> 06:31:21,800
loss function curve like this. But when you have say 10 million parameters, you basically can't
3627
06:31:21,800 --> 06:31:27,000
visualize what's going on. And so pie torch again will take care of these things behind the scenes.
3628
06:31:27,000 --> 06:31:33,400
But what it's doing is when we say requires grad pie torch is going to track the gradients
3629
06:31:33,400 --> 06:31:39,480
of each of our parameters. And so what we're trying to do here with back propagation and
3630
06:31:39,480 --> 06:31:46,840
subsequently gradient descent is calculate where the lowest point is. Because this is a loss function,
3631
06:31:46,840 --> 06:31:53,240
this is MSC loss, we could trade this out to be MAE loss in our case or L1 loss for our specific
3632
06:31:53,240 --> 06:31:59,800
problem. But this is some sort of parameter. And we calculate the gradients because what is the
3633
06:31:59,800 --> 06:32:11,800
gradient? Let's have a look. What is a gradient? A gradient is an inclined part of a road or railway.
3634
06:32:11,800 --> 06:32:17,160
Now we want it in machine learning. What's it going to give us in machine learning, a gradient
3635
06:32:17,160 --> 06:32:23,000
is a derivative of a function that has more than one input variable. Okay, let's dive in a little
3636
06:32:23,000 --> 06:32:28,280
deeper. See, here's some beautiful loss landscapes. We're trying to get to the bottom of here. This
3637
06:32:28,280 --> 06:32:35,800
is what gradient descent is all about. So oh, there we go. So this is a cost function, which is also a
3638
06:32:35,800 --> 06:32:40,920
loss function. We start with a random initial variable. What have we done? We started with a
3639
06:32:40,920 --> 06:32:47,080
random initial variable. Right? Okay. And then we take a learning step. Beautiful. This is W. So
3640
06:32:47,080 --> 06:32:52,120
this could be our weight parameter. Okay, we're connecting the dots here. This is exciting.
3641
06:32:52,840 --> 06:32:57,000
We've got a lot of tabs here, but that's all right. We'll bring this all together in a second.
3642
06:32:57,000 --> 06:33:02,040
And what we're trying to do is come to the minimum. Now, why do we need to calculate the gradients?
3643
06:33:02,040 --> 06:33:08,120
Well, the gradient is what? Oh, value of weight. Here we go. This is even better.
3644
06:33:08,120 --> 06:33:14,280
I love Google images. So this is our loss. And this is a value of a weight. So we calculate the
3645
06:33:14,280 --> 06:33:23,240
gradients. Why? Because the gradient is the slope of a line or the steepness. And so if we
3646
06:33:23,240 --> 06:33:28,440
calculate the gradient here, and we find that it's really steep right up the top of this,
3647
06:33:29,160 --> 06:33:34,520
this incline, we might head in the opposite direction to that gradient. That's what gradient
3648
06:33:34,520 --> 06:33:40,360
descent is. And so if we go down here, now, what are these step points? There's a little thing that
3649
06:33:40,360 --> 06:33:44,120
I wrote down in the last video at the end of the last video I haven't told you about yet,
3650
06:33:44,120 --> 06:33:50,520
but I was waiting for a moment like this. And if you recall, I said kind of all of these three steps
3651
06:33:50,520 --> 06:33:55,080
optimizes zero grad loss backward, optimizes step are all together. So we calculate the
3652
06:33:55,080 --> 06:33:59,960
gradients because we want to head in the opposite direction of that gradient to get to a gradient
3653
06:33:59,960 --> 06:34:05,080
value of zero. And if we get to a gradient value of zero with a loss function, well, then the loss
3654
06:34:05,080 --> 06:34:10,920
is also zero. So that's why we keep track of a gradient with requires grad equals true.
3655
06:34:10,920 --> 06:34:15,960
And again, PyTorch does a lot of this behind the scenes. And if you want to dig more into
3656
06:34:15,960 --> 06:34:20,920
what's going on here, I'm going to show you some extra resources for back propagation,
3657
06:34:20,920 --> 06:34:25,800
which is calculating this gradient curve here, and gradient descent, which is finding the bottom
3658
06:34:25,800 --> 06:34:30,680
of it towards the end of this video. And again, if we started over this side, we would just go
3659
06:34:30,680 --> 06:34:36,520
in the opposite direction of this. So maybe this is a positive gradient here, and we just go in the
3660
06:34:36,520 --> 06:34:41,640
opposite direction here. We want to get to the bottom. That is the main point of gradient descent.
3661
06:34:42,680 --> 06:34:50,600
And so if we come back, I said, just keep this step size in mind here. If we come back to where
3662
06:34:50,600 --> 06:34:56,600
we created our loss function and optimizer, I put a little tidbit here for the optimizer.
3663
06:34:57,480 --> 06:35:00,920
Because we've written a lot of code, and we haven't really discussed what's going on, but
3664
06:35:00,920 --> 06:35:06,360
I like to do things on the fly as we need them. So inside our optimizer, we'll have main two
3665
06:35:06,920 --> 06:35:11,560
parameters, which is params. So the model parameters you'd like to optimize,
3666
06:35:11,560 --> 06:35:17,160
params equals model zero dot parameters in our case. And then PyTorch is going to create
3667
06:35:17,160 --> 06:35:22,360
something similar to this curve, not visually, but just mathematically behind the scenes for
3668
06:35:22,360 --> 06:35:27,240
every parameter. Now, this is a value of weight. So this would just be potentially the weight
3669
06:35:27,240 --> 06:35:32,040
parameter of our network. But again, if you have 10 million parameters, there's no way you could
3670
06:35:32,040 --> 06:35:36,920
just create all of these curves yourself. That's the beauty of PyTorch. It's doing this behind the
3671
06:35:36,920 --> 06:35:43,960
scenes through a mechanism called torch autograd, which is auto gradient calculation. And there's
3672
06:35:43,960 --> 06:35:48,360
beautiful documentation on this. If you'd like to read more on how it works, please go through
3673
06:35:48,360 --> 06:35:53,000
that. But essentially behind the scenes, it's doing a lot of this for us for each parameter.
3674
06:35:53,000 --> 06:35:58,760
That's the optimizer. Then within the optimizer, once we've told it what parameters to optimize,
3675
06:35:58,760 --> 06:36:04,680
we have the learning rate. So the learning rate is another hyper parameter that defines how big or
3676
06:36:04,680 --> 06:36:11,080
small the optimizer changes the parameters with each step. So a small learning rate results in
3677
06:36:11,080 --> 06:36:16,520
small changes, whereas a large learning rate is in large changes. And so if we look at this
3678
06:36:16,520 --> 06:36:22,760
curve here, we might at the beginning start with large steps, so we can get closer and closer to
3679
06:36:22,760 --> 06:36:27,960
the bottom. But then as we get closer and closer to the bottom, to prevent stepping over to this
3680
06:36:27,960 --> 06:36:34,360
side of the curve, we might do smaller and smaller steps. And the optimizer in PyTorch,
3681
06:36:34,360 --> 06:36:39,400
there are optimizers that do that for us. But there is also another concept called learning
3682
06:36:39,400 --> 06:36:46,040
rate scheduling, which is, again, something if you would like to look up and do more. But
3683
06:36:46,040 --> 06:36:51,400
learning rate scheduling essentially says, hey, maybe start with some big steps. And then as we
3684
06:36:51,400 --> 06:36:57,480
get closer and closer to the bottom, reduce how big the steps are that we take. Because if you've
3685
06:36:57,480 --> 06:37:04,600
ever seen a coin, coin at the back of couch. This is my favorite analogy for this. If you've ever
3686
06:37:04,600 --> 06:37:11,080
tried to reach a coin at the back of a couch, like this excited young chap, if you're reaching
3687
06:37:11,080 --> 06:37:17,000
towards the back of a couch, you take quite big steps as you say your arm was over here,
3688
06:37:17,000 --> 06:37:22,520
you would take quite big steps until you get to about here. And in the closer you get to the coin,
3689
06:37:22,520 --> 06:37:27,880
the smaller and smaller your steps are. Otherwise, what's going to happen? The coin is going to be
3690
06:37:27,880 --> 06:37:34,440
lost. Or if you took two small steps, you'd never get to the coin. It would take forever to get there.
3691
06:37:34,440 --> 06:37:40,440
So that's the concept of learning rate. If you take two big steps, you're going to just end up
3692
06:37:40,440 --> 06:37:46,040
over here. If you take two small steps, it's going to take you forever to get to the bottom here.
3693
06:37:46,040 --> 06:37:49,640
And this bottom point is called convergence. That's another term you're going to come across. I
3694
06:37:49,640 --> 06:37:53,560
know I'm throwing a lot of different terms at you, but that's the whole concept of the learning
3695
06:37:53,560 --> 06:37:59,480
rate. How big is your step down here? In gradient descent. Gradient descent is this. Back propagation
3696
06:37:59,480 --> 06:38:05,640
is calculating these derivative curves or the gradient curves for each of the parameters in our
3697
06:38:05,640 --> 06:38:12,360
model. So let's get out of here. We'll go back to our training steps. Where were we? I think we're
3698
06:38:12,360 --> 06:38:19,640
up to back propagation. Have we done backward? Yes. So the back propagation is where we do the
3699
06:38:19,640 --> 06:38:25,080
backward steps. So the forward pass, forward propagation, go from input to output. Back propagation,
3700
06:38:25,080 --> 06:38:30,040
we take the gradients of the loss function with respect to each parameter in our model
3701
06:38:30,040 --> 06:38:35,320
by going backwards. That's what happens when we call loss.backward. PyTorch does that for us
3702
06:38:35,320 --> 06:38:43,720
behind the scenes. And then finally, step number five is step the optimizer. We've kind of discussed
3703
06:38:43,720 --> 06:38:51,560
that. As I said, if we take a step, let's get our loss curve back up. Loss function curve.
3704
06:38:51,560 --> 06:38:59,080
Doesn't really matter what curve we use. The optimizer step is taking a step this way to try
3705
06:38:59,080 --> 06:39:06,360
and optimize the parameters so that we can get down to the bottom here. And I also just noted
3706
06:39:06,360 --> 06:39:09,960
here that you can turn all of this into a function so we don't necessarily have to remember to
3707
06:39:09,960 --> 06:39:15,240
write these every single time. The ordering of this, you'll want to do the forward pass first.
3708
06:39:15,240 --> 06:39:19,560
And then calculate the loss because you can't calculate the loss unless you do the forward pass.
3709
06:39:19,560 --> 06:39:24,680
I like this ordering here of these three as well. But you also want to do the optimizer step
3710
06:39:24,680 --> 06:39:29,880
after the loss backward. So this is my favorite ordering. It works. If you like this ordering,
3711
06:39:29,880 --> 06:39:35,080
you can take that as well. With that being said, I think this video has gotten long enough.
3712
06:39:35,080 --> 06:39:42,680
In the next video, I'd like to step through this training loop one epoch at a time so that we can
3713
06:39:42,680 --> 06:39:47,080
see, I know I've just thrown a lot of words at you that this optimizer is going to try and
3714
06:39:47,080 --> 06:39:53,400
optimize our parameters each step. But let's see that in action how our parameters of our model
3715
06:39:53,400 --> 06:40:00,760
actually change every time we go through each one of these steps. So I'll see you in the next video.
3716
06:40:00,760 --> 06:40:09,160
Let's step through our model. Welcome back. And we've spent a fair bit of time on the training loop
3717
06:40:09,160 --> 06:40:13,160
and the testing loop. Well, we haven't even got to that yet, but there's a reason behind this,
3718
06:40:13,160 --> 06:40:17,400
because this is possibly one of the most important things aside from getting your data ready,
3719
06:40:17,400 --> 06:40:22,920
which we're going to see later on in PyTorch deep learning is writing the training loop,
3720
06:40:22,920 --> 06:40:27,000
because this is literally like how your model learns patterns and data. So that's why we're
3721
06:40:27,000 --> 06:40:31,320
spending a fair bit of time on here. And we'll get to the testing loop, because that's how you
3722
06:40:31,320 --> 06:40:35,960
evaluate the patterns that your model has learned from data, which is just as important as learning
3723
06:40:35,960 --> 06:40:41,640
the patterns themselves. And following on from the last couple of videos, I've just linked some
3724
06:40:42,440 --> 06:40:46,760
YouTube videos that I would recommend for extra curriculum for back propagation,
3725
06:40:46,760 --> 06:40:54,440
which is what happens when we call loss stop backward down here. And for the optimizer step,
3726
06:40:54,440 --> 06:40:59,560
gradient descent is what's happening there. So I've linked some extra resources for what's going
3727
06:40:59,560 --> 06:41:04,120
on behind the scenes there from a mathematical point of view. Remember, this course focuses on
3728
06:41:04,120 --> 06:41:09,320
writing PyTorch code. But if you'd like to dive into what math PyTorch is triggering behind the
3729
06:41:09,320 --> 06:41:15,640
scenes, I'd highly recommend these two videos. And I've also added a note here as to which
3730
06:41:15,640 --> 06:41:20,280
loss function and optimizer should I use, which is a very valid question. And again,
3731
06:41:20,280 --> 06:41:26,120
it's another one of those things that's going to be problem specific. But with experience over time,
3732
06:41:26,120 --> 06:41:30,440
you work with machine learning problems, you write a lot of code, you get an idea of what works
3733
06:41:30,440 --> 06:41:35,480
and what doesn't with your particular problem set. For example, like a regression problem,
3734
06:41:35,480 --> 06:41:41,400
like ours, regression is again predicting a number. We use MAE loss, which PyTorch causes
3735
06:41:41,400 --> 06:41:48,200
L1 loss. You could also use MSE loss and an optimizer like torch opt-in stochastic gradient
3736
06:41:48,200 --> 06:41:53,960
descent will suffice. But for classification, you might want to look into a binary classification,
3737
06:41:53,960 --> 06:41:58,520
a binary cross entropy loss, but we'll look at a classification problem later on in the course.
3738
06:41:59,240 --> 06:42:06,200
For now, I'd like to demonstrate what's going on in the steps here. So let's go model zero.
3739
06:42:07,240 --> 06:42:10,280
Let's look up the state dict and see what the parameters are for now.
3740
06:42:10,280 --> 06:42:17,880
Now they aren't the original ones I don't think. Let's re-instantiate our model so we get
3741
06:42:17,880 --> 06:42:27,720
re new parameters. Yeah, we recreated it here. I might just get rid of that. So we'll rerun our
3742
06:42:27,720 --> 06:42:35,800
model code, rerun model state dict. And we will create an instance of our model and just make
3743
06:42:35,800 --> 06:42:40,520
sure your parameters should be something similar to this. If it's not exactly like that, it doesn't
3744
06:42:40,520 --> 06:42:45,400
matter. But yeah, I'm just going to showcase you'll see on my screen what's going on anyway.
3745
06:42:46,280 --> 06:42:55,240
State dict 3367 for the weight and 012888 for the bias. And again, I can't stress enough. We've
3746
06:42:55,240 --> 06:43:00,360
only got two parameters for our model and we've set them ourselves future models that you build
3747
06:43:00,360 --> 06:43:05,160
and later ones in the course will have much, much more. And we won't actually explicitly set any
3748
06:43:05,160 --> 06:43:11,480
of them ourselves. We'll check out some predictions. They're going to be terrible because we're using
3749
06:43:11,480 --> 06:43:17,640
random parameters to begin with. But we'll set up a new loss function and an optimizer. Optimizer
3750
06:43:17,640 --> 06:43:24,200
is going to optimize our model zero parameters, the weight and bias. The learning rate is 0.01,
3751
06:43:24,200 --> 06:43:30,120
which is relatively large step. That would be a bit smaller. Remember, the larger the learning rate,
3752
06:43:30,120 --> 06:43:34,920
the bigger the step, the more the optimizer will try to change these parameters every step.
3753
06:43:34,920 --> 06:43:40,440
But let's stop talking about it. Let's see it in action. I've set a manual seed here too, by the way,
3754
06:43:40,440 --> 06:43:45,080
because the optimizer steps are going to be quite random as well, depending on how the models
3755
06:43:45,080 --> 06:43:50,680
predictions go. But this is just to try and make it as reproduces possible. So keep this in mind,
3756
06:43:50,680 --> 06:43:55,640
if you get different values to what we're going to output here from my screen to your screen,
3757
06:43:55,640 --> 06:44:02,520
don't worry too much. What's more important is the direction they're going. So ideally,
3758
06:44:02,520 --> 06:44:08,760
we're moving these values here. This is from we did one epoch before. We're moving these values
3759
06:44:08,760 --> 06:44:15,240
closer to the true values. And in practice, you won't necessarily know what the true values are.
3760
06:44:15,240 --> 06:44:19,080
But that's where evaluation of your model comes in. We're going to cover that when we write a
3761
06:44:19,080 --> 06:44:26,680
testing loop. So let's run one epoch. Now I'm going to keep that down there. Watch what happens.
3762
06:44:26,680 --> 06:44:33,480
We've done one epoch, just a single epoch. We've done the forward pass. We've calculated the loss.
3763
06:44:33,480 --> 06:44:39,160
We've done optimizer zero grad. We've performed back propagation. And we've stepped the optimizer.
3764
06:44:39,160 --> 06:44:45,240
What is stepping the optimizer do? It updates our model parameters to try and get them further
3765
06:44:45,240 --> 06:44:51,160
closer towards the weight and bias. If it does that, the loss will be closer to zero. That's what
3766
06:44:51,160 --> 06:44:58,920
it's trying to do. How about we print out the loss at the same time. Print loss and the loss.
3767
06:45:00,040 --> 06:45:08,280
Let's take another step. So the loss is 0301. Now we check the weights and the bias. We've changed
3768
06:45:08,280 --> 06:45:17,480
again three, three, four, four, five, one, four, eight, eight. We go again. The loss is going down.
3769
06:45:17,480 --> 06:45:24,200
Check it. Hey, look at that. The values are getting closer to where they should be if over so slightly.
3770
06:45:25,720 --> 06:45:31,960
Loss went down again. Oh my goodness, this is so amazing. Look, we're training our,
3771
06:45:31,960 --> 06:45:39,400
let's print this out in the same cell. Print our model. State dict. We're training our first
3772
06:45:39,400 --> 06:45:44,680
machine learning model here, people. This is very exciting, even if it's only step by step and it's
3773
06:45:44,680 --> 06:45:50,200
only a small model. This is very important. Loss is going down again. Values are getting closer to
3774
06:45:50,200 --> 06:45:55,720
where they should be. Again, we won't really know where they should be in real problems, but for
3775
06:45:55,720 --> 06:46:00,920
now we do. So let's just get excited. The real way to sort of measure your model's progress and
3776
06:46:00,920 --> 06:46:06,600
practice is a lower loss value. Remember, lower is better. A loss value measures how wrong your
3777
06:46:06,600 --> 06:46:11,480
model is. We're going down. We're going in the right direction. So that's what I meant by,
3778
06:46:11,480 --> 06:46:16,760
as long as your values are going in the similar direction. So down, we're writing similar code
3779
06:46:16,760 --> 06:46:21,960
here, but if your values are slightly different in terms of the exact numbers, don't worry too
3780
06:46:21,960 --> 06:46:26,680
much because that's inherent to the randomness of machine learning, because the steps that the
3781
06:46:26,680 --> 06:46:31,480
optimizer are taking are inherently random, but they're sort of pushed in a direction.
3782
06:46:32,120 --> 06:46:37,400
So we're doing gradient descent here. This is beautiful. How low can we get the loss? How about
3783
06:46:37,400 --> 06:46:43,960
we try to get to 0.1? Look at that. We're getting close to 0.1. And then, I mean, we don't have to
3784
06:46:43,960 --> 06:46:51,320
do this hand by hand. The bias is getting close to where it exactly should be. We're below 0.1.
3785
06:46:51,320 --> 06:46:56,920
Beautiful. So that was only about, say, 10 passes through the data, but now you're seeing it in
3786
06:46:56,920 --> 06:47:02,840
practice. You're seeing it happen. You're seeing gradient descent. Let's go gradient descent work
3787
06:47:02,840 --> 06:47:08,760
in action. We've got images. This is what's happening. We've got our cost function. J is
3788
06:47:08,760 --> 06:47:13,960
another term for cost function, which is also our loss function. We start with an initial weight.
3789
06:47:13,960 --> 06:47:20,360
What have we done? We started with an initial weight, this value here. And what are we doing?
3790
06:47:20,360 --> 06:47:24,840
We've measured the gradient pytorch has done that behind the scenes for us. Thank you pytorch.
3791
06:47:24,840 --> 06:47:29,800
And we're taking steps towards the minimum. That's what we're trying to do. If we minimize the
3792
06:47:29,800 --> 06:47:36,360
gradient of our weight, we minimize the cost function, which is also a loss function. We could
3793
06:47:36,360 --> 06:47:43,400
keep going here for hours and get as long as we want. But my challenge for you, or actually,
3794
06:47:43,400 --> 06:47:47,880
how about we make some predictions with our model we've got right now? Let's make some predictions.
3795
06:47:47,880 --> 06:47:52,280
So with torch dot inference mode, we'll make some predictions together. And then I'm going
3796
06:47:52,280 --> 06:47:58,040
to set you a challenge. How about you run this code here for 100 epochs after this video,
3797
06:47:58,040 --> 06:48:02,200
and then you make some predictions and see how that goes. So why preds? Remember how
3798
06:48:02,200 --> 06:48:09,480
poor our predictions are? Why preds new equals, we just do the forward pass here. Model zero
3799
06:48:09,480 --> 06:48:16,200
on the test data. Let's just remind ourselves quickly of how poor our previous predictions were.
3800
06:48:16,760 --> 06:48:23,160
Plot predictions, predictions equals y. Do we still have this saved? Why preds?
3801
06:48:23,160 --> 06:48:30,440
Hopefully, this is still saved. There we go. Shocking predictions, but we've just done 10 or so
3802
06:48:30,440 --> 06:48:35,960
epochs. So 10 or so training steps have our predictions. Do they look any better? Let's run
3803
06:48:35,960 --> 06:48:41,720
this. We'll copy this code. You know my rule. I don't really like to copy code, but in this case,
3804
06:48:41,720 --> 06:48:47,480
I just want to exemplify a point. I like to write all the code myself. What do we got? Why preds
3805
06:48:47,480 --> 06:48:54,920
new? Look at that. We are moving our predictions close at the red dots closer to the green dots.
3806
06:48:54,920 --> 06:48:59,720
This is what's happening. We're reducing the loss. In other words, we're reducing the difference
3807
06:48:59,720 --> 06:49:05,640
between our models predictions and our ideal outcomes through the power of back propagation
3808
06:49:05,640 --> 06:49:10,840
and gradient descent. So this is super exciting. We're training our first machine learning model.
3809
06:49:10,840 --> 06:49:17,080
My challenge to you is to run this code here. Change epochs to 100. See how low you can get this
3810
06:49:17,080 --> 06:49:24,600
loss value and run some predictions, plot them. And I think it's time to start testing. So give
3811
06:49:24,600 --> 06:49:30,360
that a go yourself, and then we'll write some testing code in the next video. I'll see you there.
3812
06:49:32,280 --> 06:49:38,760
Welcome back. In the last video, we did something super excited. We saw our loss go down. So the
3813
06:49:38,760 --> 06:49:44,200
loss is remember how different our models predictions are to what we'd ideally like them. And we saw
3814
06:49:44,200 --> 06:49:50,920
our model update its parameters through the power of back propagation and gradient descent, all
3815
06:49:50,920 --> 06:49:57,640
taken care of behind the scenes for us by PyTorch. So thank you, PyTorch. And again, if you'd like
3816
06:49:57,640 --> 06:50:03,000
some extra resources on what's actually happening from a math perspective for back propagation and
3817
06:50:03,000 --> 06:50:08,520
gradient descent, I would refer to you to these. Otherwise, this is also how I learn about things.
3818
06:50:08,520 --> 06:50:14,600
Gradient descent. There we go. How does gradient descent work? And then we've got back propagation.
3819
06:50:15,640 --> 06:50:21,640
And just to reiterate, I am doing this and just Googling these things because that's what you're
3820
06:50:21,640 --> 06:50:25,560
going to do in practice. You're going to come across a lot of different things that aren't
3821
06:50:25,560 --> 06:50:31,320
covered in this course. And this is seriously what I do day to day as a machine learning engineer
3822
06:50:31,320 --> 06:50:37,160
if I don't know what's going on. Just go to Google, read, watch a video, write some code,
3823
06:50:37,160 --> 06:50:43,000
and then I build my own intuition for it. But with that being said, I also issued you the challenge
3824
06:50:43,000 --> 06:50:50,440
of trying to run this training code for 100 epochs. Did you give that a go? I hope you did. And
3825
06:50:50,440 --> 06:50:55,560
how low did your loss value? Did the weights and bias get anywhere close to where they should have
3826
06:50:55,560 --> 06:51:01,160
been? How do the predictions look? Now, I'm going to save that for later on, running this code for
3827
06:51:01,160 --> 06:51:07,000
100 epochs. For now, let's write some testing code. And just a note, you don't necessarily have to
3828
06:51:07,000 --> 06:51:11,880
write the training and testing loop together. You can functionize them, which we will be doing later
3829
06:51:11,880 --> 06:51:17,240
on. But for the sake of this intuition, building and code practicing and first time where we're
3830
06:51:17,240 --> 06:51:23,560
writing this code together, I'm going to write them together. So testing code, we call model.ofour,
3831
06:51:23,560 --> 06:51:33,480
what does this do? So this turns off different settings in the model not needed for evaluation
3832
06:51:33,480 --> 06:51:39,560
slash testing. This can be a little confusing to remember when you're writing testing code. But
3833
06:51:39,560 --> 06:51:44,920
we're going to do it a few times until it's habit. So just make it a habit. If you're training your
3834
06:51:44,920 --> 06:51:50,120
model, call model dot train to make sure it's in training mode. If you're testing or evaluating
3835
06:51:50,120 --> 06:51:55,640
your model. So that's what a vowel stands for evaluate, call model dot a vowel. So it turns off
3836
06:51:55,640 --> 06:52:00,280
different settings in the model not needed for evaluation. So testing, this is things like drop
3837
06:52:00,280 --> 06:52:06,840
out. We haven't seen what drop out is slash batch norm layers. But if we go into torch dot
3838
06:52:06,840 --> 06:52:12,920
and end, I'm sure you'll come across these things in your future machine learning endeavors. So drop
3839
06:52:12,920 --> 06:52:21,400
out drop out layers. There we go. And batch norm. Do we have batch batch norm? There we go. If you'd
3840
06:52:21,400 --> 06:52:27,400
like to work out what they are, feel free to check out the documentation. Just take it from me for
3841
06:52:27,400 --> 06:52:34,520
now that model of our turns off different settings not needed for evaluation and testing. Then we
3842
06:52:34,520 --> 06:52:41,240
set up with torch dot inference mode, inference mode. So what does this do? Let's write down here.
3843
06:52:42,600 --> 06:52:49,480
So this turns off gradient tracking. So as we discussed, if we have parameters in our model,
3844
06:52:49,480 --> 06:52:56,600
and it turns off actually a few more things and a couple more things behind the scenes,
3845
06:52:57,640 --> 06:53:03,400
these are things again, not needed for testing. So we discussed that if parameters in our model
3846
06:53:03,400 --> 06:53:07,960
have requires grad equals true, which is the default for many different parameters in pytorch,
3847
06:53:08,840 --> 06:53:14,600
pytorch will behind the scenes keep track of the gradients of our model and use them in
3848
06:53:14,600 --> 06:53:20,520
lost up backward and optimizer step for back propagation and gradient descent. However,
3849
06:53:21,560 --> 06:53:26,040
we only need those two back propagation and gradient descent during training because that
3850
06:53:26,040 --> 06:53:31,800
is when our model is learning. When we are testing, we are just evaluating the parameters the patterns
3851
06:53:31,800 --> 06:53:36,920
that our model has learned on the training data set. So we don't need to do any learning
3852
06:53:36,920 --> 06:53:41,960
when we're testing. So we turn off the things that we don't need. And is this going to have
3853
06:53:41,960 --> 06:53:47,400
the correct spacing for me? I'm not sure we'll find out. So we still do the forward pass
3854
06:53:49,080 --> 06:53:53,960
in testing mode, do the forward pass. And if you want to look up torch inference mode,
3855
06:53:53,960 --> 06:53:58,280
just go torch inference mode. There's a great tweet about it that pytorch did, which explains
3856
06:53:58,280 --> 06:54:05,400
what's happening. I think we've covered this before, but yeah, want to make your inference
3857
06:54:05,400 --> 06:54:11,240
code and pytorch run faster. Here's a quick thread on doing exactly that. So inference
3858
06:54:11,240 --> 06:54:16,200
mode is torch no grad. Again, you might see torch no grad. I think I'll write that down just to
3859
06:54:17,560 --> 06:54:22,360
let you know. But here's what's happening behind the scenes. A lot of optimization code,
3860
06:54:22,360 --> 06:54:26,200
which is beautiful. This is why we're using pytorch so that our code runs nice and far.
3861
06:54:26,920 --> 06:54:33,480
Let me go there. You may also see with torch dot no grad in older pytorch code. It does
3862
06:54:33,480 --> 06:54:39,880
similar things, but inference mode is the faster way of doing things according to the thread.
3863
06:54:39,880 --> 06:54:43,320
And according to there's a blog post attached to there as well, I believe.
3864
06:54:44,120 --> 06:54:55,400
So you may also see torch dot no grad in older pytorch code, which would be valid. But again,
3865
06:54:55,400 --> 06:55:01,320
inference mode is the better way of doing things. So do forward pass. So let's get our model. We
3866
06:55:01,320 --> 06:55:06,520
want to create test predictions here. So we're going to go model zero. There's a lot of code
3867
06:55:06,520 --> 06:55:12,040
going on here, but I'm going to just step by step it in a second. We'll go back through it all.
3868
06:55:12,840 --> 06:55:17,960
And then number two is calculate the loss. Now we're doing the test predictions here,
3869
06:55:17,960 --> 06:55:24,680
calculate the loss test predictions with model zero. So now we want to calculate the what we want
3870
06:55:24,680 --> 06:55:31,480
to calculate the test loss. So this will be our loss function, the difference between the test
3871
06:55:31,480 --> 06:55:37,640
pred and the test labels. That's important. So for testing, we're working with test data,
3872
06:55:37,640 --> 06:55:43,080
for training, we're working with training data. Model learns patterns on the training data,
3873
06:55:43,080 --> 06:55:48,520
and it evaluates those patterns that it's learned, the different parameters on the testing data. It
3874
06:55:48,520 --> 06:55:53,960
has never seen before, just like in a university course, you'd study the course materials, which
3875
06:55:53,960 --> 06:55:58,680
is the training data, and you'd evaluate your knowledge on materials you'd hopefully never
3876
06:55:58,680 --> 06:56:03,800
seen before, unless you sort of were friends with your professor, and they gave you the exam before
3877
06:56:03,800 --> 06:56:08,760
the actual exam that would be cheating right. So that's a very important point for the test data
3878
06:56:08,760 --> 06:56:15,160
set. Don't let your model see the test data set before you evaluate it. Otherwise, you'll get
3879
06:56:15,160 --> 06:56:21,400
poor results. And that's putting it out what's happening. Epoch, we're going to go Epoch,
3880
06:56:21,400 --> 06:56:25,320
and then I will introduce you to my little jingle to remember all of these steps because
3881
06:56:25,320 --> 06:56:31,800
there's a lot going on. Don't you worry. I know there's a lot going on, but again, with practice,
3882
06:56:31,800 --> 06:56:40,520
we're going to know what's happening here. Like it's the back of our hand. All right.
3883
06:56:41,400 --> 06:56:49,000
So do we need this? Oh, yeah, we could say that. Oh, no, we don't need test here. Loss. This is
3884
06:56:49,000 --> 06:56:59,160
loss, not test. Print out what's happening. Okay. And we don't actually need to do this
3885
06:56:59,160 --> 06:57:06,760
every epoch. We could just go say if epoch divided by 10 equals zero, print out what's happening.
3886
06:57:06,760 --> 06:57:11,640
Let's do that rather than clutter everything up, print it out, and we'll print out this.
3887
06:57:12,920 --> 06:57:17,720
So let's just step through what's happening. We've got 100 epochs. That's what we're about to run,
3888
06:57:17,720 --> 06:57:22,760
100 epochs. Our model is trained for about 10 so far. So it's got a good base. Maybe we'll just
3889
06:57:22,760 --> 06:57:31,480
get rid of that base. Start a new instance of our model. So we'll come right back down.
3890
06:57:33,080 --> 06:57:38,040
So our model is back to randomly initialized parameters, but of course, randomly initialized
3891
06:57:38,040 --> 06:57:44,360
flavored with a random seed of 42. Lovely, lovely. And so we've got our training code here. We've
3892
06:57:44,360 --> 06:57:49,560
discussed what's happening there. Now, we've got our testing code. We call model dot eval,
3893
06:57:49,560 --> 06:57:54,680
which turns off different settings in the model, not needed for evaluation slash testing. We call
3894
06:57:54,680 --> 06:57:59,560
with torch inference mode context manager, which turns off gradient tracking and a couple more
3895
06:57:59,560 --> 06:58:05,960
things behind the scenes to make our code faster. We do the forward pass. We do the test predictions.
3896
06:58:05,960 --> 06:58:11,160
We pass our model, the test data, the test features to calculate the test predictions.
3897
06:58:11,160 --> 06:58:15,880
Then we calculate the loss using our loss function. We can use the same loss function that we used
3898
06:58:15,880 --> 06:58:21,240
for the training data. And it's called the test loss, because it's on the test data set.
3899
06:58:21,240 --> 06:58:25,720
And then we print out what's happening, because we want to know what's happening while our
3900
06:58:25,720 --> 06:58:30,280
model's training, we don't necessarily have to do this. But the beauty of PyTorch is you can
3901
06:58:30,280 --> 06:58:35,080
use basic Python printing statements to see what's happening with your model. And so,
3902
06:58:35,080 --> 06:58:38,920
because we're doing 100 epochs, we don't want to clutter up everything here. So we'll just
3903
06:58:38,920 --> 06:58:44,680
print out what's happening every 10th epoch. Again, you can customize this as much as you like
3904
06:58:44,680 --> 06:58:49,720
what's printing out here. This is just one example. If you had other metrics here, such as calculating
3905
06:58:49,720 --> 06:58:55,000
model accuracy, we might see that later on, hint hint. We might print out our model accuracy.
3906
06:58:55,640 --> 06:59:00,280
So this is very exciting. Are you ready to run 100 epochs? How low do you think our loss can go?
3907
06:59:02,200 --> 06:59:07,720
This loss was after about 10. So let's just save this here. Let's give it a go. Ready?
3908
06:59:07,720 --> 06:59:17,000
Three, two, one. Let's run. Oh my goodness. Look at that. Waits. Here we go. Every 10 epochs
3909
06:59:17,000 --> 06:59:22,520
were printing out what's happening. So the zero epoch, we started with losses 312. Look at it go
3910
06:59:22,520 --> 06:59:28,920
down. Yes, that's what we want. And our weights and bias, are they moving towards our ideal weight
3911
06:59:28,920 --> 06:59:34,520
and bias values of 0.7 and 0.3? Yes, they're moving in the right direction here. The loss is
3912
06:59:34,520 --> 06:59:43,080
going down. Epoch 20, wonderful. Epoch 30, even better. 40, 50, going down, down, down. Yes,
3913
06:59:43,080 --> 06:59:48,200
this is what we want. This is what we want. Now, we're predicting a straight line here. Look how
3914
06:59:48,200 --> 06:59:54,840
low the loss gets. After 100 epochs, we've got about three times less than what we had before.
3915
06:59:55,880 --> 07:00:03,640
And then we've got these values are quite close to where they should be, 0.5629, 0.3573. We'll make
3916
07:00:03,640 --> 07:00:08,680
some predictions. What do they look like? Why preds new? This is the original predictions
3917
07:00:08,680 --> 07:00:15,320
with random values. And if we make why preds new, look how close it is after 100 epochs.
3918
07:00:15,960 --> 07:00:20,920
Now, what's our, do we print out the test loss? Oh no, we're printing out loss as well.
3919
07:00:21,480 --> 07:00:25,160
Let's get rid of that. I think this is this. Yeah, that's this statement here. Our code would have
3920
07:00:25,160 --> 07:00:30,680
been a much cleaner if we didn't have that, but that's all right. Life goes on. So our test loss,
3921
07:00:30,680 --> 07:00:35,400
because this is the test predictions that we're making, is not as low as our training loss.
3922
07:00:36,520 --> 07:00:42,200
I wonder how we could get that lower. What do you think we could do? We just trained it for
3923
07:00:42,200 --> 07:00:46,440
longer. And what happened? How do you think you could get these red dots to line up with these
3924
07:00:46,440 --> 07:00:51,960
green dots? Do you think you could? So that's my challenge to you for the next video.
3925
07:00:51,960 --> 07:00:56,360
Think of something that you could do to get these red dots to match up with these green dots,
3926
07:00:56,360 --> 07:01:01,880
maybe train for longer. How do you think you could do that? So give that a shot. And I'll see
3927
07:01:01,880 --> 07:01:08,440
in the next video, we'll review what our testing code is doing. I'll see you there.
3928
07:01:10,280 --> 07:01:15,960
Welcome back. In the last video, we did something super exciting. We trained our model for 100 epochs
3929
07:01:15,960 --> 07:01:21,880
and look how good the predictions got. But I finished it off with challenging you to see if you could
3930
07:01:21,880 --> 07:01:27,800
align the red dots with the green dots. And it's okay if you're not sure how the best way to do
3931
07:01:27,800 --> 07:01:31,320
that. That's what we're here for. We're here to learn what are the best way to do these things
3932
07:01:31,320 --> 07:01:36,520
together. But you might have had the idea of potentially training the model for a little bit
3933
07:01:36,520 --> 07:01:43,800
longer. So how could we do that? Well, we could just rerun this code. So the model is going to
3934
07:01:43,800 --> 07:01:49,400
remember the parameters that it has from what we've done here. And if we rerun it, well, it's going
3935
07:01:49,400 --> 07:01:55,000
to start from where it finished off, which is already pretty good for our data set. And then
3936
07:01:55,000 --> 07:01:59,880
it's going to try and improve them even more. This is, I can't stress enough, like what we are
3937
07:01:59,880 --> 07:02:04,680
doing here is going to be very similar throughout the entire rest of the course for training more
3938
07:02:04,680 --> 07:02:10,200
and more models. So this step that we've done here for training our model and evaluating it
3939
07:02:10,760 --> 07:02:18,280
is seriously like the fundamental steps of deep learning with PyTorch is training and evaluating
3940
07:02:18,280 --> 07:02:23,800
a model. And we've just done it. Although I'll be it to predict some red dots and green dots.
3941
07:02:25,080 --> 07:02:29,960
That's all right. So let's try to line them up, hey, red dots onto green dots. I reckon if we
3942
07:02:29,960 --> 07:02:36,520
train it for another 100 epochs, we should get pretty darn close. Ready? Three, two, one. I'm
3943
07:02:36,520 --> 07:02:41,480
going to run this cell again. Runs really quick because our data's nice and simple. But
3944
07:02:41,480 --> 07:02:49,640
look at this, lastly, we started 0244. Where do we get down to? 008. Oh my goodness. So we've
3945
07:02:49,640 --> 07:02:55,720
improved it by another three X or so. And now this is where our model has got really good.
3946
07:02:55,720 --> 07:03:03,720
On the test loss, we've gone from 00564. We've gone down to 005. So almost 10X improvement there.
3947
07:03:04,360 --> 07:03:10,680
And so we make some more predictions. What are our model parameters? Remember the ideal ones here.
3948
07:03:10,680 --> 07:03:15,400
We won't necessarily know them in practice, but because we're working with a simple data set,
3949
07:03:15,400 --> 07:03:21,080
we know what the ideal parameters are. Model zero state dig weights. These are what they
3950
07:03:21,080 --> 07:03:27,880
previously were. What are they going to change to? Oh, would you look at that? Oh,
3951
07:03:27,880 --> 07:03:34,440
06990. Now, again, if yours are very slightly different to mine, don't worry too much. That is
3952
07:03:34,440 --> 07:03:39,880
the inherent randomness of machine learning and deep learning. Even though we set a manual seed,
3953
07:03:39,880 --> 07:03:46,680
it may be slightly different. The direction is more important. So if your number here is not
3954
07:03:46,680 --> 07:03:52,760
exactly what mine is, it should still be quite close to 0.7. And the same thing with this one.
3955
07:03:52,760 --> 07:03:57,880
If it's not exactly what mine is, don't worry too much. The same with all of these loss values
3956
07:03:57,880 --> 07:04:03,320
as well. The direction is more important. So we're pretty darn close. How do these predictions
3957
07:04:03,320 --> 07:04:10,440
look? Remember, these are the original ones. We started with random. And now we've trained a model.
3958
07:04:10,440 --> 07:04:16,520
So close. So close to being exactly that. So a little bit off. But that's all right. We could
3959
07:04:16,520 --> 07:04:21,880
tweak a few things to improve this. But I think that's well and truly enough for this example
3960
07:04:21,880 --> 07:04:26,680
purpose. You see what's happened. Of course, we could just create a model and set the parameters
3961
07:04:26,680 --> 07:04:30,840
ourselves manually. But where would be the fun in that? We just wrote some machine learning code
3962
07:04:30,840 --> 07:04:38,040
to do it for us with the power of back propagation and gradient descent. Now in the last video,
3963
07:04:38,040 --> 07:04:43,000
we wrote the testing loop. We discussed a few other steps here. But now let's go over it with
3964
07:04:43,000 --> 07:04:49,320
a colorful slide. Hey, because I mean, code on a page is nice, but colors are even nicer. Oh,
3965
07:04:49,880 --> 07:04:55,640
we haven't done this. We might set up this in this video too. But let's just discuss what's going on.
3966
07:04:56,280 --> 07:05:00,600
Create an empty list for storing useful value. So this is helpful for tracking model progress.
3967
07:05:00,600 --> 07:05:05,560
How can we just do this right now? Hey, we'll go here and we'll go.
3968
07:05:07,960 --> 07:05:13,480
So what did we have? Epoch count equals that. And then we'll go
3969
07:05:14,920 --> 07:05:19,320
lost values. So why do we keep track of these? It's because
3970
07:05:21,560 --> 07:05:27,240
if we want to monitor our models progress, this is called tracking experiments. So track
3971
07:05:27,240 --> 07:05:33,160
different values. If we wanted to try and improve upon our current model with a future model. So
3972
07:05:33,160 --> 07:05:39,160
our current results, such as this, if we wanted to try and improve upon it, we might build an
3973
07:05:39,160 --> 07:05:43,720
entire other model. And we might train it in a different setup. We might use a different learning
3974
07:05:43,720 --> 07:05:48,600
rate. We might use a whole bunch of different settings, but we track the values so that we
3975
07:05:48,600 --> 07:05:54,440
can compare future experiments to past experiments, like the brilliant scientists that we are.
3976
07:05:54,440 --> 07:06:02,920
And so where could we use these lists? Well, we're calculating the loss here. And we're calculating
3977
07:06:02,920 --> 07:06:13,000
the test loss here. So maybe we each time append what's going on here as we do a status update.
3978
07:06:13,000 --> 07:06:24,360
So epoch count dot append, and we're going to go a current epoch. And then we'll go loss values
3979
07:06:24,360 --> 07:06:34,520
dot append, a current loss value. And then we'll do test loss values dot append, the current test
3980
07:06:34,520 --> 07:06:41,960
loss values. Wonderful. And now let's re-instantiate our model so that it starts from fresh. So this
3981
07:06:41,960 --> 07:06:46,120
is just create another instance. So we're just going to re-initialize our model parameters to
3982
07:06:46,120 --> 07:06:50,360
start from zero. If we wanted to, we could functionize all of this so we don't have to
3983
07:06:50,360 --> 07:06:55,320
go right back up to the top of the code. But just for demo purposes, we're doing it how we're doing
3984
07:06:55,320 --> 07:07:00,440
it. And I'm going to run this for let's say 200 epochs, because that's what we ended up doing,
3985
07:07:00,440 --> 07:07:06,520
right? We ran it for 200 epochs, because we did 100 epochs twice. And I want to show you something
3986
07:07:06,520 --> 07:07:10,520
beautiful, one of the most beautiful sites in machine learning. So there we go, we run it for
3987
07:07:10,520 --> 07:07:17,000
200 epochs, we start with a fairly high training loss value and a fairly high test loss value. So
3988
07:07:17,000 --> 07:07:23,000
remember, what is our loss value? It's ma e. So if we go back, yeah, this is what we're measuring
3989
07:07:23,000 --> 07:07:30,120
for loss. So this means for the test loss on average, each of our dot points here, the red
3990
07:07:30,120 --> 07:07:38,920
predictions are 0.481. That's the average distance between each dot point. And then ideally, what
3991
07:07:38,920 --> 07:07:45,320
are we doing? We're trying to minimize this distance. That's the ma e. So the mean absolute error.
3992
07:07:45,320 --> 07:07:51,640
And we get it right down to 0.05. And if we make predictions, what do we have here, we get very
3993
07:07:51,640 --> 07:07:58,520
close to the ideal weight and bias, make our predictions, have a look at the new predictions.
3994
07:07:58,520 --> 07:08:02,200
Yeah, very small distance here. Beautiful. That's a low loss value.
3995
07:08:03,000 --> 07:08:08,680
Ideally, they'd line up, but we've got as close as we can for now. So this is one of the most
3996
07:08:08,680 --> 07:08:15,400
beautiful sites in machine learning. So plot the loss curves. So let's make a plot, because what
3997
07:08:15,400 --> 07:08:23,560
we're doing, we were tracking the value of epoch count, loss values and test loss values.
3998
07:08:24,440 --> 07:08:31,000
Let's have a look at what these all look like. So epoch count goes up, loss values ideally go down.
3999
07:08:31,560 --> 07:08:36,920
So we'll get rid of that. We're going to create a plot p l t dot plot. We're going to step back
4000
07:08:36,920 --> 07:08:45,640
through the test loop in a second with some colorful slides, label equals train loss.
4001
07:08:48,120 --> 07:08:53,560
And then we're going to go plot. You might be able to tell what's going on here. Test loss
4002
07:08:53,560 --> 07:09:00,040
values. We're going to visualize it, because that's the data explorer's motto, right, is visualize,
4003
07:09:00,040 --> 07:09:06,680
visualize, visualize. This is equals. See, collab does this auto correct. That doesn't really work
4004
07:09:06,680 --> 07:09:13,800
very well. And I don't know when it does it and why it doesn't. And we got, I know, we didn't,
4005
07:09:13,800 --> 07:09:17,000
we didn't say loss value. So that's a good auto correct. Thank you, collab.
4006
07:09:18,920 --> 07:09:24,920
So training and loss and test loss curves. So this is another term you're going to come across
4007
07:09:24,920 --> 07:09:30,280
often is a loss curve. Now you might be able to think about a loss curve. If we're doing a loss
4008
07:09:30,280 --> 07:09:34,760
curve, and it's starting at the start of training, what do we want that curve to do?
4009
07:09:36,360 --> 07:09:42,760
What do we want our loss value to do? We want it to go down. So what should an ideal loss
4010
07:09:42,760 --> 07:09:47,160
curve look like? Well, we're about to see a couple. Let's have a look. Oh, what do we got wrong?
4011
07:09:47,160 --> 07:09:57,160
Well, we need to, I'll turn it into NumPy. Is this what we're getting wrong? So why is this wrong?
4012
07:09:58,920 --> 07:10:02,840
Loss values. Why are we getting an issue? Test loss values.
4013
07:10:05,480 --> 07:10:13,240
Ah, it's because they're all tens of values. So I think we should, let's,
4014
07:10:13,240 --> 07:10:22,200
I might change this to NumPy. Oh, can I just do that? If I just call this as a NumPy array,
4015
07:10:22,200 --> 07:10:28,280
we're going to try and fix this on the fly. People, NumPy array, we'll just turn this into a NumPy
4016
07:10:28,280 --> 07:10:37,320
array. Let's see if we get NumPy. I'm figuring these things out together. NumPy as NumPy,
4017
07:10:37,320 --> 07:10:44,280
because mapplotlib works with NumPy. Yeah, there we go. So can we do loss values? Maybe
4018
07:10:45,720 --> 07:10:51,160
I'm going to try one thing, torch dot tensor, loss values, and then call
4019
07:10:53,720 --> 07:10:58,520
CPU dot NumPy. See what happens here.
4020
07:10:58,520 --> 07:11:06,440
There we go. Okay, so let's just copy this. So what we're doing here is
4021
07:11:06,440 --> 07:11:13,240
our loss values are still on PyTorch, and they can't be because mapplotlib works with
4022
07:11:14,360 --> 07:11:19,560
NumPy. And so what we're doing here is we're converting our loss values of the training loss
4023
07:11:19,560 --> 07:11:25,240
to NumPy. And if you call from the fundamental section, we call CPU and NumPy, I wonder if we
4024
07:11:25,240 --> 07:11:31,240
can just do straight up NumPy, because we're not working on there. Yeah, okay, we don't need
4025
07:11:31,240 --> 07:11:36,120
CPU because we're not working on the GPU yet, but we might need that later on. Well, this work.
4026
07:11:36,120 --> 07:11:40,920
Beautiful. There we go. One of the most beautiful sides in machine learning is a declining loss
4027
07:11:40,920 --> 07:11:47,640
curve. So this is how we keep track of our experiments, or one way, quite rudimentary. We'd like to
4028
07:11:47,640 --> 07:11:53,320
automate this later on. But I'm just showing you one way to keep track of what's happening.
4029
07:11:53,320 --> 07:11:58,280
So the training loss curve is going down here. The training loss starts at 0.3, and then it goes
4030
07:11:58,280 --> 07:12:03,480
right down. The beautiful thing is they match up. If there was a two bigger distance behind the
4031
07:12:03,480 --> 07:12:09,080
train loss and the test loss, or sorry, between, then we're running into some problems. But if they
4032
07:12:09,080 --> 07:12:14,440
match up closely at some point, that means our model is converging and the loss is getting as
4033
07:12:14,440 --> 07:12:20,040
close to zero as it possibly can. If we trained for longer, maybe the loss will go almost basically
4034
07:12:20,040 --> 07:12:25,480
to zero. But that's an experiment I'll leave you to try to train that model for longer.
4035
07:12:25,480 --> 07:12:32,680
Let's just step back through our testing loop to finish off this video. So we did that. We created
4036
07:12:32,680 --> 07:12:37,800
empty lists for strong useful values, storing useful values, strong useful values. Told the
4037
07:12:37,800 --> 07:12:43,080
model what we want to evaluate or that we want to evaluate. So we put it in an evaluation mode.
4038
07:12:43,080 --> 07:12:47,400
It turns off functionality used for training, but not evaluations, such as drop out and batch
4039
07:12:47,400 --> 07:12:51,720
normalization layers. If you want to learn more about them, you can look them up in the documentation.
4040
07:12:52,680 --> 07:12:58,520
Turn on torch inference mode. So this is for faster performance. So we don't necessarily need this,
4041
07:12:58,520 --> 07:13:03,320
but it's good practice. So I'm going to say that yes, turn on torch inference mode. So this
4042
07:13:03,320 --> 07:13:08,120
disables functionality such as gradient tracking for inference. Gradient tracking is not needed
4043
07:13:08,120 --> 07:13:14,440
for inference only for training. Now we pass the test data through the model. So this will call
4044
07:13:14,440 --> 07:13:19,080
the models implemented forward method. The forward pass is the exact same as what we did in the
4045
07:13:19,080 --> 07:13:25,560
training loop, except we're doing it on the test data. So big notion there, training loop,
4046
07:13:25,560 --> 07:13:32,280
training data, testing loop, testing data. Then we calculate the test loss value,
4047
07:13:32,280 --> 07:13:37,240
how wrong the models predictions are on the test data set. And of course, lower is better.
4048
07:13:37,800 --> 07:13:43,320
And finally, we print out what's happening. So we can keep track of what's going on during
4049
07:13:43,320 --> 07:13:47,720
training. We don't necessarily have to do this. You can customize this print value to print out
4050
07:13:47,720 --> 07:13:54,520
almost whatever you want, because it's pie torches, basically very beautifully interactive with pure
4051
07:13:54,520 --> 07:14:00,600
Python. And then we keep track of the values of what's going on on epochs and train loss and test
4052
07:14:00,600 --> 07:14:06,120
loss. We could keep track of other values here. But for now, we're just going, okay, what's the loss
4053
07:14:06,120 --> 07:14:11,800
value at a particular epoch for the training set? And for the test set. And of course, all of this
4054
07:14:11,800 --> 07:14:16,200
could be put into a function. And that way we won't have to remember these steps off by heart.
4055
07:14:16,200 --> 07:14:21,640
But the reason why we've spent so much time on this is because we're going to be using this
4056
07:14:21,640 --> 07:14:26,040
training and test functionality for all of the models that we build throughout this course.
4057
07:14:26,600 --> 07:14:31,880
So give yourself a pat in the back for getting through all of these videos. We've written a lot
4058
07:14:31,880 --> 07:14:37,000
of code. We've discussed a lot of steps. But if you'd like a song to remember what's happening,
4059
07:14:37,000 --> 07:14:43,160
let's finish this video off with my unofficial PyTorch optimization loop song.
4060
07:14:43,160 --> 07:14:51,320
So for an epoch in a range, go model dot train, do the forward pass, calculate the loss, optimize
4061
07:14:51,320 --> 07:14:59,000
a zero grad, loss backward, optimize a step, step, step. No, you only have to call this once.
4062
07:14:59,000 --> 07:15:05,720
But now let's test, go model dot eval with torch inference mode, do the forward pass,
4063
07:15:05,720 --> 07:15:11,160
calculate the loss. And then the real song goes for another epoch because you keep going back
4064
07:15:11,160 --> 07:15:18,840
through. But we finish off with print out what's happening. And then of course, we evaluate what's
4065
07:15:18,840 --> 07:15:23,880
going on. With that being said, it's time to move on to another thing. But if you'd like to review
4066
07:15:23,880 --> 07:15:29,640
what's happening, please, please, please try to run this code for yourself again and check out the
4067
07:15:29,640 --> 07:15:35,640
slides and also check out the extra curriculum. Oh, by the way, if you want to link to all
4068
07:15:35,640 --> 07:15:41,320
of the extra curriculum, just go to the book version of the course. And it's all going to be in here.
4069
07:15:41,880 --> 07:15:47,960
So that's there ready to go. Everything I link is extra curriculum will be in the extra curriculum
4070
07:15:47,960 --> 07:15:57,400
of each chapter. I'll see you in the next video. Welcome back. In the last video, we saw how to
4071
07:15:57,400 --> 07:16:03,560
train our model and evaluate it by not only looking at the loss metrics and the loss curves,
4072
07:16:03,560 --> 07:16:07,640
but we also plotted our predictions and we compared them. Hey, have a go at these random
4073
07:16:07,640 --> 07:16:12,520
predictions. Quite terrible. But then we trained a model using the power of back propagation and
4074
07:16:12,520 --> 07:16:17,880
gradient descent. And now look at our predictions. They're almost exactly where we want them to be.
4075
07:16:18,520 --> 07:16:22,040
And so you might be thinking, well, we've trained this model and it took us a while to
4076
07:16:22,040 --> 07:16:28,040
write all this code to get some good predictions. How might we run that model again? So I've took
4077
07:16:28,040 --> 07:16:33,160
in a little break after the last video, but now I've come back and you might notice that my Google
4078
07:16:33,160 --> 07:16:40,520
Colab notebook has disconnected. So what does this mean if I was to run this? Is it going to work?
4079
07:16:41,080 --> 07:16:47,480
I'm going to connect to a new Google Colab instance. But will we have all of the code that we've run
4080
07:16:47,480 --> 07:16:53,160
above? You might have already experienced this if you took a break before and came back to the
4081
07:16:53,160 --> 07:16:59,000
videos. Ah, so plot predictions is no longer defined. And do you know what that means? That
4082
07:16:59,000 --> 07:17:04,280
means that our model is also no longer defined. So we would have lost our model. We would have
4083
07:17:04,280 --> 07:17:10,040
lost all of that effort of training. Now, luckily, we didn't train the model for too long. So we can
4084
07:17:10,040 --> 07:17:16,120
just go run time, run all. And it's going to rerun all of the previous cells and be quite quick.
4085
07:17:17,160 --> 07:17:21,800
Because we're working with a small data set and using a small model. But we've been through all
4086
07:17:21,800 --> 07:17:26,680
of this code. Oh, what have we got wrong here? Model zero state dict. Well, that's all right.
4087
07:17:26,680 --> 07:17:31,720
This is good. We're finding errors. So if you want to as well, you can just go run after. It's going
4088
07:17:31,720 --> 07:17:38,600
to run all of the cells after. Beautiful. And we come back down. There's our model training.
4089
07:17:38,600 --> 07:17:42,840
We're getting very similar values to what we got before. There's the lost curves. Beautiful.
4090
07:17:42,840 --> 07:17:47,480
Still going. Okay. Now our predictions are back because we've rerun all the cells and we've got
4091
07:17:47,480 --> 07:17:58,200
our model here. So what we might cover in this video is saving a model in PyTorch. Because if
4092
07:17:58,200 --> 07:18:03,320
we're training a model and you get to a certain point, especially when you have a larger model,
4093
07:18:03,320 --> 07:18:09,000
you probably want to save it and then reuse it in this particular notebook itself. Or you might
4094
07:18:09,000 --> 07:18:13,400
want to save it somewhere and send it to your friend so that your friend can try it out. Or you
4095
07:18:13,400 --> 07:18:18,680
might want to use it in a week's time. And if Google Colab is disconnected, you might want to
4096
07:18:18,680 --> 07:18:24,440
be able to load it back in somehow. So now let's see how we can save our models in PyTorch. So
4097
07:18:25,960 --> 07:18:32,520
I'm going to write down here. There are three main methods you should know about
4098
07:18:34,360 --> 07:18:40,520
for saving and loading models in PyTorch because of course with saving comes loading. So we're
4099
07:18:40,520 --> 07:18:47,800
going to over the next two videos discuss saving and loading. So one is torch.save. And as you might
4100
07:18:47,800 --> 07:19:01,560
guess, this allows you to save a PyTorch object in Python's pickle format. So you may or may not
4101
07:19:01,560 --> 07:19:09,880
be aware of Python pickle. There we go. Python object serialization. There we go. So we've got
4102
07:19:09,880 --> 07:19:15,480
the pickle module implements a binary protocols or implements binary protocols for serializing
4103
07:19:15,480 --> 07:19:21,800
and deserializing a Python object. So serializing means I understand it is saving and deserializing
4104
07:19:21,800 --> 07:19:28,200
means that it's loading. So this is what PyTorch uses behind the scenes, which is from pure Python.
4105
07:19:28,920 --> 07:19:35,080
So if we go back here in Python's pickle format, number two is torch.load, which you might be able
4106
07:19:35,080 --> 07:19:44,040
to guess what that does as well, allows you to load a saved PyTorch object. And number three is
4107
07:19:44,040 --> 07:19:53,640
also very important is torch.nn.module.loadStatedict. Now what does this allow you to do? Well,
4108
07:19:54,200 --> 07:20:02,760
this allows you to load a model's saved dictionary or save state dictionary. Yeah, that's what we'll
4109
07:20:02,760 --> 07:20:08,360
call it. Save state dictionary. Beautiful. And what's the model state dict? Well, let's have a look,
4110
07:20:08,360 --> 07:20:14,600
model zero dot state dict. The beauty of PyTorch is that it stores a lot of your model's important
4111
07:20:14,600 --> 07:20:21,080
parameters in just a simple Python dictionary. Now it might not be that simple because our model,
4112
07:20:21,080 --> 07:20:25,960
again, only has two parameters. In the future, you may be working with models with millions of
4113
07:20:25,960 --> 07:20:32,520
parameters. So looking directly at the state deck may not be as simple as what we've got here.
4114
07:20:32,520 --> 07:20:39,800
But the principle is still the same. It's still a dictionary that holds the state of your model.
4115
07:20:39,800 --> 07:20:43,960
And so I've got these three methods I want to show you where from because this is going to be
4116
07:20:43,960 --> 07:20:49,800
your extra curriculum, save and load models, your extra curriculum for this video.
4117
07:20:50,920 --> 07:20:56,760
If we go into here, this is a very, very, very important piece of PyTorch documentation,
4118
07:20:56,760 --> 07:21:02,200
or maybe even a tutorial. So your extra curriculum for this video is to go through it.
4119
07:21:02,200 --> 07:21:07,080
Here we go. We've got torch, save, torch, load, torch, module, state deck. That's where, or load
4120
07:21:07,080 --> 07:21:12,200
state deck, that's where I've got the three things that we've just written down. And there's a fair
4121
07:21:12,200 --> 07:21:17,160
few different pieces of information. So what is a state deck? So in PyTorch, the learnable
4122
07:21:17,160 --> 07:21:22,280
parameters, i.e. the weights and biases of a torch and end module, which is our model.
4123
07:21:22,280 --> 07:21:27,800
Remember, our model subclasses and end module are contained in the model's parameters. Access
4124
07:21:27,800 --> 07:21:33,960
with model.parameters, a state deck is simply a Python dictionary object that maps each layer
4125
07:21:33,960 --> 07:21:39,320
to its parameter tensor. That's what we've seen. And so then if we define a model,
4126
07:21:39,320 --> 07:21:44,040
we can initialize the model. And if we wanted to print the state decked, we can use that.
4127
07:21:44,040 --> 07:21:49,000
The optimizer also has a state deck. So that's something to be aware of. You can go optimizer.state
4128
07:21:49,000 --> 07:21:55,800
deck. And then you get an output here. And this is our saving and loading model for inference. So
4129
07:21:55,800 --> 07:22:00,040
inference, again, is making a prediction. That's probably what we want to do in the future at some
4130
07:22:00,040 --> 07:22:05,160
point. For now, we've made predictions right within our notebook. But if we wanted to use our model
4131
07:22:05,160 --> 07:22:11,320
outside of our notebook, say in an application, or in another notebook that's not this one,
4132
07:22:11,320 --> 07:22:16,200
you'll want to know how to save and load it. So the recommended way of saving and loading a
4133
07:22:16,200 --> 07:22:21,480
PyTorch model is by saving its state deck. Now, there is another method down here,
4134
07:22:21,480 --> 07:22:27,320
which is saving and loading the entire model. So your extracurricular for this lesson,
4135
07:22:29,320 --> 07:22:33,800
we're going to go through the code to do this. But your extracurricular is to read all of the
4136
07:22:34,840 --> 07:22:40,040
sections in here, and then figure out what the pros and cons are of saving and loading the entire
4137
07:22:40,040 --> 07:22:46,280
model versus saving and loading just the state deck. So that's a challenge for you for this video.
4138
07:22:46,280 --> 07:22:50,120
I'm going to link this in here. And now let's write some code to save our model.
4139
07:22:50,120 --> 07:23:07,320
So PyTorch save and load code. Code tutorial plus extracurricular. So if we go
4140
07:23:10,040 --> 07:23:16,840
saving our PyTorch model. So what might we want? What do you think the save parameter takes?
4141
07:23:16,840 --> 07:23:24,440
If we have torch.save, what do you think it takes inside it? Well, let's find out together.
4142
07:23:24,440 --> 07:23:29,480
Hey, so let's import part lib. We're going to see why in a second. This is Python's
4143
07:23:29,480 --> 07:23:35,000
module for dealing with writing file paths. So if we wanted to save something to this is Google
4144
07:23:35,000 --> 07:23:41,160
Colab's file section over here. But just remember, if we do save this from within Google Colab,
4145
07:23:41,160 --> 07:23:48,040
the model will disappear if our Google Colab notebook instance disconnects. So I'll show you
4146
07:23:48,040 --> 07:23:55,880
how to download it from Google Colab if you want. Google Colab also has a way save from Google Colab
4147
07:23:57,080 --> 07:24:02,760
Google Colab to Google Drive to save it to your Google Drive if you wanted to. But I'll leave you
4148
07:24:02,760 --> 07:24:09,000
to look at that on your own if you like. So we're first going to create a model directory.
4149
07:24:09,000 --> 07:24:15,640
So create models directory. So this is going to help us create a folder over here called models.
4150
07:24:15,640 --> 07:24:21,480
And of course, we could create this by hand by adding a new folder here somewhere. But I like
4151
07:24:21,480 --> 07:24:28,440
to do it with code. So model path, we're going to set this to path, which is using the path library
4152
07:24:28,440 --> 07:24:34,920
here to create us a path called models. Simple. We're just going to save all of our models to
4153
07:24:34,920 --> 07:24:41,400
models to the models file. And then we're going to create model path, we're going to make that
4154
07:24:41,400 --> 07:24:47,880
directory model path dot mkdir for make directory. We're going to set parents to equals true.
4155
07:24:49,000 --> 07:24:53,880
And we're also going to set exist okay equals to true. That means if it already existed,
4156
07:24:53,880 --> 07:24:59,160
it won't throw us an error. It will try to create it. But if it already exists, it'll just recreate
4157
07:24:59,160 --> 07:25:04,040
the parents directory or it'll leave it there. It won't error out on us. We're also going to
4158
07:25:04,040 --> 07:25:10,600
create a model save path. This way, we can give our model a name. Right now, it's just model zero.
4159
07:25:12,520 --> 07:25:17,560
We want to save it under some name to the models directory. So let's create the model name.
4160
07:25:18,600 --> 07:25:25,720
Model name equals 01. I'm going to call it 01 for the section. That way, if we have more models
4161
07:25:25,720 --> 07:25:30,120
later on the course, we know which ones come from where you might create your own naming
4162
07:25:30,120 --> 07:25:37,320
convention, model workflow, pytorch workflow, model zero dot pth. And now this is another
4163
07:25:37,320 --> 07:25:46,280
important point. Pytorch objects usually have the extension dot pth for pytorch or dot pth.
4164
07:25:46,280 --> 07:25:52,360
So if we go in here, and if we look up dot pth, yeah, a common convention is to save models
4165
07:25:52,360 --> 07:25:58,600
using either a dot pth or dot pth file extension. I'll let you choose which one you like. I like
4166
07:25:58,600 --> 07:26:05,480
dot pth. So if we go down here dot pth, they both result in the same thing. You just have to remember
4167
07:26:05,480 --> 07:26:11,960
to make sure you write the right loading path and right saving path. So now we're going to create
4168
07:26:11,960 --> 07:26:17,240
our model save path, which is going to be our model path. And because we're using the path lib,
4169
07:26:17,240 --> 07:26:23,880
we can use this syntax that we've got here, model path slash model name. And then if we just print out
4170
07:26:23,880 --> 07:26:32,760
model save path, what does this look like? There we go. So it creates a supposic path
4171
07:26:32,760 --> 07:26:40,760
using the path lib library of models slash 01 pytorch workflow model zero dot pth. We haven't
4172
07:26:40,760 --> 07:26:45,560
saved our model there yet. It's just got the path that we want to save our model ready. So if we
4173
07:26:45,560 --> 07:26:52,520
refresh this, we've got models over here. Do we have anything in there? No, we don't yet. So now
4174
07:26:52,520 --> 07:26:59,240
is our step to save the model. So three is save the model state dict. Why are we saving the state
4175
07:26:59,240 --> 07:27:04,440
dict? Because that's the recommended way of doing things. If we come up here, saving and loading the
4176
07:27:04,440 --> 07:27:09,400
model for inference, save and load the state dict, which is recommended. We could also save the entire
4177
07:27:09,400 --> 07:27:15,240
model. But that's part of your extra curriculum to look into that. So let's use some syntax. It's
4178
07:27:15,240 --> 07:27:20,200
quite like this torch dot save. And then we pass it an object. And we pass it a path of where to
4179
07:27:20,200 --> 07:27:24,760
save it. We already have a path. And good thing is we already have a model. So we just have to call
4180
07:27:24,760 --> 07:27:36,360
this. Let's try it out. So let's go print f saving model to and we'll put in the path here.
4181
07:27:37,640 --> 07:27:44,040
Model save path. I like to print out some things here and there that way. We know what's going on.
4182
07:27:44,040 --> 07:27:51,880
And I don't need that capital. Why do I? Getting a little bit trigger happy here with the typing.
4183
07:27:51,880 --> 07:27:57,480
So torch dot save. And we're going to pass in the object parameter here. And if we looked up torch
4184
07:27:57,480 --> 07:28:07,480
save, we can go. What does this code take? So torch save object f. What is f? A file like object.
4185
07:28:07,480 --> 07:28:13,800
Okay. Or a string or OS path like object. Beautiful. That's what we've got. A path like
4186
07:28:13,800 --> 07:28:21,480
object containing a file name. So let's jump back into here. The object is what? It's our model zero
4187
07:28:21,480 --> 07:28:29,320
dot state dict. That's what we're saving. And then the file path is model save path. You ready?
4188
07:28:29,320 --> 07:28:35,800
Let's run this and see what happens. Beautiful. Saving model to models. So it's our model path.
4189
07:28:35,800 --> 07:28:39,480
And there's our model there. So if we refresh this, what do we have over here?
4190
07:28:39,480 --> 07:28:44,680
Wonderful. We've saved our trained model. So that means we could potentially if we wanted to,
4191
07:28:44,680 --> 07:28:49,480
you could download this file here. That's going to download it from Google CoLab to your local
4192
07:28:49,480 --> 07:28:56,440
machine. That's one way to do it. But there's also a guide here to save from Google Collaboratory
4193
07:28:56,440 --> 07:29:01,160
to Google Drive. That way you could use it later on. So there's many different ways.
4194
07:29:01,160 --> 07:29:06,440
The beauty of pie torches is flexibility. So now we've got a saved model. But let's just check
4195
07:29:06,440 --> 07:29:15,080
using our LS command. We're going to check models. Yeah, let's just check models. This is going to
4196
07:29:15,080 --> 07:29:23,800
check here. So this is list. Wonderful. There's our 01 pie torch workflow model zero dot pth. Now,
4197
07:29:23,800 --> 07:29:29,480
of course, we've saved a model. How about we try loading it back in and seeing how it works. So if
4198
07:29:29,480 --> 07:29:35,160
you want to challenge, read ahead on the documentation and try to use torch dot load to bring our model
4199
07:29:35,160 --> 07:29:42,520
back in. See what happens. I'll see in the next video. Welcome back. In the last video, we wrote
4200
07:29:42,520 --> 07:29:47,320
some code here to save our pie torch model. I'm just going to exit out of this couple of things
4201
07:29:47,320 --> 07:29:53,080
that we don't need just to clear up the screen. And now we've got our dot pth file, because remember
4202
07:29:53,080 --> 07:29:58,840
dot pth or dot pth is a common convention for saving a pie torch model. We've got it saved there,
4203
07:29:58,840 --> 07:30:03,640
and we didn't necessarily have to write all of this path style code. But this is just handy for
4204
07:30:03,640 --> 07:30:11,000
later on if we wanted to functionize this and create it in say a save dot pie file over here,
4205
07:30:11,000 --> 07:30:16,280
so that we could just call our save function and pass it in a file path where we wanted to save
4206
07:30:16,280 --> 07:30:21,480
like a directory and a name, and then it'll save it exactly how we want it for later on.
4207
07:30:22,280 --> 07:30:27,960
But now we've got a saved model. I issued a challenge of trying to load that model in.
4208
07:30:27,960 --> 07:30:33,880
So do we have torch dot load in here? Did you try that out? We've got, oh, we've got a few options
4209
07:30:33,880 --> 07:30:39,720
here. Wonderful. But we're using one of the first ones. So let's go back up here. If we wanted to
4210
07:30:39,720 --> 07:30:46,440
check the documentation for torch dot load, we've got this option here, load. What happens? Loads
4211
07:30:46,440 --> 07:30:52,600
and objects saved with torch dot save from a file. Torch dot load uses Python's unpickling
4212
07:30:52,600 --> 07:30:59,080
facilities, but treat storages which underlie tenses specially. They are firstly serialized
4213
07:30:59,080 --> 07:31:05,720
on the CPU, and then I moved the device they were saved from. Wonderful. So this is moved to the
4214
07:31:05,720 --> 07:31:11,160
device. If later on when we're using a GPU, this is just something to keep in mind. We'll see that
4215
07:31:11,160 --> 07:31:17,240
when we start to use a CPU and a GPU. But for now, let's practice using the torch dot load method
4216
07:31:17,240 --> 07:31:24,200
and see how we can do it. So we'll come back here and we'll go loading a pytorch model.
4217
07:31:24,760 --> 07:31:31,560
And since we, she's going to start writing here, since we saved our models state debt,
4218
07:31:32,360 --> 07:31:36,200
so just the dictionary of parameters from a model, rather than
4219
07:31:36,200 --> 07:31:47,320
the entire model, we'll create a new instance of our model class and load the state deck,
4220
07:31:49,480 --> 07:31:54,040
load the saved state deck. That's better state deck into that.
4221
07:31:55,480 --> 07:32:01,560
Now, this is just words on a page. Let's see this in action. So to load in a state deck,
4222
07:32:01,560 --> 07:32:05,160
which is what we say, we didn't save the entire model itself, which is one option.
4223
07:32:05,160 --> 07:32:11,480
That's extra curriculum, but we saved just the model state deck. So if we remind ourselves what
4224
07:32:11,480 --> 07:32:17,160
model zero dot state deck looks like, we saved just this. So to load this in, we have to
4225
07:32:20,360 --> 07:32:27,800
instantiate a new class or a new instance of our linear regression model class. So to load in a
4226
07:32:27,800 --> 07:32:40,120
saved state deck, we have to instantiate a new instance of our model class. So let's call this
4227
07:32:40,120 --> 07:32:46,120
loaded model zero. I like that. That way we can differentiate because it's still going to be the
4228
07:32:46,120 --> 07:32:52,120
same parameters as model zero, but this way we know that this instance is the loaded version,
4229
07:32:52,120 --> 07:32:57,320
not just the version we've been training before. So we'll create a new version of it here,
4230
07:32:57,320 --> 07:33:02,920
linear regression model. This is just the code that we wrote above, linear regression model.
4231
07:33:03,480 --> 07:33:14,040
And then we're going to load the saved state deck of model zero. And so this will update the new
4232
07:33:14,040 --> 07:33:23,080
instance with updated parameters. So let's just check before we load it, we haven't written any
4233
07:33:23,080 --> 07:33:27,480
code to actually load anything. What does loaded model zero? What does the state deck look like here?
4234
07:33:28,520 --> 07:33:31,080
It won't have anything. It'll be initialized with what?
4235
07:33:31,960 --> 07:33:38,600
Oh, loaded. That's what I called it loaded. See how it's initialized with random parameters.
4236
07:33:38,600 --> 07:33:43,960
So essentially all we're doing when we load a state dictionary into our new instance of our
4237
07:33:43,960 --> 07:33:49,880
model is that we're going, hey, take the saved state deck from this model and plug it into this.
4238
07:33:49,880 --> 07:33:56,200
So let's see what happens when we do that. So loaded model zero. Remember how I said there's
4239
07:33:56,200 --> 07:34:03,240
a method to also be aware of up here, which is torch nn module dot load state deck. And because
4240
07:34:03,240 --> 07:34:09,800
our model is a what, it's a subclass of torch dot nn dot module. So we can call load state deck
4241
07:34:09,800 --> 07:34:16,920
on our model directly or on our instance. So recall linear regression model is a subclass
4242
07:34:16,920 --> 07:34:24,280
of nn dot module. So let's call in load state deck. And this is where we call the torch dot load
4243
07:34:24,280 --> 07:34:31,720
method. And then we pass it the model save path. Is that what we call it? Because torch dot load,
4244
07:34:31,720 --> 07:34:39,560
it takes in F. So what's F a file like object or a string or a OS path like object. So that's
4245
07:34:39,560 --> 07:34:45,720
why we created this path like object up here. Model save path. So all we're doing here,
4246
07:34:45,720 --> 07:34:51,000
we're creating a new instance, linear regression model, which is a subclass of nn dot module.
4247
07:34:51,000 --> 07:34:58,520
And then on that instance, we're calling in load state deck of torch dot load model save path.
4248
07:34:58,520 --> 07:35:04,120
Because what's saved at the model save path, our previous models state deck, which is here.
4249
07:35:04,120 --> 07:35:09,960
So if we run this, let's see what happens. All keys match successfully. That is beautiful.
4250
07:35:09,960 --> 07:35:15,800
And so see the values here, loaded state deck of model zero. Well, let's check the loaded version
4251
07:35:15,800 --> 07:35:22,760
of that. We now have wonderful, we have the exact same values as above. But there's a little
4252
07:35:22,760 --> 07:35:30,040
way that we can test this. So how about we go make some predictions. So make some predictions.
4253
07:35:30,040 --> 07:35:39,960
Just to make sure with our loaded model. So let's put it in a valve mode. Because when you make
4254
07:35:39,960 --> 07:35:45,320
predictions, you want it in evaluation mode. So it goes a little bit faster. And we want to
4255
07:35:45,320 --> 07:35:53,240
also use inference mode. So with torch dot inference mode for making predictions. We want to write
4256
07:35:53,240 --> 07:35:58,120
this loaded model preds, we're going to make some predictions on the test data as well. So loaded
4257
07:35:58,120 --> 07:36:04,040
model zero, we're going to forward pass on the X test data. And then we can have a look at the
4258
07:36:04,040 --> 07:36:12,920
loaded model preds. Wonderful. And then to see if the two models are the same, we can compare
4259
07:36:14,040 --> 07:36:23,320
loaded model preds with original model preds. So why preds? These should be equivalent equals
4260
07:36:23,320 --> 07:36:31,880
equals loaded model preds. Do we have the same thing? False, false, false, what's going on here?
4261
07:36:32,840 --> 07:36:42,440
Why preds? How much different are they? Oh, where's that happened? Have we made some
4262
07:36:42,440 --> 07:36:50,600
model preds with this yet? So how about we make some model preds? This is troubleshooting on
4263
07:36:50,600 --> 07:37:00,520
the fly team. So let's go model zero dot eval. And then with torch dot inference mode,
4264
07:37:00,520 --> 07:37:06,040
this is how we can check to see that our two models are actually equivalent. Why preds equals,
4265
07:37:06,040 --> 07:37:13,160
I have a feeling why preds actually save somewhere else equals model zero. And then we pass it the
4266
07:37:13,160 --> 07:37:22,040
X test data. And then we might move this above here. And then have a look at what why preds equals.
4267
07:37:23,000 --> 07:37:30,760
Do we get the same output? Yes, we should. Wonderful. Okay, beautiful. So now we've covered
4268
07:37:30,760 --> 07:37:36,360
saving and loading models or specifically saving the models state deck. So we saved it here with
4269
07:37:36,360 --> 07:37:42,920
this code. And then we loaded it back in with load state deck plus torch load. And then we
4270
07:37:42,920 --> 07:37:48,360
checked to see by testing equivalents of the predictions of each of our models. So the original
4271
07:37:48,360 --> 07:37:53,720
one that we trained here, model zero, and the loaded version of it here. So that's saving and
4272
07:37:53,720 --> 07:37:58,680
loading a model in pytorch. There are a few more things that we could cover. But I'm going to leave
4273
07:37:58,680 --> 07:38:04,760
that for extra curriculum. We've covered the two main things or three main things. One, two, three.
4274
07:38:04,760 --> 07:38:09,240
If you'd like to read more, I'd highly encourage you to go through and read this tutorial here.
4275
07:38:09,240 --> 07:38:14,680
But with that being said, we've covered a fair bit of ground over the last few videos. How about
4276
07:38:14,680 --> 07:38:20,280
we do a few videos where we put everything together just to reiterate what we've done.
4277
07:38:20,280 --> 07:38:23,240
I think that'll be good practice. I'll see you in the next video.
4278
07:38:25,400 --> 07:38:30,360
Welcome back. Over the past few videos, we've covered a whole bunch of ground in a pytorch
4279
07:38:30,360 --> 07:38:35,800
workflow, starting with data, then building a model. Well, we split the data, then we built a
4280
07:38:35,800 --> 07:38:41,000
model. We looked at the model building essentials. We checked the contents of our model. We made
4281
07:38:41,000 --> 07:38:46,680
some predictions with a very poor model because it's based off random numbers. We spent a whole
4282
07:38:46,680 --> 07:38:50,760
bunch of time figuring out how we could train a model. We figured out what the loss function is.
4283
07:38:50,760 --> 07:38:57,320
We saw an optimizer. We wrote a training and test loop. We then learned how to save and load a
4284
07:38:57,320 --> 07:39:03,000
model in pytorch. So now I'd like to spend the next few videos putting all this together. We're
4285
07:39:03,000 --> 07:39:07,400
not going to spend as much time on each step, but we're just going to have some practice together
4286
07:39:07,400 --> 07:39:13,320
so that we can reiterate all the things that we've done. So putting it all together,
4287
07:39:14,280 --> 07:39:24,840
let's go back through the steps above and see it all in one place. Wonderful.
4288
07:39:24,840 --> 07:39:33,240
So we're going to start off with 6.1 and we'll go have a look at our workflow. So 6.1 is data,
4289
07:39:35,080 --> 07:39:39,800
but we're going to do one step before that. And I'm just going to get rid of this so we have a bit
4290
07:39:39,800 --> 07:39:46,200
more space. So we've got our data ready. We've turned it into tenses way back at the start.
4291
07:39:46,200 --> 07:39:50,600
Then we built a model and then we picked a loss function and an optimizer. We built a training
4292
07:39:50,600 --> 07:39:55,400
loop. We trained our model. We made some predictions. We saw that they were better. We evaluated our
4293
07:39:55,400 --> 07:40:00,280
model. We didn't use torch metrics, but we got visual. We saw our red dots starting to line up
4294
07:40:00,280 --> 07:40:04,680
with the green dots. We haven't really improved through experimentation. We did a little bit of
4295
07:40:04,680 --> 07:40:10,120
it though, as in we saw that if we trained our model for more epochs, we got better results.
4296
07:40:10,120 --> 07:40:14,360
So you could argue that we have done a little bit of this, but there are other ways to experiment.
4297
07:40:14,360 --> 07:40:19,240
We're going to cover those throughout the course. And then we saw how to save and reload a trained
4298
07:40:19,240 --> 07:40:24,760
model. So we've been through this entire workflow, which is quite exciting, actually.
4299
07:40:24,760 --> 07:40:29,480
So now let's go back through it, but we're going to do it a bit quicker than what we've done before,
4300
07:40:29,480 --> 07:40:35,800
because I believe you've got the skills to do so now. So let's start by importing pytorch.
4301
07:40:36,840 --> 07:40:41,000
So you could start the code from here if you wanted to. And that plot live. And actually,
4302
07:40:41,000 --> 07:40:46,680
if you want, you can pause this video and try to recode all of the steps that we've done
4303
07:40:46,680 --> 07:40:50,760
by putting some headers here, like data, and then build a model and then train the model,
4304
07:40:50,760 --> 07:40:57,800
save and load a model, whatever, and try to code it out yourself. If not, feel free to follow along
4305
07:40:57,800 --> 07:41:05,000
with me and we'll do it together. So import torch from torch import. Oh, would help if I could spell
4306
07:41:05,000 --> 07:41:10,920
torch import and n because we've seen that we use an n quite a bit. And we're going to also
4307
07:41:10,920 --> 07:41:15,160
import map plot live because we like to make some plots because we like to get visual.
4308
07:41:15,160 --> 07:41:22,440
Visualize visualize visualize as PLT. And we're going to check out pytorch version.
4309
07:41:24,200 --> 07:41:28,200
That way we know if you're on an older version, some of the code might not work here. But if you're
4310
07:41:28,200 --> 07:41:34,680
on a newer version, it should work. If it doesn't, let me know. There we go. 1.10. I'm using 1.10
4311
07:41:34,680 --> 07:41:39,640
for this. By the time you watch this video, there may be a later version out. And we're also going
4312
07:41:39,640 --> 07:41:46,280
to let's create some device agnostic code. So create device agnostic code, because I think we're
4313
07:41:46,280 --> 07:41:59,640
up to this step now. This means if we've got access to a GPU, our code will use it for potentially
4314
07:41:59,640 --> 07:42:15,320
faster computing. If no GPU is available, the code will default to using CPU. We don't necessarily
4315
07:42:15,320 --> 07:42:19,640
need to use a GPU for our particular problem that we're working on right now because it's a small
4316
07:42:19,640 --> 07:42:25,320
model and it's a small data set, but it's good practice to write device agnostic code. So that
4317
07:42:25,320 --> 07:42:31,880
means our code will use a GPU if it's available, or a CPU by default, if a GPU is not available.
4318
07:42:31,880 --> 07:42:38,520
So set up device agnostic code. We're going to be using a similar setup to this throughout the
4319
07:42:38,520 --> 07:42:43,640
entire course from now on. So that's why we're bringing it back. CUDA is available. So remember
4320
07:42:43,640 --> 07:42:50,760
CUDA is NVIDIA's programming framework for their GPUs, else use CPU. And we're going to print
4321
07:42:50,760 --> 07:43:00,200
what device are we using? Device. So what we might do is if we ran this, it should be just a CPU
4322
07:43:00,200 --> 07:43:06,760
for now, right? Yours might be different to this if you've enabled a GPU, but let's change this
4323
07:43:06,760 --> 07:43:12,360
over to use CUDA. And we can do that if you're using Google Colab, we can change the runtime type
4324
07:43:12,360 --> 07:43:17,240
by selecting GPU here. And then I'm going to save this, but what's going to happen is it's
4325
07:43:17,240 --> 07:43:21,960
going to restart the runtime. So we're going to lose all of the code that we've written above.
4326
07:43:22,760 --> 07:43:30,760
How can we get it all back? Well, we can go. Run all. This is going to run all of the cells
4327
07:43:30,760 --> 07:43:35,320
above here. They should all work and it should be quite quick because our model and data aren't
4328
07:43:35,320 --> 07:43:42,920
too big. And if it all worked, we should have CUDA as our device that we can use here. Wonderful.
4329
07:43:42,920 --> 07:43:48,360
So the beauty of Google Colab is that they've given us access to on a video GPU. So thank you,
4330
07:43:48,360 --> 07:43:54,760
Google Colab. Just once again, I'm paying for the paid version of Google Colab. You don't have to.
4331
07:43:54,760 --> 07:44:00,200
The free version should give you access to a GPU, or be it it might not be as a later version as
4332
07:44:00,200 --> 07:44:06,600
GPU as the pro versions give access to. But this will be more than enough for what we're about to
4333
07:44:06,600 --> 07:44:12,280
recreate. So I feel like that's enough for this video. We've got some device agnostic code ready
4334
07:44:12,280 --> 07:44:18,120
to go. And for the next few videos, we're going to be rebuilding this except using device agnostic
4335
07:44:18,120 --> 07:44:24,920
code. So give it a shot yourself. There's nothing in here that we haven't covered before. So I'll
4336
07:44:24,920 --> 07:44:32,040
see you in the next video. Let's create some data. Welcome back. In the last video, we set up some
4337
07:44:32,040 --> 07:44:36,520
device agnostic code and we got ready to start putting everything we've learned together.
4338
07:44:36,520 --> 07:44:41,560
So now let's continue with that. We're going to recreate some data. Now we could just copy this
4339
07:44:41,560 --> 07:44:46,680
code, but we're going to write it out together so we can have some practice creating a dummy data
4340
07:44:46,680 --> 07:44:51,720
set. And we want to get to about this stage in this video. So we want to have some data that we can
4341
07:44:51,720 --> 07:44:57,560
plot so that we can build a model to once again, learn on the blue dots to predict the green dots.
4342
07:44:58,280 --> 07:45:03,080
So we'll come down here data. I'm going to get out of this as well so that we have a bit more room.
4343
07:45:03,080 --> 07:45:19,400
Let's now create some data using the linear regression formula of y equals weight times
4344
07:45:19,400 --> 07:45:29,240
features plus bias. And you may have heard this as y equals mx plus c or mx plus b or something like
4345
07:45:29,240 --> 07:45:34,840
that, or you can substitute these for different names. Images when I learned this in high school,
4346
07:45:34,840 --> 07:45:41,160
it was y equals mx plus c. Yours might be slightly different. Yeah, bx plus a. That's what they use
4347
07:45:41,160 --> 07:45:45,080
here. A whole bunch of different ways to name things, but they're all describing the same thing.
4348
07:45:45,720 --> 07:45:51,800
So let's see this in code rather than formulaic examples. So we're going to create our weight,
4349
07:45:51,800 --> 07:45:58,200
which is 0.7 and a bias, which is 0.3. These are the values we previously used for a challenge you
4350
07:45:58,200 --> 07:46:04,280
could change these to 0.1 maybe and 0.2. These could be whatever values you'd like to set them as.
4351
07:46:05,240 --> 07:46:11,880
So weight and bias, the principle is going to be the same thing. We're going to try and build a
4352
07:46:11,880 --> 07:46:18,840
model to estimate these values. So we're going to start at 0 and we're going to end at 1.
4353
07:46:19,640 --> 07:46:24,200
So we can just create a straight line and we're going to fill in those between 0 and 1 with a
4354
07:46:24,200 --> 07:46:32,680
step of 0.02. And now we'll create the x and y features x and y, which is features and labels
4355
07:46:32,680 --> 07:46:40,360
actually. So x is our features and y are our labels. x equals torch dot a range and x is a
4356
07:46:40,360 --> 07:46:47,320
capital Y is that because typically x is a feature matrix. Even though ours is just a vector now,
4357
07:46:47,320 --> 07:46:50,920
we're going to unsqueeze this so we don't run into dimensionality issues later on.
4358
07:46:50,920 --> 07:47:00,360
You can check this for yourself without unsqueeze, errors will pop up and y equals weight times
4359
07:47:00,360 --> 07:47:05,800
x plus bias. You see how we're going a little bit faster now? This is sort of the pace that we're
4360
07:47:05,800 --> 07:47:11,240
going to start going for things that we've already covered. If we haven't covered something, we'll
4361
07:47:11,240 --> 07:47:15,640
slow down, but if we have covered something, I'm going to step it through. We're going to start
4362
07:47:15,640 --> 07:47:22,440
speeding things up a little. So if we get some values here, wonderful. We've got some x values
4363
07:47:22,440 --> 07:47:28,120
and they correlate to some y values. We're going to try and use the training values of x to predict
4364
07:47:28,120 --> 07:47:34,200
the training values of y and subsequently for the test values. Oh, and speaking of training and test
4365
07:47:34,200 --> 07:47:41,320
values, how about we split the data? So let's split the data. Split data. So we'll create the
4366
07:47:41,320 --> 07:47:47,240
train split equals int 0.8. We're going to use 80%, which is where 0.8 comes from,
4367
07:47:47,800 --> 07:47:52,680
for the length of x. So we use 80% of our samples for the training, which is a typical
4368
07:47:52,680 --> 07:47:59,880
training and test split, 80, 20. They're abouts. You could use like 70, 30. You could use 90, 10.
4369
07:47:59,880 --> 07:48:04,200
It all depends on how much data you have. There's a lot of things in machine learning that are
4370
07:48:04,200 --> 07:48:10,280
quite flexible. Train split, we're going to index on our data here so that we can create our splits.
4371
07:48:10,280 --> 07:48:19,080
Google Colab auto corrected my code in a non-helpful way just then. And we're going to do the
4372
07:48:19,080 --> 07:48:27,080
opposite split for the testing data. Now let's have a look at the lengths of these. If my calculations
4373
07:48:27,080 --> 07:48:37,560
are correct, we should have about 40 training samples and 10 testing samples. And again, this
4374
07:48:37,560 --> 07:48:42,760
may change in the future. When you work with larger data sets, you might have 100,000 training
4375
07:48:42,760 --> 07:48:50,360
samples and 20,000 testing samples. The ratio will often be quite similar. And then let's plot
4376
07:48:50,360 --> 07:48:59,480
what's going on here. So plot the data and note, if you don't have the plot predictions
4377
07:48:59,480 --> 07:49:08,360
function loaded, this will error. So we can just run plot predictions here if we wanted to. And
4378
07:49:08,360 --> 07:49:16,360
we'll pass it in X train, Y train, X test, Y test. And this should come up with our
4379
07:49:17,960 --> 07:49:22,120
plot. Wonderful. So we've just recreated the data that we've been previously using. We've got
4380
07:49:22,120 --> 07:49:26,920
blue dots to predict green dots. But if this function errors out because you've started the notebook
4381
07:49:26,920 --> 07:49:33,640
from here, right from this cell, and you've gone down from there, just remember, you'll just have
4382
07:49:33,640 --> 07:49:40,040
to go up here and copy this function. We don't have to do it because we've run all the cells,
4383
07:49:40,040 --> 07:49:45,960
but if you haven't run that cell previously, you could put it here and then run it, run it,
4384
07:49:46,520 --> 07:49:54,360
and we'll get the same outcome here. Wonderful. So what's next? Well, if we go back to our workflow,
4385
07:49:54,360 --> 07:50:00,600
we've just created some data. And have we turned it into tenses yet? I think it's just still, oh,
4386
07:50:00,600 --> 07:50:07,000
yeah, it is. It's tenses because we use PyTorch to create it. But now we're up to building or
4387
07:50:07,000 --> 07:50:12,600
picking a model. So we've built a model previously. We did that back in build model. So you could
4388
07:50:12,600 --> 07:50:16,200
refer to that code and try to build a model to fit the data that's going on here. So that's
4389
07:50:16,200 --> 07:50:23,400
your challenge for the next video. So building a PyTorch linear model. And why do we call it linear?
4390
07:50:23,400 --> 07:50:29,960
Because linear refers to a straight line. What's nonlinear? Non-straight. So I'll see you in the
4391
07:50:29,960 --> 07:50:35,400
next video. Give it a shot before we get there. But we're going to build a PyTorch linear model.
4392
07:50:39,160 --> 07:50:44,520
Welcome back. We're going through some steps to recreate everything that we've done. In the last
4393
07:50:44,520 --> 07:50:51,080
video, we created some dummy data. And we've got a straight line here. So now by the workflow,
4394
07:50:51,080 --> 07:50:54,840
we're up to building a model or picking a model. In our case, we're going to build one
4395
07:50:54,840 --> 07:51:00,360
to suit our problem. So we've got some linear data. And I've put building a PyTorch linear model
4396
07:51:00,360 --> 07:51:04,520
here. I issued you the challenge of giving it a go. You could do exactly the same steps that
4397
07:51:04,520 --> 07:51:08,760
we've done in build model. But I'm going to be a little bit cheeky and introduce something
4398
07:51:09,400 --> 07:51:17,160
new here. And that is the power of torch.nn. So let's see it. What we're going to do is we're
4399
07:51:17,160 --> 07:51:28,040
going to create a linear model by subclassingnn.module because why a lot of PyTorch models,
4400
07:51:28,040 --> 07:51:33,880
subclass, and then module. So class linear regression, what should we call this one?
4401
07:51:34,520 --> 07:51:41,080
Linear regression model v2. How about that? And we'll subclassnn.module. So much similar code to
4402
07:51:41,080 --> 07:51:46,120
what we've been writing so far. Or when we first created our linear regression model.
4403
07:51:46,120 --> 07:51:53,160
And then we're going to put the standard constructor code here, def init underscore underscore.
4404
07:51:53,160 --> 07:51:59,720
And it's going to take as an argument self. And then we're going to call super dot another
4405
07:51:59,720 --> 07:52:07,880
underscore init underscore underscore brackets. But we're going to instead of if you recall above
4406
07:52:08,440 --> 07:52:15,160
back in the build model section, we initialized these parameters ourselves. And I've been hinting
4407
07:52:15,160 --> 07:52:23,320
at in the past in videos we've seen before that oftentimes you won't necessarily initialize the
4408
07:52:23,320 --> 07:52:31,160
parameters yourself. You'll instead initialize layers that have the parameters in built in those
4409
07:52:31,160 --> 07:52:37,800
layers. We still have to create a forward method. But what we're going to see is how we can use our
4410
07:52:37,800 --> 07:52:43,960
torch linear layer to do these steps for us. So let's write the code and then we'll step through it.
4411
07:52:43,960 --> 07:52:53,320
So we'll go usenn.linear because why we're building linear regression model and our data is linear.
4412
07:52:53,320 --> 07:53:00,760
And in the past, our previous model has implemented linear regression formula. So for creating the
4413
07:53:00,760 --> 07:53:11,640
model parameters. So we can go self dot linear layer equals. So this is constructing a variable
4414
07:53:11,640 --> 07:53:20,600
that this class can use self linear layer equals nn dot linear. Remember, nn in PyTorch stands for
4415
07:53:20,600 --> 07:53:27,880
neural network. And we have in features as one of the parameters and out features as another
4416
07:53:27,880 --> 07:53:35,240
parameter. This means we want to take as input of size one and output of size one. Where does that
4417
07:53:35,240 --> 07:53:46,200
come from? Well, if we have a look at x train and y train, we have one value of x. Maybe there's
4418
07:53:46,200 --> 07:53:58,440
too many here. x five will be the first five five and five. So recall, we have one value of x
4419
07:53:58,440 --> 07:54:04,120
equates to one value of y. So that means within this linear layer, we want to take as one feature
4420
07:54:04,120 --> 07:54:11,480
x to output one feature y. And we're using just one layer here. So the input and the output shapes
4421
07:54:11,480 --> 07:54:18,520
of your model in features, out features, what data goes in and what data comes out. These values
4422
07:54:18,520 --> 07:54:23,160
will be highly dependent on the data that you're working with. And we're going to see different
4423
07:54:23,160 --> 07:54:29,000
data or different examples of input features and output features all throughout this course. So
4424
07:54:29,000 --> 07:54:34,680
but that is what's happening. We have one in feature to one out feature. Now what's happening
4425
07:54:34,680 --> 07:54:42,120
inside nn.linear. Let's have a look torch and then linear. We go the documentation
4426
07:54:43,720 --> 07:54:48,920
applies a linear transformation to the incoming data. Where have we seen this before?
4427
07:54:49,720 --> 07:54:55,640
y equals x a t plus b. Now they're using different letters, but we've got the same formula as
4428
07:54:55,640 --> 07:55:03,080
what's happening up here. Look at the same formula as our data. Wait times x plus bias. And then if
4429
07:55:03,080 --> 07:55:11,320
we look up linear regression formula once again, linear regression formula. We've got this formula
4430
07:55:11,320 --> 07:55:19,240
here. Now again, these letters can be replaced by whatever letters you like. But this linear layer
4431
07:55:19,240 --> 07:55:27,000
is implementing the linear regression formula that we created in our model before. So it's
4432
07:55:27,000 --> 07:55:34,040
essentially doing this part for us. And behind the scenes, the layer creates these parameters for us.
4433
07:55:34,600 --> 07:55:39,880
So that's a big piece of the puzzle of pie torch is that as I've said, you won't always be
4434
07:55:39,880 --> 07:55:45,960
initializing the parameters your model yourself. You'll generally initialize layers. And then you'll
4435
07:55:45,960 --> 07:55:52,280
use those layers in some Ford computation. So let's see how we could do that. So we've got a linear
4436
07:55:52,280 --> 07:55:58,120
layer which takes us in features one and out features one. What should we do now? Well, because
4437
07:55:58,120 --> 07:56:05,880
we've subclassed nn.module we need to override the Ford method. So we need to tell our model
4438
07:56:05,880 --> 07:56:10,600
what should it do as the Ford computation. And in here it's going to take itself as input,
4439
07:56:10,600 --> 07:56:15,720
as well as x, which is conventional for the input data. And then we're just going to return
4440
07:56:15,720 --> 07:56:24,920
here, self dot linear layer x. Right. And actually, we might use some typing here to say that this
4441
07:56:24,920 --> 07:56:31,960
should be a torch tensor. And it's also going to return a torch dot tensor. That's using Python's
4442
07:56:31,960 --> 07:56:37,320
type ins. So this is just saying, hey, X should be a torch tensor. And I'm going to return you a
4443
07:56:37,320 --> 07:56:43,000
torch tensor, because I'm going to pass x through the linear layer, which is expecting one in feature
4444
07:56:43,000 --> 07:56:48,520
and one out feature. And it's going to this linear transform. That's another word for it. Again,
4445
07:56:48,520 --> 07:56:53,720
pytorch and machine learning in general has many different names of the same thing. I would call
4446
07:56:53,720 --> 07:57:03,720
this linear layer. I'm going to write here, also called linear transform, probing layer,
4447
07:57:03,720 --> 07:57:12,920
fully connected layer, dense layer, intensive flow. So a whole bunch of different names for
4448
07:57:12,920 --> 07:57:17,800
the same thing, but they're all implementing a linear transform. They're all implementing a
4449
07:57:17,800 --> 07:57:24,760
version of linear regression y equals x, a ranspose plus b, in features, out features,
4450
07:57:24,760 --> 07:57:32,280
wonderful. So let's see this in action. So we're going to go set the manual seed so we can
4451
07:57:32,280 --> 07:57:42,200
get reproducibility as well, torch dot manual seed. And we're going to set model one equals
4452
07:57:42,920 --> 07:57:48,280
linear regression. This is model one, because we've already got model zero, linear regression
4453
07:57:48,280 --> 07:57:54,440
V two, and we're going to check model one, and we're going to check its state dictionary,
4454
07:57:55,080 --> 07:58:01,160
state dict. There we go. What do we have inside this ordered dict? Has that not created anything
4455
07:58:01,160 --> 07:58:14,840
for us? Model one, dot state dinked, ordered dink. We haven't got anything here in the regression
4456
07:58:14,840 --> 07:58:24,680
model V two. Ideally, this should be outputting a weight and a bias. Yeah, variables, weight,
4457
07:58:24,680 --> 07:58:29,640
and bias. Let's dig through our code line by line and see what we've got wrong. Ah, did you notice
4458
07:58:29,640 --> 07:58:34,840
this? The init function so the constructor had the wrong amount of underscores. So it was never
4459
07:58:34,840 --> 07:58:42,840
actually constructing this linear layer troubleshooting on the fly team. There we go. Beautiful. So we
4460
07:58:42,840 --> 07:58:50,440
have a linear layer, and we have it is created for us inside a weight and a bias. So effectively,
4461
07:58:50,440 --> 07:58:55,800
we've replaced the code we wrote above for build model, initializing a weight and bias parameter
4462
07:58:55,800 --> 07:59:01,240
with the linear layer. And you might be wondering why the values are slightly different, even though
4463
07:59:01,240 --> 07:59:07,160
we've used the manual seed. This goes behind the scenes of how PyTorch creates its different
4464
07:59:07,160 --> 07:59:11,560
layers. It's probably using a different form of randomness to create different types of
4465
07:59:11,560 --> 07:59:17,720
variables. So just keep that in mind. And to see this in action, we have a conversion here.
4466
07:59:18,760 --> 07:59:24,600
So this is what's going on. We've converted, this is our original model class, linear regression.
4467
07:59:24,600 --> 07:59:29,880
We initialize our model parameters here. We've got a weight and a bias. But instead, we've
4468
07:59:29,880 --> 07:59:36,040
swapped this in our linear regression model V2. This should be V2 to use linear layer. And then
4469
07:59:36,040 --> 07:59:42,200
in the forward method, we had to write the formula manually here when we initialize the parameters
4470
07:59:42,200 --> 07:59:48,600
manually. But because of the power of torch.nn, we have just passed it through the linear layer,
4471
07:59:48,600 --> 07:59:54,280
which is going to perform some predefined forward computation in this layer. So this
4472
07:59:54,280 --> 07:59:59,640
style of what's going on here is how you're going to see the majority of your PyTorch
4473
07:59:59,640 --> 08:00:07,000
deep learning models created using pre-existing layers from the torch.nn module. So if we go back
4474
08:00:07,000 --> 08:00:14,840
into torch.nn, torch.nn, we have a lot of different layers here. So we have convolutional layers,
4475
08:00:14,840 --> 08:00:19,480
pooling layers, padding layers, normalization, recurrent, transformer, linear, we're using a
4476
08:00:19,480 --> 08:00:24,200
linear layer, dropout, et cetera, et cetera. So for all of the common layers in deep learning,
4477
08:00:24,200 --> 08:00:28,600
because that's what neural networks are, they're layers of different mathematical transformations,
4478
08:00:29,160 --> 08:00:34,840
PyTorch has a lot of pre-built implementations. So that's a little bit of a sneaky trick that
4479
08:00:34,840 --> 08:00:39,960
I've done to alter our model. But we've still got basically the exact same model as we had before.
4480
08:00:39,960 --> 08:00:44,840
So what's next? Well, it's to train this model. So let's do that in the next video.
4481
08:00:44,840 --> 08:00:54,360
Welcome back. So in the last video, we built a PyTorch linear model, nice and simple using a
4482
08:00:54,360 --> 08:01:01,800
single nn.linear layer with one in feature, one out feature. And we over read the forward method
4483
08:01:01,800 --> 08:01:09,160
of nn.module using the linear layer that we created up here. So what's going to happen is when we do
4484
08:01:09,160 --> 08:01:13,960
the forward parser on our model, we're going to put some data in and it's going to go through
4485
08:01:13,960 --> 08:01:18,840
this linear layer, which behind the scenes, as we saw with torch and n linear,
4486
08:01:20,120 --> 08:01:27,320
behind the scenes, it's going to perform the linear regression formula here. So y equals x,
4487
08:01:27,320 --> 08:01:35,080
a t plus b. But now case, we've got weight and bias. So let's go back. It's now time to write
4488
08:01:35,080 --> 08:01:43,640
some training code. But before we do, let's set the model to use the target device. And so in
4489
08:01:43,640 --> 08:01:50,440
our case, we've got a device of CUDA. But because we've written device agnostic code, if we didn't
4490
08:01:50,440 --> 08:01:57,720
have access to a CUDA device, a GPU, our default device would be a CPU. So let's check the model
4491
08:01:57,720 --> 08:02:04,840
device. We can do that first up here, check the model current device, because we're going to use
4492
08:02:04,840 --> 08:02:12,120
the GPU here, or we're going to write device agnostic code. That's better to say device agnostic code.
4493
08:02:12,120 --> 08:02:19,400
That's the proper terminology device. What device are we currently using? This is the CPU, right?
4494
08:02:19,400 --> 08:02:27,800
So by default, the model will end up on the CPU. But if we set it to model one call dot two device,
4495
08:02:27,800 --> 08:02:31,960
what do you think it's going to do now? If our current target device is CUDA, we've seen what
4496
08:02:31,960 --> 08:02:38,200
two does in the fundamental section, two is going to send the model to the GPU memory. So now let's
4497
08:02:38,200 --> 08:02:45,960
check whether parameters of our model live dot device. If we send them to the device previously,
4498
08:02:45,960 --> 08:02:50,440
it was the CPU, it's going to take a little bit longer while the GPU gets fired up and goes,
4499
08:02:50,440 --> 08:02:56,360
PyTorch goes, Hey, I'm about to send you this model. You ready for it? Boom, there we go. Wonderful.
4500
08:02:56,360 --> 08:03:02,600
So now our model is on the device or the target device, which is CUDA. And if CUDA wasn't available,
4501
08:03:02,600 --> 08:03:07,480
the target device would be CPU. So this would just come out just exactly how we've got it here.
4502
08:03:07,480 --> 08:03:13,160
But with that being said, now let's get on to some training code. And this is the fun part.
4503
08:03:13,160 --> 08:03:18,760
What do we have to do? We've already seen this for training. I'm just going to clear up our
4504
08:03:18,760 --> 08:03:25,000
workspace a little bit here. For training, we need, this is part of the PyTorch workflow,
4505
08:03:25,000 --> 08:03:29,960
we need a loss function. What does a loss function do? Measures how wrong our model is,
4506
08:03:29,960 --> 08:03:37,800
we need an optimizer, we need a training loop and a testing loop. And the optimizer, what does that
4507
08:03:37,800 --> 08:03:44,600
do? Well, it optimizes the parameters of our model. So in our case, model one dot state dig,
4508
08:03:44,600 --> 08:03:50,200
what do we have? So we have some parameters here within the linear layer, we have a weight,
4509
08:03:50,200 --> 08:03:56,520
and we have a bias. The optimizer is going to optimize these random parameters so that they
4510
08:03:56,520 --> 08:04:01,880
hopefully reduce the loss function, which remember the loss function measures how wrong our model
4511
08:04:01,880 --> 08:04:06,120
is. So in our case, because we're working with the regression problem, let's set up the loss
4512
08:04:06,120 --> 08:04:12,520
function. And by the way, all of these steps are part of the workflow. We've got data ready,
4513
08:04:12,520 --> 08:04:17,800
we've built or picked a model, we're using a linear model. Now we're up to here 2.1 pick a loss
4514
08:04:17,800 --> 08:04:22,040
function and an optimizer, we're going to do build a training loop in the same session,
4515
08:04:22,040 --> 08:04:26,680
because you know what, we're getting pretty darn good at this, loss function equals what?
4516
08:04:27,240 --> 08:04:32,600
Well, we're going to use l one loss. So let's set that up and then dot l one loss, which is the
4517
08:04:32,600 --> 08:04:42,360
same as ma and if we wanted to set up our optimizer, what optimizer could we use? Well, pytorch offers
4518
08:04:42,360 --> 08:04:49,240
a lot of optimizers in torch dot opt in SGD. That's stochastic gradient descent, because remember
4519
08:04:49,240 --> 08:04:56,040
gradient descent is the algorithm that optimizes our model parameters. Adam is another popular option.
4520
08:04:56,040 --> 08:05:01,080
For now, we're going to stick with SGD. LR, which stands for learning rate. In other words,
4521
08:05:01,080 --> 08:05:07,960
how big of a step will our optimizer change our parameters with every iteration, a smaller
4522
08:05:07,960 --> 08:05:15,800
learning rate. So such as 0001 will be a small step. And then a large learning rate, such as 0.1
4523
08:05:15,800 --> 08:05:22,840
will be a larger step. Too big of a step. Our model learns too much. And it explodes too small of a
4524
08:05:22,840 --> 08:05:28,680
step. Our model never learns anything. But oh, we actually have to pass params first. I forgot
4525
08:05:28,680 --> 08:05:33,720
about that. I got ahead of myself with a learning rate. Params is the parameters we'd like our
4526
08:05:33,720 --> 08:05:39,880
optimizer to optimize. So in our case, it's model one dot parameters, because model one is our current
4527
08:05:39,880 --> 08:05:47,560
target model. Beautiful. So we've got a loss function and an optimizer. Now, let's write a training
4528
08:05:47,560 --> 08:05:54,040
loop. So I'm going to set torch manual seeds so we can try and get as reproducible as results as
4529
08:05:54,040 --> 08:05:59,320
possible. Remember, if you get different numbers to what I'm getting, don't worry too much if they're
4530
08:05:59,320 --> 08:06:05,160
not exactly the same, the direction is more important. So that means if my loss function is
4531
08:06:05,160 --> 08:06:10,280
getting smaller, yours should be getting smaller too. Don't worry too much if your fourth decimal
4532
08:06:10,280 --> 08:06:17,240
place isn't the same as what my values are. So we have a training loop ready to be written here.
4533
08:06:17,240 --> 08:06:22,120
Epox, how many should we do? Well, we did 200 last time and that worked pretty well. So let's do 200
4534
08:06:22,120 --> 08:06:28,360
again. Did you go through the extra curriculum yet? Did you watch the video for the unofficial
4535
08:06:28,360 --> 08:06:36,440
PyTorch optimization loop song yet? This one here, listen to the unofficial PyTorch optimization
4536
08:06:36,440 --> 08:06:45,960
loop song. If not, it's okay. Let's sing it together. So for an epoch in range, epochs, we're going to
4537
08:06:45,960 --> 08:06:51,000
go through the song in a second. We're going to set the model to train. In our case, it's model one,
4538
08:06:51,880 --> 08:06:57,400
model to train. Now, step number one is what? Do the forward pass. This is where we calculate
4539
08:06:57,400 --> 08:07:02,760
the predictions. So we calculate the predictions by passing the training data through our model.
4540
08:07:03,320 --> 08:07:09,960
And in our case, because the forward method in model one implements the linear layer,
4541
08:07:09,960 --> 08:07:15,720
this data is going to go through the linear layer, which is torch.nn.linear and go through
4542
08:07:15,720 --> 08:07:21,880
the linear regression formula. And then we calculate the loss, which is how wrong our models predictions
4543
08:07:21,880 --> 08:07:30,120
are. So the loss value equals loss fn. And here we're going to pass in y-pred and y-train.
4544
08:07:31,160 --> 08:07:37,640
Then what do we do? We zero the optimizer, optimizer zero grad, which because by default,
4545
08:07:37,640 --> 08:07:44,120
the optimizer is going to accumulate gradients behind the scenes. So every epoch, we want to
4546
08:07:44,120 --> 08:07:50,920
reduce those back to zero. So it starts from fresh. We're going to perform back propagation here,
4547
08:07:50,920 --> 08:07:57,720
back propagation, by calling loss, stop backwards. If the forward pass goes forward through the
4548
08:07:57,720 --> 08:08:03,400
network, the backward pass goes backwards through the network, calculating the gradients for the
4549
08:08:03,400 --> 08:08:09,640
loss function with respect to each parameter in the model. So optimizer step, this next part,
4550
08:08:09,640 --> 08:08:15,480
is going to look at those gradients and go, you know what? Which way should I optimize the parameters?
4551
08:08:15,480 --> 08:08:20,840
So because the optimizer is optimizing the model parameters, it's going to look at the
4552
08:08:20,840 --> 08:08:26,280
loss and go, you know what? I'm going to adjust the weight to be increased. And I'm going to lower
4553
08:08:26,280 --> 08:08:32,440
the bias and see if that reduces the loss. And then we can do testing. We can do both of these in
4554
08:08:32,440 --> 08:08:36,680
the same hit. Now we are moving quite fast through this because we spent a whole bunch of time
4555
08:08:36,680 --> 08:08:42,360
discussing what's going on here. So for testing, what do we do? We set the model into evaluation
4556
08:08:42,360 --> 08:08:47,160
mode. That's going to turn off things like dropout and batch normalization layers. We don't have any
4557
08:08:47,160 --> 08:08:53,080
of that in our model for now, but just it's good practice to always call a vowel whenever you're
4558
08:08:53,080 --> 08:08:58,200
doing testing. And same with inference mode. We don't need to track gradients and a whole bunch of
4559
08:08:58,200 --> 08:09:02,840
other things PyTorch does behind the scenes when we're testing or making predictions. So we use
4560
08:09:02,840 --> 08:09:08,040
the inference mode context manager. This is where we're going to create test pred, which is going
4561
08:09:08,040 --> 08:09:13,720
to be our test predictions, because here we're going to pass the test data features, forward
4562
08:09:13,720 --> 08:09:20,280
pass through our model. And then we can calculate the test loss, which is our loss function. And we're
4563
08:09:20,280 --> 08:09:28,760
going to compare the test pred to Y test. Wonderful. And then we can print out what's happening.
4564
08:09:28,760 --> 08:09:40,040
So what should we print out? How about if epoch divided by 10 equals zero. So every 10 epochs,
4565
08:09:40,040 --> 08:09:51,560
let's print something out, print. We'll do an F string here, epoch is epoch. And then we'll go
4566
08:09:51,560 --> 08:09:58,680
loss, which is the training loss, and just be equal to the loss. And then we'll go test loss is
4567
08:09:58,680 --> 08:10:08,360
equal to test loss. So do you think this will work? It's okay if you're not sure. But let's find
4568
08:10:08,360 --> 08:10:15,000
out together, hey, oh, we've got a, we need a bracket there. Oh my goodness, what's going on?
4569
08:10:15,000 --> 08:10:22,600
Run time error. Expected all tenses to be on the same device. Oh, of course. Do you know what's
4570
08:10:22,600 --> 08:10:28,760
happening here? But we found at least two devices, CUDA and CPU. Yes, of course, that's what's happened.
4571
08:10:28,760 --> 08:10:36,280
So what have we done? Up here, we put our model on the GPU. But what's going on here? Our data?
4572
08:10:36,280 --> 08:10:43,880
Has our data on the GPU? No, it's not. By default, it's on the CPU. So we haven't written device
4573
08:10:43,880 --> 08:10:50,360
agnostic code for our data. So let's write it here, put data on the target device.
4574
08:10:52,040 --> 08:10:59,960
Device agnostic code for data. So remember, one of the biggest issues with pytorch aside from
4575
08:10:59,960 --> 08:11:06,200
shape errors is that you should have your data or all of the things that you're computing with
4576
08:11:06,200 --> 08:11:11,880
on the same device. So that's why if we set up device agnostic code for our model,
4577
08:11:11,880 --> 08:11:20,200
we have to do the same for our data. So now let's put X train to device. Y train equals Y train
4578
08:11:20,200 --> 08:11:25,320
to device. This is going to create device agnostic code. In our case, it's going to use CUDA because
4579
08:11:25,320 --> 08:11:32,840
we have access to a CUDA device. But if we don't, this code will still work. It will still default
4580
08:11:32,840 --> 08:11:38,600
to CPU. So this is good. I like that we got that error because that's the sum of the things you're
4581
08:11:38,600 --> 08:11:42,520
going to come across in practice, right? So now let's run this. What's happening here?
4582
08:11:43,240 --> 08:11:50,680
Hey, look at that. Wonderful. So our loss starts up here nice and high. And then it starts to go
4583
08:11:50,680 --> 08:11:56,200
right down here for the training data. And then the same for the testing data. Beautiful.
4584
08:11:56,840 --> 08:12:02,520
Right up here. And then all the way down. Okay. So this looks pretty good on the test data set. So
4585
08:12:02,520 --> 08:12:08,200
how can we check this? How can we evaluate our model? Well, one way is to check its state
4586
08:12:08,200 --> 08:12:16,280
deck. So state decked. What do we got here? What are our weight and bias? Oh my gosh, so close.
4587
08:12:16,840 --> 08:12:24,040
So we just set weight and bias before to be 0.7 and 0.3. So this is what our model has estimated
4588
08:12:24,040 --> 08:12:31,000
our parameters to be based on the training data. 0.6968. That's pretty close to 0.7,
4589
08:12:31,000 --> 08:12:38,920
nearly perfect. And the same thing with the bias 0.3025 versus the perfect value is 0.93. But remember,
4590
08:12:38,920 --> 08:12:44,680
in practice, you won't necessarily know what the ideal parameters are. This is just to exemplify
4591
08:12:44,680 --> 08:12:50,120
what our model is doing behind the scenes. It's moving towards some ideal representative
4592
08:12:50,120 --> 08:12:56,520
parameters of whatever data we're working with. So in the next video, I'd like you to give it a go
4593
08:12:56,520 --> 08:13:02,120
of before we get to the next video, make some predictions with our model and plot them on the
4594
08:13:02,120 --> 08:13:08,600
original data. How close to the green dots match up with the red dots? And you can use this plot
4595
08:13:08,600 --> 08:13:14,040
predictions formula or function that we've been using in the past. So give that a go and I'll
4596
08:13:14,040 --> 08:13:19,240
see you in the next video. But congratulations. Look how quickly we just trained a model using
4597
08:13:19,240 --> 08:13:25,080
the steps that we've covered in a bunch of videos so far and device agnostic code. So good.
4598
08:13:25,080 --> 08:13:33,000
I'll see you soon. In the last video, we did something very, very exciting. We worked through
4599
08:13:33,000 --> 08:13:38,600
training an entire neural network. Some of these steps took us an hour or so worth of videos to
4600
08:13:38,600 --> 08:13:43,480
go back through before. But we coded that in one video. So you're ready listening the song just to
4601
08:13:43,480 --> 08:13:49,720
remind ourselves of what's going on. For an epoch in a range, call model dot train, do the forward
4602
08:13:49,720 --> 08:13:59,320
pass, calculate the loss, optimizer zero grad, loss backward, optimizer step, step, step, let's
4603
08:13:59,320 --> 08:14:06,600
test, come on a dot eval with torch inference mode, do the forward pass, calculate the loss,
4604
08:14:06,600 --> 08:14:15,240
print out what's happening. And then we do it again, again, again, for another epoch in a range.
4605
08:14:15,240 --> 08:14:18,440
Now I'm kidding. We'll just leave it there. We'll just leave it there. But that's the
4606
08:14:18,440 --> 08:14:24,680
unofficial pytorch optimization loop song. We created some device agnostic code so that we could
4607
08:14:24,680 --> 08:14:29,880
make the calculations on the same device as what our model is because the models also using device
4608
08:14:29,880 --> 08:14:36,200
agnostic code. And so now we've got to evaluate our models. We've looked at the loss and the test
4609
08:14:36,200 --> 08:14:41,560
lost here. And we know that our models loss is going down. But what does this actually equate to
4610
08:14:41,560 --> 08:14:45,880
when it makes predictions? That's what we're most interested in, right? And we've looked at the
4611
08:14:45,880 --> 08:14:51,960
parameters. They're pretty close to the ideal parameters. So at the end of last video, I issued
4612
08:14:51,960 --> 08:15:00,760
you the challenge to making and evaluating predictions to make some predictions and plot them. I hope
4613
08:15:00,760 --> 08:15:08,120
you gave it a shot. Let's see what it looks like together. Hey, so turn the model into evaluation
4614
08:15:08,120 --> 08:15:13,080
mode. Why? Because every time we're making predictions or inference, we want our model to be in a
4615
08:15:13,080 --> 08:15:18,680
vowel mode. And every time we're training, we want our model to be in training mode. And then we're
4616
08:15:18,680 --> 08:15:26,280
going to make predictions on the test data, because we train on the train data, and we evaluate our
4617
08:15:26,280 --> 08:15:31,320
model on the test data data that our model has never actually seen, except for when it makes
4618
08:15:31,320 --> 08:15:37,720
predictions. With torch inference mode, we turn on inference mode whenever we make inference or
4619
08:15:37,720 --> 08:15:43,960
predictions. So we're going to set Y threads equal to model one, and the test data goes in here.
4620
08:15:43,960 --> 08:15:50,840
Let's have a look at what the Y threads look like. Wonderful. So we've got a tensor here. It shows
4621
08:15:50,840 --> 08:15:56,040
us that they're still on the device CUDA. Why is that? Well, that's because previously we set the
4622
08:15:56,040 --> 08:16:03,000
model one to the device, the target device, the same with the test data. So subsequently,
4623
08:16:03,000 --> 08:16:09,240
our predictions are also on the CUDA device. Now, let's bring in the plot predictions function here.
4624
08:16:09,880 --> 08:16:17,800
So check out our model predictions visually. We're going to adhere to the data explorer's motto
4625
08:16:17,800 --> 08:16:26,200
of visualize visualize visualize plot predictions. And predictions are going to be set to
4626
08:16:26,200 --> 08:16:33,160
equals Y threads. And let's have a look. How good do these look? Oh, no.
4627
08:16:35,640 --> 08:16:41,640
Oh, we've got another error type error. Can't convert CUDA device type tensor to NumPy.
4628
08:16:41,640 --> 08:16:48,280
Oh, of course. Look what we've done. So our plot predictions function, if we go back up,
4629
08:16:48,280 --> 08:16:53,320
where did we define that? What does our plot predictions function use? It uses matplotlib,
4630
08:16:53,320 --> 08:17:01,800
of course, and matplotlib works with NumPy, not pytorch. And NumPy is CPU based. So of course,
4631
08:17:01,800 --> 08:17:07,320
we're running into another error down here, because we just said that our predictions are on the CUDA
4632
08:17:07,320 --> 08:17:13,640
device. They're not on the CPU. They're on a GPU. So it's giving us this helpful information here.
4633
08:17:13,640 --> 08:17:19,400
Use tensor dot CPU to copy the tensor to host memory first. So this is our tensor. Let's call
4634
08:17:19,400 --> 08:17:26,200
dot CPU and see what happens then. Is that going to go to CPU? Oh, my goodness. Look at that.
4635
08:17:27,160 --> 08:17:34,200
Look at that. Go the linear layer. The red dots, the predictions are basically on top of the testing
4636
08:17:34,200 --> 08:17:38,680
data. That is very exciting. Now again, you may not get the exact same numbers here, and that is
4637
08:17:38,680 --> 08:17:43,880
perfectly fine. But the direction should be quite similar. So your red dots should be basically on
4638
08:17:43,880 --> 08:17:50,040
top of the green dots, if not very slightly off. But that's okay. That's okay. We just want to focus
4639
08:17:50,040 --> 08:17:57,080
on the direction here. So thanks to the power of back propagation here and gradient descent,
4640
08:17:57,080 --> 08:18:05,080
our models random parameters have updated themselves to be as close as possible to the ideal parameters.
4641
08:18:05,080 --> 08:18:09,240
And now the predictions are looking pretty darn good for what we're trying to predict.
4642
08:18:09,240 --> 08:18:13,080
But we're not finished there. We've just finished training this model. What would happen if our
4643
08:18:13,080 --> 08:18:18,120
notebook disconnected right now? Well, that wouldn't be ideal, would it? So in the next part,
4644
08:18:18,120 --> 08:18:27,640
we're going to move on to 6.5, saving, and loading a trained model. So I'm going to give you a
4645
08:18:27,640 --> 08:18:33,480
challenge here as well, is to go ahead and go back and refer to this code here, saving model
4646
08:18:33,480 --> 08:18:39,800
in PyTorch, loading a PyTorch model, and see if you can save model one, the state dictionary of
4647
08:18:39,800 --> 08:18:45,000
model one, and load it back in and get something similar to this. Give that a shot, and I'll see you
4648
08:18:45,000 --> 08:18:53,080
in the next video. Welcome back. In the last video, we saw the power of the torch.nn.linear layer,
4649
08:18:53,080 --> 08:18:58,520
and back propagation and gradient descent. And we've got some pretty darn good predictions
4650
08:18:58,520 --> 08:19:03,800
out of our model. So that's very exciting. Congratulations. You've now trained two machine
4651
08:19:03,800 --> 08:19:10,280
learning models. But it's not over yet. We've got to save and load our trained model. So I
4652
08:19:10,280 --> 08:19:14,440
issued you the challenge in the last video to try and save and load the model yourself. I hope
4653
08:19:14,440 --> 08:19:19,240
you gave that a go. But we're going to do that together in this video. So we're going to start
4654
08:19:19,240 --> 08:19:26,360
by importing path because we would like a file path to save our model to. And the first step we're
4655
08:19:26,360 --> 08:19:32,360
going to do is create models directory. We don't have to recreate this because I believe we already
4656
08:19:32,360 --> 08:19:37,880
have one. But I'm going to put the code here just for completeness. And this is just so if you
4657
08:19:37,880 --> 08:19:46,040
didn't have a models directory, this would create one. So model path is going to go to path
4658
08:19:48,200 --> 08:19:56,440
models. And then we'd like to model path dot maker, we're going to call maker for make directory.
4659
08:19:56,440 --> 08:20:03,080
We'll set parents equal to true. And if it exists, okay, that'll also be true. So we won't get an error.
4660
08:20:03,080 --> 08:20:07,480
Oh my gosh, Google collab. I didn't want that. We won't get an error if it already exists.
4661
08:20:08,040 --> 08:20:15,480
And two, we're going to create a model save path. So if you recall that pytorch objects in general
4662
08:20:15,480 --> 08:20:21,960
have the extension of what? There's a little pop quiz before we get to the end of this sentence.
4663
08:20:21,960 --> 08:20:27,640
So this is going to be pytorch workflow for this module that we're going through. This one here,
4664
08:20:27,640 --> 08:20:35,480
chapter 01 pytorch workflow model one. And they usually have the extension dot PT for pytorch or
4665
08:20:35,480 --> 08:20:43,320
PT H for pytorch as well. I like PT H. But just remember, sometimes you might come across slightly
4666
08:20:43,320 --> 08:20:49,160
different versions of that PT or PT H. And we're going to create the model save name or the save
4667
08:20:49,160 --> 08:20:55,640
path. It's probably a better way to do it is going to be model path. And then we can use because we're
4668
08:20:55,640 --> 08:21:02,280
using the path lib module from Python, we can save it under model name. And so if we look at this,
4669
08:21:02,280 --> 08:21:11,240
what do we get model save path? We should get Oh, path is not defined. Oh, too many capitals here,
4670
08:21:11,240 --> 08:21:18,440
Daniel. The reason why I'm doing these in capitals is because oftentimes hyper parameters such as epochs
4671
08:21:18,440 --> 08:21:24,920
in machine learning are set as hyper parameters LR could be learning rate. And then you could have
4672
08:21:25,720 --> 08:21:33,240
as well model name equals Yeah, yeah, yeah. But that's just a little bit of nomenclature trivia for
4673
08:21:33,240 --> 08:21:39,560
later on. And model save path, we've done that. Now we're going to save the model state dictionary
4674
08:21:39,560 --> 08:21:46,280
rather than the whole model, save the model state deck, which you will find the pros and cons of
4675
08:21:46,280 --> 08:21:53,000
in where in the pytorch documentation for saving and loading model, which was a little bit of extra
4676
08:21:53,000 --> 08:21:57,720
curriculum for a previous video. But let's have a look at our model save path will print it out.
4677
08:21:58,280 --> 08:22:06,120
And we'll go torch save, we'll set the object that we're trying to save to equal model one dot state
4678
08:22:06,120 --> 08:22:12,600
deck, which is going to contain our trained model parameters. We can inspect what's going on in here,
4679
08:22:12,600 --> 08:22:18,840
state deck. They'll show us our model parameters. Remember, because we're only using a single linear
4680
08:22:18,840 --> 08:22:24,840
layer, we only have two parameters. But in practice, when you use a model with maybe hundreds of layers
4681
08:22:24,840 --> 08:22:30,360
or tens of millions of parameters, viewing the state deck explicitly, like we are now,
4682
08:22:30,360 --> 08:22:35,960
might not be too viable of an option. But the principle still remains a state deck contains
4683
08:22:35,960 --> 08:22:43,240
all of the models trained or associated parameters, and what state they're in. And the file path we're
4684
08:22:43,240 --> 08:22:49,880
going to use is, of course, the model save path, which we've seen here is a POSIX path. Let's save
4685
08:22:49,880 --> 08:22:57,080
our model. Wonderful saving model to this file path here. And if we have a look at our folder,
4686
08:22:57,080 --> 08:23:02,520
we should have two saved models now, beautiful to save models. This one for us from the workflow
4687
08:23:02,520 --> 08:23:08,680
we did before up here, saving a model in PyTorch, loading a PyTorch model. And now the one we've got,
4688
08:23:08,680 --> 08:23:15,400
of course, model one is the one that we've just saved. Beautiful. So now let's load a model. We're
4689
08:23:15,400 --> 08:23:19,640
going to do both of these in one video. Load a PyTorch model. You know what, because we've had a
4690
08:23:19,640 --> 08:23:25,400
little bit of practice so far, and we're going to pick up the pace. So let's go loaded, let's call
4691
08:23:25,400 --> 08:23:31,560
it, we'll create a new instance of loaded model one, which is, of course, our linear regression model
4692
08:23:31,560 --> 08:23:37,080
V2, which is the version two of our linear regression model class, which subclasses, what?
4693
08:23:37,640 --> 08:23:43,960
Subclasses and n.module. So if we go back here up here to where we created it. So linear regression
4694
08:23:43,960 --> 08:23:50,120
model V2 uses a linear layer rather than the previous iteration of linear regression model,
4695
08:23:50,120 --> 08:23:59,240
which we created right up here. If we go up to here, which explicitly defined the parameters,
4696
08:23:59,240 --> 08:24:03,880
and then implemented a linear regression formula in the forward method, the difference between
4697
08:24:03,880 --> 08:24:10,520
what we've got now is we use PyTorch's pre-built linear layer, and then we call that linear layer
4698
08:24:10,520 --> 08:24:15,000
in the forward method, which is probably the far more popular way of building PyTorch models,
4699
08:24:15,000 --> 08:24:21,160
is stacking together pre-built NN layers, and then calling them in some way in the forward method.
4700
08:24:21,160 --> 08:24:32,680
So let's load it in. So we'll create a new instance of linear regression model V2, and now what do
4701
08:24:32,680 --> 08:24:37,480
we do? We've created a new instance, I'm just going to get out of this, make some space for us.
4702
08:24:38,520 --> 08:24:45,640
We want to load the model state deck, the saved model one state deck, which is the state deck that
4703
08:24:45,640 --> 08:24:52,200
we just saved beforehand. So we can do this by going loaded model one, calling the load state
4704
08:24:52,200 --> 08:24:58,680
decked method, and then passing it torch dot load, and then the file path of where we saved that
4705
08:24:58,680 --> 08:25:05,320
PyTorch object before. But the reason why we use the path lib is so that we can just call model
4706
08:25:05,320 --> 08:25:13,560
save path in here. Wonderful. And then let's check out what's going on. Or actually, we need to
4707
08:25:13,560 --> 08:25:22,200
put the target model or the loaded model to the device. The reason being is because we're doing all
4708
08:25:22,200 --> 08:25:30,840
our computing with device agnostic code. So let's send it to the device. And I think that'll be about
4709
08:25:30,840 --> 08:25:36,600
it. Let's see if this works. Oh, there we go. Linear regression model V2 in features one,
4710
08:25:36,600 --> 08:25:43,480
out features one, bias equals true. Wonderful. Let's check those parameters. Hey, next loaded model
4711
08:25:43,480 --> 08:25:51,560
one dot parameters. Are they on the right device? Let's have a look. Beautiful. And let's just check
4712
08:25:51,560 --> 08:25:57,960
the loaded state dictionary of loaded model one. Do we have the same values as we had previously?
4713
08:25:57,960 --> 08:26:04,840
Yes, we do. Okay. So to conclusively make sure what's going on, let's evaluate the loaded model.
4714
08:26:04,840 --> 08:26:11,480
Evaluate loaded model, loaded model one. What do we do for making predictions? Or what do we do to
4715
08:26:11,480 --> 08:26:17,560
evaluate? We call dot a vowel. And then if we're going to make some predictions, we use torch
4716
08:26:17,560 --> 08:26:23,000
inference mode with torch inference mode. And then let's create loaded model one, threads
4717
08:26:24,760 --> 08:26:32,680
equals loaded model one. And we'll pass it the test data. And now let's check for a quality
4718
08:26:32,680 --> 08:26:39,000
between Y threads, which is our previous model one preds that we made up here, Y threads.
4719
08:26:39,800 --> 08:26:45,080
And we're going to compare them to the fresh loaded model one preds. And should they be the same?
4720
08:26:50,280 --> 08:26:57,400
Yes, they are beautiful. And we can see that they're both on the device CUDA. How amazing is that? So
4721
08:26:57,400 --> 08:27:02,680
I want to give you a big congratulations, because you've come such a long way. We've gone through
4722
08:27:02,680 --> 08:27:08,200
the entire PyTorch workflow from making data, preparing and loading it to building a model.
4723
08:27:08,200 --> 08:27:13,160
All of the steps that come in building a model, there's a whole bunch there, making predictions,
4724
08:27:13,160 --> 08:27:18,040
training a model, we spent a lot of time going through the training steps. But trust me, it's
4725
08:27:18,040 --> 08:27:23,000
worth it, because we're going to be using these exact steps all throughout the course. And in fact,
4726
08:27:23,000 --> 08:27:27,640
you're going to be using these exact steps when you build PyTorch models after this course. And
4727
08:27:27,640 --> 08:27:32,040
then we looked at how to save a model so we don't lose all our work, we looked at loading a model,
4728
08:27:32,040 --> 08:27:37,720
and then we put it all together using the exact same problem, but in far less time. And as you'll
4729
08:27:37,720 --> 08:27:42,680
see later on, we can actually make this even quicker by functionalizing some of the code we've already
4730
08:27:42,680 --> 08:27:47,240
written. But I'm going to save that for later. I'll see you in the next video, where I'm just
4731
08:27:47,240 --> 08:27:51,880
going to show you where you can find some exercises and all of the extra curriculum I've been talking
4732
08:27:51,880 --> 08:28:00,760
about throughout this section 01 PyTorch workflow. I'll see you there. Welcome back. In the last
4733
08:28:00,760 --> 08:28:06,760
video, we finished up putting things together by saving and loading our trained model, which is
4734
08:28:06,760 --> 08:28:12,520
super exciting, because let's come to the end of the PyTorch workflow section. So now, this section
4735
08:28:12,520 --> 08:28:18,760
is going to be exercises and extra curriculum, or better yet, where you can find them. So I'm
4736
08:28:18,760 --> 08:28:25,640
going to turn this into markdown. And I'm going to write here for exercises and extra curriculum.
4737
08:28:27,320 --> 08:28:35,240
Refer to. So within the book version of the course materials, which is at learnpytorch.io,
4738
08:28:35,240 --> 08:28:40,520
we're in the 01 section PyTorch workflow fundamentals. There'll be more here by the time you watch
4739
08:28:40,520 --> 08:28:45,480
this video likely. And then if we go down here, at the end of each of these sections, we've got
4740
08:28:45,480 --> 08:28:51,880
the table of contents over here. We've got exercises and extra curriculum. I listed a bunch of things
4741
08:28:51,880 --> 08:28:58,760
throughout this series of 01 videos, like what's gradient descent and what's back propagation. So
4742
08:28:58,760 --> 08:29:04,120
I've got plenty of resources to learn more on that. There's the loading and saving PyTorch
4743
08:29:04,120 --> 08:29:09,640
documentation. There's the PyTorch cheat sheet. There's a great article by Jeremy Howard for a
4744
08:29:09,640 --> 08:29:14,520
deeper understanding of what's going on in torch.nn. And there's, of course, the unofficial PyTorch
4745
08:29:14,520 --> 08:29:21,160
optimization loop song by yours truly, which is a bit of fun. And here's some exercises. So
4746
08:29:21,160 --> 08:29:27,640
the exercises here are all based on the code that we wrote throughout section 01. So there's
4747
08:29:27,640 --> 08:29:32,760
nothing in the exercises that we haven't exactly covered. And if so, I'll be sure to put a note
4748
08:29:32,760 --> 08:29:37,880
in the exercise itself. But we've got create a straight line data set using the linear regression
4749
08:29:37,880 --> 08:29:44,760
formula. And then build a model by subclassing and end up module. So for these exercises, there's an
4750
08:29:44,760 --> 08:29:50,040
exercise notebook template, which is, of course, linked here. And in the PyTorch deep learning
4751
08:29:50,040 --> 08:29:56,040
GitHub, if we go into here, and then if we go into extras, and if we go into exercises, you'll
4752
08:29:56,040 --> 08:30:01,400
find all of these templates here. They're numbered by the same section that we're in. This is PyTorch
4753
08:30:01,400 --> 08:30:07,640
workflow exercises. So if you wanted to complete these exercises, you could click this notebook
4754
08:30:07,640 --> 08:30:16,280
here, open in Google CoLab. I'll just wait for this to load. There we go. And you can start to
4755
08:30:16,280 --> 08:30:21,800
write some code here. You could save a copy of this in your own Google Drive and go through this.
4756
08:30:21,800 --> 08:30:26,840
It's got some notes here on what you should be doing. You can, of course, refer to the text-based
4757
08:30:26,840 --> 08:30:31,480
version of them. They're all here. And then if you want an example of what some solutions look
4758
08:30:31,480 --> 08:30:36,920
like, now, please, I can't stress enough that I would highly, highly recommend trying the exercises
4759
08:30:36,920 --> 08:30:43,080
yourself. You can use the book that we've got here. This is just all the code from the videos.
4760
08:30:43,080 --> 08:30:47,800
You can use this. You can use, I've got so many notebooks here now, you can use all of the code
4761
08:30:47,800 --> 08:30:52,760
that we've written here to try and complete the exercises. But please give them a go yourself.
4762
08:30:52,760 --> 08:30:57,960
And then if you go back into the extras folder, you'll also find solutions. And this is just one
4763
08:30:57,960 --> 08:31:03,400
example solutions for section 01. But I'm going to get out of that so you can't cheat and look
4764
08:31:03,400 --> 08:31:08,680
at the solutions first. But there's a whole bunch of extra resources all contained within
4765
08:31:09,400 --> 08:31:16,600
the PyTorch deep loaning repo, extras, exercises, solutions, and they're also in the book version
4766
08:31:16,600 --> 08:31:21,880
of the course. So I'm just going to link this in here. I'm going to put this right at the bottom
4767
08:31:21,880 --> 08:31:28,520
here. Wonderful. But that is it. That is the end of the section 01 PyTorch workflow.
4768
08:31:28,520 --> 08:31:33,560
So exciting. We went through basically all of the steps in a PyTorch workflow,
4769
08:31:33,560 --> 08:31:38,600
getting data ready, turning into tenses, build or pick a model, picking a loss function on an
4770
08:31:38,600 --> 08:31:42,200
optimizer. We built a training loop. We fit the model to the data. We made a prediction.
4771
08:31:42,200 --> 08:31:47,080
We evaluated our model. We improved through experimentation by training for more epochs.
4772
08:31:47,080 --> 08:31:51,240
We'll do more of this later on. And we saved and reload our trained model.
4773
08:31:51,240 --> 08:32:01,480
But that's going to finish 01. I will see you in the next section. Friends, welcome back.
4774
08:32:02,040 --> 08:32:07,400
We've got another very exciting module. You ready? Neural network classification with
4775
08:32:10,440 --> 08:32:15,720
PyTorch. Now combining this module once we get to the end with the last one, which was
4776
08:32:15,720 --> 08:32:20,520
regression. So remember classification is predicting a thing, but we're going to see this in a second.
4777
08:32:20,520 --> 08:32:25,080
And regression is predicting a number. Once we've covered this, we've covered two of the
4778
08:32:25,080 --> 08:32:30,280
the biggest problems in machine learning, predicting a number or predicting a thing.
4779
08:32:30,280 --> 08:32:36,680
So let's start off with before we get into any ideas or code, where can you get help?
4780
08:32:38,120 --> 08:32:43,240
First things first is follow along with the code. If you can, if in doubt, run the code.
4781
08:32:44,120 --> 08:32:48,280
Try it for yourself. Write the code. I can't stress how important this is.
4782
08:32:48,280 --> 08:32:53,560
If you're still stuck, press shift, command, and space to read the doc string of any of the
4783
08:32:53,560 --> 08:32:59,160
functions that we're running. If you are on Windows, it might be control. I'm on a Mac, so I put command
4784
08:32:59,160 --> 08:33:04,280
here. If you're still stuck, search for your problem. If an error comes up, just copy and paste
4785
08:33:04,280 --> 08:33:09,160
that into Google. That's what I do. You might come across resources like Stack Overflow or,
4786
08:33:09,160 --> 08:33:14,600
of course, the PyTorch documentation. We'll be referring to this a lot again throughout this
4787
08:33:14,600 --> 08:33:21,480
section. And then finally, oh wait, if you're still stuck, try again. If in doubt, run the code.
4788
08:33:21,480 --> 08:33:25,960
And then finally, if you're still stuck, don't forget, you can ask a question. The best place to
4789
08:33:25,960 --> 08:33:31,160
do so will be on the course GitHub, which will be at the discussions page, which is linked here.
4790
08:33:32,120 --> 08:33:36,440
If we load this up, there's nothing here yet, because as I record these videos, the course
4791
08:33:36,440 --> 08:33:42,920
hasn't launched yet, but press new discussion. Talk about what you've got. Problem with XYZ.
4792
08:33:42,920 --> 08:33:48,600
Let's go ahead. Leave a video number here and a timestamp, and that way, we'll be able to help
4793
08:33:48,600 --> 08:33:54,440
you out as best as possible. So video number, timestamp, and then your question here, and you
4794
08:33:54,440 --> 08:34:01,080
can select Q&A. Finally, don't forget that this notebook that we're about to go through is based
4795
08:34:01,080 --> 08:34:05,960
on chapter two of the Zero to Mastery Learn PyTorch for deep learning, which is neural network
4796
08:34:05,960 --> 08:34:11,720
classification with PyTorch. All of the text-based code that we're about to write is here. That
4797
08:34:11,720 --> 08:34:17,720
was a little spoiler. And don't forget, this is the home page. So my GitHub repo slash PyTorch
4798
08:34:17,720 --> 08:34:23,320
deep learning for all of the course materials, everything you need will be here. So that's very
4799
08:34:23,320 --> 08:34:28,200
important. How can you get help? But this is the number one. Follow along with the code and try
4800
08:34:28,200 --> 08:34:32,680
to write it yourself. Well, with that being said, when we're talking about classification,
4801
08:34:32,680 --> 08:34:38,040
what is a classification problem? Now, as I said, classification is one of the main problems of
4802
08:34:38,040 --> 08:34:43,000
machine learning. So you probably already deal with classification problems or machine learning
4803
08:34:43,000 --> 08:34:50,280
powered classification problems every day. So let's have a look at some examples. Is this email
4804
08:34:50,280 --> 08:34:57,400
spam or not spam? Did you check your emails this morning or last night or whenever? So chances are
4805
08:34:57,400 --> 08:35:00,920
that there was some sort of machine learning model behind the scenes. It may have been a neural
4806
08:35:00,920 --> 08:35:07,160
network. It may have not that decided that some of your emails won't spam. So to Daniel,
4807
08:35:07,160 --> 08:35:11,240
at mrdberg.com, hey, Daniel, this steep learning course is incredible. I can't wait to use what
4808
08:35:11,240 --> 08:35:15,800
I've learned. Oh, that's such a nice message. If you want to send that email directly to me,
4809
08:35:15,800 --> 08:35:21,640
you can. That's my actual email address. But if you want to send me this email, well, hopefully
4810
08:35:21,640 --> 08:35:27,400
my email, which is hosted by some email service detects this as spam because although that is a
4811
08:35:27,400 --> 08:35:32,200
lot of money and it would be very nice, I think if someone can't spell too well, are they really
4812
08:35:32,200 --> 08:35:37,640
going to pay me this much money? So thank you email provider for classifying this as spam. And now
4813
08:35:37,640 --> 08:35:44,680
because this is one thing or another, not spam or spam, this is binary classification. So in this
4814
08:35:44,680 --> 08:35:51,960
case, it might be one here and this is a zero or zero or one. So one thing or another, that's binary
4815
08:35:51,960 --> 08:35:57,960
classification. If you can split it into one thing or another, binary classification. And then we
4816
08:35:57,960 --> 08:36:04,840
have an example of say we had the question, we asked our photos app on our smartphone or whatever
4817
08:36:04,840 --> 08:36:10,440
device you're using. Is this photo of sushi steak or pizza? We wanted to search our photos for every
4818
08:36:10,440 --> 08:36:15,880
time we've eaten sushi or every time we've eaten steak or every time we've eaten pizza far out
4819
08:36:15,880 --> 08:36:21,080
and this looks delicious. But this is multi class classification. Now, why is this? Because we've
4820
08:36:21,080 --> 08:36:27,000
got more than two things. We've got 123. And now this could be 10 different foods. It could be 100
4821
08:36:27,000 --> 08:36:33,400
different foods. It could be 1000 different categories. So the image net data set, which is a popular
4822
08:36:33,400 --> 08:36:42,680
data set for computer vision, image net, we go to here, does it say 1000 anywhere, 1k or 1000?
4823
08:36:43,720 --> 08:36:50,600
No, it doesn't. But if we go image net 1k, download image net data, maybe it's here.
4824
08:36:50,600 --> 08:36:58,840
It won't say it, but you just, oh, there we go, 1000 object classes. So this is multi class
4825
08:36:58,840 --> 08:37:05,400
classification because it has 1000 classes, that's a lot, right? So that's multi class classification,
4826
08:37:05,400 --> 08:37:12,440
more than one thing or another. And finally, we might have multi label classification,
4827
08:37:13,000 --> 08:37:17,240
which is what tags should this article have when I first got into machine learning, I got these
4828
08:37:17,240 --> 08:37:22,600
two mixed up a whole bunch of times. Multi class classification has multiple classes such as sushi
4829
08:37:22,600 --> 08:37:28,920
steak pizza, but assigns one label to each. So this photo would be sushi in an ideal world. This is
4830
08:37:28,920 --> 08:37:35,160
steak and this is pizza. So one label to each. Whereas multi label classification means you could
4831
08:37:35,160 --> 08:37:41,720
have multiple different classes. But each of your target samples such as this Wikipedia article,
4832
08:37:41,720 --> 08:37:47,000
what tags should this article have? It may have more than one label. It might have three labels,
4833
08:37:47,000 --> 08:37:54,200
it might have 10 labels. In fact, what if we went to the Wikipedia page for deep learning Wikipedia
4834
08:37:54,840 --> 08:38:01,320
and does it have any labels? Oh, there we go. Where was that? I mean, you can try this yourself.
4835
08:38:01,320 --> 08:38:05,160
This is just the Wikipedia page for deep learning. There is a lot, there we go categories deep
4836
08:38:05,160 --> 08:38:10,760
learning, artificial neural networks, artificial intelligence and emerging technologies. So that
4837
08:38:10,760 --> 08:38:15,640
is an example. If we wanted to build a machine learning model to say, read all of the text in
4838
08:38:15,640 --> 08:38:21,320
here and then go tell me what are the most relevant categories to this article? It might come up
4839
08:38:21,320 --> 08:38:26,760
with something like these. In this case, because it has one, two, three, four, it has multiple labels
4840
08:38:26,760 --> 08:38:33,320
rather than just one label of deep learning, it could be multi label classification. So we'll go
4841
08:38:33,320 --> 08:38:38,520
back. But there's a few more. These will get you quite far in the world of classification.
4842
08:38:38,520 --> 08:38:45,480
So let's dig a little deeper on binary versus multi class classification. You may have already
4843
08:38:45,480 --> 08:38:52,200
experienced this. So in my case, if I search on my phone in the photos app for photos of a dog,
4844
08:38:52,200 --> 08:38:56,680
it might come here. If I search for photos of a cat, it might come up with this. But if I wanted
4845
08:38:56,680 --> 08:39:01,400
to train an algorithm to detect the difference between photos of these are my two dogs.
4846
08:39:01,400 --> 08:39:06,040
Aren't they cute? They're nice and tired and they're sleeping like a person. This is seven.
4847
08:39:06,040 --> 08:39:11,720
Number seven, that's her name. And this is Bella. This is a cat that me and my partner rescued.
4848
08:39:11,720 --> 08:39:16,600
And so I'm not sure what this cat's name is actually. So I'd love to give it a name, but I can't.
4849
08:39:16,600 --> 08:39:22,280
So binary classification, if we wanted to build an algorithm, we wanted to feed it, say, 10,000
4850
08:39:22,280 --> 08:39:27,640
photos of dogs and 10,000 photos of cats. And then we wanted to find a random image on the
4851
08:39:27,640 --> 08:39:32,600
internet and pass it through to our model and say, hey, is this a dog or is this a cat? It would
4852
08:39:32,600 --> 08:39:39,080
be binary classification because the options are one thing or another dog or cat. But then for
4853
08:39:39,080 --> 08:39:44,040
multi-class classification, let's say we've been working on a farm and we've been taking some photos
4854
08:39:44,040 --> 08:39:49,080
of chickens because they groovy, right? Well, we updated our model and added some chicken photos
4855
08:39:49,080 --> 08:39:54,360
in there. We would now be working with a multi-class classification problem because we've got more
4856
08:39:54,360 --> 08:40:01,640
than one thing or another. So let's jump in to what we're going to cover. This is broadly,
4857
08:40:01,640 --> 08:40:06,200
by the way, because this is just text on a page. You know, I like to just write code of what we're
4858
08:40:06,200 --> 08:40:10,840
actually doing. So we're going to look at the architecture of a neural network classification
4859
08:40:10,840 --> 08:40:15,480
model. We're going to check what the input shapes and output shapes of a classification model are
4860
08:40:15,480 --> 08:40:20,840
features and labels. In other words, because remember, machine learning models, neural networks
4861
08:40:20,840 --> 08:40:27,320
love to have numerical inputs. And those numerical inputs often come in tenses. Tenses have different
4862
08:40:27,320 --> 08:40:32,120
shapes, depending on what data you're working with. We're going to see all of this in code, creating
4863
08:40:32,120 --> 08:40:36,200
custom data to view, fit and predict on. We're going to go back through our steps in modeling.
4864
08:40:36,200 --> 08:40:41,720
We covered this a fair bit in the previous section, but creating a model for neural network classification.
4865
08:40:41,720 --> 08:40:45,480
It's a little bit different to what we've done, but not too out landishly different. We're going to
4866
08:40:45,480 --> 08:40:50,280
see how we can set up a loss function and an optimizer for a classification model. We'll
4867
08:40:50,280 --> 08:40:56,040
recreate a training loop and a evaluating loop or a testing loop. We'll see how we can save and
4868
08:40:56,040 --> 08:41:01,480
load our models. We'll harness the power of nonlinearity. Well, what does that even mean? Well,
4869
08:41:01,480 --> 08:41:07,000
if you think of what a linear line is, what is that? It's a straight line. So you might be
4870
08:41:07,000 --> 08:41:12,040
able to guess what a nonlinear line looks like. And then we'll look at different classification
4871
08:41:12,040 --> 08:41:17,640
evaluation methods. So ways that we can evaluate our classification models. And how are we going
4872
08:41:17,640 --> 08:41:24,360
to do all of this? Well, of course, we're going to be part cook, part chemist, part artist, part
4873
08:41:24,360 --> 08:41:30,600
science. But for me, I personally prefer the cook side of things because we're going to be cooking
4874
08:41:30,600 --> 08:41:36,600
up lots of code. So in the next video, before we get into coding, let's do a little bit more on
4875
08:41:36,600 --> 08:41:44,120
what are some classification inputs and outputs. I'll see you there. Welcome back. In the last
4876
08:41:44,120 --> 08:41:48,600
video, we had a little bit of a brief overview of what a classification problem is. But now,
4877
08:41:48,600 --> 08:41:53,400
let's start to get more hands on by discussing what the actual inputs to a classification problem
4878
08:41:53,400 --> 08:41:59,880
look like and the outputs look like. And so let's say we had our beautiful food photos from before,
4879
08:41:59,880 --> 08:42:04,840
and we were trying to build this app here called maybe food vision to understand what
4880
08:42:04,840 --> 08:42:12,600
foods are in the photos that we take. And so what might this look like? Well, let's break it down
4881
08:42:13,160 --> 08:42:18,600
to inputs, some kind of machine learning algorithm, and then outputs. In this case,
4882
08:42:18,600 --> 08:42:24,600
the inputs we want to numerically represent these images in some way, shape or form. Then we want
4883
08:42:24,600 --> 08:42:29,320
to build a machine learning algorithm. Hey, one might actually exist. We're going to see this later
4884
08:42:29,320 --> 08:42:33,800
on in the transfer learning section for our problem. And then we want some sort of outputs. And in
4885
08:42:33,800 --> 08:42:39,080
the case of food vision, we want to know, okay, this is a photo of sushi. And this is a photo of
4886
08:42:39,080 --> 08:42:46,280
steak. And this is a photo of pizza. You could get more hands on and technical and complicated, but
4887
08:42:46,280 --> 08:42:52,840
we're just going to stick with single label multi class classification. So it could be a sushi photo,
4888
08:42:52,840 --> 08:42:58,760
it could be a steak photo, or it could be a pizza photo. So how might we numerically represent
4889
08:42:58,760 --> 08:43:05,080
these photos? Well, let's just say we had a function in our app that every photo that gets taken
4890
08:43:05,080 --> 08:43:11,720
automatically gets resized into a square into 224 width and 224 height. This is actually quite a
4891
08:43:11,720 --> 08:43:18,520
common dimensionality for computer vision problems. And so we've got the width dimension, we've got
4892
08:43:18,520 --> 08:43:23,800
the height, and then we've got this C here, which isn't immediately recognizable. But in the case
4893
08:43:23,800 --> 08:43:29,720
of pictures, they often get represented by width, height color channels. And the color channels is
4894
08:43:29,720 --> 08:43:37,000
red, green and blue, which is each pixel in this image has some value of red, green or blue, that
4895
08:43:37,000 --> 08:43:43,000
makes whatever color is displayed here. And this is one way that we can numerically represent an
4896
08:43:43,000 --> 08:43:50,280
image by taking its width, its height and color channels, and whatever number makes up this
4897
08:43:50,280 --> 08:43:54,440
particular image. We're going to see this later on when we work with computer vision problems.
4898
08:43:55,080 --> 08:44:01,480
So we create a numerical encoding, which is the pixel values here. Then we import the pixel values
4899
08:44:01,480 --> 08:44:07,160
of each of these images into a machine learning algorithm, which is often already exists. And if
4900
08:44:07,160 --> 08:44:12,040
it doesn't exist for our particular problem, hey, well, we're learning the skills to build them now,
4901
08:44:12,040 --> 08:44:17,720
we could use pytorch to build a machine learning algorithm for this. And then outputs, what might
4902
08:44:17,720 --> 08:44:23,560
these look like? Well, in this case, these are prediction probabilities, which the outputs of
4903
08:44:23,560 --> 08:44:28,440
machine learning models are never actually discrete, which means it is definitely pizza.
4904
08:44:28,440 --> 08:44:35,240
It will give some sort of probability value between zero and one for say the closer to one,
4905
08:44:35,240 --> 08:44:42,520
the more confident our model is that it's going to be pizza. And the closer to zero is means that,
4906
08:44:42,520 --> 08:44:47,880
hey, this photo of pizza, let's say this one, and we're trying to predict sushi. Well,
4907
08:44:48,680 --> 08:44:53,080
it doesn't think that it's sushi. So it's giving it quite a low value here. And then the same for
4908
08:44:53,080 --> 08:44:58,600
steak, but it's really high, the value here for pizza. We're going to see this hands on. And then
4909
08:44:58,600 --> 08:45:03,880
it's the opposite here. So it might have got this one wrong. But with more training and more data,
4910
08:45:03,880 --> 08:45:07,320
we could probably improve this prediction. That's the whole idea of machine learning,
4911
08:45:07,320 --> 08:45:14,280
is that if you adjust the algorithm, if you adjust the data, you can improve your predictions. And so
4912
08:45:16,360 --> 08:45:22,040
the ideal outputs that we have here, this is what our models going to output. But for our case of
4913
08:45:22,040 --> 08:45:28,200
building out food vision, we want to bring them back to. So we could just put all of these numbers
4914
08:45:28,200 --> 08:45:33,720
on the screen here, but that's not really going to help people. We want to put out labels of what's
4915
08:45:33,720 --> 08:45:39,640
going on here. So we can write code to transfer these prediction probabilities into these labels
4916
08:45:39,640 --> 08:45:44,600
too. And so how did these labels come about? How do these predictions come about? Well,
4917
08:45:44,600 --> 08:45:48,920
it comes from looking at lots of different samples. So this loop, we could keep going,
4918
08:45:48,920 --> 08:45:54,120
improve these, find the ones where it's wrong, add more images here, train the model again,
4919
08:45:54,120 --> 08:46:00,200
and then make our app better. And so if we want to look at this from a shape perspective,
4920
08:46:01,240 --> 08:46:06,840
we want to create some tenses for an image classification example. So we're building food vision.
4921
08:46:08,200 --> 08:46:12,040
We've got an image again, this is just reiterating on some of the things that we've discussed.
4922
08:46:12,040 --> 08:46:17,720
We've got a width of 224 and a height of 224. This could be different. This could be 300, 300.
4923
08:46:17,720 --> 08:46:23,160
This could be whatever values that you decide to use. Then we numerically encoded in some way,
4924
08:46:23,160 --> 08:46:27,720
shape or form. We use this as the inputs to our machine learning algorithm, because of what?
4925
08:46:27,720 --> 08:46:31,800
Computers and machine learning algorithms, they love numbers. They can find patterns in here
4926
08:46:31,800 --> 08:46:35,400
that we couldn't necessarily find. Or maybe we could, if you had a long enough time,
4927
08:46:35,400 --> 08:46:39,560
but I'd rather write an algorithm to do it for me. Then it has some outputs,
4928
08:46:39,560 --> 08:46:44,280
which comes in the formal prediction probabilities, the closer to one, the more confident model is
4929
08:46:44,280 --> 08:46:49,400
and saying, hey, I'm pretty damn confident that this is a photo of sushi. I don't think it's a
4930
08:46:49,400 --> 08:46:55,320
photo of steak. So I'm giving that zero. It might be a photo of pizza, but I don't really think so.
4931
08:46:55,320 --> 08:47:01,560
So I'm giving it quite a low prediction probability. And so if we have a look at what the shapes are
4932
08:47:01,560 --> 08:47:05,640
for our tenses here, if this doesn't make sense, don't worry. We're going to see the code to do
4933
08:47:05,640 --> 08:47:11,000
all of this later on. But for now, we're just focusing on a classification input and output.
4934
08:47:11,000 --> 08:47:17,080
The big takeaway from here is numerical encoding, outputs and numerical encoding. But we want to
4935
08:47:17,080 --> 08:47:22,200
change these numerical codings from the outputs to something that we understand, say the word sushi.
4936
08:47:22,840 --> 08:47:28,200
But this tensor may be batch size. We haven't seen what batch size is. That's all right. We're
4937
08:47:28,200 --> 08:47:33,960
going to cover it. Color channels with height. So this is represented as a tensor of dimensions.
4938
08:47:33,960 --> 08:47:38,760
It could be none here. None is a typical value for a batch size, which means it's blank. So when
4939
08:47:38,760 --> 08:47:43,880
we use our model and we train it, all the code that we write with pytorch will fill in this behind
4940
08:47:43,880 --> 08:47:50,120
the scenes. And then we have three here, which is color channels. And we have 224, which is the width.
4941
08:47:50,120 --> 08:47:55,880
And we have 224 as well, which is the height. Now there is some debate in the field on the ordering.
4942
08:47:55,880 --> 08:48:01,080
We're using an image as our particular example here on the ordering of these shapes. So say,
4943
08:48:01,080 --> 08:48:06,040
for example, you might have height width color channels, typically width and height come together
4944
08:48:06,040 --> 08:48:10,840
in this order. Or they're just side by side in the tensor in terms of their whether dimension
4945
08:48:10,840 --> 08:48:17,000
appears. But color channels sometimes comes first. That means after the batch size or at the end here.
4946
08:48:17,000 --> 08:48:22,920
But pytorch, the default for now is color channels with height, though you can write code to change
4947
08:48:22,920 --> 08:48:30,120
this order because tenses are quite flexible. And so or the shape could be 32 for the batch size,
4948
08:48:30,120 --> 08:48:35,880
three, two, two, four, two, two, four, because 32 is a very common batch size. And you don't believe me?
4949
08:48:35,880 --> 08:48:46,280
Well, let's go here. Yarn LeCoon 32 batch size. Now what is a batch size? Great tweet. Just keep
4950
08:48:46,280 --> 08:48:51,320
this in mind for later on. Training with large mini batches is bad for your health. More importantly,
4951
08:48:51,320 --> 08:48:55,880
it's bad for your test error. Friends don't let friends use mini batches larger than 32. So this
4952
08:48:55,880 --> 08:49:03,240
is quite an old tweet. However, it still stands quite true. Because like today, it's 2022 when
4953
08:49:03,240 --> 08:49:08,760
I'm recording these videos, there are batch sizes a lot larger than 32. But 32 works pretty darn
4954
08:49:08,760 --> 08:49:16,280
well for a lot of problems. And so this means that if we go back to our slide, that if we use
4955
08:49:16,280 --> 08:49:22,440
a batch size of 32, our machine learning algorithm looks at 32 images at a time. Now why does it do
4956
08:49:22,440 --> 08:49:27,960
this? Well, because sadly, our computers don't have infinite compute power. In an ideal world,
4957
08:49:27,960 --> 08:49:32,920
we look at thousands of images at a time, but it turns out that using a multiple of eight here
4958
08:49:32,920 --> 08:49:39,240
is actually quite efficient. And so if we have a look at the output shape here, why is it three?
4959
08:49:39,800 --> 08:49:45,240
Well, because we're working with three different classes, one, two, three. So we've got shape equals
4960
08:49:45,240 --> 08:49:52,840
three. Now, of course, as you could imagine, these might change depending on the problem you're working
4961
08:49:52,840 --> 08:49:58,680
with. So say if we just wanted to predict if a photo was a cat or a dog, we still might have this
4962
08:49:58,680 --> 08:50:03,880
same representation here because this is the image representation. However, the shape here
4963
08:50:03,880 --> 08:50:09,400
may be two, or will be two because it's cat or dog, rather than three classes here, but a little
4964
08:50:09,400 --> 08:50:13,880
bit confusing as well with binary classification, you could have the shape just being one here.
4965
08:50:13,880 --> 08:50:19,880
But we're going to see this all hands on. Just remember, the shapes vary with whatever problem
4966
08:50:19,880 --> 08:50:26,920
you're working on. The principle of encoding your data as a numerical representation stays the same
4967
08:50:26,920 --> 08:50:33,080
for the inputs. And the outputs will often be some form of prediction probability based on whatever
4968
08:50:33,080 --> 08:50:39,880
class you're working with. So in the next video, right before we get into coding, let's just discuss
4969
08:50:39,880 --> 08:50:45,000
the high level architecture of a classification model. And remember, architecture is just like
4970
08:50:45,000 --> 08:50:52,200
the schematic of what a neural network is. I'll see you there. Welcome back. In the last video,
4971
08:50:52,200 --> 08:50:58,200
we saw some example classification inputs and outputs. The main takeaway that the inputs to a
4972
08:50:58,200 --> 08:51:02,760
classification model, particularly a neural network, want to be some form of numerical
4973
08:51:02,760 --> 08:51:09,160
representation. And the outputs are often some form of prediction probability. So let's discuss
4974
08:51:09,160 --> 08:51:13,720
the typical architecture of a classification model. And hey, this is just going to be text
4975
08:51:13,720 --> 08:51:19,240
on a page, but we're going to be building a fair few of these. So we've got some hyper parameters
4976
08:51:19,240 --> 08:51:26,280
over here. We've got binary classification. And we've got multi class classification. Now,
4977
08:51:26,280 --> 08:51:30,840
there are some similarities between the two in terms of what problem we're working with.
4978
08:51:30,840 --> 08:51:36,600
But there also are some differences here. And by the way, this has all come from, if we go
4979
08:51:36,600 --> 08:51:41,080
to the book version of the course, we've got what is a classification problem. And we've got
4980
08:51:41,080 --> 08:51:47,640
architecture of a classification neural network. So all of this text is available at learnpytorch.io
4981
08:51:47,640 --> 08:51:53,960
and in section two. So we come back. So the input layer shape, which is typically
4982
08:51:53,960 --> 08:51:59,960
decided by the parameter in features, as you can see here, is the same of number of features.
4983
08:51:59,960 --> 08:52:04,200
So if we were working on a problem, such as we brought it to predict whether someone had
4984
08:52:04,200 --> 08:52:09,560
heart disease or not, we might have five input features, such as one for age, a number for age,
4985
08:52:09,560 --> 08:52:16,360
it might be in my case, 28, sex could be male, height, 180 centimeters. If I've been growing
4986
08:52:16,360 --> 08:52:21,720
overnight, it's really close to 177. Wait, well, it depends on how much I've eaten, but it's around
4987
08:52:21,720 --> 08:52:27,480
about 75 kilos and smoking status, which is zero. So it could be zero or one, because remember,
4988
08:52:27,480 --> 08:52:33,160
we want numerical representation. So for sex, it could be zero for males, one for female,
4989
08:52:33,160 --> 08:52:37,240
height could be its number, weight could be its number as well. All of these numbers could be
4990
08:52:37,240 --> 08:52:43,480
more, could be less as well. So this is really flexible. And it's a hyper parameter. Why? Because
4991
08:52:43,480 --> 08:52:48,840
we decide the values for each of these. So in the case of our image prediction problem,
4992
08:52:48,840 --> 08:52:53,160
we could have in features equals three for number of color channels. And then we go
4993
08:52:54,520 --> 08:52:59,720
hidden layers. So there's the blue circle here. I forgot that this was all timed and colorful.
4994
08:52:59,720 --> 08:53:05,720
But let's just discuss hidden layers. Each of these is a layer and n dot linear and n dot linear
4995
08:53:05,720 --> 08:53:11,160
and n dot relu and n dot linear. So that's the kind of the syntax you'll see in PyTorch for a
4996
08:53:11,160 --> 08:53:15,960
layer is nn dot something. Now, there are many different types of layers in this in PyTorch.
4997
08:53:15,960 --> 08:53:22,840
If we go torch and n, basically everything in here is a layer in a neural network. And then if we
4998
08:53:22,840 --> 08:53:30,040
look up what a neural network looks like, neural network, recall that all of these are different
4999
08:53:30,040 --> 08:53:37,160
layers of some kind of mathematical operation. Input layer, hidden layer, you could have as
5000
08:53:37,160 --> 08:53:44,600
many hidden layers as you want. Do we have ResNet architecture? The ResNet architecture,
5001
08:53:44,600 --> 08:53:50,440
some of them have 50 layers. Look at this. Each one of these is a layer. And this is only the
5002
08:53:50,440 --> 08:53:56,120
34 layer version. I mean, there's ResNet 152, which is 152 layers. We're not at that yet.
5003
08:53:56,840 --> 08:54:02,600
But we're working up the tools to get to that stage. Let's come back to here. The neurons per
5004
08:54:02,600 --> 08:54:09,800
hidden layer. So we've got these, out features, the green circle, the green square. Now, this is,
5005
08:54:09,800 --> 08:54:17,080
if we go back to our neural network picture, this is these. Each one of these little things
5006
08:54:17,080 --> 08:54:24,360
is a neuron, some sort of parameter. So if we had 100, what would that look like? Well,
5007
08:54:24,360 --> 08:54:30,120
we'd have a fairly big graphic. So this is why I like to teach with code because you could customize
5008
08:54:30,120 --> 08:54:35,640
this as flexible as you want. So behind the scenes, PyTorch is going to create 100 of these little
5009
08:54:35,640 --> 08:54:40,600
circles for us. And within each circle is what? Some sort of mathematical operation.
5010
08:54:41,320 --> 08:54:45,960
So if we come back, what do we got next? Output layer shape. So this is how many output features
5011
08:54:45,960 --> 08:54:50,920
we have. So in the case of binary classification is one, one class or the other. We're going to
5012
08:54:50,920 --> 08:54:56,200
see this later on. Multi-class classification is you might have three output features,
5013
08:54:56,200 --> 08:55:02,520
one per class, e.g., one for food, person or dog, if you're building a food, person or dog,
5014
08:55:02,520 --> 08:55:08,680
image classification model. Hidden layer activation, which is, we haven't seen these yet.
5015
08:55:08,680 --> 08:55:15,000
Relu, which is a rectified linear unit, but can be many others because PyTorch, of course, has what?
5016
08:55:15,000 --> 08:55:19,960
Has a lot of non-linear activations. We're going to see this later on. Remember, I'm kind of planting
5017
08:55:19,960 --> 08:55:25,560
the seed here. We've seen what a linear line is, but I want you to imagine what a non-linear line is.
5018
08:55:25,560 --> 08:55:30,280
It's going to be a bit of a superpower for our classification problem. What else do we have?
5019
08:55:30,280 --> 08:55:35,320
Output activation. We haven't got that here, but we'll also see this later on, which could be
5020
08:55:35,320 --> 08:55:41,000
sigmoid for, which is generally sigmoid for binary classification, but softmax for multi-class
5021
08:55:41,000 --> 08:55:45,560
classification. A lot of these things are just names on a page. We haven't seen them yet.
5022
08:55:45,560 --> 08:55:49,880
I like to teach them as we see them, but this is just a general overview of what we're going to
5023
08:55:49,880 --> 08:55:55,720
cover. Loss function. What loss function or what does a loss function do? It measures how
5024
08:55:55,720 --> 08:56:00,680
wrong our model's predictions are compared to what the ideal predictions are. So for binary
5025
08:56:00,680 --> 08:56:06,440
classification, we might use binary cross entropy loss in PyTorch, and for multi-class
5026
08:56:06,440 --> 08:56:12,520
classification, we might just use cross entropy rather than binary cross entropy. Get it?
5027
08:56:12,520 --> 08:56:18,840
Binary classification? Binary cross entropy? And then optimizer. SGD is stochastic gradient descent.
5028
08:56:18,840 --> 08:56:24,280
We've seen that one before. Another common option is the atom optimizer, and of course,
5029
08:56:24,280 --> 08:56:32,440
the torch.optim package has plenty more options. So this is an example multi-class classification
5030
08:56:32,440 --> 08:56:37,160
problem. This network here. Why is that? And we haven't actually seen an end up sequential,
5031
08:56:37,160 --> 08:56:41,400
but as you could imagine, sequential stands for it just goes through each of these steps.
5032
08:56:42,120 --> 08:56:47,080
So multi-class classification, because it has three output features, more than one thing or
5033
08:56:47,080 --> 08:56:53,240
another. So three for food, person or dog, but going back to our food vision problem,
5034
08:56:53,240 --> 08:57:00,040
we could have the input as sushi, steak, or pizza. So we've got three output features,
5035
08:57:00,040 --> 08:57:06,600
which would be one prediction probability per class of image. We have three classes, sushi,
5036
08:57:06,600 --> 08:57:13,320
steak, or pizza. Now, I think we've done enough talking here, and enough just pointing to text
5037
08:57:13,320 --> 08:57:20,760
on slides. How about in the next video? Let's code. I'll see you in Google CoLab.
5038
08:57:22,440 --> 08:57:28,200
Welcome back. Now, we've done enough theory of what a classification problem is, what the inputs
5039
08:57:28,200 --> 08:57:33,160
and outputs are and the typical architecture. Let's get in and write some code. So I'm going to
5040
08:57:33,960 --> 08:57:42,280
get out of this, and going to go to colab.research.google.com, so we can start writing some PyTorch code.
5041
08:57:42,280 --> 08:57:48,760
I'm going to click new notebook. We're going to start exactly from scratch. I'm going to name this
5042
08:57:48,760 --> 08:57:58,840
section two, and let's call it neural network classification with PyTorch. I'm going to put
5043
08:57:58,840 --> 08:58:04,840
underscore video, because I'll just show you, you'll see this in the GitHub repo. But for all the
5044
08:58:04,840 --> 08:58:09,560
video notebooks, the ones that I write code during these videos that you're watching, the exact code
5045
08:58:09,560 --> 08:58:14,760
is going to be saved on the GitHub repo under video notebooks. So there's 00, which is the
5046
08:58:14,760 --> 08:58:19,160
fundamentals, and there's the workflow underscore video. But the reference notebook with all the
5047
08:58:19,160 --> 08:58:26,520
pretty pictures and stuff is in the main folder here. So PyTorch classification that I pi and b
5048
08:58:26,520 --> 08:58:32,280
are actually, maybe we'll just rename it that PyTorch classification. But we know it's with
5049
08:58:32,280 --> 08:58:40,120
neural networks. PyTorch classification. Okay, and let's go here. We'll add a nice title. So O2,
5050
08:58:40,120 --> 08:58:49,480
neural network classification with PyTorch. And so we'll remind ourselves, classification is a
5051
08:58:49,480 --> 08:59:00,520
problem of predicting whether something is one thing or another. And there can be multiple
5052
08:59:02,040 --> 08:59:09,800
things as the options, such as email, spam or not spam, photos of dogs or cats or pizza or
5053
08:59:09,800 --> 08:59:19,720
sushi or steak. Lots of talk about food. And then I'm just going to link in here, this resource,
5054
08:59:19,720 --> 08:59:25,560
because this is the book version of the course. These are what the videos are based off. So book
5055
08:59:25,560 --> 08:59:34,680
version of this notebook. And then all the resources are in here. All other resources
5056
08:59:34,680 --> 08:59:46,520
in the GitHub, and then stuck. Ask a question here, which is under the discussions tab. We'll
5057
08:59:46,520 --> 08:59:51,960
copy that in here. That way we've got everything linked and ready to go. But as always, what's our
5058
08:59:51,960 --> 08:59:58,280
first step in our workflow? This is a little test. See if you remember. Well, it's data, of course,
5059
08:59:58,280 --> 09:00:03,240
because all machine learning problems start with some form of data. We can't write a machine
5060
09:00:03,240 --> 09:00:09,160
learning algorithm to learn patterns and data that doesn't exist. So let's do this video. We're
5061
09:00:09,160 --> 09:00:14,760
going to make some data. Of course, you might start with some of your own that exists. But for now,
5062
09:00:14,760 --> 09:00:18,840
we're going to focus on just the concepts around the workflow. So we're going to make our own
5063
09:00:18,840 --> 09:00:24,600
custom data set. And to do so, I'll write the code first, and then I'll show you where I get it from.
5064
09:00:24,600 --> 09:00:29,960
We're going to import the scikit loan library. One of the beautiful things about Google Colab
5065
09:00:29,960 --> 09:00:36,760
is that it has scikit loan available. You're not sure what scikit loan is. It's a very popular
5066
09:00:36,760 --> 09:00:42,120
machine learning library. PyTorch is mainly focused on deep learning, but scikit loan is
5067
09:00:42,120 --> 09:00:47,400
focused on a lot of things around machine learning. So Google Colab, thank you for having scikit
5068
09:00:47,400 --> 09:00:53,320
loan already installed for us. But we're going to import the make circles data set. And rather
5069
09:00:53,320 --> 09:01:00,920
than talk about what it does, let's see what it does. So make 1000 samples. We're going to go N
5070
09:01:00,920 --> 09:01:10,040
samples equals 1000. And we're going to create circles. You might be wondering why circles. Well,
5071
09:01:10,040 --> 09:01:16,040
we're going to see exactly why circles later on. So X and Y, we're going to use this variable.
5072
09:01:16,040 --> 09:01:23,000
How would you say nomenclature as capital X and Y. Why is that? Because X is typically a matrix
5073
09:01:23,000 --> 09:01:32,040
features and labels. So let's go here. Mate circles. And we're going to make N samples. So 1000 different
5074
09:01:32,040 --> 09:01:36,600
samples. We're going to add some noise in there. Just put a little bit of randomness. Why not?
5075
09:01:36,600 --> 09:01:42,520
You can increase this as you want. I found that 0.03 is fairly good for what we're doing. And
5076
09:01:42,520 --> 09:01:46,680
then we're going to also pass in the random state variable, which is equivalent to sitting a random
5077
09:01:46,680 --> 09:01:53,400
or setting a random seed. So we're flavoring the randomness here. Wonderful. So now let's
5078
09:01:53,400 --> 09:02:00,040
have a look at the length of X, which should be what? And length of Y. Oh, we don't have Y
5079
09:02:00,040 --> 09:02:07,320
underscore getting a bit trigger happy with this keyboard here. 1000. So we have 1000 samples of
5080
09:02:07,320 --> 09:02:14,200
X caught with 1000 or paired with 1000 samples of Y features labels. So let's have a look at the
5081
09:02:14,200 --> 09:02:24,360
first five of X. So print first five samples of X. And then we'll put in here X. And we can index
5082
09:02:24,360 --> 09:02:33,240
on this five because we're adhering to the data, explorer's motto of visualize visualize visualize
5083
09:02:34,360 --> 09:02:39,640
first five samples of Y. And then we're going to go why same thing here.
5084
09:02:39,640 --> 09:02:47,480
Wonderful. Let's have a look. Maybe we'll get a new line in here. Just so
5085
09:02:50,600 --> 09:02:57,160
looks a bit better. Wonderful. So numerical. Our samples are already numerical. This is one of
5086
09:02:57,160 --> 09:03:01,640
the reasons why we're creating our own data set. We'll see later on how we get non numerical data
5087
09:03:01,640 --> 09:03:07,800
into numbers. But for now, our data is numerical, which means we can learn it with our model or
5088
09:03:07,800 --> 09:03:14,440
we can build a model to learn patterns in here. So this sample has the label of one. And this
5089
09:03:14,440 --> 09:03:19,880
sample has the label of one as well. Now, how many features do we have per sample? If I highlight
5090
09:03:19,880 --> 09:03:26,520
this line, how many features is this? It would make it a bit easier if there was a comma here,
5091
09:03:26,520 --> 09:03:33,960
but we have two features of X, which relates to one label of Y. And so far, we've only seen,
5092
09:03:33,960 --> 09:03:39,800
let's have a look at all of Y. We've got zero on one. So we've got two classes. What does this
5093
09:03:39,800 --> 09:03:46,760
mean? Zero or one? One thing or another? Well, it looks like binary classification to me,
5094
09:03:46,760 --> 09:03:52,120
because we've got only zero or only one. If there was zero, one, two, it would be
5095
09:03:53,160 --> 09:03:57,640
multi class classification, because we have more than two things. So let's X out of this.
5096
09:03:57,640 --> 09:04:03,800
Let's keep going and do a little bit more data exploration. So how about we make a data frame?
5097
09:04:03,800 --> 09:04:11,960
With pandas of circle data. There is truly no real definite way of how to explore data.
5098
09:04:11,960 --> 09:04:17,800
For me, I like to visualize it multiple different ways, or even look at random samples. In the case
5099
09:04:17,800 --> 09:04:26,040
of large data sets, such as images or text or whatnot. If you have 10 million samples, perhaps
5100
09:04:26,040 --> 09:04:34,200
visualizing them one by one is not the best way to do so. So random can help you out there.
5101
09:04:34,200 --> 09:04:39,800
So we're going to create a data frame, and we can insert a dictionary here. So I'm going to call
5102
09:04:39,800 --> 09:04:47,960
the features in this part of X, X1, and these are going to be X2. So let's say I'll write some code
5103
09:04:47,960 --> 09:04:58,360
to index on this. So everything in the zero index will be X1. And everything in the first index,
5104
09:04:59,000 --> 09:05:04,280
there we go, will be X2. Let me clean up this code. This should be on different lines,
5105
09:05:05,160 --> 09:05:15,720
enter. And then we've got, let's put in the label as Y. So this is just a dictionary here.
5106
09:05:15,720 --> 09:05:22,600
So X1 key to X0. X2, a little bit confusing because of zero indexing, but X feature one,
5107
09:05:22,600 --> 09:05:28,760
X feature two, and the label is Y. Let's see what this looks like. We'll look at the first 10 samples.
5108
09:05:29,800 --> 09:05:36,920
Okay, beautiful. So we've got X1, some numerical value, X2, another numerical value, correlates
5109
09:05:36,920 --> 09:05:46,600
to or matches up with label zero. But then this one, 0442208, and negative that number matches up
5110
09:05:46,600 --> 09:05:52,840
with label zero. So I can't tell what the patterns are just looking at these numbers. You might be
5111
09:05:52,840 --> 09:05:57,240
able to, but I definitely can't. We've got some ones. All these numbers look the same to me. So
5112
09:05:57,880 --> 09:06:04,040
what can we do next? Well, how about we visualize, visualize, visualize, and instead of just numbers
5113
09:06:04,040 --> 09:06:11,880
in a table, let's get graphical this time, visualize, visualize, visualize. So we're going to bring in
5114
09:06:11,880 --> 09:06:19,560
our friendly mapplotlib, import mapplotlib, which is a very powerful plotting library. I'm just
5115
09:06:19,560 --> 09:06:28,440
going to add some cells here. So we've got some space, mapplotlib.pyplot as PLT. That's right.
5116
09:06:28,440 --> 09:06:35,240
We've got this plot.scatter. We're going to do a scatterplot equals X. And we want the first index.
5117
09:06:35,880 --> 09:06:43,160
And then Y is going to be X as well. So that's going to appear on the Y axis. And then we want to
5118
09:06:43,160 --> 09:06:48,520
color it with labels. We're going to see what this looks like in a second. And then the color map,
5119
09:06:49,720 --> 09:06:56,760
C map stands for color map is going to be plot dot color map PLT. And then red, yellow, blue,
5120
09:06:56,760 --> 09:07:01,480
one of my favorite color outputs. So let's see what this looks like. You ready?
5121
09:07:03,720 --> 09:07:10,680
Ah, there we go. There's our circles. That's a lot better for me. So what do you think we're
5122
09:07:10,680 --> 09:07:15,320
going to try and do here? If this is our data and we're working on classification,
5123
09:07:16,360 --> 09:07:22,200
we're trying to predict if something is one thing or another. So our problem is we want to
5124
09:07:22,200 --> 09:07:29,080
try and separate these two circles. So say given a number here or given two numbers and X one
5125
09:07:29,080 --> 09:07:34,520
and an X two, which are coordinates here, we want to predict the label. Is it going to be a blue
5126
09:07:34,520 --> 09:07:40,920
dot or is it going to be a red dot? So we're working with binary classification. So we have
5127
09:07:40,920 --> 09:07:46,920
one thing or another. Do we have a blue dot or a red dot? So this is going to be our toy data here.
5128
09:07:46,920 --> 09:07:50,840
And a toy problem is, let me just write this down. This is a common thing that you'll also
5129
09:07:50,840 --> 09:08:01,400
hear in machine learning. Note, the data we're working with is often referred to as a toy data set,
5130
09:08:02,520 --> 09:08:15,960
a data set that is small enough to experiment on, but still sizable enough to practice the
5131
09:08:15,960 --> 09:08:20,760
fundamentals. And that's what we're really after in this notebook is to practice the fundamentals
5132
09:08:20,760 --> 09:08:27,000
of neural network classification. So we've got a perfect data set to do this. And by the way,
5133
09:08:27,000 --> 09:08:32,200
we've got this from scikit-learn. So this little function here made all of these samples for us.
5134
09:08:32,760 --> 09:08:38,520
And how could you find out more about this function here? Well, you could go scikit-learn
5135
09:08:39,080 --> 09:08:43,720
classification data sets. There are actually a few more in here that we could have done.
5136
09:08:43,720 --> 09:08:49,160
I just like the circle one. Toy data sets, we saw that. So this is like a toy box of different
5137
09:08:49,160 --> 09:08:54,280
data sets. So if you'd like to learn more about some data sets that you can have a look in here
5138
09:08:54,280 --> 09:08:59,400
and potentially practice on with neural networks or other forms of machine learning models from
5139
09:08:59,400 --> 09:09:04,360
scikit-learn, check out this scikit-learn. I can't speak highly enough. I know this is a pie-torch
5140
09:09:04,360 --> 09:09:08,840
course. We're not focused on this, but they kind of all come together in terms of the machine
5141
09:09:08,840 --> 09:09:12,840
learning and deep learning world. You might use something from scikit-learn, like we've done here,
5142
09:09:12,840 --> 09:09:17,160
to practice something. And then you might use pie-torch for something else, like what we're
5143
09:09:17,160 --> 09:09:23,160
doing here. Now, with that being said, what are the input and output shapes of our problem?
5144
09:09:25,480 --> 09:09:30,280
Have a think about that. And also have a think about how we'd split this into training and test.
5145
09:09:31,800 --> 09:09:36,280
So give those a go. We covered those concepts in some previous videos,
5146
09:09:36,280 --> 09:09:39,560
but we'll do them together in the next video. I'll see you there.
5147
09:09:39,560 --> 09:09:46,760
Welcome back. In the last video, we made some classification data so that we can
5148
09:09:46,760 --> 09:09:52,920
practice building a neural network in pie-torch to separate the blue dots from the red dots.
5149
09:09:52,920 --> 09:09:56,920
So let's keep pushing forward on that. And I'll just clean up here a little bit,
5150
09:09:56,920 --> 09:10:02,360
but where are we in our workflow? What have we done so far? Well, we've got our data ready a
5151
09:10:02,360 --> 09:10:06,840
little bit. We haven't turned it into tenses. So let's do that in this video, and then we'll
5152
09:10:06,840 --> 09:10:14,920
keep pushing through all of these. So in here, I'm going to make this heading 1.1. Check input
5153
09:10:14,920 --> 09:10:19,880
and output shapes. The reason we're focused a lot on input and output shapes is why,
5154
09:10:20,520 --> 09:10:27,480
because machine learning deals a lot with numerical representations as tenses. And input and output
5155
09:10:27,480 --> 09:10:32,360
shapes are some of the most common errors, like if you have a mismatch between your input and
5156
09:10:32,360 --> 09:10:36,760
output shapes of a certain layer of an output layer, you're going to run into a lot of errors
5157
09:10:36,760 --> 09:10:42,360
there. So that's why it's good to get acquainted with whatever data you're using, what are the
5158
09:10:42,360 --> 09:10:50,760
input shapes and what are the output shapes you'd like. So in our case, we can go x dot shape
5159
09:10:50,760 --> 09:10:56,840
and y dot shape. So we're working with NumPy arrays here if we just look at x. That's what the
5160
09:10:56,840 --> 09:11:02,040
make circles function is created for us. We've got an array, but as our workflow says,
5161
09:11:02,040 --> 09:11:06,760
we'd like it in tenses. If we're working with PyTorch, we want our data to be represented as
5162
09:11:06,760 --> 09:11:13,080
PyTorch tenses of that data type. And so we've got a shape here, we've got a thousand samples,
5163
09:11:13,080 --> 09:11:19,000
and x has two features, and y has no features. It's just a single number. It's a scalar. So it
5164
09:11:19,000 --> 09:11:23,960
doesn't have a shape here. So there's a thousand samples of y, thousand samples of x, two samples
5165
09:11:23,960 --> 09:11:30,520
of x equals one y label. Now, if you're working with a larger problem, you might have a thousand
5166
09:11:30,520 --> 09:11:38,840
samples of x, but x is represented by 128 different numbers, or 200 numbers, or as high as you want,
5167
09:11:38,840 --> 09:11:44,760
or just 10 or something like that. So just keep in mind that this number is quite flexible of how
5168
09:11:44,760 --> 09:11:52,440
many features represent a label. Why is the label here? But let's keep going. So view the first
5169
09:11:52,440 --> 09:12:01,000
example of features and labels. So let's make it explicit with what we've just been discussing.
5170
09:12:01,000 --> 09:12:06,680
We'll write some code to do so. We'll get the first sample of x, which is the zero index,
5171
09:12:06,680 --> 09:12:13,480
and we'll get the first sample of y, which is also the zero index. We could get really anyone
5172
09:12:13,480 --> 09:12:22,200
because they're all of the same shape. But print values for one sample of x. What does this equal?
5173
09:12:22,200 --> 09:12:34,840
X sample, and the same for y, which is y sample. And then we want to go print f string for one
5174
09:12:34,840 --> 09:12:45,560
sample of x. We'll get the shape here. X sample dot shape, and the same for y, and then we'll get
5175
09:12:45,560 --> 09:12:53,480
y sample dot shape. Beautiful. What's this going to do? Well, we've got one sample of x. So this
5176
09:12:53,480 --> 09:13:02,520
sample here of these numbers, we've got a lot going on here. 75424625 and 0231 48074. I mean,
5177
09:13:02,520 --> 09:13:07,880
you can try to find some patterns in those. If you do, all the best here, and the same for y. So this
5178
09:13:07,880 --> 09:13:14,280
is, we have the y sample, this correlates to a number one, a label of one. And then we have
5179
09:13:14,280 --> 09:13:20,680
shapes for one sample of x, which is two. So we have two features for y. It's a little bit confusing
5180
09:13:20,680 --> 09:13:26,040
here because y is a scalar, which doesn't actually have a shape. It's just one value. So for me,
5181
09:13:26,040 --> 09:13:31,320
in terms of speaking this, teaching it out loud, we'll be two features of x trying to predict
5182
09:13:31,320 --> 09:13:39,640
one number for y. And so let's now create another heading, which is 1.2. Let's get our data into
5183
09:13:39,640 --> 09:13:46,040
tenses, turn data into tenses. We have to convert them from NumPy. And we also want to create
5184
09:13:46,040 --> 09:13:51,000
train and test splits. Now, even though we're working with a toy data set here, the principle
5185
09:13:51,000 --> 09:13:57,480
of turning data into tenses and creating train and test splits will stay around for almost any
5186
09:13:57,480 --> 09:14:02,520
data set that you're working with. So let's see how we can do that. So we want to turn data
5187
09:14:02,520 --> 09:14:11,240
into tenses. And for this, we need to import torch, get pytorch and we'll check the torch version.
5188
09:14:11,240 --> 09:14:20,600
It has to be at least 1.10. And I might just put this down in the next cell. Just make sure we can
5189
09:14:20,600 --> 09:14:27,160
import pytorch. There we go, 1.10 plus CUDA 111. If your version is higher than that, that is okay.
5190
09:14:27,160 --> 09:14:33,640
The code below should still work. And if it doesn't, let me know. So x equals torch dot
5191
09:14:34,760 --> 09:14:42,600
from NumPy. Why are we doing this? Well, it's because x is a NumPy array. And if we go x dot,
5192
09:14:42,600 --> 09:14:52,760
does it have a d type attribute float 64? Can we just go type or maybe type? Oh, there we go.
5193
09:14:52,760 --> 09:15:01,160
NumPy and DRA. We can just go type x. NumPy and DRA. So we want it in a torch tensor. So we're
5194
09:15:01,160 --> 09:15:05,880
going to go from NumPy. We saw this in the fundamental section. And then we're going to change it into
5195
09:15:05,880 --> 09:15:12,760
type torch dot float. A float is an alias for float 32. We could type the same thing. These two are
5196
09:15:12,760 --> 09:15:19,000
equivalent. I just going to type torch float for writing less code. And then we're going to go
5197
09:15:19,000 --> 09:15:24,120
the same with why torch from NumPy. Now, why do we turn it into a torch float? Well, that's
5198
09:15:24,120 --> 09:15:33,240
because if you recall, the default type of NumPy arrays is, if we go might just put out this in
5199
09:15:33,240 --> 09:15:41,560
a comma x dot D type is float 64. There we go. However, pytorch, the default type is float 32.
5200
09:15:41,560 --> 09:15:46,360
So we're changing it into pytorch's default type. Otherwise, if we didn't have this little
5201
09:15:46,360 --> 09:15:51,880
section of code here dot type torch dot float, our tensors would be of float 64 as well. And that
5202
09:15:51,880 --> 09:15:58,200
may cause errors later on. So we're just going for the default data type within pytorch. And so
5203
09:15:58,200 --> 09:16:04,920
now let's have a look at the first five values of x and the first five values of y. What do we
5204
09:16:04,920 --> 09:16:11,800
have? Beautiful. We have tensor data types here. And now if we check the data type of x and we
5205
09:16:11,800 --> 09:16:19,640
check the data type of y, what do we have? And then one more, we'll just go type x. So we have
5206
09:16:19,640 --> 09:16:27,000
our data into tensors. Wonderful. But now so it's torch dot tensor. Beautiful. But now we would like
5207
09:16:27,000 --> 09:16:38,120
training and test sets. So let's go split data into training and test sets. And a very, very popular
5208
09:16:38,120 --> 09:16:45,160
way to split data is a random split. So before I issued the challenge of how you would split this
5209
09:16:45,160 --> 09:16:50,760
into a training and test set. So because these data points are kind of scattered all over the
5210
09:16:50,760 --> 09:16:58,680
place, we could split them randomly. So let's see what that looks like. To do so, I'm going to
5211
09:16:58,680 --> 09:17:05,000
use our faithful scikit learn again. Remember how I said scikit learn has a lot of beautiful methods
5212
09:17:05,000 --> 09:17:09,400
and functions for a whole bunch of different machine learning purposes. Well, one of them is
5213
09:17:09,400 --> 09:17:16,040
for a train test split. Oh my goodness, pytorch I didn't want auto correct there. Train test split.
5214
09:17:16,040 --> 09:17:20,520
Now you might be able to guess what this does. These videos are going to be a battle between me and
5215
09:17:20,520 --> 09:17:26,200
code labs auto correct. Sometimes it's good. Other times it's not. So we're going to set this code
5216
09:17:26,200 --> 09:17:31,080
up. I'm going to write it or we're going to write it together. So we've got x train for our training
5217
09:17:31,080 --> 09:17:36,360
features and X tests for our testing features. And then we also want our training labels and
5218
09:17:36,360 --> 09:17:43,080
our testing labels. That order is the order that train test split works in. And then we have train
5219
09:17:43,080 --> 09:17:48,200
test split. Now if we wrote this function and we wanted to find out more, I can press command
5220
09:17:48,200 --> 09:17:53,800
ship space, which is what I just did to have this. But truly, I don't have a great time reading all
5221
09:17:53,800 --> 09:18:01,160
of this. You might. But for me, I just like going train test split. And possibly one of the first
5222
09:18:01,160 --> 09:18:06,760
functions that appears, yes, is scikit learn. How good is that? So scikit learn dot model selection
5223
09:18:06,760 --> 09:18:13,800
dot train test split. Now split arrays or matrices into random train and test subsets. Beautiful.
5224
09:18:13,800 --> 09:18:19,000
We've got a code example of what's going on here. You can read what the different parameters do.
5225
09:18:19,000 --> 09:18:23,880
But we're going to see them in action. This is just another example of where machine learning
5226
09:18:23,880 --> 09:18:29,080
libraries such as scikit learn, we've used matplotlib, we've used pandas, they all interact
5227
09:18:29,080 --> 09:18:34,360
together to serve a great purpose. But now let's pass in our features and our labels.
5228
09:18:35,320 --> 09:18:41,560
This is the order that they come in, by the way. Oh, and we have the returns splitting. So the
5229
09:18:41,560 --> 09:18:47,320
order here, I've got the order goes x train x test y train y test took me a little while to
5230
09:18:47,320 --> 09:18:51,560
remember this order. But once you've created enough training test splits with this function,
5231
09:18:51,560 --> 09:18:56,280
you kind of know this off by heart. So just remember features first train first and then labels.
5232
09:18:57,320 --> 09:19:02,680
And we jump back in here. So I'm going to put in the test size parameter of 0.2.
5233
09:19:02,680 --> 09:19:10,200
This is percentage wise. So let me just write here 0.2 equals 20% of data will be test.
5234
09:19:10,200 --> 09:19:19,240
And 80% will be train. If we wanted to do a 50 50 split, that kind of split doesn't usually
5235
09:19:19,240 --> 09:19:27,080
happen, but you could go 0.5. But the test size says, hey, how big and percentage wise do you want
5236
09:19:27,080 --> 09:19:33,880
your test data to be? And so behind the scenes train test split will calculate what's 20% of
5237
09:19:33,880 --> 09:19:39,400
our x and y samples. So we'll see how many there is in a second. But let's also put a random state
5238
09:19:39,400 --> 09:19:45,640
in here. Because if you recall back in the documentation, train test split splits data
5239
09:19:45,640 --> 09:19:52,040
randomly into random train and test subsets. And random state, what does that do for us? Well,
5240
09:19:52,040 --> 09:20:00,040
this is a random seed equivalent of very similar to torch dot manual seed. However, because we are
5241
09:20:00,040 --> 09:20:07,000
using scikit learn, setting torch dot manual seed will only affect pytorch code rather than
5242
09:20:07,000 --> 09:20:14,040
scikit learn code. So we do this so that we get similar random splits. As in, I get a similar
5243
09:20:14,040 --> 09:20:20,680
random split to what your random split is. And in fact, they should be exactly the same. So let's
5244
09:20:20,680 --> 09:20:30,360
run this. And then we'll check the length of x train. And length of x test. So if we have 1000
5245
09:20:30,360 --> 09:20:35,800
total samples, and I know that because above in our make circles function, we said we want
5246
09:20:35,800 --> 09:20:40,440
1000 samples, that could be 10,000, that could be 100. That's the beauty of creating your own
5247
09:20:40,440 --> 09:20:47,480
data set. And we have length y train. If we have 20% testing values, how many samples are going
5248
09:20:47,480 --> 09:20:57,080
to be dedicated to the test sample, 20% of 1000 years, 200, and 80%, which is because training is
5249
09:20:57,080 --> 09:21:06,280
going to be training here. So 100 minus 20% is 80%. So 80% of 1000 years, let's find out.
5250
09:21:07,480 --> 09:21:15,480
Run all beautiful. So we have 800 training samples, 200 testing samples. This is going to be the
5251
09:21:15,480 --> 09:21:20,600
data set that we're going to be working with. So in the next video, we've now got training and
5252
09:21:20,600 --> 09:21:26,520
test sets, we've started to move through our beautiful pytorch workflow here. We've got our
5253
09:21:26,520 --> 09:21:30,840
data ready, we've turned it into tenses, we've created a training and test split. Now it's time
5254
09:21:30,840 --> 09:21:36,120
to build or pick a model. So I think we're still in the building phase. Let's do that in the next
5255
09:21:36,120 --> 09:21:45,080
video. Welcome back. In the last video, we split our data into training and test sets. And because
5256
09:21:45,080 --> 09:21:51,960
we did 80 20 split, we've got about 800 samples to train on, and 200 samples to test on. Remember,
5257
09:21:51,960 --> 09:21:59,160
the training set is so that the model can learn patterns, patterns that represent this data set
5258
09:21:59,160 --> 09:22:04,680
here, the circles data set, red dots or blue dots. And the test data set is so that we can
5259
09:22:04,680 --> 09:22:10,200
evaluate those patterns. And I took a little break before, but you can tell that because my
5260
09:22:10,200 --> 09:22:15,080
notebook is disconnected. But if I wanted to reconnect it, what could I do? We can go here,
5261
09:22:15,080 --> 09:22:19,800
run time, run before that's going to run all of the cells before. It shouldn't take too long
5262
09:22:19,800 --> 09:22:25,560
because we haven't done any large computations. But this is good timing because we're up to part
5263
09:22:25,560 --> 09:22:32,680
two, building a model. And so there's a fair few steps here, but nothing that we haven't covered
5264
09:22:32,680 --> 09:22:43,160
before, we're going to break it down. So let's build a model to classify our blue and red dots.
5265
09:22:43,160 --> 09:22:55,720
And to do so, we want to tenses. I want to not tenses. That's all right. So let me just make
5266
09:22:55,720 --> 09:23:01,560
some space here. There we go. So number one, let's set up device agnostic code. So we get in the
5267
09:23:01,560 --> 09:23:11,160
habit of creating that. So our code will run on an accelerator. I can't even spell accelerator.
5268
09:23:11,160 --> 09:23:19,320
It doesn't matter. You know what I mean? GPU. If there is one. Two. What should we do next?
5269
09:23:20,040 --> 09:23:23,720
Well, we should construct a model. Because if we want to build a model, we need a model.
5270
09:23:24,280 --> 09:23:30,280
Construct a model. And we're going to go by subclassing and then dot module.
5271
09:23:31,160 --> 09:23:36,200
Now we saw this in the previous section, we subclassed and then module. In fact, all models
5272
09:23:36,200 --> 09:23:45,560
in PyTorch subclass and end up module. And let's go define loss function and optimizer.
5273
09:23:47,480 --> 09:23:55,240
And finally, good collabs auto correct is not ideal. And then we'll create a training
5274
09:23:56,200 --> 09:24:00,600
and test loop. Though this will probably be in the next section. We'll focus on building a model
5275
09:24:00,600 --> 09:24:05,400
here. And of course, all of these steps are in line with what they're in line with this.
5276
09:24:05,400 --> 09:24:09,320
So we don't have device agnostic code here, but we're just going to do it enough so that we have
5277
09:24:09,320 --> 09:24:13,640
a habit. These are the main steps. Pick or build a pre-trained model, suit your problem, pick a
5278
09:24:13,640 --> 09:24:19,640
loss function and optimizer, build a training loop. So let's have a look. How can we start this off?
5279
09:24:19,640 --> 09:24:26,680
So we will import PyTorch. And and then we've already done this, but we're going to do it anyway
5280
09:24:26,680 --> 09:24:34,280
for completeness, just in case you wanted to run your code from here, import and then. And we're
5281
09:24:34,280 --> 09:24:44,280
going to make device agnostic code. So we'll set the device equal to CUDA if torch dot CUDA
5282
09:24:45,480 --> 09:24:53,400
is available else CPU, which will be the default. The CPU is the default. If there's no GPU,
5283
09:24:53,400 --> 09:24:58,840
which means that CUDA is available, all of our PyTorch code will default to using the CPU
5284
09:24:58,840 --> 09:25:05,560
device. Now we haven't set up a GPU yet so far. You may have, but as you see, my target device is
5285
09:25:05,560 --> 09:25:11,400
currently CPU. How about we set up a GPU? We can go into here runtime change runtime type
5286
09:25:11,960 --> 09:25:17,320
GPU. And I'm going to click save. Now this is going to restart the runtime and reconnect.
5287
09:25:18,360 --> 09:25:23,960
So once it reconnects beautiful, we could actually just run this code cell here.
5288
09:25:23,960 --> 09:25:29,640
This is going to set up the GPU device, but because we're only running this cell, if we were to just
5289
09:25:29,640 --> 09:25:37,080
set up X train, we've not been defined. So because we restarted our runtime, let's run all or we can
5290
09:25:37,080 --> 09:25:45,320
just run before. So this is going to rerun all of these cells here. And do we have X train now?
5291
09:25:45,320 --> 09:25:51,720
Let's have a look. Wonderful. Yes, we do. Okay, beautiful. So we've got device agnostic code.
5292
09:25:51,720 --> 09:25:56,680
In the next video, let's get on to constructing a model. I'll see you there.
5293
09:25:58,680 --> 09:26:04,040
Welcome back. In the last video, we set up some device agnostic code. So this is going to come in
5294
09:26:04,040 --> 09:26:09,480
later on when we send our model to the target device, and also our data to the target device.
5295
09:26:09,480 --> 09:26:13,160
This is an important step because that way, if someone else was able to run your code or you
5296
09:26:13,160 --> 09:26:17,560
were to run your code in the future, because we've set it up to be device agnostic,
5297
09:26:17,560 --> 09:26:21,720
quite a fault it will run on the CPU. But if there's an accelerator present,
5298
09:26:22,360 --> 09:26:27,480
well, that means that it might go faster because it's using a GPU rather than just using a CPU.
5299
09:26:28,040 --> 09:26:32,920
So we're up to step two here construct a model by subclassing and in module. I think we're going to
5300
09:26:33,880 --> 09:26:38,600
write a little bit of text here just to plan out the steps that we're doing. Now we've
5301
09:26:38,600 --> 09:26:48,920
set up device agnostic code. Let's create a model that we're going to break it down. We've got some
5302
09:26:48,920 --> 09:26:55,480
sub steps up here. We're going to break it down even this one down into some sub-sub steps. So number
5303
09:26:55,480 --> 09:27:04,440
one is we're going to subclass and then got module. And a reminder here, I want to make some space,
5304
09:27:04,440 --> 09:27:11,160
just so we're coding in about the middle of the page. So almost all models in pytorch,
5305
09:27:11,720 --> 09:27:17,240
subclass, and then got module because there's some great things that it does for us behind the
5306
09:27:17,240 --> 09:27:29,400
scenes. And step two is we're going to create two and then dot linear layers. And we want these
5307
09:27:29,400 --> 09:27:37,760
that are capable to handle our data. So that are capable of handling the shapes of our data.
5308
09:27:37,760 --> 09:27:46,120
Step three, we want to define or defines a forward method. Why do we want to define a forward method?
5309
09:27:46,120 --> 09:27:52,640
Well, because we're subclassing an end dot module, right? And so the forward method defines a forward
5310
09:27:52,640 --> 09:28:07,520
method that outlines the forward pass or forward computation of the model. And number four, we want
5311
09:28:07,520 --> 09:28:12,960
to instantiate, well, this doesn't really have to be the part of creating it, but we're going to do
5312
09:28:12,960 --> 09:28:27,280
anyway, and instantiate an instance of our model class and send it to the target device. So I'm
5313
09:28:27,280 --> 09:28:31,440
going to be a couple of little different steps here, but nothing too dramatic that we haven't really
5314
09:28:31,440 --> 09:28:39,360
covered before. So let's go number one, construct a model that subclasses an end dot module.
5315
09:28:39,360 --> 09:28:44,720
So I'm going to code this all out. Well, we're going to code this all out together. And then we'll
5316
09:28:44,720 --> 09:28:49,200
go back through and discuss it, and then maybe draw a few pictures or something to check out
5317
09:28:49,200 --> 09:28:56,000
what's actually happening. So circle model V one, because we're going to try and split some circles,
5318
09:28:56,000 --> 09:29:02,800
red and blue circles. This is our data up here. This is why it's called circle model, because we're
5319
09:29:02,800 --> 09:29:09,200
trying to separate the blue and red circle using a neural network. So we've subclassed an end dot
5320
09:29:09,200 --> 09:29:14,880
module. And when we create a class in Python, we'll create a constructor here, a net, and then
5321
09:29:15,520 --> 09:29:22,160
put in super dot underscore a net. And then inside the constructor, we're going to create our
5322
09:29:22,160 --> 09:29:27,920
layers. So this is number two, create two and then linear layers, capable of handling the shapes
5323
09:29:27,920 --> 09:29:39,120
of our data. So I'm going to write this down here to create two, two, and then dot linear layers, capable
5324
09:29:39,120 --> 09:29:47,680
of handling the shapes of our data. And so if we have a look at X train, what are the shapes here?
5325
09:29:47,680 --> 09:29:52,400
What's the input shape? Because X train is our features, right? Now features are going to go
5326
09:29:52,400 --> 09:30:00,800
into our model. So we have 800 training samples. This is the first number here of size two each.
5327
09:30:01,360 --> 09:30:07,600
So 800 of these and inside each is two numbers. Again, depending on the data set you're working
5328
09:30:07,600 --> 09:30:13,920
with, your features may be 100 in length, a vector of 100, or maybe a different size tensor all
5329
09:30:13,920 --> 09:30:18,080
together, or there may be millions. It really depends on what data set you're working with.
5330
09:30:18,080 --> 09:30:21,920
Because we're working with a simple data set, we're going to focus on that. But the principal
5331
09:30:21,920 --> 09:30:27,440
is still the same. You need to define a neural network layer that is capable of handling your
5332
09:30:27,440 --> 09:30:33,920
input features. So we're going to make layer one equals an n dot linear. And then if we wanted
5333
09:30:33,920 --> 09:30:38,960
to find out what's going on an n n dot linear, we could run shift command space on my computer,
5334
09:30:38,960 --> 09:30:43,760
because it's a Mac, maybe shift control space if you're on Windows. So we're going to define the
5335
09:30:43,760 --> 09:30:50,240
n features. What would n features be here? Well, we just decided that our X has two features.
5336
09:30:50,240 --> 09:30:55,440
So n features are going to be two. And now what is the out features? This one is a little bit tricky.
5337
09:30:58,560 --> 09:31:04,480
So in our case, we could have out features equal to one if we wanted to just pass a single linear
5338
09:31:04,480 --> 09:31:10,400
layer, but we want to create two linear layers here. So why would out features be one? Well,
5339
09:31:10,400 --> 09:31:16,960
that's because if we have a look at the first sample of Y train, we would want us to input,
5340
09:31:16,960 --> 09:31:25,040
or maybe we'll look at the first five. We want to map one sample of X to one sample of Y and Y
5341
09:31:25,040 --> 09:31:29,600
has a shape of one. Oh, well, really, it's nothing because it's a scalar, but we would still put
5342
09:31:29,600 --> 09:31:34,240
one here so that it outputs just one number. But we're going to change this up. We're going to put
5343
09:31:34,240 --> 09:31:40,320
it into five and we're going to create a second layer. Now, this is an important point of joining
5344
09:31:40,320 --> 09:31:47,040
together neural networks in features here. What do you think the in features of our second layer is
5345
09:31:47,040 --> 09:31:52,640
going to be? If we've produced an out feature of five here, now this number is arbitrary. We could
5346
09:31:52,640 --> 09:32:00,480
do 128. We could do 256. Generally, it's multiples of 8, 64. We're just doing five now because we're
5347
09:32:00,480 --> 09:32:04,880
keeping it nice and simple. We could do eight multiples of eight is because of the efficiency
5348
09:32:04,880 --> 09:32:09,360
of computing. I don't know enough about computer hardware to know exactly why that's the case,
5349
09:32:09,360 --> 09:32:14,800
but that's just a rule of thumb in machine learning. So the in features here has to match up with the
5350
09:32:14,800 --> 09:32:20,960
out features of a previous layer. Otherwise, we'll get shape mismatch errors. And so let's go here
5351
09:32:20,960 --> 09:32:26,000
out features. So we're going to treat this as the output layer. So this is the out features equals
5352
09:32:26,000 --> 09:32:35,840
one. So takes in two features and upscales to five features. So five numbers. So what this does,
5353
09:32:35,840 --> 09:32:41,760
what this layer is going to do is take in these two numbers of X, perform an end up linear. Let's
5354
09:32:41,760 --> 09:32:47,520
have a look at what equation it does. An end up linear is going to perform this function here
5355
09:32:48,240 --> 09:32:56,160
on the inputs. And it's going to upscale it to five features. Now, why would we do that? Well,
5356
09:32:56,160 --> 09:33:01,440
the rule of thumb here, because this is denoted as a hidden unit, or how many hidden neurons there
5357
09:33:01,440 --> 09:33:06,160
are. The rule of thumb is that the more hidden features there are, the more opportunity our model
5358
09:33:06,160 --> 09:33:11,360
has to learn patterns in the data. So to begin with, it only has two numbers to learn patterns on,
5359
09:33:11,360 --> 09:33:18,000
but at when we upscale it to five, it has five numbers to learn patterns on. Now, you might think,
5360
09:33:18,000 --> 09:33:22,400
why don't we just go straight to like 10,000 or something? But there is like an upper limit here
5361
09:33:22,400 --> 09:33:27,360
to sort of where the benefits start to trail off. We're just using five because it keeps it nice
5362
09:33:27,360 --> 09:33:32,960
and simple. And then the in features of the next layer is five, so that these two line up. We're
5363
09:33:32,960 --> 09:33:37,680
going to map this out visually in a moment, but let's keep coding. We've got in features two for
5364
09:33:37,680 --> 09:33:46,320
X. And now this is the output layer. So takes in five features from previous layer and outputs
5365
09:33:46,320 --> 09:33:55,600
a single feature. And now this is same shape. Same shape as why. So what is our next step? We
5366
09:33:55,600 --> 09:34:01,360
want to define a Ford method, a Ford computation of Ford pass. So the Ford method is going to
5367
09:34:01,360 --> 09:34:07,200
define the Ford computation. And as an input, it's going to take X, which is some form of data.
5368
09:34:07,200 --> 09:34:13,440
And now here's where we can use layer one and layer two. So now let's just go return.
5369
09:34:14,720 --> 09:34:19,520
Or we'll put a note here of what we're doing. Three, we're going to go define a Ford method
5370
09:34:19,520 --> 09:34:28,960
that outlines the Ford pass. So Ford, and we're going to return. And here's some notation we
5371
09:34:28,960 --> 09:34:34,000
haven't quite seen yet. And then we're going to go self layer two. And inside the brackets we'll
5372
09:34:34,000 --> 09:34:41,360
have self layer one inside those brackets. We're going to have X. So the way this goes is X goes
5373
09:34:41,360 --> 09:34:50,480
into layer one. And then the output of layer one goes into layer two. So whatever data we have,
5374
09:34:50,480 --> 09:34:58,000
so our training data, X train goes into layer one performs the linear calculation here. And then
5375
09:34:58,000 --> 09:35:07,280
it goes into layer two. And then layer two is going to output, go to the output. So X is the input,
5376
09:35:07,280 --> 09:35:15,600
layer one computation layer two output. So we've done that. Now let's do step four, which is
5377
09:35:16,400 --> 09:35:27,960
instantiate an instance of our model class. And send it to the target device. So this is our model
5378
09:35:27,960 --> 09:35:33,920
class, circle model V zero. We're just going to create a model because it's the first model we've
5379
09:35:33,920 --> 09:35:41,920
created up this section. Let's call it model zero. And we're going to go circle model V one. And then
5380
09:35:41,920 --> 09:35:47,600
we're going to go to two. And we're going to pass in device, because that's our target device.
5381
09:35:47,600 --> 09:35:54,560
Let's now have a look at model zero. And then Oh, typo. Yeah, classic.
5382
09:35:54,560 --> 09:36:02,720
What did we get wrong here? Oh, did we not pass in self self? Oh, there we go.
5383
09:36:06,480 --> 09:36:11,520
Little typo classic. But the beautiful thing about creating a class here is that we could put
5384
09:36:11,520 --> 09:36:16,160
this into a Python file, such as model dot pi. And then we wouldn't necessarily have to rewrite
5385
09:36:16,160 --> 09:36:21,200
this all the time, we could just call it. And so let's just check what the vice it's on.
5386
09:36:21,200 --> 09:36:27,120
So target device is CUDA, because we've got a GPU, thank you, Google Colab. And then if we wanted
5387
09:36:27,120 --> 09:36:36,480
to, let's go next model zero dot parameters, we'll call the parameters, and then we'll go device.
5388
09:36:38,720 --> 09:36:44,320
CUDA beautiful. So that means our models parameters are on the CUDA device. Now we've covered enough
5389
09:36:44,320 --> 09:36:49,520
code in here for this video. So if you want to understand it a little bit more, go back through
5390
09:36:49,520 --> 09:36:54,320
it. But we're going to come back in the next video and make it a little bit more visual. So I'll see
5391
09:36:54,320 --> 09:37:02,080
you there. Welcome back. In the last video, we did something very, very exciting. We created our
5392
09:37:02,080 --> 09:37:07,680
first multi layer neural network. But right now, this is just code on a page. But truly, this is
5393
09:37:07,680 --> 09:37:12,160
what the majority of building machine learning models in PyTorch is going to look like. You're
5394
09:37:12,160 --> 09:37:18,880
going to create some layers, or a simple or as complex as you like. And then you're going to
5395
09:37:18,880 --> 09:37:26,160
use those layers in some form of Ford computation to create the forward pass. So let's make this a
5396
09:37:26,160 --> 09:37:32,240
little bit more visual. If we go over to the TensorFlow playground, and now TensorFlow is another
5397
09:37:32,240 --> 09:37:37,440
deep learning framework similar to PyTorch, it just allows you to write code such as this,
5398
09:37:38,240 --> 09:37:43,040
to build neural networks, fit them to some sort of data to find patterns and data,
5399
09:37:43,040 --> 09:37:50,320
and then use those machine learning models in your applications. But let's create this. Oh,
5400
09:37:50,320 --> 09:37:55,280
by the way, this is playground.tensorFlow.org. This is a neural network that we can train in
5401
09:37:55,280 --> 09:38:01,200
the browser if we really wanted to. So that's pretty darn cool. But we've got a data set here,
5402
09:38:01,200 --> 09:38:07,520
which is kind of similar to the data set that we're working with. We have a look at our circles one.
5403
09:38:07,520 --> 09:38:12,240
Let's just say it's close enough. It's circular. That's what we're after. But if we increase this,
5404
09:38:12,240 --> 09:38:17,280
we've got five neurons now. We've got two features here, X1 and X2. Where is this
5405
09:38:17,280 --> 09:38:21,120
reminding you of what's happening? There's a lot of things going on here that we haven't covered
5406
09:38:21,120 --> 09:38:25,920
yet, but don't worry too much. We're just focused on this neural network here. So we've got some
5407
09:38:25,920 --> 09:38:31,360
features as the input. We've got five hidden units. This is exactly what's going on with the model
5408
09:38:31,360 --> 09:38:38,000
that we just built. We pass in X1 and X2, our values. So if we go back to our data set,
5409
09:38:38,000 --> 09:38:46,720
these are X1 and X2. We pass those in. So we've got two input features. And then we pass them to a
5410
09:38:46,720 --> 09:38:52,240
hidden layer, a single hidden layer, with five neurons. What have we just built? If we come down
5411
09:38:52,240 --> 09:38:59,440
into here to our model, we've got in features two, out features five. And then that feeds into
5412
09:38:59,440 --> 09:39:05,360
another layer, which has in features five and out features one. So this is the exact same model
5413
09:39:05,360 --> 09:39:10,160
that we've built here. Now, if we just turn this back to linear activation, because we're sticking
5414
09:39:10,160 --> 09:39:15,280
with linear for now, we'll have a look at different forms of activation functions later on. And maybe
5415
09:39:15,280 --> 09:39:21,280
we put the learning rate, we've seen the learning rate to 0.01. We've got epochs here, got classification.
5416
09:39:21,920 --> 09:39:25,760
And we're going to try and fit this neural network to this data. Let's see what happens.
5417
09:39:25,760 --> 09:39:37,040
Oh, the test loss, it's sitting about halfway 0.5. So about 50% loss. So if we only have two
5418
09:39:37,040 --> 09:39:44,160
classes and we've got a loss of 50%, what does that mean? Well, the perfect loss was zero.
5419
09:39:44,800 --> 09:39:50,400
And the worst loss was one. Then we just divide one by two and get 50%. But we've only got two
5420
09:39:50,400 --> 09:39:57,280
classes. So that means if our model was just randomly guessing, it would get a loss of about 0.5,
5421
09:39:57,280 --> 09:40:01,520
because you could just randomly guess whatever data point belongs to blue or orange in this case.
5422
09:40:02,080 --> 09:40:07,440
So in a binary classification problem, if you have the same number of samples in each class,
5423
09:40:07,440 --> 09:40:12,720
in this case, blue dots and orange dots, randomly guessing will get you about 50%. Just like tossing
5424
09:40:12,720 --> 09:40:18,080
a coin, toss a coin 100 times and you get about 50 50 might be a little bit different, but it's
5425
09:40:18,080 --> 09:40:24,240
around about that over the long term. So we've just fit for 3000 epochs. And we're still not getting
5426
09:40:24,240 --> 09:40:30,080
any better loss. Hmm. I wonder if that's going to be the case for our neural network. And so to
5427
09:40:30,080 --> 09:40:35,440
draw this in a different way, I'm going to come to a little tool called fig jam, which is just a
5428
09:40:35,440 --> 09:40:40,400
whiteboard that we can put shapes on and it's based on the browser. So this is going to be nothing
5429
09:40:40,400 --> 09:40:49,280
fancy. It's going to be a simple diagram. Say this is our input. And I'm going to make this green
5430
09:40:49,280 --> 09:40:54,240
because my favorite color is green. And then we're going to have, let's make some different
5431
09:40:54,240 --> 09:41:03,360
colored dots. I want a blue dot here. So this can be dot one, and dot two, I'll put another dot
5432
09:41:03,360 --> 09:41:11,120
here. I'll zoom out a little so we have a bit more space. Well, maybe that was too much. 50%
5433
09:41:11,120 --> 09:41:17,280
looks all right. So let me just move this around, move these up a little. So we're building a neural
5434
09:41:17,280 --> 09:41:24,240
network here. This is exactly what we just built. And so we'll go here. Well, maybe we'll put this
5435
09:41:24,240 --> 09:41:29,920
as input X one. So this will make a little bit more sense. And then we'll maybe we can copy this.
5436
09:41:29,920 --> 09:41:39,680
Now this is X two. And then we have some form of output. Let's make this one. And we're going to
5437
09:41:39,680 --> 09:41:48,400
color this orange. So output. Right. So you can imagine how we got connected dots here.
5438
09:41:48,400 --> 09:41:54,320
They will connect these. So our inputs are going to go through all of these. I wonder if I can
5439
09:41:54,320 --> 09:41:59,360
draw here. Okay, this is going to be a little bit more complex, but that's all right. So this
5440
09:41:59,360 --> 09:42:04,560
is what we've done. We've got two input features here. And if we wanted to keep drawing these,
5441
09:42:04,560 --> 09:42:09,280
we could all of these input features are going to go through all of these hidden units that we
5442
09:42:09,280 --> 09:42:14,480
have. I just drew the same arrow twice. That's okay. But this is what's happening in the forward
5443
09:42:14,480 --> 09:42:20,640
computation method. It can be a little bit confusing for when we coded it out. Why is that? Well,
5444
09:42:20,640 --> 09:42:27,040
from here, it looks like we've only got an input layer into a single hidden layer in the blue.
5445
09:42:27,040 --> 09:42:34,320
And an output layer. But truly, this is the same exact shape. You get the point. And then all of
5446
09:42:34,320 --> 09:42:40,480
these go to the output. But we're going to see this computationally later on. So whatever data set
5447
09:42:40,480 --> 09:42:45,040
you're working with, you're going to have to manufacture some form of input layer. Now this
5448
09:42:45,040 --> 09:42:51,040
may be you might have 10 of these if you have 10 features. Or four of them if you have four
5449
09:42:51,040 --> 09:42:58,320
features. And then if you wanted to adjust these, well, you could increase the number of hidden
5450
09:42:58,320 --> 09:43:03,520
units or the number of out features of a layer. What just has to match up is that the layer it's
5451
09:43:03,520 --> 09:43:09,360
going into has to have a similar shape as the what's coming out of here. So just keep that in mind
5452
09:43:09,360 --> 09:43:15,040
as you're going on. And in our case, we only have one output. So we have the output here,
5453
09:43:15,040 --> 09:43:20,880
which is why. So this is a visual version. We've got the TensorFlow playground. You could play
5454
09:43:20,880 --> 09:43:26,960
around with that. You can change this to increase. Maybe you want five hidden layers with five neurons
5455
09:43:26,960 --> 09:43:34,160
in each. This is a fun way to explore. This is a challenge, actually, go to playground.tensorflow.org,
5456
09:43:34,160 --> 09:43:38,960
replicate this network and see if it fits on this type of data. What do you think, will it?
5457
09:43:39,920 --> 09:43:43,840
Well, we're going to have to find out in the next few videos. So I'm going to show you in the
5458
09:43:43,840 --> 09:43:50,720
next video another way to create the network that we just created. This one here with even less
5459
09:43:50,720 --> 09:43:58,080
code than what we've done before. I'll see you there. Welcome back. In the last video, what we
5460
09:43:58,080 --> 09:44:02,240
discussed, well, actually, in the previous video to last, we coded up this neural network here,
5461
09:44:02,240 --> 09:44:08,400
circle model V zero. By subclassing an end or module, we created two linear layers, which are
5462
09:44:08,400 --> 09:44:14,160
capable of handling the shape of our data in features two because why we have two X features.
5463
09:44:14,160 --> 09:44:19,120
Out features were upscaling the two features to five so that it gives our network more of a
5464
09:44:19,120 --> 09:44:24,560
chance to learn. And then because we've upscaled it to five features, the next subsequent layer
5465
09:44:24,560 --> 09:44:29,600
has to be able to handle five features as input. And then we have one output feature because that's
5466
09:44:29,600 --> 09:44:34,320
the same shape as our Y here. Then we got a little bit visual by using the TensorFlow
5467
09:44:34,320 --> 09:44:39,680
playground. Did you try out that challenge, make five in layers with five neurons? Did it work?
5468
09:44:41,200 --> 09:44:45,440
And then we also got a little bit visual in Figma as well. This is just another way of
5469
09:44:45,440 --> 09:44:49,440
visualizing different things. You might have to do this a fair few times when you first start
5470
09:44:49,440 --> 09:44:55,440
with neural networks. But once you get a bit of practice, you can start to infer what's going on
5471
09:44:55,440 --> 09:45:01,440
through just pure code. So now let's keep pushing forward. How about we replicate this
5472
09:45:01,440 --> 09:45:07,280
with a simpler way? Because our network is quite simple, that means it only has two layers.
5473
09:45:07,280 --> 09:45:17,280
That means we can use. Let's replicate the model above using nn.sequential. And I'm going to code
5474
09:45:17,280 --> 09:45:22,480
this out. And then we can look up what nn.sequential is. But I think you'll be able to comprehend what's
5475
09:45:22,480 --> 09:45:29,440
happening by just looking at it. So nn, which is torch.nn. We can do torch.nn, but we've already
5476
09:45:29,440 --> 09:45:37,840
imported nn. We're going to call nn.sequential. And then we're going to go nn.linear. And what
5477
09:45:37,840 --> 09:45:45,920
was the in features of our nn.linear? Well, it was two because we have two in features. And then
5478
09:45:45,920 --> 09:45:50,080
we're going to replicate the same out features. Remember, we could customize this to whatever we
5479
09:45:50,080 --> 09:45:59,040
want 10, 100, 128. I'm going to keep it at five, nice and simple. And then we go nn.linear. And
5480
09:45:59,040 --> 09:46:03,920
the in features of this next layer is going to be five because the out features of the previous
5481
09:46:03,920 --> 09:46:09,840
layer was five. And then finally, the out features here is going to be one because we want one y
5482
09:46:09,840 --> 09:46:15,200
value to our two x features. And then I'm going to send that to the device. And then I'm going to
5483
09:46:15,200 --> 09:46:24,000
have a look at model zero. So this is, of course, going to override our previous model zero. But
5484
09:46:24,000 --> 09:46:28,640
have a look. The only thing different is that this is from the circle model V zero class. We
5485
09:46:28,640 --> 09:46:35,840
subclassed an n dot module. And the only difference is the name here. This is just from sequential.
5486
09:46:36,640 --> 09:46:42,480
And so can you see what's going on here? So as you might have guessed sequential,
5487
09:46:43,280 --> 09:46:49,600
it implements most of this code for us behind the scenes. Because we've told it that it's going
5488
09:46:49,600 --> 09:46:53,840
to be sequential, it's just going to go, hey, step the code through this layer, and then step
5489
09:46:53,840 --> 09:47:00,640
the code through this layer. And outputs basically the same model, rather than us creating our own
5490
09:47:00,640 --> 09:47:04,400
forward method, you might be thinking, Daniel, why don't you show us this earlier? That looks
5491
09:47:04,400 --> 09:47:10,640
like such an easy way to create a neural network compared to this. Well, yes, you're 100% right.
5492
09:47:10,640 --> 09:47:17,520
That is an easier way to create a neural network. However, the benefit of subclassing, and that's
5493
09:47:17,520 --> 09:47:22,720
why I started from here, is that when you have more complex operations, such as things you'd
5494
09:47:22,720 --> 09:47:29,040
like to construct in here, and a more complex forward pass, it's important to know how to
5495
09:47:29,040 --> 09:47:34,080
build your own subclasses of nn dot module. But for simple straightforward stepping through
5496
09:47:34,080 --> 09:47:39,840
each layer one by one, so this layer first, and then this layer, you can use nn dot sequential.
5497
09:47:39,840 --> 09:47:49,360
In fact, we could move this code up into here. So we could do this self dot, we'll call this
5498
09:47:49,360 --> 09:48:00,480
two linear layers equals nn dot sequential. And we could have layer one, we could go self,
5499
09:48:01,760 --> 09:48:10,720
self dot layer one. And or actually, we'll just recode it, we'll go nn dot linear. So it's so
5500
09:48:10,720 --> 09:48:16,320
it's the same code is what we've got below in features. If I could type that'll be great,
5501
09:48:16,320 --> 09:48:24,800
n features equals two, out features equals five. And then we go nn dot linear. And then we go
5502
09:48:24,800 --> 09:48:30,560
n features equals what equals five, because it has to line up out features equals one.
5503
09:48:32,080 --> 09:48:36,480
And then we've got two linear layers. And then if we wanted to get rid of this, return
5504
09:48:36,480 --> 09:48:48,160
to linear layers, and we'll pass it X remake it. There we go. Well, because we've created these as
5505
09:48:48,160 --> 09:48:55,600
well, let's get rid of that. Beautiful. So that's the exact same model, but just using nn dot
5506
09:48:55,600 --> 09:49:00,880
sequential. Now I'm going to get rid of this so that our code is not too verbose. That means a lot
5507
09:49:00,880 --> 09:49:08,080
going on. But this is the flexibility of PyTorch. So just keep in mind that there's a fair few ways
5508
09:49:08,080 --> 09:49:16,080
to make a model. The simplest is probably sequential. And then subclassing is this is a little bit
5509
09:49:16,080 --> 09:49:22,960
more complicated than what we've got. But this can extend to handle lot more complex neural networks,
5510
09:49:22,960 --> 09:49:28,320
which you'll likely have to be building later on. So let's keep pushing forward. Let's see what
5511
09:49:28,320 --> 09:49:32,000
happens if we pass some data through here. So we'll just rerun this cell to make sure we've got
5512
09:49:32,000 --> 09:49:38,480
our model zero instantiated. We'll make some predictions with the model. So of course, if we
5513
09:49:38,480 --> 09:49:46,080
have a look at our model zero dot state deck, oh, this will be a good experiment. So look at this.
5514
09:49:46,080 --> 09:49:53,440
So we have weight, a weight tensor, a bias tensor, a weight tensor, and a bias tensor. So this is
5515
09:49:53,440 --> 09:49:59,040
for the first of the zeroth layer, these two here with the zero dot, and then the one dot weight is
5516
09:49:59,040 --> 09:50:06,320
four, of course, the first index layer. Now have a look inside here. Now you see how out features
5517
09:50:06,320 --> 09:50:14,800
is five. Well, that's why our bias parameter has five values here. And the same thing for this weight
5518
09:50:14,800 --> 09:50:23,200
value here. And the weight value here, why is this have 10 samples? One, two, three, four, five, six,
5519
09:50:23,200 --> 09:50:31,280
seven, eight, nine, 10, because two times five equals 10. So this is just with a simple two layer
5520
09:50:31,280 --> 09:50:36,800
network, look at all the numbers that are going on behind the scenes. Imagine coding all of these
5521
09:50:36,800 --> 09:50:43,600
by hand. Like there's something like 20 numbers or something here. Now we've only done two layers
5522
09:50:43,600 --> 09:50:48,160
here. Now the beauty of this is that in the previous section, we created all of the weight
5523
09:50:48,160 --> 09:50:53,200
and biases using an end dot parameter and random values. You'll notice that these are all random
5524
09:50:53,200 --> 09:50:59,680
two. Again, if yours are different to mine, don't worry too much, because they're going to be started
5525
09:50:59,680 --> 09:51:05,120
randomly and we haven't set a random seed. But the thing to note here is that PyTorch is creating
5526
09:51:05,120 --> 09:51:10,400
all of these parameters for us behind the scenes. And now when we do back propagation and gradient
5527
09:51:10,400 --> 09:51:15,440
descent, when we code our training loop, the optimizer is going to change all of these values
5528
09:51:15,440 --> 09:51:20,720
ever so slightly to try and better fit or better represent the data so that we can split our two
5529
09:51:20,720 --> 09:51:31,440
circles. And so you can imagine how verbose this could get if we had say 50 layers with 128 different
5530
09:51:31,440 --> 09:51:36,320
features of each. So let's change this up, see what happens. Watch how quickly the numbers get
5531
09:51:36,320 --> 09:51:41,680
out of hand. Look at that. We just changed one value and look how many parameters our model has.
5532
09:51:41,680 --> 09:51:48,240
So you might be able to calculate that by hand, but I personally don't want to. So we're going to
5533
09:51:48,240 --> 09:51:53,680
let PyTorch take care of a lot of that for us behind the scenes. So for now we're keeping it simple,
5534
09:51:53,680 --> 09:51:59,360
but that's how we can crack our models open and have a look at what's going on. Now that was a
5535
09:51:59,360 --> 09:52:03,600
little detour. It's time to make some predictions with random numbers. I just wanted to highlight
5536
09:52:03,600 --> 09:52:09,840
the fact that our model is in fact instantiated with random numbers here. So the untrained threads
5537
09:52:09,840 --> 09:52:15,840
model zero, we're going to pass in X test. And of course, we have to send the test data to the
5538
09:52:15,840 --> 09:52:21,200
device. Otherwise, if it's on a different device, we'll get errors because PyTorch likes to make
5539
09:52:21,200 --> 09:52:28,320
calculations on the same device. So we'll go print. Let's do a nice print statement of length of
5540
09:52:28,320 --> 09:52:35,360
predictions. We're going to go length or then untrained threads, we'll pass that in there.
5541
09:52:36,080 --> 09:52:43,040
And then we'll go, oh no, we need to squiggle. And then we'll go shape. Shape is going to be
5542
09:52:43,600 --> 09:52:51,360
untrained spreads dot shape. So this is again, following the data explorer's motto of visualize,
5543
09:52:51,360 --> 09:52:57,680
visualize, visualize. And sometimes print is one of the best ones to do so. So length of test samples,
5544
09:52:58,880 --> 09:53:02,640
you might already know this, or we've already checked this together, haven't we? X test.
5545
09:53:04,160 --> 09:53:11,600
And then we're going to get the shape, which is going to be X test dot shape. Wonderful. And then
5546
09:53:11,600 --> 09:53:19,520
we're going to print. What's our little error here? Oh no, collabs tricking me. So let's go first
5547
09:53:19,520 --> 09:53:28,080
10 predictions. And we're going to go untrained threads. So how do you think these predictions will
5548
09:53:28,080 --> 09:53:34,960
fare? They're doing it with random numbers. And what are we trying to predict again? Well,
5549
09:53:34,960 --> 09:53:41,360
we're trying to predict whether a dot is a red dot or a blue dot or zero or one. And then we'll go
5550
09:53:41,360 --> 09:53:49,680
first 10 labels is going to be, we'll get this on the next line. And we'll go Y test.
5551
09:53:52,560 --> 09:53:59,280
Beautiful. So let's have a look at this untrained predictions. So we have length of predictions
5552
09:53:59,280 --> 09:54:05,760
is 200. Length of test samples is 200. But the shapes are different. What's going on here?
5553
09:54:05,760 --> 09:54:16,400
Y test. And let's have a look at X test. Oh, well, I better just have a look at Y test.
5554
09:54:18,720 --> 09:54:26,560
Why don't we have a two there? Oh, I've done X test dot shape. Oh, let's test samples. That's
5555
09:54:26,560 --> 09:54:32,240
okay. And then the predictions are one. Oh, yes. So Y test. Let's just check the first 10 X test.
5556
09:54:32,240 --> 09:54:38,240
So a little bit of clarification needed here with your shapes. So maybe we'll get this over here
5557
09:54:38,240 --> 09:54:45,440
because I like to do features first and then labels. What did we miss here? Oh, X test 10
5558
09:54:46,080 --> 09:54:50,720
and Y test. See, we're troubleshooting on the fly here. This is what you're going to do with
5559
09:54:50,720 --> 09:54:54,880
a lot of your code. So there's our test values. There's the ideal labels. But our predictions,
5560
09:54:54,880 --> 09:54:58,960
they don't look like our labels. What's going on here? We can see that they're on the CUDA device,
5561
09:54:58,960 --> 09:55:04,480
which is good. We said that. We can see that they got gradient tracking. Oh, we didn't with
5562
09:55:05,040 --> 09:55:10,640
touch. We didn't do inference mode here. That's a poor habit of us. Excuse me. Let's inference
5563
09:55:10,640 --> 09:55:16,320
mode this. There we go. So you notice that the gradient tracking goes away there. And so our
5564
09:55:16,320 --> 09:55:22,640
predictions are nowhere near what our test labels are. But also, they're not even in the same like
5565
09:55:22,640 --> 09:55:29,200
ball park. Like these are whole numbers, one or zero. And these are all floats between one and
5566
09:55:29,200 --> 09:55:36,240
zero. Hmm. So maybe rounding them. Will that do something? So where's our threads here? So
5567
09:55:36,800 --> 09:55:45,600
we go torch dot round. What happens there? Oh, they're all zero. Well, our model is probably
5568
09:55:45,600 --> 09:55:49,680
going to get about 50% accuracy. Why is that? Because all the predictions look like they're
5569
09:55:49,680 --> 09:55:57,280
going to be zero. And they've only got two options, basically head or tails. So when we create our
5570
09:55:57,280 --> 09:56:02,640
model and when we evaluate it, we want our predictions to be in the same format as our labels. But
5571
09:56:02,640 --> 09:56:06,960
we're going to cover some steps that we can take to do that in a second. What's important to take
5572
09:56:06,960 --> 09:56:10,720
away from this is that there's another way to replicate the model we've made above using
5573
09:56:10,720 --> 09:56:15,680
nn dot sequential. We've just replicated the same model as what we've got here. And n dot
5574
09:56:15,680 --> 09:56:21,280
sequential is a simpler way of creating a pytorch model. But it's limited because it literally
5575
09:56:21,280 --> 09:56:26,960
just sequentially steps through each layer in order. Whereas in here, you can get as creative as you
5576
09:56:26,960 --> 09:56:32,480
want with the forward computation. And then inside our model, pytorch has behind the scenes
5577
09:56:32,480 --> 09:56:38,960
created us some weight and bias tensors for each of our layers with regards to the shapes that
5578
09:56:38,960 --> 09:56:46,800
we've set. And so the handy thing about this is that if we got quite ridiculous with our layers,
5579
09:56:46,800 --> 09:56:50,640
pytorch would still do the same thing behind the scenes, create a whole bunch of random numbers for
5580
09:56:50,640 --> 09:56:56,480
us. And because our numbers are random, it looks like our model isn't making very good predictions.
5581
09:56:56,480 --> 09:57:00,560
But we're going to fix this in the next few videos when we move on to
5582
09:57:02,720 --> 09:57:06,560
fitting the model to the data and making a prediction. But before we do that, we need to
5583
09:57:06,560 --> 09:57:11,200
pick up a loss function and an optimizer and build a training loop. So let's get on to these two things.
5584
09:57:13,680 --> 09:57:19,120
Welcome back. So over the past few videos, we've been setting up a classification model to deal
5585
09:57:19,120 --> 09:57:23,600
with our specific shape of data. Now recall, depending on the data set that you're working
5586
09:57:23,600 --> 09:57:28,400
with will depend on what layers you use for now we're keeping it simple and n dot linear is one
5587
09:57:28,400 --> 09:57:33,680
of the most simple layers in pytorch. We've got two input features, we're upscaling that to five
5588
09:57:33,680 --> 09:57:38,960
output features. So we have five hidden units, and then we have one output feature. And that's in line
5589
09:57:38,960 --> 09:57:46,240
with the shape of our data. So two features of x equals one number for y. So now let's continue
5590
09:57:46,240 --> 09:57:52,720
on modeling with where we're up to. We have build or pick a model. So we've built a model. Now we
5591
09:57:52,720 --> 09:57:59,280
need to pick a loss function and optimizer. We're getting good at this. So let's go here,
5592
09:57:59,280 --> 09:58:06,880
set up loss function and optimizer. Now here comes the question. If we're working on classification
5593
09:58:06,880 --> 09:58:13,520
previously, we used, let's go to the next one, and an dot L one loss for regression, which is
5594
09:58:13,520 --> 09:58:19,120
MAE mean absolute error, just a heads up that won't necessarily work with a classification problem.
5595
09:58:19,120 --> 09:58:31,040
So which loss function or optimizer should you use? So again, this is problem specific. But with
5596
09:58:31,040 --> 09:58:37,680
a little bit of practice, you'll get used to using different ones. So for example, for regression,
5597
09:58:38,720 --> 09:58:43,040
you might want, which is regressions predicting a number. And I know it can get fusing because
5598
09:58:43,040 --> 09:58:47,360
it looks like we're predicting a number here, we are essentially predicting a number. But this
5599
09:58:47,360 --> 09:58:54,560
relates to a class. So for regression, you might want MAE or MSE, which is mean absolute
5600
09:58:56,000 --> 09:59:07,920
absolute error, or mean squared error. And for classification, you might want binary cross entropy
5601
09:59:09,120 --> 09:59:16,880
or categorical cross entropy, which is sometimes just referred to as cross entropy. Now, where would
5602
09:59:16,880 --> 09:59:24,000
you find these things out? Well, through the internet, of course. So you could go, what is
5603
09:59:24,000 --> 09:59:30,240
binary cross entropy? I'm going to leave you this for your extra curriculum to read through this.
5604
09:59:30,240 --> 09:59:35,120
We've got a fair few resources here. Understanding binary cross entropy slash log loss
5605
09:59:36,800 --> 09:59:42,880
by Daniel Godoy. Oh, yes. Great first name, my friend. This is actually the article that I
5606
09:59:42,880 --> 09:59:46,480
would recommend to if you want to learn what's going on behind the scenes through binary cross
5607
09:59:46,480 --> 09:59:51,520
entropy. For now, there's a lot of math there. We're going to be writing code to implement this. So
5608
09:59:51,520 --> 09:59:56,720
PyTorch has done this for us. Essentially, what does a loss function do? Let's remind ourselves.
5609
09:59:58,000 --> 10:00:09,280
Go down here. As a reminder, the loss function measures how wrong your models' predictions are.
5610
10:00:09,280 --> 10:00:17,120
So I also going to leave a reference here to I've got a little table here in the book version of
5611
10:00:17,120 --> 10:00:23,120
this course. So 0.2 neural network classification with PyTorch set up loss function and optimizer.
5612
10:00:23,120 --> 10:00:27,280
So we've got some example loss functions and optimizers here, such as stochastic gradient
5613
10:00:27,280 --> 10:00:32,880
descent or SGD optimizer, atom optimizer is also very popular. So I've got problem type here,
5614
10:00:32,880 --> 10:00:37,920
and then the PyTorch code that we can implement this with. We've got binary cross entropy loss.
5615
10:00:37,920 --> 10:00:44,000
We've got cross entropy loss, mean absolute error, MAE, mean squared error, MSE. So you want to use
5616
10:00:44,000 --> 10:00:48,080
these two for regression. There are other different loss functions you could use, but these are some
5617
10:00:48,080 --> 10:00:52,240
of the most common. That's what I'm focusing on, the most common ones that are going to get you
5618
10:00:52,240 --> 10:00:57,440
through a fair few problems. We've got binary classification, multi-class classification. What
5619
10:00:57,440 --> 10:01:03,600
are we working with? We're working with binary classification. So we're going to look at torch.nn
5620
10:01:03,600 --> 10:01:10,000
BCE, which stands for binary cross entropy, loss with logits. What the hell is a logit?
5621
10:01:10,640 --> 10:01:14,880
And BCE loss. Now this is confusing. Then trust me, when I first started using PyTorch,
5622
10:01:14,880 --> 10:01:18,880
I got a little bit confused about why they have two here, but we're going to explore that anyway.
5623
10:01:18,880 --> 10:01:26,160
So what is a logit? So if you search what is a logit, you'll get this and you'll get statistics
5624
10:01:26,160 --> 10:01:30,240
and you'll get the log odds formula. In fact, if you want to read more, I would highly encourage it.
5625
10:01:30,240 --> 10:01:35,120
So you could go through all of this. We're going to practice writing code for it instead.
5626
10:01:35,680 --> 10:01:42,880
Luckily PyTorch does this for us, but logit is kind of confusing in deep learning. So if we go
5627
10:01:42,880 --> 10:01:48,960
what is a logit in deep learning, it kind of means a different thing. It's kind of just a name of what
5628
10:01:49,680 --> 10:01:55,200
yeah, there we go. What is the word logits in TensorFlow? As I said, TensorFlow is another
5629
10:01:55,200 --> 10:02:00,800
deep learning framework. So let's close this. What do we got? We've got a whole bunch of
5630
10:02:00,800 --> 10:02:08,080
definitions here. Logits layer. Yeah. This is one of my favorites. In context of deep learning,
5631
10:02:08,080 --> 10:02:13,920
the logits layer means the layer that feeds into the softmax. So softmax is a form of activation.
5632
10:02:13,920 --> 10:02:17,840
We're going to see all of this later on because this is just words on a page right now. Softmax
5633
10:02:17,840 --> 10:02:22,560
or other such normalization. So the output of the softmax are the probabilities for the
5634
10:02:22,560 --> 10:02:29,200
classification task and its input is the logit's layer. Whoa, there's a lot going on here. So let's
5635
10:02:29,200 --> 10:02:35,360
just take a step back and get into writing some code. And for optimizers, I'm just going to complete
5636
10:02:35,360 --> 10:02:48,720
this. And for optimizers, two of the most common and useful are SGD and Adam. However, PyTorch
5637
10:02:48,720 --> 10:02:55,760
has many built in options. And as you start to learn more about the world of machine learning,
5638
10:02:55,760 --> 10:03:04,320
you'll find that if you go to torch.optim or torch.nn. So if we have.nn, what do we have in here?
5639
10:03:04,320 --> 10:03:09,120
Loss functions. There we go. Beautiful. That's what we're after. L1 loss, which is MAE,
5640
10:03:09,120 --> 10:03:14,560
MSC loss, cross entropy loss, CTC loss, all of these different types of loss here will depend
5641
10:03:14,560 --> 10:03:18,800
on the problem you're working on. But I'm here to tell you that for regression and classification,
5642
10:03:18,800 --> 10:03:24,000
two of the most common of these. See, this is that confusion again. BCE loss, BCE with
5643
10:03:24,000 --> 10:03:30,000
logit's loss. What the hell is a logit? My goodness. Okay, that's enough. And Optim,
5644
10:03:30,000 --> 10:03:34,480
these are different optimizers. We've got probably a dozen or so here. Algorithms.
5645
10:03:35,680 --> 10:03:40,800
Add a delta, add a grad. Adam, this can be pretty full on when you first get started. But for now,
5646
10:03:40,800 --> 10:03:46,560
just stick with SGD and the atom optimizer. They're two of the most common. Again, they may not
5647
10:03:46,560 --> 10:03:51,840
perform the best on every single problem, but they will get you fairly far just knowing those.
5648
10:03:51,840 --> 10:03:57,520
And then you'll pick up some of these extra ones as you go. But let's just get rid of all of,
5649
10:03:57,520 --> 10:04:05,440
maybe we'll, so I'll put this in here, this link. So we'll create our loss function. For the loss
5650
10:04:05,440 --> 10:04:20,240
function, we're going to use torch.nn.bce with logit's loss. This is exciting. For more on what
5651
10:04:21,440 --> 10:04:27,600
binary cross entropy, which is BCE, a lot of abbreviations in machine learning and deep learning
5652
10:04:27,600 --> 10:04:39,280
is check out this article. And then for a definition on what a logit is, we're going to see a
5653
10:04:39,280 --> 10:04:44,320
logit in a second in deep learning. Because again, deep learning is one of those fields,
5654
10:04:44,320 --> 10:04:48,560
a machine learning, which likes to be a bit rebellious, you know, likes to be a bit different
5655
10:04:48,560 --> 10:04:53,600
from the pure mathematics type of fields and statistics in general. It's this beautiful
5656
10:04:53,600 --> 10:05:03,280
gestaltism and for different optimizers, see torch dot opt in. But we've covered a few of these
5657
10:05:03,280 --> 10:05:11,920
things before. And finally, I'm going to put up here, and then for some common choices of loss
5658
10:05:11,920 --> 10:05:17,920
functions and optimizers. Now, don't worry too much. This is why I'm linking all of these extra
5659
10:05:17,920 --> 10:05:23,280
resources. A lot of this is covered in the book. So as we just said, set up loss function,
5660
10:05:23,280 --> 10:05:27,760
optimizer, we just talked about these things. But I mean, you can just go to this book website
5661
10:05:27,760 --> 10:05:33,200
and reference it. Oh, we don't want that. We want this link. Come on, I knew you can't even
5662
10:05:33,200 --> 10:05:37,600
copy and paste. How are you supposed to code? I know I've been promising code this whole time,
5663
10:05:37,600 --> 10:05:43,440
so let's write some. So let's set up the loss function. What did we say it was? We're going to
5664
10:05:43,440 --> 10:05:54,160
call it L O double S F N for loss function. And we're going to call B C E with logit's loss. So B
5665
10:05:54,160 --> 10:06:01,040
C E with logit's loss. This has the sigmoid activation function built in. And we haven't covered what
5666
10:06:01,040 --> 10:06:06,240
the sigmoid activation function is, but we are going to don't you worry about that built in.
5667
10:06:07,120 --> 10:06:11,360
In fact, if you wanted to learn what the sigmoid activation function is, how could you find out
5668
10:06:11,360 --> 10:06:17,520
sigmoid activation function? But we're going to see it in action. Activation functions in neural
5669
10:06:17,520 --> 10:06:21,760
networks. This is the beautiful thing about machine learning. There's so much stuff out there.
5670
10:06:21,760 --> 10:06:26,080
People have written some great articles. You've got formulas here. PyTorch has implemented that
5671
10:06:26,080 --> 10:06:33,120
behind the scenes for us. So thank you, PyTorch. But if you recall, sigmoid activation function
5672
10:06:33,120 --> 10:06:38,640
built in, where did we discuss the architecture of a classification network? What do we have here?
5673
10:06:38,640 --> 10:06:44,720
Right back in the zeroth chapter of this little online book thing that we heard here. Binary
5674
10:06:44,720 --> 10:06:52,560
classification. We have output activation. Oh, oh, look at that. So sigmoid torch dot sigmoid and
5675
10:06:52,560 --> 10:06:59,040
pytorch. All right. And then for multi class classification, we need the softmax. Okay. Names
5676
10:06:59,040 --> 10:07:04,880
on a page again, but this is just a reference table so we can keep coming back to. So let's just
5677
10:07:04,880 --> 10:07:11,520
keep going with this. I just want to highlight the fact that nn dot BCE loss also exists. So
5678
10:07:12,400 --> 10:07:23,760
this requires BCE loss equals requires inputs to have gone through the sigmoid activation function
5679
10:07:24,960 --> 10:07:33,200
prior to input to BCE loss. And so let's look up the documentation. I'm going to comment that
5680
10:07:33,200 --> 10:07:37,040
out because we're going to stick with using this one. Now, why would we stick with using this one?
5681
10:07:37,040 --> 10:07:41,920
Let's check out the documentation, hey, torch dot nn. And I realized this video is all over the
5682
10:07:41,920 --> 10:07:48,240
place, but we're going to step back through BCE loss with logits. Did I even say this right?
5683
10:07:49,760 --> 10:07:55,520
With logits loss. So with I got the width around the wrong way. So let's check this out. So this
5684
10:07:55,520 --> 10:08:02,400
loss combines a sigmoid layer with the BCE loss in one single class. So if we go back to the code,
5685
10:08:02,400 --> 10:08:11,760
BCE loss is this. So if we combined an n dot sequential, and then we passed in an n dot sigmoid,
5686
10:08:11,760 --> 10:08:22,240
and then we went and then dot BCE loss, we'd get something similar to this. But if we keep reading
5687
10:08:22,240 --> 10:08:27,360
in the documentation, because that's just I just literally read that it combines sigmoid with BCE
5688
10:08:27,360 --> 10:08:33,520
loss. But if we go back to the documentation, why do we want to use it? So this version is more
5689
10:08:33,520 --> 10:08:41,680
numerically stable than using a plain sigmoid by a BCE loss, followed by a BCE loss. As by
5690
10:08:41,680 --> 10:08:47,520
combining the operations into one layer, we take advantage of the log sum x trick for numerical
5691
10:08:47,520 --> 10:08:53,120
stability, beautiful. So if we use this log function, loss function for our binary cross entropy,
5692
10:08:53,120 --> 10:08:59,600
we get some numeric stability. Wonderful. So there's our loss function. We've got the sigmoid
5693
10:08:59,600 --> 10:09:05,360
activation function built in. And so we're going to see the difference between them later on,
5694
10:09:05,360 --> 10:09:11,840
like in the flesh, optimizer, we're going to choose, hmm, let's stick with SGD, hey,
5695
10:09:11,840 --> 10:09:17,040
old faithful stochastic gradient descent. And we have to set the parameters here, the parameters
5696
10:09:17,040 --> 10:09:23,280
parameter params equal to our model parameters would be like, hey, stochastic gradient descent,
5697
10:09:23,280 --> 10:09:30,720
please update. If we get another code cell behind here, please update our model parameters model
5698
10:09:32,240 --> 10:09:37,440
with respect to the loss, because we'd like our loss function to go down. So these two are going
5699
10:09:37,440 --> 10:09:42,480
to work in tandem again, when we write our training loop, and we'll set our learning rate to 0.1.
5700
10:09:42,480 --> 10:09:46,960
We'll see where that gets us. So that's what the optimizer is going to do. It's going to optimize
5701
10:09:46,960 --> 10:09:52,400
all of these parameters for us, which is amazing. And the principal would be the same, even if there
5702
10:09:52,400 --> 10:09:59,280
was 100 layers here, and 10,000, a million different parameters here. So we've got a loss function,
5703
10:09:59,280 --> 10:10:05,440
we've got an optimizer. And how about we create an evaluation metric. So let's calculate
5704
10:10:06,240 --> 10:10:11,680
accuracy at the same time. Because that's very helpful with classification problems is accuracy.
5705
10:10:11,680 --> 10:10:21,120
Now, what is accuracy? Well, we could look up formula for accuracy. So true positive over true
5706
10:10:21,120 --> 10:10:25,920
positive plus true negative times 100. Okay, let's see if we can implement something similar to that
5707
10:10:25,920 --> 10:10:32,800
just using pure pytorch. Now, why would we want accuracy? Because the accuracy is out of 100
5708
10:10:32,800 --> 10:10:42,960
examples. What percentage does our model get right? So for example, if we had a coin toss,
5709
10:10:42,960 --> 10:10:50,240
and we did 100 coin tosses in our model predicted heads 99 out of 100 times, and it was right
5710
10:10:50,240 --> 10:10:56,320
every single time, it might have an accuracy of 99%, because it got one wrong. So 99 out of 100,
5711
10:10:56,320 --> 10:11:07,360
it gets it right. So dev accuracy FN accuracy function, we're going to pass it y true. So
5712
10:11:07,360 --> 10:11:12,480
remember, any type of evaluation function or loss function is comparing the predictions to
5713
10:11:12,480 --> 10:11:19,280
the ground truth labels. So correct equals, this is going to see how many of our y true
5714
10:11:19,280 --> 10:11:27,440
or y threads are correct. So torch equal stands for, hey, how many of these samples y true are
5715
10:11:27,440 --> 10:11:32,560
equal to y pred? And then we're going to get the sum of that, and we need to get the item from
5716
10:11:32,560 --> 10:11:38,080
that because we want it as a single value in Python. And then we're going to calculate the
5717
10:11:38,080 --> 10:11:45,520
accuracy, ACC stands for accuracy, equals correct, divided by the length of samples that we have
5718
10:11:45,520 --> 10:11:52,720
as input. And then we're going to times that by 100, and then return the accuracy. Wonderful.
5719
10:11:52,720 --> 10:11:57,520
So we now have an accuracy function, we're going to see how all the three of these come into play
5720
10:11:57,520 --> 10:12:02,160
when we write a training loop, which we might as we get started on the next few videos, hey,
5721
10:12:02,960 --> 10:12:09,600
I'll see you there. Welcome back. In the last video, we discussed some different loss function
5722
10:12:09,600 --> 10:12:15,200
options for our classification models. So we learned that if we're working with binary cross
5723
10:12:15,200 --> 10:12:21,120
entropy or binary classification problems, we want to use binary cross entropy. And pie torch
5724
10:12:21,120 --> 10:12:27,040
has two different times of binary cross entropy, except one is a bit more numerically stable.
5725
10:12:27,040 --> 10:12:31,840
That's the BCE with logit's loss, because it has a sigmoid activation function built in.
5726
10:12:31,840 --> 10:12:37,520
So that's straight from the pie to its documentation. And that for optimizer wise, we have a few
5727
10:12:37,520 --> 10:12:42,560
different choices as well. So if we check out this section here on the pie torch book, we have a
5728
10:12:42,560 --> 10:12:47,360
few different loss functions and optimizers for different problems and the pie torch code that
5729
10:12:47,360 --> 10:12:52,400
we can implement. But the premise is still the same across the board of different problems.
5730
10:12:52,400 --> 10:12:58,800
The loss function measures how wrong our model is. And the goal of the optimizer is to optimize
5731
10:12:58,800 --> 10:13:05,680
the model parameters in such a way that the loss function goes down. And we also implemented our
5732
10:13:05,680 --> 10:13:13,360
own accuracy function metric, which is going to evaluate our models predictions using accuracy
5733
10:13:13,360 --> 10:13:21,040
as an evaluation metric, rather than just loss. So let's now work on training a model.
5734
10:13:22,000 --> 10:13:28,640
So what should we do first? Well, do you remember the steps in a pie torch training loop?
5735
10:13:28,640 --> 10:13:39,680
So to train our model, we're going to need to build a training loop. So if you watch the video
5736
10:13:39,680 --> 10:13:47,680
on the pie torch, so if you can Google unofficial pie torch song, you should find my, there we go,
5737
10:13:47,680 --> 10:13:51,920
the unofficial pie torch optimization loop song. We're not going to watch that. That's going to
5738
10:13:51,920 --> 10:13:56,640
be a little tidbit for the steps that we're going to code out. But that's just a fun little jingle
5739
10:13:56,640 --> 10:14:01,840
to remember these steps here. So if we go into the book section, this is number three train model,
5740
10:14:01,840 --> 10:14:07,920
exactly where we're up to here. But we have pie torch training loop steps. Remember, for an
5741
10:14:07,920 --> 10:14:16,160
epoch in a range, do the forward pass, calculate the loss, optimizer zero grand, loss backward,
5742
10:14:16,160 --> 10:14:22,560
optimizer step, step, step. We keep singing this all day. You could keep reading those steps all
5743
10:14:22,560 --> 10:14:30,640
day, but it's better to code them. But let's write this out. So forward pass to calculate the loss,
5744
10:14:31,520 --> 10:14:40,320
three, optimizer zero grad, four. What do we do? Loss backward. So back propagation,
5745
10:14:40,320 --> 10:14:45,920
I'll just write that up in here back propagation. We've linked to some extra resources. If you'd
5746
10:14:45,920 --> 10:14:51,920
like to find out what's going on in back propagation, we're focused on code here, and then gradient
5747
10:14:51,920 --> 10:15:06,160
descent. So optimizer step. So build a training loop with the following steps. However, I've kind
5748
10:15:06,160 --> 10:15:10,400
of mentioned a few things that need to be taken care of before we talk about the forward pass.
5749
10:15:10,400 --> 10:15:16,880
So we've talked about logits. We looked up what the hell is a logit. So if we go into this stack
5750
10:15:16,880 --> 10:15:21,760
overflow answer, we saw machine learning, what is a logit? How about we see that? We need to
5751
10:15:21,760 --> 10:15:27,280
do a few steps. So I'm going to write this down. Let's get a bit of clarity about us, Daniel.
5752
10:15:27,280 --> 10:15:30,400
We're kind of all over the place at the moment, but that's all right. That's the exciting part
5753
10:15:30,400 --> 10:15:39,600
of machine learning. So let's go from going from raw logits to prediction probabilities
5754
10:15:40,480 --> 10:15:47,200
to prediction labels. That's what we want. Because to truly evaluate our model, we want to
5755
10:15:47,200 --> 10:15:56,400
so let's write in here our model outputs going to be raw logit. So that's the definition of a
5756
10:15:56,400 --> 10:16:00,720
logit in machine learning and deep learning. You might read some few other definitions, but for us,
5757
10:16:00,720 --> 10:16:07,920
the raw outputs of our model, model zero are going to be referred to as logits. So then model zero,
5758
10:16:07,920 --> 10:16:18,960
so whatever comes out of here are logits. So we can convert these logits into prediction probabilities
5759
10:16:20,560 --> 10:16:33,440
by passing them to some kind of activation function, e.g. sigmoid for binary cross entropy
5760
10:16:33,440 --> 10:16:45,680
and softmax for multi-class classification. I've got binary class e-fication. I have to
5761
10:16:45,680 --> 10:16:51,920
sound it out every time I spell it for binary classification. So class e-fication. So we're
5762
10:16:51,920 --> 10:16:57,520
going to see multi-class classification later on, but we want some prediction probabilities.
5763
10:16:57,520 --> 10:17:02,160
We're going to see what they look like in a minute. So we want to go from logits to prediction
5764
10:17:02,160 --> 10:17:13,920
probabilities to prediction labels. Then we can convert our model's prediction probabilities to
5765
10:17:15,120 --> 10:17:24,480
prediction labels by either rounding them or taking the argmax.
5766
10:17:24,480 --> 10:17:34,560
So round is for binary classification and argmax will be for the outputs of the softmax activation
5767
10:17:34,560 --> 10:17:41,040
function, but let's see it in action first. So I've called the logits are the raw outputs of our
5768
10:17:41,040 --> 10:17:49,360
model with no activation function. So view the first five outputs of the forward pass
5769
10:17:49,360 --> 10:17:56,960
on the test data. So of course, our model is still instantiated with random values. So we're
5770
10:17:56,960 --> 10:18:02,960
going to set up a variable here, y logits, and model zero, we're going to pass at the test data.
5771
10:18:02,960 --> 10:18:09,760
So x test, not text, two device, because our model is currently on our CUDA device and we need
5772
10:18:09,760 --> 10:18:15,680
our test data on the same device or target device. Remember, that's why we're writing device
5773
10:18:15,680 --> 10:18:20,960
agnostic codes. So this would work regardless of whether there's a GPU active or not. Let's have
5774
10:18:20,960 --> 10:18:27,840
a look at the logits. Oh, okay. Right now, we've got some positive values here. And we can see that
5775
10:18:27,840 --> 10:18:33,760
they're on the CUDA device. And we can see that they're tracking gradients. Now, ideally,
5776
10:18:34,480 --> 10:18:40,080
we would have run torch dot inference mode here, because we're making predictions. And the rule
5777
10:18:40,080 --> 10:18:44,080
of thumb is whenever you make predictions with your model, you turn it into a vowel mode.
5778
10:18:44,080 --> 10:18:48,880
We just have to remember to turn it back to train when we want to train and you run torch dot
5779
10:18:48,880 --> 10:18:53,280
inference mode. So we get a very similar set up here. We just don't have the gradients being
5780
10:18:53,280 --> 10:19:00,480
tracked anymore. Okay. So these are called logits. The logits are the raw outputs of our model,
5781
10:19:00,480 --> 10:19:06,800
without being passed to any activation function. So an activation function is something a little
5782
10:19:06,800 --> 10:19:13,840
separate from a layer. So if we come up here, we've used layer. So in the neural networks that we
5783
10:19:13,840 --> 10:19:19,360
start to build and the ones that you'll subsequently build are comprised of layers and activation
5784
10:19:19,360 --> 10:19:24,000
functions, we're going to make the concept of an activation function a little bit more clear later
5785
10:19:24,000 --> 10:19:30,080
on. But for now, just treat it all as some form of mathematical operation. So if we were to pass
5786
10:19:30,080 --> 10:19:35,680
data through this model, what is happening? Well, it's going through the linear layer. Now recall,
5787
10:19:35,680 --> 10:19:40,960
we've seen this a few times now torch and then linear. If we pass data through a linear layer,
5788
10:19:40,960 --> 10:19:47,040
it's applying the linear transformation on the incoming data. So it's performing this
5789
10:19:47,040 --> 10:19:53,600
mathematical operation behind the scenes. So why the output equals the input x multiplied by a
5790
10:19:53,600 --> 10:19:59,200
weight tensor a this could really be w which is transposed so that this is doing a dot product
5791
10:19:59,200 --> 10:20:05,120
plus a bias term here. And then if we jump into our model state deck, we've got weight
5792
10:20:05,120 --> 10:20:09,600
and we've got bias. So that's the formula that's happening in these two layers. It will be different
5793
10:20:09,600 --> 10:20:14,240
depending on the layer that we choose. But for now, we're sticking with linear. And so that the
5794
10:20:14,240 --> 10:20:21,120
raw output of our data going through our two layers, the logits is going to be this information
5795
10:20:21,120 --> 10:20:30,640
here. However, it's not in the same format as our test data. And so if we want to make a comparison
5796
10:20:30,640 --> 10:20:36,640
to how good our model is performing, we need apples to apples. So we need this in the same format
5797
10:20:36,640 --> 10:20:44,000
as this, which is not of course. So we need to go to a next step. Let's use the sigmoid. So use the
5798
10:20:44,000 --> 10:20:54,720
sigmoid activation function on our model logits. So why are we using sigmoid? Well, recall in a
5799
10:20:54,720 --> 10:21:02,320
binary classification architecture, the output activation is the sigmoid function here. So now
5800
10:21:02,320 --> 10:21:07,840
let's jump back into here. And we're going to create some predprobs. And what this stands for
5801
10:21:07,840 --> 10:21:16,000
on our model logits to turn them into prediction probabilities, probabilities. So why predprobs
5802
10:21:16,000 --> 10:21:23,200
equals torch sigmoid, why logits? And now let's have a look at why predprobs. What do we get from
5803
10:21:23,200 --> 10:21:29,760
this? Oh, when we still get numbers on a page, goodness gracious me. But the important point
5804
10:21:29,760 --> 10:21:36,160
now is that they've gone through the sigmoid activation function, which is now we can pass these
5805
10:21:37,200 --> 10:21:42,480
to a torch dot round function. Let's have a look at this torch dot round. And what do we get?
5806
10:21:42,480 --> 10:21:52,560
Predprobs. Oh, the same format as what we've got here. Now you might be asking like, why don't we
5807
10:21:52,560 --> 10:21:59,360
just put torch dot round here? Well, that's a little, this step is required to, we can't just do it on
5808
10:21:59,360 --> 10:22:04,880
the raw logits. We need to use the sigmoid activation function here to turn it into prediction
5809
10:22:04,880 --> 10:22:10,960
probabilities. And now what is a prediction probability? Well, that's a value usually between 0 and 1
5810
10:22:10,960 --> 10:22:16,960
for how likely our model thinks it's a certain class. And in the case of binary cross entropy,
5811
10:22:16,960 --> 10:22:24,240
these prediction probability values, let me just write this out in text. So for our prediction
5812
10:22:24,240 --> 10:22:39,440
probability values, we need to perform a range style rounding on them. So this is a decision
5813
10:22:39,440 --> 10:22:48,400
boundary. So this will make more sense when we go why predprobs, if it's equal to 0.5 or greater
5814
10:22:48,400 --> 10:22:59,360
than 0.5, we set y equal to one. So y equal one. So class one, whatever that is, a red dot or a
5815
10:22:59,360 --> 10:23:08,800
blue dot, and then why predprobs, if it is less than 0.5, we set y equal to zero. So this is class
5816
10:23:08,800 --> 10:23:18,080
zero. You can also adjust this decision boundary. So say, if you wanted to increase this value,
5817
10:23:18,080 --> 10:23:28,560
so anything over 0.7 is one. And below that is zero. But generally, most commonly, you'll find
5818
10:23:28,560 --> 10:23:35,920
it split at 0.5. So let's keep going. Let's actually see this in action. So how about we
5819
10:23:35,920 --> 10:23:47,520
recode this? So find the predicted probabilities. And so we want no, sorry, we want the predicted
5820
10:23:47,520 --> 10:23:52,960
labels, that's what we want. So when we're evaluating our model, we want to convert the outputs of
5821
10:23:52,960 --> 10:23:58,640
our model, the outputs of our model are here, the logits, the raw outputs of our model are
5822
10:23:58,640 --> 10:24:06,320
logits. And then we can convert those logits to prediction probabilities using the sigmoid function
5823
10:24:06,320 --> 10:24:14,240
on the logits. And then we want to find the predicted labels. So we go raw logits output of our model,
5824
10:24:14,240 --> 10:24:19,600
prediction probabilities after passing them through an activation function, and then prediction labels.
5825
10:24:19,600 --> 10:24:26,080
This is the steps we want to take with the outputs of our model. So find the predicted labels.
5826
10:24:26,080 --> 10:24:31,120
Let's go in here a little bit different to our regression problem previously, but nothing we can't
5827
10:24:31,120 --> 10:24:39,120
handle. Torch round, we're going to go y-pred-probs. So I like to name it y-pred-probs for prediction
5828
10:24:39,120 --> 10:24:47,040
probabilities and y-preds for prediction labels. Now let's go in full if we wanted to. So y-pred
5829
10:24:47,040 --> 10:24:54,480
labels equals torch dot round torch dot sigmoid. So sigmoid activation function for binary cross
5830
10:24:54,480 --> 10:25:01,680
entropy and model zero x test dot two device. Truly this should be within inference mode code,
5831
10:25:01,680 --> 10:25:08,560
but for now we'll just leave it like this to have a single example of what's going on here.
5832
10:25:08,560 --> 10:25:13,840
Now I just need to count one, two, there we want. That's where we want the index. We just want it
5833
10:25:13,840 --> 10:25:22,880
on five examples. So check for equality. And we want print torch equal. We're going to check
5834
10:25:22,880 --> 10:25:34,240
y-pred's dot squeeze is equal to y-pred labels. So just we're doing the exact same thing. And we
5835
10:25:34,240 --> 10:25:38,720
need squeeze here to get rid of the extra dimension that comes out. You can try doing this without
5836
10:25:38,720 --> 10:25:49,600
squeeze. So get rid of extra dimension once again. We want y-pred's dot squeeze. Fair bit of code
5837
10:25:49,600 --> 10:25:56,640
there, but this is what's happened here. We create y-pred's. So we turn the y-pred
5838
10:25:56,640 --> 10:26:03,280
probes into y-pred's. And then we just do the full step over again. So we make predictions with
5839
10:26:03,280 --> 10:26:13,440
our model, we get the raw logits. So this is logits to pred probes to pred labels. So the raw
5840
10:26:13,440 --> 10:26:18,720
outputs of our model are logits. We turn the logits into prediction probabilities using torch
5841
10:26:18,720 --> 10:26:25,600
sigmoid. And we turn the prediction probabilities into prediction labels using torch dot round.
5842
10:26:25,600 --> 10:26:31,200
And we fulfill this criteria here. So everything above 0.5. This is what torch dot round does.
5843
10:26:31,200 --> 10:26:37,280
Turns it into a 1. Everything below 0.5 turns it into a 0. The predictions right now are going to
5844
10:26:37,280 --> 10:26:44,640
be quite terrible because our model is using random numbers. But y-pred's found with the steps above
5845
10:26:44,640 --> 10:26:50,400
is the same as y-pred labels doing the more than one hit. Thanks to this check for equality using
5846
10:26:50,400 --> 10:26:56,400
torch equal y-pred's dot squeeze. And we just do the squeeze to get rid of the extra dimensions.
5847
10:26:56,400 --> 10:27:03,280
And we have out here some labels that look like our actual y-test labels. They're in the same format,
5848
10:27:03,280 --> 10:27:10,000
but of course they're not the same values because this model is using random weights to make predictions.
5849
10:27:10,000 --> 10:27:17,280
So we've done a fair few steps here, but I believe we are now in the right space to start building
5850
10:27:17,280 --> 10:27:24,480
a training a test loop. So let's write that down here 3.2 building a training and testing loop.
5851
10:27:24,480 --> 10:27:29,280
You might want to have a go at this yourself. So we've got all the steps that we need to do the
5852
10:27:29,280 --> 10:27:35,200
forward pass. But the reason we've done this step here, the logits, then the pred probes and the
5853
10:27:35,200 --> 10:27:43,680
pred labels, is because the inputs to our loss function up here, this requires, so BCE with
5854
10:27:43,680 --> 10:27:50,560
logits loss, requires what? Well, we're going to see that in the next video, but I'd encourage
5855
10:27:50,560 --> 10:27:55,840
you to give it a go at implementing these steps here. Remember the jingle for an epoch in a range,
5856
10:27:55,840 --> 10:28:02,720
do the forward pass, calculate the loss, which is BC with logits loss, optimise a zero grad,
5857
10:28:02,720 --> 10:28:10,560
which is this one here, last backward, optimise a step, step, step. Let's do that together in the
5858
10:28:10,560 --> 10:28:17,440
next video. Welcome back. In the last few videos, we've been working through creating a model for
5859
10:28:17,440 --> 10:28:22,240
a classification problem. And now we're up to training a model. And we've got some steps here,
5860
10:28:22,240 --> 10:28:29,520
but we started off by discussing the concept of logits. Logits are the raw output of the model,
5861
10:28:29,520 --> 10:28:33,920
whatever comes out of the forward functions of the layers in our model. And then we discussed how
5862
10:28:33,920 --> 10:28:38,160
we can turn those logits into prediction probabilities using an activation function,
5863
10:28:38,160 --> 10:28:44,720
such as sigmoid for binary classification, and softmax for multi class classification.
5864
10:28:44,720 --> 10:28:48,720
We haven't seen softmax yet, but we're going to stick with sigmoid for now because we have
5865
10:28:48,720 --> 10:28:54,080
binary classification data. And then we can convert that from prediction probabilities
5866
10:28:54,080 --> 10:28:58,960
to prediction labels. Because remember, when we want to evaluate our model, we want to compare
5867
10:28:58,960 --> 10:29:06,640
apples to apples. We want our models predictions to be in the same format as our test labels.
5868
10:29:06,640 --> 10:29:12,000
And so I took a little break after the previous video. So my collab notebook has once again
5869
10:29:12,000 --> 10:29:17,200
disconnected. So I'm just going to run all of the cells before here. It's going to reconnect up
5870
10:29:17,200 --> 10:29:22,800
here. We should still have a GPU present. That's a good thing about Google collab is that if you
5871
10:29:22,800 --> 10:29:30,880
change the runtime type to GPU, it'll save that wherever it saves the Google collab notebook,
5872
10:29:30,880 --> 10:29:36,000
so that when you restart it, it should still have a GPU present. And how can we check that,
5873
10:29:36,000 --> 10:29:41,280
of course, while we can type in device, we can run that cell. And we can also check
5874
10:29:42,000 --> 10:29:48,720
Nvidia SMI. It'll tell us if we have an Nvidia GPU with CUDA enabled ready to go.
5875
10:29:48,720 --> 10:29:58,080
So what's our device? CUDA. Wonderful. And Nvidia SMI. Excellent. I have a Tesla P100 GPU.
5876
10:29:58,080 --> 10:30:03,840
Ready to go. So with that being said, let's start to write a training loop. Now we've done this before,
5877
10:30:03,840 --> 10:30:09,840
and we've got the steps up here. Do the forward pass, calculate the loss. We've spent enough on
5878
10:30:09,840 --> 10:30:14,000
this. So we're just going to start jumping into write code. There is a little tidbit in this one,
5879
10:30:14,000 --> 10:30:19,040
though, but we'll conquer that when we get to it. So I'm going to set a manual seed,
5880
10:30:20,240 --> 10:30:26,160
torch top manual seed. And I'm going to use my favorite number 42. This is just to ensure
5881
10:30:26,160 --> 10:30:32,000
reproducibility, if possible. Now I also want you to be aware of there is also another
5882
10:30:32,000 --> 10:30:39,280
form of random seed manual seed, which is a CUDA random seed. Do we have the PyTorch?
5883
10:30:39,280 --> 10:30:51,680
Yeah, reproducibility. So torch dot CUDA dot manual seed dot seed. Hmm. There is a CUDA
5884
10:30:51,680 --> 10:31:02,240
seed somewhere. Let's try and find out. CUDA. I think PyTorch have just had an upgrade to
5885
10:31:02,240 --> 10:31:09,600
their documentation. Seed. Yeah, there we go. Okay. I knew it was there. So torch dot CUDA
5886
10:31:09,600 --> 10:31:15,040
dot manual seed. So if we're using CUDA, we have a CUDA manual seed as well. So let's see what
5887
10:31:15,040 --> 10:31:21,280
happens if we put that to watch that CUDA dot manual seed 42. We don't necessarily have to put
5888
10:31:21,280 --> 10:31:26,320
these. It's just to try and get as reproducible as numbers as possible on your screen and my screen.
5889
10:31:26,880 --> 10:31:31,520
Again, what is more important is not necessarily the numbers exactly being the same lining up
5890
10:31:31,520 --> 10:31:36,800
between our screens. It's more so the direction of which way they're going. So let's set the number
5891
10:31:36,800 --> 10:31:43,200
of epochs. We're going to train for 100 epochs. epochs equals 100. But again, as you might have
5892
10:31:43,200 --> 10:31:48,560
guessed, the CUDA manual seed is for if you're doing operations on a CUDA device, which in our
5893
10:31:48,560 --> 10:31:54,880
case, we are. Well, then perhaps we'd want them to be as reproducible as possible. So speaking of
5894
10:31:54,880 --> 10:32:00,800
CUDA devices, let's put the data to the target device because we're working with or we're writing
5895
10:32:00,800 --> 10:32:06,400
data agnostic code here. So I'm going to write x train y train equals x train two device,
5896
10:32:06,400 --> 10:32:12,400
comma y train dot two device, that'll take care of the training data. And I'm going to do the
5897
10:32:12,400 --> 10:32:19,120
same for the testing data equals x test two device. Because if we're going to run our model
5898
10:32:20,400 --> 10:32:25,840
on the CUDA device, we want our data to be there too. And the way we're writing our code,
5899
10:32:25,840 --> 10:32:31,280
our code is going to be device agnostic. Have I said that enough yet? So let's also build our
5900
10:32:31,280 --> 10:32:35,840
training and evaluation loop. Because we've covered the steps in here before, we're going to start
5901
10:32:35,840 --> 10:32:41,360
working a little bit faster through here. And don't worry, I think you're up to it. So for an epoch
5902
10:32:41,360 --> 10:32:46,800
in a range of epochs, what do we do? We start with training. So let me just write this.
5903
10:32:48,320 --> 10:32:53,200
Training model zero dot train. That's the model we're working with. We call the train,
5904
10:32:53,200 --> 10:32:57,440
which is the default, but we're going to do that anyway. And as you might have guessed,
5905
10:32:57,440 --> 10:33:02,880
the code that we're writing here is, you can functionize this. So we're going to do this later
5906
10:33:02,880 --> 10:33:07,440
on. But just for now, the next couple of videos, the next module or two, we're going to keep
5907
10:33:07,440 --> 10:33:14,080
writing out the training loop in full. So this is the part, the forward pass, where there's a
5908
10:33:14,080 --> 10:33:18,880
little bit of a tidbit here compared to what we've done previously. And that is because we're
5909
10:33:18,880 --> 10:33:24,400
outputting raw logits here, if we just pass our data straight to the model. So model zero
5910
10:33:24,400 --> 10:33:30,080
x train. And we're going to squeeze them here to get rid of an extra dimension. You can try to
5911
10:33:30,080 --> 10:33:34,400
see what the output sizes look like without squeeze. But we're just going to call squeeze
5912
10:33:34,400 --> 10:33:39,760
here. Remember, squeeze removes an extra one dimension from a tensor. And then to convert it
5913
10:33:39,760 --> 10:33:47,520
into prediction labels, we go torch dot round. And torch dot sigmoid, because torch dot sigmoid
5914
10:33:47,520 --> 10:33:53,040
is an activation function, which is going to convert logits into what convert the logits
5915
10:33:53,040 --> 10:33:57,680
into prediction probabilities. So why logits? And I'm just going to put a note here. So this
5916
10:33:57,680 --> 10:34:08,880
is going to go turn logits into pred probes into pred labels. So we've done the forward pass.
5917
10:34:08,880 --> 10:34:12,560
So that's a little tidbit there. We could have done all of this in one step, but I'll show you
5918
10:34:12,560 --> 10:34:19,760
why we broke this apart. So now we're going to calculate loss slash accuracy. We don't necessarily
5919
10:34:19,760 --> 10:34:26,320
have to calculate the accuracy. But we did make an accuracy function up here. So that we can
5920
10:34:26,320 --> 10:34:31,360
calculate the accuracy during training, we could just stick with only calculating the loss. But
5921
10:34:31,360 --> 10:34:37,520
sometimes it's cool to visualize different metrics loss plus a few others while your model is training.
5922
10:34:37,520 --> 10:34:44,640
So let's write some code to do that here. So we'll start off by going loss equals loss
5923
10:34:44,640 --> 10:34:53,600
f n and y logits. Ah, here's the difference of what we've done before. Previously in the notebook
5924
10:34:53,600 --> 10:34:58,880
zero one, up to zero two now, we passed in the prediction right here. But because what's our
5925
10:34:58,880 --> 10:35:04,880
loss function? Let's have a look at our loss function. Let's just call that see what it returns.
5926
10:35:04,880 --> 10:35:15,280
BCE with logits loss. So the BCE with logits expects logits as input. So as you might have guessed,
5927
10:35:15,280 --> 10:35:24,720
loss function without logits. If we had nn dot BCE loss, notice how we haven't got with logits.
5928
10:35:25,280 --> 10:35:32,640
And then we called loss f n, f n stands for function, by the way, without logits. What do we get?
5929
10:35:32,640 --> 10:35:40,480
So BCE loss. So this loss expects prediction probabilities as input. So let's write some code
5930
10:35:40,480 --> 10:35:45,040
to differentiate between these two. As I said, we're going to be sticking with this one.
5931
10:35:45,040 --> 10:35:53,120
Why is that because if we look up torch BCE with logits loss, the documentation states that it's
5932
10:35:53,120 --> 10:35:59,280
more numerically stable. So this loss combines a sigmoid layer and the BCE loss into one single
5933
10:35:59,280 --> 10:36:05,760
class, and is more numerically stable. So let's come back here, we'll keep writing some code.
5934
10:36:06,880 --> 10:36:13,280
And the accuracy is going to be accuracy f n. So our accuracy function, there's a little bit of a
5935
10:36:13,280 --> 10:36:19,440
difference here is why true equals y train for the training data. So this will be the training
5936
10:36:19,440 --> 10:36:29,040
accuracy. And then we have y pred equals y pred. So this is our own custom accuracy function
5937
10:36:29,040 --> 10:36:33,680
that we wrote ourselves. This is a testament to the Pythonic nature of PyTorch as well.
5938
10:36:33,680 --> 10:36:37,120
We've just got a pure Python function that we've slotted into our training loop,
5939
10:36:37,680 --> 10:36:40,480
which is essentially what the loss function is behind the scenes.
5940
10:36:41,440 --> 10:36:52,400
Now, let's write here, and then dot BCE with logits loss expects raw logits. So the raw output
5941
10:36:52,400 --> 10:37:00,240
of our model as input. Now, what if we were using a BCE loss on its own here? Well, let's just write
5942
10:37:00,240 --> 10:37:07,520
some code for that. So let's call loss function. And then we want to pass in y pred. Or we can
5943
10:37:07,520 --> 10:37:14,320
just go why or torch sigmoid. So why would we pass in torch sigmoid on the logits here? Because
5944
10:37:14,320 --> 10:37:21,760
remember, calling torch dot sigmoid on our logits turns our logits into prediction probabilities.
5945
10:37:21,760 --> 10:37:31,520
And then we would pass in y train here. So if this was BCE loss expects this expects prediction
5946
10:37:32,320 --> 10:37:40,000
probabilities as input. So does that make sense? That's the difference between with logits. So
5947
10:37:40,000 --> 10:37:45,440
our loss function requires logits as input. Whereas if we just did straight up BCE loss,
5948
10:37:45,440 --> 10:37:52,240
we need to call torch dot sigmoid on the logits because it expects prediction probabilities as
5949
10:37:52,240 --> 10:37:59,280
input. Now, I'm going to comment that out because our loss function is BCE with logits loss. But
5950
10:37:59,280 --> 10:38:04,640
just keep that in mind. For some reason, you stumble across some pytorch code that's using BCE loss,
5951
10:38:04,640 --> 10:38:10,160
not BCE with logits loss. And you find that torch dot sigmoid is calling here, or you come across
5952
10:38:10,160 --> 10:38:16,400
some errors, because your inputs to your loss function are not what it expects. So with that
5953
10:38:16,400 --> 10:38:23,040
being said, we can keep going with our other steps. So we're up to optimizer zero grad. So
5954
10:38:23,040 --> 10:38:32,080
optimizer dot zero grad. Oh, this is step three, by the way. And what's after this? Once we've
5955
10:38:32,080 --> 10:38:40,960
zero grad the optimizer, we do number four, which is loss backward. We can go last backward. And then
5956
10:38:40,960 --> 10:38:48,080
we go what's next? Optimizer step step step. So optimizer dot step. And I'm singing the unofficial
5957
10:38:48,080 --> 10:38:56,000
pytorch optimization loop song there. This is back propagation. Calculate the gradients with respect
5958
10:38:56,000 --> 10:39:01,840
to all of the parameters in the model. And the optimizer step is update the parameters to reduce
5959
10:39:01,840 --> 10:39:08,720
the gradients. So gradient descent, hence the descent. Now, if we want to do testing,
5960
10:39:09,440 --> 10:39:14,480
well, we know what to do here now, we go model zero, what do we do? We call model dot of
5961
10:39:14,480 --> 10:39:18,880
al when we're testing. And if we're making predictions, that's what we do when we test,
5962
10:39:18,880 --> 10:39:24,640
we make predictions on the test data set, using the patterns that our model has learned on the
5963
10:39:24,640 --> 10:39:29,440
training data set, we turn on inference mode, because we're doing inference, we want to do the
5964
10:39:29,440 --> 10:39:36,560
forward pass. And of course, we're going to compute the test logits, because logits are the raw output
5965
10:39:36,560 --> 10:39:42,640
of our model with no modifications. X test dot squeeze, we're going to get rid of an extra one
5966
10:39:42,640 --> 10:39:48,880
dimension there. Then we create the test pred, which is we have to do a similar calculation to
5967
10:39:48,880 --> 10:39:54,880
what we've done here for the test pred, which is torch dot round. For our binary classification,
5968
10:39:54,880 --> 10:39:59,280
we want prediction probabilities, which we're going to create by calling the sigmoid function
5969
10:39:59,280 --> 10:40:05,680
on the test logits, prediction probabilities that are 0.5 or above to go to one, and prediction
5970
10:40:05,680 --> 10:40:15,200
probabilities under 0.5 to go to level zero. So two is calculate the test loss, test loss
5971
10:40:15,840 --> 10:40:22,080
slash accuracy. How would we do this? Well, just if we've done before, and we're going to go
5972
10:40:22,080 --> 10:40:29,760
loss FN test logits, because our loss function, we're using what we're using BCE with logits loss,
5973
10:40:29,760 --> 10:40:34,960
expects logits as input, where do we find that out in the documentation, of course,
5974
10:40:34,960 --> 10:40:40,000
then we come back here, test logits, we're going to compare that to the Y test labels.
5975
10:40:40,000 --> 10:40:45,280
And then for the test accuracy, what are we going to do? We're going to call accuracy FN
5976
10:40:45,280 --> 10:40:55,600
on Y true equals Y test, and Y pred equals test pred. Now you might be thinking, why did I switch
5977
10:40:55,600 --> 10:41:00,880
up the order here for these? Oh, and by the way, this is important to know with logits loss.
5978
10:41:03,120 --> 10:41:09,680
So with these loss functions, the order here matters of which way you put in your parameters.
5979
10:41:09,680 --> 10:41:13,760
So predictions come first, and then true labels for our loss function. You might be
5980
10:41:13,760 --> 10:41:19,040
wondering why I've done it the reverse for our accuracy function, Y true and Y pred. That's just
5981
10:41:19,040 --> 10:41:24,960
because I like to be confusing. Well, not really. It's because if we go to scikit-learn, I base a
5982
10:41:24,960 --> 10:41:32,720
lot of my structured code of how scikit-learn structures things. The scikit-learn metrics accuracy
5983
10:41:32,720 --> 10:41:40,160
score goes Y true Y pred. So I base it off that order, because the scikit-learn metrics package
5984
10:41:40,160 --> 10:41:48,960
is very helpful. So I've based our metric evaluation metric function off this one. Whereas PyTorch's
5985
10:41:48,960 --> 10:41:53,680
loss function does it in the reverse order, and it's important to get these in the right order.
5986
10:41:53,680 --> 10:41:59,200
Exactly why they do it in that order. I couldn't tell you why. And we've got one final step, which
5987
10:41:59,200 --> 10:42:10,240
is to print out what's happening. So how about we go, we're doing a lot of epochs here, 100 epochs.
5988
10:42:10,240 --> 10:42:17,680
So we'll divide the epoch by 10 to print out every epoch or every 10th epoch, sorry. And we have a
5989
10:42:17,680 --> 10:42:22,960
couple of different metrics that we can print out this time. So we're going to print out the epoch
5990
10:42:22,960 --> 10:42:31,200
number epoch. And then we're going to print out the loss. So loss, how many decimal points?
5991
10:42:31,200 --> 10:42:36,400
We'll go point five here. This is going to be the training loss. We'll also do the accuracy,
5992
10:42:36,400 --> 10:42:42,560
which will be the training accuracy. We could write trainiac here for our variable to be a little bit,
5993
10:42:42,560 --> 10:42:46,640
make them a little bit more understandable. And then we go here, but we're just going to leave
5994
10:42:46,640 --> 10:42:54,160
it as loss and accuracy for now, because we've got test loss over here, test loss. And we're
5995
10:42:54,160 --> 10:43:00,640
going to do the same five decimal points here. And then we're going to go test accuracy as well.
5996
10:43:01,520 --> 10:43:08,880
Test act dot, we'll go to for the accuracy. And because it's accuracy, we want a percentage. This
5997
10:43:08,880 --> 10:43:13,680
is the percent out of 100 guesses. What's the percentage that our model gets right on the training
5998
10:43:13,680 --> 10:43:18,640
data and the testing data, as long as we've coded all the functions correctly. Now,
5999
10:43:18,640 --> 10:43:23,280
we've got a fair few steps here. My challenge to you is to run this. And if there are any errors,
6000
10:43:23,280 --> 10:43:27,680
try to fix them. No doubt there's probably one or two or maybe more that we're going to have to
6001
10:43:27,680 --> 10:43:33,120
fix in the next video. But speaking of next videos, I'll see you there. Let's train our first
6002
10:43:33,840 --> 10:43:38,480
classification model. Well, this is very exciting. I'll see you soon.
6003
10:43:38,480 --> 10:43:46,160
Welcome back. In the last video, we wrote a mammoth amount of code, but nothing that we
6004
10:43:46,160 --> 10:43:50,240
can't handle. We've been through a lot of these steps. We did have to talk about a few tidbits
6005
10:43:50,240 --> 10:43:56,160
between using different loss functions, namely the BCE loss, which is binary cross entropy loss,
6006
10:43:56,160 --> 10:44:02,480
and the BCE with logit's loss. We discussed that the BCE loss in PyTorch expects prediction
6007
10:44:02,480 --> 10:44:08,960
probabilities as input. So we have to convert our model's logits. Logits are the raw output of the
6008
10:44:08,960 --> 10:44:16,240
model to prediction probabilities using the torch dot sigmoid activation function. And if we're using
6009
10:44:16,240 --> 10:44:23,520
BCE with logits loss, it expects raw logits as input as sort of the name hints at. And so we
6010
10:44:23,520 --> 10:44:29,840
just pass it straight away the raw logits. Whereas our own custom accuracy function compares labels
6011
10:44:29,840 --> 10:44:34,080
to labels. And that's kind of what we've been stepping through over the last few videos,
6012
10:44:34,080 --> 10:44:39,600
is going from logits to predprobs to pred labels, because that's the ideal output of our model is
6013
10:44:39,600 --> 10:44:46,080
some kind of label that we as humans can interpret. And so let's keep pushing forward. You may have
6014
10:44:46,080 --> 10:44:50,080
already tried to run this training loop. I don't know if it works. We wrote all this code to get
6015
10:44:50,080 --> 10:44:54,000
them in the last video. And it's probably an error somewhere. So you ready? We're going to train
6016
10:44:54,000 --> 10:45:00,800
our first classification model together for 100 epochs. If it all goes to plan in three, two,
6017
10:45:00,800 --> 10:45:07,280
one, let's run. Oh my gosh, it actually worked the first time. I promise you, I didn't change
6018
10:45:07,280 --> 10:45:13,280
anything in here from the last video. So let's inspect what's going on. It trains pretty fast.
6019
10:45:13,280 --> 10:45:18,640
Why? Well, because we're using a GPU, so it's going to be accelerated as much as it can anyway.
6020
10:45:18,640 --> 10:45:22,640
And our data set is quite small. And our network is quite small. So you won't always
6021
10:45:22,640 --> 10:45:28,160
get networks training this fast. They did 100 epochs in like a second. So the loss. Oh,
6022
10:45:29,280 --> 10:45:37,360
0.69973. It doesn't go down very much. The accuracy even starts high and then goes down.
6023
10:45:38,480 --> 10:45:43,440
What's going on here? Our model doesn't seem to be learning anything. So what would an ideal
6024
10:45:43,440 --> 10:45:49,760
accuracy be? An ideal accuracy is 100. And what's an ideal loss value? Well, zero, because lower
6025
10:45:49,760 --> 10:45:56,720
is better for loss. Hmm, this is confusing. And now if we go, have a look at our blue and red
6026
10:45:56,720 --> 10:46:05,600
dots. Where's our data? So I reckon, do we still have a data frame here? How many samples do we
6027
10:46:05,600 --> 10:46:13,600
have of each? Let's inspect. Let's do some data analysis. Where do we create a data frame here?
6028
10:46:13,600 --> 10:46:20,080
Now, circles, do we still have this instantiated circles dot label dot? We're going to call on
6029
10:46:20,080 --> 10:46:28,880
pandas here, value counts. Is this going to output how many of each? Okay. Wow, we've got 500 of
6030
10:46:28,880 --> 10:46:36,960
class one and 500 of class zero. So we have 500 red dots and blue dots, which means we have a
6031
10:46:36,960 --> 10:46:42,800
balanced data set. So if we're getting, we're basically trying to predict heads or tails here.
6032
10:46:42,800 --> 10:46:47,920
So if we're getting an accuracy of under 50%, or about 50%, if you rounded it up.
6033
10:46:49,360 --> 10:46:54,640
Our model is basically doing as well as guessing. Well, what gives? Well, I think we should get
6034
10:46:54,640 --> 10:46:59,760
visual with this. So let's make some predictions with our model, because these are just numbers
6035
10:46:59,760 --> 10:47:04,960
on the page. It's hard to interpret what's going on. But our intuition now is because we have 500
6036
10:47:04,960 --> 10:47:09,920
samples of each, or in the case of the training data set, we have 400 of each because we have
6037
10:47:09,920 --> 10:47:15,920
800 samples in the training data set. And we have in the testing data set, we have 200 total
6038
10:47:15,920 --> 10:47:20,320
samples. So we have 100 of each. We're basically doing a coin flip here. Our model is as good as
6039
10:47:20,320 --> 10:47:28,640
guessing. So turn to investigate why our model is not learning. And one of the ways we can do
6040
10:47:28,640 --> 10:47:35,280
that is by visualizing our predictions. So let's write down here from the metrics. It looks like
6041
10:47:35,280 --> 10:47:48,160
our model isn't learning anything. So to inspect it, let's make some predictions and make them
6042
10:47:48,160 --> 10:47:59,680
visual. And we're right down here. In other words, visualize, visualize, visualize. All right.
6043
10:48:00,320 --> 10:48:04,000
So we've trained a model. We've at least got the structure for the training code here.
6044
10:48:04,000 --> 10:48:08,720
But this is the right training code. We've written this code before. So you know that this set up
6045
10:48:08,720 --> 10:48:12,880
for training code does allow a model to train. So there must be something wrong with either
6046
10:48:12,880 --> 10:48:17,840
how we've built our model, the data set. But let's keep going and investigate together.
6047
10:48:17,840 --> 10:48:24,400
So to do so, I've got a function that I've pre-built earlier. Did I mention that we're learning side
6048
10:48:24,400 --> 10:48:28,480
by side of a machine learning cooking show? So this is an ingredient I prepared earlier,
6049
10:48:28,480 --> 10:48:38,160
a part of a dish. So to do so, we're going to import a function called plot decision,
6050
10:48:39,120 --> 10:48:42,320
or maybe I'll turn this into code, plot decision boundary.
6051
10:48:44,800 --> 10:48:50,960
Welcome to the cooking show, cooking with machine learning. What model will we cook up today?
6052
10:48:50,960 --> 10:48:59,360
So if we go to pytorch deep learning, well, it's already over here, but this is the home repo for
6053
10:48:59,360 --> 10:49:03,920
the course, the link for this will be scattered everywhere. But there's a little function here
6054
10:49:03,920 --> 10:49:08,480
called helper functions dot py, which I'm going to fill up with helper functions throughout the
6055
10:49:08,480 --> 10:49:13,360
course. And this is the one I'm talking about here, plot decision boundary. Now we could just
6056
10:49:13,360 --> 10:49:18,320
copy this into our notebook, or I'm going to write some code to import this programmatically,
6057
10:49:18,320 --> 10:49:22,960
so we can use other functions from in here. Here's our plot predictions function that we made in
6058
10:49:22,960 --> 10:49:30,480
the last section, zero one, but this plot decision boundary is a function that I got inspired by
6059
10:49:30,480 --> 10:49:36,880
to create from madewithml.com. Now this is another resource, a little bit of an aside,
6060
10:49:36,880 --> 10:49:41,600
I highly recommend going through this by Goku Mohandas. It gives you the foundations of neural
6061
10:49:41,600 --> 10:49:49,360
networks and also ml ops, which is a field, which is based on getting your neural networks and machine
6062
10:49:49,360 --> 10:49:56,160
learning models into applications that other people can use. So I can't recommend this resource
6063
10:49:56,160 --> 10:50:02,080
enough. So please, please, please check that out if you want another resource for machine learning,
6064
10:50:02,080 --> 10:50:06,480
but this is where this helper function came from. So thank you, Goku Mohandas. I've made a little
6065
10:50:06,480 --> 10:50:12,800
bit of modifications for this course, but not too many. So we could either copy that, paste it in
6066
10:50:12,800 --> 10:50:20,960
here, or we could write some code to import it for us magically, or using the power of the internet,
6067
10:50:20,960 --> 10:50:24,400
right, because that's what we are. We're programmers, we're machine learning engineers, we're data
6068
10:50:24,400 --> 10:50:31,120
scientists. So from pathlib, so the request module in Python is a module that allows you to make
6069
10:50:31,120 --> 10:50:36,400
requests, a request is like going to a website, hey, I'd like to get this code from you, or this
6070
10:50:36,400 --> 10:50:40,560
information from you, can you please send it to me? So that's what that allows us to do,
6071
10:50:40,560 --> 10:50:46,480
and pathlib, we've seen pathlib before, but it allows us to create file parts. Because why? Well,
6072
10:50:46,480 --> 10:50:54,080
we want to save this helper function dot pi script to our Google collab files. And so we can do this
6073
10:50:54,080 --> 10:51:03,760
with a little bit of code. So download helper functions from learn pytorch repo. If it's not
6074
10:51:04,320 --> 10:51:13,920
already downloaded. So let's see how we can do that. So we're going to write some if else code to
6075
10:51:13,920 --> 10:51:23,520
check to see if the path of helper functions dot pi already exist, we don't want to download it again.
6076
10:51:23,520 --> 10:51:30,560
So at the moment, it doesn't exist. So this if statement is going to return false. So let's just
6077
10:51:30,560 --> 10:51:37,440
print out what it does if it returns true helper functions dot pi already exists. We might we could
6078
10:51:37,440 --> 10:51:43,920
even probably do a try and accept looping about if else will help us out for now. So if it exists
6079
10:51:43,920 --> 10:51:53,600
else, print downloading helper functions dot pi. So ours doesn't exist. So it's going to make a
6080
10:51:53,600 --> 10:52:02,400
request or let's set up our request request dot get. And here's where we can put in a URL. But we
6081
10:52:02,400 --> 10:52:09,360
need the raw version of it. So this is the raw version. If we go back, this is just pytorch deep
6082
10:52:09,360 --> 10:52:15,920
learning the repo for this course slash helper functions. If I click raw, I'm going to copy that.
6083
10:52:16,560 --> 10:52:21,040
Oh, don't want to go in there want to go into request get type that in this has to be in a
6084
10:52:21,040 --> 10:52:28,880
string format. So we get the raw URL. And then we're going to go with open, we're going to open
6085
10:52:28,880 --> 10:52:39,200
a file called helper functions dot pi. And we're going to set the context to be right binary,
6086
10:52:39,200 --> 10:52:46,640
which is wb as file F is a common short version of writing file. Because we're going to call
6087
10:52:46,640 --> 10:52:53,920
file dot write, and then request dot content. So this code is basically saying hey requests,
6088
10:52:53,920 --> 10:52:59,120
get the information that's at this link here, which is of course, all of this code here,
6089
10:52:59,120 --> 10:53:05,280
which is a Python script. And then we're going to create a file called helper functions dot pi,
6090
10:53:05,280 --> 10:53:09,920
which gives us write permissions. We're going to name it F, which is short for file. And then
6091
10:53:09,920 --> 10:53:17,280
we're going to call on it file dot write the content of the request. So instead of talking
6092
10:53:17,280 --> 10:53:22,240
through it, how about we see it in action? We'll know if it works if we can from helper functions
6093
10:53:22,240 --> 10:53:28,080
import plot predictions, we're going to use plot predictions later on, as well as plot decision
6094
10:53:28,080 --> 10:53:35,920
boundary. So plot predictions we wrote in the last section. Wonderful. I'm going to write here,
6095
10:53:35,920 --> 10:53:42,080
downloading helper functions dot pi did at work. We have helper functions dot pi. Look at that,
6096
10:53:42,080 --> 10:53:47,360
we've done it programmatically. Can we view this in Google column? Oh my goodness, yes we can.
6097
10:53:47,360 --> 10:53:52,880
And look at that. So this may evolve by the time you do the course, but these are just some general
6098
10:53:52,880 --> 10:53:56,880
helper functions rather than writing all of this out. If you would like to know what's going on
6099
10:53:56,880 --> 10:54:01,440
in plot decision boundary, I encourage you to read through here. And what's going on,
6100
10:54:01,440 --> 10:54:07,360
you can step by step at yourself. There is nothing here that you can't tackle yourself. It's all
6101
10:54:07,360 --> 10:54:12,080
just Python code, no secrets just Python code. We've got we're making predictions with a
6102
10:54:12,080 --> 10:54:17,280
PyTorch model. And then we're testing for multi class or binary. So we're going to get out of that.
6103
10:54:17,280 --> 10:54:23,840
But now let's see the ultimate test is if the plot decision boundary function works. So again,
6104
10:54:23,840 --> 10:54:31,040
we could discuss plot decision boundary of the model. We could discuss what it does behind the scenes
6105
10:54:33,040 --> 10:54:38,640
to the cows come home. But we're going to see it in real life here. I like to get visual.
6106
10:54:38,640 --> 10:54:45,520
So fig size 12, six, we're going to create a plot here, because we are adhering to the data
6107
10:54:45,520 --> 10:54:51,760
explorer's motto of visualize visualize visualize. And we want to subplot because we're going to
6108
10:54:51,760 --> 10:54:59,760
compare our training and test sets here, train. And then we're going to go PLT, or actually we'll
6109
10:54:59,760 --> 10:55:05,280
plot the first one, plot decision boundary. Now, because we're doing a training plot here,
6110
10:55:05,280 --> 10:55:11,520
we're going to pass in model zero and X train and Y train. Now, this is the order that the
6111
10:55:11,520 --> 10:55:18,960
parameters go in. If we press command shift space, I believe Google collab, if it's working with me,
6112
10:55:18,960 --> 10:55:25,680
we'll put up a doc string. There we go, plot decision boundary. Look at the inputs that it
6113
10:55:25,680 --> 10:55:31,120
takes model, which is torch and end up module. And we've got X, which is our X value, which is a
6114
10:55:31,120 --> 10:55:36,640
torch tensor, and Y, which is our torch tensor value here. So that's for the training data.
6115
10:55:36,640 --> 10:55:42,240
Now, let's do the same for the testing data, plot dot subplot. This is going to be one, two,
6116
10:55:42,240 --> 10:55:47,600
two for the index. This is just number of rows of the plot, number of columns. And this is the
6117
10:55:47,600 --> 10:55:52,640
index. So this plot will appear on the first slot. We're going to see this anyway. Anything
6118
10:55:52,640 --> 10:55:58,480
below this code will appear on the second slot, PLT dot title. And we're going to call this one
6119
10:55:58,480 --> 10:56:04,160
test. Then we're going to call plot decision boundary. If this works, this is going to be some
6120
10:56:04,160 --> 10:56:11,440
serious magic. I love visualization functions in machine learning. Okay, you ready? Three,
6121
10:56:11,440 --> 10:56:19,520
two, one, let's check it out. How's our model doing? Oh, look at that. Oh, now it's clear.
6122
10:56:19,520 --> 10:56:24,160
So behind the scenes, this is the plots that plot decision boundary is making. Of course,
6123
10:56:24,160 --> 10:56:28,800
this is the training data. This is the testing data, not as many dot points here, but the same
6124
10:56:28,800 --> 10:56:33,440
sort of line of what's going on. So this is the line that our model is trying to draw through the
6125
10:56:33,440 --> 10:56:38,960
data. No wonder it's getting about 50% accuracy and the loss isn't going down. It's just trying
6126
10:56:38,960 --> 10:56:43,600
to split the data straight through the middle. It's drawing a straight line. But our data is
6127
10:56:43,600 --> 10:56:50,320
circular. Why do you think it's drawing a straight line? Well, do you think it has anything to do
6128
10:56:50,320 --> 10:56:55,920
with the fact that our model is just made with using pure linear layers? Let's go back to our model.
6129
10:56:55,920 --> 10:57:01,680
What's it comprised on? Just a couple of linear layers. What's a linear line? If we look up linear
6130
10:57:01,680 --> 10:57:06,480
line, is this going to work with me? I don't actually think it might. There we go. Linear line,
6131
10:57:06,480 --> 10:57:10,720
all straight lines. So I want you to have a think about this, even if you're completely
6132
10:57:10,720 --> 10:57:17,840
new to deep learning, can we? You can answer this question. Can we ever separate this circular data
6133
10:57:17,840 --> 10:57:24,320
with straight lines? I mean, maybe we could if we drew straight lines here, but then trying to
6134
10:57:24,320 --> 10:57:28,480
curve them around. But there's an easier way. We're going to see that later on. For now,
6135
10:57:29,120 --> 10:57:35,760
how about we try to improve our model? So the model that we built, we've got 100 epochs.
6136
10:57:35,760 --> 10:57:39,920
I wonder if our model will improve if we trained it for longer. So that's a little bit of a challenge
6137
10:57:39,920 --> 10:57:46,080
before the next video. See if you can train the model for 1000 epochs. Does that improve the
6138
10:57:46,080 --> 10:57:51,040
results here? And if it doesn't improve the results here, have a think about why that might be.
6139
10:57:52,400 --> 10:57:59,200
I'll see you in the next video. Welcome back. In the last video, we wrote some code to download
6140
10:57:59,200 --> 10:58:04,080
a series of helper functions from our helper functions dot pi. And later on, you'll see why
6141
10:58:04,080 --> 10:58:10,240
this is quite standard practice as you write more and more code is to write some code, store them
6142
10:58:10,240 --> 10:58:15,600
somewhere such as a Python script like this. And then instead of us rewriting everything that we
6143
10:58:15,600 --> 10:58:20,640
have and helper functions, we just import them and then use them later on. This is similar to
6144
10:58:20,640 --> 10:58:25,680
what we've been doing with PyTorch. PyTorch is essentially just a collection of Python scripts
6145
10:58:25,680 --> 10:58:30,880
that we're using to build neural networks. Well, there's a lot more than what we've just done.
6146
10:58:30,880 --> 10:58:34,640
I mean, we've got one here, but PyTorch is a collection of probably hundreds of different
6147
10:58:34,640 --> 10:58:39,600
Python scripts. But that's beside the point. We're trying to train a model here to separate
6148
10:58:39,600 --> 10:58:45,440
blue and red dots. But our current model is only drawing straight lines. And I got you to
6149
10:58:45,440 --> 10:58:50,960
have a think about whether our straight line model, our linear model could ever separate this data.
6150
10:58:50,960 --> 10:58:56,160
Maybe it could. And I issued the challenge to see if it could if you trained for 1000 epochs.
6151
10:58:56,960 --> 10:59:05,360
So did it improve at anything? Is the accuracy any higher? Well, speaking of training for more
6152
10:59:05,360 --> 10:59:13,760
epochs, we're up to section number five, improving a model. This is from a model perspective. So now
6153
10:59:13,760 --> 10:59:20,720
let's discuss some ways. If you were getting results after you train a machine learning model or a
6154
10:59:20,720 --> 10:59:24,240
deep learning model, whatever kind of model you're working with, and you weren't happy with those
6155
10:59:24,240 --> 10:59:30,240
results. So how could you go about improving them? So this is going to be a little bit of an overview
6156
10:59:30,240 --> 10:59:37,840
of what we're going to get into. So one way is to add more layers. So give the model more chances
6157
10:59:37,840 --> 10:59:45,200
to learn about patterns in the data. Why would that help? Because if our model currently has two
6158
10:59:45,200 --> 10:59:56,400
layers, model zero dot state dinked. Well, we've got however many numbers here, 20 or so. So this
6159
10:59:56,400 --> 11:00:00,720
is zero flayer. This is the first layer. If we had 10 of these, well, we'd have 10 times the
6160
11:00:00,720 --> 11:00:06,720
amount of parameters to try and learn the patterns in this data, a representation of this data.
6161
11:00:06,720 --> 11:00:16,480
Another way is to add more hidden units. So what I mean by that is we created this model here,
6162
11:00:16,480 --> 11:00:24,160
and each of these layers has five hidden units. The first one outputs, out features equals five,
6163
11:00:24,160 --> 11:00:31,920
and this one takes in features equals five. So we could go from, go from five hidden units to
6164
11:00:31,920 --> 11:00:41,520
10 hidden units. The same principle as above applies here is that the more parameters our model has
6165
11:00:41,520 --> 11:00:46,720
to represent our data, the potentially now I say potentially here because some of these things
6166
11:00:46,720 --> 11:00:52,720
might not necessarily work. So our data sets quite simple. So maybe if we added too many layers,
6167
11:00:52,720 --> 11:00:56,800
our models trying to learn things that are too complex, it's trying to adjust too many numbers
6168
11:00:56,800 --> 11:01:01,840
for the data set that we have the same thing for more hidden units. What other options do we
6169
11:01:01,840 --> 11:01:08,640
have? Well, we could fit for longer, give the model more of a chance to learn because every epoch
6170
11:01:08,640 --> 11:01:13,600
is one pass through the data. So maybe 100 times looking at this data set wasn't enough.
6171
11:01:13,600 --> 11:01:18,160
So maybe you could fit for 1000 times, which was the challenge. Then there's change in the
6172
11:01:18,160 --> 11:01:23,200
activation functions, which we're using sigmoid at the moment, which is generally the activation
6173
11:01:23,200 --> 11:01:28,240
function you use for a binary classification problem. But there are also activation functions
6174
11:01:28,240 --> 11:01:34,240
you can put within your model. Hmm, there's a little hint that we'll get to that later.
6175
11:01:34,240 --> 11:01:40,880
Then there's change the learning rate. So the learning rate is the amount the optimizer will
6176
11:01:40,880 --> 11:01:46,720
adjust these every epoch. And if it's too small, our model might not learn anything because it's
6177
11:01:46,720 --> 11:01:52,000
taking forever to change these numbers. But if also on the other side of things, if the learning
6178
11:01:52,000 --> 11:01:58,560
rate is too high, these updates might be too large. And our model might just explode. There's an
6179
11:01:58,560 --> 11:02:05,760
actual problem in machine learning called exploding gradient problem, where the numbers just get
6180
11:02:05,760 --> 11:02:10,800
too large. On the other side, there's also a vanishing gradients problem, where the gradients
6181
11:02:10,800 --> 11:02:17,520
just go basically to zero too quickly. And then there's also change the loss function. But I feel
6182
11:02:17,520 --> 11:02:23,520
like for now, sigmoid and binary cross entropy, pretty good, pretty standard. So we're going to
6183
11:02:23,520 --> 11:02:28,560
have a look at some options here, add more layers and fit for longer, maybe changing the learning
6184
11:02:28,560 --> 11:02:33,680
rate. But let's just add a little bit of color to what we've been talking about. Right now,
6185
11:02:33,680 --> 11:02:37,200
we've fit the model to the data and made a prediction. I'm just going to step through this.
6186
11:02:38,080 --> 11:02:42,240
Where are we up to? We've done this, we've done this, we've done these two, we've built a training
6187
11:02:42,240 --> 11:02:46,400
loop, we've fit the model to the data, made a prediction, we've evaluated our model visually,
6188
11:02:46,400 --> 11:02:50,240
and we're not happy with that. So we're up to number five, we're going to improve through
6189
11:02:50,240 --> 11:02:54,560
experimentation. We don't need to use TensorBoard just yet, we're going to talk about this as our
6190
11:02:54,560 --> 11:03:01,200
high level. TensorBoard is a tool or a utility from PyTorch, which helps you to monitor experiments.
6191
11:03:01,200 --> 11:03:05,520
We'll see that later on. And then we'll get to this, we won't save our model until we've got one
6192
11:03:05,520 --> 11:03:10,800
that we're happy with. And so if we look at what we've just talked about improving a model from a
6193
11:03:10,800 --> 11:03:16,800
model's perspective, let's talk about the things we've talked about with some color this time. So
6194
11:03:16,800 --> 11:03:21,120
say we've got a model here, this isn't the exact model that we're working with, but it's similar
6195
11:03:21,120 --> 11:03:27,200
structure. We've got one, two, three, four layers, we've got a loss function BC with Logit's loss,
6196
11:03:27,200 --> 11:03:32,640
we've got an optimizer, optimizer stochastic gradient descent, and if we did write some training code,
6197
11:03:32,640 --> 11:03:38,320
this is 10 epochs. And then the testing code here, I've just cut it out because it wouldn't fit on
6198
11:03:38,320 --> 11:03:43,040
the slide. Then if we wanted to go to a larger model, let's add some color here so we can highlight
6199
11:03:43,040 --> 11:03:49,680
what's happening, adding layers. Okay, so this one's got one, two, three, four, five, six layers.
6200
11:03:50,320 --> 11:03:56,160
And we've got another color here, which is I'd say this is like a little bit of a greeny blue
6201
11:03:56,160 --> 11:04:01,200
increase the number of hidden units. Okay, so the hidden units are these features here.
6202
11:04:01,200 --> 11:04:08,000
We've gone from 100 to 128 to 128. Remember, the out features of a previous layer have to line up
6203
11:04:08,000 --> 11:04:14,400
with the in features of a next layer. Then we've gone to 256. Wow. So remember how I said multiples
6204
11:04:14,400 --> 11:04:18,400
of eight are pretty good generally in deep learning? Well, this is where these numbers come from.
6205
11:04:19,280 --> 11:04:25,440
And then what else do we have change slash add activation functions? We haven't seen this before
6206
11:04:25,440 --> 11:04:29,440
and end up relu. If you want to jump ahead and have a look at what and end up relu is,
6207
11:04:29,440 --> 11:04:35,280
how would you find out about it? Well, I just Google and end up relu. But we're going to have
6208
11:04:35,280 --> 11:04:41,120
a look at what this is later on. We can see here that this one's got one, but this larger model has
6209
11:04:41,120 --> 11:04:48,080
some relu's scattered between the linear layers. Hmm, maybe that's a hint. If we combine a linear
6210
11:04:48,080 --> 11:04:54,000
layer with a relu, what's a relu layer? I'm not going to spoil this. We're going to find out
6211
11:04:54,000 --> 11:05:00,160
later on change the optimization function. Okay. So we've got SGD. Do you recall how I said
6212
11:05:00,160 --> 11:05:05,600
Adam is another popular one that works fairly well across a lot of problems as well. So Adam
6213
11:05:05,600 --> 11:05:11,200
might be a better option for us here. The learning rate as well. So maybe this learning rate was a
6214
11:05:11,200 --> 11:05:16,240
little too high. And so we've divided it by 10. And then finally, fitting for longer. So instead
6215
11:05:16,240 --> 11:05:21,920
of 10 epochs, we've gone to 100. So how about we try to implement some of these with our own model
6216
11:05:21,920 --> 11:05:27,120
to see if it improves what we've got going on here? Because frankly, like, this isn't
6217
11:05:27,120 --> 11:05:30,800
satisfactory. We're trying to build a neural network here. Neural networks are supposed to be
6218
11:05:30,800 --> 11:05:35,360
these models that can learn almost anything. And we can't even separate some blue dots from
6219
11:05:35,360 --> 11:05:41,120
some red dots. So in the next video, how about we run through writing some code to do some of
6220
11:05:41,120 --> 11:05:46,640
these steps here? In fact, if you want to try yourself, I'd highly encourage that. So I'd start
6221
11:05:46,640 --> 11:05:51,840
with trying to add some more layers and add some more hitting units and fitting for longer. You can
6222
11:05:51,840 --> 11:05:59,520
keep all of the other settings the same for now. But I'll see you in the next video. Welcome back.
6223
11:05:59,520 --> 11:06:05,040
In the last video, we discussed some options to improve our model from a model perspective. And
6224
11:06:05,040 --> 11:06:08,960
namely, we're trying to improve it so that the predictions are better, so that the patterns it
6225
11:06:08,960 --> 11:06:14,640
learns better represent the data. So we can separate blue dots from red dots. And you might be wondering
6226
11:06:14,640 --> 11:06:23,600
why we said from a model perspective here. So let me just write these down. These options are all
6227
11:06:23,600 --> 11:06:35,360
from a models perspective, because they deal directly with the model, rather than the data.
6228
11:06:36,000 --> 11:06:41,680
So there's another way to improve a models results is if the model was sound already,
6229
11:06:41,680 --> 11:06:47,920
in machine learning and deep learning, you may be aware that generally if you have more data samples,
6230
11:06:47,920 --> 11:06:53,840
the model learns or gets better results because it has more opportunity to learn. There's a few
6231
11:06:53,840 --> 11:06:59,040
other ways to improve a model from a data perspective, but we're going to focus on improving a model
6232
11:06:59,040 --> 11:07:12,400
from a models perspective. So, and because these options are all values we as machine learning
6233
11:07:13,760 --> 11:07:26,480
engineers and data scientists can change, they are referred to as hyper parameters.
6234
11:07:26,480 --> 11:07:33,520
So a little bit of an important distinction here. Parameters are the numbers within a model.
6235
11:07:33,520 --> 11:07:37,920
The parameters here, like these values, the weights and biases are parameters,
6236
11:07:37,920 --> 11:07:43,840
are the values a model updates by itself. Hyper parameters are what we as machine learning
6237
11:07:43,840 --> 11:07:48,480
engineers and data scientists, such as adding more layers, more hidden units, fitting for longer
6238
11:07:48,480 --> 11:07:54,640
number of epochs, activation functions, learning rate, loss functions are hyper parameters because
6239
11:07:54,640 --> 11:08:00,800
they're values that we can change. So let's change some of the hyper parameters of our model.
6240
11:08:01,600 --> 11:08:08,400
So we'll create circle model v1. We're going to import from nn.module as well. We could write this
6241
11:08:08,400 --> 11:08:14,560
model using nn.sequential, but we're going to subclass nn.module for practice.
6242
11:08:15,680 --> 11:08:21,120
Why would we use nn.sequential? Well, because as you'll see, our model is not too complicated,
6243
11:08:21,120 --> 11:08:31,520
but we subclass nn.module. In fact, nn.sequential. So if we write here, nn.sequential is also a
6244
11:08:31,520 --> 11:08:40,720
version of nn.module. But we subclass nn.module here for one for practice and for later on,
6245
11:08:40,720 --> 11:08:45,920
if we wanted to, or if you wanted to make more complex models, you're going to see a subclass
6246
11:08:45,920 --> 11:08:52,800
of nn.module a lot in the wild. So the first change we're going to update is the number
6247
11:08:52,800 --> 11:09:01,680
of hidden units. So out features, I might write this down before we do it. Let's try and improve
6248
11:09:01,680 --> 11:09:14,160
our model by adding more hidden units. So this will go from five and we'll increase it to 10.
6249
11:09:14,160 --> 11:09:23,280
And we want to increase the number of layers. So we want to go from two to three. We'll add an
6250
11:09:23,280 --> 11:09:33,120
extra layer and then increase the number of epochs. So we're going to go from 100 to 1,000. Now,
6251
11:09:33,120 --> 11:09:39,200
what can you, we're going to put on our scientist hats for a second. What would be the problem with
6252
11:09:39,200 --> 11:09:45,920
the way we're running this experiment? If we're doing all three things in one hit, why might that
6253
11:09:45,920 --> 11:09:52,080
be problematic? Well, because we might not know which one offered the improvement if there is
6254
11:09:52,080 --> 11:09:57,280
any improvement or degradation. So just to keep in mind going forward, I'm just doing this as an
6255
11:09:57,280 --> 11:10:02,000
example of how we can change all of these. But generally, when you're doing machine learning
6256
11:10:02,000 --> 11:10:10,160
experiments, you'd only like to change one value at a time and track the results. So that's called
6257
11:10:10,160 --> 11:10:14,240
experiment tracking and machine learning. We're going to have a look at experiment tracking a
6258
11:10:14,240 --> 11:10:19,520
little later on in the course, but just keep that in mind. A scientist likes to change one
6259
11:10:20,480 --> 11:10:25,520
variable of what's going on so that they can control what's happening. But we're going to
6260
11:10:25,520 --> 11:10:32,640
create this next layer here layer two. And of course, it takes the same number of out features as
6261
11:10:32,640 --> 11:10:40,480
in features as the previous layer. This is two because why our X train has. Let's look at just
6262
11:10:40,480 --> 11:10:47,040
the first five samples has two features. So now we're going to create self layer three, which
6263
11:10:47,040 --> 11:10:53,280
equals an n dot linear. The in features here is going to be 10. Why? Because the layer above
6264
11:10:53,280 --> 11:10:58,560
has out features equals 10. So what we've changed here so far is we've got hidden units previously
6265
11:10:58,560 --> 11:11:05,440
in the zero of this model was five. And now we've got a third layout, which previously before was
6266
11:11:05,440 --> 11:11:12,240
two. So these are two of our main changes here. And out features equals one, because why? Let's
6267
11:11:12,240 --> 11:11:19,120
have a look at speaking of why. Our why is just one number. So remember the shapes, the input and
6268
11:11:19,120 --> 11:11:23,520
output shapes of a model is one of the most important things in deep learning. We're going to see
6269
11:11:23,520 --> 11:11:28,000
different values for the shapes later on. But because we're working with this data set, we're
6270
11:11:28,000 --> 11:11:34,960
focused on two in features and one out feature. So now that we've got our layers prepared,
6271
11:11:34,960 --> 11:11:41,600
what's next? Well, we have to override the forward method, because every subclass of
6272
11:11:41,600 --> 11:11:49,280
an n dot module has to implement a forward method. So what are we going to do here? Well, we could,
6273
11:11:49,280 --> 11:11:55,680
let me just show you one option. We could go z, which would be z for logits. Logits is actually
6274
11:11:55,680 --> 11:12:03,040
represented by z, fun fact. But you could actually put any variable here. So this could be x one,
6275
11:12:03,040 --> 11:12:07,440
or you could reset x if you wanted to. I just look putting a different one because it's a little
6276
11:12:07,440 --> 11:12:14,560
less confusing for me. And then we could go update z by going self layer two. And then the,
6277
11:12:14,560 --> 11:12:21,280
because z above is the output of layer one, it now goes into here. And then if we go z,
6278
11:12:22,560 --> 11:12:28,080
again, equals self layer three, what's this going to take? It's going to take z from above.
6279
11:12:28,800 --> 11:12:33,200
So this is saying, hey, give me x, put it through layer one, assign it to z. And then
6280
11:12:33,200 --> 11:12:39,520
create a new variable z or override z with self layer two with z from before as the input. And
6281
11:12:39,520 --> 11:12:44,800
then we've got z again, the output of layer two has the input for layer three. And then we could
6282
11:12:44,800 --> 11:12:51,920
return z. So that's just passing our data through each one of these layers here. But a way that
6283
11:12:51,920 --> 11:13:00,080
you can leverage speedups in PyTorch is to call them all at once. So layer three, and we're going
6284
11:13:00,080 --> 11:13:07,920
to put self dot layer two. And this is generally how I'm going to write them. But it also behind
6285
11:13:07,920 --> 11:13:13,280
the scenes, because it's performing all the operations at once, you leverage whatever speed
6286
11:13:13,280 --> 11:13:20,320
ups you can get. Oh, this should be layer one. So it goes in order here. So what's happening?
6287
11:13:20,320 --> 11:13:26,000
Well, it's computing the inside of the brackets first. So layer one, x is going through layer one.
6288
11:13:26,000 --> 11:13:33,200
And then the output of x into layer one is going into layer two. And then the same again,
6289
11:13:33,200 --> 11:13:44,800
for layer three. So this way, this way of writing operations, leverages, speed ups, where possible
6290
11:13:47,360 --> 11:13:54,560
behind the scenes. And so we've done our Ford method there. We're just passing our data through
6291
11:13:54,560 --> 11:14:01,360
layers with an extra hidden units, and an extra layer overall. So now let's create an instance of
6292
11:14:01,360 --> 11:14:07,200
circle model v one, which we're going to set to model one. And we're going to write circle model
6293
11:14:07,200 --> 11:14:13,280
v one. And we're going to send it to the target device, because we like writing device agnostic code.
6294
11:14:14,400 --> 11:14:18,560
And then we're going to check out model one. So let's have a look at what's going on there.
6295
11:14:18,560 --> 11:14:25,760
Beautiful. So now we have a three layered model with more hidden units. So I wonder if we trained
6296
11:14:25,760 --> 11:14:31,680
this model for longer, are we going to get improvements here? So my challenge to you is we've already
6297
11:14:31,680 --> 11:14:36,320
done these steps before. We're going to do them over the next couple of videos for completeness.
6298
11:14:38,320 --> 11:14:45,360
But we need to what create a loss function. So I'll give you a hint. It's very similar to the one
6299
11:14:45,360 --> 11:14:52,560
we've already used. And we need to create an optimizer. And then once we've done that, we need to
6300
11:14:52,560 --> 11:15:01,520
write a training and evaluation loop for model one. So give that a shot. Otherwise, I'll see you
6301
11:15:01,520 --> 11:15:09,280
in the next video. We'll do this all together. Welcome back. In the last video, we subclassed
6302
11:15:09,280 --> 11:15:15,840
nn.module to create circle model V one, which is an upgrade on circle model V zero. In the
6303
11:15:15,840 --> 11:15:22,480
fact that we added more hidden units. So from five to 10. And we added a whole extra layer.
6304
11:15:23,120 --> 11:15:28,960
And we've got an instance of it ready to go. So we're up to in the workflow. We've got our data.
6305
11:15:28,960 --> 11:15:33,360
Well, we haven't changed the data. So we've built our new model. We now need to pick a loss function.
6306
11:15:33,360 --> 11:15:36,880
And I hinted at before that we're going to use the same loss function as before.
6307
11:15:36,880 --> 11:15:41,120
The same optimizer. You might have already done all of these steps. So you may know whether this
6308
11:15:41,120 --> 11:15:45,440
model works on our data set or not. But that's what we're going to work towards finding out in
6309
11:15:45,440 --> 11:15:50,240
this video. So we've built our new model. Now let's pick a loss function and optimizer. We could
6310
11:15:50,240 --> 11:15:54,960
almost do all of this with our eyes closed now, build a training loop, fit the model to the data,
6311
11:15:54,960 --> 11:16:00,320
make a prediction and evaluate the model. We'll come back here. And let's set up a loss function.
6312
11:16:00,320 --> 11:16:07,040
And by the way, if you're wondering, like, why would adding more features here, we've kind of
6313
11:16:07,040 --> 11:16:12,960
hinted at this before. And why would an extra layer improve our model? Well, again, it's back
6314
11:16:12,960 --> 11:16:19,040
to the fact that if we add more neurons, if we add more hidden units, and if we add more layers,
6315
11:16:19,040 --> 11:16:24,560
it just gives our model more numbers to adjust. So look at what's going on here, layer one,
6316
11:16:24,560 --> 11:16:32,320
layer two. Look how many more we have compared to model zero dot state date.
6317
11:16:36,480 --> 11:16:41,040
We have all of these. This is model zero. And we just upgraded it. Look how many more we have
6318
11:16:41,040 --> 11:16:47,360
from just adding an extra layer and more hidden units. So now we have our optimizer can change
6319
11:16:47,360 --> 11:16:53,360
these values to hopefully create a better representation of the data we're trying to fit.
6320
11:16:53,360 --> 11:17:00,720
So we just have more opportunity to learn patterns in our target data set. So that's the theory
6321
11:17:00,720 --> 11:17:06,960
behind it. So let's get rid of ease. Let's create a loss function. What are we going to use? Well,
6322
11:17:06,960 --> 11:17:15,440
we're going to use nn dot BCE with logit's loss. And our optimizer is going to be what? We're
6323
11:17:15,440 --> 11:17:21,680
going to keep that as the same as before, torch dot opt in dot SGD. But we have to be aware that
6324
11:17:21,680 --> 11:17:28,640
because we're using a new model, we have to pass in params of model one. These are the parameters
6325
11:17:28,640 --> 11:17:34,720
we want to optimize. And the LR is going to be 0.1. Is that the same LR we use before learning
6326
11:17:34,720 --> 11:17:42,160
rate? 0.1. Oh, potentially that our learning rate may be too big. 0.1. Where do we create our
6327
11:17:42,160 --> 11:17:48,720
optimizer? So we've written a lot of code here. Optimizer. There we go. 0.1. That's all right.
6328
11:17:48,720 --> 11:17:54,640
So we'll keep it at 0.1 just to keep as many things the same as possible. So we're going to set up
6329
11:17:54,640 --> 11:18:02,480
torch dot manual seed 42 to make training as reproducible as possible torch dot CUDA dot manual
6330
11:18:02,480 --> 11:18:10,560
seed 42. Now, as I said before, don't worry too much if your numbers aren't exactly the same as mine.
6331
11:18:10,560 --> 11:18:17,360
The direction is more important, whether it's good or bad direction. So now let's set up epochs.
6332
11:18:17,360 --> 11:18:25,200
We want to train for longer this time as well. So 1000 epochs. This is one of our three improvements
6333
11:18:25,200 --> 11:18:30,000
that we're trying to do. Adding more hidden units, increase the number of layers and increase the
6334
11:18:30,000 --> 11:18:35,600
number of epochs. So we're going to give our model 1000 looks at the data to try and improve
6335
11:18:35,600 --> 11:18:43,360
its patterns. So put data on the target device. We want to write device agnostic code. And yes,
6336
11:18:43,360 --> 11:18:49,120
we've already done this, but we're going to write it out again for practice because even though we
6337
11:18:49,120 --> 11:18:54,960
could functionize a lot of this, it's good while we're in still the foundation stages to practice
6338
11:18:54,960 --> 11:19:00,000
what's going on here, because I want you to be able to do this with your eyes closed before we
6339
11:19:00,000 --> 11:19:06,320
start to functionize it. So put the training data and the testing data to the target device,
6340
11:19:06,320 --> 11:19:14,320
whatever it is, CPU or GPU. And then we're going to, well, what's our song? For an epoch in range.
6341
11:19:16,160 --> 11:19:20,400
Let's loop through the epochs. We're going to start off with training. What do we do for training? Well,
6342
11:19:20,400 --> 11:19:27,440
we set model one to train. And then what's our first step? Well, we have to forward pass. What's
6343
11:19:27,440 --> 11:19:33,440
our outputs of the model? Well, the raw outputs of a model are logits. So model one, we're going
6344
11:19:33,440 --> 11:19:37,920
to pass it the training data. We're going to squeeze it so that we get rid of an extra one
6345
11:19:37,920 --> 11:19:41,520
dimension. If you don't believe me that we would like to get rid of that one dimension,
6346
11:19:41,520 --> 11:19:47,280
try running the code without that dot squeeze. And why pred equals torch dot round.
6347
11:19:48,880 --> 11:19:57,600
And torch dot sigmoid, why we're calling sigmoid on our logits to go from logits to prediction
6348
11:19:57,600 --> 11:20:07,920
probabilities to prediction labels. And then what do we do next? Well, we calculate the loss
6349
11:20:09,040 --> 11:20:18,080
slash accuracy to here. And remember, accuracy is optional, but loss is not optional. So we're
6350
11:20:18,080 --> 11:20:23,200
going to pass in here, our loss function is going to take in. I wonder if it'll work with just straight
6351
11:20:23,200 --> 11:20:30,800
up why pred? I don't think it will because we're using we need logits in here. Why logits and why
6352
11:20:30,800 --> 11:20:37,680
train? Because why? Oh, Google collab correcting the wrong thing. We have why logits because we're
6353
11:20:37,680 --> 11:20:44,800
using BCE with logits loss here. So let's keep pushing forward. We want our accuracy now,
6354
11:20:44,800 --> 11:20:49,920
which is our accuracy function. And we're going to pass in the order here, which is the reverse
6355
11:20:49,920 --> 11:20:55,120
of above, a little confusing, but I've kept the evaluation function in the same order as
6356
11:20:55,120 --> 11:21:02,960
scikit loan. Why pred equals y pred? Three, we're going to zero the gradients of the optimizer,
6357
11:21:03,760 --> 11:21:08,160
optimizer zero grad. And you might notice that we've started to pick up the pace a little.
6358
11:21:08,800 --> 11:21:13,200
That is perfectly fine. If I'm typing too fast, you can always slow down the video,
6359
11:21:13,840 --> 11:21:17,680
or you could just watch what we're doing and then code it out yourself afterwards,
6360
11:21:17,680 --> 11:21:21,840
the code resources will always be available. We're going to take the last backward
6361
11:21:22,880 --> 11:21:28,560
and perform back propagation. The only reason we're going faster is because we've covered
6362
11:21:28,560 --> 11:21:34,480
these steps. So anything that we sort of spend time here, we've covered in a previous video,
6363
11:21:34,480 --> 11:21:39,920
optimizer step. And this is where the adjustments to all of our models parameters are going to take
6364
11:21:39,920 --> 11:21:46,320
place to hopefully create a better representation of the data. And then we've got testing. What's
6365
11:21:46,320 --> 11:21:51,200
the first step that we do in testing? Well, we call model one dot a vowel to put it in evaluation
6366
11:21:51,200 --> 11:21:55,920
mode. And because we're making predictions, we're going to turn on torch inference mode
6367
11:21:55,920 --> 11:21:59,680
predictions. I call them predictions. Some other places call it inference.
6368
11:22:01,760 --> 11:22:05,440
Remember machine learning has a lot of different names for the same thing.
6369
11:22:05,440 --> 11:22:11,680
Forward pass. So we're going to create the test logits here. Equals model one X test.
6370
11:22:11,680 --> 11:22:16,480
And we're going to squeeze them because we won't don't want the extra one dimension. Just going to
6371
11:22:16,480 --> 11:22:20,880
add some code cells here so that we have more space and I'm typing in the middle of the screen.
6372
11:22:22,000 --> 11:22:26,640
Then I'm going to put in test pred here. How do we get from logits to predictions? Well,
6373
11:22:26,640 --> 11:22:32,080
we go torch dot round. And then we go torch dot sigmoid y sigmoid because we're working with a
6374
11:22:32,080 --> 11:22:37,280
binary classification problem. And to convert logits from a binary classification problem
6375
11:22:37,280 --> 11:22:43,680
to prediction probabilities, we use the sigmoid activation function. And then we're going to
6376
11:22:43,680 --> 11:22:53,280
calculate the loss. So how wrong is our model on the test data? So test last equals loss function.
6377
11:22:53,280 --> 11:23:00,240
We're going to pass it in the test logits. And then we're going to pass it in Y test for the ideal
6378
11:23:00,240 --> 11:23:06,880
labels. And then we're going to also calculate test accuracy. And test accuracy is going to
6379
11:23:06,880 --> 11:23:16,240
take in Y true equals Y test. So the test labels and Y pred equals test pred. So the test predictions
6380
11:23:17,280 --> 11:23:25,280
test predictions here. And our final step is to print out what's happening. So print out what's
6381
11:23:25,280 --> 11:23:31,360
happening. Oh, every tutorial needs a song. If I could, I'd teach everything with song.
6382
11:23:31,360 --> 11:23:38,160
Song and dance. So because we're training for 1000 epochs, how about every 100 epochs we print
6383
11:23:38,160 --> 11:23:45,040
out something. So print f string, and we're going to write epoch in here. So we know what epoch our
6384
11:23:45,040 --> 11:23:50,960
models on. And then we're going to print out the loss. Of course, this is going to be the training
6385
11:23:50,960 --> 11:23:57,200
loss. Because the test loss has test at the front of it. And then accuracy here. Now, of course,
6386
11:23:57,200 --> 11:24:05,760
this is going to be the training accuracy. We go here. And then we're going to pipe. And we're
6387
11:24:05,760 --> 11:24:11,760
going to print out the test loss. And we want the test loss here. We're going to take this to five
6388
11:24:11,760 --> 11:24:17,120
decimal places. Again, when we see the printouts of the different values, do not worry too much
6389
11:24:17,120 --> 11:24:24,080
about the exact numbers on my screen appearing on your screen, because that is inherent to the
6390
11:24:24,080 --> 11:24:31,280
randomness of machine learning. So have we got the direction is more important? Have we got,
6391
11:24:31,280 --> 11:24:35,360
we need a percentage sign here, because that's going to be a bit more complete for accuracy.
6392
11:24:35,360 --> 11:24:40,000
Have we got any errors here? I don't know. I'm just, we've just all coded this free hand,
6393
11:24:40,000 --> 11:24:44,880
right? There's a lot of code going on here. So we're about to train our next model,
6394
11:24:44,880 --> 11:24:49,440
which is the biggest model we've built so far in this course, three layers, 10 hidden units on
6395
11:24:49,440 --> 11:25:02,000
each layer. Let's see what we've got. Three, two, one, run. Oh, what? What? A thousand epochs,
6396
11:25:02,000 --> 11:25:08,560
an extra hidden layer, more hidden units. And we still, our model is still basically a coin toss.
6397
11:25:08,560 --> 11:25:12,320
50%. Now, this can't be for real. Let's plot the decision boundary.
6398
11:25:12,320 --> 11:25:22,080
Plot the decision boundary. To find out, let's get a bit visual. Plot figure, actually, to prevent us
6399
11:25:22,080 --> 11:25:28,480
from writing out all of the plot code, let's just go up here, and we'll copy this. Now, you know,
6400
11:25:28,480 --> 11:25:32,960
I'm not the biggest fan of copying code. But for this case, we've already written it. So there's
6401
11:25:32,960 --> 11:25:38,080
nothing really new here to cover. And we're going to just change this from model zero to model one,
6402
11:25:38,080 --> 11:25:42,480
because why it's our new model that we just trained. And so behind the scenes, plot decision
6403
11:25:42,480 --> 11:25:48,720
boundary is going to make predictions with the target model on the target data set and put it
6404
11:25:48,720 --> 11:25:56,000
into a nice visual representation for us. Oh, I said nice visual representation. What does this
6405
11:25:56,000 --> 11:26:01,600
look like? We've just got a coin toss on our data set. Our model is just again, it's trying
6406
11:26:01,600 --> 11:26:08,800
to draw a straight line to separate circular data. Now, why is this? Our model is based on linear,
6407
11:26:08,800 --> 11:26:18,080
is our data nonlinear? Hmm, maybe I've revealed a few of my tricks. I've done a couple of reveals
6408
11:26:18,080 --> 11:26:24,080
over the past few videos. But this is still quite annoying. And it can be fairly annoying
6409
11:26:24,080 --> 11:26:30,320
when you're training models and they're not working. So how about we verify that this model
6410
11:26:30,320 --> 11:26:36,080
can learn anything? Because right now it's just basically guessing for our data set.
6411
11:26:36,080 --> 11:26:42,000
So this model looks a lot like the model we built in section 01. Let's go back to this.
6412
11:26:42,000 --> 11:26:48,320
This is the learn pytorch.io book pytorch workflow fundamentals. Where did we create a model model
6413
11:26:48,320 --> 11:26:55,840
building essentials? Where did we build a model? Linear regression model? Yeah, here. And then
6414
11:26:55,840 --> 11:27:05,120
dot linear. But we built this model down here. So all we've changed from 01 to here is we've added
6415
11:27:05,120 --> 11:27:11,440
a couple of layers. The forward computation is quite similar. If this model can learn something
6416
11:27:11,440 --> 11:27:18,000
on a straight line, can this model learn something on a straight line? So that's my challenge to you
6417
11:27:18,000 --> 11:27:25,200
is grab the data set that we created in this previous notebook. So data, you could just
6418
11:27:25,200 --> 11:27:31,120
reproduce this in exact data set. And see if you can write some code to fit the model that we built
6419
11:27:31,120 --> 11:27:40,080
here. This one here on the data set that we created in here. Because I want to verify that
6420
11:27:40,080 --> 11:27:46,080
this model can learn anything. Because right now it seems like it's not learning anything at all.
6421
11:27:46,080 --> 11:27:50,720
And that's quite frustrating. So give that a shot. And I'll see you in the next video.
6422
11:27:50,720 --> 11:27:58,320
Welcome back. In the past few videos, we've tried to build a model to separate the blue from red
6423
11:27:58,320 --> 11:28:04,320
dots yet. Our previous efforts have proven futile, but don't worry. We're going to get there. I promise
6424
11:28:04,320 --> 11:28:08,320
you we're going to get there. And I may have a little bit of inside information here. But we're
6425
11:28:08,320 --> 11:28:13,440
going to build a model to separate these blue dots from red dots, a fundamental classification model.
6426
11:28:14,080 --> 11:28:20,480
And we tried a few things in the last couple of videos, such as training for longer, so more epochs.
6427
11:28:20,480 --> 11:28:26,320
We added another layer. We increased the hidden units because we learned of a few methods to
6428
11:28:26,320 --> 11:28:30,880
improve a model from a model perspective, such as upgrading the hyperparameters, such as number
6429
11:28:30,880 --> 11:28:36,240
of layers, more hidden units, fitting for longer, changing the activation functions,
6430
11:28:36,240 --> 11:28:41,040
changing the learning rate, we haven't quite done that one yet, and changing the loss function.
6431
11:28:41,920 --> 11:28:47,520
One way that I like to troubleshoot problems is I'm going to put a subheading here, 5.1.
6432
11:28:47,520 --> 11:28:55,040
We're going to prepare or preparing data to see if our model can fit a straight line.
6433
11:28:56,560 --> 11:29:06,640
So one way to troubleshoot, this is my trick for troubleshooting problems, especially neural
6434
11:29:06,640 --> 11:29:13,440
networks, but just machine learning in general, to troubleshoot a larger problem is to test out
6435
11:29:13,440 --> 11:29:21,920
a smaller problem. And so why is this? Well, because we know that we had something working
6436
11:29:21,920 --> 11:29:29,120
in a previous section, so 01, PyTorch, workflow fundamentals, we built a model here that worked.
6437
11:29:29,680 --> 11:29:36,400
And if we go right down, we know that this linear model can fit a straight line. So we're going
6438
11:29:36,400 --> 11:29:40,640
to replicate a data set to fit a straight line to see if the model that we're building here
6439
11:29:40,640 --> 11:29:46,560
can learn anything at all, because right now it seems like it can't. It's just tossing a coin
6440
11:29:46,560 --> 11:29:53,920
displayed between our data here, which is not ideal. So let's make some data. But yeah, this is the,
6441
11:29:54,480 --> 11:30:00,000
let's create a smaller problem, one that we know that works, and then add more complexity to try
6442
11:30:00,000 --> 11:30:05,120
and solve our larger problem. So create some data. This is going to be the same as notebook 01.
6443
11:30:05,120 --> 11:30:13,040
And I'm going to set up weight equals 0.7 bias equals 0.3. We're going to move quite quickly
6444
11:30:13,040 --> 11:30:19,840
through this because we've seen this in module one, but the overall takeaway from this is we're
6445
11:30:19,840 --> 11:30:25,440
going to see if our model works on any kind of problem at all, or do we have something fundamentally
6446
11:30:25,440 --> 11:30:33,600
wrong, create data. We're going to call it x regression, because it's a straight line, and we
6447
11:30:33,600 --> 11:30:38,480
want it to predict a number rather than a class. So you might be thinking, oh, we might have to change
6448
11:30:38,480 --> 11:30:47,280
a few things of our model architecture. Well, we'll see that in a second dot unsqueeze. And we're
6449
11:30:47,280 --> 11:30:53,840
going to go on the first dimension here or dim equals one. And why regression, we're going to use
6450
11:30:53,840 --> 11:31:00,160
the linear regression formula as well, wait times x, x regression, that is, because we're working
6451
11:31:00,160 --> 11:31:29,160
with a new data set here, plus the bias. So this is linear regression formula. Without epsilon. So it's a simplified version of linear regression, but the same formula that we've seen in a previous section. So now let's check the data. Nothing we really haven't covered here, but we're going to do a sanity check on it to make sure that we're dealing with what we're dealing with.
6452
11:31:29,160 --> 11:31:58,160
What we're dealing with is not just a load of garbage. Because it's all about the data and machine learning. I can't stress to you enough. That's the data explorer's motto is to visualize, visualize, visualize. Oh, what did we get wrong here? Unsqueeze. Did you notice that typo? Why didn't you say something? I'm kidding. There we go. Okay, so we've got 100 samples of x. We've got a different step size here, but that's all right. Let's have a little bit of fun with this. And we've got one x-value, which is, you know, a little bit more.
6453
11:31:58,160 --> 11:32:14,160
One x value per y value is a very similar data set to what we use before. Now, what do we do once we have a data set? Well, if we haven't already got training and test splits, we better make them. So create train and test splits.
6454
11:32:14,160 --> 11:32:27,160
And then we're going to go train split. We're going to use 80% equals int 0.8 times the length of, or we could just put 100 in there.
6455
11:32:27,160 --> 11:32:40,160
But we're going to be specific here. And then we're going to go x train regression, y train regression equals. What are these equal? Well, we're going to go on x regression.
6456
11:32:40,160 --> 11:32:55,160
And we're going to index up to the train split on the x. And then for the y, y regression, we're going to index up to the train split.
6457
11:32:55,160 --> 11:33:09,160
Wonderful. And then we can do the same on the test or creating the test data. Nothing really new here that we need to discuss. We're creating training and test sets. What do they do for each of them?
6458
11:33:09,160 --> 11:33:21,160
Well, the model is going to hopefully learn patterns in the training data set that is able to model the testing data set. And we're going to see that in a second.
6459
11:33:21,160 --> 11:33:37,160
So if we check the length of each, what do we have? Length x train regression. We might just check x train x test regression. What do we have here?
6460
11:33:37,160 --> 11:33:52,160
And then we're going to go length y train regression. Long variable names here. Excuse me for that. But we want to keep it separate from our already existing x and y data. What values do we have here?
6461
11:33:52,160 --> 11:34:12,160
80, 20, 80, 20, beautiful. So 80 training samples to 100 testing samples. That should be enough. Now, because we've got our helper functions file here. And if you don't have this, remember, we wrote some code up here before to where is it?
6462
11:34:12,160 --> 11:34:30,160
To download it from the course GitHub, and we imported plot predictions from it. Now, if we have a look at helper functions.py, it contains the plot predictions function that we created in the last section, section 0.1. There we go. Plot predictions.
6463
11:34:30,160 --> 11:34:41,160
So we're just running this exact same function here, or we're about to run it. It's going to save us from re typing out all of this. That's the beauty of having a helper functions.py file.
6464
11:34:41,160 --> 11:34:52,160
So if we come down here, let's plot our data to visually inspected. Right now, it's just numbers on a page. And we're not going to plot really any predictions because we don't have any predictions yet.
6465
11:34:52,160 --> 11:35:06,160
But we'll pass in the train data is equal to X train regression. And then the next one is the train labels, which is equal to Y train regression.
6466
11:35:06,160 --> 11:35:27,160
And then we have the test data, which is equal to X test regression. And then we have the test labels. Now, I think this should be labels too. Yeah, there we go. Y test progression might be proven wrong as we try to run this function.
6467
11:35:27,160 --> 11:35:42,160
Okay, there we go. So we have some training data and we have some testing data. Now, do you think that our model model one, we have a look what's model one could fit this data.
6468
11:35:42,160 --> 11:35:53,160
Does it have the right amount of in and out features? We may have to adjust these slightly. So I'd like you to think about that. Do we have to change the input features to our model for this data set?
6469
11:35:53,160 --> 11:36:00,160
And do we have to change the out features of our model for this data set? We'll find out in the next video.
6470
11:36:00,160 --> 11:36:16,160
Welcome back. We're currently working through a little side project here, but really the philosophy of what we're doing. We just created a straight line data set because we know that we've built a model in the past back in section 01 to fit a straight line data set.
6471
11:36:16,160 --> 11:36:26,160
And why are we doing this? Well, because the model that we've built so far is not fitting or not working on our circular data set here on our classification data set.
6472
11:36:26,160 --> 11:36:38,160
And so one way to troubleshoot a larger problem is to test out a smaller problem first. So later on, if you're working with a big machine learning data set, you'd probably start with a smaller portion of that data set first.
6473
11:36:38,160 --> 11:36:46,160
Likewise, with a larger machine learning model, instead of starting with a huge model, you'll start with a small model.
6474
11:36:46,160 --> 11:36:55,160
So we're taking a step back here to see if our model is going to learn anything at all on a straight line data set so that we can improve it for a non-straight line data set.
6475
11:36:55,160 --> 11:37:07,160
And there's another hint. Oh, we're going to cover it in a second. I promise you. But let's see how now we can adjust model one to fit a straight line.
6476
11:37:07,160 --> 11:37:16,160
And I should do the question at the end of last video. Do we have to adjust the parameters of model one in any way shape or form to fit this straight line data?
6477
11:37:16,160 --> 11:37:26,160
And you may have realized or you may not have that our model one is set up for our classification data, which has two X input features.
6478
11:37:26,160 --> 11:37:37,160
Whereas this data, if we go X train regression, how many input features do we have? We just get the first sample.
6479
11:37:37,160 --> 11:37:52,160
There's only one value. Or maybe we get the first 10. There's only one value per, let's remind ourselves, this is input and output shapes, one of the most fundamental things in machine learning and deep learning.
6480
11:37:52,160 --> 11:38:01,160
And trust me, I still get this wrong all the time. So that's why I'm harping on about it. We have one feature per one label. So we have to adjust our model slightly.
6481
11:38:01,160 --> 11:38:08,160
We have to change the end features to be one instead of two. The out features can stay the same because we want one number to come out.
6482
11:38:08,160 --> 11:38:23,160
So what we're going to do is code up a little bit different version of model one. So same architecture as model one. But using NN dot sequential, we're going to do the faster way of coding a model here.
6483
11:38:23,160 --> 11:38:30,160
Let's create model two and NN dot sequential. The only thing that's going to change is the number of input features.
6484
11:38:30,160 --> 11:38:42,160
So this will be the exact same code as model one. And the only difference, as I said, will be features or in features is one. And then we'll go out features equals 10.
6485
11:38:42,160 --> 11:38:51,160
So 10 hidden units in the first layer. And of course, the second layer, the number of features here has to line up with the out features of the previous layer.
6486
11:38:51,160 --> 11:39:01,160
This one's going to output 10 features as well. So we're scaling things up from one feature to 10 to try and give our model as much of a chance or as many parameters as possible.
6487
11:39:01,160 --> 11:39:08,160
Of course, we could make this number quite large. We could make it a thousand features if we want. But there is an upper bound on these things.
6488
11:39:08,160 --> 11:39:14,160
And I'm going to let you find those in your experience as a machine learning engineer and a data scientist.
6489
11:39:14,160 --> 11:39:23,160
But for now, we're keeping it nice and small. So we can run as many experiments as possible. Beautiful. Look at that. We've created a sequential model. What happens with NN dot sequential?
6490
11:39:23,160 --> 11:39:31,160
Data goes in here, passes through this layer. Then it passes through this layer. Then it passes through this layer. And what happens when it goes through the layer?
6491
11:39:31,160 --> 11:39:39,160
It triggers the layers forward method, the internal forward method. In the case of NN dot linear, we've seen it. It's got the linear regression formula.
6492
11:39:39,160 --> 11:39:50,160
So if we go NN dot linear, it performs this mathematical operation, the linear transformation. But we've seen that before. Let's keep pushing forward.
6493
11:39:50,160 --> 11:40:00,160
Let's create a loss and an optimizer loss and optimize. We're going to work through our workflow. So loss function, we have to adjust this slightly.
6494
11:40:00,160 --> 11:40:10,160
We're going to use the L1 loss because why we're dealing with a regression problem here rather than a classification problem. And our optimizer, what can we use for our optimizer?
6495
11:40:10,160 --> 11:40:21,160
How about we bring in just the exact same optimizer SGD that we've been using for our classification data. So model two dot params or parameters.
6496
11:40:21,160 --> 11:40:30,160
Always get a little bit confused. And we'll give it an LR of 0.1 because that's what we've been using so far. This is the params here.
6497
11:40:30,160 --> 11:40:38,160
So we want our optimizer to optimize our model two parameters here with a learning rate of 0.1. The learning rate is what?
6498
11:40:38,160 --> 11:40:47,160
The amount each parameter will be or the multiplier that will be applied to each parameter each epoch.
6499
11:40:47,160 --> 11:41:00,160
So now let's train the model. Do you think we could do that in this video? I think we can. So we might just train it on the training data set and then we can evaluate it on the test data set separately.
6500
11:41:00,160 --> 11:41:13,160
So we'll set up both manual seeds, CUDA and because we've set our model to the device up here. So it should be on the GPU or whatever device you have active.
6501
11:41:13,160 --> 11:41:21,160
So set the number of epochs. How many epochs should we set? Well, we set a thousand before, so we'll keep it at that.
6502
11:41:21,160 --> 11:41:30,160
epochs equals a thousand. And now we're getting really good at this sort of stuff here. Let's put our data. Put the data on the target device.
6503
11:41:30,160 --> 11:41:42,160
And I know we've done a lot of the similar steps before, but there's a reason for that. I've kept all these in here because I'd like you to buy the end of this course is to sort of know all of this stuff off by heart.
6504
11:41:42,160 --> 11:41:47,160
And even if you don't know it all off my heart, because trust me, I don't, you know where to look.
6505
11:41:47,160 --> 11:42:00,160
So X train regression, we're going to send this to device. And then we're going to go Y train regression, just a reminder or something to get you to think while we're writing this code.
6506
11:42:00,160 --> 11:42:09,160
What would happen if we didn't put our data on the same device as a model? We've seen that error come up before, but what happens?
6507
11:42:09,160 --> 11:42:16,160
Well, I've just kind of given away, haven't you Daniel? Well, that was a great question. Our code will air off.
6508
11:42:16,160 --> 11:42:22,160
Oh, well, don't worry. There's plenty of questions I've been giving you that I haven't given the answer to yet.
6509
11:42:22,160 --> 11:42:30,160
Device a beautiful. We've got a device agnostic code for the model and for the data. And now let's loop through epochs.
6510
11:42:30,160 --> 11:42:39,160
So train. We're going to for epoch in range epochs for an epoch in a range. Do the forward pass.
6511
11:42:39,160 --> 11:42:49,160
Calculate the loss. So Y pred equals model two. This is the forward pass. X train regression.
6512
11:42:49,160 --> 11:42:58,160
It's all going to work out hunky Dory because our model and our data are on the same device loss equals what we're going to bring in our loss function.
6513
11:42:58,160 --> 11:43:09,160
Then we're going to compare the predictions to Y train regression to the Y labels. What do we do next?
6514
11:43:09,160 --> 11:43:16,160
Optimize a zero grad. Optimize a dot zero grad. We're doing all of this with our comments. Look at us go.
6515
11:43:16,160 --> 11:43:24,160
Loss backward and what's next? Optimize a step, step, step. And of course, we could do some testing here.
6516
11:43:24,160 --> 11:43:33,160
Testing. We'll go model two dot a vowel. And then we'll go with torch dot inference mode.
6517
11:43:33,160 --> 11:43:41,160
We'll do the forward pass. We'll create the test predictions equals model two dot X test regression.
6518
11:43:41,160 --> 11:43:51,160
And then we'll go the test loss equals loss FN on the test predictions and versus the Y test labels.
6519
11:43:51,160 --> 11:44:00,160
Beautiful. Look at that. We've just done an optimization loop, something we spent a whole hour on before, maybe even longer, in about ten lines of code.
6520
11:44:00,160 --> 11:44:05,160
And of course, we could shorten this by making these a function. But we're going to see that later on.
6521
11:44:05,160 --> 11:44:13,160
I'd rather us give a little bit of practice while this is still a bit fresh. Print out what's happening.
6522
11:44:13,160 --> 11:44:21,160
Let's print out what's happening. What should we do? So because we're training for a thousand epochs, I like the idea of printing out something every 100 epochs.
6523
11:44:21,160 --> 11:44:33,160
That should be about enough of a step. Epoch. What do we got? We'll put in the epoch here with the F string and then we'll go to loss, which will be loss.
6524
11:44:33,160 --> 11:44:42,160
And maybe we'll get the first five of those five decimal places that is. We don't have an accuracy, do we?
6525
11:44:42,160 --> 11:44:50,160
Because we're working with regression. And we'll get the test loss out here. And that's going to be.5F as well.
6526
11:44:50,160 --> 11:44:58,160
Beautiful. Have we got any mistakes? I don't think we do. We didn't even run this code cell before. We'll just run these three again, see if we got...
6527
11:44:58,160 --> 11:45:04,160
Look at that. Oh my goodness. Is our loss... Our loss is going down.
6528
11:45:04,160 --> 11:45:09,160
So that means our model must be learning something.
6529
11:45:09,160 --> 11:45:17,160
Now, what if we adjusted the learning rate here? I think if we went 0.01 or something, will that do anything?
6530
11:45:17,160 --> 11:45:25,160
Oh, yes. Look how low our loss gets on the test data set. But let's confirm that. We've got to make some predictions.
6531
11:45:25,160 --> 11:45:30,160
Well, maybe we should do that in the next video. Yeah, this one's getting too long. But how good's that?
6532
11:45:30,160 --> 11:45:37,160
We created a straight line data set and we've created a model to fit it. We set up a loss and an optimizer already.
6533
11:45:37,160 --> 11:45:43,160
And we put the data on the target device. We trained and we tested so our model must be learning something.
6534
11:45:43,160 --> 11:45:48,160
But I'd like you to give a shot at confirming that by using our plot predictions function.
6535
11:45:48,160 --> 11:45:58,160
So make some predictions with our trained model. Don't forget to turn on inference mode. And we should see some red dots here fairly close to the green dots on the next plot.
6536
11:45:58,160 --> 11:46:02,160
Give that a shot and I'll see you in the next video.
6537
11:46:02,160 --> 11:46:11,160
Welcome back. In the last video, we did something very exciting. We solved a smaller problem that's giving us a hint towards our larger problem.
6538
11:46:11,160 --> 11:46:17,160
So we know that the model that we've previously been building, model two, has the capacity to learn something.
6539
11:46:17,160 --> 11:46:25,160
Now, how did we know that? Well, it's because we created this straight line data set. We replicated the architecture that we used for model one.
6540
11:46:25,160 --> 11:46:35,160
Recall that model one didn't work very well on our classification data. But with a little bit of an adjustment such as changing the number of in features.
6541
11:46:35,160 --> 11:46:44,160
And not too much different training code except for a different loss function because, well, we use MAE loss with regression data.
6542
11:46:44,160 --> 11:46:50,160
And we changed the learning rate slightly because we found that maybe our model could learn a bit better.
6543
11:46:50,160 --> 11:46:59,160
And again, I'd encourage you to play around with different values of the learning rate. In fact, anything that we've changed, try and change it yourself and just see what happens.
6544
11:46:59,160 --> 11:47:04,160
That's one of the best ways to learn what goes on with machine learning models.
6545
11:47:04,160 --> 11:47:10,160
But we trained for the same number of epochs. We set up device agnostic code. We did a training and testing loop.
6546
11:47:10,160 --> 11:47:15,160
Look at this looks. Oh, my goodness. Well done. And our loss went down.
6547
11:47:15,160 --> 11:47:23,160
So, hmm. What does that tell us? Well, it tells us that model two or the specific architecture has some capacity to learn something.
6548
11:47:23,160 --> 11:47:28,160
So we must be missing something. And we're going to get to that in a minute, I promise you.
6549
11:47:28,160 --> 11:47:35,160
But we're just going to confirm that our model has learned something and it's not just numbers on a page going down by getting visual.
6550
11:47:35,160 --> 11:47:43,160
So turn on. We're going to make some predictions and plot them. And you may have already done this because I issued that challenge at the last of at the end of the last video.
6551
11:47:43,160 --> 11:47:53,160
So turn on evaluation mode. Let's go model two dot eval. And let's make predictions, which are also known as inference.
6552
11:47:53,160 --> 11:48:02,160
And we're going to go with torch dot inference mode inference mode with torch dot inference mode.
6553
11:48:02,160 --> 11:48:10,160
Make some predictions. We're going to save them as why preds and we're going to use model two and we're going to pass it through ex test regression.
6554
11:48:10,160 --> 11:48:16,160
This should all work because we've set up device agnostic code, plot data and predictions.
6555
11:48:16,160 --> 11:48:23,160
To do this, we can of course use our plot predictions function that we imported via our helper functions dot pi function.
6556
11:48:23,160 --> 11:48:27,160
The code for that is just a few cells above if you'd like to check that out.
6557
11:48:27,160 --> 11:48:33,160
But let's set up the train data here. Train data parameter, which is x train regression.
6558
11:48:33,160 --> 11:48:48,160
And my goodness. Google collab. I'm already typing fast enough. You don't have to slow me down by giving me the wrong auto corrects train label equals y train regression.
6559
11:48:48,160 --> 11:48:53,160
And then we're going to pass in our test data equals ex test regression.
6560
11:48:53,160 --> 11:49:03,160
And then we're going to pass in test labels, which is why test regression got too many variables going on here. My goodness gracious.
6561
11:49:03,160 --> 11:49:08,160
We could have done better with naming, but this will do for now is why preds.
6562
11:49:08,160 --> 11:49:13,160
And then if we plot this, what does it look like? Oh, no, we got an error.
6563
11:49:13,160 --> 11:49:23,160
Now secretly, I kind of knew that that was coming ahead of time. That's the advantage of being the host of this machine learning cooking show. So type error. How do we fix this?
6564
11:49:23,160 --> 11:49:34,160
Remember how I asked you in one of the last videos what would happen if our data wasn't on the same device as our model? Well, we get an error, right? But this is a little bit different as well.
6565
11:49:34,160 --> 11:49:45,160
We've seen this one before. We've got CUDA device type tensa to NumPy. Where is this coming from? Well, because our plot predictions function uses mapplotlib.
6566
11:49:45,160 --> 11:49:55,160
And behind the scenes, mapplotlib references NumPy, which is another numerical computing library. However, NumPy uses a CPU rather than the GPU.
6567
11:49:55,160 --> 11:50:11,160
So we have to call dot CPU, this helpful message is telling us, call tensa dot CPU before we use our tensors with NumPy. So let's just call dot CPU on all of our tensor inputs here and see if this solves our problem.
6568
11:50:11,160 --> 11:50:22,160
Wonderful. Looks like it does. Oh my goodness. Look at those red dots so close. Well, okay. So this just confirms our suspicions. What we kind of already knew is that our model did have some capacity to learn.
6569
11:50:22,160 --> 11:50:34,160
It's just the data set when we changed the data set it worked. So, hmm. Is it our data that our model can't learn on? Like this circular data, or is the model itself?
6570
11:50:34,160 --> 11:50:44,160
Remember, our model is only comprised of linear functions. What is linear? Linear is a straight line, but is our data made of just straight lines?
6571
11:50:44,160 --> 11:50:58,160
I think it's got some nonlinearities in there. So the big secret I've been holding back will reveal itself starting from the next video. So if you want a head start of it, I'd go to torch and end.
6572
11:50:58,160 --> 11:51:15,160
And if we have a look at the documentation, we've been speaking a lot about linear functions. What are these nonlinear activations? And I'll give you another spoiler. We've actually seen one of these nonlinear activations throughout this notebook.
6573
11:51:15,160 --> 11:51:27,160
So go and check that out. See what you can infer from that. And I'll see you in the next video. Let's get started with nonlinearities. Welcome back.
6574
11:51:27,160 --> 11:51:38,160
In the last video, we saw that the model that we've been building has some potential to learn. I mean, look at these predictions. You could get a little bit better, of course, get the red dots on top of the green dots.
6575
11:51:38,160 --> 11:51:47,160
But we're just going to leave that the trend is what we're after. Our model has some capacity to learn, except this is straight line data.
6576
11:51:47,160 --> 11:51:57,160
And we've been hinting at it a fair bit is that we're using linear functions. And if we look up linear data, what does it look like?
6577
11:51:57,160 --> 11:52:05,160
Well, it has a quite a straight line. If we go linear and just search linear, what does this give us? Linear means straight. There we go, straight.
6578
11:52:05,160 --> 11:52:14,160
And then what happens if we search for nonlinear? I kind of hinted at this as well. Nonlinear. Oh, we get some curves. We get curved lines.
6579
11:52:14,160 --> 11:52:20,160
So linear functions. Straight. Nonlinear functions. Hmm.
6580
11:52:20,160 --> 11:52:34,160
Now, this is one of the beautiful things about machine learning. And I'm not sure about you, but when I was in high school, I kind of learned a concept called line of best fit, or y equals mx plus c, or
6581
11:52:34,160 --> 11:52:41,160
y equals mx plus b. And it looks something like this. And then if you wanted to go over these, you use quadratic functions and a whole bunch of other stuff.
6582
11:52:41,160 --> 11:52:51,160
But one of the most fundamental things about machine learning is that we build neural networks and deep down neural networks are just a combination.
6583
11:52:51,160 --> 11:52:56,160
It could be a large combination of linear functions and nonlinear functions.
6584
11:52:56,160 --> 11:53:11,160
So that's why in torch.nn, we have nonlinear activations and we have all these other different types of layers. But essentially, what they're doing deep down is combining straight lines with, if we go back up to our data, non straight lines.
6585
11:53:11,160 --> 11:53:21,160
So, of course, our model didn't work before because we've only given it the power to use linear lines. We've only given it the power to use straight lines.
6586
11:53:21,160 --> 11:53:29,160
But our data is what? It's curved. Although it's simple, we need nonlinearity to be able to model this data set.
6587
11:53:29,160 --> 11:53:38,160
And now, let's say we were building a pizza detection model. So let's look up some images of pizza, one of my favorite foods, images.
6588
11:53:38,160 --> 11:53:43,160
Pizza, right? So could you model pizza with just straight lines?
6589
11:53:43,160 --> 11:53:53,160
You're thinking, Daniel, you can't be serious. A computer vision model doesn't look for just straight lines in this. And I'd argue that, yes, it does, except we also add some curved lines in here.
6590
11:53:53,160 --> 11:54:02,160
That's the beauty of machine learning. Could you imagine trying to write the rules of an algorithm to detect that this is a pizza? Maybe you could put in, oh, it's a curve here.
6591
11:54:02,160 --> 11:54:14,160
And if you see red, no, no, no, no. Imagine if you're trying to do a hundred different foods. Your program would get really large. Instead, we give our machine learning models, if we come down to the model that we created.
6592
11:54:14,160 --> 11:54:22,160
We give our deep learning models the capacity to use linear and nonlinear functions. We haven't seen any nonlinear layers just yet.
6593
11:54:22,160 --> 11:54:27,160
Or maybe we've hinted at some, but that's all right. So we stack these on top of each other, these layers.
6594
11:54:27,160 --> 11:54:38,160
And then the model figures out what patterns in the data it should use, what lines it should draw to draw patterns to not only pizza, but another food such as sushi.
6595
11:54:38,160 --> 11:54:50,160
If we wanted to build a food image classification model, it would do this. The principle remains the same. So the question I'm going to pose to you, we'll get out of this, is, we'll come down here.
6596
11:54:50,160 --> 11:54:58,160
We've unlocked the missing piece or about to. We're going to cover it over the next couple of videos, the missing piece of our model.
6597
11:54:58,160 --> 11:55:05,160
And this is a big one. This is going to follow you out throughout all of machine learning and deep learning, nonlinearity.
6598
11:55:05,160 --> 11:55:25,160
So the question here is, what patterns could you draw if you were given an infinite amount of straight and non straight lines?
6599
11:55:25,160 --> 11:55:39,160
Or in machine learning terms, an infinite amount, but really it is finite. By infinite in machine learning terms, this is a technicality.
6600
11:55:39,160 --> 11:55:45,160
It could be a million parameters. It could be as we've got probably a hundred parameters in our model.
6601
11:55:45,160 --> 11:55:56,160
So just imagine a large amount of straight and non straight lines, an infinite amount of linear and nonlinear functions.
6602
11:55:56,160 --> 11:56:10,160
You could draw some pretty intricate patterns, couldn't you? And that's what gives machine learning and especially neural networks the capacity to not only fit a straight line here, but to separate two different circles.
6603
11:56:10,160 --> 11:56:19,160
But also to do crazy things like drive a self-driving car, or at least power the vision system of a self-driving car.
6604
11:56:19,160 --> 11:56:24,160
Of course, after that, you need some programming to plan what to actually do with what you see in an image.
6605
11:56:24,160 --> 11:56:29,160
But we're getting ahead of ourselves here. Let's now start diving into nonlinearity.
6606
11:56:29,160 --> 11:56:35,160
And the whole idea here is combining the power of linear and nonlinear functions.
6607
11:56:35,160 --> 11:56:44,160
Straight lines and non straight lines. Our classification data is not comprised of just straight lines. It's circles, so we need nonlinearity here.
6608
11:56:44,160 --> 11:56:54,160
So recreating nonlinear data, red and blue circles. We don't need to recreate this, but we're going to do it anyway for completeness.
6609
11:56:54,160 --> 11:57:01,160
So let's get a little bit of a practice. Make and plot data. This is so that you can practice the use of nonlinearity on your own.
6610
11:57:01,160 --> 11:57:09,160
And that plot little bit dot pie plot as PLT. We're going to go a bit faster here because we've covered this code above.
6611
11:57:09,160 --> 11:57:15,160
So import make circles. We're just going to recreate the exact same circle data set that we've created above.
6612
11:57:15,160 --> 11:57:21,160
Number of samples. We'll create a thousand. And we're going to create x and y equals what?
6613
11:57:21,160 --> 11:57:25,160
Make circles. Pass it in number of samples. Beautiful.
6614
11:57:25,160 --> 11:57:33,160
Colab, please. I wonder if I can turn off autocorrect and colab. I'm happy to just see all of my errors in the flesh. See? Look at that. I don't want that.
6615
11:57:33,160 --> 11:57:40,160
I want noise like that. Maybe I'll do that in the next video. We're not going to spend time here looking around how to do it.
6616
11:57:40,160 --> 11:57:46,160
We can work that out on the fly later. For now, I'm too excited to share with you the power of nonlinearity.
6617
11:57:46,160 --> 11:57:55,160
So here, x, we're just going to plot what's going on. We've got two x features and we're going to color it with the flavor of y because we're doing a binary classification.
6618
11:57:55,160 --> 11:58:08,160
And we're going to use one of my favorite C maps, which is color map. And we're going to go PLT dot CM for C map and red blue.
6619
11:58:08,160 --> 11:58:10,160
What do we get?
6620
11:58:10,160 --> 11:58:19,160
Okay, red circle, blue circle. Hey, is it the same color as what's above? I like this color better.
6621
11:58:19,160 --> 11:58:23,160
Did we get that right up here?
6622
11:58:23,160 --> 11:58:28,160
Oh, my goodness. Look how much code we've written. Yeah, I like the other blue. I'm going to bring this down here.
6623
11:58:28,160 --> 11:58:37,160
It's all about aesthetics and machine learning. It's not just numbers on a page, don't you? How could you be so crass? Let's go there.
6624
11:58:37,160 --> 11:58:43,160
Okay, that's better color red and blue. That's small lively, isn't it? So now let's convert to train and test.
6625
11:58:43,160 --> 11:58:47,160
And then we can start to build a model with nonlinearity. Oh, this is so good.
6626
11:58:47,160 --> 11:58:58,160
Okay, convert data to tenses and then to train and test splits. Nothing we haven't covered here before.
6627
11:58:58,160 --> 11:59:08,160
So import torch, but it never hurts to practice code, right? Import torch from sklearn dot model selection.
6628
11:59:08,160 --> 11:59:18,160
Import train test split so that we can split our red and blue dots randomly. And we're going to turn data into tenses.
6629
11:59:18,160 --> 11:59:28,160
And we'll go X equals torch from NumPy and we'll pass in X here. And then we'll change it into type torch dot float.
6630
11:59:28,160 --> 11:59:33,160
Why do we do this? Well, because, oh, my goodness, autocorrect. It's getting the best of me here.
6631
11:59:33,160 --> 11:59:38,160
You know, watching me live code this stuff and battle with autocorrect. That's what this whole course is.
6632
11:59:38,160 --> 11:59:44,160
And we're really teaching pie torch. Am I just battling with Google collab's autocorrect?
6633
11:59:44,160 --> 11:59:52,160
We are turning it into torch dot float with a type here because why NumPy's default, which is what makes circles users behind the scenes.
6634
11:59:52,160 --> 11:59:59,160
NumPy is actually using a lot of other machine learning libraries, pandas, built on NumPy, scikit learn, does a lot of NumPy.
6635
11:59:59,160 --> 12:00:06,160
Matplotlib, NumPy. That's just showing there. What's the word? Is it ubiquitous, ubiquity? I'm not sure, maybe.
6636
12:00:06,160 --> 12:00:10,160
If not, you can correct me. The ubiquity of NumPy.
6637
12:00:10,160 --> 12:00:17,160
And test sets, but we're using pie torch to leverage the power of autograd, which is what powers our gradient descent.
6638
12:00:17,160 --> 12:00:21,160
And the fact that it can use GPUs.
6639
12:00:21,160 --> 12:00:30,160
So we're creating training test splits here with train test split X Y.
6640
12:00:30,160 --> 12:00:36,160
And we're going to go test size equals 0.2. And we're going to set random.
6641
12:00:36,160 --> 12:00:46,160
Random state equals 42. And then we'll view our first five samples. Are these going to be?
6642
12:00:46,160 --> 12:00:52,160
Tenses. Fingers crossed. We haven't got an error. Beautiful. We have tenses here.
6643
12:00:52,160 --> 12:00:56,160
Okay. Now we're up to the exciting part. We've got our data set back.
6644
12:00:56,160 --> 12:00:59,160
I think it's time to build a model with nonlinearity.
6645
12:00:59,160 --> 12:01:06,160
So if you'd like to peek ahead, check out TorchNN again. This is a little bit of a spoiler.
6646
12:01:06,160 --> 12:01:12,160
Go into the nonlinear activation. See if you can find the one that we've already used. That's your challenge.
6647
12:01:12,160 --> 12:01:21,160
Can you find the one we've already used? And go into here and search what is our nonlinear function.
6648
12:01:21,160 --> 12:01:28,160
So give that a go and see what comes up. I'll see you in the next video.
6649
12:01:28,160 --> 12:01:34,160
Welcome back. Now put your hand up if you're ready to learn about nonlinearity.
6650
12:01:34,160 --> 12:01:38,160
And I know I can't see your hands up, but I better see some hands up or I better feel some hands up
6651
12:01:38,160 --> 12:01:44,160
because my hands up because nonlinearity is a magic piece of the puzzle that we're about to learn about.
6652
12:01:44,160 --> 12:01:50,160
So let's title this section building a model with nonlinearity.
6653
12:01:50,160 --> 12:02:03,160
So just to re-emphasize linear equals straight lines and in turn nonlinear equals non-straight lines.
6654
12:02:03,160 --> 12:02:09,160
And I left off the end of the last video, giving you the challenge of checking out the TorchNN module,
6655
12:02:09,160 --> 12:02:13,160
looking for the nonlinear function that we've already used.
6656
12:02:13,160 --> 12:02:19,160
Now where would you go to find such a thing and oh, what do we have here? Nonlinear activations.
6657
12:02:19,160 --> 12:02:27,160
And there's going to be a fair few things here, but essentially all of the modules within TorchNN
6658
12:02:27,160 --> 12:02:31,160
are either some form of layer in a neural network if we recall.
6659
12:02:31,160 --> 12:02:35,160
Let's go to a neural network. We've seen the anatomy of a neural network.
6660
12:02:35,160 --> 12:02:41,160
Generally you'll have an input layer and then multiple hidden layers and some form of output layer.
6661
12:02:41,160 --> 12:02:49,160
Well, these multiple hidden layers can be almost any combination of what's in TorchNN.
6662
12:02:49,160 --> 12:02:53,160
And in fact, they can almost be any combination of function you could imagine.
6663
12:02:53,160 --> 12:02:57,160
Whether they work or not is another question.
6664
12:02:57,160 --> 12:03:02,160
But PyTorch implements some of the most common layers that you would have as hidden layers.
6665
12:03:02,160 --> 12:03:07,160
And they might be pooling layers, padding layers, activation functions.
6666
12:03:07,160 --> 12:03:14,160
And they all have the same premise. They perform some sort of mathematical operation on an input.
6667
12:03:14,160 --> 12:03:22,160
And so if we look into the nonlinear activation functions, you might have find an n dot sigmoid.
6668
12:03:22,160 --> 12:03:27,160
Where have we used this before? There's a sigmoid activation function in math terminology.
6669
12:03:27,160 --> 12:03:31,160
It takes some input x, performs this operation on it.
6670
12:03:31,160 --> 12:03:36,160
And here's what it looks like if we did it on a straight line, but I think we should put this in practice.
6671
12:03:36,160 --> 12:03:39,160
And if you want an example, well, there's an example there.
6672
12:03:39,160 --> 12:03:43,160
All of the other nonlinear activations have examples as well.
6673
12:03:43,160 --> 12:03:46,160
But I'll let you go through all of these in your own time.
6674
12:03:46,160 --> 12:03:48,160
Otherwise we're going to be here forever.
6675
12:03:48,160 --> 12:03:50,160
And then dot relu is another common function.
6676
12:03:50,160 --> 12:03:54,160
We saw that when we looked at the architecture of a classification network.
6677
12:03:54,160 --> 12:04:02,160
So with that being said, how about we start to code a classification model with nonlinearity.
6678
12:04:02,160 --> 12:04:08,160
And of course, if you wanted to, you could look up what is a nonlinear function.
6679
12:04:08,160 --> 12:04:12,160
If you wanted to learn more, nonlinear means the graph is not a straight line.
6680
12:04:12,160 --> 12:04:16,160
Oh, beautiful. So that's how I'd learn about nonlinear functions.
6681
12:04:16,160 --> 12:04:20,160
But while we're here together, how about we write some code.
6682
12:04:20,160 --> 12:04:28,160
So let's go build a model with nonlinear activation functions.
6683
12:04:28,160 --> 12:04:33,160
And just one more thing before, just to re-emphasize what we're doing here.
6684
12:04:33,160 --> 12:04:38,160
Before we write this code, I've got, I just remembered, I've got a nice slide,
6685
12:04:38,160 --> 12:04:43,160
which is the question we posed in the previous video, the missing piece, nonlinearity.
6686
12:04:43,160 --> 12:04:50,160
But the question I want you to think about is what could you draw if you had an unlimited amount of straight,
6687
12:04:50,160 --> 12:04:54,160
in other words, linear, and non-straight, nonlinear line.
6688
12:04:54,160 --> 12:05:02,160
So we've seen previously that we can build a model, a linear model to fit some data that's in a straight line, linear data.
6689
12:05:02,160 --> 12:05:09,160
But when we're working with nonlinear data, well, we need the power of nonlinear functions.
6690
12:05:09,160 --> 12:05:14,160
So this is circular data. And now, this is only a 2D plot, keep in mind there.
6691
12:05:14,160 --> 12:05:19,160
Whereas neural networks and machine learning models can work with numbers that are in hundreds of dimensions,
6692
12:05:19,160 --> 12:05:25,160
impossible for us humans to visualize, but since computers love numbers, it's a piece of cake to them.
6693
12:05:25,160 --> 12:05:32,160
So from torch, import, and then we're going to create our first neural network with nonlinear activations.
6694
12:05:32,160 --> 12:05:36,160
This is so exciting. So let's create a class here.
6695
12:05:36,160 --> 12:05:42,160
We'll create circle model. We've got circle model V1 already. We're going to create circle model V2.
6696
12:05:42,160 --> 12:05:50,160
And we'll inherit from an end dot module. And then we'll write the constructor, which is the init function,
6697
12:05:50,160 --> 12:05:57,160
and we'll pass in self here. And then we'll go self, or super sorry, too many S words.
6698
12:05:57,160 --> 12:06:04,160
Dot underscore underscore init, underscore. There we go. So we've got the constructor here.
6699
12:06:04,160 --> 12:06:13,160
And now let's create a layer one, self dot layer one equals just the same as what we've used before. And then dot linear.
6700
12:06:13,160 --> 12:06:20,160
We're going to create this quite similar to the model that we've built before, except with one added feature.
6701
12:06:20,160 --> 12:06:26,160
And we're going to create in features, which is akin to the number of X features that we have here.
6702
12:06:26,160 --> 12:06:31,160
Again, if this was different, if we had three X features, we might change this to three.
6703
12:06:31,160 --> 12:06:38,160
But because we're working with two, we'll leave it as that. We'll keep out features as 10, so that we have 10 hidden units.
6704
12:06:38,160 --> 12:06:47,160
And then we'll go layer two, and then dot linear. Again, these values here are very customizable because why, because they're hyper parameters.
6705
12:06:47,160 --> 12:06:53,160
So let's line up the out features of layer two, and we'll do the same with layer three.
6706
12:06:53,160 --> 12:06:59,160
Because layer three is going to take the outputs of layer two. So it needs in features of 10.
6707
12:06:59,160 --> 12:07:06,160
And we want layer three to be the output layer, and we want one number as output, so we'll set one here.
6708
12:07:06,160 --> 12:07:13,160
Now, here's the fun part. We're going to introduce a nonlinear function. We're going to introduce the relu function.
6709
12:07:13,160 --> 12:07:18,160
Now, we've seen sigmoid. Relu is another very common one. It's actually quite simple.
6710
12:07:18,160 --> 12:07:22,160
But let's write it out first, and then dot relu.
6711
12:07:22,160 --> 12:07:32,160
So remember, torch dot nn stores a lot of existing nonlinear activation functions, so that we don't necessarily have to code them ourselves.
6712
12:07:32,160 --> 12:07:37,160
However, if we did want to code a relu function, let me show you. It's actually quite simple.
6713
12:07:37,160 --> 12:07:48,160
If we dive into nn dot relu, or relu, however you want to say it, I usually say relu, applies the rectified linear unit function element wise.
6714
12:07:48,160 --> 12:07:52,160
So that means element wise on every element in our input tensor.
6715
12:07:52,160 --> 12:07:58,160
And so it stands for rectified linear unit, and here's what it does. Basically, it takes an input.
6716
12:07:58,160 --> 12:08:06,160
If the input is negative, it turns the input to zero, and it leaves the positive inputs how they are.
6717
12:08:06,160 --> 12:08:08,160
And so this line is not straight.
6718
12:08:08,160 --> 12:08:14,160
Now, you could argue, yeah, well, it's straight here and then straight there, but this is a form of a nonlinear activation function.
6719
12:08:14,160 --> 12:08:19,160
So it goes boom, if it was linear, it would just stay straight there like that.
6720
12:08:19,160 --> 12:08:23,160
But let's see it in practice. Do you think this is going to improve our model?
6721
12:08:23,160 --> 12:08:28,160
Well, let's find out together, hey, forward, we need to implement the forward method.
6722
12:08:28,160 --> 12:08:38,160
And here's what we're going to do. Where should we put our nonlinear activation functions?
6723
12:08:38,160 --> 12:08:48,160
So I'm just going to put a node here. Relu is a nonlinear activation function.
6724
12:08:48,160 --> 12:08:55,160
And remember, wherever I say function, it's just performing some sort of operation on a numerical input.
6725
12:08:55,160 --> 12:09:01,160
So we're going to put a nonlinear activation function in between each of our layers.
6726
12:09:01,160 --> 12:09:05,160
So let me show you what this looks like, self dot layer three.
6727
12:09:05,160 --> 12:09:12,160
We're going to start from the outside in self dot relu, and then we're going to go self dot layer two.
6728
12:09:12,160 --> 12:09:16,160
And then we're going to go self dot relu.
6729
12:09:16,160 --> 12:09:21,160
And then there's a fair bit going on here, but nothing we can't handle layer one. And then here's the X.
6730
12:09:21,160 --> 12:09:30,160
So what happens is our data goes into layer one, performs a linear operation with an end up linear.
6731
12:09:30,160 --> 12:09:34,160
Then we pass the output of layer one to a relu function.
6732
12:09:34,160 --> 12:09:42,160
So we, where's relu up here, we turn all of the negative outputs of our model of our of layer one to zero,
6733
12:09:42,160 --> 12:09:45,160
and we keep the positives how they are.
6734
12:09:45,160 --> 12:09:48,160
And then we do the same here with layer two.
6735
12:09:48,160 --> 12:09:53,160
And then finally, the outputs of layer three stay as they are. We've got out features there.
6736
12:09:53,160 --> 12:09:58,160
We don't have a relu on the end here, because we're going to pass the outputs to the sigmoid function later on.
6737
12:09:58,160 --> 12:10:04,160
And if we really wanted to, we could put self dot sigmoid here equals an end dot sigmoid.
6738
12:10:04,160 --> 12:10:08,160
But I'm going to, that's just one way of constructing it.
6739
12:10:08,160 --> 12:10:16,160
We're just going to apply the sigmoid function to the logits of our model, because what are the logits, the raw output of our model.
6740
12:10:16,160 --> 12:10:22,160
And so let's instantiate our model. This is going to be called model three, which is a little bit confusing, but we're up to model three,
6741
12:10:22,160 --> 12:10:28,160
which is circle model V two, and we're going to send that to the target device.
6742
12:10:28,160 --> 12:10:32,160
And then let's check model three. What does this look like?
6743
12:10:35,160 --> 12:10:45,160
Wonderful. So it doesn't actually show us where the relu's appear, but it just shows us what are the parameters of our circle model V two.
6744
12:10:45,160 --> 12:10:56,160
Now, I'd like you to have a think about this. And my challenge to you is to go ahead and see if this model is capable of working on our data, on our circular data.
6745
12:10:56,160 --> 12:11:01,160
So we've got the data sets ready. You need to set up some training code.
6746
12:11:01,160 --> 12:11:06,160
My challenge to you is write that training code and see if this model works.
6747
12:11:06,160 --> 12:11:09,160
But we're going to go through that over the next few videos.
6748
12:11:09,160 --> 12:11:17,160
And also, my other challenge to you is to go to the TensorFlow Playground and recreate our neural network here.
6749
12:11:17,160 --> 12:11:23,160
You can have two hidden layers. Does this go to 10? Well, it only goes to eight. We'll keep this at five.
6750
12:11:23,160 --> 12:11:29,160
So build something like this. So we've got two layers with five. It's a little bit different to ours because we've got two layers with 10.
6751
12:11:29,160 --> 12:11:36,160
And then put the learning rate to 0.01. What do we have? 0.1 with stochastic gradient descent.
6752
12:11:36,160 --> 12:11:40,160
We've been using 0.1, so we'll leave that. So this is the TensorFlow Playground.
6753
12:11:40,160 --> 12:11:49,160
And then change the activation here. Instead of linear, which we've used before, change it to relu, which is what we're using.
6754
12:11:49,160 --> 12:11:54,160
And press play here and see what happens. I'll see you in the next video.
6755
12:11:54,160 --> 12:12:00,160
Welcome back. In the last video, I left off leaving the challenge of recreating this model here.
6756
12:12:00,160 --> 12:12:06,160
It's not too difficult to do. We've got two hidden layers and five neurons. We've got our data set, which looks kind of like ours.
6757
12:12:06,160 --> 12:12:11,160
But the main points here are have to learning rate of 0.1, which is what we've been using.
6758
12:12:11,160 --> 12:12:20,160
But to change it from, we've previously used a linear activation to change it from linear to relu, which is what we've got set up here in the code.
6759
12:12:20,160 --> 12:12:25,160
Now, remember, relu is a popular and effective nonlinear activation function.
6760
12:12:25,160 --> 12:12:31,160
And we've been discussing that we need nonlinearity to model nonlinear data.
6761
12:12:31,160 --> 12:12:36,160
And so that's the crux to what neural networks are.
6762
12:12:36,160 --> 12:12:40,160
Artificial neural networks, not to get confused with the brain neural networks, but who knows?
6763
12:12:40,160 --> 12:12:44,160
This might be how they work, too. I don't know. I'm not a neurosurgeon or a neuroscientist.
6764
12:12:44,160 --> 12:12:51,160
Artificial neural networks are a large combination of linear.
6765
12:12:51,160 --> 12:13:05,160
So this is straight and non-straight nonlinear functions, which are potentially able to find patterns in data.
6766
12:13:05,160 --> 12:13:09,160
And so for our data set, it's quite small. It's just a blue and a red circle.
6767
12:13:09,160 --> 12:13:17,160
But this same principle applies for larger data sets and larger models combined linear and nonlinear functions.
6768
12:13:17,160 --> 12:13:21,160
So we've got a few tabs going on here. Let's get rid of some. Let's come back to here.
6769
12:13:21,160 --> 12:13:24,160
Did you try this out? Does it work? Do you think it'll work?
6770
12:13:24,160 --> 12:13:29,160
I don't know. Let's find out together. Ready? Three, two, one.
6771
12:13:29,160 --> 12:13:31,160
Look at that.
6772
12:13:31,160 --> 12:13:37,160
Almost instantly the training loss goes down to zero and the test loss is basically zero as well. Look at that.
6773
12:13:37,160 --> 12:13:45,160
That's amazing. We can stop that there. And if we change the learning rate, maybe a little lower, let's see what happens.
6774
12:13:45,160 --> 12:13:49,160
It takes a little bit longer to get to where it wants to go to.
6775
12:13:49,160 --> 12:13:53,160
See, that's the power of changing the learning rate. Let's make it really small. What happens here?
6776
12:13:53,160 --> 12:13:57,160
So that was about 300 epochs. The loss started to go down.
6777
12:13:57,160 --> 12:14:01,160
If we change it to be really small, oh, we're getting a little bit of a trend.
6778
12:14:01,160 --> 12:14:06,160
Is it starting to go down? We're already surpassed the epochs that we had.
6779
12:14:06,160 --> 12:14:11,160
So see how the learning rate is much smaller? That means our model is learning much slower.
6780
12:14:11,160 --> 12:14:17,160
So this is just a beautiful visual way of demonstrating different values of the learning rate.
6781
12:14:17,160 --> 12:14:21,160
We could sit here all day and that might not get to lower, but let's increase it by 10x.
6782
12:14:21,160 --> 12:14:27,160
And that was over 1,000 epochs and it's still at about 0.5, let's say.
6783
12:14:27,160 --> 12:14:31,160
Oh, we got a better. Oh, we're going faster already.
6784
12:14:31,160 --> 12:14:37,160
So not even at 500 or so epochs, we're about 0.4.
6785
12:14:37,160 --> 12:14:41,160
That's the power of the learning rate. We'll increase it by another 10x.
6786
12:14:41,160 --> 12:14:45,160
We'll reset. Start again. Oh, would you look at that much faster this time.
6787
12:14:45,160 --> 12:14:50,160
That is beautiful. Oh, there's nothing better than watching a loss curve go down.
6788
12:14:50,160 --> 12:14:55,160
In the world of machine learning, that is. And then we reset that again.
6789
12:14:55,160 --> 12:15:01,160
And let's change it right back to what we had. And we get to 0 in basically under 100 epochs.
6790
12:15:01,160 --> 12:15:05,160
So that's the power of the learning rate, little visual representation.
6791
12:15:05,160 --> 12:15:10,160
Working on learning rates, it's time for us to build an optimizer and a loss function.
6792
12:15:10,160 --> 12:15:15,160
So that's right here. We've got our nonlinear model set up loss and optimizer.
6793
12:15:15,160 --> 12:15:19,160
You might have already done this because the code, this is code that we've written before,
6794
12:15:19,160 --> 12:15:23,160
but we're going to redo it for completeness and practice.
6795
12:15:23,160 --> 12:15:28,160
So we want a loss function. We're working with logits here and we're working with binary cross entropy.
6796
12:15:28,160 --> 12:15:30,160
So what loss do we use?
6797
12:15:30,160 --> 12:15:34,160
Binary cross entropy. Sorry, we're working with a binary classification problem.
6798
12:15:34,160 --> 12:15:38,160
Blue dots or red dots, torch dot opt in.
6799
12:15:38,160 --> 12:15:42,160
What are some other binary classification problems that you can think of?
6800
12:15:42,160 --> 12:15:46,160
We want model three dot parameters.
6801
12:15:46,160 --> 12:15:49,160
They're the parameters that we want to optimize this model here.
6802
12:15:49,160 --> 12:15:57,160
And we're going to set our LR to 0.1, just like we had in the TensorFlow playground.
6803
12:15:57,160 --> 12:16:03,160
Beautiful. So some other binary classification problems I can think of would be email.
6804
12:16:03,160 --> 12:16:07,160
Spam or not spam credit cards.
6805
12:16:07,160 --> 12:16:12,160
So equals fraud or not fraud.
6806
12:16:12,160 --> 12:16:15,160
What else? You might have insurance claims.
6807
12:16:15,160 --> 12:16:19,160
Equals who's at fault or not at fault.
6808
12:16:19,160 --> 12:16:23,160
If someone puts in a claim speaking about a car crash, whose fault was it?
6809
12:16:23,160 --> 12:16:26,160
Was the person submitting the claim? Were they at fault?
6810
12:16:26,160 --> 12:16:30,160
Or was the person who was also mentioned in the claim? Are they not at fault?
6811
12:16:30,160 --> 12:16:34,160
So there's many more, but they're just some I can think of up the top of my head.
6812
12:16:34,160 --> 12:16:38,160
But now let's train our model with nonlinearity.
6813
12:16:38,160 --> 12:16:42,160
Oh, we're on a roll here.
6814
12:16:42,160 --> 12:16:45,160
Training a model with nonlinearity.
6815
12:16:45,160 --> 12:16:50,160
So we've seen that if we introduce a nonlinear activation function within a model,
6816
12:16:50,160 --> 12:16:56,160
remember this is a linear activation function, and if we train this, the loss doesn't go down.
6817
12:16:56,160 --> 12:17:01,160
But if we just adjust this to add a relu in here, we get the loss going down.
6818
12:17:01,160 --> 12:17:06,160
So hopefully this replicates with our pure PyTorch code.
6819
12:17:06,160 --> 12:17:09,160
So let's do it, hey?
6820
12:17:09,160 --> 12:17:12,160
So we're going to create random seeds.
6821
12:17:12,160 --> 12:17:16,160
Because we're working with CUDA, we'll introduce the CUDA random seed as well.
6822
12:17:16,160 --> 12:17:22,160
Torch.manual seed. Again, don't worry too much if your numbers on your screen aren't exactly what mine are.
6823
12:17:22,160 --> 12:17:26,160
That's due to the inherent randomness of machine learning.
6824
12:17:26,160 --> 12:17:31,160
In fact, stochastic gradient descent stochastic again stands for random.
6825
12:17:31,160 --> 12:17:35,160
And we're just setting up the seeds here so that they can be as close as possible.
6826
12:17:35,160 --> 12:17:38,160
But the direction is more important.
6827
12:17:38,160 --> 12:17:45,160
So if my loss goes down, your loss should also go down on target device.
6828
12:17:45,160 --> 12:17:48,160
And then we're going to go Xtrain.
6829
12:17:48,160 --> 12:17:52,160
So this is setting up device agnostic code. We've done this before.
6830
12:17:52,160 --> 12:17:55,160
But we're going to do it again for completeness.
6831
12:17:55,160 --> 12:17:59,160
Just to practice every step of the puzzle. That's what we want to do.
6832
12:17:59,160 --> 12:18:03,160
We want to have experience. That's what this course is. It's a momentum builder.
6833
12:18:03,160 --> 12:18:13,160
So that when you go to other repos and machine learning projects that use PyTorch, you can go, oh, does this code set device agnostic code?
6834
12:18:13,160 --> 12:18:18,160
What problem are we working on? Is it binary or multi-class classification?
6835
12:18:18,160 --> 12:18:22,160
So let's go loop through data.
6836
12:18:22,160 --> 12:18:26,160
Again, we've done this before, but we're going to set up the epochs.
6837
12:18:26,160 --> 12:18:29,160
Let's do 1000 epochs. Why not?
6838
12:18:29,160 --> 12:18:33,160
So we can go for epoch in range epochs.
6839
12:18:33,160 --> 12:18:37,160
What do we do here? Well, we want to train. So this is training code.
6840
12:18:37,160 --> 12:18:40,160
We set our model model three dot train.
6841
12:18:40,160 --> 12:18:44,160
And I want you to start to think about how could we functionalize this training code?
6842
12:18:44,160 --> 12:18:47,160
We're going to start to move towards that in a future video.
6843
12:18:47,160 --> 12:18:51,160
So one is forward pass. We've got the logits. Why the logits?
6844
12:18:51,160 --> 12:18:57,160
Well, because the raw output of our model without any activation functions towards the final layer.
6845
12:18:57,160 --> 12:19:00,160
Classified as logits or called logits.
6846
12:19:00,160 --> 12:19:10,160
And then we create y-pred as in prediction labels by rounding the output of torch dot sigmoid of the logits.
6847
12:19:10,160 --> 12:19:20,160
So this is going to take us from logits to prediction probabilities to prediction labels.
6848
12:19:20,160 --> 12:19:26,160
And then we can go to, which is calculate the loss.
6849
12:19:26,160 --> 12:19:30,160
That's from my unofficial pytorch song. Calculate the last.
6850
12:19:30,160 --> 12:19:35,160
We go loss equals loss FN y logits.
6851
12:19:35,160 --> 12:19:48,160
Because remember, we've got BCE with logits loss and takes in logits as first input.
6852
12:19:48,160 --> 12:19:53,160
And that's going to calculate the loss between our models, logits and the y training labels.
6853
12:19:53,160 --> 12:19:58,160
And we will go here, we'll calculate accuracy using our accuracy function.
6854
12:19:58,160 --> 12:20:06,160
And this one is a little bit backwards compared to pytorch, but we pass in the y training labels first.
6855
12:20:06,160 --> 12:20:12,160
But it's constructed this way because it's in the same style as scikit line.
6856
12:20:12,160 --> 12:20:21,160
Three, we go optimizer zero grad. We zero the gradients of the optimizer so that it can start from fresh.
6857
12:20:21,160 --> 12:20:25,160
Calculating the ideal gradients every epoch.
6858
12:20:25,160 --> 12:20:28,160
So it's going to reset every epoch, which is fine.
6859
12:20:28,160 --> 12:20:34,160
Then we're going to perform back propagation pytorch is going to take care of that for us by calling loss backwards.
6860
12:20:34,160 --> 12:20:42,160
And then we will perform gradient descent. So step the optimizer to see how we should improve our model parameters.
6861
12:20:42,160 --> 12:20:45,160
So optimizer dot step.
6862
12:20:45,160 --> 12:20:50,160
Oh, and I want to show you speaking of model parameters. Let's check our model three dot state dig.
6863
12:20:50,160 --> 12:20:56,160
So the relu activation function actually doesn't have any parameters.
6864
12:20:56,160 --> 12:21:02,160
So you'll notice here, we've got weight, we've got bias of layer one, layer two, and a layer three.
6865
12:21:02,160 --> 12:21:09,160
So the relu function here doesn't have any parameters to optimize. If we go nn dot relu.
6866
12:21:09,160 --> 12:21:12,160
Does it say what it implements? There we go.
6867
12:21:12,160 --> 12:21:19,160
So it's just the maximum of zero or x. So it takes the input and takes the max of zero or x.
6868
12:21:19,160 --> 12:21:26,160
And so when it takes the max of zero or x, if it's a negative number, zero is going to be higher than a negative number.
6869
12:21:26,160 --> 12:21:29,160
So that's why it zeroes all of the negative inputs.
6870
12:21:29,160 --> 12:21:38,160
And then it leaves the positive inputs how they are because the max of a positive input versus zero is the positive input.
6871
12:21:38,160 --> 12:21:43,160
So this has no parameters to optimize. That's why it's so effective because you think about it.
6872
12:21:43,160 --> 12:21:47,160
Every parameter in our model needs some little bit of computation to adjust.
6873
12:21:47,160 --> 12:21:51,160
And so the more parameters we add to our model, the more compute that is required.
6874
12:21:51,160 --> 12:22:00,160
So generally, the kind of trade-off in machine learning is that, yes, more parameters have more of an ability to learn, but you need more compute.
6875
12:22:00,160 --> 12:22:07,160
So let's go model three dot a vowel. And we're going to go with torch dot inference mode.
6876
12:22:07,160 --> 12:22:13,160
If I could spell inference, that'd be fantastic. We're going to do what? We're going to do the forward pass.
6877
12:22:13,160 --> 12:22:17,160
So test logits equals model three on the test data.
6878
12:22:17,160 --> 12:22:29,160
And then we're going to calculate the test pred labels by calling torch dot round on torch dot sigmoid on the test logits.
6879
12:22:29,160 --> 12:22:33,160
And then we can calculate the test loss. How do we do that?
6880
12:22:33,160 --> 12:22:43,160
And then we can also calculate the test accuracy. I'm just going to give myself some more space here.
6881
12:22:43,160 --> 12:22:54,160
So I can code in the middle of the screen equals accuracy function on what we're going to pass in y true equals y test.
6882
12:22:54,160 --> 12:23:04,160
We're going to pass in y true equals y test. And then we will pass in y pred equals test pred.
6883
12:23:04,160 --> 12:23:08,160
Beautiful. A final step here is to print out what's happening.
6884
12:23:08,160 --> 12:23:13,160
Now, this will be very important because one, it's fun to know what your model is doing.
6885
12:23:13,160 --> 12:23:20,160
And two, if our model does actually learn, I'd like to see the loss values go down and the accuracy values go up.
6886
12:23:20,160 --> 12:23:30,160
As I said, there's nothing much more beautiful in the world of machine learning than watching a loss function go down or a loss value go down and watching a loss curve go down.
6887
12:23:30,160 --> 12:23:36,160
So let's print out the current epoch and then we'll print out the loss, which will just be the training loss.
6888
12:23:36,160 --> 12:23:41,160
And we'll take that to four decimal places. And then we'll go accuracy here.
6889
12:23:41,160 --> 12:23:54,160
And this will be a and we'll take this to two decimal places and we'll put a little percentage sign there and then we'll break it up by putting in the test loss here and we'll put in the test loss.
6890
12:23:54,160 --> 12:24:03,160
Because remember our model learns patterns on the training data set and then evaluates those patterns on the test data set.
6891
12:24:03,160 --> 12:24:14,160
So, and we'll pass in test act here and no doubt there might be an error or two within all of this code, but we're going to try and run this because we've seen this code before, but I think we're ready.
6892
12:24:14,160 --> 12:24:19,160
We're training our first model here with non-linearities built into the model.
6893
12:24:19,160 --> 12:24:24,160
You ready? Three, two, one, let's go.
6894
12:24:24,160 --> 12:24:36,160
Oh, of course. Module torch CUDA has no attribute manuals are just a typo standard man you out.
6895
12:24:36,160 --> 12:24:38,160
There we go. Have to sound that out.
6896
12:24:38,160 --> 12:24:46,160
Another one. What do we get wrong here? Oh, target size must be same as input size. Where did it mess up here?
6897
12:24:46,160 --> 12:24:55,160
What do we get wrong? Test loss, test logits on Y test. Hmm.
6898
12:24:55,160 --> 12:25:05,160
So these two aren't matching up. Model three X test and Y test. What's the size of?
6899
12:25:05,160 --> 12:25:10,160
So let's do some troubleshooting on the fly. Hey, not everything always works out as you want.
6900
12:25:10,160 --> 12:25:18,160
So length of X test, we've got a shape issue here. Remember how I said one of the most common issues in deep learning is a shape issue?
6901
12:25:18,160 --> 12:25:21,160
We've got the same shape here.
6902
12:25:21,160 --> 12:25:33,160
Let's check test logits dot shape and Y test dot shape. We'll print this out.
6903
12:25:33,160 --> 12:25:42,160
So 200. Oh, here's what we have to do. That's what we missed dot squeeze. Oh, see how I've been hinting at the fact that we needed to call dot squeeze.
6904
12:25:42,160 --> 12:25:48,160
So this is where the discrepancy is. Our test logits dot shape. We've got an extra dimension here.
6905
12:25:48,160 --> 12:25:53,160
And what are we getting here? A value error on the target size, which is a shape mismatch.
6906
12:25:53,160 --> 12:25:59,160
So we've got target size 200 must be the same input size as torch size 201.
6907
12:25:59,160 --> 12:26:04,160
So did we squeeze this? Oh, that's why the training worked. Okay, so we've missed this.
6908
12:26:04,160 --> 12:26:12,160
Let's just get rid of this. So we're getting rid of the extra one dimension by using squeeze, which is the one dimension here.
6909
12:26:12,160 --> 12:26:19,160
We should have everything lined up. There we go. Okay. Look at that. Yes.
6910
12:26:19,160 --> 12:26:24,160
Now accuracy has gone up, albeit not by too much. It's still not perfect.
6911
12:26:24,160 --> 12:26:32,160
So really we'd like this to be towards 100% lost to be lower. But I feel like we've got a better performing model. Don't you?
6912
12:26:32,160 --> 12:26:38,160
Now that is the power of non linearity. All we did was we added in a relu layer or just two of them.
6913
12:26:38,160 --> 12:26:51,160
Relu here, relu here. But what did we do? We gave our model the power of straight lines. Oh, straight linear of straight lines and non straight lines.
6914
12:26:51,160 --> 12:26:56,160
So it can potentially draw a line to separate these circles.
6915
12:26:56,160 --> 12:27:05,160
So in the next video, let's draw a line, plot our model decision boundary using our function and see if it really did learn anything.
6916
12:27:05,160 --> 12:27:08,160
I'll see you there.
6917
12:27:08,160 --> 12:27:14,160
Welcome back. In the last video, we trained our first model, and as you can tell, I've got the biggest smile on my face,
6918
12:27:14,160 --> 12:27:22,160
but we trained our first model that harnesses both the power of straight lines and non straight lines or linear functions and non linear functions.
6919
12:27:22,160 --> 12:27:29,160
And by the 1000th epoch, we look like we're getting a bit better results than just pure guessing, which is 50%.
6920
12:27:29,160 --> 12:27:36,160
Because we have 500 samples of red dots and 500 samples of blue dots. So we have evenly balanced classes.
6921
12:27:36,160 --> 12:27:46,160
Now, we've seen that if we added a relu activation function with a data set similar to ours with a TensorFlow playground, the model starts to fit.
6922
12:27:46,160 --> 12:27:51,160
But it doesn't work with just linear. There's a few other activation functions that you could play around with here.
6923
12:27:51,160 --> 12:27:57,160
You could play around with the learning rate, regularization. If you're not sure what that is, I'll leave that as extra curriculum to look up.
6924
12:27:57,160 --> 12:28:03,160
But we're going to retire the TensorFlow program for now because we're going to go back to writing code.
6925
12:28:03,160 --> 12:28:09,160
So let's get out of that. Let's get out of that. We now have to evaluate our model because right now it's just numbers on a page.
6926
12:28:09,160 --> 12:28:18,160
So let's write down here 6.4. What do we like to do to evaluate things? It's visualize, visualize, visualize.
6927
12:28:18,160 --> 12:28:25,160
So evaluating a model trained with nonlinear activation functions.
6928
12:28:25,160 --> 12:28:34,160
And we also discussed the point that neural networks are really just a big combination of linear and nonlinear functions trying to draw patterns in data.
6929
12:28:34,160 --> 12:28:41,160
So with that being said, let's make some predictions with our Model 3, our most recently trained model.
6930
12:28:41,160 --> 12:28:46,160
We'll put it into a Val mode and then we'll set up inference mode.
6931
12:28:46,160 --> 12:28:55,160
And then we'll go yprads equals torch dot round and then torch dot sigmoid.
6932
12:28:55,160 --> 12:29:00,160
We could functionalize this, of course, Model 3 and then pass in X test.
6933
12:29:00,160 --> 12:29:06,160
And you know what? We're going to squeeze these here because we ran into some troubles in the previous video.
6934
12:29:06,160 --> 12:29:14,160
I actually really liked that we did because then we got to troubleshoot a shape error on the fly because that's one of the most common issues you're going to come across in deep learning.
6935
12:29:14,160 --> 12:29:21,160
So yprads, let's check them out and then let's check out y test.
6936
12:29:21,160 --> 12:29:23,160
You want y test 10.
6937
12:29:23,160 --> 12:29:31,160
So remember, when we're evaluating predictions, we want them to be in the same format as our original labels.
6938
12:29:31,160 --> 12:29:33,160
We want to compare apples to apples.
6939
12:29:33,160 --> 12:29:36,160
And if we compare the format here, do these two things look the same?
6940
12:29:36,160 --> 12:29:39,160
Yes, they do. They're both on CUDA and they're both floats.
6941
12:29:39,160 --> 12:29:42,160
We can see that it's got this one wrong.
6942
12:29:42,160 --> 12:29:47,160
Whereas the other ones look pretty good. Hmm, this might look pretty good if we visualize it.
6943
12:29:47,160 --> 12:29:53,160
So now let's, you might have already done this because I issued the challenge of plotting the decision boundaries.
6944
12:29:53,160 --> 12:30:08,160
Plot decision boundaries and let's go PLT dot figure and we're going to set up the fig size to equal 12.6 because, again, one of the advantages of hosting a machine learning cooking show is that you can code ahead of time.
6945
12:30:08,160 --> 12:30:13,160
And then we can go PLT dot title is train.
6946
12:30:13,160 --> 12:30:18,160
And then we're going to call our plot decision boundary function, which we've seen before.
6947
12:30:18,160 --> 12:30:20,160
Plot decision boundary.
6948
12:30:20,160 --> 12:30:22,160
And we're going to pass this one in.
6949
12:30:22,160 --> 12:30:28,160
We could do model three, but we could also pass it in our older models to model one that doesn't use it on the reality.
6950
12:30:28,160 --> 12:30:31,160
In fact, I reckon that'll be a great comparison.
6951
12:30:31,160 --> 12:30:39,160
So we'll also create another plot here for the test data and this will be on index number two.
6952
12:30:39,160 --> 12:30:46,160
So remember, subplot is a number of rows, number of columns, index where the plot appears.
6953
12:30:46,160 --> 12:30:48,160
We'll give this one a title.
6954
12:30:48,160 --> 12:30:50,160
Plot dot title.
6955
12:30:50,160 --> 12:30:53,160
This will be test and Google Colab.
6956
12:30:53,160 --> 12:30:54,160
I didn't want that.
6957
12:30:54,160 --> 12:31:00,160
As I said, this course is also a battle between me and Google Colab's autocorrect.
6958
12:31:00,160 --> 12:31:04,160
So we're going model three and we'll pass in the test data here.
6959
12:31:04,160 --> 12:31:10,160
And behind the scenes, our plot decision boundary function will create a beautiful graphic for us,
6960
12:31:10,160 --> 12:31:17,160
perform some predictions on the X, the features input, and then we'll compare them with the Y values.
6961
12:31:17,160 --> 12:31:19,160
Let's see what's going on here.
6962
12:31:19,160 --> 12:31:21,160
Oh, look at that.
6963
12:31:21,160 --> 12:31:25,160
Yes, our first nonlinear model.
6964
12:31:25,160 --> 12:31:29,160
Okay, it's not perfect, but it is certainly much better than the models that we had before.
6965
12:31:29,160 --> 12:31:30,160
Look at this.
6966
12:31:30,160 --> 12:31:32,160
Model one has no linearity.
6967
12:31:32,160 --> 12:31:35,160
Model one equals no nonlinearity.
6968
12:31:35,160 --> 12:31:38,160
I've got double negative there.
6969
12:31:38,160 --> 12:31:44,160
Whereas model three equals has nonlinearity.
6970
12:31:44,160 --> 12:31:51,160
So do you see the power of nonlinearity or better yet the power of linearity or linear straight lines with non straight lines?
6971
12:31:51,160 --> 12:31:55,160
So I feel like we could do better than this, though.
6972
12:31:55,160 --> 12:32:02,160
Here's your challenge is to can you improve model three to do better?
6973
12:32:02,160 --> 12:32:05,160
What did we get?
6974
12:32:05,160 --> 12:32:13,160
79% accuracy to do better than 80% accuracy on the test data.
6975
12:32:13,160 --> 12:32:15,160
I think you can.
6976
12:32:15,160 --> 12:32:16,160
So that's the challenge.
6977
12:32:16,160 --> 12:32:20,160
And if you're looking for hints on how to do so, where can you look?
6978
12:32:20,160 --> 12:32:22,160
Well, we've covered this improving a model.
6979
12:32:22,160 --> 12:32:26,160
So maybe you add some more layers, maybe you add more hidden units.
6980
12:32:26,160 --> 12:32:27,160
Maybe you fit for longer.
6981
12:32:27,160 --> 12:32:32,160
Maybe you if you add more layers, you put a relio activation function on top of those as well.
6982
12:32:32,160 --> 12:32:36,160
Maybe you lower the learning rate because right now we've got 0.1.
6983
12:32:36,160 --> 12:32:39,160
So give this a shot, try and improve it.
6984
12:32:39,160 --> 12:32:40,160
I think you can do it.
6985
12:32:40,160 --> 12:32:42,160
But we're going to push forward.
6986
12:32:42,160 --> 12:32:45,160
That's going to be your challenge for some extra curriculum.
6987
12:32:45,160 --> 12:32:51,160
I think in the next section, we've seen our nonlinear activation functions in action.
6988
12:32:51,160 --> 12:32:54,160
Let's write some code to replicate them.
6989
12:32:54,160 --> 12:32:57,160
I'll see you there.
6990
12:32:57,160 --> 12:32:58,160
Welcome back.
6991
12:32:58,160 --> 12:33:02,160
In the last video, I left off with the challenge of improving model three to do better than
6992
12:33:02,160 --> 12:33:04,160
80% accuracy on the test data.
6993
12:33:04,160 --> 12:33:06,160
I hope you gave it a shot.
6994
12:33:06,160 --> 12:33:08,160
But here are some of the things I would have done.
6995
12:33:08,160 --> 12:33:12,160
As I potentially add more layers, I maybe increase the number of hidden units,
6996
12:33:12,160 --> 12:33:17,160
and then if we needed to fit for longer and maybe lower the learning rate to 0.01.
6997
12:33:17,160 --> 12:33:22,160
But I'll leave that for you to explore because that's the motto of the data scientists, right?
6998
12:33:22,160 --> 12:33:25,160
Is to experiment, experiment, experiment.
6999
12:33:25,160 --> 12:33:27,160
So let's go in here.
7000
12:33:27,160 --> 12:33:31,160
We've seen our nonlinear activation functions in practice.
7001
12:33:31,160 --> 12:33:33,160
Let's replicate them.
7002
12:33:33,160 --> 12:33:38,160
So replicating nonlinear activation functions.
7003
12:33:38,160 --> 12:33:46,160
And remember neural networks rather than us telling the model what to learn.
7004
12:33:46,160 --> 12:33:52,160
We give it the tools to discover patterns in data.
7005
12:33:52,160 --> 12:34:01,160
And it tries to figure out the best patterns on its own.
7006
12:34:01,160 --> 12:34:04,160
And what are these tools?
7007
12:34:04,160 --> 12:34:06,160
That's right down here.
7008
12:34:06,160 --> 12:34:07,160
We've seen this in action.
7009
12:34:07,160 --> 12:34:13,160
And these tools are linear and nonlinear functions.
7010
12:34:13,160 --> 12:34:18,160
So a neural network is a big stack of linear and nonlinear functions.
7011
12:34:18,160 --> 12:34:21,160
For us, we've only got about four layers or so, four or five layers.
7012
12:34:21,160 --> 12:34:24,160
But as I said, other networks can get much larger.
7013
12:34:24,160 --> 12:34:26,160
But the premise remains.
7014
12:34:26,160 --> 12:34:30,160
Some form of linear and nonlinear manipulation of the data.
7015
12:34:30,160 --> 12:34:32,160
So let's get out of this.
7016
12:34:32,160 --> 12:34:36,160
Let's make our workspace a little bit more cleaner.
7017
12:34:36,160 --> 12:34:38,160
Replicating nonlinear activation functions.
7018
12:34:38,160 --> 12:34:40,160
So let's create a tensor to start with.
7019
12:34:40,160 --> 12:34:43,160
Everything starts from the tensor.
7020
12:34:43,160 --> 12:34:47,160
And we'll go A equals torch A range.
7021
12:34:47,160 --> 12:34:52,160
And we're going to create a range from negative 10 to 10 with a step of one.
7022
12:34:52,160 --> 12:34:57,160
And we can set the D type here to equal torch dot float 32.
7023
12:34:57,160 --> 12:34:59,160
But we don't actually need to.
7024
12:34:59,160 --> 12:35:00,160
That's going to be the default.
7025
12:35:00,160 --> 12:35:06,160
So if we set A here, A dot D type.
7026
12:35:06,160 --> 12:35:10,160
Then we've got torch float 32 and I'm pretty sure if we've got rid of that.
7027
12:35:10,160 --> 12:35:13,160
Oh, we've got torch in 64.
7028
12:35:13,160 --> 12:35:15,160
Why is that happening?
7029
12:35:15,160 --> 12:35:18,160
Well, let's check out A.
7030
12:35:18,160 --> 12:35:23,160
Oh, it's because we've got integers as our values because we have a step as one.
7031
12:35:23,160 --> 12:35:26,160
If we turn this into a float, what's going to happen?
7032
12:35:26,160 --> 12:35:28,160
We get float 32.
7033
12:35:28,160 --> 12:35:29,160
But we'll keep it.
7034
12:35:29,160 --> 12:35:31,160
Otherwise, this is going to be what?
7035
12:35:31,160 --> 12:35:32,160
About a hundred numbers?
7036
12:35:32,160 --> 12:35:33,160
Yeah, no, that's too many.
7037
12:35:33,160 --> 12:35:41,160
Let's keep it at negative 10 to 10 and we'll set the D type here to torch float 32.
7038
12:35:41,160 --> 12:35:42,160
Beautiful.
7039
12:35:42,160 --> 12:35:46,160
So it looks like PyTorch's default data type for integers is in 64.
7040
12:35:46,160 --> 12:35:52,160
But we're going to work with float 32 because float 32, if our data wasn't float 32 with
7041
12:35:52,160 --> 12:35:56,160
the functions we're about to create, we might run into some errors.
7042
12:35:56,160 --> 12:35:59,160
So let's visualize this data.
7043
12:35:59,160 --> 12:36:04,160
I want you to guess, is this a straight line or non-straight line?
7044
12:36:04,160 --> 12:36:05,160
You've got three seconds.
7045
12:36:05,160 --> 12:36:09,160
One, two, three.
7046
12:36:09,160 --> 12:36:11,160
Straight line.
7047
12:36:11,160 --> 12:36:12,160
There we go.
7048
12:36:12,160 --> 12:36:16,160
We've got negative 10 to positive 10 up here or nine.
7049
12:36:16,160 --> 12:36:17,160
Close enough.
7050
12:36:17,160 --> 12:36:19,160
And so how would we turn this straight line?
7051
12:36:19,160 --> 12:36:22,160
If it's a straight line, it's linear.
7052
12:36:22,160 --> 12:36:26,160
How would we perform the relu activation function on this?
7053
12:36:26,160 --> 12:36:31,160
Now, we could of course call torch relu on A.
7054
12:36:31,160 --> 12:36:34,160
Actually, let's in fact just plot this.
7055
12:36:34,160 --> 12:36:37,160
PLT dot plot on torch relu.
7056
12:36:37,160 --> 12:36:39,160
What does this look like?
7057
12:36:39,160 --> 12:36:40,160
Boom, there we go.
7058
12:36:40,160 --> 12:36:42,160
But we want to replicate the relu function.
7059
12:36:42,160 --> 12:36:44,160
So let's go nn dot relu.
7060
12:36:44,160 --> 12:36:46,160
What does it do?
7061
12:36:46,160 --> 12:36:48,160
We've seen this before.
7062
12:36:48,160 --> 12:36:49,160
So we need the max.
7063
12:36:49,160 --> 12:36:51,160
We need to return based on an input.
7064
12:36:51,160 --> 12:36:54,160
We need the max of zero and x.
7065
12:36:54,160 --> 12:36:56,160
So let's give it a shot.
7066
12:36:56,160 --> 12:36:58,160
We'll come here.
7067
12:36:58,160 --> 12:37:00,160
Again, we need more space.
7068
12:37:00,160 --> 12:37:02,160
There can never be enough code space here.
7069
12:37:02,160 --> 12:37:03,160
I like writing lots of code.
7070
12:37:03,160 --> 12:37:04,160
I don't know about you.
7071
12:37:04,160 --> 12:37:06,160
But let's go relu.
7072
12:37:06,160 --> 12:37:09,160
We'll take an input x, which will be some form of tensor.
7073
12:37:09,160 --> 12:37:13,160
And we'll go return torch dot maximum.
7074
12:37:13,160 --> 12:37:15,160
I think you could just do torch dot max.
7075
12:37:15,160 --> 12:37:17,160
But we'll try maximum.
7076
12:37:17,160 --> 12:37:21,160
Torch dot tensor zero.
7077
12:37:21,160 --> 12:37:26,160
So the maximum is going to return the max between whatever this is.
7078
12:37:26,160 --> 12:37:29,160
One option and whatever the other option is.
7079
12:37:29,160 --> 12:37:34,160
So inputs must be tensors.
7080
12:37:34,160 --> 12:37:40,160
So maybe we could just give a type hint here that this is torch dot tensor.
7081
12:37:40,160 --> 12:37:42,160
And this should return a tensor too.
7082
12:37:42,160 --> 12:37:44,160
Return torch dot tensor.
7083
12:37:44,160 --> 12:37:45,160
Beautiful.
7084
12:37:45,160 --> 12:37:47,160
You're ready to try it out.
7085
12:37:47,160 --> 12:37:49,160
Let's see what our relu function does.
7086
12:37:49,160 --> 12:37:51,160
Relu A.
7087
12:37:51,160 --> 12:37:52,160
Wonderful.
7088
12:37:52,160 --> 12:37:55,160
It looks like we got quite a similar output to before.
7089
12:37:55,160 --> 12:37:57,160
Here's our original A.
7090
12:37:57,160 --> 12:37:59,160
So we've got negative numbers.
7091
12:37:59,160 --> 12:38:00,160
There we go.
7092
12:38:00,160 --> 12:38:05,160
So recall that the relu activation function turns all negative numbers into zero
7093
12:38:05,160 --> 12:38:08,160
because it takes the maximum between zero and the input.
7094
12:38:08,160 --> 12:38:11,160
And if the input's negative, well then zero is bigger than it.
7095
12:38:11,160 --> 12:38:15,160
And it leaves all of the positive values as they are.
7096
12:38:15,160 --> 12:38:17,160
So that's the beauty of relu.
7097
12:38:17,160 --> 12:38:20,160
Quite simple, but very effective.
7098
12:38:20,160 --> 12:38:25,160
So let's plot relu activation function.
7099
12:38:25,160 --> 12:38:26,160
Our custom one.
7100
12:38:26,160 --> 12:38:29,160
We will go PLT dot plot.
7101
12:38:29,160 --> 12:38:33,160
We'll call our relu function on A.
7102
12:38:33,160 --> 12:38:36,160
Let's see what this looks like.
7103
12:38:36,160 --> 12:38:38,160
Oh, look at us go.
7104
12:38:38,160 --> 12:38:39,160
Well done.
7105
12:38:39,160 --> 12:38:42,160
Just the exact same as the torch relu function.
7106
12:38:42,160 --> 12:38:43,160
Easy as that.
7107
12:38:43,160 --> 12:38:48,160
And what's another nonlinear activation function that we've used before?
7108
12:38:48,160 --> 12:38:54,160
Well, I believe one of them is if we go down to here, what did we say before?
7109
12:38:54,160 --> 12:38:55,160
Sigmoid.
7110
12:38:55,160 --> 12:38:56,160
Where is that?
7111
12:38:56,160 --> 12:38:57,160
Where are you, Sigmoid?
7112
12:38:57,160 --> 12:38:58,160
Here we go.
7113
12:38:58,160 --> 12:38:59,160
Hello, Sigmoid.
7114
12:38:59,160 --> 12:39:02,160
Oh, this has got a little bit more going on here.
7115
12:39:02,160 --> 12:39:06,160
One over one plus exponential of negative x.
7116
12:39:06,160 --> 12:39:12,160
So Sigmoid or this little symbol for Sigmoid of x, which is an input.
7117
12:39:12,160 --> 12:39:13,160
We get this.
7118
12:39:13,160 --> 12:39:15,160
So let's try and replicate this.
7119
12:39:15,160 --> 12:39:18,160
I might just bring this one in here.
7120
12:39:18,160 --> 12:39:23,160
Right now, let's do the same for Sigmoid.
7121
12:39:23,160 --> 12:39:26,160
So what do we have here?
7122
12:39:26,160 --> 12:39:29,160
Well, we want to create a custom Sigmoid.
7123
12:39:29,160 --> 12:39:32,160
And we want to have some sort of input, x.
7124
12:39:32,160 --> 12:39:39,160
And we want to return one divided by, do we have the function in Sigmoid?
7125
12:39:39,160 --> 12:39:43,160
One divided by one plus exponential.
7126
12:39:43,160 --> 12:39:50,160
One plus torch dot exp for exponential on negative x.
7127
12:39:50,160 --> 12:39:55,160
And we might put the bottom side in brackets so that it does that operation.
7128
12:39:55,160 --> 12:39:58,160
I reckon that looks all right to me.
7129
12:39:58,160 --> 12:40:04,160
So one divided by one plus torch exponential of negative x.
7130
12:40:04,160 --> 12:40:05,160
Do we have that?
7131
12:40:05,160 --> 12:40:06,160
Yes, we do.
7132
12:40:06,160 --> 12:40:08,160
Well, there's only one real way to find out.
7133
12:40:08,160 --> 12:40:11,160
Let's plot the torch version of Sigmoid.
7134
12:40:11,160 --> 12:40:14,160
Torch dot Sigmoid and we'll pass in x.
7135
12:40:14,160 --> 12:40:16,160
See what happens.
7136
12:40:16,160 --> 12:40:19,160
And then, oh, we have a.
7137
12:40:19,160 --> 12:40:20,160
My bad.
7138
12:40:20,160 --> 12:40:21,160
A is our tensor.
7139
12:40:21,160 --> 12:40:22,160
What do we get?
7140
12:40:22,160 --> 12:40:24,160
We get a curved line.
7141
12:40:24,160 --> 12:40:25,160
Wonderful.
7142
12:40:25,160 --> 12:40:27,160
And then we go plt dot plot.
7143
12:40:27,160 --> 12:40:30,160
And we're going to use our Sigmoid function on a.
7144
12:40:30,160 --> 12:40:33,160
Did we replicate torch's Sigmoid function?
7145
12:40:33,160 --> 12:40:35,160
Yes, we did.
7146
12:40:35,160 --> 12:40:38,160
Ooh, now.
7147
12:40:38,160 --> 12:40:41,160
See, this is what's happening behind the scenes with our neural networks.
7148
12:40:41,160 --> 12:40:45,160
Of course, you could do more complicated activation functions or layers and whatnot.
7149
12:40:45,160 --> 12:40:47,160
And you can try to replicate them.
7150
12:40:47,160 --> 12:40:49,160
In fact, that's a great exercise to try and do.
7151
12:40:49,160 --> 12:40:54,160
But we've essentially across the videos and the sections that we've done, we've replicated our linear layer.
7152
12:40:54,160 --> 12:40:56,160
And we've replicated the relu.
7153
12:40:56,160 --> 12:41:00,160
So we've actually built this model from scratch, or we could if we really wanted to.
7154
12:41:00,160 --> 12:41:06,160
But it's a lot easier to use PyTorch's layers because we're building neural networks here like Lego bricks,
7155
12:41:06,160 --> 12:41:08,160
stacking together these layers in some way, shape, or form.
7156
12:41:08,160 --> 12:41:14,160
And because they're a part of PyTorch, we know that they've been error-tested and they compute as fast as possible
7157
12:41:14,160 --> 12:41:18,160
behind the scenes and use GPU and get a whole bunch of benefits.
7158
12:41:18,160 --> 12:41:23,160
PyTorch offers a lot of benefits by using these layers rather than writing them ourselves.
7159
12:41:23,160 --> 12:41:25,160
And so this is what our model is doing.
7160
12:41:25,160 --> 12:41:31,160
It's literally like to learn these values and decrease the loss function and increase the accuracy.
7161
12:41:31,160 --> 12:41:37,160
It's combining linear layers and nonlinear layers or nonlinear functions.
7162
12:41:37,160 --> 12:41:39,160
Where's our relu function here?
7163
12:41:39,160 --> 12:41:41,160
A relu function like this behind the scenes.
7164
12:41:41,160 --> 12:41:47,160
So just combining linear and nonlinear functions to fit a data set.
7165
12:41:47,160 --> 12:41:54,160
And that premise remains even on our small data set and on very large data sets and very large models.
7166
12:41:54,160 --> 12:41:58,160
So with that being said, I think it's time for us to push on.
7167
12:41:58,160 --> 12:42:00,160
We've covered a fair bit of code here.
7168
12:42:00,160 --> 12:42:04,160
But we've worked on a binary classification problem.
7169
12:42:04,160 --> 12:42:08,160
Have we worked on a multi-class classification problem yet?
7170
12:42:08,160 --> 12:42:11,160
Do we have that here? Where's my fun graphic?
7171
12:42:11,160 --> 12:42:15,160
We have multi-class classification.
7172
12:42:15,160 --> 12:42:17,160
I think that's what we cover next.
7173
12:42:17,160 --> 12:42:22,160
We're going to put together all of the steps in our workflow that we've covered for binary classification.
7174
12:42:22,160 --> 12:42:26,160
But now let's move on to a multi-class classification problem.
7175
12:42:26,160 --> 12:42:32,160
If you're with me, I'll see you in the next video.
7176
12:42:32,160 --> 12:42:33,160
Welcome back.
7177
12:42:33,160 --> 12:42:37,160
In the last few videos we've been harnessing the power of nonlinearity.
7178
12:42:37,160 --> 12:42:41,160
Specifically non-straight line functions and we replicated some here.
7179
12:42:41,160 --> 12:42:49,160
And we learned that a neural network combines linear and nonlinear functions to find patterns in data.
7180
12:42:49,160 --> 12:42:53,160
And for our simple red versus blue dots, once we added a little bit of nonlinearity,
7181
12:42:53,160 --> 12:42:58,160
we found the secret source of to start separating our blue and red dots.
7182
12:42:58,160 --> 12:43:02,160
And I also issued you the challenge to try and improve this and I think you can do it.
7183
12:43:02,160 --> 12:43:04,160
So hopefully you've given that a go.
7184
12:43:04,160 --> 12:43:06,160
But now let's keep pushing forward.
7185
12:43:06,160 --> 12:43:09,160
We're going to reiterate over basically everything that we've done,
7186
12:43:09,160 --> 12:43:15,160
except this time from the point of view of a multi-class classification problem.
7187
12:43:15,160 --> 12:43:26,160
So I believe we're up to section eight, putting it all together with a multi-class classification problem.
7188
12:43:26,160 --> 12:43:27,160
Beautiful.
7189
12:43:27,160 --> 12:43:37,160
And recall the difference between binary classification equals one thing or another such as cat versus dog.
7190
12:43:37,160 --> 12:43:47,160
If you were building a cat versus dog image classifier, spam versus not spam for say emails that were spam or not spam or
7191
12:43:47,160 --> 12:43:52,160
even internet posts on Facebook or Twitter or one of the other internet services.
7192
12:43:52,160 --> 12:43:57,160
And then fraud or not fraud for credit card transactions.
7193
12:43:57,160 --> 12:44:05,160
And then multi-class classification is more than one thing or another.
7194
12:44:05,160 --> 12:44:11,160
So we could have cat versus dog versus chicken.
7195
12:44:11,160 --> 12:44:14,160
So I think we've got all the skills to do this.
7196
12:44:14,160 --> 12:44:18,160
Our architecture might be a little bit different for a multi-class classification problem.
7197
12:44:18,160 --> 12:44:20,160
But we've got so many building blocks now.
7198
12:44:20,160 --> 12:44:21,160
It's not funny.
7199
12:44:21,160 --> 12:44:27,160
Let's clean up this and we'll add some more code cells and just to reiterate.
7200
12:44:27,160 --> 12:44:29,160
So we've gone over nonlinearity.
7201
12:44:29,160 --> 12:44:34,160
The question is what could you draw if you had an unlimited amount of straight linear and non-straight,
7202
12:44:34,160 --> 12:44:38,160
nonlinear lines, I believe you could draw some pretty intricate patterns.
7203
12:44:38,160 --> 12:44:41,160
And that is what our neural networks are doing behind the scenes.
7204
12:44:41,160 --> 12:44:47,160
And so we also learned that if we wanted to just replicate some of these nonlinear functions,
7205
12:44:47,160 --> 12:44:51,160
some of the ones that we've used before, we could create a range.
7206
12:44:51,160 --> 12:44:54,160
Linear activation is just the line itself.
7207
12:44:54,160 --> 12:45:00,160
And then if we wanted to do sigmoid, we get this curl here.
7208
12:45:00,160 --> 12:45:07,160
And then if we wanted to do relu, well, we saw how to replicate the relu function as one.
7209
12:45:07,160 --> 12:45:09,160
These both are nonlinear.
7210
12:45:09,160 --> 12:45:18,160
And of course, torch.nn has far more nonlinear activations where they came from just as it has far more different layers.
7211
12:45:18,160 --> 12:45:20,160
And you'll get used to these with practice.
7212
12:45:20,160 --> 12:45:22,160
And that's what we're doing here.
7213
12:45:22,160 --> 12:45:24,160
So let's go back to the keynote.
7214
12:45:24,160 --> 12:45:26,160
So this is what we're going to be working on.
7215
12:45:26,160 --> 12:45:27,160
Multi-class classification.
7216
12:45:27,160 --> 12:45:29,160
So there's one of the big differences here.
7217
12:45:29,160 --> 12:45:33,160
We use the softmax activation function versus sigmoid.
7218
12:45:33,160 --> 12:45:35,160
There's another big difference here.
7219
12:45:35,160 --> 12:45:39,160
Instead of binary cross entropy, we use just cross entropy.
7220
12:45:39,160 --> 12:45:42,160
But I think most of it's going to stay the same.
7221
12:45:42,160 --> 12:45:44,160
We're going to see this in action in a second.
7222
12:45:44,160 --> 12:45:48,160
But let's just describe our problem space.
7223
12:45:48,160 --> 12:45:52,160
Just to go visual, we've covered a fair bit here.
7224
12:45:52,160 --> 12:45:53,160
Well done, everyone.
7225
12:45:53,160 --> 12:45:56,160
So binary versus multi-class classification.
7226
12:45:56,160 --> 12:45:59,160
Binary one thing or another.
7227
12:45:59,160 --> 12:46:00,160
Zero or one.
7228
12:46:00,160 --> 12:46:02,160
Multi-class could be three things.
7229
12:46:02,160 --> 12:46:04,160
Could be a thousand things.
7230
12:46:04,160 --> 12:46:05,160
Could be 5,000 things.
7231
12:46:05,160 --> 12:46:07,160
Could be 25 things.
7232
12:46:07,160 --> 12:46:09,160
So more than one thing or another.
7233
12:46:09,160 --> 12:46:12,160
But that's the basic premise we're going to go with.
7234
12:46:12,160 --> 12:46:14,160
Let's create some data, hey?
7235
12:46:14,160 --> 12:46:15,160
8.1.
7236
12:46:15,160 --> 12:46:21,160
Creating a 20 multi-class data set.
7237
12:46:21,160 --> 12:46:26,160
And so to create our data set, we're going to import our dependencies.
7238
12:46:26,160 --> 12:46:29,160
We're going to re-import torch, even though we already have it.
7239
12:46:29,160 --> 12:46:31,160
Just for a little bit of completeness.
7240
12:46:31,160 --> 12:46:33,160
And we're going to go map plotlib.
7241
12:46:33,160 --> 12:46:37,160
So we can plot, as always, we like to get visual where we can.
7242
12:46:37,160 --> 12:46:40,160
Visualize, visualize, visualize.
7243
12:46:40,160 --> 12:46:44,160
We're going to import from scikitlearn.datasets.
7244
12:46:44,160 --> 12:46:47,160
Let's get make blobs.
7245
12:46:47,160 --> 12:46:49,160
Now, where would I get this from?
7246
12:46:49,160 --> 12:46:51,160
SKlearn.datasets.
7247
12:46:51,160 --> 12:46:53,160
What do we get?
7248
12:46:53,160 --> 12:46:54,160
20 data sets.
7249
12:46:54,160 --> 12:46:57,160
Do we have classification?
7250
12:46:57,160 --> 12:46:58,160
20 data sets.
7251
12:46:58,160 --> 12:47:02,160
Do we have blobs?
7252
12:47:02,160 --> 12:47:06,160
If we just go make scikitlearn.
7253
12:47:06,160 --> 12:47:10,160
Classification data sets.
7254
12:47:10,160 --> 12:47:12,160
What do we get?
7255
12:47:12,160 --> 12:47:15,160
Here's one option.
7256
12:47:15,160 --> 12:47:17,160
There's also make blobs.
7257
12:47:17,160 --> 12:47:18,160
Beautiful.
7258
12:47:18,160 --> 12:47:19,160
Make blobs.
7259
12:47:19,160 --> 12:47:20,160
This is a code for that.
7260
12:47:20,160 --> 12:47:22,160
So let's just copy this in here.
7261
12:47:22,160 --> 12:47:23,160
And make blobs.
7262
12:47:23,160 --> 12:47:25,160
We're going to see this in action anyway.
7263
12:47:25,160 --> 12:47:26,160
Make blobs.
7264
12:47:26,160 --> 12:47:29,160
As you might have guessed, it makes some blobs for us.
7265
12:47:29,160 --> 12:47:31,160
I like blobs.
7266
12:47:31,160 --> 12:47:33,160
It's a fun word to say.
7267
12:47:33,160 --> 12:47:34,160
Blobs.
7268
12:47:34,160 --> 12:47:39,160
So we want train test split because we want to make a data set and then we want to split
7269
12:47:39,160 --> 12:47:40,160
it into train and test.
7270
12:47:40,160 --> 12:47:42,160
Let's set the number of hyper parameters.
7271
12:47:42,160 --> 12:47:48,160
So set the hyper parameters for data creation.
7272
12:47:48,160 --> 12:47:51,160
Now I got these from the documentation here.
7273
12:47:51,160 --> 12:47:52,160
Number of samples.
7274
12:47:52,160 --> 12:47:53,160
How many blobs do we want?
7275
12:47:53,160 --> 12:47:55,160
How many features do we want?
7276
12:47:55,160 --> 12:47:58,160
So say, for example, we wanted two different classes.
7277
12:47:58,160 --> 12:48:00,160
That would be binary classification.
7278
12:48:00,160 --> 12:48:02,160
Say, for example, you wanted 10 classes.
7279
12:48:02,160 --> 12:48:03,160
You could set this to 10.
7280
12:48:03,160 --> 12:48:05,160
And we're going to see what the others are in practice.
7281
12:48:05,160 --> 12:48:09,160
But if you want to read through them, you can well and truly do that.
7282
12:48:09,160 --> 12:48:12,160
So let's set up.
7283
12:48:12,160 --> 12:48:13,160
We want num classes.
7284
12:48:13,160 --> 12:48:15,160
Let's double what we've been working with.
7285
12:48:15,160 --> 12:48:18,160
We've been working with two classes, red dots or blue dots.
7286
12:48:18,160 --> 12:48:19,160
Let's step it up a notch.
7287
12:48:19,160 --> 12:48:20,160
We'll go to four classes.
7288
12:48:20,160 --> 12:48:21,160
Watch out, everyone.
7289
12:48:21,160 --> 12:48:24,160
And we're going to go number of features will be two.
7290
12:48:24,160 --> 12:48:26,160
So we have the same number of features.
7291
12:48:26,160 --> 12:48:28,160
And then the random seed is going to be 42.
7292
12:48:28,160 --> 12:48:31,160
You might be wondering why these are capitalized.
7293
12:48:31,160 --> 12:48:38,160
Well, generally, if we do have some hyper parameters that we say set at the start of a notebook,
7294
12:48:38,160 --> 12:48:43,160
you'll find it's quite common for people to write them as capital letters just to say
7295
12:48:43,160 --> 12:48:46,160
that, hey, these are some settings that you can change.
7296
12:48:46,160 --> 12:48:51,160
You don't have to, but I'm just going to introduce that anyway because you might stumble upon it yourself.
7297
12:48:51,160 --> 12:48:54,160
So create multi-class data.
7298
12:48:54,160 --> 12:48:58,160
We're going to use the make blobs function here.
7299
12:48:58,160 --> 12:49:02,160
So we're going to create some x blobs, some feature blobs and some label blobs.
7300
12:49:02,160 --> 12:49:04,160
Let's see what these look like in a second.
7301
12:49:04,160 --> 12:49:08,160
I know I'm just saying blobs a lot.
7302
12:49:08,160 --> 12:49:11,160
But we pass in here, none samples.
7303
12:49:11,160 --> 12:49:12,160
How many do we want?
7304
12:49:12,160 --> 12:49:14,160
Let's create a thousand as well.
7305
12:49:14,160 --> 12:49:17,160
That could really be a hyper parameter, but we'll just leave that how it is for now.
7306
12:49:17,160 --> 12:49:23,160
Number of features is going to be num features.
7307
12:49:23,160 --> 12:49:29,160
Centres equals num classes.
7308
12:49:29,160 --> 12:49:33,160
So we're going to create four classes because we've set up num classes equal to four.
7309
12:49:33,160 --> 12:49:36,160
And then we're going to go center standard deviation.
7310
12:49:36,160 --> 12:49:41,160
We'll give them a little shake up, add a little bit of randomness in here.
7311
12:49:41,160 --> 12:49:45,160
Give the clusters a little shake up.
7312
12:49:45,160 --> 12:49:46,160
We'll mix them up a bit.
7313
12:49:46,160 --> 12:49:48,160
Make it a bit hard for our model.
7314
12:49:48,160 --> 12:49:50,160
But we'll see what this does in a second.
7315
12:49:50,160 --> 12:49:55,160
Random state equals random seed, which is our favorite random seed 42.
7316
12:49:55,160 --> 12:49:59,160
Of course, you can set it whatever number you want, but I like 42.
7317
12:49:59,160 --> 12:50:02,160
Oh, and we need a comma here, of course.
7318
12:50:02,160 --> 12:50:03,160
Beautiful.
7319
12:50:03,160 --> 12:50:05,160
Now, what do we have to do here?
7320
12:50:05,160 --> 12:50:09,160
Well, because we're using scikit-learn and scikit-learn leverages NumPy.
7321
12:50:09,160 --> 12:50:12,160
So let's turn our data into tenses.
7322
12:50:12,160 --> 12:50:14,160
Turn data into tenses.
7323
12:50:14,160 --> 12:50:16,160
And how do we do that?
7324
12:50:16,160 --> 12:50:22,160
Well, we grab x blob and we call torch from NumPy from NumPy.
7325
12:50:22,160 --> 12:50:24,160
If I could type, that would be fantastic.
7326
12:50:24,160 --> 12:50:25,160
That's all right.
7327
12:50:25,160 --> 12:50:26,160
We're doing pretty well today.
7328
12:50:26,160 --> 12:50:28,160
Haven't made too many typos.
7329
12:50:28,160 --> 12:50:31,160
We did make a few in a couple of videos before, but hey.
7330
12:50:31,160 --> 12:50:33,160
I'm only human.
7331
12:50:33,160 --> 12:50:39,160
So we're going to torch from NumPy and we're going to pass in the y blob.
7332
12:50:39,160 --> 12:50:45,160
And we'll turn it into torch dot float because remember NumPy defaults as float 64, whereas
7333
12:50:45,160 --> 12:50:47,160
PyTorch likes float 32.
7334
12:50:47,160 --> 12:50:53,160
So split into training and test.
7335
12:50:53,160 --> 12:50:58,160
And we're going to create x blob train y or x test.
7336
12:50:58,160 --> 12:51:01,160
x blob test.
7337
12:51:01,160 --> 12:51:04,160
We'll keep the blob nomenclature here.
7338
12:51:04,160 --> 12:51:08,160
y blob train and y blob test.
7339
12:51:08,160 --> 12:51:13,160
And here's again where we're going to leverage the train test split function from scikit-learn.
7340
12:51:13,160 --> 12:51:15,160
So thank you for that scikit-learn.
7341
12:51:15,160 --> 12:51:18,160
x blob and we're going to pass the y blob.
7342
12:51:18,160 --> 12:51:22,160
So features, labels, x is the features, y are the labels.
7343
12:51:22,160 --> 12:51:25,160
And a test size, we've been using a test size of 20%.
7344
12:51:25,160 --> 12:51:29,160
That means 80% of the data will be for the training data.
7345
12:51:29,160 --> 12:51:31,160
That's a fair enough split with our data set.
7346
12:51:31,160 --> 12:51:37,160
And we're going to set the random seed to random seed because generally normally train test split is random,
7347
12:51:37,160 --> 12:51:42,160
but because we want some reproducibility here, we're passing random seeds.
7348
12:51:42,160 --> 12:51:45,160
Finally, we need to get visual.
7349
12:51:45,160 --> 12:51:46,160
So let's plot the data.
7350
12:51:46,160 --> 12:51:51,160
Right now we've got a whole bunch of code and a whole bunch of talking, but not too much visuals going on.
7351
12:51:51,160 --> 12:51:56,160
So we'll write down here, visualize, visualize, visualize.
7352
12:51:56,160 --> 12:52:00,160
And we can call in plot.figure.
7353
12:52:00,160 --> 12:52:02,160
What size do we want?
7354
12:52:02,160 --> 12:52:09,160
I'm going to use my favorite hand in poker, which is 10-7, because it's generally worked out to be a good plot size.
7355
12:52:09,160 --> 12:52:16,160
In my experience, anyway, we'll go x blob.
7356
12:52:16,160 --> 12:52:22,160
And we want the zero index here, and then we'll grab x blob as well.
7357
12:52:22,160 --> 12:52:25,160
And you might notice that we're visualizing the whole data set here.
7358
12:52:25,160 --> 12:52:27,160
That's perfectly fine.
7359
12:52:27,160 --> 12:52:33,160
We could visualize, train and test separately if we really wanted to, but I'll leave that as a level challenge to you.
7360
12:52:33,160 --> 12:52:38,160
And we're going to go red, yellow, blue.
7361
12:52:38,160 --> 12:52:40,160
Wonderful.
7362
12:52:40,160 --> 12:52:41,160
What do we get wrong?
7363
12:52:41,160 --> 12:52:43,160
Oh, of course we got something wrong.
7364
12:52:43,160 --> 12:52:46,160
Santa STD, did we spell center wrong?
7365
12:52:46,160 --> 12:52:47,160
Cluster STD.
7366
12:52:47,160 --> 12:52:49,160
That's what I missed.
7367
12:52:49,160 --> 12:52:52,160
So, cluster STD.
7368
12:52:52,160 --> 12:52:53,160
Standard deviation.
7369
12:52:53,160 --> 12:52:54,160
What do we get wrong?
7370
12:52:54,160 --> 12:52:55,160
Random seed.
7371
12:52:55,160 --> 12:52:57,160
Oh, this needs to be random state.
7372
12:52:57,160 --> 12:52:59,160
Oh, another typo.
7373
12:52:59,160 --> 12:53:00,160
You know what?
7374
12:53:00,160 --> 12:53:02,160
Just as I said, I wasn't getting too many typos.
7375
12:53:02,160 --> 12:53:03,160
I'll get three.
7376
12:53:03,160 --> 12:53:04,160
There we go.
7377
12:53:04,160 --> 12:53:05,160
Look at that.
7378
12:53:05,160 --> 12:53:08,160
Our first multi-class classification data set.
7379
12:53:08,160 --> 12:53:11,160
So if we set this to zero, what does it do to our clusters?
7380
12:53:11,160 --> 12:53:15,160
Let's take note of what's going on here, particularly the space between all of the dots.
7381
12:53:15,160 --> 12:53:20,160
Now, if we set this cluster STD to zero, what happens?
7382
12:53:20,160 --> 12:53:23,160
We get dots that are really just, look at that.
7383
12:53:23,160 --> 12:53:24,160
That's too easy.
7384
12:53:24,160 --> 12:53:26,160
Let's mix it up, all right?
7385
12:53:26,160 --> 12:53:28,160
Now, you can pick whatever value you want here.
7386
12:53:28,160 --> 12:53:34,160
I'm going to use 1.5, because now we need to build a model that's going to draw some lines between these four colors.
7387
12:53:34,160 --> 12:53:36,160
Two axes, four different classes.
7388
12:53:36,160 --> 12:53:42,160
But it's not going to be perfect because we've got some red dots that are basically in the blue dots.
7389
12:53:42,160 --> 12:53:45,160
And so, what's our next step?
7390
12:53:45,160 --> 12:53:47,160
Well, we've got some data ready.
7391
12:53:47,160 --> 12:53:49,160
It's now time to build a model.
7392
12:53:49,160 --> 12:53:51,160
So, I'll see you in the next video.
7393
12:53:51,160 --> 12:53:55,160
Let's build our first multi-class classification model.
7394
12:53:57,160 --> 12:53:58,160
Welcome back.
7395
12:53:58,160 --> 12:54:03,160
In the last video, we created our multi-class classification data set,
7396
12:54:03,160 --> 12:54:06,160
using scikit-learn's make-blobs function.
7397
12:54:06,160 --> 12:54:08,160
And now, why are we doing this?
7398
12:54:08,160 --> 12:54:12,160
Well, because we're going to put all of what we've covered so far together.
7399
12:54:12,160 --> 12:54:17,160
But instead of using binary classification or working with binary classification data,
7400
12:54:17,160 --> 12:54:20,160
we're going to do it with multi-class classification data.
7401
12:54:20,160 --> 12:54:26,160
So, with that being said, let's get into building our multi-class classification model.
7402
12:54:26,160 --> 12:54:29,160
So, we'll create a little heading here.
7403
12:54:29,160 --> 12:54:36,160
Building a multi-class classification model in PyTorch.
7404
12:54:36,160 --> 12:54:39,160
And now, I want you to have a think about this.
7405
12:54:39,160 --> 12:54:42,160
We spent the last few videos covering non-linearity.
7406
12:54:42,160 --> 12:54:46,160
Does this data set need non-linearity?
7407
12:54:46,160 --> 12:54:51,160
As in, could we separate this data set with pure straight lines?
7408
12:54:51,160 --> 12:54:54,160
Or do we need some non-straight lines as well?
7409
12:54:54,160 --> 12:54:56,160
Have a think about that.
7410
12:54:56,160 --> 12:55:01,160
It's okay if you're not sure, we're going to be building a model to fit this data anyway,
7411
12:55:01,160 --> 12:55:03,160
or draw patterns in this data anyway.
7412
12:55:03,160 --> 12:55:06,160
And now, before we get into coding a model,
7413
12:55:06,160 --> 12:55:10,160
so for multi-class classification, we've got this.
7414
12:55:10,160 --> 12:55:14,160
For the input layer shape, we need to define the in features.
7415
12:55:14,160 --> 12:55:18,160
So, how many in features do we have for the hidden layers?
7416
12:55:18,160 --> 12:55:23,160
Well, we could set this to whatever we want, but we're going to keep it nice and simple for now.
7417
12:55:23,160 --> 12:55:28,160
For the number of neurons per hidden layer, again, this could be almost whatever we want,
7418
12:55:28,160 --> 12:55:32,160
but because we're working with a relatively small data set,
7419
12:55:32,160 --> 12:55:36,160
we've only got four different classes, we've only got a thousand data points,
7420
12:55:36,160 --> 12:55:39,160
we'll keep it small as well, but you could change this.
7421
12:55:39,160 --> 12:55:43,160
Remember, you can change any of these because they're hyper parameters.
7422
12:55:43,160 --> 12:55:49,160
For the output layer shape, well, how many output features do we want?
7423
12:55:49,160 --> 12:55:53,160
We need one per class, how many classes do we have?
7424
12:55:53,160 --> 12:55:59,160
We have four clusters of different dots here, so we'll need four output features.
7425
12:55:59,160 --> 12:56:03,160
And then if we go back, we have an output activation of softmax, we haven't seen that yet,
7426
12:56:03,160 --> 12:56:09,160
and then we have a loss function, rather than binary cross entropy, we have cross entropy.
7427
12:56:09,160 --> 12:56:15,160
And then optimizer as well is the same as binary classification, two of the most common
7428
12:56:15,160 --> 12:56:19,160
are SGDs, stochastic gradient descent, or the atom optimizer,
7429
12:56:19,160 --> 12:56:23,160
but of course, the torch.optim package has many different options as well.
7430
12:56:23,160 --> 12:56:28,160
So let's push forward and create our first multi-class classification model.
7431
12:56:28,160 --> 12:56:33,160
First, we're going to create, we're going to get into the habit of creating
7432
12:56:33,160 --> 12:56:39,160
device agnostic code, and we'll set the device here, equals CUDA,
7433
12:56:39,160 --> 12:56:44,160
nothing we haven't seen before, but again, we're doing this to put it all together,
7434
12:56:44,160 --> 12:56:47,160
so that we have a lot of practice.
7435
12:56:47,160 --> 12:56:54,160
Is available, else CPU, and let's go device.
7436
12:56:54,160 --> 12:56:58,160
So we should have a GPU available, beautiful CUDA.
7437
12:56:58,160 --> 12:57:03,160
Now, of course, if you don't, you can go change runtime type, select GPU here,
7438
12:57:03,160 --> 12:57:08,160
that will restart the runtime, you'll have to run all of the code that's before this cell as well,
7439
12:57:08,160 --> 12:57:11,160
but I'm going to be using a GPU.
7440
12:57:11,160 --> 12:57:14,160
You don't necessarily need one because our data set's quite small,
7441
12:57:14,160 --> 12:57:20,160
and our models aren't going to be very large, but we set this up so we have device agnostic code.
7442
12:57:20,160 --> 12:57:25,160
And so let's build a multi-class classification model.
7443
12:57:25,160 --> 12:57:31,160
Look at us go, just covering all of the foundations of classification in general here,
7444
12:57:31,160 --> 12:57:38,160
and we now know that we can combine linear and non-linear functions to create
7445
12:57:38,160 --> 12:57:42,160
neural networks that can find patterns in almost any kind of data.
7446
12:57:42,160 --> 12:57:46,160
So I'm going to call my class here blob model, and it's going to, of course,
7447
12:57:46,160 --> 12:57:51,160
inherit from nn.module, and we're going to upgrade our class here.
7448
12:57:51,160 --> 12:57:54,160
We're going to take some inputs here, and I'll show you how to do this.
7449
12:57:54,160 --> 12:57:58,160
If you're familiar with Python classes, you would have already done stuff like this,
7450
12:57:58,160 --> 12:58:01,160
but we're going to set some parameters for our models,
7451
12:58:01,160 --> 12:58:05,160
because as you write more and more complex classes, you'll want to take inputs here.
7452
12:58:05,160 --> 12:58:11,160
And I'm going to pre-build the, or pre-set the hidden units parameter to eight.
7453
12:58:11,160 --> 12:58:15,160
Because I've decided, you know what, I'm going to start off with eight hidden units,
7454
12:58:15,160 --> 12:58:19,160
and if I wanted to change this to 128, I could.
7455
12:58:19,160 --> 12:58:22,160
But in the constructor here, we've got some options.
7456
12:58:22,160 --> 12:58:24,160
So we have input features.
7457
12:58:24,160 --> 12:58:28,160
We're going to set these programmatically as inputs to our class when we instantiate it.
7458
12:58:28,160 --> 12:58:31,160
The same with output features as well.
7459
12:58:31,160 --> 12:58:35,160
And so here, we're going to call self.
7460
12:58:35,160 --> 12:58:36,160
Oh, no, super.
7461
12:58:36,160 --> 12:58:37,160
Sorry.
7462
12:58:37,160 --> 12:58:40,160
I always get this mixed up dot init.
7463
12:58:40,160 --> 12:58:42,160
And underscore underscore.
7464
12:58:42,160 --> 12:58:43,160
Beautiful.
7465
12:58:43,160 --> 12:58:46,160
So we could do a doc string here as well.
7466
12:58:46,160 --> 12:58:48,160
So let's write in this.
7467
12:58:48,160 --> 12:58:55,160
Initializes multi-class classification.
7468
12:58:55,160 --> 12:59:01,160
If I could spell class e-fication model.
7469
12:59:01,160 --> 12:59:03,160
Oh, this is great.
7470
12:59:03,160 --> 12:59:05,160
And then we have some arcs here.
7471
12:59:05,160 --> 12:59:08,160
This is just a standard way of writing doc strings.
7472
12:59:08,160 --> 12:59:13,160
If you want to find out, this is Google Python doc string guide.
7473
12:59:13,160 --> 12:59:15,160
There we go.
7474
12:59:15,160 --> 12:59:16,160
Google Python style guide.
7475
12:59:16,160 --> 12:59:19,160
This is where I get mine from.
7476
12:59:19,160 --> 12:59:20,160
You can scroll through this.
7477
12:59:20,160 --> 12:59:22,160
This is just a way to write Python code.
7478
12:59:22,160 --> 12:59:23,160
Yeah, there we go.
7479
12:59:23,160 --> 12:59:26,160
So we've got a little sentence saying what's going on.
7480
12:59:26,160 --> 12:59:27,160
We've got arcs.
7481
12:59:27,160 --> 12:59:31,160
We've got returns and we've got errors if something's going on.
7482
12:59:31,160 --> 12:59:33,160
So I highly recommend checking that out.
7483
12:59:33,160 --> 12:59:34,160
Just a little tidbit.
7484
12:59:34,160 --> 12:59:37,160
So this is if someone was to use our class later on.
7485
12:59:37,160 --> 12:59:39,160
They know what the input features are.
7486
12:59:39,160 --> 12:59:48,160
Input features, which is an int, which is number of input features to the model.
7487
12:59:48,160 --> 12:59:52,160
And then, of course, we've got output features, which is also an int.
7488
12:59:52,160 --> 12:59:56,160
Which is number of output features of the model.
7489
12:59:56,160 --> 13:00:00,160
And we've got the red line here is telling us we've got something wrong, but that's okay.
7490
13:00:00,160 --> 13:00:01,160
And then the hidden features.
7491
13:00:01,160 --> 13:00:07,160
Oh, well, this is number of output classes for the case of multi-class classification.
7492
13:00:07,160 --> 13:00:13,160
And then the hidden units.
7493
13:00:13,160 --> 13:00:23,160
Int and then number of hidden units between layers and then the default is eight.
7494
13:00:23,160 --> 13:00:24,160
Beautiful.
7495
13:00:24,160 --> 13:00:26,160
And then under that, we'll just do that.
7496
13:00:26,160 --> 13:00:29,160
Is that going to fix itself?
7497
13:00:29,160 --> 13:00:30,160
Yeah, there we go.
7498
13:00:30,160 --> 13:00:32,160
We could put in what it returns.
7499
13:00:32,160 --> 13:00:34,160
Returns, whatever it returns.
7500
13:00:34,160 --> 13:00:40,160
And then an example use case, but I'll leave that for you to fill out.
7501
13:00:40,160 --> 13:00:41,160
If you like.
7502
13:00:41,160 --> 13:00:44,160
So let's instantiate some things here.
7503
13:00:44,160 --> 13:00:50,160
What we might do is write self dot linear layer stack.
7504
13:00:50,160 --> 13:00:52,160
Self dot linear layer stack.
7505
13:00:52,160 --> 13:00:56,160
And we will set this as nn dot sequential.
7506
13:00:56,160 --> 13:00:58,160
Ooh, we haven't seen this before.
7507
13:00:58,160 --> 13:01:01,160
But we're just going to look at a different way of writing a model here.
7508
13:01:01,160 --> 13:01:04,160
Previously, when we created a model, what did we do?
7509
13:01:04,160 --> 13:01:10,160
Well, we instantiated each layer as its own parameter here.
7510
13:01:10,160 --> 13:01:15,160
And then we called on them one by one, but we did it in a straightforward fashion.
7511
13:01:15,160 --> 13:01:19,160
So that's why we're going to use sequential here to just step through our layers.
7512
13:01:19,160 --> 13:01:24,160
We're not doing anything too fancy, so we'll just set up a sequential stack of layers here.
7513
13:01:24,160 --> 13:01:30,160
And recall that sequential just steps through, passes the data through each one of these layers one by one.
7514
13:01:30,160 --> 13:01:38,160
And because we've set up the parameters up here, input features can equal to input features.
7515
13:01:38,160 --> 13:01:41,160
And output features, what is this going to be?
7516
13:01:41,160 --> 13:01:45,160
Is this going to be output features or is this going to be hidden units?
7517
13:01:45,160 --> 13:01:48,160
It's going to be hidden units because it's not the final layer.
7518
13:01:48,160 --> 13:01:53,160
We want the final layer to output our output features.
7519
13:01:53,160 --> 13:01:59,160
So input features, this will be hidden units because remember the subsequent layer needs to line up with the previous layer.
7520
13:01:59,160 --> 13:02:04,160
Output features, we're going to create another one that outputs hidden units.
7521
13:02:04,160 --> 13:02:16,160
And then we'll go in n.linear in features equals hidden units because it takes the output features of the previous layer.
7522
13:02:16,160 --> 13:02:19,160
So as you see here, the output features of this feeds into here.
7523
13:02:19,160 --> 13:02:22,160
The output features of this feeds into here.
7524
13:02:22,160 --> 13:02:25,160
And then finally, this is going to be our final layer.
7525
13:02:25,160 --> 13:02:26,160
We'll do three layers.
7526
13:02:26,160 --> 13:02:31,160
Output features equals output features.
7527
13:02:31,160 --> 13:02:35,160
Wonderful. So how do we know the values of each of these?
7528
13:02:35,160 --> 13:02:43,160
Well, let's have a look at xtrain.shape and ytrain.shape.
7529
13:02:43,160 --> 13:02:46,160
So in the case of x, we have two input features.
7530
13:02:46,160 --> 13:02:51,160
And in the case of y, well, this is a little confusing as well because y is a scalar.
7531
13:02:51,160 --> 13:02:55,160
But what do you think the values for y are going to be?
7532
13:02:55,160 --> 13:03:01,160
Well, let's go NP. Or is there torch.unique? I'm not sure. Let's find out together, hey?
7533
13:03:01,160 --> 13:03:03,160
Torch unique.
7534
13:03:03,160 --> 13:03:07,160
Zero on one, ytrain. Oh, we need y blob train. That's right, blob.
7535
13:03:07,160 --> 13:03:11,160
I'm too used to writing blob.
7536
13:03:11,160 --> 13:03:15,160
And we need blob train, but I believe it's the same here.
7537
13:03:15,160 --> 13:03:18,160
And then blob.
7538
13:03:18,160 --> 13:03:22,160
There we go. So we have four classes.
7539
13:03:22,160 --> 13:03:26,160
So we need an output features value of four.
7540
13:03:26,160 --> 13:03:34,160
And now if we wanted to add nonlinearity here, we could put it in between our layers here like this.
7541
13:03:34,160 --> 13:03:41,160
But I asked the question before, do you think that this data set needs nonlinearity?
7542
13:03:41,160 --> 13:03:43,160
Well, let's leave it in there to begin with.
7543
13:03:43,160 --> 13:03:46,160
And one of the challenges for you, oh, do we need commerce here?
7544
13:03:46,160 --> 13:03:48,160
I think we need commerce here.
7545
13:03:48,160 --> 13:03:54,160
One of the challenges for you will be to test the model with nonlinearity
7546
13:03:54,160 --> 13:03:56,160
and without nonlinearity.
7547
13:03:56,160 --> 13:03:59,160
So let's just leave it in there for the time being.
7548
13:03:59,160 --> 13:04:01,160
What's missing from this?
7549
13:04:01,160 --> 13:04:03,160
Well, we need a forward method.
7550
13:04:03,160 --> 13:04:07,160
So def forward self X. What can we do here?
7551
13:04:07,160 --> 13:04:12,160
Well, because we've created this as a linear layer stack using nn.sequential,
7552
13:04:12,160 --> 13:04:18,160
we can just go return linear layer stack and pass it X.
7553
13:04:18,160 --> 13:04:20,160
And what's going to happen?
7554
13:04:20,160 --> 13:04:25,160
Whatever input goes into the forward method is just going to go through these layers sequentially.
7555
13:04:25,160 --> 13:04:30,160
Oh, we need to put self here because we've initialized it in the constructor.
7556
13:04:30,160 --> 13:04:31,160
Beautiful.
7557
13:04:31,160 --> 13:04:41,160
And now let's create an instance of blob model and send it to the target device.
7558
13:04:41,160 --> 13:04:45,160
We'll go model four equals blob model.
7559
13:04:45,160 --> 13:04:51,160
And then we can use our input features parameter, which is this one here.
7560
13:04:51,160 --> 13:04:54,160
And we're going to pass it a value of what?
7561
13:04:54,160 --> 13:04:55,160
Two.
7562
13:04:55,160 --> 13:04:58,160
And then output features. Why? Because we have two X features.
7563
13:04:58,160 --> 13:05:02,160
Now, the output feature is going to be the same as the number of classes that we have for.
7564
13:05:02,160 --> 13:05:05,160
If we had 10 classes, we'd set it to 10.
7565
13:05:05,160 --> 13:05:06,160
So we'll go four.
7566
13:05:06,160 --> 13:05:09,160
And then the hidden units is going to be eight by default.
7567
13:05:09,160 --> 13:05:12,160
So we don't have to put this here, but we're going to put it there anyway.
7568
13:05:12,160 --> 13:05:17,160
And then, of course, we're going to send this to device.
7569
13:05:17,160 --> 13:05:24,160
And then we're going to go model four.
7570
13:05:24,160 --> 13:05:26,160
What do we get wrong here?
7571
13:05:26,160 --> 13:05:30,160
Unexpected keyword argument output features.
7572
13:05:30,160 --> 13:05:31,160
Do we spell something wrong?
7573
13:05:31,160 --> 13:05:33,160
No doubt. We've got a spelling mistake.
7574
13:05:33,160 --> 13:05:40,160
Output features. Output features.
7575
13:05:40,160 --> 13:05:42,160
Oh, out features.
7576
13:05:42,160 --> 13:05:48,160
Ah, that's what we needed. Out features, not output.
7577
13:05:48,160 --> 13:05:50,160
I've got a little confused there.
7578
13:05:50,160 --> 13:05:51,160
Okay.
7579
13:05:51,160 --> 13:05:53,160
There we go. Okay, beautiful.
7580
13:05:53,160 --> 13:05:56,160
So just recall that the parameter here for an end up linear.
7581
13:05:56,160 --> 13:05:57,160
Did you pick up on that?
7582
13:05:57,160 --> 13:05:59,160
Is out features not output features.
7583
13:05:59,160 --> 13:06:05,160
Output features, a little confusing here, is our final layout output layers number of features there.
7584
13:06:05,160 --> 13:06:11,160
So we've now got a multi-class classification model that lines up with the data that we're using.
7585
13:06:11,160 --> 13:06:13,160
So the shapes line up. Beautiful.
7586
13:06:13,160 --> 13:06:15,160
Well, what's next?
7587
13:06:15,160 --> 13:06:20,160
Well, we have to create a loss function. And, of course, a training loop.
7588
13:06:20,160 --> 13:06:25,160
So I'll see you in the next few videos. And let's do that together.
7589
13:06:25,160 --> 13:06:31,160
Welcome back. In the last video, we created our multi-class classification model.
7590
13:06:31,160 --> 13:06:35,160
And we did so by subclassing an end up module.
7591
13:06:35,160 --> 13:06:39,160
And we set up a few parameters for our class constructor here.
7592
13:06:39,160 --> 13:06:44,160
So that when we made an instance of the blob model, we could customize the input features.
7593
13:06:44,160 --> 13:06:49,160
The output features. Remember, this lines up with how many features X has.
7594
13:06:49,160 --> 13:06:54,160
And the output features here lines up with how many classes are in our data.
7595
13:06:54,160 --> 13:06:58,160
So if we had 10 classes, we could change this to 10. And it would line up.
7596
13:06:58,160 --> 13:07:02,160
And then if we wanted 128 hidden units, well, we could change that.
7597
13:07:02,160 --> 13:07:07,160
So we're getting a little bit more programmatic with how we create models here.
7598
13:07:07,160 --> 13:07:14,160
And as you'll see later on, a lot of the things that we've built in here can also be functionalized in a similar matter.
7599
13:07:14,160 --> 13:07:16,160
But let's keep pushing forward. What's our next step?
7600
13:07:16,160 --> 13:07:23,160
If we build a model, if we refer to the workflow, you'd see that we have to create a loss function.
7601
13:07:23,160 --> 13:07:32,160
And an optimizer for a multi-class classification model.
7602
13:07:32,160 --> 13:07:36,160
And so what's our option here for creating a loss function?
7603
13:07:36,160 --> 13:07:39,160
Where do we find loss functions in PyTorch? I'm just going to get out of this.
7604
13:07:39,160 --> 13:07:44,160
And I'll make a new tab here. And if we search torch.nn
7605
13:07:44,160 --> 13:07:50,160
Because torch.nn is the basic building box for graphs. In other words, neural networks.
7606
13:07:50,160 --> 13:07:55,160
Where do we find loss functions? Hmm, here we go. Beautiful.
7607
13:07:55,160 --> 13:08:00,160
So we've seen that L1 loss or MSE loss could be used for regression, predicting a number.
7608
13:08:00,160 --> 13:08:07,160
And I'm here to tell you as well that for classification, we're going to be looking at cross entropy loss.
7609
13:08:07,160 --> 13:08:14,160
Now, this is for multi-class classification. For binary classification, we work with BCE loss.
7610
13:08:14,160 --> 13:08:20,160
And of course, there's a few more here, but I'm going to leave that as something that you can explore on your own.
7611
13:08:20,160 --> 13:08:24,160
Let's jump in to cross entropy loss.
7612
13:08:24,160 --> 13:08:30,160
So what do we have here? This criterion computes. Remember, a loss function in PyTorch is also referred to as a criterion.
7613
13:08:30,160 --> 13:08:36,160
You might also see loss function referred to as cost function, C-O-S-T.
7614
13:08:36,160 --> 13:08:43,160
But I call them loss functions. So this criterion computes the cross entropy loss between input and target.
7615
13:08:43,160 --> 13:08:49,160
Okay, so the input is something, and the target is our target labels.
7616
13:08:49,160 --> 13:08:54,160
It is useful when training a classification problem with C classes. There we go.
7617
13:08:54,160 --> 13:09:00,160
So that's what we're doing. We're training a classification problem with C classes, C is a number of classes.
7618
13:09:00,160 --> 13:09:06,160
If provided the optional argument, weight should be a 1D tensor assigning a weight to each of the classes.
7619
13:09:06,160 --> 13:09:15,160
So we don't have to apply a weight here, but why would you apply a weight? Well, it says, if we look at weight here,
7620
13:09:15,160 --> 13:09:20,160
this is particularly useful when you have an unbalanced training set. So just keep this in mind as you're going forward.
7621
13:09:20,160 --> 13:09:29,160
If you wanted to train a dataset that has imbalanced samples, in our case we have the same number of samples for each class,
7622
13:09:29,160 --> 13:09:33,160
but sometimes you might come across a dataset with maybe you only have 10 yellow dots.
7623
13:09:33,160 --> 13:09:39,160
And maybe you have 500 blue dots and only 100 red and 100 light blue dots.
7624
13:09:39,160 --> 13:09:44,160
So you have an unbalanced dataset. So that's where you can come in and have a look at the weight parameter here.
7625
13:09:44,160 --> 13:09:51,160
But for now, we're just going to keep things simple. We have a balanced dataset, and we're going to focus on using this loss function.
7626
13:09:51,160 --> 13:09:59,160
If you'd like to read more, please, you can read on here. And if you wanted to find out more, you could go, what is cross entropy loss?
7627
13:09:59,160 --> 13:10:05,160
And I'm sure you'll find a whole bunch of loss functions. There we go. There's the ML cheat sheet. I love that.
7628
13:10:05,160 --> 13:10:11,160
The ML glossary, that's one of my favorite websites. Towards data science, you'll find that website, Wikipedia.
7629
13:10:11,160 --> 13:10:17,160
Machine learning mastery is also another fantastic website. But you can do that all in your own time.
7630
13:10:17,160 --> 13:10:26,160
Let's code together, hey. We'll set up a loss function. Oh, and one more resource before we get into code is that we've got the architecture,
7631
13:10:26,160 --> 13:10:37,160
well, the typical architecture of a classification model. The loss function for multi-class classification is cross entropy or torch.nn.cross entropy loss.
7632
13:10:37,160 --> 13:10:51,160
Let's code it out. If in doubt, code it out. So create a loss function for multi-class classification.
7633
13:10:51,160 --> 13:11:03,160
And then we go, loss fn equals, and then dot cross entropy loss. Beautiful. And then we want to create an optimizer.
7634
13:11:03,160 --> 13:11:12,160
Create an optimizer for multi-class classification. And then the beautiful thing about optimizers is they're quite flexible.
7635
13:11:12,160 --> 13:11:20,160
They can go across a wide range of different problems. So the optimizer. So two of the most common, and I say most common because they work quite well.
7636
13:11:20,160 --> 13:11:30,160
Across a wide range of problems. So that's why I've only listed two here. But of course, within the torch dot opt in module, you will find a lot more different optimizers.
7637
13:11:30,160 --> 13:11:43,160
But let's stick with SGD for now. And we'll go back and go optimizer equals torch dot opt in for optimizer SGD for stochastic gradient descent.
7638
13:11:43,160 --> 13:11:51,160
The parameters we want our optimizer to optimize model four, we're up to our fourth model already. Oh my goodness.
7639
13:11:51,160 --> 13:12:00,160
Model four dot parameters. And we'll set the learning rate to 0.1. Of course, you could change the learning rate if you wanted to.
7640
13:12:00,160 --> 13:12:09,160
In fact, I'd encourage you to see what happens if you do because why the learning rate is a hyper parameter.
7641
13:12:09,160 --> 13:12:22,160
I'm better at writing code than I am at spelling. You can change. Wonderful. So we've now got a loss function and an optimizer for a multi class classification problem.
7642
13:12:22,160 --> 13:12:26,160
What's next? Well, we could start to build.
7643
13:12:26,160 --> 13:12:35,160
Building a training loop. We could start to do that, but I think we have a look at what the outputs of our model are.
7644
13:12:35,160 --> 13:12:47,160
So more specifically, so getting prediction probabilities for a multi class pie torch model.
7645
13:12:47,160 --> 13:12:56,160
So my challenge to you before the next video is to have a look at what happens if you pass x blob test through a model.
7646
13:12:56,160 --> 13:13:01,160
And remember, what is a model's raw output? What is that referred to as?
7647
13:13:01,160 --> 13:13:06,160
Oh, I'll let you have a think about that before the next video. I'll see you there.
7648
13:13:06,160 --> 13:13:13,160
Welcome back. In the last video, we created a loss function and an optimizer for our multi class classification model.
7649
13:13:13,160 --> 13:13:21,160
And recall the loss function measures how wrong our model's predictions are.
7650
13:13:21,160 --> 13:13:35,160
And the optimizer optimizer updates our model parameters to try and reduce the loss.
7651
13:13:35,160 --> 13:13:44,160
So that's what that does. And I also issued the challenge of doing a forward pass with model four, which is the most recent model that we created.
7652
13:13:44,160 --> 13:13:53,160
And oh, did I just give you some code that wouldn't work? Did I do that on purpose? Maybe, maybe not, you'll never know.
7653
13:13:53,160 --> 13:14:00,160
So if this did work, what are the raw outputs of our model? Let's get some raw outputs of our model.
7654
13:14:00,160 --> 13:14:04,160
And if you recall, the raw outputs of a model are called logits.
7655
13:14:04,160 --> 13:14:11,160
So we got a runtime error expected. All tensors to be on the same device are of course. Why did this come up?
7656
13:14:11,160 --> 13:14:20,160
Well, because if we go next model for dot parameters, and if we check device, what happens here?
7657
13:14:20,160 --> 13:14:29,160
Oh, we need to bring this in. Our model is on the CUDA device, whereas our data is on the CPU still.
7658
13:14:29,160 --> 13:14:35,160
Can we go X? Is our data a tensor? Can we check the device parameter of that? I think we can.
7659
13:14:35,160 --> 13:14:41,160
I might be proven wrong here. Oh, it's on the CPU. Of course, we're getting a runtime error.
7660
13:14:41,160 --> 13:14:45,160
Did you catch that one? If you did, well done. So let's see what happens.
7661
13:14:45,160 --> 13:14:53,160
But before we do a forward pass, how about we turn our model into a vowel mode to make some predictions with torch dot inference mode?
7662
13:14:53,160 --> 13:14:59,160
We'll make some predictions. We don't necessarily have to do this because it's just tests, but it's a good habit.
7663
13:14:59,160 --> 13:15:08,160
Oh, why prads? Equals, what do we get? Why prads? And maybe we'll just view the first 10.
7664
13:15:08,160 --> 13:15:17,160
What do we get here? Oh, my goodness. How much are numbers on a page? Is this the same format as our data or our test labels?
7665
13:15:17,160 --> 13:15:25,160
Let's have a look. No, it's not. Okay. Oh, we need why blob test. Excuse me.
7666
13:15:25,160 --> 13:15:33,160
We're going to make that mistake a fair few times here. So we need to get this into the format of this. Hmm.
7667
13:15:33,160 --> 13:15:42,160
How can we do that? Now, I want you to notice one thing as well is that we have one value here per one value, except that this is actually four values.
7668
13:15:42,160 --> 13:15:54,160
Now, why is that? We have one, two, three, four. Well, that is because we set the out features up here. Our model outputs four features per sample.
7669
13:15:54,160 --> 13:16:01,160
So each sample right now has four numbers associated with it. And what are these called? These are the logits.
7670
13:16:01,160 --> 13:16:10,160
Now, what we have to do here, so let's just write this down in order to evaluate and train and test our model.
7671
13:16:10,160 --> 13:16:25,160
We need to convert our model's outputs, outputs which are logits to prediction probabilities, and then to prediction labels.
7672
13:16:25,160 --> 13:16:34,160
So we've done this before, but for binary classification. So we have to go from logits to predprobs to pred labels.
7673
13:16:34,160 --> 13:16:43,160
All right, I think we can do this. So we've got some logits here. Now, how do we convert these logits to prediction probabilities?
7674
13:16:43,160 --> 13:16:50,160
Well, we use an activation function. And if we go back to our architecture, what's our output activation here?
7675
13:16:50,160 --> 13:17:00,160
For a binary classification, we use sigmoid. But for multi-class classification, these are the two main differences between multi-class classification and binary classification.
7676
13:17:00,160 --> 13:17:06,160
One uses softmax, one uses cross entropy. And it's going to take a little bit of practice to know this off by heart.
7677
13:17:06,160 --> 13:17:11,160
It took me a while, but that's why we have nice tables like this. And that's why we write a lot of code together.
7678
13:17:11,160 --> 13:17:20,160
So we're going to use a softmax function here to convert out logits. Our models raw outputs, which is this here, to prediction probabilities.
7679
13:17:20,160 --> 13:17:30,160
And let's see that. So convert our models, logit outputs to prediction probabilities.
7680
13:17:30,160 --> 13:17:39,160
So let's create why predprobs. So I like to call prediction probabilities predprobs for short.
7681
13:17:39,160 --> 13:17:47,160
So torch dot softmax. And then we go why logits. And we want it across the first dimension.
7682
13:17:47,160 --> 13:17:55,160
So let's have a look. If we print why logits, we'll get the first five values there. And then look at the conversion here.
7683
13:17:55,160 --> 13:18:06,160
Why logits? Oh, why predprobs? That's what we want to compare. Predprobs. Five. Let's check this out.
7684
13:18:06,160 --> 13:18:16,160
Oh, what did we get wrong here? Why logits? Do we have why logits? Oh, no. We should change this to why logits, because really that's the raw output of our model here.
7685
13:18:16,160 --> 13:18:26,160
Why logits? Let's rerun that. Check that. We know that these are different to these, but we ideally want these to be in the same format as these, our test labels.
7686
13:18:26,160 --> 13:18:36,160
These are our models predictions. And now we should be able to convert. There we go. Okay, beautiful. What's happening here? Let's just get out of this.
7687
13:18:36,160 --> 13:18:45,160
And we will add a few code cells here. So we have some space. Now, if you wanted to find out what's happening with torch dot softmax, what could you do?
7688
13:18:45,160 --> 13:18:57,160
We could go torch softmax. See what's happening. Softmax. Okay, so here's the function that's happening. We replicated some nonlinear activation functions before.
7689
13:18:57,160 --> 13:19:04,160
So if you wanted to replicate this, what could you do? Well, if in doubt, code it out. You could code this out. You've got the tools to do so.
7690
13:19:04,160 --> 13:19:16,160
We've got softmax to some X input takes the exponential of X. So torch exponential over the sum of torch exponential of X. So I think you could code that out if you wanted to.
7691
13:19:16,160 --> 13:19:27,160
But let's for now just stick with what we've got. We've got some logits here, and we've got some softmax, some logits that have been passed through the softmax function.
7692
13:19:27,160 --> 13:19:35,160
So that's what's happened here. We've passed our logits as the input here, and it's gone through this activation function.
7693
13:19:35,160 --> 13:19:43,160
These are prediction probabilities. And you might be like, Daniel, these are still just numbers on a page. But you also notice that none of them are negative.
7694
13:19:43,160 --> 13:19:50,160
Okay, and there's another little tidbit about what's going on here. If we sum one of them up, let's get the first one.
7695
13:19:50,160 --> 13:19:58,160
Will this work? And if we go torch dot sum, what happens?
7696
13:19:58,160 --> 13:20:09,160
Ooh, they all sum up to one. So that's one of the effects of the softmax function. And then if we go torch dot max of Y-pred probes.
7697
13:20:09,160 --> 13:20:12,160
So this is a prediction probability.
7698
13:20:12,160 --> 13:20:25,160
For multi class, you'll find that for this particular sample here, the 0th sample, this is the maximum number. And so our model, what this is saying is our model is saying, this is the prediction probability.
7699
13:20:25,160 --> 13:20:34,160
This is how much I think it is class 0. This number here, it's in order. This is how much I think it is class 1. This is how much I think it is class 2.
7700
13:20:34,160 --> 13:20:43,160
This is how much I think it is class 3. And so we have one value for each of our four classes, a little bit confusing because it's 0th indexed.
7701
13:20:43,160 --> 13:20:55,160
But the maximum value here is this index. And so how would we get the particular index value of whatever the maximum number is across these values?
7702
13:20:55,160 --> 13:21:08,160
Well, we can take the argmax and we get tensor 1. So for this particular sample, this one here, our model, and these guesses or these predictions aren't very good.
7703
13:21:08,160 --> 13:21:17,160
Why is that? Well, because our model is still just predicting with random numbers, we haven't trained it yet. So this is just random output here, basically.
7704
13:21:17,160 --> 13:21:30,160
But for now, the premise still remains that our model thinks that for this sample using random numbers, it thinks that index 1 is the right class or class number 1 for this particular sample.
7705
13:21:30,160 --> 13:21:39,160
And then for this next one, what's the maximum number here? I think it would be the 0th index and the same for the next one. What's the maximum number here?
7706
13:21:39,160 --> 13:21:46,160
Well, it would be the 0th index as well. But of course, these numbers are going to change once we've trained our model.
7707
13:21:46,160 --> 13:22:02,160
So how do we get the maximum index value of all of these? So this is where we can go, convert our model's prediction probabilities to prediction labels.
7708
13:22:02,160 --> 13:22:18,160
So let's do that. We can go ypreds equals torch dot argmax on ypredprobs. And if we go across the first dimension as well. So now let's have a look at ypreds.
7709
13:22:18,160 --> 13:22:36,160
Do we have prediction labels in the same format as our ylob test? Beautiful. Yes, we do. Although many of them are wrong, as you can see, ideally they would line up with each other.
7710
13:22:36,160 --> 13:22:44,160
But because our model is predicting or making predictions with random numbers, so they haven't been our model hasn't been trained. All of these are basically random outputs.
7711
13:22:44,160 --> 13:22:52,160
So hopefully once we train our model, it's going to line up the values of the predictions are going to line up with the values of the test labels.
7712
13:22:52,160 --> 13:23:03,160
But that is how we go from our model's raw outputs to prediction probabilities to prediction labels for a multi-class classification problem.
7713
13:23:03,160 --> 13:23:25,160
So let's just add the steps here, logits, raw output of the model, predprobs, to get the prediction probabilities, use torch dot softmax or the softmax activation function, pred labels, take the argmax of the prediction probabilities.
7714
13:23:25,160 --> 13:23:37,160
So we're going to see this in action later on when we evaluate our model, but I feel like now that we know how to go from logits to prediction probabilities to pred labels, we can write a training loop.
7715
13:23:37,160 --> 13:23:51,160
So let's set that up. 8.5, create a training loop, and testing loop for a multi-class pytorch model. This is so exciting.
7716
13:23:51,160 --> 13:23:59,160
I'll see you in the next video. Let's build our first training and testing loop for a multi-class pytorch model, and I'll give you a little hint.
7717
13:23:59,160 --> 13:24:05,160
It's quite similar to the training and testing loops we've built before, so you might want to give it a shot. I think you can.
7718
13:24:05,160 --> 13:24:09,160
Otherwise, we'll do it together in the next video.
7719
13:24:09,160 --> 13:24:20,160
Welcome back. In the last video, we covered how to go from raw logits, which is the output of the model, the raw output of the model for a multi-class pytorch model.
7720
13:24:20,160 --> 13:24:38,160
Then we turned our logits into prediction probabilities using torch.softmax, and then we turn those prediction probabilities into prediction labels by taking the argmax, which returns the index of where the maximum value occurs in the prediction probability.
7721
13:24:38,160 --> 13:24:51,160
So for this particular sample, with these four values, because it outputs four values, because we're working with four classes, if we were working with 10 classes, it would have 10 values, the principle of these steps would still be the same.
7722
13:24:51,160 --> 13:25:03,160
So for this particular sample, this is the value that's the maximum, so we would take that index, which is 1. For this one, the index 0 has the maximum value.
7723
13:25:03,160 --> 13:25:11,160
For this sample, same again, and then same again, I mean, these prediction labels are just random, right? So they're quite terrible.
7724
13:25:11,160 --> 13:25:17,160
But now we're going to change that, because we're going to build a training and testing loop for our multi-class model.
7725
13:25:17,160 --> 13:25:24,160
Let's do that. So fit the multi-class model to the data.
7726
13:25:24,160 --> 13:25:29,160
Let's go set up some manual seeds.
7727
13:25:29,160 --> 13:25:38,160
Torch dot manual seed, again, don't worry too much if our numbers on the page are not exactly the same. That's inherent to the randomness of machine learning.
7728
13:25:38,160 --> 13:25:47,160
We're setting up the manual seeds to try and get them as close as possible, but these do not guarantee complete determinism, which means the same output.
7729
13:25:47,160 --> 13:25:51,160
But we're going to try. The direction is more important.
7730
13:25:51,160 --> 13:26:01,160
Set number of epochs. We're going to go epochs. How about we just do 100? I reckon we'll start with that. We can bump it up to 1000 if we really wanted to.
7731
13:26:01,160 --> 13:26:10,160
Let's put the data to the target device. What's our target device? Well, it doesn't really matter because we've set device agnostic code.
7732
13:26:10,160 --> 13:26:18,160
So whether we're working with a CPU or a GPU, our code will use whatever device is available. I'm typing blog again.
7733
13:26:18,160 --> 13:26:27,160
So we've got x blob train, y blob train. This is going to go where? It's going to go to the device.
7734
13:26:27,160 --> 13:26:43,160
And y blob train to device. And we're going to go x blob test. And then y blob test equals x blob test to device.
7735
13:26:43,160 --> 13:26:54,160
Otherwise, we'll get device issues later on, and we'll send this to device as well. Beautiful. Now, what do we do now? Well, we loop through data.
7736
13:26:54,160 --> 13:27:01,160
Loop through data. So for an epoch in range epochs for an epoch in a range.
7737
13:27:01,160 --> 13:27:05,160
Epox. I don't want that auto correct. Come on, Google Colab. Work with me here.
7738
13:27:05,160 --> 13:27:12,160
We're training our first multi-class classification model. This is serious business. No, I'm joking. It's actually quite fun.
7739
13:27:12,160 --> 13:27:21,160
So model four dot train. And let's do the forward pass. I'm not going to put much commentary here because we've been through this before.
7740
13:27:21,160 --> 13:27:29,160
But what are the logits? The logits are raw outputs of our model. So we'll just go x blob train.
7741
13:27:29,160 --> 13:27:41,160
And x test. I didn't want that. X blob train. Why did that do that? I need to turn off auto correct in Google Colab. I've been saying it for a long time.
7742
13:27:41,160 --> 13:27:50,160
Y pred equals torch dot softmax. So what are we doing here? We're going from logits to prediction probabilities here.
7743
13:27:50,160 --> 13:28:03,160
So torch softmax. Y logits. Across the first dimension. And then we can take the argmax of this and dim equals one.
7744
13:28:03,160 --> 13:28:08,160
In fact, I'm going to show you a little bit of, oh, I've written blog here. Maybe auto correct would have been helpful for that.
7745
13:28:08,160 --> 13:28:16,160
A little trick. You don't actually have to do the torch softmax. The logits. If you just took the argmax of the logits is a little test for you.
7746
13:28:16,160 --> 13:28:23,160
Just take the argmax of the logits. And see, do you get the same similar outputs as what you get here?
7747
13:28:23,160 --> 13:28:31,160
So I've seen that done before, but for completeness, we're going to use the softmax activation function because you'll often see this in practice.
7748
13:28:31,160 --> 13:28:41,160
And now what do we do? We calculate the loss. So the loss FM. We're going to use categorical cross entropy here or just cross entropy loss.
7749
13:28:41,160 --> 13:28:52,160
So if we check our loss function, what do we have? We have cross entropy loss. We're going to compare our models, logits to y blob train.
7750
13:28:52,160 --> 13:28:58,160
And then what are we going to do? We're going to calculate the accuracy because we're working with the classification problem.
7751
13:28:58,160 --> 13:29:06,160
It'd be nice if we had accuracy as well as loss. Accuracy is one of the main classification evaluation metrics.
7752
13:29:06,160 --> 13:29:18,160
y pred equals y pred. y pred. And now what do we do? Well, we have to zero grab the optimizer. Optimizer zero grad.
7753
13:29:18,160 --> 13:29:25,160
Then we go loss backward. And then we step the optimizer. Optimizer step, step, step.
7754
13:29:25,160 --> 13:29:33,160
So none of these steps we haven't covered before. We do the forward pass. We calculate the loss and any evaluation metric we choose to do so.
7755
13:29:33,160 --> 13:29:39,160
We zero the optimizer. We perform back propagation on the loss. And we step the optimizer.
7756
13:29:39,160 --> 13:29:47,160
The optimizer will hopefully behind the scenes update the parameters of our model to better represent the patterns in our training data.
7757
13:29:47,160 --> 13:29:53,160
And so we're going to go testing code here. What do we do for testing code? Well, or inference code.
7758
13:29:53,160 --> 13:29:58,160
We set our model to a vowel mode.
7759
13:29:58,160 --> 13:30:04,160
That's going to turn off a few things behind the scenes that our model doesn't need such as dropout layers, which we haven't covered.
7760
13:30:04,160 --> 13:30:10,160
But you're more than welcome to check them out if you go torch and end.
7761
13:30:10,160 --> 13:30:18,160
Dropout layers. Do we have dropout? Dropout layers. Beautiful. And another one that it turns off is match norm.
7762
13:30:18,160 --> 13:30:24,160
Beautiful. And also you could search this. What does model dot a vowel do?
7763
13:30:24,160 --> 13:30:29,160
And you might come across stack overflow question. One of my favorite resources.
7764
13:30:29,160 --> 13:30:34,160
So there's a little bit of extra curriculum. But I prefer to see things in action.
7765
13:30:34,160 --> 13:30:40,160
So with torch inference mode, again, this turns off things like gradient tracking and a few more things.
7766
13:30:40,160 --> 13:30:45,160
So we get as fast as code as possible because we don't need to track gradients when we're making predictions.
7767
13:30:45,160 --> 13:30:49,160
We just need to use the parameters that our model has learned.
7768
13:30:49,160 --> 13:30:57,160
We want X blob test to go to our model here for the test logits. And then for the test preds, we're going to do the same step as what we've done here.
7769
13:30:57,160 --> 13:31:04,160
We're going to go torch dot softmax on the test logits across the first dimension.
7770
13:31:04,160 --> 13:31:12,160
And we're going to call the argmax on that to get the index value of where the maximum prediction probability value occurs.
7771
13:31:12,160 --> 13:31:19,160
We're going to calculate the test loss or loss function. We're going to pass in what the test logits here.
7772
13:31:19,160 --> 13:31:24,160
Then we're going to pass in why blob test compare the test logits behind the scenes.
7773
13:31:24,160 --> 13:31:32,160
Our loss function is going to do some things that convert the test logits into the same format as our test labels and then return us a value for that.
7774
13:31:32,160 --> 13:31:40,160
Then we'll also calculate the test accuracy here by passing in the why true as why blob test.
7775
13:31:40,160 --> 13:31:43,160
And we have the y pred equals y pred.
7776
13:31:43,160 --> 13:31:44,160
Wonderful.
7777
13:31:44,160 --> 13:31:46,160
And then what's our final step?
7778
13:31:46,160 --> 13:31:53,160
Well, we want to print out what's happening because I love seeing metrics as our model trains.
7779
13:31:53,160 --> 13:31:55,160
It's one of my favorite things to watch.
7780
13:31:55,160 --> 13:32:00,160
If we go if epoch, let's do it every 10 epochs because we've got 100 so far.
7781
13:32:00,160 --> 13:32:02,160
It equals zero.
7782
13:32:02,160 --> 13:32:08,160
Let's print out a nice f string with epoch.
7783
13:32:08,160 --> 13:32:11,160
And then we're going to go loss.
7784
13:32:11,160 --> 13:32:12,160
What do we put in here?
7785
13:32:12,160 --> 13:32:20,160
We'll get our loss value, but we'll take it to four decimal places and we'll get the training accuracy, which will be acc.
7786
13:32:20,160 --> 13:32:26,160
And we'll take this to two decimal places and we'll get a nice percentage sign there.
7787
13:32:26,160 --> 13:32:33,160
And we'll go test loss equals test loss and we'll go there.
7788
13:32:33,160 --> 13:32:38,160
And finally, we'll go test act at the end here, test act.
7789
13:32:38,160 --> 13:32:41,160
Now, I'm sure by now we've written a fair few of these.
7790
13:32:41,160 --> 13:32:46,160
You're either getting sick of them or you're like, wow, I can actually do the steps through here.
7791
13:32:46,160 --> 13:32:49,160
And so don't worry, we're going to be functionalizing all of this later on,
7792
13:32:49,160 --> 13:32:54,160
but I thought I'm going to include them as much as possible so that we can practice as much as possible together.
7793
13:32:54,160 --> 13:32:55,160
So you ready?
7794
13:32:55,160 --> 13:32:59,160
We're about to train our first multi-class classification model.
7795
13:32:59,160 --> 13:33:03,160
In three, two, one, let's go.
7796
13:33:03,160 --> 13:33:04,160
No typos.
7797
13:33:04,160 --> 13:33:05,160
Of course.
7798
13:33:05,160 --> 13:33:07,160
What do we get wrong here?
7799
13:33:07,160 --> 13:33:09,160
Oh, this is a fun error.
7800
13:33:09,160 --> 13:33:11,160
Runtime error.
7801
13:33:11,160 --> 13:33:16,160
NLL loss for reduced CUDA kernel to the index not implemented for float.
7802
13:33:16,160 --> 13:33:19,160
Okay, that's a pretty full on bunch of words there.
7803
13:33:19,160 --> 13:33:21,160
I don't really know how to describe that.
7804
13:33:21,160 --> 13:33:22,160
But here's a little hint.
7805
13:33:22,160 --> 13:33:23,160
We've got float there.
7806
13:33:23,160 --> 13:33:25,160
So we know that float is what?
7807
13:33:25,160 --> 13:33:26,160
Float is a form of data.
7808
13:33:26,160 --> 13:33:28,160
It's a data type.
7809
13:33:28,160 --> 13:33:31,160
So potentially because that's our hint.
7810
13:33:31,160 --> 13:33:33,160
We said not implemented for float.
7811
13:33:33,160 --> 13:33:35,160
So maybe we've got something wrong up here.
7812
13:33:35,160 --> 13:33:38,160
Our data is of the wrong type.
7813
13:33:38,160 --> 13:33:43,160
Can you see anywhere where our data might be the wrong type?
7814
13:33:43,160 --> 13:33:45,160
Well, let's print it out.
7815
13:33:45,160 --> 13:33:47,160
Where's our issue here?
7816
13:33:47,160 --> 13:33:48,160
Why logits?
7817
13:33:48,160 --> 13:33:50,160
Why blob train?
7818
13:33:50,160 --> 13:33:51,160
Okay.
7819
13:33:51,160 --> 13:33:53,160
Why blob train?
7820
13:33:53,160 --> 13:33:54,160
And why logits?
7821
13:33:54,160 --> 13:33:57,160
What does why blob train look like?
7822
13:33:57,160 --> 13:34:01,160
Why blob train?
7823
13:34:01,160 --> 13:34:02,160
Okay.
7824
13:34:02,160 --> 13:34:08,160
And what's the D type here?
7825
13:34:08,160 --> 13:34:09,160
Float.
7826
13:34:09,160 --> 13:34:10,160
Okay.
7827
13:34:10,160 --> 13:34:13,160
So it's not implemented for float.
7828
13:34:13,160 --> 13:34:14,160
Hmm.
7829
13:34:14,160 --> 13:34:16,160
Maybe we have to turn them into a different data type.
7830
13:34:16,160 --> 13:34:26,160
What if we went type torch long tensor?
7831
13:34:26,160 --> 13:34:28,160
What happens here?
7832
13:34:28,160 --> 13:34:31,160
Expected all tensors to be on the same device but found at least two devices.
7833
13:34:31,160 --> 13:34:32,160
Oh, my goodness.
7834
13:34:32,160 --> 13:34:34,160
What do we got wrong here?
7835
13:34:34,160 --> 13:34:37,160
Type torch long tensor.
7836
13:34:37,160 --> 13:34:38,160
Friends.
7837
13:34:38,160 --> 13:34:39,160
Guess what?
7838
13:34:39,160 --> 13:34:40,160
I found it.
7839
13:34:40,160 --> 13:34:44,160
And so it was to do with this pesky little data type issue here.
7840
13:34:44,160 --> 13:34:49,160
So if we run this again and now this one took me a while to find and I want you to know that,
7841
13:34:49,160 --> 13:34:53,160
that behind the scenes, even though, again, this is a machine learning cooking show,
7842
13:34:53,160 --> 13:34:56,160
it still takes a while to troubleshoot code and you're going to come across this.
7843
13:34:56,160 --> 13:35:00,160
But I thought rather than spend 10 minutes doing it in a video, I'll show you what I did.
7844
13:35:00,160 --> 13:35:04,160
So we went through this and we found that, hmm, there's something going on here.
7845
13:35:04,160 --> 13:35:06,160
I don't quite know what this is.
7846
13:35:06,160 --> 13:35:11,160
And that seems quite like a long string of words, not implemented for float.
7847
13:35:11,160 --> 13:35:14,160
And then we looked back at the line where it went wrong.
7848
13:35:14,160 --> 13:35:21,160
And so that we know that maybe the float is hinting at that one of these two tensors is of the wrong data type.
7849
13:35:21,160 --> 13:35:24,160
Now, why would we think that it's the wrong data type?
7850
13:35:24,160 --> 13:35:32,160
Well, because anytime you see float or int or something like that, it generally hints at one of your data types being wrong.
7851
13:35:32,160 --> 13:35:40,160
And so the error is actually right back up here where we created our tensor data.
7852
13:35:40,160 --> 13:35:45,160
So we turned our labels here into float, which generally is okay in PyTorch.
7853
13:35:45,160 --> 13:35:51,160
However, this one should be of type torch dot long tensor, which we haven't seen before.
7854
13:35:51,160 --> 13:35:58,160
But if we go into torch long tensor, let's have a look torch dot tensor.
7855
13:35:58,160 --> 13:36:01,160
Do we have long tensor?
7856
13:36:01,160 --> 13:36:02,160
Here we go.
7857
13:36:02,160 --> 13:36:04,160
64 bit integer signed.
7858
13:36:04,160 --> 13:36:08,160
So why do we need torch dot long tensor?
7859
13:36:08,160 --> 13:36:10,160
And again, this took me a while to find.
7860
13:36:10,160 --> 13:36:20,160
And so I want to express this that in your own code, you probably will butt your head up against some issues and errors that do take you a while to find.
7861
13:36:20,160 --> 13:36:22,160
And data types is one of the main ones.
7862
13:36:22,160 --> 13:36:29,160
But if we look in the documentation for cross entropy loss, the way I kind of found this out was this little hint here.
7863
13:36:29,160 --> 13:36:36,160
The performance of the criteria is generally better when the target contains class indices, as this allows for optimized computation.
7864
13:36:36,160 --> 13:36:40,160
But I read this and it says target contains class indices.
7865
13:36:40,160 --> 13:36:46,160
I'm like, hmm, alza indices already, but maybe they should be integers and not floats.
7866
13:36:46,160 --> 13:36:54,160
But then if you actually just look at the sample code, you would find that they use d type equals torch dot long.
7867
13:36:54,160 --> 13:37:03,160
Now, that's the thing with a lot of code around the internet is that sometimes the answer you're looking for is a little bit buried.
7868
13:37:03,160 --> 13:37:10,160
But if in doubt, run the code and butt your head up against a wall for a bit and keep going.
7869
13:37:10,160 --> 13:37:14,160
So let's just rerun all of this and see do we have an error here?
7870
13:37:14,160 --> 13:37:19,160
Let's train our first multi-class classification model together.
7871
13:37:19,160 --> 13:37:20,160
No arrows, fingers crossed.
7872
13:37:20,160 --> 13:37:22,160
But what did we get wrong here?
7873
13:37:22,160 --> 13:37:24,160
OK, so we've got different size.
7874
13:37:24,160 --> 13:37:27,160
We're slowly working through all of the errors in deep learning here.
7875
13:37:27,160 --> 13:37:31,160
Value error, input batch size 200 to match target size 200.
7876
13:37:31,160 --> 13:37:40,160
So this is telling me maybe our test data, which is of size 200, is getting mixed up with our training data, which is of size 800.
7877
13:37:40,160 --> 13:37:49,160
So we've got test loss, the test logits, model four.
7878
13:37:49,160 --> 13:37:50,160
What's the size?
7879
13:37:50,160 --> 13:37:57,160
Let's print out print test logits dot shape and wine blob test.
7880
13:37:57,160 --> 13:38:01,160
So troubleshooting on the fly here, everyone.
7881
13:38:01,160 --> 13:38:03,160
What do we got?
7882
13:38:03,160 --> 13:38:06,160
Torch size 800.
7883
13:38:06,160 --> 13:38:09,160
Where are our test labels coming from?
7884
13:38:09,160 --> 13:38:13,160
Wine blob test equals, oh, there we go.
7885
13:38:13,160 --> 13:38:15,160
Ah, did you catch that before?
7886
13:38:15,160 --> 13:38:17,160
Maybe you did, maybe you didn't.
7887
13:38:17,160 --> 13:38:19,160
But I think we should be right here.
7888
13:38:19,160 --> 13:38:24,160
Now if we just comment out this line, so we've had a data type issue and we've had a shape issue.
7889
13:38:24,160 --> 13:38:28,160
Two of the main and machine learning, oh, and again, we've had some issues.
7890
13:38:28,160 --> 13:38:29,160
Wine blob test.
7891
13:38:29,160 --> 13:38:30,160
What's going on here?
7892
13:38:30,160 --> 13:38:33,160
I thought we just changed the shape.
7893
13:38:33,160 --> 13:38:41,160
Oh, no, we have to go up and reassign it again because now this is definitely why blob, yes.
7894
13:38:41,160 --> 13:38:49,160
Let's rerun all of this, reassign our data.
7895
13:38:49,160 --> 13:38:53,160
We are running into every single error here, but I'm glad we're doing this because otherwise you might not see how to
7896
13:38:53,160 --> 13:38:56,160
troubleshoot these type of things.
7897
13:38:56,160 --> 13:38:59,160
So the size of a tensor much match the size.
7898
13:38:59,160 --> 13:39:01,160
Oh, we're getting the issue here.
7899
13:39:01,160 --> 13:39:03,160
Test spreads.
7900
13:39:03,160 --> 13:39:04,160
Oh, my goodness.
7901
13:39:04,160 --> 13:39:06,160
We have written so much code here.
7902
13:39:06,160 --> 13:39:07,160
Test spreads.
7903
13:39:07,160 --> 13:39:12,160
So instead of wire spread, this should be test spreads.
7904
13:39:12,160 --> 13:39:13,160
Fingers crossed.
7905
13:39:13,160 --> 13:39:15,160
Are we training our first model yet or what?
7906
13:39:15,160 --> 13:39:16,160
There we go.
7907
13:39:16,160 --> 13:39:18,160
Okay, I'm going to printing out some stuff.
7908
13:39:18,160 --> 13:39:20,160
I don't really want to print out that stuff.
7909
13:39:20,160 --> 13:39:23,160
I want to see the loss go down, so I'm going to.
7910
13:39:23,160 --> 13:39:29,160
So friends, I hope you know that we've just been through some of the most fundamental troubleshooting steps.
7911
13:39:29,160 --> 13:39:32,160
And you might say, oh, Daniel, there's a cop out because you're just coding wrong.
7912
13:39:32,160 --> 13:39:36,160
And in fact, I code wrong all the time.
7913
13:39:36,160 --> 13:39:42,160
But we've now worked out how to troubleshoot them shape errors and data type errors.
7914
13:39:42,160 --> 13:39:43,160
But look at this.
7915
13:39:43,160 --> 13:39:46,160
After all of that, thank goodness.
7916
13:39:46,160 --> 13:39:51,160
Our loss and accuracy go in the directions that we want them to go.
7917
13:39:51,160 --> 13:39:55,160
So our loss goes down and our accuracy goes up.
7918
13:39:55,160 --> 13:39:56,160
Beautiful.
7919
13:39:56,160 --> 13:40:01,160
So it looks like that our model is working on a multi-class classification data set.
7920
13:40:01,160 --> 13:40:03,160
So how do we check that?
7921
13:40:03,160 --> 13:40:08,160
Well, we're going to evaluate it in the next step by visualize, visualize, visualize.
7922
13:40:08,160 --> 13:40:10,160
So you might want to give that a shot.
7923
13:40:10,160 --> 13:40:14,160
See if you can use our plot decision boundary function.
7924
13:40:14,160 --> 13:40:17,160
We'll use our model to separate the data here.
7925
13:40:17,160 --> 13:40:21,160
So it's going to be much the same as what we did for binary classification.
7926
13:40:21,160 --> 13:40:25,160
But this time we're using a different model and a different data set.
7927
13:40:25,160 --> 13:40:28,160
I'll see you there.
7928
13:40:28,160 --> 13:40:29,160
Welcome back.
7929
13:40:29,160 --> 13:40:33,160
In the last video, we went through some of the steps that we've been through before
7930
13:40:33,160 --> 13:40:36,160
in terms of training and testing a model.
7931
13:40:36,160 --> 13:40:42,160
But we also butted our heads up against two of the most common issues in machine learning and deep learning in general.
7932
13:40:42,160 --> 13:40:45,160
And that is data type issues and shape issues.
7933
13:40:45,160 --> 13:40:48,160
But luckily we were able to resolve them.
7934
13:40:48,160 --> 13:40:54,160
And trust me, you're going to run across many of them in your own deep learning and machine learning endeavors.
7935
13:40:54,160 --> 13:40:59,160
So I'm glad that we got to have a look at them and sort of I could show you what I do to troubleshoot them.
7936
13:40:59,160 --> 13:41:02,160
But in reality, it's a lot of experimentation.
7937
13:41:02,160 --> 13:41:08,160
Run the code, see what errors come out, Google the errors, read the documentation, try again.
7938
13:41:08,160 --> 13:41:16,160
But with that being said, it looks like that our model, our multi-class classification model has learned something.
7939
13:41:16,160 --> 13:41:19,160
The loss is going down, the accuracy is going up.
7940
13:41:19,160 --> 13:41:31,160
But we can further evaluate this by making and evaluating predictions with a PyTorch multi-class model.
7941
13:41:31,160 --> 13:41:33,160
So how do we make predictions?
7942
13:41:33,160 --> 13:41:36,160
We've seen this step before, but let's reiterate.
7943
13:41:36,160 --> 13:41:40,160
Make predictions, we're going to set our model to what mode, a vowel mode.
7944
13:41:40,160 --> 13:41:44,160
And then we're going to turn on what context manager, inference mode.
7945
13:41:44,160 --> 13:41:47,160
Because we want to make inference, we want to make predictions.
7946
13:41:47,160 --> 13:41:49,160
Now what do we make predictions on?
7947
13:41:49,160 --> 13:41:52,160
Or what are the predictions? They're going to be logits because why?
7948
13:41:52,160 --> 13:41:55,160
They are the raw outputs of our model.
7949
13:41:55,160 --> 13:41:59,160
So we'll take model four, which we just trained and we'll pass it the test data.
7950
13:41:59,160 --> 13:42:02,160
Well, it needs to be blob test, by the way.
7951
13:42:02,160 --> 13:42:04,160
I keep getting that variable mixed up.
7952
13:42:04,160 --> 13:42:06,160
We just had enough problems with the data, Daniel.
7953
13:42:06,160 --> 13:42:09,160
We don't need any more. You're completely right.
7954
13:42:09,160 --> 13:42:10,160
I agree with you.
7955
13:42:10,160 --> 13:42:13,160
But we're probably going to come across some more problems in the future.
7956
13:42:13,160 --> 13:42:14,160
Don't you worry about that.
7957
13:42:14,160 --> 13:42:17,160
So let's view the first 10 predictions.
7958
13:42:17,160 --> 13:42:21,160
Why logits? What do they look like?
7959
13:42:21,160 --> 13:42:23,160
All right, just numbers on the page. They're raw logits.
7960
13:42:23,160 --> 13:42:31,160
Now how do we go from go from logits to prediction probabilities?
7961
13:42:31,160 --> 13:42:32,160
How do we do that?
7962
13:42:32,160 --> 13:42:40,160
With a multi-class model, we go y-pred-probs equals torch.softmax on the y-logits.
7963
13:42:40,160 --> 13:42:43,160
And we want to do it across the first dimension.
7964
13:42:43,160 --> 13:42:46,160
And what do we have when we go pred-probs?
7965
13:42:46,160 --> 13:42:50,160
Let's go up to the first 10.
7966
13:42:50,160 --> 13:42:52,160
Are we apples to apples yet?
7967
13:42:52,160 --> 13:42:58,160
What does our y-blog test look like?
7968
13:42:58,160 --> 13:43:00,160
We're not apples to apples yet, but we're close.
7969
13:43:00,160 --> 13:43:02,160
So these are prediction probabilities.
7970
13:43:02,160 --> 13:43:04,160
You'll notice that we get some fairly different values here.
7971
13:43:04,160 --> 13:43:08,160
And remember, the one closest to one here, the value closest to one,
7972
13:43:08,160 --> 13:43:12,160
which looks like it's this, the index of the maximum value
7973
13:43:12,160 --> 13:43:15,160
is going to be our model's predicted class.
7974
13:43:15,160 --> 13:43:17,160
So this index is index one.
7975
13:43:17,160 --> 13:43:19,160
And does it correlate here? Yes.
7976
13:43:19,160 --> 13:43:20,160
One, beautiful.
7977
13:43:20,160 --> 13:43:23,160
Then we have index three, which is the maximum value here.
7978
13:43:23,160 --> 13:43:25,160
Three, beautiful.
7979
13:43:25,160 --> 13:43:27,160
And then we have, what do we have here?
7980
13:43:27,160 --> 13:43:30,160
Index two, yes.
7981
13:43:30,160 --> 13:43:31,160
Okay, wonderful.
7982
13:43:31,160 --> 13:43:32,160
But let's not step through that.
7983
13:43:32,160 --> 13:43:33,160
We're programmers.
7984
13:43:33,160 --> 13:43:34,160
We can do this with code.
7985
13:43:34,160 --> 13:43:40,160
So now let's go from pred-probs to pred-labels.
7986
13:43:40,160 --> 13:43:44,160
So y-pred-equals, how do we do that?
7987
13:43:44,160 --> 13:43:50,160
Well, we can do torch.argmax on the y-pred-probs.
7988
13:43:50,160 --> 13:43:52,160
And then we can pass dim equals one.
7989
13:43:52,160 --> 13:43:54,160
We could also do it this way.
7990
13:43:54,160 --> 13:43:57,160
So y-pred-probs call dot-argmax.
7991
13:43:57,160 --> 13:43:59,160
There's no real difference between these two.
7992
13:43:59,160 --> 13:44:03,160
But we're just going to do it this way, called torch.argmax.
7993
13:44:03,160 --> 13:44:05,160
y-pred-es, let's view the first 10.
7994
13:44:05,160 --> 13:44:12,160
Are we now comparing apples to apples when we go y-blob test?
7995
13:44:12,160 --> 13:44:14,160
Yes, we are.
7996
13:44:14,160 --> 13:44:15,160
Have a go at that.
7997
13:44:15,160 --> 13:44:20,160
Look, one, three, two, one, zero, three, one, three, two, one, zero, three.
7998
13:44:20,160 --> 13:44:21,160
Beautiful.
7999
13:44:21,160 --> 13:44:24,160
Now, we could line these up and look at and compare them all day.
8000
13:44:24,160 --> 13:44:25,160
I mean, that would be fun.
8001
13:44:25,160 --> 13:44:29,160
But I know what something that would be even more fun.
8002
13:44:29,160 --> 13:44:30,160
Let's get visual.
8003
13:44:30,160 --> 13:44:33,160
So plot dot figure.
8004
13:44:33,160 --> 13:44:39,160
And we're going to go fig size equals 12.6, just because the beauty of this
8005
13:44:39,160 --> 13:44:43,160
being a cooking show is I kind of know what ingredients work from ahead of time.
8006
13:44:43,160 --> 13:44:46,160
Despite what you saw in the last video with all of that trouble shooting.
8007
13:44:46,160 --> 13:44:50,160
But I'm actually glad that we did that because seriously.
8008
13:44:50,160 --> 13:44:53,160
Shape issues and data type issues.
8009
13:44:53,160 --> 13:44:55,160
You're going to come across a lot of them.
8010
13:44:55,160 --> 13:44:59,160
The two are the main issues I troubleshoot, aside from device issues.
8011
13:44:59,160 --> 13:45:05,160
So let's go x-blob train and y-blob train.
8012
13:45:05,160 --> 13:45:08,160
And we're going to do another plot here.
8013
13:45:08,160 --> 13:45:11,160
We're going to get subplot one, two, two.
8014
13:45:11,160 --> 13:45:14,160
And we're going to do this for the test data.
8015
13:45:14,160 --> 13:45:17,160
Test and then plot decision boundary.
8016
13:45:17,160 --> 13:45:30,160
Plot decision boundary with model four on x-blob test and y-blob test as well.
8017
13:45:30,160 --> 13:45:31,160
Let's see this.
8018
13:45:31,160 --> 13:45:32,160
Did we train a multi-class?
8019
13:45:32,160 --> 13:45:33,160
Oh my goodness.
8020
13:45:33,160 --> 13:45:34,160
Yes, we did.
8021
13:45:34,160 --> 13:45:37,160
Our code worked faster than I can speak.
8022
13:45:37,160 --> 13:45:39,160
Look at that beautiful looking plot.
8023
13:45:39,160 --> 13:45:42,160
We've separated our data almost as best as what we could.
8024
13:45:42,160 --> 13:45:45,160
Like there's some here that are quite inconspicuous.
8025
13:45:45,160 --> 13:45:48,160
And now what's the thing about these lines?
8026
13:45:48,160 --> 13:45:52,160
With this model have worked, I posed the question a fair few videos ago,
8027
13:45:52,160 --> 13:45:56,160
whenever we created our multi-class model that could we separate this data
8028
13:45:56,160 --> 13:45:59,160
without nonlinear functions.
8029
13:45:59,160 --> 13:46:01,160
So how about we just test that?
8030
13:46:01,160 --> 13:46:04,160
Since we've got the code ready, let's go back up.
8031
13:46:04,160 --> 13:46:06,160
We've got nonlinear functions here.
8032
13:46:06,160 --> 13:46:07,160
We've got relu here.
8033
13:46:07,160 --> 13:46:10,160
So I'm just going to recreate our model there.
8034
13:46:10,160 --> 13:46:11,160
So I just took relu out.
8035
13:46:11,160 --> 13:46:12,160
That's all I did.
8036
13:46:12,160 --> 13:46:15,160
Commented it out, this code will still all work.
8037
13:46:15,160 --> 13:46:16,160
Or fingers crossed it will.
8038
13:46:16,160 --> 13:46:18,160
Don't count your chickens before they hatch.
8039
13:46:18,160 --> 13:46:19,160
Daniel, come on.
8040
13:46:19,160 --> 13:46:21,160
We're just going to rerun all of these cells.
8041
13:46:21,160 --> 13:46:23,160
All the code's going to stay the same.
8042
13:46:23,160 --> 13:46:26,160
All we did was we took the nonlinearity out of our model.
8043
13:46:26,160 --> 13:46:28,160
Is it still going to work?
8044
13:46:28,160 --> 13:46:29,160
Oh my goodness.
8045
13:46:29,160 --> 13:46:31,160
It still works.
8046
13:46:31,160 --> 13:46:33,160
Now why is that?
8047
13:46:33,160 --> 13:46:36,160
Well, you'll notice that the lines are a lot more straighter here.
8048
13:46:36,160 --> 13:46:38,160
Did we get different metrics?
8049
13:46:38,160 --> 13:46:39,160
I'll leave that for you to compare.
8050
13:46:39,160 --> 13:46:41,160
Maybe these will be a little bit different.
8051
13:46:41,160 --> 13:46:43,160
I don't think they're too far different.
8052
13:46:43,160 --> 13:46:48,160
But that is because our data is linearly separable.
8053
13:46:48,160 --> 13:46:51,160
So we can draw straight lines only to separate our data.
8054
13:46:51,160 --> 13:46:54,160
However, a lot of the data that you deal with in practice
8055
13:46:54,160 --> 13:46:57,160
will require linear and nonlinear.
8056
13:46:57,160 --> 13:46:59,160
Hence why we spent a lot of time on that.
8057
13:46:59,160 --> 13:47:01,160
Like the circle data that we covered before.
8058
13:47:01,160 --> 13:47:03,160
And let's look up an image of a pizza.
8059
13:47:03,160 --> 13:47:08,160
If you're building a food vision model to take photos of food
8060
13:47:08,160 --> 13:47:11,160
and separate different classes of food,
8061
13:47:11,160 --> 13:47:14,160
could you do this with just straight lines?
8062
13:47:14,160 --> 13:47:17,160
You might be able to, but I personally don't think
8063
13:47:17,160 --> 13:47:19,160
that I could build a model to do such a thing.
8064
13:47:19,160 --> 13:47:22,160
And in fact, PyTorch makes it so easy to add nonlinearities
8065
13:47:22,160 --> 13:47:24,160
to our model, we might as well have them in
8066
13:47:24,160 --> 13:47:27,160
so that our model can use it if it needs it
8067
13:47:27,160 --> 13:47:29,160
and if it doesn't need it, well, hey,
8068
13:47:29,160 --> 13:47:32,160
it's going to build a pretty good model as we saw before
8069
13:47:32,160 --> 13:47:35,160
if we included the nonlinearities in our model.
8070
13:47:35,160 --> 13:47:37,160
So we could uncomment these and our model is still
8071
13:47:37,160 --> 13:47:38,160
going to perform quite well.
8072
13:47:38,160 --> 13:47:40,160
That is the beauty of neural networks,
8073
13:47:40,160 --> 13:47:43,160
is that they decide the numbers that should
8074
13:47:43,160 --> 13:47:45,160
represent outdated the best.
8075
13:47:45,160 --> 13:47:49,160
And so, with that being said, we've evaluated our model,
8076
13:47:49,160 --> 13:47:51,160
we've trained our multi-class classification model,
8077
13:47:51,160 --> 13:47:54,160
we've put everything together, we've gone from binary
8078
13:47:54,160 --> 13:47:57,160
classification to multi-class classification.
8079
13:47:57,160 --> 13:48:00,160
I think there's just one more thing that we should cover
8080
13:48:00,160 --> 13:48:04,160
and that is, let's go here, section number nine,
8081
13:48:04,160 --> 13:48:08,160
a few more classification metrics.
8082
13:48:08,160 --> 13:48:12,160
So, as I said before, evaluating a model,
8083
13:48:12,160 --> 13:48:15,160
let's just put it here, to evaluate our model,
8084
13:48:15,160 --> 13:48:18,160
our classification models, that is,
8085
13:48:18,160 --> 13:48:22,160
evaluating a model is just as important as training a model.
8086
13:48:22,160 --> 13:48:24,160
So, I'll see you in the next video.
8087
13:48:24,160 --> 13:48:28,160
Let's cover a few more classification metrics.
8088
13:48:28,160 --> 13:48:29,160
Welcome back.
8089
13:48:29,160 --> 13:48:31,160
In the last video, we evaluated our
8090
13:48:31,160 --> 13:48:34,160
multi-class classification model visually.
8091
13:48:34,160 --> 13:48:36,160
And we saw that it did pretty darn well,
8092
13:48:36,160 --> 13:48:39,160
because our data turned out to be linearly separable.
8093
13:48:39,160 --> 13:48:41,160
So, our model, even without non-linear functions,
8094
13:48:41,160 --> 13:48:43,160
could separate the data here.
8095
13:48:43,160 --> 13:48:46,160
However, as I said before, most of the data that you deal with
8096
13:48:46,160 --> 13:48:50,160
will require some form of linear and non-linear function.
8097
13:48:50,160 --> 13:48:53,160
So, just keep that in mind, and the beauty of PyTorch is
8098
13:48:53,160 --> 13:48:56,160
that it allows us to create models with linear
8099
13:48:56,160 --> 13:48:59,160
and non-linear functions quite flexibly.
8100
13:48:59,160 --> 13:49:01,160
So, let's write down here.
8101
13:49:01,160 --> 13:49:04,160
If we wanted to further evaluate our classification models,
8102
13:49:04,160 --> 13:49:06,160
we've seen accuracy.
8103
13:49:06,160 --> 13:49:08,160
So, accuracy is one of the main methods
8104
13:49:08,160 --> 13:49:10,160
of evaluating classification models.
8105
13:49:10,160 --> 13:49:14,160
So, this is like saying, out of 100 samples,
8106
13:49:14,160 --> 13:49:18,160
how many does our model get right?
8107
13:49:18,160 --> 13:49:21,160
And so, we've seen our model right now
8108
13:49:21,160 --> 13:49:23,160
is that testing accuracy of nearly 100%.
8109
13:49:23,160 --> 13:49:25,160
So, it's nearly perfect.
8110
13:49:25,160 --> 13:49:27,160
But, of course, there were a few tough samples,
8111
13:49:27,160 --> 13:49:29,160
which I mean a little bit hard.
8112
13:49:29,160 --> 13:49:31,160
Some of them are even within the other samples,
8113
13:49:31,160 --> 13:49:33,160
so you can forgive it a little bit here
8114
13:49:33,160 --> 13:49:36,160
for not being exactly perfect.
8115
13:49:36,160 --> 13:49:38,160
What are some other metrics here?
8116
13:49:38,160 --> 13:49:41,160
Well, we've also got precision,
8117
13:49:41,160 --> 13:49:44,160
and we've also got recall.
8118
13:49:44,160 --> 13:49:46,160
Both of these will be pretty important
8119
13:49:46,160 --> 13:49:50,160
when you have classes with different amounts of values in them.
8120
13:49:50,160 --> 13:49:52,160
So, precision and recall.
8121
13:49:52,160 --> 13:49:57,160
So, accuracy is pretty good to use when you have balanced classes.
8122
13:49:57,160 --> 13:50:00,160
So, this is just text on a page for now.
8123
13:50:00,160 --> 13:50:03,160
F1 score, which combines precision and recall.
8124
13:50:03,160 --> 13:50:05,160
There's also a confusion matrix,
8125
13:50:05,160 --> 13:50:09,160
and there's also a classification report.
8126
13:50:09,160 --> 13:50:12,160
So, I'm going to show you a few code examples
8127
13:50:12,160 --> 13:50:14,160
of where you can access these,
8128
13:50:14,160 --> 13:50:17,160
and I'm going to leave it to you as extra curriculum
8129
13:50:17,160 --> 13:50:20,160
to try each one of these out.
8130
13:50:20,160 --> 13:50:23,160
So, let's go into the keynote.
8131
13:50:23,160 --> 13:50:25,160
And by the way, you should pay yourself on the back here
8132
13:50:25,160 --> 13:50:28,160
because we've just gone through all of the PyTorch workflow
8133
13:50:28,160 --> 13:50:30,160
for a classification problem.
8134
13:50:30,160 --> 13:50:32,160
Not only just binary classification,
8135
13:50:32,160 --> 13:50:35,160
we've done multi-class classification as well.
8136
13:50:35,160 --> 13:50:38,160
So, let's not stop there, though.
8137
13:50:38,160 --> 13:50:40,160
Remember, building a model,
8138
13:50:40,160 --> 13:50:43,160
evaluating a model is just as important as building a model.
8139
13:50:43,160 --> 13:50:46,160
So, we've been through non-linearity.
8140
13:50:46,160 --> 13:50:49,160
We've seen how we could replicate non-linear functions.
8141
13:50:49,160 --> 13:50:52,160
We've talked about the machine learning explorer's motto,
8142
13:50:52,160 --> 13:50:55,160
visualize, visualize, visualize.
8143
13:50:55,160 --> 13:50:58,160
Machine learning practitioners motto is experiment, experiment, experiment.
8144
13:50:58,160 --> 13:51:01,160
I think I called that the machine learning or data scientist motto.
8145
13:51:01,160 --> 13:51:03,160
Same thing, you know?
8146
13:51:03,160 --> 13:51:05,160
And steps in modeling with PyTorch.
8147
13:51:05,160 --> 13:51:06,160
We've seen this in practice,
8148
13:51:06,160 --> 13:51:08,160
so we don't need to look at these slides.
8149
13:51:08,160 --> 13:51:10,160
I mean, they'll be available on the GitHub if you want them,
8150
13:51:10,160 --> 13:51:11,160
but here we are.
8151
13:51:11,160 --> 13:51:14,160
Some common classification evaluation methods.
8152
13:51:14,160 --> 13:51:15,160
So, we have accuracy.
8153
13:51:15,160 --> 13:51:17,160
There's the formal formula if you want,
8154
13:51:17,160 --> 13:51:20,160
but there's also code, which is what we've been focusing on.
8155
13:51:20,160 --> 13:51:23,160
So, we wrote our own accuracy function, which replicates this.
8156
13:51:23,160 --> 13:51:26,160
By the way, Tp stands for not toilet paper,
8157
13:51:26,160 --> 13:51:28,160
it stands for true positive,
8158
13:51:28,160 --> 13:51:33,160
Tn is true negative, false positive, Fp, false negative, Fn.
8159
13:51:33,160 --> 13:51:36,160
And so, the code, we could do torch metrics.
8160
13:51:36,160 --> 13:51:37,160
Oh, what's that?
8161
13:51:37,160 --> 13:51:38,160
But when should you use it?
8162
13:51:38,160 --> 13:51:40,160
The default metric for classification problems.
8163
13:51:40,160 --> 13:51:43,160
Note, it is not the best for imbalanced classes.
8164
13:51:43,160 --> 13:51:45,160
So, if you had, for example,
8165
13:51:45,160 --> 13:51:48,160
1,000 samples of one class,
8166
13:51:48,160 --> 13:51:50,160
so, number one, label number one,
8167
13:51:50,160 --> 13:51:54,160
but you had only 10 samples of class zero.
8168
13:51:54,160 --> 13:51:58,160
So, accuracy might not be the best to use for then.
8169
13:51:58,160 --> 13:52:00,160
For imbalanced data sets,
8170
13:52:00,160 --> 13:52:03,160
you might want to look into precision and recall.
8171
13:52:03,160 --> 13:52:05,160
So, there's a great article called,
8172
13:52:05,160 --> 13:52:09,160
I think it's beyond accuracy, precision and recall,
8173
13:52:09,160 --> 13:52:12,160
which gives a fantastic overview of, there we go.
8174
13:52:12,160 --> 13:52:14,160
This is what I'd recommend.
8175
13:52:14,160 --> 13:52:17,160
There we go, by Will Coestron.
8176
13:52:17,160 --> 13:52:22,160
So, I'd highly recommend this article as some extra curriculum here.
8177
13:52:22,160 --> 13:52:29,160
See this article for when to use precision recall.
8178
13:52:29,160 --> 13:52:31,160
We'll go there.
8179
13:52:31,160 --> 13:52:32,160
Now, if we look back,
8180
13:52:32,160 --> 13:52:35,160
there is the formal formula for precision,
8181
13:52:35,160 --> 13:52:38,160
true positive over true positive plus false positive.
8182
13:52:38,160 --> 13:52:41,160
So, higher precision leads to less false positives.
8183
13:52:41,160 --> 13:52:44,160
So, if false positives are not ideal,
8184
13:52:44,160 --> 13:52:46,160
you probably want to increase precision.
8185
13:52:46,160 --> 13:52:49,160
If false negatives are not ideal,
8186
13:52:49,160 --> 13:52:51,160
you want to increase your recall metric.
8187
13:52:51,160 --> 13:52:55,160
However, you should be aware that there is such thing as a precision recall trade-off.
8188
13:52:55,160 --> 13:52:58,160
And you're going to find this in your experimentation.
8189
13:52:58,160 --> 13:53:01,160
Precision recall trade-off.
8190
13:53:01,160 --> 13:53:05,160
So, that means that, generally, if you increase precision,
8191
13:53:05,160 --> 13:53:07,160
you lower recall.
8192
13:53:07,160 --> 13:53:11,160
And, inversely, if you increase precision, you lower recall.
8193
13:53:11,160 --> 13:53:14,160
So, check out that, just to be aware of that.
8194
13:53:14,160 --> 13:53:18,160
But, again, you're going to learn this through practice of evaluating your models.
8195
13:53:18,160 --> 13:53:21,160
If you'd like some code to do precision and recall,
8196
13:53:21,160 --> 13:53:24,160
you've got torchmetrics.precision, or torchmetrics.recall,
8197
13:53:24,160 --> 13:53:26,160
as well as scikit-learn.
8198
13:53:26,160 --> 13:53:30,160
So scikit-learn has implementations for many different classification metrics.
8199
13:53:30,160 --> 13:53:34,160
Torchmetrics is a PyTorch-like library.
8200
13:53:34,160 --> 13:53:38,160
And then we have F1 score, which combines precision and recall.
8201
13:53:38,160 --> 13:53:42,160
So, it's a good combination if you want something in between these two.
8202
13:53:42,160 --> 13:53:45,160
And then, finally, there's a confusion matrix.
8203
13:53:45,160 --> 13:53:49,160
I haven't listed here a classification report, but I've listed it up here.
8204
13:53:49,160 --> 13:53:53,160
And we can see a classification report in scikit-learn.
8205
13:53:53,160 --> 13:53:55,160
Classification report.
8206
13:53:55,160 --> 13:53:59,160
Classification report kind of just puts together all of the metrics that we've talked about.
8207
13:53:59,160 --> 13:54:03,160
And we can go there.
8208
13:54:03,160 --> 13:54:06,160
But I've been talking a lot about torchmetrics.
8209
13:54:06,160 --> 13:54:09,160
So let's look up torchmetrics' accuracy.
8210
13:54:09,160 --> 13:54:11,160
Torchmetrics.
8211
13:54:11,160 --> 13:54:13,160
So this is a library.
8212
13:54:13,160 --> 13:54:16,160
I don't think it comes with Google Colab at the moment,
8213
13:54:16,160 --> 13:54:19,160
but you can import torchmetrics, and you can initialize a metric.
8214
13:54:19,160 --> 13:54:24,160
So we've built our own accuracy function, but the beauty of using torchmetrics
8215
13:54:24,160 --> 13:54:27,160
is that it uses PyTorch-like code.
8216
13:54:27,160 --> 13:54:31,160
So we've got metric, preds, and target.
8217
13:54:31,160 --> 13:54:36,160
And then we can find out what the value of the accuracy is.
8218
13:54:36,160 --> 13:54:42,160
And if you wanted to implement your own metrics, you could subclass the metric class here.
8219
13:54:42,160 --> 13:54:44,160
But let's just practice this.
8220
13:54:44,160 --> 13:54:51,160
So let's check to see if I'm going to grab this and copy this in here.
8221
13:54:51,160 --> 13:55:00,160
If you want access to a lot of PyTorch metrics, see torchmetrics.
8222
13:55:00,160 --> 13:55:03,160
So can we import torchmetrics?
8223
13:55:03,160 --> 13:55:08,160
Maybe it's already in Google Colab.
8224
13:55:08,160 --> 13:55:09,160
No, not here.
8225
13:55:09,160 --> 13:55:10,160
But that's all right.
8226
13:55:10,160 --> 13:55:13,160
We'll go pip install torchmetrics.
8227
13:55:13,160 --> 13:55:16,160
So Google Colab has access to torchmetrics.
8228
13:55:16,160 --> 13:55:19,160
And that's going to download from torchmetrics.
8229
13:55:19,160 --> 13:55:20,160
It shouldn't take too long.
8230
13:55:20,160 --> 13:55:21,160
It's quite a small package.
8231
13:55:21,160 --> 13:55:22,160
Beautiful.
8232
13:55:22,160 --> 13:55:29,160
And now we're going to go from torchmetrics import accuracy.
8233
13:55:29,160 --> 13:55:30,160
Wonderful.
8234
13:55:30,160 --> 13:55:33,160
And let's see how we can use this.
8235
13:55:33,160 --> 13:55:34,160
So setup metric.
8236
13:55:34,160 --> 13:55:38,160
So we're going to go torchmetric underscore accuracy.
8237
13:55:38,160 --> 13:55:40,160
We could call it whatever we want, really.
8238
13:55:40,160 --> 13:55:42,160
But we need accuracy here.
8239
13:55:42,160 --> 13:55:44,160
We're just going to set up the class.
8240
13:55:44,160 --> 13:55:53,160
And then we're going to calculate the accuracy of our multi-class model by calling torchmetric accuracy.
8241
13:55:53,160 --> 13:55:58,160
And we're going to pass it Y threads and Y blob test.
8242
13:55:58,160 --> 13:56:01,160
Let's see what happens here.
8243
13:56:01,160 --> 13:56:04,160
Oh, what did we get wrong?
8244
13:56:04,160 --> 13:56:05,160
Runtime error.
8245
13:56:05,160 --> 13:56:08,160
Expected all tensors to be on the same device, but found at least two devices.
8246
13:56:08,160 --> 13:56:11,160
Oh, of course.
8247
13:56:11,160 --> 13:56:16,160
Now, remember how I said torchmetrics implements PyTorch like code?
8248
13:56:16,160 --> 13:56:20,160
Well, let's check what device this is on.
8249
13:56:20,160 --> 13:56:22,160
Oh, it's on the CPU.
8250
13:56:22,160 --> 13:56:27,160
So something to be aware of that if you use torchmetrics, you have to make sure your metrics
8251
13:56:27,160 --> 13:56:32,160
are on the same device by using device agnostic code as your data.
8252
13:56:32,160 --> 13:56:34,160
So if we run this, what do we get?
8253
13:56:34,160 --> 13:56:35,160
Beautiful.
8254
13:56:35,160 --> 13:56:43,160
We get an accuracy of 99.5%, which is in line with the accuracy function that we coded ourselves.
8255
13:56:43,160 --> 13:56:47,160
So if you'd like a lot of pre-built metrics functions, be sure to see either
8256
13:56:47,160 --> 13:56:53,160
scikit-learn for any of these or torchmetrics for any PyTorch like metrics.
8257
13:56:53,160 --> 13:56:56,160
But just be aware, if you use the PyTorch version, they have to be on the same
8258
13:56:56,160 --> 13:56:57,160
device.
8259
13:56:57,160 --> 13:57:01,160
And if you'd like to install it, what do we have?
8260
13:57:01,160 --> 13:57:02,160
Where's the metrics?
8261
13:57:02,160 --> 13:57:03,160
Module metrics?
8262
13:57:03,160 --> 13:57:05,160
Do we have classification?
8263
13:57:05,160 --> 13:57:06,160
There we go.
8264
13:57:06,160 --> 13:57:11,160
So look how many different types of classification metrics there are from torchmetrics.
8265
13:57:11,160 --> 13:57:13,160
So I'll leave that for you to explore.
8266
13:57:13,160 --> 13:57:16,160
The resources for this will be here.
8267
13:57:16,160 --> 13:57:20,160
This is an extracurricular article for when to use precision recall.
8268
13:57:20,160 --> 13:57:26,160
And another extracurricular would be to go through the torchmetrics module for 10 minutes
8269
13:57:26,160 --> 13:57:30,160
and have a look at the different metrics for classification.
8270
13:57:30,160 --> 13:57:36,160
So with that being said, I think we've covered a fair bit.
8271
13:57:36,160 --> 13:57:40,160
But I think it's also time for you to practice what you've learned.
8272
13:57:40,160 --> 13:57:43,160
So let's cover some exercises in the next video.
8273
13:57:43,160 --> 13:57:46,160
I'll see you there.
8274
13:57:46,160 --> 13:57:47,160
Welcome back.
8275
13:57:47,160 --> 13:57:52,160
In the last video, we looked at a few more classification metrics, a little bit of a high
8276
13:57:52,160 --> 13:57:57,160
level overview for some more ways to evaluate your classification models.
8277
13:57:57,160 --> 13:58:01,160
And I linked some extracurricular here that you might want to look into as well.
8278
13:58:01,160 --> 13:58:04,160
But we have covered a whole bunch of code together.
8279
13:58:04,160 --> 13:58:07,160
But now it's time for you to practice some of this stuff on your own.
8280
13:58:07,160 --> 13:58:10,160
And so I have some exercises prepared.
8281
13:58:10,160 --> 13:58:12,160
Now, where do you go for the exercises?
8282
13:58:12,160 --> 13:58:16,160
Well, remember on the learnpytorch.io book, for each one of these chapters, there's
8283
13:58:16,160 --> 13:58:17,160
a section.
8284
13:58:17,160 --> 13:58:19,160
Now, just have a look at how much we've covered.
8285
13:58:19,160 --> 13:58:21,160
If I scroll, just keep scrolling.
8286
13:58:21,160 --> 13:58:22,160
Look at that.
8287
13:58:22,160 --> 13:58:23,160
We've covered all of that in this module.
8288
13:58:23,160 --> 13:58:24,160
That's a fair bit of stuff.
8289
13:58:24,160 --> 13:58:28,160
But down the bottom of each one is an exercises section.
8290
13:58:28,160 --> 13:58:32,160
So all exercises are focusing on practicing the code in the sections above, all of these
8291
13:58:32,160 --> 13:58:33,160
sections here.
8292
13:58:33,160 --> 13:58:39,160
I've got number one, two, three, four, five, six, seven.
8293
13:58:39,160 --> 13:58:42,160
Yeah, seven exercises, nice, writing plenty of code.
8294
13:58:42,160 --> 13:58:44,160
And then, of course, extracurricular.
8295
13:58:44,160 --> 13:58:50,160
So these are some challenges that I've mentioned throughout the entire section zero two.
8296
13:58:50,160 --> 13:58:52,160
But I'm going to link this in here.
8297
13:58:52,160 --> 13:58:53,160
Exercises.
8298
13:58:53,160 --> 13:58:57,160
But, of course, you can just find it on the learnpytorch.io book.
8299
13:58:57,160 --> 13:59:01,160
So if we come in here and we just create another heading.
8300
13:59:01,160 --> 13:59:02,160
Exercises.
8301
13:59:02,160 --> 13:59:10,160
And extracurricular.
8302
13:59:10,160 --> 13:59:14,160
And then we just write in here.
8303
13:59:14,160 --> 13:59:18,160
See exercises and extracurricular.
8304
13:59:18,160 --> 13:59:20,160
Here.
8305
13:59:20,160 --> 13:59:27,160
And so if you'd like a template of the exercise code, you can go to the PyTorch deep learning
8306
13:59:27,160 --> 13:59:28,160
repo.
8307
13:59:28,160 --> 13:59:32,160
And then within the extras folder, we have exercises and solutions.
8308
13:59:32,160 --> 13:59:35,160
You might be able to guess what's in each of these exercises.
8309
13:59:35,160 --> 13:59:39,160
We have O2 PyTorch classification exercises.
8310
13:59:39,160 --> 13:59:41,160
This is going to be some skeleton code.
8311
13:59:41,160 --> 13:59:44,160
And then, of course, we have the solutions as well.
8312
13:59:44,160 --> 13:59:46,160
Now, this is just one form of solutions.
8313
13:59:46,160 --> 13:59:51,160
But I'm not going to look at those because I would recommend you looking at the exercises
8314
13:59:51,160 --> 13:59:54,160
first before you go into the solutions.
8315
13:59:54,160 --> 13:59:57,160
So we have things like import torch.
8316
13:59:57,160 --> 13:59:59,160
Set up device agnostic code.
8317
13:59:59,160 --> 14:00:01,160
Create a data set.
8318
14:00:01,160 --> 14:00:03,160
Turn data into a data frame.
8319
14:00:03,160 --> 14:00:05,160
And then et cetera, et cetera.
8320
14:00:05,160 --> 14:00:08,160
For the things that we've done throughout this section.
8321
14:00:08,160 --> 14:00:10,160
So give that a go.
8322
14:00:10,160 --> 14:00:11,160
Try it on your own.
8323
14:00:11,160 --> 14:00:15,160
And if you get stuck, you can refer to the notebook that we've coded together.
8324
14:00:15,160 --> 14:00:16,160
All of this code here.
8325
14:00:16,160 --> 14:00:21,160
You can refer to the documentation, of course.
8326
14:00:21,160 --> 14:00:26,160
And then you can refer to as a last resort, the solutions notebooks.
8327
14:00:26,160 --> 14:00:28,160
So give that a shot.
8328
14:00:28,160 --> 14:00:31,160
And congratulations on finishing.
8329
14:00:31,160 --> 14:00:35,160
Section 02 PyTorch classification.
8330
14:00:35,160 --> 14:00:38,160
Now, if you're still there, you're still with me.
8331
14:00:38,160 --> 14:00:39,160
Let's move on to the next section.
8332
14:00:39,160 --> 14:00:43,160
We're going to cover a few more things of deep learning with PyTorch.
8333
14:00:43,160 --> 14:00:48,160
I'll see you there.
8334
14:00:48,160 --> 14:00:50,160
Hello, and welcome back.
8335
14:00:50,160 --> 14:00:52,160
We've got another section.
8336
14:00:52,160 --> 14:00:56,160
We've got computer vision and convolutional neural networks with.
8337
14:00:56,160 --> 14:00:58,160
PyTorch.
8338
14:00:58,160 --> 14:01:03,160
Now, computer vision is one of my favorite, favorite deep learning topics.
8339
14:01:03,160 --> 14:01:06,160
But before we get into the materials, let's answer a very important question.
8340
14:01:06,160 --> 14:01:09,160
And that is, where can you get help?
8341
14:01:09,160 --> 14:01:13,160
So, first and foremost, is to follow along with the code as best you can.
8342
14:01:13,160 --> 14:01:16,160
We're going to be writing a whole bunch of PyTorch computer vision code.
8343
14:01:16,160 --> 14:01:17,160
And remember our motto.
8344
14:01:17,160 --> 14:01:19,160
If and out, run the code.
8345
14:01:19,160 --> 14:01:22,160
See what the inputs and outputs are of your code.
8346
14:01:22,160 --> 14:01:24,160
And that's try it yourself.
8347
14:01:24,160 --> 14:01:28,160
If you need the doc string to read about what the function you're using does,
8348
14:01:28,160 --> 14:01:31,160
you can press shift command and space in Google CoLab.
8349
14:01:31,160 --> 14:01:33,160
Or it might be control if you're on Windows.
8350
14:01:33,160 --> 14:01:37,160
Otherwise, if you're still stuck, you can search for the code that you're running.
8351
14:01:37,160 --> 14:01:40,160
You might come across stack overflow or the PyTorch documentation.
8352
14:01:40,160 --> 14:01:43,160
We've spent a bunch of time in the PyTorch documentation already.
8353
14:01:43,160 --> 14:01:48,160
And we're going to be referencing a whole bunch in the next module in section three.
8354
14:01:48,160 --> 14:01:49,160
We're up to now.
8355
14:01:49,160 --> 14:01:53,160
If you go through all of these four steps, the next step is to try it again.
8356
14:01:53,160 --> 14:01:55,160
If and out, run the code.
8357
14:01:55,160 --> 14:02:00,160
And then, of course, if you're still stuck, you can ask a question on the PyTorch deep learning repo.
8358
14:02:00,160 --> 14:02:02,160
Discussions tab.
8359
14:02:02,160 --> 14:02:06,160
Now, if we open this up, we can go new discussion.
8360
14:02:06,160 --> 14:02:09,160
And you can write here section 03 for the computer vision.
8361
14:02:09,160 --> 14:02:15,160
My problem is, and then in here, you can write some code.
8362
14:02:15,160 --> 14:02:17,160
Be sure to format it as best you can.
8363
14:02:17,160 --> 14:02:19,160
That way it'll help us answer it.
8364
14:02:19,160 --> 14:02:23,160
And then go, what's happening here?
8365
14:02:23,160 --> 14:02:27,160
Now, why do I format the code in these back ticks here?
8366
14:02:27,160 --> 14:02:32,160
It's so that it looks like code and that it's easier to read when it's formatted on the GitHub discussion.
8367
14:02:32,160 --> 14:02:33,160
Then you can select a category.
8368
14:02:33,160 --> 14:02:39,160
If you have a general chat, an idea, a poll, a Q&A, or a show and tell of something you've made,
8369
14:02:39,160 --> 14:02:41,160
or what you've learned from the course.
8370
14:02:41,160 --> 14:02:43,160
For question and answering, you want to put it as Q&A.
8371
14:02:43,160 --> 14:02:45,160
Then you can click start discussion.
8372
14:02:45,160 --> 14:02:47,160
And it'll appear here.
8373
14:02:47,160 --> 14:02:50,160
And that way, they'll be searchable and we'll be able to help you out.
8374
14:02:50,160 --> 14:02:52,160
So I'm going to get out of this.
8375
14:02:52,160 --> 14:02:56,160
And oh, speaking of resources, we've got the PyTorch deep learning repo.
8376
14:02:56,160 --> 14:02:58,160
The links will be where you need the links.
8377
14:02:58,160 --> 14:03:04,160
All of the code that we're going to write in this section is contained within the section 3 notebook.
8378
14:03:04,160 --> 14:03:06,160
PyTorch computer vision.
8379
14:03:06,160 --> 14:03:09,160
Now, this is a beautiful notebook annotated with heaps of text and images.
8380
14:03:09,160 --> 14:03:13,160
You can go through this on your own time and use it as a reference to help out.
8381
14:03:13,160 --> 14:03:19,160
If you get stuck on any of the code we write through the videos, check it out in this notebook because it's probably here somewhere.
8382
14:03:19,160 --> 14:03:22,160
And then finally, let's get out of these.
8383
14:03:22,160 --> 14:03:24,160
If we come to the book version of the course,
8384
14:03:24,160 --> 14:03:26,160
this is learnpytorch.io.
8385
14:03:26,160 --> 14:03:27,160
We've got home.
8386
14:03:27,160 --> 14:03:30,160
This will probably be updated by the time you look at that.
8387
14:03:30,160 --> 14:03:34,160
But we have section 03, which is PyTorch computer vision.
8388
14:03:34,160 --> 14:03:39,160
It's got all of the information about what we're going to cover in a book format.
8389
14:03:39,160 --> 14:03:42,160
And you can, of course, skip ahead to different subtitles.
8390
14:03:42,160 --> 14:03:44,160
See what we're going to write here.
8391
14:03:44,160 --> 14:03:49,160
All of the links you need and extra resources will be at learnpytorch.io.
8392
14:03:49,160 --> 14:03:52,160
And for this section, it's PyTorch computer vision.
8393
14:03:52,160 --> 14:03:57,160
With that being said, speaking of computer vision, you might have the question,
8394
14:03:57,160 --> 14:04:00,160
what is a computer vision problem?
8395
14:04:00,160 --> 14:04:06,160
Well, if you can see it, it could probably be phrased at some sort of computer vision problem.
8396
14:04:06,160 --> 14:04:08,160
That's how broad computer vision is.
8397
14:04:08,160 --> 14:04:11,160
So let's have a few concrete examples.
8398
14:04:11,160 --> 14:04:14,160
We might have a binary classification problem,
8399
14:04:14,160 --> 14:04:17,160
such as if we wanted to have two different images.
8400
14:04:17,160 --> 14:04:19,160
Is this photo of steak or pizza?
8401
14:04:19,160 --> 14:04:22,160
We might build a model that understands what steak looks like in an image.
8402
14:04:22,160 --> 14:04:24,160
This is a beautiful dish that I cooked, by the way.
8403
14:04:24,160 --> 14:04:27,160
This is me eating pizza at a cafe with my dad.
8404
14:04:27,160 --> 14:04:31,160
And so we could have binary classification, one thing or another.
8405
14:04:31,160 --> 14:04:35,160
And so our machine learning model may take in the pixels of an image
8406
14:04:35,160 --> 14:04:39,160
and understand the different patterns that go into what a steak looks like
8407
14:04:39,160 --> 14:04:41,160
and the same thing with a pizza.
8408
14:04:41,160 --> 14:04:46,160
Now, the important thing to note is that we won't actually be telling our model what to learn.
8409
14:04:46,160 --> 14:04:50,160
It will learn those patterns itself from different examples of images.
8410
14:04:50,160 --> 14:04:55,160
Then we could step things up and have a multi-class classification problem.
8411
14:04:55,160 --> 14:04:56,160
You're noticing a trend here.
8412
14:04:56,160 --> 14:05:00,160
We've covered classification before, but classification can be quite broad.
8413
14:05:00,160 --> 14:05:06,160
It can be across different domains, such as vision or text or audio.
8414
14:05:06,160 --> 14:05:09,160
But if we were working with multi-class classification for an image problem,
8415
14:05:09,160 --> 14:05:13,160
we might have, is this a photo of sushi, steak or pizza?
8416
14:05:13,160 --> 14:05:15,160
And then we have three classes instead of two.
8417
14:05:15,160 --> 14:05:19,160
But again, this could be 100 classes, such as what Nutrify uses,
8418
14:05:19,160 --> 14:05:21,160
which is an app that I'm working on.
8419
14:05:21,160 --> 14:05:23,160
We go to Nutrify.app.
8420
14:05:23,160 --> 14:05:25,160
This is bare bones at the moment.
8421
14:05:25,160 --> 14:05:29,160
But right now, Nutrify can classify up to 100 different foods.
8422
14:05:29,160 --> 14:05:33,160
So if you were to upload an image of food, let's give it a try.
8423
14:05:33,160 --> 14:05:39,160
Nutrify, we'll go into images, and we'll go into sample food images.
8424
14:05:39,160 --> 14:05:41,160
And how about some chicken wings?
8425
14:05:41,160 --> 14:05:43,160
What does it classify this as?
8426
14:05:43,160 --> 14:05:45,160
Chicken wings. Beautiful.
8427
14:05:45,160 --> 14:05:49,160
And then if we upload an image of not food, maybe.
8428
14:05:49,160 --> 14:05:50,160
Let's go to Nutrify.
8429
14:05:50,160 --> 14:05:52,160
This is on my computer, by the way.
8430
14:05:52,160 --> 14:05:54,160
You might not have a sample folder set up like this.
8431
14:05:54,160 --> 14:05:57,160
And then if we upload a photo of a Cybertruck, what does it say?
8432
14:05:57,160 --> 14:05:58,160
No food found.
8433
14:05:58,160 --> 14:06:00,160
Please try another image.
8434
14:06:00,160 --> 14:06:04,160
So behind the scenes, Nutrify is using the pixels of an image
8435
14:06:04,160 --> 14:06:06,160
and then running them through a machine learning model
8436
14:06:06,160 --> 14:06:09,160
and classifying it first, whether it's food or not food.
8437
14:06:09,160 --> 14:06:13,160
And then if it is food, classifying it as what food it is.
8438
14:06:13,160 --> 14:06:16,160
So right now it works for 100 different foods.
8439
14:06:16,160 --> 14:06:18,160
So if we have a look at all these, it can classify apples,
8440
14:06:18,160 --> 14:06:21,160
artichokes, avocados, barbecue sauce.
8441
14:06:21,160 --> 14:06:24,160
Each of these work at different levels of performance,
8442
14:06:24,160 --> 14:06:27,160
but that's just something to keep in mind of what you can do.
8443
14:06:27,160 --> 14:06:30,160
So the whole premise of Nutrify is to upload a photo of food
8444
14:06:30,160 --> 14:06:33,160
and then learn about the nutrition about it.
8445
14:06:33,160 --> 14:06:35,160
So let's go back to our keynote.
8446
14:06:35,160 --> 14:06:37,160
What's another example?
8447
14:06:37,160 --> 14:06:40,160
Well, we could use computer vision for object detection,
8448
14:06:40,160 --> 14:06:42,160
where you might answer the question is,
8449
14:06:42,160 --> 14:06:44,160
where's the thing we're looking for?
8450
14:06:44,160 --> 14:06:48,160
So for example, this car here, I caught them on security camera,
8451
14:06:48,160 --> 14:06:51,160
actually did a hit and run on my new car,
8452
14:06:51,160 --> 14:06:54,160
wasn't that much of an expensive car, but I parked it on the street
8453
14:06:54,160 --> 14:06:57,160
and this person, the trailer came off the back of their car
8454
14:06:57,160 --> 14:07:00,160
and hit my car and then they just picked the trailer up
8455
14:07:00,160 --> 14:07:02,160
and drove away.
8456
14:07:02,160 --> 14:07:06,160
But I went to my neighbor's house and had to look at their security footage
8457
14:07:06,160 --> 14:07:08,160
and they found this car.
8458
14:07:08,160 --> 14:07:11,160
So potentially, you could design a machine learning model
8459
14:07:11,160 --> 14:07:13,160
to find this certain type of car.
8460
14:07:13,160 --> 14:07:16,160
It was an orange jute, by the way, but the images were in black and white
8461
14:07:16,160 --> 14:07:19,160
to detect to see if it ever recognizes a similar car
8462
14:07:19,160 --> 14:07:21,160
that comes across the street and you could go,
8463
14:07:21,160 --> 14:07:23,160
hey, did you crash into my car the other day?
8464
14:07:23,160 --> 14:07:25,160
I didn't actually find who it was.
8465
14:07:25,160 --> 14:07:27,160
So sadly, it was a hit and run.
8466
14:07:27,160 --> 14:07:30,160
But that's object detection, finding something in an image.
8467
14:07:30,160 --> 14:07:32,160
And then you might want to find out
8468
14:07:32,160 --> 14:07:34,160
whether the different sections in this image.
8469
14:07:34,160 --> 14:07:38,160
So this is a great example at what Apple uses on their devices,
8470
14:07:38,160 --> 14:07:43,160
iPhones and iPads and whatnot, to segregate or segment
8471
14:07:43,160 --> 14:07:46,160
the different sections of an image, so person one, person two,
8472
14:07:46,160 --> 14:07:49,160
skin tones, hair, sky, original.
8473
14:07:49,160 --> 14:07:53,160
And then it enhances each of these sections in different ways.
8474
14:07:53,160 --> 14:07:56,160
So that's a practice known as computational photography.
8475
14:07:56,160 --> 14:08:00,160
But the whole premise is how do you segment different portions of an image?
8476
14:08:00,160 --> 14:08:02,160
And then there's a great blog post here
8477
14:08:02,160 --> 14:08:04,160
that talks about how it works and what it does
8478
14:08:04,160 --> 14:08:06,160
and what kind of model that's used.
8479
14:08:06,160 --> 14:08:10,160
I'll leave that as extra curriculum if you'd like to look into it.
8480
14:08:10,160 --> 14:08:13,160
So if you have these images, how do you enhance the sky?
8481
14:08:13,160 --> 14:08:16,160
How do you make the skin tones look how they should?
8482
14:08:16,160 --> 14:08:19,160
How do you remove the background if you really wanted to?
8483
14:08:19,160 --> 14:08:21,160
So all of this happens on device.
8484
14:08:21,160 --> 14:08:24,160
So that's where I got that image from, by the way.
8485
14:08:24,160 --> 14:08:26,160
Semantic Mars.
8486
14:08:26,160 --> 14:08:29,160
And this is another great blog, Apple Machine Learning Research.
8487
14:08:29,160 --> 14:08:33,160
So to keep this in mind, we're about to see another example for computer vision,
8488
14:08:33,160 --> 14:08:35,160
which is Tesla Computer Vision.
8489
14:08:35,160 --> 14:08:39,160
A lot of companies have websites such as Apple Machine Learning Research
8490
14:08:39,160 --> 14:08:44,160
where they share a whole bunch of what they're up to in the world of machine learning.
8491
14:08:44,160 --> 14:08:48,160
So in Tesla's case, they have eight cameras on each of their self-driving cars
8492
14:08:48,160 --> 14:08:52,160
that fuels their full self-driving beta software.
8493
14:08:52,160 --> 14:08:56,160
And so they use computer vision to understand what's going on in an image
8494
14:08:56,160 --> 14:08:58,160
and then plan what's going on.
8495
14:08:58,160 --> 14:09:00,160
So this is three-dimensional vector space.
8496
14:09:00,160 --> 14:09:04,160
And what this means is they're basically taking these different viewpoints
8497
14:09:04,160 --> 14:09:08,160
from the eight different cameras, feeding them through some form of neural network,
8498
14:09:08,160 --> 14:09:13,160
and turning the representation of the environment around the car into a vector.
8499
14:09:13,160 --> 14:09:15,160
So a long string of numbers.
8500
14:09:15,160 --> 14:09:17,160
And why will it do that?
8501
14:09:17,160 --> 14:09:20,160
Well, because computers understand numbers far more than they understand images.
8502
14:09:20,160 --> 14:09:23,160
So we might be able to recognize what's happening here.
8503
14:09:23,160 --> 14:09:27,160
But for a computer to understand it, we have to turn it into vector space.
8504
14:09:27,160 --> 14:09:30,160
And so if you want to have a look at how Tesla uses computer vision,
8505
14:09:30,160 --> 14:09:33,160
so this is from Tesla's AI Day video.
8506
14:09:33,160 --> 14:09:35,160
I'm not going to play it all because it's three hours long,
8507
14:09:35,160 --> 14:09:38,160
but I watched it and I really enjoyed it.
8508
14:09:38,160 --> 14:09:40,160
So there's some information there.
8509
14:09:40,160 --> 14:09:42,160
And there's a little tidbit there.
8510
14:09:42,160 --> 14:09:45,160
If you go to two hours and one minute and 31 seconds on the same video,
8511
14:09:45,160 --> 14:09:48,160
have a look at what Tesla do.
8512
14:09:48,160 --> 14:09:52,160
Well, would you look at that? Where have we seen that before?
8513
14:09:52,160 --> 14:09:56,160
That's some device-agnostic code, but with Tesla's custom dojo chip.
8514
14:09:56,160 --> 14:09:58,160
So Tesla uses PyTorch.
8515
14:09:58,160 --> 14:10:00,160
So the exact same code that we're writing,
8516
14:10:00,160 --> 14:10:03,160
Tesla uses similar PyTorch code to, of course,
8517
14:10:03,160 --> 14:10:05,160
they write PyTorch code to suit their problem.
8518
14:10:05,160 --> 14:10:09,160
But nonetheless, they use PyTorch code to train their machine learning models
8519
14:10:09,160 --> 14:10:12,160
that power their self-driving software.
8520
14:10:12,160 --> 14:10:14,160
So how cool is that?
8521
14:10:14,160 --> 14:10:16,160
And if you want to have a look at another example,
8522
14:10:16,160 --> 14:10:19,160
there's plenty of different Tesla self-driving videos.
8523
14:10:19,160 --> 14:10:21,160
So, oh, we can just play it right here.
8524
14:10:21,160 --> 14:10:22,160
I was going to click the link.
8525
14:10:22,160 --> 14:10:24,160
So look, this is what happens.
8526
14:10:24,160 --> 14:10:26,160
If we have a look in the environment,
8527
14:10:26,160 --> 14:10:29,160
Tesla, the cameras, understand what's going on here.
8528
14:10:29,160 --> 14:10:31,160
And then it computes it into this little graphic here
8529
14:10:31,160 --> 14:10:33,160
on your heads-up display in the car.
8530
14:10:33,160 --> 14:10:35,160
And it kind of understands, well, I'm getting pretty close to this car.
8531
14:10:35,160 --> 14:10:37,160
I'm getting pretty close to that car.
8532
14:10:37,160 --> 14:10:40,160
And then it uses this information about what's happening,
8533
14:10:40,160 --> 14:10:43,160
this perception, to plan where it should drive next.
8534
14:10:43,160 --> 14:10:49,160
And I believe here it ends up going into it.
8535
14:10:49,160 --> 14:10:51,160
It has to stop.
8536
14:10:51,160 --> 14:10:53,160
Yeah, there we go.
8537
14:10:53,160 --> 14:10:54,160
Because we've got a stop sign.
8538
14:10:54,160 --> 14:10:55,160
Look at that.
8539
14:10:55,160 --> 14:10:56,160
It's perceiving the stop sign.
8540
14:10:56,160 --> 14:10:57,160
It's got two people here.
8541
14:10:57,160 --> 14:10:59,160
It just saw a car drive pass across this street.
8542
14:10:59,160 --> 14:11:00,160
So that is pretty darn cool.
8543
14:11:00,160 --> 14:11:03,160
That's just one example of computer vision, one of many.
8544
14:11:03,160 --> 14:11:07,160
And how would you find out what computer vision can be used for?
8545
14:11:07,160 --> 14:11:09,160
Here's what I do.
8546
14:11:09,160 --> 14:11:12,160
What can computer vision be used for?
8547
14:11:12,160 --> 14:11:14,160
Plenty more resources.
8548
14:11:14,160 --> 14:11:15,160
So, oh, there we go.
8549
14:11:15,160 --> 14:11:19,160
27 most popular computer vision applications in 2022.
8550
14:11:19,160 --> 14:11:22,160
So we've covered a fair bit there.
8551
14:11:22,160 --> 14:11:25,160
But what are we going to cover specifically with PyTorch code?
8552
14:11:25,160 --> 14:11:28,160
Well, broadly, like that.
8553
14:11:28,160 --> 14:11:32,160
We're going to get a vision data set to work with using torch vision.
8554
14:11:32,160 --> 14:11:35,160
So PyTorch has a lot of different domain libraries.
8555
14:11:35,160 --> 14:11:38,160
Torch vision helps us deal with computer vision problems.
8556
14:11:38,160 --> 14:11:42,160
And there's existing data sets that we can leverage to play around with computer vision.
8557
14:11:42,160 --> 14:11:45,160
We're going to have a look at the architecture of a convolutional neural network,
8558
14:11:45,160 --> 14:11:47,160
also known as a CNN with PyTorch.
8559
14:11:47,160 --> 14:11:51,160
We're going to look at an end-to-end multi-class image classification problem.
8560
14:11:51,160 --> 14:11:53,160
So multi-class is what?
8561
14:11:53,160 --> 14:11:54,160
More than one thing or another?
8562
14:11:54,160 --> 14:11:56,160
Could be three classes, could be a hundred.
8563
14:11:56,160 --> 14:11:59,160
We're going to look at steps at modeling with CNNs in PyTorch.
8564
14:11:59,160 --> 14:12:02,160
So we're going to create a convolutional network with PyTorch.
8565
14:12:02,160 --> 14:12:05,160
We're going to pick a last function and optimize it to suit our problem.
8566
14:12:05,160 --> 14:12:08,160
We're going to train a model, training a model a model.
8567
14:12:08,160 --> 14:12:10,160
A little bit of a typo there.
8568
14:12:10,160 --> 14:12:12,160
And then we're going to evaluate a model, right?
8569
14:12:12,160 --> 14:12:15,160
So we might have typos with our text, but we'll have less typos in the code.
8570
14:12:15,160 --> 14:12:17,160
And how are we going to do this?
8571
14:12:17,160 --> 14:12:20,160
Well, we could do it cook, so we could do it chemis.
8572
14:12:20,160 --> 14:12:22,160
Well, we're going to do it a little bit of both.
8573
14:12:22,160 --> 14:12:24,160
Part art, part science.
8574
14:12:24,160 --> 14:12:29,160
But since this is a machine learning cooking show, we're going to be cooking up lots of code.
8575
14:12:29,160 --> 14:12:34,160
So in the next video, we're going to cover the inputs and outputs of a computer vision problem.
8576
14:12:34,160 --> 14:12:36,160
I'll see you there.
8577
14:12:36,160 --> 14:12:40,160
So in the last video, we covered what we're going to cover, broadly.
8578
14:12:40,160 --> 14:12:43,160
And we saw some examples of what computer vision problems are.
8579
14:12:43,160 --> 14:12:48,160
Essentially, anything that you're able to see, you can potentially turn into a computer vision problem.
8580
14:12:48,160 --> 14:12:54,160
And we're going to be cooking up lots of machine learning, or specifically pie torch, computer vision code.
8581
14:12:54,160 --> 14:12:56,160
You see I fixed that typo.
8582
14:12:56,160 --> 14:13:00,160
Now let's talk about what the inputs and outputs are of a typical computer vision problem.
8583
14:13:00,160 --> 14:13:04,160
So let's start with a multi-classification example.
8584
14:13:04,160 --> 14:13:09,160
And so we wanted to take photos of different images of food and recognize what they were.
8585
14:13:09,160 --> 14:13:13,160
So we're replicating the functionality of Nutrify.
8586
14:13:13,160 --> 14:13:16,160
So take a photo of food and learn about it.
8587
14:13:16,160 --> 14:13:22,160
So we might start with a bunch of food images that have a height and width of some sort.
8588
14:13:22,160 --> 14:13:27,160
So we have width equals 224, height equals 224, and then they have three color channels.
8589
14:13:27,160 --> 14:13:28,160
Why three?
8590
14:13:28,160 --> 14:13:32,160
Well, that's because we have a value for red, green and blue.
8591
14:13:32,160 --> 14:13:40,160
So if we look at this up, if we go red, green, blue image format.
8592
14:13:40,160 --> 14:13:43,160
So 24-bit RGB images.
8593
14:13:43,160 --> 14:13:51,160
So a lot of images or digital images have some value for a red pixel, a green pixel and a blue pixel.
8594
14:13:51,160 --> 14:13:58,160
And if you were to convert images into numbers, they get represented by some value of red, some value of green and some value of blue.
8595
14:13:58,160 --> 14:14:02,160
That is exactly the same as how we'd represent these images.
8596
14:14:02,160 --> 14:14:09,160
So for example, this pixel here might be a little bit more red, a little less blue, and a little less green because it's close to orange.
8597
14:14:09,160 --> 14:14:11,160
And then we convert that into numbers.
8598
14:14:11,160 --> 14:14:18,160
So what we're trying to do here is essentially what we're trying to do with all of the data that we have with machine learning is represented as numbers.
8599
14:14:18,160 --> 14:14:23,160
So the typical image format to represent an image because we're using computer vision.
8600
14:14:23,160 --> 14:14:25,160
So we're trying to figure out what's in an image.
8601
14:14:25,160 --> 14:14:31,160
The typical way to represent that is in a tensor that has a value for the height, width and color channels.
8602
14:14:31,160 --> 14:14:34,160
And so we might numerically encode these.
8603
14:14:34,160 --> 14:14:37,160
In other words, represent our images as a tensor.
8604
14:14:37,160 --> 14:14:40,160
And this would be the inputs to our machine learning algorithm.
8605
14:14:40,160 --> 14:14:49,160
And in many cases, depending on what problem you're working on, an existing algorithm already exists for many of the most popular computer vision problems.
8606
14:14:49,160 --> 14:14:51,160
And if it doesn't, you can build one.
8607
14:14:51,160 --> 14:14:57,160
And then you might fashion this machine learning algorithm to output the exact shapes that you want.
8608
14:14:57,160 --> 14:14:59,160
In our case, we want three outputs.
8609
14:14:59,160 --> 14:15:02,160
We want one output for each class that we have.
8610
14:15:02,160 --> 14:15:05,160
We want a prediction probability for sushi.
8611
14:15:05,160 --> 14:15:07,160
We want a prediction probability for steak.
8612
14:15:07,160 --> 14:15:09,160
And we want a prediction probability for pizza.
8613
14:15:09,160 --> 14:15:17,160
Now in our case, in this iteration, looks like our model got one of them wrong because the highest value was assigned to the wrong class here.
8614
14:15:17,160 --> 14:15:21,160
So for the second image, it assigned a prediction probability of 0.81 for sushi.
8615
14:15:21,160 --> 14:15:26,160
Now, keep in mind that you could change these classes to whatever your particular problem is.
8616
14:15:26,160 --> 14:15:29,160
I'm just simplifying this and making it three.
8617
14:15:29,160 --> 14:15:31,160
You could have a hundred.
8618
14:15:31,160 --> 14:15:32,160
You could have a thousand.
8619
14:15:32,160 --> 14:15:34,160
You could have five.
8620
14:15:34,160 --> 14:15:37,160
It's just, it depends on what you're working with.
8621
14:15:37,160 --> 14:15:43,160
And so we might use these predicted outputs to enhance our app.
8622
14:15:43,160 --> 14:15:46,160
So if someone wants to take a photo of their plate of sushi, our app might say,
8623
14:15:46,160 --> 14:15:48,160
hey, this is a photo of sushi.
8624
14:15:48,160 --> 14:15:53,160
Here's some information about those, the sushi rolls or the same for steak, the same for pizza.
8625
14:15:53,160 --> 14:15:57,160
Now it might not always get it right because after all, that's what machine learning is.
8626
14:15:57,160 --> 14:15:59,160
It's probabilistic.
8627
14:15:59,160 --> 14:16:02,160
So how would we improve these results here?
8628
14:16:02,160 --> 14:16:08,160
Well, we could show our model more and more images of sushi steak and pizza
8629
14:16:08,160 --> 14:16:12,160
so that it builds up a better internal representation of said images.
8630
14:16:12,160 --> 14:16:17,160
So when it looks at images it's never seen before or images outside its training data set,
8631
14:16:17,160 --> 14:16:19,160
it's able to get better results.
8632
14:16:19,160 --> 14:16:24,160
But just keep in mind this whole process is similar no matter what computer vision problem you're working with.
8633
14:16:24,160 --> 14:16:27,160
You need a way to numerically encode your information.
8634
14:16:27,160 --> 14:16:30,160
You need a machine learning model that's capable of fitting the data
8635
14:16:30,160 --> 14:16:34,160
in the way that you would like it to be fit in our case classification.
8636
14:16:34,160 --> 14:16:37,160
You might have a different type of model if you're working with object detection,
8637
14:16:37,160 --> 14:16:40,160
a different type of model if you're working with segmentation.
8638
14:16:40,160 --> 14:16:44,160
And then you need to fashion the outputs in a way that best suit your problem as well.
8639
14:16:44,160 --> 14:16:47,160
So let's push forward.
8640
14:16:47,160 --> 14:16:52,160
Oh, by the way, the model that often does this is a convolutional neural network.
8641
14:16:52,160 --> 14:16:54,160
In other words, a CNN.
8642
14:16:54,160 --> 14:16:58,160
However, you can use many other different types of machine learning algorithms here.
8643
14:16:58,160 --> 14:17:03,160
It's just that convolutional neural networks typically perform the best with image data.
8644
14:17:03,160 --> 14:17:09,160
Although with recent research, there is the transformer architecture or deep learning model
8645
14:17:09,160 --> 14:17:13,160
that also performs fairly well or very well with image data.
8646
14:17:13,160 --> 14:17:15,160
So just keep that in mind going forward.
8647
14:17:15,160 --> 14:17:18,160
But for now we're going to focus on convolutional neural networks.
8648
14:17:18,160 --> 14:17:23,160
And so we might have input and output shapes because remember one of the chief machine learning problems
8649
14:17:23,160 --> 14:17:30,160
is making sure that your tensor shapes line up with each other, the input and output shapes.
8650
14:17:30,160 --> 14:17:35,160
So if we encoded this image of stake here, we might have a dimensionality of batch size
8651
14:17:35,160 --> 14:17:37,160
with height color channels.
8652
14:17:37,160 --> 14:17:39,160
And now the ordering here could be improved.
8653
14:17:39,160 --> 14:17:41,160
It's usually height then width.
8654
14:17:41,160 --> 14:17:42,160
So alphabetical order.
8655
14:17:42,160 --> 14:17:44,160
And then color channels last.
8656
14:17:44,160 --> 14:17:47,160
So we might have the shape of none, two, two, four, two, four, three.
8657
14:17:47,160 --> 14:17:48,160
Now where does this come from?
8658
14:17:48,160 --> 14:17:50,160
So none could be the batch size.
8659
14:17:50,160 --> 14:17:56,160
Now it's none because we can set the batch size to whatever we want, say for example 32.
8660
14:17:56,160 --> 14:18:02,160
Then we might have a height of two to four and a width of two to four and three color channels.
8661
14:18:02,160 --> 14:18:05,160
Now height and width are also customizable.
8662
14:18:05,160 --> 14:18:08,160
You might change this to be 512 by 512.
8663
14:18:08,160 --> 14:18:11,160
What that would mean is that you have more numbers representing your image.
8664
14:18:11,160 --> 14:18:19,160
And in sense would take more computation to figure out the patterns because there is simply more information encoded in your image.
8665
14:18:19,160 --> 14:18:23,160
But two, two, four, two, four is a common starting point for images.
8666
14:18:23,160 --> 14:18:28,160
And then 32 is also a very common batch size, as we've seen in previous videos.
8667
14:18:28,160 --> 14:18:33,160
But again, this could be changed depending on the hardware you're using, depending on the model you're using.
8668
14:18:33,160 --> 14:18:35,160
You might have a batch size to 64.
8669
14:18:35,160 --> 14:18:37,160
You might have a batch size of 512.
8670
14:18:37,160 --> 14:18:39,160
It's all problem specific.
8671
14:18:39,160 --> 14:18:41,160
And that's this line here.
8672
14:18:41,160 --> 14:18:44,160
These will vary depending on the problem you're working on.
8673
14:18:44,160 --> 14:18:49,160
So in our case, our output shape is three because we have three different classes for now.
8674
14:18:49,160 --> 14:18:53,160
But again, if you have a hundred, you might have an output shape of a hundred.
8675
14:18:53,160 --> 14:18:56,160
If you have a thousand, you might have an output shape of a thousand.
8676
14:18:56,160 --> 14:19:00,160
The same premise of this whole pattern remains though.
8677
14:19:00,160 --> 14:19:07,160
Numerically encode your data, feed it into a model, and then make sure the output shape fits your specific problem.
8678
14:19:07,160 --> 14:19:15,160
And so, for this section, Computer Vision with PyTorch, we're going to be building CNNs to do this part.
8679
14:19:15,160 --> 14:19:21,160
We're actually going to do all of the parts here, but we're going to focus on building a convolutional neural network
8680
14:19:21,160 --> 14:19:25,160
to try and find patterns in data, because it's not always guaranteed that it will.
8681
14:19:25,160 --> 14:19:28,160
Finally, let's look at one more problem.
8682
14:19:28,160 --> 14:19:34,160
Say you had grayscale images of fashion items, and you have quite small images.
8683
14:19:34,160 --> 14:19:36,160
They're only 28 by 28.
8684
14:19:36,160 --> 14:19:38,160
The exact same pattern is going to happen.
8685
14:19:38,160 --> 14:19:42,160
You numerically represent it, use it as inputs to a machine learning algorithm,
8686
14:19:42,160 --> 14:19:46,160
and then hopefully your machine learning algorithm outputs the right type of clothing that it is.
8687
14:19:46,160 --> 14:19:48,160
In this case, it's a t-shirt.
8688
14:19:48,160 --> 14:19:55,160
But I've got dot dot dot here because we're going to be working on a problem that uses ten different types of items of clothing.
8689
14:19:55,160 --> 14:19:58,160
And the images are grayscale, so there's not much detail.
8690
14:19:58,160 --> 14:20:02,160
So hopefully our machine learning algorithm can recognize what's going on in these images.
8691
14:20:02,160 --> 14:20:05,160
There might be a boot, there might be a shirt, there might be pants, there might be a dress,
8692
14:20:05,160 --> 14:20:07,160
etc, etc.
8693
14:20:07,160 --> 14:20:14,160
But we numerically encode our images into dimensionality of batch size, height with color channels.
8694
14:20:14,160 --> 14:20:23,160
This is known as NHWC, or number of batches, or number of images in a batch, height with C, or color channels.
8695
14:20:23,160 --> 14:20:25,160
This is color channels last.
8696
14:20:25,160 --> 14:20:27,160
Why am I showing you two forms of this?
8697
14:20:27,160 --> 14:20:31,160
Do you notice color channels in this one is color channels first?
8698
14:20:31,160 --> 14:20:33,160
So color channels height width?
8699
14:20:33,160 --> 14:20:38,160
Well, because you come across a lot of different representations of data full stop,
8700
14:20:38,160 --> 14:20:42,160
but particularly image data in PyTorch and other libraries,
8701
14:20:42,160 --> 14:20:45,160
many libraries expect color channels last.
8702
14:20:45,160 --> 14:20:49,160
However, PyTorch currently at the time of recording this video may change in the future,
8703
14:20:49,160 --> 14:20:53,160
defaults to representing image data with color channels first.
8704
14:20:53,160 --> 14:20:59,160
Now this is very important because you will get errors if your dimensionality is in the wrong order.
8705
14:20:59,160 --> 14:21:05,160
And so there are ways to go in between these two, and there's a lot of debate of which format is the best.
8706
14:21:05,160 --> 14:21:09,160
It looks like color channels last is going to win over the long term, just because it's more efficient,
8707
14:21:09,160 --> 14:21:12,160
but again, that's outside the scope, but just keep this in mind.
8708
14:21:12,160 --> 14:21:19,160
We're going to write code to interact between these two, but it's the same data just represented in different order.
8709
14:21:19,160 --> 14:21:25,160
And so we could rearrange these shapes to how we want color channels last or color channels first.
8710
14:21:25,160 --> 14:21:31,160
And once again, the shapes will vary depending on the problem that you're working on.
8711
14:21:31,160 --> 14:21:35,160
So with that being said, we've covered the input and output shapes.
8712
14:21:35,160 --> 14:21:37,160
How are we going to see them with code?
8713
14:21:37,160 --> 14:21:43,160
Well, of course we're going to be following the PyTorch workflow that we've done.
8714
14:21:43,160 --> 14:21:47,160
So we need to get our data ready, turn it into tenses in some way, shape or form.
8715
14:21:47,160 --> 14:21:49,160
We can do that with taught division transforms.
8716
14:21:49,160 --> 14:21:52,160
Oh, we haven't seen that one yet, but we will.
8717
14:21:52,160 --> 14:21:57,160
We can use torchutilsdata.datasetutils.data.data loader.
8718
14:21:57,160 --> 14:22:00,160
We can then build a model or pick a pre-trained model to suit our problem.
8719
14:22:00,160 --> 14:22:06,160
We've got a whole bunch of modules to help us with that, torchNN module, torchvision.models.
8720
14:22:06,160 --> 14:22:11,160
And then we have an optimizer and a loss function.
8721
14:22:11,160 --> 14:22:16,160
We can evaluate the model using torch metrics, or we can code our own metric functions.
8722
14:22:16,160 --> 14:22:20,160
We can of course improve through experimentation, which we will see later on,
8723
14:22:20,160 --> 14:22:21,160
which we've actually done that, right?
8724
14:22:21,160 --> 14:22:23,160
We've done improvement through experimentation.
8725
14:22:23,160 --> 14:22:25,160
We've tried different models, we've tried different things.
8726
14:22:25,160 --> 14:22:31,160
And then finally, we can save and reload our trained model if we wanted to use it elsewhere.
8727
14:22:31,160 --> 14:22:33,160
So with that being said, we've covered the workflow.
8728
14:22:33,160 --> 14:22:36,160
This is just a high-level overview of what we're going to code.
8729
14:22:36,160 --> 14:22:41,160
You might be asking the question, what is a convolutional neural network, or a CNN?
8730
14:22:41,160 --> 14:22:43,160
Let's answer that in the next video.
8731
14:22:43,160 --> 14:22:46,160
I'll see you there.
8732
14:22:46,160 --> 14:22:47,160
Welcome back.
8733
14:22:47,160 --> 14:22:52,160
In the last video, we saw examples of computer vision input and output shapes.
8734
14:22:52,160 --> 14:22:58,160
And we kind of hinted at the fact that convolutional neural networks are deep learning models, or CNNs,
8735
14:22:58,160 --> 14:23:02,160
that are quite good at recognizing patterns in images.
8736
14:23:02,160 --> 14:23:07,160
So we left off the last video with the question, what is a convolutional neural network?
8737
14:23:07,160 --> 14:23:09,160
And where could you find out about that?
8738
14:23:09,160 --> 14:23:13,160
What is a convolutional neural network?
8739
14:23:13,160 --> 14:23:15,160
Here's one way to find out.
8740
14:23:15,160 --> 14:23:19,160
And I'm sure, as you've seen, there's a lot of resources for such things.
8741
14:23:19,160 --> 14:23:22,160
A comprehensive guide to convolutional neural networks.
8742
14:23:22,160 --> 14:23:24,160
Which one of these is the best?
8743
14:23:24,160 --> 14:23:26,160
Well, it doesn't really matter.
8744
14:23:26,160 --> 14:23:28,160
The best one is the one that you understand the best.
8745
14:23:28,160 --> 14:23:29,160
So there we go.
8746
14:23:29,160 --> 14:23:31,160
There's a great video from Code Basics.
8747
14:23:31,160 --> 14:23:34,160
I've seen that one before, simple explanation of convolutional neural network.
8748
14:23:34,160 --> 14:23:38,160
I'll leave you to research these things on your own.
8749
14:23:38,160 --> 14:23:42,160
And if you wanted to look at images, there's a whole bunch of images.
8750
14:23:42,160 --> 14:23:45,160
I prefer to learn things by writing code.
8751
14:23:45,160 --> 14:23:47,160
Because remember, this course is code first.
8752
14:23:47,160 --> 14:23:51,160
As a machine learning engineer, 99% of my time is spent writing code.
8753
14:23:51,160 --> 14:23:53,160
So that's what we're going to focus on.
8754
14:23:53,160 --> 14:23:56,160
But anyway, here's the typical architecture of a CNN.
8755
14:23:56,160 --> 14:23:58,160
In other words, a convolutional neural network.
8756
14:23:58,160 --> 14:24:01,160
If you hear me say CNN, I'm not talking about the news website.
8757
14:24:01,160 --> 14:24:05,160
In this course, I'm talking about the architecture convolutional neural network.
8758
14:24:05,160 --> 14:24:11,160
So this is some PyTorch code that we're going to be working towards building.
8759
14:24:11,160 --> 14:24:14,160
But we have some hyperparameters slash layer types here.
8760
14:24:14,160 --> 14:24:15,160
We have an input layer.
8761
14:24:15,160 --> 14:24:19,160
So we have an input layer, which takes some in channels, and an input shape.
8762
14:24:19,160 --> 14:24:23,160
Because remember, it's very important in machine learning and deep learning to line up your
8763
14:24:23,160 --> 14:24:27,160
input and output shapes of whatever model you're using, whatever problem you're working with.
8764
14:24:27,160 --> 14:24:30,160
Then we have some sort of convolutional layer.
8765
14:24:30,160 --> 14:24:33,160
Now, what might happen in a convolutional layer?
8766
14:24:33,160 --> 14:24:39,160
Well, as you might have guessed, as what happens in many neural networks, is that the layers
8767
14:24:39,160 --> 14:24:42,160
perform some sort of mathematical operation.
8768
14:24:42,160 --> 14:24:48,160
Now, convolutional layers perform convolving window operation across an image or across
8769
14:24:48,160 --> 14:24:49,160
a tensor.
8770
14:24:49,160 --> 14:24:53,160
And discover patterns using, let's have a look, actually.
8771
14:24:53,160 --> 14:24:59,160
Let's go, nn.com2d.
8772
14:24:59,160 --> 14:25:01,160
There we go.
8773
14:25:01,160 --> 14:25:02,160
This is what happens.
8774
14:25:02,160 --> 14:25:10,160
So the output of our network equals a bias plus the sum of the weight tensor over the
8775
14:25:10,160 --> 14:25:14,160
convolutional channel out, okay, times input.
8776
14:25:14,160 --> 14:25:19,160
Now, if you want to dig deeper into what is actually going on here, you're more than welcome to
8777
14:25:19,160 --> 14:25:20,160
do that.
8778
14:25:20,160 --> 14:25:23,160
But we're going to be writing code that leverages the torch nn.com2d.
8779
14:25:23,160 --> 14:25:29,160
And we're going to fix up all of these hyperparameters here so that it works with our problem.
8780
14:25:29,160 --> 14:25:33,160
Now, what you need to know here is that this is a bias term.
8781
14:25:33,160 --> 14:25:34,160
We've seen this before.
8782
14:25:34,160 --> 14:25:36,160
And this is a weight matrix.
8783
14:25:36,160 --> 14:25:39,160
So a bias vector typically and a weight matrix.
8784
14:25:39,160 --> 14:25:42,160
And they operate over the input.
8785
14:25:42,160 --> 14:25:45,160
But we'll see these later on with code.
8786
14:25:45,160 --> 14:25:47,160
So just keep that in mind.
8787
14:25:47,160 --> 14:25:48,160
This is what's happening.
8788
14:25:48,160 --> 14:25:53,160
As with every layer in a neural network, some form of operation is happening on our input
8789
14:25:53,160 --> 14:25:54,160
data.
8790
14:25:54,160 --> 14:25:59,160
These operations happen layer by layer until eventually, hopefully, they can be turned into
8791
14:25:59,160 --> 14:26:02,160
some usable output.
8792
14:26:02,160 --> 14:26:04,160
So let's jump back in here.
8793
14:26:04,160 --> 14:26:09,160
Then we have an hidden activation slash nonlinear activation because why do we use nonlinear
8794
14:26:09,160 --> 14:26:10,160
activations?
8795
14:26:10,160 --> 14:26:17,160
Well, it's because if our data was nonlinear, non-straight lines, we need the help of straight
8796
14:26:17,160 --> 14:26:21,160
and non-straight lines to model it, to draw patterns in it.
8797
14:26:21,160 --> 14:26:24,160
Then we typically have a pooling layer.
8798
14:26:24,160 --> 14:26:26,160
And I want you to take this architecture.
8799
14:26:26,160 --> 14:26:31,160
I've said typical here for a reason because these type of architectures are changing all
8800
14:26:31,160 --> 14:26:32,160
the time.
8801
14:26:32,160 --> 14:26:34,160
So this is just one typical example of a CNN.
8802
14:26:34,160 --> 14:26:37,160
It's about as basic as a CNN as you can get.
8803
14:26:37,160 --> 14:26:40,160
So over time, you will start to learn to build more complex models.
8804
14:26:40,160 --> 14:26:44,160
You will not only start to learn to build them, you will just start to learn to use them,
8805
14:26:44,160 --> 14:26:47,160
as we'll see later on in the transfer learning section of the course.
8806
14:26:47,160 --> 14:26:49,160
And then we have an output layer.
8807
14:26:49,160 --> 14:26:51,160
So do you notice the trend here?
8808
14:26:51,160 --> 14:26:55,160
We have an input layer and then we have multiple hidden layers that perform some sort of mathematical
8809
14:26:55,160 --> 14:26:56,160
operation on our data.
8810
14:26:56,160 --> 14:27:02,160
And then we have an output slash linear layer that converts our output into the ideal shape
8811
14:27:02,160 --> 14:27:03,160
that we'd like.
8812
14:27:03,160 --> 14:27:05,160
So we have an output shape here.
8813
14:27:05,160 --> 14:27:08,160
And then how does this look in process?
8814
14:27:08,160 --> 14:27:11,160
While we put in some images, they go through all of these layers here because we've used
8815
14:27:11,160 --> 14:27:12,160
an end up sequential.
8816
14:27:12,160 --> 14:27:18,960
And then hopefully this forward method returns x in a usable status or usable state that
8817
14:27:18,960 --> 14:27:20,960
we can convert into class names.
8818
14:27:20,960 --> 14:27:25,960
And then we could integrate this into our computer vision app in some way, shape or form.
8819
14:27:25,960 --> 14:27:28,960
And here's the asterisk here.
8820
14:27:28,960 --> 14:27:32,960
Note, there are almost an unlimited amount of ways you could stack together a convolutional
8821
14:27:32,960 --> 14:27:33,960
neural network.
8822
14:27:33,960 --> 14:27:35,960
This slide only demonstrates one.
8823
14:27:35,960 --> 14:27:38,960
So just keep that in mind, only demonstrates one.
8824
14:27:38,960 --> 14:27:41,960
But the best way to practice this sort of stuff is not to stare at a page.
8825
14:27:41,960 --> 14:27:44,960
It's to if and out, code it out.
8826
14:27:44,960 --> 14:27:49,960
So let's code, I'll see you in Google CoLab.
8827
14:27:49,960 --> 14:27:50,960
Welcome back.
8828
14:27:50,960 --> 14:27:54,960
Now, we've discussed a bunch of fundamentals about computer vision problems and convolutional
8829
14:27:54,960 --> 14:27:55,960
neural networks.
8830
14:27:55,960 --> 14:27:58,960
But rather than talk to more slides, well, let's start to code them out.
8831
14:27:58,960 --> 14:28:02,960
I'm going to meet you at colab.research.google.com.
8832
14:28:02,960 --> 14:28:05,960
She's going to clean up some of these tabs.
8833
14:28:05,960 --> 14:28:08,960
And I'm going to start a new notebook.
8834
14:28:08,960 --> 14:28:15,960
And then I'm going to name this one, this is going to be 03 PyTorch computer vision.
8835
14:28:15,960 --> 14:28:17,960
And I'm going to call mine video.
8836
14:28:17,960 --> 14:28:24,960
So just so it has the video tag, because if we go in here, if we go video notebooks of
8837
14:28:24,960 --> 14:28:28,960
the PyTorch deep learning repo, the video notebooks are stored in here.
8838
14:28:28,960 --> 14:28:29,960
They've got the underscore video tag.
8839
14:28:29,960 --> 14:28:33,960
So the video notebooks have all of the code I write exactly in the video.
8840
14:28:33,960 --> 14:28:36,960
But there are some reference notebooks to go along with it.
8841
14:28:36,960 --> 14:28:40,960
Let me just write a heading here, PyTorch computer vision.
8842
14:28:40,960 --> 14:28:45,960
And I'll put a resource here, see reference notebook.
8843
14:28:45,960 --> 14:28:48,960
Now, of course, this is the one that's the ground truth.
8844
14:28:48,960 --> 14:28:54,960
It's got all of the code that we're going to be writing.
8845
14:28:54,960 --> 14:28:56,960
I'm going to put that in here.
8846
14:28:56,960 --> 14:28:58,960
Explain with text and images and whatnot.
8847
14:28:58,960 --> 14:29:03,960
And then finally, as we got see reference online book.
8848
14:29:03,960 --> 14:29:09,960
And that is at learnpytorch.io at section number three, PyTorch computer vision.
8849
14:29:09,960 --> 14:29:11,960
I'm going to put that in there.
8850
14:29:11,960 --> 14:29:14,960
And then I'm going to turn this into markdown with command mm.
8851
14:29:14,960 --> 14:29:15,960
Beautiful.
8852
14:29:15,960 --> 14:29:16,960
So let's get started.
8853
14:29:16,960 --> 14:29:18,960
I'm going to get rid of this, get rid of this.
8854
14:29:18,960 --> 14:29:20,960
How do we start this off?
8855
14:29:20,960 --> 14:29:26,960
Well, I believe there are some computer vision libraries that you should be aware of.
8856
14:29:26,960 --> 14:29:29,960
Computer vision libraries in PyTorch.
8857
14:29:29,960 --> 14:29:32,960
So this is just going to be a text based cell.
8858
14:29:32,960 --> 14:29:43,960
But the first one is torch vision, which is the base domain library for PyTorch computer vision.
8859
14:29:43,960 --> 14:29:47,960
So if we look up torch vision, what do we find?
8860
14:29:47,960 --> 14:29:50,960
We have torch vision 0.12.
8861
14:29:50,960 --> 14:29:54,960
That's the version that torch vision is currently up to at the time of recording this.
8862
14:29:54,960 --> 14:29:59,960
So in here, this is very important to get familiar with if you're working on computer vision problems.
8863
14:29:59,960 --> 14:30:03,960
And of course, in the documentation, this is just another tidbit.
8864
14:30:03,960 --> 14:30:05,960
We have torch audio for audio problems.
8865
14:30:05,960 --> 14:30:11,960
We have torch text for text torch vision, which is what we're working on torch rack for recommendation systems
8866
14:30:11,960 --> 14:30:18,960
torch data for dealing with different data pipelines torch serve, which is for serving PyTorch models
8867
14:30:18,960 --> 14:30:20,960
and PyTorch on XLA.
8868
14:30:20,960 --> 14:30:24,960
So I believe that stands for accelerated linear algebra devices.
8869
14:30:24,960 --> 14:30:26,960
You don't have to worry about these ones for now.
8870
14:30:26,960 --> 14:30:28,960
We're focused on torch vision.
8871
14:30:28,960 --> 14:30:33,960
However, if you would like to learn more about a particular domain, this is where you would go to learn more.
8872
14:30:33,960 --> 14:30:36,960
So there's a bunch of different stuff that's going on here.
8873
14:30:36,960 --> 14:30:38,960
Transforming and augmenting images.
8874
14:30:38,960 --> 14:30:42,960
So fundamentally, computer vision is dealing with things in the form of images.
8875
14:30:42,960 --> 14:30:45,960
Even a video gets converted to an image.
8876
14:30:45,960 --> 14:30:47,960
We have models and pre-trained weights.
8877
14:30:47,960 --> 14:30:53,960
So as I referenced before, you can use an existing model that works on an existing computer vision problem for your own problem.
8878
14:30:53,960 --> 14:30:57,960
We're going to cover that in section, I think it's six, for transfer learning.
8879
14:30:57,960 --> 14:31:03,960
And then we have data sets, which is a bunch of computer vision data sets, utils, operators, a whole bunch of stuff here.
8880
14:31:03,960 --> 14:31:07,960
So PyTorch is really, really good for computer vision.
8881
14:31:07,960 --> 14:31:09,960
I mean, look at all the stuff that's going on here.
8882
14:31:09,960 --> 14:31:11,960
But that's enough talking about it.
8883
14:31:11,960 --> 14:31:12,960
Let's just put it in here.
8884
14:31:12,960 --> 14:31:14,960
Torch vision. This is the main one.
8885
14:31:14,960 --> 14:31:16,960
I'm not going to link to all of these.
8886
14:31:16,960 --> 14:31:20,960
All of the links for these, by the way, is in the book version of the course PyTorch Computer Vision.
8887
14:31:20,960 --> 14:31:23,960
And we have what we're going to cover.
8888
14:31:23,960 --> 14:31:27,960
And finally, computer vision libraries in PyTorch.
8889
14:31:27,960 --> 14:31:30,960
Torch vision, data sets, models, transforms, et cetera.
8890
14:31:30,960 --> 14:31:32,960
But let's just write down the other ones.
8891
14:31:32,960 --> 14:31:38,960
So we have torch vision, not data sets, something to be aware of.
8892
14:31:38,960 --> 14:31:47,960
So get data sets and data loading functions for computer vision here.
8893
14:31:47,960 --> 14:31:50,960
Then we have torch vision.
8894
14:31:50,960 --> 14:31:55,960
And from torch vision, models is get pre-trained computer vision.
8895
14:31:55,960 --> 14:32:00,960
So when I say pre-trained computer vision models, we're going to cover this more in transfer learning, as I said.
8896
14:32:00,960 --> 14:32:06,960
Pre-trained computer vision models are models that have been already trained on some existing vision data
8897
14:32:06,960 --> 14:32:11,960
and have trained weights, trained patterns that you can leverage for your own problems,
8898
14:32:11,960 --> 14:32:16,960
that you can leverage for your own problems.
8899
14:32:16,960 --> 14:32:20,960
Then we have torch vision.transforms.
8900
14:32:20,960 --> 14:32:35,960
And then we have functions for manipulating your vision data, which is, of course, images to be suitable for use with an ML model.
8901
14:32:35,960 --> 14:32:41,960
So remember, what do we have to do when we have image data or almost any kind of data?
8902
14:32:41,960 --> 14:32:47,960
For machine learning, we have to prepare it in a way so it can't just be pure images, so that's what transforms help us out with.
8903
14:32:47,960 --> 14:32:53,960
Transforms helps to turn our image data into numbers so we can use it with a machine learning model.
8904
14:32:53,960 --> 14:32:57,960
And then, of course, we have some, these are the torch utils.
8905
14:32:57,960 --> 14:33:02,960
This is not vision specific, it's entirety of PyTorch specific, and that's data set.
8906
14:33:02,960 --> 14:33:10,960
So if we wanted to create our own data set with our own custom data, we have the base data set class for PyTorch.
8907
14:33:10,960 --> 14:33:15,960
And then we have finally torch utils data.
8908
14:33:15,960 --> 14:33:22,960
These are just good to be aware of because you'll almost always use some form of data set slash data loader with whatever PyTorch problem you're working on.
8909
14:33:22,960 --> 14:33:29,960
So this creates a Python iterable over a data set.
8910
14:33:29,960 --> 14:33:30,960
Wonderful.
8911
14:33:30,960 --> 14:33:33,960
I think these are most of the libraries that we're going to be using in this section.
8912
14:33:33,960 --> 14:33:37,960
Let's import some of them, hey, so we can see what's going on.
8913
14:33:37,960 --> 14:33:41,960
Let's go import PyTorch.
8914
14:33:41,960 --> 14:33:43,960
Import PyTorch.
8915
14:33:43,960 --> 14:33:46,960
So import torch.
8916
14:33:46,960 --> 14:33:49,960
We're also going to get NN, which stands for neural network.
8917
14:33:49,960 --> 14:33:50,960
What's in NN?
8918
14:33:50,960 --> 14:33:57,960
Well, in NN, of course, we have lots of layers, lots of loss functions, a whole bunch of different stuff for building neural networks.
8919
14:33:57,960 --> 14:34:02,960
We're going to also import torch vision.
8920
14:34:02,960 --> 14:34:05,960
And then we're going to go from torch vision.
8921
14:34:05,960 --> 14:34:13,960
Import data sets because we're going to be using data sets later on to get a data set to work with from torch vision.
8922
14:34:13,960 --> 14:34:16,960
Well, import transforms.
8923
14:34:16,960 --> 14:34:23,960
You could also go from torch vision dot transforms import to tensor.
8924
14:34:23,960 --> 14:34:26,960
This is one of the main ones you'll see for computer vision problems to tensor.
8925
14:34:26,960 --> 14:34:28,960
You can imagine what it does.
8926
14:34:28,960 --> 14:34:29,960
But let's have a look.
8927
14:34:29,960 --> 14:34:33,960
Transforms to tensor.
8928
14:34:33,960 --> 14:34:36,960
Transforming and augmenting images.
8929
14:34:36,960 --> 14:34:37,960
So look where we are.
8930
14:34:37,960 --> 14:34:41,960
We're in pytorch.org slash vision slash stable slash transforms.
8931
14:34:41,960 --> 14:34:42,960
Over here.
8932
14:34:42,960 --> 14:34:44,960
So we're in the torch vision section.
8933
14:34:44,960 --> 14:34:47,960
And we're just looking at transforming and augmenting images.
8934
14:34:47,960 --> 14:34:49,960
So transforming.
8935
14:34:49,960 --> 14:34:50,960
What do we have?
8936
14:34:50,960 --> 14:34:53,960
Transforms are common image transformations of our and the transforms module.
8937
14:34:53,960 --> 14:34:56,960
They can be trained together using compose.
8938
14:34:56,960 --> 14:34:57,960
Beautiful.
8939
14:34:57,960 --> 14:35:01,960
So if we have two tensor, what does this do?
8940
14:35:01,960 --> 14:35:05,960
Convert a pill image on NumPy and the array to a tensor.
8941
14:35:05,960 --> 14:35:06,960
Beautiful.
8942
14:35:06,960 --> 14:35:08,960
That's what we want to do later on, isn't it?
8943
14:35:08,960 --> 14:35:14,960
Well, this is kind of me giving you a spoiler is we want to convert our images into tensors so that we can use those with our models.
8944
14:35:14,960 --> 14:35:22,960
But there's a whole bunch of different transforms here and actually one of your extra curriculum is to be to read through each of these packages for 10 minutes.
8945
14:35:22,960 --> 14:35:29,960
So that's about an hour of reading, but it will definitely help you later on if you get familiar with using the pytorch documentation.
8946
14:35:29,960 --> 14:35:32,960
After all, this course is just a momentum builder.
8947
14:35:32,960 --> 14:35:34,960
We're going to write heaves of pytorch code.
8948
14:35:34,960 --> 14:35:38,960
But fundamentally, you'll be teaching yourself a lot of stuff by reading the documentation.
8949
14:35:38,960 --> 14:35:40,960
Let's keep going with this.
8950
14:35:40,960 --> 14:35:42,960
Where were we up to?
8951
14:35:42,960 --> 14:35:48,960
When we're getting familiar with our data, mapplotlib is going to be fundamental for visualization.
8952
14:35:48,960 --> 14:35:55,960
Remember, the data explorer's motto, visualize, visualize, visualize, become one with the data.
8953
14:35:55,960 --> 14:36:00,960
So we're going to import mapplotlib.pyplot as PLT.
8954
14:36:00,960 --> 14:36:03,960
And then finally, let's check the versions.
8955
14:36:03,960 --> 14:36:10,960
So print torch.version or underscore, underscore version and print torch vision.
8956
14:36:10,960 --> 14:36:15,960
So by the time you watch this, there might be a newer version of each of these modules out.
8957
14:36:15,960 --> 14:36:18,960
If there's any errors in the code, please let me know.
8958
14:36:18,960 --> 14:36:22,960
But this is just a bare minimum version that you'll need to complete this section.
8959
14:36:22,960 --> 14:36:32,960
I believe at the moment, Google Colab is running 1.11 for torch and maybe 1.10.
8960
14:36:32,960 --> 14:36:34,960
We'll find out in a second.
8961
14:36:34,960 --> 14:36:35,960
It just connected.
8962
14:36:35,960 --> 14:36:38,960
So we're importing pytorch.
8963
14:36:38,960 --> 14:36:40,960
Okay, there we go.
8964
14:36:40,960 --> 14:36:46,960
So my pytorch version is 1.10 and it's got CUDA available and torch vision is 0.11.
8965
14:36:46,960 --> 14:36:50,960
So just make sure if you're running in Google Colab, if you're running this at a later date,
8966
14:36:50,960 --> 14:36:54,960
you probably have at minimum these versions, you might even have a later version.
8967
14:36:54,960 --> 14:36:58,960
So these are the minimum versions required for this upcoming section.
8968
14:36:58,960 --> 14:37:02,960
So we've covered the base computer vision libraries in pytorch.
8969
14:37:02,960 --> 14:37:03,960
We've got them ready to go.
8970
14:37:03,960 --> 14:37:07,960
How about in the next video, we cover getting a data set.
8971
14:37:07,960 --> 14:37:09,960
I'll see you there.
8972
14:37:09,960 --> 14:37:11,960
Welcome back.
8973
14:37:11,960 --> 14:37:15,960
So in the last video, we covered some of the fundamental computer vision libraries in pytorch.
8974
14:37:15,960 --> 14:37:19,960
The main one being torch vision and then modules that stem off torch vision.
8975
14:37:19,960 --> 14:37:25,960
And then of course, we've got torch utils dot data dot data set, which is the base data set class for pytorch
8976
14:37:25,960 --> 14:37:29,960
and data loader, which creates a Python irritable over a data set.
8977
14:37:29,960 --> 14:37:32,960
So let's begin where most machine learning projects do.
8978
14:37:32,960 --> 14:37:36,960
And that is getting a data set, getting a data set.
8979
14:37:36,960 --> 14:37:38,960
I'm going to turn this into markdown.
8980
14:37:38,960 --> 14:37:46,960
And the data set that we're going to be used to demonstrating some computer vision techniques is fashion amnest.
8981
14:37:46,960 --> 14:37:57,960
Which is a take of the data set we'll be using is fashion amnest, which is a take on the original amnest data set,
8982
14:37:57,960 --> 14:38:05,960
amnest database, which is modified national institute of standards and technology database, which is kind of like the hello world
8983
14:38:05,960 --> 14:38:11,960
in machine learning and computer vision, which is these are sample images from the amnest test data set,
8984
14:38:11,960 --> 14:38:15,960
which are grayscale images of handwritten digits.
8985
14:38:15,960 --> 14:38:21,960
So this, I believe was originally used for trying to find out if you could use computer vision at a postal service
8986
14:38:21,960 --> 14:38:24,960
to, I guess, recognize post codes and whatnot.
8987
14:38:24,960 --> 14:38:27,960
I may be wrong about that, but that's what I know.
8988
14:38:27,960 --> 14:38:29,960
Yeah, 1998.
8989
14:38:29,960 --> 14:38:32,960
So all the way back at 1998, how cool is that?
8990
14:38:32,960 --> 14:38:36,960
So this was basically where convolutional neural networks were founded.
8991
14:38:36,960 --> 14:38:38,960
I'll let you read up on the history of that.
8992
14:38:38,960 --> 14:38:44,960
But neural network started to get so good that this data set was quite easy for them to do really well.
8993
14:38:44,960 --> 14:38:46,960
And that's when fashion amnest came out.
8994
14:38:46,960 --> 14:38:50,960
So this is a little bit harder if we go into here.
8995
14:38:50,960 --> 14:38:53,960
This is by Zalando research fashion amnest.
8996
14:38:53,960 --> 14:38:58,960
And it's of grayscale images of pieces of clothing.
8997
14:38:58,960 --> 14:39:06,960
So like we saw before the input and output, what we're going to be trying to do is turning these images of clothing into numbers
8998
14:39:06,960 --> 14:39:11,960
and then training a computer vision model to recognize what the different styles of clothing are.
8999
14:39:11,960 --> 14:39:15,960
And here's a dimensionality plot of all the different items of clothing.
9000
14:39:15,960 --> 14:39:19,960
Visualizing where similar items are grouped together, there's the shoes and whatnot.
9001
14:39:19,960 --> 14:39:21,960
Is this interactive?
9002
14:39:21,960 --> 14:39:23,960
Oh no, it's a video.
9003
14:39:23,960 --> 14:39:24,960
Excuse me.
9004
14:39:24,960 --> 14:39:25,960
There we go.
9005
14:39:25,960 --> 14:39:27,960
To serious machine learning researchers.
9006
14:39:27,960 --> 14:39:29,960
We are talking about replacing amnest.
9007
14:39:29,960 --> 14:39:31,960
Amnest is too easy.
9008
14:39:31,960 --> 14:39:32,960
Amnest is overused.
9009
14:39:32,960 --> 14:39:35,960
Amnest cannot represent modern CV tasks.
9010
14:39:35,960 --> 14:39:41,960
So even now fashion amnest I would say has also been pretty much sold, but it's a good way to get started.
9011
14:39:41,960 --> 14:39:43,960
Now, where could we find such a data set?
9012
14:39:43,960 --> 14:39:45,960
We could download it from GitHub.
9013
14:39:45,960 --> 14:39:50,960
But if we come back to the taught division documentation, have a look at data sets.
9014
14:39:50,960 --> 14:39:52,960
We have a whole bunch of built-in data sets.
9015
14:39:52,960 --> 14:39:56,960
And remember, this is your extra curricular to read through these for 10 minutes or so each.
9016
14:39:56,960 --> 14:39:58,960
But we have an example.
9017
14:39:58,960 --> 14:40:02,960
We could download ImageNet if we want.
9018
14:40:02,960 --> 14:40:05,960
We also have some base classes here for custom data sets.
9019
14:40:05,960 --> 14:40:07,960
We'll see that later on.
9020
14:40:07,960 --> 14:40:10,960
But if we scroll through, we have image classification data sets.
9021
14:40:10,960 --> 14:40:11,960
Caltech 101.
9022
14:40:11,960 --> 14:40:13,960
I didn't even know what all of these are.
9023
14:40:13,960 --> 14:40:14,960
There's a lot here.
9024
14:40:14,960 --> 14:40:15,960
CFAR 100.
9025
14:40:15,960 --> 14:40:18,960
So that's an example of 100 different items.
9026
14:40:18,960 --> 14:40:22,960
So that would be a 100 class, multi-class classification problem.
9027
14:40:22,960 --> 14:40:24,960
CFAR 10 is 10 classes.
9028
14:40:24,960 --> 14:40:26,960
We have amnest.
9029
14:40:26,960 --> 14:40:28,960
We have fashion amnest.
9030
14:40:28,960 --> 14:40:29,960
Oh, that's the one we're after.
9031
14:40:29,960 --> 14:40:36,960
But this is basically what you would do to download a data set from taughtvision.datasets.
9032
14:40:36,960 --> 14:40:39,960
You would download the data in some way, shape, or form.
9033
14:40:39,960 --> 14:40:41,960
And then you would turn it into a data loader.
9034
14:40:41,960 --> 14:40:51,960
So ImageNet is one of the most popular or is probably the gold standard data set for computer vision evaluation.
9035
14:40:51,960 --> 14:40:52,960
It's quite a big data set.
9036
14:40:52,960 --> 14:40:53,960
It's got millions of images.
9037
14:40:53,960 --> 14:40:58,960
But that's the beauty of taught vision is that it allows us to download example data sets
9038
14:40:58,960 --> 14:41:00,960
that we can practice on.
9039
14:41:00,960 --> 14:41:03,960
I don't even perform research on from a built-in module.
9040
14:41:03,960 --> 14:41:07,960
So let's now have a look at the fashion amnest data set.
9041
14:41:07,960 --> 14:41:09,960
How might we get this?
9042
14:41:09,960 --> 14:41:12,960
So we've got some example code here, or this is the documentation.
9043
14:41:12,960 --> 14:41:15,960
taughtvision.datasets.fashion amnest.
9044
14:41:15,960 --> 14:41:16,960
We have to pass in a root.
9045
14:41:16,960 --> 14:41:19,960
So where do we want to download the data set?
9046
14:41:19,960 --> 14:41:22,960
We also have to pass in whether we want the training version of the data set
9047
14:41:22,960 --> 14:41:24,960
or whether we want the testing version of the data set.
9048
14:41:24,960 --> 14:41:26,960
Do we want to download it?
9049
14:41:26,960 --> 14:41:27,960
Yes or no?
9050
14:41:27,960 --> 14:41:31,960
Should we transform the data in any way shape or form?
9051
14:41:31,960 --> 14:41:36,960
So we're going to be downloading images through this function call or this class call.
9052
14:41:36,960 --> 14:41:39,960
Do we want to transform those images in some way?
9053
14:41:39,960 --> 14:41:42,960
What do we have to do to images before we can use them with a model?
9054
14:41:42,960 --> 14:41:45,960
We have to turn them into a tensor, so we might look into that in a moment.
9055
14:41:45,960 --> 14:41:50,960
And target transform is do we want to transform the labels in any way shape or form?
9056
14:41:50,960 --> 14:41:54,960
So often the data sets that you download from taughtvision.datasets
9057
14:41:54,960 --> 14:41:59,960
are pre formatted in a way that they can be quite easily used with PyTorch.
9058
14:41:59,960 --> 14:42:02,960
But that won't always be the case with your own custom data sets.
9059
14:42:02,960 --> 14:42:07,960
However, what we're about to cover is just important to get an idea of what the computer vision workflow is.
9060
14:42:07,960 --> 14:42:13,960
And then later on you can start to customize how you get your data in the right format to be used with the model.
9061
14:42:13,960 --> 14:42:15,960
Then we have some different parameters here and whatnot.
9062
14:42:15,960 --> 14:42:20,960
Let's just rather than look at the documentation, if and down, code it out.
9063
14:42:20,960 --> 14:42:31,960
So we'll be using fashion MNIST and we'll start by, I'm going to just put this here, from taughtvision.datasets.
9064
14:42:31,960 --> 14:42:36,960
And we'll put the link there and we'll start by getting the training data.
9065
14:42:36,960 --> 14:42:38,960
Set up training data.
9066
14:42:38,960 --> 14:42:43,960
I'm just going to make some code cells here so that I can code in the middle of the screen.
9067
14:42:43,960 --> 14:42:50,960
Set up training data. Training data equals data sets dot fashion MNIST.
9068
14:42:50,960 --> 14:42:55,960
Because recall, we've already from taughtvision.
9069
14:42:55,960 --> 14:43:01,960
We don't need to import this again, I'm just doing it for demonstration purposes, but from taughtvision import data sets
9070
14:43:01,960 --> 14:43:04,960
so we can just call data sets dot fashion MNIST.
9071
14:43:04,960 --> 14:43:06,960
And then we're going to type in root.
9072
14:43:06,960 --> 14:43:09,960
See how the doc string comes up and tells us what's going on.
9073
14:43:09,960 --> 14:43:15,960
I personally find this a bit hard to read in Google Colab, so if I'm looking up the documentation,
9074
14:43:15,960 --> 14:43:17,960
I like to just go into here.
9075
14:43:17,960 --> 14:43:19,960
But let's code it out.
9076
14:43:19,960 --> 14:43:25,960
So root is going to be data, so where to download data to.
9077
14:43:25,960 --> 14:43:27,960
We'll see what this does in a minute.
9078
14:43:27,960 --> 14:43:28,960
Then we're going to go train.
9079
14:43:28,960 --> 14:43:31,960
We want the training version of the data set.
9080
14:43:31,960 --> 14:43:36,960
So as I said, a lot of the data sets that you find in taughtvision.datasets
9081
14:43:36,960 --> 14:43:40,960
have been formatted into training data set and testing data set already.
9082
14:43:40,960 --> 14:43:47,960
So this Boolean tells us do we want the training data set?
9083
14:43:47,960 --> 14:43:51,960
So if that was false, we would get the testing data set of fashion MNIST.
9084
14:43:51,960 --> 14:43:53,960
Do we want to download it?
9085
14:43:53,960 --> 14:43:56,960
Do we want to download?
9086
14:43:56,960 --> 14:43:57,960
Yes, no.
9087
14:43:57,960 --> 14:44:00,960
So yes, we do. We're going to set that to true.
9088
14:44:00,960 --> 14:44:03,960
Now what sort of transform do we want to do?
9089
14:44:03,960 --> 14:44:07,960
So because we're going to be downloading images and what do we have to do to our images
9090
14:44:07,960 --> 14:44:11,960
to use them with a machine-loading model, we have to convert them into tensors.
9091
14:44:11,960 --> 14:44:20,960
So I'm going to pass the transform to tensor, but we could also just go torchvision.transforms.to tensor.
9092
14:44:20,960 --> 14:44:23,960
That would be the exact same thing as what we just did before.
9093
14:44:23,960 --> 14:44:27,960
And then the target transform, do we want to transform the labels?
9094
14:44:27,960 --> 14:44:28,960
No, we don't.
9095
14:44:28,960 --> 14:44:31,960
We're going to see how they come, or the target, sorry.
9096
14:44:31,960 --> 14:44:34,960
High torch, this is another way, another naming convention.
9097
14:44:34,960 --> 14:44:37,960
Often uses target for the target that you're trying to predict.
9098
14:44:37,960 --> 14:44:42,960
So using data to predict the target, which is I often use data to predict a label.
9099
14:44:42,960 --> 14:44:44,960
They're the same thing.
9100
14:44:44,960 --> 14:44:49,960
So how do we want to transform the data?
9101
14:44:49,960 --> 14:44:56,960
And how do we want to transform the labels?
9102
14:44:56,960 --> 14:45:01,960
And then we're going to do the same for the test data.
9103
14:45:01,960 --> 14:45:03,960
So we're going to go data sets.
9104
14:45:03,960 --> 14:45:05,960
You might know what to do here.
9105
14:45:05,960 --> 14:45:09,960
It's going to be the exact same code as above, except we're going to change one line.
9106
14:45:09,960 --> 14:45:11,960
We want to store it in data.
9107
14:45:11,960 --> 14:45:16,960
We want to download the training data set as false because we want the testing version.
9108
14:45:16,960 --> 14:45:18,960
Do we want to download it?
9109
14:45:18,960 --> 14:45:19,960
Yes, we do.
9110
14:45:19,960 --> 14:45:21,960
Do we want to transform it the data?
9111
14:45:21,960 --> 14:45:26,960
Yes, we do, we want to use to tensor to convert our image data to tensors.
9112
14:45:26,960 --> 14:45:29,960
And do we want to do a target transform?
9113
14:45:29,960 --> 14:45:30,960
Well, no, we don't.
9114
14:45:30,960 --> 14:45:33,960
We want to keep the label slash the targets as they are.
9115
14:45:33,960 --> 14:45:35,960
Let's see what happens when we run this.
9116
14:45:35,960 --> 14:45:39,960
Oh, downloading fashion, Evan is beautiful.
9117
14:45:39,960 --> 14:45:41,960
So this is going to download all of the labels.
9118
14:45:41,960 --> 14:45:42,960
What do we have?
9119
14:45:42,960 --> 14:45:47,960
Train images, train labels, lovely, test images, test labels, beautiful.
9120
14:45:47,960 --> 14:45:52,960
So that's how quickly we can get a data set by using torch vision data sets.
9121
14:45:52,960 --> 14:45:56,960
Now, if we have a look over here, we have a data folder because we set the root to be
9122
14:45:56,960 --> 14:45:57,960
data.
9123
14:45:57,960 --> 14:46:00,960
Now, if we look what's inside here, we have fashion MNIST, exactly what we wanted.
9124
14:46:00,960 --> 14:46:05,960
Then we have the raw, and then we have a whole bunch of files here, which torch vision has
9125
14:46:05,960 --> 14:46:08,960
converted into data sets for us.
9126
14:46:08,960 --> 14:46:10,960
So let's get out of that.
9127
14:46:10,960 --> 14:46:16,960
And this process would be much the same if we used almost any data set in here.
9128
14:46:16,960 --> 14:46:19,960
They might be slightly different depending on what the documentation says and depending
9129
14:46:19,960 --> 14:46:21,560
on what the data set is.
9130
14:46:21,560 --> 14:46:27,760
But that is how easy torch vision data sets makes it to practice on example computer vision
9131
14:46:27,760 --> 14:46:29,160
data sets.
9132
14:46:29,160 --> 14:46:31,360
So let's go back.
9133
14:46:31,360 --> 14:46:35,840
Let's check out some parameters or some attributes of our data.
9134
14:46:35,840 --> 14:46:38,960
How many samples do we have?
9135
14:46:38,960 --> 14:46:45,760
So we'll check the lengths.
9136
14:46:45,760 --> 14:46:51,160
So we have 60,000 training examples and 10,000 testing examples.
9137
14:46:51,160 --> 14:46:54,680
So what we're going to be doing is we're going to be building a computer vision model to
9138
14:46:54,680 --> 14:47:00,160
find patterns in the training data and then use those patterns to predict on the test
9139
14:47:00,160 --> 14:47:01,400
data.
9140
14:47:01,400 --> 14:47:04,080
And so let's see a first training example.
9141
14:47:04,080 --> 14:47:08,280
See the first training example.
9142
14:47:08,280 --> 14:47:12,400
So we can just index on the train data.
9143
14:47:12,400 --> 14:47:18,640
Let's get the zero index and then we're going to have a look at the image and the label.
9144
14:47:18,640 --> 14:47:19,640
Oh my goodness.
9145
14:47:19,640 --> 14:47:21,560
A whole bunch of numbers.
9146
14:47:21,560 --> 14:47:26,080
Now you see what the two tensor has done for us?
9147
14:47:26,080 --> 14:47:30,680
So we've downloaded some images and thanks to this torch vision transforms to tensor.
9148
14:47:30,680 --> 14:47:32,400
How would we find the documentation for this?
9149
14:47:32,400 --> 14:47:36,680
Well, we could go and see what this does transforms to tensor.
9150
14:47:36,680 --> 14:47:40,080
We could go to tensor.
9151
14:47:40,080 --> 14:47:41,080
There we go.
9152
14:47:41,080 --> 14:47:42,080
What does this do?
9153
14:47:42,080 --> 14:47:43,680
Convert a pill image.
9154
14:47:43,680 --> 14:47:47,700
So that's Python image library image on NumPy array to a tensor.
9155
14:47:47,700 --> 14:47:49,760
This transform does not support torch script.
9156
14:47:49,760 --> 14:47:55,080
So converts a pill image on NumPy array height with color channels in the range 0 to 255
9157
14:47:55,080 --> 14:47:57,320
to a torch float tensor of shape.
9158
14:47:57,320 --> 14:47:58,320
See here?
9159
14:47:58,320 --> 14:48:03,280
This is what I was talking about how PyTorch defaults with a lot of transforms to CHW.
9160
14:48:03,280 --> 14:48:07,960
So color channels first height then width in that range of zero to one.
9161
14:48:07,960 --> 14:48:12,520
So typically red, green and blue values are between zero and 255.
9162
14:48:12,520 --> 14:48:15,080
But neural networks like things between zero and one.
9163
14:48:15,080 --> 14:48:21,000
And in this case, it is now in the shape of color channels first, then height, then width.
9164
14:48:21,000 --> 14:48:26,520
However, some other machine learning libraries prefer height, width, then color channels.
9165
14:48:26,520 --> 14:48:27,520
Just keep that in mind.
9166
14:48:27,520 --> 14:48:29,920
We're going to see this in practice later on.
9167
14:48:29,920 --> 14:48:30,920
So we've got an image.
9168
14:48:30,920 --> 14:48:31,920
We've got a label.
9169
14:48:31,920 --> 14:48:33,800
Let's check out some more details about it.
9170
14:48:33,800 --> 14:48:34,800
Remember how we discussed?
9171
14:48:34,800 --> 14:48:36,640
Oh, there's our label, by the way.
9172
14:48:36,640 --> 14:48:45,200
So nine, we can go traindata.classes, find some information about our class names.
9173
14:48:45,200 --> 14:48:46,800
Class names.
9174
14:48:46,800 --> 14:48:48,800
Beautiful.
9175
14:48:48,800 --> 14:48:55,600
So number nine would be 0, 1, 2, 3, 4, 5, 6, 7, 8, 9.
9176
14:48:55,600 --> 14:48:59,400
So this particular tensor seems to relate to an ankle boot.
9177
14:48:59,400 --> 14:49:00,840
How would we find that out?
9178
14:49:00,840 --> 14:49:01,840
Well, one second.
9179
14:49:01,840 --> 14:49:04,560
I'm just going to show you one more thing, class to IDX.
9180
14:49:04,560 --> 14:49:07,480
Let's go traindata.class to IDX.
9181
14:49:07,480 --> 14:49:09,240
What does this give us?
9182
14:49:09,240 --> 14:49:10,240
Class to IDX.
9183
14:49:10,240 --> 14:49:16,400
This is going to give us a dictionary of different labels and their corresponding index.
9184
14:49:16,400 --> 14:49:20,880
So if our machine learning model predicted nine or class nine, we can convert that to
9185
14:49:20,880 --> 14:49:24,560
ankle boot using this attribute of the train data.
9186
14:49:24,560 --> 14:49:26,840
There are more attributes that you can have a look at if you like.
9187
14:49:26,840 --> 14:49:32,560
You can go traindata.dot, then I just push tab to find out a bunch of different things.
9188
14:49:32,560 --> 14:49:33,560
You can go data.
9189
14:49:33,560 --> 14:49:37,280
That'll be the images, and then I believe you can also go targets.
9190
14:49:37,280 --> 14:49:43,280
So targets, that's all the labels, which is one big long tensor.
9191
14:49:43,280 --> 14:49:45,760
Now let's check the shape.
9192
14:49:45,760 --> 14:49:49,560
Check the shape of our image.
9193
14:49:49,560 --> 14:49:53,240
So image.shape and label.shape.
9194
14:49:53,240 --> 14:49:55,240
What are we going to get from that?
9195
14:49:55,240 --> 14:49:58,040
Oh, label doesn't have a shape.
9196
14:49:58,040 --> 14:49:59,040
Why is that?
9197
14:49:59,040 --> 14:50:01,000
Well, because it's only an integer.
9198
14:50:01,000 --> 14:50:02,000
So oh, beautiful.
9199
14:50:02,000 --> 14:50:03,080
Look at that.
9200
14:50:03,080 --> 14:50:06,440
So our image shape is we have a color channel of one.
9201
14:50:06,440 --> 14:50:13,360
So let me print this out in something prettier, print image shape, which is going to be image
9202
14:50:13,360 --> 14:50:14,360
shape.
9203
14:50:14,360 --> 14:50:19,200
Remember how I said it's very important to be aware of the input and output shapes of
9204
14:50:19,200 --> 14:50:20,880
your models and your data.
9205
14:50:20,880 --> 14:50:23,720
It's all part of becoming one with the data.
9206
14:50:23,720 --> 14:50:28,920
So that is what our image shape is.
9207
14:50:28,920 --> 14:50:36,480
And then if we go next, this is print image label, which is label, but we'll index on
9208
14:50:36,480 --> 14:50:39,880
class names for label.
9209
14:50:39,880 --> 14:50:44,040
And then we'll do that wonderful.
9210
14:50:44,040 --> 14:50:47,920
So our image shape is currently in the format of color channels height width.
9211
14:50:47,920 --> 14:50:50,400
We got a bunch of different numbers that's representing our image.
9212
14:50:50,400 --> 14:50:51,760
It's black and white.
9213
14:50:51,760 --> 14:50:54,000
It only has one color channel.
9214
14:50:54,000 --> 14:50:56,880
Why do you think it only has one color channel?
9215
14:50:56,880 --> 14:51:00,600
Because it's black and white, so if we jump back into the keynote, fashion, we've already
9216
14:51:00,600 --> 14:51:04,280
discussed this, grayscale images have one color channel.
9217
14:51:04,280 --> 14:51:07,840
So that means that for black, the pixel value is zero.
9218
14:51:07,840 --> 14:51:12,000
And for white, it's some value for whatever color is going on here.
9219
14:51:12,000 --> 14:51:16,240
So if it's a very high number, say it's one, it's going to be pure white.
9220
14:51:16,240 --> 14:51:20,640
If it's like 0.001, it might be a faint white pixel.
9221
14:51:20,640 --> 14:51:23,480
But if it's exactly zero, it's going to be black.
9222
14:51:23,480 --> 14:51:27,800
So color images have three color channels for red, green and blue, grayscale have one
9223
14:51:27,800 --> 14:51:29,760
color channel.
9224
14:51:29,760 --> 14:51:34,680
But I think we've done enough of visualizing our images as numbers.
9225
14:51:34,680 --> 14:51:39,800
How about in the next video, we visualize our image as an image?
9226
14:51:39,800 --> 14:51:42,560
I'll see you there.
9227
14:51:42,560 --> 14:51:43,720
Welcome back.
9228
14:51:43,720 --> 14:51:49,160
So in the last video, we checked the input output shapes of our data, and we downloaded
9229
14:51:49,160 --> 14:51:54,880
the fashion MNIST data set, which is comprised of images or grayscale images of T-shirts,
9230
14:51:54,880 --> 14:52:00,520
trousers, pullovers, dress, coat, sandal, shirt, sneaker, bag, ankle boot.
9231
14:52:00,520 --> 14:52:06,120
Now we want to see if we can build a computer vision model to decipher what's going on in
9232
14:52:06,120 --> 14:52:07,120
fashion MNIST.
9233
14:52:07,120 --> 14:52:14,080
So to separate, to classify different items of clothing based on their numerical representation.
9234
14:52:14,080 --> 14:52:18,800
And part of becoming one with the data is, of course, checking the input output shapes
9235
14:52:18,800 --> 14:52:19,880
of it.
9236
14:52:19,880 --> 14:52:24,480
So this is a fashion MNIST data set from Zalando Research.
9237
14:52:24,480 --> 14:52:27,320
Now if you recall, why did we look at our input and output shapes?
9238
14:52:27,320 --> 14:52:29,360
Well, this is what we looked at before.
9239
14:52:29,360 --> 14:52:34,360
We have 28 by 28 grayscale images that we want to represent as a tensor.
9240
14:52:34,360 --> 14:52:37,960
We want to use them as input into a machine learning algorithm, typically a computer vision
9241
14:52:37,960 --> 14:52:39,840
algorithm, such as a CNN.
9242
14:52:39,840 --> 14:52:45,520
And we want to have some sort of outputs that are formatted in the ideal shape that we'd
9243
14:52:45,520 --> 14:52:46,520
like.
9244
14:52:46,520 --> 14:52:49,200
So in our case, we have 10 different types of clothing.
9245
14:52:49,200 --> 14:52:54,800
So we're going to have an output shape of 10, but our input shape is what?
9246
14:52:54,800 --> 14:52:59,280
So by default, PyTorch turns tensors into color channels first.
9247
14:52:59,280 --> 14:53:03,080
So we have an input shape of none, one, 28, 28.
9248
14:53:03,080 --> 14:53:07,520
So none is going to be our batch size, which of course we can set that to whatever we'd
9249
14:53:07,520 --> 14:53:08,520
like.
9250
14:53:08,520 --> 14:53:15,040
Now input shape format is in NCHW, or in other words, color channels first.
9251
14:53:15,040 --> 14:53:18,880
But just remember, if you're working with some other machine learning libraries, you
9252
14:53:18,880 --> 14:53:21,760
may want to use color channels last.
9253
14:53:21,760 --> 14:53:24,280
So let's have a look at where that might be the case.
9254
14:53:24,280 --> 14:53:27,400
We're going to visualize our images.
9255
14:53:27,400 --> 14:53:30,480
So I make a little heading here, 1.2.
9256
14:53:30,480 --> 14:53:35,800
Now this is all part of becoming one with the data.
9257
14:53:35,800 --> 14:53:39,760
In other words, understanding its input and output shapes, how many samples there are,
9258
14:53:39,760 --> 14:53:44,440
what they look like, visualize, visualize, visualize.
9259
14:53:44,440 --> 14:53:45,440
Let's import mapplotlib.
9260
14:53:45,440 --> 14:53:56,280
I'm just going to add a few code cells here, import mapplotlib.pyplot as PLT.
9261
14:53:56,280 --> 14:54:03,720
Now let's create our image and label is our train data zero, and we're going to print
9262
14:54:03,720 --> 14:54:10,400
the image shape so we can understand what inputs are going into our mapplotlib function.
9263
14:54:10,400 --> 14:54:15,440
And then we're going to go plot.imshow, and we're going to pass in our image and see
9264
14:54:15,440 --> 14:54:21,320
what happens, because recall what does our image look like, image?
9265
14:54:21,320 --> 14:54:24,400
Our image is this big tensor of numbers.
9266
14:54:24,400 --> 14:54:27,160
And we've got an image shape, 128, 128.
9267
14:54:27,160 --> 14:54:29,840
Now what happens if we call plot.imshow?
9268
14:54:29,840 --> 14:54:31,840
What happens there?
9269
14:54:31,840 --> 14:54:40,200
Oh, we get an error in valid shape, 128, 128 for image data.
9270
14:54:40,200 --> 14:54:45,440
Now as I said, this is one of the most common errors in machine learning is a shape issue.
9271
14:54:45,440 --> 14:54:50,360
So the shape of your input tensor doesn't match the expected shape of that tensor.
9272
14:54:50,360 --> 14:54:56,200
So this is one of those scenarios where our data format, so color channels first, doesn't
9273
14:54:56,200 --> 14:54:59,080
match up with what mapplotlib is expecting.
9274
14:54:59,080 --> 14:55:04,280
So mapplotlib expects either just height and width, so no color channel for gray style
9275
14:55:04,280 --> 14:55:08,080
images, or it also expects the color channels to be last.
9276
14:55:08,080 --> 14:55:12,800
So we'll see that later on, but for grayscale, we can get rid of that extra dimension by
9277
14:55:12,800 --> 14:55:16,360
passing in image.squeeze.
9278
14:55:16,360 --> 14:55:18,440
So do you recall what squeeze does?
9279
14:55:18,440 --> 14:55:21,160
It's going to remove that singular dimension.
9280
14:55:21,160 --> 14:55:25,960
If we have a look at what goes on now, beautiful, we get an ankle boot.
9281
14:55:25,960 --> 14:55:31,080
Well, that's a very pixelated ankle boot, but we're only dealing with 28 by 28 pixels,
9282
14:55:31,080 --> 14:55:33,280
so not a very high definition image.
9283
14:55:33,280 --> 14:55:34,760
Let's add the title to it.
9284
14:55:34,760 --> 14:55:37,600
We're going to add in the label.
9285
14:55:37,600 --> 14:55:41,520
Beautiful.
9286
14:55:41,520 --> 14:55:43,360
So we've got the number nine here.
9287
14:55:43,360 --> 14:55:46,960
So where if we go up to here, that's an ankle boot.
9288
14:55:46,960 --> 14:55:48,640
Now let's plot this in grayscale.
9289
14:55:48,640 --> 14:55:49,760
How might we do that?
9290
14:55:49,760 --> 14:55:50,760
We can do the same thing.
9291
14:55:50,760 --> 14:55:52,320
We can go plotplt.imshow.
9292
14:55:52,320 --> 14:55:56,560
We're going to pass in image.squeeze.
9293
14:55:56,560 --> 14:56:00,120
And we're going to change the color map, C map equals gray.
9294
14:56:00,120 --> 14:56:03,400
So in mapplotlib, if you ever have to change the colors of your plot, you want to look
9295
14:56:03,400 --> 14:56:09,960
into the C map property or parameter, or sometimes it's also shortened to just C.
9296
14:56:09,960 --> 14:56:14,600
But in this case, M show is C map, and we want to plot title, and we're going to pull
9297
14:56:14,600 --> 14:56:19,640
it in class names and the label integer here.
9298
14:56:19,640 --> 14:56:21,240
So we have a look at it now.
9299
14:56:21,240 --> 14:56:25,240
We have an ankle boot, and we can remove the accesses to if we wanted to plot.access,
9300
14:56:25,240 --> 14:56:27,080
and turn that off.
9301
14:56:27,080 --> 14:56:28,600
That's going to remove the access.
9302
14:56:28,600 --> 14:56:29,600
So there we go.
9303
14:56:29,600 --> 14:56:30,960
That's the type of images that we're dealing with.
9304
14:56:30,960 --> 14:56:33,160
But that's only a singular image.
9305
14:56:33,160 --> 14:56:38,480
How about we harness the power of randomness and have a look at some random images from
9306
14:56:38,480 --> 14:56:39,880
our data set?
9307
14:56:39,880 --> 14:56:42,200
So how would we do this?
9308
14:56:42,200 --> 14:56:44,200
Let's go plot more images.
9309
14:56:44,200 --> 14:56:46,680
We'll set a random seed.
9310
14:56:46,680 --> 14:56:51,640
So you and I are both looking at as similar as possible images, 42.
9311
14:56:51,640 --> 14:56:57,360
Now we'll create a plot by calling plot.figure, and we're going to give it a size.
9312
14:56:57,360 --> 14:56:59,520
We might create a nine by nine grid.
9313
14:56:59,520 --> 14:57:03,040
So we want to see nine random images from our data set.
9314
14:57:03,040 --> 14:57:06,800
So rows, calls, or sorry, maybe we'll do four by four.
9315
14:57:06,800 --> 14:57:07,800
That'll give us 16.
9316
14:57:07,800 --> 14:57:16,080
We're going to go four i in range, and we're going to go one to rows times columns plus
9317
14:57:16,080 --> 14:57:17,080
one.
9318
14:57:17,080 --> 14:57:18,320
So we can print i.
9319
14:57:18,320 --> 14:57:20,280
What's that going to give us?
9320
14:57:20,280 --> 14:57:21,960
We want to see 16 images.
9321
14:57:21,960 --> 14:57:24,200
Oh, they're about.
9322
14:57:24,200 --> 14:57:30,360
So 16 random images, but used with a manual C to 42 of our data set.
9323
14:57:30,360 --> 14:57:33,760
This is one of my favorite things to do with any type of data set that I'm looking
9324
14:57:33,760 --> 14:57:36,760
at, whether it be text, image, audio, doesn't matter.
9325
14:57:36,760 --> 14:57:41,800
I like to randomly have a look at a whole bunch of samples at the start so that I can
9326
14:57:41,800 --> 14:57:45,840
become one with the data.
9327
14:57:45,840 --> 14:57:50,240
With that being said, let's use this loop to grab some random indexes.
9328
14:57:50,240 --> 14:57:55,880
We can do so using tortures, rand, int, so random integer between zero and length of
9329
14:57:55,880 --> 14:57:57,280
the training data.
9330
14:57:57,280 --> 14:58:01,920
This is going to give us a random integer in the range of zero and however many training
9331
14:58:01,920 --> 14:58:06,760
samples we have, which in our case is what, 60,000 or thereabouts.
9332
14:58:06,760 --> 14:58:11,160
So we want to create the size of one, and we want to get the item from that so that we
9333
14:58:11,160 --> 14:58:12,560
have a random index.
9334
14:58:12,560 --> 14:58:14,320
What is this going to give us?
9335
14:58:14,320 --> 14:58:19,800
Oh, excuse me, maybe we print that out.
9336
14:58:19,800 --> 14:58:20,800
There we go.
9337
14:58:20,800 --> 14:58:21,800
So we have random images.
9338
14:58:21,800 --> 14:58:25,320
Now, because we're using manual seed, it will give us the same numbers every time.
9339
14:58:25,320 --> 14:58:30,520
So we have three, seven, five, four, two, three, seven, five, four, two.
9340
14:58:30,520 --> 14:58:36,240
And then if we just commented out the random seed, we'll get different numbers every time.
9341
14:58:36,240 --> 14:58:39,240
But this is just to demonstrate, we'll keep the manual seed there for now.
9342
14:58:39,240 --> 14:58:43,800
You can comment that out if you want different numbers or different images, different indexes
9343
14:58:43,800 --> 14:58:45,000
each time.
9344
14:58:45,000 --> 14:58:49,640
So we'll create the image and the label by indexing on the training data at the random
9345
14:58:49,640 --> 14:58:53,560
index that we're generating.
9346
14:58:53,560 --> 14:58:55,240
And then we'll create our plot.
9347
14:58:55,240 --> 14:59:02,920
So Fig or we'll add a subplot, Fig add subplot, and we're going to go rows, calls, I.
9348
14:59:02,920 --> 14:59:06,120
So at the if index, we're going to add a subplot.
9349
14:59:06,120 --> 14:59:08,960
Remember, we set rows and columns up to here.
9350
14:59:08,960 --> 14:59:13,280
And then we're going to go PLT dot in show, we're going to show what we're going to show
9351
14:59:13,280 --> 14:59:17,400
our image, but we have to squeeze it to get rid of that singular dimension as the color
9352
14:59:17,400 --> 14:59:18,400
channel.
9353
14:59:18,400 --> 14:59:20,000
Otherwise, we end up with an issue with map plot lib.
9354
14:59:20,000 --> 14:59:21,840
We're going to use a color map of gray.
9355
14:59:21,840 --> 14:59:24,000
So it looks like the image we plotted above.
9356
14:59:24,000 --> 14:59:28,640
And then for our title, it's going to be our class names indexed with our label.
9357
14:59:28,640 --> 14:59:31,400
And then we don't want the accesses because that's going to clutter up our plot.
9358
14:59:31,400 --> 14:59:32,880
Let's see what this looks like.
9359
14:59:32,880 --> 14:59:35,720
Oh my goodness, look at that.
9360
14:59:35,720 --> 14:59:36,720
It worked first.
9361
14:59:36,720 --> 14:59:37,720
Go.
9362
14:59:37,720 --> 14:59:40,560
Usually visualizations take a fair bit of trial and error.
9363
14:59:40,560 --> 14:59:45,320
So we have ankle boots, we have shirts, we have bags, we have ankle boots, sandal, shirt,
9364
14:59:45,320 --> 14:59:46,320
pull over.
9365
14:59:46,320 --> 14:59:52,040
Oh, do you notice something about the data set right now, pull over and shirt?
9366
14:59:52,040 --> 14:59:53,400
To me, they look quite similar.
9367
14:59:53,400 --> 14:59:57,320
Do you think that will cause an issue later on when our model is trying to predict between
9368
14:59:57,320 --> 14:59:59,120
a pull over and a shirt?
9369
14:59:59,120 --> 15:00:02,080
How about if we look at some more images?
9370
15:00:02,080 --> 15:00:06,880
We'll get rid of the random seed so we can have a look at different styles.
9371
15:00:06,880 --> 15:00:13,320
So have a sandal ankle boot coat, t-shirt, top, shirt, oh, is that a little bit confusing
9372
15:00:13,320 --> 15:00:16,960
that we have a class for t-shirt and top and shirt?
9373
15:00:16,960 --> 15:00:24,080
Like I'm not sure about you, but what's the difference between a t-shirt and a shirt?
9374
15:00:24,080 --> 15:00:27,600
This is just something to keep in mind as a t-shirt and top, does that look like it could
9375
15:00:27,600 --> 15:00:30,400
be maybe even a dress?
9376
15:00:30,400 --> 15:00:32,000
Like the shape is there.
9377
15:00:32,000 --> 15:00:34,880
So this is just something to keep in mind going forward.
9378
15:00:34,880 --> 15:00:39,720
The chances are if we get confused on our, like you and I looking at our data set, if
9379
15:00:39,720 --> 15:00:43,600
we get confused about different samples and what they're labeled with, our model might
9380
15:00:43,600 --> 15:00:45,920
get confused later on.
9381
15:00:45,920 --> 15:00:50,360
So let's have a look at one more and then we'll go into the next video.
9382
15:00:50,360 --> 15:00:56,200
So we have sneaker, trouser, shirt, sandal, dress, pull over, bag, bag, t-shirt, oh, that's
9383
15:00:56,200 --> 15:00:57,200
quite a difficult one.
9384
15:00:57,200 --> 15:01:00,280
It doesn't look like there's even much going on in that image.
9385
15:01:00,280 --> 15:01:05,080
But the whole premise of building machine learning models to do this would be could you
9386
15:01:05,080 --> 15:01:10,200
write a program that would take in the shapes of these images and figure out, write a rule-based
9387
15:01:10,200 --> 15:01:14,320
program that would go, hey, if it's looked like a rectangle with a buckle in the middle,
9388
15:01:14,320 --> 15:01:16,160
it's probably a bag?
9389
15:01:16,160 --> 15:01:22,200
I mean, you probably could after a while, but I prefer to write machine learning algorithms
9390
15:01:22,200 --> 15:01:24,080
to figure out patterns and data.
9391
15:01:24,080 --> 15:01:26,120
So let's start moving towards that.
9392
15:01:26,120 --> 15:01:30,160
We're now going to go on figuring out how we can prepare this data to be loaded into
9393
15:01:30,160 --> 15:01:31,520
a model.
9394
15:01:31,520 --> 15:01:33,200
I'll see you there.
9395
15:01:33,200 --> 15:01:36,480
All right, all right, all right.
9396
15:01:36,480 --> 15:01:42,880
So we've got 60,000 images of clothing that we'd like to build a computer vision model
9397
15:01:42,880 --> 15:01:44,680
to classify into 10 different classes.
9398
15:01:44,680 --> 15:01:49,560
And now that we've visualized a fair few of these samples, do you think that we could
9399
15:01:49,560 --> 15:01:54,760
model these with just linear lines, so straight lines, or do you think we'll need a model
9400
15:01:54,760 --> 15:01:56,640
with nonlinearity?
9401
15:01:56,640 --> 15:01:58,760
So I'm going to write that down.
9402
15:01:58,760 --> 15:02:13,640
So do you think these items of clothing images could be modeled with pure linear lines, or
9403
15:02:13,640 --> 15:02:17,360
do you think we'll need nonlinearity?
9404
15:02:17,360 --> 15:02:20,560
Don't have to answer that now.
9405
15:02:20,560 --> 15:02:21,560
We could test that out later on.
9406
15:02:21,560 --> 15:02:26,600
You might want to skip ahead and try to build a model yourself with linear lines or nonlinearities.
9407
15:02:26,600 --> 15:02:32,560
We've covered linear lines and nonlinearities before, but let's now start to prepare our
9408
15:02:32,560 --> 15:02:38,280
data even further to prepare data loader.
9409
15:02:38,280 --> 15:02:46,600
So right now, our data is in the form of PyTorch data sets.
9410
15:02:46,600 --> 15:02:48,720
So let's have a look at it.
9411
15:02:48,720 --> 15:02:52,480
Same data.
9412
15:02:52,480 --> 15:02:53,480
There we go.
9413
15:02:53,480 --> 15:02:55,920
So we have data set, which is of fashion MNIST.
9414
15:02:55,920 --> 15:03:01,040
And then if we go test data, we see a similar thing except we have a different number of
9415
15:03:01,040 --> 15:03:02,040
data points.
9416
15:03:02,040 --> 15:03:05,680
We have the same transform on each, we've turned them into tenses.
9417
15:03:05,680 --> 15:03:09,600
So we want to convert them from a data set, which is a collection of all of our data,
9418
15:03:09,600 --> 15:03:10,600
into a data loader.
9419
15:03:10,600 --> 15:03:22,400
Paul, that a data loader, turns our data set into a Python iterable.
9420
15:03:22,400 --> 15:03:26,680
So I'm going to turn this into Markdown, beautiful.
9421
15:03:26,680 --> 15:03:31,240
More specifically, specific Galilee, can I spell right?
9422
15:03:31,240 --> 15:03:35,680
I don't know, we want to just code right, we're not here to learn spelling.
9423
15:03:35,680 --> 15:03:45,880
We want to turn our data into batches, or mini batches.
9424
15:03:45,880 --> 15:03:48,960
Why would we do this?
9425
15:03:48,960 --> 15:03:54,440
Well, we may get away with it by building a model to look at all 60,000 samples of our
9426
15:03:54,440 --> 15:03:56,840
current data set, because it's quite small.
9427
15:03:56,840 --> 15:04:01,320
It's only comprised of images of 28 by 28 pixels.
9428
15:04:01,320 --> 15:04:07,040
And when I say quite small, yes, 60,000 images is actually quite small for a deep learning
9429
15:04:07,040 --> 15:04:08,640
scale data set.
9430
15:04:08,640 --> 15:04:12,360
Modern data sets could be in the millions of images.
9431
15:04:12,360 --> 15:04:20,320
But if our computer hardware was able to look at 60,000 samples of 28 by 28 at one time,
9432
15:04:20,320 --> 15:04:22,240
it would need a fair bit of memory.
9433
15:04:22,240 --> 15:04:28,240
So we have RAM space up here, we have GPU memory, we have compute memory.
9434
15:04:28,240 --> 15:04:33,480
But chances are that it might not be able to store millions of images in memory.
9435
15:04:33,480 --> 15:04:40,480
So what you do is you break a data set from say 60,000 into groups of batches or mini
9436
15:04:40,480 --> 15:04:41,480
batches.
9437
15:04:41,480 --> 15:04:45,040
So we've seen batch size before, why would we do this?
9438
15:04:45,040 --> 15:04:58,000
Well, one, it is more computationally efficient, as in your computing hardware may not be able
9439
15:04:58,000 --> 15:05:06,320
to look store in memory at 60,000 images in one hit.
9440
15:05:06,320 --> 15:05:11,960
So we break it down to 32 images at a time.
9441
15:05:11,960 --> 15:05:14,800
This would be batch size of 32.
9442
15:05:14,800 --> 15:05:17,400
Now again, 32 is a number that you can change.
9443
15:05:17,400 --> 15:05:22,080
32 is just a common batch size that you'll see with many beginner style problems.
9444
15:05:22,080 --> 15:05:24,400
As you go on, you'll see different batch sizes.
9445
15:05:24,400 --> 15:05:29,200
This is just to exemplify the concept of mini batches, which is very common in deep
9446
15:05:29,200 --> 15:05:30,200
learning.
9447
15:05:30,200 --> 15:05:32,200
And why else would we do this?
9448
15:05:32,200 --> 15:05:41,120
The second point or the second main point is it gives our neural network more chances
9449
15:05:41,120 --> 15:05:45,480
to update its gradients per epoch.
9450
15:05:45,480 --> 15:05:48,880
So what I mean by this, this will make more sense when we write a training loop.
9451
15:05:48,880 --> 15:05:54,120
But if we were to just look at 60,000 images at one time, we would per epoch.
9452
15:05:54,120 --> 15:05:59,440
So per iteration through the data, we would only get one update per epoch across our entire
9453
15:05:59,440 --> 15:06:00,440
data set.
9454
15:06:00,440 --> 15:06:06,360
Whereas if we look at 32 images at a time, our neural network updates its internal states,
9455
15:06:06,360 --> 15:06:10,240
its weights, every 32 images, thanks to the optimizer.
9456
15:06:10,240 --> 15:06:12,800
This will make a lot more sense once we write our training loop.
9457
15:06:12,800 --> 15:06:17,440
But these are the two of the main reasons for turning our data into mini batches in the
9458
15:06:17,440 --> 15:06:18,960
form of a data loader.
9459
15:06:18,960 --> 15:06:22,640
Now if you'd like to learn more about the theory behind this, I would highly recommend
9460
15:06:22,640 --> 15:06:25,720
looking up Andrew Org mini batches.
9461
15:06:25,720 --> 15:06:28,720
There's a great lecture on that.
9462
15:06:28,720 --> 15:06:33,640
So yeah, large-scale machine learning, mini batch gradient descent, mini batch gradient
9463
15:06:33,640 --> 15:06:34,640
descent.
9464
15:06:34,640 --> 15:06:36,280
Yeah, that's what it's called mini batch gradient descent.
9465
15:06:36,280 --> 15:06:39,960
If you look up some results on that, you'll find a whole bunch of stuff.
9466
15:06:39,960 --> 15:06:48,080
I might just link this one, I'm going to pause that, I'm going to link this in there.
9467
15:06:48,080 --> 15:06:56,880
So for more on mini batches, see here.
9468
15:06:56,880 --> 15:07:00,920
Now to see this visually, I've got a slide prepared for this.
9469
15:07:00,920 --> 15:07:02,480
So this is what we're going to be working towards.
9470
15:07:02,480 --> 15:07:03,920
There's our input and output shapes.
9471
15:07:03,920 --> 15:07:08,480
We want to create batch size of 32 across all of our 60,000 training images.
9472
15:07:08,480 --> 15:07:12,840
And we're actually going to do the same for our testing images, but we only have 10,000
9473
15:07:12,840 --> 15:07:14,360
testing images.
9474
15:07:14,360 --> 15:07:17,640
So this is what our data set's going to look like, batched.
9475
15:07:17,640 --> 15:07:23,240
So we're going to write some code, namely using the data loader from torch.util.data.
9476
15:07:23,240 --> 15:07:26,120
We're going to pass it a data set, which is our train data.
9477
15:07:26,120 --> 15:07:28,880
We're going to give it a batch size, which we can define as whatever we want.
9478
15:07:28,880 --> 15:07:31,560
For us, we're going to use 32 to begin with.
9479
15:07:31,560 --> 15:07:35,360
And we're going to set shuffle equals true if we're using the training data.
9480
15:07:35,360 --> 15:07:38,200
Why would we set shuffle equals true?
9481
15:07:38,200 --> 15:07:43,120
Well, in case our data set for some reason has order, say we had all of the pants images
9482
15:07:43,120 --> 15:07:47,440
in a row, we had all of the T-shirt images in a row, we had all the sandal images in
9483
15:07:47,440 --> 15:07:48,440
a row.
9484
15:07:48,440 --> 15:07:51,640
We don't want our neural network to necessarily remember the order of our data.
9485
15:07:51,640 --> 15:07:56,280
We just want it to remember individual patterns between different classes.
9486
15:07:56,280 --> 15:07:59,440
So we shuffle up the data, we mix it, we mix it up.
9487
15:07:59,440 --> 15:08:02,200
And then it looks something like this.
9488
15:08:02,200 --> 15:08:06,840
So we might have batch number zero, and then we have 32 samples.
9489
15:08:06,840 --> 15:08:12,040
Now I ran out of space when I was creating these, but we got, that was fun up to 32.
9490
15:08:12,040 --> 15:08:14,440
So this is setting batch size equals 32.
9491
15:08:14,440 --> 15:08:17,700
So we look at 32 samples per batch.
9492
15:08:17,700 --> 15:08:22,720
We mix all the samples up, and we go batch, batch, batch, batch, batch, and we'll have,
9493
15:08:22,720 --> 15:08:26,160
however many batches we have, we'll have number of samples divided by the batch size.
9494
15:08:26,160 --> 15:08:31,000
So 60,000 divided by 32, what's that, 1800 or something like that?
9495
15:08:31,000 --> 15:08:33,120
So this is what we're going to be working towards.
9496
15:08:33,120 --> 15:08:36,320
I did want to write some code in this video, but I think to save it getting too long, we're
9497
15:08:36,320 --> 15:08:38,040
going to write this code in the next video.
9498
15:08:38,040 --> 15:08:42,960
If you would like to give this a go on your own, here's most of the code we have to do.
9499
15:08:42,960 --> 15:08:46,800
So there's the train data loader, do the same for the test data loader.
9500
15:08:46,800 --> 15:08:53,640
And I'll see you in the next video, and we're going to batchify our fashion MNIST data set.
9501
15:08:53,640 --> 15:08:54,720
Welcome back.
9502
15:08:54,720 --> 15:08:59,400
In the last video, we had a brief overview of the concept of mini batches.
9503
15:08:59,400 --> 15:09:05,720
And so rather than our computer looking at 60,000 images in one hit, we break things down.
9504
15:09:05,720 --> 15:09:07,960
We turn it into batches of 32.
9505
15:09:07,960 --> 15:09:11,280
Again, the batch size will vary depending on what problem you're working on.
9506
15:09:11,280 --> 15:09:15,280
But 32 is quite a good value to start with and try out.
9507
15:09:15,280 --> 15:09:20,000
And we do this for two main reasons, if we jump back to the code, why would we do this?
9508
15:09:20,000 --> 15:09:22,160
It is more computationally efficient.
9509
15:09:22,160 --> 15:09:27,920
So if we have a GPU with, say, 10 gigabytes of memory, it might not be able to store all
9510
15:09:27,920 --> 15:09:29,920
60,000 images in one hit.
9511
15:09:29,920 --> 15:09:35,000
In our data set, because it's quite small, it may be hour or two, but it's better practice
9512
15:09:35,000 --> 15:09:38,400
for later on to turn things into mini batches.
9513
15:09:38,400 --> 15:09:42,600
And it also gives our neural network more chances to update its gradients per epoch,
9514
15:09:42,600 --> 15:09:45,680
which will make a lot more sense once we write our training loop.
9515
15:09:45,680 --> 15:09:48,240
But for now, we've spoken enough about the theory.
9516
15:09:48,240 --> 15:09:50,640
Let's write some code to do so.
9517
15:09:50,640 --> 15:09:57,120
So I'm going to import data loader from torch dot utils dot data, import data loader.
9518
15:09:57,120 --> 15:10:02,680
And this principle, by the way, preparing a data loader goes the same for not only images,
9519
15:10:02,680 --> 15:10:08,520
but for text, for audio, whatever sort of data you're working with, mini batches will
9520
15:10:08,520 --> 15:10:13,440
follow you along or batches of data will follow you along throughout a lot of different deep
9521
15:10:13,440 --> 15:10:15,280
learning problems.
9522
15:10:15,280 --> 15:10:18,960
So set up the batch size hyper parameter.
9523
15:10:18,960 --> 15:10:23,560
Remember, a hyper parameter is a value that you can set yourself.
9524
15:10:23,560 --> 15:10:25,960
So batch size equals 32.
9525
15:10:25,960 --> 15:10:27,600
And it's practice.
9526
15:10:27,600 --> 15:10:29,040
You might see it typed as capitals.
9527
15:10:29,040 --> 15:10:34,480
You won't always see it, but you'll see it quite often a hyper parameter typed as capitals.
9528
15:10:34,480 --> 15:10:38,760
And then we're going to turn data sets into iterables.
9529
15:10:38,760 --> 15:10:40,920
So batches.
9530
15:10:40,920 --> 15:10:46,040
So we're going to create a train data loader here of our fashion MNIST data set.
9531
15:10:46,040 --> 15:10:47,720
We're going to use data loader.
9532
15:10:47,720 --> 15:10:49,680
We're going to see what the doc string is.
9533
15:10:49,680 --> 15:10:56,280
Or actually, let's look at the documentation torch data loader.
9534
15:10:56,280 --> 15:11:01,040
This is some extra curriculum for you too, by the way, is to read this data page torch
9535
15:11:01,040 --> 15:11:05,800
utils not data because no matter what problem you're going with with deep learning or pytorch,
9536
15:11:05,800 --> 15:11:07,520
you're going to be working with data.
9537
15:11:07,520 --> 15:11:09,960
So spend 10 minutes just reading through here.
9538
15:11:09,960 --> 15:11:13,880
I think I might have already assigned this, but this is just so important that it's worth
9539
15:11:13,880 --> 15:11:15,600
going through again.
9540
15:11:15,600 --> 15:11:17,120
Read through all of this.
9541
15:11:17,120 --> 15:11:20,440
Even if you don't understand all of it, what's going on, it's just it helps you know where
9542
15:11:20,440 --> 15:11:22,400
to look for certain things.
9543
15:11:22,400 --> 15:11:23,840
So what does it take?
9544
15:11:23,840 --> 15:11:25,080
Data loader takes a data set.
9545
15:11:25,080 --> 15:11:28,280
We need to set the batch size to something is the default of one.
9546
15:11:28,280 --> 15:11:32,400
That means that it would create a batch of one image at a time in our case.
9547
15:11:32,400 --> 15:11:33,920
Do we want to shuffle it?
9548
15:11:33,920 --> 15:11:35,520
Do we want to use a specific sampler?
9549
15:11:35,520 --> 15:11:37,920
There's a few more things going on.
9550
15:11:37,920 --> 15:11:39,120
Number of workers.
9551
15:11:39,120 --> 15:11:44,120
Number of workers stands for how many cores on our machine do we want to use to load data?
9552
15:11:44,120 --> 15:11:47,240
Generally the higher the better for this one, but we're going to keep most of these as
9553
15:11:47,240 --> 15:11:51,960
the default because most of them are set to pretty good values to begin with.
9554
15:11:51,960 --> 15:11:55,160
I'll let you read more into the other parameters here.
9555
15:11:55,160 --> 15:12:00,960
We're going to focus on the first three data set batch size and shuffle true or false.
9556
15:12:00,960 --> 15:12:02,840
Let's see what we can do.
9557
15:12:02,840 --> 15:12:08,200
So data set equals our train data, which is 60,000 fashion MNIST.
9558
15:12:08,200 --> 15:12:12,760
And then we have a batch size, which we're going to set to our batch size hyper parameter.
9559
15:12:12,760 --> 15:12:15,480
So we're going to have a batch size of 32.
9560
15:12:15,480 --> 15:12:17,800
And then finally, do we want to shuffle the training data?
9561
15:12:17,800 --> 15:12:19,960
Yes, we do.
9562
15:12:19,960 --> 15:12:23,360
And then we're going to do the same thing for the test data loader, except we're not
9563
15:12:23,360 --> 15:12:25,120
going to shuffle the test data.
9564
15:12:25,120 --> 15:12:31,800
Now, you can shuffle the test data if you want, but in my practice, it's actually easier
9565
15:12:31,800 --> 15:12:36,840
to evaluate different models when the test data isn't shuffled.
9566
15:12:36,840 --> 15:12:39,640
So you shuffle the training data to remove order.
9567
15:12:39,640 --> 15:12:41,640
And so your model doesn't learn order.
9568
15:12:41,640 --> 15:12:46,960
But for evaluation purposes, it's generally good to have your test data in the same order
9569
15:12:46,960 --> 15:12:51,480
because our model will never actually see the test data set during training.
9570
15:12:51,480 --> 15:12:53,800
We're just using it for evaluation.
9571
15:12:53,800 --> 15:12:56,680
So the order doesn't really matter to the test data loader.
9572
15:12:56,680 --> 15:13:01,160
It's just easier if we don't shuffle it, because then if we evaluate it multiple times, it's
9573
15:13:01,160 --> 15:13:03,680
not been shuffled every single time.
9574
15:13:03,680 --> 15:13:05,360
So let's run that.
9575
15:13:05,360 --> 15:13:14,640
And then we're going to check it out, our train data loader and our test data loader.
9576
15:13:14,640 --> 15:13:15,640
Beautiful.
9577
15:13:15,640 --> 15:13:20,400
Instances of torch utils data, data loader, data loader.
9578
15:13:20,400 --> 15:13:25,480
And now let's check out what we've created, hey, I always like to print different attributes
9579
15:13:25,480 --> 15:13:28,840
of whatever we make, check out what we've created.
9580
15:13:28,840 --> 15:13:32,280
This is all part of becoming one with the data.
9581
15:13:32,280 --> 15:13:40,920
So print F, I'm going to go data loaders, and then pass in, this is just going to output
9582
15:13:40,920 --> 15:13:43,760
basically the exact same as what we've got above.
9583
15:13:43,760 --> 15:13:45,840
This data loader.
9584
15:13:45,840 --> 15:13:50,120
And we can also see what attributes we can get from each of these by going train data
9585
15:13:50,120 --> 15:13:51,120
loader.
9586
15:13:51,120 --> 15:13:54,680
I don't need caps lock there, train data loader, full stop.
9587
15:13:54,680 --> 15:13:55,680
And then we can go tab.
9588
15:13:55,680 --> 15:13:57,520
We've got a whole bunch of different attributes.
9589
15:13:57,520 --> 15:13:58,960
We've got a batch size.
9590
15:13:58,960 --> 15:13:59,960
We've got our data set.
9591
15:13:59,960 --> 15:14:05,760
Do we want to drop the last as in if our batch size overlapped with our 60,000 samples?
9592
15:14:05,760 --> 15:14:07,760
Do we want to get rid of the last batch?
9593
15:14:07,760 --> 15:14:10,440
Say for example, the last batch only had 10 samples.
9594
15:14:10,440 --> 15:14:11,960
Do we want to just drop that?
9595
15:14:11,960 --> 15:14:14,840
Do we want to pin the memory that's going to help later on if we wanted to load our
9596
15:14:14,840 --> 15:14:15,840
data faster?
9597
15:14:15,840 --> 15:14:18,000
A whole bunch of different stuff here.
9598
15:14:18,000 --> 15:14:22,680
If you'd like to research more, you can find all the stuff about what's going on here in
9599
15:14:22,680 --> 15:14:24,680
the documentation.
9600
15:14:24,680 --> 15:14:26,400
But let's just keep pushing forward.
9601
15:14:26,400 --> 15:14:27,800
What else do we want to know?
9602
15:14:27,800 --> 15:14:31,800
So let's find the length of the train data loader.
9603
15:14:31,800 --> 15:14:37,560
We will go length train data loader.
9604
15:14:37,560 --> 15:14:42,640
So this is going to tell us how many batches there are, batches of, which of course is batch
9605
15:14:42,640 --> 15:14:44,560
size.
9606
15:14:44,560 --> 15:14:51,120
And we want print length of test data loader.
9607
15:14:51,120 --> 15:14:59,720
We want length test data loader batches of batch size dot dot dot.
9608
15:14:59,720 --> 15:15:01,160
So let's find out some information.
9609
15:15:01,160 --> 15:15:02,160
What do we have?
9610
15:15:02,160 --> 15:15:03,760
Oh, there we go.
9611
15:15:03,760 --> 15:15:06,760
So just we're seeing what we saw before with this one.
9612
15:15:06,760 --> 15:15:08,560
But this is more interesting here.
9613
15:15:08,560 --> 15:15:09,560
Length of train data loader.
9614
15:15:09,560 --> 15:15:13,280
Yeah, we have about 1,875 batches of 32.
9615
15:15:13,280 --> 15:15:20,560
So if we do 60,000 training samples divided by 32, yeah, it comes out to 1,875.
9616
15:15:20,560 --> 15:15:26,720
And if we did the same with 10,000 for testing samples of 32, it comes out at 313.
9617
15:15:26,720 --> 15:15:27,720
This gets rounded up.
9618
15:15:27,720 --> 15:15:32,480
So this is what I meant, that the last batch will have maybe not 32 because 32 doesn't
9619
15:15:32,480 --> 15:15:36,160
divide evenly into 10,000, but that's okay.
9620
15:15:36,160 --> 15:15:44,920
And so this means that our model is going to look at 1,875 individual batches of 32
9621
15:15:44,920 --> 15:15:49,800
images, rather than just one big batch of 60,000 images.
9622
15:15:49,800 --> 15:15:55,480
Now of course, the number of batches we have will change if we change the batch size.
9623
15:15:55,480 --> 15:15:58,400
So we have 469 batches of 128.
9624
15:15:58,400 --> 15:16:01,200
And if we reduce this down to one, what do we get?
9625
15:16:01,200 --> 15:16:03,040
We have a batch per sample.
9626
15:16:03,040 --> 15:16:09,040
So 60,000 batches of 1, 10,000 batches of 1, we're going to stick with 32.
9627
15:16:09,040 --> 15:16:10,520
But now let's visualize.
9628
15:16:10,520 --> 15:16:12,520
So we've got them in train data loader.
9629
15:16:12,520 --> 15:16:17,400
How would we visualize a batch or a single image from a batch?
9630
15:16:17,400 --> 15:16:18,840
So let's show a sample.
9631
15:16:18,840 --> 15:16:21,640
I'll show you how you can interact with a data loader.
9632
15:16:21,640 --> 15:16:25,040
We're going to use randomness as well.
9633
15:16:25,040 --> 15:16:31,480
So we'll set a manual seed and then we'll get a random index, random idx equals torch
9634
15:16:31,480 --> 15:16:33,560
rand int.
9635
15:16:33,560 --> 15:16:37,880
We're going to go from zero to length of train features batch.
9636
15:16:37,880 --> 15:16:39,960
Oh, where did I get that from?
9637
15:16:39,960 --> 15:16:40,960
Excuse me.
9638
15:16:40,960 --> 15:16:42,320
Getting ahead of myself here.
9639
15:16:42,320 --> 15:16:48,440
I want to check out what's inside the training data loader.
9640
15:16:48,440 --> 15:16:51,320
We'll check out what's inside the training data loader because the test data load is
9641
15:16:51,320 --> 15:16:53,360
going to be similar.
9642
15:16:53,360 --> 15:16:55,680
So we want the train features batch.
9643
15:16:55,680 --> 15:17:01,120
So I say features as in the images themselves and the train labels batch is going to be
9644
15:17:01,120 --> 15:17:05,740
the labels of our data set or the targets in pytorch terminology.
9645
15:17:05,740 --> 15:17:08,040
So next idar data loader.
9646
15:17:08,040 --> 15:17:16,600
So because our data loader has 1875 batches of 32, we're going to turn it into an iterable
9647
15:17:16,600 --> 15:17:24,840
with ita and we're going to get the next batch with next and then we can go here train features
9648
15:17:24,840 --> 15:17:32,080
batch.shape and we'll get train labels batch.shape.
9649
15:17:32,080 --> 15:17:34,520
What do you think this is going to give us?
9650
15:17:34,520 --> 15:17:35,520
Well, there we go.
9651
15:17:35,520 --> 15:17:36,520
Look at that.
9652
15:17:36,520 --> 15:17:37,520
So we have a tensor.
9653
15:17:37,520 --> 15:17:40,440
Each batch we have 32 samples.
9654
15:17:40,440 --> 15:17:45,680
So this is batch size and this is color channels and this is height and this is width.
9655
15:17:45,680 --> 15:17:49,800
And then we have 32 labels associated with the 32 samples.
9656
15:17:49,800 --> 15:17:56,040
Now where have we seen this before, if we go back through our keynote input and output
9657
15:17:56,040 --> 15:17:57,040
shapes.
9658
15:17:57,040 --> 15:18:00,080
So we have shape equals 32, 28, 28, 1.
9659
15:18:00,080 --> 15:18:06,280
So this is color channels last, but ours is currently in color channels first.
9660
15:18:06,280 --> 15:18:11,520
Now again, I sound like a broken record here, but these will vary depending on the problem
9661
15:18:11,520 --> 15:18:12,720
you're working with.
9662
15:18:12,720 --> 15:18:17,640
If we had larger images, what would change or the height and width dimensions would change.
9663
15:18:17,640 --> 15:18:21,960
If we had color images, the color dimension would change, but the premise is still the
9664
15:18:21,960 --> 15:18:22,960
same.
9665
15:18:22,960 --> 15:18:26,880
We're turning our data into batches so that we can pass that to a model.
9666
15:18:26,880 --> 15:18:27,960
Let's come back.
9667
15:18:27,960 --> 15:18:30,240
Let's keep going with our visualization.
9668
15:18:30,240 --> 15:18:36,720
So we want to visualize one of the random samples from a batch and then we're going to
9669
15:18:36,720 --> 15:18:44,000
go image label equals train features batch and we're going to get the random IDX from
9670
15:18:44,000 --> 15:18:50,920
that and we'll get the train labels batch and we'll get the random IDX from that.
9671
15:18:50,920 --> 15:18:56,920
So we're matching up on the, we've got one batch here, train features batch, train labels
9672
15:18:56,920 --> 15:19:03,040
batch and we're just getting the image and the label at a random index within that batch.
9673
15:19:03,040 --> 15:19:07,600
So excuse me, I need to set this equal there.
9674
15:19:07,600 --> 15:19:13,080
And then we're going to go PLT dot in show, what are we going to show?
9675
15:19:13,080 --> 15:19:16,440
We're going to show the image but we're going to have to squeeze it to remove that singular
9676
15:19:16,440 --> 15:19:23,000
dimension and then we'll set the C map equal to gray and then we'll go PLT dot title, we'll
9677
15:19:23,000 --> 15:19:29,440
set the title which is going to be the class names indexed by the label integer and then
9678
15:19:29,440 --> 15:19:32,240
we can turn off the accesses.
9679
15:19:32,240 --> 15:19:37,960
You can use off here or you can use false, depends on what you'd like to use.
9680
15:19:37,960 --> 15:19:43,480
Let's print out the image size because you can never know enough about your data and
9681
15:19:43,480 --> 15:19:55,920
then print, let's also get the label, label and label shape or label size.
9682
15:19:55,920 --> 15:20:02,960
Our label will be just a single integer so it might not have a shape but that's okay.
9683
15:20:02,960 --> 15:20:03,960
Let's have a look.
9684
15:20:03,960 --> 15:20:04,960
Oh, bag.
9685
15:20:04,960 --> 15:20:07,800
See, look, that's quite hard to understand.
9686
15:20:07,800 --> 15:20:10,200
I wouldn't be able to detect that that's a bag.
9687
15:20:10,200 --> 15:20:12,840
Can you tell me that you could write a program to understand that?
9688
15:20:12,840 --> 15:20:15,640
That just looks like a warped rectangle to me.
9689
15:20:15,640 --> 15:20:19,600
But if we had to look at another one, we'll get another random, oh, we've got a random
9690
15:20:19,600 --> 15:20:23,480
seed so it's going to produce the same image each time.
9691
15:20:23,480 --> 15:20:25,760
So we have a shirt, okay, a shirt.
9692
15:20:25,760 --> 15:20:28,280
So we see the image size there, 128, 28.
9693
15:20:28,280 --> 15:20:34,240
Now, recall that the image size is, it's a single image so it doesn't have a batch dimension.
9694
15:20:34,240 --> 15:20:37,840
So this is just color channels height width.
9695
15:20:37,840 --> 15:20:44,520
We'll go again, label four, which is a coat and we could keep doing this to become more
9696
15:20:44,520 --> 15:20:45,800
and more familiar with our data.
9697
15:20:45,800 --> 15:20:52,680
But these are all from this particular batch that we created here, coat and we'll do one
9698
15:20:52,680 --> 15:20:53,680
more, another coat.
9699
15:20:53,680 --> 15:20:55,600
We'll do one more just to make sure it's not a coat.
9700
15:20:55,600 --> 15:20:56,600
There we go.
9701
15:20:56,600 --> 15:20:57,600
We've got a bag.
9702
15:20:57,600 --> 15:20:58,600
Beautiful.
9703
15:20:58,600 --> 15:21:02,200
So we've now turned our data into data loaders.
9704
15:21:02,200 --> 15:21:07,880
So we could use these to pass them into a model, but we don't have a model.
9705
15:21:07,880 --> 15:21:12,280
So I think it's time in the next video, we start to build model zero.
9706
15:21:12,280 --> 15:21:14,440
We start to build a baseline.
9707
15:21:14,440 --> 15:21:17,840
I'll see you in the next video.
9708
15:21:17,840 --> 15:21:18,840
Welcome back.
9709
15:21:18,840 --> 15:21:24,120
So in the last video, we got our data sets or our data set into data loaders.
9710
15:21:24,120 --> 15:21:31,040
So now we have 1,875 batches of 32 images off of the training data set rather than 60,000
9711
15:21:31,040 --> 15:21:33,040
in a one big data set.
9712
15:21:33,040 --> 15:21:38,960
And we have 13 or 313 batches of 32 for the test data set.
9713
15:21:38,960 --> 15:21:41,760
Then we learned how to visualize it from a batch.
9714
15:21:41,760 --> 15:21:47,280
And we saw that we have still the same image size, one color channel, 28, 28.
9715
15:21:47,280 --> 15:21:52,720
All we've done is we've turned them into batches so that we can pass them to our model.
9716
15:21:52,720 --> 15:21:55,480
And speaking of model, let's have a look at our workflow.
9717
15:21:55,480 --> 15:21:56,480
Where are we up to?
9718
15:21:56,480 --> 15:21:58,120
Well, we've got our data ready.
9719
15:21:58,120 --> 15:22:03,840
We've turned it into tensors through a combination of torch vision transforms, torch utils data
9720
15:22:03,840 --> 15:22:04,840
dot data set.
9721
15:22:04,840 --> 15:22:08,320
We didn't have to use that one because torch vision dot data sets did it for us with the
9722
15:22:08,320 --> 15:22:11,360
fashion MNIST data set, but we did use that one.
9723
15:22:11,360 --> 15:22:18,000
We did torch utils dot data, the data loader to turn our data sets into data loaders.
9724
15:22:18,000 --> 15:22:21,840
Now we're up to building or picking a pre-trained model to suit your problem.
9725
15:22:21,840 --> 15:22:23,720
So let's start simply.
9726
15:22:23,720 --> 15:22:25,720
Let's build a baseline model.
9727
15:22:25,720 --> 15:22:29,640
And this is very exciting because we're going to build our first model, our first computer
9728
15:22:29,640 --> 15:22:33,560
vision model, albeit a baseline, but that's an important step.
9729
15:22:33,560 --> 15:22:35,880
So I'm just going to write down here.
9730
15:22:35,880 --> 15:22:46,520
When starting to build a series of machine learning modeling experiments, it's best practice
9731
15:22:46,520 --> 15:22:48,880
to start with a baseline model.
9732
15:22:48,880 --> 15:22:55,440
I'm going to turn this into markdown.
9733
15:22:55,440 --> 15:22:57,200
A baseline model.
9734
15:22:57,200 --> 15:23:01,520
So a baseline model is a simple model.
9735
15:23:01,520 --> 15:23:12,080
You will try and improve upon with subsequent models, models slash experiments.
9736
15:23:12,080 --> 15:23:22,080
So you start simply, in other words, start simply and add complexity when necessary because
9737
15:23:22,080 --> 15:23:24,240
neural networks are pretty powerful, right?
9738
15:23:24,240 --> 15:23:28,760
And so they have a tendency to almost do too well on our data set.
9739
15:23:28,760 --> 15:23:33,040
That's a concept known as overfitting, which we'll cover a little bit more later.
9740
15:23:33,040 --> 15:23:35,960
But we built a simple model to begin with, a baseline.
9741
15:23:35,960 --> 15:23:40,800
And then our whole goal will be to run experiments, according to the workflow, improve through
9742
15:23:40,800 --> 15:23:41,800
experimentation.
9743
15:23:41,800 --> 15:23:43,000
Again, this is just a guide.
9744
15:23:43,000 --> 15:23:47,200
It's not set in stone, but this is the general pattern of how things go.
9745
15:23:47,200 --> 15:23:51,520
Get data ready, build a model, fit the model, evaluate, improve the model.
9746
15:23:51,520 --> 15:23:54,440
So the first model that we build is generally a baseline.
9747
15:23:54,440 --> 15:23:57,640
And then later on, we want to improve through experimentation.
9748
15:23:57,640 --> 15:23:59,840
So let's start building a baseline.
9749
15:23:59,840 --> 15:24:03,320
But I'm going to introduce to you a new layer that we haven't seen before.
9750
15:24:03,320 --> 15:24:06,040
That is creating a flatten layer.
9751
15:24:06,040 --> 15:24:07,480
Now what is a flatten layer?
9752
15:24:07,480 --> 15:24:11,760
Well, this is best seen when we code it out.
9753
15:24:11,760 --> 15:24:15,800
So let's create a flatten model, which is just going to be nn.flatten.
9754
15:24:15,800 --> 15:24:18,000
And where could we find the documentation for this?
9755
15:24:18,000 --> 15:24:24,720
We go nn flatten, flatten in pytorch, what does it do?
9756
15:24:24,720 --> 15:24:30,000
Flattens a continuous range of dims into a tensor, for use with sequential.
9757
15:24:30,000 --> 15:24:35,200
So there's an example there, but I'd rather, if and doubt, code it out.
9758
15:24:35,200 --> 15:24:36,880
So we'll create the flatten layer.
9759
15:24:36,880 --> 15:24:42,240
And of course, all nn.flatten or nn.modules could be used as a model on their own.
9760
15:24:42,240 --> 15:24:48,480
So we're going to get a single sample.
9761
15:24:48,480 --> 15:24:52,680
So x equals train features batch.
9762
15:24:52,680 --> 15:24:54,320
Let's get the first one, zero.
9763
15:24:54,320 --> 15:24:56,000
What does this look like?
9764
15:24:56,000 --> 15:25:04,240
So it's a tensor, x, maybe we get the shape of it as well, x shape.
9765
15:25:04,240 --> 15:25:05,240
What do we get?
9766
15:25:05,240 --> 15:25:06,240
There we go.
9767
15:25:06,240 --> 15:25:07,580
So that's the shape of x.
9768
15:25:07,580 --> 15:25:09,760
Keep that in mind when we pass it through the flatten layer.
9769
15:25:09,760 --> 15:25:13,600
Do you have an inkling of what flatten might do?
9770
15:25:13,600 --> 15:25:18,120
So our shape to begin with is what, 128, 28.
9771
15:25:18,120 --> 15:25:22,440
Now let's flatten the sample.
9772
15:25:22,440 --> 15:25:28,120
So output equals, we're going to pass it to the flatten model, x.
9773
15:25:28,120 --> 15:25:32,200
So this is going to perform the forward pass internally on the flatten layer.
9774
15:25:32,200 --> 15:25:34,760
So perform forward pass.
9775
15:25:34,760 --> 15:25:37,480
Now let's print out what happened.
9776
15:25:37,480 --> 15:25:49,160
Print, shape before flattening equals x dot shape.
9777
15:25:49,160 --> 15:25:56,640
And we're going to print shape after flattening equals output dot shape.
9778
15:25:56,640 --> 15:26:02,200
So we're just taking the output of the flatten model and printing its shape here.
9779
15:26:02,200 --> 15:26:06,400
Oh, do you notice what happened?
9780
15:26:06,400 --> 15:26:12,200
Well we've gone from 128, 28 to 1784.
9781
15:26:12,200 --> 15:26:16,200
Wow what does the output look like?
9782
15:26:16,200 --> 15:26:17,200
Output.
9783
15:26:17,200 --> 15:26:23,720
Oh, the values are now in all one big vector and if we squeeze that we can remove the extra
9784
15:26:23,720 --> 15:26:24,720
dimension.
9785
15:26:24,720 --> 15:26:27,920
So we've got one big vector of values.
9786
15:26:27,920 --> 15:26:29,840
Now where did this number come from?
9787
15:26:29,840 --> 15:26:33,800
Well, if we take this and this is what shape is it?
9788
15:26:33,800 --> 15:26:34,960
We've got color channels.
9789
15:26:34,960 --> 15:26:35,960
We've got height.
9790
15:26:35,960 --> 15:26:46,840
We've got width and now we've flattened it to be color channels, height, width.
9791
15:26:46,840 --> 15:26:53,240
So we've got one big feature vector because 28 by 28 equals what?
9792
15:26:53,240 --> 15:26:58,400
We've got one value per pixel, 784.
9793
15:26:58,400 --> 15:27:02,400
One value per pixel in our output vector.
9794
15:27:02,400 --> 15:27:06,200
Now where did we see this before?
9795
15:27:06,200 --> 15:27:10,760
If we go back to our keynote, if we have a look at Tesla's takes eight cameras and then
9796
15:27:10,760 --> 15:27:16,560
it turns it into a three dimensional vector space, vector space.
9797
15:27:16,560 --> 15:27:17,960
So that's what we're trying to do here.
9798
15:27:17,960 --> 15:27:22,200
We're trying to encode whatever data we're working with in Tesla's case.
9799
15:27:22,200 --> 15:27:24,200
They have eight cameras.
9800
15:27:24,200 --> 15:27:28,200
Now theirs has more dimensions than ours because they have the time aspect because they're
9801
15:27:28,200 --> 15:27:30,400
dealing with video and they have multiple different camera angles.
9802
15:27:30,400 --> 15:27:32,360
We're just dealing with a single image here.
9803
15:27:32,360 --> 15:27:34,520
But regardless, the concept is the same.
9804
15:27:34,520 --> 15:27:39,440
We're trying to condense information down into a single vector space.
9805
15:27:39,440 --> 15:27:43,280
And so if we come back to here, why might we do this?
9806
15:27:43,280 --> 15:27:48,280
Well, it's because we're going to build a baseline model and we're going to use a linear
9807
15:27:48,280 --> 15:27:50,360
layer as the baseline model.
9808
15:27:50,360 --> 15:27:53,480
And the linear layer can't handle multi dimensional data like this.
9809
15:27:53,480 --> 15:27:56,960
We want it to have a single vector as input.
9810
15:27:56,960 --> 15:28:00,600
Now this will make a lot more sense after we've coded up our model.
9811
15:28:00,600 --> 15:28:11,520
Let's do that from torch import and then we're going to go class, fashion, amnest, model
9812
15:28:11,520 --> 15:28:12,520
V zero.
9813
15:28:12,520 --> 15:28:16,520
We're going to inherit from an end dot module.
9814
15:28:16,520 --> 15:28:19,960
And inside here, we're going to have an init function in the constructor.
9815
15:28:19,960 --> 15:28:22,040
We're going to pass in self.
9816
15:28:22,040 --> 15:28:26,840
We're going to have an input shape, which we'll use a type hint, which will take an integer
9817
15:28:26,840 --> 15:28:31,040
because remember, input shape is very important for machine learning models.
9818
15:28:31,040 --> 15:28:34,600
We're going to define a number of hidden units, which will also be an integer, and then we're
9819
15:28:34,600 --> 15:28:38,280
going to define our output shape, which will be what do you think our output shape will
9820
15:28:38,280 --> 15:28:39,280
be?
9821
15:28:39,280 --> 15:28:41,920
How many classes are we dealing with?
9822
15:28:41,920 --> 15:28:43,320
We're dealing with 10 different classes.
9823
15:28:43,320 --> 15:28:47,560
So our output shape will be, I'll save that for later on.
9824
15:28:47,560 --> 15:28:53,360
I'll let you guess for now, or you might already know, we're going to initialize it.
9825
15:28:53,360 --> 15:28:56,720
And then we're going to create our layer stack.
9826
15:28:56,720 --> 15:29:02,920
self.layer stack equals nn.sequential, recall that sequential, whatever you put inside sequential,
9827
15:29:02,920 --> 15:29:06,600
if data goes through sequential, it's going to go through it layer by layer.
9828
15:29:06,600 --> 15:29:10,120
So let's create our first layer, which is going to be nn.flatten.
9829
15:29:10,120 --> 15:29:15,600
So that means anything that comes into this first layer, what's going to happen to it?
9830
15:29:15,600 --> 15:29:18,720
It's going to flatten its external dimensions here.
9831
15:29:18,720 --> 15:29:22,600
So it's going to flatten these into something like this.
9832
15:29:22,600 --> 15:29:25,800
So we're going to flatten it first, flatten our data.
9833
15:29:25,800 --> 15:29:28,640
Then we're going to pass in our linear layer.
9834
15:29:28,640 --> 15:29:33,840
And we're going to have how many n features this is going to be input shape, because we're
9835
15:29:33,840 --> 15:29:36,360
going to define our input shape here.
9836
15:29:36,360 --> 15:29:43,160
And then we're going to go out features, equals hidden units.
9837
15:29:43,160 --> 15:29:46,040
And then we're going to create another linear layer here.
9838
15:29:46,040 --> 15:29:50,040
And we're going to set up n features, equals hidden units.
9839
15:29:50,040 --> 15:29:51,560
Why are we doing this?
9840
15:29:51,560 --> 15:29:54,280
And then out features equals output shape.
9841
15:29:54,280 --> 15:29:58,840
Why are we putting the same out features here as the n features here?
9842
15:29:58,840 --> 15:30:05,600
Well, because subsequent layers, the input of this layer here, its input shape has to
9843
15:30:05,600 --> 15:30:09,000
line up with the output shape of this layer here.
9844
15:30:09,000 --> 15:30:15,560
Hence why we use out features as hidden units for the output of this nn.linear layer.
9845
15:30:15,560 --> 15:30:22,200
And then we use n features as hidden units for the input value of this hidden layer here.
9846
15:30:22,200 --> 15:30:24,160
So let's keep going.
9847
15:30:24,160 --> 15:30:25,160
Let's go def.
9848
15:30:25,160 --> 15:30:30,040
We'll create the forward pass here, because if we subclass nn.module, we have to override
9849
15:30:30,040 --> 15:30:31,640
the forward method.
9850
15:30:31,640 --> 15:30:33,760
The forward method is going to define what?
9851
15:30:33,760 --> 15:30:38,000
It's going to define the forward computation of our model.
9852
15:30:38,000 --> 15:30:43,560
So we're just going to return self.layer stack of x.
9853
15:30:43,560 --> 15:30:49,280
So our model is going to take some input, x, which could be here, x.
9854
15:30:49,280 --> 15:30:53,400
In our case, it's going to be a batch at a time, and then it's going to pass each sample
9855
15:30:53,400 --> 15:30:54,400
through the flatten layer.
9856
15:30:54,400 --> 15:30:59,080
It's going to pass the output of the flatten layer to this first linear layer, and it's
9857
15:30:59,080 --> 15:31:04,240
going to pass the output of this linear layer to this linear layer.
9858
15:31:04,240 --> 15:31:05,240
So that's it.
9859
15:31:05,240 --> 15:31:08,640
Our model is just two linear layers with a flatten layer.
9860
15:31:08,640 --> 15:31:11,480
The flatten layer has no learnable parameters.
9861
15:31:11,480 --> 15:31:13,680
Only these two do.
9862
15:31:13,680 --> 15:31:16,400
And we have no nonlinearities.
9863
15:31:16,400 --> 15:31:18,920
So do you think this will work?
9864
15:31:18,920 --> 15:31:21,080
Does our data set need nonlinearities?
9865
15:31:21,080 --> 15:31:26,920
Well, we can find out once we fit our model to the data, but let's set up an instance
9866
15:31:26,920 --> 15:31:27,920
of our model.
9867
15:31:27,920 --> 15:31:33,000
So torch dot manual seed.
9868
15:31:33,000 --> 15:31:38,320
Let's go set up model with input parameters.
9869
15:31:38,320 --> 15:31:44,800
So we have model zero equals fashion MNIST model, which is just the same class that we
9870
15:31:44,800 --> 15:31:47,440
wrote above.
9871
15:31:47,440 --> 15:31:53,600
And here's where we're going to define the input shape equals 784.
9872
15:31:53,600 --> 15:31:56,920
Where will I get that from?
9873
15:31:56,920 --> 15:31:59,320
Well, that's here.
9874
15:31:59,320 --> 15:32:00,600
That's 28 by 28.
9875
15:32:00,600 --> 15:32:04,800
So the output of flatten needs to be the input shape here.
9876
15:32:04,800 --> 15:32:10,240
So we could put 28 by 28 there, or we're just going to put 784 and then write a comment
9877
15:32:10,240 --> 15:32:11,240
here.
9878
15:32:11,240 --> 15:32:14,440
This is 28 by 28.
9879
15:32:14,440 --> 15:32:22,400
Now if we go, I wonder if nn.linear will tell us, nn.linear will tell us what it expects
9880
15:32:22,400 --> 15:32:25,560
as in features.
9881
15:32:25,560 --> 15:32:32,760
Size of each input sample, shape, where star means any number of dimensions, including
9882
15:32:32,760 --> 15:32:38,720
none in features, linear weight, well, let's figure it out.
9883
15:32:38,720 --> 15:32:43,120
Let's see what happens if in doubt coded out, hey, we'll see what we can do.
9884
15:32:43,120 --> 15:32:47,240
In units equals, let's go with 10 to begin with.
9885
15:32:47,240 --> 15:32:53,120
How many units in the hidden layer?
9886
15:32:53,120 --> 15:32:57,200
And then the output shape is going to be what?
9887
15:32:57,200 --> 15:33:06,560
Output shape is length of class names, which will be 1 for every class.
9888
15:33:06,560 --> 15:33:07,800
Beautiful.
9889
15:33:07,800 --> 15:33:09,320
And now let's go model zero.
9890
15:33:09,320 --> 15:33:11,760
We're going to keep it on the CPU to begin with.
9891
15:33:11,760 --> 15:33:16,880
We could write device-agnostic code, but to begin, we're going to send it to the CPU.
9892
15:33:16,880 --> 15:33:20,320
I might just put that up here, actually, to CPU.
9893
15:33:20,320 --> 15:33:25,240
And then let's have a look at model zero.
9894
15:33:25,240 --> 15:33:26,800
Wonderful.
9895
15:33:26,800 --> 15:33:29,280
So we can try to do a dummy forward pass and see what happens.
9896
15:33:29,280 --> 15:33:37,600
So let's create dummy x equals torch, rand, we'll create it as the same size of image.
9897
15:33:37,600 --> 15:33:38,600
Just a singular image.
9898
15:33:38,600 --> 15:33:43,920
So this is going to be a batch of one, color channel one, height 28, height 28.
9899
15:33:43,920 --> 15:33:50,080
And we're going to go model zero and pass through dummy x.
9900
15:33:50,080 --> 15:33:54,680
So this is going to send dummy x through the forward method.
9901
15:33:54,680 --> 15:33:56,600
Let's see what happens.
9902
15:33:56,600 --> 15:33:59,000
Okay, wonderful.
9903
15:33:59,000 --> 15:34:06,280
So we get an output of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 logits.
9904
15:34:06,280 --> 15:34:07,280
Beautiful.
9905
15:34:07,280 --> 15:34:08,280
That's exactly what we want.
9906
15:34:08,280 --> 15:34:11,600
We have one logit value per class that we have.
9907
15:34:11,600 --> 15:34:15,760
Now what would happen if we got rid of flatten?
9908
15:34:15,760 --> 15:34:18,960
Then we ran this, ran this, ran this.
9909
15:34:18,960 --> 15:34:20,520
What do we get?
9910
15:34:20,520 --> 15:34:25,560
Oh, mat one and mat two shapes cannot be multiplied.
9911
15:34:25,560 --> 15:34:29,160
So we have 28 by 28 and 7.
9912
15:34:29,160 --> 15:34:34,800
Okay, what happens if we change our input shape to 28?
9913
15:34:34,800 --> 15:34:37,560
We're getting shape mismatches here.
9914
15:34:37,560 --> 15:34:38,560
What happens here?
9915
15:34:38,560 --> 15:34:45,080
Oh, okay, we get an interesting output, but this is still not the right shape, is it?
9916
15:34:45,080 --> 15:34:46,600
So that's where the flatten layer comes in.
9917
15:34:46,600 --> 15:34:48,600
What is the shape of this?
9918
15:34:48,600 --> 15:34:52,440
Oh, we get 1, 1, 28, 10.
9919
15:34:52,440 --> 15:34:58,560
Oh, so that's why we put in flatten so that it combines it into a vector.
9920
15:34:58,560 --> 15:35:01,560
So we get rid of this, see if we just leave it in this shape?
9921
15:35:01,560 --> 15:35:05,680
We get 28 different samples of 10, which is not what we want.
9922
15:35:05,680 --> 15:35:09,200
We want to compress our image into a singular vector and pass it in.
9923
15:35:09,200 --> 15:35:13,240
So let's reinstanceuate the flatten layer and let's make sure we've got the right input
9924
15:35:13,240 --> 15:35:19,080
shape here, 28 by 28, and let's pass it through, torch size 110.
9925
15:35:19,080 --> 15:35:22,960
That's exactly what we want, 1 logit per class.
9926
15:35:22,960 --> 15:35:27,320
So this could be a bit fiddly when you first start, but it's also a lot of fun once you
9927
15:35:27,320 --> 15:35:28,800
get it to work.
9928
15:35:28,800 --> 15:35:32,400
And so just keep that in mind, I showed you what it looks like when you have an error.
9929
15:35:32,400 --> 15:35:37,320
One of the biggest errors that you're going to face in machine learning is different tensor
9930
15:35:37,320 --> 15:35:39,200
shape mismatches.
9931
15:35:39,200 --> 15:35:43,960
So just keep in mind the data that you're working with and then have a look at the documentation
9932
15:35:43,960 --> 15:35:48,000
for what input shape certain layers expect.
9933
15:35:48,000 --> 15:35:51,840
So with that being said, I think it's now time that we start moving towards training
9934
15:35:51,840 --> 15:35:53,360
our model.
9935
15:35:53,360 --> 15:35:56,800
I'll see you in the next video.
9936
15:35:56,800 --> 15:35:57,800
Welcome back.
9937
15:35:57,800 --> 15:36:02,400
In the last video, we created model zero, which is going to be our baseline model for
9938
15:36:02,400 --> 15:36:08,560
our computer vision problem of detecting different types of clothing in 28 by 28 gray scale
9939
15:36:08,560 --> 15:36:09,560
images.
9940
15:36:09,560 --> 15:36:14,800
And we also learned the concept of making sure our or we rehashed on the concept of
9941
15:36:14,800 --> 15:36:19,560
making sure our input and output shapes line up with where they need to be.
9942
15:36:19,560 --> 15:36:22,800
We also did a dummy forward pass with some dummy data.
9943
15:36:22,800 --> 15:36:26,880
This is a great way to troubleshoot to see if your model shapes are correct.
9944
15:36:26,880 --> 15:36:32,000
If they come out correctly and if the inputs are lining up with where they need to be.
9945
15:36:32,000 --> 15:36:37,440
And just to rehash on what our model is going to be or what's inside our model, if we check
9946
15:36:37,440 --> 15:36:44,520
model zero state dict, what we see here is that our first layer has a weight tensor.
9947
15:36:44,520 --> 15:36:51,480
It also has a bias and our next layer has a weight tensor and it also has a bias.
9948
15:36:51,480 --> 15:36:56,840
So these are of course initialized with random values, but the whole premise of deep learning
9949
15:36:56,840 --> 15:37:02,200
and machine learning is to pass data through our model and use our optimizer to update
9950
15:37:02,200 --> 15:37:07,000
these random values to better represent the features in our data.
9951
15:37:07,000 --> 15:37:10,560
And I keep saying features, but I just want to rehash on that before we move on to the
9952
15:37:10,560 --> 15:37:12,040
next thing.
9953
15:37:12,040 --> 15:37:14,400
Featuring data could be almost anything.
9954
15:37:14,400 --> 15:37:17,400
So for example, the feature of this bag could be that it's got a rounded handle at the
9955
15:37:17,400 --> 15:37:18,400
top.
9956
15:37:18,400 --> 15:37:19,400
It has a edge over here.
9957
15:37:19,400 --> 15:37:21,000
It has an edge over there.
9958
15:37:21,000 --> 15:37:25,680
Now, we aren't going to tell our model what features to learn about the data.
9959
15:37:25,680 --> 15:37:29,680
The whole premise of it is to, or the whole fun, the whole magic behind machine learning
9960
15:37:29,680 --> 15:37:33,520
is that it figures out what features to learn.
9961
15:37:33,520 --> 15:37:39,680
And so that is what the weights and bias matrices or tensors will represent is different features
9962
15:37:39,680 --> 15:37:40,840
in our images.
9963
15:37:40,840 --> 15:37:45,960
And there could be many because we have 60,000 images of 10 classes.
9964
15:37:45,960 --> 15:37:46,960
So let's keep pushing forward.
9965
15:37:46,960 --> 15:37:50,960
It's now time to set up a loss function and an optimizer.
9966
15:37:50,960 --> 15:37:58,160
Speaking of optimizers, so 3.1 set up loss optimizer and evaluation metrics.
9967
15:37:58,160 --> 15:38:02,760
Now recall in notebook two, I'm going to turn this into markdown.
9968
15:38:02,760 --> 15:38:06,200
We created, oh, I don't need an emoji there.
9969
15:38:06,200 --> 15:38:10,160
So this is, by the way, we're just moving through this workflow.
9970
15:38:10,160 --> 15:38:11,680
We've got our data ready into tensors.
9971
15:38:11,680 --> 15:38:12,960
We've built a baseline model.
9972
15:38:12,960 --> 15:38:16,240
It's now time to pick a loss function and an optimizer.
9973
15:38:16,240 --> 15:38:20,240
So we go back to Google Chrome.
9974
15:38:20,240 --> 15:38:21,880
That's right here.
9975
15:38:21,880 --> 15:38:22,880
Loss function.
9976
15:38:22,880 --> 15:38:24,520
What's our loss function going to be?
9977
15:38:24,520 --> 15:38:35,280
Since we're working with multi-class data, our loss function will be NN dot cross entropy
9978
15:38:35,280 --> 15:38:37,280
loss.
9979
15:38:37,280 --> 15:38:43,840
And our optimizer, we've got a few options here with the optimizer, but we've had practice
9980
15:38:43,840 --> 15:38:49,480
in the past with SGD, which stands for stochastic gradient descent and the atom optimizer.
9981
15:38:49,480 --> 15:38:56,240
So our optimizer, let's just stick with SGD, which is kind of the entry level optimizer
9982
15:38:56,240 --> 15:39:05,800
torch opt in SGD for stochastic gradient descent.
9983
15:39:05,800 --> 15:39:17,400
And finally, our evaluation metric, since we're working on a classification problem, let's
9984
15:39:17,400 --> 15:39:25,160
use accuracy as our evaluation metric.
9985
15:39:25,160 --> 15:39:28,680
So recall that accuracy is a classification evaluation metric.
9986
15:39:28,680 --> 15:39:30,240
Now, where can we find this?
9987
15:39:30,240 --> 15:39:37,280
Well, if we go into learnpytorch.io, this is the beauty of having online reference material.
9988
15:39:37,280 --> 15:39:42,400
In here, neural network classification with PyTorch, in this notebook, section 02, we
9989
15:39:42,400 --> 15:39:45,960
created, do we have different classification methods?
9990
15:39:45,960 --> 15:39:47,360
Yes, we did.
9991
15:39:47,360 --> 15:39:51,840
So we've got a whole bunch of different options here for classification evaluation metrics.
9992
15:39:51,840 --> 15:39:56,280
We've got accuracy, precision, recall, F1 score, a confusion matrix.
9993
15:39:56,280 --> 15:39:57,960
Now we have some code that we could use.
9994
15:39:57,960 --> 15:40:02,160
If we wanted to use torch metrics for accuracy, we could.
9995
15:40:02,160 --> 15:40:06,960
And torch metrics is a beautiful library that has a lot of evaluation.
9996
15:40:06,960 --> 15:40:09,800
Oh, it doesn't exist.
9997
15:40:09,800 --> 15:40:11,440
What happened to torch metrics?
9998
15:40:11,440 --> 15:40:13,440
Maybe I need to fix that.
9999
15:40:13,440 --> 15:40:15,440
Link.
10000
15:40:15,440 --> 15:40:20,800
Torch metrics has a whole bunch of different PyTorch metrics.
10001
15:40:20,800 --> 15:40:23,400
So very useful library.
10002
15:40:23,400 --> 15:40:29,880
But we also coded a function in here, which is accuracy FN.
10003
15:40:29,880 --> 15:40:35,040
So we could copy this, straight into our notebook here.
10004
15:40:35,040 --> 15:40:40,200
Or I've also, if we go to the PyTorch deep learning GitHub, I'll just bring it over here.
10005
15:40:40,200 --> 15:40:43,440
I've also put it in helper functions.py.
10006
15:40:43,440 --> 15:40:48,140
And this is a script of common functions that we've used throughout the course, including
10007
15:40:48,140 --> 15:40:51,000
if we find accuracy function here.
10008
15:40:51,000 --> 15:40:52,000
Calculate accuracy.
10009
15:40:52,000 --> 15:40:58,280
Now, how would we get this helper functions file, this Python file, into our notebook?
10010
15:40:58,280 --> 15:41:01,560
One way is to just copy the code itself, straight here.
10011
15:41:01,560 --> 15:41:04,080
But let's import it as a Python script.
10012
15:41:04,080 --> 15:41:09,200
So import request, and we're going to go from pathlib import path.
10013
15:41:09,200 --> 15:41:14,880
So we want to download, and this is actually what you're going to see, very common practice
10014
15:41:14,880 --> 15:41:20,000
in larger Python projects, especially deep learning and machine learning projects, is
10015
15:41:20,000 --> 15:41:24,040
different functionality split up in different Python files.
10016
15:41:24,040 --> 15:41:27,440
And that way, you don't have to keep rewriting the same code over and over again.
10017
15:41:27,440 --> 15:41:30,520
Like you know how we've written a training and testing loop a fair few times?
10018
15:41:30,520 --> 15:41:35,640
Well, if we've written it once and it works, we might want to save that to a.py file so
10019
15:41:35,640 --> 15:41:37,560
we can import it later on.
10020
15:41:37,560 --> 15:41:42,360
So let's now write some code to import this helper functions.py file into our notebook
10021
15:41:42,360 --> 15:41:43,360
here.
10022
15:41:43,360 --> 15:41:49,720
So download helper functions from learn pytorch repo.
10023
15:41:49,720 --> 15:41:57,720
So we're going to check if our helper functions.py, if this already exists, we don't want
10024
15:41:57,720 --> 15:41:59,000
to download it.
10025
15:41:59,000 --> 15:42:07,960
So we'll print helper functions.py already exists, skipping download, skipping download
10026
15:42:07,960 --> 15:42:08,960
.dot.
10027
15:42:08,960 --> 15:42:11,520
And we're going to go else here.
10028
15:42:11,520 --> 15:42:20,560
If it doesn't exist, so we're going to download it, downloading helper functions.py.
10029
15:42:20,560 --> 15:42:27,720
And we're going to create a request here with the request library equals request.get.
10030
15:42:27,720 --> 15:42:31,400
Now here's where we have to pass in the URL of this file.
10031
15:42:31,400 --> 15:42:33,240
It's not this URL here.
10032
15:42:33,240 --> 15:42:38,360
When dealing with GitHub, to get the actual URL to the files, many files, you have to
10033
15:42:38,360 --> 15:42:39,600
click the raw button.
10034
15:42:39,600 --> 15:42:43,440
So I'll just go back and show you, click raw here.
10035
15:42:43,440 --> 15:42:45,160
And we're going to copy this raw URL.
10036
15:42:45,160 --> 15:42:47,160
See how it's just text here?
10037
15:42:47,160 --> 15:42:50,920
This is what we want to download into our co-lab notebook.
10038
15:42:50,920 --> 15:42:54,520
And we're going to write it in there, request equals request.get.
10039
15:42:54,520 --> 15:42:59,760
And we're going to go with open, and here's where we're going to save our helper functions
10040
15:42:59,760 --> 15:43:01,240
.py.
10041
15:43:01,240 --> 15:43:05,920
We're going to write binary as file, F is for file.
10042
15:43:05,920 --> 15:43:10,360
We're going to go F.write, request.content.
10043
15:43:10,360 --> 15:43:16,280
So what this is saying is Python is going to create a file called helper functions.py
10044
15:43:16,280 --> 15:43:22,160
and give it write binary permissions as F, F is for file, short for file.
10045
15:43:22,160 --> 15:43:28,240
And then we're going to say F.write, request, get that information from helper functions
10046
15:43:28,240 --> 15:43:33,600
.py here, and write your content to this file here.
10047
15:43:33,600 --> 15:43:39,440
So let's give that a shot.
10048
15:43:39,440 --> 15:43:41,840
Beautiful.
10049
15:43:41,840 --> 15:43:45,320
So downloading helper functions.py, let's have a look in here.
10050
15:43:45,320 --> 15:43:47,520
Do we have helper functions.py?
10051
15:43:47,520 --> 15:43:49,720
Yes, we do.
10052
15:43:49,720 --> 15:43:50,720
Wonderful.
10053
15:43:50,720 --> 15:43:53,800
We can import our accuracy function.
10054
15:43:53,800 --> 15:43:54,800
Where is it?
10055
15:43:54,800 --> 15:43:55,800
There we go.
10056
15:43:55,800 --> 15:43:57,760
Import accuracy function.
10057
15:43:57,760 --> 15:44:02,480
So this is very common practice when writing lots of Python code is to put helper functions
10058
15:44:02,480 --> 15:44:04,600
into.py scripts.
10059
15:44:04,600 --> 15:44:10,760
So let's import the accuracy metric.
10060
15:44:10,760 --> 15:44:13,360
Accuracy metric from helper functions.
10061
15:44:13,360 --> 15:44:15,480
Of course, we could have used torch metrics as well.
10062
15:44:15,480 --> 15:44:19,600
That's another perfectly valid option, but I just thought I'd show you what it's like
10063
15:44:19,600 --> 15:44:23,800
to import your own helper function script.
10064
15:44:23,800 --> 15:44:27,760
Of course, you can customize helper functions.py to have whatever you want in there.
10065
15:44:27,760 --> 15:44:28,760
So see this?
10066
15:44:28,760 --> 15:44:32,640
We've got from helper functions, import accuracy function.
10067
15:44:32,640 --> 15:44:33,640
What's this saying?
10068
15:44:33,640 --> 15:44:34,640
Could not be resolved.
10069
15:44:34,640 --> 15:44:37,200
Is this going to work?
10070
15:44:37,200 --> 15:44:38,280
It did.
10071
15:44:38,280 --> 15:44:43,760
And where you can go accuracy function, do we get a doc string?
10072
15:44:43,760 --> 15:44:45,560
Hmm.
10073
15:44:45,560 --> 15:44:47,720
Seems like colab isn't picking things up, but that's all right.
10074
15:44:47,720 --> 15:44:48,720
It looks like it still worked.
10075
15:44:48,720 --> 15:44:52,040
We'll find out later on if it actually works when we train our model.
10076
15:44:52,040 --> 15:44:58,000
So set up loss function and optimizer.
10077
15:44:58,000 --> 15:45:04,640
So I'm going to set up the loss function equals nn dot cross entropy loss.
10078
15:45:04,640 --> 15:45:10,000
And I'm going to set up the optimizer here as we discussed before as torch dot opt-in
10079
15:45:10,000 --> 15:45:13,120
dot SGD for stochastic gradient descent.
10080
15:45:13,120 --> 15:45:18,480
The parameters I want to optimize are the parameters from model zero, our baseline model,
10081
15:45:18,480 --> 15:45:21,440
which we had a look at before, which are all these random numbers.
10082
15:45:21,440 --> 15:45:25,360
We'd like our optimizer to tweak them in some way, shape, or form to better represent our
10083
15:45:25,360 --> 15:45:26,680
data.
10084
15:45:26,680 --> 15:45:28,680
And then I'm going to set the learning rate here.
10085
15:45:28,680 --> 15:45:30,520
How much should they be tweaked each epoch?
10086
15:45:30,520 --> 15:45:32,880
I'm going to set it to 0.1.
10087
15:45:32,880 --> 15:45:36,080
Nice and high because our data set is quite simple.
10088
15:45:36,080 --> 15:45:37,720
It's 28 by 28 images.
10089
15:45:37,720 --> 15:45:39,240
There are 60,000 of them.
10090
15:45:39,240 --> 15:45:44,320
But again, if this doesn't work, we can always adjust this and experiment, experiment,
10091
15:45:44,320 --> 15:45:45,320
experiment.
10092
15:45:45,320 --> 15:45:46,320
So let's run that.
10093
15:45:46,320 --> 15:45:47,320
We've got a loss function.
10094
15:45:47,320 --> 15:45:49,680
Is this going to give me a doc string?
10095
15:45:49,680 --> 15:45:50,880
There we go.
10096
15:45:50,880 --> 15:45:54,040
So calculates accuracy between truth and predictions.
10097
15:45:54,040 --> 15:45:55,880
Now, where does this doc string come from?
10098
15:45:55,880 --> 15:45:59,960
Well, let's have a look, hope of functions.
10099
15:45:59,960 --> 15:46:02,520
That's what we wrote before.
10100
15:46:02,520 --> 15:46:06,880
Good on us for writing good doc strings, accuracy function.
10101
15:46:06,880 --> 15:46:12,200
Well, we're going to test all these out in the next video when we write a training loop.
10102
15:46:12,200 --> 15:46:18,600
So, oh, actually, I think we might do one more function before we write a training loop.
10103
15:46:18,600 --> 15:46:21,440
How about we create a function to time our experiments?
10104
15:46:21,440 --> 15:46:24,080
Yeah, let's give that a go in the next video.
10105
15:46:24,080 --> 15:46:27,200
I'll see you there.
10106
15:46:27,200 --> 15:46:28,200
Welcome back.
10107
15:46:28,200 --> 15:46:32,920
In the last video, we downloaded our helper functions.py script and imported our accuracy
10108
15:46:32,920 --> 15:46:35,920
function that we made in notebook two.
10109
15:46:35,920 --> 15:46:40,080
But we could really beef this up, our helper functions.py file.
10110
15:46:40,080 --> 15:46:43,080
We could put a lot of different helper functions in there and import them so we didn't have
10111
15:46:43,080 --> 15:46:44,080
to rewrite them.
10112
15:46:44,080 --> 15:46:46,560
That's just something to keep in mind for later on.
10113
15:46:46,560 --> 15:46:51,720
But now, let's create a function to time our experiments.
10114
15:46:51,720 --> 15:46:55,040
So creating a function to time our experiments.
10115
15:46:55,040 --> 15:47:00,520
So one of the things about machine learning is that it's very experimental.
10116
15:47:00,520 --> 15:47:03,160
You've probably gathered that so far.
10117
15:47:03,160 --> 15:47:04,640
So let's write here.
10118
15:47:04,640 --> 15:47:10,120
So machine learning is very experimental.
10119
15:47:10,120 --> 15:47:18,800
Two of the main things you'll often want to track are, one, your model's performance
10120
15:47:18,800 --> 15:47:24,800
such as its loss and accuracy values, et cetera.
10121
15:47:24,800 --> 15:47:29,200
And two, how fast it runs.
10122
15:47:29,200 --> 15:47:36,000
So usually you want a higher performance and a fast model, that's the ideal scenario.
10123
15:47:36,000 --> 15:47:40,400
However, you could imagine that if you increase your model's performance, you might have
10124
15:47:40,400 --> 15:47:41,680
a bigger neural network.
10125
15:47:41,680 --> 15:47:43,280
It might have more layers.
10126
15:47:43,280 --> 15:47:45,760
It might have more hidden units.
10127
15:47:45,760 --> 15:47:49,760
It might degrade how fast it runs because you're simply making more calculations.
10128
15:47:49,760 --> 15:47:52,440
So there's often a trade-off between these two.
10129
15:47:52,440 --> 15:47:58,120
And how fast it runs will really be important if you're running a model, say, on the internet
10130
15:47:58,120 --> 15:48:02,360
or say on a dedicated GPU or say on a mobile device.
10131
15:48:02,360 --> 15:48:05,280
So these are two things to really keep in mind.
10132
15:48:05,280 --> 15:48:10,080
So because we're tracking our model's performance with our loss value and our accuracy function,
10133
15:48:10,080 --> 15:48:14,240
let's now write some code to check how fast it runs.
10134
15:48:14,240 --> 15:48:18,520
And I did on purpose above, I kept our model on the CPU.
10135
15:48:18,520 --> 15:48:23,240
So we're also going to compare later on how fast our model runs on the CPU versus how
10136
15:48:23,240 --> 15:48:25,760
fast it runs on the GPU.
10137
15:48:25,760 --> 15:48:28,200
So that's something that's coming up.
10138
15:48:28,200 --> 15:48:30,040
Let's write a function here.
10139
15:48:30,040 --> 15:48:32,680
We're going to use the time module from Python.
10140
15:48:32,680 --> 15:48:37,800
So from time it, import the default timer, as I'm going to call it timer.
10141
15:48:37,800 --> 15:48:44,880
So if we go Python default timer, do we get the documentation for, here we go, time it.
10142
15:48:44,880 --> 15:48:49,640
So do we have default timer, wonderful.
10143
15:48:49,640 --> 15:48:55,240
So the default timer, which is always time.perf counter, you can read more about Python timing
10144
15:48:55,240 --> 15:48:57,040
functions in here.
10145
15:48:57,040 --> 15:49:01,040
But this is essentially just going to say, hey, this is the exact time that our code
10146
15:49:01,040 --> 15:49:02,040
started.
10147
15:49:02,040 --> 15:49:05,800
And then we're going to create another stop for when our code stopped.
10148
15:49:05,800 --> 15:49:07,640
And then we're going to compare the start and stop times.
10149
15:49:07,640 --> 15:49:11,360
And that's going to basically be how long our model took to train.
10150
15:49:11,360 --> 15:49:16,080
So we're going to go def print train time.
10151
15:49:16,080 --> 15:49:18,320
This is just going to be a display function.
10152
15:49:18,320 --> 15:49:24,480
So start, we're going to get the float type hint, by the way, start an end time.
10153
15:49:24,480 --> 15:49:29,480
So the essence of this function will be to compare start and end time.
10154
15:49:29,480 --> 15:49:36,600
And we're going to set the torch or the device here, we'll pass this in as torch dot device.
10155
15:49:36,600 --> 15:49:40,480
And we're going to set that default to none, because we want to compare how fast our model
10156
15:49:40,480 --> 15:49:42,560
runs on different devices.
10157
15:49:42,560 --> 15:49:49,000
So I'm just going to write a little doc string here, prints, difference between start and
10158
15:49:49,000 --> 15:49:51,000
end time.
10159
15:49:51,000 --> 15:49:54,680
And then of course, we could add more there for the arguments, but that's a quick one liner.
10160
15:49:54,680 --> 15:49:56,380
Tell us what our function does.
10161
15:49:56,380 --> 15:49:59,560
So total time equals end minus start.
10162
15:49:59,560 --> 15:50:05,560
And then print, we're going to write here train time on, whichever device we're using
10163
15:50:05,560 --> 15:50:08,680
might be CPU, might be GPU.
10164
15:50:08,680 --> 15:50:16,960
Total time equals, we'll go to three and we'll say seconds, three decimal places that is
10165
15:50:16,960 --> 15:50:20,840
and return total time.
10166
15:50:20,840 --> 15:50:21,840
Beautiful.
10167
15:50:21,840 --> 15:50:32,880
So for example, we could do start time equals timer, and then end time equals timer.
10168
15:50:32,880 --> 15:50:37,640
And then we can put in here some code between those two.
10169
15:50:37,640 --> 15:50:44,200
And then if we go print train, oh, maybe we need a timer like this, we'll find out if
10170
15:50:44,200 --> 15:50:48,560
and out code it out, you know, we'll see if it works.
10171
15:50:48,560 --> 15:50:57,400
Start time and end equals end time and device equals.
10172
15:50:57,400 --> 15:51:04,400
We're running on the CPU right now, CPU, let's see if this works, wonderful.
10173
15:51:04,400 --> 15:51:07,640
So it's a very small number here.
10174
15:51:07,640 --> 15:51:13,880
So train time on CPU, very small number, because the start time is basically on this
10175
15:51:13,880 --> 15:51:19,760
exact line, comment basically it takes no time to run, then end time is on here, we get
10176
15:51:19,760 --> 15:51:25,600
3.304 times 10 to the power of negative five.
10177
15:51:25,600 --> 15:51:30,120
So quite a small number, but if we put some modeling code in here, it's going to measure
10178
15:51:30,120 --> 15:51:35,360
the start time of this cell, it's going to model our code in there, then we have the
10179
15:51:35,360 --> 15:51:39,240
end time, and then we find out how long our model took the train.
10180
15:51:39,240 --> 15:51:44,160
So with that being said, I think we've got all of the pieces of the puzzle for creating
10181
15:51:44,160 --> 15:51:47,120
some training and testing functions.
10182
15:51:47,120 --> 15:51:50,200
So we've got a loss function, we've got an optimizer, we've got a valuation metric, we've
10183
15:51:50,200 --> 15:51:55,280
got a timing function, we've got a model, we've got some data.
10184
15:51:55,280 --> 15:51:59,320
How about we train our first baseline computer vision model in the next video?
10185
15:51:59,320 --> 15:52:01,880
I'll see you there.
10186
15:52:01,880 --> 15:52:04,080
Good morning.
10187
15:52:04,080 --> 15:52:06,800
Well might not be morning wherever you are in the world.
10188
15:52:06,800 --> 15:52:10,880
It's nice and early here, I'm up recording some videos, because we have a lot of momentum
10189
15:52:10,880 --> 15:52:14,560
going with this, but look at this, I took a little break last night, I have a runtime
10190
15:52:14,560 --> 15:52:19,480
disconnected, but this is just what's going to happen if you're using Google Colab.
10191
15:52:19,480 --> 15:52:24,320
Since I use Google Colab Pro, completely unnecessary for the course, but I just found it worth
10192
15:52:24,320 --> 15:52:30,080
it for how much I use Google Colab, I get longer idle timeouts, so that means that my
10193
15:52:30,080 --> 15:52:33,440
Colab notebook will stay persistent for a longer time.
10194
15:52:33,440 --> 15:52:38,760
But of course overnight it's going to disconnect, so I click reconnect, and then if I want to
10195
15:52:38,760 --> 15:52:45,640
get back to wherever we were, because we downloaded some data from torchvision.datasets, I have
10196
15:52:45,640 --> 15:52:47,800
to rerun all of these cells.
10197
15:52:47,800 --> 15:52:53,680
So a nice shortcut, we might have seen this before, is to just come down to where we were,
10198
15:52:53,680 --> 15:52:58,560
and if all the code above works, oh there we go, I wrote myself some notes of where we're
10199
15:52:58,560 --> 15:53:01,120
up to.
10200
15:53:01,120 --> 15:53:05,400
Let's go run before, so this is just going to run all the cells above, and we're up
10201
15:53:05,400 --> 15:53:11,640
to here, 3.3 creating a training loop, and training a model on batches of data.
10202
15:53:11,640 --> 15:53:15,880
So that's going to be a little bit interesting, and I wrote myself another reminder here, this
10203
15:53:15,880 --> 15:53:20,400
is a little bit of behind the scenes, the optimise will update a model's parameters
10204
15:53:20,400 --> 15:53:23,480
once per batch rather than once per epoch.
10205
15:53:23,480 --> 15:53:28,360
So let's hold myself to that note, and make sure I let you know.
10206
15:53:28,360 --> 15:53:32,040
So we're going to make another title here.
10207
15:53:32,040 --> 15:53:40,480
Let's go creating a training loop, and training a model on batches of data.
10208
15:53:40,480 --> 15:53:44,440
So something a little bit different to what we may have seen before if we haven't created
10209
15:53:44,440 --> 15:53:52,960
batches of data using data loader, and recall that just up above here, we've got something
10210
15:53:52,960 --> 15:53:55,360
like 1800 there, there we go.
10211
15:53:55,360 --> 15:54:00,400
So we've split our data into batches, rather than our model looking at 60,000 images of
10212
15:54:00,400 --> 15:54:07,440
fashion MNIST data at one time, it's going to look at 1875 batches of 32, so 32 images
10213
15:54:07,440 --> 15:54:14,400
at the time, of the training data set, and 313 batches of 32 of the test data set.
10214
15:54:14,400 --> 15:54:19,400
So let's go to training loop and train our first model.
10215
15:54:19,400 --> 15:54:23,960
So I'm going to write out a few steps actually, because we have to do a little bit differently
10216
15:54:23,960 --> 15:54:25,480
to what we've done before.
10217
15:54:25,480 --> 15:54:31,080
So one, we want to loop through epochs, so a number of epochs.
10218
15:54:31,080 --> 15:54:35,280
Loop through training batches, and by the way, you might be able to hear some birds singing,
10219
15:54:35,280 --> 15:54:39,080
the sun is about to rise, I hope you enjoy them as much as I do.
10220
15:54:39,080 --> 15:54:45,840
So we're going to perform training steps, and we're going to calculate calculate the
10221
15:54:45,840 --> 15:54:49,400
train loss per batch.
10222
15:54:49,400 --> 15:54:54,680
So this is going to be one of the differences between our previous training loops.
10223
15:54:54,680 --> 15:54:59,280
And this is going to, after number two, we're going to loop through the testing batches.
10224
15:54:59,280 --> 15:55:04,840
So we'll train and evaluate our model at the same step, or same loop.
10225
15:55:04,840 --> 15:55:08,120
And we're going to perform testing steps.
10226
15:55:08,120 --> 15:55:17,080
And then we're going to calculate the test loss per batch as well, per batch.
10227
15:55:17,080 --> 15:55:23,680
Wonderful, four, we're going to, of course, print out what's happening.
10228
15:55:23,680 --> 15:55:28,360
You may have seen the unofficial PyTorch optimization loop theme song.
10229
15:55:28,360 --> 15:55:33,200
And we're going to time it all for fun, of course, because that's what our timing function
10230
15:55:33,200 --> 15:55:34,760
is for.
10231
15:55:34,760 --> 15:55:36,160
So let's get started.
10232
15:55:36,160 --> 15:55:40,000
There's a fair few steps here, but nothing that we can't handle.
10233
15:55:40,000 --> 15:55:43,640
And remember the motto, if and out, code it out.
10234
15:55:43,640 --> 15:55:46,680
Well, there's another one, if and out, run the code, but we haven't written any code to
10235
15:55:46,680 --> 15:55:48,400
run just yet.
10236
15:55:48,400 --> 15:55:52,400
So we're going to import TQDM for a progress bar.
10237
15:55:52,400 --> 15:55:57,600
If you haven't seen TQDM before, it's a very good Python progress bar that you can add
10238
15:55:57,600 --> 15:55:59,960
with a few lines of code.
10239
15:55:59,960 --> 15:56:00,960
So this is just the GitHub.
10240
15:56:00,960 --> 15:56:05,200
It's open source software, one of my favorite pieces of software, and it's going to give
10241
15:56:05,200 --> 15:56:12,160
us a progress bar to let us know how many epochs our training loop has gone through.
10242
15:56:12,160 --> 15:56:15,640
It doesn't have much overhead, but if you want to learn more about it, please refer
10243
15:56:15,640 --> 15:56:17,640
to the TQDM GitHub.
10244
15:56:17,640 --> 15:56:23,680
However, the beautiful thing is that Google CoLab has TQDM built in because it's so good
10245
15:56:23,680 --> 15:56:25,320
and so popular.
10246
15:56:25,320 --> 15:56:28,800
So we're going to import from TQDM.auto.
10247
15:56:28,800 --> 15:56:34,960
So there's a few different types of TQDM progress bars.auto is just going to recognize what
10248
15:56:34,960 --> 15:56:37,040
compute environment we're using.
10249
15:56:37,040 --> 15:56:41,120
And it's going to give us the best type of progress bar for what we're doing.
10250
15:56:41,120 --> 15:56:45,520
So for example, Google CoLab is running a Jupyter Notebook behind the scenes.
10251
15:56:45,520 --> 15:56:52,520
So the progress bar for Jupyter Notebooks is a little bit different to Python scripts.
10252
15:56:52,520 --> 15:56:58,800
So now let's set the seed and start the timer.
10253
15:56:58,800 --> 15:57:03,280
We want to write all of our training loop in this single cell here.
10254
15:57:03,280 --> 15:57:08,400
And then once it starts, once we run this cell, we want the timer to start so that we
10255
15:57:08,400 --> 15:57:12,800
can time how long the entire cell takes to run.
10256
15:57:12,800 --> 15:57:24,920
So we'll go train time start on CPU equals, we set up our timer before, beautiful.
10257
15:57:24,920 --> 15:57:28,480
Now we're going to set the number of epochs.
10258
15:57:28,480 --> 15:57:34,200
Now we're going to keep this small for faster training time so we can run more experiments.
10259
15:57:34,200 --> 15:57:39,720
So we'll keep this small for faster training time.
10260
15:57:39,720 --> 15:57:41,160
That's another little tidbit.
10261
15:57:41,160 --> 15:57:43,840
Do you notice how quickly all of the cells ran above?
10262
15:57:43,840 --> 15:57:48,600
Well, that's because we're using a relatively small data set.
10263
15:57:48,600 --> 15:57:51,960
In the beginning, when you're running experiments, you want them to run quite quickly so that
10264
15:57:51,960 --> 15:57:53,480
you can run them more often.
10265
15:57:53,480 --> 15:57:57,760
So you can learn more about your data so that you can try different things, try different
10266
15:57:57,760 --> 15:57:59,080
models.
10267
15:57:59,080 --> 15:58:02,520
So this is why we're using number of epochs equals three.
10268
15:58:02,520 --> 15:58:07,080
We start with three so that our experiment runs in 30 seconds or a minute or so.
10269
15:58:07,080 --> 15:58:11,200
That way, if something doesn't work, we haven't wasted so much time waiting for a model to
10270
15:58:11,200 --> 15:58:12,920
train.
10271
15:58:12,920 --> 15:58:16,160
Later on, we could train it for 100 epochs if we wanted to.
10272
15:58:16,160 --> 15:58:18,680
So we're going to create a training and test loop.
10273
15:58:18,680 --> 15:58:25,160
So for epoch in TQDM range epochs, let's get this going.
10274
15:58:25,160 --> 15:58:31,960
So for TQDM to work, we just wrap our iterator with TQDM and you'll see later on how this
10275
15:58:31,960 --> 15:58:32,960
tracks the progress.
10276
15:58:32,960 --> 15:58:36,360
So I'm going to put out a little print statement here.
10277
15:58:36,360 --> 15:58:39,080
We'll go epoch.
10278
15:58:39,080 --> 15:58:41,720
This is just going to say what epoch we're on.
10279
15:58:41,720 --> 15:58:44,240
We'll go here.
10280
15:58:44,240 --> 15:58:48,800
That's something that I like to do quite often is put little print statements here and there
10281
15:58:48,800 --> 15:58:51,160
so that we know what's going on.
10282
15:58:51,160 --> 15:58:53,000
So let's set up the training.
10283
15:58:53,000 --> 15:58:55,320
We're going to have to instantiate the train loss.
10284
15:58:55,320 --> 15:58:57,960
We're going to set that to zero to begin with.
10285
15:58:57,960 --> 15:59:03,060
And we're going to cumulatively add some values to the train loss here and then we'll
10286
15:59:03,060 --> 15:59:09,160
see later on how this accumulates and we can calculate the training loss per batch.
10287
15:59:09,160 --> 15:59:12,720
Let's what we're doing up here, calculate the train loss per batch.
10288
15:59:12,720 --> 15:59:17,320
And then finally, at the end of the loop, we will divide our training loss by the number
10289
15:59:17,320 --> 15:59:22,720
of batches so we can get the average training loss per batch and that will give us the training
10290
15:59:22,720 --> 15:59:24,960
loss per epoch.
10291
15:59:24,960 --> 15:59:25,960
Now that's a lot of talking.
10292
15:59:25,960 --> 15:59:27,760
If that doesn't make sense, remember.
10293
15:59:27,760 --> 15:59:29,880
But if and out, code it out.
10294
15:59:29,880 --> 15:59:34,680
So add a loop to loop through the training batches.
10295
15:59:34,680 --> 15:59:40,880
So because our data is batchified now and I've got a crow or maybe a cooker bar sitting
10296
15:59:40,880 --> 15:59:47,800
on the roof across from my apartment, it's singing its song this morning, lovely.
10297
15:59:47,800 --> 15:59:51,320
So we're going to loop through our training batch data.
10298
15:59:51,320 --> 15:59:57,080
So I've got four batch, comma x, y, because remember our training batches come in the
10299
15:59:57,080 --> 15:59:59,000
form of X.
10300
15:59:59,000 --> 16:00:02,880
So that's our data or our images and why, which is label.
10301
16:00:02,880 --> 16:00:09,800
You could call this image label or target as part of which would, but it's convention
10302
16:00:09,800 --> 16:00:13,440
to often call your features X and your labels Y.
10303
16:00:13,440 --> 16:00:20,160
We've seen this before in we're going to enumerate the train data loader as well.
10304
16:00:20,160 --> 16:00:23,760
We do this so we can keep track of the number of batches we've been through.
10305
16:00:23,760 --> 16:00:25,720
So that will give us batch there.
10306
16:00:25,720 --> 16:00:31,160
I'm going to set model zero to training mode because even though that's the default, we
10307
16:00:31,160 --> 16:00:34,240
just want to make sure that it's in training mode.
10308
16:00:34,240 --> 16:00:35,600
Now we're going to do the forward pass.
10309
16:00:35,600 --> 16:00:39,560
If you remember, what are the steps in apply to our optimization loop?
10310
16:00:39,560 --> 16:00:40,960
We do the forward pass.
10311
16:00:40,960 --> 16:00:47,720
We calculate the loss of the minus zero grad, last backwards, up to minus a step, step,
10312
16:00:47,720 --> 16:00:48,720
step.
10313
16:00:48,720 --> 16:00:49,720
So let's do that.
10314
16:00:49,720 --> 16:00:54,440
Hey, model zero, we'll put the features through there and then we're going to calculate the
10315
16:00:54,440 --> 16:00:57,200
loss.
10316
16:00:57,200 --> 16:00:58,560
We've been through these steps before.
10317
16:00:58,560 --> 16:01:03,600
So we're not going to spend too much time on the exact steps here, but we're just going
10318
16:01:03,600 --> 16:01:04,880
to practice writing them out.
10319
16:01:04,880 --> 16:01:08,320
And of course, later on, you might be thinking, then you'll, how come we haven't functionalized
10320
16:01:08,320 --> 16:01:09,720
this training loop already?
10321
16:01:09,720 --> 16:01:12,920
We've seemed to write the same generic code over and over again.
10322
16:01:12,920 --> 16:01:17,120
Well, that's because we like to practice writing PyTorch code, right?
10323
16:01:17,120 --> 16:01:18,480
We're going to functionalize them later on.
10324
16:01:18,480 --> 16:01:20,040
Don't you worry about that.
10325
16:01:20,040 --> 16:01:24,400
So here's another little step that we haven't done before is we have the training loss.
10326
16:01:24,400 --> 16:01:28,920
And so because we've set that to zero to begin with, we're going to accumulate the training
10327
16:01:28,920 --> 16:01:31,520
loss values every batch.
10328
16:01:31,520 --> 16:01:33,680
So we're going to just add it up here.
10329
16:01:33,680 --> 16:01:37,520
And then later on, we're going to divide it by the total number of batches to get the
10330
16:01:37,520 --> 16:01:39,920
average loss per batch.
10331
16:01:39,920 --> 16:01:45,600
So you see how this loss calculation is within the batch loop here?
10332
16:01:45,600 --> 16:01:49,760
So this means that one batch of data is going to go through the model.
10333
16:01:49,760 --> 16:01:53,440
And then we're going to calculate the loss on one batch of data.
10334
16:01:53,440 --> 16:01:57,960
And this loop is going to continue until it's been through all of the batches in the train
10335
16:01:57,960 --> 16:01:59,480
data loader.
10336
16:01:59,480 --> 16:02:03,160
So 1875 steps or whatever there was.
10337
16:02:03,160 --> 16:02:07,620
So accumulate train loss.
10338
16:02:07,620 --> 16:02:15,720
And then we're going to optimize a zero grad, optimizer dot zero grad.
10339
16:02:15,720 --> 16:02:18,400
And then number four is what?
10340
16:02:18,400 --> 16:02:20,320
Loss backward.
10341
16:02:20,320 --> 16:02:21,320
Loss backward.
10342
16:02:21,320 --> 16:02:22,920
We'll do the back propagation step.
10343
16:02:22,920 --> 16:02:27,540
And then finally, we've got number five, which is optimizer step.
10344
16:02:27,540 --> 16:02:35,000
So this is where I left my little note above to remind me and to also let you know, highlight
10345
16:02:35,000 --> 16:02:40,640
that the optimizer will update a model's parameters once per batch rather than once per epoch.
10346
16:02:40,640 --> 16:02:45,160
So you see how we've got a for loop inside our epoch loop here.
10347
16:02:45,160 --> 16:02:46,960
So the batch loop.
10348
16:02:46,960 --> 16:02:50,880
So this is what I meant that the optimizer, this is one of the advantages of using mini
10349
16:02:50,880 --> 16:02:55,640
batches is not only is it more memory efficient because we're not loading 60,000 images into
10350
16:02:55,640 --> 16:02:57,480
memory at a time.
10351
16:02:57,480 --> 16:03:04,440
We are updating our model's parameters once per batch rather than waiting for it to see
10352
16:03:04,440 --> 16:03:07,040
the whole data set with every batch.
10353
16:03:07,040 --> 16:03:12,000
Our model is hopefully getting slightly better.
10354
16:03:12,000 --> 16:03:17,560
So that is because the optimizer dot step call is within the batch loop rather than the
10355
16:03:17,560 --> 16:03:20,400
epoch loop.
10356
16:03:20,400 --> 16:03:24,040
So let's now print out what's happening.
10357
16:03:24,040 --> 16:03:26,440
Print out what's happening.
10358
16:03:26,440 --> 16:03:32,480
So if batch, let's do it every 400 or so batches because we have a lot of batches.
10359
16:03:32,480 --> 16:03:37,480
We don't want to print out too often, otherwise we'll just fill our screen with numbers.
10360
16:03:37,480 --> 16:03:41,520
That might not be a bad thing, but 400 seems a good number.
10361
16:03:41,520 --> 16:03:45,720
That'll be about five printouts if we have 2000 batches.
10362
16:03:45,720 --> 16:03:51,720
So print looked at, and of course you can adjust this to whatever you would like.
10363
16:03:51,720 --> 16:03:56,520
That's the flexibility of PyTorch, flexibility of Python as well.
10364
16:03:56,520 --> 16:03:58,720
So looked at how many samples have we looked at?
10365
16:03:58,720 --> 16:04:02,640
So we're going to take the batch number, multiply it by X, the length of X is going
10366
16:04:02,640 --> 16:04:07,120
to be 32 because that is our batch size.
10367
16:04:07,120 --> 16:04:13,920
Then we're going to just write down here the total number of items that we've got now
10368
16:04:13,920 --> 16:04:19,720
of data set, and we can access that by going train data loader dot data set.
10369
16:04:19,720 --> 16:04:24,880
So that's going to give us length of the data set contained within our train data loader,
10370
16:04:24,880 --> 16:04:30,320
which is you might be able to guess 60,000 or should be.
10371
16:04:30,320 --> 16:04:37,360
Now we have to, because we've been accumulating the train loss, this is going to be quite
10372
16:04:37,360 --> 16:04:41,560
high because we've been adding every single time we've calculated the loss, we've been
10373
16:04:41,560 --> 16:04:45,240
adding it to the train loss, the overall value per batch.
10374
16:04:45,240 --> 16:04:50,320
So now let's adjust if we wanted to find out, see how now we've got this line, we're outside
10375
16:04:50,320 --> 16:04:51,960
of the batch loop.
10376
16:04:51,960 --> 16:04:59,320
We want to adjust our training loss to get the average training loss per batch per epoch.
10377
16:04:59,320 --> 16:05:01,640
So we're coming back to the epoch loop here.
10378
16:05:01,640 --> 16:05:06,120
A little bit confusing, but you just line up where the loops are, and this is going to
10379
16:05:06,120 --> 16:05:09,920
help you figure out what context you're computing in.
10380
16:05:09,920 --> 16:05:12,400
So now we are in the epoch loop.
10381
16:05:12,400 --> 16:05:21,720
So divide total train loss by length of train data loader, oh, this is so exciting, training
10382
16:05:21,720 --> 16:05:23,360
our biggest model yet.
10383
16:05:23,360 --> 16:05:29,440
So train loss equals or divide equals, we're going to reassign the train loss, we're going
10384
16:05:29,440 --> 16:05:32,880
to divide it by the length of the train data loader.
10385
16:05:32,880 --> 16:05:33,880
So why do we do this?
10386
16:05:33,880 --> 16:05:38,800
Well, because we've accumulated the train loss here for every batch in the train data
10387
16:05:38,800 --> 16:05:43,960
loader, but we want to average it out across how many batches there are in the train data
10388
16:05:43,960 --> 16:05:45,560
loader.
10389
16:05:45,560 --> 16:05:51,400
So this value will be quite high until we readjust it to find the average loss per epoch, because
10390
16:05:51,400 --> 16:05:53,320
we are in the epoch loop.
10391
16:05:53,320 --> 16:05:57,520
All right, there are a few steps going on, but that's all right, we'll figure this out,
10392
16:05:57,520 --> 16:06:01,120
or what should happening in a minute, let's code up the testing loop.
10393
16:06:01,120 --> 16:06:03,680
So testing, what do we have to do for testing?
10394
16:06:03,680 --> 16:06:06,680
Well, let's set up a test loss variable.
10395
16:06:06,680 --> 16:06:10,880
Why don't we do accuracy for testing as well?
10396
16:06:10,880 --> 16:06:14,640
Did we do accuracy for training?
10397
16:06:14,640 --> 16:06:17,720
We didn't do accuracy for training, but that's all right, we'll stick to doing accuracy for
10398
16:06:17,720 --> 16:06:18,720
testing.
10399
16:06:18,720 --> 16:06:25,600
We'll go model zero dot eval, we'll put it in evaluation mode, and we'll turn on our
10400
16:06:25,600 --> 16:06:30,720
inference mode context manager with torch dot inference mode.
10401
16:06:30,720 --> 16:06:36,240
Now we'll do the same thing for x, y in test data loader, we don't need to keep track
10402
16:06:36,240 --> 16:06:40,320
of the batches here again in the test data loader.
10403
16:06:40,320 --> 16:06:46,360
So we'll just loop through x, so features, images, and labels in our test data loader.
10404
16:06:46,360 --> 16:06:51,840
We're going to do the forward pass, because the test loop, we don't have an optimization
10405
16:06:51,840 --> 16:06:57,240
step, we are just passing our data through the model and evaluating the patterns it learned
10406
16:06:57,240 --> 16:06:58,240
on the training data.
10407
16:06:58,240 --> 16:07:01,640
So we're going to pass in x here.
10408
16:07:01,640 --> 16:07:08,600
This might be a little bit confusing, let's do this x test, y test.
10409
16:07:08,600 --> 16:07:12,840
That way we don't get confused with our x above for the training set.
10410
16:07:12,840 --> 16:07:20,000
Now we're going to calculate the loss, a cum, relatively might small that wrong app to
10411
16:07:20,000 --> 16:07:21,640
sound that out.
10412
16:07:21,640 --> 16:07:22,840
What do we have here?
10413
16:07:22,840 --> 16:07:28,120
So we've got our test loss variable that we just assigned to zero above, just up here.
10414
16:07:28,120 --> 16:07:30,240
So we're going to do test loss plus equals.
10415
16:07:30,240 --> 16:07:32,800
We're doing this in one step here.
10416
16:07:32,800 --> 16:07:35,840
Test spread, y test.
10417
16:07:35,840 --> 16:07:40,560
So we're comparing our test prediction to our y test labels, our test labels.
10418
16:07:40,560 --> 16:07:44,800
Now we're going to back out of the for loop here, because that's all we have to do, the
10419
16:07:44,800 --> 16:07:47,480
forward pass and calculate the loss for the test data set.
10420
16:07:47,480 --> 16:07:50,960
Oh, I said we're going to calculate the accuracy.
10421
16:07:50,960 --> 16:07:51,960
Silly me.
10422
16:07:51,960 --> 16:07:53,800
So calculate accuracy.
10423
16:07:53,800 --> 16:07:57,440
Let's go test act.
10424
16:07:57,440 --> 16:07:59,800
And we've got plus equals.
10425
16:07:59,800 --> 16:08:02,200
We can bring out our accuracy function here.
10426
16:08:02,200 --> 16:08:07,840
That's what we downloaded from our helper functions dot pi before, y true equals y test.
10427
16:08:07,840 --> 16:08:13,280
And then y pred equals test, pred dot arg max, dim equals one.
10428
16:08:13,280 --> 16:08:14,320
Why do we do this?
10429
16:08:14,320 --> 16:08:18,520
Well, because recall that the outputs of our model, the raw outputs of our model are going
10430
16:08:18,520 --> 16:08:25,320
to be logits and our accuracy function expects our true labels and our predictions to be
10431
16:08:25,320 --> 16:08:27,120
in the same format.
10432
16:08:27,120 --> 16:08:32,000
If our test pred is just logits, we have to call arg max to find the logit value with
10433
16:08:32,000 --> 16:08:35,840
the highest index, and that will be the prediction label.
10434
16:08:35,840 --> 16:08:39,840
And so then we're comparing labels to labels.
10435
16:08:39,840 --> 16:08:42,080
That's what the arg max does here.
10436
16:08:42,080 --> 16:08:49,640
So we can back out of the batch loop now, and we're going to now calculate Cal queue
10437
16:08:49,640 --> 16:08:58,080
length, the test loss, average per batch.
10438
16:08:58,080 --> 16:09:05,480
So let's go here, test loss, divide equals length test data loader.
10439
16:09:05,480 --> 16:09:11,960
So because we were in the context of the loop here of the batch loop, our test lost and
10440
16:09:11,960 --> 16:09:17,200
test accuracy values are per batch and accumulated every single batch.
10441
16:09:17,200 --> 16:09:22,960
So now we're just dividing them by how many batches we had, test data loader, and the
10442
16:09:22,960 --> 16:09:30,960
same thing for the accuracy, calculate the ACK or test ACK average per batch.
10443
16:09:30,960 --> 16:09:39,440
So this is giving us test loss and test accuracy per epoch, test ACK divided equals length,
10444
16:09:39,440 --> 16:09:44,400
test data loader, wonderful, we're so close to finishing this up.
10445
16:09:44,400 --> 16:09:47,880
And now we'll come back to where's our epoch loop.
10446
16:09:47,880 --> 16:09:52,960
We can, these lines are very helpful in Google CoLab, we scroll down.
10447
16:09:52,960 --> 16:09:57,680
I believe if you want them, you can go settings or something like that, yeah, settings.
10448
16:09:57,680 --> 16:09:59,960
That's where you can get these lines from if you don't have them.
10449
16:09:59,960 --> 16:10:03,800
So print out what's happening.
10450
16:10:03,800 --> 16:10:12,280
We are going to print f equals n, let's get the train loss in here.
10451
16:10:12,280 --> 16:10:16,480
Ten loss and we'll print that to four decimal places.
10452
16:10:16,480 --> 16:10:22,360
And then we'll get the test loss, of course, test loss and we'll go, we'll get that to four
10453
16:10:22,360 --> 16:10:24,640
decimal places as well.
10454
16:10:24,640 --> 16:10:35,400
And then we'll get the test ACK, test accuracy, we'll get that to four decimal places as well.
10455
16:10:35,400 --> 16:10:38,920
For f, wonderful.
10456
16:10:38,920 --> 16:10:44,400
And then finally, one more step, ooh, we've written a lot of code in this video.
10457
16:10:44,400 --> 16:10:48,440
We want to calculate the training time because that's another thing that we want to track.
10458
16:10:48,440 --> 16:10:51,240
We want to see how long our model is taken to train.
10459
16:10:51,240 --> 16:11:01,960
So train time end on CPU is going to equal the timer and then we're going to get the
10460
16:11:01,960 --> 16:11:06,920
total train time model zero so we can set up a variable for this so we can compare our
10461
16:11:06,920 --> 16:11:09,000
modeling experiments later on.
10462
16:11:09,000 --> 16:11:20,400
We're going to go print train time, start equals train time, start on CPU and equals
10463
16:11:20,400 --> 16:11:23,840
train time end on CPU.
10464
16:11:23,840 --> 16:11:32,080
And finally, the device is going to be string next model zero dot parameters.
10465
16:11:32,080 --> 16:11:37,520
So we're just, this is one way of checking where our model zero parameters live.
10466
16:11:37,520 --> 16:11:43,080
So beautiful, all right.
10467
16:11:43,080 --> 16:11:44,240
Have we got enough brackets there?
10468
16:11:44,240 --> 16:11:46,720
I don't think we do.
10469
16:11:46,720 --> 16:11:47,720
Okay.
10470
16:11:47,720 --> 16:11:48,720
There we go.
10471
16:11:48,720 --> 16:11:49,720
Whoo.
10472
16:11:49,720 --> 16:11:52,360
I'll just show you what the output of this is.
10473
16:11:52,360 --> 16:11:59,360
So next, model zero dot parameters, what does this give us?
10474
16:11:59,360 --> 16:12:05,040
Oh, can we go device here?
10475
16:12:05,040 --> 16:12:12,560
Oh, what do we have here?
10476
16:12:12,560 --> 16:12:14,160
Model zero dot parameters.
10477
16:12:14,160 --> 16:12:20,720
I thought this was a little trick.
10478
16:12:20,720 --> 16:12:29,080
And then if we go next parameter containing.
10479
16:12:29,080 --> 16:12:34,040
I thought we could get device, oh, there we go.
10480
16:12:34,040 --> 16:12:35,520
Excuse me.
10481
16:12:35,520 --> 16:12:36,520
That's how we get it.
10482
16:12:36,520 --> 16:12:37,960
That's how we get the device that it's on.
10483
16:12:37,960 --> 16:12:41,720
So let me just turn this.
10484
16:12:41,720 --> 16:12:45,280
This is what the output of that's going to be CPU.
10485
16:12:45,280 --> 16:12:47,040
That's what we're after.
10486
16:12:47,040 --> 16:12:50,320
So troubleshooting on the fly here.
10487
16:12:50,320 --> 16:12:51,720
Hopefully all of this code works.
10488
16:12:51,720 --> 16:12:53,400
So we went through all of our steps.
10489
16:12:53,400 --> 16:12:56,360
We're looping through epochs at the top level here.
10490
16:12:56,360 --> 16:12:59,280
We looped through the training batches, performed the training steps.
10491
16:12:59,280 --> 16:13:04,600
So our training loop, forward pass, loss calculation, optimizer zero grad, loss backwards, calculate
10492
16:13:04,600 --> 16:13:06,600
the loss per batch, accumulate those.
10493
16:13:06,600 --> 16:13:11,640
We do the same for the testing batches except without the optimizer steps and print out
10494
16:13:11,640 --> 16:13:14,360
what's happening and we time it all for fun.
10495
16:13:14,360 --> 16:13:18,720
A fair bit going on here, but if you don't think there's any errors, give that a go, run
10496
16:13:18,720 --> 16:13:19,720
that code.
10497
16:13:19,720 --> 16:13:24,320
I'm going to leave this one on a cliffhanger and we're going to see if this works in the
10498
16:13:24,320 --> 16:13:25,320
next video.
10499
16:13:25,320 --> 16:13:28,880
I'll see you there.
10500
16:13:28,880 --> 16:13:29,880
Welcome back.
10501
16:13:29,880 --> 16:13:32,120
The last video was pretty full on.
10502
16:13:32,120 --> 16:13:35,520
We did a fair few steps, but this is all good practice.
10503
16:13:35,520 --> 16:13:38,800
The best way to learn PyTorch code is to write more PyTorch code.
10504
16:13:38,800 --> 16:13:41,080
So did you try it out?
10505
16:13:41,080 --> 16:13:42,080
Did you run this code?
10506
16:13:42,080 --> 16:13:43,080
Did it work?
10507
16:13:43,080 --> 16:13:44,760
Did we probably have an error somewhere?
10508
16:13:44,760 --> 16:13:46,160
Well, let's find out together.
10509
16:13:46,160 --> 16:13:47,160
You ready?
10510
16:13:47,160 --> 16:13:51,600
Let's train our biggest model yet in three, two, one, bomb.
10511
16:13:51,600 --> 16:13:54,000
Oh, of course we did.
10512
16:13:54,000 --> 16:13:55,200
What do we have?
10513
16:13:55,200 --> 16:13:57,040
What's going on?
10514
16:13:57,040 --> 16:13:58,520
Indentation error.
10515
16:13:58,520 --> 16:14:00,360
Ah, classic.
10516
16:14:00,360 --> 16:14:03,760
So print out what's happening.
10517
16:14:03,760 --> 16:14:06,440
Do we not have an indent there?
10518
16:14:06,440 --> 16:14:12,760
Oh, is that not in line with where it needs to be?
10519
16:14:12,760 --> 16:14:14,080
Excuse me.
10520
16:14:14,080 --> 16:14:15,080
Okay.
10521
16:14:15,080 --> 16:14:16,320
Why is this not in line?
10522
16:14:16,320 --> 16:14:19,280
So this is strange to me, enter.
10523
16:14:19,280 --> 16:14:23,520
How did this all get off by one?
10524
16:14:23,520 --> 16:14:25,600
I'm not sure, but this is just what you'll face.
10525
16:14:25,600 --> 16:14:29,640
Like sometimes you'll write this beautiful code that should work, but the main error
10526
16:14:29,640 --> 16:14:33,480
of your entire code is that it's off by a single space.
10527
16:14:33,480 --> 16:14:39,880
I'm not sure how that happened, but we're just going to pull this all into line.
10528
16:14:39,880 --> 16:14:43,320
We could have done this by selecting it all, but we're going to do it line by line just
10529
16:14:43,320 --> 16:14:50,720
to make sure that everything's in the right order, beautiful, and we print out what's
10530
16:14:50,720 --> 16:14:51,720
happening.
10531
16:14:51,720 --> 16:14:54,760
Three, two, one, round two.
10532
16:14:54,760 --> 16:14:55,760
We're going.
10533
16:14:55,760 --> 16:14:56,760
Okay.
10534
16:14:56,760 --> 16:14:57,840
So this is the progress bar I was talking about.
10535
16:14:57,840 --> 16:14:58,840
Look at that.
10536
16:14:58,840 --> 16:14:59,840
How beautiful is that?
10537
16:14:59,840 --> 16:15:01,440
Oh, we're going quite quickly through all of our samples.
10538
16:15:01,440 --> 16:15:03,080
I need to talk faster.
10539
16:15:03,080 --> 16:15:04,080
Oh, there we go.
10540
16:15:04,080 --> 16:15:05,080
We've got some good results.
10541
16:15:05,080 --> 16:15:09,080
We've got the tests, the train loss, the test loss and the test accuracy is pretty darn
10542
16:15:09,080 --> 16:15:10,080
good.
10543
16:15:10,080 --> 16:15:11,600
Oh my goodness.
10544
16:15:11,600 --> 16:15:15,600
This is a good baseline already, 67%.
10545
16:15:15,600 --> 16:15:19,280
So this is showing us it's about seven seconds per iteration.
10546
16:15:19,280 --> 16:15:21,600
Remember TQDM is tracking how many epochs.
10547
16:15:21,600 --> 16:15:22,720
We're going through.
10548
16:15:22,720 --> 16:15:27,880
So we have three epochs and our print statement is just saying, hey, we've looked at zero
10549
16:15:27,880 --> 16:15:33,720
out of 60,000 samples and we looked at 12,000 out of 60,000 samples and we finished on
10550
16:15:33,720 --> 16:15:41,000
an epoch two because it's zero indexed and we have a train loss of 0.4550 and a test
10551
16:15:41,000 --> 16:15:49,160
loss 476 and a test accuracy 834265 and a training time about just over 21 seconds or
10552
16:15:49,160 --> 16:15:50,920
just under 22.
10553
16:15:50,920 --> 16:15:55,600
So keep in mind that your numbers may not be the exact same as mine.
10554
16:15:55,600 --> 16:16:02,600
They should be in the same realm as mine, but due to inherent randomness of machine learning,
10555
16:16:02,600 --> 16:16:05,480
even if we set the manual seed might be slightly different.
10556
16:16:05,480 --> 16:16:10,520
So don't worry too much about that and what I mean by in the same realm, if your accuracy
10557
16:16:10,520 --> 16:16:15,880
is 25 rather than 83, well then probably something's wrong there.
10558
16:16:15,880 --> 16:16:20,120
But if it's 83.6, well then that's not too bad.
10559
16:16:20,120 --> 16:16:24,720
And the same with the train time on CPU, this will be heavily dependent, how long it takes
10560
16:16:24,720 --> 16:16:30,160
to train will be heavily dependent on the hardware that you're using behind the scenes.
10561
16:16:30,160 --> 16:16:32,360
So I'm using Google Colab Pro.
10562
16:16:32,360 --> 16:16:37,240
Now that may mean I get a faster CPU than the free version of Google Colab.
10563
16:16:37,240 --> 16:16:44,280
It also depends on what CPU is available in Google's computer warehouse where Google
10564
16:16:44,280 --> 16:16:47,320
Colab is hosting of how fast this will be.
10565
16:16:47,320 --> 16:16:49,080
So just keep that in mind.
10566
16:16:49,080 --> 16:16:53,960
If your time is 10 times that, then there's probably something wrong.
10567
16:16:53,960 --> 16:16:58,280
If your time is 10 times less than that, well, hey, keep using that hardware because that's
10568
16:16:58,280 --> 16:16:59,720
pretty darn good.
10569
16:16:59,720 --> 16:17:01,600
So let's keep pushing forward.
10570
16:17:01,600 --> 16:17:04,680
This will be our baseline that we try to improve upon.
10571
16:17:04,680 --> 16:17:10,800
So we have an accuracy of 83.5 and we have a train time of 20 or so seconds.
10572
16:17:10,800 --> 16:17:16,120
So we'll see what we can do with a model on the GPU later and then also later on a
10573
16:17:16,120 --> 16:17:18,480
convolutional neural network.
10574
16:17:18,480 --> 16:17:22,880
So let's evaluate our model where we up to what we just did.
10575
16:17:22,880 --> 16:17:23,880
We built a training loop.
10576
16:17:23,880 --> 16:17:24,880
So we've done that.
10577
16:17:24,880 --> 16:17:25,880
That was a fair bit of code.
10578
16:17:25,880 --> 16:17:30,280
But now we're up to we fit the model to the data and make a prediction.
10579
16:17:30,280 --> 16:17:34,880
Let's do these two combined, hey, we'll evaluate our model.
10580
16:17:34,880 --> 16:17:36,240
So we'll come back.
10581
16:17:36,240 --> 16:17:42,320
Number four is make predictions and get model zero results.
10582
16:17:42,320 --> 16:17:47,600
Now we're going to create a function to do this because we want to build multiple models
10583
16:17:47,600 --> 16:17:54,440
and that way we can, if we have, say, model 0123, we can pass it to our function to evaluate
10584
16:17:54,440 --> 16:17:58,240
that model and then we can compare the results later on.
10585
16:17:58,240 --> 16:17:59,880
So that's something to keep in mind.
10586
16:17:59,880 --> 16:18:04,400
If you're going to be writing a bunch of code multiple times, you probably want to
10587
16:18:04,400 --> 16:18:09,600
functionize it and we could definitely do that for our training and last loops.
10588
16:18:09,600 --> 16:18:11,400
But we'll see that later on.
10589
16:18:11,400 --> 16:18:14,200
So let's go deaf of our model.
10590
16:18:14,200 --> 16:18:19,320
So evaluate a given model, we'll pass it in a model, which will be a torch dot nn dot
10591
16:18:19,320 --> 16:18:21,720
module, what of type.
10592
16:18:21,720 --> 16:18:29,560
And we'll pass it in a data loader, which will be of type torch dot utils dot data dot
10593
16:18:29,560 --> 16:18:32,480
data loader.
10594
16:18:32,480 --> 16:18:38,200
And then we'll pass in the loss function so that it can calculate the loss.
10595
16:18:38,200 --> 16:18:41,920
We could pass in an evaluation metric if we wanted to track that too.
10596
16:18:41,920 --> 16:18:44,960
So this will be torch nn dot module as well.
10597
16:18:44,960 --> 16:18:47,720
And then, oh, there we go.
10598
16:18:47,720 --> 16:18:51,560
Speaking of an evaluation function, let's pass in our accuracy function as well.
10599
16:18:51,560 --> 16:18:54,120
And I don't want L, I want that.
10600
16:18:54,120 --> 16:19:06,880
So we want to return a dictionary containing the results of model predicting on data loader.
10601
16:19:06,880 --> 16:19:07,880
So that's what we want.
10602
16:19:07,880 --> 16:19:10,800
We're going to return a dictionary of model results.
10603
16:19:10,800 --> 16:19:14,160
That way we could call this function multiple times with different models and different
10604
16:19:14,160 --> 16:19:20,160
data loaders and then compare the dictionaries full of results depending on which model we
10605
16:19:20,160 --> 16:19:21,720
passed in here.
10606
16:19:21,720 --> 16:19:27,000
So let's set up loss and accuracy equals zero, zero, we'll start those off.
10607
16:19:27,000 --> 16:19:32,320
We'll go, this is going to be much the same as our testing loop above, except it's going
10608
16:19:32,320 --> 16:19:35,760
to be functionalized and we're going to return a dictionary.
10609
16:19:35,760 --> 16:19:41,360
So we'll turn on our context manager for inferencing with torch dot inference mode.
10610
16:19:41,360 --> 16:19:46,360
Now we're going to loop through the data loader and we'll get the x and y values.
10611
16:19:46,360 --> 16:19:51,120
So the x will be our data, the y will be our ideal labels, we'll make predictions with
10612
16:19:51,120 --> 16:19:52,120
the model.
10613
16:19:52,120 --> 16:19:53,960
In other words, do the forward pass.
10614
16:19:53,960 --> 16:19:57,360
So we'll go y pred equals model on x.
10615
16:19:57,360 --> 16:20:02,560
Now we don't have to specify what model it is because we've got the model parameter up
10616
16:20:02,560 --> 16:20:03,560
here.
10617
16:20:03,560 --> 16:20:08,440
So we're starting to make our functions here or this function generalizable.
10618
16:20:08,440 --> 16:20:12,240
So it could be used with almost any model and any data loader.
10619
16:20:12,240 --> 16:20:20,160
So we want to accumulate the loss and accuracy values per batch because this is within the
10620
16:20:20,160 --> 16:20:22,960
batch loop here per batch.
10621
16:20:22,960 --> 16:20:28,840
And then we're going to go loss plus equals loss function, we'll pass it in the y pred
10622
16:20:28,840 --> 16:20:33,680
and the y the true label and we'll do the same with the accuracy.
10623
16:20:33,680 --> 16:20:41,360
So except this time we use our accuracy function, we'll send in y true equals y and y pred equals
10624
16:20:41,360 --> 16:20:47,520
y pred dot argmax because the raw outputs of our model are logits.
10625
16:20:47,520 --> 16:20:51,160
And if we want to convert them into labels, we could take the softmax for the prediction
10626
16:20:51,160 --> 16:20:56,760
probabilities, but we could also take the argmax and just by skipping the softmax step, the
10627
16:20:56,760 --> 16:21:03,680
argmax will get the index where the highest value load it is, dim equals one.
10628
16:21:03,680 --> 16:21:07,880
And then we're going to make sure that we're still within the context manager here.
10629
16:21:07,880 --> 16:21:11,800
So with torch inference mode, but outside the loop.
10630
16:21:11,800 --> 16:21:14,160
So that'll be this line here.
10631
16:21:14,160 --> 16:21:24,200
We're going to scale the loss and act to find the average loss slash act per batch.
10632
16:21:24,200 --> 16:21:30,400
So loss will divide and assign to the length of the data loader.
10633
16:21:30,400 --> 16:21:35,360
So that'll divide and reassign it to however many batches are in our data loader that we
10634
16:21:35,360 --> 16:21:41,200
pass into our of our model function, then we'll do the same thing for the accuracy here.
10635
16:21:41,200 --> 16:21:44,160
Length data loader, beautiful.
10636
16:21:44,160 --> 16:21:48,440
And now we're going to return a dictionary here.
10637
16:21:48,440 --> 16:21:54,760
So return, we can return the model name by inspecting the model.
10638
16:21:54,760 --> 16:21:58,760
We get an attribute of the model, which is its class name.
10639
16:21:58,760 --> 16:22:00,680
I'll show you how you can do that.
10640
16:22:00,680 --> 16:22:06,000
So this is helpful to track if you've created multiple different models and given them different
10641
16:22:06,000 --> 16:22:10,560
class names, you can access the name attribute.
10642
16:22:10,560 --> 16:22:17,160
So this only works when model was created with a class.
10643
16:22:17,160 --> 16:22:19,960
So you just have to ensure that your models have different class names.
10644
16:22:19,960 --> 16:22:24,480
If you want to do it like that, because we're going to do it like that, we can set the model
10645
16:22:24,480 --> 16:22:26,640
name to be its class name.
10646
16:22:26,640 --> 16:22:29,640
We'll get the model loss, which is just this value here.
10647
16:22:29,640 --> 16:22:34,160
After it's been scaled, we'll turn it into a single value by taking dot item.
10648
16:22:34,160 --> 16:22:39,200
And then we'll go model dot act, or we'll get model underscore act for the models accuracy.
10649
16:22:39,200 --> 16:22:41,200
We'll do the same thing here.
10650
16:22:41,200 --> 16:22:42,200
Act.
10651
16:22:42,200 --> 16:22:47,120
I don't think we need to take the item because accuracy comes back in a different form.
10652
16:22:47,120 --> 16:22:50,840
We'll find out, if in doubt, code it out.
10653
16:22:50,840 --> 16:22:53,760
So calculate model zero results on test data set.
10654
16:22:53,760 --> 16:22:57,400
And I want to let you know that you can create your own functions here to do almost whatever
10655
16:22:57,400 --> 16:22:58,560
you want.
10656
16:22:58,560 --> 16:23:01,840
I've just decided that this is going to be helpful for the models and the data that
10657
16:23:01,840 --> 16:23:03,000
we're building.
10658
16:23:03,000 --> 16:23:07,600
But keep that in mind that your models, your data sets might be different and will likely
10659
16:23:07,600 --> 16:23:09,120
be different in the future.
10660
16:23:09,120 --> 16:23:14,200
So you can create these functions for whatever use case you need.
10661
16:23:14,200 --> 16:23:21,320
Model zero results equals a vowel model.
10662
16:23:21,320 --> 16:23:25,120
So we're just going to call our function that we've just created here.
10663
16:23:25,120 --> 16:23:27,840
Model is going to equal model zero.
10664
16:23:27,840 --> 16:23:30,840
The data loader is going to equal what?
10665
16:23:30,840 --> 16:23:35,400
The test data loader, of course, because we want to evaluate it on the test data set.
10666
16:23:35,400 --> 16:23:40,480
And we're going to send in our loss function, which is loss function that we assigned above
10667
16:23:40,480 --> 16:23:42,680
just before our training loop.
10668
16:23:42,680 --> 16:23:49,520
If we come up here, our loss function is up here, and then if we go back down, we have
10669
16:23:49,520 --> 16:23:54,600
our accuracy function is equal to our accuracy function.
10670
16:23:54,600 --> 16:23:57,920
We just pass another function in there, beautiful.
10671
16:23:57,920 --> 16:23:59,520
And let's see if this works.
10672
16:23:59,520 --> 16:24:00,520
Model zero results.
10673
16:24:00,520 --> 16:24:04,600
Did you see any typos likely or errors in our code?
10674
16:24:04,600 --> 16:24:06,320
How do you think our model did?
10675
16:24:06,320 --> 16:24:08,840
Well, let's find out.
10676
16:24:08,840 --> 16:24:11,120
Oh, there we go.
10677
16:24:11,120 --> 16:24:12,520
We got model accuracy.
10678
16:24:12,520 --> 16:24:15,320
Can you see how we could reuse this dictionary later on?
10679
16:24:15,320 --> 16:24:20,120
So if we had model one results, model two results, we could use these dictionaries and compare
10680
16:24:20,120 --> 16:24:21,120
them all together.
10681
16:24:21,120 --> 16:24:22,280
So we've got our model name.
10682
16:24:22,280 --> 16:24:29,560
Our version zero, the model has an accuracy of 83.42 and a loss of 0.47 on the test data
10683
16:24:29,560 --> 16:24:30,560
loader.
10684
16:24:30,560 --> 16:24:32,880
Again, your numbers may be slightly different.
10685
16:24:32,880 --> 16:24:34,600
They should be in the same realm.
10686
16:24:34,600 --> 16:24:37,880
But if they're not the exact same, don't worry too much.
10687
16:24:37,880 --> 16:24:44,120
If they're 20 accuracy points less and the loss is 10 times higher, then you should probably
10688
16:24:44,120 --> 16:24:47,520
go back through your code and check if something is wrong.
10689
16:24:47,520 --> 16:24:51,360
And I believe if we wanted to do a progress bar here, could we do that?
10690
16:24:51,360 --> 16:24:52,360
TQDM.
10691
16:24:52,360 --> 16:24:55,000
Let's have a look, eh?
10692
16:24:55,000 --> 16:24:57,760
Oh, look at that progress bar.
10693
16:24:57,760 --> 16:24:58,920
That's very nice.
10694
16:24:58,920 --> 16:25:02,200
So that's nice and quick because it's only on 313 batches.
10695
16:25:02,200 --> 16:25:04,400
It goes quite quick.
10696
16:25:04,400 --> 16:25:06,600
So now, what's next?
10697
16:25:06,600 --> 16:25:11,480
Well, we've built model one, we've got a model zero, sorry, I'm getting ahead myself.
10698
16:25:11,480 --> 16:25:12,920
We've got a baseline here.
10699
16:25:12,920 --> 16:25:15,160
We've got a way to evaluate our model.
10700
16:25:15,160 --> 16:25:16,760
What's our workflow say?
10701
16:25:16,760 --> 16:25:17,760
So we've got our data ready.
10702
16:25:17,760 --> 16:25:18,760
We've done that.
10703
16:25:18,760 --> 16:25:19,760
We've picked or built a model.
10704
16:25:19,760 --> 16:25:20,760
We've picked a loss function.
10705
16:25:20,760 --> 16:25:21,760
We've built an optimizer.
10706
16:25:21,760 --> 16:25:23,320
We've created a training loop.
10707
16:25:23,320 --> 16:25:24,800
We've fit the model to the data.
10708
16:25:24,800 --> 16:25:26,000
We've made a prediction.
10709
16:25:26,000 --> 16:25:29,600
We've evaluated the model using loss and accuracy.
10710
16:25:29,600 --> 16:25:34,600
We could evaluate it by making some predictions, but we'll save that for later on as in visualizing
10711
16:25:34,600 --> 16:25:36,360
some predictions.
10712
16:25:36,360 --> 16:25:39,720
I think we're up to improving through experimentation.
10713
16:25:39,720 --> 16:25:41,280
So let's give that a go, hey?
10714
16:25:41,280 --> 16:25:45,720
Do you recall that we trained model zero on the CPU?
10715
16:25:45,720 --> 16:25:50,040
How about we build model one and start to train it on the GPU?
10716
16:25:50,040 --> 16:25:55,880
So in the next section, let's create number five, is set up device agnostic code.
10717
16:25:55,880 --> 16:26:01,840
So we've done this one together for using a GPU if there is one.
10718
16:26:01,840 --> 16:26:07,080
So my challenge to you for the next video is to set up some device agnostic code.
10719
16:26:07,080 --> 16:26:11,600
So you might have to go into CoLab if you haven't got a GPU active, change runtime type
10720
16:26:11,600 --> 16:26:16,120
to GPU, and then because it might restart the runtime, you might have to rerun all of
10721
16:26:16,120 --> 16:26:20,840
the cells above so that we get our helper functions file back and the data and whatnot.
10722
16:26:20,840 --> 16:26:27,840
So set up some device agnostic code and I'll see you in the next video.
10723
16:26:27,840 --> 16:26:28,840
How'd you go?
10724
16:26:28,840 --> 16:26:31,920
You should give it a shot, did you set up some device agnostic code?
10725
16:26:31,920 --> 16:26:34,360
I hope you gave it a go, but let's do it together.
10726
16:26:34,360 --> 16:26:35,360
This won't take too long.
10727
16:26:35,360 --> 16:26:38,680
The last two videos have been quite long.
10728
16:26:38,680 --> 16:26:43,800
So if I wanted to set device agnostic code, I want to see if I have a GPU available, do
10729
16:26:43,800 --> 16:26:44,800
I?
10730
16:26:44,800 --> 16:26:47,080
I can check it from the video SMI.
10731
16:26:47,080 --> 16:26:50,640
That fails because I haven't activated a GPU in CoLab yet.
10732
16:26:50,640 --> 16:26:55,080
I can also check here, torch CUDA is available.
10733
16:26:55,080 --> 16:27:00,920
That will PyTorch will check if there's a GPU available with CUDA and it's not.
10734
16:27:00,920 --> 16:27:06,040
So let's fix these two because we want to start using a GPU and we want to set up device
10735
16:27:06,040 --> 16:27:07,040
agnostic code.
10736
16:27:07,040 --> 16:27:12,600
So no matter what hardware our system is running, PyTorch leverages it.
10737
16:27:12,600 --> 16:27:18,120
So we're going to select GPU here, I'm going to click save and you'll notice that our Google
10738
16:27:18,120 --> 16:27:22,360
CoLab notebook will start to reset and we'll start to connect.
10739
16:27:22,360 --> 16:27:23,360
There we go.
10740
16:27:23,360 --> 16:27:28,480
We've got a GPU on the back end, Python, three Google Compute Engine back end GPU.
10741
16:27:28,480 --> 16:27:31,800
Do we have to reset this?
10742
16:27:31,800 --> 16:27:39,120
NVIDIA SMI, wonderful, I have a Tesla T4 GPU with 16 gigabytes of memory, that is wonderful.
10743
16:27:39,120 --> 16:27:40,920
And now do we have a GPU available?
10744
16:27:40,920 --> 16:27:43,480
Oh, torch is not defined.
10745
16:27:43,480 --> 16:27:46,320
Well, do you notice the numbers of these cells?
10746
16:27:46,320 --> 16:27:52,400
One, two, that means because we've reset our runtime to have a GPU, we have to rerun
10747
16:27:52,400 --> 16:27:54,000
all the cells above.
10748
16:27:54,000 --> 16:27:58,160
So we can go run before, that's going to run all the cells above, make sure that we download
10749
16:27:58,160 --> 16:28:03,680
the data, make sure that we download the helper functions file, we go back up, we should see
10750
16:28:03,680 --> 16:28:05,640
our data may be downloading.
10751
16:28:05,640 --> 16:28:07,200
It shouldn't take too long.
10752
16:28:07,200 --> 16:28:12,280
That is another advantage of using a relatively small data set that is already saved on PyTorch
10753
16:28:12,280 --> 16:28:14,240
data sets.
10754
16:28:14,240 --> 16:28:17,600
Just keep in mind that if you use a larger data set and you have to re-download it into
10755
16:28:17,600 --> 16:28:22,800
Google Colab, it may take a while to run, and if you build bigger models, they may take
10756
16:28:22,800 --> 16:28:23,800
a while to run.
10757
16:28:23,800 --> 16:28:27,960
So just keep that in mind for your experiments going forward, start small, increase when
10758
16:28:27,960 --> 16:28:28,960
necessary.
10759
16:28:28,960 --> 16:28:34,520
So we'll re-run this, we'll re-run this, and finally we're going to, oh, there we go,
10760
16:28:34,520 --> 16:28:41,400
we've got a GPU, wonderful, but we'll write some device-agnostic code here, set up device-agnostic
10761
16:28:41,400 --> 16:28:42,720
code.
10762
16:28:42,720 --> 16:28:48,840
So import-torch, now realistically you quite often do this at the start of every notebook,
10763
16:28:48,840 --> 16:28:52,880
but I just wanted to highlight how we might do it if we're in the middle, and I wanted
10764
16:28:52,880 --> 16:28:58,880
to practice running a model on a CPU only before stepping things up and going to a GPU.
10765
16:28:58,880 --> 16:29:06,840
So device equals CUDA, this is for our device-agnostic code, if torch dot CUDA is available, and it
10766
16:29:06,840 --> 16:29:11,400
looks like this is going to return true, else use the CPU.
10767
16:29:11,400 --> 16:29:16,840
And then we're going to check device, wonderful, CUDA.
10768
16:29:16,840 --> 16:29:20,840
So we've got some device-agnostic code ready to go, I think it's time we built another
10769
16:29:20,840 --> 16:29:22,280
model.
10770
16:29:22,280 --> 16:29:26,000
And I asked the question before, do you think that the data set that we're working with
10771
16:29:26,000 --> 16:29:28,600
requires nonlinearity?
10772
16:29:28,600 --> 16:29:34,840
So the shirts, and the bags, and the shoes, do we need nonlinear functions to model this?
10773
16:29:34,840 --> 16:29:40,840
Well it looks like our baseline model without nonlinearities did pretty well at modeling
10774
16:29:40,840 --> 16:29:47,680
our data, so we've got a pretty good test accuracy value, so 83%, so out of 100 images
10775
16:29:47,680 --> 16:29:53,320
it predicts the right one, 83% of the time, 83 times out of 100, it did pretty well without
10776
16:29:53,320 --> 16:29:55,400
nonlinearities.
10777
16:29:55,400 --> 16:30:00,400
Why don't we try a model that uses nonlinearities and it runs on the GPU?
10778
16:30:00,400 --> 16:30:04,640
So you might want to give that a go, see if you can create a model with nonlinear functions,
10779
16:30:04,640 --> 16:30:11,200
try nn.relu, run it on the GPU, and see how it goes, otherwise we'll do it together in
10780
16:30:11,200 --> 16:30:15,040
the next video, I'll see you there.
10781
16:30:15,040 --> 16:30:20,240
Hello everyone, and welcome back, we are making some terrific progress, let's see how far
10782
16:30:20,240 --> 16:30:24,640
we've come, we've got a data set, we've prepared our data loaders, we've built a baseline model,
10783
16:30:24,640 --> 16:30:30,160
and we've trained it, evaluated it, now it's time, oh, and the last video we set up device
10784
16:30:30,160 --> 16:30:37,080
diagnostic code, but where are we in our little framework, we're up to improving through experimentation,
10785
16:30:37,080 --> 16:30:40,720
and quite often that is building a different model and trying it out, it could be using
10786
16:30:40,720 --> 16:30:44,120
more data, it could be tweaking a whole bunch of different things.
10787
16:30:44,120 --> 16:30:49,840
So let's get into some coding, I'm going to write it here, model one, I believe we're
10788
16:30:49,840 --> 16:30:56,520
up to section six now, model one is going to be building a better model with nonlinearity,
10789
16:30:56,520 --> 16:31:00,960
so I asked you to do the challenge in the last video to give it a go, to try and build
10790
16:31:00,960 --> 16:31:05,600
a model with nonlinearity, I hope you gave it a go, because if anything that this course,
10791
16:31:05,600 --> 16:31:09,680
I'm trying to impart on you in this course, it's to give things a go, to try things out
10792
16:31:09,680 --> 16:31:13,960
because that's what machine learning and coding is all about, trying things out, giving it
10793
16:31:13,960 --> 16:31:22,320
a go, but let's write down here, we learned about the power of nonlinearity in notebook
10794
16:31:22,320 --> 16:31:31,160
O2, so if we go to the learnpytorch.io book, we go to section number two, we'll just wait
10795
16:31:31,160 --> 16:31:36,920
for this to load, and then if we come down here, we can search for nonlinearity, the missing
10796
16:31:36,920 --> 16:31:41,920
piece nonlinearity, so I'm going to get this and just copy that in there, if you want to
10797
16:31:41,920 --> 16:31:47,040
see what nonlinearity helps us do, it helps us model nonlinear data, and in the case of
10798
16:31:47,040 --> 16:31:51,960
a circle, can we model that with straight lines, in other words, linear lines?
10799
16:31:51,960 --> 16:31:57,320
All linear means straight, nonlinear means non-straight, and so we learned that through
10800
16:31:57,320 --> 16:32:02,360
the power of linear and nonlinear functions, neural networks can model almost any kind
10801
16:32:02,360 --> 16:32:08,280
of data if we pair them in the right way, so you can go back through and read that there,
10802
16:32:08,280 --> 16:32:15,600
but I prefer to code things out and try it out on our data, so let's create a model with
10803
16:32:15,600 --> 16:32:24,960
nonlinear and linear layers, but we also saw that our model with just linear layers can
10804
16:32:24,960 --> 16:32:29,680
model our data, it's performing quite well, so that's where the experimentation side of
10805
16:32:29,680 --> 16:32:34,440
things will come into play, sometimes you won't know what a model will do, whether it
10806
16:32:34,440 --> 16:32:39,320
will work or won't work on your data set, but that is where we try different things
10807
16:32:39,320 --> 16:32:45,360
out, so we come up here, we look at our data, hmm, that looks actually quite linear to
10808
16:32:45,360 --> 16:32:49,400
me as a bag, like it's just some straight lines, you could maybe model that with just
10809
16:32:49,400 --> 16:32:54,680
straight lines, but there are some things which you could potentially classify as nonlinear
10810
16:32:54,680 --> 16:33:00,320
in here, it's hard to tell without knowing, so let's give it a go, let's write a nonlinear
10811
16:33:00,320 --> 16:33:06,800
model which is going to be quite similar to model zero here, except we're going to interspurse
10812
16:33:06,800 --> 16:33:13,080
some relu layers in between our linear layers, so recall that relu is a nonlinear activation
10813
16:33:13,080 --> 16:33:19,520
function, and relu has the formula, if something comes in and it's a negative value, relu is
10814
16:33:19,520 --> 16:33:23,800
going to turn that negative into a zero, and if something is positive, relu is just going
10815
16:33:23,800 --> 16:33:32,760
to leave it there, so let's create another class here, fashion MNIST model V1, and we're
10816
16:33:32,760 --> 16:33:39,080
going to subclass from nn.module, beautiful, and then we're going to initialize our model,
10817
16:33:39,080 --> 16:33:45,280
it's going to be quite the same as what we created before, we want an input shape, that's
10818
16:33:45,280 --> 16:33:50,120
going to be an integer, and then we want a number of hidden units, and that's going
10819
16:33:50,120 --> 16:33:57,560
to be an int here, and then we want an output shape, int, and I want to stress as well that
10820
16:33:57,560 --> 16:34:03,840
although we're creating a class here with these inputs, classes are as flexible as functions,
10821
16:34:03,840 --> 16:34:08,000
so if you need different use cases for your modeling classes, just keep that in mind that
10822
16:34:08,000 --> 16:34:14,680
you can build that functionality in, self dot layer stack, we're going to spell layer stack
10823
16:34:14,680 --> 16:34:21,200
correctly, and we're going to set this equal to nn dot sequential, because we just want
10824
16:34:21,200 --> 16:34:26,680
a sequential set of layers, the first one's going to be nn dot flatten, which is going
10825
16:34:26,680 --> 16:34:36,480
to be flatten inputs into a single vector, and then we're going to go nn dot linear,
10826
16:34:36,480 --> 16:34:39,720
because we want to flatten our stuff because we want it to be the right shape, if we don't
10827
16:34:39,720 --> 16:34:46,760
flatten it, we get shape issues, input shape, and then the out features of our linear layer
10828
16:34:46,760 --> 16:34:53,040
is going to be the hidden units, hidden units, I'm just going to make some code cells here
10829
16:34:53,040 --> 16:34:58,960
so that my code goes into the middle of the screen, then here is where we're going to
10830
16:34:58,960 --> 16:35:03,720
add a nonlinear layer, so this is where we're going to add in a relu function, and where
10831
16:35:03,720 --> 16:35:08,120
might we put these? Well, generally, you'll have a linear function followed by a nonlinear
10832
16:35:08,120 --> 16:35:13,800
function in the construction of neural networks. However, neural networks are as customizable
10833
16:35:13,800 --> 16:35:19,800
as you can imagine, whether they work or not is a different question. So we'll go output
10834
16:35:19,800 --> 16:35:25,360
shape here, as the out features, oh, do we miss this one up? Yes, we did. This needs
10835
16:35:25,360 --> 16:35:33,360
to be hidden units. And why is that? Well, it's because the output shape of this linear
10836
16:35:33,360 --> 16:35:38,000
layer here needs to match up with the input shape of this linear layer here. The relu
10837
16:35:38,000 --> 16:35:42,240
layer won't change the shape of our data. And you could test that out by printing the
10838
16:35:42,240 --> 16:35:47,680
different shapes if you'd like. And then we're going to finish off with another nonlinear
10839
16:35:47,680 --> 16:35:54,560
layer at the end. Relu. Now, do you think that this will improve our model's results
10840
16:35:54,560 --> 16:35:59,800
or not? Well, it's hard to tell without trying it out, right? So let's continue building
10841
16:35:59,800 --> 16:36:05,360
our model. We have to override the forward method. Self X is going to be, we'll give
10842
16:36:05,360 --> 16:36:09,360
a type in here, this is going to be a torch tensor as the input. And then we're just going
10843
16:36:09,360 --> 16:36:16,920
to return what's happening here, we go self dot layer stack X. So that just means that
10844
16:36:16,920 --> 16:36:20,680
X is going to pass through our layer stack here. And we could customize this, we could
10845
16:36:20,680 --> 16:36:26,720
try it just with one nonlinear activation. This is actually our previous network, just
10846
16:36:26,720 --> 16:36:31,640
with those commented out. All we've done is added in two relu functions. And so I'm
10847
16:36:31,640 --> 16:36:38,040
going to run that beautiful. And so what should we do next? Well, we shouldn't stand
10848
16:36:38,040 --> 16:36:46,240
shaded but previously we ran our last model model zero on if we go parameters. Do we run
10849
16:36:46,240 --> 16:36:54,680
this on the GPU or the CPU? On the CPU. So how about we try out our fashion MNIST model
10850
16:36:54,680 --> 16:37:00,920
or V one running on the device that we just set up which should be CUDA. Wonderful. So
10851
16:37:00,920 --> 16:37:09,160
we can instantiate. So create an instance of model one. So we want model one or actually
10852
16:37:09,160 --> 16:37:14,480
we'll set up a manual seed here so that whenever we create a new instance of a model, it's
10853
16:37:14,480 --> 16:37:18,520
going to be instantiated with random numbers. We don't necessarily have to set a random
10854
16:37:18,520 --> 16:37:25,240
seed, but we do so anyway so that our values are quite similar on your end and my end input
10855
16:37:25,240 --> 16:37:32,200
shape is going to be 784. Where does that come from? Well, that's because this is the
10856
16:37:32,200 --> 16:37:42,800
output of the flatten layer after our 28 by 28 image goes in. Then we're going to set
10857
16:37:42,800 --> 16:37:45,720
up the hidden units. We're going to use the same number of hidden units as before, which
10858
16:37:45,720 --> 16:37:51,440
is going to be 10. And then the output shape is what? We need one value, one output neuron
10859
16:37:51,440 --> 16:37:56,200
for each of our classes. So length of the class names. And then we're going to send
10860
16:37:56,200 --> 16:38:03,320
this to the target device so we can write send to the GPU if it's available. So now
10861
16:38:03,320 --> 16:38:08,040
that we've set up device agnostic code in the last video, we can just put two device
10862
16:38:08,040 --> 16:38:16,720
instead of hard coding that. And so if we check, so this was the output for model zero's device,
10863
16:38:16,720 --> 16:38:23,080
let's now check model one's device, model one parameters, and we can check where those
10864
16:38:23,080 --> 16:38:31,960
parameters live by using the device attribute. Beautiful. So our model one is now living
10865
16:38:31,960 --> 16:38:37,320
on the GPU CUDA at index zero. Index zero means that it's on the first GPU that we have
10866
16:38:37,320 --> 16:38:44,680
available. We only have one GPU available. So it's on this Tesla T for GPU. Now, we've
10867
16:38:44,680 --> 16:38:49,080
got a couple more things to do. Now that we've created another model, we can recreate if
10868
16:38:49,080 --> 16:38:53,360
we go back to our workflow, we've just built a model here. What do we have to do after
10869
16:38:53,360 --> 16:38:58,160
we built a model? We have to instantiate a loss function and an optimizer. Now we've
10870
16:38:58,160 --> 16:39:02,120
done both of those things for model zero. So that's what we're going to do in the next
10871
16:39:02,120 --> 16:39:07,040
video. But I'd like you to go ahead and try to create a loss function for our model and
10872
16:39:07,040 --> 16:39:11,920
optimizer for model one. The hint is that they can be the exact same loss function and
10873
16:39:11,920 --> 16:39:19,360
optimizer as model zero. So give that a shot and I'll see you in the next video. Welcome
10874
16:39:19,360 --> 16:39:24,600
back. In the last video, we created another model. So we're continuing with our modeling
10875
16:39:24,600 --> 16:39:29,560
experiments. And the only difference here between fashion MNIST model V1 and V0 is that
10876
16:39:29,560 --> 16:39:35,160
we've added in nonlinear layers. Now we don't know for now we could think or guess whether
10877
16:39:35,160 --> 16:39:39,520
they would help improve our model. And with practice, you can start to understand how
10878
16:39:39,520 --> 16:39:44,160
different functions will influence your neural networks. But I prefer to, if in doubt, code
10879
16:39:44,160 --> 16:39:49,960
it out, run lots of different experiments. So let's continue. We now have to create
10880
16:39:49,960 --> 16:39:58,240
a loss function, loss, optimizer, and evaluation metrics. So we've done this for model zero.
10881
16:39:58,240 --> 16:40:01,920
So we're not going to spend too much time explaining what's going on here. And we've
10882
16:40:01,920 --> 16:40:06,560
done this a fair few times now. So from helper functions, which is the script we downloaded
10883
16:40:06,560 --> 16:40:11,560
before, we're going to import our accuracy function. And we're going to set up a loss
10884
16:40:11,560 --> 16:40:16,360
function, which is we're working with multi class classification. So what loss function
10885
16:40:16,360 --> 16:40:25,040
do we typically use? And then dot cross entropy loss. And as our optimizer is going to be
10886
16:40:25,040 --> 16:40:31,280
torch dot opt in dot SGD. And we're going to optimize this time. I'll put in the params
10887
16:40:31,280 --> 16:40:37,880
keyword here, model one dot parameters. And the learning rate, we're just going to keep
10888
16:40:37,880 --> 16:40:42,400
it the same as our previous model. And that's a thing to keep a note for your experiments.
10889
16:40:42,400 --> 16:40:46,560
When you're running fair few experiments, you only really want to tweak a couple of things
10890
16:40:46,560 --> 16:40:51,120
or maybe just one thing per experiment, that way you can really narrow down what actually
10891
16:40:51,120 --> 16:40:55,920
influences your model and what improves it slash what doesn't improve it. And a little
10892
16:40:55,920 --> 16:41:05,560
pop quiz. What does a loss function do? This is going to measure how wrong our model is.
10893
16:41:05,560 --> 16:41:15,360
And what does the optimizer do? Tries to update our models parameters to reduce the
10894
16:41:15,360 --> 16:41:21,360
loss. So that's what these two functions are going to be doing. The accuracy function is
10895
16:41:21,360 --> 16:41:26,440
of a course going to be measuring our models accuracy. We measure the accuracy because that's
10896
16:41:26,440 --> 16:41:33,560
one of the base classification metrics. So we'll run this. Now what's next? We're getting
10897
16:41:33,560 --> 16:41:38,080
quite good at this. We've picked a loss function and an optimizer. Now we're going to build
10898
16:41:38,080 --> 16:41:43,960
a training loop. However, we spent quite a bit of time doing that in a previous video.
10899
16:41:43,960 --> 16:41:49,000
If we go up here, that was our vowel model function. Oh, that was helpful. We turned it
10900
16:41:49,000 --> 16:41:55,400
into a function. How about we do the same with these? Why don't we make a function for
10901
16:41:55,400 --> 16:42:03,640
our training loop as well as our testing loop? So I think you can give this a go. We're going
10902
16:42:03,640 --> 16:42:09,240
to make a function in the next video for training. We're going to call that train step. And
10903
16:42:09,240 --> 16:42:14,120
we'll create a function for testing called test step. Now they'll both have to take in
10904
16:42:14,120 --> 16:42:18,360
some parameters. I'll let you figure out what they are. But otherwise, we're going to code
10905
16:42:18,360 --> 16:42:24,040
that up together in the next video. So I'll see you there.
10906
16:42:24,040 --> 16:42:28,800
So we've got a loss function ready and an optimizer. What's our next step? Well, it's
10907
16:42:28,800 --> 16:42:32,720
to create training and evaluation loops. So let's make a heading here. We're going to
10908
16:42:32,720 --> 16:42:40,680
call this functionizing training and evaluation or slash testing loops because we've written
10909
16:42:40,680 --> 16:42:48,160
similar code quite often for training and evaluating slash testing our models. Now we're
10910
16:42:48,160 --> 16:42:52,840
going to start moving towards functionizing code that we've written before because that's
10911
16:42:52,840 --> 16:42:56,640
not only a best practice, it helps reduce errors because if you're writing a training
10912
16:42:56,640 --> 16:43:01,160
loop all the time, we may get it wrong. If we've got one that works for our particular
10913
16:43:01,160 --> 16:43:05,560
problem, hey, we might as well save that as a function so we can continually call that
10914
16:43:05,560 --> 16:43:11,240
over and over and over again. So how about we, and this is going to be very rare that
10915
16:43:11,240 --> 16:43:15,920
I'm going to allow you to do this is that is we're going to copy this training and you
10916
16:43:15,920 --> 16:43:22,400
might have already attempted to create this. That is the function called, let's create
10917
16:43:22,400 --> 16:43:34,640
a function for one training loop. And we're going to call this train step. And we're going
10918
16:43:34,640 --> 16:43:39,880
to create a function for the testing loop. You're going to call this test step. Now these
10919
16:43:39,880 --> 16:43:44,280
are just what I'm calling them. You can call them whatever you want. I just understand
10920
16:43:44,280 --> 16:43:50,520
it quite easily by calling it train step. And then we can for each epoch in a range,
10921
16:43:50,520 --> 16:43:55,480
we call our training step. And then the same thing for each epoch in a range, we can call
10922
16:43:55,480 --> 16:44:01,080
a testing step. This will make a lot more sense once we've coded it out. So let's put
10923
16:44:01,080 --> 16:44:07,840
the training code here. To functionize this, let's start it off with train step. Now what
10924
16:44:07,840 --> 16:44:12,240
parameters should our train step function take in? Well, let's think about this. We
10925
16:44:12,240 --> 16:44:21,840
need a model. We need a data loader. We need a loss function. And we need an optimizer.
10926
16:44:21,840 --> 16:44:28,640
We could also put in an accuracy function here if we wanted to. And potentially it's
10927
16:44:28,640 --> 16:44:33,960
not here, but we could put in what target device we'd like to compute on and make our
10928
16:44:33,960 --> 16:44:38,560
code device agnostic. So this is just the exact same code we went through before. We
10929
16:44:38,560 --> 16:44:43,480
loop through a data loader. We do the forward pass. We calculate the loss. We accumulate
10930
16:44:43,480 --> 16:44:49,960
it. We zero the optimizer. We perform backpropagation in respect to the loss with the parameters
10931
16:44:49,960 --> 16:44:54,520
of the model. And then we step the optimizer to hopefully improve the parameters of our
10932
16:44:54,520 --> 16:45:00,960
model to better predict the data that we're trying to predict. So let's craft a train
10933
16:45:00,960 --> 16:45:08,520
step function here. We'll take a model, which is going to be torch nn.module, type hint.
10934
16:45:08,520 --> 16:45:16,120
And we're going to put in a data loader, which is going to be of type torch utils dot data
10935
16:45:16,120 --> 16:45:21,000
dot data loader. Now we don't necessarily need to put this in these type hints, but
10936
16:45:21,000 --> 16:45:24,520
they're relatively new addition to Python. And so you might start to see them more and
10937
16:45:24,520 --> 16:45:30,440
more. And it also just helps people understand what your code is expecting. So the loss
10938
16:45:30,440 --> 16:45:38,080
function, we're going to put in an optimizer torch dot opt in, which is a type optimizer.
10939
16:45:38,080 --> 16:45:42,200
We also want an accuracy function. We don't necessarily need this either. These are a
10940
16:45:42,200 --> 16:45:47,920
lot of nice to habs. The first four are probably the most important. And then the device. So
10941
16:45:47,920 --> 16:45:55,640
torch is going to be torch dot device equals device. So we'll just hard code that to be
10942
16:45:55,640 --> 16:46:04,560
our already set device parameter. And we'll just write in here, performs training step
10943
16:46:04,560 --> 16:46:15,360
with model, trying to learn on data loader. Nice and simple, we could make that more
10944
16:46:15,360 --> 16:46:20,400
explanatory if we wanted to, but we'll leave it at that for now. And so right at the start,
10945
16:46:20,400 --> 16:46:25,800
we're going to set up train loss and train act equals zero zero. We're going to introduce
10946
16:46:25,800 --> 16:46:30,680
accuracy here. So we can get rid of this. Let's just go through this line by line. What
10947
16:46:30,680 --> 16:46:37,400
do we need to do here? Well, we've got four batch XY in enumerate train data loader. But
10948
16:46:37,400 --> 16:46:42,640
we're going to change that to data loader up here. So we can just change this to data
10949
16:46:42,640 --> 16:46:50,480
loader. Wonderful. And now we've got model zero dot train. Do we want that? Well, no,
10950
16:46:50,480 --> 16:46:54,000
because we're going to keep this model agnostic, we want to be able to use any model with this
10951
16:46:54,000 --> 16:46:59,520
function. So let's get rid of this model dot train. We are missing one step here is
10952
16:46:59,520 --> 16:47:10,600
put data on target device. And we could actually put this model dot train up here. Put model
10953
16:47:10,600 --> 16:47:15,680
into training mode. Now, this will be the default for the model. But just in case we're
10954
16:47:15,680 --> 16:47:21,320
going to call it anyway, model dot train, put data on the target device. So we're going
10955
16:47:21,320 --> 16:47:31,680
to go XY equals X dot two device, Y dot two device. Wonderful. And the forward pass, we
10956
16:47:31,680 --> 16:47:36,760
don't need to use model zero anymore. We're just going to use model that's up here. The
10957
16:47:36,760 --> 16:47:42,200
loss function can stay the same because we're passing in a loss function up there. The train
10958
16:47:42,200 --> 16:47:48,400
loss can be accumulated. That's fine. But we might also accumulate now the train accuracy,
10959
16:47:48,400 --> 16:47:57,960
limit loss, and accuracy per batch. So train act equals or plus equals our accuracy function
10960
16:47:57,960 --> 16:48:08,240
on Y true equals Y and Y pred equals Y pred. So the outputs here, Y pred, we need to take
10961
16:48:08,240 --> 16:48:14,960
because the raw outputs, outputs, the raw logits from the model, because our accuracy
10962
16:48:14,960 --> 16:48:20,480
function expects our predictions to be in the same format as our true values. We need
10963
16:48:20,480 --> 16:48:24,960
to make sure that they are we can call the argmax here on the first dimension. This is
10964
16:48:24,960 --> 16:48:33,560
going to go from logits to prediction labels. We can keep the optimizer zero grab the same
10965
16:48:33,560 --> 16:48:37,600
because we're passing in an optimizer up here. We can keep the loss backwards because the
10966
16:48:37,600 --> 16:48:43,920
loss is just calculated there. We can keep optimizer step. And we could print out what's
10967
16:48:43,920 --> 16:48:50,520
happening. But we might change this up a little bit. We need to divide the total train loss
10968
16:48:50,520 --> 16:48:55,560
and accuracy. I just want to type in accuracy here because now we've added in accuracy metric
10969
16:48:55,560 --> 16:49:04,160
act. So train act divided equals length train data loader. Oh, no, sorry. We can just use
10970
16:49:04,160 --> 16:49:13,240
the data loader here, data loader, data loader. And we're not going to print out per batch
10971
16:49:13,240 --> 16:49:18,200
here. I'm just going to get rid of this. We'll make at the end of this step, we will make
10972
16:49:18,200 --> 16:49:23,760
our print out here, print. Notice how it's at the end of the step because we're outside
10973
16:49:23,760 --> 16:49:30,800
the for loop now. So we're going to here, we're accumulating the loss on the training
10974
16:49:30,800 --> 16:49:35,520
data set and the accuracy on the training data set per batch. And then we're finding
10975
16:49:35,520 --> 16:49:39,640
out at the end of the training steps. So after it's been through all the batches in
10976
16:49:39,640 --> 16:49:45,000
the data loader, we're finding out what the average loss is per batch. And the average
10977
16:49:45,000 --> 16:49:53,120
accuracy is per batch. And now we're going to go train loss is going to be the train
10978
16:49:53,120 --> 16:50:07,680
loss on 0.5. And then we're going to go train act is going to be train act. And we're going
10979
16:50:07,680 --> 16:50:20,160
to set that to 0.2 F. Get that there, percentage. Wonderful. So if all this works, we should
10980
16:50:20,160 --> 16:50:25,400
be able to call our train step function and pass it in a model, a data loader, a loss
10981
16:50:25,400 --> 16:50:30,760
function, an optimizer, an accuracy function and a device. And it should automatically
10982
16:50:30,760 --> 16:50:34,960
do all of these steps. So we're going to find that out in a later video. In the next video,
10983
16:50:34,960 --> 16:50:39,320
we're going to do the same thing we've just done for the training loop with the test step.
10984
16:50:39,320 --> 16:50:43,400
But here's your challenge for this video is to go up to the testing loop code we wrote
10985
16:50:43,400 --> 16:50:49,600
before and try to recreate the test step function in the same format that we've done here. So
10986
16:50:49,600 --> 16:50:56,080
give that a go. And I'll see you in the next video. Welcome back. In the last video, we
10987
16:50:56,080 --> 16:51:01,480
functionalized our training loop. So now we can call this train step function. And instead
10988
16:51:01,480 --> 16:51:06,120
of writing all this training loop code again, well, we can train our model through the art
10989
16:51:06,120 --> 16:51:11,400
of a function. Now let's do the same for our testing loop. So I issued you the challenge
10990
16:51:11,400 --> 16:51:15,680
in the last video to give it a go. I hope you did because that's the best way to practice
10991
16:51:15,680 --> 16:51:20,160
PyTorch code is to write more pytorch code. Let's put in a model, which is going to be
10992
16:51:20,160 --> 16:51:28,720
torch and then dot module. And we're going to put in a data loader. Because we need a
10993
16:51:28,720 --> 16:51:33,360
model and we need data, the data loader is going to be, of course, the test data load
10994
16:51:33,360 --> 16:51:38,760
here, torch dot utils dot data dot data loader. And then we're going to put in a loss function,
10995
16:51:38,760 --> 16:51:44,960
which is going to be torch and end up module as well. Because we're going to use an end
10996
16:51:44,960 --> 16:51:49,680
up cross entropy loss. We'll see that later on. We're going to put in an accuracy function.
10997
16:51:49,680 --> 16:51:53,440
We don't need an optimizer because we're not doing any optimization in the testing loop.
10998
16:51:53,440 --> 16:51:58,280
We're just evaluating. And the device can be torch dot device. And we're going to set
10999
16:51:58,280 --> 16:52:04,600
that as a default to the target device parameter. Beautiful. So we'll put a little doctoring
11000
16:52:04,600 --> 16:52:18,520
here. So performs a testing loop step on model going over data loader. Wonderful. So now
11001
16:52:18,520 --> 16:52:23,560
let's set up a test loss and a test accuracy, because we'll measure test loss and accuracy
11002
16:52:23,560 --> 16:52:28,800
without testing loop function. And we're going to set the model into, I'll just put a comment
11003
16:52:28,800 --> 16:52:38,960
here, put the model in a vowel mode. So model dot a vowel, we don't have to use any underscore
11004
16:52:38,960 --> 16:52:44,960
here as in model zero, because we have a model coming in the top here. Now, what should we
11005
16:52:44,960 --> 16:52:50,680
do? Well, because we're performing a test step, we should turn on inference mode. So
11006
16:52:50,680 --> 16:52:57,840
turn on inference mode, inference mode context manager. Remember, whenever you're performing
11007
16:52:57,840 --> 16:53:02,920
predictions with your model, you should put it in model dot a vowel. And if you want as
11008
16:53:02,920 --> 16:53:07,320
many speedups as you can get, make sure the predictions are done within the inference
11009
16:53:07,320 --> 16:53:12,000
mode. Because remember, inference is another word for predictions within the inference
11010
16:53:12,000 --> 16:53:18,120
mode context manager. So we're going to loop through our data loader for X and Y in data
11011
16:53:18,120 --> 16:53:24,200
loader. We don't have to specify that this is X test. For Y test, we could if we wanted
11012
16:53:24,200 --> 16:53:31,220
to. But because we're in another function here, we can just go for X, Y in data loader,
11013
16:53:31,220 --> 16:53:40,520
we can do the forward pass. After we send the data to the target device, target device,
11014
16:53:40,520 --> 16:53:48,040
so we're going to have X, Y equals X dot two device. And the same thing with Y, we're
11015
16:53:48,040 --> 16:53:53,200
just doing best practice here, creating device agnostic code. Then what should we do? Well,
11016
16:53:53,200 --> 16:53:56,520
we should do the thing that I said before, which is the forward pass. Now that our data
11017
16:53:56,520 --> 16:54:02,320
and model be on the same device, we can create a variable here test pred equals model, we're
11018
16:54:02,320 --> 16:54:09,000
going to pass in X. And then what do we do? We can calculate the loss. So to calculate
11019
16:54:09,000 --> 16:54:17,800
the loss slash accuracy, we're going to accumulate it per batch. So we'll set up test loss equals
11020
16:54:17,800 --> 16:54:25,480
loss function. Oh, plus equals loss function. We're going to pass it in test pred and Y,
11021
16:54:25,480 --> 16:54:30,800
which is our truth label. And then the test act where you will accumulate as well, using
11022
16:54:30,800 --> 16:54:36,560
our accuracy function, we'll pass in Y true equals Y. And then Y pred, what do we have
11023
16:54:36,560 --> 16:54:43,960
to do to Y pred? Well, our test pred, we have to take the argmax to convert it from.
11024
16:54:43,960 --> 16:54:51,080
So this is going to outputs raw logits. Remember, a models raw output is referred to as logits.
11025
16:54:51,080 --> 16:55:01,080
And then here, we have to go from logits to prediction labels. Beautiful. Oh, little typo
11026
16:55:01,080 --> 16:55:07,840
here. Did you catch that one? Tab, tab. Beautiful. Oh, look how good this function is looking.
11027
16:55:07,840 --> 16:55:14,680
Now we're going to adjust the metrics. So adjust metrics and print out. You might notice
11028
16:55:14,680 --> 16:55:21,280
that we're outside of the batch loop here, right? So if we draw down from this line for
11029
16:55:21,280 --> 16:55:25,880
and we write some code here, we're still within the context manager. This is important because
11030
16:55:25,880 --> 16:55:33,800
if we want to adapt a value created inside the context manager, we have to modify it
11031
16:55:33,800 --> 16:55:39,880
still with inside that context manager, otherwise pytorch will throw an error. So try to write
11032
16:55:39,880 --> 16:55:46,800
this code if you want outside the context manager and see if it still works. So test loss, we're
11033
16:55:46,800 --> 16:55:54,400
going to adjust it to find out the average test loss and test accuracy per batch across
11034
16:55:54,400 --> 16:56:00,680
a whole step. So we're going to go length data loader. Now we're going to print out
11035
16:56:00,680 --> 16:56:06,840
what's happening. Print out what's happening. So test loss, which we put in here, well,
11036
16:56:06,840 --> 16:56:11,360
we're going to get the test loss. Let's get this to five decimal places. And then we're
11037
16:56:11,360 --> 16:56:18,040
going to go test act. And we will get that to two decimal places. You could do this as
11038
16:56:18,040 --> 16:56:24,480
many decimal as you want. You could even times it by 100 to get it in proper accuracy format.
11039
16:56:24,480 --> 16:56:31,400
And we'll put a new line on the end here. Wonderful. So now it looks like we've got functions.
11040
16:56:31,400 --> 16:56:37,840
I haven't run this cell yet for a training step and a test step. So how do you think we
11041
16:56:37,840 --> 16:56:42,440
could replicate if we go back up to our training loop that we wrote before? How do you think
11042
16:56:42,440 --> 16:56:51,160
we could replicate the functionality of this, except this time using our functions? Well,
11043
16:56:51,160 --> 16:56:56,280
we could still use this for epoch and TQDM range epochs. But then we would just call
11044
16:56:56,280 --> 16:57:01,840
our training step for this training code, our training step function. And we would call
11045
16:57:01,840 --> 16:57:07,800
our testing step function, passing in the appropriate parameters for our testing loop.
11046
16:57:07,800 --> 16:57:12,320
So that's what we'll do in the next video. We will leverage our two functions, train
11047
16:57:12,320 --> 16:57:18,080
step and test step to train model one. But here's your challenge for this video. Give
11048
16:57:18,080 --> 16:57:24,360
that a go. So use our training step and test step function to train model one for three
11049
16:57:24,360 --> 16:57:31,680
epochs and see how you go. But we'll do it together in the next video. Welcome back.
11050
16:57:31,680 --> 16:57:37,200
How'd you go? Did you create a training loop or a PyTorch optimization loop using our training
11051
16:57:37,200 --> 16:57:43,000
step function and a test step function? Were there any errors? In fact, I don't even know.
11052
16:57:43,000 --> 16:57:46,760
But how about we find out together? Hey, how do we combine these two functions to create
11053
16:57:46,760 --> 16:57:54,400
an optimization loop? So I'm going to go torch dot manual seed 42. And I'm going to measure
11054
16:57:54,400 --> 16:57:58,720
the time of how long our training and test loop takes. This time we're using a different
11055
16:57:58,720 --> 16:58:03,200
model. So this model uses nonlinearities and it's on the GPU. So that's the main thing
11056
16:58:03,200 --> 16:58:08,200
we want to compare is how long our model took on CPU versus GPU. So I'm going to import
11057
16:58:08,200 --> 16:58:16,640
from time it, import default timer as timer. And I'm going to start the train time. Train
11058
16:58:16,640 --> 16:58:27,880
time start on GPU equals timer. And then I'm just right here, set epochs. I'm going to
11059
16:58:27,880 --> 16:58:32,560
set epochs equal to three, because we want to keep our training experiments as close
11060
16:58:32,560 --> 16:58:38,960
to the same as possible. So we can see what little changes do what. And then it's create
11061
16:58:38,960 --> 16:58:51,280
a optimization and evaluation loop using train step and test step. So we're going to loop
11062
16:58:51,280 --> 16:58:59,960
through the epochs for epoch in TQDM. So we get a nice progress bar in epochs. Then we're
11063
16:58:59,960 --> 16:59:08,160
going to print epoch. A little print out of what's going on. Epoch. And we'll get a new
11064
16:59:08,160 --> 16:59:12,120
line. And then maybe one, two, three, four, five, six, seven, eight or something like
11065
16:59:12,120 --> 16:59:16,680
that. Maybe I'm miscounted there. But that's all right. Train step. What do we have to
11066
16:59:16,680 --> 16:59:21,080
do for this? Now we have a little doc string. We have a model. What model would we like
11067
16:59:21,080 --> 16:59:26,360
to use? We'd like to use model one. We have a data loader. What data loader would we
11068
16:59:26,360 --> 16:59:32,720
like to use? Well, we'd like to use our train data loader. We also have a loss function,
11069
16:59:32,720 --> 16:59:44,200
which is our loss function. We have an optimizer, which is our optimizer. And we have an accuracy
11070
16:59:44,200 --> 16:59:53,040
function, which is our accuracy function. And oops, forgot to put FM. And finally, we have
11071
16:59:53,040 --> 16:59:58,200
a device, which equals device, but we're going to set that anyway. So how beautiful is that
11072
16:59:58,200 --> 17:00:02,000
for creating a training loop? Thanks to the code that we've functionalized before. And
11073
17:00:02,000 --> 17:00:07,120
just recall, we set our optimizer and loss function in a previous video. You could bring
11074
17:00:07,120 --> 17:00:12,280
these down here if you really wanted to, so that they're all in one place, either way
11075
17:00:12,280 --> 17:00:17,720
up. But we can just get rid of that because we've already set it. Now we're going to do
11076
17:00:17,720 --> 17:00:22,280
the same thing for our test step. So what do we need here? Let's check the doc string.
11077
17:00:22,280 --> 17:00:25,920
We could put a little bit more information in this doc string if we wanted to to really
11078
17:00:25,920 --> 17:00:30,400
make our code more reusable, and so that if someone else was to use our code, or even
11079
17:00:30,400 --> 17:00:35,800
us in the future knows what's going on. But let's just code it out because we're just
11080
17:00:35,800 --> 17:00:40,480
still fresh in our minds. Model equals model one. What's our data loader going to be for
11081
17:00:40,480 --> 17:00:45,760
the test step? It's going to be our test data loader. Then we're going to set in a loss
11082
17:00:45,760 --> 17:00:49,400
function, which is going to be just the same loss function. We don't need to use an optimizer
11083
17:00:49,400 --> 17:00:56,120
here because we are only evaluating our model, but we can pass in our accuracy function.
11084
17:00:56,120 --> 17:01:00,800
Accuracy function. And then finally, the device is already set, but we can just pass
11085
17:01:00,800 --> 17:01:08,160
it in anyway. Look at that. Our whole optimization loop in a few lines of code. Isn't that beautiful?
11086
17:01:08,160 --> 17:01:12,960
So these functions are something that you could put in, like our helper functions dot
11087
17:01:12,960 --> 17:01:17,720
pi. And that way you could just import it later on. And you don't have to write your
11088
17:01:17,720 --> 17:01:22,600
training loops all over again. But we'll see a more of an example of that later on in
11089
17:01:22,600 --> 17:01:30,040
the course. So let's keep going. We want to measure the train time, right? So we're
11090
17:01:30,040 --> 17:01:34,920
going to create, once it's been through these steps, we're going to create train time end
11091
17:01:34,920 --> 17:01:41,040
on CPU. And then we're going to set that to the timer. So all this is going to do is
11092
17:01:41,040 --> 17:01:46,160
measure at value in time, once this line of code is run, it's going to run all of these
11093
17:01:46,160 --> 17:01:50,920
lines of code. So it's going to perform the training and optimization loop. And then it's
11094
17:01:50,920 --> 17:01:57,120
going to, oh, excuse me, this should be GPU. It's going to measure a point in time here.
11095
17:01:57,120 --> 17:02:01,400
So once all this codes run, measure a point in time there. And then finally, we can go
11096
17:02:01,400 --> 17:02:08,880
total train time for model one is equal to print train time, which is our function that
11097
17:02:08,880 --> 17:02:14,600
we wrote before. And we pass it in a start time. And it prints the difference between
11098
17:02:14,600 --> 17:02:21,120
the start and end time on a target device. So let's do that. Start equals what? Train
11099
17:02:21,120 --> 17:02:31,960
time start on GPU. The end is going to be train time end on GPU. And the device is going
11100
17:02:31,960 --> 17:02:42,000
to be device. Beautiful. So are you ready to run our next modeling experiment model one?
11101
17:02:42,000 --> 17:02:46,080
We've got a model running on the GPU, and it's using nonlinear layers. And we want to
11102
17:02:46,080 --> 17:02:53,840
compare it to our first model, which our results were model zero results. And we have total
11103
17:02:53,840 --> 17:03:00,720
train time on model zero. Yes, we do. So this is what we're going for. Does our model
11104
17:03:00,720 --> 17:03:06,920
one beat these results? And does it beat this result here? So three, two, one, do we
11105
17:03:06,920 --> 17:03:14,160
have any errors? No, we don't. Okay. Train step got an unexpected keyword loss. Oh, did
11106
17:03:14,160 --> 17:03:20,440
you catch that? I didn't type in loss function. Let's run it again. There we go. Okay, we're
11107
17:03:20,440 --> 17:03:25,480
running. We've got a progress bar. It's going to output at the end of each epoch. There
11108
17:03:25,480 --> 17:03:32,880
we go. Training loss. All right. Test accuracy, training accuracy. This is so exciting. I
11109
17:03:32,880 --> 17:03:38,240
love watching neural networks train. Okay, we're improving per epoch. That's a good sign.
11110
17:03:38,240 --> 17:03:45,720
But we've still got a fair way to go. Oh, okay. So what do we have here? Well, we didn't
11111
17:03:45,720 --> 17:03:51,240
beat our, hmm, it looks like we didn't beat our model zero results with the nonlinear
11112
17:03:51,240 --> 17:03:58,560
layers. And we only just slightly had a faster training time. Now, again, your numbers might
11113
17:03:58,560 --> 17:04:02,720
not be the exact same as what I've got here. Right? So that's a big thing about machine
11114
17:04:02,720 --> 17:04:08,200
learning is that it uses randomness. So your numbers might be slightly different. The direction
11115
17:04:08,200 --> 17:04:13,360
should be quite similar. And we may be using different GPUs. So just keep that in mind.
11116
17:04:13,360 --> 17:04:18,920
Right now I'm using a new video, SMI. I'm using a Tesla T4, which is at the time of
11117
17:04:18,920 --> 17:04:25,360
recording this video, Wednesday, April 20, 2022 is a relatively fast GPU for making
11118
17:04:25,360 --> 17:04:29,840
inference. So just keep that in mind. Your GPU in the future may be different. And your
11119
17:04:29,840 --> 17:04:35,760
CPU that you run may also have a different time here. So if these numbers are like 10
11120
17:04:35,760 --> 17:04:40,680
times higher, you might want to look into seeing if your code is there's some error.
11121
17:04:40,680 --> 17:04:44,840
If they're 10 times lower, well, hey, you're running it on some fast hardware. So it looks
11122
17:04:44,840 --> 17:04:52,720
like my code is running on CUDA slightly faster than the CPU, but not dramatically faster.
11123
17:04:52,720 --> 17:04:57,080
And that's probably akin to the fact that our data set isn't too complex and our model
11124
17:04:57,080 --> 17:05:01,840
isn't too large. What I mean by that is our model doesn't have like a vast amount of
11125
17:05:01,840 --> 17:05:07,520
layers. And our data set is only comprised of like, this is the layers our model has.
11126
17:05:07,520 --> 17:05:13,840
And our data set is only comprised of 60,000 images that are 28 by 28. So as you can imagine,
11127
17:05:13,840 --> 17:05:18,760
the more parameters in your model, the more features in your data, the higher this time
11128
17:05:18,760 --> 17:05:25,560
is going to be. And you might sometimes even find that your model is faster on CPU. So
11129
17:05:25,560 --> 17:05:32,280
this is the train time on CPU. You might sometimes find that your model's training
11130
17:05:32,280 --> 17:05:38,240
time on a CPU is in fact faster for the exact same code running on a GPU. Now, why might
11131
17:05:38,240 --> 17:05:48,520
that be? Well, let's write down this here. Let's go note. Sometimes, depending on your
11132
17:05:48,520 --> 17:06:00,560
data slash hardware, you might find that your model trains faster on CPU than GPU. Now,
11133
17:06:00,560 --> 17:06:09,160
why is this? So one of the number one reasons is that one, it could be that the overhead
11134
17:06:09,160 --> 17:06:22,360
for copying data slash model to and from the GPU outweighs the compute benefits offered
11135
17:06:22,360 --> 17:06:28,680
by the GPU. So that's probably one of the number one reasons is that you have to, for
11136
17:06:28,680 --> 17:06:35,600
data to be processed on a GPU, you have to copy it because it is by default on the CPU.
11137
17:06:35,600 --> 17:06:40,840
If you have to copy it to that GPU, you have some overhead time for doing that copy into
11138
17:06:40,840 --> 17:06:45,600
the GPU memory. And then although the GPU will probably compute faster on that data
11139
17:06:45,600 --> 17:06:50,280
once it's there, you still have that back and forth of going between the CPU and the
11140
17:06:50,280 --> 17:07:01,480
GPU. And the number two reason is that the hardware you're using has a better CPU in
11141
17:07:01,480 --> 17:07:08,800
terms of compute capability than the GPU. Now, this is quite a bit rarer. Usually if
11142
17:07:08,800 --> 17:07:14,480
you're using a GPU like a fairly modern GPU, it will be faster at computing, deep learning
11143
17:07:14,480 --> 17:07:21,000
or running deep learning algorithms than your general CPU. But sometimes these numbers
11144
17:07:21,000 --> 17:07:24,920
of compute time are really dependent on the hardware that you're running. So you'll get
11145
17:07:24,920 --> 17:07:29,360
the biggest benefits of speedups on the GPU when you're running larger models, larger
11146
17:07:29,360 --> 17:07:34,720
data sets, and more compute intensive layers in your neural networks. And so if you'd like
11147
17:07:34,720 --> 17:07:39,160
a great article on how to get the most out of your GPUs, it's a little bit technical,
11148
17:07:39,160 --> 17:07:43,880
but this is something to keep in mind as you progress as a machine learning engineer is
11149
17:07:43,880 --> 17:07:54,960
how to make your GPUs go burr. And I mean that burr from first principles. There we
11150
17:07:54,960 --> 17:08:01,320
go. Making deep learning go burr as in your GPU is going burr because it's running so
11151
17:08:01,320 --> 17:08:08,520
fast from first principles. So this is by Horace He who works on PyTorch. And it's
11152
17:08:08,520 --> 17:08:13,080
great. It talks about compute as a first principle. So here's what I mean by copying
11153
17:08:13,080 --> 17:08:17,080
memory and compute. There might be a fair few things you're not familiar with here,
11154
17:08:17,080 --> 17:08:21,840
but that's okay. But just be aware bandwidth. So bandwidth costs are essentially the cost
11155
17:08:21,840 --> 17:08:26,360
paid to move data from one place to another. That's what I was talking about copying stuff
11156
17:08:26,360 --> 17:08:32,800
from the CPU to the GPU. And then also there's one more, where is it overhead? Overhead is
11157
17:08:32,800 --> 17:08:37,240
basically everything else. I called it overhead. There are different terms for different things.
11158
17:08:37,240 --> 17:08:43,120
This article is excellent. So I'm going to just copy this in here. And you'll find this
11159
17:08:43,120 --> 17:08:51,680
in the resources, by the way. So for more on how to make your models compute faster,
11160
17:08:51,680 --> 17:08:59,520
see here. Lovely. So right now our baseline model is performing the best in terms of results.
11161
17:08:59,520 --> 17:09:05,800
And in terms of, or actually our model computing on the GPU is performing faster than our CPU.
11162
17:09:05,800 --> 17:09:10,360
Again yours might be slightly different. For my case, for my particular hardware, CUDA
11163
17:09:10,360 --> 17:09:16,840
is faster. Except model zero, our baseline is better than model one. So what's to do
11164
17:09:16,840 --> 17:09:24,400
next? Well, it's to keep experimenting, of course. I'll see you in the next video. Welcome
11165
17:09:24,400 --> 17:09:29,760
back. Now, before we move on to the next modeling experiment, let's get a results dictionary
11166
17:09:29,760 --> 17:09:35,400
for our model one, a model that we trained on. So just like we've got one for model zero,
11167
17:09:35,400 --> 17:09:39,800
let's create one of these for model one results. And we can create that without a vowel model
11168
17:09:39,800 --> 17:09:45,400
function. So we'll go right back down to where we were. I'll just get rid of this cell.
11169
17:09:45,400 --> 17:09:51,800
And let's type in here, get model one results dictionary. This is helpful. So later on,
11170
17:09:51,800 --> 17:09:56,680
we can compare all of our modeling results, because they'll all be in dictionary format.
11171
17:09:56,680 --> 17:10:05,200
So we're going to model one results equals a vowel model on a model equals model one.
11172
17:10:05,200 --> 17:10:12,040
And we can pass in a data loader, which is going to be our test data loader. Then we
11173
17:10:12,040 --> 17:10:16,840
can pass in a loss function, which is going to equal our loss function. And we can pass
11174
17:10:16,840 --> 17:10:25,680
in our accuracy function equals accuracy function. Wonderful. And then if we check out our model
11175
17:10:25,680 --> 17:10:34,760
one results, what do we get? Oh, no, we get an error. Do we get the code right? That looks
11176
17:10:34,760 --> 17:10:41,280
right to me. Oh, what does this say runtime error expected all tensors to be on the same
11177
17:10:41,280 --> 17:10:49,840
device, but found at least two devices, CUDA and CPU. Of course. So why did this happen?
11178
17:10:49,840 --> 17:10:54,880
Well, let's go back up to our of our model function, wherever we defined that. Here we
11179
17:10:54,880 --> 17:11:02,200
go. Ah, I see. So this is a little gotcha in pytorch or in deep learning in general. There's
11180
17:11:02,200 --> 17:11:05,520
a saying in the industry that deep learning models fail silently. And this is kind of
11181
17:11:05,520 --> 17:11:13,040
one of those ones. It's because our data and our model are on different devices. So remember
11182
17:11:13,040 --> 17:11:19,560
how I said the three big errors are shape mismatches with your data and your model device
11183
17:11:19,560 --> 17:11:24,800
mismatches, which is what we've got so far. And then data type mismatches, which is if
11184
17:11:24,800 --> 17:11:28,980
your data is in the wrong data type to be computed on. So what we're going to have to
11185
17:11:28,980 --> 17:11:35,560
do to fix this is let's bring down our vowel model function down to where we were. And
11186
17:11:35,560 --> 17:11:42,000
just like we've done in our test step and train step functions, where we've created
11187
17:11:42,000 --> 17:11:47,160
device agnostic data here, we've sent our data to the target device, we'll do that exact
11188
17:11:47,160 --> 17:11:52,080
same thing in our vowel model function. And this is just a note for going forward. It's
11189
17:11:52,080 --> 17:11:58,040
always handy to where you can create device agnostic code. So we've got our new of our
11190
17:11:58,040 --> 17:12:07,600
model function here for x, y in our data loader. Let's make our data device agnostic. So just
11191
17:12:07,600 --> 17:12:12,960
like our model is device agnostic, we've sent it to the target device, we will do the same
11192
17:12:12,960 --> 17:12:20,240
here, x dot two device, and then y dot two device. Let's see if that works. We will
11193
17:12:20,240 --> 17:12:25,200
just rerun this cell up here. I'll grab this, we're just going to write the exact same
11194
17:12:25,200 --> 17:12:30,720
code as what we did before. But now it should work because we've sent our, we could actually
11195
17:12:30,720 --> 17:12:36,520
also just pass in the target device here, device equals device. That way we can pass
11196
17:12:36,520 --> 17:12:41,920
in whatever device we want to run it on. And we're going to just add in device here,
11197
17:12:41,920 --> 17:12:50,800
device equals device. And let's see if this runs correctly. Beautiful. So if we compare
11198
17:12:50,800 --> 17:12:57,520
this to our model zero results, it looks like our baseline's still out in front. But that's
11199
17:12:57,520 --> 17:13:02,280
okay. We're going to in the next video, start to step things up a notch and move on to convolutional
11200
17:13:02,280 --> 17:13:07,080
neural networks. This is very exciting. And by the way, just remember, if your numbers
11201
17:13:07,080 --> 17:13:12,760
here aren't exactly the same as mine, don't worry too much. If they're out landishly different,
11202
17:13:12,760 --> 17:13:16,520
just go back through your code and see if it's maybe a cell hasn't been run correctly
11203
17:13:16,520 --> 17:13:20,760
or something like that. If there are a few decimal places off, that's okay. That's due
11204
17:13:20,760 --> 17:13:26,280
to the inherent randomness of machine learning and deep learning. But with that being said,
11205
17:13:26,280 --> 17:13:33,360
I'll see you in the next video. Let's get our hands on convolutional neural networks.
11206
17:13:33,360 --> 17:13:38,160
Welcome back. In the last video, we saw that our second modeling experiment, model one,
11207
17:13:38,160 --> 17:13:42,720
didn't quite beat our baseline. But now we're going to keep going with modeling experiments.
11208
17:13:42,720 --> 17:13:46,800
And we're going to move on to model two. And this is very exciting. We're going to build
11209
17:13:46,800 --> 17:13:55,200
a convolutional neural network, which are also known as CNN. CNNs are also known as
11210
17:13:55,200 --> 17:14:10,200
com net. And CNNs are known for their capabilities to find patterns in visual data. So what are
11211
17:14:10,200 --> 17:14:14,440
we going to do? Well, let's jump back into the keynote. We had a look at this slide before
11212
17:14:14,440 --> 17:14:18,720
where this is the typical architecture of a CNN. There's a fair bit going on here, but
11213
17:14:18,720 --> 17:14:23,560
we're going to step through it one by one. We have an input layer, just like any other
11214
17:14:23,560 --> 17:14:29,120
deep learning model. We have to input some kind of data. We have a bunch of hidden layers
11215
17:14:29,120 --> 17:14:34,000
in our case in a convolutional neural network, you have convolutional layers. You often have
11216
17:14:34,000 --> 17:14:38,920
hidden activations or nonlinear activation layers. You might have a pooling layer. You
11217
17:14:38,920 --> 17:14:44,080
generally always have an output layer of some sort, which is usually a linear layer. And
11218
17:14:44,080 --> 17:14:48,560
so the values for each of these different layers will depend on the problem you're working
11219
17:14:48,560 --> 17:14:53,080
on. So we're going to work towards building something like this. And you'll notice that
11220
17:14:53,080 --> 17:14:57,320
a lot of the code is quite similar to the code that we've been writing before for other
11221
17:14:57,320 --> 17:15:02,120
PyTorch models. The only difference is in here is that we're going to use different
11222
17:15:02,120 --> 17:15:09,280
layer types. And so if we want to visualize a CNN in a colored block edition, we're going
11223
17:15:09,280 --> 17:15:13,480
to code this out in a minute. So don't worry too much. We have a simple CNN. You might
11224
17:15:13,480 --> 17:15:18,080
have an input, which could be this image of my dad eating some pizza with two thumbs
11225
17:15:18,080 --> 17:15:22,920
up. We're going to preprocess that input. We're going to, in other words, turn it into
11226
17:15:22,920 --> 17:15:29,680
a tensor in red, green and blue for an image. And then we're going to pass it through a
11227
17:15:29,680 --> 17:15:36,200
combination of convolutional layers, relu layers and pooling layers. Now again, this
11228
17:15:36,200 --> 17:15:40,720
is a thing to note about deep learning models. I don't want you to get too bogged down in
11229
17:15:40,720 --> 17:15:45,600
the order of how these layers go, because they can be combined in many different ways.
11230
17:15:45,600 --> 17:15:50,400
In fact, research is coming out almost every day, every week about how to best construct
11231
17:15:50,400 --> 17:15:56,840
these layers. The overall principle is what's more important is how do you get your inputs
11232
17:15:56,840 --> 17:16:01,560
into an idolized output? That's the fun part. And then of course, we have the linear output
11233
17:16:01,560 --> 17:16:06,080
layer, which is going to output however many classes or value for however many classes
11234
17:16:06,080 --> 17:16:13,800
that we have in the case of classification. And then if you want to make your CNN deeper,
11235
17:16:13,800 --> 17:16:19,240
this is where the deep comes from deep learning, you can add more layers. So the theory behind
11236
17:16:19,240 --> 17:16:24,120
this, or the practice behind this, is that the more layers you add to your deep learning
11237
17:16:24,120 --> 17:16:30,520
model, the more chances it has to find patterns in the data. Now, how does it find these patterns?
11238
17:16:30,520 --> 17:16:35,440
Well, each one of these layers here is going to perform, just like what we've seen before,
11239
17:16:35,440 --> 17:16:41,680
a different combination of mathematical operations on whatever data we feed it. And each subsequent
11240
17:16:41,680 --> 17:16:48,240
layer receives its input from the previous layer. In this case, there are some advanced
11241
17:16:48,240 --> 17:16:52,440
networks that you'll probably come across later in your research and machine learning
11242
17:16:52,440 --> 17:16:57,640
career that use inputs from layers that are kind of over here or the way down here or
11243
17:16:57,640 --> 17:17:02,280
something like that. They're known as residual connections. But that's beyond the scope of
11244
17:17:02,280 --> 17:17:06,960
what we're covering for now. We just want to build our first convolutional neural network.
11245
17:17:06,960 --> 17:17:11,920
And so let's go back to Google Chrome. I'm going to show you my favorite website to learn
11246
17:17:11,920 --> 17:17:17,680
about convolutional neural networks. It is the CNN explainer website. And this is going
11247
17:17:17,680 --> 17:17:22,000
to be part of your extra curriculum for this video is to spend 20 minutes clicking and
11248
17:17:22,000 --> 17:17:26,040
going through this entire website. We're not going to do that together because I would
11249
17:17:26,040 --> 17:17:30,720
like you to explore it yourself. That is the best way to learn. So what you'll notice up
11250
17:17:30,720 --> 17:17:36,920
here is we have some images of some different sort. And this is going to be our input. So
11251
17:17:36,920 --> 17:17:41,600
let's start with pizza. And then we have a convolutional layer, a relu layer, a conv
11252
17:17:41,600 --> 17:17:47,440
layer, a relu layer, max pool layer, com to relu to com to relu to max pool to this
11253
17:17:47,440 --> 17:17:51,880
architecture is a convolutional neural network. And it's running live in the browser. And
11254
17:17:51,880 --> 17:17:57,680
so we pass this image, you'll notice that it breaks down into red, green and blue. And
11255
17:17:57,680 --> 17:18:01,720
then it goes through each of these layers and something happens. And then finally, we
11256
17:18:01,720 --> 17:18:07,000
have an output. And you notice that the output has 10 different classes here, because we
11257
17:18:07,000 --> 17:18:14,920
have one, two, three, four, five, six, seven, eight, nine, 10, different classes of image
11258
17:18:14,920 --> 17:18:19,600
in this demo here. And of course, we could change this if we had 100 classes, we might
11259
17:18:19,600 --> 17:18:25,560
change this to 100. But the pieces of the puzzle here would still stay quite the same.
11260
17:18:25,560 --> 17:18:30,320
And you'll notice that the class pizza has the highest output value here, because our
11261
17:18:30,320 --> 17:18:35,840
images of pizza, if we change to what is this one, espresso, it's got the highest
11262
17:18:35,840 --> 17:18:40,200
value there. So this is a pretty well performing convolutional neural network. Then we have
11263
17:18:40,200 --> 17:18:45,800
a sport car. Now, if we clicked on each one of these, something is going to happen. Let's
11264
17:18:45,800 --> 17:18:52,600
find out. We have a convolutional layer. So we have an input of an image here that 64
11265
17:18:52,600 --> 17:18:58,560
64 by three. This is color channels last format. So we have a kernel. And this kernel, this
11266
17:18:58,560 --> 17:19:02,000
is what happens inside a convolutional layer. And you might be going, well, there's a lot
11267
17:19:02,000 --> 17:19:06,400
going on here. And yes, of course, there is if this is the first time you ever seen this.
11268
17:19:06,400 --> 17:19:11,680
But essentially, what's happening is a kernel, which is also known as a filter, is going
11269
17:19:11,680 --> 17:19:17,240
over our image pixel values, because of course, they will be in the format of a tensor. And
11270
17:19:17,240 --> 17:19:22,800
trying to find small little intricate patterns in that data. So if we have a look here, and
11271
17:19:22,800 --> 17:19:26,200
this is why it's so valuable to go through this and just play around with it, we start
11272
17:19:26,200 --> 17:19:29,920
in a top left corner, and then slowly move along, you'll see on the output on the right
11273
17:19:29,920 --> 17:19:33,440
hand side, we have another little square. And do you notice in the middle all of those
11274
17:19:33,440 --> 17:19:38,960
numbers changing? Well, that is the mathematical operation that's happening as a convolutional
11275
17:19:38,960 --> 17:19:44,920
layer convolves over our input image. How cool is that? And you might be able to see on the
11276
17:19:44,920 --> 17:19:49,600
output there that there's some slight values for like, look around the headlight here. Do
11277
17:19:49,600 --> 17:19:57,240
you notice on the right how there's some activation? There's some red tiles there? Well, that
11278
17:19:57,240 --> 17:20:02,400
just means that potentially this layer or this hidden unit, and I want to zoom out for
11279
17:20:02,400 --> 17:20:10,960
a second, is we have 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 hidden units. Each one of these is
11280
17:20:10,960 --> 17:20:15,960
going to learn a different feature about the data. And now the beauty of deep learning,
11281
17:20:15,960 --> 17:20:20,360
but also one of the curses of deep learning is that we don't actually control what each
11282
17:20:20,360 --> 17:20:26,120
one of these learns. The magic of deep learning is that it figures it out itself what is
11283
17:20:26,120 --> 17:20:32,240
best to learn. We go into here, notice that each one we click on has a different representation
11284
17:20:32,240 --> 17:20:37,800
on the right hand side. And so this is what's going to happen layer by layer as it goes
11285
17:20:37,800 --> 17:20:42,200
through the convolutional neural network. And so if you want to read about what is a convolutional
11286
17:20:42,200 --> 17:20:46,560
neural network, you can go through here. But we're going to replicate this exact neural
11287
17:20:46,560 --> 17:20:51,400
network here with PyTorch code. That's how I'd prefer to learn it. But if you want the
11288
17:20:51,400 --> 17:20:55,600
intuition behind it, the math behind it, you can check out all of these resources here.
11289
17:20:55,600 --> 17:21:01,200
That is your extra curriculum for this video. So we have an input layer, we have a convolutional
11290
17:21:01,200 --> 17:21:06,520
layer, you can see how the input gets modified by some sort of mathematical operation, which
11291
17:21:06,520 --> 17:21:12,360
is of course, the convolutional operation. And we have there all different numbers finding
11292
17:21:12,360 --> 17:21:17,040
different patterns and data. This is a really good example here. You notice that the outputs
11293
17:21:17,040 --> 17:21:21,880
eyes slightly changes, that'll be a trend throughout each layer. And then we can understand
11294
17:21:21,880 --> 17:21:25,880
the different hyper parameters, but I'm going to leave this for you to explore on your own.
11295
17:21:25,880 --> 17:21:30,600
In the next video, we're going to start to write PyTorch code to replicate everything
11296
17:21:30,600 --> 17:21:40,600
that's going on here. So I'm going to link this in here to find out what's happening
11297
17:21:40,600 --> 17:21:50,680
inside CNN. See this website here. So join me in the next video. This is super exciting.
11298
17:21:50,680 --> 17:21:55,880
We're going to build our first convolutional neural network for computer vision. I'll see
11299
17:21:55,880 --> 17:22:02,840
you there. Welcome back. In the last video, we went briefly through the CNN explainer
11300
17:22:02,840 --> 17:22:07,720
website, which is my favorite resource for learning about convolutional neural networks.
11301
17:22:07,720 --> 17:22:11,880
And of course, we could spend 20 minutes clicking through everything here to find out what's
11302
17:22:11,880 --> 17:22:17,560
going on with a convolutional neural network, or we could start to code one up. So how about
11303
17:22:17,560 --> 17:22:25,400
we do that? Hey, if and down, code it out. So we're going to create a convolutional neural
11304
17:22:25,400 --> 17:22:30,040
network. And what I'm going to do is I'm going to build this, or we're going to build this
11305
17:22:30,040 --> 17:22:35,880
model together in this video. And then because it's going to use layers or PyTorch layers
11306
17:22:35,880 --> 17:22:40,280
that we haven't looked at before, we're going to spend the next couple of videos stepping
11307
17:22:40,280 --> 17:22:45,480
through those layers. So just bear with me, as we code this entire model together, we'll
11308
17:22:45,480 --> 17:22:50,920
go break it down in subsequent videos. So let's build our first convolutional neural
11309
17:22:50,920 --> 17:22:55,240
network. That's a mouthful, by the way, I'm just going to probably stick to saying CNN.
11310
17:22:55,240 --> 17:23:02,280
Fashion MNIST, we're up to model V2. We're going to subclass nn.module, as we always do
11311
17:23:02,280 --> 17:23:08,840
when we're building a PyTorch model. And in here, we're going to say model architecture
11312
17:23:10,120 --> 17:23:17,720
that replicates the tiny VGG. And you might be thinking, where did you get that from, Daniel?
11313
17:23:17,720 --> 17:23:27,000
Model from CNN explainer website. And so oftentimes, when convolutional neural networks or new
11314
17:23:27,000 --> 17:23:31,320
types of architecture come out, the authors of the research paper that present the model
11315
17:23:31,320 --> 17:23:36,280
get to name the model. And so that way, in the future, you can refer to different types of
11316
17:23:36,280 --> 17:23:42,200
model architectures with just a simple name, like tiny VGG. And people kind of know what's going on.
11317
17:23:42,200 --> 17:23:50,040
So I believe somewhere on here, it's called tiny VGG, tiny VGG. We have nothing. Yeah,
11318
17:23:50,680 --> 17:23:59,400
there we go. In tiny VGG. And do we have more than one tiny, tiny, yeah, tiny VGG. And if we
11319
17:23:59,400 --> 17:24:08,360
look up VGG, conv net, VGG 16 was one of the original ones, VGG, very deep convolutional neural
11320
17:24:08,360 --> 17:24:14,760
networks of VGG net. There's also ResNet, which is another convolutional neural network.
11321
17:24:16,120 --> 17:24:21,960
You can also, I don't want to give you my location, Google, you can go popular CNN
11322
17:24:21,960 --> 17:24:28,680
architectures. And this will give you a fair few options. Lynette is one of the first AlexNet,
11323
17:24:28,680 --> 17:24:33,560
ZF net, whole bunch of different resources. And also, how could you find out more about a
11324
17:24:33,560 --> 17:24:38,120
convolutional neural network? What is a convolutional neural network? You can go through that. But
11325
17:24:38,120 --> 17:24:43,320
let's stop that for a moment. Let's code this one up together. So we're going to initialize our
11326
17:24:44,280 --> 17:24:50,280
class here, def init. We're going to pass it in an input shape, just like we often do.
11327
17:24:50,840 --> 17:24:57,240
We're going to put in a number of hidden units, which is an int. And we're going to put in an
11328
17:24:57,240 --> 17:25:04,360
output shape, which is an int. Wonderful. So nothing to outlandish that we haven't seen before there.
11329
17:25:04,360 --> 17:25:13,000
And we're going to go super dot init to initialize our initializer for lack of a better way of
11330
17:25:13,000 --> 17:25:18,600
putting it. Now, we're going to create our neural network in a couple of blocks this time. And
11331
17:25:18,600 --> 17:25:24,200
you might often hear in when you learn more about convolutional neural networks, or I'll just tell
11332
17:25:24,200 --> 17:25:29,480
you that things are referred to are often referred to as convolutional blocks. So if we go back to
11333
17:25:29,480 --> 17:25:36,600
our keynote, this here, this combination of layers might be referred to as a convolutional block.
11334
17:25:36,600 --> 17:25:41,880
And a convolutional block, a deeper CNN, might be comprised of multiple convolutional blocks.
11335
17:25:42,680 --> 17:25:50,440
So to add to the confusion, a block is comprised of multiple layers. And then an overall architecture
11336
17:25:50,440 --> 17:25:56,520
is comprised of multiple blocks. And so the deeper and deeper your models get, the more blocks
11337
17:25:56,520 --> 17:26:01,960
it might be comprised of, and the more layers those blocks may be comprised of within them.
11338
17:26:02,920 --> 17:26:08,360
So it's kind of like Lego, which is very fun. So let's put together an an ensequential.
11339
17:26:09,640 --> 17:26:14,840
Now, the first few layers here that we're going to create in conv block one, uh,
11340
17:26:14,840 --> 17:26:21,880
nn.com 2d. Oh, look at that. Us writing us our first CNN layer. And we have to define something
11341
17:26:21,880 --> 17:26:29,240
here, which is in channels. So this channels refers to the number of channels in your visual data.
11342
17:26:29,240 --> 17:26:33,400
And we're going to put in input shape. So we're defining the input shape. This is going to be
11343
17:26:33,400 --> 17:26:39,160
the first layer in our model. The input shape is going to be what we define when we instantiate
11344
17:26:39,160 --> 17:26:44,920
this class. And then the out channels. Oh, what's the out channels going to be? Well, it's going
11345
17:26:44,920 --> 17:26:49,960
to be hidden units, just like we've done with our previous models. Now the difference here
11346
17:26:49,960 --> 17:26:55,320
is that in nn.com 2d, we have a number of different hyper parameters that we can set.
11347
17:26:55,320 --> 17:26:59,080
I'm going to set some pretty quickly here, but then we're going to step back through them,
11348
17:26:59,080 --> 17:27:04,280
not only in this video, but in subsequent videos. We've got a fair bit going on here.
11349
17:27:04,280 --> 17:27:08,840
We've got in channels, which is our input shape. We've got out channels, which are our hidden units.
11350
17:27:08,840 --> 17:27:14,520
We've got a kernel size, which equals three. Or this could be a tuple as well, three by three.
11351
17:27:14,520 --> 17:27:20,280
But I just like to keep it as three. We've got a stride and we've got padding. Now,
11352
17:27:21,080 --> 17:27:25,560
because these are values, we can set ourselves. What are they referred to as?
11353
17:27:26,840 --> 17:27:31,480
Let's write this down. Values, we can set ourselves in our neural networks.
11354
17:27:32,920 --> 17:27:40,360
In our nn's neural networks are called hyper parameters. So these are the hyper parameters
11355
17:27:40,360 --> 17:27:46,200
of nn.com 2d. And you might be thinking, what is 2d for? Well, because we're working with
11356
17:27:46,200 --> 17:27:51,640
two-dimensional data, our images have height and width. There's also com 1d for one-dimensional data,
11357
17:27:51,640 --> 17:27:55,320
3d for three-dimensional data. We're going to stick with 2d for now.
11358
17:27:56,040 --> 17:28:02,040
And so what do each of these hyper parameters do? Well, before we go through what each one of them
11359
17:28:02,040 --> 17:28:07,480
do, we're going to do that when we step by step through this particular layer. What we've just done
11360
17:28:07,480 --> 17:28:14,600
is we've replicated this particular layer of the CNN explainer website. We've still got the
11361
17:28:14,600 --> 17:28:18,520
relu. We've still got another conv and a relu and a max pool and a conv and a relu and a
11362
17:28:18,520 --> 17:28:24,360
conv and a relu and a max pool. But this is the block I was talking about. This is one block here
11363
17:28:25,400 --> 17:28:29,720
of this neural network, or at least that's how I've broken it down. And this is another block.
11364
17:28:30,360 --> 17:28:34,680
You might notice that they're comprised of the same layers just stacked on top of each other.
11365
17:28:34,680 --> 17:28:39,640
And then we're going to have an output layer. And if you want to learn about where the hyper
11366
17:28:39,640 --> 17:28:45,560
parameters came from, what we just coded, where could you learn about those? Well, one, you could
11367
17:28:45,560 --> 17:28:52,920
go, of course, to the PyTorch documentation, PyTorch, and then com 2d. You can read about it there.
11368
17:28:53,640 --> 17:28:58,200
There's the mathematical operation that we talked about or briefly stepped on before,
11369
17:28:58,200 --> 17:29:05,800
or touched on, stepped on. Is that the right word? So create a conv layer. It's there.
11370
17:29:06,440 --> 17:29:10,120
But also this is why I showed you this beautiful website so that you can read about these
11371
17:29:10,120 --> 17:29:15,080
hyper parameters down here. Understanding hyper parameters. So your extra curriculum for this
11372
17:29:15,080 --> 17:29:21,560
video is to go through this little graphic here and see if you can find out what padding means,
11373
17:29:21,560 --> 17:29:25,880
what the kernel size means, and what the stride means. I'm not going to read through this for you.
11374
17:29:25,880 --> 17:29:31,480
You can have a look at this interactive plot. We're going to keep coding because that's what
11375
17:29:31,480 --> 17:29:36,280
we're all about here. If and out, code it out. So we're going to now add a relu layer.
11376
17:29:37,240 --> 17:29:43,400
And then after that, we're going to add another conv 2d layer. And the in channels here is going
11377
17:29:43,400 --> 17:29:50,680
to be the hidden units, because we're going to take the output size of this layer and use it as
11378
17:29:50,680 --> 17:29:56,840
the input size to this layer. We're going to keep going here. Out channels equals hidden units again
11379
17:29:56,840 --> 17:30:03,880
in this case. And then the kernel size is going to be three as well. Stride will be one. Padding
11380
17:30:03,880 --> 17:30:08,840
will be one. Now, of course, we can change all of these values later on, but just bear with me
11381
17:30:08,840 --> 17:30:14,920
while we set them how they are. We'll have another relu layer. And then we're going to finish off
11382
17:30:14,920 --> 17:30:23,240
with a nn max pool 2d layer. Again, the 2d comes from the same reason we use comf2d. We're working
11383
17:30:23,240 --> 17:30:29,080
with 2d data here. And we're going to set the kernel size here to be equal to two. And of course,
11384
17:30:29,080 --> 17:30:33,560
this can be a tuple as well. So it can be two two. Now, where could you find out about nn max
11385
17:30:33,560 --> 17:30:42,920
pool 2d? Well, we go nn max pool 2d. What does this do? applies a 2d max pooling over an input
11386
17:30:42,920 --> 17:30:50,040
signal composed of several input planes. So it's taking the max of an input. And we've got some
11387
17:30:50,040 --> 17:30:55,400
parameters here, kernel size, the size of the window to take the max over. Now, where have we
11388
17:30:55,400 --> 17:31:01,560
seen a window before? I'm just going to close these. We come back up. Where did we see a window?
11389
17:31:01,560 --> 17:31:08,760
Let's dive into the max pool layer. See where my mouse is? Do you see that two by two? Well,
11390
17:31:08,760 --> 17:31:12,760
that's a window. Now, look at the difference between the input and the output. What's happening?
11391
17:31:13,320 --> 17:31:19,080
Well, we have a tile that's two by two, a window of four. And the max, we're taking the max of that
11392
17:31:19,080 --> 17:31:23,640
tile. In this case, it's zero. Let's find the actual value. There we go. So if you look at those
11393
17:31:23,640 --> 17:31:33,800
four numbers in the middle inside the max brackets, we have 0.07, 0.09, 0.06, 0.05. And the max of
11394
17:31:33,800 --> 17:31:39,880
all those is 0.09. And you'll notice that the input and the output shapes are different. The
11395
17:31:39,880 --> 17:31:46,280
output is half the size of the input. So that's what max pooling does, is it tries to take the max
11396
17:31:46,280 --> 17:31:54,120
value of whatever its input is, and then outputs it on the right here. And so as our data,
11397
17:31:54,120 --> 17:31:59,000
this is a trend in all of deep learning, actually. As our image moves through, this is what you'll
11398
17:31:59,000 --> 17:32:04,360
notice. Notice all the different shapes here. Even if you don't completely understand what's going
11399
17:32:04,360 --> 17:32:09,640
on here, you'll notice that the two values here on the left start to get smaller and smaller as
11400
17:32:09,640 --> 17:32:14,840
they go through the model. And what our model is trying to do here is take the input and learn a
11401
17:32:14,840 --> 17:32:20,600
compressed representation through each of these layers. So it's going to smoosh and smoosh and
11402
17:32:20,600 --> 17:32:27,800
smoosh trying to find the most generalizable patterns to get to the ideal output. And that
11403
17:32:27,800 --> 17:32:33,800
input is eventually going to be a feature vector to our final layer. So a lot going on there,
11404
17:32:33,800 --> 17:32:39,160
but let's keep coding. What we've just completed is this first block. We've got a cons layer,
11405
17:32:39,160 --> 17:32:44,120
a relu layer, a cons layer, a relu layer, and a max pool layer. Look at that, cons layer,
11406
17:32:44,120 --> 17:32:49,240
relu layer, cons layer, relu layer, max pool. Should we move on to the next block? We can do this
11407
17:32:49,240 --> 17:32:55,960
one a bit faster now because we've already coded the first one. So I'm going to do nn.sequential as
11408
17:32:55,960 --> 17:33:02,680
well. And then we're going to go nn.com2d. We're going to set the in channels. What should the
11409
17:33:02,680 --> 17:33:08,600
in channels be here? Well, we're going to set it to hidden units as well because our network is
11410
17:33:08,600 --> 17:33:13,320
going to flow just straight through all of these layers. And the output size of this is going to
11411
17:33:13,320 --> 17:33:19,640
be hidden units. And so we want the in channels to match up with the previous layers out channels.
11412
17:33:19,640 --> 17:33:28,040
So then we're going to go out channels equals hidden units as well. We're going to set the
11413
17:33:28,040 --> 17:33:36,120
kernel size, kernel size equals three, stride equals one, padding equals one, then what comes
11414
17:33:36,120 --> 17:33:43,560
next? Well, because the two blocks are identical, the con block one and com two, we can just go
11415
17:33:43,560 --> 17:33:52,200
the exact same combination of layers. And then relu and n.com2d in channels equals hidden units.
11416
17:33:53,480 --> 17:33:59,000
Out channels equals, you might already know this, hidden units. Then we have kernel size
11417
17:33:59,880 --> 17:34:06,280
equals three, oh, 32, don't want it that big, stride equals one, padding equals one,
11418
17:34:06,280 --> 17:34:13,480
and what comes next? Well, we have another relu layer, relu, and then what comes after that?
11419
17:34:13,480 --> 17:34:22,200
We have another max pool. And then max pool 2d, kernel size equals two, beautiful. Now,
11420
17:34:22,200 --> 17:34:27,720
what have we coded up so far? We've got this block, number one, that's what this one on the inside
11421
17:34:27,720 --> 17:34:33,640
here. And then we have com two, relu two, com two, relu two, max pool two. So we've built these
11422
17:34:33,640 --> 17:34:41,720
two blocks. Now, what do we need to do? Well, we need an output layer. And so what did we do before
11423
17:34:41,720 --> 17:34:49,640
when we made model one? We flattened the inputs of the final layer before we put them to the last
11424
17:34:49,640 --> 17:34:57,320
linear layer. So flatten. So this is going to be the same kind of setup as our classifier layer.
11425
17:34:57,880 --> 17:35:02,520
Now, I say that on purpose, because that's what you'll generally hear the last output layer
11426
17:35:02,520 --> 17:35:07,640
in a classification model called is a classifier layer. So we're going to have these two layers
11427
17:35:07,640 --> 17:35:12,040
are going to be feature extractors. In other words, they're trying to learn the patterns that
11428
17:35:12,040 --> 17:35:18,120
best represent our data. And this final layer is going to take those features and classify them
11429
17:35:18,120 --> 17:35:24,120
into our target classes. Whatever our model thinks best suits those features, or whatever our model
11430
17:35:24,120 --> 17:35:29,960
thinks those features that it learned represents in terms of our classes. So let's code it out.
11431
17:35:29,960 --> 17:35:36,600
We'll go down here. Let's build our classifier layer. This is our biggest neural network yet.
11432
17:35:37,400 --> 17:35:44,120
You should be very proud. We have an end of sequential again. And we're going to pass in
11433
17:35:44,120 --> 17:35:53,240
an end of flatten, because the output of these two blocks is going to be a multi-dimensional tensor,
11434
17:35:53,240 --> 17:36:00,200
something similar to this size 131310. So we want to flatten the outputs into a single feature
11435
17:36:00,200 --> 17:36:05,640
vector. And then we want to pass that feature vector to an nn.linear layer. And we're going to
11436
17:36:05,640 --> 17:36:13,720
go in features equals hidden units times something times something. Now, the reason I do this is
11437
17:36:13,720 --> 17:36:20,120
because we're going to find something out later on, or time zero, just so it doesn't error. But
11438
17:36:20,120 --> 17:36:25,160
sometimes calculating what you're in features needs to be is quite tricky. And I'm going to
11439
17:36:25,160 --> 17:36:30,120
show you a trick that I use later on to figure it out. And then we have out features relates
11440
17:36:30,120 --> 17:36:35,880
to our output shape, which will be the length of how many classes we have, right? One value for
11441
17:36:35,880 --> 17:36:42,280
each class that we have. And so with that being said, let's now that we've defined all of the
11442
17:36:42,280 --> 17:36:49,640
components of our tiny VGG architecture. There is a lot going on, but this is the same methodology
11443
17:36:49,640 --> 17:36:55,720
we've been using the whole time, defining some components, and then putting them together to
11444
17:36:55,720 --> 17:37:03,080
compute in some way in a forward method. So forward self X. How are we going to do this?
11445
17:37:03,640 --> 17:37:11,480
Are we going to set X is equal to self, comp block one X. So X is going to go through comp block one,
11446
17:37:11,480 --> 17:37:18,200
it's going to go through the comp 2D layer, relu layer, comp 2D layer, relu layer, max pool layer,
11447
17:37:18,200 --> 17:37:22,840
which will be the equivalent of an image going through this layer, this layer, this layer,
11448
17:37:22,840 --> 17:37:28,840
this layer, this layer, and then ending up here. So we'll set it to that. And then we can print out
11449
17:37:29,480 --> 17:37:36,680
X dot shape to get its shape. We'll check this later on. Then we pass X through comp block two,
11450
17:37:38,200 --> 17:37:42,760
which is just going to go through all of the layers in this block, which is equivalent to
11451
17:37:42,760 --> 17:37:48,520
the output of this layer going through all of these layers. And then because we've constructed a
11452
17:37:48,520 --> 17:37:54,120
classifier layer, we're going to take the output of this block, which is going to be here, and we're
11453
17:37:54,120 --> 17:37:59,960
going to pass it through our output layer, or what we've termed it, our classifier layer. I'll just
11454
17:37:59,960 --> 17:38:04,520
print out X dot shape here, so we can track the shape as our model moves through the architecture.
11455
17:38:04,520 --> 17:38:15,880
X equals self dot classifier X. And then we're going to return X. Look at us go. We just built
11456
17:38:15,880 --> 17:38:22,040
our first convolutional neural network by replicating what's on a CNN explainer website.
11457
17:38:22,600 --> 17:38:28,840
Now, that is actually very common practice in machine learning is to find some sort of architecture
11458
17:38:28,840 --> 17:38:35,560
that someone has found to work on some sort of problem and replicate it with code and see if it
11459
17:38:35,560 --> 17:38:41,640
works on your own problem. You'll see this quite often. And so now let's instantiate a model.
11460
17:38:42,200 --> 17:38:46,600
Go torch dot manual C. We're going to instantiate our first convolutional neural network.
11461
17:38:48,520 --> 17:38:57,640
Model two equals fashion amnest. We will go model V two. And we are going to set the input shape.
11462
17:38:57,640 --> 17:39:04,920
Now, what will the input shape be? Well, I'll come to the layer up here. The input shape
11463
17:39:04,920 --> 17:39:12,280
is the number of channels in our images. So do we have an image ready to go image shape?
11464
17:39:12,920 --> 17:39:18,280
This is the number of color channels in our image. We have one. If we had color images,
11465
17:39:18,280 --> 17:39:23,640
we would set the input shape to three. So the difference between our convolutional neural network,
11466
17:39:23,640 --> 17:39:30,520
our CNN, tiny VGG, and the CNN explainer tiny VGG is that they are using color images. So
11467
17:39:30,520 --> 17:39:36,520
their input is three here. So one for each color channel, red, green and blue. Whereas we have
11468
17:39:37,160 --> 17:39:41,720
black and white images. So we have only one color channel. So we set the input shape to one.
11469
17:39:42,360 --> 17:39:48,200
And then we're going to go hidden units equals 10, which is exactly the same as what tiny VGG
11470
17:39:48,200 --> 17:39:57,960
has used. 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. So that sets the hidden units value of each of our
11471
17:39:57,960 --> 17:40:04,440
layers. That's the power of creating an initializer with hidden units. And then finally, our output
11472
17:40:04,440 --> 17:40:09,320
shape is going to be what we've seen this before. This is going to be the length of our class names,
11473
17:40:09,320 --> 17:40:14,280
one value for each class in our data set. And of course, we're going to send this model to the
11474
17:40:14,280 --> 17:40:21,720
device. We're going to hit shift and enter. Oh, no, what did we get wrong? Out channels,
11475
17:40:21,720 --> 17:40:27,240
output shape. Where did I spell wrong? Out channels, out channels, out channels. I forgot an L.
11476
17:40:28,040 --> 17:40:34,600
Of course, typo. Oh, kernel size and other typo. Did you notice that?
11477
17:40:34,600 --> 17:40:42,600
Kernel size, kernel size, kernel size, kernel size. Where did we spell this wrong? Oh, here.
11478
17:40:44,440 --> 17:40:46,760
Kernel size. Are there any other typos? Probably.
11479
17:40:50,360 --> 17:40:55,640
A beautiful. There we go. Okay, what have we got? Initializing zero, obtenses and non-op.
11480
17:40:55,640 --> 17:41:00,120
Oh, so we've got an issue here and error here because I've got this. But this is just to,
11481
17:41:00,120 --> 17:41:07,800
there's a trick to calculating this. We're going to cover this in another video. But
11482
17:41:07,800 --> 17:41:13,320
pay yourself on the back. We've written a fair bit of code here. This is a convolutional neural
11483
17:41:13,320 --> 17:41:19,240
network that replicates the tiny VGG architecture on the CNN explainer website. Now, don't forget,
11484
17:41:19,240 --> 17:41:24,120
your extra curriculum is to go through this website for at least 20 minutes and read about
11485
17:41:24,120 --> 17:41:29,000
what's happening in our models. We're focused on code here. But this is particularly where you
11486
17:41:29,000 --> 17:41:33,160
want to pay attention to. If you read through this understanding hyper parameters and play around
11487
17:41:33,160 --> 17:41:37,880
with this, the next couple of videos will make a lot more sense. So read about padding,
11488
17:41:37,880 --> 17:41:43,800
read about kernel size and read about stride. I'll see you in the next video. We're going to go
11489
17:41:43,800 --> 17:41:51,080
through our network step by step. Welcome back. Now, I'm super stoked because in the last video,
11490
17:41:51,080 --> 17:41:58,120
we coded together our first ever convolutional neural network in PyTorch. So well done. We
11491
17:41:58,120 --> 17:42:03,560
replicated the tiny VGG architecture from the CNN explainer website, my favorite place for learning
11492
17:42:03,560 --> 17:42:10,600
about CNNs in the browser. So now we introduced two new layers that we haven't seen before,
11493
17:42:10,600 --> 17:42:17,800
conv2d and maxpool2d. But they all have the same sort of premise of what we've been doing so far
11494
17:42:17,800 --> 17:42:23,720
is that they're trying to learn the best features to represent our data in some way, shape or form.
11495
17:42:23,720 --> 17:42:29,640
Now, in the case of maxpool2d, it doesn't actually have any learnable parameters. It just takes
11496
17:42:29,640 --> 17:42:34,520
the max, but we're going to step through that later on. Let's use this video to step through
11497
17:42:34,520 --> 17:42:41,720
and then conv2d. We're going to do that with code. So I'll make a new heading here. 7.1
11498
17:42:43,080 --> 17:42:52,360
stepping through and then conv2d. Beautiful. Now, where could we find out what's going on
11499
17:42:52,360 --> 17:43:00,520
in an end comp2d? Well, of course, we have the documentation and then comp2d. We've got PyTorch.
11500
17:43:00,520 --> 17:43:05,800
So if you want to learn the mathematical operation that's happening, we have this value here, this
11501
17:43:06,360 --> 17:43:11,480
operation here. Essentially, it's saying the output is equal to the bias term times something,
11502
17:43:11,480 --> 17:43:18,040
plus the sum of the weight times something times the input. So do you see how just the weight
11503
17:43:18,040 --> 17:43:23,560
matrix, the weight tensor and the bias value, manipulating our input in some way equals the output?
11504
17:43:24,440 --> 17:43:33,480
Now, if we map this, we've got batch size, channels in, height, width, channels out, out, out,
11505
17:43:34,040 --> 17:43:37,640
et cetera, et cetera. But we're not going to focus too much on this. If you'd like to
11506
17:43:37,640 --> 17:43:44,280
read more into that, you can. Let's try it with code. And we're going to reproduce this particular
11507
17:43:44,280 --> 17:43:50,760
layer here, the first layer of the CNN explainer website. And we're going to do it with a dummy input.
11508
17:43:50,760 --> 17:43:55,480
In fact, that's one of my favorite ways to test things. So I'm just going to link here the documentation.
11509
17:43:58,040 --> 17:44:09,400
See the documentation for an end comp2d here. And if you'd like to read through more of this,
11510
17:44:09,400 --> 17:44:12,760
of course, this is a beautiful place to learn about what's going on.
11511
17:44:12,760 --> 17:44:18,840
There's the shape how to calculate the shape, height out, width out, et cetera. That's very
11512
17:44:18,840 --> 17:44:22,840
helpful if you need to calculate input and output shapes. But I'll show you my trick for doing so
11513
17:44:22,840 --> 17:44:30,440
later on. We have here, let's create some dummy data. So I'm going to set torch manual seed. We
11514
17:44:30,440 --> 17:44:38,600
need it to be the same size as our CNN explainer data. So 64, 64, 3. But we're going to do it
11515
17:44:38,600 --> 17:44:44,520
pie torch style. This is color channels last. We're going to do color channels first. So how
11516
17:44:44,520 --> 17:44:52,920
about we create a batch of images, we're going to be writing torch dot rand n. And we're going to
11517
17:44:52,920 --> 17:45:01,480
pass in size equals 32, three, 64, 64. And then we're going to create a singular image by taking
11518
17:45:01,480 --> 17:45:09,160
the first of that. So image is zero. Now, let's get the image batch shape. Because a lot of
11519
17:45:09,160 --> 17:45:15,000
machine learning, as I've said before, and deep learning is making sure your data has the right
11520
17:45:15,000 --> 17:45:25,320
shape. So let's check images dot shape. And let's check single image shape. We're going to go test
11521
17:45:25,320 --> 17:45:32,200
image dot shape. And finally, we're going to print, what does the test image look like?
11522
17:45:34,520 --> 17:45:39,240
We'll get this on a new line, hey, new line test image, this is of course not going to be an
11523
17:45:39,240 --> 17:45:44,440
actual image is just going to be a collection of random numbers. And of course, that is what
11524
17:45:44,440 --> 17:45:48,600
our model is currently comprised of model two, if we have a look at what's on the insides,
11525
17:45:49,400 --> 17:45:54,200
we are going to see a whole bunch of random numbers. Look at all this. What do we have?
11526
17:45:54,200 --> 17:46:02,120
We scroll up is going to give us a name for something. We have comp block two, two, we have a weight,
11527
17:46:02,120 --> 17:46:08,200
we have a bias, keep going up, we go right to the top, we have another weight, keep going down,
11528
17:46:08,200 --> 17:46:13,720
we have a bias, a weight, et cetera, et cetera. Now, our model is comprised of random numbers,
11529
17:46:13,720 --> 17:46:19,080
and what we are trying to do is just like all of our other models is pass data in and adjust the
11530
17:46:19,080 --> 17:46:25,160
random numbers within these layers to best represent our data. So let's see what happens
11531
17:46:25,160 --> 17:46:33,480
if we pass some random data through one of our comp2d layers. So let's go here, we're going to
11532
17:46:33,480 --> 17:46:46,040
create a single comp2d layer. So comp layer equals, what is it equal? And then comp2d,
11533
17:46:46,040 --> 17:46:51,960
and we're going to set the in channels is equal to what? Oh, revealed the answer too quickly.
11534
17:46:52,680 --> 17:46:59,880
Three. Why is it three? Well, it's because the in channels is the same number of color channels
11535
17:46:59,880 --> 17:47:09,400
as our images. So if we have a look at our test image shape, what do we have? Three, it has three
11536
17:47:09,400 --> 17:47:14,280
color channels. That is the same as the value here, except the order is reversed. This is color
11537
17:47:14,280 --> 17:47:20,840
channels last, pytorch defaults to color channels first. So, or for now it does, in the future this
11538
17:47:20,840 --> 17:47:25,880
may change. So just keep that in mind. So out channels equals 10. This is equivalent to the
11539
17:47:25,880 --> 17:47:32,600
number of hidden units we have. One. Oh, I don't want that one just yet. One, two, three, four,
11540
17:47:32,600 --> 17:47:39,000
five, six, seven, eight, nine, 10. So we have that 10 there. So we have 10 there. And then we have
11541
17:47:39,000 --> 17:47:44,440
kernel size. Oh, what is the kernel? Well, it's not KFC. I can tell you that. And then we have
11542
17:47:44,440 --> 17:47:48,680
stride. And then we have padding. We're going to step through these in a second. But let's check
11543
17:47:48,680 --> 17:47:55,240
out the kernel. And this kernel can also be three by three. But it's a shortcut to just type in three.
11544
17:47:55,240 --> 17:47:59,400
So that's what it actually means. If you just type in a single number, it's equivalent to typing in
11545
17:47:59,400 --> 17:48:04,200
a tuple. Now, of course, you could find that out by reading through the documentation here.
11546
17:48:04,200 --> 17:48:09,320
But where did I get that value? Well, let's dive into this beautiful website. And let's see what
11547
17:48:09,320 --> 17:48:15,880
happening. So we have a kernel here, which is also called a filter. So the thing I'm talking about
11548
17:48:15,880 --> 17:48:21,560
is this little square here, this kernel. Oh, we can see the weights there at the top. This is how
11549
17:48:21,560 --> 17:48:28,760
beautiful this website is. So if we go over there, this is what's going to happen. This is a convolution.
11550
17:48:28,760 --> 17:48:36,760
It starts with this little square, and it moves pixel by pixel across our image. And you'll notice
11551
17:48:36,760 --> 17:48:41,000
that the output is creating some sort of number there. And you'll notice in the middle, we have a
11552
17:48:41,000 --> 17:48:49,640
mathematical operation. This operation here is what's happening here. I wait times the input.
11553
17:48:50,920 --> 17:48:54,520
That's what we've got there. Now, the beauty of PyTorch is it does all of this behind the
11554
17:48:54,520 --> 17:48:59,160
scenes for us. So again, if you'd like to dig more into the mathematical operation behind the
11555
17:48:59,160 --> 17:49:03,480
scenes, you've got the resource here. And you've also got plenty of other resources online. We're
11556
17:49:03,480 --> 17:49:09,720
going to focus on code for now. So if we keep doing this across our entire image, we get this
11557
17:49:09,720 --> 17:49:14,680
output over here. So that's the kernel. And now where did I get three by three from? Well, look at
11558
17:49:14,680 --> 17:49:23,000
this. One, two, three, one, two, three, one, two, three, three by three, we have nine squares. Now,
11559
17:49:23,000 --> 17:49:28,680
if we scroll down, this was your extracurricular for the last video, understanding hyperparameters.
11560
17:49:28,680 --> 17:49:33,000
What happens if we change the kernel size to three by three? Have a look at the red square on the
11561
17:49:33,000 --> 17:49:38,840
left. Now, if we change it to two by two, it changed again. Three by three. This is our kernel,
11562
17:49:38,840 --> 17:49:43,880
or also known as a filter, passing across our image, performing some sort of mathematical
11563
17:49:43,880 --> 17:49:50,040
operation. And now the whole idea of a convolutional layer is to try and make sure that this kernel
11564
17:49:50,040 --> 17:49:55,880
performs the right operation to get the right output over here. Now, what do these kernels learn?
11565
17:49:55,880 --> 17:50:00,200
Well, that is entirely up to the model. That's the beauty of deep learning is that it
11566
17:50:00,200 --> 17:50:08,040
learns how to best represent our data, hopefully, on its own by looking at more data. And then so
11567
17:50:08,040 --> 17:50:13,160
if we jump back in here, so that's the equivalent of setting kernel size three by three. What if
11568
17:50:13,160 --> 17:50:17,720
we set the stride equal to one? Have we got this in the right order? It doesn't really matter.
11569
17:50:17,720 --> 17:50:25,480
Let's go through stride next. If we go to here, what does stride say? Stride of the convolution
11570
17:50:26,200 --> 17:50:32,040
of the convolving kernel. The default is one. Wonderful. Now, if we set the stride,
11571
17:50:32,040 --> 17:50:36,440
or if we keep it at one, it's a default one, it's going to hop over, watch the red square on the
11572
17:50:36,440 --> 17:50:43,480
left. It's going to hop over one pixel at a time. So the convolution, the convolving, happens one
11573
17:50:43,480 --> 17:50:48,920
pixel at a time. That's what the stride sets. Now, watch what happens when I change the stride
11574
17:50:48,920 --> 17:50:58,680
value to the output shape. Wow. Do you notice that it went down? So we have here, the kernel size
11575
17:50:58,680 --> 17:51:04,120
is still the same. But now we're jumping over two pixels at a time. Notice how on the left,
11576
17:51:04,120 --> 17:51:09,160
two pixels become available. And then if I jump over again, two pixels. So the reason why the
11577
17:51:09,160 --> 17:51:14,920
output compresses is because we're skipping some pixels as we go across the image. And now this
11578
17:51:14,920 --> 17:51:23,560
pattern happens throughout the entire network. That's one of the reasons why you see the size
11579
17:51:23,560 --> 17:51:31,160
of our input or the size of each layer go down over time. What our convolutional layer is doing,
11580
17:51:31,160 --> 17:51:36,280
and in fact, a lot of deep learning neural networks do, is they try to compress the input
11581
17:51:36,840 --> 17:51:43,960
into some representation that best suits the data. Because it would be no point of just memorizing
11582
17:51:43,960 --> 17:51:47,960
the exact patterns, you want to compress it in some way. Otherwise, you just might as well move
11583
17:51:47,960 --> 17:51:53,480
your input data around. You want to learn generalizable patterns that you can move around. And so we
11584
17:51:53,480 --> 17:51:58,280
keep going. We've got padding equals zero. Let's see what happens here. If we change the padding
11585
17:51:58,280 --> 17:52:06,600
value, what happens? Up, down. Notice the size here. Oh, we've added two extra pixels around the
11586
17:52:06,600 --> 17:52:11,960
edge. Now if we go down, one extra pixel. Now if we go zero, now why might we do that?
11587
17:52:13,000 --> 17:52:18,680
If we add some padding on the end, well, that's so that our kernel can operate on what's going on
11588
17:52:18,680 --> 17:52:23,880
here in the corner. In case there's some information on the edges of our image. Then you might be
11589
17:52:23,880 --> 17:52:28,040
thinking, Daniel, there's a whole bunch of values here. How do we know what to set them?
11590
17:52:28,040 --> 17:52:31,880
Well, you notice that I've just copied exactly what is going on here.
11591
17:52:33,560 --> 17:52:39,080
There's a three by three kernel. There's no padding on the image. And the stride is just going
11592
17:52:39,080 --> 17:52:44,600
one by one. And so that's often very common in machine learning, is that when you're just getting
11593
17:52:44,600 --> 17:52:49,480
started and you're not sure what values to set these values to, you just copy some existing
11594
17:52:49,480 --> 17:52:54,360
values from somewhere and see if it works on your own problem. And then if it doesn't, well,
11595
17:52:54,360 --> 17:53:00,600
you can adjust them. So let's see what happens when we do that. So pass the data through
11596
17:53:02,120 --> 17:53:09,320
the convolutional layer. So let's see what happens. Conv output equals conv layer.
11597
17:53:09,320 --> 17:53:15,400
Let's pass it our test image. And we'll check the conv output. What happens?
11598
17:53:17,720 --> 17:53:22,840
Oh no, we get an error. Of course we get a shape error. One of the most common issues of machine
11599
17:53:22,840 --> 17:53:28,680
learning and deep learning. So this is saying that our input for the conv layer expects a four
11600
17:53:28,680 --> 17:53:35,560
dimensional tensor, except it got a three dimensional input size of 364 64. Now, how do we add an
11601
17:53:35,560 --> 17:53:44,920
extra dimension to our test image? Let's have a look. How would we add a batch dimension over on
11602
17:53:44,920 --> 17:53:53,240
the left here? We can go unsqueeze zero. So now we have a four dimensional tensor. Now, just keep
11603
17:53:53,240 --> 17:53:59,320
in mind that if you're running this layer and then com2d on a pytorch version, that is, I believe
11604
17:53:59,320 --> 17:54:08,760
they fixed this or they changed it in pytorch. What am I on? I think this Google collab instance is
11605
17:54:08,760 --> 17:54:14,680
on 1.10. I think you might not get this error if you're running 1.11. So just keep that in mind.
11606
17:54:14,680 --> 17:54:20,920
Like this should work if you're running 1.11. But if it doesn't, you can always unsqueeze here.
11607
17:54:22,440 --> 17:54:28,280
And let's see what happens. Look at that. We get another tensor output. Again,
11608
17:54:28,280 --> 17:54:32,520
this is just all random numbers though, because our test image is just random numbers. And our
11609
17:54:32,520 --> 17:54:38,280
conv layer is instantiated with random numbers. But we'll set the manual seed here. Now, if our
11610
17:54:38,280 --> 17:54:41,880
numbers are different to what's, if your numbers are different to what's on my screen, don't worry
11611
17:54:41,880 --> 17:54:48,440
too much. Why is that? Because our conv layer is instantiated with random numbers. And our test
11612
17:54:48,440 --> 17:54:54,200
image is just random numbers as well. What we're paying attention to is the input and output shapes.
11613
17:54:55,240 --> 17:54:59,640
Do you see what just happened? We put our input image in there with three channels.
11614
17:54:59,640 --> 17:55:05,000
And now because we've set out channels to be 10, we've got 10. And we've got 62, 62. And this is
11615
17:55:05,000 --> 17:55:09,960
just the batch size. It just means one image. So essentially our random numbers, our test image,
11616
17:55:10,520 --> 17:55:15,080
have gone through the convolutional layer that we created, have gone through this mathematical
11617
17:55:15,080 --> 17:55:19,400
operation with regards to all the values that we've set, we've put the weight tensor, well,
11618
17:55:19,400 --> 17:55:24,360
actually PyTorch created that for us. PyTorch has done this whole operation for us. Thank you,
11619
17:55:24,360 --> 17:55:29,560
PyTorch. It's gone through all of these steps across. You could code this all by hand if you want,
11620
17:55:29,560 --> 17:55:34,920
but it's a lot easier and simpler to use a PyTorch layer. And it's done this. And now it's
11621
17:55:34,920 --> 17:55:41,720
created this output. Now, whatever this output is, I don't know, it is random numbers, but this
11622
17:55:41,720 --> 17:55:47,560
same process will happen if we use actual data as well. So let's see what happens if we change
11623
17:55:47,560 --> 17:55:54,440
the values kernel size we increase. Notice how our output has gotten smaller because we're using
11624
17:55:54,440 --> 17:55:59,080
a bigger kernel to convolve across the image. What if we put this to three, three back to what it
11625
17:55:59,080 --> 17:56:04,120
was and stride of two? What do you think will happen? Well, our output size basically halves
11626
17:56:04,120 --> 17:56:08,840
because we're skipping two pixels at a time. We'll put that back to one. What do you think will
11627
17:56:08,840 --> 17:56:16,840
happen if we set padding to one? 64, 64. We get basically the same size because we've added an
11628
17:56:16,840 --> 17:56:21,960
extra pixel around the edges. So you can play around with this. And in fact, I encourage you to
11629
17:56:21,960 --> 17:56:27,400
do this is what we just did. Padding one, we just added an extra dummy zero pixel around the edges.
11630
17:56:27,400 --> 17:56:34,040
So practice with this, see what happens as you pass our test image, random numbers,
11631
17:56:34,040 --> 17:56:40,120
through a conv 2d layer with different values here. What do you think will happen if you change
11632
17:56:40,120 --> 17:56:48,840
this to 64? Give that a shot and I'll see you in the next video. Who's ready to step through
11633
17:56:48,840 --> 17:56:55,240
the nn max pool 2d layer? Put your hand up. I've got my hand up. So let's do it together, hey,
11634
17:56:55,240 --> 17:57:01,560
we've got 7.2. Now you might have already given this a shot yourself. Stepping through
11635
17:57:03,000 --> 17:57:12,200
nn max pool 2d. And this is this is what I do for a lot of different concepts that I haven't
11636
17:57:12,200 --> 17:57:17,640
gone through before is I just write some test code and see what the inputs and outputs are.
11637
17:57:18,200 --> 17:57:23,240
And so where could we find out about max pool 2d? Well, of course, we've got the documentation.
11638
17:57:23,240 --> 17:57:30,040
I'm just going to link this in here. Max pool 2d. In the simplest case, the output value of
11639
17:57:30,040 --> 17:57:38,360
layer with input size nchw output nch out w out. By the way, this is number of batches,
11640
17:57:38,360 --> 17:57:44,680
color channels, height, width. And this is the output of that layer. And kernel size, which is
11641
17:57:44,680 --> 17:57:52,600
a parameter up here, k h k w can be precisely described as out is going to be the max of some
11642
17:57:52,600 --> 17:57:59,800
value, depending on the kernel size and the stride. So let's have a look at that in practice.
11643
17:57:59,800 --> 17:58:03,960
And of course, you can read further through the documentation here. I'll just grab the link for
11644
17:58:03,960 --> 17:58:13,880
this actually. So it's here. Wonderful. And let's now first try it with our test image that we
11645
17:58:13,880 --> 17:58:20,840
created above. So just highlight what the test image is. A bunch of random numbers in the same
11646
17:58:20,840 --> 17:58:27,160
shape as what a single image would be if we were to replicate the image size of the CNN explainer.
11647
17:58:28,440 --> 17:58:33,960
By the way, we'll have a look at a visual in a second of max pool here. But you can go through
11648
17:58:33,960 --> 17:58:39,640
that on your own time. Let's if in doubt, code it out. So we're going to print out the original
11649
17:58:39,640 --> 17:58:48,760
image shape without unsqueezed dimension. Because recall that we had to add an extra dimension to
11650
17:58:48,760 --> 17:58:53,800
pass it through our com2d layer. Now, if you're using a later version of PyTorch, you might not
11651
17:58:53,800 --> 17:58:58,600
get an error if you only use a three dimensional image tensor and pass it through a comp layer.
11652
17:58:59,400 --> 17:59:08,520
So we're going to pass it in test image, original shape, test image dot shape. So this is just going
11653
17:59:08,520 --> 17:59:13,800
to tell us what the line of code in the cell above tells us. But that's fine. I like to make
11654
17:59:13,800 --> 17:59:21,640
pretty printouts, you know, test image with unsqueezed dimension. So this is just going to be our test
11655
17:59:21,640 --> 17:59:28,040
image. And we're going to see what happens when we unsqueeze a dimension, unsqueeze on zero
11656
17:59:28,040 --> 17:59:34,360
for dimension. That is about to say first, but it's the zero. Now we're going to create a sample
11657
17:59:34,360 --> 17:59:44,680
nn max pool 2d layer. Because remember, even layers themselves in torch dot nn are models
11658
17:59:45,480 --> 17:59:49,960
of their own accord. So we can just create a single, this is like creating a single layer model here.
11659
17:59:50,520 --> 17:59:55,400
We'll set the kernel size equal to two. And recall, if we go back to CNN explainer,
11660
17:59:55,960 --> 18:00:01,960
kernel size equal to two results in a two by two square, a two by two kernel that's going to
11661
18:00:01,960 --> 18:00:09,960
convolve over our image, like so. And this is an example input, an example output. And you can see
11662
18:00:09,960 --> 18:00:15,400
the operation that max pooling does here. So just keep that in mind as we pass some sample data
11663
18:00:15,400 --> 18:00:21,960
through our max pool layer. And now let's pass data through it. I actually will pass it through
11664
18:00:21,960 --> 18:00:28,040
just the conv layer first, through just the conv layer. Because that's sort of how you might stack
11665
18:00:28,040 --> 18:00:32,120
things, you might put a convolutional layer and then a max pool layer on top of that convolutional
11666
18:00:32,120 --> 18:00:39,640
layer. So test image through conv. We'll create a variable here, equals our conv layer.
11667
18:00:41,880 --> 18:00:48,840
Is going to take as an input, our test image dot unsqueeze on the zero dimension again.
11668
18:00:50,280 --> 18:00:55,880
Beautiful. Now we're going to print out the shape here. This is just highlighting how I
11669
18:00:55,880 --> 18:01:00,440
like to troubleshoot things is I do one step, print the shape, one step, print the shape,
11670
18:01:00,440 --> 18:01:08,840
see what is happening as our data moves through various layers. So test image through conv.shape,
11671
18:01:08,840 --> 18:01:16,440
we'll see what our conv layer does to the shape of our data. And then we're going to pass data through
11672
18:01:19,480 --> 18:01:24,200
max pool layer, which is the layer we created a couple of lines above this one here.
11673
18:01:24,200 --> 18:01:34,440
So let's see what happens. Test image through current type at the moment through conv and max
11674
18:01:34,440 --> 18:01:41,400
pool. So quite a long variable name here, but this is to help us avoid confusion of what's
11675
18:01:41,400 --> 18:01:47,320
going on. So we go test image through conv. So you notice how we're taking the output of our
11676
18:01:47,320 --> 18:01:53,720
convolutional layer, this here, and we're passing it through our max pool layer, which has another
11677
18:01:53,720 --> 18:02:02,840
typo. Wonderful. And finally, we'll print out the shape, shape after going through conv layer
11678
18:02:04,120 --> 18:02:13,160
and max pool layer. What happens here? So we want test image through conv and max pool.
11679
18:02:13,960 --> 18:02:21,000
Let's see how our max pool layer manipulates our test images shape. You ready? Three, two,
11680
18:02:21,000 --> 18:02:27,240
one, let's go. What do we get? Okay. So we have the test image original shape,
11681
18:02:27,240 --> 18:02:32,840
recall that our test image is just a collection of random numbers. And of course, our conv layer
11682
18:02:33,480 --> 18:02:39,480
is going to be instantiated with random numbers. And max pool actually has no parameters. It just
11683
18:02:39,480 --> 18:02:48,920
takes the maximum of a certain range of inner tensor. So when we unsqueeze the test image as the
11684
18:02:48,920 --> 18:02:55,480
input, we get an extra dimension here. When we pass it through our conv layer. Oh, where did this
11685
18:02:55,480 --> 18:03:03,640
64 come from? 164 64 64 64. Let's go back up to our conv layer. Do you notice how that we get the
11686
18:03:03,640 --> 18:03:09,320
64 there because we changed the out channels value? If we change this back to 10, like what's in the
11687
18:03:09,320 --> 18:03:17,240
CNN explainer model? One, two, three, four, five, six, seven, eight, nine, 10. What do you think will
11688
18:03:17,240 --> 18:03:25,560
happen there? Well, we get a little highlight here. 10. Then we keep going. I'll just get rid of
11689
18:03:25,560 --> 18:03:29,080
this extra cell. We don't need to check the version anymore. We'll check the test image
11690
18:03:29,080 --> 18:03:36,760
shapes still three 64 64. But then as we pass it through the conv layer here, we get a different
11691
18:03:36,760 --> 18:03:42,120
size now. So it originally had three channels as the input for color channels, but we've upscaled
11692
18:03:42,120 --> 18:03:51,640
it to 10 so that we have 10 hidden units in our layer. And then we have 64 64. Now, again,
11693
18:03:51,640 --> 18:03:56,920
these shapes will change if we change the values of what's going on here. So we might put padding
11694
18:03:56,920 --> 18:04:05,240
to zero. What happens there? Instead of 64 64, we get 62 62. And then what happens after we pass
11695
18:04:05,240 --> 18:04:15,640
it through the conv layer and then through the max pool layer? We've got 110 64 64. And now we have
11696
18:04:15,640 --> 18:04:23,240
110 32 32. Now, why is that? Well, let's go back into the CNN explainer, jump into this max pool
11697
18:04:23,240 --> 18:04:27,960
layer here. Maybe this one because it's got a bit more going on. Do you notice on the left here is
11698
18:04:27,960 --> 18:04:33,400
the input? And we've got a two by two kernel here. And so the max pooling layer, what it does is it
11699
18:04:33,400 --> 18:04:39,960
takes the maximum of whatever the input is. So you'll notice the input is 60 60 in this case.
11700
18:04:40,920 --> 18:04:47,080
Whereas the output over here is 30 30. Now, why is that? Well, because the max operation here is
11701
18:04:47,080 --> 18:04:53,720
reducing it from section of four numbers. So let's get one with a few different numbers.
11702
18:04:55,720 --> 18:05:01,240
There we go. That'll do. So it's taking it from four numbers and finding the maximum value within
11703
18:05:01,240 --> 18:05:09,480
those four numbers here. Now, why would it do that? So as we've discussed before, what deep learning
11704
18:05:09,480 --> 18:05:15,560
neural network is trying to do or in this case, a CNN is take some input data and figure out
11705
18:05:15,560 --> 18:05:21,480
what features best represent whatever the input data is and compress them into a feature vector
11706
18:05:21,480 --> 18:05:28,680
that is going to be our output. Now, the reason being for that is because you could consider it
11707
18:05:28,680 --> 18:05:32,760
from a neural networks perspective is that intelligence is compression. So you're trying to
11708
18:05:32,760 --> 18:05:39,720
compress the patterns that make up actual data into a smaller vector space, go from a higher
11709
18:05:39,720 --> 18:05:46,280
dimensional space to a smaller vector space in terms of dimensionality of a tensor. But still,
11710
18:05:46,280 --> 18:05:52,520
this smaller dimensionality space represents the original data and can be used to predict on future
11711
18:05:52,520 --> 18:05:59,880
data. So that's the idea behind Max Paul is, hey, if we've got these learned features from our
11712
18:05:59,880 --> 18:06:05,960
convolutional layers, will the patterns, will the most important patterns stay around if we just
11713
18:06:05,960 --> 18:06:11,560
take the maximum of a certain section? So do you notice how the input here, we still have,
11714
18:06:11,560 --> 18:06:16,760
you can still see the outline of the car here, albeit a little bit more pixelated,
11715
18:06:16,760 --> 18:06:21,960
but just by taking the max of a certain region, we've got potentially the most important feature
11716
18:06:21,960 --> 18:06:27,880
of that little section. And now, of course, you could customize this value here. If when we
11717
18:06:27,880 --> 18:06:32,520
create our max pool layer, you could increase the kernel size to four by four. What do you think
11718
18:06:32,520 --> 18:06:38,600
will happen if we can increase it to four? So here, we've got a two by two kernel. If we increase it
11719
18:06:38,600 --> 18:06:46,600
to four by four, what happens? Ah, do you notice that we've gone from 62 to 15, we've essentially
11720
18:06:46,600 --> 18:06:53,560
divided our feature space by four, we've compressed it even further. Now, will that work? Well,
11721
18:06:53,560 --> 18:06:57,560
I'm not sure. That's part of the experimental nature of machine learning, but we're going to
11722
18:06:57,560 --> 18:07:04,680
keep it at two for now. And so this is with our tensor here 6464. But now let's do the same as
11723
18:07:04,680 --> 18:07:09,800
what we've done above, but we'll do it with a smaller tensor so that we can really visualize
11724
18:07:09,800 --> 18:07:17,880
things. And we're going to just replicate the same operation that's going on here. So let's go here,
11725
18:07:17,880 --> 18:07:25,320
we'll create another random tensor. We'll set up the manual seed first. And we're going to create
11726
18:07:25,320 --> 18:07:35,160
a random tensor with a similar number of dimensions. Now, recall dimensions don't tell you, so this
11727
18:07:35,160 --> 18:07:43,080
is a dimension 1364 64. That is a dimension. The dimensions can have different values within
11728
18:07:43,080 --> 18:07:50,920
themselves. So we want to create a four dimensional tensor to our images. So what that means is,
11729
18:07:50,920 --> 18:07:57,560
let me just show you it's way easy to explain things when we've got code is torch dot rand n.
11730
18:07:57,560 --> 18:08:05,400
And we're going to set it up as size equals one, one, two, two. We can have a look at this random
11731
18:08:05,400 --> 18:08:12,520
tensor. It's got four dimensions. One, two, three, four. So you could have a batch size,
11732
18:08:12,520 --> 18:08:18,280
color channels, and height width, a very small image, but it's a random image here. But this is
11733
18:08:18,280 --> 18:08:24,520
quite similar to what we've got going on here, right? Four numbers. Now, what do you think will
11734
18:08:24,520 --> 18:08:31,160
happen if we create a max pool layer, just like we've done above, create a max pool layer. So we
11735
18:08:31,160 --> 18:08:36,760
go max pool layer, just repeating the code that we have in the cell above, that's all right,
11736
18:08:36,760 --> 18:08:46,280
a little bit of practice. Kernel size equals two. And then we're going to pass the random tensor
11737
18:08:46,280 --> 18:09:05,000
through the max pool layer. So we'll go max pool tensor equals max pool layer. And we're going
11738
18:09:05,000 --> 18:09:10,600
to pass it in the random tensor. Wonderful. And then we can print out some shapes and print
11739
18:09:10,600 --> 18:09:15,480
out some tenses. As we always do to visualize, visualize, visualize. So we're going to write in
11740
18:09:15,480 --> 18:09:24,680
here max pool tensor on a new line. We'll get in the max pool tensor. We'll see what this looks
11741
18:09:24,680 --> 18:09:32,760
like. And we'll also print out max pool tensor shape. And we can probably print out random tensor
11742
18:09:32,760 --> 18:09:39,160
itself, as well as its shape as well. We'll get the shape here, dot shape. And we'll do the same
11743
18:09:39,160 --> 18:09:52,920
for the random tensor. So print, get a new line, random tensor, new line, random tensor. And then
11744
18:09:52,920 --> 18:10:02,360
we'll get the shape. Random tensor shape, random tensor. Oh, a lot of coding here. That's, that's
11745
18:10:02,360 --> 18:10:06,760
the fun part about machine learning, right? You get to write lots of code. Okay. So we're
11746
18:10:06,760 --> 18:10:11,160
visualizing what's going on with our random tensor. This is what's happening within the max pool layer.
11747
18:10:11,160 --> 18:10:15,320
We've seen this from a few different angles now. So we have a random tensor of numbers,
11748
18:10:15,320 --> 18:10:21,160
and we've got a size here. But the max pool tensor, once we pass our random tensor,
11749
18:10:21,800 --> 18:10:30,840
through the max pool layer, what happens? Well, we have 0.3367, 1288, 2345, 2303. Now,
11750
18:10:30,840 --> 18:10:37,720
what's the max of all these? Well, it takes the max here is 3367. Oh, and we've got the random
11751
18:10:37,720 --> 18:10:44,360
tensor down there. We don't want that. And see how we've reduced the shape from two by two to one
11752
18:10:44,360 --> 18:10:51,560
by one. Now, what's going on here? Just for one last time to reiterate, the convolutional layer
11753
18:10:52,200 --> 18:10:59,080
is trying to learn the most important features within an image. So if we jump into here,
11754
18:10:59,080 --> 18:11:06,200
now, what are they? Well, we don't decide what a convolutional layer learns. It learns these
11755
18:11:06,200 --> 18:11:12,200
features on its own. So the convolutional layer learns those features. We pass them through a
11756
18:11:12,200 --> 18:11:18,120
relu nonlinear activation in case our data requires nonlinear functions. And then we pass
11757
18:11:18,120 --> 18:11:24,600
those learned features through a max pool layer to compress them even further. So the convolutional
11758
18:11:24,600 --> 18:11:29,880
layer can compress the features into a smaller space. But the max pooling layer really compresses
11759
18:11:29,880 --> 18:11:36,440
them. So that's the entire idea. One more time, we start with some input data. We design a neural
11760
18:11:36,440 --> 18:11:41,000
network, in this case, a convolutional neural network, to learn a compressed representation
11761
18:11:41,000 --> 18:11:46,360
of what our input data is, so that we can use this compressed representation to later on make
11762
18:11:46,360 --> 18:11:52,360
predictions on images of our own. And in fact, you can try that out if you wanted to click here
11763
18:11:52,360 --> 18:11:58,760
and add your own image. So I'd give that a go. That's your extension for this video. But now we've
11764
18:11:58,760 --> 18:12:06,520
stepped through the max pool 2D layer and the conv 2D layer. I think it's time we started to try
11765
18:12:06,520 --> 18:12:14,200
and use our tiny VGG network. This is your challenge is to create a dummy tensor and pass it through
11766
18:12:14,200 --> 18:12:21,160
this model. Pass it through its forward layer and see what happens to the shape of your dummy tensor
11767
18:12:21,160 --> 18:12:28,280
as it moves through conv block 1 and conv block 2. And I'll show you my trick to calculating
11768
18:12:28,280 --> 18:12:33,000
the in features here for this final layer, which is equivalent to this final layer here.
11769
18:12:34,120 --> 18:12:35,160
I'll see you in the next video.
11770
18:12:37,800 --> 18:12:42,120
Over the last few videos, we've been replicating the tiny VGG architecture
11771
18:12:42,120 --> 18:12:47,400
from the CNN explainer website. And I hope you know that this is this actually quite exciting
11772
18:12:47,400 --> 18:12:53,160
because years ago, this would have taken months of work. And we've just covered we've broken it
11773
18:12:53,160 --> 18:12:57,720
down over the last few videos and rebuilt it ourselves with a few lines of PyTorch code.
11774
18:12:58,760 --> 18:13:04,520
So that's just goes to show how powerful PyTorch is and how far the deep learning field has come.
11775
18:13:04,520 --> 18:13:10,040
But we're not finished yet. Let's just go over to our keynote. This is what we've done.
11776
18:13:10,040 --> 18:13:18,200
CNN explainer model. We have an input layer. We've created that. We have com2d layers.
11777
18:13:18,760 --> 18:13:23,400
We've created those. We have relo activation layers. We've created those.
11778
18:13:24,280 --> 18:13:29,400
And finally, we have pulling layers. And then we finish off with an output layer.
11779
18:13:30,040 --> 18:13:34,360
But now let's see what happens when we actually pass some data through this entire model.
11780
18:13:34,360 --> 18:13:41,480
And as I've said before, this is actually quite a common practice is you replicate a model
11781
18:13:41,480 --> 18:13:46,920
that you found somewhere and then test it out with your own data. So we're going to start off
11782
18:13:46,920 --> 18:13:53,240
by using some dummy data to make sure that our model works. And then we're going to pass through.
11783
18:13:53,240 --> 18:13:58,520
Oh, I've got another slide for this. By the way, here's a breakdown of torch and
11784
18:13:58,520 --> 18:14:04,280
N com2d. If you'd like to see it in text form, nothing here that we really haven't discussed before, but
11785
18:14:04,280 --> 18:14:10,200
this will be in the slides if you would like to see it. Then we have a video animation.
11786
18:14:10,200 --> 18:14:14,840
We've seen this before, though. And plus, I'd rather you go through the CNN explainer website
11787
18:14:14,840 --> 18:14:18,840
on your own and explore this different values rather than me just keep talking about it.
11788
18:14:19,480 --> 18:14:24,920
Here's what we're working towards doing. We have a fashion MNIST data set. And we have
11789
18:14:24,920 --> 18:14:30,040
our inputs. We're going to numerically encode them. We've done that already. Then we have our
11790
18:14:30,040 --> 18:14:35,960
convolutional neural network, which is a combination of convolutional layers, nonlinear activation
11791
18:14:35,960 --> 18:14:40,600
layers, pooling layers. But again, these could be comprised in many different ways, shapes and
11792
18:14:40,600 --> 18:14:46,440
forms. In our case, we've just replicated the tiny VGG architecture. And then finally,
11793
18:14:46,440 --> 18:14:52,200
we want to have an output layer to predict what class of clothing a particular input image is.
11794
18:14:52,200 --> 18:15:02,360
And so let's go back. We have our CNN model here. And we've got model two. So let's just practice
11795
18:15:02,360 --> 18:15:06,920
a dummy forward pass here. We're going to come back up a bit to where we were. We'll make sure
11796
18:15:06,920 --> 18:15:15,240
we've got model two. And we get an error here because I've times this by zero. So I'm going to
11797
18:15:15,240 --> 18:15:21,080
just remove that and keep it there. Let's see what happens if we create a dummy tensor and pass it
11798
18:15:21,080 --> 18:15:29,320
through here. Now, if you recall what our image is, do we have image? This is a fashion MNIST
11799
18:15:29,320 --> 18:15:38,280
image. So I wonder if we can go plot dot M not M show image. And I'm going to squeeze that.
11800
18:15:38,280 --> 18:15:48,440
And I'm going to set the C map equal to gray. So this is our current image. Wonderful.
11801
18:15:48,440 --> 18:15:56,040
So there's our current image. So let's create a tensor. Or maybe we just try to pass this through
11802
18:15:56,040 --> 18:16:02,600
the model and see what happens. How about we try that model image? All right, we're going to try
11803
18:16:02,600 --> 18:16:12,600
the first pass forward pass. So pass image through model. What's going to happen? Well, we get an
11804
18:16:12,600 --> 18:16:17,400
error. Another shape mismatch. We've seen this before. How do we deal with this? Because what
11805
18:16:17,400 --> 18:16:26,440
is the shape of our current image? 128, 28. Now, if you don't have this image instantiated,
11806
18:16:26,440 --> 18:16:33,080
you might have to go back up a few cells. Where did we create image? I'll just find this. So
11807
18:16:33,080 --> 18:16:38,280
just we created this a fairly long time ago. So I'm going to probably recreate it down the
11808
18:16:38,280 --> 18:16:45,080
bottom. My goodness, we've written a lot of code. Well, don't do us. We could create a dummy tensor
11809
18:16:45,080 --> 18:16:51,880
if we wanted to. How about we do that? And then if you want to find, oh, right back up here,
11810
18:16:51,880 --> 18:16:57,720
we have an image. How about we do that? We can just do it with a dummy tensor. That's fine.
11811
18:16:58,760 --> 18:17:03,560
We can create one of the same size. But if you have image instantiated, you can try that out.
11812
18:17:03,560 --> 18:17:10,040
So there's an image. Let's now create an image that is, or a random tensor, that is the same
11813
18:17:10,040 --> 18:17:20,840
shape as our image. So rand image tensor equals what torch dot rand n. And we're going to pass in
11814
18:17:21,400 --> 18:17:28,520
size equals 128, 28. Then if we get rand image tensor,
11815
18:17:32,440 --> 18:17:37,240
we check its shape. What do we get? So the same shape as our test image here,
11816
18:17:37,240 --> 18:17:40,280
but it's just going to be random numbers. But that's okay. We just want to highlight a point
11817
18:17:40,280 --> 18:17:45,800
here of input and output shapes. We want to make sure our model works. Can our random image tensor
11818
18:17:45,800 --> 18:17:50,360
go all the way through our model? That's what we want to find out. So we get an error here.
11819
18:17:50,360 --> 18:17:54,600
We have four dimensions, but our image is three dimensions. How do we add an extra dimension
11820
18:17:54,600 --> 18:17:58,920
for batch size? Now you might not get this error if you're running a later version of pie torch.
11821
18:17:58,920 --> 18:18:07,400
Just keep that in mind. So unsqueeze zero. Oh, expected all tensors to be on the same device,
11822
18:18:07,400 --> 18:18:12,200
but found at least two devices. Again, we're going through all the three major issues in deep
11823
18:18:12,200 --> 18:18:17,960
learning. Shape mismatch, device mismatch, data type mismatch. So let's put this on the device,
11824
18:18:17,960 --> 18:18:21,640
two target device, because we've set up device agnostic code.
11825
18:18:21,640 --> 18:18:27,960
That one and that two shapes cannot be multiplied. Oh, but we can output here.
11826
18:18:28,920 --> 18:18:34,040
That is very exciting. So what I might do is move this a couple of cells up so that we can
11827
18:18:34,040 --> 18:18:40,760
tell what's going on. I'm going to delete this cell. So where do these shapes come from?
11828
18:18:42,040 --> 18:18:46,120
Well, we printed out the shapes there. And so this is what's happened when our,
11829
18:18:46,120 --> 18:18:51,320
I'll just create our random tensor. I'll bring our random tensor up a bit too. Let's bring this up.
11830
18:18:53,080 --> 18:19:01,960
There we go. So we pass our random to image tensor through our model, and we've made sure it's
11831
18:19:01,960 --> 18:19:07,240
got four dimensions by unsqueeze zero. And we make sure it's on the same device as our model,
11832
18:19:07,240 --> 18:19:12,440
because our model has been sent to the GPU. And this is what happens as we pass our random
11833
18:19:12,440 --> 18:19:19,000
image tensor. We've got 12828 instead of previously we've seen 6464.3, which is going to clean this
11834
18:19:19,000 --> 18:19:27,160
up a bit. And we get different shapes here. So you'll notice that as our input, if it was 6464.3
11835
18:19:27,160 --> 18:19:33,400
goes through these layers, it gets shaped into different values. Now this is going to be universal
11836
18:19:33,400 --> 18:19:38,120
across all of the different data sets you work on, you will be working with different shapes.
11837
18:19:38,120 --> 18:19:44,200
So it's important to, and also quite fun, to troubleshoot what shapes you need to use for
11838
18:19:44,200 --> 18:19:48,360
your different layers. So this is where my trick comes in. To find out the shapes for different
11839
18:19:48,360 --> 18:19:53,720
layers, I often construct my models, how we've done here, as best I can with the information
11840
18:19:53,720 --> 18:19:58,360
that I've got, such as replicating what's here. But I don't really know what the output
11841
18:19:58,360 --> 18:20:03,480
shape is going to be before it goes into this final layer. And so I recreate the model as best
11842
18:20:03,480 --> 18:20:10,040
I can. And then I pass data through it in the form of a dummy tensor in the same shape as my
11843
18:20:10,040 --> 18:20:15,000
actual data. So we could customize this to be any shape that we wanted. And then I print the
11844
18:20:15,000 --> 18:20:21,880
shapes of what's happening through each of the forward past steps. And so if we pass it through
11845
18:20:21,880 --> 18:20:27,560
this random tensor through the first column block, it goes through these layers here. And then it
11846
18:20:27,560 --> 18:20:33,240
outputs a tensor with this size. So we've got 10, because that's how many output channels we've
11847
18:20:33,240 --> 18:20:41,640
set. And then 14, 14, because our 2828 tensor has gone through a max pool 2d layer and gone through
11848
18:20:41,640 --> 18:20:47,800
a convolutional layer. And then it goes through the next block, column block two, which is because
11849
18:20:47,800 --> 18:20:52,440
we've put it in the forward method here. And then it outputs the shape. And if we go back down,
11850
18:20:53,080 --> 18:20:59,000
we have now a shape of one 10, seven, seven. So our previous tensor, the output of column block one,
11851
18:20:59,000 --> 18:21:06,200
has gone from 1414 to seven seven. So it's been compressed. So let me just write this down here,
11852
18:21:06,760 --> 18:21:13,880
output shape of column block one, just so we get a little bit more information.
11853
18:21:15,800 --> 18:21:23,160
And I'm just going to copy this, put it in here, that will become block two.
11854
18:21:23,160 --> 18:21:31,160
And then finally, I want to know if I get an output shape of classifier.
11855
18:21:31,160 --> 18:21:39,160
So if I rerun all of this, I don't get an output shape of classifier. So my model is running into
11856
18:21:39,160 --> 18:21:45,160
trouble. Once it gets to, so I get the output of conv block one, I don't get an output of classifier.
11857
18:21:45,160 --> 18:21:51,160
So this is telling me that I have an issue with my classifier layer. Now I know this, but I'm
11858
18:21:51,160 --> 18:21:57,080
not. Now I know this because, well, I've coded this model before, and the in features here,
11859
18:21:57,080 --> 18:22:00,600
we need a special calculation. So what is going on with our shapes?
11860
18:22:02,200 --> 18:22:07,880
Mat one and mat two shapes cannot be multiplied. So do you see here, what is the rule of matrix
11861
18:22:07,880 --> 18:22:12,840
multiplication? The inner dimensions here have to match. We've got 490. Where could that number
11862
18:22:12,840 --> 18:22:21,080
have come from? And we've got 10 times 10. Now, okay, I know I've set hidden units to 10.
11863
18:22:21,080 --> 18:22:28,680
So maybe that's where that 10 came from. And what is the output layer of the output shape of conv
11864
18:22:28,680 --> 18:22:37,320
block two? So if we look, we've got the output shape of conv block two. Where does that go?
11865
18:22:38,520 --> 18:22:45,640
The output of conv block two goes into our classifier model. And then it gets flattened.
11866
18:22:45,640 --> 18:22:51,960
So that's telling us something there. And then our NN linear layer is expecting the output of
11867
18:22:51,960 --> 18:22:59,720
the flatten layer as it's in features. So this is where my trick comes into play. I pass the
11868
18:22:59,720 --> 18:23:06,600
output of conv block two into the classifier layer. It gets flattened. And then that's what
11869
18:23:06,600 --> 18:23:16,840
my NN not linear layer is expecting. So what happens if we flatten this shape here? Do we get
11870
18:23:16,840 --> 18:23:28,600
this value? Let's have a look. So if we go 10 times seven times seven, 490. Now, where was this 10?
11871
18:23:28,600 --> 18:23:38,120
Well, that's our hidden units. And where were these sevens? Well, these sevens are the output
11872
18:23:38,120 --> 18:23:45,400
of conv block two. So that's my trick. I print the shapes of previous layers and see whether or
11873
18:23:45,400 --> 18:23:52,920
not they line up with subsequent layers. So if we go time seven times seven, we're going to have
11874
18:23:52,920 --> 18:23:58,120
hidden units equals 10 times seven times seven. Where do we get the two sevens? Because that is
11875
18:23:58,120 --> 18:24:03,560
the output shape of conv block two. Do you see how this can be a little bit hard to calculate ahead
11876
18:24:03,560 --> 18:24:10,120
of time? Now, you could calculate this by hand if you went into n conv 2d. But I prefer to write
11877
18:24:10,120 --> 18:24:15,400
code to calculate things for me. You can calculate that value by hand. If you go through,
11878
18:24:16,280 --> 18:24:22,360
H out W out, you can add together all of the different parameters and multiply them and divide
11879
18:24:22,360 --> 18:24:27,640
them and whatnot. You can calculate the input and output shapes of your convolutional layers.
11880
18:24:28,200 --> 18:24:34,920
You're more than welcome to try that out by hand. But I prefer to code it out. If and out code it
11881
18:24:34,920 --> 18:24:42,200
out. Now, let's see what happens if we run our random image tensor through our model. Now,
11882
18:24:42,200 --> 18:24:47,480
do you think it will work? Well, let's find out. All we've done is we've added this little line
11883
18:24:47,480 --> 18:24:53,720
here, times seven times seven. And we've calculated that because we've gone, huh, what if we pass a
11884
18:24:53,720 --> 18:25:00,280
tensor of this dimension through a flattened layer? And what is our rule of matrix multiplication?
11885
18:25:00,280 --> 18:25:06,280
The inner dimensions here must match. And why do we know that these are matrices? Well,
11886
18:25:06,280 --> 18:25:10,840
mat one and mat two shapes cannot be multiplied. And we know that inside a linear layer
11887
18:25:10,840 --> 18:25:19,160
is a matrix multiplication. So let's now give this a go. We'll see if it works.
11888
18:25:22,040 --> 18:25:28,680
Oh, ho ho. Would you look at that? That is so exciting. We have the output shape of the classifier
11889
18:25:28,680 --> 18:25:35,960
is one and 10. We have a look, we have one number one, two, three, four, five, six, seven, eight,
11890
18:25:35,960 --> 18:25:45,400
nine, 10, one number for each class in our data set. Wow. Just like the CNN explain a website,
11891
18:25:45,400 --> 18:25:51,240
we have 10 outputs here. We just happen to have 10 classes as well. Now, this number again could be
11892
18:25:51,240 --> 18:25:55,160
whatever you want. It could be 100, could be 30, could be three, depending on how many classes
11893
18:25:55,160 --> 18:26:01,160
you have. But we have just figured out the input and output shapes of each layer in our model.
11894
18:26:01,160 --> 18:26:08,680
So that's very exciting. I think it's now time we've passed a random tensor through. How about we
11895
18:26:08,680 --> 18:26:14,280
pass some actual data through our model? In the next video, let's use our train and test step
11896
18:26:14,280 --> 18:26:19,800
functions to train our first convolutional neural network. I'll see you there.
11897
18:26:24,120 --> 18:26:28,600
Well, let's get ready to train our first CNN. So what do we need? Where are we up to in the
11898
18:26:28,600 --> 18:26:33,720
workflow? Well, we've built a model and we've stepped through it. We know what's going on,
11899
18:26:33,720 --> 18:26:39,960
but let's really see what's going on by training this CNN or see if it trains because we don't
11900
18:26:39,960 --> 18:26:46,680
always know if it will on our own data set, which is of fashion MNIST. So we're going to set up a
11901
18:26:46,680 --> 18:26:54,520
loss function and optimizer for model two. And just as we've done before, model two, turn that
11902
18:26:54,520 --> 18:27:00,200
into markdown. I'll just show you the workflow again. So this is what we're doing. We've got some
11903
18:27:00,200 --> 18:27:06,040
inputs. We've got a numerical encoding. We've built this architecture and hopefully it helps us
11904
18:27:06,040 --> 18:27:13,160
learn or it helps us make a predictive model that we can input images such as grayscale images of
11905
18:27:13,160 --> 18:27:21,320
clothing and predict. And if we look where we are at the PyTorch workflow, we've got our data ready.
11906
18:27:21,320 --> 18:27:29,000
We've built our next model. Now here's where we're up to picking a loss function and an optimizer.
11907
18:27:29,000 --> 18:27:38,120
So let's do that, hey, loss function, or we can do evaluation metrics as well. So set up loss
11908
18:27:38,120 --> 18:27:47,560
function slash eval metrics slash optimizer. And we want from helper functions, import accuracy
11909
18:27:47,560 --> 18:27:52,120
function, we don't need to reimport it, but we're going to do it anyway for completeness. Loss
11910
18:27:52,120 --> 18:27:58,520
function equals nn dot cross entropy loss, because we are working with a multi class classification
11911
18:27:58,520 --> 18:28:03,800
problem. And the optimizer, we're going to keep the same as what we've used before, torch dot
11912
18:28:03,800 --> 18:28:09,800
opt in SGD. And we'll pass it in this time, the params that we're trying to optimize are the
11913
18:28:09,800 --> 18:28:17,400
parameters of model two parameters. And we'll use a learning rate of 0.1. Run that. And just
11914
18:28:17,400 --> 18:28:25,160
to reiterate, here's what we're trying to optimize model two state dig. We have a lot of random
11915
18:28:25,160 --> 18:28:31,560
weights in model two. Have a look at all this. There's the bias, there's the weight. We're going
11916
18:28:31,560 --> 18:28:37,720
to try and optimize these to help us predict on our fashion MNIST data set. So without any further
11917
18:28:37,720 --> 18:28:44,760
ado, let's in the next video, go to the workflow, we're going to build our training loop. But thanks
11918
18:28:44,760 --> 18:28:50,920
to us before, we've now got functions to do this for us. So if you want to give this a go,
11919
18:28:50,920 --> 18:28:58,520
use our train step and test step function to train model two. Try that out. And we'll do it
11920
18:28:58,520 --> 18:29:06,920
together in the next video. We're getting so close to training our model. Let's write some code to
11921
18:29:06,920 --> 18:29:11,720
train our first thing in that model. Training and testing, I'm just going to make another heading
11922
18:29:11,720 --> 18:29:21,000
here. Model two, using our training and test functions. So we don't have to rewrite all of the
11923
18:29:21,000 --> 18:29:25,720
steps in a training loop and a testing loop, because we've already created that functionality
11924
18:29:25,720 --> 18:29:32,680
before through our train step function. There we go. Performs the training, or this should be
11925
18:29:32,680 --> 18:29:39,560
performs a training step with model trying to learn on data loader. So let's set this up.
11926
18:29:39,560 --> 18:29:45,720
We're going to set up torch manual seed 42, and we can set up a CUDA manual seed as well.
11927
18:29:46,600 --> 18:29:51,080
Just to try and make our experiments as reproducible as possible, because we're going to be using
11928
18:29:51,080 --> 18:29:56,360
CUDA, we're going to measure the time because we want to compare our models, not only their
11929
18:29:56,360 --> 18:30:02,920
performance in evaluation metrics, but how long they take to train from time it, because there's
11930
18:30:02,920 --> 18:30:10,200
no point having a model that performs really, really well, but takes 10 times longer to train.
11931
18:30:10,920 --> 18:30:16,520
Well, maybe there is, depending on what you're working on. Model two equals timer,
11932
18:30:19,000 --> 18:30:24,600
and we're going to train and test model, but the time is just something to be aware of,
11933
18:30:24,600 --> 18:30:29,800
is that usually a better performing model will take longer to train. Not always the case, but
11934
18:30:29,800 --> 18:30:36,760
just something to keep in mind. So for epoch in, we're going to use TQDM to measure the progress.
11935
18:30:37,400 --> 18:30:40,680
We're going to create a range of epochs. We're just going to train for three epochs,
11936
18:30:40,680 --> 18:30:48,760
keeping our experiment short for now, just to see how they work, epoch, and we're going to
11937
18:30:48,760 --> 18:30:54,760
print a new line here. So for an epoch in a range, we're going to do the training step,
11938
18:30:54,760 --> 18:31:00,120
which is our train step function. The model is going to be equal to model two, which is our
11939
18:31:00,120 --> 18:31:05,240
convolutional neural network, our tiny VGG. The data loader is just going to be equal to the
11940
18:31:05,240 --> 18:31:10,120
train data loader, the same one we've used before. The loss function is going to be equal to the
11941
18:31:10,120 --> 18:31:16,600
loss function that we've set up above, loss FN. The optimizer as well is going to be
11942
18:31:17,160 --> 18:31:22,200
the optimizer in our case, stochastic gradient descent, optimizer equals optimizer,
11943
18:31:22,200 --> 18:31:26,920
then we set up the accuracy function, which is going to be equal to our accuracy function,
11944
18:31:27,480 --> 18:31:36,360
and the device is going to be the target device. How easy was that? Now we do the same for the
11945
18:31:36,360 --> 18:31:41,640
train or the testing step, sorry, the model is going to be equal to model two, and then the data
11946
18:31:41,640 --> 18:31:51,000
loader is going to be the test data loader, and then the loss function is going to be our same
11947
18:31:51,000 --> 18:31:56,520
our same loss function. And then we have no optimizer for this, we're just going to pass in the
11948
18:31:56,520 --> 18:32:02,920
accuracy function here. And then of course, the device is going to be equal to the device.
11949
18:32:03,800 --> 18:32:12,120
And then what do we do now? Well, we can measure the end time so that we know how long the code
11950
18:32:12,120 --> 18:32:20,120
here took to run. So let's go train time end for model two. This will be on the GPU, by the way,
11951
18:32:20,120 --> 18:32:24,600
but this time it's using a convolutional neural network. And the total train time,
11952
18:32:25,800 --> 18:32:32,440
total train time for model two is going to be equal to print train time, our function that we
11953
18:32:32,440 --> 18:32:37,640
created before as well, to help us measure start and end time. So we're going to pass in train
11954
18:32:37,640 --> 18:32:46,680
to time start model two, and then end is going to be train time end model two. And then we're going
11955
18:32:46,680 --> 18:32:52,680
to print out the device that it's using as well. So you're ready? Are you ready to train our first
11956
18:32:52,680 --> 18:32:58,440
convolutional neural network? Hopefully this code works. We've created these functions before,
11957
18:32:58,440 --> 18:33:04,840
so it should be all right. But if and out, code it out, if and out, run the code, let's see what
11958
18:33:04,840 --> 18:33:13,640
happens. Oh my goodness. Oh, of course. Oh, we forgot to comment out the output shapes.
11959
18:33:13,640 --> 18:33:20,520
So we get a whole bunch of outputs for our model, because what have we done? Back up here,
11960
18:33:21,320 --> 18:33:25,800
we forgot to. So this means every time our data goes through the forward pass, it's going to
11961
18:33:25,800 --> 18:33:33,560
be printing out the output shapes. So let's just comment out these. And I think this cell is going
11962
18:33:33,560 --> 18:33:40,200
to take quite a long time to run because it's got so many printouts. Yeah, see, streaming output
11963
18:33:40,200 --> 18:33:46,600
truncated to the last 5,000 lines. So we're going to try and stop that. Okay, there we go.
11964
18:33:46,600 --> 18:33:52,280
Beautiful. That actually worked. Sometimes it doesn't stop so quickly. So we're going to rerun
11965
18:33:52,280 --> 18:34:00,040
our fashion MSV to model cell so that we comment out these print lines. And then we'll just rerun
11966
18:34:00,040 --> 18:34:07,000
these cells down here. Just go back through fingers crossed, there's no errors. And we'll train our
11967
18:34:07,000 --> 18:34:12,840
model again. Beautiful. Not as many printouts this time. So here we go. Our first CNN is training.
11968
18:34:12,840 --> 18:34:18,840
How do you think it'll go? Well, that's what we have printouts, right? So we can see the progress.
11969
18:34:18,840 --> 18:34:23,160
So you can see here all the functions that are being called behind the scenes from PyTorch. So
11970
18:34:23,160 --> 18:34:27,240
thank you to PyTorch for that. There's our, oh, our train step function was in there.
11971
18:34:28,120 --> 18:34:35,560
Train step. Wonderful. Beautiful. So there's epoch zero. Oh, we get a pretty good test accuracy.
11972
18:34:35,560 --> 18:34:41,480
How good is that? Test accuracy is climbing as well. Have we beaten our baseline? We're looking at
11973
18:34:41,480 --> 18:34:51,320
about 14 seconds per epoch here. And then the final epoch. What do we finish at? Oh, 88.5. Wow.
11974
18:34:51,320 --> 18:35:00,920
In 41.979 or 42 there about seconds. Again, your mileage may vary. Don't worry too much if these
11975
18:35:00,920 --> 18:35:06,520
numbers aren't exactly the same on your screen and same with the training time because we might
11976
18:35:06,520 --> 18:35:15,720
be using slightly different hardware. What GPU do I have today? I have a Tesla P100 GPU. You might
11977
18:35:15,720 --> 18:35:21,000
not have the same GPU. So the training time, if this training time is something like 10 times
11978
18:35:21,000 --> 18:35:28,520
higher, you might want to look into what's going on. And if these values are like 10% lower or 10%
11979
18:35:28,520 --> 18:35:33,560
higher, you might want to see what's going on with your code as well. But let's now calculate
11980
18:35:33,560 --> 18:35:38,040
our Model 2 results. I think it is the best performing model that we have so far. Let's get
11981
18:35:38,040 --> 18:35:44,040
a results dictionary. Model 2 results is so exciting. We're learning the power of convolutional neural
11982
18:35:44,040 --> 18:35:50,520
networks. Model 2 results equals a vowel model. And this is a function that we've created before.
11983
18:35:52,440 --> 18:35:57,480
So returns a dictionary containing the results of a model predicting on data loader.
11984
18:35:57,480 --> 18:36:02,520
So now let's pass in the model, which will be our trained model to, and then we'll pass in the
11985
18:36:02,520 --> 18:36:09,960
data loader, which will be our test data loader. And then, oops, excuse me, typo, our loss function
11986
18:36:09,960 --> 18:36:16,840
will be, of course, our loss function. And the accuracy function will be accuracy function.
11987
18:36:17,480 --> 18:36:23,160
And the device is already set, but we can reset anyway, device equals device. And we'll check
11988
18:36:23,160 --> 18:36:34,200
out the Model 2 results. Make some predictions. Oh, look at that. Model accuracy 88. Does that
11989
18:36:34,200 --> 18:36:43,000
beat our baseline? Model 0 results. Oh, we did beat our baseline with a convolutional neural network.
11990
18:36:43,640 --> 18:36:50,360
All right. So I feel like that's, uh, that's quite exciting. But now let's keep going on. And, uh,
11991
18:36:50,360 --> 18:36:55,000
let's start to compare the results of all of our models. I'll see you in the next video.
11992
18:36:59,000 --> 18:37:04,760
Welcome back. Now, in the last video, we trained our first convolutional neural network. And
11993
18:37:04,760 --> 18:37:10,440
from the looks of things, it's improved upon our baseline. But let's make sure by comparing,
11994
18:37:10,440 --> 18:37:14,520
this is another important part of machine learning experiments is comparing the results
11995
18:37:14,520 --> 18:37:21,640
across your experiments. So and training time. Now, we've done that in a way where we've got
11996
18:37:21,640 --> 18:37:28,360
three dictionaries here of our model zero results, model one results, model two results. So how
11997
18:37:28,360 --> 18:37:36,600
about we create a data frame comparing them? So let's import pandas as PD. And we're going to
11998
18:37:36,600 --> 18:37:45,960
compare results equals PD dot data frame. And because our model results dictionaries, uh,
11999
18:37:45,960 --> 18:37:53,160
all have the same keys. Let's pass them in as a list. So model zero results, model one results,
12000
18:37:53,960 --> 18:38:00,920
and model two results to compare them. Wonderful. And what it looks like when we compare the results.
12001
18:38:00,920 --> 18:38:09,320
All righty. So recall our first model was our baseline V zero was just two linear layers.
12002
18:38:09,880 --> 18:38:18,040
And so we have an accuracy of 83.4 and a loss of 0.47. The next model was we trained on the GPU
12003
18:38:18,040 --> 18:38:26,200
and we introduced nonlinearities. So we actually found that that was worse off than our baseline.
12004
18:38:26,200 --> 18:38:32,360
But then we brought in the big guns. We brought in the tiny VGG architecture from the CNN explainer
12005
18:38:32,360 --> 18:38:38,200
website and trained our first convolutional neural network. And we got the best results so far.
12006
18:38:38,760 --> 18:38:43,080
But there's a lot more experiments that we could do. We could go back through our
12007
18:38:43,720 --> 18:38:50,920
tiny VGG and we could increase the number of hidden units. Where do we create our model up here?
12008
18:38:50,920 --> 18:38:55,880
We could increase this to say 30 and see what happens. That would be a good experiment to
12009
18:38:55,880 --> 18:39:01,400
try. And if we found that nonlinearities didn't help with our second model, we could comment out
12010
18:39:01,400 --> 18:39:07,800
the relu layers. We could of course change the kernel size, change the padding, change the max
12011
18:39:07,800 --> 18:39:12,440
pool. A whole bunch of different things that we could try here. We could train it for longer.
12012
18:39:12,440 --> 18:39:16,280
So maybe if we train it for 10 epochs, it would perform better. But these are just things to
12013
18:39:16,280 --> 18:39:20,920
keep in mind and try out. I'd encourage you to give them a go yourself. But for now, we've kept
12014
18:39:20,920 --> 18:39:26,840
all our experiments quite the same. How about we see the results we add in the training time?
12015
18:39:26,840 --> 18:39:31,480
Because that's another important thing that we've been tracking as well. So we'll add
12016
18:39:32,440 --> 18:39:41,560
training time to results comparison. So the reason why we do this is because
12017
18:39:42,520 --> 18:39:47,800
if this model is performing quite well, even compared to our CNN, so a difference in about
12018
18:39:47,800 --> 18:39:53,880
5% accuracy, maybe that's tolerable in the space that we're working, except that this model
12019
18:39:54,440 --> 18:39:59,880
might actually train and perform inference 10 times faster than this model. So that's just
12020
18:39:59,880 --> 18:40:05,080
something to be aware of. It's called the performance speed trade off. So let's add another column
12021
18:40:05,080 --> 18:40:12,120
here, compare results. And we're going to add in, oh, excuse me, got a little error there. That's
12022
18:40:12,120 --> 18:40:17,640
all right. Got trigger happy on the shift and enter. Training time equals, we're going to add in,
12023
18:40:18,600 --> 18:40:26,760
we've got another list here is going to be total train time for model zero, and total train time
12024
18:40:27,560 --> 18:40:37,080
for model one, and total train time for model two. And then we have a look at our
12025
18:40:37,080 --> 18:40:46,040
how compare results dictionary, or sorry, compare results data frame. Wonderful. So we see, and
12026
18:40:46,040 --> 18:40:50,520
now this is another thing. I keep stressing this to keep in mind. If your numbers aren't exactly
12027
18:40:50,520 --> 18:40:55,480
of what I've got here, don't worry too much. Go back through the code and see if you've set up
12028
18:40:55,480 --> 18:40:59,480
the random seeds correctly, you might need a koodle random seed. We may have missed one of those.
12029
18:41:00,200 --> 18:41:03,560
If your numbers are out landishly different to these numbers, then you should go back through
12030
18:41:03,560 --> 18:41:08,200
your code and see if there's something wrong. And again, the training time will be highly
12031
18:41:08,200 --> 18:41:12,920
dependent on the compute environment you're using. So if you're running this notebook locally,
12032
18:41:12,920 --> 18:41:17,800
you might get faster training times. If you're running it on a different GPU to what I have,
12033
18:41:17,800 --> 18:41:23,640
NVIDIA SMI, you might get different training times. So I'm using a Tesla P100, which is quite a fast
12034
18:41:23,640 --> 18:41:28,840
GPU. But that's because I'm paying for Colab Pro, which generally gives you faster GPUs.
12035
18:41:28,840 --> 18:41:36,840
And model zero was trained on the CPU. So depending on what compute resource Google allocates to you
12036
18:41:36,840 --> 18:41:43,000
with Google Colab, this number might vary here. So just keep that in mind. These values training
12037
18:41:43,000 --> 18:41:48,760
time will be very dependent on the hardware you're using. But if your numbers are dramatically
12038
18:41:48,760 --> 18:41:52,840
different, well, then you might want to change something in your code and see what's going on.
12039
18:41:52,840 --> 18:42:01,960
And how about we finish this off with a graph? So let's go visualize our model results. And while
12040
18:42:01,960 --> 18:42:08,680
we're doing this, have a look at the data frame above. Is the performance here 10 seconds longer
12041
18:42:08,680 --> 18:42:15,480
training time worth that extra 5% of the results on the accuracy? Now in our case, we're using a
12042
18:42:15,480 --> 18:42:21,160
relatively toy problem. What I mean by toy problem is quite a simple data set to try and test this
12043
18:42:21,160 --> 18:42:27,080
out. But in your practice, that may be worth doing. If your model takes longer to train,
12044
18:42:27,080 --> 18:42:32,600
but gets quite a bit better performance, it really depends on the problem you're working with.
12045
18:42:33,240 --> 18:42:38,520
Compare results. And we're going to set the index as the model name, because I think that's
12046
18:42:38,520 --> 18:42:43,560
what we want our graph to be, not the model name. And then we're going to plot, we want to compare
12047
18:42:43,560 --> 18:42:52,760
the model accuracy. And we want to plot, the kind is going to be equal to bar h, horizontal bar chart.
12048
18:42:53,400 --> 18:43:02,600
We've got p x label, we're going to get accuracy as a percentage. And then we're going to go py label.
12049
18:43:02,600 --> 18:43:06,200
This is just something that you could share. If someone was asking, how did your modeling
12050
18:43:06,200 --> 18:43:10,360
experiments go on fashion MNIST? Well, here's what I've got. And then they ask you, well,
12051
18:43:10,360 --> 18:43:14,760
what's the fashion MNIST model V2? Well, you could say that's a convolutional neural network that
12052
18:43:14,760 --> 18:43:20,600
trained, that's replicates the CNN explainer website that trained on a GPU. How long did that
12053
18:43:20,600 --> 18:43:25,080
take to train? Well, then you've got the training time here. We could just do it as a vertical bar
12054
18:43:25,080 --> 18:43:32,360
chart. I did it as horizontal so that this looks a bit funny to me. So horizontal like that.
12055
18:43:32,360 --> 18:43:39,960
So the model names are over here. Wonderful. So now I feel like we've got a trained model.
12056
18:43:40,760 --> 18:43:45,960
How about we make some visual predictions? Because we've just got numbers on a page here,
12057
18:43:45,960 --> 18:43:51,880
but our model is trained on computer vision data. And the whole point of making a machine
12058
18:43:51,880 --> 18:43:57,800
learning model on computer vision data is to be able to visualize predictions. So let's give
12059
18:43:57,800 --> 18:44:02,840
that a shot, hey, in the next video, we're going to use our best performing model, fashion MNIST
12060
18:44:02,840 --> 18:44:08,040
model V2 to make predictions on random samples from the test data set. You might want to give
12061
18:44:08,040 --> 18:44:13,640
that a shot, make some predictions on random samples from the test data set, and plot them out with
12062
18:44:13,640 --> 18:44:19,400
their predictions as the title. So try that out. Otherwise, we'll do it together in the next video.
12063
18:44:19,400 --> 18:44:28,520
In the last video, we compared our models results. We tried three experiments. One was a basic linear
12064
18:44:28,520 --> 18:44:34,680
model. One was a linear model with nonlinear activations. And fashion MNIST model V2 is a
12065
18:44:34,680 --> 18:44:40,280
convolutional neural network. And we saw that from an accuracy perspective, our convolutional neural
12066
18:44:40,280 --> 18:44:45,800
network performed the best. However, it had the longest training time. And I just want to exemplify
12067
18:44:45,800 --> 18:44:50,680
the fact that the training time will vary depending on the hardware that you run on. We spoke about
12068
18:44:50,680 --> 18:44:56,200
this in the last video. However, I took a break after finishing the last video, reran all of the
12069
18:44:56,200 --> 18:45:01,320
cells that we've written, all of the code cells up here by coming back to the notebook and going
12070
18:45:01,320 --> 18:45:06,520
run all. And as you'll see, if you compare the training times here to the last video, we get
12071
18:45:06,520 --> 18:45:11,960
some different values. Now, I'm not sure exactly what hardware Google collab is using behind the
12072
18:45:11,960 --> 18:45:16,920
scenes. But this is just something to keep in mind, at least from now on, we know how to track
12073
18:45:16,920 --> 18:45:22,680
our different variables, such as how long our model takes to train and what its performance
12074
18:45:22,680 --> 18:45:30,840
values are. But it's time to get visual. So let's create another heading, make and evaluate. This
12075
18:45:30,840 --> 18:45:37,240
is one of my favorite steps after training a machine learning model. So make and evaluate random
12076
18:45:37,240 --> 18:45:44,200
predictions with the best model. So we're going to follow the data explorer's model of getting
12077
18:45:44,200 --> 18:45:49,720
visual visual visual or visualize visualize visualize. Let's make a function called make
12078
18:45:49,720 --> 18:45:55,960
predictions. And it's going to take a model, which will be a torch and end module type.
12079
18:45:56,920 --> 18:46:02,840
It's also going to take some data, which can be a list. It'll also take a device type,
12080
18:46:02,840 --> 18:46:07,640
which will be torch dot device. And we'll set that by default to equal the default device that
12081
18:46:07,640 --> 18:46:12,680
we've already set up. And so what we're going to do is create an empty list for prediction
12082
18:46:12,680 --> 18:46:20,200
probabilities. Because what we'd like to do is just take random samples from the test data set,
12083
18:46:20,760 --> 18:46:26,280
make predictions on them using our model, and then plot those predictions. We want to visualize
12084
18:46:26,280 --> 18:46:32,120
them. And so we'll also turn our model into evaluation mode, because if you're making predictions with
12085
18:46:32,120 --> 18:46:37,640
your model, you should turn on evaluation mode. We'll also switch on the inference mode context
12086
18:46:37,640 --> 18:46:43,560
manager, because predictions is another word for inference. And we're going to loop through
12087
18:46:43,560 --> 18:46:51,800
for each sample in data. Let's prepare the sample. So this is going to take in
12088
18:46:52,760 --> 18:46:58,600
a single image. So we will unsqueeze it, because we need to add a batch size dimension
12089
18:46:58,600 --> 18:47:05,800
on the sample, we'll set dim equals to zero, and then we'll pass that to the device. So
12090
18:47:06,520 --> 18:47:14,440
add a batch dimension, that's with the unsqueeze, and pass to target device. That way, our data and
12091
18:47:14,440 --> 18:47:20,440
model are on the same device. And we can do a forward pass. Well, we could actually up here go
12092
18:47:21,800 --> 18:47:26,040
model dot two device. That way we know that we've got device agnostic code there.
12093
18:47:26,040 --> 18:47:33,000
Now let's do the forward pass, forward pass model outputs raw logits. So recall that if we have a
12094
18:47:33,000 --> 18:47:39,800
linear layer at the end of our model, it outputs raw logits. So pred logit for a single sample is
12095
18:47:39,800 --> 18:47:45,640
going to equal model. We pass the sample to our target model. And then we're going to get the
12096
18:47:45,640 --> 18:47:50,600
prediction probability. How do we get the prediction probability? So we want to go from
12097
18:47:50,600 --> 18:47:59,240
logit to prediction probability. Well, if we're working with a multi class classification problem,
12098
18:47:59,240 --> 18:48:05,320
we're going to use the softmax activation function on our pred logit. And we're going to squeeze
12099
18:48:05,320 --> 18:48:11,640
it so it gets rid of an extra dimension. And we're going to pass in dim equals zero. So that's going
12100
18:48:11,640 --> 18:48:17,800
to give us our prediction probability for a given sample. Now let's also turn our prediction
12101
18:48:17,800 --> 18:48:25,480
probabilities into prediction labels. So get pred. Well, actually, I think we're just going
12102
18:48:25,480 --> 18:48:31,160
to return the pred probes. Yeah, let's see what that looks like, because we've got a
12103
18:48:31,160 --> 18:48:37,640
an empty list up here for pred probes. So for matplotlib, we're going to have to use our data
12104
18:48:37,640 --> 18:48:43,320
on the CPU. So let's make sure it's on the CPU, because matplotlib doesn't work with the GPU.
12105
18:48:43,320 --> 18:48:51,400
So get pred prob off GPU for further calculations. So we're just hard coded in here to make sure
12106
18:48:51,400 --> 18:48:58,440
that our prediction probabilities off the GPU. So pred probs, which is our list up here. We're
12107
18:48:58,440 --> 18:49:06,040
going to append the pred prob that we just calculated. But we're going to put it on the CPU. And then
12108
18:49:06,040 --> 18:49:12,840
let's go down here. And we're going to. So if we've done it right, we're going to have a list of
12109
18:49:12,840 --> 18:49:18,280
prediction probabilities relating to particular samples. So we're going to stack the pred probs
12110
18:49:19,000 --> 18:49:25,320
to turn list into a tensor. So this is only one way of doing things. There are many different ways
12111
18:49:25,320 --> 18:49:30,200
that you could make predictions and visualize them. I'm just exemplifying one way. So we're
12112
18:49:30,200 --> 18:49:34,920
going to torch stack, which is just going to say, hey, concatenate everything in the list to a
12113
18:49:34,920 --> 18:49:49,560
single tensor. So we might need to tab that over, tab, tab. Beautiful. So let's try this function
12114
18:49:49,560 --> 18:49:54,760
in action and see what happens. I'm going to import random. And then I'm going to set the random
12115
18:49:54,760 --> 18:50:00,600
seed to 42. And then I'm going to create test samples as an empty list, because we want an empty
12116
18:50:00,600 --> 18:50:06,200
or we want a list of test samples to iterate through. And I'm going to create test labels also as an
12117
18:50:06,200 --> 18:50:10,760
empty list. So that remember, when we are evaluating predictions, we want to compare them to the
12118
18:50:10,760 --> 18:50:15,480
ground truth. So we want to get some test samples. And then we want to get their actual labels so
12119
18:50:15,480 --> 18:50:21,720
that when our model makes predictions, we can compare them to their actual labels. So for sample,
12120
18:50:22,360 --> 18:50:29,240
comma label, in, we're going to use random to sample the test data. Now note that this is not
12121
18:50:29,240 --> 18:50:36,360
the test data loader. This is just test data. And we're going to set k equals to nine. And recall,
12122
18:50:36,360 --> 18:50:40,520
if you want to have a look at test data, what do we do here? We can just go test data,
12123
18:50:41,960 --> 18:50:47,400
which is our data set, not converted into a data loader yet. And then if we wanted to get the first
12124
18:50:47,400 --> 18:50:53,080
10 samples, can we do that? Only one element tensors can be converted into Python scalars. So if we
12125
18:50:53,080 --> 18:50:59,240
get the first zero, and maybe we can go up to 10. Yeah, there we go. And what's the shape of this?
12126
18:51:03,240 --> 18:51:10,920
Tuple has no object shape. Okay, so we need to go image label equals that. And then can we check
12127
18:51:10,920 --> 18:51:18,520
the shape of the image label? Oh, because the labels are going to be integers.
12128
18:51:18,520 --> 18:51:25,800
Wonderful. So that's not the first 10 samples, but that's just what we get if we iterate through
12129
18:51:25,800 --> 18:51:32,520
the test data, we get an image tensor, and we get an associated label. So that's what we're doing
12130
18:51:32,520 --> 18:51:37,800
with this line here, we're just randomly sampling nine samples. And this could be any number you
12131
18:51:37,800 --> 18:51:41,560
want. I'm going to use nine, because this is a spoiler for later on, we're going to create a
12132
18:51:41,560 --> 18:51:48,280
three by three plot. So that just nine is just a fun number. So get some random samples from the
12133
18:51:48,280 --> 18:51:55,800
test data set. And then we can go test samples dot append sample. And we will go test labels dot
12134
18:51:55,800 --> 18:52:05,400
append label. And then let's go down here, view the first, maybe we go first sample shape.
12135
18:52:06,760 --> 18:52:11,720
So test samples zero dot shape.
12136
18:52:11,720 --> 18:52:19,000
And then if we get test samples, zero, we're going to get a tensor of image values. And then
12137
18:52:19,000 --> 18:52:28,920
if we wanted to plot that, can we go PLT, M show, C map, equals gray. And we may have to squeeze
12138
18:52:28,920 --> 18:52:35,480
this, I believe, to remove the batch tensor. Let's see what happens batch dimension. There we go.
12139
18:52:35,480 --> 18:52:41,000
Beautiful. So that's to me, a shoe, a high heel shoe of some sort. If we get the title,
12140
18:52:41,000 --> 18:52:50,760
PLT dot title, test labels, let's see what this looks like. It's a five, which is, of course,
12141
18:52:50,760 --> 18:52:59,560
class names will index on that. Sandal. Okay, beautiful. So we have nine random samples,
12142
18:52:59,560 --> 18:53:06,360
nine labels that are associated with that sample. Now let's make some predictions. So make predictions.
12143
18:53:06,360 --> 18:53:13,240
And this is one of my favorite things to do. I can't stress it enough is to randomly pick data
12144
18:53:13,240 --> 18:53:18,600
samples from the test data set and predict on them and do it over and over and over again to see
12145
18:53:18,600 --> 18:53:23,960
what the model is doing. So not only at the start of a problem, I'll just get the prediction
12146
18:53:23,960 --> 18:53:29,560
probabilities here. We're going to call our make predictions function. So not only at the start of
12147
18:53:29,560 --> 18:53:34,520
a problem should you become one with the data, even after you've trained a model, you'll want to
12148
18:53:34,520 --> 18:53:39,000
further become one with the data, but this time become one with your models predictions on the
12149
18:53:39,000 --> 18:53:46,840
data and see what happens. So view the first two prediction probabilities list. So we're just
12150
18:53:46,840 --> 18:53:51,160
using our make predictions function that we created before, passing at the model, the train model
12151
18:53:51,160 --> 18:53:55,800
to, and we're passing at the data, which is the test samples, which is this list that we just
12152
18:53:55,800 --> 18:54:03,000
created up here, which is comprised of random samples from the test data set. Wonderful. So
12153
18:54:03,000 --> 18:54:09,480
let's go. Pred probes. Oh, we don't want to view them all. That's going to give us
12154
18:54:13,880 --> 18:54:19,480
Oh, we want to the prediction probabilities for a given sample. And so how do we convert
12155
18:54:19,480 --> 18:54:25,720
prediction probabilities into labels? Because if we're trying to, if we have a look at test
12156
18:54:25,720 --> 18:54:32,040
labels, if we're trying to compare apples to apples, when we're evaluating our model, we want to,
12157
18:54:32,040 --> 18:54:37,320
we can't really necessarily compare the prediction probabilities straight to the test labels. So we
12158
18:54:37,320 --> 18:54:43,720
need to convert these prediction probabilities into prediction labels. So how can we do that?
12159
18:54:44,520 --> 18:54:50,920
Well, we can use argmax to take whichever value here, the index, in this case, this one,
12160
18:54:51,560 --> 18:54:58,120
the index of whichever value is the highest of these prediction probabilities. So let's see that
12161
18:54:58,120 --> 18:55:08,520
in action. Convert prediction probabilities to labels. So we'll go pred classes equals
12162
18:55:08,520 --> 18:55:14,360
pred probes, and we'll get the argmax across the first dimension. And now let's have a look at the
12163
18:55:14,360 --> 18:55:23,880
pred classes. Wonderful. So are they in the same format as our test labels? Yes, they are. So if
12164
18:55:23,880 --> 18:55:28,760
you'd like to go ahead, in the next video, we're going to plot these and compare them. So we're
12165
18:55:28,760 --> 18:55:33,560
going to write some code to create a mapplotlib plotting function that's going to plot nine
12166
18:55:33,560 --> 18:55:41,160
different samples, along with their original labels, and their predicted label. So give that a shot,
12167
18:55:41,160 --> 18:55:44,920
we've just written some code here to make some predictions on random samples. If you'd like them
12168
18:55:44,920 --> 18:55:51,000
to be truly random, you can comment out the seed here, but I've just kept the seed at 42. So that
12169
18:55:51,000 --> 18:55:58,440
our random dot sample selects the same samples on your end and on my end. So in the next video,
12170
18:55:58,440 --> 18:56:08,120
let's plot these. Let's now continue following the data explorer's motto of visualize visualize
12171
18:56:08,120 --> 18:56:12,920
visualize. We have some prediction classes. We have some labels we'd like to compare them to.
12172
18:56:12,920 --> 18:56:17,000
You can compare them visually. It looks like our model is doing pretty good. But let's,
12173
18:56:17,000 --> 18:56:22,120
since we're making predictions on images, let's plot those images along with the predictions.
12174
18:56:22,760 --> 18:56:27,560
So I'm going to write some code here to plot the predictions. I'm going to create a matplotlib
12175
18:56:27,560 --> 18:56:34,360
figure. I'm going to set the fig size to nine and nine. Because we've got nine random samples,
12176
18:56:34,920 --> 18:56:39,160
you could, of course, change this to however many you want. I just found that a three by three
12177
18:56:39,160 --> 18:56:45,720
plot works pretty good in practice. And I'm going to set n rows. So for my matplotlib plot, I want
12178
18:56:45,720 --> 18:56:53,880
three rows. And I want three columns. And so I'm going to enumerate through the samples in test
12179
18:56:53,880 --> 18:57:02,920
samples. And then I'm going to create a subplot for each sample. So create a subplot. Because this
12180
18:57:02,920 --> 18:57:07,400
is going to create a subplot because it's within the loop. Each time it goes through a new sample,
12181
18:57:07,400 --> 18:57:17,240
create a subplot of n rows and calls. And the index it's going to be on is going to be i plus
12182
18:57:17,240 --> 18:57:23,240
one, because it can't start at zero. So we just put i plus one in there. What's going on here?
12183
18:57:24,600 --> 18:57:31,720
Enumerate. Oh, excuse me. In enumerate, wonderful. So now we're going to plot the target image.
12184
18:57:31,720 --> 18:57:40,920
We can go plot dot in show, we're going to get sample dot squeeze. Because we need to remove the
12185
18:57:40,920 --> 18:57:47,960
batch dimension. And then we're going to set the C map is equal to gray. What's this telling me
12186
18:57:47,960 --> 18:57:57,560
up here? Oh, no, that's correct. Next, we're going to find the prediction label in text form,
12187
18:57:57,560 --> 18:58:02,360
because we don't want it in a numeric form, we could do that. But we want to look at things
12188
18:58:02,360 --> 18:58:08,520
visually with human readable language, such as sandal for whatever class sandal is, whatever number
12189
18:58:08,520 --> 18:58:14,760
class that is. So we're going to set the pred label equals class names. And we're going to index
12190
18:58:15,320 --> 18:58:22,200
using the pred classes I value. So right now we're going to plot our sample. We're going to find
12191
18:58:22,200 --> 18:58:29,080
its prediction. And now we're going to get the truth label. So we also want this in text form.
12192
18:58:30,040 --> 18:58:35,160
And what is the truth label going to be? Well, the truth label is we're going to have to index
12193
18:58:35,720 --> 18:58:42,600
using class names and index on that using test labels I. So we're just matching up our indexes
12194
18:58:42,600 --> 18:58:50,120
here. Finally, we're going to create a title, create a title for the plot. And now here's what I like
12195
18:58:50,120 --> 18:58:54,680
to do as well. If we're getting visual, well, we might as well get really visual, right? So I
12196
18:58:54,680 --> 18:58:59,080
think we can change the color of the title text, depending if the prediction is right or wrong.
12197
18:58:59,720 --> 18:59:04,200
So I'm going to create a title using an F string, pred is going to be a pred label,
12198
18:59:04,760 --> 18:59:10,440
and truth label. We could even plot the prediction probabilities here if we wanted to. That might
12199
18:59:10,440 --> 18:59:16,840
be an extension that you might want to try. And so here we're going to check for equality
12200
18:59:16,840 --> 18:59:26,920
between pred and truth and change color of title text. So what I mean by this, it's going to be a
12201
18:59:26,920 --> 18:59:33,560
lot easier to explain if we just if and doubt coded out. So if the pred label equals the truth
12202
18:59:33,560 --> 18:59:41,880
label, so they're equal, I want the plot dot title to be the title text. But I want the font size,
12203
18:59:41,880 --> 18:59:50,200
well, the font size can be the same 10. I want the color to equal green. So if they're so green text,
12204
18:59:50,920 --> 19:00:00,920
if prediction, same as truth, and else I'm going to set the plot title to have title text font
12205
19:00:00,920 --> 19:00:07,880
size equals 10. And the color is going to be red. So does that make sense? All we're doing is we're
12206
19:00:07,880 --> 19:00:13,400
enumerating through our test samples that we got up here, test samples that we found randomly from
12207
19:00:13,400 --> 19:00:18,760
the test data set. And then each time we're creating a subplot, we're plotting our image,
12208
19:00:19,240 --> 19:00:23,480
we're finding the prediction label by indexing on the class names with our pred classes value,
12209
19:00:24,760 --> 19:00:29,800
we're getting the truth label, and we're creating a title for the plot that compares the pred label
12210
19:00:29,800 --> 19:00:36,520
to the truth. And we're changing the color of the title text, depending if the pred label is
12211
19:00:36,520 --> 19:00:45,160
correct or not. So let's see what happens. Did we get it right? Oh, yes, we did. Oh, I'm going to
12212
19:00:45,160 --> 19:00:51,960
do one more thing. I want to turn off the accesses, just so we get more real estate. I love these
12213
19:00:51,960 --> 19:00:57,720
kind of plots. It helps that our model got all of these predictions right. So look at this,
12214
19:00:58,360 --> 19:01:05,320
pred sandal, truth, sandal, pred trouser, truth trouser. So that's pretty darn good, right? See how,
12215
19:01:05,320 --> 19:01:10,280
for me, I much appreciate, like, I much prefer visualizing things numbers on a page look good,
12216
19:01:10,280 --> 19:01:15,160
but there's something, there's nothing quite like visualizing your machine learning models
12217
19:01:15,160 --> 19:01:19,800
predictions, especially when it gets it right. So how about we select some different random samples
12218
19:01:19,800 --> 19:01:24,920
up here, we could functionize this as well to do like all of this code in one hit, but that's all
12219
19:01:24,920 --> 19:01:31,160
right. We'll be a bit hacky for now. So this is just going to randomly sample with no seed at all.
12220
19:01:31,160 --> 19:01:36,600
So your samples might be different to mine, nine different samples. So this time we have an ankle
12221
19:01:36,600 --> 19:01:41,720
boot, we'll make some predictions, we'll just step through all of this code here. And oh,
12222
19:01:41,720 --> 19:01:47,000
there we go. It got one wrong. So all of these are correct. But this is more interesting as
12223
19:01:47,000 --> 19:01:51,320
well is where does your model get things wrong? So it predicted address, but this is a coat.
12224
19:01:52,280 --> 19:01:58,600
Now, do you think that this could be potentially address? To me, I could see that as being addressed.
12225
19:01:58,600 --> 19:02:02,920
So I kind of understand where the model's coming from there. Let's make some more random predictions.
12226
19:02:03,800 --> 19:02:06,280
We might do two more of these before we move on to the next video.
12227
19:02:07,480 --> 19:02:12,840
Oh, all correct. We're interested in getting some wrong here. So our model seems to be too good.
12228
19:02:16,840 --> 19:02:20,840
All correct again. Okay, one more time. If we don't get any wrong, we're going on to the next
12229
19:02:20,840 --> 19:02:27,240
video. But this is just really, oh, there we go. Too wrong. Beautiful. So predicted address,
12230
19:02:27,240 --> 19:02:31,400
and that's a shirt. Okay. I can kind of see where the model might have stuffed up there.
12231
19:02:31,400 --> 19:02:35,880
It's a little bit long for a shirt for me, but I can still understand that that would be a shirt.
12232
19:02:36,600 --> 19:02:42,280
And this is a pullover, but the truth is a coat. So maybe, maybe there's some issues with the labels.
12233
19:02:42,280 --> 19:02:46,680
And that's probably what you'll find in a lot of data sets, especially quite large ones.
12234
19:02:46,680 --> 19:02:51,160
Just with a sheer law of large numbers, there may be some truth labels in your data sets that
12235
19:02:51,160 --> 19:02:57,320
you work with that are wrong. And so that's why I like to see, compare the models predictions
12236
19:02:57,320 --> 19:03:02,360
versus the truth on a bunch of random samples to go, you know what, is our models results
12237
19:03:02,360 --> 19:03:07,800
better or worse than they actually are. And that's what visualizing helps you do is figure out,
12238
19:03:07,800 --> 19:03:13,080
you know what, our model is actually, it says it's good on the accuracy. But when we visualize
12239
19:03:13,080 --> 19:03:19,080
the predictions, it's not too good. And vice versa, right? So you can keep playing around with this,
12240
19:03:19,080 --> 19:03:24,520
try, look at some more random samples by running this again. We'll do one more for good luck.
12241
19:03:24,520 --> 19:03:28,600
And then we'll move on to the next video. We're going to go on to another way. Oh,
12242
19:03:29,240 --> 19:03:35,160
see, this is another example. Some labels here could be confusing. And speaking of confusing,
12243
19:03:35,160 --> 19:03:40,200
well, that's going to be a spoiler for the next video. But do you see how the prediction is a
12244
19:03:40,200 --> 19:03:46,760
t-shirt top, but the truth is a shirt? To me, that label is kind of overlapping. Like, I don't know,
12245
19:03:46,760 --> 19:03:52,360
what's the difference between a t-shirt and a shirt? So that's something that you'll find
12246
19:03:52,360 --> 19:03:56,760
as you train models is maybe your model is going to tell you about your data as well.
12247
19:03:58,120 --> 19:04:04,040
And so we hinted that this is going to be confused. The model is confused between t-shirt top and
12248
19:04:04,040 --> 19:04:10,840
shirt. How about we plot a confusion matrix in the next video? I'll see you there.
12249
19:04:10,840 --> 19:04:18,760
We're up to a very exciting point in evaluating our machine learning model.
12250
19:04:18,760 --> 19:04:25,320
And that is visualizing, visualizing, visualizing. And we saw that in the previous video, our model
12251
19:04:25,320 --> 19:04:30,120
kind of gets a little bit confused. And in fact, I would personally get confused at the difference
12252
19:04:30,120 --> 19:04:38,760
between t-shirt slash top and a shirt. So these kind of insights into our model predictions
12253
19:04:38,760 --> 19:04:44,760
can also give us insights into maybe some of our labels could be improved. And another way to
12254
19:04:44,760 --> 19:04:53,000
check that is to make a confusion matrix. So let's do that, making a confusion matrix for further
12255
19:04:53,720 --> 19:05:00,120
prediction evaluation. Now, a confusion matrix is another one of my favorite ways of evaluating
12256
19:05:00,120 --> 19:05:04,760
a classification model, because that's what we're doing. We're doing multi class classification.
12257
19:05:04,760 --> 19:05:10,120
And if you recall, if we go back to section two of the lone pytorch.io book,
12258
19:05:11,080 --> 19:05:16,520
and then if we scroll down, we have a section here, more classification evaluation metrics.
12259
19:05:16,520 --> 19:05:20,760
So accuracy is probably the gold standard of classification evaluation.
12260
19:05:20,760 --> 19:05:25,880
There's precision, there's recall, there's F1 score, and there's a confusion matrix here.
12261
19:05:25,880 --> 19:05:30,920
So how about we try to build one of those? I want to get this and copy this.
12262
19:05:30,920 --> 19:05:42,280
So, and write down a confusion matrix is a fantastic way of evaluating your classification models
12263
19:05:43,800 --> 19:05:52,440
visually. Beautiful. So we're going to break this down. First of all, we need to plot a
12264
19:05:52,440 --> 19:05:58,520
confusion matrix. We need to make predictions with our trained model on the test data set.
12265
19:05:58,520 --> 19:06:05,400
Number two, we're going to make a confusion matrix. And to do so, we're going to leverage
12266
19:06:05,400 --> 19:06:12,360
torch metrics tricks have to figure out how to spell metrics and confusion matrix. So recall
12267
19:06:12,360 --> 19:06:18,440
that torch metrics we've touched on this before is a great package torch metrics for a whole
12268
19:06:18,440 --> 19:06:25,960
bunch of evaluation metrics of machine learning models in pytorch flavor. So if we find we've
12269
19:06:25,960 --> 19:06:30,760
got classification metrics, we've got audio image detection. Look how this is beautiful,
12270
19:06:30,760 --> 19:06:34,840
a bunch of different evaluation metrics. And if we go down over here, we've got confusion
12271
19:06:34,840 --> 19:06:43,240
matrix. So I only touched on five here, but or six. But if you look at torch metrics, they've got,
12272
19:06:43,240 --> 19:06:48,360
how many is that about 25 different classification metrics? So if you want some extra curriculum,
12273
19:06:48,360 --> 19:06:54,760
you can read through these. But let's go to confusion matrix. And if we look at some code here,
12274
19:06:54,760 --> 19:07:00,040
we've got torch metrics, confusion matrix, we need to pass in number of classes. We can
12275
19:07:00,040 --> 19:07:04,920
normalize if we want. And do you notice how this is quite similar to the pytorch documentation?
12276
19:07:05,640 --> 19:07:11,160
Well, that's the beautiful thing about torch metrics is that it's created with pytorch in mind.
12277
19:07:12,040 --> 19:07:15,320
So let's try out if you wanted to try it out on some
12278
19:07:16,280 --> 19:07:19,960
tester code, you could do it here. But since we've already got some of our own code,
12279
19:07:19,960 --> 19:07:28,200
let's just bring in this. And then number three is to plot it. We've got another helper package here,
12280
19:07:29,080 --> 19:07:37,800
plot the confusion matrix using ML extend. So this is another one of my favorite helper
12281
19:07:37,800 --> 19:07:41,880
libraries for machine learning things. It's got a lot of functionality that you can code up
12282
19:07:41,880 --> 19:07:47,240
yourself, but you often find yourself coding at a few too many times, such as plotting a confusion
12283
19:07:47,240 --> 19:07:57,400
matrix. So if we look up ML extend plot confusion matrix, this is a wonderful library. I believe it was
12284
19:07:58,680 --> 19:08:06,280
it was created by Sebastian Rushka, who's a machine learning researcher and also author of
12285
19:08:06,280 --> 19:08:11,400
a great book. There he is. Yeah, this is a side note machine learning with pytorch and
12286
19:08:11,400 --> 19:08:17,400
scikit loan. I just got this book it just got released in the start of 2022. And it's a great
12287
19:08:17,400 --> 19:08:22,200
book. So that's a little side note for learning more about machine learning with pytorch and scikit
12288
19:08:22,200 --> 19:08:27,160
loan. So shout out to Sebastian Rushka. Thank you for this package as well. This is going to
12289
19:08:27,160 --> 19:08:32,040
just help us plot a confusion matrix like this. So we'll have our predicted labels on the bottom
12290
19:08:32,040 --> 19:08:36,520
and our true labels on the side here. But we can just copy this code in here.
12291
19:08:36,520 --> 19:08:44,520
Link sorry, and then confusion matrix, we can copy that in here. The thing is that torch
12292
19:08:44,520 --> 19:08:51,560
metrics doesn't come with Google Colab. So if you're using Google Colab, I think ML extend does,
12293
19:08:51,560 --> 19:08:58,280
but we need a certain version of ML extend that Google Colab doesn't yet have yet. So we actually
12294
19:08:58,280 --> 19:09:05,720
need version 0.19.0. But we're going to import those in a second. Let's first make some predictions
12295
19:09:05,720 --> 19:09:12,600
across our entire test data set. So previously, we made some predictions only on nine random samples.
12296
19:09:13,800 --> 19:09:19,240
So random sample, we selected nine. You could, of course, change this number to make it on more.
12297
19:09:19,800 --> 19:09:25,240
But this was only on nine samples. Let's write some code to make predictions across our entire
12298
19:09:25,240 --> 19:09:32,120
test data set. So import tqdm.auto for progress bar tracking.
12299
19:09:32,120 --> 19:09:37,880
So tqdm.auto. We don't need to re-import it. I believe we've already got it above, but I'm just
12300
19:09:37,880 --> 19:09:44,520
going to do it anyway for completeness. And so we're going to make, this is step one, above,
12301
19:09:45,320 --> 19:09:53,480
make predictions, make predictions with trained model. Our trained model is model two. So let's
12302
19:09:53,480 --> 19:09:59,320
create an empty predictions list. So we can add our predictions to that. We're going to set our
12303
19:09:59,320 --> 19:10:06,360
model into evaluation mode. And we're going to set with torch inference mode as our context manager.
12304
19:10:06,360 --> 19:10:11,800
And then inside that, let's just build the same sort of code that we used for our testing loop,
12305
19:10:12,840 --> 19:10:16,040
except this time we're going to append all of our predictions to a list.
12306
19:10:18,040 --> 19:10:25,400
So we're going to iterate through the test data loader. And we can give our tqdm description.
12307
19:10:25,400 --> 19:10:30,200
We're going to say making predictions dot dot dot. You'll see what that looks like in a minute.
12308
19:10:31,240 --> 19:10:40,840
And here we are going to send the data and targets to target device. So x, y equals x
12309
19:10:42,040 --> 19:10:50,120
to device and y to device. Wonderful. And we're going to do the forward pass.
12310
19:10:50,120 --> 19:10:56,360
So we're going to create y logit. Remember, the raw outputs of a model with a linear layer at the
12311
19:10:56,360 --> 19:11:02,280
end are referred to as logits. And we don't need to calculate the loss, but we want to turn predictions
12312
19:11:03,480 --> 19:11:15,640
from logits to prediction probabilities to prediction labels. So we'll set here y pred equals torch
12313
19:11:15,640 --> 19:11:20,520
dot softmax. You could actually skip the torch softmax step if you wanted to and just take the
12314
19:11:20,520 --> 19:11:26,120
argmax of the logits. But we will just go from prediction probabilities to pred labels for completeness.
12315
19:11:27,080 --> 19:11:33,880
So squeeze and we're going to do it across the first dimension or the zeroth dimension. And then
12316
19:11:33,880 --> 19:11:39,560
we'll take the argmax of that across the first dimension as well. And a little tidbit. If you
12317
19:11:40,440 --> 19:11:44,360
take different dimensions here, you'll probably get different values. So just check the inputs
12318
19:11:44,360 --> 19:11:49,560
and outputs of your code to make sure you're using the right dimension here. And so let's go
12319
19:11:49,560 --> 19:11:56,520
put predictions on CPU for evaluation. Because if we're going to plot anything, that plot lib will
12320
19:11:56,520 --> 19:12:03,800
want them on the CPU. So we're going to append our predictions to y preds, y pred dot CPU.
12321
19:12:04,680 --> 19:12:10,280
Beautiful. And because we're going to have a list of different predictions, we can use concatenate
12322
19:12:10,280 --> 19:12:17,960
a list of predictions into a tensor. So let's just print out y preds. And so I can show you what
12323
19:12:17,960 --> 19:12:24,120
it looks like. And then if we go y pred tensor, this is going to turn our list of predictions
12324
19:12:24,120 --> 19:12:33,640
into a single tensor. And then we'll go y pred tensor. And we'll view the first 10. Let's see if this
12325
19:12:33,640 --> 19:12:39,320
works. So making predictions. Oh, would you look at that? Okay, so yeah, here's our list of
12326
19:12:39,320 --> 19:12:46,360
predictions. A big list of tensors. Right, we don't really want it like that. So if we get rid of
12327
19:12:46,360 --> 19:12:51,720
that, and there's our progress bar, it's going through each batch in the test data load, so there's
12328
19:12:51,720 --> 19:13:00,520
313 batches of 32. So if we comment out print y preds, this line here torch dot cat y preds is
12329
19:13:00,520 --> 19:13:07,320
going to turn this these tensors into a single tensor, or this list of tensors into a single
12330
19:13:07,320 --> 19:13:13,720
tensor concatenate. Now, if we have a look, there we go, beautiful. And if we have a look at the
12331
19:13:13,720 --> 19:13:17,960
whole thing, we're making predictions every single time here, but that's all right. They are pretty
12332
19:13:17,960 --> 19:13:25,160
quick. There we go. One big long tensor. And then if we check length y pred tensor, there should be
12333
19:13:25,160 --> 19:13:34,200
one prediction per test sample. 10,000 beautiful. So now we're going to, we need to install torch
12334
19:13:34,200 --> 19:13:38,840
metrics because torch metrics doesn't come with Google Colab at the time of recording. So let
12335
19:13:38,840 --> 19:13:45,640
me just show you if we tried to import torch metrics. It doesn't, it might in the future, so just keep
12336
19:13:45,640 --> 19:13:50,040
that in mind, it might come with Google Colab because it's a pretty useful package. But let's
12337
19:13:50,040 --> 19:14:02,280
now install see if required packages are installed. And if not, install them. So we'll just install
12338
19:14:02,280 --> 19:14:08,200
torch metrics. We'll finish off this video by trying to import. We'll set up a try and accept
12339
19:14:08,200 --> 19:14:13,640
loop. So Python is going to try import torch metrics and ML extend. I write it like this,
12340
19:14:13,640 --> 19:14:18,280
because you may already have to which metrics and ML extend if you're running this code on a local
12341
19:14:18,280 --> 19:14:23,720
machine. But if you're running it in Google Colab, which I'm sure many of you are, we are
12342
19:14:23,720 --> 19:14:29,880
going to try and import it anyway. And if it doesn't work, we're going to install it.
12343
19:14:29,880 --> 19:14:35,720
So ML extend, I'm just going to check the version here because we need version for our plot confusion
12344
19:14:35,720 --> 19:14:43,640
matrix function. This one, we need version 0.19.0 or higher. So I'm just going to write a little
12345
19:14:43,640 --> 19:14:54,040
statement here. Assert int ML extend dot version. So if these two, if this condition in the try
12346
19:14:54,040 --> 19:15:02,200
loop is or try block is accepted, it will skip the next step dot split. And I'm just going to check
12347
19:15:02,200 --> 19:15:10,600
the first index string equals is greater than or equal to 19. Otherwise, I'm going to return an
12348
19:15:10,600 --> 19:15:20,280
error saying ML extend version should be 0.19.0 or higher. And so let me just show you what this
12349
19:15:20,280 --> 19:15:31,320
looks like. If we run this here, string and int, did I not turn it into a string? Oh, excuse me.
12350
19:15:32,040 --> 19:15:40,920
There we go. And I don't need that bracket on the end. There we go. So that's what I'm saying.
12351
19:15:40,920 --> 19:15:47,160
So this is just saying, hey, the version of ML extend that you have should be 0 or should be
12352
19:15:47,160 --> 19:15:53,640
19 or higher. Because right now Google Colab by default has 14, this may change in the future.
12353
19:15:53,640 --> 19:15:58,360
So let's finish off this accept block. If the above condition fails, which it should,
12354
19:15:59,080 --> 19:16:05,960
we are going to pip install. So we're going to install this into Google Colab torch metrics.
12355
19:16:05,960 --> 19:16:11,480
We're going to do it quietly. And we're also going to pass the U tag for update ML extend.
12356
19:16:11,480 --> 19:16:20,760
So import torch metrics, ML extend afterwards, after it's been installed and upgraded. And print,
12357
19:16:20,760 --> 19:16:32,360
we're going to go ML extend version, going to go ML extend underscore version. And let's see what
12358
19:16:32,360 --> 19:16:38,520
happens if we run this. So we should see, yeah, some installation happening here. This is going
12359
19:16:38,520 --> 19:16:45,480
to install torch metrics. Oh, do we not have ML extend the upgraded version? Let's have a look.
12360
19:16:45,480 --> 19:16:52,200
We may need to restart our Google Colab instance. Ah, okay, let's take this off. Quiet.
12361
19:16:55,480 --> 19:16:57,640
Is this going to tell us to restart Google Colab?
12362
19:17:00,600 --> 19:17:06,360
Well, let's restart our runtime. After you've run this cell, if you're using Google Colab,
12363
19:17:06,360 --> 19:17:11,480
you may have to restart your runtime to reflect the fact that we have the updated version of ML
12364
19:17:11,480 --> 19:17:19,560
extend. So I'm going to restart my runtime now. Otherwise, we won't be able to plot our confusion
12365
19:17:19,560 --> 19:17:25,400
matrix. We need 0.19.0. And I'm going to run all of these cells. So I'm going to pause the video
12366
19:17:25,400 --> 19:17:31,000
here, run all of the cells by clicking run all. Note, if you run into any errors, you will have
12367
19:17:31,000 --> 19:17:35,880
to run those cells manually. And then I'm going to get back down to this cell and make sure that I
12368
19:17:35,880 --> 19:17:42,120
have ML extend version 0.1.9. I'll see in a few seconds.
12369
19:17:46,120 --> 19:17:50,760
I'm back. And just a little heads up. If you restart your runtime and click run all,
12370
19:17:50,760 --> 19:17:56,120
your Colab notebook will stop running cells if it runs into an error. So this is that error we
12371
19:17:56,120 --> 19:18:01,960
found in a previous video where our data and model were on different devices. So to skip past that,
12372
19:18:01,960 --> 19:18:09,400
we can just jump to the next cell and we can click run after. There we go. And it's going to run all
12373
19:18:09,400 --> 19:18:15,720
of the cells after for us. It's going to retrain our models. Everything's going to get rerun.
12374
19:18:15,720 --> 19:18:20,040
And then we're going to come right back down to where we were before trying to install the
12375
19:18:20,040 --> 19:18:25,240
updated version of ML extend. I'm going to write some more code while our code is running import
12376
19:18:25,240 --> 19:18:31,080
ML extend. And then I'm going to just make sure that we've got the right version here. You may
12377
19:18:31,080 --> 19:18:38,840
require a runtime restart. You may not. So just try to see after you've run this install of
12378
19:18:38,840 --> 19:18:43,880
torch metrics and upgrade of ML extend. See if you can re import ML extend. And if you have the
12379
19:18:43,880 --> 19:18:50,440
version 0.19.0 or above, we should be able to run the code. Yeah, there we go. Wonderful.
12380
19:18:50,440 --> 19:19:02,360
ML extend 0.19.0. And we've got ML extend version, assert, import. Beautiful. So we've got a lot
12381
19:19:02,360 --> 19:19:08,600
of extra code here. In the next video, let's move forward with creating a confusion matrix.
12382
19:19:08,600 --> 19:19:12,440
I just wanted to show you how to install and upgrade some packages in Google Colab if you
12383
19:19:12,440 --> 19:19:18,120
don't have them. But now we've got predictions across our entire test data set. And we're going
12384
19:19:18,120 --> 19:19:25,720
to be moving towards using confusion matrix function here to compare our predictions versus the target
12385
19:19:25,720 --> 19:19:32,280
data of our test data set. So I'll see in the next video, let's plot a confusion matrix.
12386
19:19:36,200 --> 19:19:40,840
Welcome back. In the last video, we wrote a bunch of code to import some extra libraries that we
12387
19:19:40,840 --> 19:19:45,960
need for plotting a confusion matrix. This is really helpful, by the way. Google Colab comes
12388
19:19:45,960 --> 19:19:50,040
with a lot of prebuilt installed stuff. But definitely later on down the track, you're going to need
12389
19:19:50,040 --> 19:19:55,640
to have some experience installing stuff. And this is just one way that you can do it. And we also
12390
19:19:55,640 --> 19:20:01,400
made predictions across our entire test data set. So we've got 10,000 predictions in this tensor.
12391
19:20:01,400 --> 19:20:06,520
And what we're going to do with a confusion matrix is confirm or compare these predictions
12392
19:20:06,520 --> 19:20:13,240
to the target labels in our test data set. So we've done step number one. And we've prepared
12393
19:20:13,240 --> 19:20:20,040
ourselves for step two and three, by installing torch metrics, and installing ML extend or the
12394
19:20:20,040 --> 19:20:25,480
later version of ML extend. So now let's go through step two, making a confusion matrix,
12395
19:20:25,480 --> 19:20:30,440
and step three plotting that confusion matrix. This is going to look so good. I love how good
12396
19:20:30,440 --> 19:20:35,880
confusion matrix is look. So because we've got torch metrics now, we're going to import the
12397
19:20:35,880 --> 19:20:42,680
confusion matrix class. And from our ML extend, we're going to go into the plotting module,
12398
19:20:42,680 --> 19:20:49,720
and import plot confusion matrix. Recall that the documentation for both of these are
12399
19:20:50,680 --> 19:20:58,760
within torch metrics here, and within ML extend here. Let's see what they look like. So number two
12400
19:20:58,760 --> 19:21:07,000
is set up confusion matrix instance, and compare predictions to targets. That's what evaluating a
12401
19:21:07,000 --> 19:21:11,960
model is, right? Comparing our models predictions to the target predictions. So I'm going to set
12402
19:21:11,960 --> 19:21:18,760
up a confusion matrix under the variable conf mat, then I'm going to call the confusion matrix class
12403
19:21:18,760 --> 19:21:24,440
from torch metrics. And to set up an instance of it, I need to pass in the number of classes that
12404
19:21:24,440 --> 19:21:31,320
we have. So because we have 10 classes, they are all contained within class names. Recall that
12405
19:21:31,320 --> 19:21:36,120
class names is a list of all of the different classes that we're working with. So I'm just going
12406
19:21:36,120 --> 19:21:41,480
to pass in the number of classes as the length of our class names. And then I can use that
12407
19:21:41,480 --> 19:21:48,760
conf mat instance, confusion matrix instance, to create a confusion matrix tensor by passing
12408
19:21:48,760 --> 19:21:55,160
into conf mat, which is what I've just created up here. Conf mat, just like we do with our loss
12409
19:21:55,160 --> 19:22:03,240
function, I'm going to pass in preds equals our Y pred tensor, which is just above Y pred tensor
12410
19:22:03,240 --> 19:22:10,120
that we calculated all of the predictions on the test data set. There we go. That's our preds.
12411
19:22:10,120 --> 19:22:18,600
And our target is going to be equal to test data dot targets. And this is our test data data set
12412
19:22:18,600 --> 19:22:24,280
that we've seen before. So if we go test data and press tab, we've got a bunch of different
12413
19:22:24,280 --> 19:22:32,040
attributes, we can get the classes. And of course, we can get the targets, which is the labels.
12414
19:22:32,040 --> 19:22:38,280
PyTorch calls labels targets. I usually refer to them as labels, but the target is the test data
12415
19:22:38,280 --> 19:22:44,120
target. So we want to compare our models predictions on the test data set to our test data targets.
12416
19:22:44,920 --> 19:22:49,080
And so let's keep going forward. We're up to step number three now. So this is going to create
12417
19:22:49,080 --> 19:22:55,400
our confusion matrix tensor. Oh, let's see what that looks like, actually. Conf mat tensor.
12418
19:22:58,360 --> 19:23:06,920
Oh, okay. So we've got a fair bit going on here. But let's turn this into a pretty version of this.
12419
19:23:06,920 --> 19:23:12,440
So along the bottom is going to be our predicted labels. And along the side here is going to be
12420
19:23:12,440 --> 19:23:17,480
our true labels. But this is where the power of ML extend comes in. We're going to plot our
12421
19:23:17,480 --> 19:23:25,000
confusion matrix. So let's create a figure and an axes. We're going to call the function plot
12422
19:23:25,000 --> 19:23:32,280
confusion matrix that we've just imported above. And we're going to pass in our conf mat equals
12423
19:23:32,280 --> 19:23:38,520
our conf mat tensor. But because we're working with map plot lib, it'll want it as NumPy.
12424
19:23:39,640 --> 19:23:47,720
So I'm just going to write here, map plot lib likes working with NumPy. And we're going to
12425
19:23:47,720 --> 19:23:53,960
pass in the class names so that we get labels for each of our rows and columns. Class names,
12426
19:23:53,960 --> 19:23:58,600
this is just a list of our text based class names. And then I'm going to set the fig size
12427
19:23:58,600 --> 19:24:05,000
to my favorite hand and poker, which is 10, seven. Also happens to be a good dimension for
12428
19:24:05,000 --> 19:24:12,440
Google Colab. Look at that. Oh, that is something beautiful to see. Now a confusion matrix. The
12429
19:24:12,440 --> 19:24:18,360
ideal confusion matrix will have all of the diagonal rows darkened with all of the values
12430
19:24:18,360 --> 19:24:23,480
and no values here, no values here. Because that means that the predicted label lines up with the
12431
19:24:23,480 --> 19:24:29,240
true label. So in our case, we have definitely a very dark diagonal here. But let's dive into
12432
19:24:29,240 --> 19:24:34,600
some of the highest numbers here. It looks like our model is predicting shirt when the true label
12433
19:24:34,600 --> 19:24:40,040
is actually t shirt slash top. So that is reflective of what we saw before. Do we still have that
12434
19:24:40,040 --> 19:24:46,200
image there? Okay, we don't have an image there. But in a previous video, we saw that when we plotted
12435
19:24:46,200 --> 19:24:52,760
our predictions, the model predicted t shirt slash top when it was actually a shirt. And of course,
12436
19:24:52,760 --> 19:24:58,360
vice versa. So what's another one here? Looks like our model is predicting shirt when it's
12437
19:24:58,360 --> 19:25:05,800
actually a coat. And now this is something that you can use to visually inspect your data to see
12438
19:25:05,800 --> 19:25:12,280
if the the errors that your model is making make sense from a visual perspective. So it's getting
12439
19:25:12,280 --> 19:25:16,920
confused by predicting pull over when the actual label is coat, predicting pull over when the
12440
19:25:16,920 --> 19:25:22,200
actual label is shirt. So a lot of these things clothing wise and data wise may in fact look
12441
19:25:22,200 --> 19:25:27,720
quite the same. Here's a relatively large one as well. It's predicting sneaker when it should be
12442
19:25:27,720 --> 19:25:33,560
an ankle boot. So it's confusing two different types of shoes there. So this is just a way to
12443
19:25:33,560 --> 19:25:38,360
further evaluate your model and start to go. Hmm, maybe our labels are a little bit confusing.
12444
19:25:38,360 --> 19:25:43,720
Could we expand them a little bit more? So keep that in mind, a confusion matrix is one of the
12445
19:25:43,720 --> 19:25:50,920
most powerful ways to visualize your classification model predictions. And a really, really, really
12446
19:25:50,920 --> 19:25:55,880
helpful way of creating one is to use torch metrics confusion matrix. And to plot it,
12447
19:25:56,840 --> 19:26:02,440
you can use plot confusion matrix from ML extend. However, if you're using Google Colab for these,
12448
19:26:02,440 --> 19:26:08,920
you may need to import them or install them. So that's a confusion matrix. If you'd like
12449
19:26:08,920 --> 19:26:13,960
more classification metrics, you've got them here. And you've got, of course, more in torch
12450
19:26:13,960 --> 19:26:20,120
metrics. So give that a look. I think in the next video, we've done a fair bit of evaluation.
12451
19:26:20,120 --> 19:26:25,400
Where are we up to in our workflow? I believe it's time we saved and loaded our best trained model.
12452
19:26:25,400 --> 19:26:27,800
So let's give that a go. I'll see you in the next video.
12453
19:26:31,160 --> 19:26:36,840
In the last video, we created a beautiful confusion matrix with the power of torch metrics
12454
19:26:37,640 --> 19:26:44,520
and ML extend. But now it's time to save and load our best model. Because if we, if we evaluated it,
12455
19:26:44,520 --> 19:26:48,920
our convolutional neural network and go, you know what, this model is pretty good. Let's export
12456
19:26:48,920 --> 19:26:54,840
it to a file so we can use it somewhere else. Let's see how we do that. And by the way, if we go into
12457
19:26:54,840 --> 19:27:02,360
our keynote, we've got a value at model torch metrics. We've been through this a fair few times
12458
19:27:02,360 --> 19:27:06,920
now. We've improved through experimentation. We haven't used tensor board yet, but that'll be
12459
19:27:06,920 --> 19:27:12,600
in a later video and save and reload your trained model. So here's where we're up to. If we've gone
12460
19:27:12,600 --> 19:27:16,200
through all these steps enough times and we're like, you know what, let's save our model so we
12461
19:27:16,200 --> 19:27:20,680
can use it elsewhere. And we can reload it in to make sure that it's, it's saved correctly.
12462
19:27:21,320 --> 19:27:26,840
Let's go through with this step. We want number 11. We're going to go save and load
12463
19:27:26,840 --> 19:27:31,400
best performing model. You may have already done this before. So if you've been through the other
12464
19:27:31,400 --> 19:27:35,960
parts of the course, you definitely have. So if you want to give that a go, pause the video now
12465
19:27:35,960 --> 19:27:43,480
and try it out yourself. I believe we did it in notebook number one. We have here we go,
12466
19:27:43,480 --> 19:27:48,040
saving and loading a pie torch model. You can go through this section of section number one
12467
19:27:48,040 --> 19:27:53,960
on your own and see if you can do it. Otherwise, let's code it out together. So I'm going to start
12468
19:27:53,960 --> 19:28:00,680
from with importing path from path lib, because I like to create a model directory path.
12469
19:28:01,560 --> 19:28:08,520
So create model directory path. So my model path is going to be set equal to path. And I'm going
12470
19:28:08,520 --> 19:28:13,480
to save it to models. This is where I want to, I want to create file over here called models
12471
19:28:14,200 --> 19:28:22,680
and save my models to their model path dot MKD for make directory parents. Yes, I wanted to make
12472
19:28:22,680 --> 19:28:28,200
the parent directories if they don't exist and exist. Okay. Also equals true. So if we try to
12473
19:28:28,200 --> 19:28:33,720
create it, but it's already existing, we're not going to get an error. That's fine. And next,
12474
19:28:33,720 --> 19:28:38,200
we're going to create a model save path. Just going to add some code cells here. So we have
12475
19:28:38,200 --> 19:28:48,520
more space. Let's pass in here a model name. Going to set this equal to, since we're on section three,
12476
19:28:48,520 --> 19:28:56,600
I'm going to call this O three pie torch, computer vision, model two is our best model. And I'm going
12477
19:28:56,600 --> 19:29:04,680
to save it to PTH for pie torch. You can also save it to dot PT. I like to use PTH. And we're
12478
19:29:04,680 --> 19:29:13,720
going to go model save path equal model path slash model name. So now if we have a look at this,
12479
19:29:13,720 --> 19:29:20,200
we're going to have a path called model save path. But it's going to be a POSIX path in models
12480
19:29:20,200 --> 19:29:24,600
O three pie torch computer vision, model two dot PTH. And if we have a look over here,
12481
19:29:25,320 --> 19:29:30,200
we should have, yeah, we have a models directory now. That's not going to have anything in it at
12482
19:29:30,200 --> 19:29:34,440
the moment. We've got our data directory that we had before there's fashion MNIST. This is a good
12483
19:29:34,440 --> 19:29:40,680
way to start setting up your directories, break them down data models, helper function files,
12484
19:29:40,680 --> 19:29:48,760
etc. But let's keep going. Let's save, save the model state dict. We're going to go print,
12485
19:29:49,800 --> 19:29:56,520
saving model to just going to give us some information about what's happening. Model save
12486
19:29:56,520 --> 19:30:03,160
path. And we can save a model by calling torch dot save. And we pass in the object that we want
12487
19:30:03,160 --> 19:30:10,520
to save using the object parameter, OBJ. When we get a doc string there, we're going to go model
12488
19:30:10,520 --> 19:30:16,280
two, we want to save the state dict, recall that the state dict is going to be our models what
12489
19:30:17,000 --> 19:30:21,880
our models learned parameters on the data set, so that all the weights and biases and all that
12490
19:30:21,880 --> 19:30:29,160
sort of jazz. Beautiful. So when we first created model two, these were all random numbers. They've
12491
19:30:29,160 --> 19:30:34,760
been or since we trained model two on our training data, these have all been updated to represent
12492
19:30:34,760 --> 19:30:40,520
the training images. And we can leverage these later on, as you've seen before, to make predictions.
12493
19:30:40,520 --> 19:30:45,320
So I'm not going to go through all those, but that's what we're saving. And the file path is
12494
19:30:45,320 --> 19:30:52,440
going to be our model save path. So let's run this and see what happens. Beautiful. We're saving our
12495
19:30:52,440 --> 19:30:58,680
model to our model directory. And now let's have a look in here. Do we have a model? Yes, we do.
12496
19:30:58,680 --> 19:31:03,640
Beautiful. So that's how quickly we can save a model. Of course, you can customize what the name is,
12497
19:31:03,640 --> 19:31:08,600
where you save it, et cetera, et cetera. Now, let's see what happens when we load it in.
12498
19:31:09,480 --> 19:31:13,880
So create a new instance, because we only saved the state dict of model two,
12499
19:31:14,440 --> 19:31:20,200
we need to create a new instance of our model two, or how it was created, which was with
12500
19:31:20,200 --> 19:31:27,080
our class fashion MNIST V two. If we saved the whole model, we could just import it to a new
12501
19:31:27,080 --> 19:31:32,360
variable. But I'll let you read back more on that on the different ways of saving a model in here.
12502
19:31:32,360 --> 19:31:38,280
There's also a link to the pytorch documentation would highly recommend that. But let's see it in
12503
19:31:38,280 --> 19:31:45,480
action, we need to create a new instance of our fashion MNIST model V two, which is our convolution
12504
19:31:45,480 --> 19:31:50,840
or neural network. So I'm going to set the manual seed. That way when we create a new instance,
12505
19:31:50,840 --> 19:31:56,200
it's instantiated with the same random numbers. So we're going to set up loaded model two,
12506
19:31:56,200 --> 19:32:04,440
equals fashion MNIST V two. And it's important here that we set it up with the same parameters
12507
19:32:04,440 --> 19:32:10,280
as our original saved model. So fashion MNIST V two. Oh, we've got a typo here.
12508
19:32:11,320 --> 19:32:17,000
I'll fashion MNIST model V two. Wonderful. So the input shape is going to be one,
12509
19:32:17,000 --> 19:32:22,760
because that is the number of color channels in our test, in our images, test image dot shape.
12510
19:32:22,760 --> 19:32:29,400
Do we still have a test image should be? Oh, well, we've created a different one, but our image size,
12511
19:32:29,400 --> 19:32:39,080
our image shape is 12828 image shape for color channels height width. Then we create it with
12512
19:32:39,080 --> 19:32:43,480
hidden units, we use 10 for hidden units. So we can just set that here. This is important,
12513
19:32:43,480 --> 19:32:46,600
they just have to otherwise if the shapes aren't the same, what are we going to get? We're going
12514
19:32:46,600 --> 19:32:52,520
to get a shape mismatch error. And our output shape is what is also going to be 10 or
12515
19:32:52,520 --> 19:32:59,560
length of class names. If you have the class names variable instantiated, that is. So we're
12516
19:32:59,560 --> 19:33:06,040
going to load in the saved state dict, the one that we just saved. So we can go loaded model two,
12517
19:33:07,560 --> 19:33:15,400
dot load state dict. And we can pass in torch dot load in here. And the file that we want to load
12518
19:33:15,400 --> 19:33:22,840
or the file path is model save path up here. This is why I like to just save my path variables
12519
19:33:22,840 --> 19:33:28,360
to a variable so that I can just use them later on, instead of re typing out this all the time,
12520
19:33:28,360 --> 19:33:34,600
which is definitely prone to errors. So we're going to send the model to the target device.
12521
19:33:34,600 --> 19:33:44,040
Loaded model two dot two device. Beautiful. Let's see what happens here.
12522
19:33:46,520 --> 19:33:54,280
Wonderful. So let's now evaluate the loaded model. So evaluate loaded model. The results
12523
19:33:54,280 --> 19:34:00,520
should be very much the same as our model two results. So model two results.
12524
19:34:00,520 --> 19:34:07,480
So this is what we're looking for. We want to make sure that our saved model saved these results
12525
19:34:07,480 --> 19:34:12,680
pretty closely. Now I say pretty closely because you might find some discrepancies in this lower
12526
19:34:12,680 --> 19:34:18,280
these lower decimals here, just because of the way files get saved and something gets lost,
12527
19:34:18,280 --> 19:34:24,200
et cetera, et cetera. So that's just to do with precision and computing. But as long as the first
12528
19:34:24,200 --> 19:34:32,200
few numbers are quite similar, well, then we're all gravy. So let's go torch manual seed.
12529
19:34:33,960 --> 19:34:39,400
Remember, evaluating a model is almost as well is just as important as training a model. So this
12530
19:34:39,400 --> 19:34:44,280
is what we're doing. We're making sure our model save correctly. Before we deployed it, if it didn't
12531
19:34:44,280 --> 19:34:49,320
if we deployed it, it didn't save correctly. Well, then we'd get our we would get less than ideal
12532
19:34:49,320 --> 19:34:54,600
results, wouldn't we? So model equals loaded model two, we're going to use our same
12533
19:34:54,600 --> 19:35:00,040
of our model function, by the way. And of course, we're going to evaluate it on the same test data
12534
19:35:00,040 --> 19:35:05,560
set that we've been using test data loader. And we're going to create a loss function or just
12535
19:35:05,560 --> 19:35:10,440
put in our loss function that we've created before. And our accuracy function is the accuracy
12536
19:35:10,440 --> 19:35:15,320
function we've been using throughout this notebook. So now let's check out loaded model two results.
12537
19:35:15,320 --> 19:35:20,840
They should be quite similar to this one. We're going to make some predictions. And then if we go
12538
19:35:20,840 --> 19:35:28,360
down, do we have the same numbers? Yes, we do. So we have five, six, eight, two, nine, five, six,
12539
19:35:28,360 --> 19:35:33,800
eight, two, nine, wonderful. And three, one, three, five, eight, three, one, three, five, eight,
12540
19:35:33,800 --> 19:35:41,160
beautiful. It looks like our loaded model gets the same results as our previously trained model
12541
19:35:41,160 --> 19:35:47,160
before we even saved it. And if you wanted to check if they were close, you can also use torch
12542
19:35:47,160 --> 19:35:52,200
dot is close, check if model results, if you wanted to check if they were close programmatically,
12543
19:35:52,200 --> 19:35:57,320
that is, because we just looked at these visually, check if model results are close to each other.
12544
19:35:58,760 --> 19:36:04,280
Now we can go torch is close, we're going to pass in torch dot tensor, we have to turn these
12545
19:36:04,280 --> 19:36:12,520
values into a tensor. We're going to go model two results. And we'll compare the model loss.
12546
19:36:13,080 --> 19:36:18,360
How about we do that? We want to make sure the loss values are the same. Or very close,
12547
19:36:18,360 --> 19:36:25,880
that is with torch dot is close. Torch dot tensor model. Or we want this one to be loaded model two
12548
19:36:25,880 --> 19:36:35,400
results. Model loss. Another bracket on the end there. And we'll see how close they are true,
12549
19:36:35,400 --> 19:36:41,080
wonderful. Now, if this doesn't return true, you can also adjust the tolerance levels in here.
12550
19:36:41,800 --> 19:36:47,960
So we go atal equals, this is going to be the absolute tolerance. So if we do one to the negative
12551
19:36:47,960 --> 19:36:53,560
eight, it's saying like, Hey, we need to make sure our results are basically the same up to eight
12552
19:36:53,560 --> 19:36:59,400
decimal points. That's probably quite low. I would say just make sure they're at least within two.
12553
19:37:00,360 --> 19:37:05,720
But if you're getting discrepancies here between your saved model and your loaded model, or sorry,
12554
19:37:05,720 --> 19:37:10,840
this model here, the original one and your loaded model, if they are quite large, so they're like
12555
19:37:10,840 --> 19:37:15,480
more than a few decimal points off in this column or even here, I'd go back through your code and
12556
19:37:15,480 --> 19:37:19,880
make sure that your model is saving correctly, make sure you've got random seeds set up. But
12557
19:37:19,880 --> 19:37:24,680
if they're pretty close, like in terms of within three or two decimal places of each other,
12558
19:37:24,680 --> 19:37:29,240
well, then I'd say that's that's close enough. But you can also adjust the tolerance level here
12559
19:37:29,240 --> 19:37:37,240
to check if your model results are close enough, programmatically. Wow, we have covered a fair bit
12560
19:37:37,240 --> 19:37:43,400
here. We've gone through this entire workflow for a computer vision problem. Let's in the next
12561
19:37:43,400 --> 19:37:49,400
video, I think that's enough code for this section, section three, pytorch computer vision. I've got
12562
19:37:49,400 --> 19:37:53,880
some exercises and some extra curriculum lined up for you. So let's have a look at those in the
12563
19:37:53,880 --> 19:38:03,480
next video. I'll see you there. My goodness. Look how much computer vision pytorch code
12564
19:38:03,480 --> 19:38:08,120
we've written together. We started off right up the top. We looked at the reference notebook and
12565
19:38:08,120 --> 19:38:12,680
the online book. We checked out computer vision libraries and pytorch, the main one being torch
12566
19:38:12,680 --> 19:38:17,960
vision. Then we got a data set, namely the fashion MNIST data set. There are a bunch more data sets
12567
19:38:17,960 --> 19:38:21,880
that we could have looked at. And in fact, I'd encourage you to try some out in the torch vision
12568
19:38:21,880 --> 19:38:28,360
dot data sets, use all of the steps that we've done here to try it on another data set. We repaired
12569
19:38:28,360 --> 19:38:34,440
our data loaders. So turned our data into batches. We built a baseline model, which is an important
12570
19:38:34,440 --> 19:38:39,800
step in machine learning, because the baseline model is usually relatively simple. And it's going
12571
19:38:39,800 --> 19:38:45,480
to serve as a baseline that you're going to try and improve upon through just go back to the keynote
12572
19:38:45,480 --> 19:38:52,360
through various experiments. We then made predictions with model zero. We evaluated it.
12573
19:38:53,000 --> 19:38:57,880
We timed our predictions to see if running our models on the GPU was faster when we learned that
12574
19:38:57,880 --> 19:39:02,920
sometimes a GPU won't necessarily speed up code if it's a relatively small data set because of the
12575
19:39:02,920 --> 19:39:09,800
overheads between copying data from CPU to GPU. We tried a model with non-linearity and we saw that
12576
19:39:09,800 --> 19:39:15,640
it didn't really improve upon our baseline model. But then we brought in the big guns, a convolutional
12577
19:39:15,640 --> 19:39:21,160
neural network, replicating the CNN explainer website. And by gosh, didn't we spend a lot of time
12578
19:39:21,160 --> 19:39:27,000
here? I'd encourage you as part of your extra curriculum to go through this again and again.
12579
19:39:27,000 --> 19:39:32,440
I still even come back to refer to it too. I referred to it a lot making the materials for this
12580
19:39:32,440 --> 19:39:37,720
video section and this code section. So be sure to go back and check out the CNN explainer website
12581
19:39:37,720 --> 19:39:44,600
for more of what's going on behind the scenes of your CNNs. But we coded one using pure pytorch.
12582
19:39:44,600 --> 19:39:50,120
That is amazing. We compared our model results across different experiments. We found that our
12583
19:39:50,120 --> 19:39:55,800
convolutional neural network did the best, although it took a little bit longer to train. And we also
12584
19:39:55,800 --> 19:40:01,720
learned that the training time values will definitely vary depending on the hardware you're using.
12585
19:40:01,720 --> 19:40:07,240
So that's just something to keep in mind. We made an evaluated random predictions with our best
12586
19:40:07,240 --> 19:40:13,160
model, which is an important step in visualizing, visualizing, visualizing your model's predictions,
12587
19:40:13,160 --> 19:40:18,680
because you could get evaluation metrics. But until you start to actually visualize what's going on,
12588
19:40:18,680 --> 19:40:24,520
well, in my case, that's how I best understand what my model is thinking. We saw a confusion
12589
19:40:24,520 --> 19:40:30,600
matrix using two different libraries torch metrics and ML extend a great way to evaluate
12590
19:40:30,600 --> 19:40:36,520
your classification models. And we saw how to save and load the best performing model to file
12591
19:40:36,520 --> 19:40:41,880
and made sure that the results of our saved model weren't too different from the model that
12592
19:40:41,880 --> 19:40:49,400
we trained within the notebook. So now it is time I'd love for you to practice what
12593
19:40:49,400 --> 19:40:52,680
you've gone through. This is actually really exciting now because you've gone through an end-to-end
12594
19:40:52,680 --> 19:40:58,840
computer vision problem. I've got some exercises prepared. If you go to the learn pytorch.io website
12595
19:40:58,840 --> 19:41:04,440
in section 03, scroll down. You can read through all of this. This is all the materials that we've
12596
19:41:04,440 --> 19:41:09,080
just covered in pure code. There's a lot of pictures in this notebook too that are helpful to learn
12597
19:41:09,080 --> 19:41:14,360
things what's going on. We have some exercises here. So all of the exercises are focused on
12598
19:41:14,360 --> 19:41:19,800
practicing the code and the sections above. We have two resources. We also have some extra
12599
19:41:19,800 --> 19:41:23,880
curriculum that I've put together. If you want an in-depth understanding of what's going on
12600
19:41:23,880 --> 19:41:28,200
behind the scenes in the convolutional neural networks, because we've focused a lot on code,
12601
19:41:28,760 --> 19:41:34,040
I'd highly recommend MIT's induction to deep computer vision lecture. You can spend 10 minutes
12602
19:41:34,040 --> 19:41:39,000
clicking through the different options in the pytorch vision library, torch vision, look up most
12603
19:41:39,000 --> 19:41:44,040
common convolutional neural networks in the torch vision model library, and then for a larger number
12604
19:41:44,040 --> 19:41:48,440
of pre-trained pytorch computer vision models, and if you get deeper into computer vision,
12605
19:41:48,440 --> 19:41:54,280
you're probably going to run into the torch image models library, otherwise known as 10,
12606
19:41:54,280 --> 19:41:59,080
but I'm going to leave that as extra curriculum. I'm going to just link this exercises section
12607
19:41:59,080 --> 19:42:05,960
here. Again, it's at learn pytorch.io in the exercises section. We come down. There we go.
12608
19:42:07,240 --> 19:42:13,560
But there is also resource here, an exercise template notebook. So we've got one, what are
12609
19:42:13,560 --> 19:42:17,960
three areas in industry where computer vision is being currently used. Now this is in the
12610
19:42:17,960 --> 19:42:25,080
pytorch deep learning repo, extras exercises number three. I've put out some template code here
12611
19:42:25,080 --> 19:42:30,120
for you to fill in these different sections. So some of them are code related. Some of them
12612
19:42:30,120 --> 19:42:35,160
are just text based, but they should all be able to be completed by referencing what we've gone
12613
19:42:35,160 --> 19:42:40,920
through in this notebook here. And just as one more, if we go back to pytorch deep learning,
12614
19:42:43,080 --> 19:42:47,080
this will probably be updated by the time you get here, you can always find the exercise in
12615
19:42:47,080 --> 19:42:53,560
extra curriculum by going computer vision, go to exercise in extra curriculum, or if we go into
12616
19:42:53,560 --> 19:43:00,440
the extras file, and then we go to solutions. I've now also started to add video walkthroughs
12617
19:43:00,440 --> 19:43:07,720
of each of the solutions. So this is me going through each of the exercises myself and coding
12618
19:43:07,720 --> 19:43:12,440
them. And so you'll get to see the unedited videos. So they're just one long live stream.
12619
19:43:12,440 --> 19:43:18,120
And I've done some for O2, O3, and O4, and there will be more here by the time you watch this video.
12620
19:43:18,120 --> 19:43:22,840
But if you'd like to see how I figure out the solutions to the exercises, you can watch those
12621
19:43:22,840 --> 19:43:28,840
videos and go through them yourself. But first and foremost, I would highly recommend trying out
12622
19:43:28,840 --> 19:43:34,120
the exercises on your own first. And then if you get stuck, refer to the notebook here,
12623
19:43:34,120 --> 19:43:40,200
refer to the pytorch documentation. And finally, you can check out what I would have coded as a
12624
19:43:40,200 --> 19:43:47,560
potential solution. So there's number three, computer vision, exercise solutions. So congratulations
12625
19:43:47,560 --> 19:43:52,120
on going through the pytorch computer vision section. I'll see you in the next section. We're
12626
19:43:52,120 --> 19:43:58,200
going to look at pytorch custom data sets, but no spoilers. I'll see you soon.
12627
19:44:04,760 --> 19:44:11,960
Hello, hello, hello, and welcome to section number four of the Learn pytorch for deep learning course.
12628
19:44:11,960 --> 19:44:20,600
We have custom data sets with pytorch. Now, before we dive into what we're going to cover,
12629
19:44:20,600 --> 19:44:25,080
let's answer the most important question. Where can you get help? Now, we've been through this
12630
19:44:25,080 --> 19:44:31,000
a few times now, but it's important to reiterate. Follow along with the code as best you can. We're
12631
19:44:31,000 --> 19:44:36,680
going to be writing a bunch of pytorch code. Remember the motto, if and out, run the code.
12632
19:44:37,240 --> 19:44:42,120
That's in line with try it for yourself. If you'd like to read or read the doxtring,
12633
19:44:42,120 --> 19:44:47,640
you can press shift command plus space in Google Colab. Or if you're on Windows, command might
12634
19:44:47,640 --> 19:44:52,760
be control. Then if you're still stuck, you can search for it. Two of the resources you will
12635
19:44:52,760 --> 19:44:57,640
probably come across is stack overflow or the wonderful pytorch documentation, which we've
12636
19:44:57,640 --> 19:45:03,400
had a lot of experience with so far. Then, of course, try again, go back through your code,
12637
19:45:03,400 --> 19:45:09,240
if and out, code it out, or if and out, run the code. And then finally, if you're still stuck,
12638
19:45:09,880 --> 19:45:16,600
ask a question on the pytorch deep learning discussions GitHub page. So if I click this link,
12639
19:45:16,600 --> 19:45:22,280
we come to Mr. D Burke slash pytorch deep learning, the URL is here. We've seen this before. If you
12640
19:45:22,280 --> 19:45:28,840
have a trouble or a problem with any of the course, you can start a discussion and you can
12641
19:45:28,840 --> 19:45:34,840
select the category, general ideas, polls, Q and A, and then we can go here, video,
12642
19:45:34,840 --> 19:45:43,400
put the video number in. So 99, for example, my code doesn't do what I'd like it to. So say
12643
19:45:43,400 --> 19:45:52,280
your problem and then come in here, write some code here, code here, and then my question is
12644
19:45:54,840 --> 19:45:59,320
something, something, something, click start discussion, and then we can help out. And then if
12645
19:45:59,320 --> 19:46:03,160
we come back to the discussions, of course, you can search for what's going on. So if you have an
12646
19:46:03,160 --> 19:46:07,080
error and you feel like someone else might have seen this error, you can, of course, search it
12647
19:46:07,080 --> 19:46:12,520
and find out what's happening. Now, I just want to highlight again, the resources for this course
12648
19:46:12,520 --> 19:46:17,960
are at learn pytorch.io. We are up to section four. This is a beautiful online book version of
12649
19:46:17,960 --> 19:46:23,080
all the materials we are going to cover in this section. So spoiler alert, you can use this as a
12650
19:46:23,080 --> 19:46:29,720
reference. And then, of course, in the GitHub, we have the same notebook here, pytorch custom
12651
19:46:29,720 --> 19:46:35,320
data sets. This is the ground truth notebook. So check that out if you get stuck. So I'm just
12652
19:46:35,320 --> 19:46:40,360
going to exit out of this. We've got pytorch custom data sets at learn pytorch.io. And then,
12653
19:46:40,360 --> 19:46:45,800
of course, the discussions tab for the Q&A. Now, if we jump back to the keynote, what do we have?
12654
19:46:46,760 --> 19:46:53,480
We might be asking, what is a custom data set? Now, we've built a fair few pytorch deeplining
12655
19:46:53,480 --> 19:46:59,800
neural networks so far on various data sets, such as fashion MNIST. But you might be wondering,
12656
19:46:59,800 --> 19:47:05,080
hey, I've got my own data set, or I'm working on my own problem. Can I build a model with pytorch
12657
19:47:05,080 --> 19:47:11,720
to predict on that data set? And the answer is yes. However, you do have to go through a few
12658
19:47:11,720 --> 19:47:17,000
pre processing steps to make that data set compatible with pytorch. And that's what we're
12659
19:47:17,000 --> 19:47:23,640
going to be covering in this section. And so I'd like to highlight the pytorch domain libraries.
12660
19:47:23,640 --> 19:47:28,840
Now, we've had a little bit of experience before with torch vision, such as if we wanted to classify
12661
19:47:28,840 --> 19:47:35,240
whether a photo was a pizza, steak, or sushi. So a computer vision image classification problem.
12662
19:47:35,960 --> 19:47:43,320
Now, there's also text, such as if these reviews are positive or negative. And you can use torch
12663
19:47:43,320 --> 19:47:48,120
text for that. But again, these are only just one problem within the vision space within the text
12664
19:47:48,120 --> 19:47:54,760
space. I want you to just understand that if you have any type of vision data, you probably
12665
19:47:54,760 --> 19:47:59,320
want to look into torch vision. And if you have any kind of text data, you probably want to look
12666
19:47:59,320 --> 19:48:05,640
into torch text. And then if you have audio, such as if you wanted to classify what song was playing,
12667
19:48:05,640 --> 19:48:12,760
this is what Shazam does, it uses the input sound of some sort of music, and then runs a neural network
12668
19:48:12,760 --> 19:48:18,200
over it to classify it to a certain song, you can look into torch audio for that. And then if you'd
12669
19:48:18,200 --> 19:48:23,960
like to recommend something such as you have an online store, or if your Netflix or something
12670
19:48:23,960 --> 19:48:29,480
like that, and you'd like to have a homepage that updates for recommendations, you'd like to look
12671
19:48:29,480 --> 19:48:35,320
into torch rec, which stands for recommendation system. And so this is just something to keep in mind.
12672
19:48:36,680 --> 19:48:43,560
Because each of these domain libraries has a data sets module that helps you work with different
12673
19:48:43,560 --> 19:48:49,800
data sets from different domains. And so different domain libraries contain data loading functions
12674
19:48:49,800 --> 19:48:56,600
for different data sources. So torch vision, let's just go into the next slide, we have problem space
12675
19:48:56,600 --> 19:49:02,120
vision for pre built data sets, so existing data sets like we've seen with fashion MNIST,
12676
19:49:02,120 --> 19:49:07,320
as well as functions to load your own vision data sets, you want to look into torch vision
12677
19:49:07,320 --> 19:49:14,200
dot data sets. So if we click on this, we have built in data sets, this is the pie torch documentation.
12678
19:49:14,200 --> 19:49:20,520
And if we go here, we have torch audio, torch text, torch vision, torch rec, torch data. Now,
12679
19:49:20,520 --> 19:49:26,600
at the time of recording, which is April 2022, this is torch data is currently in beta. But it's
12680
19:49:26,600 --> 19:49:32,600
going to be updated over time. So just keep this in mind, updated over time to add even more ways
12681
19:49:32,600 --> 19:49:38,520
to load different data resources. But for now, we're just going to get familiar with torch vision
12682
19:49:38,520 --> 19:49:45,720
data sets. If we went into torch text, there's another torch text dot data sets. And then if we
12683
19:49:45,720 --> 19:49:52,120
went into torch audio, we have torch audio dot data sets. And so you're noticing a trend here
12684
19:49:52,120 --> 19:49:57,960
that depending on the domain you're working in, whether it be vision, text, audio, or your data
12685
19:49:57,960 --> 19:50:03,880
is recommendation data, you'll probably want to look into its custom library within pie torch.
12686
19:50:03,880 --> 19:50:09,000
And of course, the bonus is torch data. It contains many different helper functions for loading data,
12687
19:50:09,000 --> 19:50:14,600
and is currently in beta as of April 2022. So 2022. So the by the time you watch this torch data
12688
19:50:14,600 --> 19:50:20,200
may be out of beta. And then that should be something that's extra curriculum on top of what we're
12689
19:50:20,200 --> 19:50:26,680
going to cover in this section. So let's keep going. So this is what we're going to work towards
12690
19:50:26,680 --> 19:50:35,800
building food vision mini. So we're going to load some data, namely some images of pizza,
12691
19:50:35,800 --> 19:50:42,040
sushi, and steak from the food 101 data set, we're going to build an image classification model,
12692
19:50:42,040 --> 19:50:48,040
such as the model that might power a food vision recognition app or a food image recognition app.
12693
19:50:48,760 --> 19:50:55,160
And then we're going to see if it can classify an image of pizza as pizza, an image of sushi as sushi,
12694
19:50:55,160 --> 19:51:00,520
and an image of steak as steak. So this is what we're going to focus on. We want to load,
12695
19:51:00,520 --> 19:51:06,680
say we had images existing already of pizza, sushi, and steak, we want to write some code
12696
19:51:06,680 --> 19:51:13,240
to load these images of food. So our own custom data set for building this food vision mini model,
12697
19:51:13,240 --> 19:51:17,960
which is quite similar to if you go to this is the project I'm working on personally,
12698
19:51:17,960 --> 19:51:27,800
neutrify.app. This is a food image recognition model. Here we go. So it's still a work in progress as
12699
19:51:27,800 --> 19:51:33,480
I'm going through it, but you can upload an image of food and neutrify will try to classify
12700
19:51:33,480 --> 19:51:41,400
what type of food it is. So do we have steak? There we go. Let's upload that. Beautiful steak.
12701
19:51:41,400 --> 19:51:45,880
So we're going to be building a similar model to what powers neutrify. And then there's the
12702
19:51:45,880 --> 19:51:50,920
macro nutrients for the steak. If you'd like to find out how it works, I've got all the links here,
12703
19:51:50,920 --> 19:51:56,360
but that's at neutrify.app. So let's keep pushing forward. We'll go back to the keynote.
12704
19:51:57,000 --> 19:52:02,360
This is what we're working towards. As I said, we want to load these images into PyTorch so that
12705
19:52:02,360 --> 19:52:07,080
we can build a model. We've already built a computer vision model. So we want to figure out
12706
19:52:07,080 --> 19:52:13,080
how do we get our own data into that computer vision model. And so of course we'll be adhering
12707
19:52:13,080 --> 19:52:20,760
to our PyTorch workflow that we've used a few times now. So we're going to learn how to load a
12708
19:52:20,760 --> 19:52:26,440
data set with our own custom data rather than an existing data set within PyTorch. We'll see how
12709
19:52:26,440 --> 19:52:31,880
we can build a model to fit our own custom data set. We'll go through all the steps that's involved
12710
19:52:31,880 --> 19:52:36,600
in training a model such as picking a loss function and an optimizer. We'll build a training loop.
12711
19:52:36,600 --> 19:52:44,520
We'll evaluate our model. We'll improve through experimentation. And then we can see save and reloading
12712
19:52:44,520 --> 19:52:50,760
our model. But we're also going to practice predicting on our own custom data, which is a very,
12713
19:52:50,760 --> 19:52:56,120
very important step whenever training your own models. So what we're going to cover broadly,
12714
19:52:57,400 --> 19:53:01,960
we're going to get a custom data set with PyTorch. As we said, we're going to become one with the
12715
19:53:01,960 --> 19:53:07,880
data. In other words, preparing and visualizing it. We'll learn how to transform data for use with
12716
19:53:07,880 --> 19:53:12,520
a model, very important step. We'll see how we can load custom data with pre-built functions
12717
19:53:12,520 --> 19:53:18,280
and our own custom functions. We'll build a computer vision model, aka food vision mini,
12718
19:53:18,280 --> 19:53:24,920
to classify pizza, steak, and sushi images. So a multi-class classification model. We'll compare
12719
19:53:24,920 --> 19:53:29,480
models with and without data augmentation. We haven't covered that yet, but we will later on.
12720
19:53:29,480 --> 19:53:35,400
And finally, we'll see how we can, as I said, make predictions on custom data. So this means
12721
19:53:35,400 --> 19:53:42,040
data that's not within our training or our test data set. And how are we going to do it? Well,
12722
19:53:42,040 --> 19:53:47,720
we could do it cooks or chemists. But I like to treat machine learning as a little bit of an art,
12723
19:53:47,720 --> 19:53:54,520
so we're going to be cooking up lots of code. With that being said, I'll see you in Google Colab.
12724
19:53:54,520 --> 19:54:03,800
Let's code. Welcome back to the PyTorch cooking show. Let's now learn how we can cook up some
12725
19:54:03,800 --> 19:54:11,400
custom data sets. I'm going to jump into Google Colab. So colab.research.google.com.
12726
19:54:12,760 --> 19:54:18,440
And I'm going to click new notebook. I'm just going to make sure this is zoomed in enough for
12727
19:54:18,440 --> 19:54:26,760
the video. Wonderful. So I'm going to rename this notebook 04 because we're up to section 04.
12728
19:54:27,640 --> 19:54:33,800
And I'm going to call it PyTorch custom data sets underscore video because this is going to be one
12729
19:54:33,800 --> 19:54:37,880
of the video notebooks, which has all the code that I write during the videos, which is of course
12730
19:54:37,880 --> 19:54:44,200
contained within the video notebooks folder on the PyTorch deep learning repo. So if you'd like
12731
19:54:44,200 --> 19:54:48,520
the resource or the ground truth notebook for this, I'm going to just put a heading here.
12732
19:54:49,560 --> 19:54:59,880
04 PyTorch custom data sets video notebook, make that bigger, and then put resources.
12733
19:55:01,720 --> 19:55:12,520
So book version of the course materials for 04. We'll go there, and then we'll go ground truth
12734
19:55:12,520 --> 19:55:17,800
version of notebook 04, which will be the reference notebook that we're going to use
12735
19:55:17,800 --> 19:55:24,680
for this section. Come into PyTorch custom data sets. And then we can put that in there.
12736
19:55:25,640 --> 19:55:34,040
Wonderful. So the whole synopsis of this custom data sets section is we've used some data sets
12737
19:55:34,040 --> 19:55:44,840
with PyTorch before, but how do you get your own data into PyTorch? Because that's what you
12738
19:55:44,840 --> 19:55:49,080
want to start working on, right? You want to start working on problems of your own. You want to
12739
19:55:49,080 --> 19:55:53,160
come into any sort of data that you've never worked with before, and you want to figure out how do
12740
19:55:53,160 --> 19:56:03,400
you get that into PyTorch. So one of the ways to do so is via custom data sets. And then I want
12741
19:56:03,400 --> 19:56:09,720
to put a note down here. So we're going to go zero section zero is going to be importing
12742
19:56:09,720 --> 19:56:21,080
PyTorch and setting up device agnostic code. But I want to just stress here that domain libraries.
12743
19:56:23,240 --> 19:56:31,160
So just to reiterate what we went through last video. So depending on what you're working on,
12744
19:56:31,160 --> 19:56:41,800
whether it be vision, text, audio, recommendation, something like that, you'll want to look into
12745
19:56:41,800 --> 19:56:52,200
each of the PyTorch domain libraries for existing data loader or data loading functions and
12746
19:56:52,200 --> 19:57:00,040
customizable data loading functions. So just keep that in mind. We've seen some of them. So if we
12747
19:57:00,040 --> 19:57:07,080
go torch vision, which is what we're going to be looking at, torch vision, we've got data sets,
12748
19:57:07,080 --> 19:57:12,440
and we've got documentation, we've got data sets for each of the other domain libraries here as
12749
19:57:12,440 --> 19:57:18,440
well. So if you're working on a text problem, it's going to be a similar set of steps to what
12750
19:57:18,440 --> 19:57:23,640
we're going to do with our vision problem when we build food vision mini. What we have is a data
12751
19:57:23,640 --> 19:57:28,360
set that exists somewhere. And what we want to do is bring that into PyTorch so we can build a
12752
19:57:28,360 --> 19:57:34,760
model with it. So let's import the libraries that we need. So we're going to import torch and
12753
19:57:35,960 --> 19:57:42,200
we'll probably import an N. So we'll import that from PyTorch. And I'm just going to check the
12754
19:57:42,200 --> 19:57:54,040
torch version here. So note, we need PyTorch 1.10.0 plus is required for this course. So if you're
12755
19:57:54,040 --> 19:57:59,480
using Google Colab at a later date, you may have a later version of PyTorch. I'm just going to
12756
19:57:59,480 --> 19:58:08,520
show you what version I'm using. Just going to let this load. We're going to get this ready.
12757
19:58:08,520 --> 19:58:13,000
We're going to also set up device agnostic code right from the start this time because this is
12758
19:58:13,000 --> 19:58:19,080
best practice with PyTorch. So this way, if we have a CUDA device available, our model is going
12759
19:58:19,080 --> 19:58:25,560
to use that CUDA device. And our data is going to be on that CUDA device. So there we go. Wonderful.
12760
19:58:25,560 --> 19:58:34,840
We've got PyTorch 1.10.0 plus CUDA. 111. Maybe that's 11.1. So let's check if CUDA.is available.
12761
19:58:34,840 --> 19:58:40,920
Now, I'm using Google Colab. We haven't set up a GPU yet. So it probably won't be available yet.
12762
19:58:40,920 --> 19:58:49,640
Let's have a look. Wonderful. So because we've started a new Colab instance, it's going to use
12763
19:58:49,640 --> 19:58:56,040
the CPU by default. So how do we change that? We come up to runtime, change runtime type. I'm going
12764
19:58:56,040 --> 19:59:02,520
to go hard there accelerator GPU. We've done this a few times now. I am paying for Google Colab Pro.
12765
19:59:02,520 --> 19:59:09,960
So one of the benefits of that is that it our Google Colab reserves faster GPUs for you. You do
12766
19:59:09,960 --> 19:59:15,160
don't need Google Colab Pro. As I've said to complete this course, you can use the free version,
12767
19:59:15,160 --> 19:59:22,840
but just recall Google Colab Pro tends to give you a better GPU just because GPUs aren't free.
12768
19:59:23,960 --> 19:59:28,600
Wonderful. So now we've got access to a GPU CUDA. What GPU do I have?
12769
19:59:30,360 --> 19:59:36,840
Nvidia SMI. I have a Tesla P100 with 16 gigabytes of memory, which will be more than enough for
12770
19:59:36,840 --> 19:59:43,400
the problem that we're going to work on in this video. So I believe that's enough to cover for
12771
19:59:43,400 --> 19:59:49,240
the first coding video. Let's in the next section, we are working with custom datasets after all.
12772
19:59:49,240 --> 19:59:51,880
Let's in the next video. Let's get some data, hey.
12773
19:59:55,320 --> 20:00:01,560
Now, as I said in the last video, we can't cover custom datasets without some data. So let's get
12774
20:00:01,560 --> 20:00:07,720
some data and just remind ourselves what we're going to build. And that is food vision mini.
12775
20:00:07,720 --> 20:00:13,320
So we need a way of getting some food images. And if we go back to Google Chrome,
12776
20:00:14,200 --> 20:00:21,240
torch vision datasets has plenty of built-in datasets. And one of them is the food 101 dataset.
12777
20:00:22,200 --> 20:00:30,520
Food 101. So if we go in here, this is going to take us to the original food 101 website.
12778
20:00:30,520 --> 20:00:37,000
So food 101 is 101 different classes of food. It has a challenging dataset of 101 different
12779
20:00:37,000 --> 20:00:44,920
food categories with 101,000 images. So that's a quite a beefy dataset. And so for each class,
12780
20:00:44,920 --> 20:00:52,920
250 manually reviewed test images are provided. So we have per class, 101 classes, 250 testing
12781
20:00:52,920 --> 20:01:00,680
images, and we have 750 training images. Now, we could start working on this entire dataset
12782
20:01:00,680 --> 20:01:06,280
straight from the get go. But to practice, I've created a smaller subset of this dataset,
12783
20:01:06,280 --> 20:01:12,280
and I'd encourage you to do the same with your own problems. Start small and upgrade when necessary.
12784
20:01:13,080 --> 20:01:18,680
So I've reduced the number of categories to three and the number of images to 10%.
12785
20:01:18,680 --> 20:01:26,840
Now, you could reduce this to an arbitrary amount, but I've just decided three is enough to begin with
12786
20:01:26,840 --> 20:01:32,280
and 10% of the data. And then if it works, hey, you could upscale that on your own accord.
12787
20:01:32,840 --> 20:01:38,200
And so I just want to show you the notebook that I use to create this dataset and as extra curriculum,
12788
20:01:38,200 --> 20:01:43,800
you could go through this notebook. So if we go into extras, 04 custom data creation,
12789
20:01:43,800 --> 20:01:50,280
this is just how I created the subset of data. So making a dataset to use with notebook number
12790
20:01:50,280 --> 20:01:58,120
four, I created it in custom image data set or image classification style. So we have a top level
12791
20:01:58,120 --> 20:02:03,240
folder of pizza, steak, and sushi. We have a training directory with pizza, steak, and sushi
12792
20:02:03,240 --> 20:02:09,960
images. And we have a test directory with pizza, steak, and sushi images as well. So you can go
12793
20:02:09,960 --> 20:02:16,200
through that to check it out how it was made. But now, oh, and also, if you go to loan pytorch.io
12794
20:02:16,200 --> 20:02:22,440
section four, there's more information here about what food 101 is. So get data. Here we go.
12795
20:02:23,080 --> 20:02:28,840
There's all the information about food 101. There's some resources, the original food 101 data set,
12796
20:02:28,840 --> 20:02:35,160
torch vision data sets, food 101, how I created this data set, and actually downloading the data.
12797
20:02:35,160 --> 20:02:40,840
But now we're going to write some code, because this data set, the smaller version that I've created
12798
20:02:40,840 --> 20:02:46,920
is on the pytorch deep learning repo, under data. And then we have pizza, steak, sushi.zip.
12799
20:02:46,920 --> 20:02:53,320
Oh, this one is a little spoiler for one of the exercises for this section. But you'll see that
12800
20:02:53,320 --> 20:03:01,320
later. Let's go in here. Let's now write some code to get this data set from GitHub,
12801
20:03:01,320 --> 20:03:05,000
pizza, steak, sushi.zip. And then we'll explore it, we'll become one with the data.
12802
20:03:05,800 --> 20:03:12,440
So I just want to write down here, our data set is a subset of the food 101 data set.
12803
20:03:14,520 --> 20:03:23,240
Food 101 starts with 101 different classes of food. So we could definitely build computer
12804
20:03:23,240 --> 20:03:29,720
vision models for 101 classes, but we're going to start smaller. Our data set starts with three
12805
20:03:29,720 --> 20:03:41,160
classes of food, and only 10% of the images. So what's right here? And 1000 images per class,
12806
20:03:42,040 --> 20:03:54,360
which is 750 training, 250 testing. And we have about 75 training images per class,
12807
20:03:54,360 --> 20:04:03,880
and about 25 testing images per class. So why do this? When starting out ML projects,
12808
20:04:05,000 --> 20:04:13,880
it's important to try things on a small scale and then increase the scale when necessary.
12809
20:04:15,320 --> 20:04:21,800
The whole point is to speed up how fast you can experiment.
12810
20:04:21,800 --> 20:04:27,000
Because there's no point trying to experiment on things that if we try to train on 100,000
12811
20:04:27,000 --> 20:04:32,360
images to begin with, our models might train take half an hour to train at a time. So at the
12812
20:04:32,360 --> 20:04:39,240
beginning, we want to increase the rate that we experiment at. And so let's get some data.
12813
20:04:39,240 --> 20:04:45,320
We're going to import requests so that we can request something from GitHub to download this
12814
20:04:45,320 --> 20:04:52,040
URL here. Then we're also going to import zip file from Python, because our data is in the form
12815
20:04:52,040 --> 20:04:57,720
of a zip file right now. Then we're going to get path lib, because I like to use paths whenever
12816
20:04:57,720 --> 20:05:04,360
I'm dealing with file paths or directory paths. So now let's set up a path to a data folder.
12817
20:05:05,080 --> 20:05:10,200
And this, of course, will depend on where your data set lives, what you'd like to do. But I
12818
20:05:10,200 --> 20:05:15,160
typically like to create a folder over here called data. And that's just going to store all of my
12819
20:05:15,160 --> 20:05:24,440
data for whatever project I'm working on. So data path equals path data. And then we're going to go
12820
20:05:24,440 --> 20:05:34,200
image path equals data path slash pizza steak sushi. That's how we're going to have images
12821
20:05:34,200 --> 20:05:40,280
from those three classes. Pizza steak and sushi are three of the classes out of the 101 in food
12822
20:05:40,280 --> 20:05:48,840
101. So if the image folder doesn't exist, so if our data folder already exists, we don't want to
12823
20:05:48,840 --> 20:05:55,800
redownload it. But if it doesn't exist, we want to download it and unzip it. So if image path
12824
20:05:55,800 --> 20:06:08,120
is der, so we want to print out the image path directory already exists skipping download.
12825
20:06:09,880 --> 20:06:19,960
And then if it doesn't exist, we want to print image path does not exist, creating one. Beautiful.
12826
20:06:19,960 --> 20:06:25,960
And so we're going to go image path dot mk der to make a directory. We want to make its parents
12827
20:06:25,960 --> 20:06:30,920
if we need to. So the parent directories and we want to pass exist, okay, equals true. So we don't
12828
20:06:30,920 --> 20:06:36,920
get any errors if it already exists. And so then we can write some code. I just want to show you
12829
20:06:36,920 --> 20:06:44,920
what this does if we run it. So our target directory data slash pizza steak sushi does not exist.
12830
20:06:44,920 --> 20:06:51,560
It's creating one. So then we have now data and inside pizza steak sushi. Wonderful. But we're
12831
20:06:51,560 --> 20:06:55,640
going to fill this up with some images so that we have some data to work with. And then the whole
12832
20:06:55,640 --> 20:07:02,200
premise of this entire section will be loading this data of just images into PyTorch so that we
12833
20:07:02,200 --> 20:07:06,600
can build a computer vision model on it. But I just want to stress that this step will be very
12834
20:07:06,600 --> 20:07:11,480
similar no matter what data you're working with. You'll have some folder over here or maybe it'll
12835
20:07:11,480 --> 20:07:16,040
live on the cloud somewhere. Who knows wherever your data is, but you'll want to write code to
12836
20:07:16,040 --> 20:07:24,760
load it from here into PyTorch. So let's download pizza steak and sushi data. So I'm going to use
12837
20:07:24,760 --> 20:07:32,280
width. I'll just X over here. So we have more screen space with open. I'm going to open the data
12838
20:07:32,280 --> 20:07:39,800
path slash the file name that I'm trying to open, which will be pizza steak sushi dot zip. And I'm
12839
20:07:39,800 --> 20:07:47,160
going to write binary as F. So this is essentially saying I'm doing this in advance because I know
12840
20:07:47,160 --> 20:07:54,360
I'm going to download this folder here. So I know the the file name of it, pizza steak sushi dot zip.
12841
20:07:54,360 --> 20:08:04,040
I'm going to download that into Google collab and I want to open it up. So request equals request
12842
20:08:04,040 --> 20:08:13,320
dot get. And so when I want to get this file, I can click here. And then if I click download,
12843
20:08:13,880 --> 20:08:19,880
it's going to what do you think it's going to do? Well, let's see. If I wanted to download it
12844
20:08:19,880 --> 20:08:25,800
locally, I could do that. And then I could come over here. And then I could click upload if I
12845
20:08:25,800 --> 20:08:30,760
wanted to. So upload the session storage. I could upload it from that. But I prefer to write code
12846
20:08:30,760 --> 20:08:35,240
so that I could just run this cell over again and have the file instead of being download to
12847
20:08:35,240 --> 20:08:41,480
my local computer. It just goes straight into Google collab. So to do that, we need the URL
12848
20:08:42,040 --> 20:08:47,160
from here. And I'm just going to put that in there. It needs to be as a string.
12849
20:08:49,160 --> 20:08:57,080
Excuse me. I'm getting trigger happy on the shift and enter. Wonderful. So now I've got a request
12850
20:08:57,080 --> 20:09:04,040
to get the content that's in here. And GitHub can't really show this because this is a zip file
12851
20:09:04,040 --> 20:09:11,320
of images, spoiler alert. Now let's keep going. We're going to print out that we're downloading
12852
20:09:11,320 --> 20:09:20,680
pizza, stake and sushi data dot dot dot. And then I'm going to write to file the request dot content.
12853
20:09:21,400 --> 20:09:26,760
So the content of the request that I just made to GitHub. So that's request is here.
12854
20:09:26,760 --> 20:09:32,120
Using the Python request library to get the information here from GitHub. This URL could be
12855
20:09:32,120 --> 20:09:38,280
wherever your file has been stored. And then I'm going to write the content of that request
12856
20:09:38,280 --> 20:09:46,040
to my target file, which is this. This here. So if I just copy this, I'm going to write the data
12857
20:09:46,040 --> 20:09:55,400
to here data path slash pizza, stake sushi zip. And then because it's a zip file, I want to unzip it.
12858
20:09:55,400 --> 20:10:03,720
So unzip pizza, stake sushi data. Let's go with zip file. So we imported zip file up there,
12859
20:10:03,720 --> 20:10:09,480
which is a Python library to help us deal with zip files. We're going to use zip file dot zip
12860
20:10:09,480 --> 20:10:13,960
file. We're going to pass it in the data path. So just the path that we did below,
12861
20:10:14,680 --> 20:10:23,320
data path slash pizza, stake sushi dot zip. And this time, instead of giving it right permissions,
12862
20:10:23,320 --> 20:10:29,080
so that's what wb stands for, stands for right binary. I'm going to give it read permissions.
12863
20:10:29,080 --> 20:10:35,880
So I want to read this target file instead of writing it. And I'm going to go as zip ref.
12864
20:10:36,600 --> 20:10:40,520
We can call this anything really, but zip ref is kind of, you'll see this a lot in
12865
20:10:41,320 --> 20:10:48,440
different Python examples. So we're going to print out again. So unzipping pizza, stake,
12866
20:10:48,440 --> 20:10:59,560
and sushi data. Then we're going to go zip underscore ref dot extract all. And we're going to go image
12867
20:10:59,560 --> 20:11:06,840
path. So what this means is it's taking the zip ref here. And it's extracting all of the
12868
20:11:06,840 --> 20:11:14,360
information that's within that zip ref. So within this zip file, to the image path,
12869
20:11:14,360 --> 20:11:21,320
which is what we created up here. So if we have a look at image path, let's see that.
12870
20:11:22,600 --> 20:11:29,960
Image path. Wonderful. So that's where all of the contents of that zip file are going to go
12871
20:11:29,960 --> 20:11:37,240
into this file. So let's see it in action. You're ready. Hopefully it works. Three, two, one, run.
12872
20:11:37,240 --> 20:11:45,560
File is not a zip file. Oh, no, what do we get wrong? So did I type this wrong?
12873
20:11:47,560 --> 20:11:58,440
Got zip data path. Oh, we got the zip file here. Pizza, stake, sushi, zip, read data path.
12874
20:11:59,800 --> 20:12:03,480
Okay, I found the error. So this is another thing that you'll have to keep in mind.
12875
20:12:03,480 --> 20:12:08,600
And I believe we've covered this before, but I like to keep the errors in these videos so that
12876
20:12:08,600 --> 20:12:12,680
you can see where I get things wrong, because you never write code right the first time.
12877
20:12:13,240 --> 20:12:18,600
So we have this link in GitHub. We have to make sure that we have the raw link address. So if I
12878
20:12:18,600 --> 20:12:24,760
come down to here and copy the link address from the download button, you'll notice a slight
12879
20:12:24,760 --> 20:12:29,720
difference if we come back into here. So I'm just going to copy that there. So if we step
12880
20:12:29,720 --> 20:12:35,960
through this GitHub, Mr. D Burke pytorch deep learning, we have raw instead of blob. So that
12881
20:12:35,960 --> 20:12:41,960
is why we've had an error is that our code is correct. It's just downloading the wrong data.
12882
20:12:42,680 --> 20:12:47,080
So let's change this to the raw. So just keep that in mind, you must have raw here.
12883
20:12:47,880 --> 20:12:49,320
And so let's see if this works.
12884
20:12:52,600 --> 20:12:56,440
Do we have the correct data? Oh, we might have to delete this. Oh, there we go.
12885
20:12:56,440 --> 20:13:03,720
Test. Beautiful. Train. Pizza steak sushi. Wonderful. So it looks like we've got some data. And if we
12886
20:13:03,720 --> 20:13:09,560
open this up, what do we have? We have various JPEGs. Okay. So this is our testing data. And if
12887
20:13:09,560 --> 20:13:15,640
we click on there, we've got an image of pizza. Beautiful. So we're going to explore this a
12888
20:13:15,640 --> 20:13:21,080
little bit more in the next video. But that is some code that we've written to download data sets
12889
20:13:21,080 --> 20:13:27,720
or download our own custom data set. Now, just recall that we are working specifically on a pizza
12890
20:13:27,720 --> 20:13:33,880
steak and sushi problem for computer vision. However, our whole premise is that we have some
12891
20:13:33,880 --> 20:13:38,760
custom data. And we want to convert these. How do we get these into tenses? That's what we want
12892
20:13:38,760 --> 20:13:45,560
to do. And so the same process will be for your own problems. We'll be loading a target data set
12893
20:13:45,560 --> 20:13:51,000
and then writing code to convert whatever the format the data set is in into tenses for PyTorch.
12894
20:13:52,120 --> 20:13:55,560
So I'll see you in the next video. Let's explore the data we've downloaded.
12895
20:14:00,360 --> 20:14:06,040
Welcome back. In the last video, we wrote some code to download a target data set, our own custom
12896
20:14:06,040 --> 20:14:13,000
data set from the PyTorch deep learning data directory. And if you'd like to see how that
12897
20:14:13,000 --> 20:14:18,040
data set was made, you can go to PyTorch deep learning slash extras. It's going to be in the
12898
20:14:18,040 --> 20:14:24,600
custom data creation notebook here for 04. So I've got all the code there. All we've done is take
12899
20:14:24,600 --> 20:14:31,000
data from the food 101 data set, which you can download from this website here, or from torch
12900
20:14:31,000 --> 20:14:40,120
vision. So if we go to torch vision, food 101. We've got the data set built into PyTorch there.
12901
20:14:40,120 --> 20:14:46,680
So I've used that data set from PyTorch and broken it down from 101 classes to three classes so that
12902
20:14:46,680 --> 20:14:52,680
we can start with a small experiment. So there we go. Get the training data, data sets food 101,
12903
20:14:52,680 --> 20:15:00,280
and then I've customized it to be my own style. So if we go back to CoLab, we've now got
12904
20:15:00,280 --> 20:15:04,920
pizza steak sushi, a test folder, which will be our testing images, and a train folder,
12905
20:15:04,920 --> 20:15:10,840
which will be our training images. This data is in standard image classification format. But we'll
12906
20:15:10,840 --> 20:15:16,280
cover that in a second. All we're going to do in this video is kick off section number two,
12907
20:15:16,280 --> 20:15:25,400
which is becoming one with the data, which is one of my favorite ways to refer to data preparation
12908
20:15:25,400 --> 20:15:35,320
and data exploration. So we're coming one with the data. And I'd just like to show you one of my
12909
20:15:35,320 --> 20:15:41,960
favorite quotes from Abraham loss function. So if I had eight hours to build a machine learning model,
12910
20:15:41,960 --> 20:15:48,200
I'd spend the first six hours preparing my data set. And that's what we're going to do. Abraham
12911
20:15:48,200 --> 20:15:53,080
loss function sounds like he knows what is going on. But since we've just downloaded some data,
12912
20:15:53,080 --> 20:16:00,640
let's explore it. Hey, and we'll write some code now to walk through each of the directories. How
12913
20:16:00,640 --> 20:16:07,800
you explore your data will depend on what data you've got. So we've got a fair few different
12914
20:16:07,800 --> 20:16:12,760
directories here with a fair few different folders within them. So how about we walk through each
12915
20:16:12,760 --> 20:16:18,120
of these directories and see what's going on. If you have visual data, you probably want to
12916
20:16:18,120 --> 20:16:22,440
visualize an image. So we're going to do that in the second two, write a little doc string for
12917
20:16:22,440 --> 20:16:33,160
this helper function. So walks through the path, returning its contents. Now, just in case you didn't
12918
20:16:33,160 --> 20:16:39,240
know Abraham loss function does not exist as far as I know. But I did make up that quote. So we're
12919
20:16:39,240 --> 20:16:46,680
going to use the OS dot walk function, OS dot walk. And we're going to pass it in a dirt path. And
12920
20:16:46,680 --> 20:16:55,320
what does walk do? We can get the doc string here. Directory tree generator. For each directory
12921
20:16:55,320 --> 20:17:01,240
in the directory tree rooted at the top, including top itself, but in excluding dot and dot dot,
12922
20:17:01,240 --> 20:17:08,440
yields a three tuple, derpath, der names, and file names. You can step through this in the Python
12923
20:17:08,440 --> 20:17:12,680
documentation, if you'd like. But essentially, it's just going to go through our target directory,
12924
20:17:12,680 --> 20:17:17,560
which in this case will be this one here. And walk through each of these directories printing out
12925
20:17:17,560 --> 20:17:23,560
some information about each one. So let's see that in action. This is one of my favorite things to do
12926
20:17:23,560 --> 20:17:31,080
if we're working with standard image classification format data. So there are lane, length,
12927
20:17:31,080 --> 20:17:41,000
der names, directories. And let's go land, land, file names. We say at length, like I've got the
12928
20:17:41,000 --> 20:17:50,200
G on the end, but it's just land images in, let's put in here, derpath. So a little bit confusing
12929
20:17:50,200 --> 20:17:54,760
if you've never used walk before, but it's so exciting to see all of the information in all
12930
20:17:54,760 --> 20:18:00,120
of your directories. Oh, we didn't read and run it. Let's check out function now walk through der.
12931
20:18:00,120 --> 20:18:05,320
And we're going to pass it in the image path, which is what? Well, it's going to show us.
12932
20:18:05,960 --> 20:18:11,800
How beautiful. So let's compare what we've got in our printout here. There are two directories
12933
20:18:11,800 --> 20:18:17,480
and zero images in data, pizza, steak sushi. So this one here, there's zero images, but there's
12934
20:18:17,480 --> 20:18:24,520
two directories test and train wonderful. And there are three directories in data, pizza, steak, sushi,
12935
20:18:24,520 --> 20:18:31,720
test. Yes, that looks correct. Three directories, pizza, steak, sushi. And then we have zero
12936
20:18:31,720 --> 20:18:38,840
directories and 19 images in pizza, steak, sushi, slash test, steak. We have a look at this. So that
12937
20:18:38,840 --> 20:18:44,760
means there's 19 testing images for steak. Let's have a look at one of them. There we go. Now,
12938
20:18:44,760 --> 20:18:49,880
again, these are from the food 101 data set, the original food 101 data set, which is just a whole
12939
20:18:49,880 --> 20:18:55,400
bunch of images of food, 100,000 of them. There's some steak there. Wonderful. And we're trying to
12940
20:18:55,400 --> 20:19:01,240
build a food vision model to recognize what is in each image. Then if we jump down to here,
12941
20:19:01,240 --> 20:19:07,240
we have three directories in the training directory. So we have pizza, steak, sushi. And then we have
12942
20:19:07,240 --> 20:19:15,880
75 steak images, 72 sushi images and 78 pizza. So slightly different, but very much the same
12943
20:19:15,880 --> 20:19:20,680
numbers. They're not too far off each other. So we've got about 75 or so training images,
12944
20:19:20,680 --> 20:19:26,840
and we've got about 25 or so testing images per class. Now these were just randomly selected
12945
20:19:26,840 --> 20:19:34,840
from the food 101 data set 10% of three different classes. So let's keep pushing forward. And we're
12946
20:19:34,840 --> 20:19:44,440
going to set up our training and test parts. So I just want to show you, we'll just set up this,
12947
20:19:44,440 --> 20:19:52,280
and then I'll just show you the standard image classification setup, image path.train. And we're
12948
20:19:52,280 --> 20:19:57,480
going to go tester. So if you're working on image classification problem, we want to set this up
12949
20:19:57,480 --> 20:20:04,280
as test. And then if we print out the trainer and the tester, this is what we're going to be
12950
20:20:04,280 --> 20:20:09,880
trying to do. We're going to write some code to go, Hey, look at this path for our training images.
12951
20:20:09,880 --> 20:20:17,080
And look at this path for our testing images. And so this is the standard image classification
12952
20:20:17,080 --> 20:20:22,600
data format is that you have your overall data set folder. And then you have a training folder
12953
20:20:22,600 --> 20:20:27,880
dedicated to all of the training images that you might have. And then you have a testing folder
12954
20:20:27,880 --> 20:20:31,880
dedicated to all of the testing images that you might have. And you could have a validation
12955
20:20:31,880 --> 20:20:39,240
data set here as well if you wanted to. But to label each one of these images, the class name
12956
20:20:39,240 --> 20:20:46,680
is the folder name. So all of the pizza images live in the pizza directory, the same for steak,
12957
20:20:46,680 --> 20:20:52,760
and the same for sushi. So depending on your problem, your own data format will depend on
12958
20:20:52,760 --> 20:20:57,240
whatever you're working on, you might have folders of different text files or folders of
12959
20:20:57,240 --> 20:21:04,440
different audio files. But the premise remains, we're going to be writing code to get our data here
12960
20:21:04,440 --> 20:21:11,000
into tenses for use with PyTorch. And so where does this come from? This image data classification
12961
20:21:11,000 --> 20:21:18,920
format. Well, if we go to the torch vision dot data sets documentation, as you start to work
12962
20:21:18,920 --> 20:21:23,400
with more data sets, you'll start to realize that there are standardized ways of storing
12963
20:21:23,400 --> 20:21:28,440
specific types of data. So if we come down to here, base classes for custom data sets,
12964
20:21:28,440 --> 20:21:33,480
we'll be working towards using this image folder data set. But this is a generic data
12965
20:21:33,480 --> 20:21:40,360
loader where the images are arranged in this way by default. So I've specifically formatted our data
12966
20:21:40,360 --> 20:21:48,280
to mimic the style that this pre built data loading function is for. So we've got a root directory
12967
20:21:48,280 --> 20:21:54,520
here in case of we were classifying dog and cat images, we have root, then we have a dog folder,
12968
20:21:54,520 --> 20:22:00,360
then we have various images. And the same thing for cat, this would be dog versus cat. But the only
12969
20:22:00,360 --> 20:22:06,120
difference for us is that we have food images, and we have pizza steak sushi. If we wanted to use the
12970
20:22:06,120 --> 20:22:12,680
entire food 101 data set, we would have 101 different folders of images here, which is totally
12971
20:22:12,680 --> 20:22:18,440
possible. But to begin with, we're keeping things small. So let's keep pushing forward. As I said,
12972
20:22:18,440 --> 20:22:22,760
we're dealing with a computer vision problem. So what's another way to explore our data,
12973
20:22:22,760 --> 20:22:28,440
other than just walking through the directories themselves. Let's visualize an image, hey? But
12974
20:22:28,440 --> 20:22:33,640
we've done that before with just clicking on the file. How about we write some code to do so.
12975
20:22:35,400 --> 20:22:38,840
We'll replicate this but with code. I'll see you in the next video.
12976
20:22:42,840 --> 20:22:49,800
Welcome back. In the last video, we started to become one with the data. And we learned that we
12977
20:22:49,800 --> 20:22:56,680
have about 75 images per training class and about 25 images per testing class. And we also learned
12978
20:22:56,680 --> 20:23:04,440
that the standard image classification data structure is to have the steak images within the steak
12979
20:23:04,440 --> 20:23:09,720
folder of the training data set and the same for test, and the pizza images within the pizza
12980
20:23:09,720 --> 20:23:14,920
folder, and so on for each different image classification class that we might have.
12981
20:23:14,920 --> 20:23:19,720
So if you want to create your own data set, you might format it in such a way that your training
12982
20:23:19,720 --> 20:23:25,000
images are living in a directory with their classification name. So if you wanted to classify
12983
20:23:25,000 --> 20:23:30,360
photos of dogs and cats, you might create a training folder of train slash dog train slash
12984
20:23:30,360 --> 20:23:37,080
cat, put images of dogs in the dog folder, images of cats in the cat folder, and then the same for
12985
20:23:37,080 --> 20:23:42,040
the testing data set. But the premise remains, I'm going to sound like a broken record here.
12986
20:23:42,040 --> 20:23:48,120
We want to get our data from these files, whatever files they may be in, whatever data structure
12987
20:23:48,120 --> 20:23:53,080
they might be in, into tenses. But before we do that, let's keep becoming one with the data.
12988
20:23:53,080 --> 20:23:59,880
And we're going to visualize an image. So visualizing an image, and you know how much I love randomness.
12989
20:24:00,520 --> 20:24:07,480
So let's select a random image from all of the files that we have in here. And let's plot it,
12990
20:24:07,480 --> 20:24:11,720
hey, because we could just click through them and visualize them. But I like to do things with
12991
20:24:11,720 --> 20:24:20,760
code. So specifically, let's let's plan this out. Let's write some code to number one is get all
12992
20:24:20,760 --> 20:24:28,760
of the image paths. We'll see how we can do that with the path path lib library. We then want to
12993
20:24:28,760 --> 20:24:36,360
pick a random image path using we can use Python's random for that. Python's random dot choice will
12994
20:24:36,360 --> 20:24:45,240
pick a single image random dot choice. Then we want to get the image class name. And this is where
12995
20:24:45,240 --> 20:24:51,480
part lib comes in handy. Class name, recall that whichever target image we pick, the class name will
12996
20:24:51,480 --> 20:24:56,600
be whichever directory that it's in. So in the case of if we picked a random image from this directory,
12997
20:24:57,320 --> 20:25:04,840
the class name would be pizza. So we can do that using, I think it's going to be path lib dot path.
12998
20:25:04,840 --> 20:25:09,240
And then we'll get the parent folder, wherever that image lives. So the parent image parent
12999
20:25:09,240 --> 20:25:15,320
folder that parent directory of our target random image. And we're going to get the stem of that.
13000
20:25:15,960 --> 20:25:23,000
So we have stem, stem is the last little bit here. Number four, what should we do? Well,
13001
20:25:23,000 --> 20:25:30,280
we want to open the image. So since we're working with images, let's open the image
13002
20:25:31,160 --> 20:25:38,360
with Python's pill, which is Python image library, but we'll actually be pillow. So if we go Python
13003
20:25:38,360 --> 20:25:45,240
pillow, a little bit confusing when I started to learn about Python image manipulation. So pillow
13004
20:25:45,240 --> 20:25:53,800
is a friendly pill for, but it's still called pill. So just think of pillow as a way to process
13005
20:25:53,800 --> 20:26:01,160
images with Python. So pill is the Python imaging library by Frederick Lund. And so Alex Clark and
13006
20:26:01,160 --> 20:26:09,880
contributors have created pillow. So thank you, everyone. And let's go to number five. What do
13007
20:26:09,880 --> 20:26:14,120
we want to do as well? We want to, yeah, let's get some metadata about the image. We'll then show
13008
20:26:14,120 --> 20:26:22,200
the image and print metadata. Wonderful. So let's import random, because machine learning is all
13009
20:26:22,200 --> 20:26:28,120
about harnessing the power of randomness. And I like to use randomness to explore data as well
13010
20:26:28,120 --> 20:26:36,840
as model it. So let's set the seed. So we get the same image on both of our ends. So random dot seed.
13011
20:26:38,280 --> 20:26:43,320
I'm going to use 42. You can use whatever you'd like. But if you'd like to get the same image as me,
13012
20:26:43,320 --> 20:26:54,200
I'd suggest using 42 as well. Now let's get all the image paths. So we can do this because our image
13013
20:26:54,200 --> 20:27:00,680
path list, we want to get our image path. So recall that our image path
13014
20:27:02,920 --> 20:27:08,520
is this. So this folder here, I'm just going to close all this. So this is our image path,
13015
20:27:08,520 --> 20:27:13,240
this folder here, you can also go copy path if you wanted to, we're just going to get something
13016
20:27:13,240 --> 20:27:20,440
very similar there. That's going to error out. So I'll just comment that. So it doesn't error.
13017
20:27:20,440 --> 20:27:28,040
That's our path. But we're going to keep it in the POSIX path format. And we can go list. Let's
13018
20:27:28,040 --> 20:27:34,440
create a list of image path dot glob, which stands for grab. I don't actually know what glob stands
13019
20:27:34,440 --> 20:27:42,520
for. But to me, it's like glob together. All of the images that are all of the files that suit
13020
20:27:42,520 --> 20:27:48,520
a certain pattern. So glob together for me means stick them all together. And you might be able
13021
20:27:48,520 --> 20:27:54,120
to correct me if I've got the wrong meaning there. I'd appreciate that. And so we're going to pass
13022
20:27:54,120 --> 20:28:02,840
in a certain combination. So we want star slash star. And then we want star dot jpg. Now why are
13023
20:28:02,840 --> 20:28:09,720
we doing this? Well, because we want every image path. So star is going to be this first
13024
20:28:10,680 --> 20:28:16,920
directory here. So any combination, it can be train or test. And then this star means anything for
13025
20:28:16,920 --> 20:28:24,600
what's inside tests. And let's say this first star is equal to test. This second star is equal to
13026
20:28:24,600 --> 20:28:30,280
anything here. So it could be any of pizza, steak or sushi. And then finally, this star,
13027
20:28:30,280 --> 20:28:37,240
let's say it was test pizza. This star is anything in here. And that is before dot jpg.
13028
20:28:37,800 --> 20:28:42,920
So it could be any one of these files here. Now this will make more sense once we print it out.
13029
20:28:42,920 --> 20:28:50,920
So image path list, let's have a look. There we go. So now we've got a list of every single image
13030
20:28:50,920 --> 20:28:57,800
that's within pizza steak sushi. And this is just another way that I like to visualize data is to
13031
20:28:57,800 --> 20:29:03,000
just get all of the paths and then randomly visualize it, whether it be an image or text or
13032
20:29:03,000 --> 20:29:08,360
audio, you might want to randomly listen to it. Recall that each each of the domain libraries have
13033
20:29:08,360 --> 20:29:13,800
different input and output methods for different data sets. So if we come to torch vision, we have
13034
20:29:13,800 --> 20:29:21,160
utils. So we have different ways to draw on images, reading and writing images and videos. So we
13035
20:29:21,160 --> 20:29:27,640
could load an image via read image, we could decode it, we could do a whole bunch of things.
13036
20:29:27,640 --> 20:29:33,560
I'll let you explore that as extra curriculum. But now let's select a random image from here
13037
20:29:33,560 --> 20:29:42,360
and plot it. So we'll go number two, which was our step up here, pick a random image. So pick a
13038
20:29:42,360 --> 20:29:51,880
random image path. Let's get rid of this. And so we can go random image path equals random
13039
20:29:51,880 --> 20:29:58,440
dot choice, harness the power of randomness to explore our data. Let's get a random image from
13040
20:29:58,440 --> 20:30:02,840
image path list, and then we'll print out random image path, which one was our lucky image that
13041
20:30:02,840 --> 20:30:12,360
we selected. Beautiful. So we have a test pizza image is our lucky random image. And
13042
20:30:13,960 --> 20:30:18,200
because we've got a random seed, it's going to be the same one each time. Yes, it is.
13043
20:30:19,080 --> 20:30:22,840
And if we comment out the random seed, we'll get a different one each time. We've got a stake
13044
20:30:22,840 --> 20:30:29,720
image. We've got another stake image. Another stake image. Oh, three in a row, four in a row.
13045
20:30:29,720 --> 20:30:34,600
Oh, pizza. Okay, let's keep going. So we'll get the image class
13046
20:30:36,520 --> 20:30:44,760
from the path name. So the image class is the name of the directory, because our image data is
13047
20:30:44,760 --> 20:30:52,600
in standard image classification format, where the image is stored. So let's do that image class
13048
20:30:52,600 --> 20:31:03,480
equals random image path dot parent dot stem. And then we're going to print image class. What do we
13049
20:31:03,480 --> 20:31:12,120
get? So we've got pizza. Wonderful. So the parent is this folder here. And then the stem is the end
13050
20:31:12,120 --> 20:31:17,400
of that folder, which is pizza. Beautiful. Well, now what are we up to now? We're working with
13051
20:31:17,400 --> 20:31:22,440
images. Let's open up the image so we can open up the image using pill. We could also open up the
13052
20:31:22,440 --> 20:31:29,320
image with pytorch here. So with read image, but we're going to use pill to keep things a little
13053
20:31:29,320 --> 20:31:37,000
bit generic for now. So open image, image equals image. So from pill import image, and the image
13054
20:31:37,000 --> 20:31:42,040
class has an open function. And we're just going to pass it in here, the random image path. Note
13055
20:31:42,040 --> 20:31:48,360
if this is corrupt, if your images corrupt, this may error. So then you could potentially use this
13056
20:31:48,360 --> 20:31:55,400
to clean up your data set. I've imported a lot of images with image dot open of our target data
13057
20:31:55,400 --> 20:32:00,360
set here. I don't believe any of them are corrupt. But if they are, please let me know. And we'll find
13058
20:32:00,360 --> 20:32:06,440
out later on when our model tries to train on it. So let's print some metadata. So when we open our
13059
20:32:06,440 --> 20:32:15,000
image, we get some information from it. So let's go our random image path is what? Random image path.
13060
20:32:15,000 --> 20:32:22,440
We're already printing this out, but we'll do it again anyway. And then we're going to go the image
13061
20:32:22,440 --> 20:32:32,040
class is equal to what will be the image class. Wonderful. And then we can print out, we can get
13062
20:32:32,040 --> 20:32:37,320
some metadata about our images. So the image height is going to be IMG dot height. We get that
13063
20:32:37,320 --> 20:32:43,320
metadata from using the pill library. And then we're going to print out image width. And we'll get
13064
20:32:43,320 --> 20:32:50,680
IMG dot width. And then we'll print the image itself. Wonderful. And we can get rid of this,
13065
20:32:50,680 --> 20:32:55,240
and we can get rid of this. Let's now have a look at some random images from our data set.
13066
20:32:59,240 --> 20:33:05,000
Lovely. We've got an image of pizza there. Now I will warn you that the downsides of working with
13067
20:33:05,000 --> 20:33:10,680
food data is it does make you a little bit hungry. So there we've got some sushi. And then we've got
13068
20:33:10,680 --> 20:33:22,200
some more sushi. Some steak. And we have a steak, we go one more for good luck. And we finish off
13069
20:33:22,200 --> 20:33:25,880
with some sushi. Oh, that could be a little bit confusing to me. I thought that might be steak
13070
20:33:25,880 --> 20:33:31,320
to begin with. And this is the scene. Now we'll do one more. Why it's important to sort of visualize
13071
20:33:31,320 --> 20:33:35,400
your images randomly, because you never know what you're going to come across. And this way,
13072
20:33:35,400 --> 20:33:39,560
once we visualize enough images, you could do this a hundred more times. You could do this
13073
20:33:39,560 --> 20:33:45,000
20 more times until you feel comfortable to go, Hey, I feel like I know enough about the data now.
13074
20:33:45,000 --> 20:33:50,760
Let's see how well our model goes on this sort of data. So I'll finish off on this steak image.
13075
20:33:50,760 --> 20:33:56,200
And now I'll set your little challenge before the next video is to visualize an image like we've
13076
20:33:56,200 --> 20:34:03,960
done here. But this time do it with matplotlib. So try to visualize an image with matplotlib.
13077
20:34:03,960 --> 20:34:09,640
That's your little challenge before the next video. So give that a go. We want to do a random
13078
20:34:09,640 --> 20:34:14,760
image as well. So quite a similar set up to this. But instead of printing out things like this,
13079
20:34:14,760 --> 20:34:20,120
we want to visualize it using matplotlib. So try that out and we'll do it together in the next video.
13080
20:34:24,680 --> 20:34:30,760
Oh, we are well on the way to creating our own PyTorch custom data set. We've started to
13081
20:34:30,760 --> 20:34:37,800
become one with the data. But now let's continue to visualize another image. I set you the challenge
13082
20:34:37,800 --> 20:34:43,320
in the last video to try and replicate what we've done here with the pill library with matplotlib.
13083
20:34:43,320 --> 20:34:49,240
So now let's give it a go. Hey, and why use matplotlib? Well, because matplotlib and I'm going to
13084
20:34:49,240 --> 20:34:53,800
import numpy as well, because we're going to have to convert this image into an array. That was a
13085
20:34:53,800 --> 20:34:59,560
little trick that I didn't quite elaborate on. But I hope you tried to decode it out and figure
13086
20:34:59,560 --> 20:35:06,760
it out from the errors you received. But matplotlib is one of the most fundamental data science
13087
20:35:06,760 --> 20:35:11,000
libraries. So you're going to see it everywhere. So it's just important to be aware of how to plot
13088
20:35:11,000 --> 20:35:21,720
images and data with matplotlib. So turn the image into an array. So we can go image as array. And
13089
20:35:21,720 --> 20:35:29,400
I'm going to use the numpy method NP as array. We're going to pass it in the image, recall that
13090
20:35:29,400 --> 20:35:34,920
the image is the same image that we've just set up here. And we've already opened it with pill.
13091
20:35:36,440 --> 20:35:46,200
And then I'm going to plot the image. So plot the image with matplotlib. plt.figure.
13092
20:35:46,200 --> 20:35:56,440
And then we can go fig size equals 10, seven. And then we're going to go plt.im show image as
13093
20:35:56,440 --> 20:36:03,480
array, pass it in the array of numbers. I'm going to set the title here as an f string. And then
13094
20:36:03,480 --> 20:36:11,400
I'm going to pass in image class, equals image class. Then I'm going to pass in image shape. So
13095
20:36:11,400 --> 20:36:15,240
we can get the shape here. Now this is another important thing to be aware of of your different
13096
20:36:15,240 --> 20:36:20,840
datasets when you're exploring them is what is the shape of your data? Because what's one of the
13097
20:36:20,840 --> 20:36:25,880
main errors in machine learning and deep learning? It's shape mismatch issues. So if we know the
13098
20:36:25,880 --> 20:36:31,240
shape of our data where we can start to go, okay, I kind of understand what shape I need my model
13099
20:36:31,240 --> 20:36:36,760
layers to be in what what shape I need my other data to be in. And I'm going to turn the axes off
13100
20:36:36,760 --> 20:36:44,520
here. Beautiful. So look at what we've got. Now I've just thrown this in here without really
13101
20:36:44,520 --> 20:36:49,720
explaining it. But we've seen this before in the computer vision section. As our image shape is
13102
20:36:49,720 --> 20:36:59,800
512 3063. Now the dimensions here are height is 512 pixels. The width is 306 pixels. And it has
13103
20:36:59,800 --> 20:37:08,040
three color channels. So what format is this? This is color channels last, which is the default
13104
20:37:08,040 --> 20:37:14,840
for the pill library. There's also the default for map plot lib. But pytorch recall is default
13105
20:37:14,840 --> 20:37:20,360
if we put the color channels at the start color channels first. Now there is a lot of debate as
13106
20:37:20,360 --> 20:37:24,360
I've said over which is the best order. It looks like it's leading towards going towards this. But
13107
20:37:24,360 --> 20:37:30,120
for now pytorch defaults to color channels first. But that's okay. Because we can manipulate these
13108
20:37:30,120 --> 20:37:36,200
dimensions to what we need for whatever code that we're writing. And the three color channels is what
13109
20:37:36,200 --> 20:37:41,400
red, green and blue. So if you combine red, green and blue in some way, shape or form,
13110
20:37:41,400 --> 20:37:47,640
you get the different colors here that represent our image. And so if we have a look at our image
13111
20:37:47,640 --> 20:37:59,480
as a ray. Our image is in numerical format. Wonderful. So okay. We've got one way to do this for
13112
20:37:59,480 --> 20:38:07,800
one image. I think we start moving towards scaling this up to do it for every image in our data
13113
20:38:07,800 --> 20:38:13,240
folder. So let's just finish off this video by visualizing one more image. What do we get? Same
13114
20:38:13,240 --> 20:38:19,240
premise. The image is now as an array, different numerical values. We've got a delicious looking
13115
20:38:19,240 --> 20:38:28,440
pizza here of shave 512 512 with color channels last. And we've got the same thing up here. So
13116
20:38:28,440 --> 20:38:33,640
that is one way to become one with the data is to visualize different images, especially random
13117
20:38:33,640 --> 20:38:38,040
images. You could do the same thing visualizing different text samples that you're working with
13118
20:38:38,040 --> 20:38:43,480
or listening to different audio samples. It depends what domain you're working in. So now in the
13119
20:38:43,480 --> 20:38:49,880
next video, let's start working towards turning all of the images in here. Now that we visualize
13120
20:38:49,880 --> 20:38:54,600
some of them and become one with the data, we've seen that the shapes are varying in terms of
13121
20:38:54,600 --> 20:38:59,000
height and width. But they all look like they have three color channels because we have color images.
13122
20:38:59,640 --> 20:39:04,680
But now we want to write code to turn all of these images into pytorch tenses.
13123
20:39:05,480 --> 20:39:09,080
So let's start moving towards that. I'll see you in the next video.
13124
20:39:12,600 --> 20:39:18,920
Hello and welcome back. In the last video, we converted an image to a NumPy array.
13125
20:39:18,920 --> 20:39:25,400
And we saw how an image can be represented as an array. But what if we'd like to get this image
13126
20:39:25,400 --> 20:39:33,160
from our custom data set over here, pizza steak sushi into pytorch? Well, let's cover that in
13127
20:39:33,160 --> 20:39:39,640
this video. So I'm going to create a new heading here. And it's going to be transforming data.
13128
20:39:40,280 --> 20:39:45,160
And so what we'd like to do here is I've been hinting at the fact the whole time is we want
13129
20:39:45,160 --> 20:39:50,280
to get our data into tensor format, because that is the data type that pytorch accepts.
13130
20:39:50,920 --> 20:39:59,320
So let's write down here before we can use our image data with pytorch. Now this goes for images,
13131
20:39:59,320 --> 20:40:05,720
other vision data, it goes for text, it goes to audio, basically whatever kind of data set you're
13132
20:40:05,720 --> 20:40:12,760
working with, you need some way to turn it into tenses. So that's step number one. Turn your target
13133
20:40:12,760 --> 20:40:23,160
data into tenses. In our case, it's going to be a numerical representation of our images.
13134
20:40:24,600 --> 20:40:35,240
And number two is turn it into a torch dot utils dot data dot data set. So recall from a previous
13135
20:40:35,240 --> 20:40:43,880
video that we've used the data set to house all of our data in tensor format. And then subsequently,
13136
20:40:43,880 --> 20:40:54,680
we've turned our data sets, our pytorch data sets into torch dot utils dot data dot data loader.
13137
20:40:55,240 --> 20:41:02,440
And a data loader creates an iterable or a batched version of our data set. So for short, we're going
13138
20:41:02,440 --> 20:41:11,960
to call these data set and data loader. Now, as I discussed previously, if we go to the pytorch
13139
20:41:11,960 --> 20:41:19,960
documentation torch vision for torch vision, this is going to be quite similar for torch audio torch
13140
20:41:19,960 --> 20:41:25,880
text, torch rec torch data eventually when it comes out of beta, there are different ways to
13141
20:41:25,880 --> 20:41:33,400
create such data sets. So we can go into the data sets module, and then we can find built-in data
13142
20:41:33,400 --> 20:41:43,400
sets, and then also base classes for custom data sets. But if we go into here, image folder,
13143
20:41:43,400 --> 20:41:47,400
there's another parameter I'd like to show you, and this is going to be universal across many of
13144
20:41:47,400 --> 20:41:54,040
your different data types is the transform parameter. Now, the transform parameter is
13145
20:41:54,040 --> 20:42:01,560
a parameter we can use to pass in some transforms on our data. So when we load our data sets from an
13146
20:42:01,560 --> 20:42:08,600
image folder, it performs a transform on those data samples that we've sent in here as the target
13147
20:42:08,600 --> 20:42:13,960
data folder. Now, this is a lot more easier to understand through illustration, rather than just
13148
20:42:13,960 --> 20:42:20,280
talking about it. So let's create a transform. And the main transform we're going to be doing is
13149
20:42:20,280 --> 20:42:24,840
transforming our data, and we're turning it into tenses. So let's see what that looks like. So we're
13150
20:42:24,840 --> 20:42:29,880
going to just going to re import all of the main libraries that we're going to use. So from torch
13151
20:42:29,880 --> 20:42:38,600
utils dot data, let's import data loader. And we're going to import from torch vision. I'm going to
13152
20:42:38,600 --> 20:42:47,000
import data sets. And I'm also going to import transforms. Beautiful. And I'm going to create
13153
20:42:47,000 --> 20:42:55,160
another little heading here, this is going to be 3.1, transforming data with torch vision dot
13154
20:42:55,160 --> 20:43:01,560
transform. So the main transform we're looking to here is turning out images from JPEGs.
13155
20:43:04,200 --> 20:43:07,960
If we go into train, and then we go into any folder, we've got JPEG images.
13156
20:43:09,720 --> 20:43:13,320
And we want to turn these into tensor representation. So there's some pizza there.
13157
20:43:13,320 --> 20:43:20,040
We'll get out of this. Let's see what we can do. How about we create a transform here,
13158
20:43:20,760 --> 20:43:27,240
write a transform for image. And let's start off by calling it data transform.
13159
20:43:27,880 --> 20:43:32,840
And I'm going to show you how we can combine a few transforms together. If you want to
13160
20:43:32,840 --> 20:43:38,120
combine transforms together, you can use transforms dot compose. You can also use
13161
20:43:38,120 --> 20:43:45,560
an n dot sequential to combine transforms. But we're going to stick with transforms dot
13162
20:43:45,560 --> 20:43:53,160
compose for now. And it takes a list. And so let's just write out three transforms to begin with.
13163
20:43:53,160 --> 20:43:57,720
And then we can talk about them after we do so. So we want to resize our images
13164
20:43:59,480 --> 20:44:06,200
to 6464. Now, why might we do this? Well, do you recall in the last section computer vision,
13165
20:44:06,200 --> 20:44:13,160
we use the tiny VGG architecture. And what size were the images that the tiny VGG architecture took?
13166
20:44:14,600 --> 20:44:19,480
Well, we replicated the CNN website version or the CNN explainer website version, and they took
13167
20:44:19,480 --> 20:44:25,880
images of size 6464. So perhaps we want to leverage that computer vision model later on.
13168
20:44:25,880 --> 20:44:32,280
So we're going to resize our images to 6464. And then we're going to create another transform.
13169
20:44:32,280 --> 20:44:37,960
And so this is, I just want to highlight how transforms can help you manipulate your data in a
13170
20:44:37,960 --> 20:44:42,840
certain way. So if we wanted to flip the images, which is a form of data augmentation, in other
13171
20:44:42,840 --> 20:44:49,560
words, artificially increasing the diversity of our data set, we can flip the images randomly on
13172
20:44:49,560 --> 20:44:59,960
the horizontal. So transforms dot random horizontal flip. And I'm going to put a probability in here
13173
20:44:59,960 --> 20:45:08,360
of p equals 0.5. So that means 50% of the time, if an image goes through this transform pipeline,
13174
20:45:09,000 --> 20:45:13,400
it will get flipped on the horizontal axis. As I said, this makes a lot more sense when we
13175
20:45:13,400 --> 20:45:19,800
visualize it. So we're going to do that very shortly. And finally, we're going to turn the image into
13176
20:45:19,800 --> 20:45:31,240
a torch tensor. So we can do this with transforms dot to tensor. And now where might you find such
13177
20:45:31,240 --> 20:45:37,480
transforms? So this transform here says to tensor, if we have a look at the doc string,
13178
20:45:37,480 --> 20:45:42,440
we got convert a pill image, which is what we're working with right now, or a NumPy array to a
13179
20:45:42,440 --> 20:45:47,080
tensor. This transform does not support torch script. If you'd like to find out what that is,
13180
20:45:47,080 --> 20:45:51,560
I'd like to read the documentation for that. It's essentially turning your pytorch code into a
13181
20:45:51,560 --> 20:45:59,320
Python script. It converts a pill image or a NumPy array from height with color channels in the range
13182
20:45:59,320 --> 20:46:06,280
0 to 255, which is what our values are up here. They're from 0 to 255, red, green and blue,
13183
20:46:06,920 --> 20:46:14,200
to a torch float tensor of shape color channels height width in the range 0 to 1. So it will
13184
20:46:14,200 --> 20:46:21,400
take our tensor values here or our NumPy array values from 0 to 255 and convert them into a torch
13185
20:46:21,400 --> 20:46:27,240
tensor in the range 0 to 1. We're going to see this later on in action. But this is our first
13186
20:46:27,240 --> 20:46:33,560
transform. So we can pass data data through that. In fact, I'd encourage you to try that out.
13187
20:46:34,200 --> 20:46:40,760
See what happens when you pass in data transform. What happens when you pass it in our image as a
13188
20:46:40,760 --> 20:46:52,280
ray? Image as a ray. Let's see what happens. Hey, oh, image should be pill image got class NumPy
13189
20:46:52,280 --> 20:46:58,120
array. What if we just pass in our straight up image? So this is a pill image. There we go.
13190
20:46:58,680 --> 20:47:01,880
Beautiful. So if we look at the shape of this, what do we get?
13191
20:47:01,880 --> 20:47:10,200
3 64 64. There's 64. And if what if we wanted to change this to 224, which is another common value for
13192
20:47:11,800 --> 20:47:17,400
computer vision models to 24 to 24. Do you see how powerful this is? This little transforms
13193
20:47:17,400 --> 20:47:23,400
module, the torch vision library will change that back to 64 64. And then if we have a look at what
13194
20:47:23,400 --> 20:47:31,000
D type of our transform tensor is, we get torch float 32. Beautiful. So now we've got a way to
13195
20:47:31,000 --> 20:47:36,360
transform our images into tensors. And so, but we're still only doing this with one image.
13196
20:47:37,160 --> 20:47:43,640
How about we progress towards doing it for every image in our data folder here?
13197
20:47:44,840 --> 20:47:49,560
But before we do that, I'd like to visualize what this looks like. So in the next video,
13198
20:47:49,560 --> 20:47:54,040
let's write some code to visualize what it looks like to transform multiple images at a time.
13199
20:47:54,680 --> 20:47:59,160
And I think it'd be a good idea to compare the transform that we're doing to the original image.
13200
20:47:59,160 --> 20:48:04,040
So I'll see you in the next video. Let's write some visualization code.
13201
20:48:06,840 --> 20:48:13,240
Let's now follow our data explorer's motto of visualizing our transformed images. So we saw what it looks
13202
20:48:13,240 --> 20:48:18,680
like to pass one image through a data transform. And if we wanted to find more documentation on
13203
20:48:18,680 --> 20:48:25,080
torch vision transforms, where could we go? There is a lot of these. So transforming and augmenting
13204
20:48:25,080 --> 20:48:31,000
images, this is actually going to be your extra curriculum for this video. So transforms are
13205
20:48:31,000 --> 20:48:36,440
common image transformations available in the transforms module. They can be chained together
13206
20:48:36,440 --> 20:48:41,400
using compose, which is what we've already done. Beautiful. And so if you'd like to go through all
13207
20:48:41,400 --> 20:48:45,640
of these, there's a whole bunch of different transforms that you can do, including some data
13208
20:48:45,640 --> 20:48:50,200
augmentation transforms. And then if you'd like to see them visually, I'd encourage you to check
13209
20:48:50,200 --> 20:48:55,800
out illustration of transforms. But let's write some code to explore our own transform visually
13210
20:48:55,800 --> 20:49:04,520
first. So I'll leave this as a link. So I'm going up here, right here, transforms
13211
20:49:06,600 --> 20:49:17,720
help you get your images ready to be used with a model slash perform data augmentation.
13212
20:49:17,720 --> 20:49:24,280
Wonderful. So we've got a way to turn images into tenses. That's what we want for our model.
13213
20:49:24,280 --> 20:49:29,560
We want our images as pytorch tenses. The same goes for any other data type that you're working
13214
20:49:29,560 --> 20:49:35,720
with. But now I'd just like to visualize what it looks like if we plot a number of transformed
13215
20:49:35,720 --> 20:49:41,480
images. So we're going to make a function here that takes in some image paths, a transform,
13216
20:49:41,480 --> 20:49:46,440
a number of images to transform at a time and a random seed here, because we're going to harness
13217
20:49:46,440 --> 20:49:53,400
the power of randomness. And sometimes we want to set the seed. Sometimes we don't. So we have
13218
20:49:53,400 --> 20:49:59,640
an image path list that we've created before, which is just all of the image paths that we have
13219
20:49:59,640 --> 20:50:09,000
of our data set. So data, pizza, steak sushi. Now how about we select some random image paths
13220
20:50:09,000 --> 20:50:15,320
and then take the image from that path, run it through our data transform, and then compare the
13221
20:50:15,320 --> 20:50:21,080
original image of what it looks like and the transformed image and what that looks like.
13222
20:50:22,120 --> 20:50:25,640
Let's give it a try, hey? So I'm going to write a doc string of what this does,
13223
20:50:26,600 --> 20:50:35,880
and then selects random images from a path of images and loads slash transforms them,
13224
20:50:35,880 --> 20:50:45,240
then plots the original verse, the transformed version. So that's quite a long doc string,
13225
20:50:45,240 --> 20:50:51,800
but that'll be enough. We can put in some stuff for the image paths, transforms, and seed. We'll
13226
20:50:51,800 --> 20:51:00,280
just code this out. Let's go random seed, we'll create the seed. Maybe we do it if seed, random seed.
13227
20:51:00,280 --> 20:51:08,360
Let's put that, and we'll set seed to equal none by default. That way we can, we'll see if this works,
13228
20:51:08,360 --> 20:51:14,280
hey, if in doubt, coded out random image paths, and then we're going to go random sample from the
13229
20:51:14,280 --> 20:51:19,160
image paths and the number of sample that we're going to do. So random sample is going to, this will
13230
20:51:19,160 --> 20:51:25,240
be a list on which part in here that this is a list. So we're going to randomly sample
13231
20:51:25,240 --> 20:51:34,360
k, which is going to be n. So three images from our image path list. And then we're going to go for
13232
20:51:34,360 --> 20:51:40,280
image path, we're going to loop through the randomly sampled image parts. You know how much I love
13233
20:51:40,280 --> 20:51:46,440
harnessing the power of randomness for visualization. So for image path in random image paths, let's
13234
20:51:46,440 --> 20:51:54,920
open up that image using pill image dot open image path as f. And then we're going to create a
13235
20:51:54,920 --> 20:52:02,360
figure and an axes. And we're going to create a subplot with my plot lib. So subplots. And we
13236
20:52:02,360 --> 20:52:13,320
want it to create one row. So it goes n rows and calls. One row and n calls equals two. And then
13237
20:52:13,320 --> 20:52:20,760
on the first or the zeroth axis, we're going to plot the original image. So in show, we're just
13238
20:52:20,760 --> 20:52:27,640
going to pass it straight in f. And then if we want to go x zero, we're going to set the title. So
13239
20:52:27,640 --> 20:52:35,080
set title, we're going to set it to be the original. So we'll create this as an f string, original,
13240
20:52:35,080 --> 20:52:40,840
and then new line will create a size variable. And this is going to be f dot size. So we're just
13241
20:52:40,840 --> 20:52:48,840
getting the size attribute from our file. So we'll keep going, and we'll turn off the axes here.
13242
20:52:48,840 --> 20:52:57,400
So axis, and we're going to set that to false. Now let's transform on the first axes plot. We're
13243
20:52:57,400 --> 20:53:03,720
going to transform and plot target image. This is so that our images are going to be side by side,
13244
20:53:03,720 --> 20:53:08,760
the original and the transformed version. So there's one thing that we're going to have to do. I'll
13245
20:53:08,760 --> 20:53:14,200
just, I'll code it out in a wrong way first. I think that'll be a good way to illustrate what's
13246
20:53:14,200 --> 20:53:23,240
going on. f. So I'm just going to put a note here. Note, we will need to change shape for
13247
20:53:23,240 --> 20:53:29,080
matplotlib, because we're going to come back here. Because what does this do? What have we
13248
20:53:29,080 --> 20:53:35,880
noticed that our transform does? If we check the shape here, oh, excuse me, it converts our image
13249
20:53:35,880 --> 20:53:45,240
to color channels first. Whereas matplotlib prefers color channels last. So just keep that
13250
20:53:45,240 --> 20:53:51,240
in mind for when we're going forward. This code, I'm writing it, it will error on purpose. So
13251
20:53:51,800 --> 20:53:58,200
transformed image. And then we're going to go axe one as well. We're going to set the title,
13252
20:53:58,200 --> 20:54:06,040
which is going to be transformed. And then we'll create a new line and we'll say size is going to be
13253
20:54:07,560 --> 20:54:17,400
transformed image dot shape. Or probably a bit of, yeah, we could probably go shape here. And then
13254
20:54:17,400 --> 20:54:23,560
finally, we're going to go axe one, we're going to turn the axis, we're going to set that to false.
13255
20:54:23,560 --> 20:54:28,760
You can also set it to off. So you could write false, or you could write off, you might see that
13256
20:54:28,760 --> 20:54:36,200
different versions of that somewhere. And I'm going to write a super title here, which we'll see what
13257
20:54:36,200 --> 20:54:41,720
this looks like class is going to be image path. So we're getting the target image path. And we're
13258
20:54:41,720 --> 20:54:46,680
just going to get the attribute or the parent attribute, and then the stem attribute from that,
13259
20:54:46,680 --> 20:54:51,160
just like we did before, to get the class name. And then I'm going to set this to a larger font
13260
20:54:51,160 --> 20:54:57,480
size, so that we make some nice looking plots, right? If we're going to visualize our data,
13261
20:54:57,480 --> 20:55:03,240
we might as well make our plots visually appealing. So let's plot some transformed data or transformed
13262
20:55:03,240 --> 20:55:08,520
images. So image paths, we're going to set this to image part list, which is just the variable we
13263
20:55:08,520 --> 20:55:15,000
have down below, which is the part list, a list containing all of our image paths. Our transform,
13264
20:55:15,000 --> 20:55:21,160
we're going to set our transform to be equal to our data transform. So this just means that if
13265
20:55:21,160 --> 20:55:26,920
we pass the transform in, our image is going to go through that transform, and then go through all
13266
20:55:26,920 --> 20:55:31,160
of these is going to be resized, it's going to be randomly horizontally flipped, and it's going to
13267
20:55:31,160 --> 20:55:37,560
be converted to a tensor. And then so we're going to set that data transfer there or data transform,
13268
20:55:37,560 --> 20:55:43,000
sorry, and is going to be three. So we plot three images, and we'll set the seed to 42 to begin with.
13269
20:55:43,000 --> 20:55:52,600
Let's see if this works. Oh, what did we get wrong? We have invalid shape. As I said, I love seeing
13270
20:55:52,600 --> 20:55:57,960
this error, because we have seen this error many times, and we know what to do with it. We know that
13271
20:55:57,960 --> 20:56:02,920
we have to rearrange the shapes of our data in some way, shape or form. Wow, I said shape a lot
13272
20:56:02,920 --> 20:56:07,640
there. That's all right. Let's go here, permute. This is what we have to do. We have to permute,
13273
20:56:07,640 --> 20:56:12,760
we have to swap the order of the axes. So right now, our color channels is first. So we have to
13274
20:56:12,760 --> 20:56:18,360
bring this color channel axis or dimension to the end. So we need to shuffle these across. So 64
13275
20:56:18,360 --> 20:56:22,680
into here, 64 into here, and three on the end. We need to, in other words, turn it from color
13276
20:56:22,680 --> 20:56:29,400
channels first to color channels last. So we can do that by permuting it to have the first
13277
20:56:29,400 --> 20:56:35,080
axis come now in the zero dimension spot. And then number two was going to be in the first
13278
20:56:35,080 --> 20:56:40,360
dimension spot. And then number zero was going to be at the back end. So this is essentially going
13279
20:56:40,360 --> 20:56:53,080
from C H W, and we're just changing the order to be H W C. So the exact same data is going to be
13280
20:56:53,080 --> 20:56:57,800
within that tensor. We're just changing the order of the dimensions. Let's see if this works.
13281
20:57:00,200 --> 20:57:07,400
Look at that. Oh, I love seeing some manipulated data. We have a class of pizza and the original
13282
20:57:07,400 --> 20:57:13,160
image is there, and it's 512 by 512. But then we've resized it using our transform. Notice that
13283
20:57:13,160 --> 20:57:18,920
it's a lot more pixelated now, but that makes sense because it's only 64 64 pixels. Now, why
13284
20:57:18,920 --> 20:57:25,080
might we do such a thing? Well, one, if is this image still look like that? Well, to me, it still
13285
20:57:25,080 --> 20:57:29,400
does. But the most important thing will be does it look like that to our model? Does it still look
13286
20:57:29,400 --> 20:57:36,040
like the original to our model? Now 64 by 64, there is less information encoded in this image.
13287
20:57:36,040 --> 20:57:42,440
So our model will be able to compute faster on images of this size. However, we may lose
13288
20:57:42,440 --> 20:57:48,920
some performance because not as much information is encoded as the original image. Again, the size
13289
20:57:48,920 --> 20:57:53,480
of an image is something that you can control. You can set it to be a hyper parameter. You can
13290
20:57:53,480 --> 20:58:01,320
tune the size to see if it improves your model. But I've just decided to go 60 64 64 3 in line
13291
20:58:01,320 --> 20:58:08,760
with the CNN explainer website. So a little hint, we're going to be re replicating this model that
13292
20:58:08,760 --> 20:58:15,000
we've done before. Now you notice that our images are now the same size 64 64 3 as what the CNN
13293
20:58:15,000 --> 20:58:19,800
explainer model uses. So that's where I've got that from. But again, you could change this to
13294
20:58:19,800 --> 20:58:25,000
size to whatever you want. And we see, oh, we've got a stake image here. And you notice that our
13295
20:58:25,000 --> 20:58:30,680
image has been flipped on the horizontal. So the horizontal access, our image has just been flipped
13296
20:58:30,680 --> 20:58:37,080
same with this one here. So this is the power of torch transforms. Now there are a lot more
13297
20:58:37,080 --> 20:58:42,200
transforms, as I said, you can go through them here to have a look at what's going on. Illustrations
13298
20:58:42,200 --> 20:58:48,680
of transforms is a great place. So there's resize, there's center crop, you can crop your
13299
20:58:48,680 --> 20:58:54,600
images, you can crop five different locations, you can do grayscale, you can change the color,
13300
20:58:54,600 --> 20:59:02,280
a whole bunch of different things. I'd encourage you to check this out. That's your extra curriculum
13301
20:59:02,280 --> 20:59:09,720
for this video. But now that we've visualized a transform, this is what I hinted at before that
13302
20:59:09,720 --> 20:59:17,720
we're going to use this transform for when we load all of our images in, using into a torch
13303
20:59:17,720 --> 20:59:23,720
data set. So I just wanted to make sure that they had been visualized first. We're going to use our
13304
20:59:23,720 --> 20:59:31,000
data transform in the next video when we load all of our data using a torch vision dot data sets
13305
20:59:31,000 --> 20:59:35,080
helper function. So let's give that a go. I'll see you in the next video.
13306
20:59:38,680 --> 20:59:43,160
Have a look at that beautiful plot. We've got some original images and some transformed
13307
20:59:43,160 --> 20:59:48,360
images. And the beautiful thing about our transformed images is that they're in tensor format,
13308
20:59:48,360 --> 20:59:52,120
which is what we need for our model. That's what we've been slowly working towards.
13309
20:59:52,120 --> 20:59:58,520
We've got a data set. And now we've got a way to turn it into tensors ready for a model. So
13310
20:59:58,520 --> 21:00:03,320
let's just visualize what another, I'll turn the seed off here so we can look at some more random
13311
21:00:03,320 --> 21:00:11,000
images. There we go. Okay, so we've got stake pixelated because we're downsizing 64, 64, 3.
13312
21:00:11,560 --> 21:00:16,760
Same thing for this one. And it's been flipped on the horizontal. And then same thing for this
13313
21:00:16,760 --> 21:00:25,960
pizza image and we'll do one more to finish off. Wonderful. So that is the premise of transforms
13314
21:00:26,600 --> 21:00:31,880
turning our images into tensors and also manipulating those images if we want to.
13315
21:00:32,760 --> 21:00:38,200
So let's get rid of this. I'm going to make another heading. We're up to section or part four now.
13316
21:00:38,200 --> 21:00:49,800
And this is going to be option one. So loading image data using image folder. And now I'm going
13317
21:00:49,800 --> 21:00:59,000
to turn that into markdown. And so let's go torch vision data sets. So recall how each one of the
13318
21:00:59,000 --> 21:01:03,480
torch vision domain libraries has its own data sets module that has built in functions for
13319
21:01:03,480 --> 21:01:09,160
helping you load data. In this case, we have an image folder. And there's a few others here if
13320
21:01:09,160 --> 21:01:16,120
you'd like to look into those. But an image folder, this class is going to help us load in data that
13321
21:01:16,120 --> 21:01:21,800
is in this format, the generic image classification format. So this is a prebuilt data sets function.
13322
21:01:22,360 --> 21:01:29,080
Just like there's prebuilt data sets, we can use prebuilt data set functions. Now option two
13323
21:01:29,080 --> 21:01:35,640
later on, this is a spoiler, is we're going to create our own custom version of a data set loader.
13324
21:01:35,640 --> 21:01:41,640
But we'll see that in a later video. So let's see how we can use image folder to load all of our
13325
21:01:42,440 --> 21:01:48,040
custom data, our custom images into tensors. So this is where the transform is going to come in
13326
21:01:48,040 --> 21:01:57,640
helpful. So let's write here, we can load image classification data using, let's write this,
13327
21:01:57,640 --> 21:02:06,520
let's write the full path name, torch vision dot data sets dot image folder. Put that in there,
13328
21:02:07,800 --> 21:02:16,120
beautiful. And so let's just start it out, use image folder to create data sets. Now in a previous
13329
21:02:16,120 --> 21:02:22,760
video, I hinted at the fact that we can pass a transform to our image folder class. That's going
13330
21:02:22,760 --> 21:02:30,120
to be right here. So let's see what that looks like in practice. So from torch vision, I'm going
13331
21:02:30,120 --> 21:02:35,800
to import data sets, because that's where the image folder module lives. And then we can go train
13332
21:02:35,800 --> 21:02:43,800
data equals data sets dot image folder. And we're going to pass in the root, which is our train
13333
21:02:43,800 --> 21:02:48,680
der, because we're going to do it for the training directory first. And then we're going to pass
13334
21:02:48,680 --> 21:02:54,520
in a transform, which is going to be equal to our data transform. And then we're going to pass in
13335
21:02:54,520 --> 21:02:58,280
a target transform, but we're going to leave this as none, which is the default, I believe,
13336
21:02:59,160 --> 21:03:08,120
we go up to here. Yeah, target transform is optional. So what this means is this is going to be a
13337
21:03:08,120 --> 21:03:17,000
transform for the data. And this is going to be a transform for the label slash target.
13338
21:03:17,000 --> 21:03:23,160
PyTorch likes to use target, I like to use label, but that's okay. So this means that we don't need
13339
21:03:23,160 --> 21:03:29,240
a target transform, because our labels are going to be inferred by the target directory where the
13340
21:03:29,240 --> 21:03:35,400
images live. So our pizza images are in this directory, and they're going to have pizza as the label,
13341
21:03:35,400 --> 21:03:42,840
because our data set is in standard image classification format. Now, if your data set wasn't in a
13342
21:03:42,840 --> 21:03:47,720
standard image classification format, you might use a different data loader here. A lot of them
13343
21:03:47,720 --> 21:03:54,520
will have a transform for the data. So this transform is going to run our images, whatever images are
13344
21:03:54,520 --> 21:04:00,440
loaded from these folders, through this transform that we've created here, it's going to resize them,
13345
21:04:00,440 --> 21:04:04,760
randomly flip them on the horizontal, and then turn them into tenses, which is exactly how we
13346
21:04:04,760 --> 21:04:11,560
want them for our PyTorch models. And if we wanted to transform the labels in some way, shape or form,
13347
21:04:11,560 --> 21:04:16,840
we could pass in a target transform here. But in our case, we don't need to transform the labels.
13348
21:04:18,120 --> 21:04:23,480
So let's now do the same thing for the test data. And so that's why I wanted to visualize
13349
21:04:23,480 --> 21:04:30,840
our transforms in the previous videos, because otherwise we're just passing them in as a transform.
13350
21:04:30,840 --> 21:04:35,000
So really, what's going to happen behind the scenes is all of our images are going to go
13351
21:04:35,000 --> 21:04:39,560
through these steps. And so that's what they're going to look like when we turn them into a data
13352
21:04:39,560 --> 21:04:45,480
set. So let's create the test data here or the test data set. The transform, we're going to
13353
21:04:45,480 --> 21:04:50,680
transform the test data set in the same way we've transformed our training data set. And we're
13354
21:04:50,680 --> 21:04:57,400
just going to leave that like that. So let's now print out what our data sets look like,
13355
21:04:57,400 --> 21:05:06,520
train data, and test data. Beautiful. So we have a data set, a torch data set,
13356
21:05:06,520 --> 21:05:10,360
which is an image folder. And we have number of data points. This is going to be for the training
13357
21:05:10,360 --> 21:05:17,640
data set. We have 225. So that means about 75 images per class. And we have the root location,
13358
21:05:17,640 --> 21:05:22,600
which is the folder we've loaded them in from, which is our training directory. We've set these
13359
21:05:22,600 --> 21:05:30,360
two up before, trained and tester. And then we have a transform here, which is a standard transform,
13360
21:05:30,360 --> 21:05:36,200
a resize, followed by random horizontal flip, followed by two tensor. Then we've got basically
13361
21:05:36,200 --> 21:05:43,560
the same output here for our test directory, except we have less samples there. So let's get a few
13362
21:05:43,560 --> 21:05:48,520
little attributes from the image folder. This is one of the benefits of using a pytorch prebuilt
13363
21:05:49,160 --> 21:05:54,600
data loader, is that or data set loader is that it comes with a fair few attributes. So we could
13364
21:05:54,600 --> 21:06:00,520
go to the documentation, find this out from in here, inherits from data set folder, keep digging
13365
21:06:00,520 --> 21:06:06,840
into there, or we could just come straight into Google collab. Let's go get class names as a list.
13366
21:06:07,400 --> 21:06:12,920
Can we go train data dot and then press tab? Beautiful. So we've got a fair few things here
13367
21:06:12,920 --> 21:06:19,320
that are attributes. Let's have a look at classes. This is going to give us a list of the class names,
13368
21:06:20,440 --> 21:06:27,960
class names. This is very helpful later on. So we've got pizza steak sushi. We're trying to
13369
21:06:27,960 --> 21:06:34,200
do everything with code here. So if we have this attribute of train data dot classes,
13370
21:06:34,200 --> 21:06:38,440
we can use this list later on for when we plot images straight from our data set,
13371
21:06:38,440 --> 21:06:45,160
or make predictions on them and we want to label them. You can also get class names as a dictionary,
13372
21:06:45,160 --> 21:06:54,200
map to their integer index, that is, so we can go train data dot and press tab. We've got class
13373
21:06:54,200 --> 21:07:02,840
to ID X. Let's see what this looks like. Class decked. Wonderful. So then we've got our string
13374
21:07:02,840 --> 21:07:10,040
class names mapped to their integer. So we've got pizza is zero, steak is one, sushi is two. Now,
13375
21:07:10,040 --> 21:07:15,320
this is where the target transform would come into play. If you wanted to transform those
13376
21:07:16,520 --> 21:07:20,680
these labels here in some way, shape or form, you could pass a transform into here.
13377
21:07:20,680 --> 21:07:26,760
And then if we keep going, let's check the lengths of what's going on. Check the lengths
13378
21:07:27,320 --> 21:07:32,680
of our data set. So we've seen this before, but this is going to just give us how many samples
13379
21:07:32,680 --> 21:07:39,880
that we have length, train data, length, test data, beautiful. And then of course, if you'd like
13380
21:07:39,880 --> 21:07:44,520
to explore more attributes, you can go train data dot, and then we've got a few other things,
13381
21:07:44,520 --> 21:07:50,600
functions, images, loader, samples, targets. If you wanted to just see the images, you can go dot
13382
21:07:50,600 --> 21:07:55,800
samples. If you wanted to see just the labels, you can go dot targets. This is going to be all
13383
21:07:55,800 --> 21:07:59,800
of our labels. Look at that. And I believe they're going to be an order. So we're going to have
13384
21:07:59,800 --> 21:08:05,160
zero, zero, zero, one, one, one, two, two, and then if we wanted to have a look, let's say we have a
13385
21:08:05,160 --> 21:08:15,880
look at the first sample, hey, we have data, pizza, steak sushi, train, pizza. There's the image path,
13386
21:08:15,880 --> 21:08:23,560
and it's a label zero for pizza. Wonderful. So now we've done that. How about we, we've been
13387
21:08:23,560 --> 21:08:30,440
visualizing this whole time. So let's keep up that trend. And let's visualize a sample and a label
13388
21:08:30,440 --> 21:08:37,960
from the train data data set. So in this video, we've used image folder to load our images
13389
21:08:37,960 --> 21:08:43,880
into tenses. And because our data is already in standard image classification format,
13390
21:08:43,880 --> 21:08:47,800
we can use one of torch vision dot data sets prebuilt functions.
13391
21:08:49,560 --> 21:08:53,800
So let's do some more visualization in the next video. I'll see you there.
13392
21:08:53,800 --> 21:09:03,560
Welcome back. In the last video, we used data sets dot image folder to turn all of our
13393
21:09:04,360 --> 21:09:11,480
image data into tenses. And we did that with the help of our data transform, which is a little
13394
21:09:11,480 --> 21:09:18,520
pipeline up here to take in some data, or specifically an image, resize it to a value that we've set in
13395
21:09:18,520 --> 21:09:24,520
our k6464 randomly flip it along the horizontal. We don't necessarily need this, but I've just put
13396
21:09:24,520 --> 21:09:29,320
that in there to indicate what happens when you pass an image through a transforms pipeline.
13397
21:09:29,320 --> 21:09:35,880
And then most importantly, we've turned our images into a torch tensor. So that means that our data,
13398
21:09:35,880 --> 21:09:42,040
our custom data set, this is so exciting, is now compatible to be used with a pytorch model.
13399
21:09:42,040 --> 21:09:47,000
So let's keep pushing forward. We're not finished yet. We're going to visualize some samples
13400
21:09:47,000 --> 21:09:55,800
from the train data data set. So let's, how can we do this? Let's get, we can index on the train data
13401
21:09:56,440 --> 21:10:06,360
data set to get a single image and a label. So if we go, can we do train data zero? What does that
13402
21:10:06,360 --> 21:10:13,720
give us? Okay, so this is going to give us an image tensor. And it's associated label. In this
13403
21:10:13,720 --> 21:10:22,600
case, it's an image of pizza, because why it's associated label is pizza. So let's take the zero
13404
21:10:22,600 --> 21:10:29,720
zero. So this is going to be our image. And the label is going to be train data zero. And we're
13405
21:10:29,720 --> 21:10:35,640
just going to get the first index item there, which is going to be one. And then if we have a look
13406
21:10:35,640 --> 21:10:44,440
at them separately, image and label, beautiful. So now one of our target images is in tensor format,
13407
21:10:44,440 --> 21:10:49,400
exactly how we want it. And it's label is in numeric format as well, which is also exactly how
13408
21:10:49,400 --> 21:10:55,560
we want it. And then if we wanted to convert this back to a non label, we can go class names
13409
21:10:57,160 --> 21:11:03,400
and index on that. And we see pizza. And I mean, non label is in non numeric, we can get it back
13410
21:11:03,400 --> 21:11:09,960
to string format, which is human understandable. We can just index on class names. So let's print
13411
21:11:09,960 --> 21:11:14,920
out some information about what's going on here. Print F, we're going to go image tensor.
13412
21:11:15,720 --> 21:11:20,920
I love F strings if you haven't noticed yet. Image tensor. And we're going to set in
13413
21:11:21,560 --> 21:11:25,560
new line, we're going to pass it in our image, which is just the image that we've got here.
13414
21:11:26,600 --> 21:11:30,520
Then we'll print in some more information about that. This is still all becoming one with the
13415
21:11:30,520 --> 21:11:36,840
data right where we're slowly finding out information about our data set so that if errors arise later
13416
21:11:36,840 --> 21:11:41,960
on, we can go, hmm, our image or we're getting a shape error. And I know our images are of this
13417
21:11:41,960 --> 21:11:47,240
shape or we're getting a data type error, which is why I've got the dot D type here. And that
13418
21:11:47,240 --> 21:11:53,480
might be why we're getting a data type issue. So let's do one more with the image label,
13419
21:11:53,480 --> 21:12:00,200
label, oh, well, actually, we'll do one more. We'll do print, we'll get the label data type as well.
13420
21:12:01,160 --> 21:12:07,720
Label, this will be important to take note of later on. Type, as I said, three big issues.
13421
21:12:08,360 --> 21:12:15,800
Shape mismatch, device mismatch, and data type mismatch. Can we get the type of our label?
13422
21:12:15,800 --> 21:12:24,840
Beautiful. So we've got our image tensor and we've got its shape. It's of torch size 36464.
13423
21:12:25,400 --> 21:12:31,320
That's exactly how we want it. The data type is torch float 32, which is the default data type
13424
21:12:31,320 --> 21:12:39,400
in PyTorch. Our image label is zero and the label data type is of integer. So let's try and plot
13425
21:12:39,400 --> 21:12:45,960
this and see what it looks like, hey, using matplotlib. So first of all, what do we have to do? Well,
13426
21:12:45,960 --> 21:12:53,720
we have to rearrange the order of dimensions. In other words, matplotlib likes color channels
13427
21:12:53,720 --> 21:12:59,240
last. So let's see what looks this looks like. We'll go image per mute. We've done this before,
13428
21:12:59,240 --> 21:13:06,120
image.permute 120 means we're reordering the dimensions. Zero would usually be here,
13429
21:13:06,120 --> 21:13:10,200
except that we've taken the zero dimension, the color channels and put it on the end
13430
21:13:10,200 --> 21:13:17,720
and shuffled the other two forward. So let's now print out different shapes. I love printing
13431
21:13:17,720 --> 21:13:22,120
out the change in shapes. It helps me really understand what's going on. Because sometimes
13432
21:13:22,120 --> 21:13:26,280
I look at a line like this and it doesn't really help me. But if I print out something of what
13433
21:13:26,280 --> 21:13:31,720
the shapes were originally and what they changed to, well, hey, that's a big help. That's what
13434
21:13:31,720 --> 21:13:37,640
Jupiter notebooks are all about, right? So this is going to be color channels first, height,
13435
21:13:38,360 --> 21:13:45,640
width. And depending on what data you're using, if you're not using images, if you're using text,
13436
21:13:45,640 --> 21:13:51,960
still knowing the shape of your data is a very good thing. We're going to go image per mute.shape
13437
21:13:52,520 --> 21:13:58,120
and this should be everything going right is height with color channels on the end here.
13438
21:13:58,120 --> 21:14:03,640
And we're just going to plot the image. You can never get enough plotting practice.
13439
21:14:04,680 --> 21:14:12,520
Plot the image. You're going to go PLT dot figure, we'll pass in fig size equals 10, 7.
13440
21:14:13,080 --> 21:14:19,320
And then we're going to PLT dot in show. We'll pass in the permuted image,
13441
21:14:20,120 --> 21:14:27,480
image underscore permutes, and then we'll turn off the axes. And we will set the title to be
13442
21:14:27,480 --> 21:14:33,240
class names. And we're going to index on the label, just as we did before. And we're going to set
13443
21:14:33,240 --> 21:14:41,000
the font size equal to 14. So it's nice and big. Here we go. Beautiful. There is our image of pizza.
13444
21:14:41,560 --> 21:14:47,960
It is very pixelated because we're going from about 512 as the original size 512 by 512 to 64,
13445
21:14:47,960 --> 21:14:54,360
64. I would encourage you to try this out. Potentially, you could use a different image here. So we've
13446
21:14:54,360 --> 21:14:59,960
indexed on sample zero. Maybe you want to change this to just be a random image and go through these
13447
21:14:59,960 --> 21:15:05,320
steps here. And then if you'd like to see different transforms, I'd also encourage you to try
13448
21:15:05,320 --> 21:15:10,600
changing this out, our transform pipeline here, maybe increase the size and see what it looks
13449
21:15:10,600 --> 21:15:16,600
like. And if you're feeling really adventurous, you can go into torch vision and look at the
13450
21:15:16,600 --> 21:15:21,800
transforms library here and then try one of these and see what it does to our images.
13451
21:15:21,800 --> 21:15:28,520
But we're going to keep pushing forward. We are going to look at another way. Or actually,
13452
21:15:28,520 --> 21:15:37,320
I think for completeness, let's now turn, we've got a data set. We want to, we wrote up here before
13453
21:15:37,320 --> 21:15:43,400
that we wanted to turn our images into a data set, and then subsequently a torch utils data
13454
21:15:43,400 --> 21:15:49,960
data loader. So we've done this before, by batching our images, or batching our data that we've
13455
21:15:49,960 --> 21:15:56,200
been working with. So I'd encourage you to give this a shot yourself. Try to go through the next
13456
21:15:56,200 --> 21:16:02,840
video and create a train data loader using our train data, wherever that is train data,
13457
21:16:02,840 --> 21:16:10,040
and a test data loader using our test data. So give that a shot and we'll do it together in the
13458
21:16:10,040 --> 21:16:19,880
next video. We'll turn our data sets into data loaders. Welcome back. How'd you go? In the last
13459
21:16:19,880 --> 21:16:26,200
video, I issued you the challenge to turn our data sets into data loaders. So let's do that
13460
21:16:26,200 --> 21:16:30,840
together now. I hope you gave it a shot. That's the best way to practice. So turn loaded images
13461
21:16:30,840 --> 21:16:38,760
into data loaders. So we're still adhering to our PyTorch workflow here. We've got a custom
13462
21:16:38,760 --> 21:16:43,800
data set. We found a way to turn it into tenses in the form of data sets. And now we're going to
13463
21:16:43,800 --> 21:16:50,680
turn it into a data loader. So we can turn our data sets into iterables or batchify our data.
13464
21:16:50,680 --> 21:17:00,920
So let's write down here, a data loader is going to help us turn our data sets into iterables.
13465
21:17:01,800 --> 21:17:11,160
And we can customize the batch size, write this down. So our model can see batch size
13466
21:17:11,160 --> 21:17:19,160
images at a time. So this is very important. As we touched on in the last section computer vision,
13467
21:17:19,160 --> 21:17:25,320
we create a batch size because if we had 100,000 images, chances are if they were all in one data
13468
21:17:25,320 --> 21:17:30,360
set, there's 100,000 images in the food 101 data set. We're only working with about 200.
13469
21:17:31,080 --> 21:17:37,960
If we try to load all 100,000 in one hit, chances are our hardware may run out of memory. And so
13470
21:17:37,960 --> 21:17:45,640
that's why we matchify our images. So if we have a look at this, NVIDIA SMI, our GPU only has 16
13471
21:17:45,640 --> 21:17:52,600
gigabytes. I'm using a Tesla T4 right now, well, has about 15 gigabytes of memory. So if we tried
13472
21:17:52,600 --> 21:17:58,280
to load 100,000 images into that whilst also computing on them with a PyTorch model,
13473
21:17:58,280 --> 21:18:03,080
potentially we're going to run out of memory and run into issues. So instead, we can turn them
13474
21:18:03,080 --> 21:18:09,720
into a data loader so that our model looks at 32 images at a time and can leverage all of the
13475
21:18:09,720 --> 21:18:18,280
memory that it has rather than running out of memory. So let's turn our train and test data sets
13476
21:18:18,280 --> 21:18:25,880
into data loaders, turn train and test data sets into data loaders. Now, this is not just for image
13477
21:18:25,880 --> 21:18:36,200
data. This is for all kinds of data in PyTorch. Images, text, audio, you name it. So import data
13478
21:18:36,200 --> 21:18:42,440
loader, then we're going to create a train data loader. We're going to set it equal to data loader.
13479
21:18:42,440 --> 21:18:49,160
We're going to pass in a data set. So let's set this to train data. Let's set the batch size.
13480
21:18:49,160 --> 21:18:54,440
What should we set the batch size to? I'm going to come up here and set a laser capital variable.
13481
21:18:54,440 --> 21:19:01,880
I'm going to use 32 because 32 is a good batch size. So we'll go 32 or actually,
13482
21:19:01,880 --> 21:19:05,480
let's start small. Let's just start with a batch size of one and see what happens.
13483
21:19:05,480 --> 21:19:11,720
Batch size one, number of workers. So this parameter is going to be, this is an important one. I'm going
13484
21:19:11,720 --> 21:19:15,560
to, I potentially have covered it before, but I'm going to introduce it again. Is this going to be
13485
21:19:15,560 --> 21:19:23,720
how many cores or how many CPU cores that is used to load your data? So the higher the better usually
13486
21:19:23,720 --> 21:19:32,520
and you can set this via OS CPU count, which will count how many CPUs your compute hardware has.
13487
21:19:32,520 --> 21:19:39,160
So I'll just show you how this works. Import OS and this is a Python OS module. We can do
13488
21:19:39,160 --> 21:19:45,080
CPU count to find out how many CPUs our Google Colab instance has. Mine has two,
13489
21:19:45,080 --> 21:19:51,320
your number may vary, but I believe most Colab instances have two CPUs. If you're running this on
13490
21:19:51,320 --> 21:19:55,480
your local machine, you may have more. If you're running it on dedicated deep learning hardware,
13491
21:19:55,480 --> 21:20:03,240
you may even have even more, right? So generally, if you set this to one, it will use one CPU core,
13492
21:20:03,240 --> 21:20:10,600
but if you set it to OS dot CPU count, it will use as many as possible. So we're just going to
13493
21:20:10,600 --> 21:20:16,040
leave this as one right now. You can customize this to however you want. And I'm going to shuffle
13494
21:20:16,040 --> 21:20:21,240
the training data because I don't want my model to recognize any order in the training data. So I'm
13495
21:20:21,240 --> 21:20:28,440
going to mix it up. And then I'm going to create the test data loader. Data set equals test data.
13496
21:20:29,720 --> 21:20:36,520
And batch size equals one, num workers, I'm going to set this to equal one as well. Again,
13497
21:20:36,520 --> 21:20:41,560
you can customize each of these, their hyper parameters to whatever you want. Number of workers
13498
21:20:41,560 --> 21:20:47,400
generally the more the better. And then I'm going to set shuffle equals false for the test data so
13499
21:20:47,400 --> 21:20:52,760
that if we want to evaluate our models later on, our test data set is always in the same order.
13500
21:20:53,640 --> 21:20:58,440
So now let's have a look at train data loader, see what happens. And test data loader.
13501
21:21:03,400 --> 21:21:09,720
Wonderful. So we get two instances of torch utils dot data dot data loader. And now we can
13502
21:21:09,720 --> 21:21:17,240
see if we can visualize something from the train data loader, as well as the test data loader.
13503
21:21:17,240 --> 21:21:21,000
I actually maybe we just visualize something from one of them. So we're not just double
13504
21:21:21,000 --> 21:21:26,680
handling everything. We get a length here. Wonderful. Because we're using a batch size of one,
13505
21:21:26,680 --> 21:21:34,920
our lengths of our data loaders are the same as our data sets. Now, of course, this would change
13506
21:21:34,920 --> 21:21:42,360
if we set, oh, we didn't even set this to the batch size parameter batch size. Let's come down
13507
21:21:42,360 --> 21:21:48,760
here and do the same here batch size. So we'll watch this change. If we wanted to look at 32
13508
21:21:48,760 --> 21:21:54,920
images at a time, we definitely could do that. So now we have eight batches, because 22, 225
13509
21:21:54,920 --> 21:22:02,680
divided by 32 equals roughly eight. And then 75 divided by 32 also equals roughly three. And
13510
21:22:02,680 --> 21:22:07,080
remember, these numbers are going to be rounded if there are some overlaps. So let's get rid of,
13511
21:22:08,280 --> 21:22:13,320
we'll change this back to one. And we'll keep that there. We'll get rid of these two.
13512
21:22:14,040 --> 21:22:20,680
And let's see what it looks like to plot an image from our data loader. Or at least have a look at it.
13513
21:22:22,520 --> 21:22:25,800
Check out the shapes. That's probably the most important point at this time. We've already
13514
21:22:25,800 --> 21:22:32,280
plotted in our things. So let's iterate through our train data loader. And we'll grab the next one.
13515
21:22:32,280 --> 21:22:40,120
We'll grab the image and the label. And we're going to print out here. So batch size will now be one.
13516
21:22:40,840 --> 21:22:46,520
You can change the batch size if you like. This is just again, another way of getting familiar
13517
21:22:46,520 --> 21:22:57,400
with the shapes of our data. So image shape. Let's go image dot shape. And we're going to
13518
21:22:57,400 --> 21:23:03,160
write down here. This shape is going to be batch size. This is what our data loader is going to
13519
21:23:03,160 --> 21:23:12,120
add to our images is going to add a batch dimension, color channels, height, width. And then print.
13520
21:23:12,840 --> 21:23:18,600
Let's check out that label shape. Same thing with the labels. It's going to add a batch
13521
21:23:18,600 --> 21:23:28,120
dimension. Label. And let's see what happens. Oh, we forgot the end of the bracket. Beautiful.
13522
21:23:28,680 --> 21:23:33,400
So we've got image shape. Our label shape is only one because we have a batch size of one.
13523
21:23:34,120 --> 21:23:41,160
And so now we've got batch size one, color channels three, height, width. And if we change this to
13524
21:23:41,160 --> 21:23:48,920
32, what do you think's going to happen? We get a batch size of 32, still three color channels,
13525
21:23:49,640 --> 21:23:55,800
still 64, still 64. And now we have 32 labels. So that means within each batch, we have 32
13526
21:23:55,800 --> 21:24:02,840
images. And we have 32 labels. We could use this with a model. I'm going to change this back to one.
13527
21:24:02,840 --> 21:24:11,000
And I think we've covered enough in terms of loading our data sets. How cool is this?
13528
21:24:11,000 --> 21:24:16,840
We've come a long way. We've downloaded a custom data set. We've loaded it into a data set using
13529
21:24:16,840 --> 21:24:23,720
image folder turned it into tenses using our data transform and now batchified our custom data set
13530
21:24:23,720 --> 21:24:29,160
in data loaders. We've used these with models before. So if you wanted to, you could go right
13531
21:24:29,160 --> 21:24:33,480
ahead and build a convolutional neural network to try and find patterns in our image tenses.
13532
21:24:34,040 --> 21:24:39,800
But in the next video, let's pretend we didn't have this data loader,
13533
21:24:41,320 --> 21:24:50,200
this image folder class available to us. How could we load our image data set so that it's
13534
21:24:50,200 --> 21:24:58,040
compatible? Like our image data set here, how could we replicate this image folder class?
13535
21:24:58,040 --> 21:25:04,600
So that we could use it with a data loader. Because data load is part of torch utils.data,
13536
21:25:04,600 --> 21:25:10,040
you're going to see these everywhere. Let's pretend we didn't have the torch vision.data sets
13537
21:25:10,040 --> 21:25:15,720
image folder helper function. And we'll see in the next video, how we can replicate that functionality.
13538
21:25:15,720 --> 21:25:25,160
I'll see you there. Welcome back. So over the past few videos, we've been working out how to get
13539
21:25:25,160 --> 21:25:31,320
how to get our data from our data folder, pizza, steak, and sushi. We've got images of different
13540
21:25:31,320 --> 21:25:35,960
food data here. And we're trying to get it into Tensor format. So we've seen how to do that
13541
21:25:35,960 --> 21:25:44,360
with an existing data loader helper function or data set function in image folder. However,
13542
21:25:44,360 --> 21:25:50,040
what if image folder didn't exist? And we need to write our own custom data loading function.
13543
21:25:50,040 --> 21:25:56,280
Now the premise of this is although it does exist, it's going to be good practice because you might
13544
21:25:56,280 --> 21:26:01,080
come across a case where you're trying to use a data set where a prebuilt function doesn't exist.
13545
21:26:01,080 --> 21:26:08,600
So let's replicate the functionality of image folder by creating our own data loading class.
13546
21:26:08,600 --> 21:26:15,800
So we want a few things. We want to be able to get the class names as a list from our loaded data.
13547
21:26:15,800 --> 21:26:22,360
And we want to be able to get our class names as a dictionary as well. So the whole goal of this
13548
21:26:22,360 --> 21:26:29,000
video is to start writing a function or a class that's capable of loading data from here into
13549
21:26:29,000 --> 21:26:36,680
Tensor format, capable of being used with the PyTorch's data loader class, like we've done here. So we
13550
21:26:36,680 --> 21:26:41,720
want to create a data set. Let's start it off. We're going to create another heading here. This is
13551
21:26:41,720 --> 21:26:52,280
going to be number five, option two, loading image data with a custom data set. So we want a few
13552
21:26:53,720 --> 21:27:03,320
functionality steps here. Number one is one, two, be able to load images from file to one,
13553
21:27:03,320 --> 21:27:13,480
two, be able to get class names from the data set, and three, one, two, be able to get classes
13554
21:27:13,480 --> 21:27:21,720
as dictionary from the data set. And so let's briefly discuss the pros and cons of creating
13555
21:27:21,720 --> 21:27:31,000
your own custom data set. We saw option one was to use a pre-existing data set loader helping
13556
21:27:31,000 --> 21:27:36,600
function from torch vision. And it's going to be quite similar if we go torch vision data sets.
13557
21:27:39,240 --> 21:27:44,200
Quite similar if you're using other domain libraries here, there we're going to be data
13558
21:27:44,200 --> 21:27:51,720
loading utilities. But at the base level of PyTorch is torchutils.data.dataset. Now this is
13559
21:27:51,720 --> 21:27:58,920
the base data set class. So we want to build on top of this to create our own image folder loading
13560
21:27:58,920 --> 21:28:05,720
class. So what are the pros and cons of creating your own custom data set? Well, let's discuss some
13561
21:28:05,720 --> 21:28:16,760
pros. So one pro would be you can create a data set out of almost anything as long as you write
13562
21:28:16,760 --> 21:28:25,160
the right code to load it in. And another pro is that you're not limited to PyTorch pre-built
13563
21:28:25,160 --> 21:28:34,200
data set functions. A couple of cons would be that even though this is to point number one.
13564
21:28:35,240 --> 21:28:44,040
So even though you could create a data set out of almost anything, it doesn't mean that it will
13565
21:28:44,040 --> 21:28:50,120
automatically work. It will work. And of course, you can verify this through extensive testing,
13566
21:28:50,120 --> 21:28:56,200
seeing if your model actually works, if it actually loads data in the way that you want it. And another
13567
21:28:56,200 --> 21:29:04,440
con is that using a custom data set requires us to write more code. So often results in us
13568
21:29:05,080 --> 21:29:14,760
writing more code, which could be prone to errors or performance issues. So typically if
13569
21:29:14,760 --> 21:29:20,360
something makes it into the PyTorch standard library or the PyTorch domain libraries,
13570
21:29:22,280 --> 21:29:28,680
if functionality makes it into here, it's generally been tested many, many times. And it can kind of
13571
21:29:28,680 --> 21:29:34,280
be verified that it works quite well with, or if you do use it, it works quite well. Whereas if
13572
21:29:34,280 --> 21:29:40,040
we write our own code, sure, we can test it ourselves, but it hasn't got the robustness to begin with,
13573
21:29:40,040 --> 21:29:45,240
that is, we could fix it over time, as something that's included in say the PyTorch standard library.
13574
21:29:45,960 --> 21:29:49,960
Nonetheless, it's important to be aware of how we could create such a custom data set.
13575
21:29:50,600 --> 21:29:55,080
So let's import a few things that we're going to use. We'll import OS, because we're going to be
13576
21:29:55,080 --> 21:30:02,200
working with Python's file system over here. We're going to import path lib, because we're going to
13577
21:30:02,200 --> 21:30:06,760
be working with file paths. We'll import torch, we don't need to again, but I'm just doing this
13578
21:30:06,760 --> 21:30:14,120
for completeness. We're going to import image from pill, the image class, because we want to be
13579
21:30:14,120 --> 21:30:20,920
opening images. I'm going to import from torch utils dot data. I'm going to import data set,
13580
21:30:20,920 --> 21:30:26,600
which is the base data set. And as I said over here, we can go to data sets, click on torch utils
13581
21:30:26,600 --> 21:30:32,680
data dot data set. This is an abstract class representing a data set. And you'll find that this
13582
21:30:32,680 --> 21:30:38,680
data set links to itself. So this is the base data set class. Many of the data sets in PyTorch,
13583
21:30:38,680 --> 21:30:43,880
the prebuilt functions, subclass this. So this is what we're going to be doing it.
13584
21:30:44,520 --> 21:30:49,960
And as a few notes here, all subclasses should overwrite get item. And you should optionally
13585
21:30:49,960 --> 21:30:54,680
overwrite land. These two methods, we're going to see this in a future video. For now, we're just
13586
21:30:54,680 --> 21:31:01,880
we're just setting the scene here. So from torch vision, we're going to import transforms, because
13587
21:31:01,880 --> 21:31:08,120
we want to not only import our images, but we want to transform them into tenses. And from the
13588
21:31:08,120 --> 21:31:15,240
Python's typing module, I'm going to import tuple dict and list. So we can put type hints
13589
21:31:15,240 --> 21:31:25,080
when we create our class and loading functions. Wonderful. So this is our instance of torch vision
13590
21:31:25,080 --> 21:31:32,280
dot data sets image folder, torch vision dot data sets dot image folder. Let's have a look
13591
21:31:32,280 --> 21:31:38,760
at the train data. So we want to write a function that can replicate getting the classes from a
13592
21:31:38,760 --> 21:31:46,920
particular directory, and also turning them into an index or dictionary that is. So let's build
13593
21:31:46,920 --> 21:31:52,200
a helper function to replicate this functionality here. In other words, I'd like to write a helper
13594
21:31:52,200 --> 21:31:57,640
function that if we pass it in a file path, such as pizza steak sushi or this data folder,
13595
21:31:58,440 --> 21:32:04,680
it's going to go in here. And it's going to return the class names as a list. And it's also going
13596
21:32:04,680 --> 21:32:10,840
to turn them into a dictionary, because it's going to be helpful for later on when we'd like to access
13597
21:32:10,840 --> 21:32:17,320
the classes and the class to ID X. And if we really want to completely recreate image folder,
13598
21:32:17,320 --> 21:32:23,240
well, image folder has this functionality. So we'd like that too. So this is just a little high level
13599
21:32:23,240 --> 21:32:28,520
overview of what we're going to be doing. I might link in here that we're going to subclass this.
13600
21:32:29,640 --> 21:32:39,960
So all custom data sets in pie torch, often subclass this. So here's what we're going to be doing.
13601
21:32:39,960 --> 21:32:44,200
Over the next few videos, we want to be able to load images from a file. Now you could replace
13602
21:32:44,200 --> 21:32:49,560
images with whatever data that you're working with the same premise will be here. You want to be
13603
21:32:49,560 --> 21:32:53,560
able to get the class names from the data set and want to be able to get classes as a dictionary
13604
21:32:53,560 --> 21:32:59,880
from the data set. So we're going to map our samples, our image samples to that class name
13605
21:33:00,760 --> 21:33:09,000
by just passing a file path to a function that we're about to write. And some pros and cons of
13606
21:33:09,000 --> 21:33:14,520
creating a custom data set. We've been through that. Let's in the next video, start coding up a
13607
21:33:14,520 --> 21:33:23,320
helper function to retrieve these two things from our target directory. In the last video,
13608
21:33:23,320 --> 21:33:29,720
we discussed the exciting concept of creating a custom data set. And we wrote down a few things
13609
21:33:29,720 --> 21:33:34,840
that we want to get. We discussed some pros and cons. And we learned that many custom data sets
13610
21:33:34,840 --> 21:33:40,680
inherit from torch dot utils dot data data set. So that's what we'll be doing later on. In this
13611
21:33:40,680 --> 21:33:46,280
video, let's focus on writing a helper function to recreate this functionality. So I'm going to
13612
21:33:46,280 --> 21:33:55,480
title this 5.1, creating a helper function to get class names. I'm going to turn this into
13613
21:33:55,480 --> 21:34:02,280
markdown. And if I go into here, so we want to function to let's write down some steps and then
13614
21:34:02,280 --> 21:34:09,240
we'll code it out. So we'll get the class names, we're going to use OS dot scanner. So it's going
13615
21:34:09,240 --> 21:34:21,560
to scanner directory to traverse a target directory. And ideally, the directory is in standard image
13616
21:34:22,440 --> 21:34:31,320
classification format. So just like the image folder class, our custom data class is going to
13617
21:34:31,320 --> 21:34:37,720
require our data already be formatted. In the standard image classification format, such as
13618
21:34:37,720 --> 21:34:42,920
train and test for training and test images, and then images for a particular class are in a
13619
21:34:42,920 --> 21:34:49,240
particular directory. So let's keep going. And number two, what else do we want it to do? We want
13620
21:34:49,240 --> 21:34:57,800
it to raise an error if the class names aren't found. So if this happens, there might be,
13621
21:34:57,800 --> 21:35:03,240
we want this to enter the fact that there might be something wrong with the directory structure.
13622
21:35:04,920 --> 21:35:14,120
And number three, we also want to turn the class names into our dict and a list and return them.
13623
21:35:15,720 --> 21:35:19,720
Beautiful. So let's get started. Let's set up the path directory
13624
21:35:19,720 --> 21:35:28,760
for the target directory. So our target directory is going to be what the directory we want to load
13625
21:35:28,760 --> 21:35:33,800
directory, if I could spell, we want to load our data from, let's start with the training
13626
21:35:33,800 --> 21:35:41,880
der, just for an example. So target directory, what do we get? So we're just going to use the
13627
21:35:41,880 --> 21:35:49,640
training folder as an example to begin with. And we'll go print target der, we'll put in the target
13628
21:35:49,640 --> 21:35:57,080
directory, just want to exemplify what we're doing. And then we're going to get the class names
13629
21:35:57,720 --> 21:36:03,720
from the target directory. So I'll show you the functionality of our scanner. Of course,
13630
21:36:03,720 --> 21:36:11,400
you could look this up in the Python documentation. So class names found, let's set this to be sorted.
13631
21:36:11,400 --> 21:36:21,320
And then we'll get the entry name, entry dot name for entry in list. So we're going to get OS list
13632
21:36:22,040 --> 21:36:30,760
scanner of the image path slash target directory. Let's see what happens when we do this.
13633
21:36:32,680 --> 21:36:35,240
Target directory have we got the right brackets here.
13634
21:36:35,240 --> 21:36:46,280
Now, is this going to work? Let's find out. Oh, image path slash target directory.
13635
21:36:48,600 --> 21:36:53,240
What do we get wrong? Oh, we don't need the image path there. Let's put, let's just put target
13636
21:36:53,240 --> 21:37:01,880
directory there. There we go. Beautiful. So we set up our target directory as been the training
13637
21:37:01,880 --> 21:37:07,960
to. And so if we just go, let's just do list. What happens if we just run this function here?
13638
21:37:09,160 --> 21:37:15,720
Oh, a scanner. Yeah, so there we go. So we have three directory entries. So this is where we're
13639
21:37:15,720 --> 21:37:21,560
getting entry dot name for everything in the training directory. So if we look in the training
13640
21:37:21,560 --> 21:37:27,960
directory, what do we have train? And we have one entry for pizza, one entry for sushi, one entry
13641
21:37:27,960 --> 21:37:33,240
for steak. Wonderful. So now we have a way to get a list of class names. And we could quite easily
13642
21:37:33,240 --> 21:37:38,680
turn this into a dictionary, couldn't we? Which is exactly what we want to do. We want to recreate
13643
21:37:38,680 --> 21:37:44,120
this, which we've done. And we want to recreate this, which is also done. So now let's take this
13644
21:37:44,120 --> 21:37:51,560
functionality here. And let's turn that into a function. All right, what can we do? What do we
13645
21:37:51,560 --> 21:37:58,760
call this? I'm going to call this def fine classes. And I'm going to say that it takes in a directory
13646
21:37:58,760 --> 21:38:05,320
which is a string. And it's going to return. This is where I imported typing from Python type and
13647
21:38:05,320 --> 21:38:13,400
imported tuple. And I'm going to return a list, which is a list of strings and a dictionary,
13648
21:38:13,400 --> 21:38:24,120
which is strings map to integers. Beautiful. So let's keep going. We want to, we want this
13649
21:38:24,120 --> 21:38:30,040
function to return given a target directory, we want it to return these two things. So we've seen
13650
21:38:30,040 --> 21:38:36,840
how we can get a list of the directories in a target directory by using OS scanner. So let's
13651
21:38:36,840 --> 21:38:47,800
write finds the classes are the class folder names in a target directory. Beautiful. And we know
13652
21:38:47,800 --> 21:38:54,760
that it's going to return a list and a dictionary. So let's do step number one, we want to get the
13653
21:38:54,760 --> 21:39:04,200
class names by scanning the target directory. We'll go classes, just we're going to replicate the
13654
21:39:04,200 --> 21:39:11,880
functionality we've done about, but for any given directory here. So classes equals sorted entry
13655
21:39:11,880 --> 21:39:24,440
dot name for entry in OS scanner. And we're going to pass at the target directory. If entry dot is
13656
21:39:24,440 --> 21:39:31,800
dirt, we're just going to make sure it's a directory as well. And so if we just return classes and see
13657
21:39:31,800 --> 21:39:39,080
what happens. So find classes, let's pass it in our target directory, which is our training directory.
13658
21:39:39,640 --> 21:39:49,400
What do we get? Beautiful. So we need to also return class to ID X. So let's keep going. So number
13659
21:39:49,400 --> 21:40:00,840
two is let's go raise an error. If class names could not be found. So if not classes, let's say
13660
21:40:00,840 --> 21:40:08,520
raise file, we're going to raise a file not found error. And then let's just write in here F
13661
21:40:09,400 --> 21:40:16,280
couldn't find any classes in directory. So we're just writing some error checking code here.
13662
21:40:19,240 --> 21:40:23,560
So if we can't find a class list within our target directory, we're going to raise this
13663
21:40:23,560 --> 21:40:32,760
error and say couldn't find any classes in directory, please check file structure. And there's another
13664
21:40:32,760 --> 21:40:38,280
checkup here that's going to help us as well to check if the entry is a directory. So finally,
13665
21:40:38,280 --> 21:40:44,440
let's do number three. What do we want to do? So we want to create a dictionary of index labels.
13666
21:40:44,440 --> 21:40:55,480
So computers, why do we do this? Well, computers prefer numbers rather than strings as labels. So we
13667
21:40:55,480 --> 21:41:03,480
can do this, we've already got a list of classes. So let's just create class to ID X equals class
13668
21:41:03,480 --> 21:41:15,480
name, I for I class name in enumerate classes. Let's see what this looks like.
13669
21:41:18,200 --> 21:41:25,800
So we go class names, and then class to ID X, or we can just return it actually. Do we spell
13670
21:41:25,800 --> 21:41:31,880
enumerate role? Yes, we did. So what this is going to do is going to map a class name to an integer
13671
21:41:31,880 --> 21:41:37,240
or to I for I class name in enumerate classes. So it's going to go through this, and it's going
13672
21:41:37,240 --> 21:41:42,440
to go for I. So the first one zero is going to be pizza. Ideally, one will be steak,
13673
21:41:42,440 --> 21:41:49,160
two will be sushi. Let's see how this goes. Beautiful. Look at that. We've just replicated
13674
21:41:49,160 --> 21:41:56,440
the functionality of image folder. So now we can use this helper function in our own custom
13675
21:41:56,440 --> 21:42:02,040
data set, find classes to traverse through a target directory, such as train, we could do the
13676
21:42:02,040 --> 21:42:09,160
same for test if we wanted to to. And that way, we've got a list of classes. And we've also got
13677
21:42:09,160 --> 21:42:17,720
a dictionary mapping those classes to integers. So now let's in the next video move towards sub
13678
21:42:17,720 --> 21:42:26,040
classing torch utils dot data dot data set. And we're going to fully replicate image folder. So I'll see you there.
13679
21:42:30,280 --> 21:42:35,640
In the last video, we wrote a great helper function called find classes that takes in a target
13680
21:42:35,640 --> 21:42:42,840
directory and returns a list of classes and a dictionary mapping those class names to an integer.
13681
21:42:42,840 --> 21:42:50,840
So let's move forward. And this time, we're going to create a custom data set. To replicate
13682
21:42:50,840 --> 21:42:56,520
image folder. Now we don't necessarily have to do this, right, because image folder already exists.
13683
21:42:56,520 --> 21:43:01,800
And if something already exists in the pie torch library, chances are it's going to be tested well,
13684
21:43:01,800 --> 21:43:08,200
it's going to work efficiently. And we should use it if we can. But if we needed some custom
13685
21:43:08,200 --> 21:43:14,520
functionality, we can always build up our own custom data set by sub classing torch dot utils
13686
21:43:14,520 --> 21:43:20,360
dot data data set. Or if a pre built data set function didn't exist, well, we're probably going
13687
21:43:20,360 --> 21:43:26,520
to want to subclass torch utils data dot data set anyway. And if we go into the documentation here,
13688
21:43:26,520 --> 21:43:30,040
there's a few things that we need to keep in mind when we're creating our own custom data set.
13689
21:43:30,680 --> 21:43:35,800
All data sets that represent a map from keys to data samples. So that's what we want to do.
13690
21:43:35,800 --> 21:43:41,480
We want to map keys, in other words, targets or labels to data samples, which in our case are
13691
21:43:41,480 --> 21:43:50,840
food images. So we should subclass this class here. Now to note, all subclasses should overwrite
13692
21:43:50,840 --> 21:43:57,240
get item. So get item is a method in Python, which is going to get an item or get a sample,
13693
21:43:57,800 --> 21:44:03,320
supporting fetching a data sample for a given key. So for example, if we wanted to get sample
13694
21:44:03,320 --> 21:44:08,520
number 100, this is what get item should support and should return us sample number 100.
13695
21:44:09,400 --> 21:44:15,960
And subclasses could also optionally override land, which is the length of a data set. So return
13696
21:44:15,960 --> 21:44:20,920
the size of the data set by many sampler implementations and the default options of data
13697
21:44:20,920 --> 21:44:27,800
loader, because we want to use this custom data set with data loader later on. So we should keep
13698
21:44:27,800 --> 21:44:34,040
this in mind when we're building our own custom subclasses of torch utils data data set. Let's see
13699
21:44:34,040 --> 21:44:37,560
this hands on, we're going to break it down. It's going to be a fair bit of code, but that's all right.
13700
21:44:38,280 --> 21:44:48,840
Nothing that we can't handle. So to create our own custom data set, we want to number one,
13701
21:44:48,840 --> 21:44:55,800
first things first is we're going to subclass subclass torch dot utils dot data dot data set.
13702
21:44:55,800 --> 21:45:07,160
Two, what do we want to do? We want to init our subclass with target directory. So the directory
13703
21:45:07,160 --> 21:45:18,920
we'd like to get data from, as well as a transform, if we'd like to transform our data. So just like
13704
21:45:18,920 --> 21:45:25,400
when we used image folder, we could pass a transform to our data set, so that we could transform the
13705
21:45:25,400 --> 21:45:32,120
data that we were loading. We want to do the same thing. And we want to create several attributes.
13706
21:45:33,560 --> 21:45:41,640
Let's write them down here. We want paths, which will be the parts of our images. What else do
13707
21:45:41,640 --> 21:45:50,040
we want? We want transform, which will be the transform we'd like to use. We want classes,
13708
21:45:50,040 --> 21:46:00,440
which is going to be a list of the target classes. And we want class to ID X, which is going to be
13709
21:46:00,440 --> 21:46:10,520
a dict of the target classes, mapped to integer labels. Now, of course, these attributes will
13710
21:46:10,520 --> 21:46:16,120
differ depending on your data set. But we're replicating image folder here. So these are just
13711
21:46:16,120 --> 21:46:22,840
some of the things that we've seen that come with image folder. But regardless of what data set
13712
21:46:22,840 --> 21:46:26,520
you're working with, there are probably some things that you want to cross them universal.
13713
21:46:26,520 --> 21:46:31,320
You probably want all the paths of where your data is coming from, the transforms you'd like to
13714
21:46:31,320 --> 21:46:37,080
perform on your data, what classes you're working with, and a map of those classes to an index.
13715
21:46:37,640 --> 21:46:44,920
So let's keep pushing forward. We want to create a function to load images, because after all,
13716
21:46:44,920 --> 21:46:51,960
we want to open some images. So this function will open an image. Number five, we want to
13717
21:46:52,600 --> 21:47:03,640
overwrite the LAN method to return the length of our data set. So just like it said in the documentation,
13718
21:47:05,160 --> 21:47:12,360
if you subclass using torch.utils.data, the data set, you should overwrite get item,
13719
21:47:12,360 --> 21:47:17,240
and you should optionally overwrite LAN. So we're going to, instead of optionally, we are going to
13720
21:47:17,240 --> 21:47:29,000
overwrite length. And number six, we want to overwrite the get item method to return a given sample
13721
21:47:29,640 --> 21:47:39,800
when passed an index. Excellent. So we've got a fair few steps here. But if they don't make
13722
21:47:39,800 --> 21:47:45,240
sense now, it's okay. Let's code it out. Remember our motto, if and doubt, code it out. And if
13723
21:47:45,240 --> 21:47:50,520
and doubt, run the code. So we're going to write a custom data set. This is so exciting, because
13724
21:47:51,800 --> 21:47:57,320
when you work with prebuilt data sets, it's pretty cool in machine learning. But when you can write
13725
21:47:57,880 --> 21:48:05,480
code to create your own data sets, and that's, well, that's magic. So number one is we're going to,
13726
21:48:05,480 --> 21:48:11,960
or number zero is we're going to import torch utils data set, we don't have to rewrite this,
13727
21:48:11,960 --> 21:48:16,680
we've already imported it, but we're going to do it anyway for completeness. Now step number one
13728
21:48:16,680 --> 21:48:24,760
is to subclass it subclass torch utils data, the data set. So just like when we built a model,
13729
21:48:25,480 --> 21:48:30,520
we're going to subclass and in module, but in this time, we're going to call us our class
13730
21:48:30,520 --> 21:48:37,400
image folder custom. And we're going to inherit from data set. This means that all the functionality
13731
21:48:37,400 --> 21:48:43,160
that's contained within torch utils data data set, we're going to get for our own custom class.
13732
21:48:44,360 --> 21:48:48,120
Number two, let's initialize. So we're going to initialize
13733
21:48:49,560 --> 21:48:54,840
our custom data set. And there's a few things that we'd like, and into our subclass with the
13734
21:48:54,840 --> 21:49:00,280
target directory, the directory we'd like to get data from, as well as the transform if we'd
13735
21:49:00,280 --> 21:49:06,760
like to transform our data. So let's write a knit function, a knit, and we're going to go self,
13736
21:49:07,640 --> 21:49:15,640
target, and target is going to be a string. And we're going to set a transform here,
13737
21:49:15,640 --> 21:49:24,440
we'll set it equal to none. Beautiful. So this way we can pass in a target directory of images
13738
21:49:24,440 --> 21:49:29,400
that we'd like to load. And we can also pass in a transform, just similar to the transforms that
13739
21:49:29,400 --> 21:49:35,960
we've created previously. So now we're up to number three, which is create several attributes. So
13740
21:49:35,960 --> 21:49:43,960
let's see what this looks like, create class attributes. So we'll get all of the image paths.
13741
21:49:44,680 --> 21:49:52,120
So we can do this just like we've done before, self paths equals list, path lib dot path,
13742
21:49:52,120 --> 21:49:56,440
because what's our target directory going to be? Well, I'll give you a spoiler alert,
13743
21:49:56,440 --> 21:50:02,200
it's going to be a path like the test directory, or it's going to be the train directory.
13744
21:50:02,760 --> 21:50:08,680
Because we're going to use this once for our test directory and our train directory,
13745
21:50:08,680 --> 21:50:15,320
just like we use the original image folder. So we're going to go through the target directory
13746
21:50:15,320 --> 21:50:23,160
and find out all of the paths. So this is getting all of the image paths that support
13747
21:50:23,160 --> 21:50:31,400
or that follow the file name convention of star star dot jpg. So if we have a look at this,
13748
21:50:31,400 --> 21:50:38,440
we passed in the test folder. So test is the folder star would mean any of these 123 pizza
13749
21:50:38,440 --> 21:50:44,520
steak sushi, that's the first star, then slash would go into the pizza directory. The star here
13750
21:50:44,520 --> 21:50:50,760
would mean any of the file combinations here that end in dot jpg. So this is getting us a list of
13751
21:50:50,760 --> 21:50:56,680
all of the image paths within a target directory. In other words, within the test directory and
13752
21:50:56,680 --> 21:51:02,120
within the train directory, when we call these two separately. So let's keep going, we've got all
13753
21:51:02,120 --> 21:51:06,920
of the image parts, what else did we have to do? We want to create transforms. So let's set up
13754
21:51:06,920 --> 21:51:16,840
transforms, self dot transforms equals transform. Oh, we'll just call that transform actually,
13755
21:51:16,840 --> 21:51:25,080
set up transform equals transform. So we're going to get this from here. And I put it as
13756
21:51:25,080 --> 21:51:33,480
none because it transform can be optional. So let's create classes and class to ID X attributes,
13757
21:51:33,480 --> 21:51:39,640
which is the next one on our list, which is here classes and class to ID X. Now, lucky us,
13758
21:51:39,640 --> 21:51:46,600
in the previous video, we created a function to return just those things. So let's go self dot
13759
21:51:46,600 --> 21:51:56,680
classes and self dot class to ID X equals find classes. And we're going to pass in the target
13760
21:51:56,680 --> 21:52:04,360
der or the target der from here. Now, what's next? We've done step number three, we need
13761
21:52:05,960 --> 21:52:10,520
number four is create a function to load images. All right, let's see what this looks like. So
13762
21:52:10,520 --> 21:52:20,200
number four, create a function to load images. So let's call it load image. And we're going to
13763
21:52:20,200 --> 21:52:26,280
pass in self. And we'll also pass in an index. So the index of the image we'd like to load.
13764
21:52:26,920 --> 21:52:33,800
And this is going to return an image dot image. So where does that come from? Well, previously,
13765
21:52:33,800 --> 21:52:39,480
we imported from pill. So we're going to use Python image library or pillow to import our
13766
21:52:39,480 --> 21:52:45,560
images. So we're going to give on a file path from here, such as pizza, we're going to import
13767
21:52:45,560 --> 21:52:50,840
it with the image class. And we can do that using, I believe it's image dot open. So let's give that
13768
21:52:50,840 --> 21:53:00,360
a try. I'll just write a note in here, opens an image via a path and returns it. So let's write
13769
21:53:00,360 --> 21:53:08,360
image path equals self. This is why we got all of the image paths above. So self dot paths. And
13770
21:53:08,360 --> 21:53:16,920
we're going to index it on the index. Beautiful. And then let's return image dot open image path.
13771
21:53:17,960 --> 21:53:21,560
So we're going to get a particular image path. And then we're just going to open it.
13772
21:53:22,920 --> 21:53:28,120
So now we're up to step number five, override the land method to return the length of our data set.
13773
21:53:29,400 --> 21:53:34,360
This is optional, but we're going to do it anyway. So overwrite.
13774
21:53:34,360 --> 21:53:43,400
Len. So this just wants to return how many samples we have in our data set. So let's write that
13775
21:53:43,400 --> 21:53:51,240
def, Len. So if we call Len on our data set instance, it's going to return just how many numbers there
13776
21:53:51,240 --> 21:53:59,640
are. So let's write this down. Returns the total number of samples. And this is just going to be
13777
21:53:59,640 --> 21:54:08,920
simply return length or Len of self dot paths. So for our target directory, if it was the training
13778
21:54:08,920 --> 21:54:16,360
directory, we'd return the number of image paths that this code has found out here. And same for the
13779
21:54:16,360 --> 21:54:26,360
test directory. So next, I'm going to go number six is we want to overwrite, we put this up here,
13780
21:54:26,360 --> 21:54:32,440
the get item method. So this is required if we want to subclass torch utils data data set. So
13781
21:54:32,440 --> 21:54:38,520
this is in the documentation here. All subclasses should override get item. So we want get item to,
13782
21:54:38,520 --> 21:54:43,960
if we pass it an index to our data set, we want it to return that particular item. So let's see
13783
21:54:43,960 --> 21:54:51,960
what this looks like. Override the get item method to return our particular sample.
13784
21:54:51,960 --> 21:55:00,680
And now this method is going to leverage get item, all of the code that we've created above.
13785
21:55:00,680 --> 21:55:06,760
So this is going to go take in self, which is the class itself. And it's going to take in an index,
13786
21:55:06,760 --> 21:55:15,160
which will be of an integer. And it's going to return a tuple of torch dot tensor and an integer,
13787
21:55:15,160 --> 21:55:21,720
which is the same thing that gets returned when we index on our training data. So if we have a
13788
21:55:21,720 --> 21:55:31,880
look image label equals train data, zero, get item is going to replicate this. We pass it an index here.
13789
21:55:33,720 --> 21:55:39,960
Let's check out the image and the label. This is what we have to replicate. So remember train
13790
21:55:39,960 --> 21:55:45,880
data was created with image folder from torch vision dot data sets. And so we will now get item
13791
21:55:45,880 --> 21:55:51,480
to return an image and a label, which is a tuple of a torch tensor, where the image is of a tensor
13792
21:55:51,480 --> 21:55:59,320
here. And the label is of an integer, which is the label here, the particular index as to which
13793
21:55:59,320 --> 21:56:08,520
this image relates to. So let's keep pushing forward. I'm going to write down here, returns one sample
13794
21:56:08,520 --> 21:56:20,360
of data, data and label, X and, or we'll just go XY. So we know that it's a tuple. Beautiful.
13795
21:56:20,360 --> 21:56:28,440
So let's set up the image. What do we want the image to be? Well, this is where we're going to
13796
21:56:28,440 --> 21:56:34,520
call on our self dot load image function, which is what we've created up here. Do you see the
13797
21:56:34,520 --> 21:56:40,120
customization capabilities of creating your own class? So we've got a fair bit of code here,
13798
21:56:40,120 --> 21:56:44,760
right? But essentially, all we're doing is we're just creating functions that is going to help us
13799
21:56:44,760 --> 21:56:50,440
load our images into some way, shape or form. Now, again, I can't stress this enough, regardless
13800
21:56:50,440 --> 21:56:55,960
of the data that you're working on, the pattern here will be quite similar. You'll just have to
13801
21:56:55,960 --> 21:57:02,200
change the different functions you use to load your data. So let's load an image of a particular
13802
21:57:02,200 --> 21:57:08,440
index. So if we pass in an index here, it's going to load in that image. Then what do we do? Well,
13803
21:57:08,440 --> 21:57:14,360
we want to get the class name, which is going to be self dot paths. And we'll get the index here,
13804
21:57:15,000 --> 21:57:23,480
and we can go parent dot name. So this expects path in format data,
13805
21:57:24,760 --> 21:57:33,240
folder slash class name slash image dot JPG. That's just something to be aware of. And the class
13806
21:57:33,240 --> 21:57:40,040
ID X is going to be self dot class to ID X. And we will get the class name here.
13807
21:57:42,840 --> 21:57:50,360
So now we have an image by loading in the image here. We have a class name by because our data
13808
21:57:50,360 --> 21:57:55,080
is going to be or our data is currently in standard image classification format. You may have to
13809
21:57:55,080 --> 21:57:59,320
change this depending on the format your data is in, we can get the class name from that,
13810
21:57:59,320 --> 21:58:06,920
and we can get the class ID X by indexing on our attribute up here, our dictionary of class names
13811
21:58:06,920 --> 21:58:15,720
to indexes. Now we have one small little step. This is transform if necessary. So remember our
13812
21:58:16,600 --> 21:58:22,440
transform parameter up here. If we want to transform our target image, well, let's put in if self dot
13813
21:58:22,440 --> 21:58:29,240
transform if the transform exists, let's pass the image through that transform, transform image
13814
21:58:29,240 --> 21:58:37,800
and then we're going to also return the class ID X. So do you notice how we've returned a
13815
21:58:37,800 --> 21:58:44,520
tuple here? This is going to be a torch tensor. If our transform exists and the class ID X is also
13816
21:58:44,520 --> 21:58:51,000
going to be returned, which is what we want here, X and Y, which is what gets returned here,
13817
21:58:51,000 --> 21:59:02,520
image as a tensor label as an integer. So return data label X, Y, and then if the transform doesn't
13818
21:59:02,520 --> 21:59:17,880
exist, let's just return image class ID X, return untransformed image and label. Beautiful. So
13819
21:59:17,880 --> 21:59:24,200
that is a fair bit of code there. So you can see the pro of subclassing torch utils data that data
13820
21:59:24,200 --> 21:59:29,240
set is that we can customize this in almost any way we wanted to to load whatever data that we're
13821
21:59:29,240 --> 21:59:34,840
working with, well, almost any data. However, because we've written so much code, this may be
13822
21:59:34,840 --> 21:59:38,200
prone to errors, which we're going to find out in the next video to see if it actually works.
13823
21:59:39,160 --> 21:59:43,720
But essentially, all we've done is we've followed the documentation here torch dot utils data
13824
21:59:43,720 --> 21:59:51,000
dot data set to replicate the functionality of an existing data loader function, namely image folder.
13825
21:59:51,000 --> 21:59:56,920
So if we scroll back up, ideally, if we've done it right, we should be able to write code like this,
13826
21:59:58,120 --> 22:00:02,440
passing in a root directory, such as a training directory, a particular data transform.
13827
22:00:03,080 --> 22:00:10,920
And we should get very similar instances as image folder, but using our own custom data set class.
13828
22:00:10,920 --> 22:00:20,520
So let's try that out in the next video. So now we've got a custom image folder class
13829
22:00:20,520 --> 22:00:26,280
that replicates the functionality of the original image folder, data loader class,
13830
22:00:26,280 --> 22:00:32,360
or data set class, that is, let's test it out. Let's see if it works on our own custom data.
13831
22:00:32,360 --> 22:00:43,400
So we're going to create a transform here so that we can transform our images raw jpeg images into tenses,
13832
22:00:43,400 --> 22:00:49,800
because that's the whole goal of importing data into pytorch. So let's set up a train transforms
13833
22:00:49,800 --> 22:00:57,560
compose. We're going to set it to equal to transforms dot compose. And I'm going to pass in a list here,
13834
22:00:57,560 --> 22:01:05,800
that it's going to be transforms, we're going to resize it to 6464. Whatever the image size will
13835
22:01:05,800 --> 22:01:12,600
reduce it down to 6464. Then we're going to go transforms dot random horizontal flip. We don't
13836
22:01:12,600 --> 22:01:18,680
need to necessarily flip them, but we're going to do it anyway, just to see if it works. And then
13837
22:01:18,680 --> 22:01:25,240
let's put in here transforms dot to tensor, because our images are getting opened as a pill image,
13838
22:01:25,240 --> 22:01:33,320
using image dot open. But now we're using the to transform transform from pytorch or torch
13839
22:01:33,320 --> 22:01:41,800
visions dot transforms. So I'll just put this here. From torch vision dot transforms, that way you
13840
22:01:41,800 --> 22:01:47,480
know where importing transforms there. And let's create one for the test data set as well, test
13841
22:01:47,480 --> 22:01:56,120
transforms, we'll set this up. Oh, excuse me, I need to just go import transforms. And let's go
13842
22:01:56,120 --> 22:02:01,720
transforms dot compose. And we'll pass in another list, we're going to do the exact same as above,
13843
22:02:01,720 --> 22:02:10,840
we'll set up resize, and we'll set the size equal to 6464. And then transforms, we're going to go
13844
22:02:10,840 --> 22:02:16,440
dot to tensor, we're going to skip the data augmentation for test data. Because typically,
13845
22:02:16,440 --> 22:02:23,160
you don't manipulate your test data in terms of data augmentation, you just convert it into a
13846
22:02:23,160 --> 22:02:29,800
tensor, rather than manipulate its orientation, shape, size, etc, etc. So let's run this.
13847
22:02:31,720 --> 22:02:41,080
And now let's see how image folder custom class works. Test out image folder custom.
13848
22:02:41,080 --> 22:02:50,280
Let's go, we'll set up the train data custom is equal to image folder custom. And then we'll set up
13849
22:02:50,280 --> 22:02:56,120
the target, which is equal to the training directory. And then we'll pass in the transform,
13850
22:02:56,120 --> 22:03:04,120
which is equal to the train transforms, which we just created above train transforms. And then
13851
22:03:04,120 --> 22:03:09,160
we're going to, I think that's all we need, actually, we only had two parameters that we're not going
13852
22:03:09,160 --> 22:03:13,880
to use a target transform, because our labels, we've got to help a function to transform our labels.
13853
22:03:13,880 --> 22:03:20,120
So test data custom is going to be image folder custom. And I'm going to set up the target to be
13854
22:03:20,120 --> 22:03:26,840
equal to the test directory. And the transform is going to be the test transforms from the cell
13855
22:03:26,840 --> 22:03:34,840
above there. And what's co lab telling me there? Oh, I'm going to set that up. Did we spell
13856
22:03:34,840 --> 22:03:40,200
something? Oh, we spelled it wrong train transforms. There we go. Beautiful. Now let's have a look at
13857
22:03:40,200 --> 22:03:49,160
our train data and test data custom. See if it worked. What do we have? Or we have an image folder
13858
22:03:49,160 --> 22:03:54,600
custom. Well, it doesn't give us as much rich information as just checking it out as it does
13859
22:03:54,600 --> 22:04:01,320
for the train data. But that's okay. We can still inspect these. So this is our original one made
13860
22:04:01,320 --> 22:04:07,960
with image folder. And we've got now train data custom and test data custom. Let's see if we can
13861
22:04:07,960 --> 22:04:13,320
get some information from there. So let's check the original length of the train data and see if
13862
22:04:13,320 --> 22:04:19,640
we can use the land method on our train data custom. Did that work? Wonderful. Now how about we do it
13863
22:04:19,640 --> 22:04:26,680
for the original test data made with image folder and our custom version made with test data or
13864
22:04:26,680 --> 22:04:32,760
image folder custom. Beautiful. That's exactly what we want. And now let's have a look at the
13865
22:04:32,760 --> 22:04:40,440
train data custom. Let's see if the classes attribute comes up. Dot classes. And we'll just leave that
13866
22:04:40,440 --> 22:04:47,080
there. We'll do the class dot ID X. Yes, it is. So this attribute here is I wonder if we get
13867
22:04:49,240 --> 22:04:54,840
information from Google co lab loading. What do we get? Oh, classes to ID X classes load image
13868
22:04:54,840 --> 22:05:02,440
paths transform. So if we go back up here, all these attributes are from here paths transform
13869
22:05:03,080 --> 22:05:09,800
classes class to ID X as well as load image. So this is all coming from the code that we wrote
13870
22:05:09,800 --> 22:05:16,200
our custom data set class. So let's keep pushing forward. Let's have a look at the class to ID X.
13871
22:05:16,200 --> 22:05:21,880
Do we get the same as what we wanted before? Yes, we do beautiful a dictionary containing our
13872
22:05:21,880 --> 22:05:30,200
string names and the integer associations. So let's now check for equality. We can do this by going
13873
22:05:31,800 --> 22:05:43,720
check for equality between original image folder data set and image folder custom data set. Now
13874
22:05:43,720 --> 22:05:49,960
we've kind of already done that here, but let's just try it out. Let's go print. Let's go train
13875
22:05:49,960 --> 22:05:58,440
data custom dot classes. Is that equal to train? Oh, I don't want three equals train data. The
13876
22:05:58,440 --> 22:06:07,960
original one classes and also print. Let's do test data custom dot classes. Is this equal to
13877
22:06:08,840 --> 22:06:16,920
test data? The original one classes. True and true. Now you could try this out. In fact,
13878
22:06:16,920 --> 22:06:24,520
it's a little exercise to try it out to compare the others. But congratulations to us, we have
13879
22:06:24,520 --> 22:06:30,440
replicated the main functionality of the image folder data set class. And so the takeaways from
13880
22:06:30,440 --> 22:06:37,800
this is that whatever data you have, PyTorch gives you a base data set class to inherit from.
13881
22:06:37,800 --> 22:06:43,720
And then you can write a function or a class that somehow interacts with whatever data you're
13882
22:06:43,720 --> 22:06:49,240
working with. So in our case, we load in an image. And then you, as long as you override the land
13883
22:06:49,240 --> 22:06:56,200
method and the get item method and return some sort of values, well, you can create your own
13884
22:06:56,200 --> 22:07:02,200
data set loading function. How beautiful is that? So that's going to help you work with your own
13885
22:07:02,200 --> 22:07:08,280
custom data sets in PyTorch. So let's keep pushing forward. We've seen analytically that
13886
22:07:08,280 --> 22:07:14,440
our custom data set is quite similar to the original PyTorch, torch vision dot data sets
13887
22:07:14,440 --> 22:07:20,840
image folder data set. But you know what I like to do? I like to visualize things. So let's in
13888
22:07:20,840 --> 22:07:27,720
the next video, create a function to display some random images from our trained data custom class.
13889
22:07:27,720 --> 22:07:37,640
It's time to follow the data explorer's motto of visualize, visualize, visualize. So let's
13890
22:07:37,640 --> 22:07:46,440
create another section. I'm going to write here a title called create a function to display random
13891
22:07:46,440 --> 22:07:52,200
images. And sure, we've, we've had a look at the different attributes of our custom data set.
13892
22:07:52,200 --> 22:07:57,720
We see that it gives back a list of different class names. We see that the lengths are similar
13893
22:07:57,720 --> 22:08:04,360
to the original, but there's nothing quite like visualizing some data. So let's go in here. We're
13894
22:08:04,360 --> 22:08:11,080
going to write a function, a helper function. So step number one, we need to take in a data set.
13895
22:08:11,080 --> 22:08:15,960
So one of the data sets that we just created, whether it be trained data custom or trained data.
13896
22:08:15,960 --> 22:08:29,480
And a number of other parameters, such as class names and how many images to visualize. And then
13897
22:08:29,480 --> 22:08:39,000
step number two is to prevent the display getting out of hand. Let's cap the number of
13898
22:08:39,000 --> 22:08:45,800
images to see at 10. Because look, if our data set is going to be thousands of images and we want
13899
22:08:45,800 --> 22:08:50,040
to put in a number of images to look at, let's just make sure it's the maximum is 10. That should
13900
22:08:50,040 --> 22:09:00,280
be enough. So we'll set the random seed for reproducibility. Number four is, let's get a list of random
13901
22:09:00,280 --> 22:09:08,120
samples. So we want random sample indexes, don't just get rid of this s from what do we want it from
13902
22:09:08,840 --> 22:09:15,560
from the target data set. So we want to take in a data set, and we want to count the number of
13903
22:09:15,560 --> 22:09:21,000
images we're seeing, we want to set a random seed. And do you see how much I use randomness here to
13904
22:09:21,000 --> 22:09:26,280
really get an understanding of our data? I really, really, really love harnessing the power of
13905
22:09:26,280 --> 22:09:32,280
randomness. So we want to get a random sample of indexes from all of our data set. And then we're
13906
22:09:32,280 --> 22:09:41,320
going to set up a matplotlib plot. Then we want to loop through the random sample images.
13907
22:09:41,320 --> 22:09:50,360
And plot them with matplotlib. And then as a side to this one, step seven is we need to make sure
13908
22:09:50,360 --> 22:10:00,360
the dimensions of our images line up with matplotlib. So matplotlib needs a height width color channels.
13909
22:10:00,360 --> 22:10:10,680
All right, let's take it on, hey? So number one is create a function to take in a data set.
13910
22:10:10,680 --> 22:10:16,280
So we're going to call this def, let's call it def display random images going to be one of our
13911
22:10:16,280 --> 22:10:21,480
helper functions. We've created a few type of functions like this. But let's take in a data set,
13912
22:10:21,480 --> 22:10:27,640
which is torch utils of type that is of type data set. Then we're going to take in classes,
13913
22:10:27,640 --> 22:10:32,920
which is going to be a list of different strings. So this is going to be our class names for
13914
22:10:32,920 --> 22:10:38,360
whichever data set we're using. I'm going to set this equal to none. And then we're going to take in
13915
22:10:38,360 --> 22:10:43,400
n, which is the number of images we'd like to plot. And I'm going to set this to 10 by default. So
13916
22:10:43,400 --> 22:10:49,000
we can see 10 images at a time, 10 random images, that is, do we want to display the shape? Let's
13917
22:10:49,000 --> 22:10:54,920
set that equal to true, so that we can display what the shape of the images, because we're passing
13918
22:10:54,920 --> 22:11:01,400
it through our transform as it goes into a data set. So we want to see what the shape of our
13919
22:11:01,400 --> 22:11:07,080
images are just to make sure that that's okay. And we can also let's set up a seed, which is
13920
22:11:07,080 --> 22:11:13,560
going to be an integer, and we'll set that to none to begin with as well. Okay, so step number two,
13921
22:11:13,560 --> 22:11:17,960
what do we have above? We have to prevent the display getting out of hand, let's cap the number
13922
22:11:17,960 --> 22:11:23,160
of images to see at 10. So we've got n is by default, it's going to be 10, but let's just make
13923
22:11:23,160 --> 22:11:32,920
sure that it stays there. Adjust display, if n is too high. So if n is greater than 10,
13924
22:11:32,920 --> 22:11:39,640
let's just readjust this, let's set n equal to 10, and display shape, we'll turn off the
13925
22:11:39,640 --> 22:11:45,720
display shape, because if we have 10 images, our display may get out of hand. So just print out
13926
22:11:45,720 --> 22:11:56,600
here for display purposes, and shouldn't be larger than 10, setting to 10, and removing
13927
22:11:57,160 --> 22:12:01,320
shape display. Now I only know this because I've had experience cooking this dish before.
13928
22:12:01,320 --> 22:12:06,840
In other words, I've written this type of code before. So you can customize the beautiful thing
13929
22:12:06,840 --> 22:12:12,040
about Python and PyTorch, as you can customize these display functions in any way you see fit.
13930
22:12:12,040 --> 22:12:17,480
So step number three, what are we doing? Set the random seed for reproducibility. Okay,
13931
22:12:17,480 --> 22:12:25,080
set the seed. So if seed, let's set random dot seed equal to that seed value, and then we can keep
13932
22:12:25,080 --> 22:12:32,760
and then we can keep going. So number four is let's get some random sample indexes. So we can do
13933
22:12:32,760 --> 22:12:39,480
that by going get random sample indexes, which is step number four here. So we've got a target
13934
22:12:39,480 --> 22:12:45,400
data set that we want to inspect. We want to get some random samples from that. So let's create a
13935
22:12:45,400 --> 22:12:54,680
random samples IDX list. And I'm going to randomly sample from a length of our data set, or sorry,
13936
22:12:54,680 --> 22:12:59,080
a range of the length of our data set. And I'll show you what this means in a second.
13937
22:13:00,440 --> 22:13:06,280
And the K, excuse me, have we got enough brackets there? I always get confused with the brackets.
13938
22:13:06,280 --> 22:13:11,960
The K is going to be n. So in this case, I want to randomly sample 10 images from the length of
13939
22:13:11,960 --> 22:13:18,120
our data set or 10 indexes. So let's just have a look at what this looks like. We'll put in here,
13940
22:13:18,120 --> 22:13:24,840
our train data custom here. So this is going to take a range of the length of our train data
13941
22:13:24,840 --> 22:13:32,120
custom, which is what 225. We looked at that before, just up here, length of this. So between zero
13942
22:13:32,120 --> 22:13:37,960
and 255, we're going to get 10 indexes if we've done this correctly. Beautiful. So there's 10
13943
22:13:37,960 --> 22:13:44,840
random samples from our train data custom, or 10 random indexes, that is. So we're up to step number
13944
22:13:44,840 --> 22:13:53,000
five, which was loop through the random sample images or indexes. Let's create this to indexes,
13945
22:13:53,800 --> 22:13:58,440
indexes and plot them with matplotlib. So this is going to give us a list here.
13946
22:13:59,000 --> 22:14:10,040
So let's go loop through random indexes and plot them with matplotlib. Beautiful. So for
13947
22:14:10,040 --> 22:14:21,640
i tug sample in enumerate, let's enumerate through the random, random samples, idx list.
13948
22:14:22,360 --> 22:14:30,920
And then we're going to go tug image and tug label, because all of the samples in our target
13949
22:14:30,920 --> 22:14:36,440
data set are in the form of tuples. So we're going to get the target image and the target label,
13950
22:14:36,440 --> 22:14:43,320
which is going to be data set tug sample. We'll take the index. So it might be one of these values
13951
22:14:43,320 --> 22:14:51,000
here. We'll index on that. And the zero index will be the image. And then we'll go on the data set as
13952
22:14:51,000 --> 22:14:58,440
well. We'll take the tug sample index. And then the index number one will be the label of our target
13953
22:14:58,440 --> 22:15:06,920
sample. And then number seven, oh, excuse me, we've missed a step. That should be number six.
13954
22:15:08,120 --> 22:15:14,200
Did you catch that? Number five is setup plot. So we can do this quite easily by going plot
13955
22:15:14,200 --> 22:15:20,280
figure. This is so that each time we iterate through another sample, we're going to have
13956
22:15:20,280 --> 22:15:27,560
quite a big figure here. So we set up the plot outside the loop so that we can add a plot to this
13957
22:15:27,560 --> 22:15:34,040
original plot here. And now this is number seven, where we make sure the dimensions of our images
13958
22:15:34,040 --> 22:15:39,560
line up with matplotlib. So if we recall by default, pytorch is going to turn our image dimensions into
13959
22:15:39,560 --> 22:15:47,560
what color channels first, however, matplotlib prefers color channels last. So let's go adjust,
13960
22:15:47,560 --> 22:15:58,200
tensor dimensions for plotting. So let's go tag image. Let's call this tag image adjust equals
13961
22:15:58,200 --> 22:16:06,600
tag image dot commute. And we're going to alter the order of the indexes. So this is going to go
13962
22:16:06,600 --> 22:16:14,040
from color channels or the dimensions that is height width. And we're going to change this width,
13963
22:16:14,040 --> 22:16:23,480
if I could spell, to height width color channels. Beautiful. That one will probably catch you off
13964
22:16:23,480 --> 22:16:28,360
guard a few times. But we've seen it a couple of times now. So we're going to keep going with this
13965
22:16:29,000 --> 22:16:36,760
plot adjusted samples. So now we can add a subplot to our matplotlib plot. And we want to create,
13966
22:16:36,760 --> 22:16:44,840
we want one row of n images, this will make a lot more sense when we visualize it. And then for
13967
22:16:44,840 --> 22:16:52,840
the index, we're going to keep track of i plus one. So let's keep going. So then we're going to go
13968
22:16:52,840 --> 22:17:01,960
plot in show. And I'm going to go tug image adjust. So I'm going to plot this image here. And then
13969
22:17:01,960 --> 22:17:11,800
let's turn off the axis. And we can go if the classes variable exists, which is up here, a list
13970
22:17:11,800 --> 22:17:18,200
of classes, let's adjust the title of the plot to be the particular index in the class list. So
13971
22:17:18,200 --> 22:17:26,520
title equals f class. And then we're going to put in here classes. And we're going to index on that
13972
22:17:26,520 --> 22:17:31,480
with the target label index, which is going to come from here. Because that's going to be a new
13973
22:17:31,480 --> 22:17:42,520
numerical format. And then if display shape, let's set the title equal to title plus f. We're going
13974
22:17:42,520 --> 22:17:49,320
to go new line shape. This is going to be the shape of the image, tug image adjust dot shape.
13975
22:17:50,840 --> 22:17:57,320
And then we'll set the title to PLT dot title. So you see how if we have display shape, we're
13976
22:17:57,320 --> 22:18:02,120
just adjusting the title variable that we created here. And then we're putting the title onto the
13977
22:18:02,120 --> 22:18:09,240
plot. So let's see how this goes. That is quite a beautiful function. Let's pass in one of our
13978
22:18:09,240 --> 22:18:15,560
data sets and see what it looks like. Let's plot some random images. So which one should we start
13979
22:18:15,560 --> 22:18:23,560
with first? So let's display random images from the image folder created data sets. So this is the
13980
22:18:23,560 --> 22:18:30,200
inbuilt pytorch image folder. Let's go display random images, the function we just created above.
13981
22:18:30,200 --> 22:18:34,440
We're going to pass in the train data. And then we can pass in the number of images. Let's have
13982
22:18:34,440 --> 22:18:41,640
a look at five. And the classes is going to be the class names, which is just a list of our
13983
22:18:41,640 --> 22:18:46,360
different class names. And then we can set the seed, we want it to be random. So we'll just set
13984
22:18:46,360 --> 22:18:55,240
the seed to equal none. Oh, doesn't that look good? So this is from our original train data
13985
22:18:55,240 --> 22:19:02,440
made with image folder. So option number one up here, option one, there we go. And we've
13986
22:19:02,440 --> 22:19:08,360
passed in the class name. So this is sushi resize to 64, 64, three, same with all of the others,
13987
22:19:08,360 --> 22:19:14,840
but from different classes. Let's set the seed to 42, see what happens. I get these images,
13988
22:19:14,840 --> 22:19:21,480
we got a sushi, we got a pizza, we got pizza, sushi pizza. And then if we try a different one,
13989
22:19:21,480 --> 22:19:29,480
we just go none. We get random images again, wonderful. Now let's write the same code,
13990
22:19:29,480 --> 22:19:38,600
but this time using our train data custom data set. So display random images from the image folder
13991
22:19:38,600 --> 22:19:46,840
custom data set. So this is the one that we created display random images. I'm going to pass
13992
22:19:46,840 --> 22:19:53,880
in train data custom, our own data set. Oh, this is exciting. Let's set any equal to 10 and just see
13993
22:19:53,880 --> 22:19:58,680
see how far we can go with with our plot. Or maybe we set it to 20 and just see if our
13994
22:19:58,680 --> 22:20:08,920
code for adjusting the plot makes sense. Class names and seed equals, I'm going to put in 42 this time.
13995
22:20:08,920 --> 22:20:13,800
There we go. For display purposes, and shouldn't be larger than 10 setting to 10 and removing shape
13996
22:20:13,800 --> 22:20:21,160
display. So we have a stake image, a pizza image, pizza, steak pizza, pizza, pizza, pizza, steak,
13997
22:20:21,160 --> 22:20:26,200
pizza. If we turn off the random seed, we should get another 10 random images here.
13998
22:20:26,200 --> 22:20:34,600
Beautiful. Look at that. Steak, steak, sushi, pizza, steak, sushi class. I'm reading out
13999
22:20:34,600 --> 22:20:41,720
the different things here. Pizza, pizza, pizza, pizza. Okay. So it looks like our custom data set
14000
22:20:41,720 --> 22:20:48,200
is working from both a qualitative standpoint, looking at the different images and a quantitative.
14001
22:20:48,200 --> 22:20:52,680
How about we change it to five and see what it looks like? Do we have a different shape? Yes,
14002
22:20:52,680 --> 22:20:59,320
we do the same shape as above. Wonderful. Okay. So we've got train data custom.
14003
22:21:00,120 --> 22:21:05,560
And we've got train data, which is made from image folder. But the premises remain, we've built up
14004
22:21:05,560 --> 22:21:10,600
a lot of different ideas. And we're looking at things from different points of view. We are
14005
22:21:10,600 --> 22:21:17,720
getting our data from the folder structure here into tensor format. So there's still one more
14006
22:21:17,720 --> 22:21:23,800
step that we have to do. And that's go from data set to data loader. So in the next video,
14007
22:21:23,800 --> 22:21:30,120
let's see how we can turn our custom loaded images, train data custom, and test data custom
14008
22:21:30,120 --> 22:21:35,240
into data loaders. So you might want to go ahead and give that a try yourself. We've done it before
14009
22:21:35,240 --> 22:21:40,440
up here. Turn loaded images into data loaders. We're going to replicate the same thing as we did
14010
22:21:40,440 --> 22:21:45,480
in here for our option number two, except this time we'll be using our custom data set.
14011
22:21:45,480 --> 22:21:54,280
I'll see you in the next video. I'll take some good looking images and even better that they're
14012
22:21:54,280 --> 22:22:00,360
from our own custom data set. Now we've got one more step. We're going to turn our data set into a
14013
22:22:00,360 --> 22:22:05,800
data loader. In other words, we're going to batchify all of our images so they can be used with the
14014
22:22:05,800 --> 22:22:12,440
model. And I gave you the challenge of trying this out yourself in the last video. So I hope
14015
22:22:12,440 --> 22:22:17,400
you gave that a go. But let's see what that might look like in here. So I'm going to go 5.4.
14016
22:22:17,960 --> 22:22:26,360
Let's go. What should we call this? So turn custom loaded images into data loaders. So this
14017
22:22:26,360 --> 22:22:33,000
is just goes to show that we can write our own custom data set class. And we can still use it
14018
22:22:33,000 --> 22:22:42,760
with PyTorch's data loader. So let's go from utils torch dot utils that is utils dot data import
14019
22:22:42,760 --> 22:22:47,080
data loader. We'll get that in here. We don't need to do that again, but I'm just doing it for
14020
22:22:47,080 --> 22:22:52,920
completeness. So we're going to set this to train data loader custom. And I'm going to create an
14021
22:22:52,920 --> 22:22:59,160
instance of data loader here. And then inside I'm going to pass the data set, which is going to be
14022
22:22:59,160 --> 22:23:05,640
train data custom. I'm just going to set a universal parameter here in capitals for batch size equals
14023
22:23:05,640 --> 22:23:11,880
32. Because we can come down here, we can set the batch size, we're going to set this equal to 32.
14024
22:23:12,840 --> 22:23:17,320
Or in other words, the batch size parameter we set up there, we can set the number of workers
14025
22:23:17,320 --> 22:23:26,040
here as well. If you set to zero, let's go see what the default is actually torch utils data loader.
14026
22:23:26,040 --> 22:23:34,680
What's the default for number of workers? Zero. Okay, beautiful. And recall that number of workers
14027
22:23:34,680 --> 22:23:41,000
is going to set how many cores load your data with a data loader. And generally higher is better.
14028
22:23:41,000 --> 22:23:46,280
But you can also experiment with this value and see what value suits your model and your
14029
22:23:46,280 --> 22:23:52,600
hardware the best. So just keep in mind that number of workers is going to alter how much
14030
22:23:52,600 --> 22:23:59,400
compute your hardware that you're running your code on uses to load your data. So by default,
14031
22:23:59,400 --> 22:24:05,880
it's set to zero. And then we're going to shuffle the training data. Wonderful. And let's do the
14032
22:24:05,880 --> 22:24:10,920
same for the test data loader. We'll create test data loader custom. And I'm going to create a
14033
22:24:10,920 --> 22:24:18,120
new instance. So let me make a few code cells here of data loader, and create a data set or pass
14034
22:24:18,120 --> 22:24:24,520
in the data set parameter as the test data custom. So again, these data sets are what we've created
14035
22:24:24,520 --> 22:24:32,600
using our own custom data set class. I'm going to set the batch size equal to batch size. And
14036
22:24:32,600 --> 22:24:37,720
let's set the number workers equal to zero. In a previous video, we've also set it to CPU count.
14037
22:24:38,680 --> 22:24:46,280
You can also set it to one. You can hard code it to four all depends on what hardware you're using.
14038
22:24:46,280 --> 22:24:54,040
I like to use OPA OS dot CPU count. And then we're not going to shuffle the test data.
14039
22:24:56,680 --> 22:25:05,160
False. Beautiful. And let's have a look at what we get here. Train data loader custom and test
14040
22:25:06,120 --> 22:25:10,840
data loader custom. And actually, I'm just going to reset this instead of being OOS CPU count.
14041
22:25:10,840 --> 22:25:14,360
I'm going to put it back to zero, just so we've got it in line with the one above.
14042
22:25:14,360 --> 22:25:22,120
And of course, numb workers, we could also set this numb workers equals zero or OS dot CPU count.
14043
22:25:22,920 --> 22:25:29,880
And then we could come down here and set this as numb workers and numb workers.
14044
22:25:30,920 --> 22:25:38,040
And let's have a look to see if it works. Beautiful. So we've got two instances of utils.data.data
14045
22:25:38,040 --> 22:25:43,960
loader. Now, let's just get a single sample from the train data loader here, just to make sure the
14046
22:25:43,960 --> 22:25:52,280
image shape and batch size is correct. Get image and label from custom data loader. We want image
14047
22:25:52,280 --> 22:25:59,720
custom. And I'm going to go label custom equals next. And I'm going to iter over the train data
14048
22:25:59,720 --> 22:26:10,280
loader custom. And then let's go print out the shapes. We want image custom dot shape and label
14049
22:26:10,280 --> 22:26:18,120
custom. Do we get a shape here? Beautiful. There we go. So we have shape here of 32,
14050
22:26:18,120 --> 22:26:24,840
because that is our batch size. Then we have three color channels, 64, 64, which is in line with
14051
22:26:24,840 --> 22:26:31,000
what? Which is in line with our transform that we set all the way up here. Transform. We transform
14052
22:26:31,000 --> 22:26:35,240
our image. You may want to change that to something different depending on the model you're using,
14053
22:26:35,240 --> 22:26:41,240
depending on how much data you want to be comprised within your image. Recall, generally a larger
14054
22:26:41,240 --> 22:26:47,640
image size encodes more information. And this is all coming from our original image folder
14055
22:26:47,640 --> 22:26:53,320
custom data set class. So look at us go. And I mean, this is a lot of code here or a fair bit of
14056
22:26:53,320 --> 22:26:59,240
code, right? But you could think of this as like you write it once. And then if your data set continues
14057
22:26:59,240 --> 22:27:05,960
to be in this format, well, you can use this over and over again. So you might put this, this image
14058
22:27:05,960 --> 22:27:11,960
folder custom into a helper function file over here, such as data set dot pie or something like
14059
22:27:11,960 --> 22:27:18,040
that. And then you could call it in future code instead of rewriting it all the time. And so that's
14060
22:27:18,040 --> 22:27:23,320
just exactly what pytorch is done with taught vision dot data sets dot image folder. So we've
14061
22:27:23,320 --> 22:27:27,400
got some shapes here. And if we wanted to change the batch size, what do we do? We just change it
14062
22:27:27,400 --> 22:27:32,840
like that 64. Remember, a good batch size is also a multiple of eight, because that's going to help
14063
22:27:32,840 --> 22:27:43,640
out computing. And batch size equals one. We get a batch size equal of one. We've been through a
14064
22:27:43,640 --> 22:27:49,240
fair bit. But we've covered a very important thing. And that is loading your own data with a custom
14065
22:27:49,240 --> 22:27:54,680
data set. So generally, you will be able to load your own data with an existing data loading function
14066
22:27:54,680 --> 22:28:01,160
or data set function from one of the torch domain libraries, such as torch audio, torch text,
14067
22:28:01,160 --> 22:28:07,400
torch vision, torch rack. And later on, when it's out of beta, torch data. But if you need to create
14068
22:28:07,400 --> 22:28:13,160
your own custom one, while you can subclass torch dot utils dot data, dot data set, and then add
14069
22:28:13,160 --> 22:28:19,000
your own functionality to it. So let's keep pushing forward. Previously, we touched a little bit on
14070
22:28:19,000 --> 22:28:25,880
transforming data. And you may have heard me say that torch vision transforms can be used for data
14071
22:28:25,880 --> 22:28:33,640
augmentation. And if you haven't, that is what the documentation says here. But data augmentation
14072
22:28:33,640 --> 22:28:39,560
is manipulating our images in some way, shape or form, so that we can artificially increase the
14073
22:28:39,560 --> 22:28:46,680
diversity of our training data set. So let's have a look at that more in the next video. I'll see you
14074
22:28:46,680 --> 22:28:56,600
there. Over the last few videos, we've created functions and classes to load in our own custom
14075
22:28:56,600 --> 22:29:03,320
data set. And we learned that one of the biggest steps in loading a custom data set is transforming
14076
22:29:03,320 --> 22:29:10,200
your data, particularly turning your target data into tenses. And we also had a brief look at the
14077
22:29:10,200 --> 22:29:14,840
torch vision transforms module. And we saw that there's a fair few different ways that we can
14078
22:29:14,840 --> 22:29:21,880
transform our data. And that one of the ways that we can transform our image data is through
14079
22:29:21,880 --> 22:29:27,480
augmentation. And so if we went into the illustration of transforms, let's have a look at all the
14080
22:29:27,480 --> 22:29:33,160
different ways we can do it. We've got resize going to change the size of the original image.
14081
22:29:33,160 --> 22:29:39,320
We've got center crop, which will crop. We've got five crop. We've got grayscale. We've got random
14082
22:29:39,320 --> 22:29:46,520
transforms. We've got Gaussian blur. We've got random rotation, random caffeine, random crop.
14083
22:29:46,520 --> 22:29:50,840
We could keep going. And in fact, I'd encourage you to check out all of the different options here.
14084
22:29:51,640 --> 22:29:58,680
But oh, there's auto augment. Wonderful. There's random augment. This is what I was hinting at.
14085
22:29:58,680 --> 22:30:04,040
Data augmentation. Do you notice how the original image gets augmented in different ways here?
14086
22:30:04,040 --> 22:30:09,560
So it gets artificially changed. So it gets rotated a little here. It gets dark and a little
14087
22:30:09,560 --> 22:30:15,320
here or maybe brightened, depending how you look at it, it gets shifted up here. And then the colors
14088
22:30:15,320 --> 22:30:21,720
kind of change here. And so this process is known as data augmentation, as we've hinted at.
14089
22:30:21,720 --> 22:30:30,440
And we're going to create another section here, which is number six, other forms of transforms.
14090
22:30:31,080 --> 22:30:37,240
And this is data augmentation. So how could you find out about what data augmentation is?
14091
22:30:37,240 --> 22:30:42,120
Well, you could go here. What is data augmentation? And I'm sure there's going to be plenty of
14092
22:30:42,120 --> 22:30:48,680
resources here. Wikipedia. There we go. Data augmentation in data analysis are techniques
14093
22:30:48,680 --> 22:30:55,320
used to increase the amount of data by adding slightly modified copies of already existing data
14094
22:30:55,320 --> 22:31:01,240
or newly created synthetic data from existing data. So I'm going to write down here,
14095
22:31:02,040 --> 22:31:11,880
data augmentation is the process of artificially adding diversity to your training data.
14096
22:31:11,880 --> 22:31:23,960
Now, in the case of image data, this may mean applying various image transformations to the
14097
22:31:23,960 --> 22:31:30,120
training images. And we saw a whole bunch of those in the torch vision transformed package.
14098
22:31:30,120 --> 22:31:35,400
But now let's have a look at one type of data augmentation in particular. And that is trivial
14099
22:31:35,400 --> 22:31:41,160
augment. But just to illustrate this, I've got a slide here ready to go. We've got what is data
14100
22:31:41,160 --> 22:31:47,800
augmentation. And it's looking at the same image, but from different perspectives. And we do this,
14101
22:31:47,800 --> 22:31:54,600
as I said, to artificially increase the diversity of a data set. So if we imagine our original
14102
22:31:54,600 --> 22:31:59,480
images over here on the left, and then if we wanted to rotate it, we could apply a rotation
14103
22:31:59,480 --> 22:32:04,440
transform. And then if we wanted to shift it on the vertical and the horizontal axis,
14104
22:32:04,440 --> 22:32:10,280
we could apply a shift transform. And if we wanted to zoom in on the image, we could apply
14105
22:32:10,280 --> 22:32:16,200
a zoom transform. And there are many different types of transforms. As I've got a note here,
14106
22:32:16,200 --> 22:32:20,200
there are many different kinds of data augmentation, such as cropping, replacing,
14107
22:32:20,200 --> 22:32:26,120
shearing. And this slide only demonstrates a few. But I'd like to highlight another type of data
14108
22:32:26,120 --> 22:32:34,360
augmentation. And that is one used to recently train pytorch torch vision image models to state
14109
22:32:34,360 --> 22:32:42,680
of the art levels. So let's take a look at one particular type of data augmentation,
14110
22:32:43,880 --> 22:32:51,800
used to train pytorch vision models to state of the art levels.
14111
22:32:54,440 --> 22:32:59,240
Now, just in case you're not sure why we might do this, we would like to increase
14112
22:32:59,240 --> 22:33:06,520
the diversity of our training data so that our images become harder for our model to learn. Or
14113
22:33:06,520 --> 22:33:11,880
it gets a chance to view the same image from different perspectives so that when you use your
14114
22:33:11,880 --> 22:33:18,520
image classification model in practice, it's seen the same sort of images, but from many different
14115
22:33:18,520 --> 22:33:23,880
angles. So hopefully it learns patterns that are generalizable to those different angles.
14116
22:33:23,880 --> 22:33:35,720
So this practice, hopefully, results in a model that's more generalizable to unseen data.
14117
22:33:36,920 --> 22:33:48,280
And so if we go to torch vision, state of the art, here we go. So this is a recent blog post
14118
22:33:48,280 --> 22:33:51,960
by the pytorch team, how to train state of the art models, which is what we want to do,
14119
22:33:51,960 --> 22:33:57,240
state of the art means best in business, otherwise known as soda. You might see this acronym quite
14120
22:33:57,240 --> 22:34:03,160
often using torch visions latest primitives. So torch vision is the package that we've been
14121
22:34:03,160 --> 22:34:08,680
using to work with vision data. And torch vision has a bunch of primitives, which are,
14122
22:34:08,680 --> 22:34:16,280
in other words, functions that help us train really good performing models. So blog post here.
14123
22:34:16,280 --> 22:34:23,560
And if we jump into this blog post and if we scroll down, we've got some improvements here.
14124
22:34:23,560 --> 22:34:28,360
So there's an original ResNet 50 model. ResNet 50 is a common computer vision architecture.
14125
22:34:29,000 --> 22:34:35,800
So accuracy at one. So what do we have? Well, let's just say they get a boost in what the previous
14126
22:34:35,800 --> 22:34:43,880
results were. So if we scroll down, there is a type of data augmentation here. So if we add up
14127
22:34:43,880 --> 22:34:48,840
all of the improvements that they used, so there's a whole bunch here. Now, as your extra curriculum,
14128
22:34:48,840 --> 22:34:53,560
I'd encourage you to look at what the improvements are. You're not going to get them all the first
14129
22:34:53,560 --> 22:34:58,520
go, but that's all right. Blog posts like this come out all the time and the recipes are continually
14130
22:34:58,520 --> 22:35:04,840
changing. So even though I'm showing you this now, this may change in the future. So I just
14131
22:35:04,840 --> 22:35:09,960
scroll down to see if this table showed us what the previous results were. Doesn't look like it does.
14132
22:35:09,960 --> 22:35:15,720
Oh, no, there's the baseline. So 76 and with all these little additions, it got right up to nearly
14133
22:35:15,720 --> 22:35:21,320
81. So nearly a boost of 5% accuracy. And that's pretty good. So what we're going to have a look
14134
22:35:21,320 --> 22:35:26,280
at is trivial augment. So there's a bunch of different things such as learning rate optimization,
14135
22:35:26,280 --> 22:35:32,120
training for longer. So these are ways you can improve your model. Random erasing of image data,
14136
22:35:32,120 --> 22:35:37,720
label smoothing, you can add that as a parameter to your loss functions, such as cross entropy loss,
14137
22:35:37,720 --> 22:35:44,600
mix up and cut mix, weight decay tuning, fixed res mitigations, exponential moving average,
14138
22:35:44,600 --> 22:35:49,240
which is EMA, inference resize tuning. So there's a whole bunch of different recipe items here,
14139
22:35:49,240 --> 22:35:52,760
but we're going to focus on what we're going to break it down. Let's have a look at trivial
14140
22:35:52,760 --> 22:36:01,560
augment. So we'll come in here. Let's look at trivial augment. So if we wanted to look at
14141
22:36:01,560 --> 22:36:06,600
trivial augment, can we find it in here? Oh, yes, we can. It's right here. Trivial augment.
14142
22:36:06,600 --> 22:36:12,840
So as you'll see, if you pass an image into trivial augment, it's going to change it in a few
14143
22:36:12,840 --> 22:36:21,080
different ways. So if we go into here, let's write that down. So let's see this in action on some
14144
22:36:21,080 --> 22:36:30,040
of our own data. So we'll import from torch vision, import transforms. And we're going to create a
14145
22:36:30,040 --> 22:36:41,240
train transform, which is equal to transforms dot compose. We'll pass it in there. And this is
14146
22:36:41,240 --> 22:36:45,960
going to be very similar to what we've done before in terms of composing a transform. What do we
14147
22:36:45,960 --> 22:36:51,880
want to do? Well, let's say we wanted to resize one of our images or an image going through this
14148
22:36:51,880 --> 22:36:59,240
transform. Let's change its size to 224224, which is a common size in image classification. And
14149
22:36:59,240 --> 22:37:07,720
then it's going to go through transforms. We're going to pass in trivial augment wide. And there's
14150
22:37:07,720 --> 22:37:14,920
a parameter here, which is number of magnitude bins, which is basically a number from 0 to 31,
14151
22:37:14,920 --> 22:37:22,040
31 being the max of how intense you want the augmentation to happen. So say we, we only put this as
14152
22:37:22,040 --> 22:37:29,720
5, our augmentation would be of intensity from 0 to 5. And so in that case, the maximum wouldn't
14153
22:37:29,720 --> 22:37:35,480
be too intense. So if we put it to 31, it's going to be the max intensity. And what I mean by intensity
14154
22:37:35,480 --> 22:37:43,400
is say this rotation, if we go on a scale of 0 to 31, this may be a 10, whereas 31 would be
14155
22:37:43,400 --> 22:37:50,680
completely rotating. And same with all these others, right? So the lower this number, the less the
14156
22:37:50,680 --> 22:37:58,440
maximum up a bound of the applied transform will be. Then if we go transforms dot to tensor,
14157
22:37:59,800 --> 22:38:06,840
wonderful. So there we've just implemented trivial augment. How beautiful is that? That is from
14158
22:38:07,400 --> 22:38:13,880
the PyTorch torch vision transforms library. We've got trivial augment wide. And it was used
14159
22:38:13,880 --> 22:38:20,920
trivial augment to train the latest state of the art vision models in the PyTorch torch vision
14160
22:38:21,560 --> 22:38:25,960
models library or models repository. And if you wanted to look up trivial augment, how could you
14161
22:38:25,960 --> 22:38:31,320
find that? You could search it. Here is the paper if you'd like to read it. Oh, it's implemented.
14162
22:38:31,320 --> 22:38:36,120
It's actually a very, very, I would say, let's just say trivial augment. I didn't want to say
14163
22:38:36,120 --> 22:38:40,120
simple because I don't want to downplay it. Trivial augment leverages the power of randomness
14164
22:38:40,120 --> 22:38:45,480
quite beautifully. So I'll let you read more on there. I would rather try it out on our data
14165
22:38:45,480 --> 22:38:52,440
and visualize it first. Test transform. Let's go transforms compose. And you might have the
14166
22:38:52,440 --> 22:38:58,360
question of which transforms should I use with my data? Well, that's the million dollar question,
14167
22:38:58,360 --> 22:39:02,760
right? That's the same thing as asking, which model should I use for my data? There's a fair
14168
22:39:02,760 --> 22:39:09,160
few different answers there. And my best answer will be try out a few, see what work for other
14169
22:39:09,160 --> 22:39:14,680
people like we've done here by finding that trivial augment worked well for the PyTorch team.
14170
22:39:14,680 --> 22:39:19,400
Try that on your own problems. If it works well, excellent. If it doesn't work well,
14171
22:39:19,400 --> 22:39:24,760
well, you can always excuse me. We've got a spelling mistake. If it doesn't work well,
14172
22:39:24,760 --> 22:39:29,720
well, you can always set up an experiment to try something else. So let's test out our
14173
22:39:29,720 --> 22:39:34,680
augmentation pipeline. So we'll get all the image paths. We've already done this, but we're
14174
22:39:34,680 --> 22:39:39,720
going to do it anyway. Again, just to reiterate, we've covered a fair bit here. So I might just
14175
22:39:39,720 --> 22:39:46,200
rehash on a few things. We're going to get list, image path, which is our, let me just show you
14176
22:39:47,080 --> 22:39:51,720
our image path. We just want to get all of the images within this file.
14177
22:39:52,360 --> 22:39:59,960
So we'll go image path dot glob, glob together all the files and folders that match this pattern.
14178
22:39:59,960 --> 22:40:07,720
And then if we check, what do we get? We'll check the first 10. Beautiful. And then we can
14179
22:40:07,720 --> 22:40:13,160
leverage our function from the four to plot some random images, plot random images.
14180
22:40:14,520 --> 22:40:20,280
We'll pass in or plot transformed random transformed images. That's what we want.
14181
22:40:20,280 --> 22:40:26,120
Let's see what it looks like when it goes through our trivial augment. So image paths,
14182
22:40:26,120 --> 22:40:34,040
equals image part list. This is a function that we've created before, by the way, transform equals
14183
22:40:34,040 --> 22:40:38,840
train transform, which is the transform we just created above that contains trivial augment.
14184
22:40:40,600 --> 22:40:45,080
And then we're going to put n equals three for five images. And we'll do seed equals none
14185
22:40:45,080 --> 22:40:52,200
to plot. Oh, sorry, n equals three for three images, not five. Beautiful. And we'll set the
14186
22:40:52,200 --> 22:40:58,200
seed equals none, by the way. So look at this. We've got class pizza. Now trivial augment,
14187
22:40:58,200 --> 22:41:03,480
it resized this. Now, I'm not quite sure what it did to transform it per se. Maybe it got a little
14188
22:41:03,480 --> 22:41:09,000
bit darker. This one looks like it's been the colors have been manipulated in some way, shape,
14189
22:41:09,000 --> 22:41:16,200
or form. And this one looks like it's been resized and not too much has happened to that one from
14190
22:41:16,200 --> 22:41:22,280
my perspective. So if we go again, let's have a look at another three images. So trivial augment
14191
22:41:22,280 --> 22:41:28,760
works. And what I said before, it harnesses the power of randomness. It kind of selects randomly
14192
22:41:28,760 --> 22:41:33,640
from all of these other augmentation types, and applies them at some level of intensity.
14193
22:41:34,280 --> 22:41:38,520
So all of these ones here, trivial augment is just going to select summit random, and then
14194
22:41:38,520 --> 22:41:44,280
apply them some random intensity from zero to 31, because that's what we've set on our data.
14195
22:41:44,280 --> 22:41:48,440
And of course, you can read a little bit more in the documentation, or sorry, in the paper here.
14196
22:41:49,080 --> 22:41:53,560
But I like to see it happening. So this one looks like it's been cut off over here a little bit.
14197
22:41:54,200 --> 22:41:58,760
This one again, the colors have been changed in some way, shape, or form. This one's been darkened.
14198
22:41:59,560 --> 22:42:04,440
And so do you see how we're artificially adding diversity to our training data set? So instead
14199
22:42:04,440 --> 22:42:09,800
of all of our images being this one perspective like this, we're adding a bunch of different
14200
22:42:09,800 --> 22:42:14,520
angles and telling our model, hey, you got to try and still learn these patterns, even if they've
14201
22:42:14,520 --> 22:42:20,200
been manipulated. So we'll try one more of these. So look at that one. That's pretty
14202
22:42:20,200 --> 22:42:25,000
manipulated there, isn't it? But it's still an image of stake. So that's what we're trying to
14203
22:42:25,000 --> 22:42:28,920
get our model to do is still recognize this image as an image of stake, even though it's been
14204
22:42:28,920 --> 22:42:34,360
manipulated a bit. Now, will this work or not? Hey, it might, it might not, but that's all the
14205
22:42:34,360 --> 22:42:40,760
nature of experimentation is. So play around. I would encourage you to go in the transforms
14206
22:42:40,760 --> 22:42:46,360
documentation like we've just done, illustrations, change this one out, trivial augment wine,
14207
22:42:46,360 --> 22:42:51,240
for another type of augmentation that you can find in here, and see what it does to some of
14208
22:42:51,240 --> 22:42:57,000
our images randomly. I've just highlighted trivial augment because it's what the PyTorch team have
14209
22:42:57,000 --> 22:43:02,440
used in their most recent blog post for their training recipe to train state-of-the-art vision
14210
22:43:02,440 --> 22:43:09,080
models. So speaking of training models, let's move forward and we've got to build our first model
14211
22:43:09,800 --> 22:43:12,200
for this section. I'll see you in the next video.
14212
22:43:15,960 --> 22:43:21,480
Welcome back. In the last video, we covered how the PyTorch team used trivial augment
14213
22:43:21,480 --> 22:43:26,280
wide, which is the latest state-of-the-art in data augmentation at the time of recording this
14214
22:43:26,280 --> 22:43:31,720
video to train their latest state-of-the-art computer vision models that are within
14215
22:43:31,720 --> 22:43:37,960
torch vision. And we saw how easily we could apply trivial augment thanks to torch vision
14216
22:43:37,960 --> 22:43:43,240
dot transforms. And we'll just see one more of those in action, just to highlight what's going on.
14217
22:43:45,720 --> 22:43:49,800
So it doesn't look like much happened to that image when we augmented, but we see this one has
14218
22:43:49,800 --> 22:43:53,720
been moved over. We've got some black space there. This one has been rotated a little,
14219
22:43:53,720 --> 22:43:59,240
and now we've got some black space there. But now's time for us to build our first
14220
22:43:59,240 --> 22:44:04,920
computer vision model on our own custom data set. So let's get started. We're going to go model zero.
14221
22:44:05,880 --> 22:44:11,080
We're going to reuse the tiny VGG architecture, which we covered in the computer vision section.
14222
22:44:11,080 --> 22:44:15,080
And the first experiment that we're going to do, we're going to build a baseline,
14223
22:44:15,080 --> 22:44:19,880
which is what we do with model zero. We're going to build it without data augmentation.
14224
22:44:19,880 --> 22:44:26,280
So rather than use trivial augment, which we've got up here, which is what the PyTorch team used
14225
22:44:26,280 --> 22:44:30,600
to train their state-of-the-art computer vision models, we're going to start by training our
14226
22:44:30,600 --> 22:44:36,120
computer vision model without data augmentation. And then so later on, we can try one to see
14227
22:44:36,680 --> 22:44:41,800
with data augmentation to see if it helps or doesn't. So let me just put a link in here,
14228
22:44:42,440 --> 22:44:48,840
CNN explainer. This is the model architecture that we covered in depth in the last section.
14229
22:44:48,840 --> 22:44:52,520
So we're not going to go spend too much time here. All you have to know is that we're going
14230
22:44:52,520 --> 22:44:58,760
to have an input of 64, 64, 3 into multiple different layers, such as convolutional layers,
14231
22:44:58,760 --> 22:45:03,480
relio layers, max pool layers. And then we're going to have some output layer that suits the
14232
22:45:03,480 --> 22:45:09,240
number of classes that we have. In this case, there's 10 different classes, but in our case,
14233
22:45:09,240 --> 22:45:17,800
we have three different classes, one for pizza, steak, and sushi. So let's replicate the tiny VGG
14234
22:45:17,800 --> 22:45:26,280
architecture from the CNN explainer website. And this is going to be good practice, right?
14235
22:45:26,280 --> 22:45:29,720
We're not going to spend too much time referencing their architecture. We're going to spend more
14236
22:45:29,720 --> 22:45:35,080
time coding here. But of course, before we can train a model, what do we have to do? Well,
14237
22:45:35,080 --> 22:45:43,320
let's go 7.1. We're going to create some transforms and loading data. We're going to load data for
14238
22:45:43,320 --> 22:45:51,000
model zero. Now, we could of course use some of the variables that we already have loaded. But
14239
22:45:51,000 --> 22:45:57,480
we're going to recreate them just to practice. So let's create a simple transform. And what is
14240
22:45:57,480 --> 22:46:04,840
our whole premise of loading data for model zero? We want to get our data from the data folder,
14241
22:46:05,640 --> 22:46:10,600
from pizza, steak sushi, from the training and test folders, from their respective folders,
14242
22:46:10,600 --> 22:46:15,480
we want to load these images and turn them into tenses. Now we've done this a few times now.
14243
22:46:16,120 --> 22:46:23,560
And one of the ways that we can do that is by creating a transform equals transforms dot compose.
14244
22:46:24,520 --> 22:46:32,680
And we're going to pass in, let's resize it. So transforms dot resize, we're going to resize our
14245
22:46:32,680 --> 22:46:40,520
images to be the same size as the tiny VGG architecture on the CNN explainer website. 64
14246
22:46:40,520 --> 22:46:48,360
64 three. And then we're also going to pass in another transform to tensor. So that our
14247
22:46:48,920 --> 22:46:55,480
images get resized to 64 64. And then they get converted into tenses. And particularly,
14248
22:46:55,480 --> 22:47:02,360
these values within that tensor are going to be between zero and one. So there's our transform.
14249
22:47:02,360 --> 22:47:07,000
Now we're going to load some data. If you want to pause the video here and try to load it yourself,
14250
22:47:07,000 --> 22:47:12,840
I'd encourage you to try out option one, loading image data using the image folder class,
14251
22:47:12,840 --> 22:47:20,120
and then turn that data set, that image folder data set into a data loader. So batchify it so
14252
22:47:20,120 --> 22:47:26,200
that we can use it with a pytorch model. So give that a shot. Otherwise, let's go ahead and do
14253
22:47:26,200 --> 22:47:33,800
that together. So one, we're going to load and transform data. We've done this before,
14254
22:47:33,800 --> 22:47:39,960
but let's just rehash on it what we're doing. So from torch vision import data sets, then we're
14255
22:47:39,960 --> 22:47:46,600
going to create the train data simple. And I call this simple because we're going to use at first
14256
22:47:46,600 --> 22:47:52,600
a simple transform, one with no data augmentation. And then later on for another modeling experiment,
14257
22:47:52,600 --> 22:47:58,360
we're going to create another transform one with data augmentation. So let's put this here
14258
22:47:58,360 --> 22:48:06,840
data sets image folder. And let's go the route equals the training directory. And then the
14259
22:48:06,840 --> 22:48:11,240
transform is going to be what? It's going to be our simple transform that we've got above.
14260
22:48:11,960 --> 22:48:17,480
And then we can put in test data simple here. And we're going to create data sets dot image
14261
22:48:17,480 --> 22:48:22,120
folder. And then we're going to pass in the route as the test directory. And we'll pass in the
14262
22:48:22,120 --> 22:48:26,760
transform is going to be the simple transform again above. So we're performing the same
14263
22:48:26,760 --> 22:48:33,080
transformation here on our training data, and on our testing data. Then what's the next step
14264
22:48:33,080 --> 22:48:42,600
we can do here? Well, we can to turn the data sets into data loaders. So let's try it out.
14265
22:48:42,600 --> 22:48:49,960
First, we're going to import OS, then from torch dot utils dot data, we're going to import data
14266
22:48:49,960 --> 22:48:58,600
loader. And then we're going to set up batch size and number of workers. So let's go batch size.
14267
22:48:58,600 --> 22:49:01,560
We're going to use a batch size of 32 for our first model.
14268
22:49:03,560 --> 22:49:09,800
Numb workers, which will be the number of excuse me, got a typo up here classic number of workers,
14269
22:49:09,800 --> 22:49:16,440
which will be the what the number of CPU cores that we dedicate towards loading our data.
14270
22:49:16,440 --> 22:49:24,200
So let's now create the data loaders. We're going to create train data loader simple,
14271
22:49:24,200 --> 22:49:32,840
which will be equal to data loader. And the data set that goes in here will be train data
14272
22:49:32,840 --> 22:49:37,880
simple. Then we can set the batch size equal to the batch size parameter that we just created,
14273
22:49:37,880 --> 22:49:43,160
or hyper parameter that is, recall a hyper parameter is something that you can set yourself. We
14274
22:49:43,160 --> 22:49:50,520
would like to shuffle the training data. And we're going to set numb workers equal to numb workers.
14275
22:49:51,240 --> 22:49:58,120
So in our case, how many calls does Google Colab have? Let's just run this. Find out how many
14276
22:49:58,120 --> 22:50:05,160
numb workers there are. I think there's going to be two CPUs. Wonderful. And then we're going to do
14277
22:50:05,160 --> 22:50:14,520
the same thing for the test data loader. Test data loader simple. We're going to go data loader.
14278
22:50:14,520 --> 22:50:19,880
We'll pass in the data set here, which is going to be the test data simple. And then we're going
14279
22:50:19,880 --> 22:50:27,720
to go batch size equals batch size. We're not going to shuffle the test data set. And then the
14280
22:50:27,720 --> 22:50:35,800
numb workers will just set it to the same thing as we've got above. Beautiful. So I hope you gave
14281
22:50:35,800 --> 22:50:41,000
that a shot, but now do you see how quickly we can get our data loaded if it's in the right format?
14282
22:50:41,640 --> 22:50:46,280
I know we spent a lot of time going through all of these steps over multiple videos and
14283
22:50:46,280 --> 22:50:51,480
writing lots of code, but this is how quickly we can get set up to load our data. We create a
14284
22:50:51,480 --> 22:50:57,000
simple transform, and then we load in and transform our data at the same time. And then we turn the
14285
22:50:57,000 --> 22:51:02,600
data sets into data loaders just like this. Now we're ready to use these data loaders with a model.
14286
22:51:03,400 --> 22:51:10,040
So speaking of models, how about we build the tiny VGG architecture in the next video? And in
14287
22:51:10,040 --> 22:51:15,400
fact, we've already done this in notebook number three. So if you want to refer back to the model
14288
22:51:15,400 --> 22:51:21,240
that we built there, right down here, which was model number two, if you want to refer back to
14289
22:51:21,240 --> 22:51:27,480
this section and give it a go yourself, I'd encourage you to do so. Otherwise, we'll build tiny VGG
14290
22:51:27,480 --> 22:51:36,440
architecture in the next video. Welcome back. In the last video, we got set up starting to get
14291
22:51:36,440 --> 22:51:41,560
ready to model our first custom data set. And I issued you the challenge to try and replicate
14292
22:51:41,560 --> 22:51:47,400
the tiny VGG architecture from the CNN explainer website, which we covered in notebook number
14293
22:51:47,400 --> 22:51:53,480
three. But now let's see how fast we can do that together. Hey, I'm going to write down here section
14294
22:51:53,480 --> 22:51:59,160
seven point two. And I know we've already coded this up before, but it's good practice to see what
14295
22:51:59,160 --> 22:52:07,320
it's like to build pytorch models from scratch, create tiny VGG model class. So the model is going
14296
22:52:07,320 --> 22:52:12,440
to come from here. Previously, we created our model, there would have been one big change from
14297
22:52:12,440 --> 22:52:18,600
the model that we created in section number three, which is that our model in section number three
14298
22:52:18,600 --> 22:52:24,760
used black and white images. But now the images that we have are going to be color images. So
14299
22:52:24,760 --> 22:52:30,120
there's going to be three color channels rather than one. And there might be a little bit of a
14300
22:52:30,120 --> 22:52:35,880
trick that we have to do to find out the shape later on in the classifier layer. But let's get
14301
22:52:35,880 --> 22:52:43,160
started. We've got class tiny VGG, we're going to inherit from nn.module. This is going to be
14302
22:52:44,200 --> 22:52:55,880
the model architecture copying tiny VGG from CNN explainer. And remember that it's a it's
14303
22:52:55,880 --> 22:53:00,920
quite a common practice in machine learning to find a model that works for a problem similar to
14304
22:53:00,920 --> 22:53:06,440
yours and then copy it and try it on your own problem. So I only want two underscores there.
14305
22:53:06,440 --> 22:53:12,440
We're going to initialize our class. We're going to give it an input shape, which will be an int.
14306
22:53:13,160 --> 22:53:17,960
We're going to say how many hidden units do we want, which will also be an int. And we're going
14307
22:53:17,960 --> 22:53:25,480
to have an output shape, which will be an int as well. And it's going to return something none
14308
22:53:25,480 --> 22:53:32,840
of type none. And if we go down here, we can initialize it with super dot underscore init.
14309
22:53:34,520 --> 22:53:40,520
Beautiful. And now let's create the first COM block. So COM block one, which we'll recall
14310
22:53:40,520 --> 22:53:48,920
will be this section of layers here. So COM block one, let's do an nn.sequential to do so.
14311
22:53:48,920 --> 22:53:56,440
Now we need com relu com relu max pool. So let's try this out. And then com to D.
14312
22:53:57,080 --> 22:54:04,520
The in channels is going to be the input shape of our model. The input shape parameter.
14313
22:54:04,520 --> 22:54:09,320
The out channels is going to be the number of hidden units we have, which is from
14314
22:54:10,360 --> 22:54:15,080
Oh, I'm gonna just put enter down here input shape hidden units. We're just getting those
14315
22:54:15,080 --> 22:54:20,840
to there. Let's set the kernel size to three, which will be how big the convolving window will be
14316
22:54:20,840 --> 22:54:27,400
over our image data. There's a stride of one and the padding equals one as well. So these are the
14317
22:54:27,400 --> 22:54:34,040
similar parameters to what the CNN explainer website uses. And we're going to go and then
14318
22:54:34,840 --> 22:54:43,000
relu. And then we're going to go and then com to D. And I want to stress that even if someone
14319
22:54:43,000 --> 22:54:48,600
else uses like certain values for these, you don't have to copy them exactly. So just keep that in
14320
22:54:48,600 --> 22:54:53,960
mind. You can try out various values of these. These are all hyper parameters that you can set
14321
22:54:53,960 --> 22:55:01,880
yourself. Hidden units, out channels, equals hidden units as well. Then we're going to go kernel
14322
22:55:01,880 --> 22:55:09,240
size equals three stride equals one. And we're going to put padding equals one as well.
14323
22:55:09,240 --> 22:55:13,320
Then we're going to have another relu layer. And I believe I forgot my comma up here.
14324
22:55:15,960 --> 22:55:19,960
Another relu layer here. And we're going to finish off
14325
22:55:21,800 --> 22:55:27,880
with an N dot max pool 2D. And we're going to put in the kernel size.
14326
22:55:27,880 --> 22:55:39,160
These equals two and the stride here equals two. Wonderful. So oh, by the way, for max
14327
22:55:39,160 --> 22:55:47,240
pool 2D, the default stride value is same as the kernel size. So let's have a go here.
14328
22:55:47,720 --> 22:55:55,240
What can we do now? Well, we could just replicate this block as block two. So how about we copy this
14329
22:55:55,240 --> 22:56:00,920
down here? We've already had enough practice writing this sort of code. So we're going to
14330
22:56:00,920 --> 22:56:05,240
go comp block two, but we need to change the input shape here. The input shape of this block
14331
22:56:05,240 --> 22:56:10,200
two is going to receive the output shape here. So we need to line those up. This is going to be
14332
22:56:10,200 --> 22:56:21,480
hidden units. Hidden units. And I believe that's all we need to change there. Beautiful. So let's
14333
22:56:21,480 --> 22:56:27,240
create the classifier layer. And the classifier layer recall is going to be this output layer
14334
22:56:27,240 --> 22:56:33,240
here. So we need at some point to add a linear layer. That's going to have a number of outputs
14335
22:56:33,240 --> 22:56:37,960
equal to the number of classes that we're working with. And in this case, the number of classes is
14336
22:56:37,960 --> 22:56:45,320
10. But in our case, our custom data set, we have three classes, pizza, steak, sushi. So let's
14337
22:56:45,320 --> 22:56:51,320
create a classifier layer, which will be an end sequential. And then we're going to pass in an end
14338
22:56:51,320 --> 22:56:57,880
dot flatten to turn the outputs of our convolutional blocks into feature vector into a feature vector
14339
22:56:57,880 --> 22:57:03,880
site. And then we're going to have an end dot linear. And the end features, do you remember my
14340
22:57:03,880 --> 22:57:09,400
trick for calculating the shape in features? I'm going to put hidden units here for the time being.
14341
22:57:09,400 --> 22:57:16,520
Out features is going to be output shape. So I put hidden units here for the time being because
14342
22:57:16,520 --> 22:57:22,520
we don't quite yet know what the output shape of all of these operations is going to be. Of course,
14343
22:57:22,520 --> 22:57:28,040
we could calculate them by hand by looking up the formula for input and output shapes of convolutional
14344
22:57:28,040 --> 22:57:34,520
layers. So the input and output shapes are here. But I prefer to just do it programmatically and let
14345
22:57:34,520 --> 22:57:40,920
the errors tell me where I'm wrong. So we can do that by doing a forward pass. And speaking of a
14346
22:57:40,920 --> 22:57:45,880
forward pass, let's create a forward method, because every time we have to subclass an end
14347
22:57:45,880 --> 22:57:51,480
dot module, we have to override the forward method. We've done this a few times. But as you can see,
14348
22:57:51,480 --> 22:57:57,960
I'm picking up the pace a little bit because you've got this. So let's pass in the conv block one,
14349
22:57:57,960 --> 22:58:03,480
we're going to go X, then we're going to print out x dot shape. And then we're going to reassign
14350
22:58:03,480 --> 22:58:10,120
X to be self.com block two. So we're passing it through our second block of convolutional layers,
14351
22:58:10,120 --> 22:58:15,480
print X dot shape to check the shape here. Now this is where our model will probably error
14352
22:58:15,480 --> 22:58:20,760
is because the input shape here isn't going to line up in features, hidden units, because we've
14353
22:58:20,760 --> 22:58:26,600
passed all of the output of what's going through comp block one, comp block two to a flatten layer,
14354
22:58:26,600 --> 22:58:32,040
because we want a feature vector to go into our nn.linear layer, our output layer, which has an
14355
22:58:32,040 --> 22:58:38,520
out features size of output shape. And then we're going to return X. So I'm going to print x dot
14356
22:58:38,520 --> 22:58:43,240
shape here. And I just want to let you in on one little secret as well. We haven't covered this
14357
22:58:43,240 --> 22:58:48,600
before, but we could rewrite this entire forward method, this entire stack of code,
14358
22:58:48,600 --> 22:58:55,320
by going return self dot classifier, and then going from the outside in. So we could pass in
14359
22:58:55,320 --> 22:59:03,240
comp block two here, comp block two, and then self comp block one, and then X on the inside.
14360
22:59:03,960 --> 22:59:09,720
So that is essentially the exact same thing as what we've done here, except this is going to
14361
22:59:10,520 --> 22:59:17,560
benefits from operator fusion. Now this topic is beyond the scope of this course,
14362
22:59:17,560 --> 22:59:22,840
essentially, all you need to know is that operator fusion behind the scenes speeds up
14363
22:59:22,840 --> 22:59:27,960
how your GPU performs computations. So all of these are going to happen in one step,
14364
22:59:27,960 --> 22:59:33,640
rather than here, we are reassigning X every time we make a computation through these layers.
14365
22:59:33,640 --> 22:59:40,280
So we're spending time going from computation back to memory, computation back to memory,
14366
22:59:40,280 --> 22:59:44,440
whereas this kind of just chunks it all together in one hit. If you'd like to read
14367
22:59:44,440 --> 22:59:49,880
more about this, I'd encourage you to look up the blog post, how to make your GPUs go
14368
22:59:49,880 --> 22:59:58,600
bur from first principles, and bur means fast. That's why I love this post, right?
14369
22:59:58,600 --> 23:00:04,440
Because it's half satire, half legitimately, like GPU computer science. So if you go in here,
14370
23:00:04,440 --> 23:00:08,520
yeah, here's what we want to avoid. We want to avoid all of this transportation between
14371
23:00:08,520 --> 23:00:14,840
memory and compute. And then if we look in here, we might have operator fusion. There we go.
14372
23:00:14,840 --> 23:00:20,680
This is operator fusion, the most important optimization in deep learning compilers. So
14373
23:00:20,680 --> 23:00:25,640
I will link this, making deep learning go bur from first principles by Horace Hare,
14374
23:00:25,640 --> 23:00:31,400
a great blog post that I really like, right here. So if you'd like to read more on that,
14375
23:00:31,400 --> 23:00:35,800
it's also going to be in the extracurricular section of the course. So don't worry, it'll be there.
14376
23:00:35,800 --> 23:00:43,080
Now, we've got a model. Oh, where do we, where do we forget a comma? Right here, of course we did.
14377
23:00:47,000 --> 23:00:51,240
And we've got another, we forgot another comma up here. Did you notice these?
14378
23:00:53,080 --> 23:00:59,480
Beautiful. Okay. So now we can create our model by going torch or an instance of the tiny VGG
14379
23:00:59,480 --> 23:01:07,000
to see if our model holds up. Let's create model zero equals tiny VGG. And I'm going to pass in
14380
23:01:07,000 --> 23:01:11,000
the input shape. What is the input shape? It's going to be the number of color channels of our
14381
23:01:11,000 --> 23:01:17,640
image. So number of color channels in our image data, which is three, because we have color images.
14382
23:01:19,400 --> 23:01:24,280
And then we're going to put in hidden units, equals 10, which will be the same number of
14383
23:01:24,280 --> 23:01:31,720
hidden units as the tiny VGG architecture. One, two, three, four, five, six, seven, eight, nine,
14384
23:01:31,720 --> 23:01:39,080
10. Again, we could put in 10, we could put in 100, we could put in 64, which is a good multiple
14385
23:01:39,080 --> 23:01:44,120
of eight. So let's just leave it at 10 for now. And then the output shape is going to be what?
14386
23:01:44,680 --> 23:01:49,960
It's going to be the length of our class names, because we want one hidden unit or one output unit
14387
23:01:49,960 --> 23:01:55,800
per class. And then we're going to send it to the target device, which is of course CUDA. And then
14388
23:01:55,800 --> 23:02:05,080
we can check out our model zero here. Beautiful. So that took a few seconds, as you saw there,
14389
23:02:05,080 --> 23:02:09,160
to move to the GPU memory. So that's just something to keep in mind for when you build
14390
23:02:09,160 --> 23:02:14,200
large neural networks and you want to speed up their computation, is to use operator fusion
14391
23:02:14,200 --> 23:02:19,480
where you can, because as you saw, it took a few seconds for our model to just move from the CPU,
14392
23:02:19,480 --> 23:02:26,680
which is the default to the GPU. So we've got our architecture here. But of course, we know that
14393
23:02:26,680 --> 23:02:32,680
this potentially is wrong. And how would we find that out? Well, we could find the right hidden
14394
23:02:32,680 --> 23:02:38,440
unit shape or we could find that it's wrong by passing some dummy data through our model. So
14395
23:02:38,440 --> 23:02:43,560
that's one of my favorite ways to troubleshoot a model. Let's in the next video pass some dummy
14396
23:02:43,560 --> 23:02:49,240
data through our model and see if we've implemented the forward pass correctly. And also check the
14397
23:02:49,240 --> 23:03:00,040
input and output shapes of each of our layers. I'll see you there. In the last video, we replicated
14398
23:03:00,040 --> 23:03:05,960
the tiny VGG architecture from the CNN explainer website, very similar to the model that we built
14399
23:03:05,960 --> 23:03:13,320
in section 03. But this time, we're using color images instead of grayscale images. And we did
14400
23:03:13,320 --> 23:03:18,440
it quite a bit faster than what we previously did, because we've already covered it, right?
14401
23:03:18,440 --> 23:03:21,960
And you've had some experience now building pilotage models from scratch.
14402
23:03:21,960 --> 23:03:28,680
So we're going to pick up the pace when we build our models. But let's now go and try a dummy
14403
23:03:28,680 --> 23:03:34,680
forward pass to check that our forward method is working correctly and that our input and output
14404
23:03:34,680 --> 23:03:42,440
shapes are correct. So let's create a new heading. Try a forward pass on a single image. And this
14405
23:03:42,440 --> 23:03:51,720
is one of my favorite ways to test the model. So let's first get a single image. Get a single
14406
23:03:51,720 --> 23:03:58,440
image. We want an image batch. Maybe we get an image batch, get a single image batch, because
14407
23:03:58,440 --> 23:04:07,720
we've got images that are batches already image batch. And then we'll get a label batch. And we'll
14408
23:04:07,720 --> 23:04:15,320
go next, it a train data loader. Simple. That's the data loader that we're working with for now.
14409
23:04:16,600 --> 23:04:22,040
And then we'll check image batch dot shape and label batch dot shape.
14410
23:04:25,480 --> 23:04:29,720
Wonderful. And now let's see what happens. Try a forward pass.
14411
23:04:29,720 --> 23:04:38,760
Oh, I spelled single wrong up here. Try a forward pass. We could try this on a single image trying
14412
23:04:38,760 --> 23:04:45,640
it on a same batch will result in similar results. So let's go model zero. And we're just going to
14413
23:04:45,640 --> 23:04:54,360
pass it in the image batch and see what happens. Oh, no. Of course, we get that input type,
14414
23:04:54,360 --> 23:05:00,120
torch float tensor and wait type torch CUDA float tensor should be the same or input should be.
14415
23:05:00,760 --> 23:05:06,520
So we've got tensors on a different device, right? So this is on the CPU, the image batch,
14416
23:05:06,520 --> 23:05:12,360
whereas our model is, of course, on the target device. So we've seen this error a number of times.
14417
23:05:12,360 --> 23:05:20,360
Let's see if this fixes it. Oh, we get an other error. And we kind of expected this type of error.
14418
23:05:20,360 --> 23:05:26,680
We've got runtime error amount one and mat two shapes cannot be multiplied. 32. So that looks
14419
23:05:26,680 --> 23:05:35,560
like the batch size 2560 and 10. Hmm, what is 10? Well, recall that 10 is the number of hidden
14420
23:05:35,560 --> 23:05:41,640
units that we have. So this is the size here. That's 10 there. So it's trying to multiply
14421
23:05:42,280 --> 23:05:48,920
a matrix of this size by this size. So 10 has got something going on with it. We need to get
14422
23:05:48,920 --> 23:05:53,960
these two numbers, the middle numbers, to satisfy the rules of matrix multiplication,
14423
23:05:53,960 --> 23:05:58,120
because that's what happens in our linear layer. We need to get these two numbers the same.
14424
23:06:00,040 --> 23:06:07,080
And so our hint and my trick is to look at the previous layer. So if that's our batch size,
14425
23:06:07,080 --> 23:06:15,240
where does this value come from? Well, could it be the fact that a tensor of this size goes
14426
23:06:15,240 --> 23:06:22,360
through the flatten layer? Recall that we have this layer up here. So we've printed out the shape
14427
23:06:22,360 --> 23:06:29,720
here of the conv block, the output of conv block one. Now this shape here is the output of conv
14428
23:06:29,720 --> 23:06:36,360
block two. So we've got this number, the output of conv block one, and then the output of conv
14429
23:06:36,360 --> 23:06:43,560
block two. So that must be the input to our classifier layer. So if we go 10 times 16 times 16,
14430
23:06:43,560 --> 23:06:55,000
what do we get? 2560. Beautiful. So we can multiply our hitting units 10 by 16 by 16, which is the
14431
23:06:55,000 --> 23:07:04,440
shape here. And we get 2560. Let's see if that works. We'll go up here, times 16 times 16.
14432
23:07:05,080 --> 23:07:10,280
And let's see what happens. We'll rerun the model, we'll rerun the image batch, and then we'll pass
14433
23:07:10,280 --> 23:07:16,920
it. Oh, look at that. Our model works. Or the shapes at least line up. We don't know if it works
14434
23:07:16,920 --> 23:07:22,280
yet. We haven't started training yet. But this is the output size. We've got the output. It's on
14435
23:07:22,280 --> 23:07:27,640
the CUDA device, of course. But we've got 32 samples with three numbers in each. Now these are going
14436
23:07:27,640 --> 23:07:32,920
to be as good as random, because we haven't trained our model yet. We've only initialized it here
14437
23:07:32,920 --> 23:07:41,560
with random weights. So we've got 32 or a batch worth of random predictions on 32 images.
14438
23:07:42,360 --> 23:07:46,920
So you see how the output shape here three corresponds to the output shape we set up here.
14439
23:07:47,640 --> 23:07:52,280
Output shape equals length class names, which is exactly the number of classes that we're dealing
14440
23:07:52,280 --> 23:07:59,880
with. But I think our number is a little bit different to what's in the CNN explainer 1616.
14441
23:07:59,880 --> 23:08:07,240
How did they end up with 1313? You know what? I think we got one of these numbers wrong,
14442
23:08:07,240 --> 23:08:13,640
kernel size, stride, padding. Let's have a look. Jump into here. If we wanted to truly replicate it,
14443
23:08:14,680 --> 23:08:20,520
is there any padding here? I actually don't think there's any padding here. So what if we go back
14444
23:08:20,520 --> 23:08:27,560
here and see if we can change this to zero and change this to zero? Zero. I'm not sure if this
14445
23:08:27,560 --> 23:08:31,800
will work, by the way. If it doesn't, it's not too bad, but we're just trying to line up the shapes
14446
23:08:31,800 --> 23:08:38,920
with the CNN explainer to truly replicate it. So the output of the COM Block 1 should be 30-30-10.
14447
23:08:38,920 --> 23:08:46,200
What are we working with at the moment? We've got 32-32-10. So let's see if removing the padding
14448
23:08:46,200 --> 23:08:51,960
from our convolutional layers lines our shape up with the CNN explainer. So I'm going to rerun
14449
23:08:51,960 --> 23:08:57,320
this, rerun our model. I've set the padding to zero on all of our padding hyper parameters.
14450
23:08:58,040 --> 23:09:02,600
Oh, and we get another error. We get another shape error. Of course we do,
14451
23:09:02,600 --> 23:09:09,080
because we've now got different shapes. Wow, do you see how often that these errors come up?
14452
23:09:10,200 --> 23:09:15,080
Trust me, I spend a lot of time troubleshooting these shape errors. So we now have to line up
14453
23:09:15,080 --> 23:09:23,080
these shapes. So we've got 13-13-10. Now does that equal 16-90? Let's try it out. 13-13-10.
14454
23:09:24,120 --> 23:09:30,760
16-90. Beautiful. And do our shapes line up with the CNN explainer? So we've got 30-30-10.
14455
23:09:30,760 --> 23:09:36,520
Remember, these are in PyTorch. So color channels first, whereas this is color channels last. So
14456
23:09:36,520 --> 23:09:41,640
yeah, we've got the output of our first COM Block is lining up here. That's correct.
14457
23:09:41,640 --> 23:09:46,200
And then same with the second block. How good is that? We've officially replicated the CNN explainer
14458
23:09:46,200 --> 23:09:55,240
model. So we can take this value 13-13-10 and bring it back up here. 13-13-10. Remember,
14459
23:09:55,240 --> 23:09:59,640
hidden units is 10. So we're just going to multiply it by 13-13. You could calculate
14460
23:09:59,640 --> 23:10:05,160
these shapes by hand, but my trick is I like to let the error codes give me a hint of where to go.
14461
23:10:05,160 --> 23:10:15,160
And boom, there we go. We get it working again. Some shape troubleshooting on the fly. So now
14462
23:10:15,160 --> 23:10:20,440
we've done a single forward pass on the model. We can kind of verify that our data at least flows
14463
23:10:20,440 --> 23:10:27,240
through it. What's next? Well, I'd like to show you another little package that I like to use
14464
23:10:27,240 --> 23:10:33,480
to also have a look at the input and output shapes of my model. And that is called Torch Info. So
14465
23:10:33,480 --> 23:10:39,400
you might want to give this a shot before we go into the next video. But in the next video,
14466
23:10:39,400 --> 23:10:44,280
we're going to see how we can use Torch Info to print out a summary of our model. So we're
14467
23:10:44,280 --> 23:10:51,080
going to get something like this. So this is how beautifully easy Torch Info is to use. So
14468
23:10:51,080 --> 23:10:57,160
give that a shot, install it into Google CoLab and run it in a cell here. See if you can get
14469
23:10:57,160 --> 23:11:03,800
something similar to this output for our model zero. And I'll see you in the next video. We'll try
14470
23:11:03,800 --> 23:11:13,880
that together. In the last video, we checked our model by doing a forward pass on a single batch.
14471
23:11:13,880 --> 23:11:18,920
And we learned that our forward method so far looks like it's intact and that we don't get any
14472
23:11:18,920 --> 23:11:24,440
shape errors as our data moves through the model. But I'd like to introduce to you one of my
14473
23:11:24,440 --> 23:11:31,480
favorite packages for finding out information from a PyTorch model. And that is Torch Info.
14474
23:11:31,480 --> 23:11:40,200
So let's use Torch Info to get an idea of the shapes going through our model. So you know how
14475
23:11:40,200 --> 23:11:46,040
much I love doing things in a programmatic way? Well, that's what Torch Info does. Before,
14476
23:11:46,040 --> 23:11:50,200
we used print statements to find out the different shapes going through our model.
14477
23:11:50,200 --> 23:11:54,920
And I'm just going to comment these out in our forward method so that when we run this later on
14478
23:11:54,920 --> 23:12:00,840
during training, we don't get excessive printouts of all the shapes. So let's see what Torch Info
14479
23:12:00,840 --> 23:12:06,520
does. And in the last video, I issued a challenge to give it a go. It's quite straightforward of
14480
23:12:06,520 --> 23:12:11,240
how to use it. But let's see it together. This is the type of output we're looking for from our
14481
23:12:11,240 --> 23:12:16,840
tiny VGG model. And of course, you could get this type of output from almost any PyTorch model.
14482
23:12:16,840 --> 23:12:23,080
But we have to install it first. And as far as I know, Google CoLab doesn't come with Torch Info
14483
23:12:23,080 --> 23:12:29,800
by default. Now, you might as well try this in the future and see if it works. But yeah, I don't
14484
23:12:29,800 --> 23:12:35,800
get this module because my Google CoLab instance doesn't have an install. No problem with that.
14485
23:12:35,800 --> 23:12:45,400
Let's install Torch Info here. Install Torch Info and then we'll import it if it's available.
14486
23:12:45,400 --> 23:12:51,320
So we're going to try and import Torch Info. If it's already installed, we'll import it.
14487
23:12:51,320 --> 23:12:58,840
And then if it doesn't work, if that try block fails, we're going to run pip install Torch Info.
14488
23:12:58,840 --> 23:13:06,440
And then we will import Torch Info. And then we're going to run down here from Torch Info,
14489
23:13:06,440 --> 23:13:12,760
import summary. And then if this all works, we're going to get a summary of our model. We're going
14490
23:13:12,760 --> 23:13:19,240
to pass it in model zero. And we have to put in an input size here. Now that is an example of the
14491
23:13:19,240 --> 23:13:24,600
size of data that will flow through our model. So in our case, let's put in an input size of 1,
14492
23:13:25,160 --> 23:13:32,440
3, 64, 64. So this is an example of putting in a batch of one image. You could potentially
14493
23:13:32,440 --> 23:13:37,560
put in 32 here if you wanted, but let's just put in a batch of a singular image. And of course,
14494
23:13:37,560 --> 23:13:43,560
we could change these values here if we wanted to, 24 to 24. But what you might notice is that if
14495
23:13:43,560 --> 23:13:50,440
it doesn't get the right input size, it produces an error. There we go. So just like we got before
14496
23:13:50,440 --> 23:13:55,400
when we printed out our input sizes manually, we get an error here. Because what Torch Info
14497
23:13:55,400 --> 23:14:00,280
behind the scenes is going to do is it's going to do a forward pass on whichever model you pass
14498
23:14:00,280 --> 23:14:06,360
it with an input size of whichever input size you give it. So let's put in the input size that
14499
23:14:06,360 --> 23:14:15,800
our model was built for. Wonderful. So what Torch Info gives us is, oh, excuse me, we didn't
14500
23:14:15,800 --> 23:14:22,680
comment out the printouts before. So just make sure we've commented out these printouts in the
14501
23:14:22,680 --> 23:14:29,240
forward method of our 20 VGG class. So I'm just going to run this, then we run that, run that,
14502
23:14:29,240 --> 23:14:35,080
just to make sure everything still works. We'll run Torch Info. There we go. So no printouts
14503
23:14:35,080 --> 23:14:40,360
from our model, but this is, look how beautiful this is. I love how this prints out. So we have
14504
23:14:40,360 --> 23:14:46,600
our tiny VGG class, and then we can see it's comprised of three sequential blocks. And then
14505
23:14:46,600 --> 23:14:51,480
inside those sequential blocks, we have different combinations of layers. We have some conv layers,
14506
23:14:52,040 --> 23:14:57,720
some relu layers, some max pool layers. And then the final layer is our classification layer
14507
23:14:57,720 --> 23:15:03,720
with a flatten and a linear layer. And we can see the shapes changing throughout our model.
14508
23:15:03,720 --> 23:15:10,760
As our data goes in and gets manipulated by the various layers. So are these in line with
14509
23:15:10,760 --> 23:15:15,960
the CNN explainer? So if we check this last one, we've already verified this before.
14510
23:15:17,320 --> 23:15:23,560
And we also get some other helpful information down here, which is total params. So you can see
14511
23:15:23,560 --> 23:15:29,400
that each of these layers has a different amount of parameters to learn. Now, recall that a parameter
14512
23:15:29,400 --> 23:15:36,280
is a value such as a weight or a bias term within each of our layers, which starts off as a random
14513
23:15:36,280 --> 23:15:42,120
number. And the whole goal of deep learning is to adjust those random numbers to better represent
14514
23:15:42,120 --> 23:15:49,000
our data. So in our case, we have just over 8000 total parameters. Now this is actually quite small.
14515
23:15:49,800 --> 23:15:53,880
In the future, you'll probably play around with models that have a million parameters or more.
14516
23:15:53,880 --> 23:16:00,280
And models now are starting to have many billions of parameters. And we also get some
14517
23:16:00,280 --> 23:16:04,840
information here, such as how much the model size would be. Now this would be very helpful,
14518
23:16:05,400 --> 23:16:09,880
depending on where we had to put our model. So what you'll notice is that as a model gets larger,
14519
23:16:10,520 --> 23:16:15,640
as more layers, it will have more parameters, more weights and bias terms that can be adjusted
14520
23:16:15,640 --> 23:16:23,000
to learn patterns and data. But its input size and its estimated total size would definitely get
14521
23:16:23,000 --> 23:16:27,880
bigger as well. So that's just something to keep in mind if you have size constraints in terms of
14522
23:16:27,880 --> 23:16:34,600
storage in your future applications. So ours is under a megabyte, which is quite small. But you
14523
23:16:34,600 --> 23:16:39,720
might find that some models in the future get up to 500 megabytes, maybe even over a gigabyte.
14524
23:16:39,720 --> 23:16:44,600
So just keep that in mind for going forward. And that's the crux of torch info, one of my
14525
23:16:44,600 --> 23:16:49,640
favorite packages, just gives you an idea of the input and output shapes of each of your layers.
14526
23:16:49,640 --> 23:16:54,760
So you can use torch info wherever you need. It should work with most of your PyTorch models.
14527
23:16:54,760 --> 23:17:00,280
Just be sure to pass it in the right input size. You can also use it to verify like we did before,
14528
23:17:00,280 --> 23:17:06,520
if the input and output shapes are correct. So check that out, big shout out to Tyler Yup,
14529
23:17:06,520 --> 23:17:13,640
and everyone who's created the torch info package. Now in the next video, let's move towards training
14530
23:17:13,640 --> 23:17:19,000
our tiny VGG model. We're going to have to create some training and test functions. If you want to
14531
23:17:19,000 --> 23:17:26,920
jump ahead, we've already done this. So I encourage you to go back to section 6.2 in the
14532
23:17:26,920 --> 23:17:31,880
functionalizing training and test loops. And we're going to build functions very similar to this,
14533
23:17:31,880 --> 23:17:38,520
but for our custom data set. So if you want to replicate these functions in this notebook,
14534
23:17:39,160 --> 23:17:43,240
give that a go. Otherwise, I'll see you in the next video and we'll do it together.
14535
23:17:43,240 --> 23:17:53,000
How'd you go? Did you give it a shot? Did you try replicating the train step and the test step
14536
23:17:53,000 --> 23:17:58,760
function? I hope you did. Otherwise, let's do that in this video, but this time we're going to do
14537
23:17:58,760 --> 23:18:04,280
it for our custom data sets. And what you'll find is not much, if anything, changes, because
14538
23:18:04,280 --> 23:18:10,520
we've created our train and test loop functions in such a way that they're generic. So we want
14539
23:18:10,520 --> 23:18:16,360
to create a train step function. And by generic, I mean they can be used with almost any model and
14540
23:18:16,360 --> 23:18:26,520
data loader. So train step is takes in a model and data loader and trains the model on the data
14541
23:18:26,520 --> 23:18:33,320
loader. And we also want to create another function called test step, which takes in
14542
23:18:33,320 --> 23:18:42,040
a model and a data loader and other things and evaluates the model on the data loader. And of course,
14543
23:18:42,040 --> 23:18:47,080
for the train step and for the test step, each of them respectively are going to take a training
14544
23:18:47,080 --> 23:18:53,720
data loader. I just might make this a third heading so that our outline looks nice, beautiful.
14545
23:18:54,360 --> 23:18:58,520
Section seven is turning out to be quite a big section. Of course, we want them to be
14546
23:18:58,520 --> 23:19:04,360
respectively taken their own data loader. So train takes in the train data loader, test takes in the
14547
23:19:04,360 --> 23:19:09,800
test data loader. Without any further ado, let's create the train step function. Now we've seen
14548
23:19:09,800 --> 23:19:15,880
this one in the computer vision section. So let's see what we can make here. So we need a train
14549
23:19:15,880 --> 23:19:21,880
step, which is going to take in a model, which will be a torch and then dot module. And we want
14550
23:19:21,880 --> 23:19:28,680
it also to take in a data loader, which will be a torch dot utils dot data dot data loader.
14551
23:19:29,960 --> 23:19:34,120
And then it's going to take in a loss function, which is going to be a torch and then
14552
23:19:34,120 --> 23:19:41,720
dot module as well. And then it's going to take in an optimizer, which is going to be torch
14553
23:19:41,720 --> 23:19:49,400
opt in dot optimizer. Wonderful. And then what do we do? What's the first thing that we do in
14554
23:19:49,400 --> 23:19:57,560
a training step? Well, we put the model in train mode. So let's go model dot train.
14555
23:19:58,840 --> 23:20:04,600
Then what shall we do next? Well, let's set up some evaluation metrics, one of them being loss
14556
23:20:04,600 --> 23:20:13,480
and one of them being accuracy. So set up train loss and train accuracy values. And we're going
14557
23:20:13,480 --> 23:20:18,360
to accumulate these per batch because we're working with batches. So we've got train loss
14558
23:20:18,360 --> 23:20:27,560
and train act equals zero, zero. Now we can loop through our data loader. So let's write loop through
14559
23:20:28,600 --> 23:20:33,240
data loader. And we'll loop through each of the batches in this because we've batchified our
14560
23:20:33,240 --> 23:20:42,520
data loader. So for batch x, y, in enumerate data loader, we want to send the data to the target
14561
23:20:42,520 --> 23:20:52,280
device. So we could even put that device parameter up here. Device equals device. We'll set that
14562
23:20:52,280 --> 23:21:04,440
to device by default. And then we can go x, y equals x dot two device. And y dot two device.
14563
23:21:05,880 --> 23:21:11,960
Beautiful. And now what do we do? Well, remember the pie torch, the unofficial pie torch optimization
14564
23:21:11,960 --> 23:21:22,040
song, we do the forward pass. So y pred equals model om x. And then number two is we calculate the
14565
23:21:22,040 --> 23:21:32,280
last. So calculate the loss. Let's go loss equals loss function. And we're going to pass it in
14566
23:21:32,280 --> 23:21:37,560
y pred y. We've done this a few times now. So that's why we're doing it a little bit faster.
14567
23:21:37,560 --> 23:21:42,520
So I hope you noticed that the things that we've covered before, I'm stepping up the pace a bit.
14568
23:21:42,520 --> 23:21:47,880
So it might be a bit of a challenge, but that's all right, you can handle it. And then, so that's
14569
23:21:47,880 --> 23:21:53,960
accumulating the loss. So we're starting from zero up here. And then each batch, we're doing a forward
14570
23:21:53,960 --> 23:22:00,200
pass, calculating the loss, and then adding it to the overall train loss. And so we're going to
14571
23:22:00,200 --> 23:22:07,640
optimize a zero grad. So zero, the gradients of the optimizer for each new batch. And then we're
14572
23:22:07,640 --> 23:22:16,040
going to perform back propagation. So loss backwards. And then five, what do we do? Optimize a step,
14573
23:22:16,040 --> 23:22:22,600
step, step. Wonderful. Look at that. Look at us coding a train loop in a minute or so.
14574
23:22:22,600 --> 23:22:31,640
Now, let's calculate the accuracy and accumulate it. Calculate the, you notice that we don't have
14575
23:22:31,640 --> 23:22:39,560
an accuracy function here. That's because accuracy is quite a straightforward metric to calculate.
14576
23:22:39,560 --> 23:22:46,280
So we'll first get the, the y pred class, because this is going to output model logits.
14577
23:22:46,280 --> 23:22:54,600
As we've seen before, the raw output of a model is logits. So to get the class, we're going to take
14578
23:22:54,600 --> 23:23:02,040
the arg max torch dot softmax. So we'll get the prediction probabilities of y pred, which is the
14579
23:23:02,040 --> 23:23:07,480
raw logits, what we've got up here, across dimension one, and then also across dimension one here.
14580
23:23:09,640 --> 23:23:15,080
Beautiful. So that should give us the labels. And then we can find out if this is wrong by
14581
23:23:15,080 --> 23:23:21,640
checking it later on. And then we're going to create the accuracy by taking the y pred class,
14582
23:23:21,640 --> 23:23:28,360
checking for a quality with the right labels. So this is going to give us how many of these
14583
23:23:28,360 --> 23:23:34,760
values equal true. And we want to take the sum of that, take the item of that, which is just a
14584
23:23:34,760 --> 23:23:41,640
single integer. And then we want to divide it by the length of y pred. So we're just getting the
14585
23:23:41,640 --> 23:23:48,040
total number that are right, and dividing it by the length of samples. So that's the formula for
14586
23:23:48,040 --> 23:23:53,640
accuracy. Now we can come down here outside of the batch loop, we know that because we've got this
14587
23:23:53,640 --> 23:24:03,160
helpful line drawn here. And we can go adjust metrics to get the average loss and accuracy
14588
23:24:03,160 --> 23:24:10,360
per batch. So we're going to set train loss is equal to train loss, divided by the length of
14589
23:24:10,360 --> 23:24:15,720
the data loader. So the number of batches in total. And the train accuracy is the train
14590
23:24:15,720 --> 23:24:21,720
act, divided by the length of the data loader as well. So that's going to give us the average
14591
23:24:21,720 --> 23:24:31,720
loss and average accuracy per epoch across all batches. So train act. Now that's a pretty good
14592
23:24:31,720 --> 23:24:38,760
looking function to me for a train step. Do you want to take on the test step? So pause the video,
14593
23:24:38,760 --> 23:24:44,600
give it a shot, and you'll get great inspiration from this notebook here. Otherwise, we're going
14594
23:24:44,600 --> 23:24:51,800
to do it together in three, two, one, let's do the test step. So create a test step function.
14595
23:24:53,080 --> 23:24:59,800
So we want to be able to call these functions in an epoch loop. And that way, instead of writing
14596
23:24:59,800 --> 23:25:03,560
out training and test code for multiple different models, we just write it out once, and we can
14597
23:25:03,560 --> 23:25:10,680
call those functions. So let's create def test step, we're going to do model, which is going to be
14598
23:25:11,960 --> 23:25:16,840
if I could type torch and then module. And then we're going to do data loader,
14599
23:25:17,640 --> 23:25:26,680
which is torch utils dot data, that data loader, capital L there. And then we're going to just
14600
23:25:26,680 --> 23:25:31,960
pass in a loss function here, because we don't need an optimizer for the test function. We're
14601
23:25:31,960 --> 23:25:37,160
not trying to optimize anything, we're just trying to evaluate how our model did on the training
14602
23:25:37,160 --> 23:25:42,600
dataset. And let's put in the device here, why not? That way we can change the device if we need
14603
23:25:42,600 --> 23:25:48,520
to. So put model in a val mode, because we're going to be evaluating or we're going to be testing.
14604
23:25:49,480 --> 23:26:00,360
Then we can set up test loss and test accuracy values. So test loss and test act. We're going
14605
23:26:00,360 --> 23:26:05,480
to make these zero, we're going to accumulate them per batch. But before we go through the batch,
14606
23:26:05,480 --> 23:26:12,120
let's turn on inference mode. So this is behind the scenes going to take care of a lot of pie torch
14607
23:26:12,120 --> 23:26:16,760
functionality that we don't need. That's very helpful during training, such as tracking gradients.
14608
23:26:16,760 --> 23:26:23,400
But during testing, we don't need that. So loop through data loader or data batches.
14609
23:26:23,400 --> 23:26:33,720
And we're going to go for batch x, y in enumerate data loader. You'll notice that above, we didn't
14610
23:26:33,720 --> 23:26:40,600
actually use this batch term here. And we probably won't use it here either. But I just like to go
14611
23:26:40,600 --> 23:26:48,120
through and have that there in case we wanted to use it anyway. So send data to the target device.
14612
23:26:48,120 --> 23:26:59,000
So we're going to go x, y equals x dot two device. And same with y dot two device. Beautiful. And
14613
23:26:59,000 --> 23:27:04,760
then what do we do for an evaluation step or a test step? Well, of course, we do the forward pass,
14614
23:27:05,480 --> 23:27:14,080
forward pass. And we're going to, let's call these test pred logits and get the raw outputs of our
14615
23:27:14,080 --> 23:27:21,680
model. And then we can calculate the loss on those raw outputs, calculate the loss. We get the loss
14616
23:27:21,680 --> 23:27:30,880
is equal to loss function on test pred logits versus y. And then we're going to accumulate the
14617
23:27:30,880 --> 23:27:39,200
loss. So test loss plus equals loss dot item. Remember, item just gets a single integer from
14618
23:27:39,200 --> 23:27:46,320
whatever term you call it on. And then we're going to calculate the accuracy. Now we can do this
14619
23:27:47,040 --> 23:27:53,600
exactly how we've done for the training data set or the training step. So test pred labels,
14620
23:27:53,600 --> 23:27:59,120
we're going to, you don't, I just want to highlight the fact that you actually don't need to take
14621
23:27:59,120 --> 23:28:04,640
the softmax here, you could just take the argmax directly from this. The reason why we take the
14622
23:28:04,640 --> 23:28:11,040
softmax. So you could do the same here, you could just directly take the argmax of the logits. The
14623
23:28:11,040 --> 23:28:16,000
reason why we get the softmax is just for completeness. So if you wanted the prediction probabilities,
14624
23:28:16,000 --> 23:28:22,240
you could use torch dot softmax on the prediction logits. But it's not 100% necessary to get the
14625
23:28:22,240 --> 23:28:27,360
same values. And you can test this out yourself. So try this with and without the softmax and
14626
23:28:27,360 --> 23:28:34,320
see if you get the same results. So we're going to go test accuracy. Plus equals, now we'll just
14627
23:28:34,320 --> 23:28:40,560
create our accuracy calculation on the fly test pred labels. We'll check for equality on the y,
14628
23:28:41,280 --> 23:28:46,080
then we'll get the sum of that, we'll get the item of that, and then we'll divide that by the
14629
23:28:46,080 --> 23:28:53,040
length of the test pred labels. Beautiful. So it's going to give us accuracy per batch. And so now
14630
23:28:53,040 --> 23:29:04,320
we want to adjust the metrics to get average loss and accuracy per batch. So test loss equals
14631
23:29:04,320 --> 23:29:11,120
test loss divided by length of the data loader. And then we're going to go test,
14632
23:29:11,120 --> 23:29:17,360
ac equals test, act divided by length of the data loader. And then finally, we're going to
14633
23:29:17,360 --> 23:29:28,640
return the test loss, not lost, and test accuracy. Look at us go. Now, in previous videos, that took
14634
23:29:28,640 --> 23:29:33,440
us, or in previous sections, that took us a fairly long time. But now we've done it in about 10
14635
23:29:33,440 --> 23:29:38,400
minutes or so. So give yourself a pat in the back for all the progress you've been making.
14636
23:29:38,960 --> 23:29:44,720
But now let's in the next video, we did this in the computer vision section as well. We created,
14637
23:29:44,720 --> 23:29:52,560
do we create a train function? Oh, no, we didn't. But we could. So let's create a function to
14638
23:29:53,120 --> 23:29:57,760
functionize this. We want to train our model. I think we did actually. Deaf train, we've done
14639
23:29:57,760 --> 23:30:05,920
so much. I'm not sure what we've done. Oh, okay. So looks like we might not have. But in the next
14640
23:30:05,920 --> 23:30:11,840
video, give yourself this challenge, create a function called train that combines these two
14641
23:30:11,840 --> 23:30:19,200
functions and loops through them both with an epoch range. So just like we've done here in the
14642
23:30:19,200 --> 23:30:26,080
previous notebook, can you functionize this? So just this step here. So you'll need to take in a
14643
23:30:26,080 --> 23:30:30,800
number of epochs, you'll need to take in a train data loader and a test data loader, a model, a
14644
23:30:30,800 --> 23:30:36,560
loss function, an optimizer, and maybe a device. And I think you should be pretty on your way to
14645
23:30:36,560 --> 23:30:41,200
all the steps we need for train. So give that a shot. But in the next video, we're going to create
14646
23:30:41,200 --> 23:30:47,600
a function that combines train step and test step to train a model. I'll see you there.
14647
23:30:51,360 --> 23:30:56,880
How'd you go? In the last video, I issued you the challenge to combine our train step function,
14648
23:30:56,880 --> 23:31:01,440
as well as our test step function together in their own function so that we could just call
14649
23:31:01,440 --> 23:31:05,840
one function that calls both of these and train a model and evaluate it, of course.
14650
23:31:05,840 --> 23:31:12,560
So let's now do that together. I hope you gave it a shot. That's what it's all about. So we're
14651
23:31:12,560 --> 23:31:18,240
going to create a train function. Now the role of this function is going to, as I said, combine
14652
23:31:18,240 --> 23:31:25,040
train step and test step. Now we're doing all of this on purpose, right, because we want to not
14653
23:31:25,040 --> 23:31:30,000
have to rewrite all of our code all the time. So we want to be functionalizing as many things as
14654
23:31:30,000 --> 23:31:35,600
possible, so that we can just import these later on, if we wanted to train more models and just
14655
23:31:35,600 --> 23:31:41,040
leverage the code that we've written before, as long as it works. So let's see if it does,
14656
23:31:41,040 --> 23:31:48,080
we're going to create a train function. I'm going to first import TQDM, TQDM.auto,
14657
23:31:48,080 --> 23:31:52,560
because I'd like to get a progress bar while our model is training. There's nothing quite like
14658
23:31:52,560 --> 23:32:01,440
watching a neural network train. So step number one is we need to create a train function that takes
14659
23:32:01,440 --> 23:32:13,120
in various model parameters, plus optimizer, plus data loaders, plus a loss function. A whole
14660
23:32:13,120 --> 23:32:18,800
bunch of different things. So let's create def train. And I'm going to pass in a model here,
14661
23:32:19,600 --> 23:32:25,600
which is going to be torch and then dot module. You'll notice that the inputs of this are going
14662
23:32:25,600 --> 23:32:31,680
to be quite similar to our train step and test step. I don't actually need that there.
14663
23:32:32,880 --> 23:32:39,680
So we also want a train data loader for the training data, torch dot utils dot data dot data
14664
23:32:39,680 --> 23:32:47,440
loader. And we also want a test data loader, which is going to be torch dot utils dot data
14665
23:32:47,440 --> 23:32:54,000
dot data loader. And then we want an optimizer. So the optimizer will only be used with our
14666
23:32:54,000 --> 23:32:59,360
training data set, but that's okay. We can take it as an input of the miser. And then we want a
14667
23:32:59,360 --> 23:33:04,880
loss function. This will generally be used for both our training and testing step. Because that's
14668
23:33:04,880 --> 23:33:09,520
what we're combining here. Now, since we're working with multi class classification,
14669
23:33:09,520 --> 23:33:14,640
I'm going to set our loss function to be a default of an n dot cross entropy loss.
14670
23:33:15,920 --> 23:33:21,120
Then I'm going to get epochs. I'm going to set five, we'll train for five epochs by default.
14671
23:33:21,120 --> 23:33:27,440
And then finally, I'm going to set the device equal to the device. So what do we get wrong here?
14672
23:33:30,240 --> 23:33:34,000
That's all right. We'll just keep coding. We'll ignore these little red lines. If they
14673
23:33:34,000 --> 23:33:39,680
stay around, we'll come back to them. So step number two, I'm going to create. This is a step
14674
23:33:39,680 --> 23:33:44,960
you might not have seen, but I'm going to create an empty results dictionary. Now, this is going
14675
23:33:44,960 --> 23:33:51,040
to help us track our results. Do you recall in a previous notebook, we outputted a model dictionary
14676
23:33:51,040 --> 23:33:56,880
for how a model went. So if we look at model one results, yeah, we got a dictionary like this.
14677
23:33:57,520 --> 23:34:03,520
So I'd like to create one of these on the fly, but keep track of the result every epoch. So what
14678
23:34:03,520 --> 23:34:09,840
was the loss on epoch number zero? What was the accuracy on epoch number three? So we'll show you
14679
23:34:09,840 --> 23:34:14,160
how I'll do that. We can use a dictionary and just update that while our model trains.
14680
23:34:14,880 --> 23:34:20,000
So results, I want to keep track of the train loss. So we're going to set that equal to an empty
14681
23:34:20,000 --> 23:34:25,840
list and just append to it. I also want to keep track of the train accuracy. We'll set that as
14682
23:34:25,840 --> 23:34:32,240
an empty list as well. I also want to keep track of the test loss. And I also want to keep track
14683
23:34:32,240 --> 23:34:38,080
of the test accuracy. Now, you'll notice over time that these, what you can track is actually
14684
23:34:38,080 --> 23:34:44,640
very flexible. And what your functions can do is also very flexible. So this is not the gold
14685
23:34:44,640 --> 23:34:50,480
standard of doing anything by any means. It's just one way that works. And you'll probably find in
14686
23:34:50,480 --> 23:34:57,280
the future that you need different functionality. And of course, you can code that out. So let's
14687
23:34:57,280 --> 23:35:05,440
now loop through our epochs. So for epoch in TQDM, let's create a range of our epochs above.
14688
23:35:06,160 --> 23:35:09,440
And then we can set the train loss. Have I missed a comma up here somewhere?
14689
23:35:09,440 --> 23:35:17,840
Type annotation not supported for that type of expression. Okay, that's all right. We'll just leave
14690
23:35:17,840 --> 23:35:24,160
that there. So we're going to go train loss and train act, recall that our train step function
14691
23:35:24,800 --> 23:35:30,400
that we created in the previous video, train step returns our train loss and train act. So as I
14692
23:35:30,400 --> 23:35:36,800
said, I want to keep track of these throughout our training. So I'm going to get them from train
14693
23:35:36,800 --> 23:35:43,200
step. Then for each epoch in our range of epochs, we're going to pass in our model and perform a
14694
23:35:43,200 --> 23:35:48,080
training step. So the data loader here is of course going to be the train data loader. The
14695
23:35:48,080 --> 23:35:52,320
loss function is just going to be the loss function that we pass into the train function.
14696
23:35:52,880 --> 23:35:59,520
And then the optimizer is going to be the optimizer. And then the device is going to be device.
14697
23:36:00,160 --> 23:36:03,920
Beautiful. Look at that. We just performed a training step in five lines of code.
14698
23:36:03,920 --> 23:36:08,560
So let's keep pushing forward. It's telling us we've got a whole bunch of different things here.
14699
23:36:08,560 --> 23:36:14,240
Epox is not defined. Maybe we just have to get rid of this. We can't have the type annotation here.
14700
23:36:14,240 --> 23:36:22,160
And that'll that'll stop. That'll stop Google Colab getting angry at us. If it does anymore,
14701
23:36:22,160 --> 23:36:28,400
I'm just going to ignore it for now. Epox. Anyway, we'll leave it at that. We'll find out if there's
14702
23:36:28,400 --> 23:36:33,920
an error later on. Test loss. You might be able to find it before I do. So test step. We're going
14703
23:36:33,920 --> 23:36:39,200
to pass in the model. We're going to pass in a data loader. Now this is going to be the test data
14704
23:36:39,200 --> 23:36:46,080
loader. Look at us go. Grading training and test step functions, loss function. And then we don't
14705
23:36:46,080 --> 23:36:49,280
need an optimizer. We're just going to pass in the device. And then behind the scenes,
14706
23:36:50,160 --> 23:36:56,320
both of these functions are going to train and test our model. How cool is that? So still within
14707
23:36:56,320 --> 23:37:02,640
the loop. This is important. Within the loop, we're going to have number four is we're going to
14708
23:37:02,640 --> 23:37:09,120
print out. Let's print out what's happening. Print out what's happening. We can go print.
14709
23:37:09,120 --> 23:37:14,720
And we'll do a fancy little print statement here. We'll get the epoch. And then we will get
14710
23:37:15,280 --> 23:37:20,800
the train loss, which will be equal to the train loss. We'll get that to, let's go
14711
23:37:20,800 --> 23:37:26,880
four decimal places. How about that? And then we'll get the train accuracy, which is going to be the
14712
23:37:26,880 --> 23:37:34,720
train act. We'll get that to four, maybe three decimal of four, just for just so it looks nice.
14713
23:37:34,720 --> 23:37:39,840
It looks aesthetic. And then we'll go test loss. We'll get that coming out here. And we'll pass
14714
23:37:39,840 --> 23:37:45,120
in the test loss. We'll get that to four decimal places as well. And then finally, we'll get the
14715
23:37:45,120 --> 23:37:52,480
test accuracy. So a fairly long print statement here. But that's all right. We'd like to see how
14716
23:37:52,480 --> 23:37:59,200
our model is doing while it's training. Beautiful. And so again, still within the epoch, we want to
14717
23:37:59,200 --> 23:38:04,000
update our results dictionary so that we can keep track of how our model performed over time.
14718
23:38:04,640 --> 23:38:11,040
So let's pass in results. We want to update the train loss. And so this is going to be this.
14719
23:38:11,040 --> 23:38:21,840
And then we can append our train loss value. So this is just going to expend the list in here
14720
23:38:21,840 --> 23:38:27,840
with the train loss value, every epoch. And then we'll do the same thing on the train accuracy,
14721
23:38:27,840 --> 23:38:41,280
append train act. And then we'll do the same thing again with test loss dot append test loss.
14722
23:38:41,280 --> 23:38:51,280
And then we will finally do the same thing with the test accuracy test accuracy. Now,
14723
23:38:51,280 --> 23:38:56,720
this is a pretty big function. But this is why we write the code now so that we can use it
14724
23:38:56,720 --> 23:39:03,360
multiple times later on. So return the field results at the end of the epoch. So outside the
14725
23:39:03,360 --> 23:39:12,400
epochs loop. So our loop, we're outside it now. Let's return results. Now, I've probably got an
14726
23:39:12,400 --> 23:39:16,640
error somewhere here and you might be able to spot it. Okay, train data loader. Where do we get
14727
23:39:16,640 --> 23:39:23,040
that invalid syntax? Maybe up here, we don't have a comma here. Was that the issue the whole time?
14728
23:39:23,040 --> 23:39:30,720
Wonderful. You might have seen that I'm completely missed that. But we now have a train function
14729
23:39:30,720 --> 23:39:34,960
to train our model. And the train function, of course, is going to call out our train step
14730
23:39:34,960 --> 23:39:42,480
function and our test step function. So what's left to do? Well, nothing less than train and
14731
23:39:42,480 --> 23:39:50,480
evaluate model zero. So our model is way back up here. How about in the next video, we leverage
14732
23:39:50,480 --> 23:39:56,080
our functions, namely just the train function, because it's going to call our train step function
14733
23:39:56,080 --> 23:40:01,600
and our test step function and train our model. So I'm going to encourage you to give that a go.
14734
23:40:01,600 --> 23:40:05,040
You're going to have to go back to the workflow. Maybe you'll maybe already know this.
14735
23:40:06,640 --> 23:40:11,520
So what have we done? We've got our data ready and we turned it into tenses using a combination
14736
23:40:11,520 --> 23:40:16,560
of these functions. We've built and picked a model while we've built a model, which is the
14737
23:40:16,560 --> 23:40:22,720
tiny VGG architecture. Have we created a loss function yet? I don't think we have or an optimizer.
14738
23:40:23,520 --> 23:40:27,920
I don't think we've done that yet. We've definitely built a training loop though.
14739
23:40:28,960 --> 23:40:32,720
We aren't using torch metrics. We're just using accuracy, but we could use this if we want.
14740
23:40:33,280 --> 23:40:37,520
We haven't improved through experimentation yet, but we're going to try this later on and
14741
23:40:37,520 --> 23:40:43,600
then save and reload the model. We've seen this before. So I think we're up to picking a loss
14742
23:40:43,600 --> 23:40:49,760
function and an optimizer. So give that a shot. In the next video, we're going to create a loss
14743
23:40:49,760 --> 23:40:54,000
function and an optimizer and then leverage the functions we've spent in the last two videos
14744
23:40:54,000 --> 23:41:01,520
creating to train our first model model zero on our own custom data set. This is super exciting.
14745
23:41:02,080 --> 23:41:03,120
I'll see you in the next video.
14746
23:41:06,480 --> 23:41:09,840
Who's ready to train and evaluate model zero? Put your hand up.
14747
23:41:09,840 --> 23:41:17,920
I definitely am. So let's do it together. We're going to start off section 7.7 and we're going
14748
23:41:17,920 --> 23:41:24,480
to put in train and evaluate model zero, our baseline model on our custom data set. Now,
14749
23:41:25,120 --> 23:41:29,680
if we refer back to the PyTorch workflow, I issued you the challenge in the last video to try and
14750
23:41:29,680 --> 23:41:34,640
create a loss function and an optimizer. I hope you gave that a go, but we've already built a
14751
23:41:34,640 --> 23:41:40,480
training loop. So we're going to leverage our training loop functions, namely train, train step
14752
23:41:40,480 --> 23:41:47,440
and test step. All we need to do now is instantiate a model, choose a loss function and an optimizer
14753
23:41:47,440 --> 23:41:54,560
and pass those values to our training function. So let's do that. All right, this is so exciting.
14754
23:41:54,560 --> 23:42:05,280
Let's set the random seeds. I'm going to set torch manual seed 42 and torch cuda manual seed 42.
14755
23:42:05,280 --> 23:42:09,760
Now remember, I just want to highlight something. I read an article the other day about not using
14756
23:42:09,760 --> 23:42:16,320
random seeds. The reason why we are using random seeds is for educational purposes. So to try and
14757
23:42:16,320 --> 23:42:21,200
get our numbers on my screen and your screen as close as possible, but in practice, you quite
14758
23:42:21,200 --> 23:42:27,600
often don't use random seeds all the time. The reason why is because you want your models performance
14759
23:42:27,600 --> 23:42:34,400
to be similar regardless of the random seed that you use. So just keep that in mind going forward.
14760
23:42:34,400 --> 23:42:41,280
We're using random seeds to just exemplify how we can get similar numbers on our page. But
14761
23:42:41,280 --> 23:42:46,720
ideally, no matter what the random seed was, our models would go in the same direction.
14762
23:42:46,720 --> 23:42:53,040
That's where we want our models to eventually go. But we're going to train for five epochs.
14763
23:42:53,680 --> 23:43:00,160
And now let's create a recreate an instance of tiny VGG. We can do so because we've created the
14764
23:43:00,160 --> 23:43:07,120
tiny VGG class. So tiny VGG, which is our model zero. We don't have to do this, but we're going
14765
23:43:07,120 --> 23:43:13,600
to do it any later. So we've got all the code in one place, tiny VGG. What is our input shape
14766
23:43:13,600 --> 23:43:21,280
going to be? That is the number of color channels of our target images. And because we're dealing
14767
23:43:21,280 --> 23:43:26,080
with color images, we have an input shape of three. Previously, we used an input shape of one to
14768
23:43:26,080 --> 23:43:32,240
deal with grayscale images. I'm going to set hidden units to 10 in line with the CNN explainer website.
14769
23:43:32,880 --> 23:43:38,960
And the output shape is going to be the number of classes in our training data set. And then,
14770
23:43:38,960 --> 23:43:46,400
of course, we're going to send the target model to the target device. So what do we do now?
14771
23:43:47,040 --> 23:43:52,720
Well, we set up a loss function and an optimizer, loss function, and optimizer.
14772
23:43:53,360 --> 23:43:58,800
So our loss function is going to be because we're dealing with multiclass classification,
14773
23:43:58,800 --> 23:44:05,520
and then cross entropy, if I could spell cross entropy loss. And then we're going to have an
14774
23:44:05,520 --> 23:44:10,720
optimizer. This time, how about we mix things up? How about we try the atom optimizer? Now,
14775
23:44:10,720 --> 23:44:14,880
of course, the optimizer is one of the hyper parameters that you can set for your model,
14776
23:44:14,880 --> 23:44:20,240
and a hyper parameter being a value that you can set yourself. So the parameters that we want to
14777
23:44:20,240 --> 23:44:29,120
optimize are our model zero parameters. And we're going to set a learning rate of 0.001. Now,
14778
23:44:29,120 --> 23:44:34,320
recall that you can tweet this learning rate, if you like, but I believe, did I just see that
14779
23:44:34,320 --> 23:44:43,440
the default learning rate of atom is 0.001? Yeah, there we go. So Adam's default learning rate is
14780
23:44:43,440 --> 23:44:49,920
one to the power of 10 to the negative three. And so that is a default learning rate for Adam.
14781
23:44:49,920 --> 23:44:57,280
And as I said, oftentimes, different variables in the pytorch library, such as optimizers,
14782
23:44:57,280 --> 23:45:02,240
have good default values that work across a wide range of problems. So we're just going to stick
14783
23:45:02,240 --> 23:45:06,320
with the default. If you want to, you can experiment with different values of this.
14784
23:45:07,360 --> 23:45:11,680
But now let's start the timer, because we want to time our models.
14785
23:45:13,440 --> 23:45:20,560
We're going to import from time it. We want to get the default timer class. And I'm going to
14786
23:45:20,560 --> 23:45:26,960
import that as timer, just so we don't have to type out default timer. So the start time is going
14787
23:45:26,960 --> 23:45:32,320
to be timer. This is going to just put a line in the sand of what the start time is at this
14788
23:45:32,320 --> 23:45:37,680
particular line of code. It's going to measure that. And then we're going to train model zero.
14789
23:45:38,240 --> 23:45:44,800
Now this is using, of course, our train function. So let's write model zero results, and then
14790
23:45:44,800 --> 23:45:52,400
they wrote model one, but we're not up to there yet. So let's go train model equals model zero.
14791
23:45:52,400 --> 23:45:58,400
And this is just the training function that we wrote in a previous video. And the train data
14792
23:45:58,400 --> 23:46:03,840
is going to be our train data loader. And we've got train data loader simple, because we're not
14793
23:46:03,840 --> 23:46:10,240
using data augmentation for model one. And then our test data loader is going to be our test data
14794
23:46:10,240 --> 23:46:15,440
loader simple. And then we're going to set our optimizer, which is equal to the optimizer we just
14795
23:46:15,440 --> 23:46:21,360
created. Friendly atom optimizer. And the loss function is going to be the loss function that
14796
23:46:21,360 --> 23:46:28,880
we just created, which is an n cross entropy loss. Finally, we can send in epochs is going to be
14797
23:46:28,880 --> 23:46:35,200
num epochs, which is what we set at the start of this video to five. And of course, we could train
14798
23:46:35,200 --> 23:46:39,680
our model for longer if we wanted to. But the whole idea of when you first start training a model
14799
23:46:39,680 --> 23:46:44,560
is to keep your experiments quick. So that's why we're only training for five, maybe later on you
14800
23:46:44,560 --> 23:46:50,640
train for 10, 20, tweak the learning rate, do a whole bunch of different things. But let's go
14801
23:46:50,640 --> 23:46:56,080
down here, let's end the timer, see how long our models took to train, and the timer and print out
14802
23:46:58,640 --> 23:47:05,200
how long it took. So in a previous section, we created a helper function for this.
14803
23:47:06,160 --> 23:47:10,000
We're just going to simplify it in this section. And we're just going to print out how long the
14804
23:47:10,000 --> 23:47:19,280
training time was. Total training time. Let's go n time minus start time. And then we're going to go
14805
23:47:19,280 --> 23:47:26,960
point, we'll take it to three decimal places, hey, seconds, you ready to train our first model,
14806
23:47:26,960 --> 23:47:33,920
our first convolutional neural network on our own custom data set on pizza, stake and sushi
14807
23:47:33,920 --> 23:47:43,040
images. Let's do it. You're ready? Three, two, one, no errors. Oh, there we go. Okay,
14808
23:47:43,760 --> 23:47:48,480
should this be trained data loader? Did you notice that? What is our trained data
14809
23:47:48,480 --> 23:47:55,200
taker's input? Oh, we're not getting a doc string. Oh, there we go. We want trained data
14810
23:47:55,200 --> 23:48:03,440
loader, data loader, and same with this, I believe. Let's try again. Beautiful. Oh, look at that
14811
23:48:03,440 --> 23:48:10,400
lovely progress bar. Okay, how's our model is training quite fast? Okay. All right, what do we
14812
23:48:10,400 --> 23:48:17,200
get? So we get an accuracy on the training data set of about 40%. And we get an accuracy on the
14813
23:48:17,200 --> 23:48:25,840
test data set of about 50%. Now, what's that telling us? It's telling us that about 50% of the time
14814
23:48:25,840 --> 23:48:32,000
our model is getting the prediction correct. But we've only got three classes. So even if our model
14815
23:48:32,000 --> 23:48:40,800
was guessing, it would get things right 33% of the time. So even if you just guessed pizza every
14816
23:48:40,800 --> 23:48:46,480
single time, because we only have three classes, if you guessed pizza every single time, you get
14817
23:48:46,480 --> 23:48:53,120
a baseline accuracy of 33%. So our model isn't doing too much better than our baseline accuracy.
14818
23:48:53,120 --> 23:48:57,600
Of course, we'd like this number to go higher, and maybe it would if it trained for longer.
14819
23:48:58,480 --> 23:49:03,520
So I'll let you experiment with that. But if you'd like to see some different methods of
14820
23:49:03,520 --> 23:49:10,720
improving a model, recall back in section number O two, we had an improving a model section,
14821
23:49:10,720 --> 23:49:16,160
improving a model. Here we go. So here's some things you might want to try.
14822
23:49:18,160 --> 23:49:24,160
We can improve a model by adding more layers. So if we come back to our tiny VGG architecture,
14823
23:49:25,520 --> 23:49:33,840
right up here, we're only using two convolutional blocks. Perhaps you wanted to add in a convolutional
14824
23:49:33,840 --> 23:49:39,920
block three. You can also add more hidden units. Right now we're using 10 hidden units. You might
14825
23:49:39,920 --> 23:49:44,960
want to double that and see what happens. Fitting for longer. This is what we just spoke about.
14826
23:49:44,960 --> 23:49:50,800
So right now we're only fitting for five epochs. So if you maybe wanted to try double that again,
14827
23:49:50,800 --> 23:49:57,120
and then even double that again, changing the activation functions. So maybe relu is not the
14828
23:49:57,120 --> 23:50:02,720
ideal activation function for our specific use case. Change the learning rate. We've spoken
14829
23:50:02,720 --> 23:50:08,880
about that before. So right now our learning rate is 0.001 for Adam, which is the default.
14830
23:50:08,880 --> 23:50:14,000
But perhaps there's a better learning rate out there. Change the loss function. This is probably not
14831
23:50:15,200 --> 23:50:19,680
in our case, not going to help too much because cross entropy loss is a pretty good loss for
14832
23:50:19,680 --> 23:50:24,880
multi class classification. But these are some things that you could try these first three,
14833
23:50:24,880 --> 23:50:29,760
especially. You could try quite quickly. You could try doubling the layers. You could try
14834
23:50:29,760 --> 23:50:34,320
adding more hidden units. And you could try fitting for longer. So I'd give that a shot.
14835
23:50:34,320 --> 23:50:42,960
But in the next video, we're going to take our model zero results, which is a dictionary or at
14836
23:50:42,960 --> 23:50:49,760
least it should be. And we're going to plot some loss curves. So this is a good way to inspect how
14837
23:50:49,760 --> 23:50:55,440
our model is training. Yes, we've got some values here. Let's plot these in the next video. I'll see you there.
14838
23:50:55,440 --> 23:51:06,400
In the last video, we trained our first convolutional neural network on custom data. So you should be
14839
23:51:06,400 --> 23:51:12,000
very proud of that. That is no small feat to take our own data set of whatever we want
14840
23:51:12,000 --> 23:51:17,200
and train apply to its model on it. However, we did find that it didn't perform as well as we'd
14841
23:51:17,200 --> 23:51:22,800
like it to. We also highlighted a few different things that we could try to do to improve it.
14842
23:51:22,800 --> 23:51:29,360
But now let's plot our models results using a loss curve. So I'm going to write another heading
14843
23:51:29,360 --> 23:51:39,840
down here. We'll go, I believe we're up to 7.8. So plot the loss curves of model zero. So what
14844
23:51:39,840 --> 23:51:49,120
is a loss curve? So I'm going to write down here, a loss curve is a way of tracking your models
14845
23:51:49,120 --> 23:51:55,280
progress over time. So if we just looked up Google and we looked up loss curves,
14846
23:51:56,720 --> 23:52:02,960
oh, there's a great guide by the way. I'm going to link this. But I'd rather if and doubt code it
14847
23:52:02,960 --> 23:52:09,440
out than just look at guides. Yeah, loss curves. So yeah, loss over time. So there's our loss value
14848
23:52:09,440 --> 23:52:15,120
on the left. And there's say steps, which is epochs or batches or something like that.
14849
23:52:15,120 --> 23:52:19,680
Then we've got a whole bunch of different loss curves over here. Essentially, what we want it
14850
23:52:19,680 --> 23:52:27,120
to do is go down over time. So that's the idea loss curve. Let's go back down here.
14851
23:52:28,480 --> 23:52:37,040
And a good guide for different loss curves can be seen here. We're not going to go through that
14852
23:52:37,040 --> 23:52:42,080
just yet. Let's focus on plotting our own models, loss curves, and we can inspect those.
14853
23:52:42,080 --> 23:52:51,920
Let's get the model keys. Get the model zero results keys. I'm going to type in model zero
14854
23:52:51,920 --> 23:52:58,240
results dot keys because it's a dictionary. Let's see if we can write some code to plot these
14855
23:52:59,760 --> 23:53:06,000
values here. So yeah, over time. So we have one value for train loss, train,
14856
23:53:06,000 --> 23:53:12,240
act, test loss, and test act for every epoch. And of course, these lists would be longer if we
14857
23:53:12,240 --> 23:53:18,400
train for more epochs. But let's just how about we create a function called def plot loss curves,
14858
23:53:18,400 --> 23:53:26,880
which will take in a results dictionary, which is of string and a list of floats. So this just
14859
23:53:26,880 --> 23:53:33,600
means that our results parameter here is taking in a dictionary that has a string as a key.
14860
23:53:33,600 --> 23:53:42,000
And it contains a list of floats. That's what this means here. So let's write a doc string
14861
23:53:42,000 --> 23:53:53,200
plots training curves of a results dictionary. Beautiful. And so we're in this section of our
14862
23:53:53,200 --> 23:53:58,880
workflow, which is kind of like a, we're kind of doing something similar to TensorBoard, what it
14863
23:53:58,880 --> 23:54:02,720
does. I'll let you look into that if you want to. Otherwise, we're going to see it later on.
14864
23:54:02,720 --> 23:54:08,160
But we're really evaluating our model here. Let's write some plotting code. We're going to use map plot
14865
23:54:08,160 --> 23:54:17,520
lib. So we want to get the lost values of the results dictionary. So this is training and test.
14866
23:54:19,040 --> 23:54:24,080
Let's set loss equal to results train loss. So this is going to be the loss on the training
14867
23:54:24,080 --> 23:54:30,800
data set. And then we'll create the test loss, which is going to be, well, index on the results
14868
23:54:30,800 --> 23:54:36,640
dictionary and get the test loss. Beautiful. Now we'll do the same and we'll get the accuracy.
14869
23:54:37,280 --> 23:54:43,680
Get the accuracy values of the results dictionary. So training and test.
14870
23:54:44,880 --> 23:54:50,080
Then we're going to go accuracy equals results. This will be the training accuracy train
14871
23:54:50,080 --> 23:55:00,000
act and accuracy. Oh, we'll call this test accuracy actually test accuracy equals results test act.
14872
23:55:00,800 --> 23:55:05,280
Now let's create a number of epochs. So we want to figure out how many epochs we did. We can do
14873
23:55:05,280 --> 23:55:12,880
that by just counting the length of this value here. So figure out how many epochs there were.
14874
23:55:13,520 --> 23:55:18,640
So we'll set epochs equal to a range because we want to plot it over time. Our models results
14875
23:55:18,640 --> 23:55:25,360
over time. That's that's the whole idea of a loss curve. So we'll just get the the length of
14876
23:55:26,320 --> 23:55:33,360
our results here. And we'll get the range. So now we can set up a plot.
14877
23:55:34,960 --> 23:55:41,200
Let's go PLT dot figure. And we'll set the fig size equal to something nice and big because
14878
23:55:41,200 --> 23:55:47,120
we're going to do four plots. We want one for maybe two plots, one for the loss, one for the accuracy.
14879
23:55:47,120 --> 23:55:57,520
And then we'll go plot the loss. PLT dot subplot. We're going to create one row, two columns,
14880
23:55:57,520 --> 23:56:03,920
and index number one. We want to put PLT dot plot. And here's where we're going to plot the
14881
23:56:03,920 --> 23:56:12,480
training loss. So we get that a label of train loss. And then we'll add another plot with epochs
14882
23:56:12,480 --> 23:56:20,560
and test loss. The label here is going to be test loss. And then we'll add a title, which will be
14883
23:56:20,560 --> 23:56:28,000
loss PLT. Let's put a label on the X, which will be epochs. So we know how many steps we've done.
14884
23:56:28,000 --> 23:56:33,920
This plot over here, loss curves, it uses steps. I'm going to use epochs. They mean almost the
14885
23:56:33,920 --> 23:56:41,200
same thing. It depends on what scale you'd like to see your loss curves. We'll get a legend as well
14886
23:56:41,200 --> 23:56:48,560
so that we are the labels appear. Now we're going to plot the accuracy. So PLT dot subplot.
14887
23:56:49,120 --> 23:56:54,560
Let's go one, two, and then index number two that this plot's going to be on PLT dot plot.
14888
23:56:55,120 --> 23:57:02,880
We're going to go epochs accuracy. And the label here is going to be train accuracy.
14889
23:57:04,000 --> 23:57:08,000
And then we'll get on the next plot, which is actually going to be on the same plot.
14890
23:57:08,000 --> 23:57:12,560
We'll put the test accuracy. That way we have the test accuracy and the training accuracy side
14891
23:57:12,560 --> 23:57:20,480
by side, test accuracy same with the train loss and train, sorry, test loss. And then we'll give
14892
23:57:20,480 --> 23:57:25,600
our plot a title. This plot is going to be accuracy. And then we're going to give it an
14893
23:57:25,600 --> 23:57:31,120
X label, which is going to be epochs as well. And then finally, we'll get the plot, but legend,
14894
23:57:31,120 --> 23:57:36,400
a lot of plotting code here. But let's see what this looks like. Hey, if we've done it all right,
14895
23:57:36,400 --> 23:57:42,640
we should be able to pass it in a dictionary just like this and see some nice plots like this.
14896
23:57:44,480 --> 23:57:54,400
Let's give it a go. And I'm going to call plot loss curves. And I'm going to pass in model 0 results.
14897
23:57:58,960 --> 23:58:06,000
All righty then. Okay. So that's not too bad. Now, why do I say that? Well, because we're
14898
23:58:06,000 --> 23:58:12,400
looking here for mainly trends, we haven't trained our model for too long. Quantitatively, we know
14899
23:58:12,400 --> 23:58:18,720
that our model hasn't performed at the way we'd like it to do. So we'd like the accuracy on both
14900
23:58:18,720 --> 23:58:23,680
the train and test data sets to be higher. And then of course, if the accuracy is going higher,
14901
23:58:23,680 --> 23:58:31,520
then the loss is going to come down. So the ideal trend for a loss curve is to go down from
14902
23:58:31,520 --> 23:58:38,160
the top left to the bottom right. In other words, the loss is going down over time. So that's,
14903
23:58:38,160 --> 23:58:44,000
the trend is all right here. So potentially, if we train for more epochs, which I'd encourage
14904
23:58:44,000 --> 23:58:49,680
you to give it a go, our model's loss might get lower. And the accuracy is also trending in the
14905
23:58:49,680 --> 23:58:56,800
right way. Our accuracy, we want it to go up over time. So if we train for more epochs, these curves
14906
23:58:56,800 --> 23:59:02,880
may continue to go on. Now, they may not, they, you never really know, right? You can guess these
14907
23:59:02,880 --> 23:59:08,640
things. But until you try it, you don't really know. So in the next video, we're going to have a
14908
23:59:08,640 --> 23:59:12,800
look at some different forms of loss curves. But before we do that, I'd encourage you to go through
14909
23:59:12,800 --> 23:59:19,360
this guide here, interpreting loss curves. So I feel like if you just search out loss curves,
14910
23:59:19,360 --> 23:59:23,200
you're going to find Google's guide, or you could just search interpreting loss curves.
14911
23:59:23,200 --> 23:59:29,840
Because as you'll see, there's many different ways that loss curves can be interpreted. But the ideal
14912
23:59:29,840 --> 23:59:37,520
trend is for the loss to go down over time, and metrics like accuracy to go up over time.
14913
23:59:38,240 --> 23:59:43,280
So in the next video, let's cover a few different forms of loss curves, such as the ideal loss
14914
23:59:43,280 --> 23:59:47,440
curve, what it looks like when your model's underfitting, and what it looks like when your
14915
23:59:47,440 --> 23:59:52,400
model's overfitting. And if you'd like to have a primer on those things, I'd read through this
14916
23:59:52,400 --> 23:59:57,120
guide here. Don't worry too much if you're not sure what's happening. We're going to cover a bit
14917
23:59:57,120 --> 24:00:05,440
more about loss curves in the next video. I'll see you there. In the last video, we looked at our
14918
24:00:05,440 --> 24:00:12,560
model's loss curves, and also the accuracy curves. And a loss curve is a way to evaluate a model's
14919
24:00:12,560 --> 24:00:18,000
performance over time, such as how long it was training for. And as you'll see, if you Google
14920
24:00:18,000 --> 24:00:24,240
some images of loss curves, you'll see many different types of loss curves. They come in all
14921
24:00:24,240 --> 24:00:29,200
different shapes and sizes. And there's many different ways to interpret loss curves. So
14922
24:00:29,200 --> 24:00:34,560
this is Google's testing and debugging and machine learning guide. So I'm going to set this as
14923
24:00:34,560 --> 24:00:41,360
actually curriculum for this section. So we're up to number eight. Let's have a look at what should
14924
24:00:41,360 --> 24:00:54,400
an ideal loss curve look like. So we'll just link that in there. Now, loss curve, I'll just
14925
24:00:54,400 --> 24:01:05,520
rewrite here, is a loss curve is, I'll just make some space. A loss curve is one of the most
14926
24:01:05,520 --> 24:01:15,440
helpful ways to troubleshoot a model. So the trend of a loss curve, you want it to go down over time,
14927
24:01:15,440 --> 24:01:20,480
and the trend typically of an evaluation metric, like accuracy, you want it to go up over time.
14928
24:01:20,480 --> 24:01:28,720
So let's go into the keynote, loss curves. So a way to evaluate your model's performance over time.
14929
24:01:28,720 --> 24:01:34,320
These are three of the main different forms of loss curve that you'll face. But again,
14930
24:01:34,320 --> 24:01:39,600
there's many different types as mentioned in here, interpreting loss curves. Sometimes you get it
14931
24:01:39,600 --> 24:01:45,200
going all over the place. Sometimes your loss will explode. Sometimes your metrics will be
14932
24:01:45,200 --> 24:01:50,400
contradictory. Sometimes your testing loss will be higher than your training loss. We'll have a
14933
24:01:50,400 --> 24:01:54,400
look at what that is. Sometimes your model gets stuck. In other words, the loss doesn't reduce.
14934
24:01:55,040 --> 24:02:00,480
Let's have a look at some loss curves here in the case of underfitting, overfitting, and just
14935
24:02:00,480 --> 24:02:07,280
right. So this is the Goldilocks zone. Underfitting is when your model's loss on the training and
14936
24:02:07,280 --> 24:02:13,600
test data sets could be lower. So in our case, if we go back to our loss curves, of course,
14937
24:02:13,600 --> 24:02:19,120
we want this to be lower, and we want our accuracy to be higher. So from our perspective,
14938
24:02:19,120 --> 24:02:24,640
it looks like our model is underfitting. And we would probably want to train it for longer,
14939
24:02:24,640 --> 24:02:30,560
say, 10, 20 epochs to see if this train continues. If it keeps going down, it may stop underfitting.
14940
24:02:31,200 --> 24:02:39,600
So underfitting is when your loss could be lower. Now, the inverse of underfitting is called
14941
24:02:39,600 --> 24:02:43,760
overfitting. And so two of the biggest problems in machine learning is trying to
14942
24:02:45,200 --> 24:02:50,640
underfitting. So in other words, make your loss lower and also reduce overfitting. These are
14943
24:02:50,640 --> 24:02:55,520
both active areas of research because you always want your model to perform better,
14944
24:02:55,520 --> 24:03:00,480
but you also want it to perform pretty much the same on the training set as it does the test set.
14945
24:03:01,280 --> 24:03:08,960
And so overfitting would be when your training loss is lower than your testing loss. And why
14946
24:03:08,960 --> 24:03:14,000
would this be overfitting? So it means overfitting because your model is essentially learning the
14947
24:03:14,000 --> 24:03:19,600
training data too well. And that means the loss goes down on the training data set,
14948
24:03:19,600 --> 24:03:26,640
which is typically a good thing. However, this learning is not reflected in the testing data set.
14949
24:03:27,200 --> 24:03:32,320
So your model is essentially memorizing patterns in the training data set that don't
14950
24:03:32,320 --> 24:03:38,320
generalize well to the test data set. So this is where we come to the just right curve is that we
14951
24:03:38,320 --> 24:03:45,760
want, ideally, our training loss to reduce as much as our test loss. And quite often, you'll find
14952
24:03:45,760 --> 24:03:51,040
that the loss is slightly lower on the training set than it is on the test set. And that's just
14953
24:03:51,040 --> 24:03:56,160
because the model is exposed to the training data, and it's never seen the test data before.
14954
24:03:56,800 --> 24:04:01,200
So it might be a little bit lower on the training data set than on the test data set.
14955
24:04:02,000 --> 24:04:08,240
So underfitting, the model's loss could be lower. Overfitting, the model is learning the training
14956
24:04:08,240 --> 24:04:13,760
data too well. Now, this would be equivalent to say you were studying for a final exam,
14957
24:04:13,760 --> 24:04:19,440
and you just memorize the course materials, the training set. And when it came time to the final
14958
24:04:19,440 --> 24:04:25,280
exam, because you don't even memorize the course materials, you couldn't adapt those skills to
14959
24:04:25,280 --> 24:04:31,120
questions you hadn't seen before. So the final exam would be the test set. So that's overfitting.
14960
24:04:31,120 --> 24:04:37,520
The train loss is lower than the test loss. And just right, ideally, you probably won't see
14961
24:04:37,520 --> 24:04:44,160
loss curves this exact smooth. I mean, they might be a little bit jumpy. Ideally, your training loss
14962
24:04:44,160 --> 24:04:48,960
and test loss go down at a similar rate. And of course, there's more combinations of these. If
14963
24:04:48,960 --> 24:04:53,760
you'd like to see them, check out the Google's loss curve guide that you can check that out there.
14964
24:04:53,760 --> 24:04:58,400
That's some extra curriculum. Now, you probably want to know how do you deal with underfitting
14965
24:04:58,400 --> 24:05:02,880
and overfitting? Let's look at a few ways. We'll start with overfitting.
14966
24:05:02,880 --> 24:05:10,960
So we want to reduce overfitting. In other words, we want our model to perform just as
14967
24:05:10,960 --> 24:05:16,320
well on the training data set as it does on the test data set. So one of the best ways to
14968
24:05:16,320 --> 24:05:21,920
reduce overfitting is to get more data. So this means that our training data set will be larger.
14969
24:05:21,920 --> 24:05:28,400
Our model will be exposed to more examples. And with us, in theory, it doesn't always work.
14970
24:05:28,400 --> 24:05:33,440
These all come with a caveat, right? They don't always work as with many things in machine learning.
14971
24:05:33,440 --> 24:05:38,480
So get more data, give your model more chance to learn patterns, generalizable patterns in a
14972
24:05:38,480 --> 24:05:44,480
data set. You can use data augmentation. So make your models training data set harder to learn.
14973
24:05:45,120 --> 24:05:50,880
So we've seen a few examples of data augmentation. You can get better data. So not only more data,
14974
24:05:50,880 --> 24:05:57,360
perhaps the data that you're using isn't that the quality isn't that good. So if you enhance the
14975
24:05:57,360 --> 24:06:02,480
quality of your data set, your model may be able to learn better, more generalizable patterns and
14976
24:06:02,480 --> 24:06:08,720
in turn reduce overfitting. Use transfer learning. So we're going to cover this in a later section
14977
24:06:08,720 --> 24:06:15,680
of the course. But transfer learning is taking one model that works, taking its patterns that it's
14978
24:06:15,680 --> 24:06:21,600
learned and applying it to your own data set. So for example, I'll just go into the Torch Vision
14979
24:06:21,600 --> 24:06:28,880
models library. Many of these models in here in Torch Vision, the models module, have already
14980
24:06:28,880 --> 24:06:35,680
been trained on a certain data set and such as ImageNet. And you can take the weights or the
14981
24:06:35,680 --> 24:06:41,360
patterns that these models have learned. And if they work well on an ImageNet data set, which is
14982
24:06:41,360 --> 24:06:47,520
millions of different images, you can adjust those patterns to your own problem. And oftentimes
14983
24:06:47,520 --> 24:06:52,480
that will help with overfitting. If you're still overfitting, you can try to simplify your model.
14984
24:06:52,480 --> 24:06:58,160
Usually this means taking away things like extra layers, taking away more hidden units. So say you
14985
24:06:58,160 --> 24:07:04,240
had 10 layers, you might reduce it to five layers. Why does this? What's the theory behind this?
14986
24:07:04,240 --> 24:07:10,400
Well, if you simplify your model and take away complexity from your model, you're kind of telling
14987
24:07:10,400 --> 24:07:15,280
your model, hey, use what you've got. And you're going to have to, because you've only got five
14988
24:07:15,280 --> 24:07:20,320
layers now, you're going to have to make sure that those five layers work really well, because
14989
24:07:20,320 --> 24:07:25,200
you've no longer got 10. And the same for hidden units. Say you started with 100 hidden units per
14990
24:07:25,200 --> 24:07:32,160
layer, you might reduce that to 50 and say, hey, you had 100 before. Now use those 50 and make your
14991
24:07:32,160 --> 24:07:41,440
patterns generalizable. Use learning rate decay. So the learning rate is how much your optimizer
14992
24:07:41,440 --> 24:07:52,560
updates your model's weight every step. So learning rate decay is to decay the learning rate
14993
24:07:52,560 --> 24:07:58,720
over time. So you might look this up, you can look this up, go high torch, learning rate,
14994
24:07:59,520 --> 24:08:05,120
scheduling. So what this means is you want to decrease your learning rate over time.
14995
24:08:05,120 --> 24:08:09,280
Now, I know I'm giving you a lot of different things here, but you've got this keynote as a
14996
24:08:09,280 --> 24:08:15,200
reference. So you can come across these over time. So learning rate scheduling. So we might look
14997
24:08:15,200 --> 24:08:23,120
into here, do we have schedule, scheduler, beautiful. So this is going to adjust the learning rate
14998
24:08:23,120 --> 24:08:28,720
over time. So for example, at the start of when a model is training, you might want a higher learning
14999
24:08:28,720 --> 24:08:34,720
rate. And then as the model starts to learn patterns more and more and more, you might want to reduce
15000
24:08:34,720 --> 24:08:39,760
that learning rate over time so that your model doesn't update its patterns too much
15001
24:08:39,760 --> 24:08:45,840
in later epochs. So that's the concept of learning rate scheduling. At the closer you get to convergence,
15002
24:08:46,480 --> 24:08:51,840
the lower you might want to set your learning rate, think of it like this. If you're reaching
15003
24:08:51,840 --> 24:08:57,760
for a coin at the back of a couch, can we get an image of that coin at back of couch?
15004
24:08:57,760 --> 24:09:05,760
Images. So if you're trying to reach a coin in the cushions here, so the closer you get to that coin,
15005
24:09:05,760 --> 24:09:11,760
at the beginning, you might take big steps. But then the closer you get to that coin, the smaller
15006
24:09:11,760 --> 24:09:15,920
the step you might take to pick that coin out. Because if you take a big step when you're really
15007
24:09:15,920 --> 24:09:22,480
close to the coin here, the coin might fall down the couch. The same thing with learning rate decay.
15008
24:09:22,480 --> 24:09:26,640
At the start of your model training, you might take bigger steps as your model works its way
15009
24:09:26,640 --> 24:09:32,800
down the loss curve. But then you get closer and closer to the ideal position on the loss curve.
15010
24:09:32,800 --> 24:09:36,960
You might start to lower and lower that learning rate until you get right very close to the end
15011
24:09:36,960 --> 24:09:43,680
and you can pick up the coin. Or in other words, your model can converge. And then finally, use
15012
24:09:43,680 --> 24:09:50,080
early stopping. So if we go into an image, is there early stopping here? Early stopping.
15013
24:09:50,080 --> 24:10:00,480
Loss curves early stopping. So what this means is that you stop. Yeah, there we go. So there's
15014
24:10:00,480 --> 24:10:04,800
heaps of different guides early stopping with PyTorch. Beautiful. So what this means is before
15015
24:10:04,800 --> 24:10:11,120
your testing error starts to go up, you keep track of your model's testing error. And then you stop
15016
24:10:11,120 --> 24:10:16,400
your model from training or you save the weight or you save the patterns where your model's loss
15017
24:10:16,400 --> 24:10:21,360
was the lowest. So then you could just set your model to train for an infinite amount of training
15018
24:10:21,360 --> 24:10:27,360
steps. And as soon as the testing error starts to increase for say 10 steps in a row, you go back
15019
24:10:27,360 --> 24:10:31,440
to this point here and go, I think that was where our model was the best. And the testing
15020
24:10:31,440 --> 24:10:36,640
error started to increase after that. So we're going to save that model there instead of the model
15021
24:10:36,640 --> 24:10:42,640
here. So that's the concept of early stopping. So that's dealing with overfitting. There are
15022
24:10:42,640 --> 24:10:48,560
other methods to deal with underfitting. So recall underfitting is when we have a loss that isn't as
15023
24:10:48,560 --> 24:10:54,160
low as we'd like it to be. Our model is not fitting the data very well. So it's underfitting.
15024
24:10:54,960 --> 24:10:59,440
So to reduce underfitting, you can add more layers slash units to your model. You're trying to
15025
24:10:59,440 --> 24:11:05,120
increase your model's ability to learn by adding more layers or units. You can again tweak the
15026
24:11:05,120 --> 24:11:09,440
learning rate. Perhaps your learning rate is too high to begin with and your model doesn't learn
15027
24:11:09,440 --> 24:11:14,000
very well. So you can adjust the learning rate again, just like we discussed with reaching for
15028
24:11:14,000 --> 24:11:19,120
that coin at the back of a couch. If your model is still underfitting, you can train for longer. So
15029
24:11:19,120 --> 24:11:24,720
that means giving your model more opportunities to look at the data. So more epochs, that just
15030
24:11:24,720 --> 24:11:29,600
means it's got looking at the training set over and over and over and over again and trying to
15031
24:11:29,600 --> 24:11:35,120
learn those patterns. However, you might find again, if you try to train for too long, your testing
15032
24:11:35,120 --> 24:11:40,960
error will start to go up. Your model might start overfitting if you train too long. So machine
15033
24:11:40,960 --> 24:11:45,760
learning is all about a balance between underfitting and overfitting. You want your model to fit quite
15034
24:11:45,760 --> 24:11:52,720
well. And so this is a great one. So you want your model to start fitting quite well. But then if you
15035
24:11:52,720 --> 24:11:58,320
try to reduce underfitting too much, you might start to overfit and then vice versa, right? If
15036
24:11:58,320 --> 24:12:03,520
you try to reduce overfitting too much, your model might underfit. So this is one of the
15037
24:12:03,520 --> 24:12:07,760
most fun dances in machine learning, the balance between overfitting and underfitting.
15038
24:12:08,640 --> 24:12:13,600
Finally, you might use transfer learning. So transfer learning helps with overfitting and
15039
24:12:13,600 --> 24:12:18,560
underfitting. Recall transfer learning is using a model's learned patterns from Ron problem and
15040
24:12:18,560 --> 24:12:23,280
adjusting them to your own. We're going to see this later on in the course. And then finally,
15041
24:12:23,280 --> 24:12:29,920
use less regularization. So regularization is holding your model back. So it's trying
15042
24:12:29,920 --> 24:12:34,480
to prevent overfitting. So if you do too much preventing of overfitting, in other words,
15043
24:12:34,480 --> 24:12:40,720
regularizing your model, you might end up underfitting. So if we go back, we have a look at the ideal
15044
24:12:40,720 --> 24:12:47,040
curves, underfitting. If you try to prevent underfitting too much, so increasing your model's
15045
24:12:47,040 --> 24:12:51,920
capability to learn, you might end up overfitting. And if you try to prevent overfitting too much,
15046
24:12:51,920 --> 24:12:57,760
you might end up underfitting. We are going for the just right section. And this is going to be a
15047
24:12:57,760 --> 24:13:03,600
balance between these two throughout your entire machine learning career. In fact, it's probably
15048
24:13:03,600 --> 24:13:11,040
the most prevalent area of research is trying to get models not to underfit, but also not to
15049
24:13:11,040 --> 24:13:17,120
overfit. So keep that in mind. A loss curve is a great way to evaluate your model's performance
15050
24:13:17,120 --> 24:13:23,040
over time. And a lot of what we do with the loss curves is try to work out whether our model is
15051
24:13:23,040 --> 24:13:28,480
underfitting or overfitting, and we're trying to get to this just right curve. We might not get
15052
24:13:28,480 --> 24:13:34,240
exactly there, but we want to keep trying getting as close as we can. So with that being said,
15053
24:13:34,800 --> 24:13:41,120
let's now build another model in the next video. And we're going to try a method to try and see if
15054
24:13:41,120 --> 24:13:46,480
we can use data augmentation to prevent our model from overfitting. Although that experiment
15055
24:13:46,480 --> 24:13:50,800
doesn't sound like the most ideal one we could do right now, because it looks like our model is
15056
24:13:50,800 --> 24:13:54,960
underfitting. So with your knowledge of what you've just learned in the previous video,
15057
24:13:54,960 --> 24:14:01,840
how to prevent underfitting, what would you do to increase this model's capability of learning
15058
24:14:01,840 --> 24:14:07,040
patterns in the training data set? Would you train it for longer? Would you add more layers?
15059
24:14:07,040 --> 24:14:12,960
Would you add more hidden units? Have a think and we'll start building another model in the next video.
15060
24:14:12,960 --> 24:14:23,840
Welcome back. In the last video, we covered the important concept of a loss curve and how it can
15061
24:14:23,840 --> 24:14:28,800
give us information about whether our model is underfitting. In other words, our model's loss
15062
24:14:28,800 --> 24:14:35,680
could be lower or whether it's overfitting. In other words, the training loss is lower than the test
15063
24:14:35,680 --> 24:14:41,360
loss or far lower than the validation loss. That's another thing to note here is that I put training
15064
24:14:41,360 --> 24:14:46,960
and test sets here. You could also do this with a validation data set and that the just right,
15065
24:14:46,960 --> 24:14:52,240
the Goldilocks zone, is when our training and test loss are quite similar over time.
15066
24:14:53,600 --> 24:14:57,360
Now, there was a fair bit of information in that last video, so I just wanted to highlight
15067
24:14:57,360 --> 24:15:02,720
that you can get this all in section 04, which is the notebook that we're working on. And then
15068
24:15:02,720 --> 24:15:08,240
if you come down over here, if we come to section 8, watch an ideal loss curve look like we've got
15069
24:15:08,240 --> 24:15:14,000
underfitting, overfitting, just right, how to deal with overfitting. We've got a few options here.
15070
24:15:14,000 --> 24:15:18,160
We've got how to deal with underfitting and then we've got a few options there. And then if we
15071
24:15:18,160 --> 24:15:26,560
wanted to look for more, how to deal with overfitting. You could find a bunch of resources here and then
15072
24:15:26,560 --> 24:15:35,520
how to deal with underfitting. You could find a bunch of resources here as well. So that is a
15073
24:15:35,520 --> 24:15:39,600
very fine line, very fine balance that you're going to experience throughout all of your
15074
24:15:39,600 --> 24:15:45,520
machine learning career. But it's time now to move on. We're going to move on to creating
15075
24:15:45,520 --> 24:15:53,920
another model, which is tiny VGG, with data augmentation this time. So if we go back to the slide,
15076
24:15:54,480 --> 24:15:59,200
data augmentation is one way of dealing with overfitting. Now, it's probably not the most
15077
24:15:59,200 --> 24:16:04,160
ideal experiment that we could take because our model zero, our baseline model, looks like it's
15078
24:16:04,160 --> 24:16:11,520
underfitting. But data augmentation, as we've seen before, is a way of manipulating images
15079
24:16:11,520 --> 24:16:17,520
to artificially increase the diversity of your training data set without collecting more data.
15080
24:16:18,160 --> 24:16:23,440
So we could take our photos of pizza, sushi, and steak and randomly rotate them 30 degrees
15081
24:16:24,080 --> 24:16:29,920
and increase diversity forces a model to learn or hopefully learn. Again, all of these come with
15082
24:16:29,920 --> 24:16:36,640
a caveat of not always being the silver bullet to learn more generalizable patterns. Now,
15083
24:16:36,640 --> 24:16:40,800
I should have spelled generalizable here rather than generalization, but similar thing.
15084
24:16:42,000 --> 24:16:46,400
Let's go here. Let's create to start off with, we'll just write down.
15085
24:16:48,160 --> 24:16:55,040
Now let's try another modeling experiment. So this is in line with our PyTorch workflow,
15086
24:16:55,040 --> 24:17:01,040
trying a model and trying another one and trying another one, so and so over again. This time,
15087
24:17:01,760 --> 24:17:10,160
using the same model as before, but with some slight data augmentation.
15088
24:17:10,880 --> 24:17:15,040
Oh, maybe we're not slight. That's probably not the best word. We'll just say with some data
15089
24:17:15,040 --> 24:17:20,720
augmentation. And if we come down here, we're going to write section 9.1. We need to first
15090
24:17:20,720 --> 24:17:28,320
create a transform with data augmentation. So we've seen what this looks like before.
15091
24:17:28,320 --> 24:17:34,400
We're going to use the trivial augment data augmentation, create training transform,
15092
24:17:34,400 --> 24:17:40,800
which is, as we saw in a previous video, what PyTorch the PyTorch team have recently used
15093
24:17:40,800 --> 24:17:46,080
to train their state-of-the-art computer vision models. So train transform trivial.
15094
24:17:46,080 --> 24:17:53,920
This is what I'm going to call my transform. And I'm just going to from Torch Vision import
15095
24:17:53,920 --> 24:17:58,400
transforms. We've done this before. We don't have to re-import it, but I'm going to do it anyway,
15096
24:17:58,400 --> 24:18:03,280
just to show you that we're re-importing or we're using transforms. And we're going to compose
15097
24:18:03,280 --> 24:18:09,600
a transform here. Recall that transforms help us manipulate our data. So we're going to transform
15098
24:18:09,600 --> 24:18:15,760
our images into size 64 by 64. Then we're going to set up a trivial augment transforms,
15099
24:18:15,760 --> 24:18:24,400
just like we did before, trivial augment wide. And we're going to set the number of magnitude
15100
24:18:24,400 --> 24:18:31,120
bins here to be 31, which is the default here, which means we'll randomly use some data augmentation
15101
24:18:31,120 --> 24:18:38,080
on each one of our images. And it will be applied at a magnitude of 0 to 31, also randomly selected.
15102
24:18:38,640 --> 24:18:44,720
So if we lower this to five, the upper bound of intensity of how much that data augmentation is
15103
24:18:44,720 --> 24:18:51,760
applied to a certain image will be less than if we set it to say 31. Now, our final transform
15104
24:18:51,760 --> 24:18:56,160
here is going to be too tensor because we want our images in tensor format for our model.
15105
24:18:57,120 --> 24:19:03,200
And then I'm going to create a test transform. I'm going to call this simple, which is just going
15106
24:19:03,200 --> 24:19:09,360
to be transforms dot compose. And all that it's going to have, oh, I should put a list here,
15107
24:19:09,360 --> 24:19:17,200
all that it's going to have, we'll just make some space over here, is going to be transforms.
15108
24:19:17,200 --> 24:19:27,600
All we want to do is resize the image size equals 64 64. Now we don't apply data augmentation
15109
24:19:27,600 --> 24:19:33,200
to the test data set, because we only just want to evaluate our models on the test data set.
15110
24:19:33,200 --> 24:19:37,840
Our models aren't going to be learning any generalizable patterns on the test data set,
15111
24:19:37,840 --> 24:19:43,200
which is why we focus our data augmentations on the training data set. And I've just readjusted
15112
24:19:43,200 --> 24:19:51,040
that. I don't want to do that. Beautiful. So we've got a transform ready. Now let's load some data
15113
24:19:51,040 --> 24:20:01,360
using those transforms. So we'll create train and test data sets and data loaders
15114
24:20:01,360 --> 24:20:12,480
with data augmentation. So we've done this before. You might want to try it out on your own. So
15115
24:20:12,480 --> 24:20:18,640
pause the video if you'd like to test it out. Create a data set and a data loader using these
15116
24:20:18,640 --> 24:20:25,360
transforms here. And recall that our data set is going to be creating a data set from pizza,
15117
24:20:25,360 --> 24:20:31,760
steak and sushi for the train and test folders. And that our data loader is going to be batchifying
15118
24:20:31,760 --> 24:20:41,520
our data set. So let's turn our image folders into data sets. Data sets, beautiful. And I'm going
15119
24:20:41,520 --> 24:20:48,000
to write here train data augmented just so we know that it's it's been augmented. We've got a few
15120
24:20:48,000 --> 24:20:52,960
of similar variable names throughout this notebook. So I just want to be as clear as possible. And
15121
24:20:52,960 --> 24:20:59,680
I'm going to use, I'll just re import torch vision data sets. We've seen this before, the image
15122
24:20:59,680 --> 24:21:05,280
folder. So rather than our use our own custom class, we're going to use the existing image folder
15123
24:21:05,280 --> 24:21:12,320
class that's within torch vision data sets. And we have to pass in here a root. So I'll just get
15124
24:21:12,320 --> 24:21:19,920
the doc string there, root, which is going to be equal to our trainer, which recall is the path
15125
24:21:19,920 --> 24:21:28,000
to our training directory. Got that saved. And then I'm going to pass in here, the transform is going
15126
24:21:28,000 --> 24:21:35,920
to be train transform trivial. So our training data is going to be augmented. Thanks to this
15127
24:21:35,920 --> 24:21:41,520
transform here, and transforms trivial augment wide. You know where you can find more about
15128
24:21:41,520 --> 24:21:47,200
trivial augment wide, of course, in the pie torch documentation, or just searching transforms
15129
24:21:47,200 --> 24:21:59,600
trivial augment wide. And did I spell this wrong? trivial. Oh, train train transform. I spelled
15130
24:21:59,600 --> 24:22:06,640
that wrong. Of course I did. So test data, let's create this as test data simple, equals data sets
15131
24:22:06,640 --> 24:22:12,640
dot image folder. And the root D is going to be here the test directory. And the transform is
15132
24:22:12,640 --> 24:22:24,240
just going to be what the test transform simple. Beautiful. So now let's turn these data sets
15133
24:22:24,240 --> 24:22:35,280
into data loaders. So turn our data sets into data loaders. We're going to import os,
15134
24:22:36,080 --> 24:22:41,200
I'm going to set the batch size here to equal to 32. The number of workers that are going to
15135
24:22:41,200 --> 24:22:46,320
load our data loaders, I'm going to set this to os dot CPU count. So there'll be one worker
15136
24:22:46,320 --> 24:22:55,200
per CPU on our machine. I'm going to set here the torch manual seed to 42, because we're going to
15137
24:22:55,200 --> 24:23:03,840
shuffle our training data. Train data loader, I'm going to call this augmented equals data loader.
15138
24:23:03,840 --> 24:23:08,480
Now I just want to I don't need to re import this, but I just want to show you again from
15139
24:23:08,480 --> 24:23:15,600
torch dot utils. You can never have enough practice right dot data. Let's import data loader.
15140
24:23:16,320 --> 24:23:22,960
So that's where we got the data loader class from. Now let's go train data augmented. We'll
15141
24:23:22,960 --> 24:23:28,000
pass in that as the data set. And I'll just put in here the parameter name for completeness.
15142
24:23:28,640 --> 24:23:33,760
That's our data set. And then we want to set the batch size, which is equal to batch size.
15143
24:23:33,760 --> 24:23:42,960
I'm going to set shuffle equal to true. And I'm going to set num workers equal to num workers.
15144
24:23:45,040 --> 24:23:52,080
Beautiful. And now let's do that again with the test data loader that this time test data
15145
24:23:52,080 --> 24:23:56,000
loader. I'm going to call this test data loader simple. We're not using any data
15146
24:23:56,000 --> 24:24:01,760
augmentation on the test data set, just turning our images, our test images into tenses.
15147
24:24:01,760 --> 24:24:08,480
The data set here is going to be test data simple. Going to pass in the batch size equal to batch
15148
24:24:08,480 --> 24:24:14,320
size. So both our data loaders will have a batch size of 32. Going to keep shuffle on false.
15149
24:24:14,320 --> 24:24:26,480
And num workers, I'm going to set to num workers. Look at us go. We've already got a data set
15150
24:24:26,480 --> 24:24:32,640
and a data loader. This time, our data loader is going to be augmented for the training data set.
15151
24:24:32,640 --> 24:24:36,480
And it's going to be nice and simple for the test data set. So this is really similar,
15152
24:24:36,480 --> 24:24:41,440
this data loader to the previous one we made. The only difference in this modeling experiment
15153
24:24:41,440 --> 24:24:46,960
is that we're going to be adding data augmentation, namely trivial augment wide.
15154
24:24:47,680 --> 24:24:52,480
So with that being said, we've got a data set, we've got a data loader. In the next video,
15155
24:24:52,480 --> 24:24:58,240
let's construct and train model one. In fact, you might want to give that a go. So you can use
15156
24:24:58,240 --> 24:25:05,920
our tiny VGG class to make model one. And then you can use our train function to train a new
15157
24:25:05,920 --> 24:25:12,880
tiny VGG instance with our training data loader augmented and our test data loader simple.
15158
24:25:13,520 --> 24:25:17,600
So give that a go and we'll do it together in the next video. I'll see you there.
15159
24:25:17,600 --> 24:25:26,160
Now that we've got our data sets and data loaders with data augmentation ready,
15160
24:25:26,160 --> 24:25:34,080
let's now create another model. So 9.3, we're going to construct and train model one.
15161
24:25:34,880 --> 24:25:40,240
And this time, I'm just going to write what we're going to doing, going to be doing sorry.
15162
24:25:40,240 --> 24:25:46,000
This time, we'll be using the same model architecture, but we're changing the data here.
15163
24:25:46,000 --> 24:25:56,720
Except this time, we've augmented the training data. So we'd like to see how this performs
15164
24:25:56,720 --> 24:26:03,200
compared to a model with no data augmentation. So that was our baseline up here. And that's what
15165
24:26:03,200 --> 24:26:08,160
you'll generally do with your experiments. You'll start as simple as possible and introduce
15166
24:26:08,160 --> 24:26:14,000
complexity when required. So create model one and send it to the target device, that is,
15167
24:26:14,000 --> 24:26:21,120
to the target device. And because of our helpful selves previously, we can create a manual seed
15168
24:26:21,120 --> 24:26:28,320
here, torch.manualseed. And we can create model one, leveraging the class that we created before.
15169
24:26:29,760 --> 24:26:36,720
So although we built tiny VGG from scratch in this video, in this section, sorry, in subsequent
15170
24:26:37,600 --> 24:26:42,000
coding sessions, because we've built it from scratch once and we know that it works, we can
15171
24:26:42,000 --> 24:26:48,800
just recreate it by calling the class and passing in different variables here. So let's get the
15172
24:26:48,800 --> 24:26:54,480
number of classes that we have in our train data augmented classes. And we're going to send it
15173
24:26:54,480 --> 24:27:03,280
to device. And then if we inspect model one, let's have a look. Wonderful. Now let's keep going.
15174
24:27:03,280 --> 24:27:08,160
We can also leverage our training function that we did. You might have tried this before.
15175
24:27:08,160 --> 24:27:17,040
So let's now train our model. She's going to put here. Wonderful. Now we've got a model and
15176
24:27:17,040 --> 24:27:24,960
data loaders. Let's create what do we have to do? We have to create a loss function and an optimizer
15177
24:27:24,960 --> 24:27:37,360
and call upon our train function that we created earlier to train and evaluate our model. Beautiful.
15178
24:27:37,360 --> 24:27:48,720
So I'm going to set the random seeds, torch dot manual seeds, and torch dot CUDA, because we're
15179
24:27:48,720 --> 24:27:54,320
going to be using CUDA. Let's set the manual seed here 42. I'm going to set the number of epochs.
15180
24:27:54,320 --> 24:28:01,520
We're going to keep many of the parameters the same. Set the number of epochs, num epochs equals
15181
24:28:01,520 --> 24:28:06,880
five. We could of course train this model for longer if we really wanted to by increasing the
15182
24:28:06,880 --> 24:28:16,400
number of epochs. But now let's set up the loss function. So loss FN equals NN cross entropy loss.
15183
24:28:16,960 --> 24:28:21,920
Don't forget this just came into mind. Loss function often as well in PyTorch is called
15184
24:28:21,920 --> 24:28:27,200
criterion. So the criterion you're trying to reduce. But I just like to call it loss function.
15185
24:28:28,720 --> 24:28:34,240
And then we're going to have optimizer. Let's use the same optimizer we use before torch dot
15186
24:28:34,240 --> 24:28:41,840
opt in dot atom. Recall SGD and atom are two of the most popular optimizers. So model one dot
15187
24:28:41,840 --> 24:28:46,640
parameters. Then the parameters we're going to optimize. We're going to set the learning rate to
15188
24:28:46,640 --> 24:28:54,000
zero zero one, which is the default for the atom optimizer in PyTorch. Then we're going to start
15189
24:28:54,000 --> 24:29:02,640
the timer. So from time it, let's import the default timer as timer. And we'll go start time
15190
24:29:02,640 --> 24:29:12,320
equals timer. And then let's go train model one. How can we do this? Well, we're going to get a
15191
24:29:12,320 --> 24:29:17,920
results dictionary as model one results. We're going to call upon our train function. Inside our
15192
24:29:17,920 --> 24:29:23,920
train function, we'll pass the model parameter as model one. For the train data loader parameter,
15193
24:29:23,920 --> 24:29:29,520
we're going to pass in train data loader augmented. So our augmented training data loader.
15194
24:29:29,520 --> 24:29:39,520
And for the test data loader, we can pass in here test data loader. Simple. Then we can write our
15195
24:29:39,520 --> 24:29:45,520
optimizer, which will be the atom optimizer. Our loss function is going to be an n cross entropy
15196
24:29:45,520 --> 24:29:50,480
loss, what we've created above. And then we can set the number of epochs is going to be equal to
15197
24:29:50,480 --> 24:29:56,240
num epochs. And then if we really wanted to, we could set the device equal to device, which will
15198
24:29:56,240 --> 24:30:03,600
be our target device. And now let's end the timer and print out how long it took.
15199
24:30:06,240 --> 24:30:18,960
Took n time equals timer. And we'll go print total training time for model one is going to be
15200
24:30:18,960 --> 24:30:28,960
n time minus start time. And oh, it would help if I could spell, we'll get that to three decimal
15201
24:30:28,960 --> 24:30:35,120
places. And that'll be seconds. So you're ready? We look how quickly we built a training pipeline
15202
24:30:35,120 --> 24:30:41,600
for model one. And look how big easily we created it. So go ask for coding all of that stuff up
15203
24:30:41,600 --> 24:30:50,000
before. Let's train our second model, our first model using data augmentation. You're ready? Three,
15204
24:30:50,000 --> 24:30:56,640
two, one, let's go. No errors. Beautiful. We're going nice and quick here.
15205
24:30:59,440 --> 24:31:05,520
So oh, about just over seven seconds. So what what GPU do I have currently?
15206
24:31:05,520 --> 24:31:11,520
Just keep this in mind that I'm using Google Colab Pro. So I get preference in terms of
15207
24:31:11,520 --> 24:31:16,960
allocating a faster GPU. Your model training time may be longer than what I've got, depending on the
15208
24:31:16,960 --> 24:31:22,720
GPU. It also may be faster, again, depending on the GPU. But we get about seven seconds, but it looks
15209
24:31:22,720 --> 24:31:29,440
like our model with data augmentation didn't perform as well as our model without data augmentation.
15210
24:31:29,440 --> 24:31:37,600
Hmm. So how long did our model before without data augmentation take the train? Oh, just over seven
15211
24:31:37,600 --> 24:31:43,440
seconds as well. But we got better results in terms of accuracy on the training and test data sets
15212
24:31:43,440 --> 24:31:48,880
for model zero. So maybe data augmentation doesn't help in our case. And we kind of hinted at that
15213
24:31:48,880 --> 24:31:56,320
because the loss here was already going down. We weren't really overfitting yet. So recall that data
15214
24:31:56,320 --> 24:32:03,280
augmentation is a way to help with overfitting generally. So maybe that wasn't the best step to
15215
24:32:03,280 --> 24:32:09,360
try and improve our model. But let's nonetheless keep evaluating our model. In the next video,
15216
24:32:09,360 --> 24:32:14,320
we're going to plot the loss curves of model one. So in fact, you might want to give that a go.
15217
24:32:14,960 --> 24:32:20,960
So we've got a function plot loss curves, and we've got some results in a dictionary format.
15218
24:32:20,960 --> 24:32:27,200
So try that out, plot the loss curves, and see what you see. Let's do it together in the next video.
15219
24:32:28,000 --> 24:32:28,640
I'll see you there.
15220
24:32:31,920 --> 24:32:36,720
In the last video, we did the really exciting thing of training our first model with data
15221
24:32:36,720 --> 24:32:43,680
augmentation. But we also saw that quantitatively, it looks like that it didn't give us much improvement.
15222
24:32:43,680 --> 24:32:48,320
So let's keep evaluating our model here. I'm going to make a section. Recall that one of my
15223
24:32:48,320 --> 24:32:52,960
favorite ways or one of the best ways, not just my favorite, to evaluate the performance of a
15224
24:32:52,960 --> 24:33:03,040
model over time is to plot the loss curves. So a loss curve helps you evaluate your model's performance
15225
24:33:04,000 --> 24:33:10,880
over time. And it will also give you a great visual representation or a visual way to see if
15226
24:33:10,880 --> 24:33:18,880
your model is underfitting or overfitting. So let's plot the loss curves of model one results and see
15227
24:33:18,880 --> 24:33:24,640
what happens. We're using this function we created before. And oh my goodness, is that going in the
15228
24:33:24,640 --> 24:33:32,400
right direction? It looks like our test loss is going up here. Now, is that where we want it to go?
15229
24:33:33,040 --> 24:33:38,320
Remember the ideal direction for a loss curve is to go down over time because loss is measuring
15230
24:33:38,320 --> 24:33:44,160
what? It's measuring how wrong our model is. And the accuracy curve looks like it's all over the
15231
24:33:44,160 --> 24:33:49,280
place as well. I mean, it's going up kind of, but maybe we don't have enough time to measure these
15232
24:33:49,280 --> 24:33:56,160
things. So an experiment that you could do is train both of our models model zero and model one
15233
24:33:56,720 --> 24:34:02,640
for more epochs and see if these loss curves flatten out. So I'll pose you the question,
15234
24:34:02,640 --> 24:34:09,440
is our model underfitting or overfitting right now or both? So if we want to have a look at the
15235
24:34:09,440 --> 24:34:15,280
loss curves, our just right is for the loss that is, this is not for accuracy, this is for loss over
15236
24:34:15,280 --> 24:34:23,520
time, we want it to go down. So for me, our model is underfitting because our loss could be lower,
15237
24:34:23,520 --> 24:34:28,960
but it also looks like it's overfitting as well. So it's not doing a very good job because our test
15238
24:34:28,960 --> 24:34:38,080
loss is far higher than our training loss. So if we go back to section four of the LearnPyTorch.io
15239
24:34:38,080 --> 24:34:43,280
book, what should an ideal loss curve look like? I'd like you to start thinking of some ways
15240
24:34:43,840 --> 24:34:49,920
that we could deal with overfitting of our model. So could we get more data? Could we simplify it?
15241
24:34:49,920 --> 24:34:53,360
Could we use transfer learning? We're going to see that later on, but you might want to jump
15242
24:34:53,360 --> 24:34:58,320
ahead and have a look. And if we're dealing with underfitting, what are some other things that we
15243
24:34:58,320 --> 24:35:03,280
could try with our model? Could we add some more layers, potentially another convolutional block?
15244
24:35:03,280 --> 24:35:08,000
Could we increase the number of hidden units per layer? So if we've got currently 10 hidden units
15245
24:35:08,000 --> 24:35:13,040
per layer, maybe you want to increase that to 64 or something like that? Could we train it for
15246
24:35:13,040 --> 24:35:17,120
longer? That's probably one of the easiest things to try with our current training functions. We
15247
24:35:17,120 --> 24:35:24,720
could train for 20 epochs. So have a go at this, reference this, try out some experiments with,
15248
24:35:24,720 --> 24:35:30,960
see if you can get these loss curves more towards the ideal shape. And in the next video, we're going
15249
24:35:30,960 --> 24:35:35,920
to keep pushing forward. We're going to compare our model results. So we've done two experiments.
15250
24:35:35,920 --> 24:35:39,760
Let's now see them side by side. We've looked at our model results individually,
15251
24:35:39,760 --> 24:35:44,480
and we know that they could be improved. But a good way to compare all of your experiments
15252
24:35:44,480 --> 24:35:49,520
is to compare your model's results side by side. So that's what we're going to do in the next video.
15253
24:35:49,520 --> 24:35:59,120
I'll see you there. Now that we've compared our models loss curves on their own individually,
15254
24:35:59,120 --> 24:36:05,840
how about we compare our model results to each other? So let's have a look at comparing our model
15255
24:36:05,840 --> 24:36:13,520
results. And so I'm going to write a little note here that after evaluating our modeling
15256
24:36:13,520 --> 24:36:24,080
experiments on their own, it's important to compare them to each other. And there's a few
15257
24:36:24,080 --> 24:36:31,040
different ways to do this. There's a few different ways to do this. Number one is hard coding.
15258
24:36:32,000 --> 24:36:37,600
So like we've done, we've written functions, we've written helper functions and whatnot,
15259
24:36:37,600 --> 24:36:42,080
and manually plotted things. So I'm just going to write in here, this is what we're doing.
15260
24:36:42,080 --> 24:36:48,880
Then, of course, there are tools to do this, such as PyTorch plus TensorBoard. So I'll link to this,
15261
24:36:48,880 --> 24:36:54,720
PyTorch TensorBoard. We're going to see this in a later section of the course. TensorBoard is a
15262
24:36:54,720 --> 24:36:59,760
great resource for tracking your experiments. If you'd like to jump forward and have a look at what
15263
24:36:59,760 --> 24:37:05,040
that is in the PyTorch documentation, I'd encourage you to do so. Then another one of my favorite
15264
24:37:05,040 --> 24:37:13,840
tools is weights and biases. So these are all going to involve some code as well, but they help out
15265
24:37:13,840 --> 24:37:19,840
with automatically tracking different experiments. So weights and biases is one of my favorite,
15266
24:37:20,400 --> 24:37:25,040
and you've got platform for experiments. That's what you'll be looking at. So if you run multiple
15267
24:37:25,040 --> 24:37:30,720
experiments, you can set up weights and biases pretty easy to track your different model hub
15268
24:37:30,720 --> 24:37:37,040
parameters. So PyTorch, there we go. Import weights and biases, start a new run on weights and biases.
15269
24:37:37,760 --> 24:37:42,800
You can save the learning rate value and whatnot, go through your data and just log everything there.
15270
24:37:43,360 --> 24:37:48,240
So this is not a course about different tools. We're going to focus on just pure PyTorch,
15271
24:37:48,240 --> 24:37:51,360
but I thought I'd leave these here anyway, because you're going to come across them
15272
24:37:51,360 --> 24:37:58,000
eventually, and MLflow is another one of my favorites as well. We have ML tracking,
15273
24:37:58,000 --> 24:38:03,200
projects, models, registry, all that sort of stuff. If you'd like in to look into
15274
24:38:03,200 --> 24:38:08,400
more ways to track your experiments, there are some extensions. But for now, we're going to stick
15275
24:38:08,400 --> 24:38:13,840
with hard coding. We're just going to do it as simple as possible to begin with. And if we wanted
15276
24:38:13,840 --> 24:38:20,080
to add other tools later on, we can sure do that. So let's create a data frame for each of our model
15277
24:38:20,080 --> 24:38:26,480
results. We can do this because our model results recall are in the form of dictionaries. So model
15278
24:38:26,480 --> 24:38:33,200
zero results. But you can see what we're doing now by hard coding this, it's quite cumbersome.
15279
24:38:33,200 --> 24:38:38,720
Can you imagine if we had say 10 models or even just five models, we'd have to really
15280
24:38:38,720 --> 24:38:45,040
write a fair bit of code here for all of our dictionaries and whatnot, whereas these tools
15281
24:38:45,040 --> 24:38:51,520
here help you to track everything automatically. So we've got a data frame here. Model zero results
15282
24:38:51,520 --> 24:38:56,720
over time. These are our number of epochs. We can notice that the training loss starts to go down.
15283
24:38:56,720 --> 24:39:02,240
The testing loss also starts to go down. And the accuracy on the training and test data set starts
15284
24:39:02,240 --> 24:39:06,800
to go up. Now, those are the trends that we're looking for. So an experiment you could try would
15285
24:39:06,800 --> 24:39:12,400
be to train this model zero for longer to see if it improved. But we're currently just interested
15286
24:39:12,400 --> 24:39:18,800
in comparing results. So let's set up a plot. I want to plot model zero results and model one
15287
24:39:18,800 --> 24:39:24,880
results on the same plot. So we'll need a plot for training loss. We'll need a plot for training
15288
24:39:24,880 --> 24:39:31,360
accuracy, test loss and test accuracy. And then we want two separate lines on each of them. One
15289
24:39:31,360 --> 24:39:37,040
for model zero and one for model one. And this particular pattern would be similar regardless if
15290
24:39:37,040 --> 24:39:42,640
we had 10 different experiments, or if we had 10 different metrics we wanted to compare,
15291
24:39:42,640 --> 24:39:47,040
you generally want to plot them all against each other to make them visual. And that's what tools
15292
24:39:47,040 --> 24:39:52,800
such as weights and biases, what TensorBoard, and what ML flow can help you to do. I'm just
15293
24:39:52,800 --> 24:39:59,920
going to get out of that, clean up our browser. So let's set up a plot here. I'm going to use
15294
24:39:59,920 --> 24:40:04,320
matplotlib. I'm going to put in a figure. I'm going to make it quite large because we want four
15295
24:40:04,320 --> 24:40:10,000
subplots, one for each of the metrics we want to compare across our different models. Now,
15296
24:40:10,000 --> 24:40:18,560
let's get number of epochs. So epochs is going to be length, or we'll turn it into a range, actually,
15297
24:40:19,360 --> 24:40:28,720
range of Len model zero DF. So that's going to give us five. Beautiful range between zero and five.
15298
24:40:29,280 --> 24:40:36,560
Now, let's create a plot for the train loss. We want to compare the train loss across model zero
15299
24:40:36,560 --> 24:40:45,440
and the train loss across model one. So we can go PLT dot subplot. Let's create a plot with two
15300
24:40:45,440 --> 24:40:50,240
rows and two columns. And this is going to be index number one will be the training loss.
15301
24:40:50,240 --> 24:40:57,280
We'll go PLT dot plot. I'm going to put in here epochs and then model zero DF. Inside here,
15302
24:40:57,280 --> 24:41:04,400
I'm going to put train loss for our first metric. And then I'm going to label it with model zero.
15303
24:41:04,400 --> 24:41:12,000
So we're comparing the train loss on each of our modeling experiments. Recall that model zero was
15304
24:41:12,000 --> 24:41:19,280
our baseline model. And that was tiny VGG without data augmentation. And then we tried out model one,
15305
24:41:19,280 --> 24:41:25,760
which was the same model. But all we did was we added a data augmentation transform to our training
15306
24:41:25,760 --> 24:41:35,360
data. So PLT will go x label. They both used the same test data set and PLT dot legend. Let's see
15307
24:41:35,360 --> 24:41:42,560
what this looks like. Wonderful. So there's our training loss across two different models.
15308
24:41:43,520 --> 24:41:49,200
So we notice that model zero is trending in the right way. Model one kind of exploded on epoch
15309
24:41:49,200 --> 24:41:56,080
number that would be zero, one, two, or one, depending how you're counting. Let's just say epoch number
15310
24:41:56,080 --> 24:42:00,720
two, because that's easier. The loss went up. But then it started to go back down. So again,
15311
24:42:00,720 --> 24:42:05,360
if we continued training these models, we might notice that the overall trend of the loss is
15312
24:42:05,360 --> 24:42:11,440
going down on the training data set, which is exactly what we'd like. So let's now plot,
15313
24:42:12,080 --> 24:42:18,160
we'll go the test loss. So I'm going to go test loss here. And then I'm going to change this.
15314
24:42:18,160 --> 24:42:25,520
I believe if I hold control, or command, maybe, nope, or option on my Mac keyboard,
15315
24:42:25,520 --> 24:42:29,680
yeah, so it might be a different key on Windows. But for me, I can press option and I can get a
15316
24:42:29,680 --> 24:42:35,520
multi cursor here. So I'm just going to come back in here. And that way I can backspace there
15317
24:42:35,520 --> 24:42:42,320
and just turn this into test loss. Wonderful. So I'm going to put this as test loss as the title.
15318
24:42:42,320 --> 24:42:48,960
And I need to change the index. So this will be index one, index two, index three, index four.
15319
24:42:49,600 --> 24:42:55,360
Let's see what this looks like. Do we get the test loss? Beautiful. That's what we get.
15320
24:42:55,360 --> 24:43:00,000
However, we noticed that model one is probably overfitting at this stage. So maybe the data
15321
24:43:00,000 --> 24:43:05,920
augmentation wasn't the best change to make to our model. Recall that even if you make a change
15322
24:43:05,920 --> 24:43:11,120
to your model, such as preventing overfitting or underfitting, it won't always guarantee that
15323
24:43:11,120 --> 24:43:17,680
the change takes your model's evaluation metrics in the right direction. Ideally, loss is going
15324
24:43:17,680 --> 24:43:25,280
from top left to bottom right over time. So looks like model zero is winning out here at the moment
15325
24:43:25,280 --> 24:43:32,640
on the loss front. So now let's plot the accuracy for both training and test. So I'm going to change
15326
24:43:32,640 --> 24:43:38,960
this to train. I'm going to put this as accuracy. And this is going to be index number three on the
15327
24:43:38,960 --> 24:43:47,440
plot. And do we save it as, yeah, just act? Wonderful. So I'm going to option click here on my Mac.
15328
24:43:48,080 --> 24:43:54,080
This is going to be train. And this is going to be accuracy here. And then I'll change this one to
15329
24:43:54,800 --> 24:44:00,560
accuracy. And then I'm going to change this to accuracy. And this is going to be plot number four,
15330
24:44:01,200 --> 24:44:07,680
two rows, two columns, index number four. And I'm going to option click here to have two cursors,
15331
24:44:07,680 --> 24:44:15,360
test, act. And then I'll change this to test, act. And I'm going to get rid of the legend here.
15332
24:44:15,360 --> 24:44:20,960
It takes a little bit to plot because we're doing four graphs in one hit. Wonderful. So that's
15333
24:44:20,960 --> 24:44:26,800
comparing our models. But do you see how we could potentially functionalize this to plot, however,
15334
24:44:26,800 --> 24:44:32,880
many model results that we have? But if we had say another five models, we did another five
15335
24:44:32,880 --> 24:44:37,360
experiments, which is actually not too many experiments on a problem, you might find that
15336
24:44:37,360 --> 24:44:41,840
sometimes you do over a dozen experiments for a single modeling problem, maybe even more.
15337
24:44:42,560 --> 24:44:47,280
These graphs can get pretty outlandish with all the little lines going through. So that's
15338
24:44:47,280 --> 24:44:54,400
again what tools like TensorBoard, weights and biases and MLflow will help with. But if we have
15339
24:44:54,400 --> 24:44:59,440
a look at the accuracy, it seems that both of our models are heading in the right direction.
15340
24:44:59,440 --> 24:45:05,360
We want to go from the bottom left up in the case of accuracy. But the test accuracy that's training,
15341
24:45:05,360 --> 24:45:10,000
oh, excuse me, is this not training accuracy? I messed up that. Did you catch that one?
15342
24:45:12,000 --> 24:45:16,880
So training accuracy, we're heading in the right direction, but it looks like model one is
15343
24:45:16,880 --> 24:45:20,960
yeah, still overfitting. So the results we're getting on the training data set
15344
24:45:20,960 --> 24:45:26,160
aren't coming over to the testing data set. And that's what we really want our models to shine
15345
24:45:26,160 --> 24:45:32,880
is on the test data set. So metrics on the training data set are good. But ideally,
15346
24:45:32,880 --> 24:45:37,840
we want our models to perform well on the test data set data it hasn't seen before.
15347
24:45:38,480 --> 24:45:42,560
So that's just something to keep in mind. Whenever you do a series of modeling experiments,
15348
24:45:42,560 --> 24:45:47,920
it's always good to not only evaluate them individually, evaluate them against each other.
15349
24:45:47,920 --> 24:45:52,320
So that way you can go back through your experiments, see what worked and what didn't.
15350
24:45:52,320 --> 24:45:56,320
If you were to ask me what I would do for both of these models, I would probably train them for
15351
24:45:56,320 --> 24:46:02,400
longer and maybe add some more hidden units to each of the layers and see where the results go from
15352
24:46:02,400 --> 24:46:08,560
there. So give that a shot. In the next video, let's see how we can use our trained models to
15353
24:46:08,560 --> 24:46:14,880
make a prediction on our own custom image of food. So yes, we used a custom data set of
15354
24:46:14,880 --> 24:46:21,040
pizza steak and sushi images. But what if we had our own, what if we finished this model training
15355
24:46:21,040 --> 24:46:25,840
and we decided, you know what, this is a good enough model. And then we deployed it to an app like
15356
24:46:25,840 --> 24:46:32,320
neutrify dot app, which is a food recognition app that I'm personally working on. Then we wanted to
15357
24:46:32,320 --> 24:46:38,080
upload an image and have it be classified by our pytorch model. So let's give that a shot, see how
15358
24:46:38,080 --> 24:46:44,640
we can use our trained model to predict on an image that's not in our training data and not in our
15359
24:46:44,640 --> 24:46:54,560
testing data. I'll see you in the next video. Welcome back. In the last video, we compared our
15360
24:46:54,560 --> 24:47:00,640
modeling experiments. Now we're going to move on to one of the most exciting parts of deep learning.
15361
24:47:00,640 --> 24:47:13,280
And that is making a prediction on a custom image. So although we've trained a model on custom data,
15362
24:47:14,560 --> 24:47:23,200
how do you make a prediction on a sample slash image in our case? That's not in either
15363
24:47:23,200 --> 24:47:30,640
the training or testing data set. So let's say you were building a food recognition app,
15364
24:47:30,640 --> 24:47:35,360
such as neutrify, take a photo of food and learn about it. You wanted to use computer vision to
15365
24:47:35,360 --> 24:47:41,280
essentially turn foods into QR codes. So I'll just show you the workflow here. If we were to upload
15366
24:47:41,280 --> 24:47:48,000
this image of my dad giving two thumbs up for a delicious pizza. And what does neutrify predicted
15367
24:47:48,000 --> 24:47:54,160
as pizza? Beautiful. So macaronutrients that you get some nutrition information and then the time
15368
24:47:54,160 --> 24:48:00,400
taken. So we could replicate a similar process to this using our trained PyTorch model, or be it.
15369
24:48:00,400 --> 24:48:05,360
It's not going to be too great of results or performance because we've seen that we could
15370
24:48:05,360 --> 24:48:11,360
improve our models, but based on the accuracy here and based on the loss and whatnot. But let's just
15371
24:48:11,360 --> 24:48:16,720
see what it's like, the workflow. So the first thing we're going to do is get a custom image.
15372
24:48:16,720 --> 24:48:23,440
Now we could upload one here, such as clicking the upload button in Google Colab, choosing an image
15373
24:48:23,440 --> 24:48:29,200
and then importing it like that. But I'm going to do so programmatically, as you've seen before.
15374
24:48:29,200 --> 24:48:35,360
So let's write some code in this video to download a custom image. I'm going to do so using requests
15375
24:48:36,320 --> 24:48:43,040
and like all good cooking shows, I've prepared a custom image for us. So custom image path. But
15376
24:48:43,040 --> 24:48:48,240
again, you could use this process that we're going to go through with any of your own images
15377
24:48:48,240 --> 24:48:54,000
of pizza, steak or sushi. And if you wanted to train your own model on another set of custom data,
15378
24:48:54,000 --> 24:49:00,160
the workflow will be quite similar. So I'm going to download a photo called pizza dad,
15379
24:49:00,880 --> 24:49:07,280
which is my dad, two big thumbs up. And so I'm going to download it from github. So this image is
15380
24:49:07,280 --> 24:49:12,800
on the course github. And let's write some code to download the image. If it doesn't already exist
15381
24:49:13,440 --> 24:49:20,720
in our Colab instance. So if you wanted to upload a single image, you could click with this button.
15382
24:49:20,720 --> 24:49:25,120
Just be aware that like all of our other data, it's going to disappear if Colab disconnects.
15383
24:49:25,120 --> 24:49:28,480
So that's why I like to write code. So we don't have to re upload it every time.
15384
24:49:28,480 --> 24:49:38,720
So if not custom image path is file, let's open a request here or open a file going to open up
15385
24:49:38,720 --> 24:49:46,640
the custom image path with right binary permissions as F short for file. And then when downloading,
15386
24:49:47,360 --> 24:49:54,000
this is because our image is stored on github. When downloading an image or when downloading
15387
24:49:54,000 --> 24:50:00,960
from github in general, you typically want the raw link need to use the raw file link.
15388
24:50:01,760 --> 24:50:08,320
So let's write a request here equals request dot get. So if we go to the pytorch deep learning
15389
24:50:08,320 --> 24:50:15,120
repo, then if we go into, I believe it might be extras, not in extras, it's going to be in images,
15390
24:50:15,120 --> 24:50:19,200
that would make a lot more sense. Wouldn't it Daniel? Let's get O for pizza dad.
15391
24:50:19,200 --> 24:50:26,560
So if we have a look, this is pytorch deep learning images, O for pizza dad. There's a big version
15392
24:50:26,560 --> 24:50:32,160
of the image there. And then if we click download, just going to give us the raw link. Yeah, there we
15393
24:50:32,160 --> 24:50:36,880
go. So that's the image. Hey dad, how you doing? Is that pizza delicious? It looks like it.
15394
24:50:36,880 --> 24:50:43,920
Let's see if our model can get this right. What do you think? Will it? So of course, we want
15395
24:50:43,920 --> 24:50:50,400
our model to predict pizza for this image because it's got a pizza in it. So custom image path,
15396
24:50:51,040 --> 24:50:56,800
we're going to download that. I've just put in the raw URL above. So notice the raw
15397
24:50:57,360 --> 24:51:04,080
github user content. That's from the course github. Then I'm going to go f dot right. So file,
15398
24:51:05,040 --> 24:51:13,360
write the request content. So the content from the request, in other words, the raw file from
15399
24:51:13,360 --> 24:51:18,880
github here. Similar workflow for if you were getting another image from somewhere else on
15400
24:51:18,880 --> 24:51:26,560
the internet and else if it is already downloaded, let's just not download it. So print f custom image
15401
24:51:26,560 --> 24:51:36,240
path already exists skipping download. And let's see if this works or run the code. So downloading
15402
24:51:36,240 --> 24:51:46,080
data o four pizza dad dot jpeg. And if we go into here, we refresh. There we go. Beautiful. So our
15403
24:51:46,080 --> 24:51:52,640
data or our custom image, sorry, is now in our data folder. So if we click on this, this is inside
15404
24:51:52,640 --> 24:52:01,760
Google CoLab now. Beautiful. We got a big nice big image there. And there's a nice big pizza there.
15405
24:52:01,760 --> 24:52:07,680
So we're going to be writing some code over the next few videos to do the exact same process as
15406
24:52:07,680 --> 24:52:13,360
what we've been doing to import our custom data set for our custom image. What do we still have to
15407
24:52:13,360 --> 24:52:18,560
do? We still have to turn it into tenses. And then we have to pass it through our model. So let's see
15408
24:52:18,560 --> 24:52:28,000
what that looks like over the next few videos. We are up to one of the most exciting parts of
15409
24:52:28,000 --> 24:52:34,800
building dev learning models. And that is predicting on custom data in our case, a custom image of
15410
24:52:35,760 --> 24:52:40,640
a photo of my dad eating pizza. So of course, we're training a computer vision model on here on
15411
24:52:40,640 --> 24:52:45,840
pizza steak and sushi. So hopefully the ideal result for our model to predict on this image
15412
24:52:45,840 --> 24:52:53,200
will be pizza. So let's keep going. Let's figure out how we can get our image, our custom image,
15413
24:52:53,200 --> 24:52:59,760
our singular image into Tensor form, loading in a custom image with pytorch, creating another
15414
24:52:59,760 --> 24:53:06,160
section here. So I'm just going to write down here, we have to make sure our custom image is in the
15415
24:53:06,160 --> 24:53:18,240
same format as the data our model was trained on. So namely, that was in Tensor form with data type
15416
24:53:18,240 --> 24:53:29,120
torch float 32. And then of shape 64 by 64 by three. So we might need to change the shape of our
15417
24:53:29,120 --> 24:53:37,760
image. And then we need to make sure that it's on the right device. Command MM, beautiful. So let's
15418
24:53:37,760 --> 24:53:44,800
see what this looks like. Hey, so if I'm going to import torch vision. Now the package you use to
15419
24:53:44,800 --> 24:53:50,720
load your data will depend on the domain you're in. So let's open up the torch vision documentation.
15420
24:53:51,760 --> 24:53:56,240
We can go to models. That's okay. So if we're working with text, you might want to look in
15421
24:53:56,240 --> 24:54:01,760
here for some input and output functions, so some loading functions, torch audio, same thing.
15422
24:54:02,320 --> 24:54:07,120
Torch vision is what we're working with. Let's click into torch vision. Now we want to look into
15423
24:54:07,120 --> 24:54:12,480
reading and writing images and videos because we want to read in an image, right? We've got a
15424
24:54:12,480 --> 24:54:17,520
custom image. We want to read it in. So this is part of your extracurricular, by the way, to go
15425
24:54:17,520 --> 24:54:22,080
through these for at least 10 minutes each. So spend an hour if you're going through torch vision.
15426
24:54:22,080 --> 24:54:26,960
You could do the same across these other ones. It will just really help you familiarize yourself
15427
24:54:26,960 --> 24:54:33,120
with all the functions of PyTorch domain libraries. So we want to look here's some options for video.
15428
24:54:33,120 --> 24:54:38,480
We're not working with video. Here's some options for images. Now what do we want to do? We want
15429
24:54:38,480 --> 24:54:44,400
to read in an image. So we've got a few things here. Decode image. Oh, I've skipped over one.
15430
24:54:44,960 --> 24:54:51,520
We can write a JPEG if we wanted to. We can encode a PNG. Let's jump into this one. Read image.
15431
24:54:51,520 --> 24:54:58,560
What does it do? Read the JPEG or PNG into a three-dimensional RGB or grayscale tensor.
15432
24:54:58,560 --> 24:55:03,040
That is what we want. And then optionally converts the image to the desired format.
15433
24:55:03,040 --> 24:55:10,640
The values of the output tensor are you int eight. Okay. Beautiful. So let's see what this looks like.
15434
24:55:10,640 --> 24:55:16,400
Okay. Mode. The read mode used optionally for converting the image. Let's see what we can do
15435
24:55:16,400 --> 24:55:25,920
with this. I'm going to copy this in. So I'll write this down. We can read an image into PyTorch using
15436
24:55:25,920 --> 24:55:34,560
and go with that. So let's see what this looks like in practice. Read in custom image. I can't
15437
24:55:34,560 --> 24:55:40,640
explain to you how much I love using deep learning models to predict on custom data. So custom image.
15438
24:55:41,200 --> 24:55:45,680
We're going to call it you int eight because as we read from the documentation here,
15439
24:55:46,240 --> 24:55:52,560
it reads it in you int eight format. So let's have a look at what that looks like rather than
15440
24:55:52,560 --> 24:55:59,040
just talking about it. Torch vision.io. Read image. What's our target image path?
15441
24:55:59,760 --> 24:56:04,240
Well, we've got custom image path up here. This is why I like to do things programmatically.
15442
24:56:04,800 --> 24:56:08,800
So if our collab notebook reset, we could just run this cell again,
15443
24:56:08,800 --> 24:56:14,960
get our custom image and then we've got it here. So custom image you int eight. Let's see what this
15444
24:56:14,960 --> 24:56:23,840
looks like. Oh, what did we get wrong? Unable to cast Python instance. Oh, does it need to be a
15445
24:56:23,840 --> 24:56:30,720
string expected a value type of string or what found POSIX path? So this the path needs to be a
15446
24:56:30,720 --> 24:56:38,800
string. Okay. If we have a look at our custom image path, what did we get wrong? Oh, we've got a
15447
24:56:38,800 --> 24:56:46,400
POSIX path. So let's convert this custom image path into a string and see what happens. Look at that.
15448
24:56:47,520 --> 24:56:55,520
That's how image in integer form. I wonder if this is plotable. Let's go PLT dot M show custom image
15449
24:56:55,520 --> 24:57:00,880
you int eight. Maybe we get a dimensionality problem here in valid shape. Okay. Let's
15450
24:57:00,880 --> 24:57:11,120
some permute it, permute, and we'll go one, two, zero. Is this going to plot? It's a fairly big image.
15451
24:57:11,680 --> 24:57:18,640
There we go. Two thumbs up. Look at us. So that is the power of torch vision.io. I owe stands for
15452
24:57:18,640 --> 24:57:24,160
input output. We were just able to read in our custom image. Now, how about we get some metadata
15453
24:57:24,160 --> 24:57:29,040
about this? Let's go. We'll print it up here, actually. I'll keep that there because that's
15454
24:57:29,040 --> 24:57:35,840
fun to plot it. Let's find the shape of our data, the data type. And yeah, we've got it in Tensor
15455
24:57:35,840 --> 24:57:41,360
format, but it's you int eight right now. So we might have to convert that to float 32. We want
15456
24:57:41,360 --> 24:57:46,720
to find out its shape. And we need to make sure that if we're predicting on a custom image,
15457
24:57:46,720 --> 24:57:52,080
the data that we're predicting on the custom image needs to be on the same device as our model.
15458
24:57:52,080 --> 24:58:00,240
So let's print out some info. Print. Let's go custom image Tensor. And this is going to be a new line.
15459
24:58:00,240 --> 24:58:08,160
And then we will go custom image you int eight. Wonderful. And then let's go custom image
15460
24:58:08,160 --> 24:58:16,240
shape. We will get the shape parameter custom image shape or attribute. Sorry. And then we also
15461
24:58:16,240 --> 24:58:21,600
want to know the data type custom image data type. But we have a kind of an inkling because the
15462
24:58:21,600 --> 24:58:29,440
documentation said it would be you int eight, you int eight, and we'll go D type. Let's have a look.
15463
24:58:30,160 --> 24:58:36,400
What do we have? So there's our image Tensor. And it's quite a big image. So custom image shape.
15464
24:58:36,880 --> 24:58:44,560
So what was our model trained on? Our model was trained on images of 64 by 64. So this image
15465
24:58:44,560 --> 24:58:49,600
encodes a lot more information than what our model was trained on. So we're going to have to
15466
24:58:49,600 --> 24:58:56,240
change that shape to pass it through our model. And then we've got an image data type here or
15467
24:58:56,240 --> 24:59:01,280
Tensor data type of torch you int eight. So maybe that's going to be some errors for us later on.
15468
24:59:01,280 --> 24:59:07,520
So if you want to go ahead and see if you can resize this Tensor to 64 64 using a torch transform
15469
24:59:07,520 --> 24:59:12,080
or torch vision transform, I'd encourage you to try that out. And if you know how to change a
15470
24:59:12,080 --> 24:59:18,640
torch tensor from you int eight to torch float 32, give that a shot as well. So let's try
15471
24:59:18,640 --> 24:59:22,800
make a prediction on our image in the next video. I'll see you there.
15472
24:59:26,000 --> 24:59:31,440
In the last video, we loaded in our own custom image and got two big thumbs up from my dad,
15473
24:59:31,440 --> 24:59:36,720
and we turned it into a tensor. So we've got a custom image tensor here. It's quite big though,
15474
24:59:36,720 --> 24:59:40,560
and we looked at a few things of what we have to do before we pass it through our model.
15475
24:59:40,560 --> 24:59:46,720
So we need to make sure it's in the data type torch float 32, shape 64, 64, 3, and on the right
15476
24:59:46,720 --> 24:59:56,320
device. So let's make another section here. We'll go 11.2 and we'll call it making a prediction on a
15477
24:59:56,320 --> 25:00:04,240
custom image with a pie torch model with a trained pie torch model. And albeit, our models aren't
15478
25:00:04,240 --> 25:00:09,040
quite the level we would like them at yet. I think it's important just to see what it's like to
15479
25:00:09,040 --> 25:00:17,600
make a prediction end to end on some custom data, because that's the fun part, right? So try to make
15480
25:00:17,600 --> 25:00:22,800
a prediction on an image. Now, I want to just highlight something about the importance of different
15481
25:00:22,800 --> 25:00:28,240
data types and shapes and whatnot and devices, three of the biggest errors in deep learning.
15482
25:00:28,800 --> 25:00:36,240
In let's see what happens if we try to predict on you int eight format. So we'll go model one
15483
25:00:36,240 --> 25:00:44,320
dot eval and with torch dot inference mode. Let's make a prediction. We'll pass it through our model
15484
25:00:44,320 --> 25:00:49,680
one. We could use model zero if we wanted to here. They're both performing pretty poorly anyway.
15485
25:00:50,320 --> 25:00:56,080
Let's send it to the device and see what happens. Oh, no. What did we get wrong here?
15486
25:00:56,800 --> 25:01:04,080
Runtime error input type. Ah, so we've got you int eight. So this is one of our first errors
15487
25:01:04,080 --> 25:01:10,480
that we talked about. We need to make sure that our custom data is of the same data type that
15488
25:01:10,480 --> 25:01:17,040
our model was originally trained on. So we've got torch CUDA float tensor. So we've got an issue
15489
25:01:17,040 --> 25:01:24,480
here. We've got a you into eight image data or image tensor trying to be predicted on by a model
15490
25:01:24,480 --> 25:01:33,920
with its data type of torch CUDA float tensor. So let's try fix this by loading the custom image
15491
25:01:33,920 --> 25:01:42,720
and convert to torch dot float 32. So one of the ways we can do this is we'll just recreate the
15492
25:01:42,720 --> 25:01:49,520
custom image tensor. And I'm going to use torch vision dot IO dot read image. We don't have to
15493
25:01:49,520 --> 25:01:53,600
fully reload our image, but I'm going to do it anyway for completeness and a little bit of practice.
15494
25:01:54,400 --> 25:02:01,760
And then I'm going to set the type here with the type method to torch float 32. And then
15495
25:02:01,760 --> 25:02:10,880
let's just see what happens. We'll go custom image. Let's see what this looks like. I wonder if our
15496
25:02:10,880 --> 25:02:17,280
model will work on this. Let's just try again, we'll bring this up, copy this down to make a
15497
25:02:17,280 --> 25:02:24,960
prediction and custom image dot two device. Our image is in torch float 32 now. Let's see what
15498
25:02:24,960 --> 25:02:32,400
happens. Oh, we get an issue. Oh my goodness, that's a big matrix. Now I have a feeling that
15499
25:02:32,400 --> 25:02:39,040
that might be because our image, our custom image is of a shape that's far too large. Custom image
15500
25:02:39,040 --> 25:02:47,200
dot shape. What do we get? Oh my gosh, 4000 and 3,024. And do you notice as well that our values
15501
25:02:47,200 --> 25:02:54,320
here are between zero and one, whereas our previous images, do we have an image? There we go. That
15502
25:02:54,320 --> 25:02:59,760
our model was trained on what between zero and one. So how could we get these values to be between
15503
25:02:59,760 --> 25:03:08,720
zero and one? Well, one of the ways to do so is by dividing by 255. Now, why would we divide by 255?
15504
25:03:09,840 --> 25:03:17,120
Well, because that's a standard image format is to store the image tensor values in values from
15505
25:03:17,120 --> 25:03:24,560
zero to 255 for red, green and blue color channels. So if we want to scale them, so this is what I
15506
25:03:24,560 --> 25:03:31,280
meant by zero to 255, if we wanted to scale these values to be between zero and one, we can divide
15507
25:03:31,280 --> 25:03:38,320
them by 255. Because that is the maximum value that they can be. So let's see what happens if we do
15508
25:03:38,320 --> 25:03:47,120
that. Okay, we get our image values between zero and one. Can we plot this image? So plt dot m
15509
25:03:47,120 --> 25:03:52,800
show, let's plot our custom image. We got a permute it. So it works nicely with mapplotlib.
15510
25:03:53,760 --> 25:03:54,640
What do we get here?
15511
25:04:00,720 --> 25:04:05,680
Beautiful. We get the same image, right? But it's still quite big. Look at that. We've got a pixel
15512
25:04:05,680 --> 25:04:11,600
height of or image height of almost 4000 pixels and a width of over 3000 pixels. So we need to do
15513
25:04:11,600 --> 25:04:17,280
some adjustments further on. So let's keep going. We've got custom image to device. We've got an
15514
25:04:17,280 --> 25:04:23,200
error here. So this is a shape error. So what can we do to transform our image shape? And you
15515
25:04:23,200 --> 25:04:29,280
might have already tried this. Well, let's create a transform pipeline to transform our image shape.
15516
25:04:29,280 --> 25:04:37,760
So create transform pipeline or composition to resize the image. Because remember, what are we
15517
25:04:37,760 --> 25:04:42,560
trying to do? We're trying to get our model to predict on the same type of data it was trained on.
15518
25:04:42,560 --> 25:04:51,280
So let's go custom image transform is transforms dot compose. And we're just going to, since our
15519
25:04:51,280 --> 25:04:59,600
image is already of a tensor, let's do transforms dot resize, and we'll set the size to the same shape
15520
25:04:59,600 --> 25:05:06,560
that our model was trained on, or the same size that is. So let's go from torch vision. We don't
15521
25:05:06,560 --> 25:05:10,400
have to rewrite this. It's already imported. But I just want to highlight that we're using the
15522
25:05:10,400 --> 25:05:18,000
transforms package. We'll run that. There we go. We've got a transform pipeline. Now let's see what
15523
25:05:18,000 --> 25:05:25,680
happens when we transform our target image, transform target image. What happens? Custom image
15524
25:05:25,680 --> 25:05:32,240
transformed. I love printing the inputs and outputs of our different pipelines here. So let's pass
15525
25:05:32,240 --> 25:05:39,840
our custom image that we've just imported. So custom image transform, our custom image is recall
15526
25:05:39,840 --> 25:05:48,640
of shape. Quite large. We're going to pass it through our transformation pipeline. And let's
15527
25:05:48,640 --> 25:05:58,960
print out the shapes. Let's go original shape. And then we'll go custom image dot shape. And then
15528
25:05:58,960 --> 25:06:10,080
we'll go print transformed shape is going to be custom image underscore transformed dot shape.
15529
25:06:11,040 --> 25:06:18,560
Let's see the transformation. Oh, would you look at that? How good we've gone from quite a large image
15530
25:06:18,560 --> 25:06:24,160
to a transformed image here. So it's going to be squished and squashed a little. So that's what
15531
25:06:24,160 --> 25:06:30,400
happens. Let's see what happens when we plot our transformed image. We've gone from 4000 pixels
15532
25:06:30,400 --> 25:06:36,400
on the height to 64. And we've gone from 3000 pixels on the height to 64. So this is what our
15533
25:06:36,400 --> 25:06:45,440
model is going to see. Let's go custom image transformed. And we're going to permute it to be 120.
15534
25:06:47,520 --> 25:06:52,480
Okay, so quite pixelated. Do you see how this might affect the accuracy of our model?
15535
25:06:52,480 --> 25:06:58,800
Because we've gone from custom image, is this going to, oh, yeah, we need to plot dot image.
15536
25:06:58,800 --> 25:07:06,640
So we've gone from this high definition image to an image that's of far lower quality here.
15537
25:07:06,640 --> 25:07:11,600
And I can kind of see myself that this is still a pizza, but I know that it's a pizza. So just
15538
25:07:11,600 --> 25:07:15,760
keep this in mind going forward is that another way that we could potentially improve our model's
15539
25:07:15,760 --> 25:07:23,680
performance if we increased the size of the training image data. So instead of 64 64, we might want
15540
25:07:23,680 --> 25:07:30,240
to upgrade our models capability to deal with images that are of 224 224. So if we have a look
15541
25:07:30,240 --> 25:07:40,800
at what this looks like 224 224. Wow, that looks a lot better than 64 64. So that's something that
15542
25:07:40,800 --> 25:07:46,160
you might want to try out later on. But we're going to stick in line with the CNN explainer model.
15543
25:07:47,760 --> 25:07:52,240
How about we try to make another prediction? So since we transformed our
15544
25:07:53,760 --> 25:08:00,400
image to be the same size as the data our model was trained on. So with torch inference mode,
15545
25:08:00,400 --> 25:08:08,480
let's go custom image pred equals model one on custom image underscore transformed.
15546
25:08:08,480 --> 25:08:15,360
Does it work now? Oh my goodness, still not working expected all tensors on the same device. Of course,
15547
25:08:15,360 --> 25:08:21,760
that's what we forgot here. Let's go to device. Or actually, let's leave that error there. And
15548
25:08:21,760 --> 25:08:27,440
we'll just copy this code down here. And let's put this custom image transform back on the right
15549
25:08:27,440 --> 25:08:35,520
device and see if we finally get a prediction to happen with our model. Oh, we still get an error.
15550
25:08:35,520 --> 25:08:42,560
Oh my goodness, what's going on here? Oh, we need to add a batch size to it. So I'm just gonna write
15551
25:08:42,560 --> 25:09:00,240
up here. This will error. No batch size. And this will error. Image not on right device. And then
15552
25:09:00,240 --> 25:09:07,120
let's try again, we need to add a batch size to our image. So if we look at custom image transformed
15553
25:09:08,880 --> 25:09:16,080
dot shape, recall that our images that passed through our model had a batch dimension. So this
15554
25:09:16,080 --> 25:09:22,320
is another place where we get shape mismatch issues is if our model, because what's going on
15555
25:09:22,320 --> 25:09:28,160
in neural network is a lot of tensor manipulation. If the dimensions don't line up, we want to perform
15556
25:09:28,160 --> 25:09:33,920
matrix multiplication and the rules. If we don't play to the rules, the matrix multiplication will
15557
25:09:33,920 --> 25:09:43,040
fail. So let's fix this by adding a batch dimension. So we can do this by going a custom image transformed.
15558
25:09:43,040 --> 25:09:50,160
Let's unsqueeze it on the first dimension and then check the shape. There we go. We add a single batch.
15559
25:09:50,160 --> 25:09:54,960
So that's what we want to do when we make a prediction on a single custom image. We want to pass it to
15560
25:09:54,960 --> 25:10:02,000
our model as an image or a batch of one sample. So let's finally see if this will work.
15561
25:10:03,520 --> 25:10:09,040
Let's just not comment what we'll do. This, or maybe we'll try anyway, this should work.
15562
25:10:11,120 --> 25:10:17,440
Added a batch size. So do you see the steps we've been through so far? And we're just going to
15563
25:10:17,440 --> 25:10:26,640
unsqueeze this. Unsqueeze on the zero dimension to add a batch size. Oh, it didn't error. Oh my
15564
25:10:26,640 --> 25:10:32,960
goodness. It didn't error. Have a look at that. Yes, that's what we want. We get a prediction
15565
25:10:32,960 --> 25:10:39,680
load it because the raw outputs of our model, we get a load it value for each of our custom classes.
15566
25:10:39,680 --> 25:10:44,720
So this could be pizza. This could be steak. And this could be sushi, depending on the order of
15567
25:10:44,720 --> 25:10:54,160
our classes. Let's just have a look. Class to IDX. Did we not get that? Class names.
15568
25:10:56,240 --> 25:11:01,760
Beautiful. So pizza steak sushi. We've still got a ways to go to convert this into that.
15569
25:11:01,760 --> 25:11:08,400
But I just want to highlight what we've done. So note, to make a prediction on a custom image,
15570
25:11:08,400 --> 25:11:16,000
we had to. And this is something you'll have to keep in mind for almost all of your custom data.
15571
25:11:16,000 --> 25:11:22,600
It needs to be formatted in the same way that your model was trained on. So we had to load the image
15572
25:11:22,600 --> 25:11:35,160
and turn it into a tensor. We had to make sure the image was the same data type as the model.
15573
25:11:35,160 --> 25:11:43,560
So that was torch float 32. And then we had to make sure the image was the same shape as the data
15574
25:11:43,560 --> 25:11:54,760
the model was trained on, which was 64, 64, three with a batch size. So that was one,
15575
25:11:54,760 --> 25:12:02,720
three, 64, 64. And excuse me, this should actually be the other way around. This should be color
15576
25:12:02,720 --> 25:12:09,600
channels first, because we're dealing with pie torch here. 64. And then finally, we had to make
15577
25:12:09,600 --> 25:12:21,120
sure the image was on the same device as our model. So they are three of the big ones that we've
15578
25:12:21,120 --> 25:12:26,160
talked about so much the same data type or data type mismatch will result in a bunch of issues.
15579
25:12:26,160 --> 25:12:33,520
Shape mismatch will result in a bunch of issues. And device mismatch will also result in a bunch
15580
25:12:33,520 --> 25:12:41,760
of issues. If you want these to be highlighted, they are in the learn pie torch.io resource. We have
15581
25:12:41,760 --> 25:12:48,080
putting things together. Where do we have it? Oh, yeah, no, it's in the main takeaway section,
15582
25:12:48,080 --> 25:12:53,160
sorry, predicting on your own custom data with a trained model as possible, as long as you format
15583
25:12:53,160 --> 25:12:58,200
the data into a similar format to what the model was trained on. So make sure you take care of the
15584
25:12:58,200 --> 25:13:03,800
three big pie torch and deep learning errors. Wrong data types, wrong data shapes, and wrong
15585
25:13:03,800 --> 25:13:10,520
devices, regardless of whether that's images or audio or text, these three will follow you around.
15586
25:13:10,520 --> 25:13:17,880
So just keep them in mind. But now we've got some code to predict on custom images, but it's kind
15587
25:13:17,880 --> 25:13:22,440
of all over the place. We've got about 10 coding cells here just to make a prediction on a custom
15588
25:13:22,440 --> 25:13:29,640
image. How about we functionize this and see if it works on our pizza dad image. I'll see you in the
15589
25:13:29,640 --> 25:13:38,840
next video. Welcome back. We're now well on our way to making custom predictions on our own custom
15590
25:13:38,840 --> 25:13:44,840
image data. Let's keep pushing forward. In the last video, we finished off getting some raw model
15591
25:13:44,840 --> 25:13:50,920
logits. So the raw outputs from our model. Now, let's see how we can convert these logits into
15592
25:13:50,920 --> 25:13:59,400
prediction labels. Let's write some code. So convert logits to prediction labels. Or let's go
15593
25:14:00,040 --> 25:14:06,040
convert logits. Let's first convert them to prediction probabilities. Probabilities.
15594
25:14:07,000 --> 25:14:14,840
So how do we do that? Let's go custom image pred probes equals torch dot softmax
15595
25:14:14,840 --> 25:14:22,520
to convert our custom image pred across the first dimension. So the first dimension of this tensor
15596
25:14:22,520 --> 25:14:28,040
will be the inner brackets, of course. So just this little section here. Let's see what these
15597
25:14:28,040 --> 25:14:36,360
look like. This will be prediction probabilities. Wonderful. So you'll notice that these are quite
15598
25:14:36,360 --> 25:14:42,600
spread out. Now, this is not ideal. Ideally, we'd like our model to assign a fairly large
15599
25:14:42,600 --> 25:14:49,240
prediction probability to the target class, the right target class that is. However, since our model
15600
25:14:49,240 --> 25:14:53,720
when we trained it isn't actually performing that all that well. The prediction probabilities
15601
25:14:53,720 --> 25:14:58,680
are quite spread out across all of the classes. But nonetheless, we're just highlighting what
15602
25:14:58,680 --> 25:15:03,320
it's like to predict on custom data. So now let's convert the prediction probabilities
15603
25:15:03,320 --> 25:15:12,680
to prediction labels. Now, you'll notice that we used softmax because why we are working with
15604
25:15:12,680 --> 25:15:19,480
multi class classification data. And so we can get the custom image pred labels, the integers,
15605
25:15:20,040 --> 25:15:28,120
by taking the argmax of the prediction probabilities, custom image pred probes across the first
15606
25:15:28,120 --> 25:15:35,000
dimension as well. So let's go custom image pred labels. Let's see what they look like.
15607
25:15:35,960 --> 25:15:42,680
Zero. So the index here with the highest value is index number zero. And you'll notice that it's
15608
25:15:42,680 --> 25:15:49,240
still on the coded device. So what would happen if we try to index on our class names with
15609
25:15:49,240 --> 25:16:00,760
the custom image pred labels? Or maybe that doesn't need to be a plural. Oh, there we go. We get pizza.
15610
25:16:00,760 --> 25:16:06,440
But you might also have to change this to the CPU later on. Otherwise, you might run into some
15611
25:16:06,440 --> 25:16:11,960
errors. So just be aware of that. So you notice how we just put it to the CPU. So we get pizza. We
15612
25:16:11,960 --> 25:16:16,120
got a correct prediction. But this is as good as guessing in my opinion, because these are kind
15613
25:16:16,120 --> 25:16:22,760
of spread out. Ideally, this value would be higher, maybe something like 0.8 or above for our pizza
15614
25:16:22,760 --> 25:16:31,320
dad image. But nonetheless, our model is getting two thumbs up even on this 64 by 64 image. But
15615
25:16:31,320 --> 25:16:37,000
that's a lot of code that we've written. Let's functionize it. So we can just pass in a file path
15616
25:16:37,000 --> 25:16:42,360
and get a custom prediction from it. So putting custom image prediction together.
15617
25:16:42,360 --> 25:16:52,440
Let's go building a function. So we want the ideal outcome is, let's plot our image as well.
15618
25:16:52,440 --> 25:17:08,760
Ideal outcome is a function where we plot or where we pass an image path to and have our model predict
15619
25:17:08,760 --> 25:17:17,880
on that image and plot the image plus the prediction. So this is our ideal outcome. And I think I'm
15620
25:17:17,880 --> 25:17:24,120
going to issue this as a challenge. So give that a go, put all of our code above together. And you'll
15621
25:17:24,120 --> 25:17:27,880
just have to import the image, you'll have to process it and whatnot. I know I said we were going
15622
25:17:27,880 --> 25:17:31,720
to build a function in this video, but we're going to say that to the next video. I'd like
15623
25:17:31,720 --> 25:17:39,160
you to give that a go. So start from way back up here, import the image via torture vision.io read
15624
25:17:39,160 --> 25:17:45,400
image, format it using what we've done, change the data type, change the shape, change the device,
15625
25:17:46,040 --> 25:17:54,200
and then plot the image with its prediction as the title. So give that a go and we'll do it
15626
25:17:54,200 --> 25:18:02,760
together in the next video. How'd you go? I just realized I had a typo in the previous cell,
15627
25:18:02,760 --> 25:18:07,640
but that's all right. Did you give it a shot? Did you put together the custom image prediction
15628
25:18:07,640 --> 25:18:13,720
in a function format? I'd love it if you did. But if not, that's okay. Let's keep going. Let's see
15629
25:18:13,720 --> 25:18:17,640
what that might look like. And there are many different ways that you could do this. But
15630
25:18:17,640 --> 25:18:21,720
here's one of the ways that I've thought of. So we want to function that's going to
15631
25:18:21,720 --> 25:18:28,440
pred and plot a target image. We wanted to take in a torch model. And so that's going to be ideally
15632
25:18:28,440 --> 25:18:33,960
a trained model. We wanted to also take in an image path, which will be of a string. It can
15633
25:18:33,960 --> 25:18:40,680
take in a class names list so that we can index it and get the prediction label in string format.
15634
25:18:41,400 --> 25:18:46,680
So let's put this as a list of strings. And by default, this can equal none. Just in case we
15635
25:18:46,680 --> 25:18:52,280
just wanted the prediction, it wants to take in a transform so that we can pass it in some form of
15636
25:18:52,280 --> 25:18:58,280
transform to transform the image. And then it's going to take in a device, which will be by default
15637
25:18:58,280 --> 25:19:05,000
the target device. So let's write a little doc string here, makes a prediction on a target image
15638
25:19:05,000 --> 25:19:17,160
with a trained model and plots the image and prediction. Beautiful. Now what do we have to do
15639
25:19:17,160 --> 25:19:25,160
first? Let's load in the image. Load in the image just like we did before with torch vision. So
15640
25:19:25,720 --> 25:19:34,680
target image equals torch vision.io dot read image. And we'll go string on the image path,
15641
25:19:34,680 --> 25:19:40,200
which will be the image path here. And we convert it to a string just in case it doesn't get passed
15642
25:19:40,200 --> 25:19:49,880
in as a string. And then let's change it into type torch float 32. Because we want to make sure that
15643
25:19:49,880 --> 25:19:57,240
our custom image or our custom data is in the same type as what we trained our model on. So now
15644
25:19:57,240 --> 25:20:09,480
let's divide the image pixel values by 255 to get them between zero or to get them between zero
15645
25:20:10,120 --> 25:20:17,960
one as a range. So we can just do this by target image equals target image divided by 255. And we
15646
25:20:17,960 --> 25:20:22,920
could also just do this in one step up here 255. But I've just put it out there just to let you know
15647
25:20:22,920 --> 25:20:30,920
that, hey, read image imports image data as between zero and 255. So our model prefers numbers
15648
25:20:30,920 --> 25:20:37,960
between zero and one. So let's just scale it there. Now we want to transform our data if necessary.
15649
25:20:37,960 --> 25:20:43,400
In our case, it is, but it won't always be. So we want this function to be pretty generic
15650
25:20:43,400 --> 25:20:51,160
predomplot image. So if the transform exists, let's set the target image to the transform,
15651
25:20:51,160 --> 25:20:56,280
or we'll pass it through the transform that is wonderful. And the transform we're going to get
15652
25:20:56,280 --> 25:21:04,840
from here. Now what's left to do? Well, let's make sure the model is on the target device.
15653
25:21:05,960 --> 25:21:10,920
It might be by default, but if we're passing in a device parameter, we may as well make sure the
15654
25:21:10,920 --> 25:21:18,680
model is there too. And now we can make a prediction. So let's turn on a vowel slash inference mode
15655
25:21:18,680 --> 25:21:25,800
and make a prediction with our model. So model, we call a vowel mode, and then with torch dot
15656
25:21:25,800 --> 25:21:30,520
inference mode, because we're making a prediction, we want to turn our model into inference mode,
15657
25:21:30,520 --> 25:21:39,160
or put it in inference mode context. Let's add an extra dimension to the image. Let's go target
15658
25:21:39,160 --> 25:21:43,880
image. We could do this step above, actually, but we're just going to do it here. From kind of
15659
25:21:43,880 --> 25:21:49,080
remembering things on the fly here of what we need to do, we're adding a, this is, let's write
15660
25:21:49,080 --> 25:21:59,720
this down, this is the batch dimension. e g our model will predict on batches of one x image.
15661
25:22:00,520 --> 25:22:05,400
So we're just unsqueezing it to add an extra dimension at the zero dimension space,
15662
25:22:05,400 --> 25:22:09,400
just like we did in a previous video. Now let's make a prediction
15663
25:22:09,400 --> 25:22:16,040
on the image with an extra dimension. Otherwise, if we don't have that extra dimension, we saw
15664
25:22:16,040 --> 25:22:21,160
that we get a shape issue. So right down here, target image pred. And remember, this is going
15665
25:22:21,160 --> 25:22:29,560
to be the raw model outputs, raw logit outputs. We're going to target image pred. And yeah,
15666
25:22:30,120 --> 25:22:35,000
I believe that's all we need for the prediction. Oh wait, there was one more thing, two device.
15667
25:22:35,000 --> 25:22:44,200
Me too. Also make sure the target image is on the right device. Beautiful. So fair
15668
25:22:44,200 --> 25:22:49,160
few steps here, but nothing we can't handle. All we're really doing is replicating what we've done
15669
25:22:49,160 --> 25:22:55,000
for batches of images. But we want to make sure that if someone passed any image to our
15670
25:22:55,640 --> 25:23:00,440
pred and plot image function, that we've got functionality in here to handle that image.
15671
25:23:00,440 --> 25:23:06,360
And do we get this? Oh, we want just target image to device. Did you catch that error?
15672
25:23:06,920 --> 25:23:16,120
So let's keep going. Now let's convert the logits. Our models raw logits. Let's convert those
15673
25:23:16,120 --> 25:23:22,200
to prediction probabilities. This is so exciting. We're getting so close to making a function
15674
25:23:22,200 --> 25:23:27,160
to predict on custom data. So we'll set this to target image pred probes, which is going to be
15675
25:23:27,160 --> 25:23:33,640
torch dot softmax. And we will pass in the target image pred here. We want to get the softmax of
15676
25:23:33,640 --> 25:23:39,480
the first dimension. Now let's convert our prediction probabilities, which is what we get in the line
15677
25:23:39,480 --> 25:23:49,320
above. We want to convert those to prediction labels. So let's get the target image pred labels
15678
25:23:49,320 --> 25:23:56,200
labels equals torch dot argmax. We want to get the argmax of, or in other words, the index,
15679
25:23:56,200 --> 25:24:03,000
which is the maximum value from the pred probes of the first dimension as well. Now what should we
15680
25:24:03,000 --> 25:24:09,240
return here? Well, we don't really need to return anything. We want to create a plot. So let's plot
15681
25:24:09,240 --> 25:24:21,640
the image alongside the prediction and prediction probability. Beautiful. So plot dot in show,
15682
25:24:21,640 --> 25:24:27,240
what are we going to pass in here? We're going to pass in here our target image. Now we have to
15683
25:24:27,240 --> 25:24:33,000
squeeze this, I believe, because we've added an extra dimension up here. So we'll squeeze it to
15684
25:24:33,000 --> 25:24:40,120
remove that batch size. And then we still have to permute it because map plot lib likes images
15685
25:24:40,120 --> 25:24:47,560
in the format color channels last one, two, zero. So remove batch dimension.
15686
25:24:50,040 --> 25:24:59,960
And rearrange shape to be hc hwc. That is color channels last. Now if the class names parameter
15687
25:24:59,960 --> 25:25:05,720
exists, so we've passed in a list of class names, this function is really just replicating
15688
25:25:05,720 --> 25:25:11,000
everything we've done in the past 10 cells, by the way. So right back up here, we're replicating
15689
25:25:11,000 --> 25:25:15,480
all of this stuff in one function. So pretty large function, but once we've written it,
15690
25:25:15,480 --> 25:25:22,920
we can pass in our images as much as we like. So if class names exist, let's set the title
15691
25:25:22,920 --> 25:25:29,960
to our showcase that class name. So the pred is going to be class names. Let's index on that
15692
25:25:29,960 --> 25:25:36,600
pred image, or target image pred label. And this is where we'll have to put it to the CPU,
15693
25:25:36,600 --> 25:25:42,840
because if we're using a title with map plot lib, map plot lib cannot handle things that are on
15694
25:25:42,840 --> 25:25:48,920
the GPU. This is why we have to put it to the CPU. And then I believe that should be enough for
15695
25:25:48,920 --> 25:25:55,640
that. Let's add a little line in here, so that we can have it. Oh, I've missed something.
15696
25:25:56,520 --> 25:26:03,240
An outside bracket there. Wonderful. Let's add the prediction probability, because that's always
15697
25:26:03,240 --> 25:26:09,960
fun to see. So we want target image pred probs. And we want to get the maximum pred problem from
15698
25:26:09,960 --> 25:26:16,280
that. And we'll also put that on the CPU. And I think we might get this three decimal places.
15699
25:26:16,280 --> 25:26:24,600
Now this is saying, oh, pred labels, we don't need that. We need just non plural, beautiful. Now,
15700
25:26:24,600 --> 25:26:32,760
if the class names doesn't exist, let's just set the title equal to f f string, we'll go pred,
15701
25:26:34,040 --> 25:26:39,800
target image pred label. Is Google Colab still telling me this is wrong?
15702
25:26:39,800 --> 25:26:45,480
Target image pred label. Oh, no, we've still got the same thing. It just hasn't caught up with me,
15703
25:26:45,480 --> 25:26:51,160
and I'm coding a bit fast here. And then we'll pass in the prob, which will be just the same as
15704
25:26:51,160 --> 25:27:03,880
above. I could even copy this in. Beautiful. And let's now set the title to the title. And we
15705
25:27:03,880 --> 25:27:11,080
and we will turn the axes off. PLT axes false. Fair bit of code there. But this is going to be a
15706
25:27:11,080 --> 25:27:16,840
super exciting moment. Let's see what this looks like. When we pass it in a target image and a
15707
25:27:16,840 --> 25:27:23,000
target model, some class names, and a transform. Are you ready? We've got our transform ready,
15708
25:27:23,000 --> 25:27:29,160
by the way, it's back up here. Custom image transform. It's just going to resize our image.
15709
25:27:29,160 --> 25:27:36,040
So let's see. Oh, this file was updated remotely or in another tab. Sometimes this happens, and
15710
25:27:36,040 --> 25:27:39,960
usually Google Colab sorts itself out, but that's all right. It doesn't affect our code for now.
15711
25:27:39,960 --> 25:27:46,040
Pred on our custom image. Are you ready? Save failed. Would you like to override? Yes, I would.
15712
25:27:47,000 --> 25:27:52,440
So you might see that in Google Colab. Usually it fixes itself. There we go. Save successfully.
15713
25:27:52,440 --> 25:27:58,040
Pred and plot image. I was going to say, Google Colab, don't fail me now. We're about to predict
15714
25:27:58,040 --> 25:28:07,080
on our own custom data. Using a model trained on our own custom data. Image part. Let's pass in
15715
25:28:07,080 --> 25:28:13,080
custom image path, which is going to be the path to our pizza dad image. Let's go class names,
15716
25:28:13,080 --> 25:28:19,160
equals class names, which is pizza, steak, and sushi. We'll pass in our transform to convert our
15717
25:28:19,160 --> 25:28:28,520
image to the right shape and size custom image transform. And then finally, the target device is
15718
25:28:28,520 --> 25:28:33,320
going to be device. Are you ready? Let's make a prediction on custom data. One of my favorite
15719
25:28:33,320 --> 25:28:38,840
things. One of the most fun things to do when building deep learning models. Three, two, one.
15720
25:28:38,840 --> 25:28:49,880
How did it go? Oh, no. What did we get wrong? CPU. Okay. Such a so close, but yet so far.
15721
25:28:50,760 --> 25:28:57,960
Has no attribute CPU. Oh, maybe we need to put this to CPU. That's where I got the square bracket
15722
25:28:57,960 --> 25:29:03,800
wrong. So that's what we needed to change. We needed to because this is going to be potentially
15723
25:29:03,800 --> 25:29:09,640
on the GPU. Tag image pred label. We need to put it on the CPU. We need to do that. Why?
15724
25:29:09,640 --> 25:29:14,920
Because this is going to be the title of our map plot lib plot. And map plot lib doesn't interface
15725
25:29:14,920 --> 25:29:25,240
too well with data on a GPU. Let's try it again. Three, two, one, running. Oh, look at that.
15726
25:29:25,240 --> 25:29:30,520
Prediction on a custom image. And it gets it right. Two thumbs up. I didn't plan this. Our model is
15727
25:29:30,520 --> 25:29:36,040
performing actually quite poorly. So this is as good as a guess to me. You might want to try this
15728
25:29:36,040 --> 25:29:41,160
on your own image. And in fact, if you do, please share it with me. I would love to see it. But
15729
25:29:41,800 --> 25:29:47,640
you could potentially try this with another model. See what happens? Steak. Okay, there we go. So
15730
25:29:47,640 --> 25:29:55,880
even though model one performs worse quantitatively, it performs better qualitatively. So that's the
15731
25:29:55,880 --> 25:30:01,640
power of a visualize, visualize, visualize. And if we use model zero, also, which isn't performing
15732
25:30:01,640 --> 25:30:08,600
too well, it gets it wrong with a prediction probability of 0.368, which isn't too high either.
15733
25:30:09,160 --> 25:30:13,400
So we've talked about a couple of different ways to improve our models. Now we've even
15734
25:30:13,400 --> 25:30:19,240
got a way to make predictions on our own custom images. So give that a shot. I'd love to see
15735
25:30:19,240 --> 25:30:25,080
your custom predictions, upload an image here if you want, or download it into Google Colab using
15736
25:30:25,080 --> 25:30:32,840
code that we've used before. But we've come a fairly long way. I feel like we've covered enough
15737
25:30:32,840 --> 25:30:38,360
for custom data sets. Let's summarize what we've covered in the next video. And I've got a bunch
15738
25:30:38,360 --> 25:30:43,800
of exercises and extra curriculum for you. So this is exciting stuff. I'll see you in the next video.
15739
25:30:47,160 --> 25:30:52,200
In the last video, we did the very exciting thing of making a prediction on our own custom
15740
25:30:52,200 --> 25:30:57,400
image, although it's quite pixelated. And although our models performance quantitatively didn't
15741
25:30:57,400 --> 25:31:03,160
turn out to be too good qualitatively, it happened to work out. But of course, there are a fair few
15742
25:31:03,160 --> 25:31:08,760
ways that we could improve our models performance. But the main takeaway here is that we had to do
15743
25:31:08,760 --> 25:31:16,200
a bunch of pre processing to make sure our custom image was in the same format as what our model
15744
25:31:16,200 --> 25:31:21,400
expected. And this is quite a lot of what I do behind the scenes for Nutrify. If you upload an
15745
25:31:21,400 --> 25:31:27,400
image here, it gets pre processed in a similar way to go through our image classification model
15746
25:31:27,400 --> 25:31:34,760
to output a label like this. So let's get out of this. To summarize, I've got a colorful slide here,
15747
25:31:34,760 --> 25:31:40,120
but we've already covered this predicting on custom data. These are three things to make sure of,
15748
25:31:40,120 --> 25:31:46,040
regardless of whether you're using images, text or audio, make sure your data is in the right
15749
25:31:46,040 --> 25:31:53,080
data type. In our case, it was torch float 32. Make sure your data is on the same device as the model.
15750
25:31:53,080 --> 25:31:59,880
So we had to put our custom image to the GPU, which was where our model also lived. And then we had
15751
25:31:59,880 --> 25:32:05,480
to make sure our data was in the correct shape. So the original shape was 64, 64, 3. Actually,
15752
25:32:05,480 --> 25:32:10,120
this should be reversed, because it was color channels first. But the same principle remains here.
15753
25:32:10,120 --> 25:32:17,480
We had to add a batch dimension and rearrange if we needed. So in our case, we used images of this
15754
25:32:17,480 --> 25:32:24,680
shape batches first color channels first height width. But depending on your problem will depend
15755
25:32:24,680 --> 25:32:29,240
on your shape, depending on the device you're using will depend on where your data and your
15756
25:32:29,240 --> 25:32:34,120
model lives. And depending on the data type you're using will depend on what you're using for torch
15757
25:32:34,120 --> 25:32:43,560
float 32 or something else. So let's summarize. If we go here main takeaways, you can read through
15758
25:32:43,560 --> 25:32:49,320
these, but some of the big ones are pie torch has many built in functions to deal with all kinds
15759
25:32:49,320 --> 25:32:55,400
of data from vision to text to audio to recommendation systems. So if we look at the pie torch docs,
15760
25:32:57,480 --> 25:33:01,480
you're going to become very familiar with these over time. We've got torch audio data,
15761
25:33:01,480 --> 25:33:06,280
torch text, torch vision is what we practiced with. And we've got a whole bunch of things here for
15762
25:33:06,280 --> 25:33:13,320
transforming and augmenting images, data sets, utilities, operators, and torch data is currently
15763
25:33:13,320 --> 25:33:19,160
in beta. But this is just something to be aware of later on. So it's a prototype library right now,
15764
25:33:19,160 --> 25:33:24,680
but by the time you watch this, it might be available. But it's another way of loading data.
15765
25:33:24,680 --> 25:33:31,640
So just be aware of this for later on. And if we come back to up here, if applied to watch built
15766
25:33:31,640 --> 25:33:36,440
in data loading functions, don't suit your requirements, you can write your own custom
15767
25:33:36,440 --> 25:33:42,600
data set classes by subclassing torch dot utils dot data dot data set. And we saw that way back
15768
25:33:42,600 --> 25:33:50,360
up here in option number two. Option two, here we go, loading image data with a custom data set,
15769
25:33:50,360 --> 25:33:56,040
wrote plenty of code to do that. And then a lot of machine learning is dealing with the
15770
25:33:56,040 --> 25:34:00,280
balance between overfitting and underfitting. We've got a whole section in the book here to
15771
25:34:00,280 --> 25:34:04,120
check out what an ideal loss curve should look like and how to deal with overfitting,
15772
25:34:04,120 --> 25:34:09,960
how to deal with underfitting. It's it is a fine line. So much of the research and machine
15773
25:34:09,960 --> 25:34:16,600
learning is actually dedicated towards this balance. And then three big things for being aware of
15774
25:34:16,600 --> 25:34:21,960
when you're predicting on your own custom data, wrong data types, wrong data shapes,
15775
25:34:21,960 --> 25:34:27,000
and wrong devices. This will follow you around, as I said, and we saw that in practice to get our
15776
25:34:27,000 --> 25:34:32,760
own custom image ready for a trained model. Now, we have some exercises here. If you'd like
15777
25:34:32,760 --> 25:34:38,200
the link to it, you can go to loan pytorch.io section number four exercises, and of course,
15778
25:34:38,200 --> 25:34:41,960
extra curriculum. A lot of the things I've mentioned throughout the course that would be a good
15779
25:34:41,960 --> 25:34:47,560
resource to check out contained in here. But the exercises, this is this is your time to shine,
15780
25:34:47,560 --> 25:34:52,120
your time to practice. Let's go back to this notebook, scroll right down to the bottom.
15781
25:34:52,120 --> 25:35:00,760
Look how much code we've written. Goodness me, exercises for all exercises and extra curriculum.
15782
25:35:02,760 --> 25:35:08,920
See here, turn that into markdown. Wonderful. And so if we go in here, you've got a couple of
15783
25:35:08,920 --> 25:35:15,000
resources. There's an exercise template notebook for number four, and example solutions for notebook
15784
25:35:15,000 --> 25:35:21,080
number four, which is what we're working on now. So of course, I'd encourage you to go through the
15785
25:35:21,080 --> 25:35:27,880
pytorch custom data sets exercises template first. Try to fill out all of the code here on your own.
15786
25:35:27,880 --> 25:35:32,440
So we've got some questions here. We've got some dummy code. We've got some comments.
15787
25:35:32,440 --> 25:35:38,200
So give that a go. Go through this. Use this book resource to reference. Use all the code
15788
25:35:38,200 --> 25:35:43,080
we've written. Use the documentation, whatever you want. But try to go through this on your own.
15789
25:35:43,080 --> 25:35:47,560
And then if you get stuck somewhere, you can look at an example solution that I created,
15790
25:35:47,560 --> 25:35:53,160
which is here, pytorch custom data sets exercise solutions. And just be aware that this is just
15791
25:35:53,160 --> 25:35:57,480
one way of doing things. It's not necessarily the best. It's just a way to reference what
15792
25:35:57,480 --> 25:36:03,400
you're writing to what I would do. And there's actually now live walkthroughs of the solutions,
15793
25:36:03,400 --> 25:36:08,680
errors and all on YouTube. So if you go to this video, which is going to mute. So this is me
15794
25:36:08,680 --> 25:36:13,960
live streaming the whole thing, writing a bunch of pytorch code. If you just keep going through all
15795
25:36:13,960 --> 25:36:19,800
of that, you'll see me writing all of the solutions, running into errors, trying different things,
15796
25:36:19,800 --> 25:36:25,160
et cetera, et cetera. But that's on YouTube. You can check that out on your own time. But I feel
15797
25:36:25,160 --> 25:36:32,360
like we've covered enough exercises. Oh, by the way, this is in the extras exercises tab
15798
25:36:32,360 --> 25:36:37,640
of the pytorch deep learning repo. So extras exercises and solutions that are contained in there.
15799
25:36:39,160 --> 25:36:45,960
Far out. We've covered a lot. Look at all that. So that has been pytorch custom data sets.
15800
25:36:46,600 --> 25:36:55,240
I will see you in the next section. Holy smokes. That was a lot of pytorch code.
15801
25:36:56,360 --> 25:37:01,640
But if you're still hungry for more, there is five more chapters available at learnpytorch.io,
15802
25:37:01,640 --> 25:37:06,840
which cover transfer learning, my favorite topic, pytorch model experiment tracking,
15803
25:37:06,840 --> 25:37:11,880
pytorch paper replicating, and pytorch model deployment. How do you get your model into the
15804
25:37:11,880 --> 25:37:17,400
hands of others? And if you'd like to learn in this video style, the videos for those chapters
15805
25:37:17,400 --> 25:37:32,520
are available at zero to mastery.io. But otherwise, happy machine learning. And I'll see you next time.
1984018
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.