All language subtitles for PyTorch for Deep Learning & Machine Learning – Full Course [English] [GetSubs.cc]

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian Download
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,000 --> 00:00:05,280 This comprehensive course will teach you the foundations of machine learning and deep learning 2 00:00:05,280 --> 00:00:11,360 using PyTorch. PyTorch is a machine learning framework written in Python. You'll learn machine 3 00:00:11,360 --> 00:00:17,280 learning by writing PyTorch code. So when in doubt, run the provided code and experiment. 4 00:00:17,280 --> 00:00:23,520 Your teacher for this course is Daniel Bourke. Daniel is a machine learning engineer and popular 5 00:00:23,520 --> 00:00:28,080 course creator. So enjoy the course and don't watch the whole thing in one sitting. 6 00:00:28,080 --> 00:00:34,240 Hello, welcome to the video. It's quite a big one. But if you've come here to learn machine 7 00:00:34,240 --> 00:00:40,800 learning and deep learning and PyTorch code, well, you're in the right place. Now, this video and 8 00:00:40,800 --> 00:00:45,520 tutorial is focused for beginners who have got about three to six months of Python coding experience. 9 00:00:46,400 --> 00:00:50,000 So we're going to cover a whole bunch of important machine learning concepts 10 00:00:50,000 --> 00:00:57,120 by writing PyTorch code. Now, if you get stuck, you can leave a comment below or post on the course 11 00:00:57,120 --> 00:01:02,560 GitHub discussions page. And on GitHub is where you'll be able to find all the materials that we cover, 12 00:01:02,560 --> 00:01:09,280 as well as on learn pytorch.io. There's an online readable book version of this course there. 13 00:01:10,160 --> 00:01:15,280 But if you finish this video and you find that, hey, I would still like to learn more PyTorch. 14 00:01:15,280 --> 00:01:19,600 I mean, you can't really cover all the PyTorch in a day that video titles just apply on words of 15 00:01:19,600 --> 00:01:25,840 the length of video. That's an aside. There is five more chapters available at learn pytorch.io, 16 00:01:25,840 --> 00:01:30,640 covering everything from transfer learning to model deployment to experiment tracking. 17 00:01:30,640 --> 00:01:36,400 And all the videos to go with those are available at zero to mastery.io. But that's enough for me. 18 00:01:37,280 --> 00:01:40,400 Having machine learning and I'll see you inside. 19 00:01:45,120 --> 00:01:50,720 Hello, my name is Daniel and welcome to the deep learning with 20 00:01:50,720 --> 00:02:00,160 PyTorch course. Now, that was too good not to watch twice. Welcome to the deep learning with 21 00:02:01,920 --> 00:02:07,280 cools at fire PyTorch course. So this is very exciting. Are you going to see that animation 22 00:02:07,280 --> 00:02:13,040 quite a bit because, I mean, it's fun and PyTorch's symbol is a flame because of torch. 23 00:02:13,040 --> 00:02:17,520 But let's get into it. So naturally, if you've come to this course, you might have already 24 00:02:17,520 --> 00:02:21,280 researched what is deep learning, but we're going to cover it quite briefly. 25 00:02:21,840 --> 00:02:26,720 And just in the sense of how much you need to know for this course, because we're going to be 26 00:02:26,720 --> 00:02:30,960 more focused on, rather than just definitions, we're going to be focused on getting practical 27 00:02:30,960 --> 00:02:36,080 and seeing things happen. So if we define what machine learning is, because as we'll see in a 28 00:02:36,080 --> 00:02:41,600 second, deep learning is a subset of machine learning. Machine learning is turning things 29 00:02:41,600 --> 00:02:49,280 data, which can be almost anything, images, text, tables of numbers, video, audio files, 30 00:02:49,280 --> 00:02:54,640 almost anything can be classified as data into numbers. So computers love numbers, 31 00:02:55,200 --> 00:03:01,840 and then finding patterns in those numbers. Now, how do we find those patterns? Well, 32 00:03:01,840 --> 00:03:05,680 the computer does this part specifically a machine learning algorithm or a deep learning 33 00:03:05,680 --> 00:03:11,760 algorithm of things that we're going to be building in this course. How? Code and math. Now, 34 00:03:11,760 --> 00:03:16,320 this course is code focused. I want to stress that before you get into it. We're focused on 35 00:03:16,320 --> 00:03:21,760 writing code. Now, behind the scenes, that code is going to trigger some math to find patterns in 36 00:03:21,760 --> 00:03:26,720 those numbers. If you would like to deep dive into the math behind the code, I'm going to be 37 00:03:26,720 --> 00:03:32,080 linking extra resources for that. However, we're going to be getting hands on and writing lots of 38 00:03:32,080 --> 00:03:37,280 code to do lots of this. And so if we keep going to break things down a little bit more, 39 00:03:37,920 --> 00:03:43,360 machine learning versus deep learning, if we have this giant bubble here of artificial 40 00:03:43,360 --> 00:03:48,320 intelligence, you might have seen something similar like this on the internet. I've just 41 00:03:48,320 --> 00:03:53,280 copied that and put it into pretty colors for this course. So you've got this overarching 42 00:03:54,160 --> 00:04:00,000 big bubble of the topic of artificial intelligence, which you could define as, again, almost anything 43 00:04:00,000 --> 00:04:04,960 you want. Then typically, there's a subset within artificial intelligence, which is known as machine 44 00:04:04,960 --> 00:04:10,960 learning, which is quite a broad topic. And then within machine learning, you have another topic 45 00:04:10,960 --> 00:04:16,000 called deep learning. And so that's what we're going to be focused on working with PyTorch, 46 00:04:16,720 --> 00:04:22,320 writing deep learning code. But again, you could use PyTorch for a lot of different machine 47 00:04:22,320 --> 00:04:29,920 learning things. And truth be told, I kind of use these two terms interchangeably. Yes, ML is the 48 00:04:29,920 --> 00:04:36,080 broader topic and deep learning is a bit more nuanced. But again, if you want to form your 49 00:04:36,080 --> 00:04:41,360 own definitions of these, I'd highly encourage you to do so. This course is more focused on, 50 00:04:41,360 --> 00:04:47,280 rather than defining what things are, is seeing how they work. So this is what we're focused on doing. 51 00:04:47,280 --> 00:04:52,320 Just to break things down, if you're familiar with the fundamentals of machine learning, 52 00:04:52,320 --> 00:04:58,320 you probably understand this paradigm, but we're going to just rehash on it anyway. So if we 53 00:04:58,320 --> 00:05:04,160 consider traditional programming, let's say you'd like to write a computer program that's enabled to, 54 00:05:04,160 --> 00:05:12,320 or has the ability to reproduce your grandmother's favorite or famous roast chicken dish. And so we 55 00:05:12,320 --> 00:05:17,440 might have some inputs here, which are some beautiful vegetables, a chicken that you've raised on the 56 00:05:17,440 --> 00:05:24,000 farm. You might write down some rules. This could be your program, cut the vegetables, season the 57 00:05:24,000 --> 00:05:28,320 chicken, preheat the oven, cook the chicken for 30 minutes and add vegetables. Now, it might not 58 00:05:28,320 --> 00:05:33,760 be this simple, or it might actually be because your Sicilian grandmother is a great cook. So she's 59 00:05:33,760 --> 00:05:39,440 put things into an art now and can just do it step by step. And then those inputs combined with 60 00:05:39,440 --> 00:05:46,800 those rules makes this beautiful roast chicken dish. So that's traditional programming. Now, 61 00:05:46,800 --> 00:05:52,480 a machine learning algorithm typically takes some inputs and some desired outputs and then 62 00:05:52,480 --> 00:05:59,280 figures out the rules. So the patterns between the inputs and the outputs. So where in traditional 63 00:05:59,280 --> 00:06:05,200 program, we had to hand write all of these rules, the ideal machine learning algorithm will figure 64 00:06:05,200 --> 00:06:12,320 out this bridge between our inputs and our idealized output. Now, in the machine learning sense, this 65 00:06:12,320 --> 00:06:18,400 is typically described as supervised learning, because you will have some kind of input with 66 00:06:18,400 --> 00:06:24,240 some kind of output, also known as features, and also known as labels. And the machine learning 67 00:06:24,240 --> 00:06:30,880 algorithm's job is to figure out the relationships between the inputs or the features and the outputs 68 00:06:30,880 --> 00:06:36,960 or the label. So if we wanted to write a machine learning algorithm to figure out our Sicilian 69 00:06:36,960 --> 00:06:42,880 grandmother's famous roast chicken dish, we would probably gather a bunch of inputs of ingredients 70 00:06:42,880 --> 00:06:47,840 such as these delicious vegetables and chicken, and then have a whole bunch of outputs of the 71 00:06:47,840 --> 00:06:53,920 finished product and see if our algorithm can figure out what we should do to go from these 72 00:06:53,920 --> 00:07:00,480 inputs to output. So that's almost enough to cover of the difference between traditional programming 73 00:07:00,480 --> 00:07:05,280 and machine learning as far as definitions go. We're going to get hands on encoding these sort 74 00:07:05,280 --> 00:07:12,720 of algorithms throughout the course. For now, let's go to the next video and ask the question, 75 00:07:12,720 --> 00:07:17,200 why use machine learning or deep learning? And actually, before we get there, I'd like you to 76 00:07:17,200 --> 00:07:22,400 think about that. So going back to what we just saw, the paradigm between traditional programming 77 00:07:22,400 --> 00:07:28,960 and machine learning, why would you want to use machine learning algorithms rather than 78 00:07:28,960 --> 00:07:33,920 traditional programming? So if you had to write all these rules, could that get cumbersome? 79 00:07:34,800 --> 00:07:37,840 So have a think about it and we'll cover it in the next video. 80 00:07:41,040 --> 00:07:45,520 Welcome back. So in the last video, we covered briefly the difference between 81 00:07:45,520 --> 00:07:49,920 traditional programming and machine learning. And again, I don't want to spend too much time 82 00:07:49,920 --> 00:07:54,800 on definitions. I'd rather you see this in practice. And I left you with the question, 83 00:07:54,800 --> 00:08:00,800 why would you want to use machine learning or deep learning? Well, let's think of a good reason. 84 00:08:01,360 --> 00:08:06,400 Why not? I mean, if we had to write all those handwritten rules to reproduce Alsace and grandmother's 85 00:08:06,400 --> 00:08:13,280 roast chicken dish all the time, that would be quite cumbersome, right? Well, let's draw a line 86 00:08:13,280 --> 00:08:18,560 on that. Why not? What's a better reason? And kind of what we just said, right? For a complex 87 00:08:18,560 --> 00:08:24,160 problem, can you think of all the rules? So let's imagine we're trying to build a self-driving car. 88 00:08:24,800 --> 00:08:30,480 Now, if you've learned to drive, you've probably done so in maybe 20 hours, 100 hours. But now, 89 00:08:30,480 --> 00:08:34,960 I'll give you a task of writing down every single rule about driving. How do you back out of your 90 00:08:34,960 --> 00:08:40,000 driveway? How do you turn left and go down the street? How do you park a reverse park? How do 91 00:08:40,000 --> 00:08:45,920 you stop at an intersection? How do you know how fast to go somewhere? So we just listed half a 92 00:08:45,920 --> 00:08:50,400 dozen rules. But you could probably go a fair few more. You might get into the thousands. 93 00:08:50,400 --> 00:08:56,480 And so for a complex problem, such as driving, can you think of all the rules? Well, probably not. 94 00:08:56,480 --> 00:09:02,960 So that's where machine learning and deep learning come in to help. And so this is a beautiful comment 95 00:09:02,960 --> 00:09:08,960 I like to share with you on one of my YouTube videos is my 2020 machine learning roadmap. 96 00:09:08,960 --> 00:09:13,360 And this is from Yashawing. I'm probably going to mispronounce this if I even try to. 97 00:09:13,360 --> 00:09:18,480 But Yashawing says, I think you can use ML. So ML is machine learning. I'm going to use that 98 00:09:18,480 --> 00:09:22,880 a lot throughout the course, by the way. ML is machine learning, just so you know. 99 00:09:22,880 --> 00:09:27,920 For literally anything, as long as you can convert it into numbers, ah, that's what we said before, 100 00:09:27,920 --> 00:09:33,280 machine learning is turning something into computer readable numbers. And then programming it to find 101 00:09:33,280 --> 00:09:38,400 patterns, except with a machine learning algorithm, typically we write the algorithm and it finds 102 00:09:38,400 --> 00:09:44,640 the patterns, not us. And so literally it could be anything, any input or output from the universe. 103 00:09:44,640 --> 00:09:51,360 That's pretty darn cool about machine learning, right? But should you always use it just because 104 00:09:51,360 --> 00:09:58,320 it could be used for anything? Well, I'd like to also introduce you to Google's number one rule 105 00:09:58,320 --> 00:10:05,200 of machine learning. Now, if you can build a simple rule based system such as the step of five 106 00:10:05,200 --> 00:10:10,560 rules that we had to map the ingredients to our Sicilian grandmothers roast chicken dish, 107 00:10:10,560 --> 00:10:15,680 if you can write just five steps to do that, that's going to work every time, well, you should 108 00:10:15,680 --> 00:10:20,160 probably do that. So if you can build a simple rule based system that doesn't require machine 109 00:10:20,160 --> 00:10:26,320 learning, do that. And of course, maybe it's not so very simple, but maybe you can just write some 110 00:10:26,320 --> 00:10:31,120 rules to solve the problem that you're working on. And this is from a wise software engineer, 111 00:10:31,120 --> 00:10:36,080 which is, I kind of hinted at it before, rule one of Google's machine learning handbook. Now, 112 00:10:36,080 --> 00:10:39,680 I'm going to highly recommend you read through that, but we're not going to go through that in 113 00:10:39,680 --> 00:10:44,720 this video. So check that out. You can Google that otherwise the links will be where you get links. 114 00:10:45,440 --> 00:10:49,840 So just keep that in mind, although machine learning is very powerful and very fun and very 115 00:10:49,840 --> 00:10:54,880 excited, it doesn't mean that you should always use it. I know this is quite the thing to be saying 116 00:10:54,880 --> 00:10:59,200 at the start of a deep learning machine learning course, but I just want you to keep in mind, 117 00:10:59,200 --> 00:11:04,640 simple rule based systems are still good. Machine learning isn't a solve all for everything. 118 00:11:05,440 --> 00:11:09,440 Now, let's have a look at what deep learning is good for, but I'm going to leave you on a 119 00:11:09,440 --> 00:11:13,040 clip hammock because we're going to check this out in the next video. See you soon. 120 00:11:14,960 --> 00:11:20,080 In the last video, we familiarized ourselves with Google's number one rule of machine learning, 121 00:11:20,080 --> 00:11:25,360 which is basically if you don't need it, don't use it. And with that in mind, 122 00:11:25,360 --> 00:11:29,520 what should we actually be looking to use machine learning or deep learning for? 123 00:11:30,720 --> 00:11:35,280 Well, problems with long lists of rules. So when the traditional approach fails to 124 00:11:35,280 --> 00:11:40,480 remember the traditional approach is you have some sort of data input, you write a list of rules for 125 00:11:40,480 --> 00:11:45,040 that data to be manipulated in some way, shape, or form, and then you have the outputs that you 126 00:11:45,040 --> 00:11:50,320 know. But if you have a long, long list of rules, like the rules of driving a car, which could be 127 00:11:50,320 --> 00:11:54,640 hundreds, could be thousands, could be millions, who knows, that's where machine learning and 128 00:11:54,640 --> 00:11:59,200 deep learning may help. And it kind of is at the moment in the world of self-driving cars, 129 00:11:59,200 --> 00:12:02,400 machine learning and deep learning are the state of the art approach. 130 00:12:03,680 --> 00:12:08,800 Continually changing environments. So whatever the benefits of deep learning is that it can 131 00:12:08,800 --> 00:12:15,040 keep learning if it needs to. And so it can adapt and learn to new scenarios. So if you update the 132 00:12:15,040 --> 00:12:21,040 data that your model was trained on, it can adjust to new different kinds of data in the future. 133 00:12:21,040 --> 00:12:26,000 So similarly to if you are driving a car, you might know your own neighborhood very well. 134 00:12:26,000 --> 00:12:30,960 But then when you go to somewhere you haven't been before, sure you can draw on the foundations 135 00:12:30,960 --> 00:12:35,600 of what you know, but you're going to have to adapt. How fast should you go? Where should you 136 00:12:35,600 --> 00:12:40,240 stop? Where should you park? These kinds of things. So with problems with long lists of rules, 137 00:12:40,240 --> 00:12:48,640 or continually changing environments, or if you had a large, large data set. And so this is where 138 00:12:48,640 --> 00:12:55,200 deep learning is flourishing in the world of technology. So let's give an example. One of my 139 00:12:55,200 --> 00:13:00,880 favorites is the food 101 data set, which you can search for online, which is images of 101 140 00:13:00,880 --> 00:13:07,280 different kinds of foods. Now we briefly looked at what a rule list might look like for cooking 141 00:13:07,280 --> 00:13:14,080 your grandmother's famous Sicilian roast chicken dish. But can you imagine if you wanted to build 142 00:13:14,080 --> 00:13:19,600 an app that could take photos of different food, how long your list of rules would be to differentiate 143 00:13:19,600 --> 00:13:24,880 101 different foods? It'd be so long. You need rule sets for every single one. Let's just take 144 00:13:24,880 --> 00:13:32,000 one food, for example. How do you write a program to tell what a banana looks like? I mean you'd 145 00:13:32,000 --> 00:13:36,400 have to code what a banana looks like, but not only a banana, what everything that isn't a banana 146 00:13:36,400 --> 00:13:42,720 looks like. So keep this in mind. What deep learning is good for? Problems with long lists of rules, 147 00:13:42,720 --> 00:13:47,360 continually changing environments, or discovering insights within large collections of data. 148 00:13:48,480 --> 00:13:52,880 Now, what deep learning is not good for? And I'm going to write typically here because, 149 00:13:53,440 --> 00:13:57,280 again, this is problem specific. Deep learning is quite powerful these days and things might 150 00:13:57,280 --> 00:14:01,360 change in the future. So keep an open mind, if there's anything about this course, it's not for 151 00:14:01,360 --> 00:14:07,040 me to tell you exactly what's what. It's for me to spark a curiosity into you to figure out what's 152 00:14:07,040 --> 00:14:13,760 what, or even better yet, what's not what. So when you need explainability, as we'll see, 153 00:14:13,760 --> 00:14:19,200 the patterns learned by a deep learning model, which is lots of numbers, called weights and biases, 154 00:14:19,200 --> 00:14:24,320 we'll have a look at that later on, are typically uninterpretable by a human. So some of the times 155 00:14:24,320 --> 00:14:30,240 deep learning models can have a million, 10 million, 100 million, a billion, some models are getting 156 00:14:30,240 --> 00:14:36,080 into the trillions of parameters. When I say parameters, I mean numbers or patterns in data. 157 00:14:36,080 --> 00:14:40,800 Remember, machine learning is turning things into numbers and then writing a machine learning model 158 00:14:40,800 --> 00:14:46,160 to find patterns in those numbers. So sometimes those patterns themselves can be lists of numbers 159 00:14:46,160 --> 00:14:50,080 that are in the millions. And so can you imagine looking at a list of numbers that has a million 160 00:14:50,080 --> 00:14:54,400 different things going on? That's going to be quite hard. I find it hard to understand 161 00:14:54,400 --> 00:15:00,720 three or four numbers, let alone a million. And when the traditional approach is a better option, 162 00:15:00,720 --> 00:15:05,680 again, this is Google's rule number one of machine learning. If you can do what you need to do with 163 00:15:05,680 --> 00:15:12,000 a simple rule based system, well, maybe you don't need to use machine learning or deep learning. 164 00:15:12,000 --> 00:15:15,840 Again, I'm going to use the deep learning machine learning terms interchangeably. 165 00:15:15,840 --> 00:15:20,640 I'm not too concerned with definitions. You can form your own definitions, but just so you know, 166 00:15:20,640 --> 00:15:26,960 from my perspective, ML and deep learning are quite similar. When arrows are unacceptable. 167 00:15:26,960 --> 00:15:31,200 So since the outputs of a deep learning model aren't always predictable, we'll see that deep 168 00:15:31,200 --> 00:15:35,760 learning models are probabilistic. That means they're when they predict something, they're making a 169 00:15:35,760 --> 00:15:42,240 probabilistic bet on it. Whereas in a rule based system, you kind of know what the outputs are 170 00:15:42,240 --> 00:15:48,080 going to be every single time. So if you can't have errors based on probabilistic errors, 171 00:15:48,080 --> 00:15:53,200 well, then you probably shouldn't use deep learning and you'd like to go back to a simple rule based 172 00:15:53,200 --> 00:15:59,680 system. And then finally, when you don't have much data, so deep learning models usually require a 173 00:15:59,680 --> 00:16:04,560 fairly large amount of data to produce great results. However, there's a caveat here, you know, 174 00:16:04,560 --> 00:16:09,680 at the start, I said typically, we're going to see some techniques of how to get great results 175 00:16:09,680 --> 00:16:14,320 without huge amounts of data. And again, I wrote typically here because there are techniques, 176 00:16:14,320 --> 00:16:18,720 you can just research deep learning explainability. You're going to find a whole bunch of stuff. 177 00:16:18,720 --> 00:16:24,000 You can look up examples of when machine learning versus deep learning. And then when arrows are 178 00:16:24,000 --> 00:16:30,320 unacceptable, again, there are ways to make your model reproducible. So it predicts you know what's 179 00:16:30,320 --> 00:16:36,800 going to come out. So we do a lot of testing to verify this as well. And so what's next? Ah, 180 00:16:37,360 --> 00:16:40,560 we've got machine learning versus deep learning, and we're going to have a look at some different 181 00:16:40,560 --> 00:16:46,560 problem spaces in a second, and mainly breaking down in terms of what kind of data you have. 182 00:16:46,560 --> 00:16:50,560 Not going to do this now prevent this video from getting too long. We'll cover all these 183 00:16:50,560 --> 00:16:57,760 colorful beautiful pictures in the next video. Welcome back. So in the last video, we covered a 184 00:16:57,760 --> 00:17:02,880 few things of what deep learning is good for and what deep learning is typically not good for. 185 00:17:02,880 --> 00:17:08,960 So let's dive in to a little more of a comparison of machine learning versus deep learning. Again, 186 00:17:08,960 --> 00:17:15,840 I'm going to be using these terms quite interchangeably. But there are some specific things that 187 00:17:15,840 --> 00:17:21,840 typically you want traditional style of machine learning techniques versus deep learning. However, 188 00:17:21,840 --> 00:17:28,400 this is constantly changing. So again, I'm not talking in absolutes here. I'm more just talking 189 00:17:28,400 --> 00:17:34,720 in general. And I'll leave it to you to use your own curiosity to research the specific 190 00:17:34,720 --> 00:17:40,400 differences between these two. But typically, for machine learning, like the traditional style of 191 00:17:40,400 --> 00:17:44,080 algorithms, although they are still machine learning algorithms, which is kind of a little 192 00:17:44,080 --> 00:17:49,760 bit confusing where deep learning and machine learning differ is you want to use traditional 193 00:17:49,760 --> 00:17:54,960 machine learning algorithms on structured data. So if you have tables of numbers, this is what I 194 00:17:54,960 --> 00:18:02,720 mean by structured rows and columns, structured data. And possibly one of the best algorithms 195 00:18:02,720 --> 00:18:08,560 for this type of data is a gradient boosted machine, such as xg boost. This is an algorithm 196 00:18:08,560 --> 00:18:12,960 that you'll see in a lot of data science competitions, and also used in production settings. When I 197 00:18:12,960 --> 00:18:17,520 say production settings, I mean, applications that you may interact with on the internet, 198 00:18:17,520 --> 00:18:23,200 or use on a day to day. So that's production. xg boost is typically the favorite algorithm for 199 00:18:23,200 --> 00:18:29,680 these kinds of situations. So again, if you have structured data, you might look into xg boost 200 00:18:29,680 --> 00:18:35,120 rather than building a deep learning algorithm. But again, the rules aren't set in stone. That's 201 00:18:35,120 --> 00:18:40,240 where deep learning and machine learning is kind of an art kind of a science is that sometimes 202 00:18:40,240 --> 00:18:45,440 xg boost is the best for structured data, but there might be exceptions to the rule. But for deep 203 00:18:45,440 --> 00:18:51,280 learning, it is typically better for unstructured data. And what I mean by that is data that's kind 204 00:18:51,280 --> 00:18:57,440 of all over the place. It's not in your nice, standardized rows and columns. So say you had 205 00:18:57,440 --> 00:19:02,640 natural language such as this tweet by this person, whose name is quite similar to mine, 206 00:19:02,640 --> 00:19:07,760 and has the same Twitter account as me. Oh, maybe I wrote that. How do I learn machine learning? What 207 00:19:07,760 --> 00:19:12,960 you need to hear? Learn Python, learn math, start probability, software engineering, build. 208 00:19:12,960 --> 00:19:16,880 What you need to do? Google it, go down the rabbit hole, resurfacing six to nine months, 209 00:19:16,880 --> 00:19:21,840 and ring assess. I like that. Or if you had a whole bunch of texts such as the definition for 210 00:19:21,840 --> 00:19:27,760 deep learning on Wikipedia, again, this is the reason why I'm not covering as many definitions 211 00:19:27,760 --> 00:19:31,440 in this course is because look how simple you can look these things up. Wikipedia is going to 212 00:19:31,440 --> 00:19:36,960 be able to define deep learning far better than what I can. I'm more focused on just getting involved 213 00:19:36,960 --> 00:19:41,520 in working hands on with this stuff than defining what it is. And then we have 214 00:19:42,720 --> 00:19:50,240 images. If we wanted to build a burger, take a photo app thing, you would work with image data, 215 00:19:50,240 --> 00:19:55,680 which doesn't really have much of a structure. Although we'll see that there are ways for deep 216 00:19:55,680 --> 00:20:01,120 learning that we can turn this kind of data to have some sort of structure through the beauty 217 00:20:01,120 --> 00:20:06,640 of a tensor. And then we might have audio files such as if you were talking to your voice assistant. 218 00:20:06,640 --> 00:20:11,680 I'm not going to say one because a whole bunch of my devices might go crazy if I say the name of 219 00:20:11,680 --> 00:20:18,240 my voice assistant, which rhymes with I'm not even going to say that out loud. And so typically, 220 00:20:18,240 --> 00:20:24,880 for unstructured data, you'll want to use a neural network of some kind. So structured data, 221 00:20:24,880 --> 00:20:30,800 gradient boosted machine, or a random forest, or a tree based algorithm, such as extra boost, 222 00:20:30,800 --> 00:20:38,480 and unstructured data, neural networks. So let's keep going. Let's have a look at some of the 223 00:20:38,480 --> 00:20:44,880 common algorithms that you might use for structured data, machine learning versus unstructured data, 224 00:20:44,880 --> 00:20:49,360 deep learning. So random forest is one of my favorites, gradient boosted models, 225 00:20:49,360 --> 00:20:56,080 native base nearest neighbor, support vector machine, SVM, and then many more. But since 226 00:20:56,080 --> 00:21:01,280 the advent of deep learning, these are often referred to as shallow algorithms. So deep learning, 227 00:21:01,280 --> 00:21:06,880 why is it called deep learning? Well, as we'll see is that it can have many different layers 228 00:21:06,880 --> 00:21:11,600 of algorithm, you might have an input layer, 100 layers in the middle, and then an output layer. 229 00:21:11,600 --> 00:21:16,720 But we'll get hands on with this later on. And so common algorithms for deep learning and neural 230 00:21:16,720 --> 00:21:21,520 networks, fully connected neural network, convolutional neural network, recurrent neural network, 231 00:21:21,520 --> 00:21:27,440 transformers have taken over over the past couple years, and of course, many more. And the beautiful 232 00:21:27,440 --> 00:21:33,680 thing about deep learning and neural networks is is almost as many problems that it can be applied 233 00:21:33,680 --> 00:21:39,280 to is as many different ways that you can construct them. So this is why I'm putting all these 234 00:21:39,280 --> 00:21:42,800 dot points on the page. And I can understand if you haven't had much experience of machine 235 00:21:42,800 --> 00:21:49,680 learning or deep learning, this can be a whole bunch of information overload. But good news is 236 00:21:49,680 --> 00:21:54,480 what we're going to be focused on building with PyTorch is neural networks, fully connected neural 237 00:21:54,480 --> 00:21:59,120 networks and convolutional neural networks, the foundation of deep learning. But the excellent 238 00:21:59,120 --> 00:22:03,280 thing is, the exciting thing is, is that if we learn these foundational building blocks, 239 00:22:03,280 --> 00:22:09,920 we can get into these other styles of things here. And again, part art, part science of machine 240 00:22:09,920 --> 00:22:14,720 learning and deep learning is depending on how you represent your problem, depending on what your 241 00:22:14,720 --> 00:22:21,920 problem is, many of the algorithms here and here can be used for both. So I know I've just kind of 242 00:22:21,920 --> 00:22:26,080 bedazzled you and saying that, Oh, well, you kind of use these ones for deep learning, you kind of 243 00:22:26,080 --> 00:22:30,960 use these ones for machine learning. But depending on what your problem is, you can also use both. 244 00:22:30,960 --> 00:22:35,760 So that's a little bit of confusion to machine learning. But that's a fun part about it too, 245 00:22:35,760 --> 00:22:41,440 is use your curiosity to figure out what's best for whatever you're working on. And with all this 246 00:22:41,440 --> 00:22:48,320 talk about neural networks, how about in the next video, we cover what are neural networks. Now, 247 00:22:48,320 --> 00:22:52,720 I'd like you to Google this before we watch the next video, because it's going to be hundreds of 248 00:22:52,720 --> 00:22:57,760 definitions of what they are. And I'd like you to start forming your own definition of what a 249 00:22:57,760 --> 00:23:05,120 neural network is. I'll see you in the next video. Welcome back. In the last video, I left you with 250 00:23:05,120 --> 00:23:09,920 the cliffhanger of a question. What are neural networks? And I gave you the challenge of 251 00:23:09,920 --> 00:23:13,440 Googling that, but you might have already done that by the time you've got here. 252 00:23:13,440 --> 00:23:19,360 Let's just do that together. If I type in what are neural networks, I've already done this. 253 00:23:20,240 --> 00:23:25,040 What are neural networks? Explain neural networks, neural network definition. There are hundreds 254 00:23:25,040 --> 00:23:30,240 of definitions of things like this online neural network in five minutes. Three blue one brown. 255 00:23:30,240 --> 00:23:35,280 I'd highly recommend that channel series on neural networks. That's going to be in the 256 00:23:35,280 --> 00:23:40,800 extracurricular stat quest is also amazing. So there's hundreds of different definitions on here, 257 00:23:40,800 --> 00:23:45,200 and you can read 10 of them, five of them, three of them, make your own definition. 258 00:23:45,200 --> 00:23:49,040 But for the sake of this course, here's how I'm going to find neural networks. 259 00:23:50,080 --> 00:23:55,440 So we have some data of whatever it is. We might have images of food. We might have 260 00:23:55,440 --> 00:24:00,240 tweets or natural language, and we might have speech. So these are some examples of inputs 261 00:24:00,240 --> 00:24:05,120 for unstructured data, because they're not rows and columns. So these are the input data that 262 00:24:05,120 --> 00:24:13,200 we have. And then how do we use them with a neural network? Well, before data can be used in a neural 263 00:24:13,200 --> 00:24:18,480 network, it needs to be turned into numbers, because humans, we like looking at images of Raman and 264 00:24:18,480 --> 00:24:23,120 spaghetti. We know that that's Raman. We know that that's spaghetti after we've seen it one or two 265 00:24:23,120 --> 00:24:30,000 times. And we like reading good tweets, and we like listening to amazing music or hearing our 266 00:24:30,000 --> 00:24:35,840 friend talk on the phone in audio file. However, before a computer understands what's going on 267 00:24:35,840 --> 00:24:41,760 in these inputs, it needs to turn them into numbers. So this is what I call a numerical 268 00:24:41,760 --> 00:24:47,600 encoding or a representation. And this numerical encoding, these square brackets indicate that 269 00:24:47,600 --> 00:24:52,560 it's part of a matrix or a tensor, which we're going to get very hands on with throughout this 270 00:24:52,560 --> 00:24:58,880 course. So we have our inputs, we've turned it into numbers, and then we pass it through a neural 271 00:24:58,880 --> 00:25:05,040 network. And now this is a graphic for a neural network. However, the graphics for neural networks, 272 00:25:05,040 --> 00:25:13,840 as we'll see, can get quite involved. But they all represent the same fundamentals. So if we go to 273 00:25:13,840 --> 00:25:18,640 this one, for example, we have an input layer, then we have multiple hidden layers. However, 274 00:25:18,640 --> 00:25:23,600 you define this, you can design these and how you want. Then we have an output layer. So our 275 00:25:23,600 --> 00:25:30,160 inputs will go in some kind of data. The hidden layers will perform mathematical operations on the 276 00:25:30,160 --> 00:25:35,840 input. So the numbers, and then we'll have an output. Oh, there's three blue one brown neural 277 00:25:35,840 --> 00:25:41,040 networks from the ground up. Great video. Highly recommend you check that out. But then if we come 278 00:25:41,040 --> 00:25:45,840 back to this, so we've got our inputs, we've turned it into numbers. And we've got our neural 279 00:25:45,840 --> 00:25:50,960 networks that we put the input in. This is typically the input layer, hidden layer. This can be as 280 00:25:50,960 --> 00:25:54,880 many different layers as you want, as many different, each of these little dots is called a node. 281 00:25:54,880 --> 00:25:58,160 There's a lot of information here, but we're going to get hands-on with seeing what this looks 282 00:25:58,160 --> 00:26:04,000 like. And then we have some kind of output. Now, which neural network should you use? Well, 283 00:26:04,000 --> 00:26:07,840 you can choose the appropriate neural network for your problem, which could involve you 284 00:26:07,840 --> 00:26:13,920 hand coding each one of these steps. Or you could find one that has worked on problems similar to 285 00:26:13,920 --> 00:26:19,760 your own, such as for images, you might use a CNN, which is a convolutional neural network. 286 00:26:19,760 --> 00:26:24,080 For natural language, you might use a transformer. For speech, you might also use a transformer. 287 00:26:24,080 --> 00:26:29,120 But fundamentally, they all follow the same principle of inputs, manipulation, outputs. 288 00:26:29,840 --> 00:26:35,440 And so the neural network will learn a representation on its own. We want to find what it learns. 289 00:26:35,440 --> 00:26:40,320 So it's going to manipulate these patterns in some way, shape, or form. And when I say 290 00:26:40,320 --> 00:26:44,800 learns representation, I'm going to also refer to it as learns patterns in the data. 291 00:26:44,800 --> 00:26:51,840 A lot of people refer to it as features. A feature may be the fact that the word do comes out to how, 292 00:26:51,840 --> 00:26:57,120 usually, in across a whole bunch of different languages. A feature can be almost anything you 293 00:26:57,120 --> 00:27:02,960 want. And again, we don't define this. The neural network learns these representations, 294 00:27:02,960 --> 00:27:08,880 patterns, features, also called weights on its own. And then where do we go from there? Well, 295 00:27:08,880 --> 00:27:14,720 we've got some sort of numbers, numerical encoding turned our data into numbers. Our neural network 296 00:27:14,720 --> 00:27:19,920 has learned a representation that it thinks best represents the patterns in our data. 297 00:27:20,560 --> 00:27:26,160 And then it outputs those representation outputs, which we can use. And often you'll 298 00:27:26,160 --> 00:27:30,960 hear this referred to as features or weight matrix or weight tensor. 299 00:27:31,520 --> 00:27:36,240 Learned representation is also another common one. There's a lot of different terms for these 300 00:27:36,240 --> 00:27:44,880 things. And then it will output. We can convert these outputs into human understandable outputs. 301 00:27:44,880 --> 00:27:49,600 So if we were to look at these, this could be, again, I said representations or patterns that 302 00:27:49,600 --> 00:27:54,640 are neural network learns can be millions of numbers. This is only nine. So imagine if these 303 00:27:54,640 --> 00:28:00,400 were millions of different numbers, I can barely understand the nine numbers that is going on here. 304 00:28:00,400 --> 00:28:05,760 So we need a way to convert these into human understandable terms. So for this example, 305 00:28:05,760 --> 00:28:10,640 we might have some input data, which are images of food. And then we want our neural network to 306 00:28:10,640 --> 00:28:15,120 learn the representations between an image of ramen and an image of spaghetti. 307 00:28:15,120 --> 00:28:19,520 And then eventually we'll take those patterns that it's learned and we'll convert them into 308 00:28:19,520 --> 00:28:25,440 whether it thinks that this is an image of ramen or spaghetti. Or in the case of this tweet, 309 00:28:25,440 --> 00:28:31,280 is this a tweet for a natural disaster or not a natural disaster? So our neural network has, 310 00:28:31,280 --> 00:28:36,320 well, we've written code to turn this into numbers. Pass it through our neural network. Our neural 311 00:28:36,320 --> 00:28:42,800 network has learned some kind of patterns. And then we ideally want it to represent this tweet 312 00:28:42,800 --> 00:28:48,320 as not a disaster. And then we can write code to do each of these steps here. And the same thing 313 00:28:48,320 --> 00:28:54,800 for these inputs going as speech, turning into something that you might say to your smart speaker, 314 00:28:54,800 --> 00:29:00,640 which I'm not going to say because a whole bunch of my devices might go off. And so let's cover 315 00:29:00,640 --> 00:29:05,920 the anatomy of neural networks. We've hinted at this a little bit already. But this is like 316 00:29:05,920 --> 00:29:11,760 neural network anatomy 101. Again, this is highly customizable what this thing actually is. We're 317 00:29:11,760 --> 00:29:18,400 going to see it in PyTorch code later on. But the data goes into the input layer. And in this case, 318 00:29:18,400 --> 00:29:26,720 the number of units slash neurons slash nodes is two hidden layers. You can have, I put a s here 319 00:29:26,720 --> 00:29:33,680 because you can have one hidden layer, or the deep in deep learning comes from having lots of 320 00:29:33,680 --> 00:29:39,680 layers. So this is only showing four layers. You might have, well, this is three layers as well. 321 00:29:40,240 --> 00:29:50,160 It might be very deep neural networks such as ResNet 152. This is 152 different layers. 322 00:29:50,160 --> 00:29:59,920 So again, you can, or this is 34, because this is only ResNet 34. But ResNet 152 has 152 different 323 00:29:59,920 --> 00:30:04,800 layers. So that's a common computer vision or a popular computer vision algorithm, by the way. 324 00:30:05,520 --> 00:30:10,720 Lots of terms we're throwing out here. But with time, you'll start to become familiar with them. 325 00:30:10,720 --> 00:30:15,920 So hidden layers can be almost as many as you want. We've only got pictured one here. And in this 326 00:30:15,920 --> 00:30:21,280 case, there's three hidden units slash neurons. And then we have an output layer. So the outputs 327 00:30:21,280 --> 00:30:27,600 learned representation or prediction probabilities from here, depending on how we set it up, which 328 00:30:27,600 --> 00:30:35,680 again, we will see what these are later on. And in this case, it has one hidden unit. So two input, 329 00:30:35,680 --> 00:30:40,560 three, one output, you can customize the number of these, you can customize how many layers there 330 00:30:40,560 --> 00:30:47,120 are, you can customize what goes into here, you can customize what goes out of there. So now, 331 00:30:47,840 --> 00:30:56,080 if we talk about the overall architecture, which is describing all of the layers combined. So that's, 332 00:30:56,080 --> 00:31:01,200 when you hear neural network architecture, it talks about the input, the hidden layers, 333 00:31:01,200 --> 00:31:05,920 which may be more than one, and the output layer. So that's a terminology for overall architecture. 334 00:31:05,920 --> 00:31:13,280 Now, I say patterns is an arbitrary term. You can hear embedding embedding might come from hidden 335 00:31:13,280 --> 00:31:18,480 layers, weights, feature representation, feature vectors, all referring to similar things. So, 336 00:31:18,480 --> 00:31:25,200 again, how do we turn our data into some numerical form, build a neural network to figure out patterns 337 00:31:25,200 --> 00:31:31,680 to output some desired output that we want. And now to get more technical, each layer is usually a 338 00:31:31,680 --> 00:31:38,560 combination of linear, so straight lines, and nonlinear, non-straight functions. So what I mean by that 339 00:31:38,560 --> 00:31:42,480 is a linear function is a straight line, a nonlinear function is a non-straight line. 340 00:31:43,120 --> 00:31:49,520 If I asked you to draw whatever you want with unlimited straight lines and not straight lines, 341 00:31:49,520 --> 00:31:54,640 so you can use straight lines or curved lines, what kind of patterns could you draw? 342 00:31:55,920 --> 00:32:00,080 At a fundamental level, that is basically what a neural network is doing. It's using a combination 343 00:32:00,080 --> 00:32:05,440 of linear, straight lines, and not straight lines to draw patterns in our data. We'll see what 344 00:32:05,440 --> 00:32:12,160 this looks like later on. Now, from the next video, let's dive in briefly to different kinds of 345 00:32:12,160 --> 00:32:17,840 learning. So we've looked at what a neural network is, the overall algorithm, but there are also 346 00:32:17,840 --> 00:32:22,480 different paradigms of how a neural network learns. I'll see you in the next video. 347 00:32:22,480 --> 00:32:29,920 Welcome back. We've discussed a brief overview of an anatomy of what a neural network is, 348 00:32:30,640 --> 00:32:36,560 but let's now discuss some learning paradigms. So the first one is supervised learning, 349 00:32:37,120 --> 00:32:43,120 and then we have unsupervised and self-supervised learning, and transfer learning. Now supervised 350 00:32:43,120 --> 00:32:48,800 learning is when you have data and labels, such as in the example we gave at the start, which was 351 00:32:48,800 --> 00:32:55,200 how you would build a neural network or a machine learning algorithm to figure out the rules to 352 00:32:55,200 --> 00:33:00,160 cook your Sicilian grandmother's famous roast chicken dish. So in the case of supervised learning, 353 00:33:00,160 --> 00:33:06,400 you'd have a lot of data, so inputs, such as raw ingredients as vegetables and chicken, 354 00:33:06,400 --> 00:33:13,680 and a lot of examples of what that inputs should ideally look like. Or in the case of discerning 355 00:33:13,680 --> 00:33:19,520 photos between a cat and a dog, you might have a thousand photos of a cat and a thousand photos 356 00:33:19,520 --> 00:33:25,440 of a dog that you know which photos are cat and which photos are dog, and you pass those photos 357 00:33:25,440 --> 00:33:31,600 to a machine learning algorithm to discern. So in that case, you have data, the photos, and the 358 00:33:31,600 --> 00:33:38,480 labels, aka cat and dog, for each of those photos. So that's supervised learning, data and labels. 359 00:33:39,040 --> 00:33:43,360 Unsupervised and self-supervised learning is you just have the data itself. 360 00:33:43,360 --> 00:33:48,400 You don't have any labels. So in the case of cat and dog photos, you only have the photos. 361 00:33:48,400 --> 00:33:54,000 You don't have the labels of cat and dog. So in the case of self-supervised learning, 362 00:33:54,720 --> 00:33:59,920 you could get a machine learning algorithm to learn an inherent representation of what, 363 00:33:59,920 --> 00:34:04,640 and when I say representation, I mean patterns and numbers, I mean weights, I mean features, 364 00:34:04,640 --> 00:34:08,960 a whole bunch of different names describing the same thing. You could get a self-supervised 365 00:34:08,960 --> 00:34:15,280 learning algorithm to figure out the fundamental patterns between a dog and a cat image, but 366 00:34:16,240 --> 00:34:19,440 it wouldn't necessarily know the difference between the two. 367 00:34:19,440 --> 00:34:22,640 That's where you could come in later and go show me the patterns you've learned, 368 00:34:22,640 --> 00:34:26,080 and it might show you the patterns and you could go, okay, the patterns that look like this, 369 00:34:26,080 --> 00:34:31,120 a dog and the patterns that look like that, a cat. So self-supervised and unsupervised learning 370 00:34:31,120 --> 00:34:36,800 learn solely on the data itself. And then finally, transfer learning is a very, very 371 00:34:36,800 --> 00:34:41,920 important paradigm in deep learning. It's taking the patterns that one model has learned 372 00:34:42,800 --> 00:34:48,720 of a data set and transferring it to another model, such in the case of if we were trying to 373 00:34:48,720 --> 00:34:53,520 build a supervised learning algorithm for discerning between cat and dog photos. 374 00:34:53,520 --> 00:34:57,840 We might start with a model that has already learned patterns and images 375 00:34:57,840 --> 00:35:03,120 and transfer those foundational patterns to our own model so that our model gets a head start. 376 00:35:03,120 --> 00:35:08,400 This is transfer learning is a very, very powerful technique, but as for this course, 377 00:35:08,400 --> 00:35:12,400 we're going to be writing code to focus on these two supervised learning and transfer learning, 378 00:35:12,400 --> 00:35:16,560 which are two of the most common paradigms or common types of learning in machine learning 379 00:35:16,560 --> 00:35:21,840 and deep learning. However, this style of code though can be adapted across different learning 380 00:35:21,840 --> 00:35:25,840 paradigms. Now, I just want to let you know there is one that I haven't mentioned here, 381 00:35:25,840 --> 00:35:33,040 which is kind of in its own bucket, and that is reinforcement learning. So I'll leave this 382 00:35:33,040 --> 00:35:36,720 as an extension if you wanted to look it up. But essentially, this is a good one. 383 00:35:37,440 --> 00:35:43,120 That's a good photo, actually. So shout out to Katie Nuggets. The whole idea of reinforcement 384 00:35:43,120 --> 00:35:48,560 learning is that you have some kind of environment and an agent that does actions in that environment, 385 00:35:48,560 --> 00:35:54,000 and you give rewards and observations back to that agent. So say, for example, 386 00:35:54,000 --> 00:36:01,440 you wanted to teach your dog to urinate outside. Well, you would reward its actions of urinating 387 00:36:01,440 --> 00:36:07,760 outside and possibly not reward its actions of urinating all over your couch. So reinforcement 388 00:36:07,760 --> 00:36:13,360 learning is again, it's kind of in its own paradigm. This picture has a good explanation 389 00:36:13,360 --> 00:36:17,440 between unsupervised learning, supervised learning to separate two different things, 390 00:36:17,440 --> 00:36:22,800 and then reinforcement learning is kind of like that. But again, I will let you research the 391 00:36:22,800 --> 00:36:27,360 different learning paradigms a little bit more in your own time. As I said, we're going to be 392 00:36:27,360 --> 00:36:33,280 focused on writing code to do supervised learning and transfer learning, specifically pytorch code. 393 00:36:34,160 --> 00:36:41,680 Now with that covered, let's get a few examples of what is deep learning actually used for. And 394 00:36:41,680 --> 00:36:45,920 before we get into the next video, I'm going to issue you a challenge to search this question 395 00:36:45,920 --> 00:36:51,440 yourself and come up with some of your own ideas for what deep learning is currently used for. 396 00:36:51,440 --> 00:36:58,480 So give that a shot and I'll see you in the next video. How'd you go? Did you do some research? 397 00:36:58,480 --> 00:37:02,400 Did you find out what deep learning is actually used for? I bet you found a treasure trail of 398 00:37:02,400 --> 00:37:06,880 things. And hey, I mean, if you're reading this course, chances are that you probably already 399 00:37:06,880 --> 00:37:11,120 know some use cases for deep learning. You're like, Daniel, hurry up and get to the code. Well, 400 00:37:11,120 --> 00:37:16,080 we're going to get there, don't you worry? But let's have a look at some things that deep 401 00:37:16,080 --> 00:37:20,720 learning can be used for. But before, I just want to remind you of this comment. This is from 402 00:37:20,720 --> 00:37:26,400 Yasha Sway on the 2020 machine learning roadmap video. I think you can use ML and remember, 403 00:37:26,400 --> 00:37:31,760 ML is machine learning. And remember, deep learning is a part of ML for literally anything as long 404 00:37:31,760 --> 00:37:36,400 as you can convert it into numbers and program it to find patterns. Literally, it could be anything, 405 00:37:36,400 --> 00:37:42,960 any input or output from the universe. So that's a beautiful thing about machine learning is that 406 00:37:42,960 --> 00:37:48,320 if you can encode it something into numbers, chances are you can build a machine learning 407 00:37:48,320 --> 00:37:53,680 algorithm to find patterns in those numbers. Will it work? Well, again, that's the reason machine 408 00:37:53,680 --> 00:37:57,920 learning and deep learning is part art, part science. A scientist would love to know that their 409 00:37:57,920 --> 00:38:02,240 experiments would work. But an artist is kind of excited about the fact that, I don't know, 410 00:38:02,240 --> 00:38:07,440 this might work, it might not. And so that's something to keep in mind. Along with the rule 411 00:38:07,440 --> 00:38:12,320 number one of machine learning is if you don't need it, you don't use it. But if you do use it, 412 00:38:12,320 --> 00:38:17,760 it can be used for almost anything. And let's get a little bit specific and find out some deep 413 00:38:17,760 --> 00:38:22,560 learning use cases. And I've put some up there for a reason because there are lots. These are just 414 00:38:22,560 --> 00:38:27,520 some that I interact with in my day to day life, such as recommendation, we've got a programming 415 00:38:27,520 --> 00:38:33,120 video, we've got a programming podcast, we got some jujitsu videos, we've got some RuneScape 416 00:38:33,120 --> 00:38:38,720 videos, a soundtrack from my favorite movie. Have you noticed, whenever you go to YouTube, 417 00:38:38,720 --> 00:38:44,320 you don't really search for things anymore. Well, sometimes you might, but the recommendation 418 00:38:44,320 --> 00:38:48,720 page is pretty darn good. That's all powered by deep learning. And in the last 10 years, 419 00:38:48,720 --> 00:38:54,000 have you noticed that translation has got pretty good too? Well, that's powered by deep learning 420 00:38:54,000 --> 00:39:00,080 as well. Now, I don't have much hands on experience with this. I did use it when I was in Japan. 421 00:39:00,080 --> 00:39:05,920 I speak a very little amount of Japanese and even smaller amount of Mandarin. But if I wanted to 422 00:39:05,920 --> 00:39:16,800 translate deep learning as epic to Spanish, it might come out as el aprendise, profando es ebiko. 423 00:39:16,800 --> 00:39:21,040 Now, all of the native Spanish speakers watching this video can laugh at me because that was a very 424 00:39:21,040 --> 00:39:27,120 Australian version of saying deep learning is epic in Spanish. But that's so cool. All the Google 425 00:39:27,120 --> 00:39:32,080 Translate is now powered by deep learning. And the beautiful thing, if I couldn't say it myself, 426 00:39:32,080 --> 00:39:37,200 I could click this speaker and it would say it for me. So that speech recognition that's powered 427 00:39:37,200 --> 00:39:41,520 by deep learning. So if you were to ask your voice assistant who's the biggest big dog of them all, 428 00:39:41,520 --> 00:39:45,920 of course, they're going to say you, which is what I've set up, my voice assistant to say. 429 00:39:45,920 --> 00:39:51,680 That's part of speech recognition. And in computer vision, oh, look at this. You see this? Where is 430 00:39:51,680 --> 00:39:57,440 this photo from? This photo is from this person driving this car. Did a hit and run on my car, 431 00:39:57,440 --> 00:40:02,160 at the front of my house, my apartment building, my car was parked on the street, this car, the 432 00:40:02,160 --> 00:40:07,440 trailer came off, ran into the back of my car, basically destroyed it, and then they drove off. 433 00:40:07,440 --> 00:40:14,240 However, my next door neighbors security camera picked up on this car. Now, I became a detective 434 00:40:14,240 --> 00:40:19,040 for a week, and I thought, hmm, if there was a computer vision algorithm built into that camera, 435 00:40:19,040 --> 00:40:24,080 it could have detected when the car hit. I mean, it took a lot of searching to find it, 436 00:40:24,080 --> 00:40:28,000 it turns out the car hit about 3.30am in the morning. So it's pitch black. And of course, 437 00:40:28,000 --> 00:40:32,080 we didn't get the license plate. So this person is out there somewhere in the world after doing 438 00:40:32,080 --> 00:40:37,760 a hit and run. So if you're watching this video, just remember computer vision might catch you one 439 00:40:37,760 --> 00:40:42,560 day. So this is called object detection, where you would place a box around the area where the 440 00:40:42,560 --> 00:40:46,960 pixels most represent the object that you're looking for. So for computer vision, we could 441 00:40:46,960 --> 00:40:52,400 train an object detector to capture cars that drive past a certain camera. And then if someone 442 00:40:52,400 --> 00:40:55,680 does a hit and run on you, you could capture it. And then fingers crossed, it's not too dark 443 00:40:55,680 --> 00:40:59,680 that you can read the license plate and go, hey, excuse me, please, this person has hit my car 444 00:40:59,680 --> 00:41:03,840 and wrecked it. So that's a very close to home story of where computer vision could be used. 445 00:41:03,840 --> 00:41:09,520 And then finally, natural language processing. Have you noticed as well, your spam detector on 446 00:41:09,520 --> 00:41:14,640 your email inbox is pretty darn good? Well, some are powered by deep learning, some not, 447 00:41:14,640 --> 00:41:19,360 it's hard to tell these days what is powered by deep learning, what isn't. But natural language 448 00:41:19,360 --> 00:41:25,520 processing is the process of looking at natural language text. So unstructured text. So whatever 449 00:41:25,520 --> 00:41:31,360 you'd write an email in a story in a Wikipedia document and deciding or getting your algorithm 450 00:41:31,360 --> 00:41:36,560 to find patterns in that. So for this example, you would find that this email is not spam. 451 00:41:36,560 --> 00:41:40,400 This deep learning course is incredible. I can't wait to use what I've learned. Thank you so much. 452 00:41:40,400 --> 00:41:45,200 And by the way, that is my real email. So if you want to email me, you can. And then this is spam. 453 00:41:45,200 --> 00:41:52,080 Hey, Daniel, congratulations, you win a lot of money. Wow, I really like that a lot of money. 454 00:41:52,080 --> 00:41:56,400 But somebody said, I don't think that this is real. So that would probably go to my spam inbox. 455 00:41:57,120 --> 00:42:05,040 Now, with that being said, if we wanted to put these problems in a little bit more of a 456 00:42:05,040 --> 00:42:09,520 classification, this is known as sequence to sequence because you put one sequence in 457 00:42:09,520 --> 00:42:15,280 and get one sequence out. Same as this, you have a sequence of audio waves and you get some 458 00:42:15,280 --> 00:42:22,880 text out. So sequence to sequence, sec to sec. This is classification slash regression. In this 459 00:42:22,880 --> 00:42:28,160 case, the regression is predicting a number. That's what a regression problem is. You would predict 460 00:42:28,160 --> 00:42:33,840 the coordinates of where these box corners should be. So say this should be at however many pixels 461 00:42:33,840 --> 00:42:38,560 in from the X angle and however many pixels down from the Y angle, that's that corner. 462 00:42:38,560 --> 00:42:43,280 And then you would draw in between the corners. And then the classification part would go, 463 00:42:43,280 --> 00:42:48,320 Hey, this is that car that did a hit and run on us. And in this case, this is classification. 464 00:42:48,320 --> 00:42:52,960 Classification is predicting whether something is one thing or another, or perhaps more than one 465 00:42:52,960 --> 00:42:58,160 thing or another in the class of multi class classification. So this email is not spam. That's 466 00:42:58,160 --> 00:43:06,560 a class and this email is spam. So that's also a class. So I think we've only got one direction 467 00:43:06,560 --> 00:43:09,840 to go now that we've sort of laid the foundation for the course. And that is 468 00:43:12,720 --> 00:43:16,560 Well, let's start talking about PyTorch. I'll see you in the next video. 469 00:43:18,320 --> 00:43:21,120 Well, let's now cover some of the foundations of 470 00:43:24,080 --> 00:43:31,840 PyTorch. But first, you might be asking, what is PyTorch? Well, of course, we could just go to 471 00:43:31,840 --> 00:43:38,880 our friend, the internet, and look up PyTorch.org. This is the homepage for PyTorch. 472 00:43:38,880 --> 00:43:43,680 This course is not a replacement for everything on this homepage. This should be your ground truth 473 00:43:43,680 --> 00:43:49,840 for everything PyTorch. So you can get started. You've got a big ecosystem. You've got a way to 474 00:43:49,840 --> 00:43:55,360 set up on your local computer. You've got resources. You've got docs. PyTorch. You've got the GitHub. 475 00:43:55,360 --> 00:44:00,800 You've got search. You've got blog, everything here. This website should be the place you're 476 00:44:00,800 --> 00:44:06,160 visiting most throughout this course as we're writing PyTorch code. You're coming here. 477 00:44:06,160 --> 00:44:09,840 You're reading about it. You're checking things out. You're looking at examples. 478 00:44:10,800 --> 00:44:18,080 But for the sake of this course, let's break PyTorch down. Oh, there's a little flame animation 479 00:44:18,080 --> 00:44:28,000 I just forgot about. What is PyTorch? I didn't sync up the animations. That's all right. So 480 00:44:28,000 --> 00:44:36,320 PyTorch is the most popular research deep learning framework. I'll get to that in a second. 481 00:44:36,320 --> 00:44:42,400 It allows you to write fast deep learning code in Python. If you know Python, it's a very user-friendly 482 00:44:42,400 --> 00:44:47,920 programming language. PyTorch allows us to write state-of-the-art deep learning code 483 00:44:47,920 --> 00:44:55,200 accelerated by GPUs with Python. It enables you access to many pre-built deep learning models 484 00:44:55,200 --> 00:45:01,280 from Torch Hub, which is a website that has lots of, if you remember, I said transfer learning is 485 00:45:01,280 --> 00:45:07,280 a way that we can use other deep learning models to power our own. Torch Hub is a resource for that. 486 00:45:07,280 --> 00:45:10,800 Same as Torch Vision.Models. We'll be looking at this throughout the course. 487 00:45:10,800 --> 00:45:16,000 It provides an ecosystem for the whole stack of machine learning. From pre-processing data, 488 00:45:16,000 --> 00:45:20,720 getting your data into tenses, what if you started with some images? How do you represent them as 489 00:45:20,720 --> 00:45:25,520 numbers? Then you can build models such as neural networks to model that data. Then you can even 490 00:45:25,520 --> 00:45:31,600 deploy your model in your application slash cloud, well, deploy your PyTorch model. Application slash 491 00:45:31,600 --> 00:45:37,360 cloud will be depending on what sort of application slash cloud that you're using, but generally it 492 00:45:37,360 --> 00:45:43,520 will run some kind of PyTorch model. And it was originally designed and used in-house by Facebook 493 00:45:43,520 --> 00:45:48,400 slash meta. I'm pretty sure Facebook have renamed themselves meta now, but it is now open source 494 00:45:48,400 --> 00:45:53,600 and used by companies such as Tesla, Microsoft and OpenAI. And when I say it is the most popular 495 00:45:53,600 --> 00:45:58,560 deep learning research framework, don't take my word for it. Let's have a look at papers with code 496 00:45:58,560 --> 00:46:03,840 dot com slash trends. If you're not sure what papers with code is, it is a website that tracks 497 00:46:03,840 --> 00:46:08,320 the latest and greatest machine learning papers and whether or not they have code. So we have some 498 00:46:08,320 --> 00:46:13,840 other languages here, other deep learning frameworks, PyTorch, TensorFlow, Jax is another one, MXNet, 499 00:46:13,840 --> 00:46:18,960 paddle paddle, the original torch. So PyTorch is an evolution of torch written in Python, 500 00:46:18,960 --> 00:46:25,760 CAF2, Mindspore. But if we look at this, when is this? Last date is December 2021. We have, 501 00:46:26,960 --> 00:46:33,520 oh, this is going to move every time I move it. No. So I'll highlight PyTorch at 58% there. 502 00:46:33,520 --> 00:46:40,560 So by far and large, the most popular research machine learning framework used to write the code 503 00:46:40,560 --> 00:46:45,680 for state of the art machine learning algorithms. So this is browse state of the art papers with 504 00:46:45,680 --> 00:46:51,120 code.com amazing website. We have semantic segmentation, image classification, object detection, image 505 00:46:51,120 --> 00:46:56,000 generation, computer vision, natural language processing, medical, I'll let you explore this. 506 00:46:56,000 --> 00:47:00,960 It's one of my favorite resources for staying up to date on the field. But as you see, out of the 507 00:47:00,960 --> 00:47:08,560 65,000 papers with code that this website is tracked, 58% of them are implemented with PyTorch. 508 00:47:08,560 --> 00:47:14,000 How cool is that? And this is what we're learning. So let's jump into there. Why PyTorch? Well, 509 00:47:14,000 --> 00:47:18,800 other than the reasons that we just spoke about, it's a research favorite. This is highlighting. 510 00:47:20,080 --> 00:47:27,280 There we go. So there we go. I've highlighted it here. PyTorch, 58%, nearly 2,500 repos. If 511 00:47:27,280 --> 00:47:31,920 you're not sure what a repo is, a repo is a place where you store all of your code online. 512 00:47:31,920 --> 00:47:37,680 And generally, if a paper gets published in machine learning, if it's fantastic research, 513 00:47:37,680 --> 00:47:44,080 it will come with code, code that you can access and use for your own applications or your own 514 00:47:44,080 --> 00:47:51,840 research. Again, why PyTorch? Well, this is a tweet from Francois Chale, who's the author of 515 00:47:51,840 --> 00:47:57,440 Keras, which is another popular deep learning framework. But with tools like Colab, we're going 516 00:47:57,440 --> 00:48:02,480 to see what Colab is in a second, Keras and TensorFlow. I've added in here and PyTorch. 517 00:48:02,480 --> 00:48:06,480 Virtually anyone can solve in a day with no initial investment problems that would have 518 00:48:06,480 --> 00:48:12,880 required an engineering team working for a quarter and $20,000 in hardware in 2014. So this is just 519 00:48:12,880 --> 00:48:18,800 to highlight how good the space of deep learning and machine learning tooling has become. Colab, 520 00:48:18,800 --> 00:48:24,880 Keras and TensorFlow are all fantastic. And now PyTorch is added to this list. If you want to 521 00:48:24,880 --> 00:48:31,600 check that out, there's Francois Chale on Twitter. Very, very prominent voice in the machine learning 522 00:48:31,600 --> 00:48:37,440 field. Why PyTorch? If you want some more reasons, well, have a look at this. Look at all the 523 00:48:37,440 --> 00:48:42,560 places that are using PyTorch. It's just coming up everywhere. We've got Andre Kapathi here, 524 00:48:42,560 --> 00:48:49,360 who's the director of AI at Tesla. So if we go, we could search this, PyTorch 525 00:48:51,200 --> 00:48:59,760 at Tesla. We've got a YouTube talk there, Andre Kapathi, director of AI at Tesla. 526 00:48:59,760 --> 00:49:08,640 And so Tesla are using PyTorch for the computer vision models of autopilot. So if we go to videos 527 00:49:08,640 --> 00:49:16,160 or maybe images, does it come up there? Things like this, a car detecting what's going on in the scene. 528 00:49:16,960 --> 00:49:21,360 Of course, there'll be some other code for planning, but I'll let you research that. 529 00:49:22,080 --> 00:49:28,640 When we come back here, OpenAI, which is one of the biggest open artificial intelligence 530 00:49:28,640 --> 00:49:34,480 research firms, open in the sense that they publish a lot of their research methodologies, 531 00:49:34,480 --> 00:49:40,960 however, recently there's been some debate about that. But if you go to openai.com, 532 00:49:40,960 --> 00:49:45,280 let's just say that they're one of the biggest AI research entities in the world, 533 00:49:45,280 --> 00:49:50,800 and they've standardized on PyTorch. So they've got a great blog, they've got great research, 534 00:49:50,800 --> 00:49:56,320 and now they've got OpenAI API, which is, you can use their API to access some of the models 535 00:49:56,320 --> 00:50:02,080 that they've trained. Presumably with PyTorch, because this blog post from January 2020 says 536 00:50:02,080 --> 00:50:07,280 that OpenAI is now standardized across PyTorch. There's a repo called the incredible PyTorch, 537 00:50:07,280 --> 00:50:11,040 which collects a whole bunch of different projects that are built on top of PyTorch. 538 00:50:11,040 --> 00:50:15,040 That's the beauty of PyTorch is that you can build on top of it, you can build with it 539 00:50:15,040 --> 00:50:22,960 AI for AG, for agriculture. PyTorch has been used. Let's have a look. PyTorch in agriculture. 540 00:50:22,960 --> 00:50:29,520 There we go. Agricultural robots use PyTorch. This is a medium article. 541 00:50:31,920 --> 00:50:37,120 It's everywhere. So if we go down here, this is using object detection. Beautiful. 542 00:50:38,560 --> 00:50:43,280 Object detection to detect what kind of weeds should be sprayed with fertilizer. This is just 543 00:50:43,280 --> 00:50:49,200 one of many different things, so PyTorch on a big tractor like this. It can be used almost 544 00:50:49,200 --> 00:50:53,920 anywhere. If we come back, PyTorch builds the future of AI and machine learning at Facebook, 545 00:50:53,920 --> 00:50:58,480 so Facebook, which is also MetaAI, a little bit confusing, even though it says MetaAI, 546 00:50:58,480 --> 00:51:03,520 it's on AI.facebook.com. That may change by the time you watch this. They use PyTorch in-house 547 00:51:03,520 --> 00:51:09,200 for all of their machine learning applications. Microsoft is huge in the PyTorch game. 548 00:51:09,200 --> 00:51:14,560 It's absolutely everywhere. So if that's not enough reason to use PyTorch, 549 00:51:14,560 --> 00:51:19,920 well, then maybe you're in the wrong course. So you've seen enough reasons of why to use PyTorch. 550 00:51:19,920 --> 00:51:25,280 I'm going to give you one more. That is that it helps you run your code, your machine learning code 551 00:51:25,280 --> 00:51:30,720 accelerated on a GPU. We've covered this briefly, but what is a GPU slash a TPU, 552 00:51:30,720 --> 00:51:36,320 because this is more of a newer chip these days. A GPU is a graphics processing unit, 553 00:51:36,320 --> 00:51:41,040 which is essentially very fast at crunching numbers. Originally designed for video games, 554 00:51:41,040 --> 00:51:45,440 if you've ever designed or played a video game, you know that the graphics are quite intense, 555 00:51:45,440 --> 00:51:50,800 especially these days. And so to render those graphics, you need to do a lot of numerical calculations. 556 00:51:50,800 --> 00:51:56,720 And so the beautiful thing about PyTorch is that it enables you to leverage a GPU through an 557 00:51:56,720 --> 00:52:01,840 interface called CUDA, which is a lot of words I'm going to throw out you here, a lot of acronyms 558 00:52:01,840 --> 00:52:08,080 in the deep learning space, CUDA. Let's just search CUDA. CUDA toolkit. So CUDA is a parallel 559 00:52:08,080 --> 00:52:12,880 computing platform and application programming interface, which is an API that allows software 560 00:52:12,880 --> 00:52:18,400 to use certain types of graphics processing units for general purpose computing. That's what 561 00:52:18,400 --> 00:52:25,840 we want. So PyTorch leverages CUDA to enable you to run your machine learning code on NVIDIA 562 00:52:25,840 --> 00:52:32,960 GPUs. Now, there is also an ability to run your PyTorch code on TPUs, which is a tensor processing 563 00:52:32,960 --> 00:52:39,440 unit. However, GPUs are far more popular when running various types of PyTorch code. So we're 564 00:52:39,440 --> 00:52:45,440 going to focus on running our PyTorch code on the GPU. And to just give you a quick example, 565 00:52:45,440 --> 00:52:51,520 PyTorch on TPU, let's see that. Getting started with PyTorch on cloud TPUs, there's plenty of 566 00:52:51,520 --> 00:52:57,120 guys for that. But as I said, GPUs are going to be far more common in practice. So that's what 567 00:52:57,120 --> 00:53:04,320 we're going to focus on. And with that said, we've said tensor processing unit. Now, the reason 568 00:53:04,320 --> 00:53:08,640 why these are called tensor processing units is because machine learning and deep learning 569 00:53:08,640 --> 00:53:16,160 deals a lot with tensors. And so in the next video, let's answer the question, what is a tensor? 570 00:53:16,160 --> 00:53:21,440 But before I go through and answer that from my perspective, I'd like you to research this 571 00:53:21,440 --> 00:53:27,200 question. So open up Google or your favorite search engine and type in what is a tensor and 572 00:53:27,200 --> 00:53:34,400 see what you find. I'll see you in the next video. Welcome back. In the last video, I left you on 573 00:53:34,400 --> 00:53:40,880 the cliffhanger question of what is a tensor? And I also issued you the challenge to research 574 00:53:40,880 --> 00:53:45,760 what is a tensor. Because as I said, this course isn't all about telling you exactly what things 575 00:53:45,760 --> 00:53:50,960 are. It's more so sparking a curiosity in you so that you can stumble upon the answers to these 576 00:53:50,960 --> 00:53:56,160 things yourself. But let's have a look. What is a tensor? Now, if you remember this graphic, 577 00:53:56,160 --> 00:54:00,320 there's a lot going on here. But this is our neural network. We have some kind of input, 578 00:54:00,320 --> 00:54:05,360 some kind of numerical encoding. Now, we start with this data. In our case, it's unstructured data 579 00:54:05,360 --> 00:54:11,920 because we have some images here, some text here, and some audio file here. Now, these necessarily 580 00:54:11,920 --> 00:54:17,600 don't go in all at the same time. This image could just focus on a neural network specifically 581 00:54:17,600 --> 00:54:23,280 for images. This text could focus on a neural network specifically for text. And this sound bite 582 00:54:23,280 --> 00:54:29,680 or speech could focus on a neural network specifically for speech. However, the field is sort of also 583 00:54:29,680 --> 00:54:34,240 moving towards building neural networks that are capable of handling all three types of inputs. 584 00:54:34,960 --> 00:54:39,520 For now, we're going to start small and then build up the algorithms that we're going to focus on 585 00:54:39,520 --> 00:54:45,440 are neural networks that focus on one type of data. But the premise is still the same. You have 586 00:54:45,440 --> 00:54:50,480 some kind of input. You have to numerically encode it in some form, pass it to a neural network 587 00:54:50,480 --> 00:54:55,520 to learn representations or patterns within that numerical encoding, output some form of 588 00:54:55,520 --> 00:55:00,800 representation. And then we can convert that representation into things that humans understand. 589 00:55:01,760 --> 00:55:06,800 And you might have already seen these, and I might have already referenced the fact that 590 00:55:07,360 --> 00:55:13,200 these are tensors. So when the question comes up, what are tensors? A tensor could be almost 591 00:55:13,200 --> 00:55:18,400 anything. It could be almost any representation of numbers. We're going to get very hands on with 592 00:55:18,400 --> 00:55:23,840 tensors. And that's actually the fundamental building block of PyTorch aside from neural network 593 00:55:23,840 --> 00:55:31,040 components is the torch dot tensor. We're going to see that very shortly. But this is a very 594 00:55:31,040 --> 00:55:36,400 important takeaway is that you have some sort of input data. You're going to numerically encode 595 00:55:36,400 --> 00:55:41,840 that data, turn it into a tensor of some kind. Whatever that kind is will depend on the problem 596 00:55:41,840 --> 00:55:47,920 you're working with. Then you're going to pass it to a neural network, which will perform mathematical 597 00:55:47,920 --> 00:55:54,000 operations on that tensor. Now, a lot of those mathematical operations are taken care of by 598 00:55:54,800 --> 00:55:59,760 PyTorch behind the scenes. So we'll be writing code to execute some kind of mathematical 599 00:55:59,760 --> 00:56:06,400 operations on these tensors. And then the neural network that we create, or the one that's already 600 00:56:06,400 --> 00:56:12,400 been created, but we just use for our problem, we'll output another tensor, similar to the input, 601 00:56:12,400 --> 00:56:17,920 but has been manipulated in a certain way that we've sort of programmed it to. And then we can take 602 00:56:17,920 --> 00:56:25,600 this output tensor and change it into something that a human can understand. So to remove a lot 603 00:56:25,600 --> 00:56:30,720 of the text around it, make it a little bit more clearer. If we were focusing on building an image 604 00:56:30,720 --> 00:56:35,200 classification model, so we want to classify whether this was a photo of Raman or spaghetti, 605 00:56:35,200 --> 00:56:40,240 we would have images as input. We would turn those images into numbers, which are represented 606 00:56:40,240 --> 00:56:45,440 by a tensor. We would pass that tensor of numbers to a neural network, or there might be lots of 607 00:56:45,440 --> 00:56:50,240 tensors here. We might have 10,000 images. We might have a million images. Or in some cases, 608 00:56:50,240 --> 00:56:55,600 if you're Google or Facebook, you might be working with 300 million or a billion images at a time. 609 00:56:56,800 --> 00:57:02,880 The principle still stands that you encode your data in some form of numerical representation, 610 00:57:02,880 --> 00:57:08,720 which is a tensor, pass that tensor, or lots of tensors to a neural network. The neural network 611 00:57:08,720 --> 00:57:14,080 performs mathematical operations on those tensors, outputs a tensor, we convert that tensor into 612 00:57:14,080 --> 00:57:20,000 something that we can understand as humans. And so with that being said, we've covered a lot of 613 00:57:20,000 --> 00:57:24,480 the fundamentals. What is machine learning? What is deep learning? What is neural network? Well, 614 00:57:24,480 --> 00:57:28,720 we've touched the surface of these things. You can get as deep as you like. We've covered 615 00:57:28,720 --> 00:57:33,760 why use PyTorch. What is PyTorch? Now, the fundamental building block of deep learning 616 00:57:33,760 --> 00:57:37,840 is tensors. We've covered that. Let's get a bit more specific in the next video 617 00:57:37,840 --> 00:57:43,840 of what we're going to cover code-wise in this first module. I'm so excited we're going to start 618 00:57:43,840 --> 00:57:50,720 codes in. I'll see you in the next video. Now it's time to get specific about what we're going to 619 00:57:50,720 --> 00:57:56,560 cover code-wise in this fundamentals module. But I just want to reiterate the fact that 620 00:57:56,560 --> 00:58:01,680 going back to the last video where I challenge you to look up what is a tensor, here's exactly 621 00:58:01,680 --> 00:58:06,720 what I would do. I would come to Google. I would type in the question, what is a tensor? There we go. 622 00:58:06,720 --> 00:58:11,680 What is a tensor in PyTorch? It knows Google knows that using that deep learning data that we want 623 00:58:11,680 --> 00:58:15,920 to know what a tensor is in PyTorch. But a tensor is a very general thing. It's not 624 00:58:15,920 --> 00:58:21,360 associated with just PyTorch. Now we've got tensor on Wikipedia. We've got tensor. This is probably 625 00:58:21,360 --> 00:58:26,960 my favorite video on what is a tensor. By Dan Flesch. Flesch, I'm probably saying that wrong, 626 00:58:26,960 --> 00:58:33,440 but good first name. This is going to be your extra curriculum for this video and the previous 627 00:58:33,440 --> 00:58:38,960 video is to watch this on what is a tensor. Now you might be saying, well, what gives? I've come to 628 00:58:38,960 --> 00:58:43,360 this course to learn PyTorch and all this guy's doing, all you're doing, Daniel, is just Googling 629 00:58:43,360 --> 00:58:49,120 things when a question comes up. Why don't you just tell me what it is? Well, if I was to tell you 630 00:58:49,120 --> 00:58:53,680 everything about deep learning and machine learning and PyTorch and what it is and what it's not, 631 00:58:53,680 --> 00:58:58,800 that course would be far too long. I'm doing this on purpose. I'm searching questions like this on 632 00:58:58,800 --> 00:59:04,240 purpose because that's exactly what I do day to day as a machine learning engineer. I write code 633 00:59:04,240 --> 00:59:09,440 like we're about to do. And then if I don't know something, I literally go to whatever search engine 634 00:59:09,440 --> 00:59:14,800 I'm using, Google most of the time, and type in whatever error I'm getting or PyTorch, what is 635 00:59:14,800 --> 00:59:20,800 a tensor, something like that. So I want to not only tell you that it's okay to search questions 636 00:59:20,800 --> 00:59:25,360 like that, but it's encouraged. So just keep that in mind as we go through the whole course, 637 00:59:25,360 --> 00:59:30,160 you're going to see me do it a lot. Let's get into what we're going to cover. Here we go. 638 00:59:31,200 --> 00:59:36,320 Now, this tweet is from Elon Musk. And so I've decided, you know what, let's base the whole 639 00:59:36,320 --> 00:59:41,920 course on this tweet. We have learning MLDL from university, you have a little bit of a small brain. 640 00:59:41,920 --> 00:59:47,040 Online courses, well, like this one, that brain's starting to explode and you get some little fireworks 641 00:59:47,040 --> 00:59:53,520 from YouTube. Oh, you're watching this on YouTube. Look at that shiny brain from articles. My goodness. 642 00:59:54,720 --> 01:00:03,040 Lucky that this course comes in article format. If you go to learn pytorch.io, all of the course 643 01:00:03,040 --> 01:00:08,800 materials are in online book format. So we're going to get into this fundamental section very 644 01:00:08,800 --> 01:00:13,360 shortly. But if you want a reference, the course materials are built off this book. And by the 645 01:00:13,360 --> 01:00:17,680 time you watch this, there's going to be more chapters here. So we're covering all the bases 646 01:00:17,680 --> 01:00:23,280 here. And then finally, from memes, you would ascend to some godlike creature. I think that's 647 01:00:23,280 --> 01:00:27,680 hovering underwater. So that is the best way to learn machine learning. So this is what we're 648 01:00:27,680 --> 01:00:33,680 going to start with MLDL from university online courses, YouTube from articles from memes. No, 649 01:00:33,680 --> 01:00:41,520 no, no, no. But kind of here is what we're going to cover broadly. So now in this module, 650 01:00:42,480 --> 01:00:47,520 we are going to cover the pytorch basics and fundamentals, mainly dealing with tensors and 651 01:00:47,520 --> 01:00:52,880 tensor operations. Remember, a neural network is all about input tensors, performing operations on 652 01:00:52,880 --> 01:00:59,920 those tensors, creating output operations. Later, we're going to be focused on pre-processing data, 653 01:00:59,920 --> 01:01:06,000 getting it into tensors, so turning data from raw form, images, whatever, into a numerical 654 01:01:06,000 --> 01:01:09,840 encoding, which is a tensor. Then we're going to look at building and using pre-trained deep 655 01:01:09,840 --> 01:01:14,240 learning models, specifically neural networks. We're going to fit a model to the data. So we're 656 01:01:14,240 --> 01:01:18,960 going to show our model or write code for our model to learn patterns in the data that we've 657 01:01:18,960 --> 01:01:22,400 pre-processed. We're going to see how we can make predictions with our model, because that's 658 01:01:22,400 --> 01:01:26,160 what deep learning and machine learning is all about, right, using patterns from the past to 659 01:01:26,160 --> 01:01:30,240 predict the future. And then we're going to evaluate our model's predictions. We're going to learn 660 01:01:30,240 --> 01:01:34,720 how to save and load our models. For example, if you wanted to export your model from where we're 661 01:01:34,720 --> 01:01:39,360 working to an application or something like that. And then finally, we're going to see how we can 662 01:01:39,360 --> 01:01:46,000 use a trained model to make predictions on our own data on custom data, which is very fun. And 663 01:01:46,000 --> 01:01:51,280 how? Well, you can see that the scientist has faded out a little bit, but that's not really that true. 664 01:01:51,280 --> 01:01:56,320 We're going to do it like cooks, not chemists. So chemists are quite precise. Everything has to be 665 01:01:56,320 --> 01:02:01,520 exactly how it is. But cooks are more like, oh, you know what, a little bit of salt, a little bit of 666 01:02:01,520 --> 01:02:06,080 butter. Does it taste good? Okay, well, then we're on. But machine learning is a little bit of both. 667 01:02:06,080 --> 01:02:11,280 It's a little bit of science, a little bit of art. That's how we're going to do it. But 668 01:02:11,280 --> 01:02:15,760 I like the idea of this being a machine learning cooking show. So welcome to cooking with machine 669 01:02:15,760 --> 01:02:25,120 learning, cooking with PyTorch with Daniel. And finally, we've got a workflow here, which we have 670 01:02:25,120 --> 01:02:29,840 a PyTorch workflow, which is one of many. We're going to kind of use this throughout the entire 671 01:02:29,840 --> 01:02:34,480 course is step one, we're going to get our data ready. Step two, we're going to build a 672 01:02:34,480 --> 01:02:38,720 pick a pre trained model to suit whatever problem we're working on. Step two point one, 673 01:02:38,720 --> 01:02:42,320 pick a loss function and optimizer. Don't worry about what they are. We're going to cover them 674 01:02:42,320 --> 01:02:47,440 soon. Step two point two, build a training loop. Now this is kind of all part of the parcel of 675 01:02:47,440 --> 01:02:51,680 step two, hence why we've got two point one and two point two. You'll see what that means later on. 676 01:02:52,240 --> 01:02:55,520 Number three, we're going to fit the model to the data and make a prediction. So say we're working 677 01:02:55,520 --> 01:03:01,120 on image classification for Raman or spaghetti. How do we build a neural network or put our 678 01:03:01,120 --> 01:03:06,000 images through that neural network to get some sort of idea of what's in an image? We'll see 679 01:03:06,000 --> 01:03:11,360 how to do that. Well, the value weight our model to see if it's predicting BS or it's actually 680 01:03:11,360 --> 01:03:16,640 going all right. Number five, we're going to improve through experimentation. That's another 681 01:03:16,640 --> 01:03:20,720 big thing that you'll notice throughout machine learning throughout this course is that it's 682 01:03:20,720 --> 01:03:27,200 very experimental part art, part science. Number six, save and reload your trained model. Again, 683 01:03:27,200 --> 01:03:33,040 I put these with numerical order, but they can kind of be mixed and matched depending on where 684 01:03:33,040 --> 01:03:40,560 you are in the journey. But numerical order is just easy to understand for now. Now we've got 685 01:03:40,560 --> 01:03:45,840 one more video, maybe another one before we get into code. But in the next video, I'm going to 686 01:03:45,840 --> 01:03:52,320 cover some very, very important points on how you should approach this course. I'll see you there. 687 01:03:54,400 --> 01:03:58,400 Now you might be asking, how should I approach this course? You might not be asking, but we're 688 01:03:58,400 --> 01:04:03,440 going to answer it anyway. How to approach this course? This is how I would recommend approaching 689 01:04:03,440 --> 01:04:10,000 this course. So I'm a machine learning engineer day to day and learning machine learning to 690 01:04:10,000 --> 01:04:14,560 coding machine learning, a kind of two different things. I remember when I first learned it was 691 01:04:14,560 --> 01:04:18,960 kind of, you learned a lot of theory rather than writing code. So not to take away from the theory 692 01:04:18,960 --> 01:04:24,000 of being important, this course is going to be focusing on writing machine learning specifically 693 01:04:24,000 --> 01:04:31,200 PyTorch code. So the number one step to approaching this course is to code along. Now because this 694 01:04:31,200 --> 01:04:37,840 course is focused on purely writing code, I will be linking extracurricular resources for you to 695 01:04:37,840 --> 01:04:43,360 learn more about what's going on behind the scenes of the code. My idea of teaching is that if we 696 01:04:43,360 --> 01:04:48,080 can code together, write some code, see how it's working, that's going to spark your curiosity to 697 01:04:48,080 --> 01:04:55,040 figure out what's going on behind the scenes. So motto number one is if and out, run the code, 698 01:04:55,040 --> 01:05:05,120 write it, run the code, see what happens. Number two, I love that. Explore an experiment again. 699 01:05:05,120 --> 01:05:13,600 Approach this with the idea, the mind of a scientist and a chef or science and art. Experiment, 700 01:05:13,600 --> 01:05:18,880 experiment, experiment. Try things with rigor like a scientist would, and then just try things 701 01:05:18,880 --> 01:05:25,200 for the fun of it like a chef would. Number three, visualize what you don't understand. I can't 702 01:05:25,200 --> 01:05:29,200 emphasize this one enough. We have three models so far. If and out, run the code, you're going to 703 01:05:29,200 --> 01:05:34,640 hear me say this a lot. Experiment, experiment, experiment. And number three, visualize, visualize, 704 01:05:34,640 --> 01:05:39,600 visualize. Why is this? Well, because we've spoken about machine learning and deep learning 705 01:05:39,600 --> 01:05:45,200 deals with a lot of data, a lot of numbers. And so I find it that if I visualize some numbers in 706 01:05:45,760 --> 01:05:51,840 whatever form that isn't just numbers all over a page, I tend to understand it better. 707 01:05:51,840 --> 01:05:57,120 And there are some great extracurricular resources that I'm going to link that also turn what we're 708 01:05:57,120 --> 01:06:07,040 doing. So writing code into fantastic visualizations. Number four, ask questions, including the dumb 709 01:06:07,040 --> 01:06:11,120 questions. Really, there's no such thing as a dumb question. Everyone is just on a different 710 01:06:11,120 --> 01:06:15,520 part of their learning journey. And in fact, if you do have a quote unquote dumb question, 711 01:06:15,520 --> 01:06:20,080 it turns out that a lot of people probably have that one as well. So be sure to ask questions. 712 01:06:20,080 --> 01:06:24,000 I'm going to link a resource in a minute of where you can ask those questions, but 713 01:06:24,000 --> 01:06:29,120 please, please, please ask questions, not only to the community, but to Google to the internet 714 01:06:29,120 --> 01:06:33,440 to wherever you can, or just yourself. Ask questions of the code and write code to figure 715 01:06:33,440 --> 01:06:38,160 out the answer to those questions. Number five, do the exercises. There are some 716 01:06:38,720 --> 01:06:44,640 great exercises that I've created for each of the modules. If we go, have we got the book version 717 01:06:44,640 --> 01:06:51,280 of the course up here? We do. Within all of these chapters here, down the bottom is going to be 718 01:06:51,280 --> 01:06:56,960 exercises and extra curriculum. So we've got some exercises. I'm not going to jump into them, 719 01:06:56,960 --> 01:07:02,720 but I would highly recommend don't just follow along with the course and code after I code. 720 01:07:02,720 --> 01:07:08,160 Please, please, please give the exercises a go because that's going to stretch your knowledge. 721 01:07:09,520 --> 01:07:13,200 We're going to have a lot of practice writing code together, doing all of this stuff here. 722 01:07:13,200 --> 01:07:16,800 But then the exercises are going to give you a chance to practice what you've learned. 723 01:07:16,800 --> 01:07:22,160 And then of course, extra curriculum. Well, hey, if you want to learn more, there's plenty of 724 01:07:22,160 --> 01:07:30,160 opportunities to do so there. And then finally, number six, share your work. I can't emphasize 725 01:07:30,160 --> 01:07:36,320 enough how much writing about learning deep learning or sharing my work through GitHub or 726 01:07:36,320 --> 01:07:42,160 different code resources or with the community has helped with my learning. So if you learn 727 01:07:42,160 --> 01:07:47,920 something cool about PyTorch, I'd love to see it. Link it to me somehow in the Discord chat 728 01:07:47,920 --> 01:07:53,360 or on GitHub or whatever. There'll be links of where you can find me. I'd love to see it. Please 729 01:07:53,360 --> 01:07:58,560 do share your work. It's a great way to not only learn something because when you share it, when 730 01:07:58,560 --> 01:08:02,960 you write about it, it's like, how would someone else understand it? But it's also a great way to 731 01:08:02,960 --> 01:08:10,160 help others learn too. And so we said how to approach this course. Now, let's go how not to 732 01:08:10,160 --> 01:08:16,800 approach this course. I would love for you to avoid overthinking the process. And this is your brain, 733 01:08:16,800 --> 01:08:21,920 and this is your brain on fire. So avoid having your brain on fire. That's not a good place to be. 734 01:08:21,920 --> 01:08:27,280 We are working with PyTorch, so it's going to be quite hot. Just playing on words with the name 735 01:08:27,280 --> 01:08:32,400 torch. But avoid your brain catching on fire. And avoid saying, I can't learn, 736 01:08:33,920 --> 01:08:38,000 I've said this to myself lots of times, and then I've practiced it and it turns out I can 737 01:08:38,000 --> 01:08:42,240 actually learn those things. So let's just draw a red line on there. Oh, I think a red line. 738 01:08:42,240 --> 01:08:46,480 Yeah, there we go. Nice and thick red line. We'll get that out there. It doesn't really make sense 739 01:08:46,480 --> 01:08:52,400 now that this says avoid and crossed out. But don't say I can't learn and prevent your brain from 740 01:08:52,400 --> 01:08:58,480 catching on fire. Finally, we've got one more video that I'm going to cover before this one 741 01:08:58,480 --> 01:09:03,120 gets too long of the resources for the course before we get into coding. I'll see you there. 742 01:09:03,120 --> 01:09:10,000 Now, there are some fundamental resources that I would like you to be aware of before we 743 01:09:10,000 --> 01:09:14,640 go any further in this course. These are going to be paramount to what we're working with. 744 01:09:14,640 --> 01:09:21,600 So for this course, there are three things. There is the GitHub repo. So if we click this link, 745 01:09:22,480 --> 01:09:25,840 I've got a pinned on my browser. So you might want to do the same while you're going through 746 01:09:25,840 --> 01:09:32,080 the course. But this is Mr. D. Burks in my GitHub slash PyTorch deep learning. It is still a work 747 01:09:32,080 --> 01:09:35,760 in progress at the time of recording this video. But by the time you go through it, it won't look 748 01:09:35,760 --> 01:09:40,080 too much different, but there just be more materials. You'll have materials outline, 749 01:09:40,080 --> 01:09:44,560 section, what does it cover? As you can see, some more are coming soon at the time of recording 750 01:09:44,560 --> 01:09:49,200 this. So these will probably be done by the time you watch this exercise in extra curriculum. 751 01:09:49,200 --> 01:09:54,240 There'll be links here. Basically, everything you need for the course will be in the GitHub repo. 752 01:09:54,240 --> 01:10:02,240 And then if we come back, also on the GitHub repo, the same repo. So Mr. D. Burks slash PyTorch 753 01:10:02,240 --> 01:10:08,240 deep learning. If you click on discussions, this is going to be the Q and A. This is just the same 754 01:10:08,240 --> 01:10:13,760 link here, the Q and A for the course. So if you have a question here, you can click new discussion, 755 01:10:13,760 --> 01:10:25,520 you can go Q and A, and then type in video, and then the title PyTorch Fundamentals, and then go 756 01:10:25,520 --> 01:10:35,200 in here. Or you could type in your error as well. What is N-DIM for a tensor? And then in here, 757 01:10:35,200 --> 01:10:42,720 you can type in some stuff here. Hello. I'm having trouble on video X, Y, Z. Put in the name of the 758 01:10:42,720 --> 01:10:48,640 video. So that way I can, or someone else can help you out. And then code, you can go three 759 01:10:48,640 --> 01:10:54,880 back ticks, write Python, and then you can go import torch, torch dot rand n, which is going to 760 01:10:54,880 --> 01:10:59,920 create a tensor. We're going to see this in a second. Yeah, yeah, yeah. And then if you post that 761 01:10:59,920 --> 01:11:05,840 question, the formatting of the code is very helpful that we can understand what's going on, 762 01:11:05,840 --> 01:11:11,040 and what's going on here. So this is basically the outline of how I would ask a question video. 763 01:11:11,040 --> 01:11:16,880 This is going on. What is such and such for whatever's going on? Hello. This is what I'm having 764 01:11:16,880 --> 01:11:21,600 trouble with. Here's the code, and here's what's happening. You could even include the error message, 765 01:11:21,600 --> 01:11:26,400 and then you can just click start discussion, and then someone, either myself or someone else from 766 01:11:26,400 --> 01:11:30,320 the course will be able to help out there. And the beautiful thing about this is that it's all in 767 01:11:30,320 --> 01:11:34,640 one place. You can start to search it. There's nothing here yet because the course isn't out yet, 768 01:11:34,640 --> 01:11:38,880 but as you go through it, there will probably be more and more stuff here. Then if you have any 769 01:11:38,880 --> 01:11:44,000 issues with the code that you think needs fixed, you can also open a new issue there. I'll let you 770 01:11:44,000 --> 01:11:48,800 read more into what's going on. I've just got some issues here already about the fact that I 771 01:11:48,800 --> 01:11:52,480 need to record videos for the course. I need to create some stuff. But if you think there's 772 01:11:52,480 --> 01:11:56,560 something that could be improved, make an issue. If you have a question about the course, 773 01:11:57,200 --> 01:12:02,000 ask a discussion. And then if we come back to the keynote, we have one more resource. So that 774 01:12:02,000 --> 01:12:06,000 was the course materials all live in the GitHub. The course Q&A is on the course 775 01:12:06,000 --> 01:12:13,120 GitHub's discussions tab, and then the course online book. Now, this is a work of art. 776 01:12:13,680 --> 01:12:18,400 This is quite beautiful. It is some code to automatically turn all of the materials from the 777 01:12:18,400 --> 01:12:24,720 GitHub. So if we come into here code, if we click on notebook zero zero, this is going to sometimes 778 01:12:24,720 --> 01:12:29,280 if you've ever worked with Jupiter notebooks on GitHub, they can take a while to load. 779 01:12:29,920 --> 01:12:35,200 So all of the materials here automatically get converted into this book. So the beautiful 780 01:12:35,200 --> 01:12:40,000 thing about the book is that it's got different headings here. It's all readable. It's all online. 781 01:12:40,000 --> 01:12:43,600 It's going to have all the images there. And you can also search some stuff here, 782 01:12:43,600 --> 01:12:50,880 PyTorch training steps, creating a training loop in PyTorch. Beautiful. We're going to see 783 01:12:50,880 --> 01:12:56,800 this later on. So they're the three big materials that you need to be aware of, the three big resources 784 01:12:56,800 --> 01:13:03,200 for this specific course materials on GitHub course Q&A course online book, which is 785 01:13:03,200 --> 01:13:08,160 learn pytorch.io, simple URL to remember, all the materials will be there. And then 786 01:13:09,120 --> 01:13:15,680 specifically for PyTorch or things PyTorch, the PyTorch website and the PyTorch forums. 787 01:13:15,680 --> 01:13:20,240 So if you have a question that's not course related, but more PyTorch related, I'd highly 788 01:13:20,240 --> 01:13:25,200 recommend you go to the PyTorch forums, which is available at discuss.pytorch.org. We've got a link 789 01:13:25,200 --> 01:13:30,560 there. Then the PyTorch website, PyTorch.org, this is going to be your home ground for everything 790 01:13:30,560 --> 01:13:36,080 PyTorch of course. We have the documentation here. And as I said, this course is not a replacement 791 01:13:36,080 --> 01:13:42,320 for getting familiar with the PyTorch documentation. This, the course actually is built off all of 792 01:13:42,320 --> 01:13:47,440 the PyTorch documentation. It's just organized in a slightly different way. So there's plenty of 793 01:13:47,440 --> 01:13:52,720 amazing resources here on everything to do with PyTorch. This is your home ground. And you're 794 01:13:52,720 --> 01:13:58,400 going to see me referring to this a lot throughout the course. So just keep these in mind, course 795 01:13:58,400 --> 01:14:05,760 materials on GitHub, course discussions, learnpytorch.io. This is all for the course. And all things 796 01:14:05,760 --> 01:14:11,440 PyTorch specific, so not necessarily this course, but just PyTorch in general, the PyTorch website 797 01:14:11,440 --> 01:14:18,240 and the PyTorch forums. With that all being said, we've come so far. We've covered a lot already, 798 01:14:18,240 --> 01:14:25,600 but guess what time it is? Let's write some code. I'll see you in the next video. 799 01:14:25,600 --> 01:14:32,240 We've covered enough of the fundamentals so far. Well, from a theory point of view, 800 01:14:32,240 --> 01:14:36,640 let's get into coding. So I'm going to go over to Google Chrome. I'm going to introduce you to 801 01:14:36,640 --> 01:14:41,360 the tool. One of the main tools we're going to be using for the entire course. And that is Google 802 01:14:41,360 --> 01:14:47,040 Colab. So the way I would suggest following along with this course is remember, one of the major 803 01:14:47,040 --> 01:14:52,880 ones is to code along. So we're going to go to colab.research.google. I've got a typo here. 804 01:14:52,880 --> 01:14:58,000 Classic. You're going to see me do lots of typos throughout this course. Colab.research.google.com. 805 01:14:58,000 --> 01:15:03,200 This is going to load up Google Colab. Now, you can follow along with what I'm going to do, 806 01:15:03,200 --> 01:15:08,800 but if you'd like to find out how to use Google Colab from a top-down perspective, 807 01:15:09,520 --> 01:15:13,040 you can go through some of these. I'd probably recommend going through overview of 808 01:15:13,040 --> 01:15:18,640 Collaboratory Features. But essentially, what Google Colab is going to enable us to do is 809 01:15:18,640 --> 01:15:23,840 create a new notebook. And this is how we're going to practice writing PyTorch code. 810 01:15:23,840 --> 01:15:30,640 So if you refer to the reference document of learnpytorch.io, these are actually 811 01:15:30,640 --> 01:15:37,760 Colab notebooks just in book format, so online book format. So these are the basis materials 812 01:15:37,760 --> 01:15:42,480 for what the course is going to be. There's going to be more here, but every new module, 813 01:15:42,480 --> 01:15:46,240 we're going to start a new notebook. And I'm going to just zoom in here. 814 01:15:46,240 --> 01:15:51,520 So this one, the first module is going to be zero, zero, because Python code starts at zero, 815 01:15:51,520 --> 01:15:57,520 zero. And we're going to call this PyTorch Fundamentals. I'm going to call mine video, 816 01:15:57,520 --> 01:16:02,640 just so we know that this is the notebook that I wrote through the video. And what this is going 817 01:16:02,640 --> 01:16:08,560 to do is if we click Connect, it's going to give us a space to write Python code. So here we can go 818 01:16:08,560 --> 01:16:16,960 print. Hello, I'm excited to learn PyTorch. And then if we hit shift and enter, it comes out like 819 01:16:16,960 --> 01:16:22,560 that. But another beautiful benefit of Google Colab are PS. I'm using the pro version, which 820 01:16:22,560 --> 01:16:26,880 costs about $10 a month or so. That price may be different depending on where you're from. 821 01:16:26,880 --> 01:16:31,200 The reason I'm doing that is because I use Colab all the time. However, you do not have to use 822 01:16:31,200 --> 01:16:36,080 the paid version for this course. Google Colab comes with a free version, which you'll be able 823 01:16:36,080 --> 01:16:41,680 to use to complete this course. If you see it worthwhile, I find the pro version is worthwhile. 824 01:16:42,240 --> 01:16:47,360 Another benefit of Google Colab is if we go here, we can go to runtime. Let me just show you that 825 01:16:47,360 --> 01:16:55,280 again. Runtime, change runtime type, hardware accelerator. And we can choose to run our code 826 01:16:55,280 --> 01:17:00,720 on an accelerator here. Now we've got GPU and TPU. We're going to be focused on using 827 01:17:00,720 --> 01:17:06,640 GPU. If you'd like to look into TPU, I'll leave that to you. But we can click GPU, click save. 828 01:17:06,640 --> 01:17:13,280 And now our code, if we write it in such a way, will run on the GPU. Now we're going to see this 829 01:17:13,280 --> 01:17:20,000 later on code that runs on the GPU is a lot faster in terms of compute time, especially for deep 830 01:17:20,000 --> 01:17:27,440 learning. So if we write here in a video SMI, we now have access to a GPU. In my case, I have a 831 01:17:27,440 --> 01:17:34,320 Tesla P100. It's quite a good GPU. You tend to get the better GPUs. If you pay for Google Colab, 832 01:17:34,320 --> 01:17:39,040 if you don't pay for it, you get the free version, you get a free GPU. It just won't be as fast as 833 01:17:39,040 --> 01:17:44,080 the GPUs you typically get with the paid version. So just keep that in mind. A whole bunch of stuff 834 01:17:44,080 --> 01:17:50,320 that we can do here. I'm not going to go through it all because there's too much. But we've covered 835 01:17:50,320 --> 01:17:56,800 basically what we need to cover. So if we just come up here, I'm going to write a text cell. So 836 01:17:56,800 --> 01:18:07,200 oo dot pytorch fundamentals. And I'm going to link in here resource notebook. Now you can come 837 01:18:07,200 --> 01:18:14,240 to learn pytorch.io and all the notebooks are going to be in sync. So 00, we can put this in here. 838 01:18:14,240 --> 01:18:20,160 Resource notebook is there. That's what this notebook is going to be based off. This one here. 839 01:18:20,160 --> 01:18:24,480 And then if you have a question about what's going on in this notebook, 840 01:18:24,480 --> 01:18:31,520 you can come to the course GitHub. And then we go back, back. This is where you can see what's 841 01:18:31,520 --> 01:18:35,840 going on. This is pytorch deep learning projects as you can see what's happening. At the moment, 842 01:18:35,840 --> 01:18:40,240 I've got pytorch course creation because I'm in the middle of creating it. But if you have a question, 843 01:18:40,240 --> 01:18:45,920 you can come to Mr. D Burke slash pytorch deep learning slash discussions, which is this tab here, 844 01:18:45,920 --> 01:18:51,440 and then ask a question by clicking new discussion. So any discussions related to this notebook, 845 01:18:51,440 --> 01:18:55,360 you can ask it there. And I'm going to turn this right now. This is a code cell. 846 01:18:56,000 --> 01:19:01,360 CoLab is basically comprised of code and text cells. I'm going to turn this into a text cell 847 01:19:01,360 --> 01:19:07,120 by pressing command mm, shift and enter. Now we have a text cell. And then if we wanted another 848 01:19:07,120 --> 01:19:13,120 code cell, we could go like that text code text code, yada, yada, yada. But I'm going to delete this. 849 01:19:14,160 --> 01:19:20,480 And to finish off this video, we're going to import pytorch. So we're going to import torch. 850 01:19:20,480 --> 01:19:27,680 And then we're going to print torch dot dot version. So that's another beautiful thing about Google 851 01:19:27,680 --> 01:19:33,840 Colab is that it comes with pytorch pre installed and a lot of other common Python data science 852 01:19:33,840 --> 01:19:43,120 packages, such as we could also go import pandas as PD, import NumPy as MP import mapplot lib 853 01:19:43,120 --> 01:19:53,840 lib dot pyplot as PLT. This is Google Colab is by far the easiest way to get started with this 854 01:19:53,840 --> 01:20:00,240 course. You can run things locally. If you'd like to do that, I'd refer to you to pytorch deep 855 01:20:00,240 --> 01:20:06,400 learning is going to be set up dot MD, getting set up to code pytorch. We've just gone through 856 01:20:06,400 --> 01:20:12,000 number one setting up with Google Colab. There is also another option for getting started locally. 857 01:20:12,000 --> 01:20:15,760 Right now, this document's a work in progress, but it'll be finished by the time you watch this 858 01:20:15,760 --> 01:20:20,880 video. This is not a replacement, though, for the pytorch documentation for getting set up 859 01:20:20,880 --> 01:20:25,920 locally. So if you'd like to run locally on your machine, rather than going on Google Colab, 860 01:20:25,920 --> 01:20:31,840 please refer to this documentation or set up dot MD here. But if you'd like to get started 861 01:20:31,840 --> 01:20:36,480 as soon as possible, I'd highly recommend you using Google Colab. In fact, the entire course 862 01:20:36,480 --> 01:20:40,960 is going to be able to be run through Google Colab. So let's finish off this video, make sure 863 01:20:40,960 --> 01:20:46,880 we've got pytorch ready to go. And of course, some fundamental data science packages here. 864 01:20:47,520 --> 01:20:55,040 Wonderful. This means that we have pytorch 1.10.0. So if your version number is far greater than this, 865 01:20:55,040 --> 01:21:00,320 maybe you're watching this video a couple of years in the future, and pytorch is up to 2.11, 866 01:21:00,320 --> 01:21:05,600 maybe some of the code in this notebook won't work. But 1.10.0 should be more than enough for 867 01:21:05,600 --> 01:21:15,920 what we're going to do. And plus Q111, CU111, stands for CUDA version 11.1, I believe. And what 868 01:21:15,920 --> 01:21:20,480 that would mean is if we came in here, and we wanted to install it on Linux, which is what 869 01:21:20,480 --> 01:21:27,280 Colab runs on, there's Mac and Windows as well. We've got CUDA. Yeah. So right now, as of recording 870 01:21:27,280 --> 01:21:33,680 this video, the latest pytorch build is 1.10.2. So you'll need at least pytorch 1.10 to complete 871 01:21:33,680 --> 01:21:43,280 this course and CUDA 11.3. So that's CUDA toolkit. If you remember, CUDA toolkit is NVIDIA's 872 01:21:44,400 --> 01:21:51,600 programming. There we go. NVIDIA developer. CUDA is what enables us to run our pytorch code on 873 01:21:51,600 --> 01:22:00,320 NVIDIA GPUs, which we have access to in Google Colab. Beautiful. So we're set up ready to write code. 874 01:22:00,320 --> 01:22:05,840 Let's get started in the next video writing some pytorch code. This is so exciting. I'll see you 875 01:22:05,840 --> 01:22:13,120 there. So we've got set up. We've got access to pytorch. We've got a Google Colab instance running 876 01:22:13,120 --> 01:22:18,640 here. We've got a GPU because we've gone up to runtime, change runtime type, hardware accelerator. 877 01:22:18,640 --> 01:22:23,920 You won't necessarily need a GPU for this entire notebook, but I just wanted to show you how to 878 01:22:23,920 --> 01:22:30,320 get access to a GPU because we're going to be using them later on. So let's get rid of this. 879 01:22:30,320 --> 01:22:36,080 And one last thing, how I'd recommend going through this course is in a split window fashion. 880 01:22:36,080 --> 01:22:40,640 So for example, you might have the video where I'm talking right now and writing code on the 881 01:22:40,640 --> 01:22:46,720 left side, and then you might have another window over the other side with your own Colab 882 01:22:46,720 --> 01:22:55,040 window. And you can go new notebook, call it whatever you want, my notebook. You could call it very 883 01:22:55,040 --> 01:23:00,240 similar to what we're writing here. And then if I write code over on this side, on this video, 884 01:23:01,280 --> 01:23:05,840 you can't copy it, of course, but you'll write the same code here and then go on and go on and 885 01:23:05,840 --> 01:23:10,000 go on. And if you get stuck, of course, you have the reference notebook and you have an 886 01:23:10,000 --> 01:23:15,680 opportunity to ask a question here. So with that being said, let's get started. The first thing 887 01:23:15,680 --> 01:23:23,200 we're going to have a look at in PyTorch is an introduction to tenses. So tenses are the main 888 01:23:23,200 --> 01:23:28,960 building block of deep learning in general, or data. And so you may have watched the video, 889 01:23:29,520 --> 01:23:37,840 what is a tensor? For the sake of this course, tenses are a way to represent data, especially 890 01:23:37,840 --> 01:23:43,520 multi dimensional data, numeric data that is, but that numeric data represents something else. 891 01:23:43,520 --> 01:23:49,920 So let's go in here, creating tenses. So the first kind of tensor we're going to create is 892 01:23:49,920 --> 01:23:53,760 actually called a scalar. I know I'm going to throw a lot of different names of things at you, 893 01:23:53,760 --> 01:23:58,640 but it's important that you're aware of such nomenclature. Even though in PyTorch, almost 894 01:23:58,640 --> 01:24:03,680 everything is referred to as a tensor, there are different kinds of tenses. And just to 895 01:24:03,680 --> 01:24:10,880 exemplify the fact that we're using a reference notebook, if we go up here, we can see we have 896 01:24:10,880 --> 01:24:16,080 importing PyTorch. We've done that. Now we're up to introduction to tenses. We've got creating 897 01:24:16,080 --> 01:24:22,320 tenses, and we've got scalar, etc, etc, etc. So this is what we're going to be working through. 898 01:24:22,320 --> 01:24:29,200 Let's do it together. So scalar, the way to, oops, what have I done there? The way to create a 899 01:24:29,200 --> 01:24:36,480 tensor in PyTorch, we're going to call this scalar equals torch dot tensor. And we're going to fill 900 01:24:36,480 --> 01:24:42,880 it with the number seven. And then if we press or retype in scalar, what do we get back? Seven, 901 01:24:42,880 --> 01:24:48,560 wonderful. And it's got the tensor data type here. So how would we find out about what torch dot 902 01:24:48,560 --> 01:24:55,200 tensor actually is? Well, let me show you how I would. We go to torch dot tensor. There we go. 903 01:24:55,200 --> 01:25:00,800 We've got the documentation. So this is possibly the most common class in PyTorch other than 904 01:25:00,800 --> 01:25:06,560 one we're going to see later on that you'll use, which is torch dot nn. Basically, everything in 905 01:25:06,560 --> 01:25:11,440 PyTorch works off torch dot tensor. And if you'd like to learn more, you can read through here. 906 01:25:11,440 --> 01:25:15,360 In fact, I would encourage you to read through this documentation for at least 10 minutes 907 01:25:15,360 --> 01:25:20,080 after you finish some videos here. So with that being said, I'm going to link that in here. 908 01:25:20,080 --> 01:25:31,040 So PyTorch tensors are created using torch dot tensor. And then we've got that link there. 909 01:25:32,800 --> 01:25:38,320 Oops, typos got law Daniel. Come on. They're better than this. No, I'm kidding. There's going to be 910 01:25:38,320 --> 01:25:44,320 typos got law through the whole course. Okay. Now, what are some attributes of a scalar? So 911 01:25:44,320 --> 01:25:49,040 some details about scalars. Let's find out how many dimensions there are. Oh, and by the way, 912 01:25:49,040 --> 01:25:55,520 this warning, perfect timing. Google Colab will give you some warnings here, depending on whether 913 01:25:55,520 --> 01:26:01,600 you're using a GPU or not. Now, the reason being is because Google Colab provides GPUs to you and 914 01:26:01,600 --> 01:26:08,560 I for free. However, GPUs aren't free for Google to provide. So if we're not using a GPU, we can 915 01:26:08,560 --> 01:26:14,880 save some resources, allow someone else to use a GPU by going to none. And of course, we can 916 01:26:14,880 --> 01:26:19,680 always switch this back. So I'm going to turn my GPU off so that someone else out there, 917 01:26:20,320 --> 01:26:25,920 I'm not using the GPU at the moment, they can use it. So what you're also going to see is if 918 01:26:25,920 --> 01:26:33,600 your Google Colab instance ever restarts up here, we're going to have to rerun these cells. So if 919 01:26:33,600 --> 01:26:38,560 you stop coding for a while, go have a break and then come back and you start your notebook again, 920 01:26:38,560 --> 01:26:46,080 that's one downside of Google Colab is that it resets after a few hours. How many hours? I don't 921 01:26:46,080 --> 01:26:51,680 know exactly. The reset time is longer if you have the pro subscription, but because it's a free 922 01:26:51,680 --> 01:26:56,560 service and the way Google calculate usage and all that sort of stuff, I can't give a conclusive 923 01:26:56,560 --> 01:27:02,640 evidence or conclusive answer on how long until it resets. But just know, if you come back, you might 924 01:27:02,640 --> 01:27:07,600 have to rerun some of your cells and you can do that with shift and enter. So a scalar has no 925 01:27:07,600 --> 01:27:13,760 dimensions. All right, it's just a single number. But then we move on to the next thing. Or actually, 926 01:27:13,760 --> 01:27:19,120 if we wanted to get this number out of a tensor type, we can use scalar dot item, this is going 927 01:27:19,120 --> 01:27:26,480 to give it back as just a regular Python integer. Wonderful, there we go, the number seven back, 928 01:27:26,480 --> 01:27:37,360 get tensor back as Python int. Now, the next thing that we have is a vector. So let's write 929 01:27:37,360 --> 01:27:44,160 in here vector, which again is going to be created with torch dot tensor. But you will also hear 930 01:27:44,720 --> 01:27:52,240 the word vector used a lot too. Now, what is the deal? Oops, seven dot seven. Google Colab's auto 931 01:27:52,240 --> 01:27:57,040 complete is a bit funny. It doesn't always do the thing you want it to. So if we see a vector, 932 01:27:57,840 --> 01:28:02,000 we've got two numbers here. And then if we really wanted to find out what is a vector. 933 01:28:02,000 --> 01:28:10,800 So a vector usually has magnitude and direction. So what we're going to see later on is, there we 934 01:28:10,800 --> 01:28:15,760 go, magnitude, how far it's going and which way it's going. And then if we plotted it, we've got, 935 01:28:15,760 --> 01:28:20,720 yeah, a vector equals the magnitude would be the length here and the direction would be where it's 936 01:28:20,720 --> 01:28:26,640 pointing. And oh, here we go, scalar vector matrix tensor. This is what we're working on as well. 937 01:28:26,640 --> 01:28:33,920 So the thing about vectors, how they differ with scalars is how I just remember them is 938 01:28:33,920 --> 01:28:38,000 rather than magnitude and direction is a vector typically has more than one number. 939 01:28:38,560 --> 01:28:42,800 So if we go vector and dim, how many dimensions does it have? 940 01:28:44,960 --> 01:28:50,240 It has one dimension, which is kind of confusing. But when we see tensors with more than one 941 01:28:50,240 --> 01:28:55,280 dimension, it'll make sense. And another way that I remember how many dimensions something 942 01:28:55,280 --> 01:29:02,640 has is by the number of square brackets. So let's check out something else. Maybe we go vector 943 01:29:03,440 --> 01:29:12,160 dot shape shape is two. So the difference between dimension. So dimension is like number of square 944 01:29:12,160 --> 01:29:18,080 brackets. And when I say, even though there's two here, I mean number of pairs of closing square 945 01:29:18,080 --> 01:29:24,640 brackets. So there's one pair of closing square brackets here. But the shape of the vector is two. 946 01:29:24,640 --> 01:29:31,680 So we have two by one elements. So that means a total of two elements. Now if we wanted to step 947 01:29:31,680 --> 01:29:37,440 things up a notch, let's create a matrix. So this is another term you're going to hear. 948 01:29:37,440 --> 01:29:42,640 And you might be wondering why I'm capitalizing matrix. Well, I'll explain that in the second 949 01:29:42,640 --> 01:29:50,560 matrix equals torch dot tensor. And we're going to put two square brackets here. You might be 950 01:29:50,560 --> 01:29:55,600 thinking, what could the two square brackets mean? Or actually, that's a little bit of a challenge. 951 01:29:55,600 --> 01:30:02,880 If one pair of square brackets had an endem of one, what will the endem be number of dimensions 952 01:30:02,880 --> 01:30:12,960 of two square brackets? So let's create this matrix. Beautiful. So we've got another tensor here. 953 01:30:12,960 --> 01:30:18,400 Again, as I said, these things have different names, like the traditional name of scalar, 954 01:30:18,400 --> 01:30:24,000 vector matrix, but they're all still a torch dot tensor. That's a little bit confusing, 955 01:30:24,000 --> 01:30:29,840 but the thing you should remember in PyTorch is basically anytime you encode data into numbers, 956 01:30:29,840 --> 01:30:37,040 it's of a tensor data type. And so now how many n number of dimensions do you think a matrix has? 957 01:30:38,160 --> 01:30:43,920 It has two. So there we go. We have two square brackets. So if we wanted to get matrix, 958 01:30:43,920 --> 01:30:50,640 let's index on the zeroth axis. Let's see what happens there. Ah, so we get seven and eight. 959 01:30:50,640 --> 01:30:58,000 And then we get off the first dimension. Ah, nine and 10. So this is where the square brackets, 960 01:30:58,000 --> 01:31:02,960 the pairings come into play. We've got two square bracket pairings on the outside here. 961 01:31:02,960 --> 01:31:08,800 So we have an endem of two. Now, if we get the shape of the matrix, what do you think the shape will be? 962 01:31:08,800 --> 01:31:21,280 Ah, two by two. So we've got two numbers here by two. So we have a total of four elements in there. 963 01:31:22,320 --> 01:31:25,920 So we're covering a fair bit of ground here, nice and quick, but that's going to be the 964 01:31:25,920 --> 01:31:30,880 teaching style of this course is we're going to get quite hands on and writing a lot of code and 965 01:31:30,880 --> 01:31:36,080 just interacting with it rather than continually going back over and discussing what's going on 966 01:31:36,080 --> 01:31:42,000 here. The best way to find out what's happening within a matrix is to write more code that's similar 967 01:31:42,000 --> 01:31:48,960 to these matrices here. But let's not stop at matrix. Let's upgrade to a tensor now. So I might 968 01:31:48,960 --> 01:31:54,400 put this in capitals as well. And I haven't explained what the capitals mean yet, but we'll see that 969 01:31:54,400 --> 01:32:01,840 in a second. So let's go torch dot tensor. And what we're going to do is this time, 970 01:32:01,840 --> 01:32:07,360 we've done one square bracket pairing. We've done two square bracket pairings. Let's do three 971 01:32:07,360 --> 01:32:11,840 square bracket pairings and just get a little bit adventurous. All right. And so you might be thinking 972 01:32:11,840 --> 01:32:16,480 at the moment, this is quite tedious. I'm just going to write a bunch of random numbers here. One, 973 01:32:16,480 --> 01:32:23,920 two, three, three, six, nine, two, five, four. Now you might be thinking, Daniel, you've said 974 01:32:23,920 --> 01:32:28,480 tensors could have millions of numbers. If we had to write them all by hand, that would be 975 01:32:28,480 --> 01:32:35,520 quite tedious. And yes, you're completely right. The fact is, though, that most of the time, 976 01:32:35,520 --> 01:32:41,200 you won't be crafting tensors by hand. PyTorch will do a lot of that behind the scenes. However, 977 01:32:41,200 --> 01:32:45,600 it's important to know that these are the fundamental building blocks of the models 978 01:32:45,600 --> 01:32:51,680 and the deep learning neural networks that we're going to be building. So tensor capitals as well, 979 01:32:51,680 --> 01:32:57,920 we have three square brackets. So, or three square bracket pairings. I'm just going to refer to three 980 01:32:57,920 --> 01:33:04,400 square brackets at the very start because they're going to be paired down here. How many n dim or 981 01:33:04,400 --> 01:33:11,520 number of dimensions do you think our tensor will have? Three, wonderful. And what do you think the 982 01:33:11,520 --> 01:33:17,360 shape of our tensor is? We have three elements here. We have three elements here, three elements 983 01:33:17,360 --> 01:33:29,840 here. And we have one, two, three. So maybe our tensor has a shape of one by three by three. 984 01:33:29,840 --> 01:33:38,800 Hmm. What does that mean? Well, we've got three by one, two, three. That's the second square 985 01:33:38,800 --> 01:33:44,960 bracket there by one. Ah, so that's the first dimension there or the zeroth dimension because 986 01:33:44,960 --> 01:33:49,760 we remember PyTorch is zero indexed. We have, well, let's just instead of talking about it, 987 01:33:49,760 --> 01:33:54,800 let's just get on the zeroth axis and see what happens with the zeroth dimension. There we go. 988 01:33:54,800 --> 01:34:01,280 Okay. So there's, this is the far left one, zero, which is very confusing because we've got a one 989 01:34:01,280 --> 01:34:12,160 here, but so we've got, oops, don't mean that. What this is saying is we've got one three by three 990 01:34:12,160 --> 01:34:19,920 shape tensor. So very outer bracket matches up with this number one here. And then this three 991 01:34:20,480 --> 01:34:28,320 matches up with the next one here, which is one, two, three. And then this three matches up with 992 01:34:28,320 --> 01:34:36,400 this one, one, two, three. Now, if you'd like to see this with a pretty picture, we can see it here. 993 01:34:36,400 --> 01:34:44,960 So dim zero lines up. So the blue bracket, the very outer one, lines up with the one. Then dim 994 01:34:44,960 --> 01:34:52,400 equals one, this one here, the middle bracket, lines up with the middle dimension here. And then 995 01:34:52,400 --> 01:35:01,680 dim equals two, the very inner lines up with these three here. So again, this is going to take a lot 996 01:35:01,680 --> 01:35:06,720 of practice. It's taken me a lot of practice to understand the dimensions of tensors. But 997 01:35:07,600 --> 01:35:14,000 to practice, I would like you to write out your own tensor of, you can put however many square 998 01:35:14,000 --> 01:35:20,560 brackets you want. And then just interact with the end dim shape and indexing, just as I've done 999 01:35:20,560 --> 01:35:25,120 here, but you can put any combination of numbers inside this tensor. That's a little bit of practice 1000 01:35:25,120 --> 01:35:30,400 before the next video. So give that a shot and then we'll move on to the next topic. 1001 01:35:30,400 --> 01:35:39,840 I'll see you there. Welcome back. In the last video, we covered the basic building blocks of data 1002 01:35:39,840 --> 01:35:45,760 representation in deep learning, which is the tensor, or in PyTorch, specifically torch.tensor. 1003 01:35:45,760 --> 01:35:51,520 But within that, we had to look at what a scalar is. We had to look at what a vector is. We had to 1004 01:35:51,520 --> 01:35:57,360 look at a matrix. We had to look at what a tensor is. And I issued you the challenge to get as 1005 01:35:57,360 --> 01:36:01,840 creative as you like with creating your own tensor. So I hope you gave that a shot because as you'll 1006 01:36:01,840 --> 01:36:07,200 see throughout the course and your deep learning journey, a tensor can represent or can be of almost 1007 01:36:07,200 --> 01:36:13,120 any shape and size and have almost any combination of numbers within it. And so this is very important 1008 01:36:13,120 --> 01:36:18,320 to be able to interact with different tensors to be able to understand what the different names of 1009 01:36:18,320 --> 01:36:24,000 things are. So when you hear matrix, you go, oh, maybe that's a two dimensional tensor. When you 1010 01:36:24,000 --> 01:36:28,960 hear a vector, maybe that's a one dimensional tensor. When you hear a tensor, that could be any 1011 01:36:28,960 --> 01:36:33,120 amount of dimensions. And just for reference for that, if we come back to the course reference, 1012 01:36:33,120 --> 01:36:38,400 we've got a scalar. What is it? A single number, number of dimensions, zero. We've got a vector, 1013 01:36:38,400 --> 01:36:45,920 a number with direction, number of dimensions, one, a matrix, a tensor. And now here's another little 1014 01:36:45,920 --> 01:36:52,160 tidbit of the nomenclature of things, the naming of things. Typically, you'll see a variable name 1015 01:36:52,160 --> 01:36:58,880 for a scalar or a vector as a lowercase. So a vector, you might have a lowercase y storing that 1016 01:36:58,880 --> 01:37:06,480 data. But for a matrix or a tensor, you'll often see an uppercase letter or variable in Python in 1017 01:37:06,480 --> 01:37:12,080 our case, because we're writing code. And so I am not exactly sure why this is, but this is just 1018 01:37:12,080 --> 01:37:16,800 what you're going to see in machine learning and deep learning code and research papers 1019 01:37:16,800 --> 01:37:22,720 across the board. This is a typical nomenclature. Scalars and vectors, lowercase, matrix and tensors, 1020 01:37:22,720 --> 01:37:28,000 uppercase, that's where that naming comes from. And that's why I've given the tensor uppercase here. 1021 01:37:28,880 --> 01:37:34,240 Now, with that being said, let's jump in to another very important concept with tensors. 1022 01:37:34,240 --> 01:37:40,000 And that is random tensors. Why random tensors? I'm just writing this in a code cell now. 1023 01:37:40,000 --> 01:37:48,160 I could go here. This is a comment in Python, random tensors. But we'll get rid of that. We could 1024 01:37:48,160 --> 01:37:54,320 just start another text cell here. And then three hashes is going to give us a heading, random tensors 1025 01:37:54,320 --> 01:38:02,560 there. Or I could turn this again into a markdown cell with command mm when I'm using Google Colab. 1026 01:38:02,560 --> 01:38:10,080 So random tensors. Let's write down here. Why random tensors? So we've done the tedious thing 1027 01:38:10,080 --> 01:38:15,120 of creating our own tensors with some numbers that we've defined, whatever these are. Again, 1028 01:38:15,120 --> 01:38:21,680 you could define these as almost anything. But random tensors is a big part in pytorch because 1029 01:38:21,680 --> 01:38:34,240 let's write this down. Random tensors are important because the way many neural networks learn is 1030 01:38:34,240 --> 01:38:42,800 that they start with tensors full of random numbers and then adjust those random numbers 1031 01:38:42,800 --> 01:38:52,720 to better represent the data. So seriously, this is one of the big concepts of neural networks. 1032 01:38:52,720 --> 01:38:58,480 I'm going to write in code here, which is this is what the tick is for. Start with random numbers. 1033 01:38:58,480 --> 01:39:19,200 Look at data, update random numbers. Look at data, update random numbers. That is the crux 1034 01:39:19,200 --> 01:39:25,280 of neural networks. So let's create a random tensor with pytorch. Remember how I said that 1035 01:39:25,280 --> 01:39:30,960 pytorch is going to create tensors for you behind the scenes? Well, this is one of the ways that 1036 01:39:30,960 --> 01:39:40,880 it does so. So we create a random tensor and we give it a size of random tensor of size or shape. 1037 01:39:40,880 --> 01:39:47,600 Pytorch use these independently. So size, shape, they mean the different versions of the same thing. 1038 01:39:47,600 --> 01:39:58,240 So random tensor equals torch dot rand. And we're going to type in here three, four. And the beautiful 1039 01:39:58,240 --> 01:40:02,960 thing about Google Colab as well is that if we wait long enough, it's going to pop up with the doc 1040 01:40:02,960 --> 01:40:07,680 string of what's going on. I personally find this a little hard to read in Google Colab, 1041 01:40:07,680 --> 01:40:13,360 because you see you can keep going down there. You might be able to read that. But what can we do? 1042 01:40:13,360 --> 01:40:19,920 Well, we can go to torch dot rand. Then we go to the documentation. Beautiful. Now there's a whole 1043 01:40:19,920 --> 01:40:24,240 bunch of stuff here that you're more than welcome to read. We're not going to go through all that. 1044 01:40:24,240 --> 01:40:30,400 We're just going to see what happens hands on. So we'll copy that in here. And write this in notes, 1045 01:40:31,120 --> 01:40:37,440 torch random tensors. Done. Just going to make some code cells down here. So I've got some space. 1046 01:40:37,440 --> 01:40:46,240 I can get this a bit up here. Let's see what our random tensor looks like. There we go. Beautiful 1047 01:40:46,240 --> 01:40:53,120 of size three, four. So we've got three or four elements here. And then we've got three deep 1048 01:40:53,120 --> 01:40:59,040 here. So again, there's the two pairs. So what do you think the number of dimensions will be 1049 01:40:59,040 --> 01:41:09,600 for random tensor? And dim. Two beautiful. And so we have some random numbers here. Now the 1050 01:41:09,600 --> 01:41:14,640 beautiful thing about pie torch again is that it's going to do a lot of this behind the scenes. So 1051 01:41:14,640 --> 01:41:20,320 if we wanted to create a size of 10 10, in some cases, we won't want one dimension here. And then 1052 01:41:20,320 --> 01:41:24,400 it's going to go 10 10. And then if we check the number of dimensions, how many do you think it 1053 01:41:24,400 --> 01:41:31,120 will be now three? Why is that? Because we've got one 10 10. And then if we wanted to create 10 10 10. 1054 01:41:32,560 --> 01:41:35,680 What's the number of dimensions going to be? It's not going to change. Why is that? 1055 01:41:36,720 --> 01:41:39,200 We haven't run that cell yet, but we've got a lot of numbers here. 1056 01:41:42,240 --> 01:41:47,280 We can find out what 10 times 10 times 10 is. And I know we can do that in our heads, but 1057 01:41:47,280 --> 01:41:51,520 the beauty of collab is we've got a calculator right here. 10 times 10 times 10. We've got a 1058 01:41:51,520 --> 01:41:57,120 thousand elements in there. But sometimes tenses can be hundreds of thousands of elements or 1059 01:41:57,120 --> 01:42:01,360 millions of elements. But pie torch is going to take care of a lot of this behind the scenes. So 1060 01:42:01,360 --> 01:42:10,720 let's clean up a bit of space here. This is a random tensor. Random numbers beautiful of now 1061 01:42:10,720 --> 01:42:15,200 it's got two dimensions because we've got three by four. And if we put another one in the front 1062 01:42:15,200 --> 01:42:20,480 there, we're going to have how many dimensions three dimensions there. But again, this number 1063 01:42:20,480 --> 01:42:26,560 of dimensions could be any number. And what's inside here could be any number. Let's get rid of that. 1064 01:42:26,560 --> 01:42:31,840 And let's get a bit specific because right now this is just a random tensor of whatever dimension. 1065 01:42:31,840 --> 01:42:44,160 How about we create a random tensor with similar shape to an image tensor. So a lot of the time 1066 01:42:44,160 --> 01:42:50,480 when we turn images, image size tensor, when we turn images into tenses, they're going to have, 1067 01:42:51,120 --> 01:42:58,800 let me just write it in code for you first, size equals a height, a width, and a number of color 1068 01:42:58,800 --> 01:43:06,640 channels. And so in this case, it's going to be height with color channels. And the color channels 1069 01:43:06,640 --> 01:43:15,520 are red, green, blue. And so let's create a random image tensor. Let's view the size of it or the 1070 01:43:15,520 --> 01:43:28,400 shape. And then random image size tensor will view the end dim. Beautiful. Okay, so we've got 1071 01:43:28,400 --> 01:43:36,080 torch size, the same size two, two, four, two, four, three, height, width, color channels. And we've got 1072 01:43:36,080 --> 01:43:42,080 three dimensions, one, four, height, width, color channels. Let's go and see an example of this. This 1073 01:43:42,080 --> 01:43:50,640 is the PyTorch Fundamentals notebook. If we go up to here, so say we wanted to encode this image 1074 01:43:50,640 --> 01:43:56,240 of my dad eating pizza with thumbs up of a square image of two, two, four by two, two, four. 1075 01:43:56,240 --> 01:44:02,560 This is an input. And if we wanted to encode this into tensor format, well, one of the ways of 1076 01:44:02,560 --> 01:44:07,360 representing an image tensor, very common ways is to split it into color channels because with 1077 01:44:07,360 --> 01:44:12,960 red, green, and blue, you can create almost any color you want. And then we have a tensor 1078 01:44:12,960 --> 01:44:17,600 representation. So sometimes you're going to see color channels come first. We can switch this 1079 01:44:17,600 --> 01:44:24,000 around and our code quite easily by going color channels here. But you'll also see color channels 1080 01:44:24,000 --> 01:44:28,880 come at the end. I know I'm saying a lot that we kind of haven't covered yet. The main takeaway 1081 01:44:28,880 --> 01:44:35,760 from here is that almost any data can be represented as a tensor. And one of the common ways to represent 1082 01:44:35,760 --> 01:44:42,720 images is in the format color channels, height, width, and how these values are will depend on 1083 01:44:42,720 --> 01:44:48,960 what's in the image. But we've done this in a random way. So the takeaway from this video is 1084 01:44:48,960 --> 01:44:55,680 that PyTorch enables you to create tensors quite easily with the random method. However, it is 1085 01:44:55,680 --> 01:45:02,000 going to do a lot of this creating tensors for you behind the scenes and why a random tensor is so 1086 01:45:02,000 --> 01:45:08,800 valuable because neural networks start with random numbers, look at data such as image tensors, 1087 01:45:08,800 --> 01:45:15,200 and then adjust those random numbers to better represent that data. And they repeat those steps 1088 01:45:15,200 --> 01:45:20,880 onwards and onwards and onwards. Let's finish this video here. I'm going to challenge for you 1089 01:45:20,880 --> 01:45:26,320 just to create your own random tensor of whatever size and shape you want. So you could have 5, 10, 1090 01:45:26,320 --> 01:45:30,640 10 here and see what that looks like. And then we'll keep coding in the next video. 1091 01:45:33,200 --> 01:45:37,920 I hope you took on the challenge of creating random tensor of your own size. And just a little 1092 01:45:37,920 --> 01:45:42,720 tidbit here. You might have seen me in the previous video. I didn't use the size parameter. But in 1093 01:45:42,720 --> 01:45:49,280 this case, I did here, you can go either way. So if we go torch dot rand size equals, we put in a 1094 01:45:49,280 --> 01:45:55,680 tuple here of three three, we've got that tensor there three three. But then also if we don't put 1095 01:45:55,680 --> 01:46:01,360 the size in there, it's the default. So it's going to create a very similar tensor. So whether you 1096 01:46:01,360 --> 01:46:07,360 have this size or not, it's going to have quite a similar output depending on the shape that you 1097 01:46:07,360 --> 01:46:14,480 put in there. But now let's get started to another kind of tensor that you might see zeros and ones. 1098 01:46:16,240 --> 01:46:21,920 So say you wanted to create a tensor, but that wasn't just full of random numbers, 1099 01:46:21,920 --> 01:46:30,400 you wanted to create a tensor of all zeros. This is helpful for if you're creating some form of 1100 01:46:30,400 --> 01:46:39,520 mask. Now, we haven't covered what a mask is. But essentially, if we create a tensor of all zeros, 1101 01:46:41,280 --> 01:46:48,320 what happens when you multiply a number by zero? All zeros. So if we wanted to multiply 1102 01:46:48,320 --> 01:46:53,520 these two together, let's do zeros times random tensor. 1103 01:46:53,520 --> 01:47:04,160 There we go, all zeros. So maybe if you're working with this random tensor and you wanted to mask 1104 01:47:04,160 --> 01:47:10,160 out, say all of the numbers in this column for some reason, you could create a tensor of zeros in 1105 01:47:10,160 --> 01:47:15,680 that column, multiply it by your target tensor, and you would zero all those numbers. That's telling 1106 01:47:15,680 --> 01:47:20,480 your model, hey, ignore all of the numbers that are in here because I've zeroed them out. And then 1107 01:47:20,480 --> 01:47:28,960 if you wanted to create a tensor of all ones, create a tensor of all ones, we can go ones equals 1108 01:47:28,960 --> 01:47:37,440 torch dot ones, size equals three, four. And then if we have a look, there's another parameter I 1109 01:47:37,440 --> 01:47:43,920 haven't showed you yet, but this is another important one is the D type. So the default data type, 1110 01:47:43,920 --> 01:47:49,520 so that's what D type stands for, is torch dot float. We've actually been using torch dot float 1111 01:47:49,520 --> 01:47:54,240 the whole time, because that's whenever you create a tensor with pytorch, we're using a pytorch 1112 01:47:54,240 --> 01:47:58,720 method, unless you explicitly define what the data type is, we'll see that later on, defining 1113 01:47:58,720 --> 01:48:06,160 what the data type is, it starts off as torch float 32. So these are float numbers. So that 1114 01:48:06,160 --> 01:48:13,440 is how you create zeros and ones zeros is probably I've seen more common than ones in use, but just 1115 01:48:13,440 --> 01:48:17,760 keep these in mind, you might come across them. There are lots of different methods to creating 1116 01:48:17,760 --> 01:48:25,200 tensors. And truth be told, like random is probably one of the most common, but you might see zeros 1117 01:48:25,200 --> 01:48:31,200 and ones out in the field. So now we've covered that. Let's move on into the next video, where 1118 01:48:31,200 --> 01:48:36,080 we're going to create a range. So have a go at creating a tensor full of zeros and whatever size 1119 01:48:36,080 --> 01:48:40,240 you want, and a tensor full of ones and whatever size you want. And I'll see you in the next video. 1120 01:48:42,480 --> 01:48:47,120 Welcome back. I hope you took on the challenge of creating a torch tensor of zeros of your 1121 01:48:47,120 --> 01:48:55,840 own size and ones of your own size. But now let's investigate how we might create a range of 1122 01:48:55,840 --> 01:49:04,400 tensors and tensors like. So these are two other very common methods of creating tensors. 1123 01:49:05,120 --> 01:49:12,480 So let's start by creating a range. So we'll first use torch dot range, because depending on 1124 01:49:12,480 --> 01:49:19,120 when you're watching this video, torch dot range may be still in play or it may be deprecated. 1125 01:49:19,680 --> 01:49:24,880 If we write in torch dot range right now with the pie torch version that I'm using, which is 1126 01:49:25,440 --> 01:49:33,920 torch dot version, which is torch or pie torch 1.10 point zero torch range is deprecated and 1127 01:49:33,920 --> 01:49:37,920 will be removed in a future release. So just keep that in mind. If you come across some code that's 1128 01:49:37,920 --> 01:49:44,240 using torch dot range, maybe out of whack. So the way to get around that is to fix that is to use 1129 01:49:44,240 --> 01:49:52,240 a range instead. And if we just write in torch dot a range, we've got tensors of zero to nine, 1130 01:49:52,240 --> 01:49:57,200 because it of course starts at zero index. If we wanted one to 10, we could go like this. 1131 01:49:57,200 --> 01:50:07,680 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. And we can go zero, or we go 1, 2, 10, equals torch a range. 1132 01:50:10,960 --> 01:50:17,280 Wonderful. And we can also define the step. So let's let's type in some start and where can we 1133 01:50:17,280 --> 01:50:22,000 find the documentation on a range? Sometimes in Google Colab, you can press shift tab, 1134 01:50:22,000 --> 01:50:28,800 but I find that it doesn't always work for me. Yeah, you could hover over it, but we can also just 1135 01:50:28,800 --> 01:50:36,640 go torch a range and look for the documentation torch a range. So we've got start and step. Let's 1136 01:50:36,640 --> 01:50:44,560 see what all of these three do. Maybe we start at zero, and maybe we want it to go to a thousand, 1137 01:50:44,560 --> 01:50:53,120 and then we want a step of what should our step be? What's a fun number? 77. So it's not one to 10 1138 01:50:53,120 --> 01:51:01,200 anymore, but here we go. We've got start at zero, 77 plus 77 plus 77, all the way up to it finishes 1139 01:51:01,200 --> 01:51:09,840 at a thousand. So if we wanted to take it back to one to 10, we can go up here. 110, and the default 1140 01:51:09,840 --> 01:51:16,560 step is going to be one. Oops, we needed the end to be that it's going to finish at end minus one. 1141 01:51:17,200 --> 01:51:27,280 There we go. Beautiful. Now we can also create tensors like. So creating tensors like. So tensors 1142 01:51:27,280 --> 01:51:32,720 like is say you had a particular shape of a tensor you wanted to replicate somewhere else, but you 1143 01:51:32,720 --> 01:51:38,560 didn't want to explicitly define what that shape should be. So what's the shape of one to 10? 1144 01:51:42,160 --> 01:51:47,840 One to 10. Now if we wanted to create a tensor full of zeros that had the same shape as this, 1145 01:51:47,840 --> 01:51:57,120 we can use tensor like or zeros like. So 10 zeros, zeros equals, I'm not even sure if I'm 1146 01:51:57,120 --> 01:52:03,280 spelling zeros right then, zeros. Well, I might have a typo spelling zeros here, but you get what 1147 01:52:03,280 --> 01:52:10,800 I'm saying is torch zeros. Oh, torch spell it like that. That's why I'm spelling it like that. 1148 01:52:10,800 --> 01:52:19,040 Zeros like one to 10. And then the input is going to be one to 10. And we have a look at 10 zeros. 1149 01:52:19,040 --> 01:52:28,080 My goodness, this is taking quite the while to run. This is troubleshooting on the fly. 1150 01:52:28,080 --> 01:52:33,360 If something's happening like this, you can try to stop. If something was happening like that, 1151 01:52:33,360 --> 01:52:38,800 you can click run and then stop. Well, it's running so fast that I can't click stop. If you do also 1152 01:52:38,800 --> 01:52:43,760 run into trouble, you can go runtime, restart runtime. We might just do that now just to show you. 1153 01:52:44,320 --> 01:52:48,960 Restart and run all is going to restart the compute engine behind the collab notebook. 1154 01:52:48,960 --> 01:52:53,920 And run all the cells to where we are. So let's just see that we restart and run runtime. If you're 1155 01:52:53,920 --> 01:53:00,640 getting errors, sometimes this helps. There is no set in stone way to troubleshoot errors. It's 1156 01:53:00,640 --> 01:53:06,880 guess and check with this. So there we go. We've created 10 zeros, which is torch zeros like 1157 01:53:07,760 --> 01:53:14,320 our one to 10 tensor. So we've got zeros in the same shape as one to 10. So if you'd like to create 1158 01:53:14,320 --> 01:53:26,400 tensors, use torch arrange and get deprecated message. Use torch arrange instead for creating 1159 01:53:26,400 --> 01:53:31,680 a range of tensors with a start and end in a step. And then if you wanted to create tensors 1160 01:53:31,680 --> 01:53:38,080 or a tensor like something else, you want to look for the like method. And then you put an input, 1161 01:53:38,080 --> 01:53:43,200 which is another tensor. And then it'll create a similar tensor with whatever this method here 1162 01:53:43,200 --> 01:53:49,760 is like in that fashion or in the same shape as your input. So with that being said, 1163 01:53:49,760 --> 01:53:54,880 give that a try, create a range of tensors, and then try to replicate that range shape that you've 1164 01:53:54,880 --> 01:54:04,400 made with zeros. I'll see you in the next video. Welcome back. Let's now get into a very important 1165 01:54:04,400 --> 01:54:12,160 topic of tensor data types. So we've briefly hinted on this before. And I said that let's create 1166 01:54:12,160 --> 01:54:20,720 a tensor to begin with float 32 tensor. And we're going to go float 32 tensor equals torch 1167 01:54:20,720 --> 01:54:29,680 dot tensor. And let's just put in the numbers three, six, nine. If you've ever played need for 1168 01:54:29,680 --> 01:54:34,800 speed underground, you'll know where three, six, nine comes from. And then we're going to go 1169 01:54:34,800 --> 01:54:45,040 D type equals, let's just put none and see what happens, hey, float 32 tensor. Oh, what is the 1170 01:54:45,040 --> 01:54:54,640 data type? float 32, tensor dot D type. float 32, even though we put none, this is because 1171 01:54:54,640 --> 01:55:00,640 the default data type in pytorch, even if it's specified as none is going to come out as float 32. 1172 01:55:00,640 --> 01:55:06,480 What if we wanted to change that to something else? Well, let's type in here float 16. 1173 01:55:07,920 --> 01:55:14,480 And now we've got float 32 tensor. This variable name is a lie now because it's a float 16 tensor. 1174 01:55:14,480 --> 01:55:20,000 So we'll leave that as none. Let's go there. There's another parameter when creating tensors. 1175 01:55:20,000 --> 01:55:26,000 It's very important, which is device. So we'll see what that is later on. And then there's a 1176 01:55:26,000 --> 01:55:32,400 final one, which is also very important, which is requires grad equals false. Now this could be 1177 01:55:32,400 --> 01:55:38,080 true, of course, we're going to set this as false. So these are three of the most important parameters 1178 01:55:38,080 --> 01:55:43,920 when you're creating tensors. Now, again, you won't necessarily always have to enter these when 1179 01:55:43,920 --> 01:55:49,040 you're creating tensors, because pytorch does a lot of tensor creation behind the scenes for you. 1180 01:55:49,040 --> 01:55:58,080 So let's just write out what these are. Data type is what data type is the tensor, e.g. float 32, 1181 01:55:58,080 --> 01:56:04,400 or float 16. Now, if you'd like to look at what data types are available for pytorch tensors, 1182 01:56:04,400 --> 01:56:11,280 we can go torch tensor and write up the top unless the documentation changes. We have data types. 1183 01:56:11,280 --> 01:56:17,360 It's so important that data types is the first thing that comes up when you're creating a tensor. 1184 01:56:17,360 --> 01:56:24,000 So we have 32-bit floating point, 64-bit floating point, 16, 16, 32-bit complex. Now, 1185 01:56:24,000 --> 01:56:30,640 the most common ones that you will likely interact with are 32-bit floating point and 16-bit floating 1186 01:56:30,640 --> 01:56:36,000 point. Now, what does this mean? What do these numbers actually mean? Well, they have to do with 1187 01:56:36,000 --> 01:56:44,000 precision in computing. So let's look up that. Precision in computing. Precision computer science. 1188 01:56:44,000 --> 01:56:50,000 So in computer science, the precision of a numerical quantity, we're dealing with numbers, right? 1189 01:56:50,000 --> 01:56:55,280 As a measure of the detail in which the quantity is expressed. This is usually measured in bits, 1190 01:56:55,280 --> 01:57:01,280 but sometimes in decimal digits. It is related to precision in mathematics, which describes the 1191 01:57:01,280 --> 01:57:08,320 number of digits that are used to express a value. So, for us, precision is the numerical quantity, 1192 01:57:08,320 --> 01:57:14,560 is a measure of the detail, how much detail in which the quantity is expressed. So, I'm not going 1193 01:57:14,560 --> 01:57:19,600 to dive into the background of computer science and how computers represent numbers. The important 1194 01:57:19,600 --> 01:57:25,280 takeaway for you from this will be that single precision floating point is usually called float 1195 01:57:25,280 --> 01:57:33,280 32, which means, yeah, a number contains 32 bits in computer memory. So if you imagine, if we have 1196 01:57:33,280 --> 01:57:39,680 a tensor that is using 32 bit floating point, the computer memory stores the number as 32 bits. 1197 01:57:40,240 --> 01:57:46,880 Or if it has 16 bit floating point, it stores it as 16 bits or 16 numbers representing or 16. 1198 01:57:46,880 --> 01:57:52,480 I'm not sure if a bit equates to a single number in computer memory. But what this means is that 1199 01:57:52,480 --> 01:57:59,680 a 32 bit tensor is single precision. This is half precision. Now, this means that it's the default 1200 01:57:59,680 --> 01:58:05,520 of 32, float 32, torch dot float 32, as we've seen in code, which means it's going to take up 1201 01:58:05,520 --> 01:58:10,560 a certain amount of space in computer memory. Now, you might be thinking, why would I do anything 1202 01:58:10,560 --> 01:58:16,880 other than the default? Well, if you'd like to sacrifice some detail in how your number is 1203 01:58:16,880 --> 01:58:25,840 represented. So instead of 32 bits, it's represented by 16 bits, you can calculate faster on numbers 1204 01:58:25,840 --> 01:58:32,720 that take up less memory. So that is the main differentiator between 32 bit and 16 bit. But if 1205 01:58:32,720 --> 01:58:38,720 you need more precision, you might go up to 64 bit. So just keep that in mind as you go forward. 1206 01:58:38,720 --> 01:58:45,120 Single precision is 32. Half precision is 16. What do these numbers represent? They represent 1207 01:58:45,120 --> 01:58:53,360 how much detail a single number is stored in memory. That was a lot to take in. But we're talking 1208 01:58:53,360 --> 01:58:58,640 about 10 to data types. I'm spending a lot of time here, because I'm going to put a note here, 1209 01:58:58,640 --> 01:59:11,360 note, tensor data types is one of the three big issues with pytorch and deep learning or 1210 01:59:11,360 --> 01:59:17,200 not not issues, they're going to be errors that you run into and deep learning. Three big 1211 01:59:17,200 --> 01:59:29,840 errors, you'll run into with pytorch and deep learning. So one is tensors, not right data type. 1212 01:59:29,840 --> 01:59:39,360 Two tensors, not right shape. We've seen a few shapes of four and three tensors, not on the right 1213 01:59:39,360 --> 01:59:48,080 device. And so in this case, if we had a tensor that was float 16 and we were trying to do computations 1214 01:59:48,080 --> 01:59:53,040 with a tensor that was float 32, we might run into some errors. And so that's the tensors not 1215 01:59:53,040 --> 01:59:58,400 being in the right data type. So it's important to know about the D type parameter here. And then 1216 01:59:58,400 --> 02:00:03,600 tensors not being the right shape. Well, that's once we get onto matrix multiplication, we'll see 1217 02:00:03,600 --> 02:00:08,320 that if one tensor is a certain shape and another tensor is another shape and those shapes don't 1218 02:00:08,320 --> 02:00:13,360 line up, we're going to run into shape errors. And this is a perfect segue to the device. 1219 02:00:13,360 --> 02:00:19,840 Device equals none. By default, this is going to be CPU. This is why we are using Google Colab 1220 02:00:19,840 --> 02:00:25,440 because it enables us to have access to, oh, we don't want to restart, enables us to have access 1221 02:00:25,440 --> 02:00:32,480 to a GPU. As I've said before, a GPU enables us. So we could change this to CUDA. That would be, 1222 02:00:32,480 --> 02:00:39,760 we'll see how to write device agnostic code later on. But this device, if you try to do 1223 02:00:39,760 --> 02:00:46,240 operations between two tensors that are not on the same device. So for example, you have one tensor 1224 02:00:46,240 --> 02:00:51,040 that lives on a GPU for fast computing, and you have another tensor that lives on a CPU and you 1225 02:00:51,040 --> 02:00:56,400 try to do something with them, while pytorch is going to throw you an error. And then finally, 1226 02:00:56,400 --> 02:01:01,520 this last requirement is grad is if you want pytorch to track the gradients, we haven't covered 1227 02:01:01,520 --> 02:01:06,640 what that is of a tensor when it goes through certain numerical calculations. This is a bit of 1228 02:01:06,640 --> 02:01:12,960 a bombardment, but I thought I'd throw these in as important parameters to be aware of since 1229 02:01:12,960 --> 02:01:17,760 we're discussing data type. And really, it would be reminiscent of me to discuss data type without 1230 02:01:17,760 --> 02:01:24,960 discussing not the right shape or not the right device. So with that being said, let's write down 1231 02:01:24,960 --> 02:01:36,960 here what device is your tensor on, and whether or not to track gradients with this tensor's 1232 02:01:36,960 --> 02:01:44,880 operations. So we have a float 32 tensor. Now, how might we change the tensor data type of this? 1233 02:01:44,880 --> 02:01:52,000 Let's create float 16 tensor. And we saw that we could explicitly write in float 16 tensor. 1234 02:01:52,000 --> 02:02:00,560 Or we can just type in here, float 16 tensor equals float 32 tensor dot type. And we're going to type 1235 02:02:00,560 --> 02:02:07,920 in torch dot float 16, why float 16, because well, that's how we define float 16, or we could use 1236 02:02:07,920 --> 02:02:14,560 half. So the same thing, these things are the same, let's just do half, or float 16 is more 1237 02:02:14,560 --> 02:02:25,600 explicit for me. And then let's check out float 16 tensor. Beautiful, we've converted our float 1238 02:02:25,600 --> 02:02:31,600 32 tensor into float 16. So that is one of the ways that you'll be able to tackle the tensors 1239 02:02:31,600 --> 02:02:37,280 not in the right data type issue that you run into. And just a little note on the precision 1240 02:02:37,280 --> 02:02:43,520 and computing, if you'd like to read more on that, I'm going to link this in here. And this is all 1241 02:02:43,520 --> 02:02:52,640 about how computers store numbers. So precision in computing. There we go. I'll just get rid of that. 1242 02:02:53,520 --> 02:02:59,520 Wonderful. So give that a try, create some tensors, research, or go to the documentation of torch 1243 02:02:59,520 --> 02:03:04,640 dot tensor and see if you can find out a little bit more about D type device and requires grad, 1244 02:03:04,640 --> 02:03:09,920 and create some tensors of different data types. Play around with whatever the ones you want here, 1245 02:03:09,920 --> 02:03:14,960 and see if you can run into some errors, maybe try to multiply two tensors together. So if you go 1246 02:03:15,760 --> 02:03:23,680 float 16 tensor times float 32 tensor, give that a try and see what happens. I'll see you in the next 1247 02:03:23,680 --> 02:03:30,560 video. Welcome back. In the last video, we covered a little bit about tensor data types, 1248 02:03:30,560 --> 02:03:36,080 as well as some of the most common parameters you'll see past to the torch dot tensor method. 1249 02:03:36,080 --> 02:03:40,480 And so I should do the challenge at the end of the last video to create some of your own tensors 1250 02:03:40,480 --> 02:03:45,200 of different data types, and then to see what happens when you multiply a float 16 tensor by a 1251 02:03:45,200 --> 02:03:52,800 float 32 tensor. Oh, it works. And but you've like Daniel, you said that you're going to have tensors 1252 02:03:52,800 --> 02:03:58,880 not the right data type. Well, this is another kind of gotcha or caveat of pie torch and deep 1253 02:03:58,880 --> 02:04:03,760 learning in general, is that sometimes you'll find that even if you think something may error 1254 02:04:03,760 --> 02:04:08,800 because these two tensors are different data types, it actually results in no error. But then 1255 02:04:08,800 --> 02:04:13,200 sometimes you'll have other operations that you do, especially training large neural networks, 1256 02:04:13,200 --> 02:04:18,400 where you'll get data type issues. The important thing is to just be aware of the fact that some 1257 02:04:18,400 --> 02:04:23,520 operations will run an error when your tensors are not in the right data type. So let's try another 1258 02:04:23,520 --> 02:04:32,960 type. Maybe we try a 32 bit integer. So torch dot in 32. And we try to multiply that by a float. 1259 02:04:32,960 --> 02:04:45,200 Wonder what will happen then? So let's go into 32 in 32 tensor equals torch dot tensor. And we'll 1260 02:04:45,200 --> 02:04:52,720 just make it three. Notice that there's no floats there or no dot points to make it a float. 1261 02:04:52,720 --> 02:05:04,320 Three, six, nine and D type can be torch in 32. And then in 32 tensor, what does this look like? 1262 02:05:04,320 --> 02:05:12,080 Typo, of course, one of many in 32 tensor. So now let's go float 32 tensor and see what happens. 1263 02:05:12,080 --> 02:05:17,120 Can we get pie torch to throw an error in 32 tensor? 1264 02:05:17,120 --> 02:05:26,080 Huh, it worked as well. Or maybe we go into 64. What happens here? 1265 02:05:28,000 --> 02:05:34,640 Still works. Now, see, this is again one of the confusing parts of doing tensor operations. 1266 02:05:34,640 --> 02:05:40,160 What if we do a long tensor? Torch to long. Is this going to still work? 1267 02:05:41,760 --> 02:05:45,520 Ah, torch has no attribute called long. That's not a data type issue. 1268 02:05:45,520 --> 02:05:57,520 I think it's long tensor. Long tensor. Does this work? D type must be torch D type. 1269 02:05:58,560 --> 02:06:04,000 Torch long tensor. I could have sworn that this was torch dot tensor. 1270 02:06:08,000 --> 02:06:12,960 Oh, there we go. Torch dot long tensor. That's another word for 64 bit. 1271 02:06:12,960 --> 02:06:19,200 So what is this saying? CPU tensor. Okay, let's see. This is some troubleshooting on the fly here. 1272 02:06:23,040 --> 02:06:29,360 Then we multiply it. This is a float 32 times a long. It works. Okay, so it's actually a bit 1273 02:06:29,360 --> 02:06:33,360 more robust than what I thought it was. But just keep this in mind when we're training models, 1274 02:06:33,360 --> 02:06:36,320 we're probably going to run into some errors at some point of our tensor's not being the 1275 02:06:36,320 --> 02:06:40,480 right data type. And if pie torch throws us an error saying your tensors are in the wrong data 1276 02:06:40,480 --> 02:06:46,960 type, well, at least we know now how to change that data type or how to set the data type if we 1277 02:06:46,960 --> 02:06:53,920 need to. And so with that being said, let's just formalize what we've been doing a fair bit already. 1278 02:06:53,920 --> 02:06:59,600 And that's getting information from tensors. So the three big things that we'll want to get 1279 02:06:59,600 --> 02:07:04,320 from our tensors in line with the three big errors that we're going to face in neural networks and 1280 02:07:04,320 --> 02:07:13,680 deep lining is let's copy these down. Just going to get this, copy this down below. So if we want 1281 02:07:13,680 --> 02:07:18,400 to get some information from tensors, how do we check the shape? How do we check the data type? 1282 02:07:18,400 --> 02:07:24,320 How do we check the device? Let's write that down. So to get information from this, to get 1283 02:07:24,320 --> 02:07:39,680 D type or let's write data type from a tensor can use tensor dot D type. And let's go here to get 1284 02:07:39,680 --> 02:07:52,320 shape from a tensor can use tensor dot shape. And to get device from a tensor, which devices it on 1285 02:07:52,320 --> 02:08:02,720 CPU or GPU can use tensor dot device. Let's see these three in action. So if we run into one of 1286 02:08:02,720 --> 02:08:07,760 the three big problems in deep learning and neural networks in general, especially with PyTorch, 1287 02:08:07,760 --> 02:08:12,080 tensor's not the right data type, tensor's not the right shape or tensor's not on the right device. 1288 02:08:12,640 --> 02:08:19,440 Let's create a tensor and try these three out. We've got some tensor equals torch dot 1289 02:08:19,440 --> 02:08:23,440 rand and we'll create it a three four. Let's have a look at what it looks like. 1290 02:08:25,600 --> 02:08:31,440 There we go. Random numbers of shape three and four. Now let's find out some details about it. 1291 02:08:32,960 --> 02:08:39,840 Find out details about some tensor. So print or print some tensor. 1292 02:08:39,840 --> 02:08:49,360 And oops, didn't want that print. And let's format it or make an F string of shape of tensor. 1293 02:08:50,640 --> 02:08:52,720 Oh, let's do data type first. We'll follow that order. 1294 02:08:56,560 --> 02:09:02,000 Data type of tensor. And we're going to go, how do we do this? Some tensor dot what? 1295 02:09:02,000 --> 02:09:11,360 Dot d type. Beautiful. And then we're going to print tensors not in the right shape. So let's go 1296 02:09:13,120 --> 02:09:22,400 shape of tensor equals some tensor dot shape. Oh, I went a bit too fast, but we could also use 1297 02:09:22,400 --> 02:09:28,400 size. Let's just confirm that actually. We'll code that out together. From my experience, 1298 02:09:28,400 --> 02:09:40,640 some tensor dot size, and some tensor dot shape result in the same thing. Is that true? Oh, function. 1299 02:09:40,640 --> 02:09:46,320 Oh, that's what it is. Some tensor dot size is a function, not an attribute. 1300 02:09:49,040 --> 02:09:54,480 There we go. Which one should you use? For me, I'm probably more used to using shape. You may come 1301 02:09:54,480 --> 02:09:59,680 across dot size as well, but just realize that they do quite the same thing except one's a function 1302 02:09:59,680 --> 02:10:06,720 and one's an attribute. An attribute is written dot shape without the curly brackets. A function 1303 02:10:06,720 --> 02:10:12,560 or a method is with the brackets at the end. So that's the difference between these are attributes 1304 02:10:12,560 --> 02:10:18,400 here. D type size. We're going to change this to shape. Tensor attributes. This is what we're 1305 02:10:18,400 --> 02:10:27,200 getting. I should probably write that down. This is tensor attributes. That's the formal name for 1306 02:10:27,200 --> 02:10:31,760 these things. And then finally, what else do we want? Tensors, what device are we looking for? 1307 02:10:32,640 --> 02:10:41,200 Let's get rid of this, get rid of this. And then print f device tensor is on. By default, 1308 02:10:41,200 --> 02:10:52,560 our tensor is on the CPU. So some tensor dot device. There we go. So now we've got our tensor 1309 02:10:52,560 --> 02:10:57,760 here, some tensor. The data type is a torch float 32 because we didn't change it to anything else. 1310 02:10:57,760 --> 02:11:02,000 And torch float 32 is the default. The shape is three four, which makes a lot of sense because 1311 02:11:02,000 --> 02:11:07,360 we passed in three four here. And the device tensor is on is the CPU, which is, of course, 1312 02:11:07,360 --> 02:11:12,960 the default, unless we explicitly say to put it on another device, all of the tensors that we 1313 02:11:12,960 --> 02:11:18,720 create will default to being on the CPU, rather than the GPU. And we'll see later on how to put 1314 02:11:18,720 --> 02:11:25,040 tensors and other things in torch onto a GPU. But with that being said, give it a shot, 1315 02:11:25,040 --> 02:11:29,040 create your own tensor, get some information from that tensor, and see if you can change 1316 02:11:29,040 --> 02:11:36,640 these around. So see if you could create a random tensor, but instead of float 32, it's a float 16. 1317 02:11:36,640 --> 02:11:41,920 And then probably another extracurricular, we haven't covered this yet. But see how to change 1318 02:11:41,920 --> 02:11:47,920 the device a pytorch tensor is on. Give that a crack. And I'll see you in the next video. 1319 02:11:49,840 --> 02:11:55,360 Welcome back. So in the last video, we had a look at a few tensor attributes, namely the data 1320 02:11:55,360 --> 02:12:01,120 type of a tensor, the shape of a tensor, and the device that a tensor lives on. And I alluded to 1321 02:12:01,120 --> 02:12:08,160 the fact that these will help resolve three of the most common issues in building neural networks, 1322 02:12:08,160 --> 02:12:13,360 deep learning models, specifically with pytorch. So tensor has not been the right data type, 1323 02:12:13,360 --> 02:12:19,360 tensor has not been the right shape, and tensor has not been on the right device. So now let's 1324 02:12:19,360 --> 02:12:26,000 get into manipulating tensors. And what I mean by that, so let's just write here the title, 1325 02:12:26,000 --> 02:12:32,560 manipulating tensors. And this is going to be tensor operations. So when we're building neural 1326 02:12:32,560 --> 02:12:39,120 networks, neural networks are comprised of lots of mathematical functions that pytorch code is going 1327 02:12:39,120 --> 02:12:51,280 to run behind the scenes for us. So let's go here, tensor operations include addition, 1328 02:12:51,280 --> 02:13:02,560 subtraction, and these are the regular addition, subtraction, multiplication. There's two types 1329 02:13:02,560 --> 02:13:08,960 of multiplication in that you'll typically see referenced in deep learning and neural networks, 1330 02:13:09,600 --> 02:13:17,920 division, and matrix multiplication. And these, the ones here, so addition, subtraction, 1331 02:13:17,920 --> 02:13:25,600 multiplication, division, your typical operations that you're probably familiar with matrix multiplication. 1332 02:13:25,600 --> 02:13:29,680 The only different one here is matrix multiplication. We're going to have a look at that in a minute. 1333 02:13:30,320 --> 02:13:37,520 But to find patterns in numbers of a data set, a neural network will combine these functions 1334 02:13:37,520 --> 02:13:43,040 in some way, shape or form. So it takes a tensor full of random numbers, performs some kind of 1335 02:13:43,040 --> 02:13:48,400 combination of addition, subtraction, multiplication, division, matrix multiplication. It doesn't have 1336 02:13:48,400 --> 02:13:53,520 to be all of these. It could be any combination of these to manipulate these numbers in some way 1337 02:13:53,520 --> 02:13:59,040 to represent a data set. So that's how a neural network learns is it will just comprise these 1338 02:13:59,040 --> 02:14:05,920 functions, look at some data to adjust the numbers of a random tensor, and then go from there. But 1339 02:14:05,920 --> 02:14:11,040 with that being said, let's look at a few of these. So we'll begin with addition. First thing we need 1340 02:14:11,040 --> 02:14:22,720 to do is create a tensor. And to add something to a tensor, we'll just go torch tensor. Let's go one, 1341 02:14:22,720 --> 02:14:31,680 two, three, add something to a tensor is tensor plus, we can use plus as the addition operator, 1342 02:14:31,680 --> 02:14:42,000 just like in Python, tensor plus 10 is going to be tensor 11, 12, 13, tensor plus 100 is going to be 1343 02:14:42,000 --> 02:14:49,200 as you'd expect plus 100. Let's leave that as plus 10 and add 10 to it. And so you might be 1344 02:14:49,200 --> 02:14:59,920 able to guess how we would multiply it by 10. So let's go multiply tensor by 10. We can go tensor, 1345 02:14:59,920 --> 02:15:08,800 star, which are my keyboard shift eight, 10. We get 10, 10, 10. And because we didn't reassign it, 1346 02:15:10,800 --> 02:15:19,040 our tensor is still 123. So if we go, if we reassign it here, tensor equals tensor by 10, 1347 02:15:19,040 --> 02:15:26,080 and then check out tensor, we've now got 10 2030. And the same thing here, we'll have 10 2030. But 1348 02:15:26,080 --> 02:15:36,240 then if we go back from the top, if we delete this reassignment, oh, what do we get there, tensor 1349 02:15:36,240 --> 02:15:47,680 by 10. Oh, what's happened here? Oh, because we've got, yeah, okay, I see, tensor by 10, tensor, 1350 02:15:47,680 --> 02:15:55,760 still 123. What should we try now? How about subtract subtract 10 equals tensor minus 10. 1351 02:15:58,880 --> 02:16:04,880 And you can also use, well, there we go, one minus 10, eight minus 10, three minus 10. 1352 02:16:05,600 --> 02:16:12,640 You can also use like torch has inbuilt functions or pytorch. So try out pytorch 1353 02:16:12,640 --> 02:16:24,400 inbuilt functions. So torch dot mall is short for multiply. We can pass in our tensor here, 1354 02:16:24,400 --> 02:16:30,960 and we can add in 10. That's going to multiply each element of tensor by 10. So just taking 1355 02:16:30,960 --> 02:16:35,920 the original tensor that we created, which is 123. And performing the same thing as this, 1356 02:16:35,920 --> 02:16:42,800 I would recommend where you can use the operators from Python. If for some reason, you see torch 1357 02:16:42,800 --> 02:16:48,080 dot mall, maybe there's a reason for that. But generally, these are more understandable if you 1358 02:16:48,080 --> 02:16:53,200 just use the operators, if you need to do a straight up multiplication, straight up addition, or straight 1359 02:16:53,200 --> 02:17:00,560 up subtraction, because torch also has torch dot add, torch dot add, is it torch dot add? It might 1360 02:17:00,560 --> 02:17:07,280 be torch dot add. I'm not sure. Oh, there we go. Yeah, torch dot add. So as I alluded to before, 1361 02:17:07,280 --> 02:17:12,800 there's two different types of multiplication that you'll hear about element wise and matrix 1362 02:17:12,800 --> 02:17:18,000 multiplication. We're going to cover matrix multiplication in the next video. As a challenge, 1363 02:17:18,000 --> 02:17:26,320 though, I would like you to search what is matrix multiplication. And I think the first website that 1364 02:17:26,320 --> 02:17:32,400 comes up, matrix multiplication, Wikipedia, yeah, math is fun. It has a great guide. So before we 1365 02:17:32,400 --> 02:17:38,400 get into matrix multiplication, jump into math is fun to have a look at matrix multiplying, 1366 02:17:38,400 --> 02:17:43,520 and have a think about how we might be able to replicate that in pie torch. Even if you're not 1367 02:17:43,520 --> 02:17:52,000 sure, just have a think about it. I'll see you in the next video. Welcome back. In the last video, 1368 02:17:52,000 --> 02:17:58,880 we discussed some basic tensor operations, such as addition, subtraction, multiplication, 1369 02:17:58,880 --> 02:18:04,560 element wise, division, and matrix multiplication. But we didn't actually go through what matrix 1370 02:18:04,560 --> 02:18:09,920 multiplication is. So now let's start on that more particularly discussing the difference between 1371 02:18:09,920 --> 02:18:15,680 element wise and matrix multiplication. So we'll come down here, let's write another heading, 1372 02:18:15,680 --> 02:18:23,440 matrix multiplication. So there's two ways, or two main ways. Yeah, let's write that two main 1373 02:18:23,440 --> 02:18:33,920 ways of performing multiplication in neural networks and deep learning. So one is the simple 1374 02:18:33,920 --> 02:18:42,640 version, which is what we've seen, which is element wise multiplication. And number two is matrix 1375 02:18:42,640 --> 02:18:49,600 multiplication. So matrix multiplication is actually possibly the most common tensor operation you 1376 02:18:49,600 --> 02:18:57,280 will find inside neural networks. And in the last video, I issued the extra curriculum of having a 1377 02:18:57,280 --> 02:19:04,480 look at the math is fun dot com page for how to multiply matrices. So the first example they go 1378 02:19:04,480 --> 02:19:11,760 through is element wise multiplication, which just means multiplying each element by a specific 1379 02:19:11,760 --> 02:19:18,080 number. In this case, we have two times four equals eight, two times zero equals zero, two times one 1380 02:19:18,080 --> 02:19:23,840 equals two, two times negative nine equals negative 18. But then if we move on to matrix 1381 02:19:23,840 --> 02:19:30,160 multiplication, which is multiplying a matrix by another matrix, we need to do the dot product. 1382 02:19:30,160 --> 02:19:35,600 So that's something that you'll also hear matrix multiplication referred to as the dot product. 1383 02:19:35,600 --> 02:19:42,720 So these two are used interchangeably matrix multiplication or dot product. And if we just 1384 02:19:42,720 --> 02:19:50,960 look up the symbol for dot product, you'll find that it's just a dot. There we go, a heavy dot, 1385 02:19:50,960 --> 02:20:00,560 images. There we go, a dot B. So this is vector a dot product B. A few different options there, 1386 02:20:00,560 --> 02:20:05,440 but let's look at what it looks like in pytorch code. But first, there's a little bit of a 1387 02:20:05,440 --> 02:20:10,640 difference here. So how did we get from multiplying this matrix here of one, two, three, four, five, 1388 02:20:10,640 --> 02:20:17,680 six, times seven, eight, nine, 10, 11, 12? How did we get 58 there? Well, we start by going, 1389 02:20:17,680 --> 02:20:22,240 this is the difference between element wise and dot product, by the way, one times seven. 1390 02:20:23,120 --> 02:20:29,200 We'll record that down there. So that's seven. And then two times nine. So this is first row, 1391 02:20:29,200 --> 02:20:36,800 first column, two times nine is 18. And then three times 11 is 33. And if we add those up, 1392 02:20:36,800 --> 02:20:44,880 seven plus 18, plus 33, we get 58. And then if we were to do that for each other element that's 1393 02:20:44,880 --> 02:20:50,960 throughout these two matrices, we end up with something like this. So that's what I'd encourage 1394 02:20:50,960 --> 02:20:56,000 you to go through step by step and reproduce this a good challenge would be to reproduce this by 1395 02:20:56,000 --> 02:21:02,720 hand with pytorch code. But now let's go back and write some pytorch code to do both of these. So 1396 02:21:04,400 --> 02:21:12,720 I just want to link here as well, more information on multiplying matrices. So I'm going to turn 1397 02:21:12,720 --> 02:21:18,560 this into markdown. Let's first see element wise, element wise multiplication. We're going to start 1398 02:21:18,560 --> 02:21:27,440 with just a rudimentary example. So if we have our tensor, what is it at the moment? It's 123. 1399 02:21:27,440 --> 02:21:33,440 And then if we multiply that by itself, we get 149. But let's print something out so it looks a bit 1400 02:21:33,440 --> 02:21:42,640 prettier than that. So print, I'm going to turn this into a string. And then we do that. So if we 1401 02:21:42,640 --> 02:21:51,120 print tensor times tensor, element wise multiplication is going to give us print equals. And then 1402 02:21:51,120 --> 02:22:00,800 let's do in here tensor times tensor. We go like that. Wonderful. So we get one times one 1403 02:22:00,800 --> 02:22:08,560 equals one, two times two equals four, three times three equals nine. Now for matrix multiplication, 1404 02:22:08,560 --> 02:22:18,720 pytorch stores matrix multiplication, similar to torch dot mall in the torch dot mat mall space, 1405 02:22:19,440 --> 02:22:25,360 which stands for matrix multiplication. So let's just test it out. Let's just true the exact 1406 02:22:25,360 --> 02:22:30,640 same thing that we did here, instead of element wise, we'll do matrix multiplication on our 123 1407 02:22:30,640 --> 02:22:41,200 tensor. What happens here? Oh my goodness, 14. Now why did we get 14 instead of 149? Can you guess 1408 02:22:41,200 --> 02:22:48,880 how we got to 14 or think about how we got to 14 from these numbers? So if we recall back, 1409 02:22:49,600 --> 02:22:58,160 we saw that for we're only multiplying two smaller tensors, by the way, 123. This example is with 1410 02:22:58,160 --> 02:23:03,600 a larger one, but the same principle applies across different sizes of tensors or matrices. 1411 02:23:04,320 --> 02:23:09,280 And when I say matrix multiplication, you can also do matrix multiplication between tensors. 1412 02:23:10,000 --> 02:23:16,400 And in our case, we're using vectors just to add to the confusion. But what is the difference 1413 02:23:16,400 --> 02:23:22,960 here between element wise and dot product? Well, we've got one main addition. And that is addition. 1414 02:23:22,960 --> 02:23:32,560 So if we were to code this out by hand, matrix multiplication by hand, we'd have recall that 1415 02:23:32,560 --> 02:23:40,640 the elements of our tensor are 123. So if we wanted to matrix multiply that by itself, 1416 02:23:40,640 --> 02:23:49,600 we'd have one times one, which is the equivalent of doing one times seven in this visual example. 1417 02:23:49,600 --> 02:23:58,720 And then we'd have plus, it's going to be two times two, two times two. What does that give us? 1418 02:23:58,720 --> 02:24:08,080 Plus three times three. What does it give us? Three times three. That gives us 14. So that's how 1419 02:24:08,080 --> 02:24:14,720 we got to that number there. Now we could do this with a for loop. So let's have a gaze at when I 1420 02:24:14,720 --> 02:24:20,880 say gaze, it means have a look. That's a Australian colloquialism for having a look. But I want to 1421 02:24:20,880 --> 02:24:27,360 show you the time difference in it might not actually be that big a difference if we do it by hand 1422 02:24:27,360 --> 02:24:32,160 versus using something like matmore. And that's another thing to note is that if PyTorch has a 1423 02:24:32,160 --> 02:24:40,800 method already implemented, chances are it's a fast calculating version of that method. So I know 1424 02:24:40,800 --> 02:24:45,680 for basic operators, I said it's usually best to just use this straight up basic operator. 1425 02:24:45,680 --> 02:24:50,880 But for something like matrix multiplication or other advanced operators instead of the basic 1426 02:24:50,880 --> 02:24:55,520 operators, you probably want to use the torch version rather than writing a for loop, which is 1427 02:24:55,520 --> 02:25:02,480 what we're about to do. So let's go value equals zero. This is matrix multiplication by hand. So 1428 02:25:02,480 --> 02:25:11,760 for I in range, len tensor, so for each element in the length of our tensor, which is 123, we want to 1429 02:25:11,760 --> 02:25:19,680 update our value to be plus equal, which is doing this plus reassignment here. The ith element in 1430 02:25:19,680 --> 02:25:29,040 each tensor times the ith element. So times itself. And then how long is this going to take? 1431 02:25:29,040 --> 02:25:42,400 Let's now return the value. We should get 14, print 14. There we go. So 1.9 milliseconds on 1432 02:25:42,400 --> 02:25:49,440 whatever CPU that Google collab is using behind the scenes. But now if we time it and use the torch 1433 02:25:49,440 --> 02:25:55,120 method torch dot matmore, it was tensor dot sensor. And again, we're using a very small tensor. So 1434 02:25:55,120 --> 02:26:02,640 okay, there we go. It actually showed how much quicker it is, even with such a small tensor. 1435 02:26:02,640 --> 02:26:13,440 So this is 1.9 milliseconds. This is 252 microseconds. So this is 10 times slower using a for loop, 1436 02:26:13,440 --> 02:26:18,480 then pie torches vectorized version. I'll let you look into that if you want to find out what 1437 02:26:18,480 --> 02:26:24,560 vectorization means. It's just a type of programming that rather than writing for loops, because as 1438 02:26:24,560 --> 02:26:30,640 you could imagine, if this tensor was, let's say, had a million elements instead of just three, 1439 02:26:30,640 --> 02:26:36,000 if you have to loop through each of those elements one by one, that's going to be quite cumbersome. 1440 02:26:36,000 --> 02:26:44,080 So a lot of pie torches functions behind the scenes implement optimized functions to perform 1441 02:26:45,120 --> 02:26:49,600 mathematical operations, such as matrix multiplication, like the one we did by hand, 1442 02:26:49,600 --> 02:26:56,080 in a far faster manner, as we can see here. And that's only with a tensor of three elements. 1443 02:26:56,080 --> 02:27:00,480 So you can imagine the speedups on something like a tensor with a million elements. 1444 02:27:01,120 --> 02:27:06,960 But with that being said, that is the crux of matrix multiplication. For a little bit more, 1445 02:27:06,960 --> 02:27:12,080 I encourage you to read through this documentation here by mathisfun.com. Otherwise, 1446 02:27:12,080 --> 02:27:17,920 let's look at a couple of rules that we have to satisfy for larger versions of matrix multiplication. 1447 02:27:17,920 --> 02:27:22,640 Because right now, we've done it with a simple tensor, only 123. Let's step things up a notch 1448 02:27:22,640 --> 02:27:30,960 in the next video. Welcome back. In the last video, we were introduced to matrix multiplication, 1449 02:27:30,960 --> 02:27:38,640 which although we haven't seen it yet, is one of the most common operations in neural networks. 1450 02:27:38,640 --> 02:27:46,800 And we saw that you should always try to use torches implementation of certain operations, 1451 02:27:46,800 --> 02:27:51,360 except if they're basic operations, like plus multiplication and whatnot, 1452 02:27:51,360 --> 02:27:57,840 because chances are it's a lot faster version than if you would do things by hand. And also, 1453 02:27:57,840 --> 02:28:04,240 it's a lot less code. Like compared to this, this is pretty verbose code compared to just a matrix 1454 02:28:04,240 --> 02:28:10,000 multiply these two tensors. But there's something that we didn't allude to in the last video. 1455 02:28:10,000 --> 02:28:14,720 There's a couple of rules that need to be satisfied when performing matrix multiplication. 1456 02:28:14,720 --> 02:28:20,960 It worked for us because we have a rather simple tensor. But once you start to build larger tensors, 1457 02:28:20,960 --> 02:28:25,120 you might run into one of the most common errors in deep learning. I'm going to write this down 1458 02:28:25,120 --> 02:28:32,560 actually here. This is one to be very familiar with. One of the most common errors in deep 1459 02:28:32,560 --> 02:28:39,760 learning, we've already alluded to this as well, is shape errors. So let's jump back to this in a 1460 02:28:39,760 --> 02:28:50,400 minute. I just want to write up here. So there are two rules that performing or two main rules 1461 02:28:50,400 --> 02:28:56,560 that performing matrix multiplication needs to satisfy. Otherwise, we're going to get an error. 1462 02:28:57,120 --> 02:29:06,640 So number one is the inner dimensions must match. Let's see what this means. 1463 02:29:06,640 --> 02:29:14,880 So if we want to have two tensors of shape, three by two, and then we're going to use the at symbol. 1464 02:29:15,920 --> 02:29:21,360 Now, we might be asking why the at symbol. Well, the at symbol is another, is a like an operator 1465 02:29:21,360 --> 02:29:27,120 symbol for matrix multiplication. So I just want to give you an example. If we go tensor at 1466 02:29:29,200 --> 02:29:34,400 at stands for matrix multiplication, we get tensor 14, which is exactly the same as what we got there. 1467 02:29:34,400 --> 02:29:39,280 Should you use at or should you use mat mall? I would personally recommend to use mat mall. 1468 02:29:39,280 --> 02:29:43,920 It's a little bit clearer at sometimes can get confusing because it's not as common as seeing 1469 02:29:43,920 --> 02:29:49,680 something like mat mall. So we'll get rid of that, but I'm just using it up here for brevity. 1470 02:29:50,320 --> 02:29:56,800 And then we're going to go three, two. Now, this won't work. We'll see why in a second. 1471 02:29:56,800 --> 02:30:06,800 But if we go two, three, at, and then we have three, two, this will work. Or, and then if we go 1472 02:30:06,800 --> 02:30:13,120 the reverse, say threes on the outside, twos here. And then we have twos on the inside and threes 1473 02:30:13,120 --> 02:30:19,520 on the outside, this will work. Now, why is this? Well, this is the rule number one. The inner 1474 02:30:19,520 --> 02:30:26,960 dimensions must match. So the inner dimensions are what I mean by this is let's create torch 1475 02:30:26,960 --> 02:30:36,880 round or create of size 32. And then we'll get its shape. So we have, so if we created a tensor 1476 02:30:36,880 --> 02:30:42,960 like this, three, two, and then if we created another tensor, well, let me just show you straight 1477 02:30:42,960 --> 02:30:53,760 up torch dot mat mall torch dot ran to watch this won't work. We'll get an error. There we go. So 1478 02:30:53,760 --> 02:30:59,600 this is one of the most common errors that you're going to face in deep learning is that matrix 1479 02:30:59,600 --> 02:31:04,480 one and matrix two shapes cannot be multiplied because it doesn't satisfy rule number one. 1480 02:31:04,480 --> 02:31:11,600 The inner dimensions must match. And so what I mean by inner dimensions is this dimension multiplied 1481 02:31:11,600 --> 02:31:19,120 by this dimension. So say we were trying to multiply three, two by three, two, these are the inner 1482 02:31:19,120 --> 02:31:29,040 dimensions. Now this will work because why the inner dimensions match. Two, three by three, two, 1483 02:31:29,040 --> 02:31:40,640 two, three by three, two. Now notice how the inner dimensions, inner, inner match. Let's see what 1484 02:31:40,640 --> 02:31:48,480 comes out here. Look at that. And now this is where rule two comes into play. Two. The resulting 1485 02:31:48,480 --> 02:32:03,520 matrix has the shape of the outer dimensions. So we've just seen this one two, three at three, two, 1486 02:32:04,560 --> 02:32:12,960 which is at remember is matrix multiply. So we have a matrix of shape, two, three, 1487 02:32:12,960 --> 02:32:20,400 matrix multiply a matrix of three, two, the inner dimensions match. So it works. The resulting shape 1488 02:32:20,400 --> 02:32:30,160 is what? Two, two. Just as we've seen here, we've got a shape of two, two. Now what if we did 1489 02:32:31,120 --> 02:32:38,080 the reverse? What if we did this one that also will work? Three on the outside. What do you think 1490 02:32:38,080 --> 02:32:44,000 is going to happen here? In fact, I encourage you to pause the video and give it a go. So this 1491 02:32:44,000 --> 02:32:49,680 is going to result in a three three matrix. But don't take my word for it. Let's have a look. Three, 1492 02:32:50,560 --> 02:32:57,120 put two on the inside and we'll put two on the inside here and then three on the outside. What 1493 02:32:57,120 --> 02:33:05,920 does it give us? Oh, look at that. A three three. One, two, three. One, two, three. Now what if we 1494 02:33:05,920 --> 02:33:11,600 were to change this? Two and two. This can be almost any number you want. Let's change them both 1495 02:33:11,600 --> 02:33:18,320 to 10. What's going to happen? Will this work? What's the resulting shape going to be? So the 1496 02:33:18,320 --> 02:33:24,160 inner dimensions match? What's rule number two? The resulting matrix has the shape of the outer 1497 02:33:24,160 --> 02:33:29,200 dimension. So what do you think is going to be the shape of this resulting matrix multiplication? 1498 02:33:29,200 --> 02:33:39,360 Well, let's have a look. It's still three three. Wow. Now what if we go 10? 10 on the outside 1499 02:33:40,640 --> 02:33:46,400 and 10 and 10 on the inside? What do we get? Well, we get, I'm not going to count all of those, 1500 02:33:46,400 --> 02:33:54,480 but if we just go shape, we get 10 by 10. Because these are the two main rules of matrix multiplication 1501 02:33:54,480 --> 02:33:58,880 is if you're running into an error that the matrix multiplication can't work. So let's say this was 1502 02:33:58,880 --> 02:34:05,440 10 and this was seven. Watch what's going to happen? We can't multiply them because the inner 1503 02:34:05,440 --> 02:34:10,160 dimensions do not match. We don't have 10 and 10. We have 10 and seven. But then when we change 1504 02:34:10,160 --> 02:34:17,120 this so that they match, we get 10 and 10. Beautiful. So now let's create a little bit more of a 1505 02:34:17,120 --> 02:34:23,840 specific example. We'll create two tenses. We'll come down. Actually, to prevent this video from 1506 02:34:23,840 --> 02:34:28,240 being too long, I've got an error in the word error. That's funny. We'll go on with one of those 1507 02:34:28,240 --> 02:34:32,000 common errors in deep learning shape errors. We've just seen it, but I'm going to get a little bit 1508 02:34:32,000 --> 02:34:38,080 more specific with that shape error in the next video. Before we do that, have a look at matrix 1509 02:34:38,080 --> 02:34:43,680 multiplication. There's a website, my other favorite website. I told you I've got two. This is my 1510 02:34:43,680 --> 02:34:49,120 other one. Matrix multiplication dot XYZ. This is your challenge before the next video. Put in 1511 02:34:49,120 --> 02:34:57,600 some random numbers here, whatever you want, two, 10, five, six, seven, eight, whatever you want. Change 1512 02:34:57,600 --> 02:35:05,280 these around a bit, three, four. Well, that's a five, not a four. And then multiply and just watch 1513 02:35:05,280 --> 02:35:10,560 what happens. That's all I'd like you to do. Just watch what happens and we're going to replicate 1514 02:35:10,560 --> 02:35:16,080 something like this in PyTorch code in the next video. I'll see you there. 1515 02:35:16,080 --> 02:35:22,960 Welcome back. In the last video, we discussed a little bit more about matrix multiplication, 1516 02:35:22,960 --> 02:35:29,280 but we're not done there. We looked at two of the main rules of matrix multiplication, 1517 02:35:29,280 --> 02:35:34,960 and we saw a few errors of what happens if those rules aren't satisfied, particularly if the 1518 02:35:34,960 --> 02:35:41,040 inner dimensions don't match. So this is what I've been alluding to as one of the most common 1519 02:35:41,040 --> 02:35:46,560 errors in deep learning, and that is shape errors. Because neural networks are comprised of lots of 1520 02:35:46,560 --> 02:35:52,720 matrix multiplication operations, if you have some sort of tensor shape error somewhere 1521 02:35:52,720 --> 02:35:59,200 in your neural network, chances are you're going to get a shape error. So now let's investigate 1522 02:35:59,200 --> 02:36:06,560 how we can deal with those. So let's create some tenses, shapes for matrix multiplication. 1523 02:36:06,560 --> 02:36:12,000 And I also showed you the website, sorry, matrix multiplication dot xyz. I hope you had a go at 1524 02:36:12,000 --> 02:36:15,520 typing in some numbers here and visualizing what happens, because we're going to reproduce 1525 02:36:15,520 --> 02:36:22,240 something very similar to what happens here, but with PyTorch code. Shapes for matrix multiplication, 1526 02:36:22,240 --> 02:36:29,440 we have tensor a, let's create this as torch dot tensor. We're going to create a tensor with 1527 02:36:29,440 --> 02:36:36,720 just the elements one, two, all the way up to, let's just go to six, hey, that'll be enough. Six, 1528 02:36:36,720 --> 02:36:44,560 wonderful. And then tensor b can be equal to a torch tensor 1529 02:36:47,440 --> 02:36:53,520 of where we're going to go for this one. Let's go seven, 10, this will be a little bit confusing 1530 02:36:53,520 --> 02:37:03,040 this one, but then we'll go eight, 11, and this will go up to 12, nine, 12. So it's the same 1531 02:37:04,240 --> 02:37:09,040 sort of sequence as what's going on here, but they've been swapped around. So we've got the 1532 02:37:09,040 --> 02:37:14,400 vertical axis here, instead of one, two, three, four, this is just seven, eight, nine, 10, 11, 12. 1533 02:37:14,960 --> 02:37:19,360 But let's now try and perform a matrix multiplication. How do we do that? 1534 02:37:19,360 --> 02:37:25,760 Torch dot mat mall for matrix multiplication. PS torch also has torch dot mm, which stands 1535 02:37:25,760 --> 02:37:30,320 for matrix multiplication, which is a short version. So I'll just write down here so that you know 1536 02:37:30,960 --> 02:37:42,080 tensor a, tensor b. I'm going to write torch dot mm is the same as torch dot mat mall. It's an alias 1537 02:37:42,080 --> 02:37:50,320 for writing less code. This is literally how common matrix multiplications are in PyTorch 1538 02:37:50,320 --> 02:37:56,720 is that they've made torch dot mm as an alias for mat mall. So you have to type four less characters 1539 02:37:56,720 --> 02:38:02,720 using torch dot mm instead of mat mall. But I like to write mat mall because it's a little bit 1540 02:38:02,720 --> 02:38:09,280 like it explains what it does a little bit more than mm. So what do you think's going to happen 1541 02:38:09,280 --> 02:38:14,880 here? It's okay if you're not sure. But what you could probably do to find out is check the 1542 02:38:14,880 --> 02:38:20,320 shapes of these. Does this operation matrix multiplication satisfy the rules that we just 1543 02:38:20,320 --> 02:38:24,800 discussed? Especially this one. This is the main one. The inner dimensions must match. 1544 02:38:25,600 --> 02:38:34,160 Well, let's have a look, hey? Oh, no, mat one and mat two shapes cannot be multiplied. 1545 02:38:34,160 --> 02:38:39,360 Three by two and three by two. This is very similar to what we went through in the last video. 1546 02:38:39,360 --> 02:38:42,640 But now we've got some actual numbers there. Let's check the shape. 1547 02:38:44,400 --> 02:38:51,360 Oh, torch size three two. Torch size three two now. In the last video we created a random tensor 1548 02:38:51,360 --> 02:38:57,120 and we could adjust the shape on the fly. But these tensors already exist. How might we adjust 1549 02:38:57,120 --> 02:39:05,280 the shape of these? Well, now I'm going to introduce you to another very common operation or tensor 1550 02:39:05,280 --> 02:39:13,520 manipulation that you'll see. And that is the transpose. To fix our tensor shape issues, 1551 02:39:13,520 --> 02:39:31,440 we can manipulate the shape of one of our tensors using a transpose. And so, all right here, 1552 02:39:32,000 --> 02:39:38,000 we're going to see this anyway, but I'm going to define it in words. A transpose switches the 1553 02:39:38,000 --> 02:39:48,800 axes or dimensions of a given tensor. So let's see this in action. If we go, and the way to do it, 1554 02:39:48,800 --> 02:40:00,880 is you can go tensor b dot t. Let's see what happens. Let's look at the original tensor b as well. 1555 02:40:01,600 --> 02:40:06,480 So dot t stands for transpose. And that's a little bit hard to read, so we might do these on 1556 02:40:06,480 --> 02:40:18,560 different lines, tensor b. We'll get rid of that. So you see what's happened here. Instead of 1557 02:40:18,560 --> 02:40:24,880 tensor b, this is the original one. We might put the original on top. Instead of the original one 1558 02:40:24,880 --> 02:40:30,960 having seven, eight, nine, 10, 11, 12 down the vertical, the transpose has transposed it to seven, 1559 02:40:30,960 --> 02:40:35,920 eight, nine across the horizontal and 10, 11, 12 down here. Now, if we get the shape of this, 1560 02:40:36,720 --> 02:40:43,200 tensor b dot shape, let's have a look at that. Let's have a look at the original shape, tensor b dot 1561 02:40:43,200 --> 02:40:55,280 shape. What's happened? Oh, no, we've still got three, two. Oh, that's what I've missed out here. 1562 02:40:55,280 --> 02:41:00,960 I've got a typo. Excuse me. I thought I was, you think code that you've written is working, 1563 02:41:00,960 --> 02:41:06,080 but then you realize you've got something as small as just a dot t missing, and it throws off your 1564 02:41:06,080 --> 02:41:13,920 whole train of thought. So you're seeing these arrows on the fly here. Now, tensor b is this, 1565 02:41:13,920 --> 02:41:19,600 but its shape is torch dot size three, two. And if we try to matrix multiply three, two, and three, 1566 02:41:19,600 --> 02:41:26,560 two, tensor a and tensor b, we get an error. Why? Because the inner dimensions do not match. 1567 02:41:26,560 --> 02:41:35,520 But if we perform a transpose on tensor b, we switch the dimensions around. So now, 1568 02:41:35,520 --> 02:41:43,680 we perform a transpose with tensor b dot t, t's for transpose. We have, this is the important 1569 02:41:43,680 --> 02:41:49,360 point as well. We still have the same elements. It's just that they've been rearranged. They've 1570 02:41:49,360 --> 02:41:56,480 been transposed. So now, tensor b still has the same information encoded, but rearranged. 1571 02:41:56,480 --> 02:42:01,840 So now we have torch size two, three. And so when we try to matrix multiply these, 1572 02:42:02,400 --> 02:42:09,360 we satisfy the first criteria. And now look at the output of the matrix multiplication of tensor a 1573 02:42:09,360 --> 02:42:17,440 and tensor b dot t transposed is three, three. And that is because of the second rule of matrix 1574 02:42:17,440 --> 02:42:23,760 multiplication. The resulting matrix has the shape of the outer dimensions. So we've got three, 1575 02:42:23,760 --> 02:42:31,280 two matrix multiply two, three results in a shape of three, three. So let's predify some of this, 1576 02:42:31,280 --> 02:42:36,240 and we'll print out what's going on here. Just so we know, we can step through it, 1577 02:42:36,240 --> 02:42:41,360 because right now we've just got codal over the place a bit. Let's see here, the matrix 1578 02:42:41,360 --> 02:42:54,880 multiplication operation works when tensor b is transposed. And in a second, I'm going to 1579 02:42:54,880 --> 02:42:58,880 show you what this looks like visually. But right now we've done it with pytorch code, 1580 02:42:58,880 --> 02:43:03,040 which might be a little confusing. And that's perfectly fine. Matrix multiplication takes a 1581 02:43:03,040 --> 02:43:10,400 little while and a little practice. So original shapes is going to be tensor a dot shape. Let's 1582 02:43:10,400 --> 02:43:20,640 see what this is. And tensor b equals tensor b dot shape. But the reason why we're spending so 1583 02:43:20,640 --> 02:43:26,240 much time on this is because as you'll see, as you get more and more into neural networks and 1584 02:43:26,240 --> 02:43:32,800 deep learning, the matrix multiplication operation is one of the most or if not the most common. 1585 02:43:32,800 --> 02:43:40,080 Same shape as above, because we haven't changed tensor a shape, we've only changed tensor b shape, 1586 02:43:40,080 --> 02:43:50,880 or we've transposed it. And then in tensor b dot transpose equals, we want tensor b dot 1587 02:43:50,880 --> 02:43:57,600 t dot shape. Wonderful. And then if we print, let's just print out, oops, 1588 02:43:57,600 --> 02:44:06,160 print, I spelled the wrong word there, print. We want, what are we multiplying here? This is 1589 02:44:06,160 --> 02:44:11,120 one of the ways, remember our motto of visualize, visualize, visualize, well, this is how I visualize, 1590 02:44:11,120 --> 02:44:19,680 visualize, visualize things, shape, let's do the at symbol for brevity, tensor, and let's get b dot 1591 02:44:19,680 --> 02:44:27,840 t dot shape. We'll put down our little rule here, inner dimensions must match. And then print, 1592 02:44:29,120 --> 02:44:37,600 let's get the output output, I'll put that on a new line. The output is going to equal 1593 02:44:37,600 --> 02:44:43,040 torch dot, or our outputs already here, but we're going to rewrite it for a little bit of practice, 1594 02:44:43,040 --> 02:44:55,120 tensor a, tensor b dot t. And then we can go print output. And then finally, print, let's get it on a 1595 02:44:55,120 --> 02:45:00,800 new line as well, the output shape, a fair bit going on here. But we're going to step through it, 1596 02:45:00,800 --> 02:45:06,960 and it's going to help us understand a little bit about what's going on. That's the data visualizes 1597 02:45:06,960 --> 02:45:16,880 motto. There we go. Okay, so the original shapes are what torch size three two, and torch size three 1598 02:45:16,880 --> 02:45:23,120 two, the new shapes tensor a stays the same, we haven't changed tensor a, and then we have tensor 1599 02:45:23,120 --> 02:45:31,360 b dot t is torch size two three, then we multiply a three by two by a two by three. So the inner 1600 02:45:31,360 --> 02:45:37,520 dimensions must match, which is correct, they do match two and two. Then we have an output of tensor 1601 02:45:37,520 --> 02:45:45,680 at 27, 30, 33, 61, 68, 75, etc. And the output shape is what the output shape is the outer 1602 02:45:45,680 --> 02:45:52,800 dimensions three three. Now, of course, you could rearrange this maybe transpose tensor a instead of 1603 02:45:52,800 --> 02:45:58,160 tensor b, have a play around with it. See if you can create some more errors trying to multiply these 1604 02:45:58,160 --> 02:46:03,600 two, and see what happens if you transpose tensor a instead of tensor b, that's my challenge. But 1605 02:46:03,600 --> 02:46:11,040 before we finish this video, how about we just recreate what we've done here with this cool website 1606 02:46:11,040 --> 02:46:17,440 matrix multiplication. So what did we have? We had tensor a, which is one to six, let's recreate 1607 02:46:17,440 --> 02:46:28,080 this, remove that, this is going to be one, two, three, four, five, six, and then we want to increase 1608 02:46:28,080 --> 02:46:37,200 this, and this is going to be seven, eight, nine, 10, 11, 12. Is that the right way of doing things? 1609 02:46:38,160 --> 02:46:43,440 So this is already transposed, just to let you know. So this is the equivalent of tensor b 1610 02:46:43,440 --> 02:46:54,880 on the right here, tensor b dot t. So let me just show you, if we go tensor b dot transpose, 1611 02:46:55,920 --> 02:47:01,440 which original version was that, but we're just passing in the transpose version to our matrix 1612 02:47:01,440 --> 02:47:06,960 multiplication website. And then if we click multiply, this is what's happening behind the 1613 02:47:06,960 --> 02:47:12,160 scenes with our pytorch code of matmore. We have one times seven plus two times 10. Did you see 1614 02:47:12,160 --> 02:47:16,960 that little flippy thing that it did? That's where the 27 comes from. And then if we come down here, 1615 02:47:17,600 --> 02:47:22,640 what's our first element? 27 when we matrix multiply them. Then if we do the same thing, 1616 02:47:22,640 --> 02:47:28,480 the next step, we get 30 and 61, from a combination of these numbers, do it again, 1617 02:47:29,360 --> 02:47:36,240 33, 68, 95, from a combination of these numbers, again, and again, and finally we end up with 1618 02:47:36,240 --> 02:47:44,000 exactly what we have here. So that's a little bit of practice for you to go through is to create 1619 02:47:44,000 --> 02:47:49,520 some of your own tensors can be almost whatever you want. And then try to matrix multiply them 1620 02:47:49,520 --> 02:47:54,000 with different shapes. See what happens when you transpose and what different values you get. 1621 02:47:54,000 --> 02:47:57,680 And if you'd like to visualize it, you could write out something like this. That really 1622 02:47:57,680 --> 02:48:01,760 helps me understand matrix multiplication. And then if you really want to visualize it, 1623 02:48:01,760 --> 02:48:07,520 you can go through this website and recreate your target tensors in something like this. 1624 02:48:07,520 --> 02:48:12,080 I'm not sure how long you can go. But yeah, that should be enough to get started. 1625 02:48:12,080 --> 02:48:14,480 So give that a try and I'll see you in the next video. 1626 02:48:17,600 --> 02:48:22,560 Welcome back. In the last few videos, we've covered one of the most fundamental operations 1627 02:48:22,560 --> 02:48:27,760 in neural networks. And that is matrix multiplication. But now it's time to move on. 1628 02:48:27,760 --> 02:48:36,400 And let's cover tensor aggregation. And what I mean by that is finding the min, max, mean, 1629 02:48:37,040 --> 02:48:44,000 sum, et cetera, tensor aggregation of certain tensor values. So for whatever reason, you may 1630 02:48:44,000 --> 02:48:49,040 want to find the minimum value of a tensor, the maximum value, the mean, the sum, what's going on 1631 02:48:49,040 --> 02:48:54,160 there. So let's have a look at some few PyTorch methods that are in built to do all of these. 1632 02:48:54,160 --> 02:48:59,520 And again, if you're finding one of these values, it's called tensor aggregation because you're 1633 02:48:59,520 --> 02:49:04,640 going from what's typically a large amount of numbers to a small amount of numbers. So the min 1634 02:49:04,640 --> 02:49:12,160 of this tensor would be 27. So you're turning it from nine elements to one element, hence 1635 02:49:12,160 --> 02:49:20,560 aggregation. So let's create a tensor, create a tensor, x equals torch dot, let's use a range. 1636 02:49:20,560 --> 02:49:27,760 We'll create maybe a zero to 100 with a step of 10. Sounds good to me. And we can find the min 1637 02:49:30,640 --> 02:49:36,640 by going, can we do torch dot min? Maybe we can. Or we could also go 1638 02:49:38,880 --> 02:49:39,520 x dot min. 1639 02:49:39,520 --> 02:49:52,400 And then we can do the same, find the max torch dot max and x dot max. Now how do you think we 1640 02:49:52,400 --> 02:50:03,520 might get the average? So let's try it out. Or find the mean, find the mean torch dot mean 1641 02:50:03,520 --> 02:50:13,440 x. Oops, we don't have an x. Is this going to work? What's happened? Mean input data type 1642 02:50:13,440 --> 02:50:19,360 should be either floating point or complex D types got long instead. Ha ha. Finally, 1643 02:50:19,360 --> 02:50:23,840 I knew the error would show its face eventually. Remember how I said it right up here that 1644 02:50:24,640 --> 02:50:29,360 we've covered a fair bit already. But right up here, some of the most common errors that 1645 02:50:29,360 --> 02:50:33,040 you're going to run into is tensor is not the right data type, not the right shape. We've seen 1646 02:50:33,040 --> 02:50:36,880 that with matrix multiplication, not the right device. We haven't seen that yet. But not the 1647 02:50:36,880 --> 02:50:42,960 right data type. This is one of those times. So it turns out that the tensor that we created, 1648 02:50:42,960 --> 02:50:46,640 x is of the data type, x dot D type. 1649 02:50:48,720 --> 02:50:53,360 In 64, which is long. So if we go to, let's look up torch tensor. 1650 02:50:53,360 --> 02:51:04,000 This is where they're getting long from. We've seen long before is N64. Where's that or long? 1651 02:51:04,000 --> 02:51:08,560 Yeah. So long tenter. That's what it's saying. And it turns out that the torch mean function 1652 02:51:09,120 --> 02:51:15,200 can't work on tensors with data type long. So what can we do here? Well, we can change 1653 02:51:15,200 --> 02:51:24,480 the data type of x. So let's go torch mean x type and change it to float 32. Or before we do that, 1654 02:51:25,360 --> 02:51:32,720 if we go to torch dot mean, is this going to tell us that it needs a D type? Oh, D type. 1655 02:51:32,720 --> 02:51:48,560 One option on the desired data type. Does it have float 32? It doesn't tell us. Ah, so this is 1656 02:51:48,560 --> 02:51:52,640 another one of those little hidden things that you're going to come across. And you only really 1657 02:51:52,640 --> 02:51:58,880 come across this by writing code is that sometimes the documentation doesn't really tell you explicitly 1658 02:51:58,880 --> 02:52:06,000 what D type the input should be, the input tensor. However, we find out that with this error message 1659 02:52:06,000 --> 02:52:12,080 that it should either be a floating point or a complex D type, not along. So we can convert it 1660 02:52:12,080 --> 02:52:19,840 to torch float 32. So all we've done is gone x type as type float 32. Let's see what happens here. 1661 02:52:19,840 --> 02:52:27,200 45 beautiful. And then the same thing, if we went, can we do x dot mean? Is that going to work as well? 1662 02:52:29,520 --> 02:52:38,400 Oh, same thing. So if we go x dot type torch dot float 32, get the mean of that. There we go. 1663 02:52:38,960 --> 02:52:44,480 So that is, I knew it would come up eventually. A beautiful example of finding the right data 1664 02:52:44,480 --> 02:52:56,560 type. Let me just put a note here. Note the torch dot mean function requires a tensor of float 32. 1665 02:52:57,440 --> 02:53:05,760 So so far, we've seen two of the major errors in PyTorch is data type and shape issues. What's 1666 02:53:05,760 --> 02:53:12,320 another one that we said? Oh, some. So find the sum. Find the sum we want x dot sum or maybe we 1667 02:53:12,320 --> 02:53:17,280 just do torch dot sum first. Keep it in line with what's going on above and x dot sum. 1668 02:53:18,960 --> 02:53:24,720 Which one of these should you use like torch dot something x or x dot sum? Personally, 1669 02:53:24,720 --> 02:53:30,080 I prefer torch dot max, but you'll also probably see me at points right this. It really depends 1670 02:53:30,080 --> 02:53:35,840 on what's going on. I would say pick whichever style you prefer. And because behind the scenes, 1671 02:53:35,840 --> 02:53:40,960 they're calling the same methodology. Picture whichever style you prefer and stick with that 1672 02:53:40,960 --> 02:53:47,040 throughout your code. For now, let's leave it at that tensor aggregation. There's some 1673 02:53:47,600 --> 02:53:53,040 finding min max mean sum. In the next video, we're going to look at finding the positional 1674 02:53:53,040 --> 02:54:00,000 min and max, which is also known as arg max and arg min or vice versa. So actually, that's a 1675 02:54:00,000 --> 02:54:05,280 little bit of a challenge for the next video is see how you can find out what the positional 1676 02:54:05,280 --> 02:54:12,240 min and max is of this. And what I mean by that is which index does the max value occur at and 1677 02:54:12,240 --> 02:54:17,600 which index of this tensor does the min occur at? You'll probably want to look into the methods 1678 02:54:17,600 --> 02:54:23,520 arg min torch dot arg min for that one and torch dot arg max for that. But we'll cover that in the 1679 02:54:23,520 --> 02:54:32,320 next video. I'll see you there. Welcome back. In the last video, we learned all about tensor 1680 02:54:32,320 --> 02:54:37,280 aggregation. And we found the min the max the mean and the sum. And we also ran into one of the most 1681 02:54:37,280 --> 02:54:44,160 common issues in pie torch and deep learning and neural networks in general. And that was wrong 1682 02:54:44,160 --> 02:54:50,240 data types. And so we solved that issue by converting because some functions such as torch dot mean 1683 02:54:50,240 --> 02:54:56,800 require a specific type of data type as input. And we created our tensor here, which was of by 1684 02:54:56,800 --> 02:55:03,360 default torch in 64. However, torch dot mean requires torch dot float 32. We saw that in an error. 1685 02:55:03,360 --> 02:55:08,480 We fix that by changing the type of the inputs. I also issued you the challenge of finding 1686 02:55:10,560 --> 02:55:17,760 finding the positional min and max. And you might have found that you can use the 1687 02:55:17,760 --> 02:55:30,000 arg min for the minimum. Let's remind ourselves of what x is x. So this means at tensor index of 1688 02:55:30,000 --> 02:55:36,800 tensor x. If we find the argument, that is the minimum value, which is zero. So at index zero, 1689 02:55:36,800 --> 02:55:44,960 we get the value zero. So that's at zero there. Zero there. This is an index value. So this is 1690 02:55:44,960 --> 02:55:56,240 what arg min stands for find the position in tensor that has the minimum value with arg min. 1691 02:55:57,840 --> 02:56:05,520 And then returns index position of target tensor 1692 02:56:05,520 --> 02:56:15,520 where the minimum value occurs. Now, let's just change x to start from one, 1693 02:56:18,240 --> 02:56:27,680 just so there we go. So the arg min is still position zero, position zero. So this is an index 1694 02:56:27,680 --> 02:56:34,240 value. And then if we index on x at the zeroth index, we get one. So the minimum value in 1695 02:56:34,240 --> 02:56:43,920 x is one. And then the maximum, you might guess, is find the position in tensor that has the maximum 1696 02:56:43,920 --> 02:56:51,200 value with arg max. And it's going to be the same thing, except it'll be the maximum, which is, 1697 02:56:51,200 --> 02:57:00,960 which position index nine. So if we go zero, one, two, three, four, five, six, seven, eight, 1698 02:57:00,960 --> 02:57:09,920 nine. And then if we index on x for the ninth element, we get 91 beautiful. Now these two are 1699 02:57:09,920 --> 02:57:17,760 useful for if yes, you want to define the minimum of a tensor, you can just use min. But if you 1700 02:57:17,760 --> 02:57:22,720 sometimes you don't want the actual minimum value, you just want to know where it appears, 1701 02:57:22,720 --> 02:57:28,080 particularly with the arg max value. This is helpful for when we use the soft max activation 1702 02:57:28,080 --> 02:57:32,640 function later on. Now we haven't covered that yet. So I'm not going to allude too much to it. 1703 02:57:32,640 --> 02:57:38,560 But just remember to find the positional min and max, you can use arg min and arg max. 1704 02:57:39,280 --> 02:57:44,240 So that's all we need to cover with that. Let's keep going in the next video. I'll see you then. 1705 02:57:47,680 --> 02:57:53,280 Welcome back. So we've covered a fair bit of ground. And just to let you know, I took a little break 1706 02:57:53,280 --> 02:57:57,840 after going through all of these. And I'd just like to show you how I get back to where I'm at, 1707 02:57:57,840 --> 02:58:04,080 because if we tried to just write x here and press shift and enter, because our collab 1708 02:58:04,080 --> 02:58:08,240 was disconnected, it's now connecting because as soon as you press any button in collab, it's 1709 02:58:08,240 --> 02:58:15,440 going to reconnect. It's going to try to connect, initialize, and then x is probably not going to 1710 02:58:15,440 --> 02:58:21,520 be stored in memory anymore. So there we go. Name x is not defined. That's because the collab 1711 02:58:21,520 --> 02:58:26,880 state gets reset if you take a break for a couple of hours. This is to ensure Google can keep 1712 02:58:26,880 --> 02:58:31,760 providing resources for free. And it deletes everything to ensure that there's no compute 1713 02:58:31,760 --> 02:58:38,560 resources that are being wasted. So to get back to here, I'm just going to go restart and run all. 1714 02:58:38,560 --> 02:58:44,160 You don't necessarily have to restart the notebook. You could also go, do we have run all? Yeah, 1715 02:58:44,160 --> 02:58:48,480 we could do run before. That'll run every cell before this. We could run after we could run the 1716 02:58:48,480 --> 02:58:53,360 selection, which is this cell here. I'm going to click run all, which is just going to go through 1717 02:58:53,360 --> 02:59:01,360 every single cell that we've coded above and run them all. However, it will also stop at the errors 1718 02:59:01,360 --> 02:59:06,720 where I've left in on purpose. So remember when we ran into a shape error? Well, because this error, 1719 02:59:06,720 --> 02:59:11,280 we didn't fix it. I left it there on purpose so that we could keep seeing a shape error. It's 1720 02:59:11,280 --> 02:59:17,680 going to stop at this cell. So we're going to have to run every cell after the error cell. 1721 02:59:17,680 --> 02:59:22,560 So see how it's going to run these now. They run fine. And then we get right back to where we were, 1722 02:59:22,560 --> 02:59:30,240 which was X. So that's just a little tidbit of how I get back into coding. Let's now cover reshaping, 1723 02:59:32,320 --> 02:59:39,280 stacking, squeezing, and unsqueezing. You might be thinking, squeezing and unsqueezing. What are 1724 02:59:39,280 --> 02:59:45,440 you talking about, Daniel? Well, it's all to do with tenses. And you're like, are we going to 1725 02:59:45,440 --> 02:59:50,000 squeeze our tenses? Give them a hug. Are we going to let them go by unsqueezing them? 1726 02:59:50,000 --> 02:59:56,960 Well, let's quickly define what these are. So reshaping is we saw before one of the most common 1727 02:59:56,960 --> 03:00:01,600 errors in machine learning and deep learning is shape mismatches with matrices because they 1728 03:00:01,600 --> 03:00:10,240 have to satisfy certain rules. So reshape reshapes an input tensor to a defined shape. 1729 03:00:10,880 --> 03:00:15,360 Now, we're just defining these things in words right now, but we're going to see it in code in 1730 03:00:15,360 --> 03:00:26,000 just a minute. There's also view, which is return a view of an input tensor of certain shape, 1731 03:00:26,560 --> 03:00:34,800 but keep the same memory as the original tensor. So we'll see what view is in a second. 1732 03:00:34,800 --> 03:00:40,000 Reshaping and view are quite similar, but a view always shares the same memory as the original 1733 03:00:40,000 --> 03:00:45,760 tensor. It just shows you the same tensor, but from a different perspective, a different shape. 1734 03:00:46,320 --> 03:00:55,680 And then we have stacking, which is combine multiple tensors on top of each other. This is a V stack 1735 03:00:55,680 --> 03:01:05,520 for vertical stack or side by side. H stack. Let's see what different types of torch stacks there are. 1736 03:01:05,520 --> 03:01:09,680 Again, this is how I research different things. If I wanted to learn something new, I would search 1737 03:01:09,680 --> 03:01:16,400 torch something stack concatenate a sequence of tensors along a new dimension. Okay. So maybe we 1738 03:01:16,400 --> 03:01:21,360 not H stack or V stack, we can just define what dimension we'd like to combine them on. 1739 03:01:21,360 --> 03:01:28,240 I wonder if there is a torch V stack. Torch V stack. Oh, there it is. And is there a torch H stack for 1740 03:01:28,240 --> 03:01:34,640 horizontal stack? There is a H stack. Beautiful. So we'll focus on just the plain stack. If you 1741 03:01:34,640 --> 03:01:39,040 want to have a look at V stack, it'll be quite similar to what we're going to do with stack 1742 03:01:39,040 --> 03:01:42,800 and same with H stack. Again, this is just words for now. We're going to see the code in a minute. 1743 03:01:43,680 --> 03:01:52,080 So there's also squeeze, which removes all one dimensions. I'm going to put one in code, 1744 03:01:52,960 --> 03:01:58,080 dimensions from a tensor. We'll see what that looks like. And then there's unsqueeze, 1745 03:01:58,080 --> 03:02:11,920 which adds a one dimension to our target tensor. And then finally, there's permute, which is return 1746 03:02:11,920 --> 03:02:25,360 a view of the input with dimensions permuted. So swapped in a certain way. So a fair few methods 1747 03:02:25,360 --> 03:02:32,640 here. But essentially the crust of all of these, the main point of all of these is to manipulate 1748 03:02:32,640 --> 03:02:39,920 our tensors in some way to change their shape or change their dimension. Because again, one of the 1749 03:02:39,920 --> 03:02:45,200 number one issues in machine learning and deep learning is tensor shape issues. So let's start 1750 03:02:45,200 --> 03:02:51,440 off by creating a tensor and have a look at each of these. Let's create a tensor. And then we're 1751 03:02:51,440 --> 03:02:56,480 going to just import torch. We don't have to, but this will just enable us to run the notebook 1752 03:02:56,480 --> 03:03:02,320 directly from this cell if we wanted to, instead of having to run everything above here. So let's 1753 03:03:02,320 --> 03:03:09,440 create another X torch dot a range because range is deprecated. I'm just going to add a few code 1754 03:03:09,440 --> 03:03:15,440 cells here so that I can scroll and that's in the middle of the screen there. Beautiful. So let's 1755 03:03:15,440 --> 03:03:22,560 just make it between one and 10 nice and simple. And then let's have a look at X and X dot shape. 1756 03:03:22,560 --> 03:03:30,160 What does this give us? Okay, beautiful. So we've got the numbers from one to nine. Our tensor is 1757 03:03:30,160 --> 03:03:40,960 of shape torch size nine. Let's start with reshape. So how about we add an extra dimension. So then 1758 03:03:40,960 --> 03:03:48,800 we have X reshaped equals X dot reshape. Now a key thing to keep in mind about the reshape 1759 03:03:48,800 --> 03:03:54,080 is that the dimensions have to be compatible with the original dimensions. So we're going to 1760 03:03:54,080 --> 03:03:59,680 change the shape of our original tensor with a reshape. And we try to change it into the shape 1761 03:03:59,680 --> 03:04:06,640 one seven. Does that work with the number nine? Well, let's find out, hey, let's check out X reshaped. 1762 03:04:06,640 --> 03:04:16,480 And then we'll look at X reshaped dot shape. What's this going to do? Oh, why do we get an error there? 1763 03:04:16,480 --> 03:04:21,280 Well, it's telling us here, this is what pie torch is actually really good at is giving us 1764 03:04:21,280 --> 03:04:26,720 errors for what's going wrong. We have one seven is invalid for input size of nine. 1765 03:04:26,720 --> 03:04:32,720 Well, why is that? Well, we're trying to squeeze nine elements into a tensor of one 1766 03:04:32,720 --> 03:04:40,000 times seven into seven elements. But if we change this to nine, what do we get? Ah, so do you notice 1767 03:04:40,000 --> 03:04:45,280 what just happened here? We just added a single dimension. See the single square bracket with 1768 03:04:45,280 --> 03:04:51,360 the extra shape here. What if we wanted to add two? Can we do that? No, we can't. Why is that? 1769 03:04:51,920 --> 03:04:57,680 Well, because two nine is invalid for input size nine, because two times nine is what? 1770 03:04:57,680 --> 03:05:02,880 18. So we're trying to double the amount of elements without having double the amount of elements. 1771 03:05:02,880 --> 03:05:07,920 So if we change this back to one, what happens if we change these around nine one? What does this 1772 03:05:07,920 --> 03:05:14,480 do? Oh, a little bit different there. So now instead of adding one on the first dimension or 1773 03:05:14,480 --> 03:05:21,920 the zeroth dimension, because Python is zero indexed, we added it on the first dimension, 1774 03:05:21,920 --> 03:05:27,200 which is giving us a square bracket here if we go back. So we add it to the outside here, 1775 03:05:27,200 --> 03:05:30,960 because we've put the one there. And then if we wanted to add it on the inside, 1776 03:05:32,080 --> 03:05:37,920 we put the one on the outside there. So then we've got the torch size nine one. Now, let's try 1777 03:05:37,920 --> 03:05:45,120 change the view, change the view. So just to reiterate, the reshape has to be compatible 1778 03:05:45,120 --> 03:05:50,880 with the original size. So how about we change this to one to 10? So we have a size of 10, 1779 03:05:50,880 --> 03:05:57,440 and then we can go five, two, what happens there? Oh, it's compatible because five times two equals 1780 03:05:57,440 --> 03:06:05,680 10. And then what's another way we could do this? How about we make it up to 12? So we've got 12 1781 03:06:05,680 --> 03:06:12,960 elements, and then we can go three, four, a code cells taking a little while run here. 1782 03:06:12,960 --> 03:06:20,320 Then we'll go back to nine, just so we've got the original there. 1783 03:06:22,400 --> 03:06:31,040 Whoops, they're going to be incompatible. Oh, so this is another thing. This is good. 1784 03:06:31,040 --> 03:06:35,280 We're getting some errors on the fly here. Sometimes you'll get saved failed with Google 1785 03:06:35,280 --> 03:06:40,960 CoLab, and automatic saving failed. What you can do to fix this is just either keep coding, 1786 03:06:40,960 --> 03:06:46,960 keep running some cells, and CoLab will fix itself in the background, or restart the notebook, 1787 03:06:46,960 --> 03:06:52,400 close it, and open again. So we've got size nine, or size eight, sorry, incompatible. 1788 03:06:54,000 --> 03:06:59,920 But this is good. You're seeing the errors that come up on the fly, rather than me sort of just 1789 03:06:59,920 --> 03:07:03,440 telling you what the errors are, you're seeing them as they come up for me. I'm trying to live 1790 03:07:03,440 --> 03:07:09,120 code this, and this is what's going to happen when you start to use Google CoLab, and subsequently 1791 03:07:09,120 --> 03:07:16,400 other forms of Jupyter Notebooks. But now let's get into the view, so we can go z equals, 1792 03:07:16,400 --> 03:07:25,840 let's change the view of x. View will change it to one nine, and then we'll go z, and then z dot shape. 1793 03:07:29,680 --> 03:07:36,640 Ah, we get the same thing here. So view is quite similar to reshape. Remember, though, that a 1794 03:07:36,640 --> 03:07:44,960 view shares the memory with the original tensor. So z is just a different view of x. So z shares 1795 03:07:44,960 --> 03:07:54,640 the same memory as what x does. So let's exemplify this. So changing z changes x, because a view of 1796 03:07:54,640 --> 03:08:04,640 a tensor shares the same memory as the original input. So let's just change z, change the first 1797 03:08:04,640 --> 03:08:11,360 element by using indexing here. So we're targeting one, we'll set this to equal five, and then we'll 1798 03:08:11,360 --> 03:08:19,440 see what z and x equal. Yeah, so see, we've got z, the first one here, we change the first element, 1799 03:08:19,440 --> 03:08:25,360 the zero element to five. And the same thing happens with x, we change the first element of z. 1800 03:08:25,360 --> 03:08:32,160 So because z is a view of x, the first element of x changes as well. But let's keep going. How 1801 03:08:32,160 --> 03:08:37,040 about we stack some tenses on top of each other? And we'll see what the stack function does in 1802 03:08:37,040 --> 03:08:48,080 torch. So stack tenses on top of each other. And I'll just see if I press command S to save, 1803 03:08:48,080 --> 03:08:55,200 maybe we'll get this fixed. Or maybe it just will fix itself. Oh, notebook is saved. 1804 03:08:56,960 --> 03:09:01,280 Unless you've made some extensive changes that you're worried about losing, you could just 1805 03:09:01,280 --> 03:09:07,200 download this notebook, so file download, and upload it to collab. But usually if you click yes, 1806 03:09:08,880 --> 03:09:13,200 it sort of resolves itself. Yeah, there we go. All changes saved. So that's beautiful 1807 03:09:13,200 --> 03:09:18,720 troubleshooting on the fly. I like that. So x stack, let's stack some tenses together, 1808 03:09:18,720 --> 03:09:25,200 equals torch stack. Let's go x x x, because if we look at what the doc string of stack is, 1809 03:09:25,200 --> 03:09:32,160 will we get this in collab? Or we just go to the documentations? Yeah. So list, it takes a list of 1810 03:09:32,160 --> 03:09:37,200 tenses and concatenates a sequence of tenses along a new dimension. And we define the dimension, 1811 03:09:37,200 --> 03:09:42,480 the dimension by default is zero. That's a little bit hard to read for me. So tenses, 1812 03:09:42,480 --> 03:09:46,560 dim equals zero. If we come into here, the default dimension is zero. Let's see what happens when 1813 03:09:46,560 --> 03:09:52,640 we play around with the dimension here. So we've got four x's. And the first one, we'll just do it 1814 03:09:52,640 --> 03:10:01,120 by default, x stack. Okay, wonderful. So they're stacked vertically. Let's see what happens if we 1815 03:10:01,120 --> 03:10:07,600 change this to one. Oh, they rearranged a little and stack like that. What happens if we change it 1816 03:10:07,600 --> 03:10:13,120 to two? Does it have a dimension to? Oh, we can't do that. Well, that's because the original shape 1817 03:10:13,120 --> 03:10:19,680 of x is incompatible with using dimension two. So the only real way to get used to what happens 1818 03:10:19,680 --> 03:10:23,520 here by stacking them on top of each other is to play around with the different values for the 1819 03:10:23,520 --> 03:10:30,000 dimension. So dim zero, dim one, they look a little bit different there. Now they're on top of each 1820 03:10:30,000 --> 03:10:37,840 other. And so the first zero index is now the zeroth tensor. And then same with two being there, 1821 03:10:37,840 --> 03:10:44,240 three and so on. But we'll leave it at the default. And there's also v stack and h stack. I'll leave 1822 03:10:44,240 --> 03:10:52,000 that to you to to practice those. But I think from memory v stack is using dimension equals zero. 1823 03:10:52,000 --> 03:10:57,680 Or h stack is like using dimension equals one. I may have those back the front. You can correct me 1824 03:10:57,680 --> 03:11:04,640 if I'm wrong there. Now let's move on. We're going to now have a look at squeeze and unsqueeze. 1825 03:11:05,600 --> 03:11:10,880 So actually, I'm going to get you to practice this. So see if you can look up torch squeeze 1826 03:11:10,880 --> 03:11:16,720 and torch unsqueeze. And see if you can try them out. We've created a tensor here. We've used 1827 03:11:16,720 --> 03:11:22,720 reshape and view and we've used stack. The usage of squeeze and unsqueeze is quite similar. So give 1828 03:11:22,720 --> 03:11:27,040 that a go. And to prevent this video from getting too long, we'll do them together in the next video. 1829 03:11:29,760 --> 03:11:36,080 Welcome back. In the last video, I issued the challenge of trying out torch dot squeeze, 1830 03:11:36,080 --> 03:11:45,680 which removes all single dimensions from a target tensor. And how would you try that out? Well, 1831 03:11:45,680 --> 03:11:51,760 here's what I would have done. I'd go to torch dot squeeze and see what happens. Open up the 1832 03:11:51,760 --> 03:11:58,480 documentation. Squeeze input dimension returns a tensor with all the dimensions of input size 1833 03:11:58,480 --> 03:12:04,720 one removed. And does it have some demonstrations? Yes, it does. Wow. Okay. So you could copy this in 1834 03:12:04,720 --> 03:12:11,840 straight into a notebook, copy it here. But what I'd actually encourage you to do quite often is 1835 03:12:11,840 --> 03:12:17,600 if you're looking up a new torch method you haven't used, code all of the example by hand. And then 1836 03:12:17,600 --> 03:12:22,800 just practice what the inputs and outputs look like. So x is the input here. Check the size of x, 1837 03:12:23,360 --> 03:12:30,080 squeeze x, well, set the squeeze of x to y, check the size of y. So let's replicate something 1838 03:12:30,080 --> 03:12:38,000 similar to this. We'll go into here, we'll look at x reshaped and we'll remind ourselves of x reshaped 1839 03:12:38,000 --> 03:12:49,600 dot shape. And then how about we see what x reshaped dot squeeze looks like. Okay. What happened here? 1840 03:12:50,400 --> 03:12:55,360 Well, we started with two square brackets. And we started with a shape of one nine 1841 03:12:55,360 --> 03:13:02,560 and removes all single dimensions from a target tensor. And now if we call the squeeze method on 1842 03:13:02,560 --> 03:13:09,120 x reshaped, we only have one square bracket here. So what do you think the shape of x reshaped dot 1843 03:13:09,120 --> 03:13:17,120 squeeze is going to be? We'll check the shape here. It's just nine. So that's the squeeze method, 1844 03:13:17,120 --> 03:13:24,640 removes all single dimensions. If we had one one nine, it would remove all of the ones. So it would 1845 03:13:24,640 --> 03:13:31,360 just end up being nine as well. Now, let's write some print statements so we can have a little 1846 03:13:31,360 --> 03:13:39,440 pretty output. So previous tensor, this is what I like to do. This is a form of visualize, visualize, 1847 03:13:39,440 --> 03:13:46,080 visualize. If I'm trying to get my head around something, I print out each successive change 1848 03:13:46,080 --> 03:13:51,280 to see what's happening. That way, I can go, Oh, okay. So that's what it was there. And then I 1849 03:13:51,280 --> 03:13:57,360 called that line of code there. Yes, it's a bit tedious. But you do this half a dozen times, a 1850 03:13:57,360 --> 03:14:02,480 fair few times. I mean, I still do it a lot of the time, even though I've written thousands of lines 1851 03:14:02,480 --> 03:14:07,680 of machine learning code. But it starts to become instinct after a while, you start to go, Oh, okay, 1852 03:14:07,680 --> 03:14:13,920 I've got a dimension mismatch on my tensors. So I need to squeeze them before I put them into a 1853 03:14:13,920 --> 03:14:23,040 certain function. For a little while, but with practice, just like riding a bike, right? But that 1854 03:14:23,040 --> 03:14:27,600 try saying is like when you first start, you're all wobbly all over the place having to look up 1855 03:14:27,600 --> 03:14:32,720 the documentation, not that there's much documentation for riding a bike, you just kind of keep trying. 1856 03:14:32,720 --> 03:14:38,480 But that's the style of coding. I'd like you to adopt is to just try it first. Then if you're stuck, 1857 03:14:38,480 --> 03:14:42,640 go to the documentation, look something up, print it out like this, what we're doing, 1858 03:14:42,640 --> 03:14:47,440 quite cumbersome. But this is going to give us a good explanation for what's happening. Here's our 1859 03:14:47,440 --> 03:14:53,120 previous tensor x reshaped. And then if we look at the shape of x reshaped, it's one nine. And then 1860 03:14:53,120 --> 03:14:57,600 if we call the squeeze method, which removes all single dimensions from a target tensor, 1861 03:14:57,600 --> 03:15:04,160 we have the new tensor, which is has one square bracket removed. And the new shape is all single 1862 03:15:04,160 --> 03:15:09,840 dimensions removed. So it's still the original values, but just a different dimension. Now, 1863 03:15:09,840 --> 03:15:14,800 let's do the same as what we've done here with unsqueeze. So we've given our tensors a hug and 1864 03:15:14,800 --> 03:15:18,480 squeezed out all the single dimensions of them. Now we're going to unsqueeze them. We're going to 1865 03:15:18,480 --> 03:15:25,040 take a step back and let them grow a bit. So torch unsqueeze adds a single dimension 1866 03:15:26,480 --> 03:15:34,720 to a target tensor at a specific dim dimension. Now that's another thing to note in PyTorch whenever 1867 03:15:34,720 --> 03:15:40,000 it says dim, that's dimension as in this is a zeroth dimension, first dimension. And if there 1868 03:15:40,000 --> 03:15:45,680 was more here, we'd go two, three, four, five, six, et cetera. Because why tensors can have 1869 03:15:45,680 --> 03:15:56,320 unlimited dimensions. So let's go previous target can be excused. So we'll get this squeezed version 1870 03:15:56,320 --> 03:16:02,720 of our tensor, which is x squeezed up here. And then we'll go print. The previous shape 1871 03:16:02,720 --> 03:16:14,400 is going to be x squeezed dot shape. And then we're going to add an extra dimension with unsqueeze. 1872 03:16:17,360 --> 03:16:24,400 There we go, x unsqueezed equals x squeezed. So our tensor before that we remove the single 1873 03:16:24,400 --> 03:16:32,160 dimension. And we're going to put in unsqueeze, dim, we'll do it on the zeroth dimension. And I 1874 03:16:32,160 --> 03:16:35,840 want you to have a think about what this is going to output even before we run the code. 1875 03:16:35,840 --> 03:16:39,680 Just think about, because we've added an extra dimension on the zeroth dimension, 1876 03:16:39,680 --> 03:16:45,440 what's the new shape of the unsqueeze tensor going to be? So we're going to go x unsqueezed. 1877 03:16:47,120 --> 03:16:56,320 And then we're going to go print, we'll get our new tensor shape, which is going to be x unsqueezed 1878 03:16:56,320 --> 03:17:04,080 dot shape. All right, let's have a look. There we go. So there's our previous tensor, 1879 03:17:04,080 --> 03:17:10,560 which is the squeezed version, just as a single dimension here. And then we have our new tensor, 1880 03:17:10,560 --> 03:17:16,240 which with the unsqueeze method on dimension zero, we've added a square bracket on the zeroth 1881 03:17:16,240 --> 03:17:20,160 dimension, which is this one here. Now what do you think's going to happen if I change this to one? 1882 03:17:20,160 --> 03:17:28,880 Where's the single dimension going to be added? Let's have a look. Ah, so instead of adding the 1883 03:17:28,880 --> 03:17:34,480 single dimension on the zeroth dimension, we've added it on the first dimension here. It's quite 1884 03:17:34,480 --> 03:17:40,640 confusing because Python is zero index. So I kind of want to my brain's telling me to say first, 1885 03:17:40,640 --> 03:17:45,920 but it's really the zeroth index here or the zeroth dimension. Now let's change this back to 1886 03:17:45,920 --> 03:17:52,320 zero. But that's just another way of exploring things. Every time there's like a parameter that 1887 03:17:52,320 --> 03:17:58,320 we have here, dim equals something like that could be shape, could be size, whatever, try 1888 03:17:58,320 --> 03:18:02,640 changing the values. That's what I'd encourage you to do. And even write some print code like 1889 03:18:02,640 --> 03:18:09,600 we've done here. Now there's one more we want to try out. And that's permute. So torch dot permute 1890 03:18:09,600 --> 03:18:23,840 rearranges the dimensions of a target tensor in a specified order. So if we wanted to check out, 1891 03:18:23,840 --> 03:18:30,080 let's get rid of some of these extra tabs. Torch dot permute. Let's have a look. This one took me 1892 03:18:30,080 --> 03:18:36,080 a little bit of practice to get used to. Because again, working with zeroth dimensions, even though 1893 03:18:36,080 --> 03:18:41,760 it seems like the first one. So returns a view. Okay. So we know that a view shares the memory of 1894 03:18:41,760 --> 03:18:47,280 the original input tensor with its dimensions permuted. So permuted for me, I didn't really know 1895 03:18:47,280 --> 03:18:53,040 what that word meant. I just have mapped in my own memory that permute means rearrange dimensions. 1896 03:18:53,600 --> 03:18:58,480 So the example here is we start with a random tensor, we check the size, and then we'd have 1897 03:18:58,480 --> 03:19:04,320 torch permute. We're going to swap the order of the dimensions. So the second dimension is first, 1898 03:19:04,320 --> 03:19:10,480 the zeroth dimension is in the middle, and the first dimension is here. So these are dimension 1899 03:19:10,480 --> 03:19:17,600 values. So if we have torch random two, three, five, two, zero, one has changed this one to be 1900 03:19:17,600 --> 03:19:24,480 over here. And then zero, one is two, three, and now two, three there. So let's try something similar 1901 03:19:24,480 --> 03:19:30,800 to this. So one of the common places you'll be using permute, or you might see permute being 1902 03:19:30,800 --> 03:19:37,040 used is with images. So there's a data specific data format. We've kind of seen a little bit 1903 03:19:37,040 --> 03:19:44,320 before, not too much. Original equals torch dot rand size equals. So an image tensor, 1904 03:19:44,880 --> 03:19:50,800 we go height width color channels on the end. So I'll just write this down. So this is height 1905 03:19:50,800 --> 03:19:57,040 width color channels. Remember, much of, and I'm going to spell color Australian style, 1906 03:19:57,040 --> 03:20:04,080 much of deep learning is turning your data into numerical representations. And this is quite common 1907 03:20:04,080 --> 03:20:10,000 numerical representation of image data. You have a tensor dimension for the height, a tensor dimension 1908 03:20:10,000 --> 03:20:14,240 for the width, and a tensor dimension for the color channels, which is red, green, and blue, 1909 03:20:14,240 --> 03:20:20,080 because a certain number of red, green, and blue creates almost any color. Now, if we want to 1910 03:20:20,080 --> 03:20:31,840 permute this, so permute the original tensor to rearrange the axis or dimension, axis or dimension, 1911 03:20:31,840 --> 03:20:40,080 are kind of used in the same light for tensors or dim order. So let's switch the color channels 1912 03:20:40,080 --> 03:20:45,680 to be the first or the zeroth dimension. So instead of height width color channels, 1913 03:20:45,680 --> 03:20:51,200 it'll be color channels height width. How would we do that with permute? Let's give it a shot. 1914 03:20:51,840 --> 03:21:01,200 X permuted equals X original dot permute. And we're going to take the second dimension, 1915 03:21:01,200 --> 03:21:06,640 because this takes a series of dims here. So the second dimension is color channels. Remember, 1916 03:21:06,640 --> 03:21:13,200 zero, one, two. So two, we want two first, then we want the height, which is a zero. And then we 1917 03:21:13,200 --> 03:21:24,160 want the width, which is one. And now let's do this shifts, axis, zero to one, one to two, 1918 03:21:24,800 --> 03:21:35,360 and two to zero. So this is the order as well. This two maps to zero. This zero maps to the first 1919 03:21:35,360 --> 03:21:41,360 index. This one maps to this index. But that's enough talk about it. Let's see what it looks like. 1920 03:21:41,360 --> 03:21:51,840 So print, previous shape, X original dot shape. And then we go here, print new shape. This will 1921 03:21:51,840 --> 03:22:01,120 be the permuted version. We want X permuted dot shape. Let's see what this looks like. Wonderful. 1922 03:22:01,120 --> 03:22:06,000 That's exactly what we wanted. So you see, let's just write a little note here. Now this is 1923 03:22:06,000 --> 03:22:14,960 color channels, height, width. So the same data is going to be in both of these tenses. So X 1924 03:22:14,960 --> 03:22:20,320 original X permuted, it's just viewed from a different point of view. Because remember, a 1925 03:22:20,320 --> 03:22:26,480 permute is a view. And what did we discuss? A view shares the same memory as the original tensor. 1926 03:22:26,480 --> 03:22:32,880 So X permuted will share the same place in memory as X original, even though it's from a different 1927 03:22:32,880 --> 03:22:37,920 shape. So a little challenge before we move on to the next video for you, or before you move 1928 03:22:37,920 --> 03:22:46,080 on to the next video, try change one of the values in X original. Have a look at X original. 1929 03:22:46,080 --> 03:22:54,560 And see if that same value, it could be, let's get one of this zero, zero, get all of the dimensions 1930 03:22:54,560 --> 03:23:07,360 here, zero. See what that is? Or can we get a single value maybe? Oops. Oh, no, we'll need a zero 1931 03:23:07,360 --> 03:23:14,560 here, getting some practice on indexing here. Oh, zero, zero, zero. There we go. Okay, so maybe 1932 03:23:14,560 --> 03:23:22,000 we set that to some value, whatever you choose, and see if that changes in X permuted. So give 1933 03:23:22,000 --> 03:23:29,840 that a shot, and I'll see you in the next video. Welcome back. In the last video, we covered 1934 03:23:29,840 --> 03:23:36,480 squeezing, unsqueezing, and permuting, which I'm not going to lie, these concepts are quite a 1935 03:23:36,480 --> 03:23:41,600 lot to take in, but just so you're aware of them. Remember, what are they working towards? They're 1936 03:23:41,600 --> 03:23:46,640 helping us fix shape and dimension issues with our tensors, which is one of the most common 1937 03:23:46,640 --> 03:23:51,680 issues in deep learning and neural networks. And I usually do the little challenge of changing a 1938 03:23:51,680 --> 03:23:58,480 value of X original to highlight the fact that permute returns a different view of the original 1939 03:23:58,480 --> 03:24:04,960 tensor. And a view in PyTorch shares memory with that original tensor. So if we change the value 1940 03:24:04,960 --> 03:24:12,880 at zero, zero, zero of X original to, in my case, 728218, it happens the same value gets copied across 1941 03:24:12,880 --> 03:24:20,240 to X permuted. So with that being said, we looked at selecting data from tensors here, and this is 1942 03:24:20,240 --> 03:24:25,520 using a technique called indexing. So let's just rehash that, because this is another thing that 1943 03:24:25,520 --> 03:24:30,560 can be a little bit of a hurdle when first working with multi dimensional tensors. So let's see how 1944 03:24:30,560 --> 03:24:38,480 we can select data from tensors with indexing. So if you've ever done indexing, indexing, 1945 03:24:39,840 --> 03:24:46,400 with PyTorch is similar to indexing with NumPy. If you've ever worked with NumPy, 1946 03:24:46,400 --> 03:24:51,760 and you've done indexing, selecting data from arrays, NumPy uses an array as its main data type, 1947 03:24:51,760 --> 03:24:57,680 PyTorch uses tensors. It's very similar. So let's again start by creating a tensor. 1948 03:24:58,560 --> 03:25:04,960 And again, I'm just going to add a few code cells here, so I can make my screen right in the middle. 1949 03:25:04,960 --> 03:25:10,720 Now we're going to import torch. Again, we don't need to import torch all the time, 1950 03:25:10,720 --> 03:25:18,720 just so you can run the notebook from here later on. X equals torch dot. Let's create a range again, 1951 03:25:18,720 --> 03:25:24,320 just nice and simple. This is how I like to work out the fundamentals too, is just create the small 1952 03:25:24,320 --> 03:25:30,000 range, reshape it, and the reshape has to be compatible with the original dimension. So we go 1953 03:25:30,000 --> 03:25:35,840 one, three, three, and why is this because torch a range is going to return us nine values, because 1954 03:25:35,840 --> 03:25:43,120 it's from the start here to the end minus one, and then one times three times three is what is 1955 03:25:43,120 --> 03:25:53,200 nine. So let's have a look x x dot shape. Beautiful. So we have one, two, three, four, five, six, 1956 03:25:53,200 --> 03:25:59,760 seven, eight, nine of size one. So we have this is the outer bracket here, which is going to contain 1957 03:25:59,760 --> 03:26:09,920 all of this. And then we have three, which is this one here, one, two, three. And then we have three, 1958 03:26:09,920 --> 03:26:19,600 which is one, two, three. Now let's work with this. Let's index on our new tensor. So let's see what 1959 03:26:19,600 --> 03:26:29,760 happens when we get x zero, this is going to index on the first bracket. So we get this one here. So 1960 03:26:29,760 --> 03:26:35,280 we've indexed on the first dimension here, the zero dimension on this one here, which is why we get 1961 03:26:35,280 --> 03:26:47,120 what's inside here. And then let's try again, let's index on the middle bracket. So dimension 1962 03:26:47,120 --> 03:26:56,400 one. So we got to go x, and then zero, and then zero. Let's see what happens there. Now is this the 1963 03:26:56,400 --> 03:27:04,960 same as going x zero, zero? It is, there we go. So it depends on what you want to use. Sometimes 1964 03:27:04,960 --> 03:27:10,880 I prefer to go like this. So I know that I'm getting the first bracket, and then the zeroth 1965 03:27:10,880 --> 03:27:15,440 version of that first bracket. So then we have these three values here. Now what do you think 1966 03:27:15,440 --> 03:27:20,960 what's going to happen if we index on third dimension or the second dimension here? Well, 1967 03:27:20,960 --> 03:27:29,760 let's find out. So let's index on the most in our bracket, which is last dimension. 1968 03:27:31,120 --> 03:27:38,480 So we have x zero, zero, zero. What numbers is going to give us back of x zero, 1969 03:27:39,280 --> 03:27:44,160 on the zero dimension gives us back this middle tensor. And then if x zero, zero gives us back 1970 03:27:44,160 --> 03:27:51,040 the zeroth index of the middle tensor. If we go x zero, zero, zero is going to give us the zeroth 1971 03:27:52,000 --> 03:27:59,840 tensor, the zeroth index, and the zeroth element. A lot to take in there. But what we've done is 1972 03:27:59,840 --> 03:28:06,880 we've just broken it down step by step. We've got this first zero targets this outer bracket 1973 03:28:06,880 --> 03:28:14,800 and returns us all of this. And then zero, zero targets this first because of this first zero, 1974 03:28:14,800 --> 03:28:21,520 and then the zero here targets this. And then if we go zero, zero, zero, we target this, 1975 03:28:22,080 --> 03:28:27,760 then we target this, and then we get this back because we are getting the zeroth index here. 1976 03:28:27,760 --> 03:28:34,400 So if we change this to one, what do we get back? Two. And if we change these all to one, 1977 03:28:34,400 --> 03:28:42,640 what will we get? This is a bit of trivia here, or a challenge. So we're going one, one, one. 1978 03:28:45,520 --> 03:28:50,320 Let's see what happens. Oh, no, did you catch that before I ran the code? I did that one quite 1979 03:28:50,320 --> 03:28:56,080 quickly. We have index one is out of bounds. Why is that? Well, because this dimension is only one 1980 03:28:56,080 --> 03:29:00,640 here. So we can only index on the zero. That's where it gets a little bit confusing because this 1981 03:29:00,640 --> 03:29:05,200 says one, but because it's only got zero dimension, we can only index on the zero if to mention. But 1982 03:29:05,200 --> 03:29:13,760 what if we do 011? What does that give us? Five. Beautiful. So I'd like to issue you the challenge 1983 03:29:13,760 --> 03:29:20,240 of how about getting number nine? How would you get number nine? So rearrange this code to get 1984 03:29:20,240 --> 03:29:24,880 number nine. That's your challenge. Now, I just want to show you as well, is you can use, 1985 03:29:24,880 --> 03:29:37,600 you can also use, you might see this, the semicolon to select all of a target dimension. So let's say 1986 03:29:37,600 --> 03:29:45,360 we wanted to get all of the zeroth dimension, but the zero element from that. We can get 123. 1987 03:29:46,000 --> 03:29:51,040 And then let's say we want to say get all values of the zeroth and first dimensions, 1988 03:29:51,040 --> 03:29:58,080 but only index one of the second dimension. Oh, that was a mouthful. But get all values of 1989 03:29:58,080 --> 03:30:06,720 zeroth and first dimensions, but only index one of second dimension. So let's break this 1990 03:30:06,720 --> 03:30:14,880 down step by step. We want all values of zeroth and first dimensions, but only index one of the 1991 03:30:14,880 --> 03:30:22,720 second dimension. We press enter, shift enter, 258. So what did we get there? 258. Okay. So we've 1992 03:30:22,720 --> 03:30:30,160 got all elements of the zeroth and first dimension, but then so which will return us this thing here. 1993 03:30:30,160 --> 03:30:37,920 But then we only want 258, which is the first element here of the second dimension, which is 1994 03:30:37,920 --> 03:30:43,840 this three there. So quite confusing. But with some practice, you can figure out how to select 1995 03:30:43,840 --> 03:30:49,280 almost any numbers you want from any kind of tensor that you have. So now let's try again, 1996 03:30:49,280 --> 03:30:59,520 get all values of the zero dimension, but only the one index value of the first and second 1997 03:30:59,520 --> 03:31:04,560 dimension. So what might this look like? Let's break it down again. So we come down here x, 1998 03:31:05,120 --> 03:31:09,520 and we're going to go all values of the zero dimension because zero comes first. And then we 1999 03:31:09,520 --> 03:31:15,040 want only the one index value of the first and only the one index value of the second. 2000 03:31:15,680 --> 03:31:20,560 What is this going to give us five? Oh, we selected the middle tensor. So really, 2001 03:31:20,560 --> 03:31:27,520 this line of code is exactly the same as this line of code here, except we've got the square 2002 03:31:27,520 --> 03:31:33,040 brackets on the outside here, because we've got this semicolon there. So if we change this to a zero, 2003 03:31:34,560 --> 03:31:38,720 we remove that. But because we've got the semicolon there, we've selected all the 2004 03:31:38,720 --> 03:31:45,040 dimensions. So we get back the square bracket there, something to keep in mind. Finally, 2005 03:31:45,040 --> 03:31:57,760 let's just go one more. So get index zero of zero and first dimension, and all values of second 2006 03:31:57,760 --> 03:32:06,640 dimension. So x zero, zero. So zero, the index of zero and first dimension, zero, zero, 2007 03:32:06,640 --> 03:32:11,520 and all values of the second dimension. What have we just done here? We've got tensor one, 2008 03:32:11,520 --> 03:32:19,680 two, three, lovely. This code again is equivalent to what we've done up here. This has a semicolon 2009 03:32:19,680 --> 03:32:24,960 on the end. But what this line explicitly says without the semicolon is, hey, give us all the 2010 03:32:24,960 --> 03:32:30,480 values on the remaining dimension there. So my challenge for you is to take this tensor that we 2011 03:32:30,480 --> 03:32:42,000 have got here and index on it to return nine. So I'll write down here, index on x to return nine. 2012 03:32:42,000 --> 03:32:54,160 So if you have a look at x, as well as index on x to return three, six, nine. So these values 2013 03:32:54,160 --> 03:33:02,160 here. So give those both a go and I'll see you in the next video. Welcome back. How'd you go? 2014 03:33:02,160 --> 03:33:07,360 Did you give the challenge ago? I finished the last video with issuing the challenge to index on 2015 03:33:07,360 --> 03:33:13,600 x to return nine and index on x to return three, six, nine. Now here's what I came up with. Again, 2016 03:33:13,600 --> 03:33:16,960 there's a few different ways that you could approach both of these. But this is just what 2017 03:33:16,960 --> 03:33:25,760 I've found. So because x is one, three, three of size, well, that's his dimensions. If we want to 2018 03:33:25,760 --> 03:33:31,760 select nine, we need zero, which is this first outer bracket to get all of these elements. And 2019 03:33:31,760 --> 03:33:37,520 then we need two to select this bottom one here. And then we need this final two to select the 2020 03:33:37,520 --> 03:33:43,520 second dimension of this bottom one here. And then for three, six, nine, we need all of the 2021 03:33:43,520 --> 03:33:47,840 elements in the first dimension, all of the in the zeroth dimension, all of the elements in the 2022 03:33:47,840 --> 03:33:56,080 first dimension. And then we get two, which is this three, six, nine set up here. So that's how I 2023 03:33:56,080 --> 03:34:00,560 would practice indexing, start with whatever shape tensor you like, create it something like this, 2024 03:34:00,560 --> 03:34:05,920 and then see how you can write different indexing to select whatever number you pick. 2025 03:34:05,920 --> 03:34:18,160 So now let's move on to the next part, which is PyTorch tensors and NumPy. So NumPy is a 2026 03:34:18,160 --> 03:34:25,440 popular scientific, very popular. PyTorch actually requires NumPy when you install PyTorch. Popular 2027 03:34:25,440 --> 03:34:37,120 scientific Python numerical computing library, that's a bit of a mouthful. And because of this, 2028 03:34:37,120 --> 03:34:46,880 PyTorch has functionality to interact with it. So quite often, you might start off with, 2029 03:34:46,880 --> 03:34:52,320 let's change this into Markdown, you might start off with your data, because it's numerical format, 2030 03:34:52,320 --> 03:35:03,600 you might start off with data in NumPy, NumPy array, want in PyTorch tensor. Because your 2031 03:35:03,600 --> 03:35:07,680 data might be represented by NumPy because it started in NumPy, but say you want to do 2032 03:35:07,680 --> 03:35:12,320 some deep learning on it and you want to leverage PyTorch's deep learning capabilities, 2033 03:35:12,320 --> 03:35:17,360 well, you might want to change your data from NumPy to a PyTorch tensor. And PyTorch has a 2034 03:35:17,360 --> 03:35:26,320 method to do this, which is torch from NumPy, which will take in an ND array, which is NumPy's 2035 03:35:26,320 --> 03:35:31,840 main data type, and change it into a torch tensor. We'll see this in a second. And then if you want 2036 03:35:31,840 --> 03:35:38,560 to go from PyTorch tensor to NumPy because you want to use some sort of NumPy method, 2037 03:35:38,560 --> 03:35:47,200 well, the method to do this is torch dot tensor, and you can call dot NumPy on it. But this is all 2038 03:35:47,200 --> 03:35:55,600 just talking about in words, let's see it in action. So NumPy array to tensor. Let's try this out 2039 03:35:55,600 --> 03:36:04,560 first. So we'll import torch so we can run this cell on its own, and then import NumPy as np, 2040 03:36:04,560 --> 03:36:10,400 the common naming convention for NumPy, we're going to create an array in NumPy. And we're 2041 03:36:10,400 --> 03:36:18,960 going to just put one to eight, a range. And then we're going to go tensor equals torch from NumPy 2042 03:36:20,240 --> 03:36:26,320 because we want to go from NumPy array to a torch tensor. So we use from NumPy, and then we pass 2043 03:36:26,320 --> 03:36:35,040 in array, and then we have array and tensor. Wonderful. So there's our NumPy array, and our torch 2044 03:36:35,040 --> 03:36:41,600 tensor with the same data. But what you might notice here is that the D type for the tensor is 2045 03:36:41,600 --> 03:36:49,280 torch dot float 64. Now why is this? It's because NumPy's default data type. Oh, D type 2046 03:36:49,280 --> 03:36:57,840 is float 64. Whereas tensor, what have we discussed before? What's pytorch's default data type? 2047 03:36:58,560 --> 03:37:05,440 float 64. Well, that's not pytorch's default data type. If we were to create torch, a range, 2048 03:37:06,000 --> 03:37:10,560 1.0 to 8.0, by default, pytorch is going to create it in 2049 03:37:10,560 --> 03:37:21,520 float 32. So just be aware of that. If you are going from NumPy to pytorch, the default NumPy 2050 03:37:21,520 --> 03:37:28,720 data type is float 64. And pytorch reflects that data type when you use the from NumPy method. 2051 03:37:28,720 --> 03:37:36,240 I wonder if there's a D type. Can we go D type equals torch dot float 32? Takes no keyword. 2052 03:37:36,240 --> 03:37:43,040 Okay. But how could we change the data type here? Well, we could go type torch float 32. 2053 03:37:44,800 --> 03:37:52,400 Yeah, that will give us a tensor D type of float 32 instead of float 64. Beautiful. I'll just keep 2054 03:37:52,400 --> 03:38:06,320 that there so you know, warning when converting from NumPy pytorch, pytorch reflects NumPy's 2055 03:38:06,320 --> 03:38:17,920 default data type of float 64, unless specified. Otherwise, because what have we discussed, 2056 03:38:17,920 --> 03:38:24,560 when you're trying to perform certain calculations, you might run into a data type issue. So you might 2057 03:38:24,560 --> 03:38:32,720 need to convert the type from float 64 to float 32. Now, let's see what happens. What do you think 2058 03:38:32,720 --> 03:38:40,240 will happen if we change the array? We change the value of an array. Well, let's find out. 2059 03:38:40,240 --> 03:38:52,080 So change the value of array. The question is, what will this do to tensor? Because we've used 2060 03:38:52,080 --> 03:38:58,000 the from NumPy method, do you think if we change the array, the tensor will change? So let's try 2061 03:38:58,000 --> 03:39:06,800 this array equals array plus one. So we're just adding one to every value in the array. Now, 2062 03:39:06,800 --> 03:39:15,520 what is the array and the tensor going to look like? Uh huh. So array, we only change the first 2063 03:39:15,520 --> 03:39:21,520 value there. Oh, sorry, we change every value because we have one to seven. Now it's two, three, 2064 03:39:21,520 --> 03:39:26,000 four, five, six, seven, eight. We change the value from the array. It doesn't change the 2065 03:39:26,000 --> 03:39:32,240 value of the tensor. So that's just something to keep in mind. If you use from NumPy, we get 2066 03:39:32,240 --> 03:39:37,120 a new tensor in memory here. So the original, the new tensor doesn't change if you change the 2067 03:39:37,120 --> 03:39:43,360 original array. So now let's go from tensor to NumPy. If you wanted to go back to NumPy, 2068 03:39:43,360 --> 03:39:49,440 tensor to NumPy array. So we'll start with a tensor. We could use the one we have right now, 2069 03:39:49,440 --> 03:39:52,880 but we're going to create another one, but we'll create one of ones just for fun. 2070 03:39:53,680 --> 03:40:01,600 One rhymes with fun. NumPy tensor equals. How do we go to NumPy? Well, we have 2071 03:40:01,600 --> 03:40:08,480 torch dot tensor dot NumPy. So we just simply call NumPy on here. And then we have tensor 2072 03:40:08,480 --> 03:40:14,080 and NumPy tensor. What data type do you think the NumPy tensor is going to have? 2073 03:40:14,080 --> 03:40:19,040 Because we've returned it to NumPy. Pi torches, default data type is 2074 03:40:21,360 --> 03:40:26,560 Flight 32. So if we change that to NumPy, what's going to be the D type of the NumPy tensor? 2075 03:40:26,560 --> 03:40:36,800 NumPy tensor dot D type. It reflects the original D type of what you set the tensor as. So just 2076 03:40:36,800 --> 03:40:41,360 keep that in mind. If you're going between PyTorch and NumPy, default data type of NumPy is 2077 03:40:41,360 --> 03:40:47,120 float 64, whereas the default data type of PyTorch is float 32. So that may cause some errors if 2078 03:40:47,120 --> 03:40:51,600 you're doing different kinds of calculations. Now, what do you think is going to happen if we 2079 03:40:51,600 --> 03:40:58,800 went from our tensor to an array, if we change the tensor, change the tensor, what happens to 2080 03:41:01,760 --> 03:41:11,280 NumPy tensor? So we get tensor equals tensor plus one. And then we go NumPy tensor. 2081 03:41:11,920 --> 03:41:19,280 Oh, we'll get tensor as well. So our tensor is now all twos because we added one to the ones. 2082 03:41:19,280 --> 03:41:24,960 But our NumPy tensor remains the same. Remains unchanged. So this means they don't share memory. 2083 03:41:24,960 --> 03:41:31,600 So that's how we go in between PyTorch and NumPy. If you'd like to look up more, I'd encourage 2084 03:41:31,600 --> 03:41:40,160 you to go PyTorch and NumPy. So warm up NumPy, beginner. There's a fair few tutorials here on 2085 03:41:40,160 --> 03:41:45,840 PyTorch because NumPy is so prevalent, they work pretty well together. So have a look at that. 2086 03:41:45,840 --> 03:41:50,080 There's a lot going on there. There's a few more links, I'd encourage you to check out, 2087 03:41:50,080 --> 03:41:54,800 but we've covered some of the main ones that you'll see in practice. With that being said, 2088 03:41:54,800 --> 03:42:00,800 let's now jump into the next video where we're going to have a look at the concept of reproducibility. 2089 03:42:00,800 --> 03:42:05,200 If you'd like to look that up, I'd encourage you to search PyTorch's reproducibility and see 2090 03:42:05,200 --> 03:42:12,880 what you can find. Otherwise, I'll see you in the next video. Welcome back. It's now time for us 2091 03:42:12,880 --> 03:42:19,600 to cover the topic of reproducibility. If I could even spell it, that would be fantastic. 2092 03:42:19,600 --> 03:42:30,480 Reproducibility. Trying to take the random out of random. So we've touched upon the concept of 2093 03:42:30,480 --> 03:42:35,040 neural networks harnessing the power of randomness. And what I mean by that is we haven't actually 2094 03:42:35,040 --> 03:42:40,320 built our own neural network yet, but we will be doing that. And we've created tenses full of random 2095 03:42:40,320 --> 03:42:53,600 values. And so in short, how our neural network learns is start with random numbers, perform tensor 2096 03:42:53,600 --> 03:43:05,760 operations, update random numbers to try and make them better representations of the data. Again, 2097 03:43:05,760 --> 03:43:18,400 again, again, again, again. However, if you're trying to do reproducible experiments, sometimes 2098 03:43:18,400 --> 03:43:22,880 you don't want so much randomness. And what I mean by this is if we were creating random tensors, 2099 03:43:23,440 --> 03:43:28,080 from what we've seen so far is that every time we create a random tensor, let's create one here, 2100 03:43:28,080 --> 03:43:36,320 torch dot rand, and we'll create it of three three. Every time we run this cell, it gives us new numbers. 2101 03:43:36,320 --> 03:43:43,920 So 7 7 5 2. There we go. Rand again. Right. So we get a whole bunch of random numbers here. 2102 03:43:45,040 --> 03:43:50,160 Every single time. But what if you were trying to share this notebook with a friend, 2103 03:43:50,160 --> 03:43:55,840 so say you went up share and you clicked the share link and you sent that to someone and you're like, 2104 03:43:55,840 --> 03:44:00,400 hey, try out this machine learning experiment I did. And you wanted a little less randomness 2105 03:44:00,400 --> 03:44:06,640 because neural networks start with random numbers. How might you do that? Well, let's 2106 03:44:06,640 --> 03:44:20,560 this write down to reduce the randomness in neural networks. And pytorch comes the concept of a 2107 03:44:20,560 --> 03:44:27,840 random seed. So we're going to see this in action. But essentially, let's write this down, 2108 03:44:27,840 --> 03:44:41,840 essentially what the random seed does is flavor the randomness. So because of how computers work, 2109 03:44:41,840 --> 03:44:46,720 they're actually not true randomness. And actually, there's arguments against this, 2110 03:44:46,720 --> 03:44:50,640 and it's quite a big debate in the computer science topic, whatnot, but I am not a computer 2111 03:44:50,640 --> 03:44:56,160 scientist, I am a machine learning engineer. So computers are fundamentally deterministic. 2112 03:44:56,160 --> 03:45:01,360 It means they run the same steps over and over again. So what the randomness we're doing here 2113 03:45:01,360 --> 03:45:06,160 is referred to as pseudo randomness or generated randomness. And the random seed, 2114 03:45:06,160 --> 03:45:11,760 which is what you see a lot in machine learning experiments, flavors that randomness. So let's 2115 03:45:11,760 --> 03:45:16,000 see it in practice. And at the end of this video, I'll give you two resources that I'd recommend 2116 03:45:16,000 --> 03:45:21,680 to learn a little bit more about the concept of pseudo randomness and reproducibility in pytorch. 2117 03:45:22,240 --> 03:45:28,240 Let's start by importing torch so you could start this notebook right from here. Create two random 2118 03:45:28,240 --> 03:45:38,960 tensors. We'll just call this random tensor a equals torch dot rand and we'll go three four 2119 03:45:38,960 --> 03:45:48,480 and we'll go random tensor b equals torch dot rand same size three four. And then if we have a 2120 03:45:48,480 --> 03:45:59,600 look at let's go print random tensor a print random tensor b. And then let's print to see if 2121 03:45:59,600 --> 03:46:08,480 they're equal anywhere random tensor a equals equals equals random tensor b. Now what do you 2122 03:46:08,480 --> 03:46:15,520 think this is going to do? If we have a look at one equals one, what does it return? True. 2123 03:46:16,240 --> 03:46:21,200 So this is comparison operator to compare two different tensors. We're creating two random 2124 03:46:21,200 --> 03:46:25,280 tensors here. We're going to have a look at them. We'd expect them to be full of random values. 2125 03:46:25,280 --> 03:46:29,680 Do you think any of the values in each of these random tensors is going to be equal to each other? 2126 03:46:31,280 --> 03:46:36,320 Well, there is a chance that they are, but it's highly unlikely. I'll be quite surprised if they are. 2127 03:46:36,320 --> 03:46:43,600 Oh, again, my connection might be a little bit. Oh, there we go. Beautiful. So we have tensor a 2128 03:46:44,240 --> 03:46:51,440 tensor of three four with random numbers. And we have tensor b of three four with random numbers. 2129 03:46:51,440 --> 03:46:55,920 So if we were, if I was to share this notebook with my friend or my colleague or even you, 2130 03:46:56,480 --> 03:47:00,880 if you ran this cell, you are going to get random numbers as well. And you have every chance of 2131 03:47:00,880 --> 03:47:05,760 replicating one of these numbers. But again, it's highly unlikely. So again, I'm getting that 2132 03:47:05,760 --> 03:47:10,480 automatic save failed. You might get that if your internet connection is dropping out, maybe that's 2133 03:47:10,480 --> 03:47:15,360 something going on with my internet connection. But again, as we've seen, usually this resolves 2134 03:47:15,360 --> 03:47:20,640 itself. If you try a few times, I'll just keep coding. If it really doesn't resolve itself, 2135 03:47:20,640 --> 03:47:26,800 you can go file is a download notebook or save a copy and drive download. You can download the 2136 03:47:26,800 --> 03:47:32,800 notebook, save it to your local machine, re upload it to upload notebook and start again in another 2137 03:47:32,800 --> 03:47:38,080 Google Colab instance. But there we go. It fixed itself. Wonderful troubleshooting on the fly. 2138 03:47:38,880 --> 03:47:45,920 So the way we make these reproducible is through the concept of a random seed. So let's have a 2139 03:47:45,920 --> 03:47:58,560 look at that. Let's make some random, but reproducible tenses. So import torch. And we're going to 2140 03:47:58,560 --> 03:48:13,840 set the random seed by going torch dot manual seed random. Oh, we don't have random set yet. 2141 03:48:14,480 --> 03:48:20,720 I'm going to set my random seed. You set the random seed to some numerical value. 42 is a common 2142 03:48:20,720 --> 03:48:26,320 one. You might see zero. You might see one, two, three, four. Essentially, you can set it to whatever 2143 03:48:26,320 --> 03:48:33,280 you want. And each of these, you can think of 77, 100, as different flavors of randomness. So 2144 03:48:33,280 --> 03:48:39,680 I like to use 42, because it's the answer to the universe. And then we go random seed. And now 2145 03:48:39,680 --> 03:48:50,720 let's create some random tenses. Random tensor C with the flavor of our random seed. Three, 2146 03:48:50,720 --> 03:48:59,360 four. And then we're going to go torch tensor D equals torch dot rand three, four. Now, let's 2147 03:48:59,360 --> 03:49:10,800 see what happens. We'll print out random tensor C. And we'll print out random tensor D. And then 2148 03:49:10,800 --> 03:49:20,880 we'll print out to see if they're equal anywhere. Random tensor C equals random tensor D. So let's 2149 03:49:20,880 --> 03:49:32,000 find out what happens. Huh, what gives? Well, we've got randomness. We set the random seed. We're 2150 03:49:32,000 --> 03:49:42,640 telling pytorch a flavor our randomness with 42 torch manual seed. Hmm, let's try set the manual 2151 03:49:42,640 --> 03:49:52,240 seed each time we call a random method. We go there. Ah, much better. So now we've got some 2152 03:49:52,240 --> 03:49:59,600 flavored randomness. So a thing to keep in mind is that if you want to use the torch manual seed, 2153 03:49:59,600 --> 03:50:06,320 generally it only works for one block of code if you're using a notebook. So that's just 2154 03:50:06,320 --> 03:50:10,160 something to keep in mind. If you're creating random tensors, one after the other, we're using 2155 03:50:10,160 --> 03:50:15,280 assignment like this, you should use torch dot manual seed every time you want to call the rand 2156 03:50:15,280 --> 03:50:20,800 method or some sort of randomness. However, if we're using other torch processes, usually what 2157 03:50:20,800 --> 03:50:25,520 you might see is torch manual seed is set right at the start of a cell. And then a whole bunch 2158 03:50:25,520 --> 03:50:31,200 of code is done down here. But because we're calling subsequent methods here, we have to reset 2159 03:50:31,200 --> 03:50:36,720 the random seed. Otherwise, if we don't do this, we comment this line, it's going to flavor the 2160 03:50:36,720 --> 03:50:42,640 randomness of torch random tensor C with torch manual seed. But then random tensor D is just 2161 03:50:42,640 --> 03:50:48,800 going to have no flavor. It's not going to use a random seed. So we reset it there. Wonderful. 2162 03:50:48,800 --> 03:50:56,400 So I wonder, does this have a seed method? Let's go torch dot rand. Does this have seed? 2163 03:50:57,040 --> 03:51:02,480 Sometimes they have a seed method. Seed, no, it doesn't. Okay, that's all right. 2164 03:51:03,440 --> 03:51:08,000 The more you learn, but there's documentation for torch dot rand. And I said that I was going to 2165 03:51:08,000 --> 03:51:14,080 link at the end of this video. So the manual seed is a way to, or the random seed, but in 2166 03:51:14,080 --> 03:51:19,680 torch, it's called a manual seed is a way to flavor the randomness. So these numbers, as you see, 2167 03:51:19,680 --> 03:51:24,880 are still quite random. But the random seed just makes them reproducible. So if I was to share this 2168 03:51:24,880 --> 03:51:28,720 with you, if you had to run this block of code, ideally, you're going to get the same numerical 2169 03:51:28,720 --> 03:51:35,360 output here. So with that being said, I'd like to refer to you to the pie torch reproducibility 2170 03:51:35,360 --> 03:51:40,240 document, because we've only quite scratched the surface of this of reproducibility. We've covered 2171 03:51:40,240 --> 03:51:48,400 one of the main ones. But this is a great document on how to go through reproducibility in pie torch. 2172 03:51:48,400 --> 03:51:53,040 So this is your extra curriculum for this, even if you don't understand what's going on in a lot 2173 03:51:53,040 --> 03:51:58,320 of the code here, just be aware of reproducibility, because it's an important topic in machine 2174 03:51:58,320 --> 03:52:06,240 learning and deep learning. So I'll put this here, extra resources for reproducibility. 2175 03:52:06,240 --> 03:52:14,640 As we go pie torch randomness, we'll change this into markdown. And then finally, the concept 2176 03:52:14,640 --> 03:52:22,320 of a random seed is Wikipedia random seed. So random seeds quite a universal concept, 2177 03:52:22,320 --> 03:52:27,280 not just for pie torch, there's a random seed and NumPy as well. So if you'd like to see what 2178 03:52:27,280 --> 03:52:33,120 this means, yeah, initialize a pseudo random number generator. So that's a big word, pseudo random 2179 03:52:33,120 --> 03:52:38,160 number generator. But if you'd like to learn about more random number generation computing, 2180 03:52:38,160 --> 03:52:43,440 and what a random seed does is I'd refer to you to check out this documentation here. 2181 03:52:44,720 --> 03:52:50,400 Whoo, far out, we have covered a lot. But there's a couple more topics you should really be aware 2182 03:52:50,400 --> 03:52:55,600 of to finish off the pie torch fundamentals. You got this. I'll see you in the next video. 2183 03:52:55,600 --> 03:53:04,400 Welcome back. Now, let's talk about the important concept of running tenses or pie 2184 03:53:04,400 --> 03:53:17,440 torch objects. So running tenses and pie torch objects on GPUs and making faster computations. 2185 03:53:17,440 --> 03:53:26,480 So we've discussed that GPUs, let me just scroll down a little bit here, GPUs equal faster 2186 03:53:26,480 --> 03:53:40,560 computation on numbers. Thanks to CUDA plus NVIDIA hardware plus pie torch working behind the 2187 03:53:40,560 --> 03:53:50,400 scenes to make everything hunky dory. Good. That's what hunky dory means, by the way, 2188 03:53:50,400 --> 03:53:55,920 if you never heard that before. So let's have a look at how we do this. Now, we first need to 2189 03:53:55,920 --> 03:54:02,960 talk about, let's go here one getting a GPU. There's a few different ways we've seen one before. 2190 03:54:02,960 --> 03:54:12,000 Number one easiest is to use what we're using right now. Use Google Colab for a free GPU. 2191 03:54:13,360 --> 03:54:18,880 But there's also Google Colab Pro. And I think there might even be, let's look up Google Colab 2192 03:54:19,520 --> 03:54:25,840 Pro. Choose the best that's right for you. I use Google Colab Pro because I use it almost every day. 2193 03:54:25,840 --> 03:54:32,320 So yeah, I pay for Colab Pro. You can use Colab for free, which is might be what you're using. 2194 03:54:32,320 --> 03:54:38,800 There's also Colab Pro Plus, which has a lot more advantages as well. But Colab Pro is giving me 2195 03:54:38,800 --> 03:54:45,040 faster GPUs, so access to faster GPUs, which means you spend less time waiting while your code is running. 2196 03:54:45,040 --> 03:54:50,400 More memory, longer run time, so it'll last a bit longer if you leave it running idle. 2197 03:54:50,400 --> 03:54:55,760 And then Colab Pro again is a step up from that. I personally haven't had a need yet to use 2198 03:54:55,760 --> 03:55:01,760 Google Colab Pro Plus. You can complete this whole course on the free tier as well. But as you start 2199 03:55:01,760 --> 03:55:06,160 to code more, as you start to run bigger models, as you start to want to compute more, you might 2200 03:55:06,160 --> 03:55:14,400 want to look into something like Google Colab Pro. Or let's go here. Options to upgrade as well. 2201 03:55:14,400 --> 03:55:26,800 And then another way is use your own GPU. Now this takes a little bit of setup and requires 2202 03:55:26,800 --> 03:55:39,920 the investment of purchasing a GPU. There's lots of options. So one of my favorite posts for 2203 03:55:39,920 --> 03:55:50,640 getting a GPU is, yeah, the best GPUs for deep learning in 2020, or something like this. 2204 03:55:51,200 --> 03:56:00,240 What do we got? Deep learning? Tim Detmos. This is, yeah, which GPUs to get for deep learning? 2205 03:56:00,240 --> 03:56:07,120 Now, I believe at the time of this video, I think it's been updated since this date. So don't take 2206 03:56:07,120 --> 03:56:14,640 my word for it. But this is a fantastic blog post for figuring out what GPUs see this post 2207 03:56:14,640 --> 03:56:27,840 for what option to get. And then number three is use cloud computing. So such as 2208 03:56:28,800 --> 03:56:33,840 GCP, which is Google Cloud Platform AWS, which is Amazon Web Services or Azure. 2209 03:56:33,840 --> 03:56:45,120 These services, which is Azure is by Microsoft, allow you to rent computers on the cloud and access 2210 03:56:45,120 --> 03:56:51,600 them. So the first option using Google Colab, which is what we're using is by far the easiest 2211 03:56:51,600 --> 03:56:57,440 and free. So there's big advantages there. However, the downside is that you have to use a website 2212 03:56:57,440 --> 03:57:01,680 here, Google Colab, you can't run it locally. You don't get the benefit of using cloud computing, 2213 03:57:01,680 --> 03:57:07,680 but my personal workflow is I run basically all of my small scale experiments and things like 2214 03:57:07,680 --> 03:57:13,280 learning new stuff in Google Colab. And then if I want to upgrade things, run video experiments, 2215 03:57:13,280 --> 03:57:18,800 I have my own dedicated deep learning PC, which I have built with a big powerful GPU. And then 2216 03:57:18,800 --> 03:57:25,200 also I use cloud computing if necessary. So that's my workflow. Start with Google Colab. 2217 03:57:25,200 --> 03:57:30,160 And then these two, if I need to do some larger experiments. But because this is the beginning 2218 03:57:30,160 --> 03:57:34,880 of course, we can just stick with Google Colab for the time being. But I thought I'd make you aware 2219 03:57:34,880 --> 03:57:44,640 of these other two options. And if you'd like to set up a GPU, so four, two, three, PyTorch plus 2220 03:57:44,640 --> 03:57:56,000 GPU drivers, which is CUDA takes a little bit of setting up to do this, refer to PyTorch 2221 03:57:56,000 --> 03:58:07,200 setup documentation. So if we go to pytorch.org, they have some great setup guides here, 2222 03:58:07,200 --> 03:58:12,720 get started. And we have start locally. This is if you want to run on your local machine, 2223 03:58:12,720 --> 03:58:18,400 such as a Linux setup. This is what I have Linux CUDA 11.3. It's going to give you a 2224 03:58:18,400 --> 03:58:26,080 conda install command to use conda. And then if you want to use cloud partners, which is Alibaba 2225 03:58:26,080 --> 03:58:31,360 Cloud, Amazon Web Services, Google Cloud Platform, this is where you'll want to go. So I'll just link 2226 03:58:31,360 --> 03:58:38,560 this in here. But for this course, we're going to be focusing on using Google Colab. So now, 2227 03:58:38,560 --> 03:58:43,920 let's see how we might get a GPU in Google Colab. And we've already covered this, but I'm going to 2228 03:58:43,920 --> 03:58:50,640 recover it just so you know. We're going to change the runtime type. You can go in any notebook and 2229 03:58:50,640 --> 03:58:58,880 do this, runtime type, hardware accelerator, we can select GPU, click save. Now this is going to 2230 03:58:58,880 --> 03:59:07,680 restart our runtime and connect us to our runtime, aka a Google compute instance with a GPU. And so 2231 03:59:07,680 --> 03:59:18,400 now if we run NVIDIA SMI, I have a Tesla P100 GPU. So let's look at this Tesla P100 2232 03:59:21,360 --> 03:59:28,240 GPU. Do we have an image? Yeah, so this is the GPU that I've got running, not the Tesla car, 2233 03:59:28,240 --> 03:59:35,120 the GPU. So this is quite a powerful GPU. That is because I have upgraded to Colab Pro. Now, 2234 03:59:35,120 --> 03:59:40,560 if you're not using Colab Pro, you might get something like a Tesla K80, which is a slightly 2235 03:59:40,560 --> 03:59:48,240 less powerful GPU than a Tesla P100, but still a GPU nonetheless and will still work faster than 2236 03:59:48,240 --> 03:59:53,840 just running PyTorch code on the pure CPU, which is the default in Google Colab and the default 2237 03:59:53,840 --> 04:00:02,880 in PyTorch. And so now we can also check to see if we have GPU access with PyTorch. So let's go 2238 04:00:02,880 --> 04:00:11,600 here. This is number two now. Check for GPU access with PyTorch. So this is a little command that's 2239 04:00:11,600 --> 04:00:20,480 going to allow us or tell us if PyTorch, just having the GPU here, this is by the way, another 2240 04:00:20,480 --> 04:00:27,440 thing that Colab has a good setup with, is that all the connections between PyTorch and the NVIDIA 2241 04:00:27,440 --> 04:00:34,640 GPU are set up for us. Whereas when you set it up on your own GPU or using cloud computing, 2242 04:00:34,640 --> 04:00:38,560 there are a few steps you have to go through, which we're not going to cover in this course. 2243 04:00:38,560 --> 04:00:42,480 I'd highly recommend you go through the getting started locally set up if you want to do that, 2244 04:00:43,040 --> 04:00:49,440 to connect PyTorch to your own GPU. So let's check for the GPU access with PyTorch. 2245 04:00:49,440 --> 04:00:58,800 This is another advantage of using Google Colab. Almost zero set up to get started. So import 2246 04:00:58,800 --> 04:01:08,960 torch and then we're going to go torch dot cuda dot is available. And remember, cuda is 2247 04:01:08,960 --> 04:01:16,800 NVIDIA's programming interface that allows us to use GPUs for numerical computing. There we go, 2248 04:01:16,800 --> 04:01:22,240 beautiful. So big advantage of Google Colab is we get access to a free GPU. In my case, I'm paying 2249 04:01:22,240 --> 04:01:26,800 for the faster GPU, but in your case, you're more than welcome to use the free version. 2250 04:01:26,800 --> 04:01:34,400 All that means it'll be slightly slower than a faster GPU here. And we now have access to GPUs 2251 04:01:34,400 --> 04:01:44,560 with PyTorch. So there is one more thing known as device agnostic code. So set up device agnostic 2252 04:01:44,560 --> 04:01:50,320 code. Now, this is an important concept in PyTorch because wherever you run PyTorch, you might not 2253 04:01:50,320 --> 04:01:57,760 always have access to a GPU. But if there was access to a GPU, you'd like it to use it if it's 2254 04:01:57,760 --> 04:02:05,280 available. So one of the ways that this is done in PyTorch is to set the device variable. Now, 2255 04:02:05,280 --> 04:02:09,520 really, you could set this to any variable you want, but you're going to see it used as device 2256 04:02:09,520 --> 04:02:21,280 quite often. So cuda if torch dot cuda is available. Else CPU. So all this is going to say, and we'll 2257 04:02:21,280 --> 04:02:29,280 see where we use the device variable later on is set the device to use cuda if it's available. So 2258 04:02:29,280 --> 04:02:35,040 it is so true. If it's not available, if we don't have access to a GPU that PyTorch can use, 2259 04:02:35,040 --> 04:02:41,120 just default to the CPU. So with that being said, there's one more thing. You can also count the 2260 04:02:41,120 --> 04:02:45,520 number of GPUs. So this won't really apply to us for now because we're just going to stick with 2261 04:02:45,520 --> 04:02:51,040 using one GPU. But as you upgrade your PyTorch experiments and machine learning experiments, 2262 04:02:51,040 --> 04:02:55,360 you might have access to more than one GPU. So you can also count the devices here. 2263 04:02:57,200 --> 04:03:02,640 We have access to one GPU, which is this here. So the reason why you might want to count the number 2264 04:03:02,640 --> 04:03:08,960 of devices is because if you're running huge models on large data sets, you might want to run one 2265 04:03:08,960 --> 04:03:16,080 model on a certain GPU, another model on another GPU, and so on and so on. But final thing before 2266 04:03:16,080 --> 04:03:24,720 we finish this video is if we go PyTorch device agnostic code, cuda semantics, there's a little 2267 04:03:24,720 --> 04:03:31,120 section in here called best practices. This is basically what we just covered there is setting 2268 04:03:31,120 --> 04:03:37,600 the device argument. Now this is using the arg pass, but so yeah, there we go. args.device, 2269 04:03:37,600 --> 04:03:44,320 torch.device, cuda, args.device, torch.device, CPU. So this is one way to set it from the Python 2270 04:03:45,120 --> 04:03:50,720 arguments when you're running scripts, but we're using the version of running it through a notebook. 2271 04:03:51,760 --> 04:03:57,040 So check this out. I'll just link this here, device agnostic code. It's okay if you're not sure 2272 04:03:57,040 --> 04:04:01,040 of what's going on here. We're going to cover it a little bit more later on throughout the course, 2273 04:04:01,680 --> 04:04:12,640 but right here for PyTorch, since it's capable of running compute on the GPU or CPU, 2274 04:04:12,640 --> 04:04:27,840 it's best practice to set up device agnostic code, e.g. run on GPU if available, 2275 04:04:29,360 --> 04:04:37,440 else default to CPU. So check out the best practices for using cuda, which is namely setting up 2276 04:04:37,440 --> 04:04:44,000 device agnostic code. And let's in the next video, see what I mean about setting our PyTorch tensors 2277 04:04:44,000 --> 04:04:51,840 and objects to the target device. Welcome back. In the last video, we checked out a few different 2278 04:04:51,840 --> 04:04:58,480 options for getting a GPU, and then getting PyTorch to run on the GPU. And for now we're using 2279 04:04:58,480 --> 04:05:04,240 Google Colab, which is the easiest way to get set up because it gives us free access to a GPU, 2280 04:05:04,240 --> 04:05:11,040 faster ones if you set up with Colab Pro, and it comes with PyTorch automatically set up to 2281 04:05:11,040 --> 04:05:20,000 use the GPU if it's available. So now let's see how we can actually use the GPU. So to do so, 2282 04:05:20,000 --> 04:05:33,120 we'll look at putting tensors and models on the GPU. So the reason we want our tensors slash models 2283 04:05:33,120 --> 04:05:43,040 on the GPU is because using GPU results in faster computations. And if we're getting our machine 2284 04:05:43,040 --> 04:05:48,080 learning models to find patterns and numbers, GPUs are great at doing numerical calculations. 2285 04:05:48,080 --> 04:05:52,800 And the numerical calculations we're going to be doing are tensor operations like we saw above. 2286 04:05:53,520 --> 04:05:59,360 So the tensor operations, well, we've covered a lot. Somewhere here, tensor operations, 2287 04:05:59,360 --> 04:06:04,160 there we go, manipulating tensor operations. So if we can run these computations faster, 2288 04:06:04,160 --> 04:06:10,080 we can discover patterns in our data faster, we can do more experiments, and we can work towards 2289 04:06:10,080 --> 04:06:15,280 finding the best possible model for whatever problem that we're working on. So let's see, 2290 04:06:15,920 --> 04:06:21,840 we'll create a tensor, as usual, create a tensor. Now the default is on the CPU. 2291 04:06:21,840 --> 04:06:30,160 So tensor equals torch dot tensor. And we'll just make it a nice simple one, one, two, three. 2292 04:06:30,720 --> 04:06:38,480 And let's write here, tensor not on GPU will print out tensor. And this is where we can use, 2293 04:06:39,440 --> 04:06:47,040 we saw this parameter before device. Can we pass it in here? Device equals CPU. 2294 04:06:47,040 --> 04:06:54,880 Let's see what this comes out with. There we go. So if we print it out, tensor 123 is on the CPU. 2295 04:06:54,880 --> 04:07:02,560 But even if we got rid of that device parameter, by default, it's going to be on the CPU. Wonderful. 2296 04:07:02,560 --> 04:07:08,880 So now PyTorch makes it quite easy to move things to, and I'm saying to for a reason, 2297 04:07:08,880 --> 04:07:18,320 to the GPU, or to, even better, the target device. So if the GPU is available, we use CUDA. 2298 04:07:18,320 --> 04:07:23,840 If it's not, it uses CPU. This is why we set up the device variable. So let's see, 2299 04:07:24,560 --> 04:07:28,080 move tensor to GPU. If available, 2300 04:07:28,080 --> 04:07:40,240 tensor on GPU equals tensor dot two device. Now let's have a look at this, tensor on GPU. 2301 04:07:43,520 --> 04:07:48,960 So this is going to shift the tensor that we created up here to the target device. 2302 04:07:50,720 --> 04:07:57,040 Wonderful. Look at that. So now our tensor 123 is on device CUDA zero. Now this is the index of 2303 04:07:57,040 --> 04:08:02,240 the GPU that we're using, because we only have one, it's going to be at index zero. So later on, 2304 04:08:02,240 --> 04:08:06,960 when you start to do bigger experiments and work with multiple GPUs, you might have different tensors 2305 04:08:06,960 --> 04:08:12,640 that are stored on different GPUs. But for now, we're just sticking with one GPU, keeping it nice 2306 04:08:12,640 --> 04:08:18,960 and simple. And so you might have a case where you want to move, oh, actually, the reason why we 2307 04:08:18,960 --> 04:08:25,680 set up device agnostic code is again, this code would work if we run this, regardless if we had, 2308 04:08:25,680 --> 04:08:32,240 so it won't error out. But regardless if we had a GPU or not, this code will work. So whatever device 2309 04:08:32,240 --> 04:08:38,400 we have access to, whether it's only a CPU or whether it's a GPU, this tensor will move to whatever 2310 04:08:38,400 --> 04:08:44,480 target device. But since we have a GPU available, it goes there. You'll see this a lot. This two 2311 04:08:44,480 --> 04:08:50,240 method moves tensors and it can be also used for models. We're going to see that later on. So just 2312 04:08:50,240 --> 04:08:58,080 keep two device in mind. And then you might want to, for some computations, such as using NumPy, 2313 04:08:58,080 --> 04:09:06,320 NumPy only works with the CPU. So you might want to move tensors back to the CPU, moving tensors back 2314 04:09:06,320 --> 04:09:13,280 to the CPU. So can you guess how we might do that? It's okay if you don't know. We haven't covered a 2315 04:09:13,280 --> 04:09:17,760 lot of things, but I'm going to challenge you anyway, because that's the fun part of thinking 2316 04:09:17,760 --> 04:09:27,200 about something. So let's see how we can do it. Let's write down if tensor is on GPU, can't transform 2317 04:09:27,200 --> 04:09:33,840 it to NumPy. So let's see what happens if we take our tensor on the GPU and try to go NumPy. 2318 04:09:34,640 --> 04:09:39,760 What happens? Well, we get an error. So this is another huge error. Remember the top three 2319 04:09:39,760 --> 04:09:44,640 errors in deep learning or pytorch? There's lots of them, but number one, shape errors, 2320 04:09:44,640 --> 04:09:51,520 number two, data type issues. And with pytorch, number three is device issues. So can't convert 2321 04:09:51,520 --> 04:09:58,400 CUDA zero device type tensor to NumPy. So NumPy doesn't work with the GPU. Use tensor dot CPU 2322 04:09:58,400 --> 04:10:04,320 to copy the tensor to host memory first. So if we call tensor dot CPU, it's going to bring our 2323 04:10:04,320 --> 04:10:10,320 target tensor back to the CPU. And then we should be able to use it with NumPy. So 2324 04:10:10,320 --> 04:10:26,480 to fix the GPU tensor with NumPy issue, we can first set it to the CPU. So tensor back on CPU 2325 04:10:27,680 --> 04:10:34,480 equals tensor on GPU dot CPU. We're just taking what this said here. That's a beautiful thing 2326 04:10:34,480 --> 04:10:39,280 about pytorch is very helpful error messages. And then we're going to go NumPy. 2327 04:10:39,280 --> 04:10:45,520 And then if we go tensor back on CPU, is this going to work? Let's have a look. Oh, of course, 2328 04:10:45,520 --> 04:10:53,280 it's not because I typed it wrong. And I've typed it again twice. Third time, third time's a charm. 2329 04:10:54,320 --> 04:11:00,880 There we go. Okay, so that works because we've put it back to the CPU first before calling NumPy. 2330 04:11:00,880 --> 04:11:07,520 And then if we refer back to our tensor on the GPU, because we've reassociated this, again, 2331 04:11:07,520 --> 04:11:14,640 we've got typos galore classic, because we've reassigned tensor back on CPU, our tensor on 2332 04:11:14,640 --> 04:11:22,080 GPU remains unchanged. So that's the four main things about working with pytorch on the GPU. 2333 04:11:22,080 --> 04:11:26,560 There are a few more tidbits such as multiple GPUs, but now you've got the fundamentals. We're 2334 04:11:26,560 --> 04:11:30,400 going to stick with using one GPU. And if you'd like to later on once you've learned a bit more 2335 04:11:30,400 --> 04:11:36,080 research into multiple GPUs, well, as you might have guessed, pytorch has functionality for that too. 2336 04:11:36,080 --> 04:11:42,320 So have a go at getting access to a GPU using colab, check to see if it's available, set up device 2337 04:11:42,320 --> 04:11:48,000 agnostic code, create a few dummy tensors and just set them to different devices, see what happens 2338 04:11:48,000 --> 04:11:53,360 if you change the device parameter, run a few errors by trying to do some NumPy calculations 2339 04:11:53,360 --> 04:11:58,480 with tensors on the GPU, and then bring those tensors on the GPU back to NumPy and see what happens 2340 04:11:58,480 --> 04:12:05,920 there. So I think we've covered, I think we've reached the end of the fundamentals. We've covered 2341 04:12:05,920 --> 04:12:10,960 a fair bit. Introduction to tensors, the minmax, a whole bunch of stuff inside the introduction 2342 04:12:10,960 --> 04:12:16,880 to tensors, finding the positional minmax, reshaping, indexing, working with tensors and NumPy, 2343 04:12:16,880 --> 04:12:24,720 reproducibility, using a GPU and moving stuff back to the GPU far out. Now you're probably wondering, 2344 04:12:24,720 --> 04:12:29,680 Daniel, we've covered a whole bunch. What should I do to practice all this? Well, I'm glad you asked. 2345 04:12:29,680 --> 04:12:35,840 Let's cover that in the next video. Welcome back. And you should be very proud of your 2346 04:12:35,840 --> 04:12:41,120 self right now. We've been through a lot, but we've covered a whole bunch of PyTorch fundamentals. 2347 04:12:41,120 --> 04:12:45,040 These are going to be the building blocks that we use throughout the rest of the course. 2348 04:12:45,680 --> 04:12:51,760 But before moving on to the next section, I'd encourage you to try out what you've learned 2349 04:12:51,760 --> 04:12:59,680 through the exercises and extra curriculum. Now, I've set up a few exercises here based off 2350 04:12:59,680 --> 04:13:05,120 everything that we've covered. If you go into learn pytorch.io, go to the section that we're 2351 04:13:05,120 --> 04:13:09,680 currently on. This is going to be the case for every section, by the way. So just keep this in mind, 2352 04:13:10,240 --> 04:13:15,360 is we're working on PyTorch fundamentals. Now, if you go to the PyTorch fundamentals notebook, 2353 04:13:15,360 --> 04:13:20,000 this is going to refresh, but that if you scroll down to the table of contents at the bottom of 2354 04:13:20,000 --> 04:13:26,000 each one is going to be some exercises and extra curriculum. So these exercises here, 2355 04:13:26,560 --> 04:13:31,440 such as documentation reading, because a lot you've seen me refer to the PyTorch documentation 2356 04:13:31,440 --> 04:13:36,720 for almost everything we've covered a lot, but it's important to become familiar with that. 2357 04:13:36,720 --> 04:13:42,000 So exercise number one is read some of the documentation. Exercise number two is create a 2358 04:13:42,000 --> 04:13:48,160 random tensor with shape, seven, seven. Three, perform a matrix multiplication on the tensor from two 2359 04:13:48,160 --> 04:13:53,520 with another random tensor. So these exercises are all based off what we've covered here. 2360 04:13:53,520 --> 04:13:59,680 So I'd encourage you to reference what we've covered in whichever notebook you choose, 2361 04:13:59,680 --> 04:14:04,240 could be this learn pytorch.io, could be going back through the one we've just coded together 2362 04:14:04,240 --> 04:14:15,440 in the video. So I'm going to link this here, exercises, see exercises for this notebook here. 2363 04:14:16,880 --> 04:14:23,680 So then how should you approach these exercises? So one way would be to just read them here, 2364 04:14:23,680 --> 04:14:32,240 and then in collab we'll go file new notebook, wait for the notebook to load. Then you could call this 2365 04:14:32,240 --> 04:14:40,240 zero zero pytorch exercises or something like that, and then you could start off by importing 2366 04:14:40,240 --> 04:14:46,080 torch, and then away you go. For me, I'd probably set this up on one side of the screen, this one 2367 04:14:46,080 --> 04:14:51,760 up on the other side of the screen, and then I just have the exercises here. So number one, 2368 04:14:51,760 --> 04:14:56,080 I'm not going to really write much code for that, but you could have documentation reading here. 2369 04:14:57,440 --> 04:15:02,800 And then so this encourages you to read through torch.tensor and go through there 2370 04:15:04,000 --> 04:15:08,720 for 10 minutes or so. And then for the other ones, we've got create a random tensor with shape 2371 04:15:08,720 --> 04:15:17,120 seven seven. So we just comment that out. So torch, round seven seven, and there we go. 2372 04:15:17,120 --> 04:15:22,160 Some are as easy as that. Some are a little bit more complex. As we go throughout the course, 2373 04:15:22,160 --> 04:15:25,520 these exercises are going to get a little bit more in depth as we've learned more. 2374 04:15:26,560 --> 04:15:32,480 But if you'd like an exercise template, you can come back to the GitHub. This is the home for all 2375 04:15:32,480 --> 04:15:38,880 of the course materials. You can go into extras and then exercises. I've created templates for 2376 04:15:38,880 --> 04:15:46,480 each of the exercises. So pytorch fundamentals exercises. If you open this up, this is a template 2377 04:15:46,480 --> 04:15:51,600 for all of the exercises. So you see there, create a random tensor with shape seven seven. 2378 04:15:51,600 --> 04:15:55,840 These are all just headings. And if you'd like to open this in CoLab and work on it, 2379 04:15:55,840 --> 04:16:02,400 how can you do that? Well, you can copy this link here. Come to Google CoLab. We'll go file, 2380 04:16:03,040 --> 04:16:11,600 open notebook, GitHub. You can type in the link there. Click search. What's this going to do? 2381 04:16:11,600 --> 04:16:17,360 Boom. Pytorch fundamentals exercises. So now you can go through all of the exercises. This 2382 04:16:17,360 --> 04:16:23,920 will be the same for every module on the course and test your knowledge. Now it is open book. You 2383 04:16:23,920 --> 04:16:30,800 can use the notebook here, the ones that we've coded together. But I would encourage you to try 2384 04:16:30,800 --> 04:16:35,520 to do these things on your own first. If you get stuck, you can always reference back. And then 2385 04:16:35,520 --> 04:16:41,280 if you'd like to see an example solutions, you can go back to the extras. There's a solutions folder 2386 04:16:41,280 --> 04:16:46,320 as well. And that's where the solutions live. So the fundamental exercise solutions. But again, 2387 04:16:46,320 --> 04:16:52,480 I would encourage you to try these out, at least give them a go before having a look at the solutions. 2388 04:16:53,360 --> 04:16:58,240 So just keep that in mind at the end of every module, there's exercises and extra curriculum. 2389 04:16:58,240 --> 04:17:03,360 The exercises will be code based. The extra curriculum is usually like reading based. 2390 04:17:03,360 --> 04:17:07,520 So spend one hour going through the Pytorch basics tutorial. I recommend the quick start 2391 04:17:07,520 --> 04:17:12,160 in tensor sections. And then finally to learn more on how a tensor can represent data, 2392 04:17:12,160 --> 04:17:17,520 watch the video what's a tensor which we referred to throughout this. But massive effort on finishing 2393 04:17:17,520 --> 04:17:29,760 the Pytorch fundamentals section. I'll see you in the next section. Friends, welcome back to 2394 04:17:31,760 --> 04:17:36,240 the Pytorch workflow module. Now let's have a look at what we're going to get into. 2395 04:17:36,240 --> 04:17:43,920 So this is a Pytorch workflow. And I say a because it's one of many. When you get into 2396 04:17:43,920 --> 04:17:47,360 deep learning machine learning, you'll find that there's a fair few ways to do things. But here's 2397 04:17:47,360 --> 04:17:51,760 the rough outline of what we're going to do. We're going to get our data ready and turn it into 2398 04:17:51,760 --> 04:17:56,800 tensors because remember a tensor can represent almost any kind of data. We're going to pick or 2399 04:17:56,800 --> 04:18:00,880 build or pick a pre-trained model. We'll pick a loss function and optimize it. Don't worry if 2400 04:18:00,880 --> 04:18:03,920 you don't know what they are. We're going to cover this. We're going to build a training loop, 2401 04:18:03,920 --> 04:18:09,200 fit the model to make a prediction. So fit the model to the data that we have. We'll learn how 2402 04:18:09,200 --> 04:18:14,400 to evaluate our models. We'll see how we can improve through experimentation and we'll save 2403 04:18:14,400 --> 04:18:19,280 and reload our trained model. So if you wanted to export your model from a notebook and use it 2404 04:18:19,280 --> 04:18:25,520 somewhere else, this is what you want to be doing. And so where can you get help? Probably the most 2405 04:18:25,520 --> 04:18:29,520 important thing is to follow along with the code. We'll be coding all of this together. 2406 04:18:29,520 --> 04:18:35,840 Remember model number one. If and out, run the code. Try it for yourself. That's how I learn best. 2407 04:18:35,840 --> 04:18:40,800 Is I write code? I try it. I get it wrong. I try again and keep going until I get it right. 2408 04:18:41,760 --> 04:18:46,080 Read the doc string because that's going to show you some documentation about the functions that 2409 04:18:46,080 --> 04:18:51,200 we're using. So on a Mac, you can use shift command and space in Google Colab or if you're on a Windows 2410 04:18:51,200 --> 04:18:56,800 PC, it might be control here. If you're still stuck, try searching for it. You'll probably come 2411 04:18:56,800 --> 04:19:01,440 across resources such as stack overflow or the PyTorch documentation. We've already seen this 2412 04:19:01,440 --> 04:19:05,760 a whole bunch and we're probably going to see it a lot more throughout this entire course actually 2413 04:19:05,760 --> 04:19:11,120 because that's going to be the ground truth of everything PyTorch. Try again. And finally, 2414 04:19:11,120 --> 04:19:15,760 if you're still stuck, ask a question. So the best place to ask a question will be 2415 04:19:15,760 --> 04:19:20,000 at the PyTorch deep learning slash discussions tab. And then if we go to GitHub, 2416 04:19:20,640 --> 04:19:25,280 that's just under here. So Mr. Deeburg PyTorch deep learning. This is all the course materials. 2417 04:19:25,280 --> 04:19:30,960 We see here, this is your ground truth for the entire course. And then if you have a question, 2418 04:19:30,960 --> 04:19:36,160 go to the discussions tab, new discussion, you can ask a question there. And don't forget to 2419 04:19:36,160 --> 04:19:41,120 please put the video and the code that you're trying to run. That way we can reference 2420 04:19:41,120 --> 04:19:47,440 what's going on and help you out there. And also, don't forget, there is the book version of the 2421 04:19:47,440 --> 04:19:52,480 course. So learn pytorch.io. By the time you watch this video, it'll probably have all the chapters 2422 04:19:52,480 --> 04:19:56,960 here. But here's what we're working through. This is what the videos are based on. All of this, 2423 04:19:56,960 --> 04:20:00,720 we're going to go through all of this. How fun is that? But this is just reference material. 2424 04:20:00,720 --> 04:20:06,880 So you can read this at your own time. We're going to focus on coding together. And speaking of coding. 2425 04:20:09,840 --> 04:20:12,160 Let's code. I'll see you over at Google Colab. 2426 04:20:14,080 --> 04:20:21,840 Oh, right. Well, let's get hands on with some code. I'm going to come over to colab.research.google.com. 2427 04:20:21,840 --> 04:20:28,000 You may already have that bookmark. And I'm going to start a new notebook. So we're going to do 2428 04:20:28,000 --> 04:20:33,760 everything from scratch here. We'll let this load up. I'm just going to zoom in a little bit. 2429 04:20:35,360 --> 04:20:44,160 Beautiful. And now I'm going to title this 01 pytorch workflow. And I'm going to put the video 2430 04:20:45,040 --> 04:20:50,080 ending on here so that you know that this notebook's from the video. Why is that? Because in the 2431 04:20:50,080 --> 04:20:54,480 course resources, we have the original notebook here, which is what this video notebook is going 2432 04:20:54,480 --> 04:20:59,440 to be based off. You can refer to this notebook as reference for what we're going to go through. 2433 04:20:59,440 --> 04:21:03,520 It's got a lot of pictures and beautiful text annotations. We're going to be focused on the 2434 04:21:03,520 --> 04:21:08,640 code in the videos. And then of course, you've got the book version of the notebook as well, 2435 04:21:08,640 --> 04:21:14,240 which is just a different formatted version of this exact same notebook. So I'm going to link 2436 04:21:14,240 --> 04:21:24,720 both of these up here. So let's write in here, pytorch workflow. And let's explore an example, 2437 04:21:25,520 --> 04:21:36,720 pytorch end to end workflow. And then I'm going to put the resources. So ground truth notebook. 2438 04:21:36,720 --> 04:21:44,400 We go here. And I'm also going to put the book version. 2439 04:21:44,400 --> 04:21:58,080 Book version of notebook. And finally, ask a question, which will be where at the discussions 2440 04:21:58,080 --> 04:22:04,560 page. Then we'll go there. Beautiful. Let's turn this into markdown. So let's get started. Let's 2441 04:22:04,560 --> 04:22:10,160 just jump right in and start what we're covering. So this is the trend I want to start getting 2442 04:22:10,160 --> 04:22:14,880 towards is rather than spending a whole bunch of time going through keynotes and slides, 2443 04:22:14,880 --> 04:22:19,920 I'd rather we just code together. And then we explain different things as they need to be 2444 04:22:19,920 --> 04:22:23,840 explained because that's what you're going to be doing if you end up writing a lot of pytorch is 2445 04:22:23,840 --> 04:22:29,440 you're going to be writing code and then looking things up as you go. So I'll get out of these 2446 04:22:29,440 --> 04:22:34,640 extra tabs. I don't think we need them. Just these two will be the most important. So what we're 2447 04:22:34,640 --> 04:22:38,560 covering, let's create a little dictionary so we can check this if we wanted to later on. 2448 04:22:39,200 --> 04:22:44,640 So referring to our pytorch workflows, at least the example one that we're going to go through, 2449 04:22:45,280 --> 04:22:51,840 which is just here. So we're going to go through all six of these steps, maybe a little bit of 2450 04:22:51,840 --> 04:22:57,360 each one, but just to see it going from this to this, that's what we're really focused on. And then 2451 04:22:57,360 --> 04:23:04,240 we're going to go through through rest the course like really dig deep into all of these. So what 2452 04:23:04,240 --> 04:23:10,320 we're covering number one is data preparing and loading. Number two is we're going to see how we 2453 04:23:10,320 --> 04:23:16,080 can build a machine learning model in pytorch or a deep learning model. And then we're going 2454 04:23:16,080 --> 04:23:23,120 to see how we're going to fit our model to the data. So this is called training. So fit is another 2455 04:23:23,120 --> 04:23:27,520 word. As I said in machine learning, there's a lot of different names for similar things, 2456 04:23:27,520 --> 04:23:32,880 kind of confusing, but you'll pick it up with time. So we're going to once we've trained a model, 2457 04:23:32,880 --> 04:23:37,440 we're going to see how we can make predictions and evaluate those predictions, 2458 04:23:37,440 --> 04:23:43,200 evaluating a model. If you make predictions, it's often referred to as inference. I typically 2459 04:23:43,200 --> 04:23:47,920 say making predictions, but inference is another very common term. And then we're going to look 2460 04:23:47,920 --> 04:23:54,480 at how we can save and load a model. And then we're going to put it all together. So a little bit 2461 04:23:54,480 --> 04:24:01,120 different from the visual version we have of the pytorch workflow. So if we go back to here, 2462 04:24:02,080 --> 04:24:08,480 I might zoom in a little. There we go. So we're going to focus on this one later on, 2463 04:24:08,480 --> 04:24:12,560 improve through experimentation. We're just going to focus on the getting data ready, 2464 04:24:12,560 --> 04:24:17,600 building a model, fitting the model, evaluating model, save and reload. So we'll see this one more, 2465 04:24:18,400 --> 04:24:21,680 like in depth later on, but I'll hint at different things that you can do 2466 04:24:21,680 --> 04:24:26,400 for this while we're working through this workflow. And so let's put that in here. 2467 04:24:26,960 --> 04:24:31,840 And then if we wanted to refer to this later, we can just go what we're covering. 2468 04:24:34,400 --> 04:24:39,360 Oh, this is going to connect, of course. Beautiful. So we can refer to this later on, 2469 04:24:39,360 --> 04:24:45,920 if we wanted to. And we're going to start by import torch. We're going to get pytorch ready 2470 04:24:45,920 --> 04:24:52,160 to go import nn. So I'll write a note here. And then we haven't seen this one before, but 2471 04:24:52,160 --> 04:24:56,240 we're going to see a few things that we haven't seen, but that's okay. We'll explain it as we go. 2472 04:24:56,240 --> 04:25:03,360 So nn contains all of pytorch's building blocks for neural networks. 2473 04:25:03,360 --> 04:25:10,160 And how would we learn more about torch nn? Well, if we just go torch.nn, here's how I'd 2474 04:25:10,160 --> 04:25:15,760 learn about it, pytorch documentation. Beautiful. Look at all these. These are the basic building 2475 04:25:15,760 --> 04:25:20,560 blocks for graphs. Now, when you see the word graph, it's referring to a computational graph, 2476 04:25:20,560 --> 04:25:24,320 which is in the case of neural networks, let's look up a photo of a neural network. 2477 04:25:24,320 --> 04:25:33,680 Images, this is a graph. So if you start from here, you're going to go towards the right. 2478 04:25:33,680 --> 04:25:38,560 There's going to be many different pictures. So yeah, this is a good one. Input layer. You have 2479 04:25:38,560 --> 04:25:45,360 a hidden layer, hidden layer to output layer. So torch and n comprises of a whole bunch of 2480 04:25:45,360 --> 04:25:50,720 different layers. So you can see layers, layers, layers. And each one of these, you can see input 2481 04:25:50,720 --> 04:25:57,040 layer, hidden layer one, hidden layer two. So it's our job as data scientists and machine 2482 04:25:57,040 --> 04:26:03,040 learning engineers to combine these torch dot nn building blocks to build things such as these. 2483 04:26:03,040 --> 04:26:08,800 Now, it might not be exactly like this, but that's the beauty of pytorch is that you can 2484 04:26:08,800 --> 04:26:13,440 combine these in almost any different way to build any kind of neural network you can imagine. 2485 04:26:14,640 --> 04:26:19,840 And so let's keep going. That's torch nn. We're going to get hands on with it, 2486 04:26:19,840 --> 04:26:24,880 rather than just talk about it. And we're going to need map plot lib because what's our other 2487 04:26:24,880 --> 04:26:31,200 motto? Our data explorers motto is visualize, visualize, visualize. And let's check our pytorch 2488 04:26:31,200 --> 04:26:37,840 version. Pytorch version torch dot version. So this is just to show you you'll need 2489 04:26:39,360 --> 04:26:46,240 at least this version. So 1.10 plus CUDA 111. That means that we've got CU stands for CUDA. 2490 04:26:46,240 --> 04:26:50,080 That means we've got access to CUDA. We don't have a GPU on this runtime yet, 2491 04:26:50,080 --> 04:26:54,640 because we haven't gone to GPU. We might do that later. 2492 04:26:56,320 --> 04:27:02,880 So if you have a version that's lower than this, say 1.8, 0.0, you'll want pytorch 1.10 at least. 2493 04:27:02,880 --> 04:27:08,000 If you have a version higher than this, your code should still work. But that's about enough 2494 04:27:08,000 --> 04:27:13,280 for this video. We've got our workflow ready to set up our notebook, our video notebook. 2495 04:27:13,280 --> 04:27:17,360 We've got the resources. We've got what we're covering. We've got our dependencies. 2496 04:27:17,360 --> 04:27:24,240 Let's in the next one get started on one data, preparing and loading. 2497 04:27:26,320 --> 04:27:27,360 I'll see you in the next video. 2498 04:27:29,760 --> 04:27:36,880 Let's now get on to the first step of our pytorch workflow. And that is data, preparing and loading. 2499 04:27:36,880 --> 04:27:43,680 Now, I want to stress data can be almost anything in machine learning. 2500 04:27:44,640 --> 04:27:50,640 I mean, you could have an Excel spreadsheet, which is rows and columns, 2501 04:27:51,440 --> 04:27:58,800 nice and formatted data. You could have images of any kind. You could have videos. I mean, 2502 04:27:58,800 --> 04:28:09,360 YouTube has lots of data. You could have audio like songs or podcasts. You could have even DNA 2503 04:28:09,360 --> 04:28:14,640 these days. Patents and DNA are starting to get discovered by machine learning. And then, of course, 2504 04:28:14,640 --> 04:28:20,720 you could have text like what we're writing here. And so what we're going to be focusing on 2505 04:28:20,720 --> 04:28:26,640 throughout this entire course is the fact that machine learning is a game of two parts. 2506 04:28:26,640 --> 04:28:41,920 So one, get data into a numerical representation to build a model to learn patterns in that 2507 04:28:41,920 --> 04:28:47,200 numerical representation. Of course, there's more around it. Yes, yes, yes. I understand you can 2508 04:28:47,200 --> 04:28:52,560 get as complex as you like, but these are the main two concepts. And machine learning, when I say 2509 04:28:52,560 --> 04:28:59,280 machine learning, saying goes for deep learning, you need some kind of, oh, number a call. Number 2510 04:28:59,280 --> 04:29:04,000 a call. I like that word, number a call representation. Then you want to build a model to learn patterns 2511 04:29:04,000 --> 04:29:11,520 in that numerical representation. And if you want, I've got a nice pretty picture that describes that 2512 04:29:11,520 --> 04:29:16,400 machine learning a game of two parts. Let's refer to our data. Remember, data can be almost 2513 04:29:16,400 --> 04:29:22,000 anything. These are our inputs. So the first step that we want to do is create some form 2514 04:29:22,000 --> 04:29:28,240 of numerical encoding in the form of tenses to represent these inputs, how this looks will be 2515 04:29:28,880 --> 04:29:33,840 dependent on the data, depending on the numerical encoding you choose to use. Then we're going to 2516 04:29:33,840 --> 04:29:38,880 build some sort of neural network to learn a representation, which is also referred to as 2517 04:29:38,880 --> 04:29:45,200 patterns features or weights within that numerical encoding. It's going to output that 2518 04:29:45,200 --> 04:29:50,560 representation. And then we want to do something without representation, such as in the case of 2519 04:29:50,560 --> 04:29:55,760 this, we're doing image recognition, image classification, is it a photo of Raman or spaghetti? 2520 04:29:55,760 --> 04:30:02,560 Is this tweet spam or not spam? Is this audio file saying what it says here? I'm not going to say 2521 04:30:02,560 --> 04:30:08,320 this because my audio assistant that's also named to this word here is close by and I don't want it 2522 04:30:08,320 --> 04:30:16,880 to go off. So this is our game of two parts. One here is convert our data into a numerical 2523 04:30:16,880 --> 04:30:23,040 representation. And two here is build a model or use a pre trained model to find patterns in 2524 04:30:23,040 --> 04:30:29,280 that numerical representation. And so we've got a little stationary picture here, turn data into 2525 04:30:29,280 --> 04:30:34,880 numbers, part two, build a model to learn patterns in numbers. So with that being said, 2526 04:30:34,880 --> 04:30:46,720 now let's create some data to showcase this. So to showcase this, let's create some known 2527 04:30:49,280 --> 04:30:56,000 data using the linear regression formula. Now, if you're not sure what linear regression is, 2528 04:30:56,000 --> 04:31:03,120 or the formula is, let's have a look linear regression formula. This is how I'd find it. 2529 04:31:03,120 --> 04:31:09,920 Okay, we have some fancy Greek letters here. But essentially, we have y equals a function of x 2530 04:31:09,920 --> 04:31:16,320 and b plus epsilon. Okay. Well, there we go. A linear regression line has the equation in the 2531 04:31:16,320 --> 04:31:20,960 form of y equals a plus bx. Oh, I like this one better. This is nice and simple. We're going to 2532 04:31:20,960 --> 04:31:27,760 start from as simple as possible and work up from there. So y equals a plus bx, where x is the 2533 04:31:27,760 --> 04:31:34,480 explanatory variable, and y is the dependent variable. The slope of the line is b. And the 2534 04:31:34,480 --> 04:31:41,440 slope is also known as the gradient. And a is the intercept. Okay, the value of when y 2535 04:31:42,160 --> 04:31:48,720 when x equals zero. Now, this is just text on a page. This is formula on a page. You know how I 2536 04:31:48,720 --> 04:31:59,520 like to learn things? Let's code it out. So let's write it here. We'll use a linear regression formula 2537 04:31:59,520 --> 04:32:08,400 to make a straight line with known parameters. I'm going to write this down because parameter 2538 04:32:10,160 --> 04:32:16,880 is a common word that you're going to hear in machine learning as well. So a parameter is 2539 04:32:16,880 --> 04:32:22,800 something that a model learns. So for our data set, if machine learning is a game of two parts, 2540 04:32:22,800 --> 04:32:27,760 we're going to start with this. Number one is going to be done for us, because we're going to 2541 04:32:27,760 --> 04:32:35,280 start with a known representation, a known data set. And then we want our model to learn that 2542 04:32:35,280 --> 04:32:40,000 representation. This is all just talk, Daniel, let's get into coding. Yes, you're right. You're 2543 04:32:40,000 --> 04:32:46,560 right. Let's do it. So create known parameters. So I'm going to use a little bit different 2544 04:32:46,560 --> 04:32:54,640 names to what that Google definition did. So weight is going to be 0.7 and bias is going to be 0.3. 2545 04:32:55,280 --> 04:33:00,720 Now weight and bias are another common two terms that you're going to hear in neural networks. 2546 04:33:01,440 --> 04:33:07,680 So just keep that in mind. But for us, this is going to be the equivalent of our weight will be B 2547 04:33:08,640 --> 04:33:15,120 and our bias will be A. But forget about this for the time being. Let's just focus on the code. 2548 04:33:15,120 --> 04:33:22,400 So we know these numbers. But we want to build a model that is able to estimate these numbers. 2549 04:33:23,600 --> 04:33:28,800 How? By looking at different examples. So let's create some data here. We're going to create a 2550 04:33:28,800 --> 04:33:34,640 range of numbers. Start equals zero and equals one. We're going to create some numbers between 2551 04:33:34,640 --> 04:33:41,040 zero and one. And they're going to have a gap. So the step the gap is going to be 0.02. 2552 04:33:41,040 --> 04:33:45,280 Now we're going to create an X variable. Why is X a capital here? 2553 04:33:47,200 --> 04:33:52,480 Well, it's because typically X in machine learning you'll find is a matrix or a tensor. 2554 04:33:52,480 --> 04:33:58,320 And if we remember back to the fundamentals, a capital represents a matrix or a tensor 2555 04:33:58,320 --> 04:34:03,040 and a lowercase represents a vector. But now case it's going to be a little confusing because 2556 04:34:03,040 --> 04:34:09,360 X is a vector. But later on, X will start to be a tensor and a matrix. So for now, 2557 04:34:09,360 --> 04:34:12,720 we'll just keep the capital, not capital notation. 2558 04:34:15,600 --> 04:34:24,160 We're going to create the formula here, which is remember how I said our weight is in this case, 2559 04:34:24,880 --> 04:34:31,920 the B and our bias is the A. So we've got the same formula here. Y equals weight times X plus 2560 04:34:31,920 --> 04:34:38,320 bias. Now let's have a look at these different numbers. So we'll view the first 10 of X and we'll 2561 04:34:38,320 --> 04:34:43,840 view the first 10 of Y. We'll have a look at the length of X and we'll have a look at the length of 2562 04:34:43,840 --> 04:34:55,280 Y. Wonderful. So we've got some values here. We've got 50 numbers of each. This is a little 2563 04:34:55,280 --> 04:34:59,600 confusing. Let's just view the first 10 of X and Y first. And then we can have a look at the 2564 04:34:59,600 --> 04:35:11,520 length here. So what we're going to be doing is building a model to learn some values, 2565 04:35:12,960 --> 04:35:20,640 to look at the X values here and learn what the associated Y value is and the relationship 2566 04:35:20,640 --> 04:35:25,760 between those. Of course, we know what the relationship is between X and Y because we've 2567 04:35:25,760 --> 04:35:32,400 coded this formula here. But you won't always know that in the wild. That is the whole premise of 2568 04:35:32,400 --> 04:35:38,160 machine learning. This is our ideal output and this is our input. The whole premise of machine 2569 04:35:38,160 --> 04:35:44,640 learning is to learn a representation of the input and how it maps to the output. So here are our 2570 04:35:44,640 --> 04:35:51,040 input numbers and these are our output numbers. And we know that the parameters of the weight and 2571 04:35:51,040 --> 04:35:55,760 bias are 0.7 and 0.3. We could have set these to whatever we want, by the way. I just like the 2572 04:35:55,760 --> 04:36:01,520 number 7 and 3. You could set these to 0.9, whatever, whatever. The premise would be the same. 2573 04:36:02,160 --> 04:36:06,080 So, oh, and what I've just done here, I kind of just coded this without talking. 2574 04:36:06,880 --> 04:36:14,160 But I just did torch a range and it starts at 0 and it ends at 1 and the step is 0.02. So there 2575 04:36:14,160 --> 04:36:22,640 we go, 000 by 0.02, 04. And I've unsqueezed it. So what does unsqueezed do? Removes the extra 2576 04:36:22,640 --> 04:36:29,120 dimensions. Oh, sorry, ads are extra dimension. You're getting confused here. So if we remove that, 2577 04:36:31,920 --> 04:36:37,120 we get no extra square bracket. But if we add unsqueeze, you'll see that we need this later on 2578 04:36:37,120 --> 04:36:42,960 for when we're doing models. Wonderful. So let's just leave it at that. That's enough for this 2579 04:36:42,960 --> 04:36:47,040 video, we've got some data to work with. Don't worry if this is a little bit confusing for now, 2580 04:36:47,040 --> 04:36:52,800 we're going to keep coding on and see what we can do to build a model to infer patterns in this 2581 04:36:52,800 --> 04:36:58,720 data. But right now, I want you to have a think, this is tensor data, but it's just numbers on a 2582 04:36:58,720 --> 04:37:05,440 page. What might be a better way to hint, this is a hint by the way, visualize it. What's our 2583 04:37:05,440 --> 04:37:12,800 data explorer's motto? Let's have a look at that in the next video. Welcome back. In the last 2584 04:37:12,800 --> 04:37:18,560 video, we created some numbers on a page using the linear regression formula with some known 2585 04:37:18,560 --> 04:37:22,880 parameters. Now, there's a lot going on here, but that's all right. We're going to keep building 2586 04:37:22,880 --> 04:37:28,160 upon what we've done and learn by doing. So in this video, we're going to cover one of the most 2587 04:37:28,160 --> 04:37:35,840 important concepts in machine learning in general. So splitting data into training and test sets. 2588 04:37:35,840 --> 04:37:45,200 One of the most important concepts in machine learning in general. Now, I know I've said this 2589 04:37:45,200 --> 04:37:52,880 already a few times. One of the most important concepts, but truly, this is possibly, in terms 2590 04:37:52,880 --> 04:37:58,240 of data, this is probably the number one thing that you need to be aware of. And if you've come 2591 04:37:58,240 --> 04:38:02,480 from a little bit of a machine learning background, you probably well and truly know all about this. 2592 04:38:02,480 --> 04:38:08,240 But we're going to recover it anyway. So let's jump in to some pretty pictures. Oh, look at that 2593 04:38:08,240 --> 04:38:13,120 one speaking of pretty pictures. But that's not what we're focused on now. We're looking at the 2594 04:38:13,120 --> 04:38:18,160 three data sets. And I've written down here possibly the most important concept in machine 2595 04:38:18,160 --> 04:38:23,840 learning, because it definitely is from a data perspective. So the course materials, 2596 04:38:24,560 --> 04:38:29,680 imagine you're at university. So this is going to be the training set. And then you have the 2597 04:38:29,680 --> 04:38:34,000 practice exam, which is the validation set. Then you have the final exam, which is the test set. 2598 04:38:34,640 --> 04:38:41,200 And the goal of all of this is for generalization. So let's step back. So say you're trying to learn 2599 04:38:41,200 --> 04:38:46,080 something at university or through this course, you might have all of the materials, which is your 2600 04:38:46,080 --> 04:38:54,640 training set. So this is where our model learns patterns from. And then to practice what you've 2601 04:38:54,640 --> 04:39:01,360 done, you might have a practice exam. So the mid semester exam or something like that. Now, 2602 04:39:01,360 --> 04:39:06,400 let's just see if you're learning the course materials well. So in the case of our model, 2603 04:39:06,400 --> 04:39:13,440 we might tune our model on this plastic exam. So we might find that on the validation set, 2604 04:39:14,000 --> 04:39:20,480 our model doesn't do too well. And we adjusted a bit, and then we retrain it, and then it does 2605 04:39:20,480 --> 04:39:27,120 better. Before finally, at the end of semester, the most important exam is your final exam. And 2606 04:39:27,120 --> 04:39:32,000 this is to see if you've gone through the entire course materials, and you've learned some things. 2607 04:39:32,000 --> 04:39:36,800 Now you can adapt to unseen material. And that's a big point here. We're going to see this in 2608 04:39:36,800 --> 04:39:44,000 practice is that when the model learns something on the course materials, it never sees the validation 2609 04:39:44,000 --> 04:39:51,520 set or the test set. So say we started with 100 data points, you might use 70 of those data points 2610 04:39:51,520 --> 04:39:57,600 for the training material. You might use 15% of those data points, so 15 for the practice. 2611 04:39:57,600 --> 04:40:03,440 And you might use 15 for the final exam. So this final exam is just like if you're at university 2612 04:40:03,440 --> 04:40:08,480 learning something is to see if, hey, have you learned any skills from this material at all? 2613 04:40:08,480 --> 04:40:15,120 Are you ready to go into the wild into the quote unquote real world? And so this final exam is to 2614 04:40:15,120 --> 04:40:22,640 test your model's generalization, because it's never seen this data is, let's define generalization 2615 04:40:22,640 --> 04:40:27,840 is the ability for a machine learning model or a deep learning model to perform well on data it 2616 04:40:27,840 --> 04:40:32,240 hasn't seen before, because that's our whole goal, right? We want to build a machine learning model 2617 04:40:32,240 --> 04:40:38,800 on some training data that we can deploy in our application or production setting. And then 2618 04:40:38,800 --> 04:40:44,320 more data comes in that it hasn't seen before. And it can make decisions based on that new data 2619 04:40:44,320 --> 04:40:48,480 because of the patterns it's learned in the training set. So just keep this in mind, 2620 04:40:48,480 --> 04:40:54,880 three data sets training validation test. And if we jump in to the learn pytorch book, 2621 04:40:54,880 --> 04:41:04,800 we've got split data. So we're going to create three sets. Or in our case, we're only going to 2622 04:41:04,800 --> 04:41:10,080 create two or training in a test. Why is that? Because you don't always need a validation set. 2623 04:41:10,720 --> 04:41:18,160 There is often a use case for a validation set. But the main two that are always used is the training 2624 04:41:18,160 --> 04:41:23,920 set and the testing set. And how much should you split? Well, usually for the training set, 2625 04:41:23,920 --> 04:41:27,920 you'll have 60 to 80% of your data. If you do create a validation set, you'll have somewhere 2626 04:41:27,920 --> 04:41:33,280 between 10 and 20. And if you do create a testing set, it's a similar split to the validation set, 2627 04:41:33,280 --> 04:41:40,080 you'll have between 10 and 20%. So training, always testing always validation often, but 2628 04:41:40,800 --> 04:41:46,320 not always. So with that being said, I'll let you refer to those materials if you want. But now 2629 04:41:46,320 --> 04:41:55,680 let's create a training and test set with our data. So we saw before that we have 50 points, 2630 04:41:55,680 --> 04:42:01,920 we have X and Y, we have one to one ratio. So one value of X relates to one value of Y. 2631 04:42:01,920 --> 04:42:08,880 And we know that the split now for the training set is 60 to 80%. And the test set is 10 to 20%. 2632 04:42:09,600 --> 04:42:14,640 So let's go with the upper bounds of each of these, 80% and 20%, which is a very common split, 2633 04:42:14,640 --> 04:42:25,120 actually 80, 20. So let's go create a train test split. And we're going to go train split. 2634 04:42:25,760 --> 04:42:32,880 We'll create a number here so we can see how much. So we want an integer of 0.8, which is 80% 2635 04:42:32,880 --> 04:42:39,280 of the length of X. What does that give us? Train split should be about 40 samples. Wonderful. 2636 04:42:39,280 --> 04:42:46,240 So we're going to create 40 samples of X and 40 samples of Y. Our model will train on those 40 2637 04:42:46,240 --> 04:42:54,080 samples to predict what? The other 10 samples. So let's see this in practice. So X train, 2638 04:42:55,280 --> 04:43:03,680 Y train equals X. And we're going to use indexing to get all of the samples up until the train 2639 04:43:03,680 --> 04:43:10,560 split. That's what this colon does here. So hey, X up until the train split, Y up until the train 2640 04:43:10,560 --> 04:43:17,120 split, and then for the testing. Oh, thanks for that. Auto correct cola, but didn't actually need that 2641 04:43:17,120 --> 04:43:25,520 one. X test. Y test equals X. And then we're going to get everything from the train split onwards. 2642 04:43:25,520 --> 04:43:36,280 So the index onwards, that's what this notation means here. And Y from the train split onwards as 2643 04:43:36,280 --> 04:43:43,520 well. Now, there are many different ways to create a train and test split. Ours is quite simple here, 2644 04:43:43,520 --> 04:43:48,640 but that's because we're working with quite a simple data set. One of the most popular methods 2645 04:43:48,640 --> 04:43:53,880 that I like is scikit learns train test split. We're going to see this one later on. It adds a 2646 04:43:53,880 --> 04:43:59,960 little bit of randomness into splitting your data. But that's for another video, just to make you 2647 04:43:59,960 --> 04:44:09,400 aware of it. So let's go length X train. We should have 40 training samples to 2648 04:44:09,400 --> 04:44:24,280 how many testing samples length X test and length Y test. Wonderful 40 40 10 10 because we have 2649 04:44:24,280 --> 04:44:31,160 training features, training labels, testing features, testing labels. So essentially what we've 2650 04:44:31,160 --> 04:44:37,560 created here is now a training set. We've split our data. Training set could also be referred to 2651 04:44:37,560 --> 04:44:43,160 as training split yet another example of where machine learning has different names for different 2652 04:44:43,160 --> 04:44:48,920 things. So set split same thing training split test split. This is what we've created. Remember, 2653 04:44:48,920 --> 04:44:53,880 the validation set is used often, but not always because our data set is quite simple. We're just 2654 04:44:53,880 --> 04:44:59,960 sticking with the necessities training and test. But keep this in mind. One of your biggest, 2655 04:45:00,520 --> 04:45:06,360 biggest, biggest hurdles in machine learning will be creating proper training and test sets. So 2656 04:45:06,360 --> 04:45:11,560 it's a very important concept. With that being said, I did issue the challenge in the last video 2657 04:45:11,560 --> 04:45:17,000 to visualize these numbers on a page. We haven't done that in this video. So let's move towards 2658 04:45:17,000 --> 04:45:22,920 that next. I'd like you to think of how could you make these more visual? Right. These are just 2659 04:45:22,920 --> 04:45:33,880 numbers on a page right now. Maybe that plot lib can help. Let's find out. Hey, hey, hey, welcome 2660 04:45:33,880 --> 04:45:40,360 back. In the last video, we split our data into training and test sets. And now later on, 2661 04:45:40,360 --> 04:45:44,680 we're going to be building a model to learn patterns in the training data to relate to the 2662 04:45:44,680 --> 04:45:50,120 testing data. But as I said, right now, our data is just numbers on a page. It's kind of 2663 04:45:50,120 --> 04:45:54,760 hard to understand. You might be able to understand this, but I prefer to get visual. So let's write 2664 04:45:54,760 --> 04:46:04,280 this down. How might we better visualize our data? And I'm put a capital here. So we're grammatically 2665 04:46:04,280 --> 04:46:17,960 correct. And this is where the data Explorers motto comes in. Visualize, visualize, visualize. 2666 04:46:18,680 --> 04:46:23,640 Ha ha. Right. So if ever you don't understand a concept, one of the best ways to start 2667 04:46:23,640 --> 04:46:29,400 understanding it more for me is to visualize it. So let's write a function to do just that. 2668 04:46:29,400 --> 04:46:34,600 We're going to call this plot predictions. We'll see why we call it this later on. That's the 2669 04:46:34,600 --> 04:46:39,080 benefit of making these videos is that I've got a plan for the future. Although it might seem 2670 04:46:39,080 --> 04:46:43,000 like I'm winging it, there is a little bit of behind the scenes happening here. So we'll have 2671 04:46:43,000 --> 04:46:51,000 the train data, which is our X train. And then we'll have the train labels, which is our Y train. 2672 04:46:51,000 --> 04:46:58,200 And we'll also have the test data. Yeah, that's a good idea. X test. And we'll also have the test 2673 04:46:58,200 --> 04:47:05,960 labels, equals Y test. Excuse me. I was looking at too many X's there. And then the predictions. 2674 04:47:05,960 --> 04:47:11,400 And we'll set this to none, because we don't have any predictions yet. But as you might have guessed, 2675 04:47:11,400 --> 04:47:16,440 we might have some later on. So we'll put a little doc string here, so that we're being nice and 2676 04:47:16,440 --> 04:47:26,120 Pythonic. So plots training data, test data, and compares predictions. Nice and simple. 2677 04:47:28,120 --> 04:47:33,880 Nothing too outlandish. And then we're going to create a figure. This is where map plot lib comes 2678 04:47:33,880 --> 04:47:41,640 in. Plot figure. And we'll go fig size equals 10, seven, which is my favorite hand in poker. 2679 04:47:41,640 --> 04:47:46,920 And we'll plot the training data in blue also happens to be a good dimension for a map plot. 2680 04:47:47,880 --> 04:47:54,760 Plot dot scatter. Train data. Creating a scatter plot here. We'll see what it does in a second. 2681 04:47:55,560 --> 04:48:00,840 Color. We're going to give this a color of B for blue. That's what C stands for in map plot lib 2682 04:48:00,840 --> 04:48:09,480 scatter. We'll go size equals four and label equals training data. Now, where could you find 2683 04:48:09,480 --> 04:48:14,440 information about this scatter function here? We've got command shift space. Is that going to 2684 04:48:14,440 --> 04:48:19,160 give us a little bit of a doc string? Or sometimes if command not space is not working, 2685 04:48:19,720 --> 04:48:24,040 you can also hover over this bracket. I think you can even hover over this. 2686 04:48:26,280 --> 04:48:32,760 There we go. But this is a little hard for me to read. Like it's there, but it's got a lot going 2687 04:48:32,760 --> 04:48:46,840 on. X, Y, S, C, C map. I just like to go map plot lib scatter. There we go. We've got a whole 2688 04:48:46,840 --> 04:48:52,040 bunch of information there. A little bit easier to read for me here. And then you can see some 2689 04:48:52,040 --> 04:48:58,680 examples. Beautiful. So now let's jump back into here. So in our function plot predictions, 2690 04:48:58,680 --> 04:49:03,720 we've taken some training data, test data. We've got the training data plotting in blue. What 2691 04:49:03,720 --> 04:49:10,200 color should we use for the testing data? How about green? I like that idea. Plot.scatter. 2692 04:49:10,840 --> 04:49:17,720 Test data. Green's my favorite color. What's your favorite color? C equals G. You might be 2693 04:49:17,720 --> 04:49:22,200 able to just plot it in your favorite color here. Just remember though, it'll be a little bit 2694 04:49:22,200 --> 04:49:26,840 different from the videos. And then we're going to call this testing data. So just the exact same 2695 04:49:26,840 --> 04:49:33,560 line is above, but with a different set of data. Now, let's check if there are predictions. So 2696 04:49:33,560 --> 04:49:44,120 are there predictions? So if predictions is not none, let's plot the predictions, plot the 2697 04:49:44,120 --> 04:49:58,840 predictions, if they exist. So plot scatter test data. And why are we plotting the test data? 2698 04:49:58,840 --> 04:50:03,960 Remember, what is our scatter function? Let's go back up to here. It takes in x and y. So 2699 04:50:04,920 --> 04:50:10,200 our predictions are going to be compared to the testing data labels. So that's the whole 2700 04:50:10,200 --> 04:50:14,680 game that we're playing here. We're going to train our model on the training data. 2701 04:50:15,320 --> 04:50:19,400 And then to evaluate it, we're going to get our model to predict the y values 2702 04:50:20,280 --> 04:50:28,120 as with the input of x test. And then to evaluate our model, we compare how good our models 2703 04:50:28,120 --> 04:50:35,320 predictions are. In other words, predictions versus the actual values of the test data set. 2704 04:50:35,320 --> 04:50:42,280 But we're going to see this in practice. Rather than just talk about it. So let's do our predictions 2705 04:50:42,280 --> 04:50:55,320 in red. And label equals predictions. Wonderful. So let's also show the legend, because, I mean, 2706 04:50:55,320 --> 04:51:01,320 we're legends. So we could just put in a mirror here. Now I'm kidding. Legend is going to show 2707 04:51:01,320 --> 04:51:10,760 our labels on the map plot. So prop equals size and prop stands for properties. Well, 2708 04:51:11,640 --> 04:51:16,040 it may or may not. I just like to think it does. That's how I remember it. So we have a beautiful 2709 04:51:16,040 --> 04:51:24,040 function here to plot our data. Should we try it out? Remember, we've got hard coded inputs here, 2710 04:51:24,040 --> 04:51:28,360 so we don't actually need to input anything to our function. We've got our train and test data 2711 04:51:28,360 --> 04:51:32,840 ready to go. If in doubt, run the code, let's check it out. Did we make a mistake in our plot 2712 04:51:32,840 --> 04:51:40,840 predictions function? You might have caught it. Hey, there we go. Beautiful. So because we don't 2713 04:51:40,840 --> 04:51:46,120 have any predictions, we get no red dots. But this is what we're trying to do. We've got a simple 2714 04:51:46,120 --> 04:51:51,000 straight line. You can't get a much more simple data set than that. So we've got our training data 2715 04:51:51,000 --> 04:51:56,440 in blue, and we've got our testing data in green. So the whole idea of what we're going to be doing 2716 04:51:56,440 --> 04:52:00,520 with our machine learning model is we don't actually really need to build a machine learning 2717 04:52:00,520 --> 04:52:05,960 model for this. We could do other things, but machine learning is fun. So we're going to take 2718 04:52:05,960 --> 04:52:11,160 in the blue dots. There's quite a pattern here, right? This is the relationship we have an x value 2719 04:52:11,160 --> 04:52:17,720 here, and we have a y value. So we're going to build a model to try and learn the pattern 2720 04:52:17,720 --> 04:52:25,160 of these blue dots, so that if we fed our model, our model, the x values of the green dots, 2721 04:52:25,160 --> 04:52:29,560 could it predict the appropriate y values for that? Because remember, these are the test data set. 2722 04:52:29,560 --> 04:52:37,400 So pass our model x test to predict y test. So blue dots as input, green dots as the ideal output. 2723 04:52:37,400 --> 04:52:42,360 This is the ideal output, a perfect model would have red dots over the top of the green dots. So 2724 04:52:42,360 --> 04:52:47,640 that's what we will try to work towards. Now, we know the relationship between x and y. 2725 04:52:48,200 --> 04:52:53,160 How do we know that? Well, we set that up above here. This is our weight and bias. 2726 04:52:53,160 --> 04:52:59,560 We created that line y equals weight times x plus bias, which is the simple version of the 2727 04:52:59,560 --> 04:53:05,080 linear regression formula. So mx plus c, you might have heard that in high school algebra, 2728 04:53:05,080 --> 04:53:11,720 so gradient plus intercept. That's what we've got. With that being said, 2729 04:53:11,720 --> 04:53:16,920 let's move on to the next video and build a model. Well, this is exciting. I'll see you there. 2730 04:53:16,920 --> 04:53:24,760 Welcome back. In the last video, we saw how to get visual with our data. We followed the data 2731 04:53:24,760 --> 04:53:31,080 explorer's motto of visualize, visualize, visualize. And we've got an idea of the training data that 2732 04:53:31,080 --> 04:53:36,760 we're working with and the testing data that we're trying to build a model to learn the patterns 2733 04:53:36,760 --> 04:53:44,200 in the training data, essentially this upwards trend here, to be able to predict the testing data. 2734 04:53:44,200 --> 04:53:49,560 So I just want to give you another heads up. I took a little break after the recording last 2735 04:53:49,560 --> 04:53:54,760 video. And so now my colab notebook has disconnected. So I'm going to click reconnect. 2736 04:53:55,480 --> 04:54:02,920 And my variables here may not work. So this is what might happen on your end. If you take a break 2737 04:54:02,920 --> 04:54:08,200 from using Google Colab and come back, if I try to run this function, they might have been saved, 2738 04:54:08,200 --> 04:54:14,600 it looks like they have. But if not, you can go restart and run all. This is typically one of the 2739 04:54:14,600 --> 04:54:23,240 most helpful troubleshooting steps of using Google Colab. If a cell, say down here isn't working, 2740 04:54:23,240 --> 04:54:32,520 you can always rerun the cells above. And that may help with a lower cell here, such as if this 2741 04:54:32,520 --> 04:54:38,600 function wasn't instantiated because this cell wasn't run, and we couldn't run this cell here, 2742 04:54:38,600 --> 04:54:43,320 which calls this function here, we just have to rerun this cell above so that we can run this one. 2743 04:54:43,960 --> 04:54:51,960 But now let's get into building our first PyTorch model. We're going to jump straight into the code. 2744 04:54:51,960 --> 04:54:58,760 So our first PyTorch model. Now this is very exciting. 2745 04:54:58,760 --> 04:55:09,480 Let's do it. So we'll turn this into Markdown. Now we're going to create a linear regression model. 2746 04:55:09,480 --> 04:55:15,720 So look at linear regression formula again, we're going to create a model that's essentially going 2747 04:55:15,720 --> 04:55:23,480 to run this computation. So we need to create a model that has a parameter for A, a parameter for B, 2748 04:55:23,480 --> 04:55:29,640 and in our case it's going to be weight and bias, and a way to do this forward computation. 2749 04:55:29,640 --> 04:55:36,040 What I mean by that, we're going to see with code. So let's do it. We'll do it with pure PyTorch. 2750 04:55:36,040 --> 04:55:44,040 So create a linear regression model class. Now if you're not experienced with using Python classes, 2751 04:55:44,040 --> 04:55:49,240 I'm going to be using them throughout the course, and I'm going to call this one linear regression 2752 04:55:49,240 --> 04:55:56,440 model. If you haven't dealt with Python classes before, that's okay. I'm going to be explaining 2753 04:55:56,440 --> 04:56:03,000 what we're doing as we're doing it. But if you'd like a deeper dive, I'd recommend you to real Python 2754 04:56:04,120 --> 04:56:12,520 classes. OOP in Python three. That's a good rhyming. So I'm just going to link this here. 2755 04:56:12,520 --> 04:56:23,400 Because we're going to be building classes throughout the course, 2756 04:56:23,400 --> 04:56:31,480 I'd recommend getting familiar with OOP, which is object oriented programming, a little bit of a 2757 04:56:31,480 --> 04:56:43,880 mouthful, hence the OOP in Python. To do so, you can use the following resource from real Python. 2758 04:56:43,880 --> 04:56:48,280 But when I'm going to go through that now, I'd rather just code it out and talk it out while we 2759 04:56:48,280 --> 04:56:53,080 do it. So we've got a class here. Now the first thing you might notice is that the class inherits 2760 04:56:53,080 --> 04:57:00,040 from nn.module. And you might be wondering, well, what's nn.module? Well, let's write down here, 2761 04:57:00,040 --> 04:57:12,840 almost everything in PyTorch inherits from nn.module. So you can imagine nn.module as the 2762 04:57:12,840 --> 04:57:20,520 Lego building bricks of PyTorch model. And so nn.module has a lot of helpful inbuilt things that's 2763 04:57:20,520 --> 04:57:25,000 going to help us build our PyTorch models. And of course, how could you learn more about it? 2764 04:57:25,000 --> 04:57:33,080 Well, you could go nn.module, PyTorch. Module. Here we go. Base class for all neural network 2765 04:57:33,080 --> 04:57:38,760 modules. Wonderful. Your models should also subclass this class. So that's what we're building. We're 2766 04:57:38,760 --> 04:57:44,120 building our own PyTorch model. And so the documentation here says that your models should 2767 04:57:44,120 --> 04:57:49,800 also subclass this class. And another thing with PyTorch, this is what makes it, it might seem very 2768 04:57:49,800 --> 04:57:56,360 confusing when you first begin. But modules can contain other modules. So what I mean by being a 2769 04:57:56,360 --> 04:58:01,880 Lego brick is that you can stack these modules on top of each other and make progressively more 2770 04:58:01,880 --> 04:58:08,360 complex neural networks as you go. But we'll leave that for later on. For now, we're going to start 2771 04:58:08,360 --> 04:58:15,640 with something nice and simple. And let's clean up our web browser. So we're going to create a 2772 04:58:15,640 --> 04:58:23,720 constructor here, which is with the init function. It's going to take self as a parameter. If you're 2773 04:58:23,720 --> 04:58:29,000 not sure of what's going on here, just follow along with the code for now. And I'd encourage you 2774 04:58:29,000 --> 04:58:37,640 to read this documentation here after the video. So then we have super dot init. I know when I 2775 04:58:37,640 --> 04:58:40,920 first started learning this, I was like, why do we have to write a knit twice? And then what's 2776 04:58:40,920 --> 04:58:47,640 super and all that jazz. But just for now, just take this as being some required Python syntax. 2777 04:58:48,280 --> 04:58:54,040 And then we have self dot weights. So that means we're going to create a weights parameter. We'll 2778 04:58:54,040 --> 04:58:59,720 see why we do this in a second. And to create that parameter, we're going to use nn dot parameter. 2779 04:59:00,280 --> 04:59:08,280 And just a quick reminder that we imported nn from torch before. And if you remember, 2780 04:59:08,280 --> 04:59:15,880 nn is the building block layer for neural networks. And within nn, so nn stands for neural network 2781 04:59:15,880 --> 04:59:24,520 is module. So we've got nn dot parameter. Now, we're going to start with random parameters. 2782 04:59:25,240 --> 04:59:32,120 So torch dot rand n. One, we're going to talk through each of these in a second. So I'm also 2783 04:59:32,120 --> 04:59:39,560 going to put requires, requires grad equals true. We haven't touched any of these, but that's okay. 2784 04:59:40,120 --> 04:59:50,360 D type equals torch dot float. So let's see what nn parameter tells us. What do we have here? 2785 04:59:53,080 --> 04:59:58,440 A kind of tensor that is to be considered a module parameter. So we've just created a module 2786 04:59:58,440 --> 05:00:04,280 using nn module. Parameters are torch tensor subclasses. So this is a tensor in itself 2787 05:00:05,000 --> 05:00:09,480 that have a very special property when used with modules. When they're assigned as a module 2788 05:00:09,480 --> 05:00:14,760 attribute, they are automatically added to the list of its parameters. And we'll appear e g 2789 05:00:14,760 --> 05:00:20,440 in module dot parameters iterator. Oh, we're going to see that later on. Assigning a tensor 2790 05:00:20,440 --> 05:00:28,040 doesn't have such effect. So we're creating a parameter here. Now requires grad. What does that 2791 05:00:28,040 --> 05:00:32,680 mean? Well, let's just rather than just try to read the doc string collab, let's look it up. 2792 05:00:32,680 --> 05:00:42,600 nn dot parameter. What does it say requires grad optional. If the parameter requires gradient. 2793 05:00:43,400 --> 05:00:51,160 Hmm. What does requires gradient mean? Well, let's come back to that in a second. And then 2794 05:00:51,160 --> 05:00:56,680 for now, I just want you to think about it. D type equals torch dot float. Now, 2795 05:00:56,680 --> 05:01:02,920 the data type here torch dot float is, as we've discussed before, is the default 2796 05:01:02,920 --> 05:01:08,360 for pytorch to watch dot float. This could also be torch dot float 32. So we're just going to 2797 05:01:08,360 --> 05:01:14,920 leave it as torch float 32, because pytorch likes to work with flight 32. Now, do we have 2798 05:01:17,160 --> 05:01:24,280 this by default? We do. So we don't necessarily have to set requires grad equals true. So just 2799 05:01:24,280 --> 05:01:33,000 keep that in mind. So now we've created a parameter for the weights. We also have to create a parameter 2800 05:01:33,000 --> 05:01:41,080 for the bias. Let's finish creating this. And then we'll write the code, then we'll talk about it. 2801 05:01:41,080 --> 05:01:52,120 So rand n. Now requires grad equals true. And d type equals torch dot float. There we go. 2802 05:01:52,120 --> 05:02:01,000 And now we're going to write a forward method. So forward method to define the computation 2803 05:02:02,040 --> 05:02:14,520 in the model. So let's go def forward, which self takes in a parameter x, which is data, 2804 05:02:14,520 --> 05:02:23,720 which X is expected to be of type torch tensor. And it returns a torch dot tensor. And then we go 2805 05:02:23,720 --> 05:02:28,760 here. And so we say X, we don't necessarily need this comment. I'm just going to write it anyway. 2806 05:02:28,760 --> 05:02:36,280 X is the input data. So in our case, it might be the training data. And then from here, we want 2807 05:02:36,280 --> 05:02:46,440 it to return self dot weights times X plus self dot bias. Now, where have we seen this before? 2808 05:02:47,480 --> 05:02:56,440 Well, this is the linear regression formula. Now, let's take a step back into how we created our data. 2809 05:02:56,440 --> 05:02:59,560 And then we'll go back through and talk a little bit more about what's going on here. 2810 05:02:59,560 --> 05:03:08,680 So if we go back up to our data, where did we create that? We created it here. So you see how 2811 05:03:08,680 --> 05:03:16,520 we've created known parameters, weight and bias. And then we created our y variable, our target, 2812 05:03:16,520 --> 05:03:23,320 using the linear regression formula, wait times X plus bias, and X were a range of numbers. 2813 05:03:23,320 --> 05:03:29,560 So what we've done with our linear regression model that we've created from scratch, 2814 05:03:29,560 --> 05:03:37,880 if we go down here, we've created a parameter, weights. This could just be weight, if we wanted to. 2815 05:03:38,440 --> 05:03:44,840 We've created a parameter here. So when we created our data, we knew what the parameters weight and 2816 05:03:44,840 --> 05:03:52,200 bias were. The whole goal of our model is to start with random numbers. So these are going to be 2817 05:03:52,200 --> 05:03:58,440 random parameters. And to look at the data, which in our case will be the training samples, 2818 05:03:59,160 --> 05:04:07,400 and update those random numbers to represent the pattern here. So ideally, our model, if it's 2819 05:04:07,400 --> 05:04:13,800 learning correctly, will take our weight, which is going to be a random value, and our bias, 2820 05:04:13,800 --> 05:04:18,120 which is going to be a random value. And it will run it through this forward calculation, 2821 05:04:18,120 --> 05:04:25,720 which is the same formula that we use to create our data. And it will adjust the weight and bias 2822 05:04:25,720 --> 05:04:34,520 to represent as close as possible, if not perfect, the known parameters. So that's the premise of 2823 05:04:34,520 --> 05:04:41,960 machine learning. And how does it do this? Through an algorithm called gradient descent. So I'm just 2824 05:04:41,960 --> 05:04:46,360 going to write this down because we've talked a lot about this, but I'd like to just tie it together 2825 05:04:46,360 --> 05:05:01,880 here. So what our model does, so start with random values, weight and bias, look at training data, 2826 05:05:01,880 --> 05:05:21,480 and adjust the random values to better represent the, or get closer to the ideal values. So the 2827 05:05:21,480 --> 05:05:33,240 weight and bias values we use to create the data. So that's what it's going to do. It's going to 2828 05:05:33,240 --> 05:05:39,000 start with random values, and then continually look at our training data to see if it can adjust 2829 05:05:39,000 --> 05:05:46,120 those random values to be what would represent this straight line here. Now, how does it do so? 2830 05:05:46,120 --> 05:06:01,240 How does it do so? Through two main algorithms. So one is gradient descent, and two is back 2831 05:06:01,240 --> 05:06:12,120 propagation. So I'm going to leave it here for the time being, but we're going to continue talking 2832 05:06:12,120 --> 05:06:21,480 about this gradient descent is why we have requires grad equals true. And so what this is going to 2833 05:06:21,480 --> 05:06:28,680 do is when we run computations using this model here, pytorch is going to keep track of the gradients 2834 05:06:28,680 --> 05:06:36,280 of our weights parameter and our bias parameter. And then it's going to update them through a 2835 05:06:36,280 --> 05:06:43,000 combination of gradient descent and back propagation. Now, I'm going to leave this as extracurricular 2836 05:06:43,000 --> 05:06:46,120 for you to look through and gradient descent and back propagation. I'm going to add some 2837 05:06:46,120 --> 05:06:51,240 resources here. There will also be plenty of resources in the pytorch workflow fundamentals 2838 05:06:51,240 --> 05:06:57,000 book chapter on how these algorithms work behind the scenes. We're going to be focused on the code, 2839 05:06:57,000 --> 05:07:02,280 the pytorch code, to trigger these algorithms behind the scenes. So pytorch, lucky for us, 2840 05:07:02,280 --> 05:07:09,080 has implemented gradient descent and back propagation for us. So we're writing the higher level code 2841 05:07:09,080 --> 05:07:14,040 here to trigger these two algorithms. So in the next video, we're going to step through this a 2842 05:07:14,040 --> 05:07:21,560 little bit more, and then further discuss some of the most useful and required modules of pytorch, 2843 05:07:21,560 --> 05:07:27,560 particularly an N and a couple of others. So let's leave it there, and I'll see you in the next video. 2844 05:07:27,560 --> 05:07:35,240 Welcome back. In the last video, we covered a whole bunch in creating our first pytorch model 2845 05:07:35,240 --> 05:07:40,040 that inherits from nn.module. We talked about object oriented programming and how a lot of 2846 05:07:40,040 --> 05:07:45,400 pytorch uses object oriented programming. I can't say that. I might just say OOP for now. 2847 05:07:45,400 --> 05:07:51,480 What I've done since last video, though, is I've added two resources here for gradient descent 2848 05:07:51,480 --> 05:07:57,880 and back propagation. These are two of my favorite videos on YouTube by the channel three blue 2849 05:07:57,880 --> 05:08:02,280 one brown. So this is on gradient descent. I would highly recommend watching this entire series, 2850 05:08:02,280 --> 05:08:08,360 by the way. So that's your extra curriculum for this video, in particular, and for this course overall 2851 05:08:08,360 --> 05:08:13,240 is to go through these two videos. Even if you're not sure entirely what's happening, 2852 05:08:13,240 --> 05:08:17,960 you will gain an intuition for the code that we're going to be writing with pytorch. 2853 05:08:17,960 --> 05:08:23,480 So just keep that in mind as we go forward, a lot of what pytorch is doing behind the scenes for us 2854 05:08:23,480 --> 05:08:32,520 is taking care of these two algorithms for us. And we also created two parameters here in our model 2855 05:08:32,520 --> 05:08:39,720 where we've instantiated them as random values. So one parameter for each of the ones that we use, 2856 05:08:39,720 --> 05:08:44,680 the weight and bias for our data set. And now I want you to keep in mind that we're working 2857 05:08:44,680 --> 05:08:50,440 with a simple data set here. So we've created our known parameters. But in a data set that you 2858 05:08:50,440 --> 05:08:54,760 haven't created by yourself, you've maybe gathered that from the internet, such as images, 2859 05:08:55,560 --> 05:09:02,840 you won't be necessarily defining these parameters. Instead, another module from nn will define the 2860 05:09:02,840 --> 05:09:10,760 parameters for you. And we'll work out what those parameters should end up being. But since we're 2861 05:09:10,760 --> 05:09:16,760 working with a simple data set, we can define our two parameters that we're trying to estimate. 2862 05:09:16,760 --> 05:09:21,720 This is a key point here is that our model is going to start with random values. That's the 2863 05:09:21,720 --> 05:09:27,240 annotation I've added here. Start with a random weight value using torch random. And then we've 2864 05:09:27,240 --> 05:09:32,920 told it that it can update via gradient descent. So pytorch is going to track the gradients of 2865 05:09:32,920 --> 05:09:37,720 this parameter for us. And then we've told it that the D type we want is float 32. We don't 2866 05:09:37,720 --> 05:09:43,080 necessarily need these two set explicitly, because a lot of the time the default in pytorch is to 2867 05:09:43,080 --> 05:09:49,080 set these two requires grad equals true and d type equals torch dot float. Does that for us 2868 05:09:49,080 --> 05:09:54,280 behind the scenes? But just to keep things as fundamental and as straightforward as possible, 2869 05:09:54,280 --> 05:10:01,000 we've set all of this explicitly. So let's jump into the keynote. I'd just like to explain 2870 05:10:01,000 --> 05:10:06,840 what's going on one more time in a visual sense. So here's the exact code that we've 2871 05:10:06,840 --> 05:10:12,760 just written. I've just copied it from here. And I've just made it a little bit more colorful. 2872 05:10:13,480 --> 05:10:21,160 But here's what's going on. So when you build a model in pytorch, it subclasses the nn.modgable 2873 05:10:21,160 --> 05:10:27,560 class. This contains all the building blocks for neural networks. So our class of model, subclasses 2874 05:10:27,560 --> 05:10:36,680 nn.modgable. Now, inside the constructor, we initialize the model parameters. Now, as we'll see, 2875 05:10:36,680 --> 05:10:44,600 later on with bigger models, we won't necessarily always explicitly create the weights and biases. 2876 05:10:45,160 --> 05:10:49,880 We might initialize whole layers. Now, this is a concept we haven't touched on yet, but 2877 05:10:50,440 --> 05:10:57,480 we might initialize a list of layers or whatever we need. So basically, what happens in here is that 2878 05:10:57,480 --> 05:11:04,760 we create whatever variables that we need for our model to use. And so these could be different 2879 05:11:04,760 --> 05:11:10,200 layers from torch.nn, single parameters, which is what we've done in our case, hard coded values, 2880 05:11:10,200 --> 05:11:18,760 or even functions. Now, we've explicitly set requires grad equals true for our model parameters. 2881 05:11:19,320 --> 05:11:24,200 So this, in turn, means that pytorch behind the scenes will track all of the gradients 2882 05:11:24,200 --> 05:11:31,960 for these parameters here for use with torch.auto grad. So torch.auto grad module of pytorch is what 2883 05:11:31,960 --> 05:11:36,840 implements gradient descent. Now, a lot of this will happen behind the scenes for when we write 2884 05:11:36,840 --> 05:11:41,320 our pytorch training code. So if you'd like to know what's happening behind the scenes, 2885 05:11:41,320 --> 05:11:45,000 I'd highly recommend you checking out these two videos, hence is why I've linked them here. 2886 05:11:46,920 --> 05:11:52,280 Oh, and for many pytorch.nn modules requires grad is true is set by default. 2887 05:11:53,720 --> 05:12:00,200 Finally, we've got a forward method. Now, any subclass of nn.modgable, which is what we've done, 2888 05:12:00,200 --> 05:12:05,800 requires a forward method. Now, we can see this in the documentation. If we go torch 2889 05:12:06,520 --> 05:12:07,800 dot nn.modgable. 2890 05:12:10,440 --> 05:12:13,160 Click on module. Do we have forward? 2891 05:12:16,680 --> 05:12:22,040 Yeah, there we go. So forward, we've got a lot of things built into an nn.modgable. 2892 05:12:22,760 --> 05:12:28,680 So you see here, this is a subclass of an nn.modgable. And then we have forward. 2893 05:12:28,680 --> 05:12:34,280 So forward is what defines the computation performed at every call. So if we were 2894 05:12:34,280 --> 05:12:39,800 to call linear regression model and put some data through it, the forward method is the 2895 05:12:39,800 --> 05:12:46,360 operation that this module does that this model does. And in our case, our forward method is 2896 05:12:46,360 --> 05:12:52,840 the linear regression function. So keep this in mind, any subclass of nn.modgable needs to 2897 05:12:52,840 --> 05:12:56,920 override the forward method. So you need to define a forward method if you're going to subclass 2898 05:12:56,920 --> 05:13:03,480 nn.modgable. We'll see this very hands on. But for now, I believe that's enough coverage of what 2899 05:13:03,480 --> 05:13:10,120 we've done. If you have any questions, remember, you can ask it in the discussions. We've got a 2900 05:13:10,120 --> 05:13:17,560 fair bit going on here. But I think we've broken it down a fair bit. The next step is for us to, 2901 05:13:17,560 --> 05:13:22,280 I know I mentioned this in a previous video is to cover some PyTorch model building essentials. 2902 05:13:22,280 --> 05:13:27,560 But we're going to cover a few more of them. We've seen some already. But the next way to really 2903 05:13:27,560 --> 05:13:33,000 start to understand what's going on is to check the contents of our model, train one, and make 2904 05:13:33,000 --> 05:13:38,520 some predictions with it. So let's get hands on with that in the next few videos. I'll see you there. 2905 05:13:42,040 --> 05:13:47,640 Welcome back. In the last couple of videos, we stepped through creating our first PyTorch model. 2906 05:13:47,640 --> 05:13:52,520 And it looks like there's a fair bit going on here. But some of the main takeaways is that almost 2907 05:13:52,520 --> 05:14:00,040 every model in PyTorch inherits from nn.modgable. And if you are going to inherit from nn.modgable, 2908 05:14:00,040 --> 05:14:04,360 you should override the forward method to define what computation is happening in your model. 2909 05:14:05,160 --> 05:14:10,680 And for later on, when our model is learning things, in other words, updating its weights and 2910 05:14:10,680 --> 05:14:17,880 bias values from random values to values that better fit the data, it's going to do so via 2911 05:14:17,880 --> 05:14:22,840 gradient descent and back propagation. And so these two videos are some extra curriculum 2912 05:14:22,840 --> 05:14:27,880 for what's happening behind the scenes. But we haven't actually written any code yet to trigger 2913 05:14:27,880 --> 05:14:33,000 these two. So I'll refer back to these when we actually do write code to do that. For now, 2914 05:14:33,000 --> 05:14:41,240 we've just got a model that defines some forward computation. But speaking of models, let's have 2915 05:14:41,240 --> 05:14:45,880 a look at a couple of PyTorch model building essentials. So we're not going to write too much 2916 05:14:45,880 --> 05:14:50,680 code for this video, and it's going to be relatively short. But I just want to introduce you to some 2917 05:14:50,680 --> 05:14:54,920 of the main classes that you're going to be interacting with in PyTorch. And we've seen 2918 05:14:54,920 --> 05:15:02,040 some of these already. So one of the first is torch.nn. So contains all of the building blocks 2919 05:15:02,040 --> 05:15:08,200 for computational graphs. Computational graphs is another word for neural networks. 2920 05:15:09,320 --> 05:15:15,240 Well, actually computational graphs is quite general. I'll just write here, a neural network 2921 05:15:15,960 --> 05:15:28,360 can be considered a computational graph. So then we have torch.nn.parameter. We've seen this. 2922 05:15:28,360 --> 05:15:38,680 So what parameters should our model try and learn? And then we can write here often a PyTorch 2923 05:15:38,680 --> 05:15:50,040 layer from torch.nn will set these for us. And then we've got torch.nn.module, which is 2924 05:15:50,040 --> 05:16:00,440 what we've seen here. And so torch.nn.module is the base class for all neural network modules. 2925 05:16:03,240 --> 05:16:13,640 If you subclass it, you should overwrite forward, which is what we've done here. We've created our 2926 05:16:13,640 --> 05:16:19,960 own forward method. So what else should we cover here? We're going to see these later 2927 05:16:19,960 --> 05:16:28,600 on, but I'm going to put it here, torch.optim. This is where the optimizers in PyTorch live. 2928 05:16:29,320 --> 05:16:39,160 They will help with gradient descent. So optimizer, an optimizer is, as we've said before, 2929 05:16:39,720 --> 05:16:44,760 that our model starts with random values. And it looks at training data and adjusts the random 2930 05:16:44,760 --> 05:16:51,080 values to better represent the ideal values. The optimizer contains algorithm that's going to 2931 05:16:51,640 --> 05:16:58,840 optimize these values, instead of being random, to being values that better represent our data. 2932 05:16:59,400 --> 05:17:08,680 So those algorithms live in torch.optim. And then one more for now, I'll link to extra resources. 2933 05:17:08,680 --> 05:17:13,240 And we're going to cover them as we go. That's how I like to do things, cover them as we need them. 2934 05:17:13,240 --> 05:17:19,480 So all nn.module. So this is the forward method. I'm just going to explicitly say here that all 2935 05:17:19,480 --> 05:17:30,840 nn.module subclasses require you to overwrite forward. This method defines what happens 2936 05:17:31,640 --> 05:17:39,560 in the forward computation. So in our case, if we were to pass some data to our linear regression 2937 05:17:39,560 --> 05:17:45,400 model, the forward method would take that data and perform this computation here. 2938 05:17:45,960 --> 05:17:49,640 And as your models get bigger and bigger, ours is quite straightforward here. 2939 05:17:49,640 --> 05:17:54,680 This forward computation can be as simple or as complex as you like, depending on what you'd 2940 05:17:54,680 --> 05:18:02,280 like your model to do. And so I've got a nice and fancy slide here, which basically reiterates 2941 05:18:02,280 --> 05:18:06,040 what we've just discussed. PyTorch is central neural network building modules. 2942 05:18:06,040 --> 05:18:17,320 So the module torch.nn, torch.nn.module, torch.optim, torch.utils.dataset. We haven't actually talked 2943 05:18:17,320 --> 05:18:22,440 about this yet. And I believe there's one more data loader. We're going to see these two later on. 2944 05:18:22,440 --> 05:18:27,400 But these are very helpful when you've got a bit more of a complicated data set. In our case, 2945 05:18:27,400 --> 05:18:32,360 we've got just 50 integers for our data set. We've got a simple straight line. But when we need 2946 05:18:32,360 --> 05:18:38,280 to create more complex data sets, we're going to use these. So this will help us build models. 2947 05:18:39,160 --> 05:18:45,640 This will help us optimize our models parameters. And this will help us load data. And if you'd 2948 05:18:45,640 --> 05:18:50,920 like more, one of my favorite resources is the PyTorch cheat sheet. Again, we're referring 2949 05:18:50,920 --> 05:18:56,440 back to the documentation. See, all of this documentation, right? As I said, this course is 2950 05:18:56,440 --> 05:19:01,560 not a replacement for the documentation. It's just my interpretation of how one should best 2951 05:19:01,560 --> 05:19:08,760 become familiar with PyTorch. So we've got imports, the general import torch from torch.utils.dataset 2952 05:19:08,760 --> 05:19:13,960 data loader. Oh, did you look at that? We've got that mentioned here, data, data set data loader. 2953 05:19:14,520 --> 05:19:20,840 And torch, script and jit, neural network API. I want an X. I'll let you go through here. 2954 05:19:21,720 --> 05:19:26,360 We're covering some of the most fundamental ones here. But there's, of course, PyTorch is 2955 05:19:26,360 --> 05:19:32,680 quite a big library. So some extra curricula for this video would be to go through this for 2956 05:19:32,680 --> 05:19:36,520 five to 10 minutes and just read. You don't have to understand them all. We're going to start to 2957 05:19:36,520 --> 05:19:40,520 get more familiar with all of these. We're not all of them because, I mean, that would require 2958 05:19:40,520 --> 05:19:46,840 making videos for the whole documentation. But a lot of these through writing them via code. 2959 05:19:47,880 --> 05:19:54,040 So that's enough for this video. I'll link this PyTorch cheat sheet in the video here. 2960 05:19:54,040 --> 05:20:01,160 And in the next video, how about we, we haven't actually checked out what happens if we do 2961 05:20:01,160 --> 05:20:06,840 create an instance of our linear regression model. I think we should do that. I'll see you there. 2962 05:20:09,720 --> 05:20:16,680 Welcome back. In the last video, we covered some of the PyTorch model building essentials. And look, 2963 05:20:16,680 --> 05:20:21,560 I linked a cheat sheet here. There's a lot going on. There's a lot of text going on in the page. 2964 05:20:21,560 --> 05:20:27,560 Of course, the reference material for here is in the Learn PyTorch book. PyTorch model building 2965 05:20:27,560 --> 05:20:32,520 essentials under 0.1, which is the notebook we're working on here. But I couldn't help myself. 2966 05:20:32,520 --> 05:20:37,320 I wanted to add some color to this. So before we inspect our model, let's just add a little bit 2967 05:20:37,320 --> 05:20:43,560 of color to our text on the page. We go to whoa. Here's our workflow. This is what we're covering 2968 05:20:43,560 --> 05:20:50,280 in this video, right? Or in this module, 0.1. But to get data ready, here are some of the most 2969 05:20:50,280 --> 05:20:55,560 important PyTorch modules. Torchvision.transforms. We'll see that when we cover computer vision later 2970 05:20:55,560 --> 05:21:00,520 on. Torch.utils.data.data set. So that's if we want to create a data set that's a little bit 2971 05:21:00,520 --> 05:21:05,000 more complicated than because our data set is so simple, we haven't used either of these 2972 05:21:05,000 --> 05:21:12,040 data set creator or data loader. And if we go build a picker model, well, we can use torch.nn. 2973 05:21:12,040 --> 05:21:19,240 We've seen that one. We've seen torch.nn.module. So in our case, we're building a model. But if we 2974 05:21:19,240 --> 05:21:22,840 wanted a pre-trained model, well, there's some computer vision models that have already been 2975 05:21:22,840 --> 05:21:28,920 built for us in torchvision.models. Now torchvision stands for PyTorch's computer vision 2976 05:21:28,920 --> 05:21:34,040 module. So we haven't covered that either. But this is just a spoiler for what's coming on 2977 05:21:34,040 --> 05:21:39,400 later on. Then if the optimizer, if we wanted to optimize our model's parameters to better 2978 05:21:39,400 --> 05:21:45,640 represent a data set, we can go to torch.optim. Then if we wanted to evaluate the model, 2979 05:21:45,640 --> 05:21:49,320 well, we've got torch metrics for that. We haven't seen that, but we're going to be 2980 05:21:49,320 --> 05:21:53,640 hands-on with all of these later on. Then if we wanted to improve through experimentation, 2981 05:21:53,640 --> 05:22:00,280 we've got torch.utils.tensorboard. Hmm. What's this? But again, if you want more, 2982 05:22:00,280 --> 05:22:04,360 there's some at the PyTorch cheat sheet. But now this is just adding a little bit of color 2983 05:22:04,360 --> 05:22:09,000 and a little bit of code to our PyTorch workflow. And with that being said, let's get a little bit 2984 05:22:09,000 --> 05:22:18,520 deeper into what we've built, which is our first PyTorch model. So checking the contents of our 2985 05:22:18,520 --> 05:22:29,720 PyTorch model. So now we've created a model. Let's see what's inside. You might already be able 2986 05:22:30,760 --> 05:22:36,680 to guess this by the fact of what we've created in the constructor here in the init function. 2987 05:22:36,680 --> 05:22:42,680 So what do you think we have inside our model? And how do you think we'd look in that? Now, 2988 05:22:42,680 --> 05:22:45,720 of course, these are questions you might not have the answer to because you've just, you're like, 2989 05:22:45,720 --> 05:22:49,800 Daniel, I'm just starting to learn PyTorch. I don't know these, but I'm asking you just to start 2990 05:22:49,800 --> 05:22:59,400 thinking about these different things, you know? So we can check out our model parameters or what's 2991 05:22:59,400 --> 05:23:11,080 inside our model using, wait for it, dot parameters. Oh, don't you love it when things are nice and 2992 05:23:11,080 --> 05:23:16,760 simple? Well, let's check it out. Hey, well, first things we're going to do is let's create a random 2993 05:23:16,760 --> 05:23:25,560 seed. Now, why are we creating a random seed? Well, because recall, we're creating these parameters 2994 05:23:25,560 --> 05:23:32,360 with random values. And if we were to create them with outer random seed, we would get different 2995 05:23:32,360 --> 05:23:38,680 values every time. So for the sake of the educational sense, for the sake of this video, 2996 05:23:38,680 --> 05:23:44,600 we're going to create a manual seed here, torch dot manual seed. I'm going to use 42 or maybe 43, 2997 05:23:44,600 --> 05:23:52,360 I could use 43 now 42 because I love 42. It's the answer to the universe. And we're going to create 2998 05:23:52,360 --> 05:24:01,400 an instance of the model that we created. So this is a subclass of an end up module. 2999 05:24:02,840 --> 05:24:07,640 So let's do it. Model zero, because it's going to be the zeroth model, the first model that 3000 05:24:07,640 --> 05:24:15,080 we've ever created in this whole course, how amazing linear regression model, which is what 3001 05:24:15,080 --> 05:24:21,160 our class is called. So we can just call it like that. That's all I'm doing, just calling this class. 3002 05:24:21,160 --> 05:24:27,160 And so let's just see what happens there. And then if we go model zero, what does it give us? Oh, 3003 05:24:27,160 --> 05:24:32,440 linear regression. Okay, it doesn't give us much. But we want to find out what's going on in here. 3004 05:24:32,440 --> 05:24:45,880 So check out the parameters. So model zero dot parameters. What do we get from this? Oh, a generator. 3005 05:24:45,880 --> 05:24:53,880 Well, let's turn this into a list that'll be better to look at. There we go. Oh, how exciting is that? 3006 05:24:53,880 --> 05:25:01,880 So parameter containing. Look at the values tensor requires grad equals true parameter containing 3007 05:25:01,880 --> 05:25:11,880 wonderful. So these are our model parameters. So why are they the values that they are? Well, 3008 05:25:11,880 --> 05:25:20,040 it's because we've used torch rand n. Let's see what happens if we go, let's just create torch dot 3009 05:25:20,040 --> 05:25:26,600 rand n one, what happens? We get a value like that. And now if we run this again, 3010 05:25:28,520 --> 05:25:32,440 we get the same values. But if we run this again, so keep this in one two, three, four, 3011 05:25:32,440 --> 05:25:38,040 five, actually, that's, wow, that's pretty cool that we got a random value that was all in order, 3012 05:25:38,040 --> 05:25:44,040 four in a row. Can we do it twice in a row? Probably not. Oh, we get it the same one. Now, 3013 05:25:44,040 --> 05:25:49,560 why is that? Oh, we get a different one. Did we just get the same one twice? Oh, my gosh, 3014 05:25:49,560 --> 05:25:55,960 we got the same value twice in a row. You saw that. You saw that. That's incredible. Now, 3015 05:25:55,960 --> 05:26:01,640 the reason why we get this is because this one is different every time because there's no random 3016 05:26:01,640 --> 05:26:12,600 seed. Watch if we put the random seed here, torch dot manual seed, 42, 3, 3, 6, 7, what happens? 3017 05:26:13,640 --> 05:26:20,680 3, 3, 6, 7, what happens? 3, 3, 6, 7. Okay. And what if we commented out the random seed 3018 05:26:20,680 --> 05:26:27,080 here, initialized our model, different values, two, three, five, two, three, four, five, it must 3019 05:26:27,080 --> 05:26:34,280 like that value. Oh, my goodness. Let me know if you get that value, right? So if we keep going, 3020 05:26:34,280 --> 05:26:39,080 we get different values every single time. Why is this? Why are we getting different values 3021 05:26:39,080 --> 05:26:43,320 every single time? You might be, Daniel, you sound like a broken record, but I'm trying to 3022 05:26:43,320 --> 05:26:49,640 really drive home the fact that we initialize our models with random parameters. So this is the 3023 05:26:49,640 --> 05:26:53,560 essence of what our machine learning models and deep learning models are going to do. Start with 3024 05:26:53,560 --> 05:26:59,640 random values, weights and bias. Maybe we've only got two parameters here, but the future models 3025 05:26:59,640 --> 05:27:03,960 that we build might have thousands. And so of course, we're not going to do them all by hand. 3026 05:27:03,960 --> 05:27:08,920 We'll see how we do that later on. But for now, we start with random values. And our ideal model 3027 05:27:08,920 --> 05:27:13,640 will look at the training data and adjust these random values. But just so that we can get 3028 05:27:13,640 --> 05:27:20,520 reproducible results, I'll get rid of this cell. I've set the random seed here. So you should be 3029 05:27:20,520 --> 05:27:24,760 getting similar values to this. If you're not, because there's maybe some sort of pytorch update 3030 05:27:24,760 --> 05:27:29,160 and how the random seeds calculated, you might get slightly different values. But for now, 3031 05:27:29,160 --> 05:27:36,360 we'll use torch.manualc.42. And I want you to just be aware of this can be a little bit confusing. 3032 05:27:37,320 --> 05:27:45,080 If you just do the list of parameters, for me, I understand it better if I list the name parameters. 3033 05:27:45,080 --> 05:27:54,040 So the way we do that is with model zero, and we call state dict on it. This is going to give us 3034 05:27:54,040 --> 05:27:59,960 our dictionary of the parameters of our model. So as you can see here, we've got weights, 3035 05:27:59,960 --> 05:28:06,680 and we've got bias, and they are random values. So where did weights and bias come from? Well, 3036 05:28:06,680 --> 05:28:12,040 of course, they came from here, weights, bias. But of course, as well up here, 3037 05:28:12,040 --> 05:28:22,280 we've got known parameters. So now our whole goal is what? Our whole goal is to build code, 3038 05:28:22,280 --> 05:28:27,960 or write code, that is going to allow our model to look at these blue dots here, 3039 05:28:28,600 --> 05:28:40,040 and adjust this weight and bias value to be weights as close as possible to weight and bias. 3040 05:28:40,040 --> 05:28:49,160 Now, how do we go from here and here to here and here? Well, we're going to see that in future 3041 05:28:49,160 --> 05:28:57,800 videos, but the closer we get these values to these two, the better we're going to be able to 3042 05:28:58,360 --> 05:29:05,880 predict and model our data. Now, this principle, I cannot stress enough, is the fundamental 3043 05:29:05,880 --> 05:29:10,440 entire foundation, the fundamental foundation. Well, good description, Daniel. The entire 3044 05:29:10,440 --> 05:29:16,120 foundation of deep learning, we start with some random values, and we use gradient descent and 3045 05:29:16,120 --> 05:29:22,280 back propagation, plus whatever data that we're working with to move these random values as close 3046 05:29:22,280 --> 05:29:29,640 as possible to the ideal values. And in most cases, you won't know what the ideal values are. 3047 05:29:30,280 --> 05:29:33,640 But in our simple case, we already know what the ideal values are. 3048 05:29:33,640 --> 05:29:38,920 So just keep that in mind going forward. The premise of deep learning is to start with random 3049 05:29:38,920 --> 05:29:46,040 values and make them more representative closer to the ideal values. With that being said, 3050 05:29:46,040 --> 05:29:50,920 let's try and make some predictions with our model as it is. I mean, it's got random values. 3051 05:29:50,920 --> 05:29:55,480 How do you think the predictions will go? So I think in the next video, we'll make some predictions 3052 05:29:55,480 --> 05:30:03,960 on this test data and see what they look like. I'll see you there. Welcome back. In the last 3053 05:30:03,960 --> 05:30:10,520 video, we checked out the internals of our first PyTorch model. And we found out that because we're 3054 05:30:10,520 --> 05:30:17,000 creating our model with torch dot or the parameters of our model, with torch dot rand, they begin as 3055 05:30:17,000 --> 05:30:22,360 random variables. And we also discussed the entire premise of deep learning is to start with random 3056 05:30:22,360 --> 05:30:28,200 numbers and slowly progress those towards more ideal numbers, slightly less random numbers based 3057 05:30:28,200 --> 05:30:36,280 on the data. So let's see, before we start to improve these numbers, let's see what their predictive 3058 05:30:36,280 --> 05:30:41,880 power is like right now. Now you might be able to guess how well these random numbers will be 3059 05:30:41,880 --> 05:30:47,960 able to predict on our data. You're not sure what that predicting means? Let's have a look. So making 3060 05:30:47,960 --> 05:30:56,280 predictions using torch dot inference mode, something we haven't seen. But as always, we're going to 3061 05:30:56,280 --> 05:31:07,160 discuss it while we use it. So to check our models predictive power, let's see how well 3062 05:31:07,160 --> 05:31:18,760 it predicts Y test based on X test. Because remember again, another premise of a machine 3063 05:31:18,760 --> 05:31:24,120 learning model is to take some features as input and make some predictions close to some sort of 3064 05:31:24,120 --> 05:31:39,240 labels. So when we pass data through our model, it's going to run it through the forward method. 3065 05:31:41,400 --> 05:31:47,480 So here's where it's a little bit confusing. We defined a forward method and it takes X as input. 3066 05:31:47,480 --> 05:31:52,280 Now I've done a little X, but we're going to pass it in a large X as its input. But the reason why I've 3067 05:31:52,280 --> 05:31:57,480 done a little X is because oftentimes in pytorch code, you're going to find all over the internet 3068 05:31:57,480 --> 05:32:03,240 is that X is quite common, commonly used in the forward method here, like this as the input data. 3069 05:32:03,240 --> 05:32:06,200 So I've just left it there because that's what you're going to find quite often. 3070 05:32:07,080 --> 05:32:13,000 So let's test it out. We haven't discussed what inference mode does yet, but we will make predictions 3071 05:32:13,000 --> 05:32:19,480 with model. So with torch dot inference mode, let's use it. And then we will discuss what's going 3072 05:32:19,480 --> 05:32:32,920 on. Y threads equals a model zero X test. So that's all we're doing. We're passing the X test data 3073 05:32:34,520 --> 05:32:41,400 through our model. Now, when we pass this X test in here, let's remind ourselves of what X test is. 3074 05:32:41,400 --> 05:32:51,320 X test 10 variables here. And we're trying to our ideal model will predict the exact values of Y test. 3075 05:32:51,320 --> 05:32:58,200 So this is what our model will do if it's a perfect model. It will take these X test values as input, 3076 05:32:58,200 --> 05:33:06,360 and it will return these Y test values as output. That's an ideal model. So the predictions are the 3077 05:33:06,360 --> 05:33:11,720 exact same as the test data set. How do you think our model will go considering it's starting with 3078 05:33:11,720 --> 05:33:19,160 random values as its parameters? Well, let's find out, hey. So what can that Y threads? 3079 05:33:20,760 --> 05:33:27,320 Oh, what's happened here? Not implemented error. Ah, this is an error I get quite often in Google 3080 05:33:27,320 --> 05:33:33,880 Colab when I'm creating a high-torch model. Now, it usually happens. I'm glad we've stumbled upon 3081 05:33:33,880 --> 05:33:38,600 this. And I think I know the fix. But if not, we might see a little bit of troubleshooting in this 3082 05:33:38,600 --> 05:33:46,760 video is that when we create this, if you see this not implemented error, right, it's saying that 3083 05:33:46,760 --> 05:33:52,280 the Ford method. Here we go. Ford implemented. There we go. It's a little bit of a rabbit hole 3084 05:33:52,280 --> 05:33:57,240 this not implemented area. I've come across it a fair few times and it took me a while to figure 3085 05:33:57,240 --> 05:34:04,760 out that for some reason the spacing. So in Python, you know how you have space space and that defines 3086 05:34:04,760 --> 05:34:09,480 a function space space. There's another thing there and another line there. For some reason, 3087 05:34:09,480 --> 05:34:13,240 if you look at this line in my notebook, and by the way, if you don't have these lines or if you 3088 05:34:13,240 --> 05:34:19,720 don't have these numbers, you can go into tools, settings, editor, and then you can define them here. 3089 05:34:19,720 --> 05:34:25,240 So show line numbers, show notation guides, all that sort of jazz there. You can customize what's 3090 05:34:25,240 --> 05:34:31,720 going on. But I just have these two on because I've run into this error a fair few times. And so 3091 05:34:31,720 --> 05:34:39,080 it's because that this Ford method is not in line with this bracket here. So we need to highlight 3092 05:34:39,080 --> 05:34:45,560 this and click shift tab, move it over. So now you see that it's in line here. And then if we run 3093 05:34:45,560 --> 05:34:51,640 this, won't change any output there. See, that's the hidden gotcha. Is that when we ran this before, 3094 05:34:51,640 --> 05:35:01,320 it found no error. But then when we run it down here, it works. So just keep that in mind. I'm 3095 05:35:01,320 --> 05:35:07,720 really glad we stumbled upon that because indentation errors, not implemented errors, 3096 05:35:07,720 --> 05:35:12,440 one of the most common errors you'll find in PyTorch, or in, well, when you're writing PyTorch 3097 05:35:12,440 --> 05:35:18,840 code in Google CoLab, I'm not sure why, but it just happens. So these are our models predictions 3098 05:35:18,840 --> 05:35:24,040 so far by running the test data through our models Ford method that we defined. And so if 3099 05:35:24,040 --> 05:35:31,720 we look at Y test, are these close? Oh my gosh, they are shocking. So why don't we visualize them? 3100 05:35:33,240 --> 05:35:38,920 Plot predictions. And we're going to put in predictions equals Y threads. 3101 05:35:40,920 --> 05:35:46,840 Let's have a look. Oh my goodness. All the way over here. Remember how we discussed before 3102 05:35:46,840 --> 05:35:52,520 that an ideal model will have, what, red dots on top of the green dots because our ideal model 3103 05:35:52,520 --> 05:35:57,880 will be perfectly predicting the test data. So right now, because our model is initialized with 3104 05:35:57,880 --> 05:36:04,680 random parameters, it's basically making random predictions. So they're extremely far from where 3105 05:36:04,680 --> 05:36:09,880 our ideal predictions are, is that we'll have some training data. And our model predictions, 3106 05:36:09,880 --> 05:36:14,600 when we first create our model will be quite bad. But we want to write some code that will 3107 05:36:14,600 --> 05:36:19,640 hopefully move these red dots closer to these green dots. I'm going to see how we can do that in 3108 05:36:20,280 --> 05:36:26,760 later videos. But we did one thing up here, which we haven't discussed, which is with torch dot 3109 05:36:26,760 --> 05:36:33,800 inference mode. Now this is a context manager, which is what happens when we're making predictions. 3110 05:36:33,800 --> 05:36:38,760 So making predictions, another word for predictions is inference torch uses inference. So I'll try 3111 05:36:38,760 --> 05:36:43,320 to use that a bit more, but I like to use predictions as well. We could also just go 3112 05:36:43,320 --> 05:36:52,120 Y preds equals model zero dot X test. And we're going to get quite a similar output. 3113 05:36:56,120 --> 05:37:02,120 Right. But I've put on inference mode because I want to start making that a habit for later on, 3114 05:37:02,120 --> 05:37:06,360 when we make predictions, put on inference mode. Now why do this? You might notice something different. 3115 05:37:07,320 --> 05:37:12,440 What's the difference here between the outputs? Y preds equals model. There's no inference mode 3116 05:37:12,440 --> 05:37:19,160 here, no context manager. Do you notice that there's a grad function here? And we don't need to go 3117 05:37:19,160 --> 05:37:24,920 into discussing what exactly this is doing here. But do you notice that this one is lacking that 3118 05:37:24,920 --> 05:37:30,680 grad function? So do you remember how behind the scenes I said that pie torch does a few things 3119 05:37:31,880 --> 05:37:36,680 with requires grad equals true, it keeps track of the gradients of different parameters so that 3120 05:37:36,680 --> 05:37:44,360 they can be used in gradient descent and back propagation. Now what inference mode does is it 3121 05:37:44,360 --> 05:37:53,080 turns off that gradient tracking. So it essentially removes all of the, because when we're doing 3122 05:37:53,080 --> 05:37:56,920 inference, we're not doing training. So we don't need to keep track of the gradient. So we don't 3123 05:37:56,920 --> 05:38:03,800 need to do to keep track of how we should update our models. So inference mode disables all of the 3124 05:38:03,800 --> 05:38:09,960 useful things that are available during training. What's the benefit of this? Well, it means that 3125 05:38:09,960 --> 05:38:15,800 pie torch behind the scenes is keeping track of less data. So in turn, it will, with our small 3126 05:38:15,800 --> 05:38:20,520 data set, it probably won't be too dramatic. But with a larger data set, it means that your 3127 05:38:20,520 --> 05:38:26,680 predictions will potentially be a lot faster because a whole bunch of numbers aren't being 3128 05:38:26,680 --> 05:38:31,800 kept track of or a whole bunch of things that you don't need during prediction mode or inference 3129 05:38:31,800 --> 05:38:37,560 mode. That's why it's called inference mode. I'm not being saved to memory. If you'd like to 3130 05:38:37,560 --> 05:38:44,760 learn more about this, you go pie torch inference mode Twitter. I just remember to search for Twitter 3131 05:38:44,760 --> 05:38:53,800 because they did a big tweet storm about it. Here we go. So oh, this is another thing that we can 3132 05:38:53,800 --> 05:38:57,960 cover. I'm going to copy this in here. But there's also a blog post about what's going on behind 3133 05:38:57,960 --> 05:39:03,320 the scenes. Long story short, it makes your code faster. Want to make your inference code and pie 3134 05:39:03,320 --> 05:39:08,360 torch run faster? Here's a quick thread on doing exactly that. And that's what we're doing. So 3135 05:39:09,160 --> 05:39:14,840 I'm going to write down here. See more on inference mode here. 3136 05:39:17,400 --> 05:39:24,440 And I just want to highlight something as well is that they referenced torch no grad with the 3137 05:39:24,440 --> 05:39:29,000 torch inference mode context manager. Inference mode is fairly new in pie torch. So you might 3138 05:39:29,000 --> 05:39:34,600 see a lot of code existing pie torch code with torch dot no grad. You can use this as well. 3139 05:39:35,320 --> 05:39:40,360 Why? Preds equals model zero. And this will do much of the same as what inference mode is doing. 3140 05:39:40,360 --> 05:39:45,640 But inference mode has a few things that are advantages over no grad, which are discussed in 3141 05:39:45,640 --> 05:39:51,960 this thread here. But if we do this, we get very similar output to what we got before. 3142 05:39:51,960 --> 05:39:58,360 Grad function. But as you'll read in here and in the pie torch documentation, inference mode is 3143 05:39:58,360 --> 05:40:05,800 the favored way of doing inference for now. I just wanted to highlight this. So you can also do 3144 05:40:05,800 --> 05:40:26,440 something similar with torch dot no grad. However, inference mode is preferred. Alrighty. So I'm 3145 05:40:26,440 --> 05:40:32,920 just going to comment this out. So we just have one thing going on there. The main takeaway 3146 05:40:32,920 --> 05:40:38,680 from this video is that when we're making predictions, we use the context manager torch 3147 05:40:38,680 --> 05:40:44,040 dot inference mode. And right now, because our models variables or internal parameters are 3148 05:40:44,040 --> 05:40:51,640 randomly initialized, our models predictions are as good as random. So they're actually not too far 3149 05:40:51,640 --> 05:40:58,600 off where our values are. At least the red dots aren't like scattered all over here. But in the 3150 05:40:58,600 --> 05:41:04,840 upcoming videos, we're going to be writing some pie torch training code to move these values 3151 05:41:04,840 --> 05:41:12,440 closer to the green dots by looking at the training data here. So with that being said, 3152 05:41:12,440 --> 05:41:20,120 I'll see you in the next video. Friends, welcome back. In the last video, we saw that our model 3153 05:41:20,120 --> 05:41:26,120 performs pretty poorly. Like, ideally, these red dots should be in line with these green dots. 3154 05:41:26,120 --> 05:41:33,080 And we know that because why? Well, it's because our model is initialized with random parameters. 3155 05:41:33,080 --> 05:41:38,120 And I just want to put a little note here. You don't necessarily have to initialize your model 3156 05:41:38,120 --> 05:41:43,480 with random parameters. You could initialize it with this could be zero. Yeah, these two values, 3157 05:41:43,480 --> 05:41:49,400 weights can bias could be zero and you could go from there. Or you could also use the parameters 3158 05:41:49,400 --> 05:41:53,640 from another model. But we're going to see that later on. That's something called transfer learning. 3159 05:41:53,640 --> 05:42:00,680 That's just a little spoiler for what's to come. And so we've also discussed that an ideal model 3160 05:42:00,680 --> 05:42:09,720 will replicate these known parameters. So in other words, start with random unknown parameters, 3161 05:42:09,720 --> 05:42:17,080 these two values here. And then we want to write some code for our model to move towards estimating 3162 05:42:17,080 --> 05:42:22,920 the ideal parameters here. Now, I just want to be explicit here and write down some intuition 3163 05:42:22,920 --> 05:42:27,400 before we jump into the training code. But this is very exciting. We're about to get into 3164 05:42:27,400 --> 05:42:33,240 training our very first machine learning model. So what's right here, the whole idea of training 3165 05:42:34,120 --> 05:42:49,160 is for a model to move from some unknown parameters, these may be random to some known parameters. 3166 05:42:49,160 --> 05:43:02,040 Or in other words, from a poor representation, representation of the data to a better representation 3167 05:43:02,840 --> 05:43:09,480 of the data. And so in our case, would you say that our models representation of the green dots 3168 05:43:09,480 --> 05:43:14,920 here with this red dots, is that a good representation? Or is that a poor representation? 3169 05:43:14,920 --> 05:43:21,640 I mean, I don't know about you, but I would say that to me, this is a fairly poor representation. 3170 05:43:21,640 --> 05:43:28,600 And one way to measure the representation between your models outputs, in our case, the red dots, 3171 05:43:28,600 --> 05:43:36,040 the predictions, and the testing data, is to use a loss function. So I'm going to write 3172 05:43:36,040 --> 05:43:40,520 this down here. This is what we're moving towards. We're moving towards training, but we need a 3173 05:43:40,520 --> 05:43:49,560 way to measure how poorly our models predictions are doing. So one way to measure how poor or how 3174 05:43:49,560 --> 05:44:00,040 wrong your models predictions are, is to use a loss function. And so if we go pytorch loss 3175 05:44:00,040 --> 05:44:06,200 functions, we're going to see that pytorch has a fair few loss functions built in. But the essence 3176 05:44:06,200 --> 05:44:11,960 of all of them is quite similar. So just wait for this to load my internet's going a little bit 3177 05:44:11,960 --> 05:44:15,320 slow today, but that's okay. We're not in a rush here. We're learning something fun. 3178 05:44:16,040 --> 05:44:20,840 If I search here for loss, loss functions, here we go. So yeah, this is torch in N. These are the 3179 05:44:20,840 --> 05:44:25,240 basic building blocks for graphs, whole bunch of good stuff in here, including loss functions. 3180 05:44:25,240 --> 05:44:29,960 Beautiful. And this is another thing to note as well, another one of those scenarios where 3181 05:44:29,960 --> 05:44:35,960 there's more words for the same thing. You might also see a loss function referred to as a criterion. 3182 05:44:36,520 --> 05:44:42,200 There's another word called cost function. So I might just write this down so you're aware of it. 3183 05:44:42,200 --> 05:44:47,160 Yeah, cost function versus loss function. And maybe some formal definitions about what all of these 3184 05:44:47,160 --> 05:44:51,480 are. Maybe they're used in different fields. But in the case of we're focused on machine learning, 3185 05:44:51,480 --> 05:45:02,280 right? So I'm just going to go note, loss function may also be called cost function or criterion in 3186 05:45:03,080 --> 05:45:13,320 different areas. For our case, we're going to refer to it as a loss function. And let's 3187 05:45:13,320 --> 05:45:17,400 just formally define a loss function here, because we're going to go through a fair few steps in 3188 05:45:17,400 --> 05:45:22,680 the upcoming videos. So this is a warning, nothing we can't handle. But I want to put some formal 3189 05:45:22,680 --> 05:45:26,280 definitions on things. We're going to see them in practice. That's what I prefer to do, 3190 05:45:26,280 --> 05:45:30,760 rather than just sit here defining stuff. This lecture has already had enough text on the page. 3191 05:45:30,760 --> 05:45:38,520 So hurry up and get into coding Daniel. A loss function is a function to measure how wrong your 3192 05:45:38,520 --> 05:45:50,840 models predictions are to the ideal outputs. So lower is better. So ideally, think of a measurement, 3193 05:45:50,840 --> 05:45:55,400 how could we measure the difference between the red dots and the green dots? One of the 3194 05:45:55,400 --> 05:46:00,760 simplest ways to do so would be just measure the distance here, right? So if we go, let's just 3195 05:46:00,760 --> 05:46:09,560 estimate this is 035 to 0.8. They're abouts. So what's the difference there? About 0.45. 3196 05:46:09,560 --> 05:46:14,120 Then we could do the same again for all of these other dots, and then maybe take the average of that. 3197 05:46:15,320 --> 05:46:19,640 Now, if you've worked with loss functions before, you might have realized that I've just 3198 05:46:19,640 --> 05:46:25,320 reproduced mean absolute error. But we're going to get to that in a minute. So we need a loss 3199 05:46:25,320 --> 05:46:30,520 function. I'm going to write down another little dot point here. This is just setting up intuition. 3200 05:46:30,520 --> 05:46:37,240 Things we need to train. We need a loss function. This is PyTorch. And this is machine learning 3201 05:46:37,240 --> 05:46:42,440 in general, actually. But we're focused on PyTorch. We need an optimizer. What does the optimizer do? 3202 05:46:43,000 --> 05:46:52,520 Takes into account the loss of a model and adjusts the model's parameters. So the parameters recall 3203 05:46:52,520 --> 05:47:01,880 our weight and bias values. Weight and biases. We can check those or bias. We can check those by 3204 05:47:01,880 --> 05:47:10,600 going model dot parameter or parameters. But I also like, oh, that's going to give us a generator, 3205 05:47:10,600 --> 05:47:17,640 isn't it? Why do we not define the model yet? What do we call our model? Oh, model zero. Excuse me. 3206 05:47:17,640 --> 05:47:22,840 I forgot where. I'm going to build a lot of models in this course. So we're giving them numbers. 3207 05:47:24,120 --> 05:47:27,800 Modeled up parameters. Yeah, we've got a generator. So we'll turn that into a list. 3208 05:47:28,360 --> 05:47:32,600 But model zero, if we want to get them labeled, we want state dict here. 3209 05:47:35,480 --> 05:47:39,720 There we go. So our weight is this value. That's a random value we've set. And there's the bias. 3210 05:47:39,720 --> 05:47:45,080 And now we've only got two parameters for our model. So it's quite simple. However, the principles 3211 05:47:45,080 --> 05:47:50,040 that we're learning here are going to be the same principles, taking a loss function, 3212 05:47:50,040 --> 05:47:55,320 trying to minimize it, so getting it to lower. So the ideal model will predict exactly what our 3213 05:47:55,320 --> 05:48:02,760 test data is. And an optimizer will take into account the loss and will adjust a model's parameter. 3214 05:48:02,760 --> 05:48:08,440 And our case weights and bias to be, let's finish this definition takes into account the 3215 05:48:08,440 --> 05:48:15,880 loss of a model and adjust the model's parameters, e.g. weight and bias, in our case, to improve the 3216 05:48:15,880 --> 05:48:32,760 loss function. And specifically, for PyTorch, we need a training loop and a testing loop. 3217 05:48:32,760 --> 05:48:40,040 Now, this is what we're going to work towards building throughout the next couple of videos. 3218 05:48:40,040 --> 05:48:44,120 We're going to focus on these two first, the loss function and optimizer. There's the formal 3219 05:48:44,120 --> 05:48:47,320 definition of those. You're going to find many different definitions. That's how I'm going to 3220 05:48:47,320 --> 05:48:52,040 find them. Loss function measures how wrong your model's predictions are, lower is better, 3221 05:48:52,040 --> 05:48:57,560 optimizer takes into account the loss of your model. So how wrong it is, and starts to move 3222 05:48:57,560 --> 05:49:04,440 these two values into a way that improves where these red dots end up. But these, again, these 3223 05:49:04,440 --> 05:49:11,160 principles of a loss function and an optimizer can be for models with two parameters or models 3224 05:49:11,160 --> 05:49:17,080 with millions of parameters, can be for computer vision models, or could be for simple models like 3225 05:49:17,080 --> 05:49:22,760 ours that predict the dots on a straight line. So with that being said, let's jump into the next 3226 05:49:22,760 --> 05:49:28,120 video. We'll start to look a little deeper into loss function, row problem, and an optimizer. 3227 05:49:28,840 --> 05:49:36,280 I'll see you there. Welcome back. We're in the exciting streak of videos coming up here. I mean, 3228 05:49:36,280 --> 05:49:40,680 the whole course is fun. Trust me. But this is really exciting because training your first machine 3229 05:49:40,680 --> 05:49:45,320 learning model seems a little bit like magic, but it's even more fun when you're writing the code 3230 05:49:45,320 --> 05:49:50,360 yourself what's going on behind the scenes. So we discussed that the whole concept of training 3231 05:49:50,360 --> 05:49:54,760 is from going unknown parameters, random parameters, such as what we've got so far 3232 05:49:54,760 --> 05:49:59,800 to parameters that better represent the data. And we spoke of the concept of a loss function. 3233 05:49:59,800 --> 05:50:04,440 We want to minimize the loss function. That is the whole idea of a training loop in PyTorch, 3234 05:50:04,440 --> 05:50:10,600 or an optimization loop in PyTorch. And an optimizer is one of those ways that can 3235 05:50:10,600 --> 05:50:18,440 nudge the parameters of our model. In our case, weights or bias towards values rather than just 3236 05:50:18,440 --> 05:50:24,680 being random values like they are now towards values that lower the loss function. And if we 3237 05:50:24,680 --> 05:50:29,000 lower the loss function, what does a loss function do? It measures how wrong our models 3238 05:50:29,000 --> 05:50:34,040 predictions are compared to the ideal outputs. So if we lower that, well, hopefully we move 3239 05:50:34,040 --> 05:50:40,680 these red dots towards the green dots. And so as you might have guessed, PyTorch has some built 3240 05:50:40,680 --> 05:50:47,400 in functionality for implementing loss functions and optimizers. And by the way, what we're covering 3241 05:50:47,400 --> 05:50:52,920 so far is in the train model section of the PyTorch workflow fundamentals, I've got a little 3242 05:50:52,920 --> 05:50:57,480 nice table here, which describes a loss function. What does it do? Where does it live in PyTorch? 3243 05:50:57,480 --> 05:51:02,120 Common values, we're going to see some of these hands on. If you'd like to read about it, 3244 05:51:02,120 --> 05:51:07,160 of course, you have the book version of the course here. So loss functions in PyTorch, 3245 05:51:07,160 --> 05:51:11,800 I'm just in docstorch.nn. Look at this. Look at all these loss functions. There's far too many 3246 05:51:11,800 --> 05:51:16,200 for us to go through all in one hit. So we're just going to focus on some of the most common ones. 3247 05:51:16,200 --> 05:51:22,280 Look at that. We've got about what's our 15 loss functions, something like that? Well, truth be 3248 05:51:22,280 --> 05:51:28,680 told is that which one should use? You're not really going to know unless you start to work hands 3249 05:51:28,680 --> 05:51:34,360 on with different problems. And so in our case, we're going to be looking at L1 loss. And this is 3250 05:51:34,360 --> 05:51:39,480 an again, once more another instance where different machine learning libraries have different names 3251 05:51:39,480 --> 05:51:46,200 for the same thing, this is mean absolute error, which we kind of discussed in the last video, 3252 05:51:46,200 --> 05:51:51,800 which is if we took the distance from this red dot to this green dot and say at 0.4, they're about 3253 05:51:51,800 --> 05:51:58,200 0.4, 0.4, and then took the mean, well, we've got the mean absolute error. But in PyTorch, 3254 05:51:58,200 --> 05:52:03,480 they call it L1 loss, which is a little bit confusing because then we go to MSE loss, 3255 05:52:03,480 --> 05:52:09,880 which is mean squared error, which is L2. So naming conventions just takes a little bit of getting 3256 05:52:09,880 --> 05:52:16,040 used to this is a warning for you. So let's have a look at the L1 loss function. Again, 3257 05:52:16,040 --> 05:52:19,960 I'm just making you aware of where the other loss functions are. We'll do with some binary 3258 05:52:19,960 --> 05:52:25,160 cross entropy loss later in the course. And maybe even is that categorical cross entropy? 3259 05:52:26,120 --> 05:52:31,640 We'll see that later on. But all the others will be problem specific. For now, a couple of loss 3260 05:52:31,640 --> 05:52:37,240 functions like this, L1 loss, MSE loss, we use for regression problems. So that's predicting a number. 3261 05:52:38,040 --> 05:52:43,800 Cross entropy loss is a loss that you use with classification problems. But we'll see those hands 3262 05:52:43,800 --> 05:52:49,960 on later on. Let's have a look at L1 loss. So L1 loss creates a criterion. As I said, you might 3263 05:52:49,960 --> 05:52:55,000 hear the word criterion used in PyTorch for a loss function. I typically call them loss functions. 3264 05:52:55,000 --> 05:52:59,480 The literature typically calls it loss functions. That measures the mean absolute error. There we 3265 05:52:59,480 --> 05:53:07,000 go. L1 loss is the mean absolute error between each element in the input X and target Y. Now, 3266 05:53:07,000 --> 05:53:11,240 your extracurricular measure might have guessed is to read through the documentation for the 3267 05:53:11,240 --> 05:53:16,440 different loss functions, especially L1 loss. But for the sake of this video, let's just implement 3268 05:53:16,440 --> 05:53:22,840 it for ourselves. Oh, and if you want a little bit of a graphic, I've got one here. This is where 3269 05:53:22,840 --> 05:53:28,680 we're up to, by the way, picking a loss function optimizer for step two. This is a fun part, right? 3270 05:53:28,680 --> 05:53:33,560 We're getting into training a model. So we've got mean absolute error. Here's that graph we've 3271 05:53:33,560 --> 05:53:38,440 seen before. Oh, look at this. Okay. So we've got the difference here. I've actually measured 3272 05:53:38,440 --> 05:53:44,200 this before in the past. So I kind of knew what it was. Mean absolute error is if we repeat for 3273 05:53:44,200 --> 05:53:50,520 all samples in our set that we're working with. And if we take the absolute difference between 3274 05:53:50,520 --> 05:53:56,920 these two dots, well, then we take the mean, we've got mean absolute error. So MAE loss equals 3275 05:53:56,920 --> 05:54:01,000 torch mean we could write it out. That's the beauty of pine torch, right? We could write this out. 3276 05:54:01,000 --> 05:54:08,520 Or we could use the torch and N version, which is recommended. So let's jump in. There's a colorful 3277 05:54:08,520 --> 05:54:14,760 slide describing what we're about to do. So let's go set up a loss function. And then we're also 3278 05:54:14,760 --> 05:54:27,960 going to put in here, set up an optimizer. So let's call it loss FN equals NN dot L1 loss. 3279 05:54:29,400 --> 05:54:32,520 Simple as that. And then if we have a look at what's our loss function, what does this say? 3280 05:54:34,280 --> 05:54:37,000 Oh my goodness. My internet is going quite slow today. 3281 05:54:38,600 --> 05:54:42,360 It's raining outside. So there might be some delays somewhere. But that's right. Gives us a 3282 05:54:42,360 --> 05:54:48,360 chance to sit here and be mindful about what we're doing. Look at that. Okay. Loss function. 3283 05:54:48,360 --> 05:54:53,560 L1 loss. Beautiful. So we've got a loss function. Our objective for training a machine learning 3284 05:54:53,560 --> 05:54:58,600 model will be two. Let's go back. Look at the colorful graphic will be to minimize these 3285 05:54:58,600 --> 05:55:05,720 distances here. And in turn, minimize the overall value of MAE. That is our goal. 3286 05:55:05,720 --> 05:55:12,200 If our red dots line up with our green dots, we will have a loss value of zero, the ideal point 3287 05:55:12,200 --> 05:55:18,920 for a model to be. And so let's go here. We now need an optimizer. As we discussed before, 3288 05:55:18,920 --> 05:55:23,960 the optimizer takes into account the loss of a model. So these two work in tandem. 3289 05:55:23,960 --> 05:55:27,400 That's why I've put them as similar steps if we go back a few slides. 3290 05:55:28,760 --> 05:55:34,680 So this is why I put these as 2.1. Often picking a loss function and optimizer and pytorch 3291 05:55:34,680 --> 05:55:40,200 come as part of the same package because they work together. The optimizer's objective is to 3292 05:55:40,200 --> 05:55:45,880 give the model values. So parameters like a weight and a bias that minimize the loss function. 3293 05:55:45,880 --> 05:55:53,000 They work in tandem. And so let's see what an optimizer optimizes. Where might that be? 3294 05:55:53,000 --> 05:55:59,480 What if we search here? I typically don't use this search because I prefer just using Google 3295 05:55:59,480 --> 05:56:08,520 search. But does this give us optimizer? Hey, there we go. So again, pytorch has torch.optim 3296 05:56:09,640 --> 05:56:16,280 which is where the optimizers are. Torch.optim. Let me put this link in here. 3297 05:56:17,800 --> 05:56:21,800 This is another bit of your extracurricular. If you want to read more about different optimizers 3298 05:56:21,800 --> 05:56:26,920 in pytorch, as you might have guessed, they have a few. Torch.optim is a package implementing 3299 05:56:26,920 --> 05:56:32,520 various optimization algorithms. Most commonly used methods are already supported and the interface 3300 05:56:32,520 --> 05:56:38,120 is general enough so that more sophisticated ones can also be easily integrated into the future. 3301 05:56:38,120 --> 05:56:42,840 So if we have a look at what algorithms exist here, again, we're going to throw a lot of names 3302 05:56:42,840 --> 05:56:50,600 at you. But in the literature, a lot of them that have made it into here are already good working 3303 05:56:50,600 --> 05:56:55,880 algorithms. So it's a matter of picking whichever one's best for your problem. How do you find that 3304 05:56:55,880 --> 05:57:04,120 out? Well, SGD, stochastic gradient descent, is possibly the most popular. However, there are 3305 05:57:04,120 --> 05:57:11,160 some iterations on SGD, such as Adam, which is another one that's really popular. So again, 3306 05:57:11,160 --> 05:57:16,200 this is one of those other machine learning is part art, part science is trial and error of 3307 05:57:16,200 --> 05:57:20,280 figuring out what works best for your problem for us. We're going to start with SGD because 3308 05:57:20,280 --> 05:57:25,400 it's the most popular. And if you were paying attention to a previous video, you might have 3309 05:57:25,400 --> 05:57:31,800 seen that I said, look up gradient descent, wherever we got this gradient descent. There we go. 3310 05:57:32,360 --> 05:57:38,360 So this is one of the main algorithms that improves our models. So gradient descent and back 3311 05:57:38,360 --> 05:57:43,720 propagation. So if we have a look at this stochastic gradient descent, bit of a tongue twister, 3312 05:57:43,720 --> 05:57:49,400 is random gradient descent. So that's what stochastic means. So basically, our model 3313 05:57:49,400 --> 05:57:58,360 improves by taking random numbers, let's go down here, here, and randomly adjusting them 3314 05:57:58,360 --> 05:58:04,440 so that they minimize the loss. And once how optimizer, that's right here, once how optimizer 3315 05:58:04,440 --> 05:58:11,560 torch dot opt in, let's implement SGD, SGD stochastic gradient descent. We're going to write this here, 3316 05:58:11,560 --> 05:58:20,840 stochastic gradient descent. It starts by randomly adjusting these values. And once it's found 3317 05:58:20,840 --> 05:58:26,280 some random values or random steps that have minimized the loss value, we're going to see 3318 05:58:26,280 --> 05:58:32,600 this in action later on, it's going to continue adjusting them in that direction. So say it says, 3319 05:58:32,600 --> 05:58:37,880 oh, weights, if I increase the weights, it reduces the loss. So it's going to keep increasing the 3320 05:58:37,880 --> 05:58:44,760 weights until the weights no longer reduce the loss. Maybe it gets to a point at say 0.65. 3321 05:58:44,760 --> 05:58:48,760 If you increase the weights anymore, the loss is going to go up. So the optimizer is like, 3322 05:58:48,760 --> 05:58:53,320 well, I'm going to stop there. And then for the bias, the same thing happens. If it decreases the 3323 05:58:53,320 --> 05:58:57,640 bias and finds that the loss increases, well, it's going to go, well, I'm going to try increasing 3324 05:58:57,640 --> 05:59:04,600 the bias instead. So again, one last summary of what's going on here, a loss function measures 3325 05:59:04,600 --> 05:59:09,720 how wrong our model is. And the optimizer adjust our model parameters, no matter whether there's 3326 05:59:09,720 --> 05:59:15,320 two parameters or millions of them to reduce the loss. There are a couple of things that 3327 05:59:15,320 --> 05:59:23,160 an optimizer needs to take in. It needs to take in as an argument, params. So this is if we go to 3328 05:59:23,160 --> 05:59:30,600 SGD, I'm just going to link this as well. SGD, there's the formula of what SGD does. I look at this 3329 05:59:30,600 --> 05:59:35,240 and I go, hmm, there's a lot going on here. And take me a while to understand that. So I like to 3330 05:59:35,240 --> 05:59:43,640 see it in code. So we need params. This is short for what parameters should I optimize as an optimizer. 3331 05:59:43,640 --> 05:59:49,880 And then we also need an LR, which stands for, I'm going to write this in a comment, LR equals 3332 05:59:49,880 --> 05:59:55,320 learning rate, possibly the most, oh, I didn't even type rate, did I possibly the most important 3333 05:59:55,320 --> 06:00:02,360 hyper parameter you can set? So let me just remind you, I'm throwing lots of words out here, but I'm 3334 06:00:02,360 --> 06:00:07,240 kind of like trying to write notes about what we're doing. Again, we're going to see these in action 3335 06:00:07,240 --> 06:00:21,160 in a second. So check out our models and parameters. So a parameter is a value that the model sets 3336 06:00:21,160 --> 06:00:32,920 itself. So learning rate equals possibly the most important learning hyper parameter. I don't 3337 06:00:32,920 --> 06:00:39,160 need learning there, do I? Hyper parameter. And a hyper parameter is a value that us as a data scientist 3338 06:00:39,160 --> 06:00:46,840 or a machine learning engineer set ourselves, you can set. So the learning rate is, in our case, 3339 06:00:46,840 --> 06:00:52,440 let's go 0.01. You're like, Daniel, where did I get this value from? Well, again, these type of 3340 06:00:52,440 --> 06:01:00,360 values come with experience. I think it actually says it in here, LR, LR 0.1. Yeah, okay, so the 3341 06:01:00,360 --> 06:01:07,080 default is 0.1. But then if we go back to Optim, I think I saw it somewhere. Did I see it somewhere? 3342 06:01:07,080 --> 06:01:16,280 0.0? Yeah, there we go. Yeah, so a lot of the default settings are pretty good in torch optimizers. 3343 06:01:16,280 --> 06:01:22,520 However, the learning rate, what does it actually do? We could go 0.01. These are all common values 3344 06:01:22,520 --> 06:01:30,680 here. Triple zero one. I'm not sure exactly why. Oh, model, it's model zero. The learning rate says 3345 06:01:30,680 --> 06:01:36,600 to our optimizer, yes, it's going to optimize our parameters here. But the higher the learning 3346 06:01:36,600 --> 06:01:43,400 rate, the more it adjusts each of these parameters in one hit. So let's say it's 0.01. And it's going 3347 06:01:43,400 --> 06:01:49,560 to optimize this value here. So it's going to take that big of a step. If we changed it to here, 3348 06:01:49,560 --> 06:01:56,280 it's going to take a big step on this three. And if we changed it to all the way to the end 0.01, 3349 06:01:56,280 --> 06:02:01,640 it's only going to change this value. So the smaller the learning rate, the smaller the change 3350 06:02:01,640 --> 06:02:06,200 in the parameter, the larger the learning rate, the larger the change in the parameter. 3351 06:02:06,200 --> 06:02:13,320 So we've set up a loss function. We've set up an optimizer. Let's now move on to the next step 3352 06:02:13,320 --> 06:02:20,840 in our training workflow. And that's by building a training loop. Far out. This is exciting. I'll 3353 06:02:20,840 --> 06:02:29,400 see you in the next video. Welcome back. In the last video, we set up a loss function. And we set 3354 06:02:29,400 --> 06:02:35,240 up an optimizer. And we discussed the roles of each. So loss function measures how wrong our model 3355 06:02:35,240 --> 06:02:41,640 is. The optimizer talks to the loss function and goes, well, if I change these parameters a certain 3356 06:02:41,640 --> 06:02:47,480 way, does that reduce the loss function at all? And if it does, yes, let's keep adjusting them in 3357 06:02:47,480 --> 06:02:53,800 that direction. If it doesn't, let's adjust them in the opposite direction. And I just want to show 3358 06:02:53,800 --> 06:02:58,920 you I added a little bit of text here just to concretely put down what we were discussing. 3359 06:02:58,920 --> 06:03:05,320 Inside the optimizer, you'll often have to set two parameters, params and lr, where params is 3360 06:03:05,320 --> 06:03:10,840 the model parameters you'd like to optimize for an example, in our case, params equals our model 3361 06:03:10,840 --> 06:03:16,760 zero parameters, which were, of course, a weight and a bias. And the learning rate, which is lr 3362 06:03:16,760 --> 06:03:22,440 in optimizer, lr stands for learning rate. And the learning rate is a hyper parameter. Remember, 3363 06:03:22,440 --> 06:03:27,560 a hyper parameter is a value that we the data scientist or machine learning engineer sets, 3364 06:03:27,560 --> 06:03:35,000 whereas a parameter is what the model sets itself defines how big or smaller optimizer changes 3365 06:03:35,000 --> 06:03:41,080 the model parameters. So a small learning rate, so the smaller this value results in small 3366 06:03:41,080 --> 06:03:47,240 changes, a large learning rate results in large changes. So another question might be, 3367 06:03:47,880 --> 06:03:53,320 well, very valid question. Hey, I put this here already, is which loss function and which optimizer 3368 06:03:53,320 --> 06:03:58,760 should I use? So this is another tough one, because it's problem specific. But with experience 3369 06:03:58,760 --> 06:04:02,920 and machine learning, I'm showing you one example here, you'll get an idea of what works for your 3370 06:04:02,920 --> 06:04:08,040 particular problem for a regression problem, like ours, a loss function of l1 loss, which is mai 3371 06:04:08,040 --> 06:04:14,200 and pytorch. And an optimizer like torch dot opt in slash s gd like sarcastic gradient descent 3372 06:04:14,200 --> 06:04:18,600 will suffice. But for a classification problem, we're going to see this later on. 3373 06:04:18,600 --> 06:04:23,240 Not this one specifically, whether a photo is a cat of a dog, that's just an example of a binary 3374 06:04:23,240 --> 06:04:28,840 classification problem, you might want to use a binary classification loss. But with that being 3375 06:04:28,840 --> 06:04:35,640 said, we now are moving on to, well, here's our whole goal is to reduce the MAE of our model. 3376 06:04:35,640 --> 06:04:40,840 Let's get the workflow. We've done these two steps. Now we want to build a training loop. So 3377 06:04:40,840 --> 06:04:44,760 let's get back into here. There's going to be a fair few steps going on. We've already covered 3378 06:04:44,760 --> 06:04:54,520 a few, but hey, nothing we can't handle together. So building a training loop in pytorch. 3379 06:04:56,040 --> 06:05:01,320 So I thought about just talking about what's going on in the training loop, but we can talk 3380 06:05:01,320 --> 06:05:06,600 about the steps after we've coded them. How about we do that? So we want to build a training loop 3381 06:05:06,600 --> 06:05:16,200 and a testing loop. How about we do that? So a couple of things we need in a training loop. 3382 06:05:16,840 --> 06:05:20,440 So there's going to be a fair few steps here if you've never written a training loop before, 3383 06:05:20,440 --> 06:05:25,240 but that is completely fine because you'll find that the first couple of times that you write this, 3384 06:05:25,240 --> 06:05:28,920 you'll be like, oh my gosh, there's too much going on here. But then when you have practice, 3385 06:05:28,920 --> 06:05:33,480 you'll go, okay, I see what's going on here. And then eventually you'll write them with your 3386 06:05:33,480 --> 06:05:38,280 eyes closed. I've got a fun song for you to help you out remembering things. It's called the 3387 06:05:38,280 --> 06:05:43,400 unofficial pytorch optimization loop song. We'll see that later on, or actually, I'll probably leave 3388 06:05:43,400 --> 06:05:48,600 that as an extension, but you'll see that you can also functionize these things, which we will do 3389 06:05:48,600 --> 06:05:53,160 later in the course so that you can just write them once and then forget about them. But we're 3390 06:05:53,160 --> 06:05:57,640 going to write it all from scratch to begin with so we know what's happening. So we want to, 3391 06:05:57,640 --> 06:06:06,280 or actually step zero, is loop through the data. So we want to look at the data multiple times 3392 06:06:06,280 --> 06:06:11,240 because our model is going to, at first, start with random predictions on the data, make some 3393 06:06:11,240 --> 06:06:15,400 predictions. We're trying to improve those. We're trying to minimize the loss to make those 3394 06:06:15,400 --> 06:06:23,400 predictions. We do a forward pass. So forward pass. Why is it called a forward pass? So this 3395 06:06:23,400 --> 06:06:34,360 involves data moving through our model's forward functions. Now that I say functions because there 3396 06:06:34,360 --> 06:06:39,240 might be plural, there might be more than one. And the forward method recall, we wrote in our model 3397 06:06:39,240 --> 06:06:46,840 up here. Ford. A forward pass is our data going through this function here. And if you want to 3398 06:06:46,840 --> 06:06:54,600 look at it visually, let's look up a neural network graphic. Images, a forward pass is just 3399 06:06:55,400 --> 06:07:01,640 data moving from the inputs to the output layer. So starting here input layer moving through the 3400 06:07:01,640 --> 06:07:07,560 model. So that's a forward pass, also called forward propagation. Another time we'll have 3401 06:07:07,560 --> 06:07:14,040 more than one name is used for the same thing. So we'll go back down here, forward pass. And 3402 06:07:14,040 --> 06:07:22,760 I'll just write here also called forward propagation, propagation. Wonderful. And then we need to 3403 06:07:22,760 --> 06:07:33,480 calculate the loss. So forward pass. Let me write this. To calculate or to make predictions, 3404 06:07:33,480 --> 06:07:46,200 make predictions on data. So calculate the loss, compare forward pass predictions. Oh, there's 3405 06:07:46,200 --> 06:07:50,760 an undergoing in the background here of my place. We might be in for a storm. Perfect time to write 3406 06:07:50,760 --> 06:07:55,960 code, compare forward pass predictions to ground truth labels. We're going to see all this in code 3407 06:07:56,520 --> 06:08:01,160 in a second, calculate the loss. And then we're going to go optimise a zero grad. We haven't 3408 06:08:01,160 --> 06:08:04,760 spoken about what this is, but that's okay. We're going to see that in a second. I'm not going to 3409 06:08:04,760 --> 06:08:09,960 put too much there. Loss backward. We haven't discussed this one either. There's probably three 3410 06:08:09,960 --> 06:08:15,080 steps that we haven't really discussed. We've discussed the idea behind them, but not too much 3411 06:08:15,080 --> 06:08:23,800 in depth. Optimise our step. So this one is loss backwards is move backwards. If the forward pass 3412 06:08:23,800 --> 06:08:29,880 is forwards, like through the network, the forward pass is data goes into out. The backward pass 3413 06:08:29,880 --> 06:08:35,960 data goes, our calculations happen backwards. So we'll see what that is in a second. Where were 3414 06:08:35,960 --> 06:08:40,120 we over here? We've got too much going on. I'm getting rid of these moves backwards through 3415 06:08:41,480 --> 06:08:51,880 the network to calculate the gradients. Oh, oh, the gradients of each of the parameters 3416 06:08:53,320 --> 06:08:58,840 of our model with respect to the loss. Oh my gosh, that is an absolute mouthful, 3417 06:08:58,840 --> 06:09:06,600 but that'll do for now. Optimise a step. This is going to use the optimiser to adjust our 3418 06:09:06,600 --> 06:09:16,200 model's parameters to try and improve the loss. So remember how I said in a previous video 3419 06:09:16,200 --> 06:09:21,720 that I'd love you to watch the two videos I linked above. One on gradient descent and one 3420 06:09:21,720 --> 06:09:25,960 on back propagation. If you did, you might have seen like there's a fair bit of math going on in 3421 06:09:25,960 --> 06:09:33,000 there. Well, that's essentially how our model goes from random parameters to better parameters, 3422 06:09:33,000 --> 06:09:37,720 using math. Many people, one of the main things I get asked from machine learning is how do I 3423 06:09:37,720 --> 06:09:42,600 learn machine learning if I didn't do math? Well, the beautiful thing about PyTorch is that it 3424 06:09:42,600 --> 06:09:47,880 implements a lot of the math of back propagation. So this is back propagation. I'm going to write 3425 06:09:47,880 --> 06:09:53,160 this down here. This is an algorithm called back, back propagation, hence the loss backward. We're 3426 06:09:53,160 --> 06:10:00,280 going to see this in code in a second, don't you worry? And this is gradient descent. So these 3427 06:10:00,280 --> 06:10:06,760 two algorithms drive the majority of our learning. So back propagation, calculate the gradients of 3428 06:10:06,760 --> 06:10:11,400 the parameters of our model with respect to the loss function and optimise our step, 3429 06:10:11,400 --> 06:10:16,600 we'll trigger code to run gradient descent, which is to minimise the gradients because what is a 3430 06:10:16,600 --> 06:10:24,360 gradient? Let's look this up. What is a gradient? I know we haven't written a code yet, but we're 3431 06:10:24,360 --> 06:10:32,280 going to do that. Images. Gradient, there we go. Changing y, changing x. Gradient is from high 3432 06:10:32,280 --> 06:10:38,360 school math. Gradient is a slope. So if you were on a hill, let's find a picture of a hill. 3433 06:10:38,360 --> 06:10:50,440 Picture of a hill. There we go. This is a great big hill. So if you were on the top of this hill, 3434 06:10:50,440 --> 06:10:56,520 and you wanted to get to the bottom, how would you get to the bottom? Well, of course, you just 3435 06:10:56,520 --> 06:11:00,920 walked down the hill. But if you're a machine learning model, what are you trying to do? Let's 3436 06:11:00,920 --> 06:11:05,560 imagine your loss is the height of this hill. You start off with your losses really high, and you 3437 06:11:05,560 --> 06:11:10,200 want to take your loss down to zero, which is the bottom, right? Well, if you measure the gradient 3438 06:11:10,200 --> 06:11:17,800 of the hill, the bottom of the hill is in the opposite direction to where the gradient is steep. 3439 06:11:18,360 --> 06:11:23,560 Does that make sense? So the gradient here is an incline. We want our model to move towards the 3440 06:11:23,560 --> 06:11:27,800 gradient being nothing, which is down here. And you could argue, yeah, the gradient's probably 3441 06:11:27,800 --> 06:11:31,160 nothing up the top here, but let's just for argument's sake say that we want to get to the 3442 06:11:31,160 --> 06:11:35,480 bottom of the hill. So we're measuring the gradient, and one of the ways an optimisation algorithm 3443 06:11:35,480 --> 06:11:42,520 works is it moves our model parameters so that the gradient equals zero, and then if the gradient 3444 06:11:43,080 --> 06:11:48,520 of the loss equals zero, while the loss equals zero two. So now let's write some code. So we're 3445 06:11:48,520 --> 06:11:53,720 going to set up a parameter called or a variable called epochs. And we're going to start with one, 3446 06:11:53,720 --> 06:11:59,160 even though this could be any value, let me define these as we go. So we're going to write code to 3447 06:11:59,160 --> 06:12:10,440 do all of this. So epochs, an epoch is one loop through the data dot dot dot. So epochs, we're 3448 06:12:10,440 --> 06:12:15,320 going to start with one. So one time through all of the data, we don't have much data. And so 3449 06:12:15,880 --> 06:12:25,720 for epoch, let's go this, this is step zero, zero, loop through the data. By the way, when I say 3450 06:12:25,720 --> 06:12:32,520 loop through the data, I want you to do all of these steps within the loop. And do dot dot dot 3451 06:12:33,480 --> 06:12:39,080 loop through the data. So for epoch in range epochs, even though it's only going to be one, 3452 06:12:39,080 --> 06:12:44,120 we can adjust this later. And because epochs, we've set this ourselves, it is a, 3453 06:12:45,880 --> 06:12:55,080 this is a hyper parameter, because we've set it ourselves. And I know you could argue that, 3454 06:12:55,080 --> 06:13:01,640 hey, our machine learning parameters of model zero, or our model parameters, model zero aren't 3455 06:13:01,640 --> 06:13:06,440 actually parameters, because we've set them. But in the models that you build in the future, 3456 06:13:06,440 --> 06:13:12,600 they will likely be set automatically rather than you setting them explicitly like we've done when 3457 06:13:12,600 --> 06:13:17,080 we created model zero. And oh my gosh, this is taking quite a while to run. That's all right. 3458 06:13:17,080 --> 06:13:20,920 We don't need it to run fast. We just, we need to write some more code, then you'll come on. 3459 06:13:20,920 --> 06:13:27,480 There's a step here I haven't discussed either. Set the model to training mode. So pytorch models 3460 06:13:27,480 --> 06:13:32,520 have a couple of different modes. The default is training mode. So we can set it to training 3461 06:13:32,520 --> 06:13:40,200 mode by going like this. Train. So what does train mode do in a pytorch model? My goodness. 3462 06:13:40,200 --> 06:13:44,600 Is there a reason my engineer is going this slide? That's all right. I'm just going to 3463 06:13:44,600 --> 06:13:56,920 discuss this with talking again list. Train mode. Train mode in pytorch sets. Oh, there we go. 3464 06:13:57,560 --> 06:14:06,680 Requires grad equals true. Now I wonder if we do with torch dot no grad member no grad is similar 3465 06:14:06,680 --> 06:14:14,360 to inference mode. Will this adjust? See, I just wanted to take note of requires grad equals 3466 06:14:14,360 --> 06:14:19,560 true. Actually, what I might do is we do this in a different cell. Watch this. This is just going 3467 06:14:19,560 --> 06:14:24,600 to be rather than me just spit words at you. I reckon we might be able to get it work in doing 3468 06:14:24,600 --> 06:14:31,560 this. Oh, that didn't list the model parameters. Why did that not come out? Model zero dot eval. 3469 06:14:33,320 --> 06:14:40,120 So there's two modes of our mode and train mode model dot eval parameters. Hey, we're experimenting 3470 06:14:40,120 --> 06:14:44,680 together on the fly here. And actually, this is what I want you to do is I want you to experiment 3471 06:14:44,680 --> 06:14:53,000 with different things. It's not going to say requires grad equals false. Hmm. With torch dot no 3472 06:14:53,000 --> 06:15:03,000 grad. Model zero dot parameters. I don't know if this will work, but it definitely works behind 3473 06:15:03,000 --> 06:15:07,240 the scenes. And what I mean by works behind the scenes are not here. It works behind the scenes 3474 06:15:07,240 --> 06:15:11,400 when calculations have been made, but not if we're trying to explicitly print things out. 3475 06:15:12,760 --> 06:15:16,680 Well, that's an experiment that I thought was going to work and it didn't work. So train 3476 06:15:16,680 --> 06:15:26,120 mode in pytorch sets all parameters that require gradients to require gradients. 3477 06:15:27,240 --> 06:15:32,120 So do you remember with the picture of the hill? I spoke about how we're trying to minimize the 3478 06:15:32,120 --> 06:15:37,480 gradient. So the gradient is the steepness of the hill. If the height of the hill is a loss function 3479 06:15:37,480 --> 06:15:42,760 and we want to take that down to zero, we want to take the gradient down to zero. So same thing 3480 06:15:42,760 --> 06:15:49,880 with the gradients of our model parameters, which are here with respect to the loss function, 3481 06:15:49,880 --> 06:15:54,840 we want to try and minimize that gradient. So that's gradient descent is take that gradient down to 3482 06:15:54,840 --> 06:16:05,160 zero. So model dot train. And then there's also model zero dot a vowel. So turns off gradient 3483 06:16:05,160 --> 06:16:12,840 tracking. So we're going to see that later on. But for now, I feel like this video is getting far 3484 06:16:12,840 --> 06:16:17,560 too long. Let's finish the training loop in the next video. I'll see you there. 3485 06:16:19,960 --> 06:16:24,520 Friends, welcome back. In the last video, I promised a lot of code, but we didn't get there. We 3486 06:16:24,520 --> 06:16:29,320 discussed some important steps. I forgot how much behind the scenes there is to apply towards training 3487 06:16:29,320 --> 06:16:34,280 loop. And I think it's important to spend the time that we did discussing what's going on, 3488 06:16:34,280 --> 06:16:38,920 because there's a fair few steps. But once you know what's going on, I mean, later on, we don't 3489 06:16:38,920 --> 06:16:42,920 have to write all the code that we're going to write in this video, you can functionize it. We're 3490 06:16:42,920 --> 06:16:47,320 going to see that later on in the course, and it's going to run behind the scenes for us. But we're 3491 06:16:47,320 --> 06:16:52,440 spending a fair bit of time here, because this is literally the crux of how our model learns. So 3492 06:16:52,440 --> 06:16:58,920 let's get into it. So now we're going to implement the forward pass, which involves our model's 3493 06:16:58,920 --> 06:17:05,320 forward function, which we defined up here. When we built our model, the forward pass runs through 3494 06:17:05,320 --> 06:17:12,120 this code here. So let's just write that. So in our case, because we're training, I'm just 3495 06:17:12,120 --> 06:17:19,800 going to write here. This is training. We're going to see dot of our later on. We'll talk 3496 06:17:19,800 --> 06:17:25,240 about that when it comes. Let's do the forward pass. So the forward pass, we want to pass data 3497 06:17:25,240 --> 06:17:31,480 through our model's forward method. We can do this quite simply by going y pred. So y predictions, 3498 06:17:31,480 --> 06:17:37,880 because remember, we're trying to use our ideal model is using x test to predict y test 3499 06:17:38,440 --> 06:17:44,520 on our test data set. We make predictions on our test data set. We learn on our training data set. 3500 06:17:44,520 --> 06:17:49,080 So we're passing, which is going to get rid of that because we don't need that. So we're 3501 06:17:49,080 --> 06:17:56,840 passing our model x train and model zero is going to be our current model. There we go. So we learn 3502 06:17:56,840 --> 06:18:03,000 patterns on the training data to evaluate our model on the test data. Number two, where we are. 3503 06:18:05,160 --> 06:18:10,200 So we have to calculate the loss. Now, in a previous video, we set up a loss function. 3504 06:18:10,200 --> 06:18:15,000 So this is going to help us calculate the what what kind of loss are we using? We want to calculate 3505 06:18:15,000 --> 06:18:21,880 the MAE. So the difference or the distance between our red dot and a green dot. And the formula would 3506 06:18:21,880 --> 06:18:27,400 be the same if we had 10,000 red dots and 10,000 green dots, we're calculating how far they are 3507 06:18:27,400 --> 06:18:37,320 apart. And then we're taking the mean of that value. So let's go back here. So calculate the loss. 3508 06:18:38,040 --> 06:18:42,760 And in our case, we're going to set loss equal to our loss function, which is L one loss in 3509 06:18:42,760 --> 06:18:53,240 PyTorch, but it is MAE. Y-pred and Y-train. So we're calculating the difference between our models 3510 06:18:53,240 --> 06:18:59,800 predictions on the training data set and the ideal training values. And if you want to go into 3511 06:19:00,680 --> 06:19:08,360 torch dot NN loss functions, that's going to show you the order because sometimes this confuses me 3512 06:19:08,360 --> 06:19:14,920 to what order the values go in here, but it goes prediction first, then labels and I may be wrong 3513 06:19:14,920 --> 06:19:20,600 there because I get confused here. My dyslexia kicks in, but I'm pretty sure it's predictions first, 3514 06:19:20,600 --> 06:19:29,720 then actual labels. Do we have an example of where it's used? Yeah, import first, target next. 3515 06:19:30,360 --> 06:19:35,240 So there we go. And truth be told, because it's mean absolute error, it shouldn't actually matter 3516 06:19:35,240 --> 06:19:40,920 too much. But in the case of staying true to the documentation, let's do inputs first and then 3517 06:19:40,920 --> 06:19:48,200 targets next for the rest of the course. Then we're going to go optimizer zero grad. Hmm, 3518 06:19:48,760 --> 06:19:52,520 haven't discussed this one, but that's okay. I'm going to write the code and then I'm going to 3519 06:19:52,520 --> 06:19:59,080 discuss what it does. So what does this do? Actually, before we discuss this, I'm going to write 3520 06:19:59,080 --> 06:20:04,680 these two steps because they kind of all work together. And it's a lot easier to discuss what 3521 06:20:04,680 --> 06:20:13,560 optimizer zero grad does in the context of having everything else perform back propagation 3522 06:20:14,680 --> 06:20:24,440 on the loss with respect to the parameters of the model. Back propagation is going to take 3523 06:20:24,440 --> 06:20:29,160 the loss value. So lost backward, I always say backwards, but it's just backward. That's the code 3524 06:20:29,160 --> 06:20:41,320 there. And then number five is step the optimizer. So perform gradient descent. So optimizer dot 3525 06:20:41,320 --> 06:20:47,720 step. Oh, look at us. We just wrote the five major steps of a training loop. Now let's discuss 3526 06:20:47,720 --> 06:20:54,040 how all of these work together. So it's kind of strange, like the ordering of these, you might 3527 06:20:54,040 --> 06:20:59,720 think, Oh, what should I do the order? Typically the forward pass and the loss come straight up. 3528 06:20:59,720 --> 06:21:05,160 Then there's a little bit of ambiguity around what order these have to come in. But the optimizer 3529 06:21:05,160 --> 06:21:12,520 step should come after the back propagation. So I just like to keep this order how it is because 3530 06:21:12,520 --> 06:21:17,720 this works. Let's just keep it that way. But what happens here? Well, it also is a little bit 3531 06:21:17,720 --> 06:21:23,320 confusing in the first iteration of the loop because we've got zero grad. But what happens here is 3532 06:21:23,320 --> 06:21:29,480 that the optimizer makes some calculations in how it should adjust model parameters with regards to 3533 06:21:29,480 --> 06:21:38,520 the back propagation of the loss. And so by default, these will by default, how the optimizer 3534 06:21:38,520 --> 06:21:54,600 changes will accumulate through the loop. So we have to zero them above in step three 3535 06:21:54,600 --> 06:22:01,240 for the next iteration of the loop. So a big long comment there. But what this is saying is, 3536 06:22:01,240 --> 06:22:06,760 let's say we go through the loop and the optimizer chooses a value of one, change it by one. And 3537 06:22:06,760 --> 06:22:10,360 then it goes through a loop again, if we didn't zero it, if we didn't take it to zero, because 3538 06:22:10,360 --> 06:22:16,120 that's what it is doing, it's going one to zero, it would go, okay, next one, two, three, four, 3539 06:22:16,120 --> 06:22:21,240 five, six, seven, eight, all through the loop, right? Because we're looping here. If this was 3540 06:22:21,800 --> 06:22:27,240 10, it would accumulate the value that it's supposed to change 10 times. But we want it to start 3541 06:22:27,240 --> 06:22:32,840 again, start fresh each iteration of the loop. And now the reason why it accumulates, that's 3542 06:22:32,840 --> 06:22:37,000 pretty deep in the pytorch documentation. From my understanding, there's something to do with 3543 06:22:37,000 --> 06:22:41,480 like efficiency of computing. If you find out what the exact reason is, I'd love to know. 3544 06:22:42,040 --> 06:22:49,320 So we have to zero it, then we perform back propagation. If you recall, back propagation is 3545 06:22:49,320 --> 06:22:57,000 discussed in here. And then with optimizer step, we form gradient descent. So the beauty of pytorch, 3546 06:22:57,000 --> 06:23:02,520 this is the beauty of pytorch, is that it will perform back propagation, we're going to have a 3547 06:23:02,520 --> 06:23:10,040 look at this in second, and gradient descent for us. So to prevent this video from getting too long, 3548 06:23:10,040 --> 06:23:15,160 I know we've just written code, but I would like you to practice writing a training loop 3549 06:23:15,160 --> 06:23:19,160 yourself, just write this code, and then run it and see what happens. Actually, you can comment 3550 06:23:19,160 --> 06:23:23,480 this out, we're going to write the testing loop in a second. So your extra curriculum for this 3551 06:23:23,480 --> 06:23:31,720 video is to, one, rewrite this training loop, is to, two, sing the pytorch optimization loop 3552 06:23:31,720 --> 06:23:37,320 song, let's go into here. If you want to remember the steps, well, I've got a song for you. This is 3553 06:23:37,320 --> 06:23:41,720 the training loop song, we haven't discussed the test step, but maybe you could try this yourself. 3554 06:23:42,520 --> 06:23:47,640 So this is an old version of the song, actually, I've got a new one for you. But let's sing this 3555 06:23:47,640 --> 06:23:54,840 together. It's training time. So we do the forward pass, calculate the loss, optimise a zero grad, 3556 06:23:54,840 --> 06:24:01,160 loss backwards, optimise a step, step, step. Now you only have to call optimise a step once, 3557 06:24:01,160 --> 06:24:08,280 this is just for jingle purposes. But for test time, let's test with torch no grad, do the forward 3558 06:24:08,280 --> 06:24:15,640 pass, calculate the loss, watch it go down, down, down. That's from my Twitter, but this is a way 3559 06:24:15,640 --> 06:24:22,200 that I help myself remember the steps that are going on in the code here. And if you want the 3560 06:24:22,200 --> 06:24:31,400 video version of it, well, you're just going to have to search unofficial pytorch optimisation loop 3561 06:24:31,400 --> 06:24:38,440 song. Oh, look at that, who's that guy? Well, he looks pretty cool. So I'll let you check that 3562 06:24:38,440 --> 06:24:46,440 out in your own time. But for now, go back through the training loop steps. I've got a colorful 3563 06:24:46,440 --> 06:24:50,280 graphic coming up in the next video, we're going to write the testing steps. And then we're going 3564 06:24:50,280 --> 06:24:55,000 to go back one more time and talk about what's happening in each of them. And again, if you'd 3565 06:24:55,000 --> 06:24:59,720 like some even more extra curriculum, don't forget the videos I've shown you on back propagation 3566 06:24:59,720 --> 06:25:05,800 and gradient descent. But for now, let's leave this video here. I'll see you in the next one. 3567 06:25:05,800 --> 06:25:12,680 Friends, welcome back. In the last few videos, we've been discussing the steps in a training 3568 06:25:12,680 --> 06:25:17,880 loop in pytorch. And there's a fair bit going on. So in this video, we're going to step back 3569 06:25:17,880 --> 06:25:23,400 through what we've done just to recap. And then we're going to get into testing. And it's nice 3570 06:25:23,400 --> 06:25:28,520 and early where I am right now. The sun's about to come up. It's a very, very beautiful morning 3571 06:25:28,520 --> 06:25:34,040 to be writing code. So let's jump in. We've got a little song here for what we're doing in the 3572 06:25:34,040 --> 06:25:42,360 training steps. For an epoch in a range, comodel.train, do the forward pass, calculate the loss of the 3573 06:25:42,360 --> 06:25:51,160 measure zero grad, last backward of the measure step step step. That's the little jingle I use to 3574 06:25:51,160 --> 06:25:55,720 remember the steps in here, because the first time you write it, there's a fair bit going on. 3575 06:25:55,720 --> 06:26:01,480 But subsequent steps and subsequent times that you do write it, you'll start to memorize this. 3576 06:26:01,480 --> 06:26:06,840 And even better later on, we're going to put it into a function so that we can just call it 3577 06:26:06,840 --> 06:26:12,600 over and over and over and over again. With that being said, let's jump in to a colorful slide, 3578 06:26:13,160 --> 06:26:18,280 because that's a lot of code on the page. Let's add some color to it, understand what's happening. 3579 06:26:18,280 --> 06:26:24,840 That way you can refer to this and go, Hmm, I see what's going on now. So for the loop, this is why 3580 06:26:24,840 --> 06:26:30,760 it's called a training loop. We step through a number of epochs. One epoch is a single forward 3581 06:26:30,760 --> 06:26:36,760 pass through the data. So pass the data through the model for a number of epochs. Epox is a 3582 06:26:36,760 --> 06:26:42,600 hyper parameter, which means you could set it to 100, you could set it to 1000, you could set it 3583 06:26:42,600 --> 06:26:49,720 to one as we're going to see later on in this video. We skip this step with the colors, but 3584 06:26:49,720 --> 06:26:55,720 we put the model in we call model.train. This is the default mode that the model is in. 3585 06:26:55,720 --> 06:27:01,400 Essentially, it sets up a whole bunch of settings behind the scenes in our model parameters so that 3586 06:27:01,400 --> 06:27:07,400 they can track the gradients and do a whole bunch of learning behind the scenes with these 3587 06:27:07,400 --> 06:27:13,640 functions down here. PyTorch does a lot of this for us. So the next step is the forward pass. 3588 06:27:14,280 --> 06:27:18,920 We perform a forward pass on the training data in the training loop. This is an important note. 3589 06:27:18,920 --> 06:27:24,760 In the training loop is where the model learns patterns on the training data. Whereas in the 3590 06:27:24,760 --> 06:27:30,440 testing loop, we haven't got to that yet is where we evaluate the patterns that our model has learned 3591 06:27:30,440 --> 06:27:36,120 or the parameters that our model has learned on unseen data. So we pass the data through the model, 3592 06:27:36,120 --> 06:27:41,160 this will perform the forward method located within the model object. So because we created 3593 06:27:41,160 --> 06:27:46,600 a model object, you can actually call your models whatever you want, but it's good practice to 3594 06:27:46,600 --> 06:27:51,080 you'll often see it just called model. And if you remember, we'll go back to the code. 3595 06:27:51,080 --> 06:27:59,240 We created a forward method in our model up here, which is this, because our linear regression model, 3596 06:27:59,240 --> 06:28:06,120 class, subclasses, nn.module, we need to create our own custom forward method. So that's why it's 3597 06:28:06,120 --> 06:28:11,320 called a forward pass is because not only does it, well, the technical term is forward propagation. 3598 06:28:11,320 --> 06:28:20,920 So if we have a look at a neural network picture, forward propagation just means going through 3599 06:28:20,920 --> 06:28:26,680 the network from the input to the output, there's a thing called back propagation, which we're going 3600 06:28:26,680 --> 06:28:31,960 to discuss in a second, which happens when we call loss.backward, which is going backward through 3601 06:28:31,960 --> 06:28:40,440 the model. But let's return to our colorful slide. We've done the forward pass, call a forward method, 3602 06:28:40,440 --> 06:28:48,360 which performs some calculation on the data we pass it. Next is we calculate the loss value, 3603 06:28:48,360 --> 06:28:53,720 how wrong the model's predictions are. And this will depend on what loss function you use, 3604 06:28:53,720 --> 06:28:58,200 what kind of predictions your model is outputting, and what kind of true values you have. 3605 06:28:58,760 --> 06:29:03,880 But that's what we're doing here. We're comparing our model's predictions on the training data 3606 06:29:03,880 --> 06:29:11,320 to what they should ideally be. And these will be the training labels. The next step, we zero 3607 06:29:11,320 --> 06:29:17,320 the optimizer gradients. So why do we do this? Well, it's a little confusing for the first epoch in 3608 06:29:17,320 --> 06:29:23,240 the loop. But as we get down to optimizer dot step here, the gradients that the optimizer 3609 06:29:23,240 --> 06:29:30,840 calculates accumulate over time so that for each epoch for each loop step, we want them to go back 3610 06:29:30,840 --> 06:29:37,480 to zero. And now the exact reason behind why the optimizer accumulates gradients is buried somewhere 3611 06:29:37,480 --> 06:29:43,240 within the pie torch documentation. I'm not sure of the exact reason from memory. It's because of 3612 06:29:43,240 --> 06:29:48,680 compute optimization. It just adds them up in case you wanted to know what they were. But if 3613 06:29:48,680 --> 06:29:57,160 you find out exactly, I'd love to know. Next step is to perform back propagation on the loss function. 3614 06:29:57,160 --> 06:30:02,760 That's what we're calling loss. backward. Now back propagation is we compute the gradient of 3615 06:30:02,760 --> 06:30:08,680 every parameter with requires grad equals true. And if you recall, we go back to our code. 3616 06:30:08,680 --> 06:30:16,360 We've set requires grad equals true for our parameters. Now the reason we've set requires 3617 06:30:16,360 --> 06:30:23,400 grad equals true is not only so back propagation can be performed on it. But let me show you what 3618 06:30:23,400 --> 06:30:30,600 the gradients look like. So let's go loss function curve. That's a good idea. So we're looking for 3619 06:30:30,600 --> 06:30:38,600 so we're looking for some sort of convex curve here. There we go. L two loss. We're using L one loss 3620 06:30:38,600 --> 06:30:43,800 at the moment. Is there a better one here? All we need is just a nice looking curve. Here we go. 3621 06:30:44,760 --> 06:30:51,160 So this is why we keep track of the gradients behind the scenes. Pie torch is going to create 3622 06:30:51,160 --> 06:30:56,440 some sort of curve for all of our parameters that looks like this. Now this is just a 2d plot. 3623 06:30:56,440 --> 06:31:02,120 So the reason why we're just using an example from Google images is one, because you're going to 3624 06:31:02,120 --> 06:31:08,200 spend a lot of your time Googling different things. And two, in practice, when you have your own 3625 06:31:08,200 --> 06:31:14,760 custom neural networks, right now we only have two parameters. So it's quite easy to visualize a 3626 06:31:14,760 --> 06:31:21,800 loss function curve like this. But when you have say 10 million parameters, you basically can't 3627 06:31:21,800 --> 06:31:27,000 visualize what's going on. And so pie torch again will take care of these things behind the scenes. 3628 06:31:27,000 --> 06:31:33,400 But what it's doing is when we say requires grad pie torch is going to track the gradients 3629 06:31:33,400 --> 06:31:39,480 of each of our parameters. And so what we're trying to do here with back propagation and 3630 06:31:39,480 --> 06:31:46,840 subsequently gradient descent is calculate where the lowest point is. Because this is a loss function, 3631 06:31:46,840 --> 06:31:53,240 this is MSC loss, we could trade this out to be MAE loss in our case or L1 loss for our specific 3632 06:31:53,240 --> 06:31:59,800 problem. But this is some sort of parameter. And we calculate the gradients because what is the 3633 06:31:59,800 --> 06:32:11,800 gradient? Let's have a look. What is a gradient? A gradient is an inclined part of a road or railway. 3634 06:32:11,800 --> 06:32:17,160 Now we want it in machine learning. What's it going to give us in machine learning, a gradient 3635 06:32:17,160 --> 06:32:23,000 is a derivative of a function that has more than one input variable. Okay, let's dive in a little 3636 06:32:23,000 --> 06:32:28,280 deeper. See, here's some beautiful loss landscapes. We're trying to get to the bottom of here. This 3637 06:32:28,280 --> 06:32:35,800 is what gradient descent is all about. So oh, there we go. So this is a cost function, which is also a 3638 06:32:35,800 --> 06:32:40,920 loss function. We start with a random initial variable. What have we done? We started with a 3639 06:32:40,920 --> 06:32:47,080 random initial variable. Right? Okay. And then we take a learning step. Beautiful. This is W. So 3640 06:32:47,080 --> 06:32:52,120 this could be our weight parameter. Okay, we're connecting the dots here. This is exciting. 3641 06:32:52,840 --> 06:32:57,000 We've got a lot of tabs here, but that's all right. We'll bring this all together in a second. 3642 06:32:57,000 --> 06:33:02,040 And what we're trying to do is come to the minimum. Now, why do we need to calculate the gradients? 3643 06:33:02,040 --> 06:33:08,120 Well, the gradient is what? Oh, value of weight. Here we go. This is even better. 3644 06:33:08,120 --> 06:33:14,280 I love Google images. So this is our loss. And this is a value of a weight. So we calculate the 3645 06:33:14,280 --> 06:33:23,240 gradients. Why? Because the gradient is the slope of a line or the steepness. And so if we 3646 06:33:23,240 --> 06:33:28,440 calculate the gradient here, and we find that it's really steep right up the top of this, 3647 06:33:29,160 --> 06:33:34,520 this incline, we might head in the opposite direction to that gradient. That's what gradient 3648 06:33:34,520 --> 06:33:40,360 descent is. And so if we go down here, now, what are these step points? There's a little thing that 3649 06:33:40,360 --> 06:33:44,120 I wrote down in the last video at the end of the last video I haven't told you about yet, 3650 06:33:44,120 --> 06:33:50,520 but I was waiting for a moment like this. And if you recall, I said kind of all of these three steps 3651 06:33:50,520 --> 06:33:55,080 optimizes zero grad loss backward, optimizes step are all together. So we calculate the 3652 06:33:55,080 --> 06:33:59,960 gradients because we want to head in the opposite direction of that gradient to get to a gradient 3653 06:33:59,960 --> 06:34:05,080 value of zero. And if we get to a gradient value of zero with a loss function, well, then the loss 3654 06:34:05,080 --> 06:34:10,920 is also zero. So that's why we keep track of a gradient with requires grad equals true. 3655 06:34:10,920 --> 06:34:15,960 And again, PyTorch does a lot of this behind the scenes. And if you want to dig more into 3656 06:34:15,960 --> 06:34:20,920 what's going on here, I'm going to show you some extra resources for back propagation, 3657 06:34:20,920 --> 06:34:25,800 which is calculating this gradient curve here, and gradient descent, which is finding the bottom 3658 06:34:25,800 --> 06:34:30,680 of it towards the end of this video. And again, if we started over this side, we would just go 3659 06:34:30,680 --> 06:34:36,520 in the opposite direction of this. So maybe this is a positive gradient here, and we just go in the 3660 06:34:36,520 --> 06:34:41,640 opposite direction here. We want to get to the bottom. That is the main point of gradient descent. 3661 06:34:42,680 --> 06:34:50,600 And so if we come back, I said, just keep this step size in mind here. If we come back to where 3662 06:34:50,600 --> 06:34:56,600 we created our loss function and optimizer, I put a little tidbit here for the optimizer. 3663 06:34:57,480 --> 06:35:00,920 Because we've written a lot of code, and we haven't really discussed what's going on, but 3664 06:35:00,920 --> 06:35:06,360 I like to do things on the fly as we need them. So inside our optimizer, we'll have main two 3665 06:35:06,920 --> 06:35:11,560 parameters, which is params. So the model parameters you'd like to optimize, 3666 06:35:11,560 --> 06:35:17,160 params equals model zero dot parameters in our case. And then PyTorch is going to create 3667 06:35:17,160 --> 06:35:22,360 something similar to this curve, not visually, but just mathematically behind the scenes for 3668 06:35:22,360 --> 06:35:27,240 every parameter. Now, this is a value of weight. So this would just be potentially the weight 3669 06:35:27,240 --> 06:35:32,040 parameter of our network. But again, if you have 10 million parameters, there's no way you could 3670 06:35:32,040 --> 06:35:36,920 just create all of these curves yourself. That's the beauty of PyTorch. It's doing this behind the 3671 06:35:36,920 --> 06:35:43,960 scenes through a mechanism called torch autograd, which is auto gradient calculation. And there's 3672 06:35:43,960 --> 06:35:48,360 beautiful documentation on this. If you'd like to read more on how it works, please go through 3673 06:35:48,360 --> 06:35:53,000 that. But essentially behind the scenes, it's doing a lot of this for us for each parameter. 3674 06:35:53,000 --> 06:35:58,760 That's the optimizer. Then within the optimizer, once we've told it what parameters to optimize, 3675 06:35:58,760 --> 06:36:04,680 we have the learning rate. So the learning rate is another hyper parameter that defines how big or 3676 06:36:04,680 --> 06:36:11,080 small the optimizer changes the parameters with each step. So a small learning rate results in 3677 06:36:11,080 --> 06:36:16,520 small changes, whereas a large learning rate is in large changes. And so if we look at this 3678 06:36:16,520 --> 06:36:22,760 curve here, we might at the beginning start with large steps, so we can get closer and closer to 3679 06:36:22,760 --> 06:36:27,960 the bottom. But then as we get closer and closer to the bottom, to prevent stepping over to this 3680 06:36:27,960 --> 06:36:34,360 side of the curve, we might do smaller and smaller steps. And the optimizer in PyTorch, 3681 06:36:34,360 --> 06:36:39,400 there are optimizers that do that for us. But there is also another concept called learning 3682 06:36:39,400 --> 06:36:46,040 rate scheduling, which is, again, something if you would like to look up and do more. But 3683 06:36:46,040 --> 06:36:51,400 learning rate scheduling essentially says, hey, maybe start with some big steps. And then as we 3684 06:36:51,400 --> 06:36:57,480 get closer and closer to the bottom, reduce how big the steps are that we take. Because if you've 3685 06:36:57,480 --> 06:37:04,600 ever seen a coin, coin at the back of couch. This is my favorite analogy for this. If you've ever 3686 06:37:04,600 --> 06:37:11,080 tried to reach a coin at the back of a couch, like this excited young chap, if you're reaching 3687 06:37:11,080 --> 06:37:17,000 towards the back of a couch, you take quite big steps as you say your arm was over here, 3688 06:37:17,000 --> 06:37:22,520 you would take quite big steps until you get to about here. And in the closer you get to the coin, 3689 06:37:22,520 --> 06:37:27,880 the smaller and smaller your steps are. Otherwise, what's going to happen? The coin is going to be 3690 06:37:27,880 --> 06:37:34,440 lost. Or if you took two small steps, you'd never get to the coin. It would take forever to get there. 3691 06:37:34,440 --> 06:37:40,440 So that's the concept of learning rate. If you take two big steps, you're going to just end up 3692 06:37:40,440 --> 06:37:46,040 over here. If you take two small steps, it's going to take you forever to get to the bottom here. 3693 06:37:46,040 --> 06:37:49,640 And this bottom point is called convergence. That's another term you're going to come across. I 3694 06:37:49,640 --> 06:37:53,560 know I'm throwing a lot of different terms at you, but that's the whole concept of the learning 3695 06:37:53,560 --> 06:37:59,480 rate. How big is your step down here? In gradient descent. Gradient descent is this. Back propagation 3696 06:37:59,480 --> 06:38:05,640 is calculating these derivative curves or the gradient curves for each of the parameters in our 3697 06:38:05,640 --> 06:38:12,360 model. So let's get out of here. We'll go back to our training steps. Where were we? I think we're 3698 06:38:12,360 --> 06:38:19,640 up to back propagation. Have we done backward? Yes. So the back propagation is where we do the 3699 06:38:19,640 --> 06:38:25,080 backward steps. So the forward pass, forward propagation, go from input to output. Back propagation, 3700 06:38:25,080 --> 06:38:30,040 we take the gradients of the loss function with respect to each parameter in our model 3701 06:38:30,040 --> 06:38:35,320 by going backwards. That's what happens when we call loss.backward. PyTorch does that for us 3702 06:38:35,320 --> 06:38:43,720 behind the scenes. And then finally, step number five is step the optimizer. We've kind of discussed 3703 06:38:43,720 --> 06:38:51,560 that. As I said, if we take a step, let's get our loss curve back up. Loss function curve. 3704 06:38:51,560 --> 06:38:59,080 Doesn't really matter what curve we use. The optimizer step is taking a step this way to try 3705 06:38:59,080 --> 06:39:06,360 and optimize the parameters so that we can get down to the bottom here. And I also just noted 3706 06:39:06,360 --> 06:39:09,960 here that you can turn all of this into a function so we don't necessarily have to remember to 3707 06:39:09,960 --> 06:39:15,240 write these every single time. The ordering of this, you'll want to do the forward pass first. 3708 06:39:15,240 --> 06:39:19,560 And then calculate the loss because you can't calculate the loss unless you do the forward pass. 3709 06:39:19,560 --> 06:39:24,680 I like this ordering here of these three as well. But you also want to do the optimizer step 3710 06:39:24,680 --> 06:39:29,880 after the loss backward. So this is my favorite ordering. It works. If you like this ordering, 3711 06:39:29,880 --> 06:39:35,080 you can take that as well. With that being said, I think this video has gotten long enough. 3712 06:39:35,080 --> 06:39:42,680 In the next video, I'd like to step through this training loop one epoch at a time so that we can 3713 06:39:42,680 --> 06:39:47,080 see, I know I've just thrown a lot of words at you that this optimizer is going to try and 3714 06:39:47,080 --> 06:39:53,400 optimize our parameters each step. But let's see that in action how our parameters of our model 3715 06:39:53,400 --> 06:40:00,760 actually change every time we go through each one of these steps. So I'll see you in the next video. 3716 06:40:00,760 --> 06:40:09,160 Let's step through our model. Welcome back. And we've spent a fair bit of time on the training loop 3717 06:40:09,160 --> 06:40:13,160 and the testing loop. Well, we haven't even got to that yet, but there's a reason behind this, 3718 06:40:13,160 --> 06:40:17,400 because this is possibly one of the most important things aside from getting your data ready, 3719 06:40:17,400 --> 06:40:22,920 which we're going to see later on in PyTorch deep learning is writing the training loop, 3720 06:40:22,920 --> 06:40:27,000 because this is literally like how your model learns patterns and data. So that's why we're 3721 06:40:27,000 --> 06:40:31,320 spending a fair bit of time on here. And we'll get to the testing loop, because that's how you 3722 06:40:31,320 --> 06:40:35,960 evaluate the patterns that your model has learned from data, which is just as important as learning 3723 06:40:35,960 --> 06:40:41,640 the patterns themselves. And following on from the last couple of videos, I've just linked some 3724 06:40:42,440 --> 06:40:46,760 YouTube videos that I would recommend for extra curriculum for back propagation, 3725 06:40:46,760 --> 06:40:54,440 which is what happens when we call loss stop backward down here. And for the optimizer step, 3726 06:40:54,440 --> 06:40:59,560 gradient descent is what's happening there. So I've linked some extra resources for what's going 3727 06:40:59,560 --> 06:41:04,120 on behind the scenes there from a mathematical point of view. Remember, this course focuses on 3728 06:41:04,120 --> 06:41:09,320 writing PyTorch code. But if you'd like to dive into what math PyTorch is triggering behind the 3729 06:41:09,320 --> 06:41:15,640 scenes, I'd highly recommend these two videos. And I've also added a note here as to which 3730 06:41:15,640 --> 06:41:20,280 loss function and optimizer should I use, which is a very valid question. And again, 3731 06:41:20,280 --> 06:41:26,120 it's another one of those things that's going to be problem specific. But with experience over time, 3732 06:41:26,120 --> 06:41:30,440 you work with machine learning problems, you write a lot of code, you get an idea of what works 3733 06:41:30,440 --> 06:41:35,480 and what doesn't with your particular problem set. For example, like a regression problem, 3734 06:41:35,480 --> 06:41:41,400 like ours, regression is again predicting a number. We use MAE loss, which PyTorch causes 3735 06:41:41,400 --> 06:41:48,200 L1 loss. You could also use MSE loss and an optimizer like torch opt-in stochastic gradient 3736 06:41:48,200 --> 06:41:53,960 descent will suffice. But for classification, you might want to look into a binary classification, 3737 06:41:53,960 --> 06:41:58,520 a binary cross entropy loss, but we'll look at a classification problem later on in the course. 3738 06:41:59,240 --> 06:42:06,200 For now, I'd like to demonstrate what's going on in the steps here. So let's go model zero. 3739 06:42:07,240 --> 06:42:10,280 Let's look up the state dict and see what the parameters are for now. 3740 06:42:10,280 --> 06:42:17,880 Now they aren't the original ones I don't think. Let's re-instantiate our model so we get 3741 06:42:17,880 --> 06:42:27,720 re new parameters. Yeah, we recreated it here. I might just get rid of that. So we'll rerun our 3742 06:42:27,720 --> 06:42:35,800 model code, rerun model state dict. And we will create an instance of our model and just make 3743 06:42:35,800 --> 06:42:40,520 sure your parameters should be something similar to this. If it's not exactly like that, it doesn't 3744 06:42:40,520 --> 06:42:45,400 matter. But yeah, I'm just going to showcase you'll see on my screen what's going on anyway. 3745 06:42:46,280 --> 06:42:55,240 State dict 3367 for the weight and 012888 for the bias. And again, I can't stress enough. We've 3746 06:42:55,240 --> 06:43:00,360 only got two parameters for our model and we've set them ourselves future models that you build 3747 06:43:00,360 --> 06:43:05,160 and later ones in the course will have much, much more. And we won't actually explicitly set any 3748 06:43:05,160 --> 06:43:11,480 of them ourselves. We'll check out some predictions. They're going to be terrible because we're using 3749 06:43:11,480 --> 06:43:17,640 random parameters to begin with. But we'll set up a new loss function and an optimizer. Optimizer 3750 06:43:17,640 --> 06:43:24,200 is going to optimize our model zero parameters, the weight and bias. The learning rate is 0.01, 3751 06:43:24,200 --> 06:43:30,120 which is relatively large step. That would be a bit smaller. Remember, the larger the learning rate, 3752 06:43:30,120 --> 06:43:34,920 the bigger the step, the more the optimizer will try to change these parameters every step. 3753 06:43:34,920 --> 06:43:40,440 But let's stop talking about it. Let's see it in action. I've set a manual seed here too, by the way, 3754 06:43:40,440 --> 06:43:45,080 because the optimizer steps are going to be quite random as well, depending on how the models 3755 06:43:45,080 --> 06:43:50,680 predictions go. But this is just to try and make it as reproduces possible. So keep this in mind, 3756 06:43:50,680 --> 06:43:55,640 if you get different values to what we're going to output here from my screen to your screen, 3757 06:43:55,640 --> 06:44:02,520 don't worry too much. What's more important is the direction they're going. So ideally, 3758 06:44:02,520 --> 06:44:08,760 we're moving these values here. This is from we did one epoch before. We're moving these values 3759 06:44:08,760 --> 06:44:15,240 closer to the true values. And in practice, you won't necessarily know what the true values are. 3760 06:44:15,240 --> 06:44:19,080 But that's where evaluation of your model comes in. We're going to cover that when we write a 3761 06:44:19,080 --> 06:44:26,680 testing loop. So let's run one epoch. Now I'm going to keep that down there. Watch what happens. 3762 06:44:26,680 --> 06:44:33,480 We've done one epoch, just a single epoch. We've done the forward pass. We've calculated the loss. 3763 06:44:33,480 --> 06:44:39,160 We've done optimizer zero grad. We've performed back propagation. And we've stepped the optimizer. 3764 06:44:39,160 --> 06:44:45,240 What is stepping the optimizer do? It updates our model parameters to try and get them further 3765 06:44:45,240 --> 06:44:51,160 closer towards the weight and bias. If it does that, the loss will be closer to zero. That's what 3766 06:44:51,160 --> 06:44:58,920 it's trying to do. How about we print out the loss at the same time. Print loss and the loss. 3767 06:45:00,040 --> 06:45:08,280 Let's take another step. So the loss is 0301. Now we check the weights and the bias. We've changed 3768 06:45:08,280 --> 06:45:17,480 again three, three, four, four, five, one, four, eight, eight. We go again. The loss is going down. 3769 06:45:17,480 --> 06:45:24,200 Check it. Hey, look at that. The values are getting closer to where they should be if over so slightly. 3770 06:45:25,720 --> 06:45:31,960 Loss went down again. Oh my goodness, this is so amazing. Look, we're training our, 3771 06:45:31,960 --> 06:45:39,400 let's print this out in the same cell. Print our model. State dict. We're training our first 3772 06:45:39,400 --> 06:45:44,680 machine learning model here, people. This is very exciting, even if it's only step by step and it's 3773 06:45:44,680 --> 06:45:50,200 only a small model. This is very important. Loss is going down again. Values are getting closer to 3774 06:45:50,200 --> 06:45:55,720 where they should be. Again, we won't really know where they should be in real problems, but for 3775 06:45:55,720 --> 06:46:00,920 now we do. So let's just get excited. The real way to sort of measure your model's progress and 3776 06:46:00,920 --> 06:46:06,600 practice is a lower loss value. Remember, lower is better. A loss value measures how wrong your 3777 06:46:06,600 --> 06:46:11,480 model is. We're going down. We're going in the right direction. So that's what I meant by, 3778 06:46:11,480 --> 06:46:16,760 as long as your values are going in the similar direction. So down, we're writing similar code 3779 06:46:16,760 --> 06:46:21,960 here, but if your values are slightly different in terms of the exact numbers, don't worry too 3780 06:46:21,960 --> 06:46:26,680 much because that's inherent to the randomness of machine learning, because the steps that the 3781 06:46:26,680 --> 06:46:31,480 optimizer are taking are inherently random, but they're sort of pushed in a direction. 3782 06:46:32,120 --> 06:46:37,400 So we're doing gradient descent here. This is beautiful. How low can we get the loss? How about 3783 06:46:37,400 --> 06:46:43,960 we try to get to 0.1? Look at that. We're getting close to 0.1. And then, I mean, we don't have to 3784 06:46:43,960 --> 06:46:51,320 do this hand by hand. The bias is getting close to where it exactly should be. We're below 0.1. 3785 06:46:51,320 --> 06:46:56,920 Beautiful. So that was only about, say, 10 passes through the data, but now you're seeing it in 3786 06:46:56,920 --> 06:47:02,840 practice. You're seeing it happen. You're seeing gradient descent. Let's go gradient descent work 3787 06:47:02,840 --> 06:47:08,760 in action. We've got images. This is what's happening. We've got our cost function. J is 3788 06:47:08,760 --> 06:47:13,960 another term for cost function, which is also our loss function. We start with an initial weight. 3789 06:47:13,960 --> 06:47:20,360 What have we done? We started with an initial weight, this value here. And what are we doing? 3790 06:47:20,360 --> 06:47:24,840 We've measured the gradient pytorch has done that behind the scenes for us. Thank you pytorch. 3791 06:47:24,840 --> 06:47:29,800 And we're taking steps towards the minimum. That's what we're trying to do. If we minimize the 3792 06:47:29,800 --> 06:47:36,360 gradient of our weight, we minimize the cost function, which is also a loss function. We could 3793 06:47:36,360 --> 06:47:43,400 keep going here for hours and get as long as we want. But my challenge for you, or actually, 3794 06:47:43,400 --> 06:47:47,880 how about we make some predictions with our model we've got right now? Let's make some predictions. 3795 06:47:47,880 --> 06:47:52,280 So with torch dot inference mode, we'll make some predictions together. And then I'm going 3796 06:47:52,280 --> 06:47:58,040 to set you a challenge. How about you run this code here for 100 epochs after this video, 3797 06:47:58,040 --> 06:48:02,200 and then you make some predictions and see how that goes. So why preds? Remember how 3798 06:48:02,200 --> 06:48:09,480 poor our predictions are? Why preds new equals, we just do the forward pass here. Model zero 3799 06:48:09,480 --> 06:48:16,200 on the test data. Let's just remind ourselves quickly of how poor our previous predictions were. 3800 06:48:16,760 --> 06:48:23,160 Plot predictions, predictions equals y. Do we still have this saved? Why preds? 3801 06:48:23,160 --> 06:48:30,440 Hopefully, this is still saved. There we go. Shocking predictions, but we've just done 10 or so 3802 06:48:30,440 --> 06:48:35,960 epochs. So 10 or so training steps have our predictions. Do they look any better? Let's run 3803 06:48:35,960 --> 06:48:41,720 this. We'll copy this code. You know my rule. I don't really like to copy code, but in this case, 3804 06:48:41,720 --> 06:48:47,480 I just want to exemplify a point. I like to write all the code myself. What do we got? Why preds 3805 06:48:47,480 --> 06:48:54,920 new? Look at that. We are moving our predictions close at the red dots closer to the green dots. 3806 06:48:54,920 --> 06:48:59,720 This is what's happening. We're reducing the loss. In other words, we're reducing the difference 3807 06:48:59,720 --> 06:49:05,640 between our models predictions and our ideal outcomes through the power of back propagation 3808 06:49:05,640 --> 06:49:10,840 and gradient descent. So this is super exciting. We're training our first machine learning model. 3809 06:49:10,840 --> 06:49:17,080 My challenge to you is to run this code here. Change epochs to 100. See how low you can get this 3810 06:49:17,080 --> 06:49:24,600 loss value and run some predictions, plot them. And I think it's time to start testing. So give 3811 06:49:24,600 --> 06:49:30,360 that a go yourself, and then we'll write some testing code in the next video. I'll see you there. 3812 06:49:32,280 --> 06:49:38,760 Welcome back. In the last video, we did something super excited. We saw our loss go down. So the 3813 06:49:38,760 --> 06:49:44,200 loss is remember how different our models predictions are to what we'd ideally like them. And we saw 3814 06:49:44,200 --> 06:49:50,920 our model update its parameters through the power of back propagation and gradient descent, all 3815 06:49:50,920 --> 06:49:57,640 taken care of behind the scenes for us by PyTorch. So thank you, PyTorch. And again, if you'd like 3816 06:49:57,640 --> 06:50:03,000 some extra resources on what's actually happening from a math perspective for back propagation and 3817 06:50:03,000 --> 06:50:08,520 gradient descent, I would refer to you to these. Otherwise, this is also how I learn about things. 3818 06:50:08,520 --> 06:50:14,600 Gradient descent. There we go. How does gradient descent work? And then we've got back propagation. 3819 06:50:15,640 --> 06:50:21,640 And just to reiterate, I am doing this and just Googling these things because that's what you're 3820 06:50:21,640 --> 06:50:25,560 going to do in practice. You're going to come across a lot of different things that aren't 3821 06:50:25,560 --> 06:50:31,320 covered in this course. And this is seriously what I do day to day as a machine learning engineer 3822 06:50:31,320 --> 06:50:37,160 if I don't know what's going on. Just go to Google, read, watch a video, write some code, 3823 06:50:37,160 --> 06:50:43,000 and then I build my own intuition for it. But with that being said, I also issued you the challenge 3824 06:50:43,000 --> 06:50:50,440 of trying to run this training code for 100 epochs. Did you give that a go? I hope you did. And 3825 06:50:50,440 --> 06:50:55,560 how low did your loss value? Did the weights and bias get anywhere close to where they should have 3826 06:50:55,560 --> 06:51:01,160 been? How do the predictions look? Now, I'm going to save that for later on, running this code for 3827 06:51:01,160 --> 06:51:07,000 100 epochs. For now, let's write some testing code. And just a note, you don't necessarily have to 3828 06:51:07,000 --> 06:51:11,880 write the training and testing loop together. You can functionize them, which we will be doing later 3829 06:51:11,880 --> 06:51:17,240 on. But for the sake of this intuition, building and code practicing and first time where we're 3830 06:51:17,240 --> 06:51:23,560 writing this code together, I'm going to write them together. So testing code, we call model.ofour, 3831 06:51:23,560 --> 06:51:33,480 what does this do? So this turns off different settings in the model not needed for evaluation 3832 06:51:33,480 --> 06:51:39,560 slash testing. This can be a little confusing to remember when you're writing testing code. But 3833 06:51:39,560 --> 06:51:44,920 we're going to do it a few times until it's habit. So just make it a habit. If you're training your 3834 06:51:44,920 --> 06:51:50,120 model, call model dot train to make sure it's in training mode. If you're testing or evaluating 3835 06:51:50,120 --> 06:51:55,640 your model. So that's what a vowel stands for evaluate, call model dot a vowel. So it turns off 3836 06:51:55,640 --> 06:52:00,280 different settings in the model not needed for evaluation. So testing, this is things like drop 3837 06:52:00,280 --> 06:52:06,840 out. We haven't seen what drop out is slash batch norm layers. But if we go into torch dot 3838 06:52:06,840 --> 06:52:12,920 and end, I'm sure you'll come across these things in your future machine learning endeavors. So drop 3839 06:52:12,920 --> 06:52:21,400 out drop out layers. There we go. And batch norm. Do we have batch batch norm? There we go. If you'd 3840 06:52:21,400 --> 06:52:27,400 like to work out what they are, feel free to check out the documentation. Just take it from me for 3841 06:52:27,400 --> 06:52:34,520 now that model of our turns off different settings not needed for evaluation and testing. Then we 3842 06:52:34,520 --> 06:52:41,240 set up with torch dot inference mode, inference mode. So what does this do? Let's write down here. 3843 06:52:42,600 --> 06:52:49,480 So this turns off gradient tracking. So as we discussed, if we have parameters in our model, 3844 06:52:49,480 --> 06:52:56,600 and it turns off actually a few more things and a couple more things behind the scenes, 3845 06:52:57,640 --> 06:53:03,400 these are things again, not needed for testing. So we discussed that if parameters in our model 3846 06:53:03,400 --> 06:53:07,960 have requires grad equals true, which is the default for many different parameters in pytorch, 3847 06:53:08,840 --> 06:53:14,600 pytorch will behind the scenes keep track of the gradients of our model and use them in 3848 06:53:14,600 --> 06:53:20,520 lost up backward and optimizer step for back propagation and gradient descent. However, 3849 06:53:21,560 --> 06:53:26,040 we only need those two back propagation and gradient descent during training because that 3850 06:53:26,040 --> 06:53:31,800 is when our model is learning. When we are testing, we are just evaluating the parameters the patterns 3851 06:53:31,800 --> 06:53:36,920 that our model has learned on the training data set. So we don't need to do any learning 3852 06:53:36,920 --> 06:53:41,960 when we're testing. So we turn off the things that we don't need. And is this going to have 3853 06:53:41,960 --> 06:53:47,400 the correct spacing for me? I'm not sure we'll find out. So we still do the forward pass 3854 06:53:49,080 --> 06:53:53,960 in testing mode, do the forward pass. And if you want to look up torch inference mode, 3855 06:53:53,960 --> 06:53:58,280 just go torch inference mode. There's a great tweet about it that pytorch did, which explains 3856 06:53:58,280 --> 06:54:05,400 what's happening. I think we've covered this before, but yeah, want to make your inference 3857 06:54:05,400 --> 06:54:11,240 code and pytorch run faster. Here's a quick thread on doing exactly that. So inference 3858 06:54:11,240 --> 06:54:16,200 mode is torch no grad. Again, you might see torch no grad. I think I'll write that down just to 3859 06:54:17,560 --> 06:54:22,360 let you know. But here's what's happening behind the scenes. A lot of optimization code, 3860 06:54:22,360 --> 06:54:26,200 which is beautiful. This is why we're using pytorch so that our code runs nice and far. 3861 06:54:26,920 --> 06:54:33,480 Let me go there. You may also see with torch dot no grad in older pytorch code. It does 3862 06:54:33,480 --> 06:54:39,880 similar things, but inference mode is the faster way of doing things according to the thread. 3863 06:54:39,880 --> 06:54:43,320 And according to there's a blog post attached to there as well, I believe. 3864 06:54:44,120 --> 06:54:55,400 So you may also see torch dot no grad in older pytorch code, which would be valid. But again, 3865 06:54:55,400 --> 06:55:01,320 inference mode is the better way of doing things. So do forward pass. So let's get our model. We 3866 06:55:01,320 --> 06:55:06,520 want to create test predictions here. So we're going to go model zero. There's a lot of code 3867 06:55:06,520 --> 06:55:12,040 going on here, but I'm going to just step by step it in a second. We'll go back through it all. 3868 06:55:12,840 --> 06:55:17,960 And then number two is calculate the loss. Now we're doing the test predictions here, 3869 06:55:17,960 --> 06:55:24,680 calculate the loss test predictions with model zero. So now we want to calculate the what we want 3870 06:55:24,680 --> 06:55:31,480 to calculate the test loss. So this will be our loss function, the difference between the test 3871 06:55:31,480 --> 06:55:37,640 pred and the test labels. That's important. So for testing, we're working with test data, 3872 06:55:37,640 --> 06:55:43,080 for training, we're working with training data. Model learns patterns on the training data, 3873 06:55:43,080 --> 06:55:48,520 and it evaluates those patterns that it's learned, the different parameters on the testing data. It 3874 06:55:48,520 --> 06:55:53,960 has never seen before, just like in a university course, you'd study the course materials, which 3875 06:55:53,960 --> 06:55:58,680 is the training data, and you'd evaluate your knowledge on materials you'd hopefully never 3876 06:55:58,680 --> 06:56:03,800 seen before, unless you sort of were friends with your professor, and they gave you the exam before 3877 06:56:03,800 --> 06:56:08,760 the actual exam that would be cheating right. So that's a very important point for the test data 3878 06:56:08,760 --> 06:56:15,160 set. Don't let your model see the test data set before you evaluate it. Otherwise, you'll get 3879 06:56:15,160 --> 06:56:21,400 poor results. And that's putting it out what's happening. Epoch, we're going to go Epoch, 3880 06:56:21,400 --> 06:56:25,320 and then I will introduce you to my little jingle to remember all of these steps because 3881 06:56:25,320 --> 06:56:31,800 there's a lot going on. Don't you worry. I know there's a lot going on, but again, with practice, 3882 06:56:31,800 --> 06:56:40,520 we're going to know what's happening here. Like it's the back of our hand. All right. 3883 06:56:41,400 --> 06:56:49,000 So do we need this? Oh, yeah, we could say that. Oh, no, we don't need test here. Loss. This is 3884 06:56:49,000 --> 06:56:59,160 loss, not test. Print out what's happening. Okay. And we don't actually need to do this 3885 06:56:59,160 --> 06:57:06,760 every epoch. We could just go say if epoch divided by 10 equals zero, print out what's happening. 3886 06:57:06,760 --> 06:57:11,640 Let's do that rather than clutter everything up, print it out, and we'll print out this. 3887 06:57:12,920 --> 06:57:17,720 So let's just step through what's happening. We've got 100 epochs. That's what we're about to run, 3888 06:57:17,720 --> 06:57:22,760 100 epochs. Our model is trained for about 10 so far. So it's got a good base. Maybe we'll just 3889 06:57:22,760 --> 06:57:31,480 get rid of that base. Start a new instance of our model. So we'll come right back down. 3890 06:57:33,080 --> 06:57:38,040 So our model is back to randomly initialized parameters, but of course, randomly initialized 3891 06:57:38,040 --> 06:57:44,360 flavored with a random seed of 42. Lovely, lovely. And so we've got our training code here. We've 3892 06:57:44,360 --> 06:57:49,560 discussed what's happening there. Now, we've got our testing code. We call model dot eval, 3893 06:57:49,560 --> 06:57:54,680 which turns off different settings in the model, not needed for evaluation slash testing. We call 3894 06:57:54,680 --> 06:57:59,560 with torch inference mode context manager, which turns off gradient tracking and a couple more 3895 06:57:59,560 --> 06:58:05,960 things behind the scenes to make our code faster. We do the forward pass. We do the test predictions. 3896 06:58:05,960 --> 06:58:11,160 We pass our model, the test data, the test features to calculate the test predictions. 3897 06:58:11,160 --> 06:58:15,880 Then we calculate the loss using our loss function. We can use the same loss function that we used 3898 06:58:15,880 --> 06:58:21,240 for the training data. And it's called the test loss, because it's on the test data set. 3899 06:58:21,240 --> 06:58:25,720 And then we print out what's happening, because we want to know what's happening while our 3900 06:58:25,720 --> 06:58:30,280 model's training, we don't necessarily have to do this. But the beauty of PyTorch is you can 3901 06:58:30,280 --> 06:58:35,080 use basic Python printing statements to see what's happening with your model. And so, 3902 06:58:35,080 --> 06:58:38,920 because we're doing 100 epochs, we don't want to clutter up everything here. So we'll just 3903 06:58:38,920 --> 06:58:44,680 print out what's happening every 10th epoch. Again, you can customize this as much as you like 3904 06:58:44,680 --> 06:58:49,720 what's printing out here. This is just one example. If you had other metrics here, such as calculating 3905 06:58:49,720 --> 06:58:55,000 model accuracy, we might see that later on, hint hint. We might print out our model accuracy. 3906 06:58:55,640 --> 06:59:00,280 So this is very exciting. Are you ready to run 100 epochs? How low do you think our loss can go? 3907 06:59:02,200 --> 06:59:07,720 This loss was after about 10. So let's just save this here. Let's give it a go. Ready? 3908 06:59:07,720 --> 06:59:17,000 Three, two, one. Let's run. Oh my goodness. Look at that. Waits. Here we go. Every 10 epochs 3909 06:59:17,000 --> 06:59:22,520 were printing out what's happening. So the zero epoch, we started with losses 312. Look at it go 3910 06:59:22,520 --> 06:59:28,920 down. Yes, that's what we want. And our weights and bias, are they moving towards our ideal weight 3911 06:59:28,920 --> 06:59:34,520 and bias values of 0.7 and 0.3? Yes, they're moving in the right direction here. The loss is 3912 06:59:34,520 --> 06:59:43,080 going down. Epoch 20, wonderful. Epoch 30, even better. 40, 50, going down, down, down. Yes, 3913 06:59:43,080 --> 06:59:48,200 this is what we want. This is what we want. Now, we're predicting a straight line here. Look how 3914 06:59:48,200 --> 06:59:54,840 low the loss gets. After 100 epochs, we've got about three times less than what we had before. 3915 06:59:55,880 --> 07:00:03,640 And then we've got these values are quite close to where they should be, 0.5629, 0.3573. We'll make 3916 07:00:03,640 --> 07:00:08,680 some predictions. What do they look like? Why preds new? This is the original predictions 3917 07:00:08,680 --> 07:00:15,320 with random values. And if we make why preds new, look how close it is after 100 epochs. 3918 07:00:15,960 --> 07:00:20,920 Now, what's our, do we print out the test loss? Oh no, we're printing out loss as well. 3919 07:00:21,480 --> 07:00:25,160 Let's get rid of that. I think this is this. Yeah, that's this statement here. Our code would have 3920 07:00:25,160 --> 07:00:30,680 been a much cleaner if we didn't have that, but that's all right. Life goes on. So our test loss, 3921 07:00:30,680 --> 07:00:35,400 because this is the test predictions that we're making, is not as low as our training loss. 3922 07:00:36,520 --> 07:00:42,200 I wonder how we could get that lower. What do you think we could do? We just trained it for 3923 07:00:42,200 --> 07:00:46,440 longer. And what happened? How do you think you could get these red dots to line up with these 3924 07:00:46,440 --> 07:00:51,960 green dots? Do you think you could? So that's my challenge to you for the next video. 3925 07:00:51,960 --> 07:00:56,360 Think of something that you could do to get these red dots to match up with these green dots, 3926 07:00:56,360 --> 07:01:01,880 maybe train for longer. How do you think you could do that? So give that a shot. And I'll see 3927 07:01:01,880 --> 07:01:08,440 in the next video, we'll review what our testing code is doing. I'll see you there. 3928 07:01:10,280 --> 07:01:15,960 Welcome back. In the last video, we did something super exciting. We trained our model for 100 epochs 3929 07:01:15,960 --> 07:01:21,880 and look how good the predictions got. But I finished it off with challenging you to see if you could 3930 07:01:21,880 --> 07:01:27,800 align the red dots with the green dots. And it's okay if you're not sure how the best way to do 3931 07:01:27,800 --> 07:01:31,320 that. That's what we're here for. We're here to learn what are the best way to do these things 3932 07:01:31,320 --> 07:01:36,520 together. But you might have had the idea of potentially training the model for a little bit 3933 07:01:36,520 --> 07:01:43,800 longer. So how could we do that? Well, we could just rerun this code. So the model is going to 3934 07:01:43,800 --> 07:01:49,400 remember the parameters that it has from what we've done here. And if we rerun it, well, it's going 3935 07:01:49,400 --> 07:01:55,000 to start from where it finished off, which is already pretty good for our data set. And then 3936 07:01:55,000 --> 07:01:59,880 it's going to try and improve them even more. This is, I can't stress enough, like what we are 3937 07:01:59,880 --> 07:02:04,680 doing here is going to be very similar throughout the entire rest of the course for training more 3938 07:02:04,680 --> 07:02:10,200 and more models. So this step that we've done here for training our model and evaluating it 3939 07:02:10,760 --> 07:02:18,280 is seriously like the fundamental steps of deep learning with PyTorch is training and evaluating 3940 07:02:18,280 --> 07:02:23,800 a model. And we've just done it. Although I'll be it to predict some red dots and green dots. 3941 07:02:25,080 --> 07:02:29,960 That's all right. So let's try to line them up, hey, red dots onto green dots. I reckon if we 3942 07:02:29,960 --> 07:02:36,520 train it for another 100 epochs, we should get pretty darn close. Ready? Three, two, one. I'm 3943 07:02:36,520 --> 07:02:41,480 going to run this cell again. Runs really quick because our data's nice and simple. But 3944 07:02:41,480 --> 07:02:49,640 look at this, lastly, we started 0244. Where do we get down to? 008. Oh my goodness. So we've 3945 07:02:49,640 --> 07:02:55,720 improved it by another three X or so. And now this is where our model has got really good. 3946 07:02:55,720 --> 07:03:03,720 On the test loss, we've gone from 00564. We've gone down to 005. So almost 10X improvement there. 3947 07:03:04,360 --> 07:03:10,680 And so we make some more predictions. What are our model parameters? Remember the ideal ones here. 3948 07:03:10,680 --> 07:03:15,400 We won't necessarily know them in practice, but because we're working with a simple data set, 3949 07:03:15,400 --> 07:03:21,080 we know what the ideal parameters are. Model zero state dig weights. These are what they 3950 07:03:21,080 --> 07:03:27,880 previously were. What are they going to change to? Oh, would you look at that? Oh, 3951 07:03:27,880 --> 07:03:34,440 06990. Now, again, if yours are very slightly different to mine, don't worry too much. That is 3952 07:03:34,440 --> 07:03:39,880 the inherent randomness of machine learning and deep learning. Even though we set a manual seed, 3953 07:03:39,880 --> 07:03:46,680 it may be slightly different. The direction is more important. So if your number here is not 3954 07:03:46,680 --> 07:03:52,760 exactly what mine is, it should still be quite close to 0.7. And the same thing with this one. 3955 07:03:52,760 --> 07:03:57,880 If it's not exactly what mine is, don't worry too much. The same with all of these loss values 3956 07:03:57,880 --> 07:04:03,320 as well. The direction is more important. So we're pretty darn close. How do these predictions 3957 07:04:03,320 --> 07:04:10,440 look? Remember, these are the original ones. We started with random. And now we've trained a model. 3958 07:04:10,440 --> 07:04:16,520 So close. So close to being exactly that. So a little bit off. But that's all right. We could 3959 07:04:16,520 --> 07:04:21,880 tweak a few things to improve this. But I think that's well and truly enough for this example 3960 07:04:21,880 --> 07:04:26,680 purpose. You see what's happened. Of course, we could just create a model and set the parameters 3961 07:04:26,680 --> 07:04:30,840 ourselves manually. But where would be the fun in that? We just wrote some machine learning code 3962 07:04:30,840 --> 07:04:38,040 to do it for us with the power of back propagation and gradient descent. Now in the last video, 3963 07:04:38,040 --> 07:04:43,000 we wrote the testing loop. We discussed a few other steps here. But now let's go over it with 3964 07:04:43,000 --> 07:04:49,320 a colorful slide. Hey, because I mean, code on a page is nice, but colors are even nicer. Oh, 3965 07:04:49,880 --> 07:04:55,640 we haven't done this. We might set up this in this video too. But let's just discuss what's going on. 3966 07:04:56,280 --> 07:05:00,600 Create an empty list for storing useful value. So this is helpful for tracking model progress. 3967 07:05:00,600 --> 07:05:05,560 How can we just do this right now? Hey, we'll go here and we'll go. 3968 07:05:07,960 --> 07:05:13,480 So what did we have? Epoch count equals that. And then we'll go 3969 07:05:14,920 --> 07:05:19,320 lost values. So why do we keep track of these? It's because 3970 07:05:21,560 --> 07:05:27,240 if we want to monitor our models progress, this is called tracking experiments. So track 3971 07:05:27,240 --> 07:05:33,160 different values. If we wanted to try and improve upon our current model with a future model. So 3972 07:05:33,160 --> 07:05:39,160 our current results, such as this, if we wanted to try and improve upon it, we might build an 3973 07:05:39,160 --> 07:05:43,720 entire other model. And we might train it in a different setup. We might use a different learning 3974 07:05:43,720 --> 07:05:48,600 rate. We might use a whole bunch of different settings, but we track the values so that we 3975 07:05:48,600 --> 07:05:54,440 can compare future experiments to past experiments, like the brilliant scientists that we are. 3976 07:05:54,440 --> 07:06:02,920 And so where could we use these lists? Well, we're calculating the loss here. And we're calculating 3977 07:06:02,920 --> 07:06:13,000 the test loss here. So maybe we each time append what's going on here as we do a status update. 3978 07:06:13,000 --> 07:06:24,360 So epoch count dot append, and we're going to go a current epoch. And then we'll go loss values 3979 07:06:24,360 --> 07:06:34,520 dot append, a current loss value. And then we'll do test loss values dot append, the current test 3980 07:06:34,520 --> 07:06:41,960 loss values. Wonderful. And now let's re-instantiate our model so that it starts from fresh. So this 3981 07:06:41,960 --> 07:06:46,120 is just create another instance. So we're just going to re-initialize our model parameters to 3982 07:06:46,120 --> 07:06:50,360 start from zero. If we wanted to, we could functionize all of this so we don't have to 3983 07:06:50,360 --> 07:06:55,320 go right back up to the top of the code. But just for demo purposes, we're doing it how we're doing 3984 07:06:55,320 --> 07:07:00,440 it. And I'm going to run this for let's say 200 epochs, because that's what we ended up doing, 3985 07:07:00,440 --> 07:07:06,520 right? We ran it for 200 epochs, because we did 100 epochs twice. And I want to show you something 3986 07:07:06,520 --> 07:07:10,520 beautiful, one of the most beautiful sites in machine learning. So there we go, we run it for 3987 07:07:10,520 --> 07:07:17,000 200 epochs, we start with a fairly high training loss value and a fairly high test loss value. So 3988 07:07:17,000 --> 07:07:23,000 remember, what is our loss value? It's ma e. So if we go back, yeah, this is what we're measuring 3989 07:07:23,000 --> 07:07:30,120 for loss. So this means for the test loss on average, each of our dot points here, the red 3990 07:07:30,120 --> 07:07:38,920 predictions are 0.481. That's the average distance between each dot point. And then ideally, what 3991 07:07:38,920 --> 07:07:45,320 are we doing? We're trying to minimize this distance. That's the ma e. So the mean absolute error. 3992 07:07:45,320 --> 07:07:51,640 And we get it right down to 0.05. And if we make predictions, what do we have here, we get very 3993 07:07:51,640 --> 07:07:58,520 close to the ideal weight and bias, make our predictions, have a look at the new predictions. 3994 07:07:58,520 --> 07:08:02,200 Yeah, very small distance here. Beautiful. That's a low loss value. 3995 07:08:03,000 --> 07:08:08,680 Ideally, they'd line up, but we've got as close as we can for now. So this is one of the most 3996 07:08:08,680 --> 07:08:15,400 beautiful sites in machine learning. So plot the loss curves. So let's make a plot, because what 3997 07:08:15,400 --> 07:08:23,560 we're doing, we were tracking the value of epoch count, loss values and test loss values. 3998 07:08:24,440 --> 07:08:31,000 Let's have a look at what these all look like. So epoch count goes up, loss values ideally go down. 3999 07:08:31,560 --> 07:08:36,920 So we'll get rid of that. We're going to create a plot p l t dot plot. We're going to step back 4000 07:08:36,920 --> 07:08:45,640 through the test loop in a second with some colorful slides, label equals train loss. 4001 07:08:48,120 --> 07:08:53,560 And then we're going to go plot. You might be able to tell what's going on here. Test loss 4002 07:08:53,560 --> 07:09:00,040 values. We're going to visualize it, because that's the data explorer's motto, right, is visualize, 4003 07:09:00,040 --> 07:09:06,680 visualize, visualize. This is equals. See, collab does this auto correct. That doesn't really work 4004 07:09:06,680 --> 07:09:13,800 very well. And I don't know when it does it and why it doesn't. And we got, I know, we didn't, 4005 07:09:13,800 --> 07:09:17,000 we didn't say loss value. So that's a good auto correct. Thank you, collab. 4006 07:09:18,920 --> 07:09:24,920 So training and loss and test loss curves. So this is another term you're going to come across 4007 07:09:24,920 --> 07:09:30,280 often is a loss curve. Now you might be able to think about a loss curve. If we're doing a loss 4008 07:09:30,280 --> 07:09:34,760 curve, and it's starting at the start of training, what do we want that curve to do? 4009 07:09:36,360 --> 07:09:42,760 What do we want our loss value to do? We want it to go down. So what should an ideal loss 4010 07:09:42,760 --> 07:09:47,160 curve look like? Well, we're about to see a couple. Let's have a look. Oh, what do we got wrong? 4011 07:09:47,160 --> 07:09:57,160 Well, we need to, I'll turn it into NumPy. Is this what we're getting wrong? So why is this wrong? 4012 07:09:58,920 --> 07:10:02,840 Loss values. Why are we getting an issue? Test loss values. 4013 07:10:05,480 --> 07:10:13,240 Ah, it's because they're all tens of values. So I think we should, let's, 4014 07:10:13,240 --> 07:10:22,200 I might change this to NumPy. Oh, can I just do that? If I just call this as a NumPy array, 4015 07:10:22,200 --> 07:10:28,280 we're going to try and fix this on the fly. People, NumPy array, we'll just turn this into a NumPy 4016 07:10:28,280 --> 07:10:37,320 array. Let's see if we get NumPy. I'm figuring these things out together. NumPy as NumPy, 4017 07:10:37,320 --> 07:10:44,280 because mapplotlib works with NumPy. Yeah, there we go. So can we do loss values? Maybe 4018 07:10:45,720 --> 07:10:51,160 I'm going to try one thing, torch dot tensor, loss values, and then call 4019 07:10:53,720 --> 07:10:58,520 CPU dot NumPy. See what happens here. 4020 07:10:58,520 --> 07:11:06,440 There we go. Okay, so let's just copy this. So what we're doing here is 4021 07:11:06,440 --> 07:11:13,240 our loss values are still on PyTorch, and they can't be because mapplotlib works with 4022 07:11:14,360 --> 07:11:19,560 NumPy. And so what we're doing here is we're converting our loss values of the training loss 4023 07:11:19,560 --> 07:11:25,240 to NumPy. And if you call from the fundamental section, we call CPU and NumPy, I wonder if we 4024 07:11:25,240 --> 07:11:31,240 can just do straight up NumPy, because we're not working on there. Yeah, okay, we don't need 4025 07:11:31,240 --> 07:11:36,120 CPU because we're not working on the GPU yet, but we might need that later on. Well, this work. 4026 07:11:36,120 --> 07:11:40,920 Beautiful. There we go. One of the most beautiful sides in machine learning is a declining loss 4027 07:11:40,920 --> 07:11:47,640 curve. So this is how we keep track of our experiments, or one way, quite rudimentary. We'd like to 4028 07:11:47,640 --> 07:11:53,320 automate this later on. But I'm just showing you one way to keep track of what's happening. 4029 07:11:53,320 --> 07:11:58,280 So the training loss curve is going down here. The training loss starts at 0.3, and then it goes 4030 07:11:58,280 --> 07:12:03,480 right down. The beautiful thing is they match up. If there was a two bigger distance behind the 4031 07:12:03,480 --> 07:12:09,080 train loss and the test loss, or sorry, between, then we're running into some problems. But if they 4032 07:12:09,080 --> 07:12:14,440 match up closely at some point, that means our model is converging and the loss is getting as 4033 07:12:14,440 --> 07:12:20,040 close to zero as it possibly can. If we trained for longer, maybe the loss will go almost basically 4034 07:12:20,040 --> 07:12:25,480 to zero. But that's an experiment I'll leave you to try to train that model for longer. 4035 07:12:25,480 --> 07:12:32,680 Let's just step back through our testing loop to finish off this video. So we did that. We created 4036 07:12:32,680 --> 07:12:37,800 empty lists for strong useful values, storing useful values, strong useful values. Told the 4037 07:12:37,800 --> 07:12:43,080 model what we want to evaluate or that we want to evaluate. So we put it in an evaluation mode. 4038 07:12:43,080 --> 07:12:47,400 It turns off functionality used for training, but not evaluations, such as drop out and batch 4039 07:12:47,400 --> 07:12:51,720 normalization layers. If you want to learn more about them, you can look them up in the documentation. 4040 07:12:52,680 --> 07:12:58,520 Turn on torch inference mode. So this is for faster performance. So we don't necessarily need this, 4041 07:12:58,520 --> 07:13:03,320 but it's good practice. So I'm going to say that yes, turn on torch inference mode. So this 4042 07:13:03,320 --> 07:13:08,120 disables functionality such as gradient tracking for inference. Gradient tracking is not needed 4043 07:13:08,120 --> 07:13:14,440 for inference only for training. Now we pass the test data through the model. So this will call 4044 07:13:14,440 --> 07:13:19,080 the models implemented forward method. The forward pass is the exact same as what we did in the 4045 07:13:19,080 --> 07:13:25,560 training loop, except we're doing it on the test data. So big notion there, training loop, 4046 07:13:25,560 --> 07:13:32,280 training data, testing loop, testing data. Then we calculate the test loss value, 4047 07:13:32,280 --> 07:13:37,240 how wrong the models predictions are on the test data set. And of course, lower is better. 4048 07:13:37,800 --> 07:13:43,320 And finally, we print out what's happening. So we can keep track of what's going on during 4049 07:13:43,320 --> 07:13:47,720 training. We don't necessarily have to do this. You can customize this print value to print out 4050 07:13:47,720 --> 07:13:54,520 almost whatever you want, because it's pie torches, basically very beautifully interactive with pure 4051 07:13:54,520 --> 07:14:00,600 Python. And then we keep track of the values of what's going on on epochs and train loss and test 4052 07:14:00,600 --> 07:14:06,120 loss. We could keep track of other values here. But for now, we're just going, okay, what's the loss 4053 07:14:06,120 --> 07:14:11,800 value at a particular epoch for the training set? And for the test set. And of course, all of this 4054 07:14:11,800 --> 07:14:16,200 could be put into a function. And that way we won't have to remember these steps off by heart. 4055 07:14:16,200 --> 07:14:21,640 But the reason why we've spent so much time on this is because we're going to be using this 4056 07:14:21,640 --> 07:14:26,040 training and test functionality for all of the models that we build throughout this course. 4057 07:14:26,600 --> 07:14:31,880 So give yourself a pat in the back for getting through all of these videos. We've written a lot 4058 07:14:31,880 --> 07:14:37,000 of code. We've discussed a lot of steps. But if you'd like a song to remember what's happening, 4059 07:14:37,000 --> 07:14:43,160 let's finish this video off with my unofficial PyTorch optimization loop song. 4060 07:14:43,160 --> 07:14:51,320 So for an epoch in a range, go model dot train, do the forward pass, calculate the loss, optimize 4061 07:14:51,320 --> 07:14:59,000 a zero grad, loss backward, optimize a step, step, step. No, you only have to call this once. 4062 07:14:59,000 --> 07:15:05,720 But now let's test, go model dot eval with torch inference mode, do the forward pass, 4063 07:15:05,720 --> 07:15:11,160 calculate the loss. And then the real song goes for another epoch because you keep going back 4064 07:15:11,160 --> 07:15:18,840 through. But we finish off with print out what's happening. And then of course, we evaluate what's 4065 07:15:18,840 --> 07:15:23,880 going on. With that being said, it's time to move on to another thing. But if you'd like to review 4066 07:15:23,880 --> 07:15:29,640 what's happening, please, please, please try to run this code for yourself again and check out the 4067 07:15:29,640 --> 07:15:35,640 slides and also check out the extra curriculum. Oh, by the way, if you want to link to all 4068 07:15:35,640 --> 07:15:41,320 of the extra curriculum, just go to the book version of the course. And it's all going to be in here. 4069 07:15:41,880 --> 07:15:47,960 So that's there ready to go. Everything I link is extra curriculum will be in the extra curriculum 4070 07:15:47,960 --> 07:15:57,400 of each chapter. I'll see you in the next video. Welcome back. In the last video, we saw how to 4071 07:15:57,400 --> 07:16:03,560 train our model and evaluate it by not only looking at the loss metrics and the loss curves, 4072 07:16:03,560 --> 07:16:07,640 but we also plotted our predictions and we compared them. Hey, have a go at these random 4073 07:16:07,640 --> 07:16:12,520 predictions. Quite terrible. But then we trained a model using the power of back propagation and 4074 07:16:12,520 --> 07:16:17,880 gradient descent. And now look at our predictions. They're almost exactly where we want them to be. 4075 07:16:18,520 --> 07:16:22,040 And so you might be thinking, well, we've trained this model and it took us a while to 4076 07:16:22,040 --> 07:16:28,040 write all this code to get some good predictions. How might we run that model again? So I've took 4077 07:16:28,040 --> 07:16:33,160 in a little break after the last video, but now I've come back and you might notice that my Google 4078 07:16:33,160 --> 07:16:40,520 Colab notebook has disconnected. So what does this mean if I was to run this? Is it going to work? 4079 07:16:41,080 --> 07:16:47,480 I'm going to connect to a new Google Colab instance. But will we have all of the code that we've run 4080 07:16:47,480 --> 07:16:53,160 above? You might have already experienced this if you took a break before and came back to the 4081 07:16:53,160 --> 07:16:59,000 videos. Ah, so plot predictions is no longer defined. And do you know what that means? That 4082 07:16:59,000 --> 07:17:04,280 means that our model is also no longer defined. So we would have lost our model. We would have 4083 07:17:04,280 --> 07:17:10,040 lost all of that effort of training. Now, luckily, we didn't train the model for too long. So we can 4084 07:17:10,040 --> 07:17:16,120 just go run time, run all. And it's going to rerun all of the previous cells and be quite quick. 4085 07:17:17,160 --> 07:17:21,800 Because we're working with a small data set and using a small model. But we've been through all 4086 07:17:21,800 --> 07:17:26,680 of this code. Oh, what have we got wrong here? Model zero state dict. Well, that's all right. 4087 07:17:26,680 --> 07:17:31,720 This is good. We're finding errors. So if you want to as well, you can just go run after. It's going 4088 07:17:31,720 --> 07:17:38,600 to run all of the cells after. Beautiful. And we come back down. There's our model training. 4089 07:17:38,600 --> 07:17:42,840 We're getting very similar values to what we got before. There's the lost curves. Beautiful. 4090 07:17:42,840 --> 07:17:47,480 Still going. Okay. Now our predictions are back because we've rerun all the cells and we've got 4091 07:17:47,480 --> 07:17:58,200 our model here. So what we might cover in this video is saving a model in PyTorch. Because if 4092 07:17:58,200 --> 07:18:03,320 we're training a model and you get to a certain point, especially when you have a larger model, 4093 07:18:03,320 --> 07:18:09,000 you probably want to save it and then reuse it in this particular notebook itself. Or you might 4094 07:18:09,000 --> 07:18:13,400 want to save it somewhere and send it to your friend so that your friend can try it out. Or you 4095 07:18:13,400 --> 07:18:18,680 might want to use it in a week's time. And if Google Colab is disconnected, you might want to 4096 07:18:18,680 --> 07:18:24,440 be able to load it back in somehow. So now let's see how we can save our models in PyTorch. So 4097 07:18:25,960 --> 07:18:32,520 I'm going to write down here. There are three main methods you should know about 4098 07:18:34,360 --> 07:18:40,520 for saving and loading models in PyTorch because of course with saving comes loading. So we're 4099 07:18:40,520 --> 07:18:47,800 going to over the next two videos discuss saving and loading. So one is torch.save. And as you might 4100 07:18:47,800 --> 07:19:01,560 guess, this allows you to save a PyTorch object in Python's pickle format. So you may or may not 4101 07:19:01,560 --> 07:19:09,880 be aware of Python pickle. There we go. Python object serialization. There we go. So we've got 4102 07:19:09,880 --> 07:19:15,480 the pickle module implements a binary protocols or implements binary protocols for serializing 4103 07:19:15,480 --> 07:19:21,800 and deserializing a Python object. So serializing means I understand it is saving and deserializing 4104 07:19:21,800 --> 07:19:28,200 means that it's loading. So this is what PyTorch uses behind the scenes, which is from pure Python. 4105 07:19:28,920 --> 07:19:35,080 So if we go back here in Python's pickle format, number two is torch.load, which you might be able 4106 07:19:35,080 --> 07:19:44,040 to guess what that does as well, allows you to load a saved PyTorch object. And number three is 4107 07:19:44,040 --> 07:19:53,640 also very important is torch.nn.module.loadStatedict. Now what does this allow you to do? Well, 4108 07:19:54,200 --> 07:20:02,760 this allows you to load a model's saved dictionary or save state dictionary. Yeah, that's what we'll 4109 07:20:02,760 --> 07:20:08,360 call it. Save state dictionary. Beautiful. And what's the model state dict? Well, let's have a look, 4110 07:20:08,360 --> 07:20:14,600 model zero dot state dict. The beauty of PyTorch is that it stores a lot of your model's important 4111 07:20:14,600 --> 07:20:21,080 parameters in just a simple Python dictionary. Now it might not be that simple because our model, 4112 07:20:21,080 --> 07:20:25,960 again, only has two parameters. In the future, you may be working with models with millions of 4113 07:20:25,960 --> 07:20:32,520 parameters. So looking directly at the state deck may not be as simple as what we've got here. 4114 07:20:32,520 --> 07:20:39,800 But the principle is still the same. It's still a dictionary that holds the state of your model. 4115 07:20:39,800 --> 07:20:43,960 And so I've got these three methods I want to show you where from because this is going to be 4116 07:20:43,960 --> 07:20:49,800 your extra curriculum, save and load models, your extra curriculum for this video. 4117 07:20:50,920 --> 07:20:56,760 If we go into here, this is a very, very, very important piece of PyTorch documentation, 4118 07:20:56,760 --> 07:21:02,200 or maybe even a tutorial. So your extra curriculum for this video is to go through it. 4119 07:21:02,200 --> 07:21:07,080 Here we go. We've got torch, save, torch, load, torch, module, state deck. That's where, or load 4120 07:21:07,080 --> 07:21:12,200 state deck, that's where I've got the three things that we've just written down. And there's a fair 4121 07:21:12,200 --> 07:21:17,160 few different pieces of information. So what is a state deck? So in PyTorch, the learnable 4122 07:21:17,160 --> 07:21:22,280 parameters, i.e. the weights and biases of a torch and end module, which is our model. 4123 07:21:22,280 --> 07:21:27,800 Remember, our model subclasses and end module are contained in the model's parameters. Access 4124 07:21:27,800 --> 07:21:33,960 with model.parameters, a state deck is simply a Python dictionary object that maps each layer 4125 07:21:33,960 --> 07:21:39,320 to its parameter tensor. That's what we've seen. And so then if we define a model, 4126 07:21:39,320 --> 07:21:44,040 we can initialize the model. And if we wanted to print the state decked, we can use that. 4127 07:21:44,040 --> 07:21:49,000 The optimizer also has a state deck. So that's something to be aware of. You can go optimizer.state 4128 07:21:49,000 --> 07:21:55,800 deck. And then you get an output here. And this is our saving and loading model for inference. So 4129 07:21:55,800 --> 07:22:00,040 inference, again, is making a prediction. That's probably what we want to do in the future at some 4130 07:22:00,040 --> 07:22:05,160 point. For now, we've made predictions right within our notebook. But if we wanted to use our model 4131 07:22:05,160 --> 07:22:11,320 outside of our notebook, say in an application, or in another notebook that's not this one, 4132 07:22:11,320 --> 07:22:16,200 you'll want to know how to save and load it. So the recommended way of saving and loading a 4133 07:22:16,200 --> 07:22:21,480 PyTorch model is by saving its state deck. Now, there is another method down here, 4134 07:22:21,480 --> 07:22:27,320 which is saving and loading the entire model. So your extracurricular for this lesson, 4135 07:22:29,320 --> 07:22:33,800 we're going to go through the code to do this. But your extracurricular is to read all of the 4136 07:22:34,840 --> 07:22:40,040 sections in here, and then figure out what the pros and cons are of saving and loading the entire 4137 07:22:40,040 --> 07:22:46,280 model versus saving and loading just the state deck. So that's a challenge for you for this video. 4138 07:22:46,280 --> 07:22:50,120 I'm going to link this in here. And now let's write some code to save our model. 4139 07:22:50,120 --> 07:23:07,320 So PyTorch save and load code. Code tutorial plus extracurricular. So if we go 4140 07:23:10,040 --> 07:23:16,840 saving our PyTorch model. So what might we want? What do you think the save parameter takes? 4141 07:23:16,840 --> 07:23:24,440 If we have torch.save, what do you think it takes inside it? Well, let's find out together. 4142 07:23:24,440 --> 07:23:29,480 Hey, so let's import part lib. We're going to see why in a second. This is Python's 4143 07:23:29,480 --> 07:23:35,000 module for dealing with writing file paths. So if we wanted to save something to this is Google 4144 07:23:35,000 --> 07:23:41,160 Colab's file section over here. But just remember, if we do save this from within Google Colab, 4145 07:23:41,160 --> 07:23:48,040 the model will disappear if our Google Colab notebook instance disconnects. So I'll show you 4146 07:23:48,040 --> 07:23:55,880 how to download it from Google Colab if you want. Google Colab also has a way save from Google Colab 4147 07:23:57,080 --> 07:24:02,760 Google Colab to Google Drive to save it to your Google Drive if you wanted to. But I'll leave you 4148 07:24:02,760 --> 07:24:09,000 to look at that on your own if you like. So we're first going to create a model directory. 4149 07:24:09,000 --> 07:24:15,640 So create models directory. So this is going to help us create a folder over here called models. 4150 07:24:15,640 --> 07:24:21,480 And of course, we could create this by hand by adding a new folder here somewhere. But I like 4151 07:24:21,480 --> 07:24:28,440 to do it with code. So model path, we're going to set this to path, which is using the path library 4152 07:24:28,440 --> 07:24:34,920 here to create us a path called models. Simple. We're just going to save all of our models to 4153 07:24:34,920 --> 07:24:41,400 models to the models file. And then we're going to create model path, we're going to make that 4154 07:24:41,400 --> 07:24:47,880 directory model path dot mkdir for make directory. We're going to set parents to equals true. 4155 07:24:49,000 --> 07:24:53,880 And we're also going to set exist okay equals to true. That means if it already existed, 4156 07:24:53,880 --> 07:24:59,160 it won't throw us an error. It will try to create it. But if it already exists, it'll just recreate 4157 07:24:59,160 --> 07:25:04,040 the parents directory or it'll leave it there. It won't error out on us. We're also going to 4158 07:25:04,040 --> 07:25:10,600 create a model save path. This way, we can give our model a name. Right now, it's just model zero. 4159 07:25:12,520 --> 07:25:17,560 We want to save it under some name to the models directory. So let's create the model name. 4160 07:25:18,600 --> 07:25:25,720 Model name equals 01. I'm going to call it 01 for the section. That way, if we have more models 4161 07:25:25,720 --> 07:25:30,120 later on the course, we know which ones come from where you might create your own naming 4162 07:25:30,120 --> 07:25:37,320 convention, model workflow, pytorch workflow, model zero dot pth. And now this is another 4163 07:25:37,320 --> 07:25:46,280 important point. Pytorch objects usually have the extension dot pth for pytorch or dot pth. 4164 07:25:46,280 --> 07:25:52,360 So if we go in here, and if we look up dot pth, yeah, a common convention is to save models 4165 07:25:52,360 --> 07:25:58,600 using either a dot pth or dot pth file extension. I'll let you choose which one you like. I like 4166 07:25:58,600 --> 07:26:05,480 dot pth. So if we go down here dot pth, they both result in the same thing. You just have to remember 4167 07:26:05,480 --> 07:26:11,960 to make sure you write the right loading path and right saving path. So now we're going to create 4168 07:26:11,960 --> 07:26:17,240 our model save path, which is going to be our model path. And because we're using the path lib, 4169 07:26:17,240 --> 07:26:23,880 we can use this syntax that we've got here, model path slash model name. And then if we just print out 4170 07:26:23,880 --> 07:26:32,760 model save path, what does this look like? There we go. So it creates a supposic path 4171 07:26:32,760 --> 07:26:40,760 using the path lib library of models slash 01 pytorch workflow model zero dot pth. We haven't 4172 07:26:40,760 --> 07:26:45,560 saved our model there yet. It's just got the path that we want to save our model ready. So if we 4173 07:26:45,560 --> 07:26:52,520 refresh this, we've got models over here. Do we have anything in there? No, we don't yet. So now 4174 07:26:52,520 --> 07:26:59,240 is our step to save the model. So three is save the model state dict. Why are we saving the state 4175 07:26:59,240 --> 07:27:04,440 dict? Because that's the recommended way of doing things. If we come up here, saving and loading the 4176 07:27:04,440 --> 07:27:09,400 model for inference, save and load the state dict, which is recommended. We could also save the entire 4177 07:27:09,400 --> 07:27:15,240 model. But that's part of your extra curriculum to look into that. So let's use some syntax. It's 4178 07:27:15,240 --> 07:27:20,200 quite like this torch dot save. And then we pass it an object. And we pass it a path of where to 4179 07:27:20,200 --> 07:27:24,760 save it. We already have a path. And good thing is we already have a model. So we just have to call 4180 07:27:24,760 --> 07:27:36,360 this. Let's try it out. So let's go print f saving model to and we'll put in the path here. 4181 07:27:37,640 --> 07:27:44,040 Model save path. I like to print out some things here and there that way. We know what's going on. 4182 07:27:44,040 --> 07:27:51,880 And I don't need that capital. Why do I? Getting a little bit trigger happy here with the typing. 4183 07:27:51,880 --> 07:27:57,480 So torch dot save. And we're going to pass in the object parameter here. And if we looked up torch 4184 07:27:57,480 --> 07:28:07,480 save, we can go. What does this code take? So torch save object f. What is f? A file like object. 4185 07:28:07,480 --> 07:28:13,800 Okay. Or a string or OS path like object. Beautiful. That's what we've got. A path like 4186 07:28:13,800 --> 07:28:21,480 object containing a file name. So let's jump back into here. The object is what? It's our model zero 4187 07:28:21,480 --> 07:28:29,320 dot state dict. That's what we're saving. And then the file path is model save path. You ready? 4188 07:28:29,320 --> 07:28:35,800 Let's run this and see what happens. Beautiful. Saving model to models. So it's our model path. 4189 07:28:35,800 --> 07:28:39,480 And there's our model there. So if we refresh this, what do we have over here? 4190 07:28:39,480 --> 07:28:44,680 Wonderful. We've saved our trained model. So that means we could potentially if we wanted to, 4191 07:28:44,680 --> 07:28:49,480 you could download this file here. That's going to download it from Google CoLab to your local 4192 07:28:49,480 --> 07:28:56,440 machine. That's one way to do it. But there's also a guide here to save from Google Collaboratory 4193 07:28:56,440 --> 07:29:01,160 to Google Drive. That way you could use it later on. So there's many different ways. 4194 07:29:01,160 --> 07:29:06,440 The beauty of pie torches is flexibility. So now we've got a saved model. But let's just check 4195 07:29:06,440 --> 07:29:15,080 using our LS command. We're going to check models. Yeah, let's just check models. This is going to 4196 07:29:15,080 --> 07:29:23,800 check here. So this is list. Wonderful. There's our 01 pie torch workflow model zero dot pth. Now, 4197 07:29:23,800 --> 07:29:29,480 of course, we've saved a model. How about we try loading it back in and seeing how it works. So if 4198 07:29:29,480 --> 07:29:35,160 you want to challenge, read ahead on the documentation and try to use torch dot load to bring our model 4199 07:29:35,160 --> 07:29:42,520 back in. See what happens. I'll see in the next video. Welcome back. In the last video, we wrote 4200 07:29:42,520 --> 07:29:47,320 some code here to save our pie torch model. I'm just going to exit out of this couple of things 4201 07:29:47,320 --> 07:29:53,080 that we don't need just to clear up the screen. And now we've got our dot pth file, because remember 4202 07:29:53,080 --> 07:29:58,840 dot pth or dot pth is a common convention for saving a pie torch model. We've got it saved there, 4203 07:29:58,840 --> 07:30:03,640 and we didn't necessarily have to write all of this path style code. But this is just handy for 4204 07:30:03,640 --> 07:30:11,000 later on if we wanted to functionize this and create it in say a save dot pie file over here, 4205 07:30:11,000 --> 07:30:16,280 so that we could just call our save function and pass it in a file path where we wanted to save 4206 07:30:16,280 --> 07:30:21,480 like a directory and a name, and then it'll save it exactly how we want it for later on. 4207 07:30:22,280 --> 07:30:27,960 But now we've got a saved model. I issued a challenge of trying to load that model in. 4208 07:30:27,960 --> 07:30:33,880 So do we have torch dot load in here? Did you try that out? We've got, oh, we've got a few options 4209 07:30:33,880 --> 07:30:39,720 here. Wonderful. But we're using one of the first ones. So let's go back up here. If we wanted to 4210 07:30:39,720 --> 07:30:46,440 check the documentation for torch dot load, we've got this option here, load. What happens? Loads 4211 07:30:46,440 --> 07:30:52,600 and objects saved with torch dot save from a file. Torch dot load uses Python's unpickling 4212 07:30:52,600 --> 07:30:59,080 facilities, but treat storages which underlie tenses specially. They are firstly serialized 4213 07:30:59,080 --> 07:31:05,720 on the CPU, and then I moved the device they were saved from. Wonderful. So this is moved to the 4214 07:31:05,720 --> 07:31:11,160 device. If later on when we're using a GPU, this is just something to keep in mind. We'll see that 4215 07:31:11,160 --> 07:31:17,240 when we start to use a CPU and a GPU. But for now, let's practice using the torch dot load method 4216 07:31:17,240 --> 07:31:24,200 and see how we can do it. So we'll come back here and we'll go loading a pytorch model. 4217 07:31:24,760 --> 07:31:31,560 And since we, she's going to start writing here, since we saved our models state debt, 4218 07:31:32,360 --> 07:31:36,200 so just the dictionary of parameters from a model, rather than 4219 07:31:36,200 --> 07:31:47,320 the entire model, we'll create a new instance of our model class and load the state deck, 4220 07:31:49,480 --> 07:31:54,040 load the saved state deck. That's better state deck into that. 4221 07:31:55,480 --> 07:32:01,560 Now, this is just words on a page. Let's see this in action. So to load in a state deck, 4222 07:32:01,560 --> 07:32:05,160 which is what we say, we didn't save the entire model itself, which is one option. 4223 07:32:05,160 --> 07:32:11,480 That's extra curriculum, but we saved just the model state deck. So if we remind ourselves what 4224 07:32:11,480 --> 07:32:17,160 model zero dot state deck looks like, we saved just this. So to load this in, we have to 4225 07:32:20,360 --> 07:32:27,800 instantiate a new class or a new instance of our linear regression model class. So to load in a 4226 07:32:27,800 --> 07:32:40,120 saved state deck, we have to instantiate a new instance of our model class. So let's call this 4227 07:32:40,120 --> 07:32:46,120 loaded model zero. I like that. That way we can differentiate because it's still going to be the 4228 07:32:46,120 --> 07:32:52,120 same parameters as model zero, but this way we know that this instance is the loaded version, 4229 07:32:52,120 --> 07:32:57,320 not just the version we've been training before. So we'll create a new version of it here, 4230 07:32:57,320 --> 07:33:02,920 linear regression model. This is just the code that we wrote above, linear regression model. 4231 07:33:03,480 --> 07:33:14,040 And then we're going to load the saved state deck of model zero. And so this will update the new 4232 07:33:14,040 --> 07:33:23,080 instance with updated parameters. So let's just check before we load it, we haven't written any 4233 07:33:23,080 --> 07:33:27,480 code to actually load anything. What does loaded model zero? What does the state deck look like here? 4234 07:33:28,520 --> 07:33:31,080 It won't have anything. It'll be initialized with what? 4235 07:33:31,960 --> 07:33:38,600 Oh, loaded. That's what I called it loaded. See how it's initialized with random parameters. 4236 07:33:38,600 --> 07:33:43,960 So essentially all we're doing when we load a state dictionary into our new instance of our 4237 07:33:43,960 --> 07:33:49,880 model is that we're going, hey, take the saved state deck from this model and plug it into this. 4238 07:33:49,880 --> 07:33:56,200 So let's see what happens when we do that. So loaded model zero. Remember how I said there's 4239 07:33:56,200 --> 07:34:03,240 a method to also be aware of up here, which is torch nn module dot load state deck. And because 4240 07:34:03,240 --> 07:34:09,800 our model is a what, it's a subclass of torch dot nn dot module. So we can call load state deck 4241 07:34:09,800 --> 07:34:16,920 on our model directly or on our instance. So recall linear regression model is a subclass 4242 07:34:16,920 --> 07:34:24,280 of nn dot module. So let's call in load state deck. And this is where we call the torch dot load 4243 07:34:24,280 --> 07:34:31,720 method. And then we pass it the model save path. Is that what we call it? Because torch dot load, 4244 07:34:31,720 --> 07:34:39,560 it takes in F. So what's F a file like object or a string or a OS path like object. So that's 4245 07:34:39,560 --> 07:34:45,720 why we created this path like object up here. Model save path. So all we're doing here, 4246 07:34:45,720 --> 07:34:51,000 we're creating a new instance, linear regression model, which is a subclass of nn dot module. 4247 07:34:51,000 --> 07:34:58,520 And then on that instance, we're calling in load state deck of torch dot load model save path. 4248 07:34:58,520 --> 07:35:04,120 Because what's saved at the model save path, our previous models state deck, which is here. 4249 07:35:04,120 --> 07:35:09,960 So if we run this, let's see what happens. All keys match successfully. That is beautiful. 4250 07:35:09,960 --> 07:35:15,800 And so see the values here, loaded state deck of model zero. Well, let's check the loaded version 4251 07:35:15,800 --> 07:35:22,760 of that. We now have wonderful, we have the exact same values as above. But there's a little 4252 07:35:22,760 --> 07:35:30,040 way that we can test this. So how about we go make some predictions. So make some predictions. 4253 07:35:30,040 --> 07:35:39,960 Just to make sure with our loaded model. So let's put it in a valve mode. Because when you make 4254 07:35:39,960 --> 07:35:45,320 predictions, you want it in evaluation mode. So it goes a little bit faster. And we want to 4255 07:35:45,320 --> 07:35:53,240 also use inference mode. So with torch dot inference mode for making predictions. We want to write 4256 07:35:53,240 --> 07:35:58,120 this loaded model preds, we're going to make some predictions on the test data as well. So loaded 4257 07:35:58,120 --> 07:36:04,040 model zero, we're going to forward pass on the X test data. And then we can have a look at the 4258 07:36:04,040 --> 07:36:12,920 loaded model preds. Wonderful. And then to see if the two models are the same, we can compare 4259 07:36:14,040 --> 07:36:23,320 loaded model preds with original model preds. So why preds? These should be equivalent equals 4260 07:36:23,320 --> 07:36:31,880 equals loaded model preds. Do we have the same thing? False, false, false, what's going on here? 4261 07:36:32,840 --> 07:36:42,440 Why preds? How much different are they? Oh, where's that happened? Have we made some 4262 07:36:42,440 --> 07:36:50,600 model preds with this yet? So how about we make some model preds? This is troubleshooting on 4263 07:36:50,600 --> 07:37:00,520 the fly team. So let's go model zero dot eval. And then with torch dot inference mode, 4264 07:37:00,520 --> 07:37:06,040 this is how we can check to see that our two models are actually equivalent. Why preds equals, 4265 07:37:06,040 --> 07:37:13,160 I have a feeling why preds actually save somewhere else equals model zero. And then we pass it the 4266 07:37:13,160 --> 07:37:22,040 X test data. And then we might move this above here. And then have a look at what why preds equals. 4267 07:37:23,000 --> 07:37:30,760 Do we get the same output? Yes, we should. Wonderful. Okay, beautiful. So now we've covered 4268 07:37:30,760 --> 07:37:36,360 saving and loading models or specifically saving the models state deck. So we saved it here with 4269 07:37:36,360 --> 07:37:42,920 this code. And then we loaded it back in with load state deck plus torch load. And then we 4270 07:37:42,920 --> 07:37:48,360 checked to see by testing equivalents of the predictions of each of our models. So the original 4271 07:37:48,360 --> 07:37:53,720 one that we trained here, model zero, and the loaded version of it here. So that's saving and 4272 07:37:53,720 --> 07:37:58,680 loading a model in pytorch. There are a few more things that we could cover. But I'm going to leave 4273 07:37:58,680 --> 07:38:04,760 that for extra curriculum. We've covered the two main things or three main things. One, two, three. 4274 07:38:04,760 --> 07:38:09,240 If you'd like to read more, I'd highly encourage you to go through and read this tutorial here. 4275 07:38:09,240 --> 07:38:14,680 But with that being said, we've covered a fair bit of ground over the last few videos. How about 4276 07:38:14,680 --> 07:38:20,280 we do a few videos where we put everything together just to reiterate what we've done. 4277 07:38:20,280 --> 07:38:23,240 I think that'll be good practice. I'll see you in the next video. 4278 07:38:25,400 --> 07:38:30,360 Welcome back. Over the past few videos, we've covered a whole bunch of ground in a pytorch 4279 07:38:30,360 --> 07:38:35,800 workflow, starting with data, then building a model. Well, we split the data, then we built a 4280 07:38:35,800 --> 07:38:41,000 model. We looked at the model building essentials. We checked the contents of our model. We made 4281 07:38:41,000 --> 07:38:46,680 some predictions with a very poor model because it's based off random numbers. We spent a whole 4282 07:38:46,680 --> 07:38:50,760 bunch of time figuring out how we could train a model. We figured out what the loss function is. 4283 07:38:50,760 --> 07:38:57,320 We saw an optimizer. We wrote a training and test loop. We then learned how to save and load a 4284 07:38:57,320 --> 07:39:03,000 model in pytorch. So now I'd like to spend the next few videos putting all this together. We're 4285 07:39:03,000 --> 07:39:07,400 not going to spend as much time on each step, but we're just going to have some practice together 4286 07:39:07,400 --> 07:39:13,320 so that we can reiterate all the things that we've done. So putting it all together, 4287 07:39:14,280 --> 07:39:24,840 let's go back through the steps above and see it all in one place. Wonderful. 4288 07:39:24,840 --> 07:39:33,240 So we're going to start off with 6.1 and we'll go have a look at our workflow. So 6.1 is data, 4289 07:39:35,080 --> 07:39:39,800 but we're going to do one step before that. And I'm just going to get rid of this so we have a bit 4290 07:39:39,800 --> 07:39:46,200 more space. So we've got our data ready. We've turned it into tenses way back at the start. 4291 07:39:46,200 --> 07:39:50,600 Then we built a model and then we picked a loss function and an optimizer. We built a training 4292 07:39:50,600 --> 07:39:55,400 loop. We trained our model. We made some predictions. We saw that they were better. We evaluated our 4293 07:39:55,400 --> 07:40:00,280 model. We didn't use torch metrics, but we got visual. We saw our red dots starting to line up 4294 07:40:00,280 --> 07:40:04,680 with the green dots. We haven't really improved through experimentation. We did a little bit of 4295 07:40:04,680 --> 07:40:10,120 it though, as in we saw that if we trained our model for more epochs, we got better results. 4296 07:40:10,120 --> 07:40:14,360 So you could argue that we have done a little bit of this, but there are other ways to experiment. 4297 07:40:14,360 --> 07:40:19,240 We're going to cover those throughout the course. And then we saw how to save and reload a trained 4298 07:40:19,240 --> 07:40:24,760 model. So we've been through this entire workflow, which is quite exciting, actually. 4299 07:40:24,760 --> 07:40:29,480 So now let's go back through it, but we're going to do it a bit quicker than what we've done before, 4300 07:40:29,480 --> 07:40:35,800 because I believe you've got the skills to do so now. So let's start by importing pytorch. 4301 07:40:36,840 --> 07:40:41,000 So you could start the code from here if you wanted to. And that plot live. And actually, 4302 07:40:41,000 --> 07:40:46,680 if you want, you can pause this video and try to recode all of the steps that we've done 4303 07:40:46,680 --> 07:40:50,760 by putting some headers here, like data, and then build a model and then train the model, 4304 07:40:50,760 --> 07:40:57,800 save and load a model, whatever, and try to code it out yourself. If not, feel free to follow along 4305 07:40:57,800 --> 07:41:05,000 with me and we'll do it together. So import torch from torch import. Oh, would help if I could spell 4306 07:41:05,000 --> 07:41:10,920 torch import and n because we've seen that we use an n quite a bit. And we're going to also 4307 07:41:10,920 --> 07:41:15,160 import map plot live because we like to make some plots because we like to get visual. 4308 07:41:15,160 --> 07:41:22,440 Visualize visualize visualize as PLT. And we're going to check out pytorch version. 4309 07:41:24,200 --> 07:41:28,200 That way we know if you're on an older version, some of the code might not work here. But if you're 4310 07:41:28,200 --> 07:41:34,680 on a newer version, it should work. If it doesn't, let me know. There we go. 1.10. I'm using 1.10 4311 07:41:34,680 --> 07:41:39,640 for this. By the time you watch this video, there may be a later version out. And we're also going 4312 07:41:39,640 --> 07:41:46,280 to let's create some device agnostic code. So create device agnostic code, because I think we're 4313 07:41:46,280 --> 07:41:59,640 up to this step now. This means if we've got access to a GPU, our code will use it for potentially 4314 07:41:59,640 --> 07:42:15,320 faster computing. If no GPU is available, the code will default to using CPU. We don't necessarily 4315 07:42:15,320 --> 07:42:19,640 need to use a GPU for our particular problem that we're working on right now because it's a small 4316 07:42:19,640 --> 07:42:25,320 model and it's a small data set, but it's good practice to write device agnostic code. So that 4317 07:42:25,320 --> 07:42:31,880 means our code will use a GPU if it's available, or a CPU by default, if a GPU is not available. 4318 07:42:31,880 --> 07:42:38,520 So set up device agnostic code. We're going to be using a similar setup to this throughout the 4319 07:42:38,520 --> 07:42:43,640 entire course from now on. So that's why we're bringing it back. CUDA is available. So remember 4320 07:42:43,640 --> 07:42:50,760 CUDA is NVIDIA's programming framework for their GPUs, else use CPU. And we're going to print 4321 07:42:50,760 --> 07:43:00,200 what device are we using? Device. So what we might do is if we ran this, it should be just a CPU 4322 07:43:00,200 --> 07:43:06,760 for now, right? Yours might be different to this if you've enabled a GPU, but let's change this 4323 07:43:06,760 --> 07:43:12,360 over to use CUDA. And we can do that if you're using Google Colab, we can change the runtime type 4324 07:43:12,360 --> 07:43:17,240 by selecting GPU here. And then I'm going to save this, but what's going to happen is it's 4325 07:43:17,240 --> 07:43:21,960 going to restart the runtime. So we're going to lose all of the code that we've written above. 4326 07:43:22,760 --> 07:43:30,760 How can we get it all back? Well, we can go. Run all. This is going to run all of the cells 4327 07:43:30,760 --> 07:43:35,320 above here. They should all work and it should be quite quick because our model and data aren't 4328 07:43:35,320 --> 07:43:42,920 too big. And if it all worked, we should have CUDA as our device that we can use here. Wonderful. 4329 07:43:42,920 --> 07:43:48,360 So the beauty of Google Colab is that they've given us access to on a video GPU. So thank you, 4330 07:43:48,360 --> 07:43:54,760 Google Colab. Just once again, I'm paying for the paid version of Google Colab. You don't have to. 4331 07:43:54,760 --> 07:44:00,200 The free version should give you access to a GPU, or be it it might not be as a later version as 4332 07:44:00,200 --> 07:44:06,600 GPU as the pro versions give access to. But this will be more than enough for what we're about to 4333 07:44:06,600 --> 07:44:12,280 recreate. So I feel like that's enough for this video. We've got some device agnostic code ready 4334 07:44:12,280 --> 07:44:18,120 to go. And for the next few videos, we're going to be rebuilding this except using device agnostic 4335 07:44:18,120 --> 07:44:24,920 code. So give it a shot yourself. There's nothing in here that we haven't covered before. So I'll 4336 07:44:24,920 --> 07:44:32,040 see you in the next video. Let's create some data. Welcome back. In the last video, we set up some 4337 07:44:32,040 --> 07:44:36,520 device agnostic code and we got ready to start putting everything we've learned together. 4338 07:44:36,520 --> 07:44:41,560 So now let's continue with that. We're going to recreate some data. Now we could just copy this 4339 07:44:41,560 --> 07:44:46,680 code, but we're going to write it out together so we can have some practice creating a dummy data 4340 07:44:46,680 --> 07:44:51,720 set. And we want to get to about this stage in this video. So we want to have some data that we can 4341 07:44:51,720 --> 07:44:57,560 plot so that we can build a model to once again, learn on the blue dots to predict the green dots. 4342 07:44:58,280 --> 07:45:03,080 So we'll come down here data. I'm going to get out of this as well so that we have a bit more room. 4343 07:45:03,080 --> 07:45:19,400 Let's now create some data using the linear regression formula of y equals weight times 4344 07:45:19,400 --> 07:45:29,240 features plus bias. And you may have heard this as y equals mx plus c or mx plus b or something like 4345 07:45:29,240 --> 07:45:34,840 that, or you can substitute these for different names. Images when I learned this in high school, 4346 07:45:34,840 --> 07:45:41,160 it was y equals mx plus c. Yours might be slightly different. Yeah, bx plus a. That's what they use 4347 07:45:41,160 --> 07:45:45,080 here. A whole bunch of different ways to name things, but they're all describing the same thing. 4348 07:45:45,720 --> 07:45:51,800 So let's see this in code rather than formulaic examples. So we're going to create our weight, 4349 07:45:51,800 --> 07:45:58,200 which is 0.7 and a bias, which is 0.3. These are the values we previously used for a challenge you 4350 07:45:58,200 --> 07:46:04,280 could change these to 0.1 maybe and 0.2. These could be whatever values you'd like to set them as. 4351 07:46:05,240 --> 07:46:11,880 So weight and bias, the principle is going to be the same thing. We're going to try and build a 4352 07:46:11,880 --> 07:46:18,840 model to estimate these values. So we're going to start at 0 and we're going to end at 1. 4353 07:46:19,640 --> 07:46:24,200 So we can just create a straight line and we're going to fill in those between 0 and 1 with a 4354 07:46:24,200 --> 07:46:32,680 step of 0.02. And now we'll create the x and y features x and y, which is features and labels 4355 07:46:32,680 --> 07:46:40,360 actually. So x is our features and y are our labels. x equals torch dot a range and x is a 4356 07:46:40,360 --> 07:46:47,320 capital Y is that because typically x is a feature matrix. Even though ours is just a vector now, 4357 07:46:47,320 --> 07:46:50,920 we're going to unsqueeze this so we don't run into dimensionality issues later on. 4358 07:46:50,920 --> 07:47:00,360 You can check this for yourself without unsqueeze, errors will pop up and y equals weight times 4359 07:47:00,360 --> 07:47:05,800 x plus bias. You see how we're going a little bit faster now? This is sort of the pace that we're 4360 07:47:05,800 --> 07:47:11,240 going to start going for things that we've already covered. If we haven't covered something, we'll 4361 07:47:11,240 --> 07:47:15,640 slow down, but if we have covered something, I'm going to step it through. We're going to start 4362 07:47:15,640 --> 07:47:22,440 speeding things up a little. So if we get some values here, wonderful. We've got some x values 4363 07:47:22,440 --> 07:47:28,120 and they correlate to some y values. We're going to try and use the training values of x to predict 4364 07:47:28,120 --> 07:47:34,200 the training values of y and subsequently for the test values. Oh, and speaking of training and test 4365 07:47:34,200 --> 07:47:41,320 values, how about we split the data? So let's split the data. Split data. So we'll create the 4366 07:47:41,320 --> 07:47:47,240 train split equals int 0.8. We're going to use 80%, which is where 0.8 comes from, 4367 07:47:47,800 --> 07:47:52,680 for the length of x. So we use 80% of our samples for the training, which is a typical 4368 07:47:52,680 --> 07:47:59,880 training and test split, 80, 20. They're abouts. You could use like 70, 30. You could use 90, 10. 4369 07:47:59,880 --> 07:48:04,200 It all depends on how much data you have. There's a lot of things in machine learning that are 4370 07:48:04,200 --> 07:48:10,280 quite flexible. Train split, we're going to index on our data here so that we can create our splits. 4371 07:48:10,280 --> 07:48:19,080 Google Colab auto corrected my code in a non-helpful way just then. And we're going to do the 4372 07:48:19,080 --> 07:48:27,080 opposite split for the testing data. Now let's have a look at the lengths of these. If my calculations 4373 07:48:27,080 --> 07:48:37,560 are correct, we should have about 40 training samples and 10 testing samples. And again, this 4374 07:48:37,560 --> 07:48:42,760 may change in the future. When you work with larger data sets, you might have 100,000 training 4375 07:48:42,760 --> 07:48:50,360 samples and 20,000 testing samples. The ratio will often be quite similar. And then let's plot 4376 07:48:50,360 --> 07:48:59,480 what's going on here. So plot the data and note, if you don't have the plot predictions 4377 07:48:59,480 --> 07:49:08,360 function loaded, this will error. So we can just run plot predictions here if we wanted to. And 4378 07:49:08,360 --> 07:49:16,360 we'll pass it in X train, Y train, X test, Y test. And this should come up with our 4379 07:49:17,960 --> 07:49:22,120 plot. Wonderful. So we've just recreated the data that we've been previously using. We've got 4380 07:49:22,120 --> 07:49:26,920 blue dots to predict green dots. But if this function errors out because you've started the notebook 4381 07:49:26,920 --> 07:49:33,640 from here, right from this cell, and you've gone down from there, just remember, you'll just have 4382 07:49:33,640 --> 07:49:40,040 to go up here and copy this function. We don't have to do it because we've run all the cells, 4383 07:49:40,040 --> 07:49:45,960 but if you haven't run that cell previously, you could put it here and then run it, run it, 4384 07:49:46,520 --> 07:49:54,360 and we'll get the same outcome here. Wonderful. So what's next? Well, if we go back to our workflow, 4385 07:49:54,360 --> 07:50:00,600 we've just created some data. And have we turned it into tenses yet? I think it's just still, oh, 4386 07:50:00,600 --> 07:50:07,000 yeah, it is. It's tenses because we use PyTorch to create it. But now we're up to building or 4387 07:50:07,000 --> 07:50:12,600 picking a model. So we've built a model previously. We did that back in build model. So you could 4388 07:50:12,600 --> 07:50:16,200 refer to that code and try to build a model to fit the data that's going on here. So that's 4389 07:50:16,200 --> 07:50:23,400 your challenge for the next video. So building a PyTorch linear model. And why do we call it linear? 4390 07:50:23,400 --> 07:50:29,960 Because linear refers to a straight line. What's nonlinear? Non-straight. So I'll see you in the 4391 07:50:29,960 --> 07:50:35,400 next video. Give it a shot before we get there. But we're going to build a PyTorch linear model. 4392 07:50:39,160 --> 07:50:44,520 Welcome back. We're going through some steps to recreate everything that we've done. In the last 4393 07:50:44,520 --> 07:50:51,080 video, we created some dummy data. And we've got a straight line here. So now by the workflow, 4394 07:50:51,080 --> 07:50:54,840 we're up to building a model or picking a model. In our case, we're going to build one 4395 07:50:54,840 --> 07:51:00,360 to suit our problem. So we've got some linear data. And I've put building a PyTorch linear model 4396 07:51:00,360 --> 07:51:04,520 here. I issued you the challenge of giving it a go. You could do exactly the same steps that 4397 07:51:04,520 --> 07:51:08,760 we've done in build model. But I'm going to be a little bit cheeky and introduce something 4398 07:51:09,400 --> 07:51:17,160 new here. And that is the power of torch.nn. So let's see it. What we're going to do is we're 4399 07:51:17,160 --> 07:51:28,040 going to create a linear model by subclassingnn.module because why a lot of PyTorch models, 4400 07:51:28,040 --> 07:51:33,880 subclass, and then module. So class linear regression, what should we call this one? 4401 07:51:34,520 --> 07:51:41,080 Linear regression model v2. How about that? And we'll subclassnn.module. So much similar code to 4402 07:51:41,080 --> 07:51:46,120 what we've been writing so far. Or when we first created our linear regression model. 4403 07:51:46,120 --> 07:51:53,160 And then we're going to put the standard constructor code here, def init underscore underscore. 4404 07:51:53,160 --> 07:51:59,720 And it's going to take as an argument self. And then we're going to call super dot another 4405 07:51:59,720 --> 07:52:07,880 underscore init underscore underscore brackets. But we're going to instead of if you recall above 4406 07:52:08,440 --> 07:52:15,160 back in the build model section, we initialized these parameters ourselves. And I've been hinting 4407 07:52:15,160 --> 07:52:23,320 at in the past in videos we've seen before that oftentimes you won't necessarily initialize the 4408 07:52:23,320 --> 07:52:31,160 parameters yourself. You'll instead initialize layers that have the parameters in built in those 4409 07:52:31,160 --> 07:52:37,800 layers. We still have to create a forward method. But what we're going to see is how we can use our 4410 07:52:37,800 --> 07:52:43,960 torch linear layer to do these steps for us. So let's write the code and then we'll step through it. 4411 07:52:43,960 --> 07:52:53,320 So we'll go usenn.linear because why we're building linear regression model and our data is linear. 4412 07:52:53,320 --> 07:53:00,760 And in the past, our previous model has implemented linear regression formula. So for creating the 4413 07:53:00,760 --> 07:53:11,640 model parameters. So we can go self dot linear layer equals. So this is constructing a variable 4414 07:53:11,640 --> 07:53:20,600 that this class can use self linear layer equals nn dot linear. Remember, nn in PyTorch stands for 4415 07:53:20,600 --> 07:53:27,880 neural network. And we have in features as one of the parameters and out features as another 4416 07:53:27,880 --> 07:53:35,240 parameter. This means we want to take as input of size one and output of size one. Where does that 4417 07:53:35,240 --> 07:53:46,200 come from? Well, if we have a look at x train and y train, we have one value of x. Maybe there's 4418 07:53:46,200 --> 07:53:58,440 too many here. x five will be the first five five and five. So recall, we have one value of x 4419 07:53:58,440 --> 07:54:04,120 equates to one value of y. So that means within this linear layer, we want to take as one feature 4420 07:54:04,120 --> 07:54:11,480 x to output one feature y. And we're using just one layer here. So the input and the output shapes 4421 07:54:11,480 --> 07:54:18,520 of your model in features, out features, what data goes in and what data comes out. These values 4422 07:54:18,520 --> 07:54:23,160 will be highly dependent on the data that you're working with. And we're going to see different 4423 07:54:23,160 --> 07:54:29,000 data or different examples of input features and output features all throughout this course. So 4424 07:54:29,000 --> 07:54:34,680 but that is what's happening. We have one in feature to one out feature. Now what's happening 4425 07:54:34,680 --> 07:54:42,120 inside nn.linear. Let's have a look torch and then linear. We go the documentation 4426 07:54:43,720 --> 07:54:48,920 applies a linear transformation to the incoming data. Where have we seen this before? 4427 07:54:49,720 --> 07:54:55,640 y equals x a t plus b. Now they're using different letters, but we've got the same formula as 4428 07:54:55,640 --> 07:55:03,080 what's happening up here. Look at the same formula as our data. Wait times x plus bias. And then if 4429 07:55:03,080 --> 07:55:11,320 we look up linear regression formula once again, linear regression formula. We've got this formula 4430 07:55:11,320 --> 07:55:19,240 here. Now again, these letters can be replaced by whatever letters you like. But this linear layer 4431 07:55:19,240 --> 07:55:27,000 is implementing the linear regression formula that we created in our model before. So it's 4432 07:55:27,000 --> 07:55:34,040 essentially doing this part for us. And behind the scenes, the layer creates these parameters for us. 4433 07:55:34,600 --> 07:55:39,880 So that's a big piece of the puzzle of pie torch is that as I've said, you won't always be 4434 07:55:39,880 --> 07:55:45,960 initializing the parameters your model yourself. You'll generally initialize layers. And then you'll 4435 07:55:45,960 --> 07:55:52,280 use those layers in some Ford computation. So let's see how we could do that. So we've got a linear 4436 07:55:52,280 --> 07:55:58,120 layer which takes us in features one and out features one. What should we do now? Well, because 4437 07:55:58,120 --> 07:56:05,880 we've subclassed nn.module we need to override the Ford method. So we need to tell our model 4438 07:56:05,880 --> 07:56:10,600 what should it do as the Ford computation. And in here it's going to take itself as input, 4439 07:56:10,600 --> 07:56:15,720 as well as x, which is conventional for the input data. And then we're just going to return 4440 07:56:15,720 --> 07:56:24,920 here, self dot linear layer x. Right. And actually, we might use some typing here to say that this 4441 07:56:24,920 --> 07:56:31,960 should be a torch tensor. And it's also going to return a torch dot tensor. That's using Python's 4442 07:56:31,960 --> 07:56:37,320 type ins. So this is just saying, hey, X should be a torch tensor. And I'm going to return you a 4443 07:56:37,320 --> 07:56:43,000 torch tensor, because I'm going to pass x through the linear layer, which is expecting one in feature 4444 07:56:43,000 --> 07:56:48,520 and one out feature. And it's going to this linear transform. That's another word for it. Again, 4445 07:56:48,520 --> 07:56:53,720 pytorch and machine learning in general has many different names of the same thing. I would call 4446 07:56:53,720 --> 07:57:03,720 this linear layer. I'm going to write here, also called linear transform, probing layer, 4447 07:57:03,720 --> 07:57:12,920 fully connected layer, dense layer, intensive flow. So a whole bunch of different names for 4448 07:57:12,920 --> 07:57:17,800 the same thing, but they're all implementing a linear transform. They're all implementing a 4449 07:57:17,800 --> 07:57:24,760 version of linear regression y equals x, a ranspose plus b, in features, out features, 4450 07:57:24,760 --> 07:57:32,280 wonderful. So let's see this in action. So we're going to go set the manual seed so we can 4451 07:57:32,280 --> 07:57:42,200 get reproducibility as well, torch dot manual seed. And we're going to set model one equals 4452 07:57:42,920 --> 07:57:48,280 linear regression. This is model one, because we've already got model zero, linear regression 4453 07:57:48,280 --> 07:57:54,440 V two, and we're going to check model one, and we're going to check its state dictionary, 4454 07:57:55,080 --> 07:58:01,160 state dict. There we go. What do we have inside this ordered dict? Has that not created anything 4455 07:58:01,160 --> 07:58:14,840 for us? Model one, dot state dinked, ordered dink. We haven't got anything here in the regression 4456 07:58:14,840 --> 07:58:24,680 model V two. Ideally, this should be outputting a weight and a bias. Yeah, variables, weight, 4457 07:58:24,680 --> 07:58:29,640 and bias. Let's dig through our code line by line and see what we've got wrong. Ah, did you notice 4458 07:58:29,640 --> 07:58:34,840 this? The init function so the constructor had the wrong amount of underscores. So it was never 4459 07:58:34,840 --> 07:58:42,840 actually constructing this linear layer troubleshooting on the fly team. There we go. Beautiful. So we 4460 07:58:42,840 --> 07:58:50,440 have a linear layer, and we have it is created for us inside a weight and a bias. So effectively, 4461 07:58:50,440 --> 07:58:55,800 we've replaced the code we wrote above for build model, initializing a weight and bias parameter 4462 07:58:55,800 --> 07:59:01,240 with the linear layer. And you might be wondering why the values are slightly different, even though 4463 07:59:01,240 --> 07:59:07,160 we've used the manual seed. This goes behind the scenes of how PyTorch creates its different 4464 07:59:07,160 --> 07:59:11,560 layers. It's probably using a different form of randomness to create different types of 4465 07:59:11,560 --> 07:59:17,720 variables. So just keep that in mind. And to see this in action, we have a conversion here. 4466 07:59:18,760 --> 07:59:24,600 So this is what's going on. We've converted, this is our original model class, linear regression. 4467 07:59:24,600 --> 07:59:29,880 We initialize our model parameters here. We've got a weight and a bias. But instead, we've 4468 07:59:29,880 --> 07:59:36,040 swapped this in our linear regression model V2. This should be V2 to use linear layer. And then 4469 07:59:36,040 --> 07:59:42,200 in the forward method, we had to write the formula manually here when we initialize the parameters 4470 07:59:42,200 --> 07:59:48,600 manually. But because of the power of torch.nn, we have just passed it through the linear layer, 4471 07:59:48,600 --> 07:59:54,280 which is going to perform some predefined forward computation in this layer. So this 4472 07:59:54,280 --> 07:59:59,640 style of what's going on here is how you're going to see the majority of your PyTorch 4473 07:59:59,640 --> 08:00:07,000 deep learning models created using pre-existing layers from the torch.nn module. So if we go back 4474 08:00:07,000 --> 08:00:14,840 into torch.nn, torch.nn, we have a lot of different layers here. So we have convolutional layers, 4475 08:00:14,840 --> 08:00:19,480 pooling layers, padding layers, normalization, recurrent, transformer, linear, we're using a 4476 08:00:19,480 --> 08:00:24,200 linear layer, dropout, et cetera, et cetera. So for all of the common layers in deep learning, 4477 08:00:24,200 --> 08:00:28,600 because that's what neural networks are, they're layers of different mathematical transformations, 4478 08:00:29,160 --> 08:00:34,840 PyTorch has a lot of pre-built implementations. So that's a little bit of a sneaky trick that 4479 08:00:34,840 --> 08:00:39,960 I've done to alter our model. But we've still got basically the exact same model as we had before. 4480 08:00:39,960 --> 08:00:44,840 So what's next? Well, it's to train this model. So let's do that in the next video. 4481 08:00:44,840 --> 08:00:54,360 Welcome back. So in the last video, we built a PyTorch linear model, nice and simple using a 4482 08:00:54,360 --> 08:01:01,800 single nn.linear layer with one in feature, one out feature. And we over read the forward method 4483 08:01:01,800 --> 08:01:09,160 of nn.module using the linear layer that we created up here. So what's going to happen is when we do 4484 08:01:09,160 --> 08:01:13,960 the forward parser on our model, we're going to put some data in and it's going to go through 4485 08:01:13,960 --> 08:01:18,840 this linear layer, which behind the scenes, as we saw with torch and n linear, 4486 08:01:20,120 --> 08:01:27,320 behind the scenes, it's going to perform the linear regression formula here. So y equals x, 4487 08:01:27,320 --> 08:01:35,080 a t plus b. But now case, we've got weight and bias. So let's go back. It's now time to write 4488 08:01:35,080 --> 08:01:43,640 some training code. But before we do, let's set the model to use the target device. And so in 4489 08:01:43,640 --> 08:01:50,440 our case, we've got a device of CUDA. But because we've written device agnostic code, if we didn't 4490 08:01:50,440 --> 08:01:57,720 have access to a CUDA device, a GPU, our default device would be a CPU. So let's check the model 4491 08:01:57,720 --> 08:02:04,840 device. We can do that first up here, check the model current device, because we're going to use 4492 08:02:04,840 --> 08:02:12,120 the GPU here, or we're going to write device agnostic code. That's better to say device agnostic code. 4493 08:02:12,120 --> 08:02:19,400 That's the proper terminology device. What device are we currently using? This is the CPU, right? 4494 08:02:19,400 --> 08:02:27,800 So by default, the model will end up on the CPU. But if we set it to model one call dot two device, 4495 08:02:27,800 --> 08:02:31,960 what do you think it's going to do now? If our current target device is CUDA, we've seen what 4496 08:02:31,960 --> 08:02:38,200 two does in the fundamental section, two is going to send the model to the GPU memory. So now let's 4497 08:02:38,200 --> 08:02:45,960 check whether parameters of our model live dot device. If we send them to the device previously, 4498 08:02:45,960 --> 08:02:50,440 it was the CPU, it's going to take a little bit longer while the GPU gets fired up and goes, 4499 08:02:50,440 --> 08:02:56,360 PyTorch goes, Hey, I'm about to send you this model. You ready for it? Boom, there we go. Wonderful. 4500 08:02:56,360 --> 08:03:02,600 So now our model is on the device or the target device, which is CUDA. And if CUDA wasn't available, 4501 08:03:02,600 --> 08:03:07,480 the target device would be CPU. So this would just come out just exactly how we've got it here. 4502 08:03:07,480 --> 08:03:13,160 But with that being said, now let's get on to some training code. And this is the fun part. 4503 08:03:13,160 --> 08:03:18,760 What do we have to do? We've already seen this for training. I'm just going to clear up our 4504 08:03:18,760 --> 08:03:25,000 workspace a little bit here. For training, we need, this is part of the PyTorch workflow, 4505 08:03:25,000 --> 08:03:29,960 we need a loss function. What does a loss function do? Measures how wrong our model is, 4506 08:03:29,960 --> 08:03:37,800 we need an optimizer, we need a training loop and a testing loop. And the optimizer, what does that 4507 08:03:37,800 --> 08:03:44,600 do? Well, it optimizes the parameters of our model. So in our case, model one dot state dig, 4508 08:03:44,600 --> 08:03:50,200 what do we have? So we have some parameters here within the linear layer, we have a weight, 4509 08:03:50,200 --> 08:03:56,520 and we have a bias. The optimizer is going to optimize these random parameters so that they 4510 08:03:56,520 --> 08:04:01,880 hopefully reduce the loss function, which remember the loss function measures how wrong our model 4511 08:04:01,880 --> 08:04:06,120 is. So in our case, because we're working with the regression problem, let's set up the loss 4512 08:04:06,120 --> 08:04:12,520 function. And by the way, all of these steps are part of the workflow. We've got data ready, 4513 08:04:12,520 --> 08:04:17,800 we've built or picked a model, we're using a linear model. Now we're up to here 2.1 pick a loss 4514 08:04:17,800 --> 08:04:22,040 function and an optimizer, we're going to do build a training loop in the same session, 4515 08:04:22,040 --> 08:04:26,680 because you know what, we're getting pretty darn good at this, loss function equals what? 4516 08:04:27,240 --> 08:04:32,600 Well, we're going to use l one loss. So let's set that up and then dot l one loss, which is the 4517 08:04:32,600 --> 08:04:42,360 same as ma and if we wanted to set up our optimizer, what optimizer could we use? Well, pytorch offers 4518 08:04:42,360 --> 08:04:49,240 a lot of optimizers in torch dot opt in SGD. That's stochastic gradient descent, because remember 4519 08:04:49,240 --> 08:04:56,040 gradient descent is the algorithm that optimizes our model parameters. Adam is another popular option. 4520 08:04:56,040 --> 08:05:01,080 For now, we're going to stick with SGD. LR, which stands for learning rate. In other words, 4521 08:05:01,080 --> 08:05:07,960 how big of a step will our optimizer change our parameters with every iteration, a smaller 4522 08:05:07,960 --> 08:05:15,800 learning rate. So such as 0001 will be a small step. And then a large learning rate, such as 0.1 4523 08:05:15,800 --> 08:05:22,840 will be a larger step. Too big of a step. Our model learns too much. And it explodes too small of a 4524 08:05:22,840 --> 08:05:28,680 step. Our model never learns anything. But oh, we actually have to pass params first. I forgot 4525 08:05:28,680 --> 08:05:33,720 about that. I got ahead of myself with a learning rate. Params is the parameters we'd like our 4526 08:05:33,720 --> 08:05:39,880 optimizer to optimize. So in our case, it's model one dot parameters, because model one is our current 4527 08:05:39,880 --> 08:05:47,560 target model. Beautiful. So we've got a loss function and an optimizer. Now, let's write a training 4528 08:05:47,560 --> 08:05:54,040 loop. So I'm going to set torch manual seeds so we can try and get as reproducible as results as 4529 08:05:54,040 --> 08:05:59,320 possible. Remember, if you get different numbers to what I'm getting, don't worry too much if they're 4530 08:05:59,320 --> 08:06:05,160 not exactly the same, the direction is more important. So that means if my loss function is 4531 08:06:05,160 --> 08:06:10,280 getting smaller, yours should be getting smaller too. Don't worry too much if your fourth decimal 4532 08:06:10,280 --> 08:06:17,240 place isn't the same as what my values are. So we have a training loop ready to be written here. 4533 08:06:17,240 --> 08:06:22,120 Epox, how many should we do? Well, we did 200 last time and that worked pretty well. So let's do 200 4534 08:06:22,120 --> 08:06:28,360 again. Did you go through the extra curriculum yet? Did you watch the video for the unofficial 4535 08:06:28,360 --> 08:06:36,440 PyTorch optimization loop song yet? This one here, listen to the unofficial PyTorch optimization 4536 08:06:36,440 --> 08:06:45,960 loop song. If not, it's okay. Let's sing it together. So for an epoch in range, epochs, we're going to 4537 08:06:45,960 --> 08:06:51,000 go through the song in a second. We're going to set the model to train. In our case, it's model one, 4538 08:06:51,880 --> 08:06:57,400 model to train. Now, step number one is what? Do the forward pass. This is where we calculate 4539 08:06:57,400 --> 08:07:02,760 the predictions. So we calculate the predictions by passing the training data through our model. 4540 08:07:03,320 --> 08:07:09,960 And in our case, because the forward method in model one implements the linear layer, 4541 08:07:09,960 --> 08:07:15,720 this data is going to go through the linear layer, which is torch.nn.linear and go through 4542 08:07:15,720 --> 08:07:21,880 the linear regression formula. And then we calculate the loss, which is how wrong our models predictions 4543 08:07:21,880 --> 08:07:30,120 are. So the loss value equals loss fn. And here we're going to pass in y-pred and y-train. 4544 08:07:31,160 --> 08:07:37,640 Then what do we do? We zero the optimizer, optimizer zero grad, which because by default, 4545 08:07:37,640 --> 08:07:44,120 the optimizer is going to accumulate gradients behind the scenes. So every epoch, we want to 4546 08:07:44,120 --> 08:07:50,920 reduce those back to zero. So it starts from fresh. We're going to perform back propagation here, 4547 08:07:50,920 --> 08:07:57,720 back propagation, by calling loss, stop backwards. If the forward pass goes forward through the 4548 08:07:57,720 --> 08:08:03,400 network, the backward pass goes backwards through the network, calculating the gradients for the 4549 08:08:03,400 --> 08:08:09,640 loss function with respect to each parameter in the model. So optimizer step, this next part, 4550 08:08:09,640 --> 08:08:15,480 is going to look at those gradients and go, you know what? Which way should I optimize the parameters? 4551 08:08:15,480 --> 08:08:20,840 So because the optimizer is optimizing the model parameters, it's going to look at the 4552 08:08:20,840 --> 08:08:26,280 loss and go, you know what? I'm going to adjust the weight to be increased. And I'm going to lower 4553 08:08:26,280 --> 08:08:32,440 the bias and see if that reduces the loss. And then we can do testing. We can do both of these in 4554 08:08:32,440 --> 08:08:36,680 the same hit. Now we are moving quite fast through this because we spent a whole bunch of time 4555 08:08:36,680 --> 08:08:42,360 discussing what's going on here. So for testing, what do we do? We set the model into evaluation 4556 08:08:42,360 --> 08:08:47,160 mode. That's going to turn off things like dropout and batch normalization layers. We don't have any 4557 08:08:47,160 --> 08:08:53,080 of that in our model for now, but just it's good practice to always call a vowel whenever you're 4558 08:08:53,080 --> 08:08:58,200 doing testing. And same with inference mode. We don't need to track gradients and a whole bunch of 4559 08:08:58,200 --> 08:09:02,840 other things PyTorch does behind the scenes when we're testing or making predictions. So we use 4560 08:09:02,840 --> 08:09:08,040 the inference mode context manager. This is where we're going to create test pred, which is going 4561 08:09:08,040 --> 08:09:13,720 to be our test predictions, because here we're going to pass the test data features, forward 4562 08:09:13,720 --> 08:09:20,280 pass through our model. And then we can calculate the test loss, which is our loss function. And we're 4563 08:09:20,280 --> 08:09:28,760 going to compare the test pred to Y test. Wonderful. And then we can print out what's happening. 4564 08:09:28,760 --> 08:09:40,040 So what should we print out? How about if epoch divided by 10 equals zero. So every 10 epochs, 4565 08:09:40,040 --> 08:09:51,560 let's print something out, print. We'll do an F string here, epoch is epoch. And then we'll go 4566 08:09:51,560 --> 08:09:58,680 loss, which is the training loss, and just be equal to the loss. And then we'll go test loss is 4567 08:09:58,680 --> 08:10:08,360 equal to test loss. So do you think this will work? It's okay if you're not sure. But let's find 4568 08:10:08,360 --> 08:10:15,000 out together, hey, oh, we've got a, we need a bracket there. Oh my goodness, what's going on? 4569 08:10:15,000 --> 08:10:22,600 Run time error. Expected all tenses to be on the same device. Oh, of course. Do you know what's 4570 08:10:22,600 --> 08:10:28,760 happening here? But we found at least two devices, CUDA and CPU. Yes, of course, that's what's happened. 4571 08:10:28,760 --> 08:10:36,280 So what have we done? Up here, we put our model on the GPU. But what's going on here? Our data? 4572 08:10:36,280 --> 08:10:43,880 Has our data on the GPU? No, it's not. By default, it's on the CPU. So we haven't written device 4573 08:10:43,880 --> 08:10:50,360 agnostic code for our data. So let's write it here, put data on the target device. 4574 08:10:52,040 --> 08:10:59,960 Device agnostic code for data. So remember, one of the biggest issues with pytorch aside from 4575 08:10:59,960 --> 08:11:06,200 shape errors is that you should have your data or all of the things that you're computing with 4576 08:11:06,200 --> 08:11:11,880 on the same device. So that's why if we set up device agnostic code for our model, 4577 08:11:11,880 --> 08:11:20,200 we have to do the same for our data. So now let's put X train to device. Y train equals Y train 4578 08:11:20,200 --> 08:11:25,320 to device. This is going to create device agnostic code. In our case, it's going to use CUDA because 4579 08:11:25,320 --> 08:11:32,840 we have access to a CUDA device. But if we don't, this code will still work. It will still default 4580 08:11:32,840 --> 08:11:38,600 to CPU. So this is good. I like that we got that error because that's the sum of the things you're 4581 08:11:38,600 --> 08:11:42,520 going to come across in practice, right? So now let's run this. What's happening here? 4582 08:11:43,240 --> 08:11:50,680 Hey, look at that. Wonderful. So our loss starts up here nice and high. And then it starts to go 4583 08:11:50,680 --> 08:11:56,200 right down here for the training data. And then the same for the testing data. Beautiful. 4584 08:11:56,840 --> 08:12:02,520 Right up here. And then all the way down. Okay. So this looks pretty good on the test data set. So 4585 08:12:02,520 --> 08:12:08,200 how can we check this? How can we evaluate our model? Well, one way is to check its state 4586 08:12:08,200 --> 08:12:16,280 deck. So state decked. What do we got here? What are our weight and bias? Oh my gosh, so close. 4587 08:12:16,840 --> 08:12:24,040 So we just set weight and bias before to be 0.7 and 0.3. So this is what our model has estimated 4588 08:12:24,040 --> 08:12:31,000 our parameters to be based on the training data. 0.6968. That's pretty close to 0.7, 4589 08:12:31,000 --> 08:12:38,920 nearly perfect. And the same thing with the bias 0.3025 versus the perfect value is 0.93. But remember, 4590 08:12:38,920 --> 08:12:44,680 in practice, you won't necessarily know what the ideal parameters are. This is just to exemplify 4591 08:12:44,680 --> 08:12:50,120 what our model is doing behind the scenes. It's moving towards some ideal representative 4592 08:12:50,120 --> 08:12:56,520 parameters of whatever data we're working with. So in the next video, I'd like you to give it a go 4593 08:12:56,520 --> 08:13:02,120 of before we get to the next video, make some predictions with our model and plot them on the 4594 08:13:02,120 --> 08:13:08,600 original data. How close to the green dots match up with the red dots? And you can use this plot 4595 08:13:08,600 --> 08:13:14,040 predictions formula or function that we've been using in the past. So give that a go and I'll 4596 08:13:14,040 --> 08:13:19,240 see you in the next video. But congratulations. Look how quickly we just trained a model using 4597 08:13:19,240 --> 08:13:25,080 the steps that we've covered in a bunch of videos so far and device agnostic code. So good. 4598 08:13:25,080 --> 08:13:33,000 I'll see you soon. In the last video, we did something very, very exciting. We worked through 4599 08:13:33,000 --> 08:13:38,600 training an entire neural network. Some of these steps took us an hour or so worth of videos to 4600 08:13:38,600 --> 08:13:43,480 go back through before. But we coded that in one video. So you're ready listening the song just to 4601 08:13:43,480 --> 08:13:49,720 remind ourselves of what's going on. For an epoch in a range, call model dot train, do the forward 4602 08:13:49,720 --> 08:13:59,320 pass, calculate the loss, optimizer zero grad, loss backward, optimizer step, step, step, let's 4603 08:13:59,320 --> 08:14:06,600 test, come on a dot eval with torch inference mode, do the forward pass, calculate the loss, 4604 08:14:06,600 --> 08:14:15,240 print out what's happening. And then we do it again, again, again, for another epoch in a range. 4605 08:14:15,240 --> 08:14:18,440 Now I'm kidding. We'll just leave it there. We'll just leave it there. But that's the 4606 08:14:18,440 --> 08:14:24,680 unofficial pytorch optimization loop song. We created some device agnostic code so that we could 4607 08:14:24,680 --> 08:14:29,880 make the calculations on the same device as what our model is because the models also using device 4608 08:14:29,880 --> 08:14:36,200 agnostic code. And so now we've got to evaluate our models. We've looked at the loss and the test 4609 08:14:36,200 --> 08:14:41,560 lost here. And we know that our models loss is going down. But what does this actually equate to 4610 08:14:41,560 --> 08:14:45,880 when it makes predictions? That's what we're most interested in, right? And we've looked at the 4611 08:14:45,880 --> 08:14:51,960 parameters. They're pretty close to the ideal parameters. So at the end of last video, I issued 4612 08:14:51,960 --> 08:15:00,760 you the challenge to making and evaluating predictions to make some predictions and plot them. I hope 4613 08:15:00,760 --> 08:15:08,120 you gave it a shot. Let's see what it looks like together. Hey, so turn the model into evaluation 4614 08:15:08,120 --> 08:15:13,080 mode. Why? Because every time we're making predictions or inference, we want our model to be in a 4615 08:15:13,080 --> 08:15:18,680 vowel mode. And every time we're training, we want our model to be in training mode. And then we're 4616 08:15:18,680 --> 08:15:26,280 going to make predictions on the test data, because we train on the train data, and we evaluate our 4617 08:15:26,280 --> 08:15:31,320 model on the test data data that our model has never actually seen, except for when it makes 4618 08:15:31,320 --> 08:15:37,720 predictions. With torch inference mode, we turn on inference mode whenever we make inference or 4619 08:15:37,720 --> 08:15:43,960 predictions. So we're going to set Y threads equal to model one, and the test data goes in here. 4620 08:15:43,960 --> 08:15:50,840 Let's have a look at what the Y threads look like. Wonderful. So we've got a tensor here. It shows 4621 08:15:50,840 --> 08:15:56,040 us that they're still on the device CUDA. Why is that? Well, that's because previously we set the 4622 08:15:56,040 --> 08:16:03,000 model one to the device, the target device, the same with the test data. So subsequently, 4623 08:16:03,000 --> 08:16:09,240 our predictions are also on the CUDA device. Now, let's bring in the plot predictions function here. 4624 08:16:09,880 --> 08:16:17,800 So check out our model predictions visually. We're going to adhere to the data explorer's motto 4625 08:16:17,800 --> 08:16:26,200 of visualize visualize visualize plot predictions. And predictions are going to be set to 4626 08:16:26,200 --> 08:16:33,160 equals Y threads. And let's have a look. How good do these look? Oh, no. 4627 08:16:35,640 --> 08:16:41,640 Oh, we've got another error type error. Can't convert CUDA device type tensor to NumPy. 4628 08:16:41,640 --> 08:16:48,280 Oh, of course. Look what we've done. So our plot predictions function, if we go back up, 4629 08:16:48,280 --> 08:16:53,320 where did we define that? What does our plot predictions function use? It uses matplotlib, 4630 08:16:53,320 --> 08:17:01,800 of course, and matplotlib works with NumPy, not pytorch. And NumPy is CPU based. So of course, 4631 08:17:01,800 --> 08:17:07,320 we're running into another error down here, because we just said that our predictions are on the CUDA 4632 08:17:07,320 --> 08:17:13,640 device. They're not on the CPU. They're on a GPU. So it's giving us this helpful information here. 4633 08:17:13,640 --> 08:17:19,400 Use tensor dot CPU to copy the tensor to host memory first. So this is our tensor. Let's call 4634 08:17:19,400 --> 08:17:26,200 dot CPU and see what happens then. Is that going to go to CPU? Oh, my goodness. Look at that. 4635 08:17:27,160 --> 08:17:34,200 Look at that. Go the linear layer. The red dots, the predictions are basically on top of the testing 4636 08:17:34,200 --> 08:17:38,680 data. That is very exciting. Now again, you may not get the exact same numbers here, and that is 4637 08:17:38,680 --> 08:17:43,880 perfectly fine. But the direction should be quite similar. So your red dots should be basically on 4638 08:17:43,880 --> 08:17:50,040 top of the green dots, if not very slightly off. But that's okay. That's okay. We just want to focus 4639 08:17:50,040 --> 08:17:57,080 on the direction here. So thanks to the power of back propagation here and gradient descent, 4640 08:17:57,080 --> 08:18:05,080 our models random parameters have updated themselves to be as close as possible to the ideal parameters. 4641 08:18:05,080 --> 08:18:09,240 And now the predictions are looking pretty darn good for what we're trying to predict. 4642 08:18:09,240 --> 08:18:13,080 But we're not finished there. We've just finished training this model. What would happen if our 4643 08:18:13,080 --> 08:18:18,120 notebook disconnected right now? Well, that wouldn't be ideal, would it? So in the next part, 4644 08:18:18,120 --> 08:18:27,640 we're going to move on to 6.5, saving, and loading a trained model. So I'm going to give you a 4645 08:18:27,640 --> 08:18:33,480 challenge here as well, is to go ahead and go back and refer to this code here, saving model 4646 08:18:33,480 --> 08:18:39,800 in PyTorch, loading a PyTorch model, and see if you can save model one, the state dictionary of 4647 08:18:39,800 --> 08:18:45,000 model one, and load it back in and get something similar to this. Give that a shot, and I'll see you 4648 08:18:45,000 --> 08:18:53,080 in the next video. Welcome back. In the last video, we saw the power of the torch.nn.linear layer, 4649 08:18:53,080 --> 08:18:58,520 and back propagation and gradient descent. And we've got some pretty darn good predictions 4650 08:18:58,520 --> 08:19:03,800 out of our model. So that's very exciting. Congratulations. You've now trained two machine 4651 08:19:03,800 --> 08:19:10,280 learning models. But it's not over yet. We've got to save and load our trained model. So I 4652 08:19:10,280 --> 08:19:14,440 issued you the challenge in the last video to try and save and load the model yourself. I hope 4653 08:19:14,440 --> 08:19:19,240 you gave that a go. But we're going to do that together in this video. So we're going to start 4654 08:19:19,240 --> 08:19:26,360 by importing path because we would like a file path to save our model to. And the first step we're 4655 08:19:26,360 --> 08:19:32,360 going to do is create models directory. We don't have to recreate this because I believe we already 4656 08:19:32,360 --> 08:19:37,880 have one. But I'm going to put the code here just for completeness. And this is just so if you 4657 08:19:37,880 --> 08:19:46,040 didn't have a models directory, this would create one. So model path is going to go to path 4658 08:19:48,200 --> 08:19:56,440 models. And then we'd like to model path dot maker, we're going to call maker for make directory. 4659 08:19:56,440 --> 08:20:03,080 We'll set parents equal to true. And if it exists, okay, that'll also be true. So we won't get an error. 4660 08:20:03,080 --> 08:20:07,480 Oh my gosh, Google collab. I didn't want that. We won't get an error if it already exists. 4661 08:20:08,040 --> 08:20:15,480 And two, we're going to create a model save path. So if you recall that pytorch objects in general 4662 08:20:15,480 --> 08:20:21,960 have the extension of what? There's a little pop quiz before we get to the end of this sentence. 4663 08:20:21,960 --> 08:20:27,640 So this is going to be pytorch workflow for this module that we're going through. This one here, 4664 08:20:27,640 --> 08:20:35,480 chapter 01 pytorch workflow model one. And they usually have the extension dot PT for pytorch or 4665 08:20:35,480 --> 08:20:43,320 PT H for pytorch as well. I like PT H. But just remember, sometimes you might come across slightly 4666 08:20:43,320 --> 08:20:49,160 different versions of that PT or PT H. And we're going to create the model save name or the save 4667 08:20:49,160 --> 08:20:55,640 path. It's probably a better way to do it is going to be model path. And then we can use because we're 4668 08:20:55,640 --> 08:21:02,280 using the path lib module from Python, we can save it under model name. And so if we look at this, 4669 08:21:02,280 --> 08:21:11,240 what do we get model save path? We should get Oh, path is not defined. Oh, too many capitals here, 4670 08:21:11,240 --> 08:21:18,440 Daniel. The reason why I'm doing these in capitals is because oftentimes hyper parameters such as epochs 4671 08:21:18,440 --> 08:21:24,920 in machine learning are set as hyper parameters LR could be learning rate. And then you could have 4672 08:21:25,720 --> 08:21:33,240 as well model name equals Yeah, yeah, yeah. But that's just a little bit of nomenclature trivia for 4673 08:21:33,240 --> 08:21:39,560 later on. And model save path, we've done that. Now we're going to save the model state dictionary 4674 08:21:39,560 --> 08:21:46,280 rather than the whole model, save the model state deck, which you will find the pros and cons of 4675 08:21:46,280 --> 08:21:53,000 in where in the pytorch documentation for saving and loading model, which was a little bit of extra 4676 08:21:53,000 --> 08:21:57,720 curriculum for a previous video. But let's have a look at our model save path will print it out. 4677 08:21:58,280 --> 08:22:06,120 And we'll go torch save, we'll set the object that we're trying to save to equal model one dot state 4678 08:22:06,120 --> 08:22:12,600 deck, which is going to contain our trained model parameters. We can inspect what's going on in here, 4679 08:22:12,600 --> 08:22:18,840 state deck. They'll show us our model parameters. Remember, because we're only using a single linear 4680 08:22:18,840 --> 08:22:24,840 layer, we only have two parameters. But in practice, when you use a model with maybe hundreds of layers 4681 08:22:24,840 --> 08:22:30,360 or tens of millions of parameters, viewing the state deck explicitly, like we are now, 4682 08:22:30,360 --> 08:22:35,960 might not be too viable of an option. But the principle still remains a state deck contains 4683 08:22:35,960 --> 08:22:43,240 all of the models trained or associated parameters, and what state they're in. And the file path we're 4684 08:22:43,240 --> 08:22:49,880 going to use is, of course, the model save path, which we've seen here is a POSIX path. Let's save 4685 08:22:49,880 --> 08:22:57,080 our model. Wonderful saving model to this file path here. And if we have a look at our folder, 4686 08:22:57,080 --> 08:23:02,520 we should have two saved models now, beautiful to save models. This one for us from the workflow 4687 08:23:02,520 --> 08:23:08,680 we did before up here, saving a model in PyTorch, loading a PyTorch model. And now the one we've got, 4688 08:23:08,680 --> 08:23:15,400 of course, model one is the one that we've just saved. Beautiful. So now let's load a model. We're 4689 08:23:15,400 --> 08:23:19,640 going to do both of these in one video. Load a PyTorch model. You know what, because we've had a 4690 08:23:19,640 --> 08:23:25,400 little bit of practice so far, and we're going to pick up the pace. So let's go loaded, let's call 4691 08:23:25,400 --> 08:23:31,560 it, we'll create a new instance of loaded model one, which is, of course, our linear regression model 4692 08:23:31,560 --> 08:23:37,080 V2, which is the version two of our linear regression model class, which subclasses, what? 4693 08:23:37,640 --> 08:23:43,960 Subclasses and n.module. So if we go back here up here to where we created it. So linear regression 4694 08:23:43,960 --> 08:23:50,120 model V2 uses a linear layer rather than the previous iteration of linear regression model, 4695 08:23:50,120 --> 08:23:59,240 which we created right up here. If we go up to here, which explicitly defined the parameters, 4696 08:23:59,240 --> 08:24:03,880 and then implemented a linear regression formula in the forward method, the difference between 4697 08:24:03,880 --> 08:24:10,520 what we've got now is we use PyTorch's pre-built linear layer, and then we call that linear layer 4698 08:24:10,520 --> 08:24:15,000 in the forward method, which is probably the far more popular way of building PyTorch models, 4699 08:24:15,000 --> 08:24:21,160 is stacking together pre-built NN layers, and then calling them in some way in the forward method. 4700 08:24:21,160 --> 08:24:32,680 So let's load it in. So we'll create a new instance of linear regression model V2, and now what do 4701 08:24:32,680 --> 08:24:37,480 we do? We've created a new instance, I'm just going to get out of this, make some space for us. 4702 08:24:38,520 --> 08:24:45,640 We want to load the model state deck, the saved model one state deck, which is the state deck that 4703 08:24:45,640 --> 08:24:52,200 we just saved beforehand. So we can do this by going loaded model one, calling the load state 4704 08:24:52,200 --> 08:24:58,680 decked method, and then passing it torch dot load, and then the file path of where we saved that 4705 08:24:58,680 --> 08:25:05,320 PyTorch object before. But the reason why we use the path lib is so that we can just call model 4706 08:25:05,320 --> 08:25:13,560 save path in here. Wonderful. And then let's check out what's going on. Or actually, we need to 4707 08:25:13,560 --> 08:25:22,200 put the target model or the loaded model to the device. The reason being is because we're doing all 4708 08:25:22,200 --> 08:25:30,840 our computing with device agnostic code. So let's send it to the device. And I think that'll be about 4709 08:25:30,840 --> 08:25:36,600 it. Let's see if this works. Oh, there we go. Linear regression model V2 in features one, 4710 08:25:36,600 --> 08:25:43,480 out features one, bias equals true. Wonderful. Let's check those parameters. Hey, next loaded model 4711 08:25:43,480 --> 08:25:51,560 one dot parameters. Are they on the right device? Let's have a look. Beautiful. And let's just check 4712 08:25:51,560 --> 08:25:57,960 the loaded state dictionary of loaded model one. Do we have the same values as we had previously? 4713 08:25:57,960 --> 08:26:04,840 Yes, we do. Okay. So to conclusively make sure what's going on, let's evaluate the loaded model. 4714 08:26:04,840 --> 08:26:11,480 Evaluate loaded model, loaded model one. What do we do for making predictions? Or what do we do to 4715 08:26:11,480 --> 08:26:17,560 evaluate? We call dot a vowel. And then if we're going to make some predictions, we use torch 4716 08:26:17,560 --> 08:26:23,000 inference mode with torch inference mode. And then let's create loaded model one, threads 4717 08:26:24,760 --> 08:26:32,680 equals loaded model one. And we'll pass it the test data. And now let's check for a quality 4718 08:26:32,680 --> 08:26:39,000 between Y threads, which is our previous model one preds that we made up here, Y threads. 4719 08:26:39,800 --> 08:26:45,080 And we're going to compare them to the fresh loaded model one preds. And should they be the same? 4720 08:26:50,280 --> 08:26:57,400 Yes, they are beautiful. And we can see that they're both on the device CUDA. How amazing is that? So 4721 08:26:57,400 --> 08:27:02,680 I want to give you a big congratulations, because you've come such a long way. We've gone through 4722 08:27:02,680 --> 08:27:08,200 the entire PyTorch workflow from making data, preparing and loading it to building a model. 4723 08:27:08,200 --> 08:27:13,160 All of the steps that come in building a model, there's a whole bunch there, making predictions, 4724 08:27:13,160 --> 08:27:18,040 training a model, we spent a lot of time going through the training steps. But trust me, it's 4725 08:27:18,040 --> 08:27:23,000 worth it, because we're going to be using these exact steps all throughout the course. And in fact, 4726 08:27:23,000 --> 08:27:27,640 you're going to be using these exact steps when you build PyTorch models after this course. And 4727 08:27:27,640 --> 08:27:32,040 then we looked at how to save a model so we don't lose all our work, we looked at loading a model, 4728 08:27:32,040 --> 08:27:37,720 and then we put it all together using the exact same problem, but in far less time. And as you'll 4729 08:27:37,720 --> 08:27:42,680 see later on, we can actually make this even quicker by functionalizing some of the code we've already 4730 08:27:42,680 --> 08:27:47,240 written. But I'm going to save that for later. I'll see you in the next video, where I'm just 4731 08:27:47,240 --> 08:27:51,880 going to show you where you can find some exercises and all of the extra curriculum I've been talking 4732 08:27:51,880 --> 08:28:00,760 about throughout this section 01 PyTorch workflow. I'll see you there. Welcome back. In the last 4733 08:28:00,760 --> 08:28:06,760 video, we finished up putting things together by saving and loading our trained model, which is 4734 08:28:06,760 --> 08:28:12,520 super exciting, because let's come to the end of the PyTorch workflow section. So now, this section 4735 08:28:12,520 --> 08:28:18,760 is going to be exercises and extra curriculum, or better yet, where you can find them. So I'm 4736 08:28:18,760 --> 08:28:25,640 going to turn this into markdown. And I'm going to write here for exercises and extra curriculum. 4737 08:28:27,320 --> 08:28:35,240 Refer to. So within the book version of the course materials, which is at learnpytorch.io, 4738 08:28:35,240 --> 08:28:40,520 we're in the 01 section PyTorch workflow fundamentals. There'll be more here by the time you watch 4739 08:28:40,520 --> 08:28:45,480 this video likely. And then if we go down here, at the end of each of these sections, we've got 4740 08:28:45,480 --> 08:28:51,880 the table of contents over here. We've got exercises and extra curriculum. I listed a bunch of things 4741 08:28:51,880 --> 08:28:58,760 throughout this series of 01 videos, like what's gradient descent and what's back propagation. So 4742 08:28:58,760 --> 08:29:04,120 I've got plenty of resources to learn more on that. There's the loading and saving PyTorch 4743 08:29:04,120 --> 08:29:09,640 documentation. There's the PyTorch cheat sheet. There's a great article by Jeremy Howard for a 4744 08:29:09,640 --> 08:29:14,520 deeper understanding of what's going on in torch.nn. And there's, of course, the unofficial PyTorch 4745 08:29:14,520 --> 08:29:21,160 optimization loop song by yours truly, which is a bit of fun. And here's some exercises. So 4746 08:29:21,160 --> 08:29:27,640 the exercises here are all based on the code that we wrote throughout section 01. So there's 4747 08:29:27,640 --> 08:29:32,760 nothing in the exercises that we haven't exactly covered. And if so, I'll be sure to put a note 4748 08:29:32,760 --> 08:29:37,880 in the exercise itself. But we've got create a straight line data set using the linear regression 4749 08:29:37,880 --> 08:29:44,760 formula. And then build a model by subclassing and end up module. So for these exercises, there's an 4750 08:29:44,760 --> 08:29:50,040 exercise notebook template, which is, of course, linked here. And in the PyTorch deep learning 4751 08:29:50,040 --> 08:29:56,040 GitHub, if we go into here, and then if we go into extras, and if we go into exercises, you'll 4752 08:29:56,040 --> 08:30:01,400 find all of these templates here. They're numbered by the same section that we're in. This is PyTorch 4753 08:30:01,400 --> 08:30:07,640 workflow exercises. So if you wanted to complete these exercises, you could click this notebook 4754 08:30:07,640 --> 08:30:16,280 here, open in Google CoLab. I'll just wait for this to load. There we go. And you can start to 4755 08:30:16,280 --> 08:30:21,800 write some code here. You could save a copy of this in your own Google Drive and go through this. 4756 08:30:21,800 --> 08:30:26,840 It's got some notes here on what you should be doing. You can, of course, refer to the text-based 4757 08:30:26,840 --> 08:30:31,480 version of them. They're all here. And then if you want an example of what some solutions look 4758 08:30:31,480 --> 08:30:36,920 like, now, please, I can't stress enough that I would highly, highly recommend trying the exercises 4759 08:30:36,920 --> 08:30:43,080 yourself. You can use the book that we've got here. This is just all the code from the videos. 4760 08:30:43,080 --> 08:30:47,800 You can use this. You can use, I've got so many notebooks here now, you can use all of the code 4761 08:30:47,800 --> 08:30:52,760 that we've written here to try and complete the exercises. But please give them a go yourself. 4762 08:30:52,760 --> 08:30:57,960 And then if you go back into the extras folder, you'll also find solutions. And this is just one 4763 08:30:57,960 --> 08:31:03,400 example solutions for section 01. But I'm going to get out of that so you can't cheat and look 4764 08:31:03,400 --> 08:31:08,680 at the solutions first. But there's a whole bunch of extra resources all contained within 4765 08:31:09,400 --> 08:31:16,600 the PyTorch deep loaning repo, extras, exercises, solutions, and they're also in the book version 4766 08:31:16,600 --> 08:31:21,880 of the course. So I'm just going to link this in here. I'm going to put this right at the bottom 4767 08:31:21,880 --> 08:31:28,520 here. Wonderful. But that is it. That is the end of the section 01 PyTorch workflow. 4768 08:31:28,520 --> 08:31:33,560 So exciting. We went through basically all of the steps in a PyTorch workflow, 4769 08:31:33,560 --> 08:31:38,600 getting data ready, turning into tenses, build or pick a model, picking a loss function on an 4770 08:31:38,600 --> 08:31:42,200 optimizer. We built a training loop. We fit the model to the data. We made a prediction. 4771 08:31:42,200 --> 08:31:47,080 We evaluated our model. We improved through experimentation by training for more epochs. 4772 08:31:47,080 --> 08:31:51,240 We'll do more of this later on. And we saved and reload our trained model. 4773 08:31:51,240 --> 08:32:01,480 But that's going to finish 01. I will see you in the next section. Friends, welcome back. 4774 08:32:02,040 --> 08:32:07,400 We've got another very exciting module. You ready? Neural network classification with 4775 08:32:10,440 --> 08:32:15,720 PyTorch. Now combining this module once we get to the end with the last one, which was 4776 08:32:15,720 --> 08:32:20,520 regression. So remember classification is predicting a thing, but we're going to see this in a second. 4777 08:32:20,520 --> 08:32:25,080 And regression is predicting a number. Once we've covered this, we've covered two of the 4778 08:32:25,080 --> 08:32:30,280 the biggest problems in machine learning, predicting a number or predicting a thing. 4779 08:32:30,280 --> 08:32:36,680 So let's start off with before we get into any ideas or code, where can you get help? 4780 08:32:38,120 --> 08:32:43,240 First things first is follow along with the code. If you can, if in doubt, run the code. 4781 08:32:44,120 --> 08:32:48,280 Try it for yourself. Write the code. I can't stress how important this is. 4782 08:32:48,280 --> 08:32:53,560 If you're still stuck, press shift, command, and space to read the doc string of any of the 4783 08:32:53,560 --> 08:32:59,160 functions that we're running. If you are on Windows, it might be control. I'm on a Mac, so I put command 4784 08:32:59,160 --> 08:33:04,280 here. If you're still stuck, search for your problem. If an error comes up, just copy and paste 4785 08:33:04,280 --> 08:33:09,160 that into Google. That's what I do. You might come across resources like Stack Overflow or, 4786 08:33:09,160 --> 08:33:14,600 of course, the PyTorch documentation. We'll be referring to this a lot again throughout this 4787 08:33:14,600 --> 08:33:21,480 section. And then finally, oh wait, if you're still stuck, try again. If in doubt, run the code. 4788 08:33:21,480 --> 08:33:25,960 And then finally, if you're still stuck, don't forget, you can ask a question. The best place to 4789 08:33:25,960 --> 08:33:31,160 do so will be on the course GitHub, which will be at the discussions page, which is linked here. 4790 08:33:32,120 --> 08:33:36,440 If we load this up, there's nothing here yet, because as I record these videos, the course 4791 08:33:36,440 --> 08:33:42,920 hasn't launched yet, but press new discussion. Talk about what you've got. Problem with XYZ. 4792 08:33:42,920 --> 08:33:48,600 Let's go ahead. Leave a video number here and a timestamp, and that way, we'll be able to help 4793 08:33:48,600 --> 08:33:54,440 you out as best as possible. So video number, timestamp, and then your question here, and you 4794 08:33:54,440 --> 08:34:01,080 can select Q&A. Finally, don't forget that this notebook that we're about to go through is based 4795 08:34:01,080 --> 08:34:05,960 on chapter two of the Zero to Mastery Learn PyTorch for deep learning, which is neural network 4796 08:34:05,960 --> 08:34:11,720 classification with PyTorch. All of the text-based code that we're about to write is here. That 4797 08:34:11,720 --> 08:34:17,720 was a little spoiler. And don't forget, this is the home page. So my GitHub repo slash PyTorch 4798 08:34:17,720 --> 08:34:23,320 deep learning for all of the course materials, everything you need will be here. So that's very 4799 08:34:23,320 --> 08:34:28,200 important. How can you get help? But this is the number one. Follow along with the code and try 4800 08:34:28,200 --> 08:34:32,680 to write it yourself. Well, with that being said, when we're talking about classification, 4801 08:34:32,680 --> 08:34:38,040 what is a classification problem? Now, as I said, classification is one of the main problems of 4802 08:34:38,040 --> 08:34:43,000 machine learning. So you probably already deal with classification problems or machine learning 4803 08:34:43,000 --> 08:34:50,280 powered classification problems every day. So let's have a look at some examples. Is this email 4804 08:34:50,280 --> 08:34:57,400 spam or not spam? Did you check your emails this morning or last night or whenever? So chances are 4805 08:34:57,400 --> 08:35:00,920 that there was some sort of machine learning model behind the scenes. It may have been a neural 4806 08:35:00,920 --> 08:35:07,160 network. It may have not that decided that some of your emails won't spam. So to Daniel, 4807 08:35:07,160 --> 08:35:11,240 at mrdberg.com, hey, Daniel, this steep learning course is incredible. I can't wait to use what 4808 08:35:11,240 --> 08:35:15,800 I've learned. Oh, that's such a nice message. If you want to send that email directly to me, 4809 08:35:15,800 --> 08:35:21,640 you can. That's my actual email address. But if you want to send me this email, well, hopefully 4810 08:35:21,640 --> 08:35:27,400 my email, which is hosted by some email service detects this as spam because although that is a 4811 08:35:27,400 --> 08:35:32,200 lot of money and it would be very nice, I think if someone can't spell too well, are they really 4812 08:35:32,200 --> 08:35:37,640 going to pay me this much money? So thank you email provider for classifying this as spam. And now 4813 08:35:37,640 --> 08:35:44,680 because this is one thing or another, not spam or spam, this is binary classification. So in this 4814 08:35:44,680 --> 08:35:51,960 case, it might be one here and this is a zero or zero or one. So one thing or another, that's binary 4815 08:35:51,960 --> 08:35:57,960 classification. If you can split it into one thing or another, binary classification. And then we 4816 08:35:57,960 --> 08:36:04,840 have an example of say we had the question, we asked our photos app on our smartphone or whatever 4817 08:36:04,840 --> 08:36:10,440 device you're using. Is this photo of sushi steak or pizza? We wanted to search our photos for every 4818 08:36:10,440 --> 08:36:15,880 time we've eaten sushi or every time we've eaten steak or every time we've eaten pizza far out 4819 08:36:15,880 --> 08:36:21,080 and this looks delicious. But this is multi class classification. Now, why is this? Because we've 4820 08:36:21,080 --> 08:36:27,000 got more than two things. We've got 123. And now this could be 10 different foods. It could be 100 4821 08:36:27,000 --> 08:36:33,400 different foods. It could be 1000 different categories. So the image net data set, which is a popular 4822 08:36:33,400 --> 08:36:42,680 data set for computer vision, image net, we go to here, does it say 1000 anywhere, 1k or 1000? 4823 08:36:43,720 --> 08:36:50,600 No, it doesn't. But if we go image net 1k, download image net data, maybe it's here. 4824 08:36:50,600 --> 08:36:58,840 It won't say it, but you just, oh, there we go, 1000 object classes. So this is multi class 4825 08:36:58,840 --> 08:37:05,400 classification because it has 1000 classes, that's a lot, right? So that's multi class classification, 4826 08:37:05,400 --> 08:37:12,440 more than one thing or another. And finally, we might have multi label classification, 4827 08:37:13,000 --> 08:37:17,240 which is what tags should this article have when I first got into machine learning, I got these 4828 08:37:17,240 --> 08:37:22,600 two mixed up a whole bunch of times. Multi class classification has multiple classes such as sushi 4829 08:37:22,600 --> 08:37:28,920 steak pizza, but assigns one label to each. So this photo would be sushi in an ideal world. This is 4830 08:37:28,920 --> 08:37:35,160 steak and this is pizza. So one label to each. Whereas multi label classification means you could 4831 08:37:35,160 --> 08:37:41,720 have multiple different classes. But each of your target samples such as this Wikipedia article, 4832 08:37:41,720 --> 08:37:47,000 what tags should this article have? It may have more than one label. It might have three labels, 4833 08:37:47,000 --> 08:37:54,200 it might have 10 labels. In fact, what if we went to the Wikipedia page for deep learning Wikipedia 4834 08:37:54,840 --> 08:38:01,320 and does it have any labels? Oh, there we go. Where was that? I mean, you can try this yourself. 4835 08:38:01,320 --> 08:38:05,160 This is just the Wikipedia page for deep learning. There is a lot, there we go categories deep 4836 08:38:05,160 --> 08:38:10,760 learning, artificial neural networks, artificial intelligence and emerging technologies. So that 4837 08:38:10,760 --> 08:38:15,640 is an example. If we wanted to build a machine learning model to say, read all of the text in 4838 08:38:15,640 --> 08:38:21,320 here and then go tell me what are the most relevant categories to this article? It might come up 4839 08:38:21,320 --> 08:38:26,760 with something like these. In this case, because it has one, two, three, four, it has multiple labels 4840 08:38:26,760 --> 08:38:33,320 rather than just one label of deep learning, it could be multi label classification. So we'll go 4841 08:38:33,320 --> 08:38:38,520 back. But there's a few more. These will get you quite far in the world of classification. 4842 08:38:38,520 --> 08:38:45,480 So let's dig a little deeper on binary versus multi class classification. You may have already 4843 08:38:45,480 --> 08:38:52,200 experienced this. So in my case, if I search on my phone in the photos app for photos of a dog, 4844 08:38:52,200 --> 08:38:56,680 it might come here. If I search for photos of a cat, it might come up with this. But if I wanted 4845 08:38:56,680 --> 08:39:01,400 to train an algorithm to detect the difference between photos of these are my two dogs. 4846 08:39:01,400 --> 08:39:06,040 Aren't they cute? They're nice and tired and they're sleeping like a person. This is seven. 4847 08:39:06,040 --> 08:39:11,720 Number seven, that's her name. And this is Bella. This is a cat that me and my partner rescued. 4848 08:39:11,720 --> 08:39:16,600 And so I'm not sure what this cat's name is actually. So I'd love to give it a name, but I can't. 4849 08:39:16,600 --> 08:39:22,280 So binary classification, if we wanted to build an algorithm, we wanted to feed it, say, 10,000 4850 08:39:22,280 --> 08:39:27,640 photos of dogs and 10,000 photos of cats. And then we wanted to find a random image on the 4851 08:39:27,640 --> 08:39:32,600 internet and pass it through to our model and say, hey, is this a dog or is this a cat? It would 4852 08:39:32,600 --> 08:39:39,080 be binary classification because the options are one thing or another dog or cat. But then for 4853 08:39:39,080 --> 08:39:44,040 multi-class classification, let's say we've been working on a farm and we've been taking some photos 4854 08:39:44,040 --> 08:39:49,080 of chickens because they groovy, right? Well, we updated our model and added some chicken photos 4855 08:39:49,080 --> 08:39:54,360 in there. We would now be working with a multi-class classification problem because we've got more 4856 08:39:54,360 --> 08:40:01,640 than one thing or another. So let's jump in to what we're going to cover. This is broadly, 4857 08:40:01,640 --> 08:40:06,200 by the way, because this is just text on a page. You know, I like to just write code of what we're 4858 08:40:06,200 --> 08:40:10,840 actually doing. So we're going to look at the architecture of a neural network classification 4859 08:40:10,840 --> 08:40:15,480 model. We're going to check what the input shapes and output shapes of a classification model are 4860 08:40:15,480 --> 08:40:20,840 features and labels. In other words, because remember, machine learning models, neural networks 4861 08:40:20,840 --> 08:40:27,320 love to have numerical inputs. And those numerical inputs often come in tenses. Tenses have different 4862 08:40:27,320 --> 08:40:32,120 shapes, depending on what data you're working with. We're going to see all of this in code, creating 4863 08:40:32,120 --> 08:40:36,200 custom data to view, fit and predict on. We're going to go back through our steps in modeling. 4864 08:40:36,200 --> 08:40:41,720 We covered this a fair bit in the previous section, but creating a model for neural network classification. 4865 08:40:41,720 --> 08:40:45,480 It's a little bit different to what we've done, but not too out landishly different. We're going to 4866 08:40:45,480 --> 08:40:50,280 see how we can set up a loss function and an optimizer for a classification model. We'll 4867 08:40:50,280 --> 08:40:56,040 recreate a training loop and a evaluating loop or a testing loop. We'll see how we can save and 4868 08:40:56,040 --> 08:41:01,480 load our models. We'll harness the power of nonlinearity. Well, what does that even mean? Well, 4869 08:41:01,480 --> 08:41:07,000 if you think of what a linear line is, what is that? It's a straight line. So you might be 4870 08:41:07,000 --> 08:41:12,040 able to guess what a nonlinear line looks like. And then we'll look at different classification 4871 08:41:12,040 --> 08:41:17,640 evaluation methods. So ways that we can evaluate our classification models. And how are we going 4872 08:41:17,640 --> 08:41:24,360 to do all of this? Well, of course, we're going to be part cook, part chemist, part artist, part 4873 08:41:24,360 --> 08:41:30,600 science. But for me, I personally prefer the cook side of things because we're going to be cooking 4874 08:41:30,600 --> 08:41:36,600 up lots of code. So in the next video, before we get into coding, let's do a little bit more on 4875 08:41:36,600 --> 08:41:44,120 what are some classification inputs and outputs. I'll see you there. Welcome back. In the last 4876 08:41:44,120 --> 08:41:48,600 video, we had a little bit of a brief overview of what a classification problem is. But now, 4877 08:41:48,600 --> 08:41:53,400 let's start to get more hands on by discussing what the actual inputs to a classification problem 4878 08:41:53,400 --> 08:41:59,880 look like and the outputs look like. And so let's say we had our beautiful food photos from before, 4879 08:41:59,880 --> 08:42:04,840 and we were trying to build this app here called maybe food vision to understand what 4880 08:42:04,840 --> 08:42:12,600 foods are in the photos that we take. And so what might this look like? Well, let's break it down 4881 08:42:13,160 --> 08:42:18,600 to inputs, some kind of machine learning algorithm, and then outputs. In this case, 4882 08:42:18,600 --> 08:42:24,600 the inputs we want to numerically represent these images in some way, shape or form. Then we want 4883 08:42:24,600 --> 08:42:29,320 to build a machine learning algorithm. Hey, one might actually exist. We're going to see this later 4884 08:42:29,320 --> 08:42:33,800 on in the transfer learning section for our problem. And then we want some sort of outputs. And in 4885 08:42:33,800 --> 08:42:39,080 the case of food vision, we want to know, okay, this is a photo of sushi. And this is a photo of 4886 08:42:39,080 --> 08:42:46,280 steak. And this is a photo of pizza. You could get more hands on and technical and complicated, but 4887 08:42:46,280 --> 08:42:52,840 we're just going to stick with single label multi class classification. So it could be a sushi photo, 4888 08:42:52,840 --> 08:42:58,760 it could be a steak photo, or it could be a pizza photo. So how might we numerically represent 4889 08:42:58,760 --> 08:43:05,080 these photos? Well, let's just say we had a function in our app that every photo that gets taken 4890 08:43:05,080 --> 08:43:11,720 automatically gets resized into a square into 224 width and 224 height. This is actually quite a 4891 08:43:11,720 --> 08:43:18,520 common dimensionality for computer vision problems. And so we've got the width dimension, we've got 4892 08:43:18,520 --> 08:43:23,800 the height, and then we've got this C here, which isn't immediately recognizable. But in the case 4893 08:43:23,800 --> 08:43:29,720 of pictures, they often get represented by width, height color channels. And the color channels is 4894 08:43:29,720 --> 08:43:37,000 red, green and blue, which is each pixel in this image has some value of red, green or blue, that 4895 08:43:37,000 --> 08:43:43,000 makes whatever color is displayed here. And this is one way that we can numerically represent an 4896 08:43:43,000 --> 08:43:50,280 image by taking its width, its height and color channels, and whatever number makes up this 4897 08:43:50,280 --> 08:43:54,440 particular image. We're going to see this later on when we work with computer vision problems. 4898 08:43:55,080 --> 08:44:01,480 So we create a numerical encoding, which is the pixel values here. Then we import the pixel values 4899 08:44:01,480 --> 08:44:07,160 of each of these images into a machine learning algorithm, which is often already exists. And if 4900 08:44:07,160 --> 08:44:12,040 it doesn't exist for our particular problem, hey, well, we're learning the skills to build them now, 4901 08:44:12,040 --> 08:44:17,720 we could use pytorch to build a machine learning algorithm for this. And then outputs, what might 4902 08:44:17,720 --> 08:44:23,560 these look like? Well, in this case, these are prediction probabilities, which the outputs of 4903 08:44:23,560 --> 08:44:28,440 machine learning models are never actually discrete, which means it is definitely pizza. 4904 08:44:28,440 --> 08:44:35,240 It will give some sort of probability value between zero and one for say the closer to one, 4905 08:44:35,240 --> 08:44:42,520 the more confident our model is that it's going to be pizza. And the closer to zero is means that, 4906 08:44:42,520 --> 08:44:47,880 hey, this photo of pizza, let's say this one, and we're trying to predict sushi. Well, 4907 08:44:48,680 --> 08:44:53,080 it doesn't think that it's sushi. So it's giving it quite a low value here. And then the same for 4908 08:44:53,080 --> 08:44:58,600 steak, but it's really high, the value here for pizza. We're going to see this hands on. And then 4909 08:44:58,600 --> 08:45:03,880 it's the opposite here. So it might have got this one wrong. But with more training and more data, 4910 08:45:03,880 --> 08:45:07,320 we could probably improve this prediction. That's the whole idea of machine learning, 4911 08:45:07,320 --> 08:45:14,280 is that if you adjust the algorithm, if you adjust the data, you can improve your predictions. And so 4912 08:45:16,360 --> 08:45:22,040 the ideal outputs that we have here, this is what our models going to output. But for our case of 4913 08:45:22,040 --> 08:45:28,200 building out food vision, we want to bring them back to. So we could just put all of these numbers 4914 08:45:28,200 --> 08:45:33,720 on the screen here, but that's not really going to help people. We want to put out labels of what's 4915 08:45:33,720 --> 08:45:39,640 going on here. So we can write code to transfer these prediction probabilities into these labels 4916 08:45:39,640 --> 08:45:44,600 too. And so how did these labels come about? How do these predictions come about? Well, 4917 08:45:44,600 --> 08:45:48,920 it comes from looking at lots of different samples. So this loop, we could keep going, 4918 08:45:48,920 --> 08:45:54,120 improve these, find the ones where it's wrong, add more images here, train the model again, 4919 08:45:54,120 --> 08:46:00,200 and then make our app better. And so if we want to look at this from a shape perspective, 4920 08:46:01,240 --> 08:46:06,840 we want to create some tenses for an image classification example. So we're building food vision. 4921 08:46:08,200 --> 08:46:12,040 We've got an image again, this is just reiterating on some of the things that we've discussed. 4922 08:46:12,040 --> 08:46:17,720 We've got a width of 224 and a height of 224. This could be different. This could be 300, 300. 4923 08:46:17,720 --> 08:46:23,160 This could be whatever values that you decide to use. Then we numerically encoded in some way, 4924 08:46:23,160 --> 08:46:27,720 shape or form. We use this as the inputs to our machine learning algorithm, because of what? 4925 08:46:27,720 --> 08:46:31,800 Computers and machine learning algorithms, they love numbers. They can find patterns in here 4926 08:46:31,800 --> 08:46:35,400 that we couldn't necessarily find. Or maybe we could, if you had a long enough time, 4927 08:46:35,400 --> 08:46:39,560 but I'd rather write an algorithm to do it for me. Then it has some outputs, 4928 08:46:39,560 --> 08:46:44,280 which comes in the formal prediction probabilities, the closer to one, the more confident model is 4929 08:46:44,280 --> 08:46:49,400 and saying, hey, I'm pretty damn confident that this is a photo of sushi. I don't think it's a 4930 08:46:49,400 --> 08:46:55,320 photo of steak. So I'm giving that zero. It might be a photo of pizza, but I don't really think so. 4931 08:46:55,320 --> 08:47:01,560 So I'm giving it quite a low prediction probability. And so if we have a look at what the shapes are 4932 08:47:01,560 --> 08:47:05,640 for our tenses here, if this doesn't make sense, don't worry. We're going to see the code to do 4933 08:47:05,640 --> 08:47:11,000 all of this later on. But for now, we're just focusing on a classification input and output. 4934 08:47:11,000 --> 08:47:17,080 The big takeaway from here is numerical encoding, outputs and numerical encoding. But we want to 4935 08:47:17,080 --> 08:47:22,200 change these numerical codings from the outputs to something that we understand, say the word sushi. 4936 08:47:22,840 --> 08:47:28,200 But this tensor may be batch size. We haven't seen what batch size is. That's all right. We're 4937 08:47:28,200 --> 08:47:33,960 going to cover it. Color channels with height. So this is represented as a tensor of dimensions. 4938 08:47:33,960 --> 08:47:38,760 It could be none here. None is a typical value for a batch size, which means it's blank. So when 4939 08:47:38,760 --> 08:47:43,880 we use our model and we train it, all the code that we write with pytorch will fill in this behind 4940 08:47:43,880 --> 08:47:50,120 the scenes. And then we have three here, which is color channels. And we have 224, which is the width. 4941 08:47:50,120 --> 08:47:55,880 And we have 224 as well, which is the height. Now there is some debate in the field on the ordering. 4942 08:47:55,880 --> 08:48:01,080 We're using an image as our particular example here on the ordering of these shapes. So say, 4943 08:48:01,080 --> 08:48:06,040 for example, you might have height width color channels, typically width and height come together 4944 08:48:06,040 --> 08:48:10,840 in this order. Or they're just side by side in the tensor in terms of their whether dimension 4945 08:48:10,840 --> 08:48:17,000 appears. But color channels sometimes comes first. That means after the batch size or at the end here. 4946 08:48:17,000 --> 08:48:22,920 But pytorch, the default for now is color channels with height, though you can write code to change 4947 08:48:22,920 --> 08:48:30,120 this order because tenses are quite flexible. And so or the shape could be 32 for the batch size, 4948 08:48:30,120 --> 08:48:35,880 three, two, two, four, two, two, four, because 32 is a very common batch size. And you don't believe me? 4949 08:48:35,880 --> 08:48:46,280 Well, let's go here. Yarn LeCoon 32 batch size. Now what is a batch size? Great tweet. Just keep 4950 08:48:46,280 --> 08:48:51,320 this in mind for later on. Training with large mini batches is bad for your health. More importantly, 4951 08:48:51,320 --> 08:48:55,880 it's bad for your test error. Friends don't let friends use mini batches larger than 32. So this 4952 08:48:55,880 --> 08:49:03,240 is quite an old tweet. However, it still stands quite true. Because like today, it's 2022 when 4953 08:49:03,240 --> 08:49:08,760 I'm recording these videos, there are batch sizes a lot larger than 32. But 32 works pretty darn 4954 08:49:08,760 --> 08:49:16,280 well for a lot of problems. And so this means that if we go back to our slide, that if we use 4955 08:49:16,280 --> 08:49:22,440 a batch size of 32, our machine learning algorithm looks at 32 images at a time. Now why does it do 4956 08:49:22,440 --> 08:49:27,960 this? Well, because sadly, our computers don't have infinite compute power. In an ideal world, 4957 08:49:27,960 --> 08:49:32,920 we look at thousands of images at a time, but it turns out that using a multiple of eight here 4958 08:49:32,920 --> 08:49:39,240 is actually quite efficient. And so if we have a look at the output shape here, why is it three? 4959 08:49:39,800 --> 08:49:45,240 Well, because we're working with three different classes, one, two, three. So we've got shape equals 4960 08:49:45,240 --> 08:49:52,840 three. Now, of course, as you could imagine, these might change depending on the problem you're working 4961 08:49:52,840 --> 08:49:58,680 with. So say if we just wanted to predict if a photo was a cat or a dog, we still might have this 4962 08:49:58,680 --> 08:50:03,880 same representation here because this is the image representation. However, the shape here 4963 08:50:03,880 --> 08:50:09,400 may be two, or will be two because it's cat or dog, rather than three classes here, but a little 4964 08:50:09,400 --> 08:50:13,880 bit confusing as well with binary classification, you could have the shape just being one here. 4965 08:50:13,880 --> 08:50:19,880 But we're going to see this all hands on. Just remember, the shapes vary with whatever problem 4966 08:50:19,880 --> 08:50:26,920 you're working on. The principle of encoding your data as a numerical representation stays the same 4967 08:50:26,920 --> 08:50:33,080 for the inputs. And the outputs will often be some form of prediction probability based on whatever 4968 08:50:33,080 --> 08:50:39,880 class you're working with. So in the next video, right before we get into coding, let's just discuss 4969 08:50:39,880 --> 08:50:45,000 the high level architecture of a classification model. And remember, architecture is just like 4970 08:50:45,000 --> 08:50:52,200 the schematic of what a neural network is. I'll see you there. Welcome back. In the last video, 4971 08:50:52,200 --> 08:50:58,200 we saw some example classification inputs and outputs. The main takeaway that the inputs to a 4972 08:50:58,200 --> 08:51:02,760 classification model, particularly a neural network, want to be some form of numerical 4973 08:51:02,760 --> 08:51:09,160 representation. And the outputs are often some form of prediction probability. So let's discuss 4974 08:51:09,160 --> 08:51:13,720 the typical architecture of a classification model. And hey, this is just going to be text 4975 08:51:13,720 --> 08:51:19,240 on a page, but we're going to be building a fair few of these. So we've got some hyper parameters 4976 08:51:19,240 --> 08:51:26,280 over here. We've got binary classification. And we've got multi class classification. Now, 4977 08:51:26,280 --> 08:51:30,840 there are some similarities between the two in terms of what problem we're working with. 4978 08:51:30,840 --> 08:51:36,600 But there also are some differences here. And by the way, this has all come from, if we go 4979 08:51:36,600 --> 08:51:41,080 to the book version of the course, we've got what is a classification problem. And we've got 4980 08:51:41,080 --> 08:51:47,640 architecture of a classification neural network. So all of this text is available at learnpytorch.io 4981 08:51:47,640 --> 08:51:53,960 and in section two. So we come back. So the input layer shape, which is typically 4982 08:51:53,960 --> 08:51:59,960 decided by the parameter in features, as you can see here, is the same of number of features. 4983 08:51:59,960 --> 08:52:04,200 So if we were working on a problem, such as we brought it to predict whether someone had 4984 08:52:04,200 --> 08:52:09,560 heart disease or not, we might have five input features, such as one for age, a number for age, 4985 08:52:09,560 --> 08:52:16,360 it might be in my case, 28, sex could be male, height, 180 centimeters. If I've been growing 4986 08:52:16,360 --> 08:52:21,720 overnight, it's really close to 177. Wait, well, it depends on how much I've eaten, but it's around 4987 08:52:21,720 --> 08:52:27,480 about 75 kilos and smoking status, which is zero. So it could be zero or one, because remember, 4988 08:52:27,480 --> 08:52:33,160 we want numerical representation. So for sex, it could be zero for males, one for female, 4989 08:52:33,160 --> 08:52:37,240 height could be its number, weight could be its number as well. All of these numbers could be 4990 08:52:37,240 --> 08:52:43,480 more, could be less as well. So this is really flexible. And it's a hyper parameter. Why? Because 4991 08:52:43,480 --> 08:52:48,840 we decide the values for each of these. So in the case of our image prediction problem, 4992 08:52:48,840 --> 08:52:53,160 we could have in features equals three for number of color channels. And then we go 4993 08:52:54,520 --> 08:52:59,720 hidden layers. So there's the blue circle here. I forgot that this was all timed and colorful. 4994 08:52:59,720 --> 08:53:05,720 But let's just discuss hidden layers. Each of these is a layer and n dot linear and n dot linear 4995 08:53:05,720 --> 08:53:11,160 and n dot relu and n dot linear. So that's the kind of the syntax you'll see in PyTorch for a 4996 08:53:11,160 --> 08:53:15,960 layer is nn dot something. Now, there are many different types of layers in this in PyTorch. 4997 08:53:15,960 --> 08:53:22,840 If we go torch and n, basically everything in here is a layer in a neural network. And then if we 4998 08:53:22,840 --> 08:53:30,040 look up what a neural network looks like, neural network, recall that all of these are different 4999 08:53:30,040 --> 08:53:37,160 layers of some kind of mathematical operation. Input layer, hidden layer, you could have as 5000 08:53:37,160 --> 08:53:44,600 many hidden layers as you want. Do we have ResNet architecture? The ResNet architecture, 5001 08:53:44,600 --> 08:53:50,440 some of them have 50 layers. Look at this. Each one of these is a layer. And this is only the 5002 08:53:50,440 --> 08:53:56,120 34 layer version. I mean, there's ResNet 152, which is 152 layers. We're not at that yet. 5003 08:53:56,840 --> 08:54:02,600 But we're working up the tools to get to that stage. Let's come back to here. The neurons per 5004 08:54:02,600 --> 08:54:09,800 hidden layer. So we've got these, out features, the green circle, the green square. Now, this is, 5005 08:54:09,800 --> 08:54:17,080 if we go back to our neural network picture, this is these. Each one of these little things 5006 08:54:17,080 --> 08:54:24,360 is a neuron, some sort of parameter. So if we had 100, what would that look like? Well, 5007 08:54:24,360 --> 08:54:30,120 we'd have a fairly big graphic. So this is why I like to teach with code because you could customize 5008 08:54:30,120 --> 08:54:35,640 this as flexible as you want. So behind the scenes, PyTorch is going to create 100 of these little 5009 08:54:35,640 --> 08:54:40,600 circles for us. And within each circle is what? Some sort of mathematical operation. 5010 08:54:41,320 --> 08:54:45,960 So if we come back, what do we got next? Output layer shape. So this is how many output features 5011 08:54:45,960 --> 08:54:50,920 we have. So in the case of binary classification is one, one class or the other. We're going to 5012 08:54:50,920 --> 08:54:56,200 see this later on. Multi-class classification is you might have three output features, 5013 08:54:56,200 --> 08:55:02,520 one per class, e.g., one for food, person or dog, if you're building a food, person or dog, 5014 08:55:02,520 --> 08:55:08,680 image classification model. Hidden layer activation, which is, we haven't seen these yet. 5015 08:55:08,680 --> 08:55:15,000 Relu, which is a rectified linear unit, but can be many others because PyTorch, of course, has what? 5016 08:55:15,000 --> 08:55:19,960 Has a lot of non-linear activations. We're going to see this later on. Remember, I'm kind of planting 5017 08:55:19,960 --> 08:55:25,560 the seed here. We've seen what a linear line is, but I want you to imagine what a non-linear line is. 5018 08:55:25,560 --> 08:55:30,280 It's going to be a bit of a superpower for our classification problem. What else do we have? 5019 08:55:30,280 --> 08:55:35,320 Output activation. We haven't got that here, but we'll also see this later on, which could be 5020 08:55:35,320 --> 08:55:41,000 sigmoid for, which is generally sigmoid for binary classification, but softmax for multi-class 5021 08:55:41,000 --> 08:55:45,560 classification. A lot of these things are just names on a page. We haven't seen them yet. 5022 08:55:45,560 --> 08:55:49,880 I like to teach them as we see them, but this is just a general overview of what we're going to 5023 08:55:49,880 --> 08:55:55,720 cover. Loss function. What loss function or what does a loss function do? It measures how 5024 08:55:55,720 --> 08:56:00,680 wrong our model's predictions are compared to what the ideal predictions are. So for binary 5025 08:56:00,680 --> 08:56:06,440 classification, we might use binary cross entropy loss in PyTorch, and for multi-class 5026 08:56:06,440 --> 08:56:12,520 classification, we might just use cross entropy rather than binary cross entropy. Get it? 5027 08:56:12,520 --> 08:56:18,840 Binary classification? Binary cross entropy? And then optimizer. SGD is stochastic gradient descent. 5028 08:56:18,840 --> 08:56:24,280 We've seen that one before. Another common option is the atom optimizer, and of course, 5029 08:56:24,280 --> 08:56:32,440 the torch.optim package has plenty more options. So this is an example multi-class classification 5030 08:56:32,440 --> 08:56:37,160 problem. This network here. Why is that? And we haven't actually seen an end up sequential, 5031 08:56:37,160 --> 08:56:41,400 but as you could imagine, sequential stands for it just goes through each of these steps. 5032 08:56:42,120 --> 08:56:47,080 So multi-class classification, because it has three output features, more than one thing or 5033 08:56:47,080 --> 08:56:53,240 another. So three for food, person or dog, but going back to our food vision problem, 5034 08:56:53,240 --> 08:57:00,040 we could have the input as sushi, steak, or pizza. So we've got three output features, 5035 08:57:00,040 --> 08:57:06,600 which would be one prediction probability per class of image. We have three classes, sushi, 5036 08:57:06,600 --> 08:57:13,320 steak, or pizza. Now, I think we've done enough talking here, and enough just pointing to text 5037 08:57:13,320 --> 08:57:20,760 on slides. How about in the next video? Let's code. I'll see you in Google CoLab. 5038 08:57:22,440 --> 08:57:28,200 Welcome back. Now, we've done enough theory of what a classification problem is, what the inputs 5039 08:57:28,200 --> 08:57:33,160 and outputs are and the typical architecture. Let's get in and write some code. So I'm going to 5040 08:57:33,960 --> 08:57:42,280 get out of this, and going to go to colab.research.google.com, so we can start writing some PyTorch code. 5041 08:57:42,280 --> 08:57:48,760 I'm going to click new notebook. We're going to start exactly from scratch. I'm going to name this 5042 08:57:48,760 --> 08:57:58,840 section two, and let's call it neural network classification with PyTorch. I'm going to put 5043 08:57:58,840 --> 08:58:04,840 underscore video, because I'll just show you, you'll see this in the GitHub repo. But for all the 5044 08:58:04,840 --> 08:58:09,560 video notebooks, the ones that I write code during these videos that you're watching, the exact code 5045 08:58:09,560 --> 08:58:14,760 is going to be saved on the GitHub repo under video notebooks. So there's 00, which is the 5046 08:58:14,760 --> 08:58:19,160 fundamentals, and there's the workflow underscore video. But the reference notebook with all the 5047 08:58:19,160 --> 08:58:26,520 pretty pictures and stuff is in the main folder here. So PyTorch classification that I pi and b 5048 08:58:26,520 --> 08:58:32,280 are actually, maybe we'll just rename it that PyTorch classification. But we know it's with 5049 08:58:32,280 --> 08:58:40,120 neural networks. PyTorch classification. Okay, and let's go here. We'll add a nice title. So O2, 5050 08:58:40,120 --> 08:58:49,480 neural network classification with PyTorch. And so we'll remind ourselves, classification is a 5051 08:58:49,480 --> 08:59:00,520 problem of predicting whether something is one thing or another. And there can be multiple 5052 08:59:02,040 --> 08:59:09,800 things as the options, such as email, spam or not spam, photos of dogs or cats or pizza or 5053 08:59:09,800 --> 08:59:19,720 sushi or steak. Lots of talk about food. And then I'm just going to link in here, this resource, 5054 08:59:19,720 --> 08:59:25,560 because this is the book version of the course. These are what the videos are based off. So book 5055 08:59:25,560 --> 08:59:34,680 version of this notebook. And then all the resources are in here. All other resources 5056 08:59:34,680 --> 08:59:46,520 in the GitHub, and then stuck. Ask a question here, which is under the discussions tab. We'll 5057 08:59:46,520 --> 08:59:51,960 copy that in here. That way we've got everything linked and ready to go. But as always, what's our 5058 08:59:51,960 --> 08:59:58,280 first step in our workflow? This is a little test. See if you remember. Well, it's data, of course, 5059 08:59:58,280 --> 09:00:03,240 because all machine learning problems start with some form of data. We can't write a machine 5060 09:00:03,240 --> 09:00:09,160 learning algorithm to learn patterns and data that doesn't exist. So let's do this video. We're 5061 09:00:09,160 --> 09:00:14,760 going to make some data. Of course, you might start with some of your own that exists. But for now, 5062 09:00:14,760 --> 09:00:18,840 we're going to focus on just the concepts around the workflow. So we're going to make our own 5063 09:00:18,840 --> 09:00:24,600 custom data set. And to do so, I'll write the code first, and then I'll show you where I get it from. 5064 09:00:24,600 --> 09:00:29,960 We're going to import the scikit loan library. One of the beautiful things about Google Colab 5065 09:00:29,960 --> 09:00:36,760 is that it has scikit loan available. You're not sure what scikit loan is. It's a very popular 5066 09:00:36,760 --> 09:00:42,120 machine learning library. PyTorch is mainly focused on deep learning, but scikit loan is 5067 09:00:42,120 --> 09:00:47,400 focused on a lot of things around machine learning. So Google Colab, thank you for having scikit 5068 09:00:47,400 --> 09:00:53,320 loan already installed for us. But we're going to import the make circles data set. And rather 5069 09:00:53,320 --> 09:01:00,920 than talk about what it does, let's see what it does. So make 1000 samples. We're going to go N 5070 09:01:00,920 --> 09:01:10,040 samples equals 1000. And we're going to create circles. You might be wondering why circles. Well, 5071 09:01:10,040 --> 09:01:16,040 we're going to see exactly why circles later on. So X and Y, we're going to use this variable. 5072 09:01:16,040 --> 09:01:23,000 How would you say nomenclature as capital X and Y. Why is that? Because X is typically a matrix 5073 09:01:23,000 --> 09:01:32,040 features and labels. So let's go here. Mate circles. And we're going to make N samples. So 1000 different 5074 09:01:32,040 --> 09:01:36,600 samples. We're going to add some noise in there. Just put a little bit of randomness. Why not? 5075 09:01:36,600 --> 09:01:42,520 You can increase this as you want. I found that 0.03 is fairly good for what we're doing. And 5076 09:01:42,520 --> 09:01:46,680 then we're going to also pass in the random state variable, which is equivalent to sitting a random 5077 09:01:46,680 --> 09:01:53,400 or setting a random seed. So we're flavoring the randomness here. Wonderful. So now let's 5078 09:01:53,400 --> 09:02:00,040 have a look at the length of X, which should be what? And length of Y. Oh, we don't have Y 5079 09:02:00,040 --> 09:02:07,320 underscore getting a bit trigger happy with this keyboard here. 1000. So we have 1000 samples of 5080 09:02:07,320 --> 09:02:14,200 X caught with 1000 or paired with 1000 samples of Y features labels. So let's have a look at the 5081 09:02:14,200 --> 09:02:24,360 first five of X. So print first five samples of X. And then we'll put in here X. And we can index 5082 09:02:24,360 --> 09:02:33,240 on this five because we're adhering to the data, explorer's motto of visualize visualize visualize 5083 09:02:34,360 --> 09:02:39,640 first five samples of Y. And then we're going to go why same thing here. 5084 09:02:39,640 --> 09:02:47,480 Wonderful. Let's have a look. Maybe we'll get a new line in here. Just so 5085 09:02:50,600 --> 09:02:57,160 looks a bit better. Wonderful. So numerical. Our samples are already numerical. This is one of 5086 09:02:57,160 --> 09:03:01,640 the reasons why we're creating our own data set. We'll see later on how we get non numerical data 5087 09:03:01,640 --> 09:03:07,800 into numbers. But for now, our data is numerical, which means we can learn it with our model or 5088 09:03:07,800 --> 09:03:14,440 we can build a model to learn patterns in here. So this sample has the label of one. And this 5089 09:03:14,440 --> 09:03:19,880 sample has the label of one as well. Now, how many features do we have per sample? If I highlight 5090 09:03:19,880 --> 09:03:26,520 this line, how many features is this? It would make it a bit easier if there was a comma here, 5091 09:03:26,520 --> 09:03:33,960 but we have two features of X, which relates to one label of Y. And so far, we've only seen, 5092 09:03:33,960 --> 09:03:39,800 let's have a look at all of Y. We've got zero on one. So we've got two classes. What does this 5093 09:03:39,800 --> 09:03:46,760 mean? Zero or one? One thing or another? Well, it looks like binary classification to me, 5094 09:03:46,760 --> 09:03:52,120 because we've got only zero or only one. If there was zero, one, two, it would be 5095 09:03:53,160 --> 09:03:57,640 multi class classification, because we have more than two things. So let's X out of this. 5096 09:03:57,640 --> 09:04:03,800 Let's keep going and do a little bit more data exploration. So how about we make a data frame? 5097 09:04:03,800 --> 09:04:11,960 With pandas of circle data. There is truly no real definite way of how to explore data. 5098 09:04:11,960 --> 09:04:17,800 For me, I like to visualize it multiple different ways, or even look at random samples. In the case 5099 09:04:17,800 --> 09:04:26,040 of large data sets, such as images or text or whatnot. If you have 10 million samples, perhaps 5100 09:04:26,040 --> 09:04:34,200 visualizing them one by one is not the best way to do so. So random can help you out there. 5101 09:04:34,200 --> 09:04:39,800 So we're going to create a data frame, and we can insert a dictionary here. So I'm going to call 5102 09:04:39,800 --> 09:04:47,960 the features in this part of X, X1, and these are going to be X2. So let's say I'll write some code 5103 09:04:47,960 --> 09:04:58,360 to index on this. So everything in the zero index will be X1. And everything in the first index, 5104 09:04:59,000 --> 09:05:04,280 there we go, will be X2. Let me clean up this code. This should be on different lines, 5105 09:05:05,160 --> 09:05:15,720 enter. And then we've got, let's put in the label as Y. So this is just a dictionary here. 5106 09:05:15,720 --> 09:05:22,600 So X1 key to X0. X2, a little bit confusing because of zero indexing, but X feature one, 5107 09:05:22,600 --> 09:05:28,760 X feature two, and the label is Y. Let's see what this looks like. We'll look at the first 10 samples. 5108 09:05:29,800 --> 09:05:36,920 Okay, beautiful. So we've got X1, some numerical value, X2, another numerical value, correlates 5109 09:05:36,920 --> 09:05:46,600 to or matches up with label zero. But then this one, 0442208, and negative that number matches up 5110 09:05:46,600 --> 09:05:52,840 with label zero. So I can't tell what the patterns are just looking at these numbers. You might be 5111 09:05:52,840 --> 09:05:57,240 able to, but I definitely can't. We've got some ones. All these numbers look the same to me. So 5112 09:05:57,880 --> 09:06:04,040 what can we do next? Well, how about we visualize, visualize, visualize, and instead of just numbers 5113 09:06:04,040 --> 09:06:11,880 in a table, let's get graphical this time, visualize, visualize, visualize. So we're going to bring in 5114 09:06:11,880 --> 09:06:19,560 our friendly mapplotlib, import mapplotlib, which is a very powerful plotting library. I'm just 5115 09:06:19,560 --> 09:06:28,440 going to add some cells here. So we've got some space, mapplotlib.pyplot as PLT. That's right. 5116 09:06:28,440 --> 09:06:35,240 We've got this plot.scatter. We're going to do a scatterplot equals X. And we want the first index. 5117 09:06:35,880 --> 09:06:43,160 And then Y is going to be X as well. So that's going to appear on the Y axis. And then we want to 5118 09:06:43,160 --> 09:06:48,520 color it with labels. We're going to see what this looks like in a second. And then the color map, 5119 09:06:49,720 --> 09:06:56,760 C map stands for color map is going to be plot dot color map PLT. And then red, yellow, blue, 5120 09:06:56,760 --> 09:07:01,480 one of my favorite color outputs. So let's see what this looks like. You ready? 5121 09:07:03,720 --> 09:07:10,680 Ah, there we go. There's our circles. That's a lot better for me. So what do you think we're 5122 09:07:10,680 --> 09:07:15,320 going to try and do here? If this is our data and we're working on classification, 5123 09:07:16,360 --> 09:07:22,200 we're trying to predict if something is one thing or another. So our problem is we want to 5124 09:07:22,200 --> 09:07:29,080 try and separate these two circles. So say given a number here or given two numbers and X one 5125 09:07:29,080 --> 09:07:34,520 and an X two, which are coordinates here, we want to predict the label. Is it going to be a blue 5126 09:07:34,520 --> 09:07:40,920 dot or is it going to be a red dot? So we're working with binary classification. So we have 5127 09:07:40,920 --> 09:07:46,920 one thing or another. Do we have a blue dot or a red dot? So this is going to be our toy data here. 5128 09:07:46,920 --> 09:07:50,840 And a toy problem is, let me just write this down. This is a common thing that you'll also 5129 09:07:50,840 --> 09:08:01,400 hear in machine learning. Note, the data we're working with is often referred to as a toy data set, 5130 09:08:02,520 --> 09:08:15,960 a data set that is small enough to experiment on, but still sizable enough to practice the 5131 09:08:15,960 --> 09:08:20,760 fundamentals. And that's what we're really after in this notebook is to practice the fundamentals 5132 09:08:20,760 --> 09:08:27,000 of neural network classification. So we've got a perfect data set to do this. And by the way, 5133 09:08:27,000 --> 09:08:32,200 we've got this from scikit-learn. So this little function here made all of these samples for us. 5134 09:08:32,760 --> 09:08:38,520 And how could you find out more about this function here? Well, you could go scikit-learn 5135 09:08:39,080 --> 09:08:43,720 classification data sets. There are actually a few more in here that we could have done. 5136 09:08:43,720 --> 09:08:49,160 I just like the circle one. Toy data sets, we saw that. So this is like a toy box of different 5137 09:08:49,160 --> 09:08:54,280 data sets. So if you'd like to learn more about some data sets that you can have a look in here 5138 09:08:54,280 --> 09:08:59,400 and potentially practice on with neural networks or other forms of machine learning models from 5139 09:08:59,400 --> 09:09:04,360 scikit-learn, check out this scikit-learn. I can't speak highly enough. I know this is a pie-torch 5140 09:09:04,360 --> 09:09:08,840 course. We're not focused on this, but they kind of all come together in terms of the machine 5141 09:09:08,840 --> 09:09:12,840 learning and deep learning world. You might use something from scikit-learn, like we've done here, 5142 09:09:12,840 --> 09:09:17,160 to practice something. And then you might use pie-torch for something else, like what we're 5143 09:09:17,160 --> 09:09:23,160 doing here. Now, with that being said, what are the input and output shapes of our problem? 5144 09:09:25,480 --> 09:09:30,280 Have a think about that. And also have a think about how we'd split this into training and test. 5145 09:09:31,800 --> 09:09:36,280 So give those a go. We covered those concepts in some previous videos, 5146 09:09:36,280 --> 09:09:39,560 but we'll do them together in the next video. I'll see you there. 5147 09:09:39,560 --> 09:09:46,760 Welcome back. In the last video, we made some classification data so that we can 5148 09:09:46,760 --> 09:09:52,920 practice building a neural network in pie-torch to separate the blue dots from the red dots. 5149 09:09:52,920 --> 09:09:56,920 So let's keep pushing forward on that. And I'll just clean up here a little bit, 5150 09:09:56,920 --> 09:10:02,360 but where are we in our workflow? What have we done so far? Well, we've got our data ready a 5151 09:10:02,360 --> 09:10:06,840 little bit. We haven't turned it into tenses. So let's do that in this video, and then we'll 5152 09:10:06,840 --> 09:10:14,920 keep pushing through all of these. So in here, I'm going to make this heading 1.1. Check input 5153 09:10:14,920 --> 09:10:19,880 and output shapes. The reason we're focused a lot on input and output shapes is why, 5154 09:10:20,520 --> 09:10:27,480 because machine learning deals a lot with numerical representations as tenses. And input and output 5155 09:10:27,480 --> 09:10:32,360 shapes are some of the most common errors, like if you have a mismatch between your input and 5156 09:10:32,360 --> 09:10:36,760 output shapes of a certain layer of an output layer, you're going to run into a lot of errors 5157 09:10:36,760 --> 09:10:42,360 there. So that's why it's good to get acquainted with whatever data you're using, what are the 5158 09:10:42,360 --> 09:10:50,760 input shapes and what are the output shapes you'd like. So in our case, we can go x dot shape 5159 09:10:50,760 --> 09:10:56,840 and y dot shape. So we're working with NumPy arrays here if we just look at x. That's what the 5160 09:10:56,840 --> 09:11:02,040 make circles function is created for us. We've got an array, but as our workflow says, 5161 09:11:02,040 --> 09:11:06,760 we'd like it in tenses. If we're working with PyTorch, we want our data to be represented as 5162 09:11:06,760 --> 09:11:13,080 PyTorch tenses of that data type. And so we've got a shape here, we've got a thousand samples, 5163 09:11:13,080 --> 09:11:19,000 and x has two features, and y has no features. It's just a single number. It's a scalar. So it 5164 09:11:19,000 --> 09:11:23,960 doesn't have a shape here. So there's a thousand samples of y, thousand samples of x, two samples 5165 09:11:23,960 --> 09:11:30,520 of x equals one y label. Now, if you're working with a larger problem, you might have a thousand 5166 09:11:30,520 --> 09:11:38,840 samples of x, but x is represented by 128 different numbers, or 200 numbers, or as high as you want, 5167 09:11:38,840 --> 09:11:44,760 or just 10 or something like that. So just keep in mind that this number is quite flexible of how 5168 09:11:44,760 --> 09:11:52,440 many features represent a label. Why is the label here? But let's keep going. So view the first 5169 09:11:52,440 --> 09:12:01,000 example of features and labels. So let's make it explicit with what we've just been discussing. 5170 09:12:01,000 --> 09:12:06,680 We'll write some code to do so. We'll get the first sample of x, which is the zero index, 5171 09:12:06,680 --> 09:12:13,480 and we'll get the first sample of y, which is also the zero index. We could get really anyone 5172 09:12:13,480 --> 09:12:22,200 because they're all of the same shape. But print values for one sample of x. What does this equal? 5173 09:12:22,200 --> 09:12:34,840 X sample, and the same for y, which is y sample. And then we want to go print f string for one 5174 09:12:34,840 --> 09:12:45,560 sample of x. We'll get the shape here. X sample dot shape, and the same for y, and then we'll get 5175 09:12:45,560 --> 09:12:53,480 y sample dot shape. Beautiful. What's this going to do? Well, we've got one sample of x. So this 5176 09:12:53,480 --> 09:13:02,520 sample here of these numbers, we've got a lot going on here. 75424625 and 0231 48074. I mean, 5177 09:13:02,520 --> 09:13:07,880 you can try to find some patterns in those. If you do, all the best here, and the same for y. So this 5178 09:13:07,880 --> 09:13:14,280 is, we have the y sample, this correlates to a number one, a label of one. And then we have 5179 09:13:14,280 --> 09:13:20,680 shapes for one sample of x, which is two. So we have two features for y. It's a little bit confusing 5180 09:13:20,680 --> 09:13:26,040 here because y is a scalar, which doesn't actually have a shape. It's just one value. So for me, 5181 09:13:26,040 --> 09:13:31,320 in terms of speaking this, teaching it out loud, we'll be two features of x trying to predict 5182 09:13:31,320 --> 09:13:39,640 one number for y. And so let's now create another heading, which is 1.2. Let's get our data into 5183 09:13:39,640 --> 09:13:46,040 tenses, turn data into tenses. We have to convert them from NumPy. And we also want to create 5184 09:13:46,040 --> 09:13:51,000 train and test splits. Now, even though we're working with a toy data set here, the principle 5185 09:13:51,000 --> 09:13:57,480 of turning data into tenses and creating train and test splits will stay around for almost any 5186 09:13:57,480 --> 09:14:02,520 data set that you're working with. So let's see how we can do that. So we want to turn data 5187 09:14:02,520 --> 09:14:11,240 into tenses. And for this, we need to import torch, get pytorch and we'll check the torch version. 5188 09:14:11,240 --> 09:14:20,600 It has to be at least 1.10. And I might just put this down in the next cell. Just make sure we can 5189 09:14:20,600 --> 09:14:27,160 import pytorch. There we go, 1.10 plus CUDA 111. If your version is higher than that, that is okay. 5190 09:14:27,160 --> 09:14:33,640 The code below should still work. And if it doesn't, let me know. So x equals torch dot 5191 09:14:34,760 --> 09:14:42,600 from NumPy. Why are we doing this? Well, it's because x is a NumPy array. And if we go x dot, 5192 09:14:42,600 --> 09:14:52,760 does it have a d type attribute float 64? Can we just go type or maybe type? Oh, there we go. 5193 09:14:52,760 --> 09:15:01,160 NumPy and DRA. We can just go type x. NumPy and DRA. So we want it in a torch tensor. So we're 5194 09:15:01,160 --> 09:15:05,880 going to go from NumPy. We saw this in the fundamental section. And then we're going to change it into 5195 09:15:05,880 --> 09:15:12,760 type torch dot float. A float is an alias for float 32. We could type the same thing. These two are 5196 09:15:12,760 --> 09:15:19,000 equivalent. I just going to type torch float for writing less code. And then we're going to go 5197 09:15:19,000 --> 09:15:24,120 the same with why torch from NumPy. Now, why do we turn it into a torch float? Well, that's 5198 09:15:24,120 --> 09:15:33,240 because if you recall, the default type of NumPy arrays is, if we go might just put out this in 5199 09:15:33,240 --> 09:15:41,560 a comma x dot D type is float 64. There we go. However, pytorch, the default type is float 32. 5200 09:15:41,560 --> 09:15:46,360 So we're changing it into pytorch's default type. Otherwise, if we didn't have this little 5201 09:15:46,360 --> 09:15:51,880 section of code here dot type torch dot float, our tensors would be of float 64 as well. And that 5202 09:15:51,880 --> 09:15:58,200 may cause errors later on. So we're just going for the default data type within pytorch. And so 5203 09:15:58,200 --> 09:16:04,920 now let's have a look at the first five values of x and the first five values of y. What do we 5204 09:16:04,920 --> 09:16:11,800 have? Beautiful. We have tensor data types here. And now if we check the data type of x and we 5205 09:16:11,800 --> 09:16:19,640 check the data type of y, what do we have? And then one more, we'll just go type x. So we have 5206 09:16:19,640 --> 09:16:27,000 our data into tensors. Wonderful. But now so it's torch dot tensor. Beautiful. But now we would like 5207 09:16:27,000 --> 09:16:38,120 training and test sets. So let's go split data into training and test sets. And a very, very popular 5208 09:16:38,120 --> 09:16:45,160 way to split data is a random split. So before I issued the challenge of how you would split this 5209 09:16:45,160 --> 09:16:50,760 into a training and test set. So because these data points are kind of scattered all over the 5210 09:16:50,760 --> 09:16:58,680 place, we could split them randomly. So let's see what that looks like. To do so, I'm going to 5211 09:16:58,680 --> 09:17:05,000 use our faithful scikit learn again. Remember how I said scikit learn has a lot of beautiful methods 5212 09:17:05,000 --> 09:17:09,400 and functions for a whole bunch of different machine learning purposes. Well, one of them is 5213 09:17:09,400 --> 09:17:16,040 for a train test split. Oh my goodness, pytorch I didn't want auto correct there. Train test split. 5214 09:17:16,040 --> 09:17:20,520 Now you might be able to guess what this does. These videos are going to be a battle between me and 5215 09:17:20,520 --> 09:17:26,200 code labs auto correct. Sometimes it's good. Other times it's not. So we're going to set this code 5216 09:17:26,200 --> 09:17:31,080 up. I'm going to write it or we're going to write it together. So we've got x train for our training 5217 09:17:31,080 --> 09:17:36,360 features and X tests for our testing features. And then we also want our training labels and 5218 09:17:36,360 --> 09:17:43,080 our testing labels. That order is the order that train test split works in. And then we have train 5219 09:17:43,080 --> 09:17:48,200 test split. Now if we wrote this function and we wanted to find out more, I can press command 5220 09:17:48,200 --> 09:17:53,800 ship space, which is what I just did to have this. But truly, I don't have a great time reading all 5221 09:17:53,800 --> 09:18:01,160 of this. You might. But for me, I just like going train test split. And possibly one of the first 5222 09:18:01,160 --> 09:18:06,760 functions that appears, yes, is scikit learn. How good is that? So scikit learn dot model selection 5223 09:18:06,760 --> 09:18:13,800 dot train test split. Now split arrays or matrices into random train and test subsets. Beautiful. 5224 09:18:13,800 --> 09:18:19,000 We've got a code example of what's going on here. You can read what the different parameters do. 5225 09:18:19,000 --> 09:18:23,880 But we're going to see them in action. This is just another example of where machine learning 5226 09:18:23,880 --> 09:18:29,080 libraries such as scikit learn, we've used matplotlib, we've used pandas, they all interact 5227 09:18:29,080 --> 09:18:34,360 together to serve a great purpose. But now let's pass in our features and our labels. 5228 09:18:35,320 --> 09:18:41,560 This is the order that they come in, by the way. Oh, and we have the returns splitting. So the 5229 09:18:41,560 --> 09:18:47,320 order here, I've got the order goes x train x test y train y test took me a little while to 5230 09:18:47,320 --> 09:18:51,560 remember this order. But once you've created enough training test splits with this function, 5231 09:18:51,560 --> 09:18:56,280 you kind of know this off by heart. So just remember features first train first and then labels. 5232 09:18:57,320 --> 09:19:02,680 And we jump back in here. So I'm going to put in the test size parameter of 0.2. 5233 09:19:02,680 --> 09:19:10,200 This is percentage wise. So let me just write here 0.2 equals 20% of data will be test. 5234 09:19:10,200 --> 09:19:19,240 And 80% will be train. If we wanted to do a 50 50 split, that kind of split doesn't usually 5235 09:19:19,240 --> 09:19:27,080 happen, but you could go 0.5. But the test size says, hey, how big and percentage wise do you want 5236 09:19:27,080 --> 09:19:33,880 your test data to be? And so behind the scenes train test split will calculate what's 20% of 5237 09:19:33,880 --> 09:19:39,400 our x and y samples. So we'll see how many there is in a second. But let's also put a random state 5238 09:19:39,400 --> 09:19:45,640 in here. Because if you recall back in the documentation, train test split splits data 5239 09:19:45,640 --> 09:19:52,040 randomly into random train and test subsets. And random state, what does that do for us? Well, 5240 09:19:52,040 --> 09:20:00,040 this is a random seed equivalent of very similar to torch dot manual seed. However, because we are 5241 09:20:00,040 --> 09:20:07,000 using scikit learn, setting torch dot manual seed will only affect pytorch code rather than 5242 09:20:07,000 --> 09:20:14,040 scikit learn code. So we do this so that we get similar random splits. As in, I get a similar 5243 09:20:14,040 --> 09:20:20,680 random split to what your random split is. And in fact, they should be exactly the same. So let's 5244 09:20:20,680 --> 09:20:30,360 run this. And then we'll check the length of x train. And length of x test. So if we have 1000 5245 09:20:30,360 --> 09:20:35,800 total samples, and I know that because above in our make circles function, we said we want 5246 09:20:35,800 --> 09:20:40,440 1000 samples, that could be 10,000, that could be 100. That's the beauty of creating your own 5247 09:20:40,440 --> 09:20:47,480 data set. And we have length y train. If we have 20% testing values, how many samples are going 5248 09:20:47,480 --> 09:20:57,080 to be dedicated to the test sample, 20% of 1000 years, 200, and 80%, which is because training is 5249 09:20:57,080 --> 09:21:06,280 going to be training here. So 100 minus 20% is 80%. So 80% of 1000 years, let's find out. 5250 09:21:07,480 --> 09:21:15,480 Run all beautiful. So we have 800 training samples, 200 testing samples. This is going to be the 5251 09:21:15,480 --> 09:21:20,600 data set that we're going to be working with. So in the next video, we've now got training and 5252 09:21:20,600 --> 09:21:26,520 test sets, we've started to move through our beautiful pytorch workflow here. We've got our 5253 09:21:26,520 --> 09:21:30,840 data ready, we've turned it into tenses, we've created a training and test split. Now it's time 5254 09:21:30,840 --> 09:21:36,120 to build or pick a model. So I think we're still in the building phase. Let's do that in the next 5255 09:21:36,120 --> 09:21:45,080 video. Welcome back. In the last video, we split our data into training and test sets. And because 5256 09:21:45,080 --> 09:21:51,960 we did 80 20 split, we've got about 800 samples to train on, and 200 samples to test on. Remember, 5257 09:21:51,960 --> 09:21:59,160 the training set is so that the model can learn patterns, patterns that represent this data set 5258 09:21:59,160 --> 09:22:04,680 here, the circles data set, red dots or blue dots. And the test data set is so that we can 5259 09:22:04,680 --> 09:22:10,200 evaluate those patterns. And I took a little break before, but you can tell that because my 5260 09:22:10,200 --> 09:22:15,080 notebook is disconnected. But if I wanted to reconnect it, what could I do? We can go here, 5261 09:22:15,080 --> 09:22:19,800 run time, run before that's going to run all of the cells before. It shouldn't take too long 5262 09:22:19,800 --> 09:22:25,560 because we haven't done any large computations. But this is good timing because we're up to part 5263 09:22:25,560 --> 09:22:32,680 two, building a model. And so there's a fair few steps here, but nothing that we haven't covered 5264 09:22:32,680 --> 09:22:43,160 before, we're going to break it down. So let's build a model to classify our blue and red dots. 5265 09:22:43,160 --> 09:22:55,720 And to do so, we want to tenses. I want to not tenses. That's all right. So let me just make 5266 09:22:55,720 --> 09:23:01,560 some space here. There we go. So number one, let's set up device agnostic code. So we get in the 5267 09:23:01,560 --> 09:23:11,160 habit of creating that. So our code will run on an accelerator. I can't even spell accelerator. 5268 09:23:11,160 --> 09:23:19,320 It doesn't matter. You know what I mean? GPU. If there is one. Two. What should we do next? 5269 09:23:20,040 --> 09:23:23,720 Well, we should construct a model. Because if we want to build a model, we need a model. 5270 09:23:24,280 --> 09:23:30,280 Construct a model. And we're going to go by subclassing and then dot module. 5271 09:23:31,160 --> 09:23:36,200 Now we saw this in the previous section, we subclassed and then module. In fact, all models 5272 09:23:36,200 --> 09:23:45,560 in PyTorch subclass and end up module. And let's go define loss function and optimizer. 5273 09:23:47,480 --> 09:23:55,240 And finally, good collabs auto correct is not ideal. And then we'll create a training 5274 09:23:56,200 --> 09:24:00,600 and test loop. Though this will probably be in the next section. We'll focus on building a model 5275 09:24:00,600 --> 09:24:05,400 here. And of course, all of these steps are in line with what they're in line with this. 5276 09:24:05,400 --> 09:24:09,320 So we don't have device agnostic code here, but we're just going to do it enough so that we have 5277 09:24:09,320 --> 09:24:13,640 a habit. These are the main steps. Pick or build a pre-trained model, suit your problem, pick a 5278 09:24:13,640 --> 09:24:19,640 loss function and optimizer, build a training loop. So let's have a look. How can we start this off? 5279 09:24:19,640 --> 09:24:26,680 So we will import PyTorch. And and then we've already done this, but we're going to do it anyway 5280 09:24:26,680 --> 09:24:34,280 for completeness, just in case you wanted to run your code from here, import and then. And we're 5281 09:24:34,280 --> 09:24:44,280 going to make device agnostic code. So we'll set the device equal to CUDA if torch dot CUDA 5282 09:24:45,480 --> 09:24:53,400 is available else CPU, which will be the default. The CPU is the default. If there's no GPU, 5283 09:24:53,400 --> 09:24:58,840 which means that CUDA is available, all of our PyTorch code will default to using the CPU 5284 09:24:58,840 --> 09:25:05,560 device. Now we haven't set up a GPU yet so far. You may have, but as you see, my target device is 5285 09:25:05,560 --> 09:25:11,400 currently CPU. How about we set up a GPU? We can go into here runtime change runtime type 5286 09:25:11,960 --> 09:25:17,320 GPU. And I'm going to click save. Now this is going to restart the runtime and reconnect. 5287 09:25:18,360 --> 09:25:23,960 So once it reconnects beautiful, we could actually just run this code cell here. 5288 09:25:23,960 --> 09:25:29,640 This is going to set up the GPU device, but because we're only running this cell, if we were to just 5289 09:25:29,640 --> 09:25:37,080 set up X train, we've not been defined. So because we restarted our runtime, let's run all or we can 5290 09:25:37,080 --> 09:25:45,320 just run before. So this is going to rerun all of these cells here. And do we have X train now? 5291 09:25:45,320 --> 09:25:51,720 Let's have a look. Wonderful. Yes, we do. Okay, beautiful. So we've got device agnostic code. 5292 09:25:51,720 --> 09:25:56,680 In the next video, let's get on to constructing a model. I'll see you there. 5293 09:25:58,680 --> 09:26:04,040 Welcome back. In the last video, we set up some device agnostic code. So this is going to come in 5294 09:26:04,040 --> 09:26:09,480 later on when we send our model to the target device, and also our data to the target device. 5295 09:26:09,480 --> 09:26:13,160 This is an important step because that way, if someone else was able to run your code or you 5296 09:26:13,160 --> 09:26:17,560 were to run your code in the future, because we've set it up to be device agnostic, 5297 09:26:17,560 --> 09:26:21,720 quite a fault it will run on the CPU. But if there's an accelerator present, 5298 09:26:22,360 --> 09:26:27,480 well, that means that it might go faster because it's using a GPU rather than just using a CPU. 5299 09:26:28,040 --> 09:26:32,920 So we're up to step two here construct a model by subclassing and in module. I think we're going to 5300 09:26:33,880 --> 09:26:38,600 write a little bit of text here just to plan out the steps that we're doing. Now we've 5301 09:26:38,600 --> 09:26:48,920 set up device agnostic code. Let's create a model that we're going to break it down. We've got some 5302 09:26:48,920 --> 09:26:55,480 sub steps up here. We're going to break it down even this one down into some sub-sub steps. So number 5303 09:26:55,480 --> 09:27:04,440 one is we're going to subclass and then got module. And a reminder here, I want to make some space, 5304 09:27:04,440 --> 09:27:11,160 just so we're coding in about the middle of the page. So almost all models in pytorch, 5305 09:27:11,720 --> 09:27:17,240 subclass, and then got module because there's some great things that it does for us behind the 5306 09:27:17,240 --> 09:27:29,400 scenes. And step two is we're going to create two and then dot linear layers. And we want these 5307 09:27:29,400 --> 09:27:37,760 that are capable to handle our data. So that are capable of handling the shapes of our data. 5308 09:27:37,760 --> 09:27:46,120 Step three, we want to define or defines a forward method. Why do we want to define a forward method? 5309 09:27:46,120 --> 09:27:52,640 Well, because we're subclassing an end dot module, right? And so the forward method defines a forward 5310 09:27:52,640 --> 09:28:07,520 method that outlines the forward pass or forward computation of the model. And number four, we want 5311 09:28:07,520 --> 09:28:12,960 to instantiate, well, this doesn't really have to be the part of creating it, but we're going to do 5312 09:28:12,960 --> 09:28:27,280 anyway, and instantiate an instance of our model class and send it to the target device. So I'm 5313 09:28:27,280 --> 09:28:31,440 going to be a couple of little different steps here, but nothing too dramatic that we haven't really 5314 09:28:31,440 --> 09:28:39,360 covered before. So let's go number one, construct a model that subclasses an end dot module. 5315 09:28:39,360 --> 09:28:44,720 So I'm going to code this all out. Well, we're going to code this all out together. And then we'll 5316 09:28:44,720 --> 09:28:49,200 go back through and discuss it, and then maybe draw a few pictures or something to check out 5317 09:28:49,200 --> 09:28:56,000 what's actually happening. So circle model V one, because we're going to try and split some circles, 5318 09:28:56,000 --> 09:29:02,800 red and blue circles. This is our data up here. This is why it's called circle model, because we're 5319 09:29:02,800 --> 09:29:09,200 trying to separate the blue and red circle using a neural network. So we've subclassed an end dot 5320 09:29:09,200 --> 09:29:14,880 module. And when we create a class in Python, we'll create a constructor here, a net, and then 5321 09:29:15,520 --> 09:29:22,160 put in super dot underscore a net. And then inside the constructor, we're going to create our 5322 09:29:22,160 --> 09:29:27,920 layers. So this is number two, create two and then linear layers, capable of handling the shapes 5323 09:29:27,920 --> 09:29:39,120 of our data. So I'm going to write this down here to create two, two, and then dot linear layers, capable 5324 09:29:39,120 --> 09:29:47,680 of handling the shapes of our data. And so if we have a look at X train, what are the shapes here? 5325 09:29:47,680 --> 09:29:52,400 What's the input shape? Because X train is our features, right? Now features are going to go 5326 09:29:52,400 --> 09:30:00,800 into our model. So we have 800 training samples. This is the first number here of size two each. 5327 09:30:01,360 --> 09:30:07,600 So 800 of these and inside each is two numbers. Again, depending on the data set you're working 5328 09:30:07,600 --> 09:30:13,920 with, your features may be 100 in length, a vector of 100, or maybe a different size tensor all 5329 09:30:13,920 --> 09:30:18,080 together, or there may be millions. It really depends on what data set you're working with. 5330 09:30:18,080 --> 09:30:21,920 Because we're working with a simple data set, we're going to focus on that. But the principal 5331 09:30:21,920 --> 09:30:27,440 is still the same. You need to define a neural network layer that is capable of handling your 5332 09:30:27,440 --> 09:30:33,920 input features. So we're going to make layer one equals an n dot linear. And then if we wanted 5333 09:30:33,920 --> 09:30:38,960 to find out what's going on an n n dot linear, we could run shift command space on my computer, 5334 09:30:38,960 --> 09:30:43,760 because it's a Mac, maybe shift control space if you're on Windows. So we're going to define the 5335 09:30:43,760 --> 09:30:50,240 n features. What would n features be here? Well, we just decided that our X has two features. 5336 09:30:50,240 --> 09:30:55,440 So n features are going to be two. And now what is the out features? This one is a little bit tricky. 5337 09:30:58,560 --> 09:31:04,480 So in our case, we could have out features equal to one if we wanted to just pass a single linear 5338 09:31:04,480 --> 09:31:10,400 layer, but we want to create two linear layers here. So why would out features be one? Well, 5339 09:31:10,400 --> 09:31:16,960 that's because if we have a look at the first sample of Y train, we would want us to input, 5340 09:31:16,960 --> 09:31:25,040 or maybe we'll look at the first five. We want to map one sample of X to one sample of Y and Y 5341 09:31:25,040 --> 09:31:29,600 has a shape of one. Oh, well, really, it's nothing because it's a scalar, but we would still put 5342 09:31:29,600 --> 09:31:34,240 one here so that it outputs just one number. But we're going to change this up. We're going to put 5343 09:31:34,240 --> 09:31:40,320 it into five and we're going to create a second layer. Now, this is an important point of joining 5344 09:31:40,320 --> 09:31:47,040 together neural networks in features here. What do you think the in features of our second layer is 5345 09:31:47,040 --> 09:31:52,640 going to be? If we've produced an out feature of five here, now this number is arbitrary. We could 5346 09:31:52,640 --> 09:32:00,480 do 128. We could do 256. Generally, it's multiples of 8, 64. We're just doing five now because we're 5347 09:32:00,480 --> 09:32:04,880 keeping it nice and simple. We could do eight multiples of eight is because of the efficiency 5348 09:32:04,880 --> 09:32:09,360 of computing. I don't know enough about computer hardware to know exactly why that's the case, 5349 09:32:09,360 --> 09:32:14,800 but that's just a rule of thumb in machine learning. So the in features here has to match up with the 5350 09:32:14,800 --> 09:32:20,960 out features of a previous layer. Otherwise, we'll get shape mismatch errors. And so let's go here 5351 09:32:20,960 --> 09:32:26,000 out features. So we're going to treat this as the output layer. So this is the out features equals 5352 09:32:26,000 --> 09:32:35,840 one. So takes in two features and upscales to five features. So five numbers. So what this does, 5353 09:32:35,840 --> 09:32:41,760 what this layer is going to do is take in these two numbers of X, perform an end up linear. Let's 5354 09:32:41,760 --> 09:32:47,520 have a look at what equation it does. An end up linear is going to perform this function here 5355 09:32:48,240 --> 09:32:56,160 on the inputs. And it's going to upscale it to five features. Now, why would we do that? Well, 5356 09:32:56,160 --> 09:33:01,440 the rule of thumb here, because this is denoted as a hidden unit, or how many hidden neurons there 5357 09:33:01,440 --> 09:33:06,160 are. The rule of thumb is that the more hidden features there are, the more opportunity our model 5358 09:33:06,160 --> 09:33:11,360 has to learn patterns in the data. So to begin with, it only has two numbers to learn patterns on, 5359 09:33:11,360 --> 09:33:18,000 but at when we upscale it to five, it has five numbers to learn patterns on. Now, you might think, 5360 09:33:18,000 --> 09:33:22,400 why don't we just go straight to like 10,000 or something? But there is like an upper limit here 5361 09:33:22,400 --> 09:33:27,360 to sort of where the benefits start to trail off. We're just using five because it keeps it nice 5362 09:33:27,360 --> 09:33:32,960 and simple. And then the in features of the next layer is five, so that these two line up. We're 5363 09:33:32,960 --> 09:33:37,680 going to map this out visually in a moment, but let's keep coding. We've got in features two for 5364 09:33:37,680 --> 09:33:46,320 X. And now this is the output layer. So takes in five features from previous layer and outputs 5365 09:33:46,320 --> 09:33:55,600 a single feature. And now this is same shape. Same shape as why. So what is our next step? We 5366 09:33:55,600 --> 09:34:01,360 want to define a Ford method, a Ford computation of Ford pass. So the Ford method is going to 5367 09:34:01,360 --> 09:34:07,200 define the Ford computation. And as an input, it's going to take X, which is some form of data. 5368 09:34:07,200 --> 09:34:13,440 And now here's where we can use layer one and layer two. So now let's just go return. 5369 09:34:14,720 --> 09:34:19,520 Or we'll put a note here of what we're doing. Three, we're going to go define a Ford method 5370 09:34:19,520 --> 09:34:28,960 that outlines the Ford pass. So Ford, and we're going to return. And here's some notation we 5371 09:34:28,960 --> 09:34:34,000 haven't quite seen yet. And then we're going to go self layer two. And inside the brackets we'll 5372 09:34:34,000 --> 09:34:41,360 have self layer one inside those brackets. We're going to have X. So the way this goes is X goes 5373 09:34:41,360 --> 09:34:50,480 into layer one. And then the output of layer one goes into layer two. So whatever data we have, 5374 09:34:50,480 --> 09:34:58,000 so our training data, X train goes into layer one performs the linear calculation here. And then 5375 09:34:58,000 --> 09:35:07,280 it goes into layer two. And then layer two is going to output, go to the output. So X is the input, 5376 09:35:07,280 --> 09:35:15,600 layer one computation layer two output. So we've done that. Now let's do step four, which is 5377 09:35:16,400 --> 09:35:27,960 instantiate an instance of our model class. And send it to the target device. So this is our model 5378 09:35:27,960 --> 09:35:33,920 class, circle model V zero. We're just going to create a model because it's the first model we've 5379 09:35:33,920 --> 09:35:41,920 created up this section. Let's call it model zero. And we're going to go circle model V one. And then 5380 09:35:41,920 --> 09:35:47,600 we're going to go to two. And we're going to pass in device, because that's our target device. 5381 09:35:47,600 --> 09:35:54,560 Let's now have a look at model zero. And then Oh, typo. Yeah, classic. 5382 09:35:54,560 --> 09:36:02,720 What did we get wrong here? Oh, did we not pass in self self? Oh, there we go. 5383 09:36:06,480 --> 09:36:11,520 Little typo classic. But the beautiful thing about creating a class here is that we could put 5384 09:36:11,520 --> 09:36:16,160 this into a Python file, such as model dot pi. And then we wouldn't necessarily have to rewrite 5385 09:36:16,160 --> 09:36:21,200 this all the time, we could just call it. And so let's just check what the vice it's on. 5386 09:36:21,200 --> 09:36:27,120 So target device is CUDA, because we've got a GPU, thank you, Google Colab. And then if we wanted 5387 09:36:27,120 --> 09:36:36,480 to, let's go next model zero dot parameters, we'll call the parameters, and then we'll go device. 5388 09:36:38,720 --> 09:36:44,320 CUDA beautiful. So that means our models parameters are on the CUDA device. Now we've covered enough 5389 09:36:44,320 --> 09:36:49,520 code in here for this video. So if you want to understand it a little bit more, go back through 5390 09:36:49,520 --> 09:36:54,320 it. But we're going to come back in the next video and make it a little bit more visual. So I'll see 5391 09:36:54,320 --> 09:37:02,080 you there. Welcome back. In the last video, we did something very, very exciting. We created our 5392 09:37:02,080 --> 09:37:07,680 first multi layer neural network. But right now, this is just code on a page. But truly, this is 5393 09:37:07,680 --> 09:37:12,160 what the majority of building machine learning models in PyTorch is going to look like. You're 5394 09:37:12,160 --> 09:37:18,880 going to create some layers, or a simple or as complex as you like. And then you're going to 5395 09:37:18,880 --> 09:37:26,160 use those layers in some form of Ford computation to create the forward pass. So let's make this a 5396 09:37:26,160 --> 09:37:32,240 little bit more visual. If we go over to the TensorFlow playground, and now TensorFlow is another 5397 09:37:32,240 --> 09:37:37,440 deep learning framework similar to PyTorch, it just allows you to write code such as this, 5398 09:37:38,240 --> 09:37:43,040 to build neural networks, fit them to some sort of data to find patterns and data, 5399 09:37:43,040 --> 09:37:50,320 and then use those machine learning models in your applications. But let's create this. Oh, 5400 09:37:50,320 --> 09:37:55,280 by the way, this is playground.tensorFlow.org. This is a neural network that we can train in 5401 09:37:55,280 --> 09:38:01,200 the browser if we really wanted to. So that's pretty darn cool. But we've got a data set here, 5402 09:38:01,200 --> 09:38:07,520 which is kind of similar to the data set that we're working with. We have a look at our circles one. 5403 09:38:07,520 --> 09:38:12,240 Let's just say it's close enough. It's circular. That's what we're after. But if we increase this, 5404 09:38:12,240 --> 09:38:17,280 we've got five neurons now. We've got two features here, X1 and X2. Where is this 5405 09:38:17,280 --> 09:38:21,120 reminding you of what's happening? There's a lot of things going on here that we haven't covered 5406 09:38:21,120 --> 09:38:25,920 yet, but don't worry too much. We're just focused on this neural network here. So we've got some 5407 09:38:25,920 --> 09:38:31,360 features as the input. We've got five hidden units. This is exactly what's going on with the model 5408 09:38:31,360 --> 09:38:38,000 that we just built. We pass in X1 and X2, our values. So if we go back to our data set, 5409 09:38:38,000 --> 09:38:46,720 these are X1 and X2. We pass those in. So we've got two input features. And then we pass them to a 5410 09:38:46,720 --> 09:38:52,240 hidden layer, a single hidden layer, with five neurons. What have we just built? If we come down 5411 09:38:52,240 --> 09:38:59,440 into here to our model, we've got in features two, out features five. And then that feeds into 5412 09:38:59,440 --> 09:39:05,360 another layer, which has in features five and out features one. So this is the exact same model 5413 09:39:05,360 --> 09:39:10,160 that we've built here. Now, if we just turn this back to linear activation, because we're sticking 5414 09:39:10,160 --> 09:39:15,280 with linear for now, we'll have a look at different forms of activation functions later on. And maybe 5415 09:39:15,280 --> 09:39:21,280 we put the learning rate, we've seen the learning rate to 0.01. We've got epochs here, got classification. 5416 09:39:21,920 --> 09:39:25,760 And we're going to try and fit this neural network to this data. Let's see what happens. 5417 09:39:25,760 --> 09:39:37,040 Oh, the test loss, it's sitting about halfway 0.5. So about 50% loss. So if we only have two 5418 09:39:37,040 --> 09:39:44,160 classes and we've got a loss of 50%, what does that mean? Well, the perfect loss was zero. 5419 09:39:44,800 --> 09:39:50,400 And the worst loss was one. Then we just divide one by two and get 50%. But we've only got two 5420 09:39:50,400 --> 09:39:57,280 classes. So that means if our model was just randomly guessing, it would get a loss of about 0.5, 5421 09:39:57,280 --> 09:40:01,520 because you could just randomly guess whatever data point belongs to blue or orange in this case. 5422 09:40:02,080 --> 09:40:07,440 So in a binary classification problem, if you have the same number of samples in each class, 5423 09:40:07,440 --> 09:40:12,720 in this case, blue dots and orange dots, randomly guessing will get you about 50%. Just like tossing 5424 09:40:12,720 --> 09:40:18,080 a coin, toss a coin 100 times and you get about 50 50 might be a little bit different, but it's 5425 09:40:18,080 --> 09:40:24,240 around about that over the long term. So we've just fit for 3000 epochs. And we're still not getting 5426 09:40:24,240 --> 09:40:30,080 any better loss. Hmm. I wonder if that's going to be the case for our neural network. And so to 5427 09:40:30,080 --> 09:40:35,440 draw this in a different way, I'm going to come to a little tool called fig jam, which is just a 5428 09:40:35,440 --> 09:40:40,400 whiteboard that we can put shapes on and it's based on the browser. So this is going to be nothing 5429 09:40:40,400 --> 09:40:49,280 fancy. It's going to be a simple diagram. Say this is our input. And I'm going to make this green 5430 09:40:49,280 --> 09:40:54,240 because my favorite color is green. And then we're going to have, let's make some different 5431 09:40:54,240 --> 09:41:03,360 colored dots. I want a blue dot here. So this can be dot one, and dot two, I'll put another dot 5432 09:41:03,360 --> 09:41:11,120 here. I'll zoom out a little so we have a bit more space. Well, maybe that was too much. 50% 5433 09:41:11,120 --> 09:41:17,280 looks all right. So let me just move this around, move these up a little. So we're building a neural 5434 09:41:17,280 --> 09:41:24,240 network here. This is exactly what we just built. And so we'll go here. Well, maybe we'll put this 5435 09:41:24,240 --> 09:41:29,920 as input X one. So this will make a little bit more sense. And then we'll maybe we can copy this. 5436 09:41:29,920 --> 09:41:39,680 Now this is X two. And then we have some form of output. Let's make this one. And we're going to 5437 09:41:39,680 --> 09:41:48,400 color this orange. So output. Right. So you can imagine how we got connected dots here. 5438 09:41:48,400 --> 09:41:54,320 They will connect these. So our inputs are going to go through all of these. I wonder if I can 5439 09:41:54,320 --> 09:41:59,360 draw here. Okay, this is going to be a little bit more complex, but that's all right. So this 5440 09:41:59,360 --> 09:42:04,560 is what we've done. We've got two input features here. And if we wanted to keep drawing these, 5441 09:42:04,560 --> 09:42:09,280 we could all of these input features are going to go through all of these hidden units that we 5442 09:42:09,280 --> 09:42:14,480 have. I just drew the same arrow twice. That's okay. But this is what's happening in the forward 5443 09:42:14,480 --> 09:42:20,640 computation method. It can be a little bit confusing for when we coded it out. Why is that? Well, 5444 09:42:20,640 --> 09:42:27,040 from here, it looks like we've only got an input layer into a single hidden layer in the blue. 5445 09:42:27,040 --> 09:42:34,320 And an output layer. But truly, this is the same exact shape. You get the point. And then all of 5446 09:42:34,320 --> 09:42:40,480 these go to the output. But we're going to see this computationally later on. So whatever data set 5447 09:42:40,480 --> 09:42:45,040 you're working with, you're going to have to manufacture some form of input layer. Now this 5448 09:42:45,040 --> 09:42:51,040 may be you might have 10 of these if you have 10 features. Or four of them if you have four 5449 09:42:51,040 --> 09:42:58,320 features. And then if you wanted to adjust these, well, you could increase the number of hidden 5450 09:42:58,320 --> 09:43:03,520 units or the number of out features of a layer. What just has to match up is that the layer it's 5451 09:43:03,520 --> 09:43:09,360 going into has to have a similar shape as the what's coming out of here. So just keep that in mind 5452 09:43:09,360 --> 09:43:15,040 as you're going on. And in our case, we only have one output. So we have the output here, 5453 09:43:15,040 --> 09:43:20,880 which is why. So this is a visual version. We've got the TensorFlow playground. You could play 5454 09:43:20,880 --> 09:43:26,960 around with that. You can change this to increase. Maybe you want five hidden layers with five neurons 5455 09:43:26,960 --> 09:43:34,160 in each. This is a fun way to explore. This is a challenge, actually, go to playground.tensorflow.org, 5456 09:43:34,160 --> 09:43:38,960 replicate this network and see if it fits on this type of data. What do you think, will it? 5457 09:43:39,920 --> 09:43:43,840 Well, we're going to have to find out in the next few videos. So I'm going to show you in the 5458 09:43:43,840 --> 09:43:50,720 next video another way to create the network that we just created. This one here with even less 5459 09:43:50,720 --> 09:43:58,080 code than what we've done before. I'll see you there. Welcome back. In the last video, what we 5460 09:43:58,080 --> 09:44:02,240 discussed, well, actually, in the previous video to last, we coded up this neural network here, 5461 09:44:02,240 --> 09:44:08,400 circle model V zero. By subclassing an end or module, we created two linear layers, which are 5462 09:44:08,400 --> 09:44:14,160 capable of handling the shape of our data in features two because why we have two X features. 5463 09:44:14,160 --> 09:44:19,120 Out features were upscaling the two features to five so that it gives our network more of a 5464 09:44:19,120 --> 09:44:24,560 chance to learn. And then because we've upscaled it to five features, the next subsequent layer 5465 09:44:24,560 --> 09:44:29,600 has to be able to handle five features as input. And then we have one output feature because that's 5466 09:44:29,600 --> 09:44:34,320 the same shape as our Y here. Then we got a little bit visual by using the TensorFlow 5467 09:44:34,320 --> 09:44:39,680 playground. Did you try out that challenge, make five in layers with five neurons? Did it work? 5468 09:44:41,200 --> 09:44:45,440 And then we also got a little bit visual in Figma as well. This is just another way of 5469 09:44:45,440 --> 09:44:49,440 visualizing different things. You might have to do this a fair few times when you first start 5470 09:44:49,440 --> 09:44:55,440 with neural networks. But once you get a bit of practice, you can start to infer what's going on 5471 09:44:55,440 --> 09:45:01,440 through just pure code. So now let's keep pushing forward. How about we replicate this 5472 09:45:01,440 --> 09:45:07,280 with a simpler way? Because our network is quite simple, that means it only has two layers. 5473 09:45:07,280 --> 09:45:17,280 That means we can use. Let's replicate the model above using nn.sequential. And I'm going to code 5474 09:45:17,280 --> 09:45:22,480 this out. And then we can look up what nn.sequential is. But I think you'll be able to comprehend what's 5475 09:45:22,480 --> 09:45:29,440 happening by just looking at it. So nn, which is torch.nn. We can do torch.nn, but we've already 5476 09:45:29,440 --> 09:45:37,840 imported nn. We're going to call nn.sequential. And then we're going to go nn.linear. And what 5477 09:45:37,840 --> 09:45:45,920 was the in features of our nn.linear? Well, it was two because we have two in features. And then 5478 09:45:45,920 --> 09:45:50,080 we're going to replicate the same out features. Remember, we could customize this to whatever we 5479 09:45:50,080 --> 09:45:59,040 want 10, 100, 128. I'm going to keep it at five, nice and simple. And then we go nn.linear. And 5480 09:45:59,040 --> 09:46:03,920 the in features of this next layer is going to be five because the out features of the previous 5481 09:46:03,920 --> 09:46:09,840 layer was five. And then finally, the out features here is going to be one because we want one y 5482 09:46:09,840 --> 09:46:15,200 value to our two x features. And then I'm going to send that to the device. And then I'm going to 5483 09:46:15,200 --> 09:46:24,000 have a look at model zero. So this is, of course, going to override our previous model zero. But 5484 09:46:24,000 --> 09:46:28,640 have a look. The only thing different is that this is from the circle model V zero class. We 5485 09:46:28,640 --> 09:46:35,840 subclassed an n dot module. And the only difference is the name here. This is just from sequential. 5486 09:46:36,640 --> 09:46:42,480 And so can you see what's going on here? So as you might have guessed sequential, 5487 09:46:43,280 --> 09:46:49,600 it implements most of this code for us behind the scenes. Because we've told it that it's going 5488 09:46:49,600 --> 09:46:53,840 to be sequential, it's just going to go, hey, step the code through this layer, and then step 5489 09:46:53,840 --> 09:47:00,640 the code through this layer. And outputs basically the same model, rather than us creating our own 5490 09:47:00,640 --> 09:47:04,400 forward method, you might be thinking, Daniel, why don't you show us this earlier? That looks 5491 09:47:04,400 --> 09:47:10,640 like such an easy way to create a neural network compared to this. Well, yes, you're 100% right. 5492 09:47:10,640 --> 09:47:17,520 That is an easier way to create a neural network. However, the benefit of subclassing, and that's 5493 09:47:17,520 --> 09:47:22,720 why I started from here, is that when you have more complex operations, such as things you'd 5494 09:47:22,720 --> 09:47:29,040 like to construct in here, and a more complex forward pass, it's important to know how to 5495 09:47:29,040 --> 09:47:34,080 build your own subclasses of nn dot module. But for simple straightforward stepping through 5496 09:47:34,080 --> 09:47:39,840 each layer one by one, so this layer first, and then this layer, you can use nn dot sequential. 5497 09:47:39,840 --> 09:47:49,360 In fact, we could move this code up into here. So we could do this self dot, we'll call this 5498 09:47:49,360 --> 09:48:00,480 two linear layers equals nn dot sequential. And we could have layer one, we could go self, 5499 09:48:01,760 --> 09:48:10,720 self dot layer one. And or actually, we'll just recode it, we'll go nn dot linear. So it's so 5500 09:48:10,720 --> 09:48:16,320 it's the same code is what we've got below in features. If I could type that'll be great, 5501 09:48:16,320 --> 09:48:24,800 n features equals two, out features equals five. And then we go nn dot linear. And then we go 5502 09:48:24,800 --> 09:48:30,560 n features equals what equals five, because it has to line up out features equals one. 5503 09:48:32,080 --> 09:48:36,480 And then we've got two linear layers. And then if we wanted to get rid of this, return 5504 09:48:36,480 --> 09:48:48,160 to linear layers, and we'll pass it X remake it. There we go. Well, because we've created these as 5505 09:48:48,160 --> 09:48:55,600 well, let's get rid of that. Beautiful. So that's the exact same model, but just using nn dot 5506 09:48:55,600 --> 09:49:00,880 sequential. Now I'm going to get rid of this so that our code is not too verbose. That means a lot 5507 09:49:00,880 --> 09:49:08,080 going on. But this is the flexibility of PyTorch. So just keep in mind that there's a fair few ways 5508 09:49:08,080 --> 09:49:16,080 to make a model. The simplest is probably sequential. And then subclassing is this is a little bit 5509 09:49:16,080 --> 09:49:22,960 more complicated than what we've got. But this can extend to handle lot more complex neural networks, 5510 09:49:22,960 --> 09:49:28,320 which you'll likely have to be building later on. So let's keep pushing forward. Let's see what 5511 09:49:28,320 --> 09:49:32,000 happens if we pass some data through here. So we'll just rerun this cell to make sure we've got 5512 09:49:32,000 --> 09:49:38,480 our model zero instantiated. We'll make some predictions with the model. So of course, if we 5513 09:49:38,480 --> 09:49:46,080 have a look at our model zero dot state deck, oh, this will be a good experiment. So look at this. 5514 09:49:46,080 --> 09:49:53,440 So we have weight, a weight tensor, a bias tensor, a weight tensor, and a bias tensor. So this is 5515 09:49:53,440 --> 09:49:59,040 for the first of the zeroth layer, these two here with the zero dot, and then the one dot weight is 5516 09:49:59,040 --> 09:50:06,320 four, of course, the first index layer. Now have a look inside here. Now you see how out features 5517 09:50:06,320 --> 09:50:14,800 is five. Well, that's why our bias parameter has five values here. And the same thing for this weight 5518 09:50:14,800 --> 09:50:23,200 value here. And the weight value here, why is this have 10 samples? One, two, three, four, five, six, 5519 09:50:23,200 --> 09:50:31,280 seven, eight, nine, 10, because two times five equals 10. So this is just with a simple two layer 5520 09:50:31,280 --> 09:50:36,800 network, look at all the numbers that are going on behind the scenes. Imagine coding all of these 5521 09:50:36,800 --> 09:50:43,600 by hand. Like there's something like 20 numbers or something here. Now we've only done two layers 5522 09:50:43,600 --> 09:50:48,160 here. Now the beauty of this is that in the previous section, we created all of the weight 5523 09:50:48,160 --> 09:50:53,200 and biases using an end dot parameter and random values. You'll notice that these are all random 5524 09:50:53,200 --> 09:50:59,680 two. Again, if yours are different to mine, don't worry too much, because they're going to be started 5525 09:50:59,680 --> 09:51:05,120 randomly and we haven't set a random seed. But the thing to note here is that PyTorch is creating 5526 09:51:05,120 --> 09:51:10,400 all of these parameters for us behind the scenes. And now when we do back propagation and gradient 5527 09:51:10,400 --> 09:51:15,440 descent, when we code our training loop, the optimizer is going to change all of these values 5528 09:51:15,440 --> 09:51:20,720 ever so slightly to try and better fit or better represent the data so that we can split our two 5529 09:51:20,720 --> 09:51:31,440 circles. And so you can imagine how verbose this could get if we had say 50 layers with 128 different 5530 09:51:31,440 --> 09:51:36,320 features of each. So let's change this up, see what happens. Watch how quickly the numbers get 5531 09:51:36,320 --> 09:51:41,680 out of hand. Look at that. We just changed one value and look how many parameters our model has. 5532 09:51:41,680 --> 09:51:48,240 So you might be able to calculate that by hand, but I personally don't want to. So we're going to 5533 09:51:48,240 --> 09:51:53,680 let PyTorch take care of a lot of that for us behind the scenes. So for now we're keeping it simple, 5534 09:51:53,680 --> 09:51:59,360 but that's how we can crack our models open and have a look at what's going on. Now that was a 5535 09:51:59,360 --> 09:52:03,600 little detour. It's time to make some predictions with random numbers. I just wanted to highlight 5536 09:52:03,600 --> 09:52:09,840 the fact that our model is in fact instantiated with random numbers here. So the untrained threads 5537 09:52:09,840 --> 09:52:15,840 model zero, we're going to pass in X test. And of course, we have to send the test data to the 5538 09:52:15,840 --> 09:52:21,200 device. Otherwise, if it's on a different device, we'll get errors because PyTorch likes to make 5539 09:52:21,200 --> 09:52:28,320 calculations on the same device. So we'll go print. Let's do a nice print statement of length of 5540 09:52:28,320 --> 09:52:35,360 predictions. We're going to go length or then untrained threads, we'll pass that in there. 5541 09:52:36,080 --> 09:52:43,040 And then we'll go, oh no, we need to squiggle. And then we'll go shape. Shape is going to be 5542 09:52:43,600 --> 09:52:51,360 untrained spreads dot shape. So this is again, following the data explorer's motto of visualize, 5543 09:52:51,360 --> 09:52:57,680 visualize, visualize. And sometimes print is one of the best ones to do so. So length of test samples, 5544 09:52:58,880 --> 09:53:02,640 you might already know this, or we've already checked this together, haven't we? X test. 5545 09:53:04,160 --> 09:53:11,600 And then we're going to get the shape, which is going to be X test dot shape. Wonderful. And then 5546 09:53:11,600 --> 09:53:19,520 we're going to print. What's our little error here? Oh no, collabs tricking me. So let's go first 5547 09:53:19,520 --> 09:53:28,080 10 predictions. And we're going to go untrained threads. So how do you think these predictions will 5548 09:53:28,080 --> 09:53:34,960 fare? They're doing it with random numbers. And what are we trying to predict again? Well, 5549 09:53:34,960 --> 09:53:41,360 we're trying to predict whether a dot is a red dot or a blue dot or zero or one. And then we'll go 5550 09:53:41,360 --> 09:53:49,680 first 10 labels is going to be, we'll get this on the next line. And we'll go Y test. 5551 09:53:52,560 --> 09:53:59,280 Beautiful. So let's have a look at this untrained predictions. So we have length of predictions 5552 09:53:59,280 --> 09:54:05,760 is 200. Length of test samples is 200. But the shapes are different. What's going on here? 5553 09:54:05,760 --> 09:54:16,400 Y test. And let's have a look at X test. Oh, well, I better just have a look at Y test. 5554 09:54:18,720 --> 09:54:26,560 Why don't we have a two there? Oh, I've done X test dot shape. Oh, let's test samples. That's 5555 09:54:26,560 --> 09:54:32,240 okay. And then the predictions are one. Oh, yes. So Y test. Let's just check the first 10 X test. 5556 09:54:32,240 --> 09:54:38,240 So a little bit of clarification needed here with your shapes. So maybe we'll get this over here 5557 09:54:38,240 --> 09:54:45,440 because I like to do features first and then labels. What did we miss here? Oh, X test 10 5558 09:54:46,080 --> 09:54:50,720 and Y test. See, we're troubleshooting on the fly here. This is what you're going to do with 5559 09:54:50,720 --> 09:54:54,880 a lot of your code. So there's our test values. There's the ideal labels. But our predictions, 5560 09:54:54,880 --> 09:54:58,960 they don't look like our labels. What's going on here? We can see that they're on the CUDA device, 5561 09:54:58,960 --> 09:55:04,480 which is good. We said that. We can see that they got gradient tracking. Oh, we didn't with 5562 09:55:05,040 --> 09:55:10,640 touch. We didn't do inference mode here. That's a poor habit of us. Excuse me. Let's inference 5563 09:55:10,640 --> 09:55:16,320 mode this. There we go. So you notice that the gradient tracking goes away there. And so our 5564 09:55:16,320 --> 09:55:22,640 predictions are nowhere near what our test labels are. But also, they're not even in the same like 5565 09:55:22,640 --> 09:55:29,200 ball park. Like these are whole numbers, one or zero. And these are all floats between one and 5566 09:55:29,200 --> 09:55:36,240 zero. Hmm. So maybe rounding them. Will that do something? So where's our threads here? So 5567 09:55:36,800 --> 09:55:45,600 we go torch dot round. What happens there? Oh, they're all zero. Well, our model is probably 5568 09:55:45,600 --> 09:55:49,680 going to get about 50% accuracy. Why is that? Because all the predictions look like they're 5569 09:55:49,680 --> 09:55:57,280 going to be zero. And they've only got two options, basically head or tails. So when we create our 5570 09:55:57,280 --> 09:56:02,640 model and when we evaluate it, we want our predictions to be in the same format as our labels. But 5571 09:56:02,640 --> 09:56:06,960 we're going to cover some steps that we can take to do that in a second. What's important to take 5572 09:56:06,960 --> 09:56:10,720 away from this is that there's another way to replicate the model we've made above using 5573 09:56:10,720 --> 09:56:15,680 nn dot sequential. We've just replicated the same model as what we've got here. And n dot 5574 09:56:15,680 --> 09:56:21,280 sequential is a simpler way of creating a pytorch model. But it's limited because it literally 5575 09:56:21,280 --> 09:56:26,960 just sequentially steps through each layer in order. Whereas in here, you can get as creative as you 5576 09:56:26,960 --> 09:56:32,480 want with the forward computation. And then inside our model, pytorch has behind the scenes 5577 09:56:32,480 --> 09:56:38,960 created us some weight and bias tensors for each of our layers with regards to the shapes that 5578 09:56:38,960 --> 09:56:46,800 we've set. And so the handy thing about this is that if we got quite ridiculous with our layers, 5579 09:56:46,800 --> 09:56:50,640 pytorch would still do the same thing behind the scenes, create a whole bunch of random numbers for 5580 09:56:50,640 --> 09:56:56,480 us. And because our numbers are random, it looks like our model isn't making very good predictions. 5581 09:56:56,480 --> 09:57:00,560 But we're going to fix this in the next few videos when we move on to 5582 09:57:02,720 --> 09:57:06,560 fitting the model to the data and making a prediction. But before we do that, we need to 5583 09:57:06,560 --> 09:57:11,200 pick up a loss function and an optimizer and build a training loop. So let's get on to these two things. 5584 09:57:13,680 --> 09:57:19,120 Welcome back. So over the past few videos, we've been setting up a classification model to deal 5585 09:57:19,120 --> 09:57:23,600 with our specific shape of data. Now recall, depending on the data set that you're working 5586 09:57:23,600 --> 09:57:28,400 with will depend on what layers you use for now we're keeping it simple and n dot linear is one 5587 09:57:28,400 --> 09:57:33,680 of the most simple layers in pytorch. We've got two input features, we're upscaling that to five 5588 09:57:33,680 --> 09:57:38,960 output features. So we have five hidden units, and then we have one output feature. And that's in line 5589 09:57:38,960 --> 09:57:46,240 with the shape of our data. So two features of x equals one number for y. So now let's continue 5590 09:57:46,240 --> 09:57:52,720 on modeling with where we're up to. We have build or pick a model. So we've built a model. Now we 5591 09:57:52,720 --> 09:57:59,280 need to pick a loss function and optimizer. We're getting good at this. So let's go here, 5592 09:57:59,280 --> 09:58:06,880 set up loss function and optimizer. Now here comes the question. If we're working on classification 5593 09:58:06,880 --> 09:58:13,520 previously, we used, let's go to the next one, and an dot L one loss for regression, which is 5594 09:58:13,520 --> 09:58:19,120 MAE mean absolute error, just a heads up that won't necessarily work with a classification problem. 5595 09:58:19,120 --> 09:58:31,040 So which loss function or optimizer should you use? So again, this is problem specific. But with 5596 09:58:31,040 --> 09:58:37,680 a little bit of practice, you'll get used to using different ones. So for example, for regression, 5597 09:58:38,720 --> 09:58:43,040 you might want, which is regressions predicting a number. And I know it can get fusing because 5598 09:58:43,040 --> 09:58:47,360 it looks like we're predicting a number here, we are essentially predicting a number. But this 5599 09:58:47,360 --> 09:58:54,560 relates to a class. So for regression, you might want MAE or MSE, which is mean absolute 5600 09:58:56,000 --> 09:59:07,920 absolute error, or mean squared error. And for classification, you might want binary cross entropy 5601 09:59:09,120 --> 09:59:16,880 or categorical cross entropy, which is sometimes just referred to as cross entropy. Now, where would 5602 09:59:16,880 --> 09:59:24,000 you find these things out? Well, through the internet, of course. So you could go, what is 5603 09:59:24,000 --> 09:59:30,240 binary cross entropy? I'm going to leave you this for your extra curriculum to read through this. 5604 09:59:30,240 --> 09:59:35,120 We've got a fair few resources here. Understanding binary cross entropy slash log loss 5605 09:59:36,800 --> 09:59:42,880 by Daniel Godoy. Oh, yes. Great first name, my friend. This is actually the article that I 5606 09:59:42,880 --> 09:59:46,480 would recommend to if you want to learn what's going on behind the scenes through binary cross 5607 09:59:46,480 --> 09:59:51,520 entropy. For now, there's a lot of math there. We're going to be writing code to implement this. So 5608 09:59:51,520 --> 09:59:56,720 PyTorch has done this for us. Essentially, what does a loss function do? Let's remind ourselves. 5609 09:59:58,000 --> 10:00:09,280 Go down here. As a reminder, the loss function measures how wrong your models' predictions are. 5610 10:00:09,280 --> 10:00:17,120 So I also going to leave a reference here to I've got a little table here in the book version of 5611 10:00:17,120 --> 10:00:23,120 this course. So 0.2 neural network classification with PyTorch set up loss function and optimizer. 5612 10:00:23,120 --> 10:00:27,280 So we've got some example loss functions and optimizers here, such as stochastic gradient 5613 10:00:27,280 --> 10:00:32,880 descent or SGD optimizer, atom optimizer is also very popular. So I've got problem type here, 5614 10:00:32,880 --> 10:00:37,920 and then the PyTorch code that we can implement this with. We've got binary cross entropy loss. 5615 10:00:37,920 --> 10:00:44,000 We've got cross entropy loss, mean absolute error, MAE, mean squared error, MSE. So you want to use 5616 10:00:44,000 --> 10:00:48,080 these two for regression. There are other different loss functions you could use, but these are some 5617 10:00:48,080 --> 10:00:52,240 of the most common. That's what I'm focusing on, the most common ones that are going to get you 5618 10:00:52,240 --> 10:00:57,440 through a fair few problems. We've got binary classification, multi-class classification. What 5619 10:00:57,440 --> 10:01:03,600 are we working with? We're working with binary classification. So we're going to look at torch.nn 5620 10:01:03,600 --> 10:01:10,000 BCE, which stands for binary cross entropy, loss with logits. What the hell is a logit? 5621 10:01:10,640 --> 10:01:14,880 And BCE loss. Now this is confusing. Then trust me, when I first started using PyTorch, 5622 10:01:14,880 --> 10:01:18,880 I got a little bit confused about why they have two here, but we're going to explore that anyway. 5623 10:01:18,880 --> 10:01:26,160 So what is a logit? So if you search what is a logit, you'll get this and you'll get statistics 5624 10:01:26,160 --> 10:01:30,240 and you'll get the log odds formula. In fact, if you want to read more, I would highly encourage it. 5625 10:01:30,240 --> 10:01:35,120 So you could go through all of this. We're going to practice writing code for it instead. 5626 10:01:35,680 --> 10:01:42,880 Luckily PyTorch does this for us, but logit is kind of confusing in deep learning. So if we go 5627 10:01:42,880 --> 10:01:48,960 what is a logit in deep learning, it kind of means a different thing. It's kind of just a name of what 5628 10:01:49,680 --> 10:01:55,200 yeah, there we go. What is the word logits in TensorFlow? As I said, TensorFlow is another 5629 10:01:55,200 --> 10:02:00,800 deep learning framework. So let's close this. What do we got? We've got a whole bunch of 5630 10:02:00,800 --> 10:02:08,080 definitions here. Logits layer. Yeah. This is one of my favorites. In context of deep learning, 5631 10:02:08,080 --> 10:02:13,920 the logits layer means the layer that feeds into the softmax. So softmax is a form of activation. 5632 10:02:13,920 --> 10:02:17,840 We're going to see all of this later on because this is just words on a page right now. Softmax 5633 10:02:17,840 --> 10:02:22,560 or other such normalization. So the output of the softmax are the probabilities for the 5634 10:02:22,560 --> 10:02:29,200 classification task and its input is the logit's layer. Whoa, there's a lot going on here. So let's 5635 10:02:29,200 --> 10:02:35,360 just take a step back and get into writing some code. And for optimizers, I'm just going to complete 5636 10:02:35,360 --> 10:02:48,720 this. And for optimizers, two of the most common and useful are SGD and Adam. However, PyTorch 5637 10:02:48,720 --> 10:02:55,760 has many built in options. And as you start to learn more about the world of machine learning, 5638 10:02:55,760 --> 10:03:04,320 you'll find that if you go to torch.optim or torch.nn. So if we have.nn, what do we have in here? 5639 10:03:04,320 --> 10:03:09,120 Loss functions. There we go. Beautiful. That's what we're after. L1 loss, which is MAE, 5640 10:03:09,120 --> 10:03:14,560 MSC loss, cross entropy loss, CTC loss, all of these different types of loss here will depend 5641 10:03:14,560 --> 10:03:18,800 on the problem you're working on. But I'm here to tell you that for regression and classification, 5642 10:03:18,800 --> 10:03:24,000 two of the most common of these. See, this is that confusion again. BCE loss, BCE with 5643 10:03:24,000 --> 10:03:30,000 logit's loss. What the hell is a logit? My goodness. Okay, that's enough. And Optim, 5644 10:03:30,000 --> 10:03:34,480 these are different optimizers. We've got probably a dozen or so here. Algorithms. 5645 10:03:35,680 --> 10:03:40,800 Add a delta, add a grad. Adam, this can be pretty full on when you first get started. But for now, 5646 10:03:40,800 --> 10:03:46,560 just stick with SGD and the atom optimizer. They're two of the most common. Again, they may not 5647 10:03:46,560 --> 10:03:51,840 perform the best on every single problem, but they will get you fairly far just knowing those. 5648 10:03:51,840 --> 10:03:57,520 And then you'll pick up some of these extra ones as you go. But let's just get rid of all of, 5649 10:03:57,520 --> 10:04:05,440 maybe we'll, so I'll put this in here, this link. So we'll create our loss function. For the loss 5650 10:04:05,440 --> 10:04:20,240 function, we're going to use torch.nn.bce with logit's loss. This is exciting. For more on what 5651 10:04:21,440 --> 10:04:27,600 binary cross entropy, which is BCE, a lot of abbreviations in machine learning and deep learning 5652 10:04:27,600 --> 10:04:39,280 is check out this article. And then for a definition on what a logit is, we're going to see a 5653 10:04:39,280 --> 10:04:44,320 logit in a second in deep learning. Because again, deep learning is one of those fields, 5654 10:04:44,320 --> 10:04:48,560 a machine learning, which likes to be a bit rebellious, you know, likes to be a bit different 5655 10:04:48,560 --> 10:04:53,600 from the pure mathematics type of fields and statistics in general. It's this beautiful 5656 10:04:53,600 --> 10:05:03,280 gestaltism and for different optimizers, see torch dot opt in. But we've covered a few of these 5657 10:05:03,280 --> 10:05:11,920 things before. And finally, I'm going to put up here, and then for some common choices of loss 5658 10:05:11,920 --> 10:05:17,920 functions and optimizers. Now, don't worry too much. This is why I'm linking all of these extra 5659 10:05:17,920 --> 10:05:23,280 resources. A lot of this is covered in the book. So as we just said, set up loss function, 5660 10:05:23,280 --> 10:05:27,760 optimizer, we just talked about these things. But I mean, you can just go to this book website 5661 10:05:27,760 --> 10:05:33,200 and reference it. Oh, we don't want that. We want this link. Come on, I knew you can't even 5662 10:05:33,200 --> 10:05:37,600 copy and paste. How are you supposed to code? I know I've been promising code this whole time, 5663 10:05:37,600 --> 10:05:43,440 so let's write some. So let's set up the loss function. What did we say it was? We're going to 5664 10:05:43,440 --> 10:05:54,160 call it L O double S F N for loss function. And we're going to call B C E with logit's loss. So B 5665 10:05:54,160 --> 10:06:01,040 C E with logit's loss. This has the sigmoid activation function built in. And we haven't covered what 5666 10:06:01,040 --> 10:06:06,240 the sigmoid activation function is, but we are going to don't you worry about that built in. 5667 10:06:07,120 --> 10:06:11,360 In fact, if you wanted to learn what the sigmoid activation function is, how could you find out 5668 10:06:11,360 --> 10:06:17,520 sigmoid activation function? But we're going to see it in action. Activation functions in neural 5669 10:06:17,520 --> 10:06:21,760 networks. This is the beautiful thing about machine learning. There's so much stuff out there. 5670 10:06:21,760 --> 10:06:26,080 People have written some great articles. You've got formulas here. PyTorch has implemented that 5671 10:06:26,080 --> 10:06:33,120 behind the scenes for us. So thank you, PyTorch. But if you recall, sigmoid activation function 5672 10:06:33,120 --> 10:06:38,640 built in, where did we discuss the architecture of a classification network? What do we have here? 5673 10:06:38,640 --> 10:06:44,720 Right back in the zeroth chapter of this little online book thing that we heard here. Binary 5674 10:06:44,720 --> 10:06:52,560 classification. We have output activation. Oh, oh, look at that. So sigmoid torch dot sigmoid and 5675 10:06:52,560 --> 10:06:59,040 pytorch. All right. And then for multi class classification, we need the softmax. Okay. Names 5676 10:06:59,040 --> 10:07:04,880 on a page again, but this is just a reference table so we can keep coming back to. So let's just 5677 10:07:04,880 --> 10:07:11,520 keep going with this. I just want to highlight the fact that nn dot BCE loss also exists. So 5678 10:07:12,400 --> 10:07:23,760 this requires BCE loss equals requires inputs to have gone through the sigmoid activation function 5679 10:07:24,960 --> 10:07:33,200 prior to input to BCE loss. And so let's look up the documentation. I'm going to comment that 5680 10:07:33,200 --> 10:07:37,040 out because we're going to stick with using this one. Now, why would we stick with using this one? 5681 10:07:37,040 --> 10:07:41,920 Let's check out the documentation, hey, torch dot nn. And I realized this video is all over the 5682 10:07:41,920 --> 10:07:48,240 place, but we're going to step back through BCE loss with logits. Did I even say this right? 5683 10:07:49,760 --> 10:07:55,520 With logits loss. So with I got the width around the wrong way. So let's check this out. So this 5684 10:07:55,520 --> 10:08:02,400 loss combines a sigmoid layer with the BCE loss in one single class. So if we go back to the code, 5685 10:08:02,400 --> 10:08:11,760 BCE loss is this. So if we combined an n dot sequential, and then we passed in an n dot sigmoid, 5686 10:08:11,760 --> 10:08:22,240 and then we went and then dot BCE loss, we'd get something similar to this. But if we keep reading 5687 10:08:22,240 --> 10:08:27,360 in the documentation, because that's just I just literally read that it combines sigmoid with BCE 5688 10:08:27,360 --> 10:08:33,520 loss. But if we go back to the documentation, why do we want to use it? So this version is more 5689 10:08:33,520 --> 10:08:41,680 numerically stable than using a plain sigmoid by a BCE loss, followed by a BCE loss. As by 5690 10:08:41,680 --> 10:08:47,520 combining the operations into one layer, we take advantage of the log sum x trick for numerical 5691 10:08:47,520 --> 10:08:53,120 stability, beautiful. So if we use this log function, loss function for our binary cross entropy, 5692 10:08:53,120 --> 10:08:59,600 we get some numeric stability. Wonderful. So there's our loss function. We've got the sigmoid 5693 10:08:59,600 --> 10:09:05,360 activation function built in. And so we're going to see the difference between them later on, 5694 10:09:05,360 --> 10:09:11,840 like in the flesh, optimizer, we're going to choose, hmm, let's stick with SGD, hey, 5695 10:09:11,840 --> 10:09:17,040 old faithful stochastic gradient descent. And we have to set the parameters here, the parameters 5696 10:09:17,040 --> 10:09:23,280 parameter params equal to our model parameters would be like, hey, stochastic gradient descent, 5697 10:09:23,280 --> 10:09:30,720 please update. If we get another code cell behind here, please update our model parameters model 5698 10:09:32,240 --> 10:09:37,440 with respect to the loss, because we'd like our loss function to go down. So these two are going 5699 10:09:37,440 --> 10:09:42,480 to work in tandem again, when we write our training loop, and we'll set our learning rate to 0.1. 5700 10:09:42,480 --> 10:09:46,960 We'll see where that gets us. So that's what the optimizer is going to do. It's going to optimize 5701 10:09:46,960 --> 10:09:52,400 all of these parameters for us, which is amazing. And the principal would be the same, even if there 5702 10:09:52,400 --> 10:09:59,280 was 100 layers here, and 10,000, a million different parameters here. So we've got a loss function, 5703 10:09:59,280 --> 10:10:05,440 we've got an optimizer. And how about we create an evaluation metric. So let's calculate 5704 10:10:06,240 --> 10:10:11,680 accuracy at the same time. Because that's very helpful with classification problems is accuracy. 5705 10:10:11,680 --> 10:10:21,120 Now, what is accuracy? Well, we could look up formula for accuracy. So true positive over true 5706 10:10:21,120 --> 10:10:25,920 positive plus true negative times 100. Okay, let's see if we can implement something similar to that 5707 10:10:25,920 --> 10:10:32,800 just using pure pytorch. Now, why would we want accuracy? Because the accuracy is out of 100 5708 10:10:32,800 --> 10:10:42,960 examples. What percentage does our model get right? So for example, if we had a coin toss, 5709 10:10:42,960 --> 10:10:50,240 and we did 100 coin tosses in our model predicted heads 99 out of 100 times, and it was right 5710 10:10:50,240 --> 10:10:56,320 every single time, it might have an accuracy of 99%, because it got one wrong. So 99 out of 100, 5711 10:10:56,320 --> 10:11:07,360 it gets it right. So dev accuracy FN accuracy function, we're going to pass it y true. So 5712 10:11:07,360 --> 10:11:12,480 remember, any type of evaluation function or loss function is comparing the predictions to 5713 10:11:12,480 --> 10:11:19,280 the ground truth labels. So correct equals, this is going to see how many of our y true 5714 10:11:19,280 --> 10:11:27,440 or y threads are correct. So torch equal stands for, hey, how many of these samples y true are 5715 10:11:27,440 --> 10:11:32,560 equal to y pred? And then we're going to get the sum of that, and we need to get the item from 5716 10:11:32,560 --> 10:11:38,080 that because we want it as a single value in Python. And then we're going to calculate the 5717 10:11:38,080 --> 10:11:45,520 accuracy, ACC stands for accuracy, equals correct, divided by the length of samples that we have 5718 10:11:45,520 --> 10:11:52,720 as input. And then we're going to times that by 100, and then return the accuracy. Wonderful. 5719 10:11:52,720 --> 10:11:57,520 So we now have an accuracy function, we're going to see how all the three of these come into play 5720 10:11:57,520 --> 10:12:02,160 when we write a training loop, which we might as we get started on the next few videos, hey, 5721 10:12:02,960 --> 10:12:09,600 I'll see you there. Welcome back. In the last video, we discussed some different loss function 5722 10:12:09,600 --> 10:12:15,200 options for our classification models. So we learned that if we're working with binary cross 5723 10:12:15,200 --> 10:12:21,120 entropy or binary classification problems, we want to use binary cross entropy. And pie torch 5724 10:12:21,120 --> 10:12:27,040 has two different times of binary cross entropy, except one is a bit more numerically stable. 5725 10:12:27,040 --> 10:12:31,840 That's the BCE with logit's loss, because it has a sigmoid activation function built in. 5726 10:12:31,840 --> 10:12:37,520 So that's straight from the pie to its documentation. And that for optimizer wise, we have a few 5727 10:12:37,520 --> 10:12:42,560 different choices as well. So if we check out this section here on the pie torch book, we have a 5728 10:12:42,560 --> 10:12:47,360 few different loss functions and optimizers for different problems and the pie torch code that 5729 10:12:47,360 --> 10:12:52,400 we can implement. But the premise is still the same across the board of different problems. 5730 10:12:52,400 --> 10:12:58,800 The loss function measures how wrong our model is. And the goal of the optimizer is to optimize 5731 10:12:58,800 --> 10:13:05,680 the model parameters in such a way that the loss function goes down. And we also implemented our 5732 10:13:05,680 --> 10:13:13,360 own accuracy function metric, which is going to evaluate our models predictions using accuracy 5733 10:13:13,360 --> 10:13:21,040 as an evaluation metric, rather than just loss. So let's now work on training a model. 5734 10:13:22,000 --> 10:13:28,640 So what should we do first? Well, do you remember the steps in a pie torch training loop? 5735 10:13:28,640 --> 10:13:39,680 So to train our model, we're going to need to build a training loop. So if you watch the video 5736 10:13:39,680 --> 10:13:47,680 on the pie torch, so if you can Google unofficial pie torch song, you should find my, there we go, 5737 10:13:47,680 --> 10:13:51,920 the unofficial pie torch optimization loop song. We're not going to watch that. That's going to 5738 10:13:51,920 --> 10:13:56,640 be a little tidbit for the steps that we're going to code out. But that's just a fun little jingle 5739 10:13:56,640 --> 10:14:01,840 to remember these steps here. So if we go into the book section, this is number three train model, 5740 10:14:01,840 --> 10:14:07,920 exactly where we're up to here. But we have pie torch training loop steps. Remember, for an 5741 10:14:07,920 --> 10:14:16,160 epoch in a range, do the forward pass, calculate the loss, optimizer zero grand, loss backward, 5742 10:14:16,160 --> 10:14:22,560 optimizer step, step, step. We keep singing this all day. You could keep reading those steps all 5743 10:14:22,560 --> 10:14:30,640 day, but it's better to code them. But let's write this out. So forward pass to calculate the loss, 5744 10:14:31,520 --> 10:14:40,320 three, optimizer zero grad, four. What do we do? Loss backward. So back propagation, 5745 10:14:40,320 --> 10:14:45,920 I'll just write that up in here back propagation. We've linked to some extra resources. If you'd 5746 10:14:45,920 --> 10:14:51,920 like to find out what's going on in back propagation, we're focused on code here, and then gradient 5747 10:14:51,920 --> 10:15:06,160 descent. So optimizer step. So build a training loop with the following steps. However, I've kind 5748 10:15:06,160 --> 10:15:10,400 of mentioned a few things that need to be taken care of before we talk about the forward pass. 5749 10:15:10,400 --> 10:15:16,880 So we've talked about logits. We looked up what the hell is a logit. So if we go into this stack 5750 10:15:16,880 --> 10:15:21,760 overflow answer, we saw machine learning, what is a logit? How about we see that? We need to 5751 10:15:21,760 --> 10:15:27,280 do a few steps. So I'm going to write this down. Let's get a bit of clarity about us, Daniel. 5752 10:15:27,280 --> 10:15:30,400 We're kind of all over the place at the moment, but that's all right. That's the exciting part 5753 10:15:30,400 --> 10:15:39,600 of machine learning. So let's go from going from raw logits to prediction probabilities 5754 10:15:40,480 --> 10:15:47,200 to prediction labels. That's what we want. Because to truly evaluate our model, we want to 5755 10:15:47,200 --> 10:15:56,400 so let's write in here our model outputs going to be raw logit. So that's the definition of a 5756 10:15:56,400 --> 10:16:00,720 logit in machine learning and deep learning. You might read some few other definitions, but for us, 5757 10:16:00,720 --> 10:16:07,920 the raw outputs of our model, model zero are going to be referred to as logits. So then model zero, 5758 10:16:07,920 --> 10:16:18,960 so whatever comes out of here are logits. So we can convert these logits into prediction probabilities 5759 10:16:20,560 --> 10:16:33,440 by passing them to some kind of activation function, e.g. sigmoid for binary cross entropy 5760 10:16:33,440 --> 10:16:45,680 and softmax for multi-class classification. I've got binary class e-fication. I have to 5761 10:16:45,680 --> 10:16:51,920 sound it out every time I spell it for binary classification. So class e-fication. So we're 5762 10:16:51,920 --> 10:16:57,520 going to see multi-class classification later on, but we want some prediction probabilities. 5763 10:16:57,520 --> 10:17:02,160 We're going to see what they look like in a minute. So we want to go from logits to prediction 5764 10:17:02,160 --> 10:17:13,920 probabilities to prediction labels. Then we can convert our model's prediction probabilities to 5765 10:17:15,120 --> 10:17:24,480 prediction labels by either rounding them or taking the argmax. 5766 10:17:24,480 --> 10:17:34,560 So round is for binary classification and argmax will be for the outputs of the softmax activation 5767 10:17:34,560 --> 10:17:41,040 function, but let's see it in action first. So I've called the logits are the raw outputs of our 5768 10:17:41,040 --> 10:17:49,360 model with no activation function. So view the first five outputs of the forward pass 5769 10:17:49,360 --> 10:17:56,960 on the test data. So of course, our model is still instantiated with random values. So we're 5770 10:17:56,960 --> 10:18:02,960 going to set up a variable here, y logits, and model zero, we're going to pass at the test data. 5771 10:18:02,960 --> 10:18:09,760 So x test, not text, two device, because our model is currently on our CUDA device and we need 5772 10:18:09,760 --> 10:18:15,680 our test data on the same device or target device. Remember, that's why we're writing device 5773 10:18:15,680 --> 10:18:20,960 agnostic codes. So this would work regardless of whether there's a GPU active or not. Let's have 5774 10:18:20,960 --> 10:18:27,840 a look at the logits. Oh, okay. Right now, we've got some positive values here. And we can see that 5775 10:18:27,840 --> 10:18:33,760 they're on the CUDA device. And we can see that they're tracking gradients. Now, ideally, 5776 10:18:34,480 --> 10:18:40,080 we would have run torch dot inference mode here, because we're making predictions. And the rule 5777 10:18:40,080 --> 10:18:44,080 of thumb is whenever you make predictions with your model, you turn it into a vowel mode. 5778 10:18:44,080 --> 10:18:48,880 We just have to remember to turn it back to train when we want to train and you run torch dot 5779 10:18:48,880 --> 10:18:53,280 inference mode. So we get a very similar set up here. We just don't have the gradients being 5780 10:18:53,280 --> 10:19:00,480 tracked anymore. Okay. So these are called logits. The logits are the raw outputs of our model, 5781 10:19:00,480 --> 10:19:06,800 without being passed to any activation function. So an activation function is something a little 5782 10:19:06,800 --> 10:19:13,840 separate from a layer. So if we come up here, we've used layer. So in the neural networks that we 5783 10:19:13,840 --> 10:19:19,360 start to build and the ones that you'll subsequently build are comprised of layers and activation 5784 10:19:19,360 --> 10:19:24,000 functions, we're going to make the concept of an activation function a little bit more clear later 5785 10:19:24,000 --> 10:19:30,080 on. But for now, just treat it all as some form of mathematical operation. So if we were to pass 5786 10:19:30,080 --> 10:19:35,680 data through this model, what is happening? Well, it's going through the linear layer. Now recall, 5787 10:19:35,680 --> 10:19:40,960 we've seen this a few times now torch and then linear. If we pass data through a linear layer, 5788 10:19:40,960 --> 10:19:47,040 it's applying the linear transformation on the incoming data. So it's performing this 5789 10:19:47,040 --> 10:19:53,600 mathematical operation behind the scenes. So why the output equals the input x multiplied by a 5790 10:19:53,600 --> 10:19:59,200 weight tensor a this could really be w which is transposed so that this is doing a dot product 5791 10:19:59,200 --> 10:20:05,120 plus a bias term here. And then if we jump into our model state deck, we've got weight 5792 10:20:05,120 --> 10:20:09,600 and we've got bias. So that's the formula that's happening in these two layers. It will be different 5793 10:20:09,600 --> 10:20:14,240 depending on the layer that we choose. But for now, we're sticking with linear. And so that the 5794 10:20:14,240 --> 10:20:21,120 raw output of our data going through our two layers, the logits is going to be this information 5795 10:20:21,120 --> 10:20:30,640 here. However, it's not in the same format as our test data. And so if we want to make a comparison 5796 10:20:30,640 --> 10:20:36,640 to how good our model is performing, we need apples to apples. So we need this in the same format 5797 10:20:36,640 --> 10:20:44,000 as this, which is not of course. So we need to go to a next step. Let's use the sigmoid. So use the 5798 10:20:44,000 --> 10:20:54,720 sigmoid activation function on our model logits. So why are we using sigmoid? Well, recall in a 5799 10:20:54,720 --> 10:21:02,320 binary classification architecture, the output activation is the sigmoid function here. So now 5800 10:21:02,320 --> 10:21:07,840 let's jump back into here. And we're going to create some predprobs. And what this stands for 5801 10:21:07,840 --> 10:21:16,000 on our model logits to turn them into prediction probabilities, probabilities. So why predprobs 5802 10:21:16,000 --> 10:21:23,200 equals torch sigmoid, why logits? And now let's have a look at why predprobs. What do we get from 5803 10:21:23,200 --> 10:21:29,760 this? Oh, when we still get numbers on a page, goodness gracious me. But the important point 5804 10:21:29,760 --> 10:21:36,160 now is that they've gone through the sigmoid activation function, which is now we can pass these 5805 10:21:37,200 --> 10:21:42,480 to a torch dot round function. Let's have a look at this torch dot round. And what do we get? 5806 10:21:42,480 --> 10:21:52,560 Predprobs. Oh, the same format as what we've got here. Now you might be asking like, why don't we 5807 10:21:52,560 --> 10:21:59,360 just put torch dot round here? Well, that's a little, this step is required to, we can't just do it on 5808 10:21:59,360 --> 10:22:04,880 the raw logits. We need to use the sigmoid activation function here to turn it into prediction 5809 10:22:04,880 --> 10:22:10,960 probabilities. And now what is a prediction probability? Well, that's a value usually between 0 and 1 5810 10:22:10,960 --> 10:22:16,960 for how likely our model thinks it's a certain class. And in the case of binary cross entropy, 5811 10:22:16,960 --> 10:22:24,240 these prediction probability values, let me just write this out in text. So for our prediction 5812 10:22:24,240 --> 10:22:39,440 probability values, we need to perform a range style rounding on them. So this is a decision 5813 10:22:39,440 --> 10:22:48,400 boundary. So this will make more sense when we go why predprobs, if it's equal to 0.5 or greater 5814 10:22:48,400 --> 10:22:59,360 than 0.5, we set y equal to one. So y equal one. So class one, whatever that is, a red dot or a 5815 10:22:59,360 --> 10:23:08,800 blue dot, and then why predprobs, if it is less than 0.5, we set y equal to zero. So this is class 5816 10:23:08,800 --> 10:23:18,080 zero. You can also adjust this decision boundary. So say, if you wanted to increase this value, 5817 10:23:18,080 --> 10:23:28,560 so anything over 0.7 is one. And below that is zero. But generally, most commonly, you'll find 5818 10:23:28,560 --> 10:23:35,920 it split at 0.5. So let's keep going. Let's actually see this in action. So how about we 5819 10:23:35,920 --> 10:23:47,520 recode this? So find the predicted probabilities. And so we want no, sorry, we want the predicted 5820 10:23:47,520 --> 10:23:52,960 labels, that's what we want. So when we're evaluating our model, we want to convert the outputs of 5821 10:23:52,960 --> 10:23:58,640 our model, the outputs of our model are here, the logits, the raw outputs of our model are 5822 10:23:58,640 --> 10:24:06,320 logits. And then we can convert those logits to prediction probabilities using the sigmoid function 5823 10:24:06,320 --> 10:24:14,240 on the logits. And then we want to find the predicted labels. So we go raw logits output of our model, 5824 10:24:14,240 --> 10:24:19,600 prediction probabilities after passing them through an activation function, and then prediction labels. 5825 10:24:19,600 --> 10:24:26,080 This is the steps we want to take with the outputs of our model. So find the predicted labels. 5826 10:24:26,080 --> 10:24:31,120 Let's go in here a little bit different to our regression problem previously, but nothing we can't 5827 10:24:31,120 --> 10:24:39,120 handle. Torch round, we're going to go y-pred-probs. So I like to name it y-pred-probs for prediction 5828 10:24:39,120 --> 10:24:47,040 probabilities and y-preds for prediction labels. Now let's go in full if we wanted to. So y-pred 5829 10:24:47,040 --> 10:24:54,480 labels equals torch dot round torch dot sigmoid. So sigmoid activation function for binary cross 5830 10:24:54,480 --> 10:25:01,680 entropy and model zero x test dot two device. Truly this should be within inference mode code, 5831 10:25:01,680 --> 10:25:08,560 but for now we'll just leave it like this to have a single example of what's going on here. 5832 10:25:08,560 --> 10:25:13,840 Now I just need to count one, two, there we want. That's where we want the index. We just want it 5833 10:25:13,840 --> 10:25:22,880 on five examples. So check for equality. And we want print torch equal. We're going to check 5834 10:25:22,880 --> 10:25:34,240 y-pred's dot squeeze is equal to y-pred labels. So just we're doing the exact same thing. And we 5835 10:25:34,240 --> 10:25:38,720 need squeeze here to get rid of the extra dimension that comes out. You can try doing this without 5836 10:25:38,720 --> 10:25:49,600 squeeze. So get rid of extra dimension once again. We want y-pred's dot squeeze. Fair bit of code 5837 10:25:49,600 --> 10:25:56,640 there, but this is what's happened here. We create y-pred's. So we turn the y-pred 5838 10:25:56,640 --> 10:26:03,280 probes into y-pred's. And then we just do the full step over again. So we make predictions with 5839 10:26:03,280 --> 10:26:13,440 our model, we get the raw logits. So this is logits to pred probes to pred labels. So the raw 5840 10:26:13,440 --> 10:26:18,720 outputs of our model are logits. We turn the logits into prediction probabilities using torch 5841 10:26:18,720 --> 10:26:25,600 sigmoid. And we turn the prediction probabilities into prediction labels using torch dot round. 5842 10:26:25,600 --> 10:26:31,200 And we fulfill this criteria here. So everything above 0.5. This is what torch dot round does. 5843 10:26:31,200 --> 10:26:37,280 Turns it into a 1. Everything below 0.5 turns it into a 0. The predictions right now are going to 5844 10:26:37,280 --> 10:26:44,640 be quite terrible because our model is using random numbers. But y-pred's found with the steps above 5845 10:26:44,640 --> 10:26:50,400 is the same as y-pred labels doing the more than one hit. Thanks to this check for equality using 5846 10:26:50,400 --> 10:26:56,400 torch equal y-pred's dot squeeze. And we just do the squeeze to get rid of the extra dimensions. 5847 10:26:56,400 --> 10:27:03,280 And we have out here some labels that look like our actual y-test labels. They're in the same format, 5848 10:27:03,280 --> 10:27:10,000 but of course they're not the same values because this model is using random weights to make predictions. 5849 10:27:10,000 --> 10:27:17,280 So we've done a fair few steps here, but I believe we are now in the right space to start building 5850 10:27:17,280 --> 10:27:24,480 a training a test loop. So let's write that down here 3.2 building a training and testing loop. 5851 10:27:24,480 --> 10:27:29,280 You might want to have a go at this yourself. So we've got all the steps that we need to do the 5852 10:27:29,280 --> 10:27:35,200 forward pass. But the reason we've done this step here, the logits, then the pred probes and the 5853 10:27:35,200 --> 10:27:43,680 pred labels, is because the inputs to our loss function up here, this requires, so BCE with 5854 10:27:43,680 --> 10:27:50,560 logits loss, requires what? Well, we're going to see that in the next video, but I'd encourage 5855 10:27:50,560 --> 10:27:55,840 you to give it a go at implementing these steps here. Remember the jingle for an epoch in a range, 5856 10:27:55,840 --> 10:28:02,720 do the forward pass, calculate the loss, which is BC with logits loss, optimise a zero grad, 5857 10:28:02,720 --> 10:28:10,560 which is this one here, last backward, optimise a step, step, step. Let's do that together in the 5858 10:28:10,560 --> 10:28:17,440 next video. Welcome back. In the last few videos, we've been working through creating a model for 5859 10:28:17,440 --> 10:28:22,240 a classification problem. And now we're up to training a model. And we've got some steps here, 5860 10:28:22,240 --> 10:28:29,520 but we started off by discussing the concept of logits. Logits are the raw output of the model, 5861 10:28:29,520 --> 10:28:33,920 whatever comes out of the forward functions of the layers in our model. And then we discussed how 5862 10:28:33,920 --> 10:28:38,160 we can turn those logits into prediction probabilities using an activation function, 5863 10:28:38,160 --> 10:28:44,720 such as sigmoid for binary classification, and softmax for multi class classification. 5864 10:28:44,720 --> 10:28:48,720 We haven't seen softmax yet, but we're going to stick with sigmoid for now because we have 5865 10:28:48,720 --> 10:28:54,080 binary classification data. And then we can convert that from prediction probabilities 5866 10:28:54,080 --> 10:28:58,960 to prediction labels. Because remember, when we want to evaluate our model, we want to compare 5867 10:28:58,960 --> 10:29:06,640 apples to apples. We want our models predictions to be in the same format as our test labels. 5868 10:29:06,640 --> 10:29:12,000 And so I took a little break after the previous video. So my collab notebook has once again 5869 10:29:12,000 --> 10:29:17,200 disconnected. So I'm just going to run all of the cells before here. It's going to reconnect up 5870 10:29:17,200 --> 10:29:22,800 here. We should still have a GPU present. That's a good thing about Google collab is that if you 5871 10:29:22,800 --> 10:29:30,880 change the runtime type to GPU, it'll save that wherever it saves the Google collab notebook, 5872 10:29:30,880 --> 10:29:36,000 so that when you restart it, it should still have a GPU present. And how can we check that, 5873 10:29:36,000 --> 10:29:41,280 of course, while we can type in device, we can run that cell. And we can also check 5874 10:29:42,000 --> 10:29:48,720 Nvidia SMI. It'll tell us if we have an Nvidia GPU with CUDA enabled ready to go. 5875 10:29:48,720 --> 10:29:58,080 So what's our device? CUDA. Wonderful. And Nvidia SMI. Excellent. I have a Tesla P100 GPU. 5876 10:29:58,080 --> 10:30:03,840 Ready to go. So with that being said, let's start to write a training loop. Now we've done this before, 5877 10:30:03,840 --> 10:30:09,840 and we've got the steps up here. Do the forward pass, calculate the loss. We've spent enough on 5878 10:30:09,840 --> 10:30:14,000 this. So we're just going to start jumping into write code. There is a little tidbit in this one, 5879 10:30:14,000 --> 10:30:19,040 though, but we'll conquer that when we get to it. So I'm going to set a manual seed, 5880 10:30:20,240 --> 10:30:26,160 torch top manual seed. And I'm going to use my favorite number 42. This is just to ensure 5881 10:30:26,160 --> 10:30:32,000 reproducibility, if possible. Now I also want you to be aware of there is also another 5882 10:30:32,000 --> 10:30:39,280 form of random seed manual seed, which is a CUDA random seed. Do we have the PyTorch? 5883 10:30:39,280 --> 10:30:51,680 Yeah, reproducibility. So torch dot CUDA dot manual seed dot seed. Hmm. There is a CUDA 5884 10:30:51,680 --> 10:31:02,240 seed somewhere. Let's try and find out. CUDA. I think PyTorch have just had an upgrade to 5885 10:31:02,240 --> 10:31:09,600 their documentation. Seed. Yeah, there we go. Okay. I knew it was there. So torch dot CUDA 5886 10:31:09,600 --> 10:31:15,040 dot manual seed. So if we're using CUDA, we have a CUDA manual seed as well. So let's see what 5887 10:31:15,040 --> 10:31:21,280 happens if we put that to watch that CUDA dot manual seed 42. We don't necessarily have to put 5888 10:31:21,280 --> 10:31:26,320 these. It's just to try and get as reproducible as numbers as possible on your screen and my screen. 5889 10:31:26,880 --> 10:31:31,520 Again, what is more important is not necessarily the numbers exactly being the same lining up 5890 10:31:31,520 --> 10:31:36,800 between our screens. It's more so the direction of which way they're going. So let's set the number 5891 10:31:36,800 --> 10:31:43,200 of epochs. We're going to train for 100 epochs. epochs equals 100. But again, as you might have 5892 10:31:43,200 --> 10:31:48,560 guessed, the CUDA manual seed is for if you're doing operations on a CUDA device, which in our 5893 10:31:48,560 --> 10:31:54,880 case, we are. Well, then perhaps we'd want them to be as reproducible as possible. So speaking of 5894 10:31:54,880 --> 10:32:00,800 CUDA devices, let's put the data to the target device because we're working with or we're writing 5895 10:32:00,800 --> 10:32:06,400 data agnostic code here. So I'm going to write x train y train equals x train two device, 5896 10:32:06,400 --> 10:32:12,400 comma y train dot two device, that'll take care of the training data. And I'm going to do the 5897 10:32:12,400 --> 10:32:19,120 same for the testing data equals x test two device. Because if we're going to run our model 5898 10:32:20,400 --> 10:32:25,840 on the CUDA device, we want our data to be there too. And the way we're writing our code, 5899 10:32:25,840 --> 10:32:31,280 our code is going to be device agnostic. Have I said that enough yet? So let's also build our 5900 10:32:31,280 --> 10:32:35,840 training and evaluation loop. Because we've covered the steps in here before, we're going to start 5901 10:32:35,840 --> 10:32:41,360 working a little bit faster through here. And don't worry, I think you're up to it. So for an epoch 5902 10:32:41,360 --> 10:32:46,800 in a range of epochs, what do we do? We start with training. So let me just write this. 5903 10:32:48,320 --> 10:32:53,200 Training model zero dot train. That's the model we're working with. We call the train, 5904 10:32:53,200 --> 10:32:57,440 which is the default, but we're going to do that anyway. And as you might have guessed, 5905 10:32:57,440 --> 10:33:02,880 the code that we're writing here is, you can functionize this. So we're going to do this later 5906 10:33:02,880 --> 10:33:07,440 on. But just for now, the next couple of videos, the next module or two, we're going to keep 5907 10:33:07,440 --> 10:33:14,080 writing out the training loop in full. So this is the part, the forward pass, where there's a 5908 10:33:14,080 --> 10:33:18,880 little bit of a tidbit here compared to what we've done previously. And that is because we're 5909 10:33:18,880 --> 10:33:24,400 outputting raw logits here, if we just pass our data straight to the model. So model zero 5910 10:33:24,400 --> 10:33:30,080 x train. And we're going to squeeze them here to get rid of an extra dimension. You can try to 5911 10:33:30,080 --> 10:33:34,400 see what the output sizes look like without squeeze. But we're just going to call squeeze 5912 10:33:34,400 --> 10:33:39,760 here. Remember, squeeze removes an extra one dimension from a tensor. And then to convert it 5913 10:33:39,760 --> 10:33:47,520 into prediction labels, we go torch dot round. And torch dot sigmoid, because torch dot sigmoid 5914 10:33:47,520 --> 10:33:53,040 is an activation function, which is going to convert logits into what convert the logits 5915 10:33:53,040 --> 10:33:57,680 into prediction probabilities. So why logits? And I'm just going to put a note here. So this 5916 10:33:57,680 --> 10:34:08,880 is going to go turn logits into pred probes into pred labels. So we've done the forward pass. 5917 10:34:08,880 --> 10:34:12,560 So that's a little tidbit there. We could have done all of this in one step, but I'll show you 5918 10:34:12,560 --> 10:34:19,760 why we broke this apart. So now we're going to calculate loss slash accuracy. We don't necessarily 5919 10:34:19,760 --> 10:34:26,320 have to calculate the accuracy. But we did make an accuracy function up here. So that we can 5920 10:34:26,320 --> 10:34:31,360 calculate the accuracy during training, we could just stick with only calculating the loss. But 5921 10:34:31,360 --> 10:34:37,520 sometimes it's cool to visualize different metrics loss plus a few others while your model is training. 5922 10:34:37,520 --> 10:34:44,640 So let's write some code to do that here. So we'll start off by going loss equals loss 5923 10:34:44,640 --> 10:34:53,600 f n and y logits. Ah, here's the difference of what we've done before. Previously in the notebook 5924 10:34:53,600 --> 10:34:58,880 zero one, up to zero two now, we passed in the prediction right here. But because what's our 5925 10:34:58,880 --> 10:35:04,880 loss function? Let's have a look at our loss function. Let's just call that see what it returns. 5926 10:35:04,880 --> 10:35:15,280 BCE with logits loss. So the BCE with logits expects logits as input. So as you might have guessed, 5927 10:35:15,280 --> 10:35:24,720 loss function without logits. If we had nn dot BCE loss, notice how we haven't got with logits. 5928 10:35:25,280 --> 10:35:32,640 And then we called loss f n, f n stands for function, by the way, without logits. What do we get? 5929 10:35:32,640 --> 10:35:40,480 So BCE loss. So this loss expects prediction probabilities as input. So let's write some code 5930 10:35:40,480 --> 10:35:45,040 to differentiate between these two. As I said, we're going to be sticking with this one. 5931 10:35:45,040 --> 10:35:53,120 Why is that because if we look up torch BCE with logits loss, the documentation states that it's 5932 10:35:53,120 --> 10:35:59,280 more numerically stable. So this loss combines a sigmoid layer and the BCE loss into one single 5933 10:35:59,280 --> 10:36:05,760 class, and is more numerically stable. So let's come back here, we'll keep writing some code. 5934 10:36:06,880 --> 10:36:13,280 And the accuracy is going to be accuracy f n. So our accuracy function, there's a little bit of a 5935 10:36:13,280 --> 10:36:19,440 difference here is why true equals y train for the training data. So this will be the training 5936 10:36:19,440 --> 10:36:29,040 accuracy. And then we have y pred equals y pred. So this is our own custom accuracy function 5937 10:36:29,040 --> 10:36:33,680 that we wrote ourselves. This is a testament to the Pythonic nature of PyTorch as well. 5938 10:36:33,680 --> 10:36:37,120 We've just got a pure Python function that we've slotted into our training loop, 5939 10:36:37,680 --> 10:36:40,480 which is essentially what the loss function is behind the scenes. 5940 10:36:41,440 --> 10:36:52,400 Now, let's write here, and then dot BCE with logits loss expects raw logits. So the raw output 5941 10:36:52,400 --> 10:37:00,240 of our model as input. Now, what if we were using a BCE loss on its own here? Well, let's just write 5942 10:37:00,240 --> 10:37:07,520 some code for that. So let's call loss function. And then we want to pass in y pred. Or we can 5943 10:37:07,520 --> 10:37:14,320 just go why or torch sigmoid. So why would we pass in torch sigmoid on the logits here? Because 5944 10:37:14,320 --> 10:37:21,760 remember, calling torch dot sigmoid on our logits turns our logits into prediction probabilities. 5945 10:37:21,760 --> 10:37:31,520 And then we would pass in y train here. So if this was BCE loss expects this expects prediction 5946 10:37:32,320 --> 10:37:40,000 probabilities as input. So does that make sense? That's the difference between with logits. So 5947 10:37:40,000 --> 10:37:45,440 our loss function requires logits as input. Whereas if we just did straight up BCE loss, 5948 10:37:45,440 --> 10:37:52,240 we need to call torch dot sigmoid on the logits because it expects prediction probabilities as 5949 10:37:52,240 --> 10:37:59,280 input. Now, I'm going to comment that out because our loss function is BCE with logits loss. But 5950 10:37:59,280 --> 10:38:04,640 just keep that in mind. For some reason, you stumble across some pytorch code that's using BCE loss, 5951 10:38:04,640 --> 10:38:10,160 not BCE with logits loss. And you find that torch dot sigmoid is calling here, or you come across 5952 10:38:10,160 --> 10:38:16,400 some errors, because your inputs to your loss function are not what it expects. So with that 5953 10:38:16,400 --> 10:38:23,040 being said, we can keep going with our other steps. So we're up to optimizer zero grad. So 5954 10:38:23,040 --> 10:38:32,080 optimizer dot zero grad. Oh, this is step three, by the way. And what's after this? Once we've 5955 10:38:32,080 --> 10:38:40,960 zero grad the optimizer, we do number four, which is loss backward. We can go last backward. And then 5956 10:38:40,960 --> 10:38:48,080 we go what's next? Optimizer step step step. So optimizer dot step. And I'm singing the unofficial 5957 10:38:48,080 --> 10:38:56,000 pytorch optimization loop song there. This is back propagation. Calculate the gradients with respect 5958 10:38:56,000 --> 10:39:01,840 to all of the parameters in the model. And the optimizer step is update the parameters to reduce 5959 10:39:01,840 --> 10:39:08,720 the gradients. So gradient descent, hence the descent. Now, if we want to do testing, 5960 10:39:09,440 --> 10:39:14,480 well, we know what to do here now, we go model zero, what do we do? We call model dot of 5961 10:39:14,480 --> 10:39:18,880 al when we're testing. And if we're making predictions, that's what we do when we test, 5962 10:39:18,880 --> 10:39:24,640 we make predictions on the test data set, using the patterns that our model has learned on the 5963 10:39:24,640 --> 10:39:29,440 training data set, we turn on inference mode, because we're doing inference, we want to do the 5964 10:39:29,440 --> 10:39:36,560 forward pass. And of course, we're going to compute the test logits, because logits are the raw output 5965 10:39:36,560 --> 10:39:42,640 of our model with no modifications. X test dot squeeze, we're going to get rid of an extra one 5966 10:39:42,640 --> 10:39:48,880 dimension there. Then we create the test pred, which is we have to do a similar calculation to 5967 10:39:48,880 --> 10:39:54,880 what we've done here for the test pred, which is torch dot round. For our binary classification, 5968 10:39:54,880 --> 10:39:59,280 we want prediction probabilities, which we're going to create by calling the sigmoid function 5969 10:39:59,280 --> 10:40:05,680 on the test logits, prediction probabilities that are 0.5 or above to go to one, and prediction 5970 10:40:05,680 --> 10:40:15,200 probabilities under 0.5 to go to level zero. So two is calculate the test loss, test loss 5971 10:40:15,840 --> 10:40:22,080 slash accuracy. How would we do this? Well, just if we've done before, and we're going to go 5972 10:40:22,080 --> 10:40:29,760 loss FN test logits, because our loss function, we're using what we're using BCE with logits loss, 5973 10:40:29,760 --> 10:40:34,960 expects logits as input, where do we find that out in the documentation, of course, 5974 10:40:34,960 --> 10:40:40,000 then we come back here, test logits, we're going to compare that to the Y test labels. 5975 10:40:40,000 --> 10:40:45,280 And then for the test accuracy, what are we going to do? We're going to call accuracy FN 5976 10:40:45,280 --> 10:40:55,600 on Y true equals Y test, and Y pred equals test pred. Now you might be thinking, why did I switch 5977 10:40:55,600 --> 10:41:00,880 up the order here for these? Oh, and by the way, this is important to know with logits loss. 5978 10:41:03,120 --> 10:41:09,680 So with these loss functions, the order here matters of which way you put in your parameters. 5979 10:41:09,680 --> 10:41:13,760 So predictions come first, and then true labels for our loss function. You might be 5980 10:41:13,760 --> 10:41:19,040 wondering why I've done it the reverse for our accuracy function, Y true and Y pred. That's just 5981 10:41:19,040 --> 10:41:24,960 because I like to be confusing. Well, not really. It's because if we go to scikit-learn, I base a 5982 10:41:24,960 --> 10:41:32,720 lot of my structured code of how scikit-learn structures things. The scikit-learn metrics accuracy 5983 10:41:32,720 --> 10:41:40,160 score goes Y true Y pred. So I base it off that order, because the scikit-learn metrics package 5984 10:41:40,160 --> 10:41:48,960 is very helpful. So I've based our metric evaluation metric function off this one. Whereas PyTorch's 5985 10:41:48,960 --> 10:41:53,680 loss function does it in the reverse order, and it's important to get these in the right order. 5986 10:41:53,680 --> 10:41:59,200 Exactly why they do it in that order. I couldn't tell you why. And we've got one final step, which 5987 10:41:59,200 --> 10:42:10,240 is to print out what's happening. So how about we go, we're doing a lot of epochs here, 100 epochs. 5988 10:42:10,240 --> 10:42:17,680 So we'll divide the epoch by 10 to print out every epoch or every 10th epoch, sorry. And we have a 5989 10:42:17,680 --> 10:42:22,960 couple of different metrics that we can print out this time. So we're going to print out the epoch 5990 10:42:22,960 --> 10:42:31,200 number epoch. And then we're going to print out the loss. So loss, how many decimal points? 5991 10:42:31,200 --> 10:42:36,400 We'll go point five here. This is going to be the training loss. We'll also do the accuracy, 5992 10:42:36,400 --> 10:42:42,560 which will be the training accuracy. We could write trainiac here for our variable to be a little bit, 5993 10:42:42,560 --> 10:42:46,640 make them a little bit more understandable. And then we go here, but we're just going to leave 5994 10:42:46,640 --> 10:42:54,160 it as loss and accuracy for now, because we've got test loss over here, test loss. And we're 5995 10:42:54,160 --> 10:43:00,640 going to do the same five decimal points here. And then we're going to go test accuracy as well. 5996 10:43:01,520 --> 10:43:08,880 Test act dot, we'll go to for the accuracy. And because it's accuracy, we want a percentage. This 5997 10:43:08,880 --> 10:43:13,680 is the percent out of 100 guesses. What's the percentage that our model gets right on the training 5998 10:43:13,680 --> 10:43:18,640 data and the testing data, as long as we've coded all the functions correctly. Now, 5999 10:43:18,640 --> 10:43:23,280 we've got a fair few steps here. My challenge to you is to run this. And if there are any errors, 6000 10:43:23,280 --> 10:43:27,680 try to fix them. No doubt there's probably one or two or maybe more that we're going to have to 6001 10:43:27,680 --> 10:43:33,120 fix in the next video. But speaking of next videos, I'll see you there. Let's train our first 6002 10:43:33,840 --> 10:43:38,480 classification model. Well, this is very exciting. I'll see you soon. 6003 10:43:38,480 --> 10:43:46,160 Welcome back. In the last video, we wrote a mammoth amount of code, but nothing that we 6004 10:43:46,160 --> 10:43:50,240 can't handle. We've been through a lot of these steps. We did have to talk about a few tidbits 6005 10:43:50,240 --> 10:43:56,160 between using different loss functions, namely the BCE loss, which is binary cross entropy loss, 6006 10:43:56,160 --> 10:44:02,480 and the BCE with logit's loss. We discussed that the BCE loss in PyTorch expects prediction 6007 10:44:02,480 --> 10:44:08,960 probabilities as input. So we have to convert our model's logits. Logits are the raw output of the 6008 10:44:08,960 --> 10:44:16,240 model to prediction probabilities using the torch dot sigmoid activation function. And if we're using 6009 10:44:16,240 --> 10:44:23,520 BCE with logits loss, it expects raw logits as input as sort of the name hints at. And so we 6010 10:44:23,520 --> 10:44:29,840 just pass it straight away the raw logits. Whereas our own custom accuracy function compares labels 6011 10:44:29,840 --> 10:44:34,080 to labels. And that's kind of what we've been stepping through over the last few videos, 6012 10:44:34,080 --> 10:44:39,600 is going from logits to predprobs to pred labels, because that's the ideal output of our model is 6013 10:44:39,600 --> 10:44:46,080 some kind of label that we as humans can interpret. And so let's keep pushing forward. You may have 6014 10:44:46,080 --> 10:44:50,080 already tried to run this training loop. I don't know if it works. We wrote all this code to get 6015 10:44:50,080 --> 10:44:54,000 them in the last video. And it's probably an error somewhere. So you ready? We're going to train 6016 10:44:54,000 --> 10:45:00,800 our first classification model together for 100 epochs. If it all goes to plan in three, two, 6017 10:45:00,800 --> 10:45:07,280 one, let's run. Oh my gosh, it actually worked the first time. I promise you, I didn't change 6018 10:45:07,280 --> 10:45:13,280 anything in here from the last video. So let's inspect what's going on. It trains pretty fast. 6019 10:45:13,280 --> 10:45:18,640 Why? Well, because we're using a GPU, so it's going to be accelerated as much as it can anyway. 6020 10:45:18,640 --> 10:45:22,640 And our data set is quite small. And our network is quite small. So you won't always 6021 10:45:22,640 --> 10:45:28,160 get networks training this fast. They did 100 epochs in like a second. So the loss. Oh, 6022 10:45:29,280 --> 10:45:37,360 0.69973. It doesn't go down very much. The accuracy even starts high and then goes down. 6023 10:45:38,480 --> 10:45:43,440 What's going on here? Our model doesn't seem to be learning anything. So what would an ideal 6024 10:45:43,440 --> 10:45:49,760 accuracy be? An ideal accuracy is 100. And what's an ideal loss value? Well, zero, because lower 6025 10:45:49,760 --> 10:45:56,720 is better for loss. Hmm, this is confusing. And now if we go, have a look at our blue and red 6026 10:45:56,720 --> 10:46:05,600 dots. Where's our data? So I reckon, do we still have a data frame here? How many samples do we 6027 10:46:05,600 --> 10:46:13,600 have of each? Let's inspect. Let's do some data analysis. Where do we create a data frame here? 6028 10:46:13,600 --> 10:46:20,080 Now, circles, do we still have this instantiated circles dot label dot? We're going to call on 6029 10:46:20,080 --> 10:46:28,880 pandas here, value counts. Is this going to output how many of each? Okay. Wow, we've got 500 of 6030 10:46:28,880 --> 10:46:36,960 class one and 500 of class zero. So we have 500 red dots and blue dots, which means we have a 6031 10:46:36,960 --> 10:46:42,800 balanced data set. So if we're getting, we're basically trying to predict heads or tails here. 6032 10:46:42,800 --> 10:46:47,920 So if we're getting an accuracy of under 50%, or about 50%, if you rounded it up. 6033 10:46:49,360 --> 10:46:54,640 Our model is basically doing as well as guessing. Well, what gives? Well, I think we should get 6034 10:46:54,640 --> 10:46:59,760 visual with this. So let's make some predictions with our model, because these are just numbers 6035 10:46:59,760 --> 10:47:04,960 on the page. It's hard to interpret what's going on. But our intuition now is because we have 500 6036 10:47:04,960 --> 10:47:09,920 samples of each, or in the case of the training data set, we have 400 of each because we have 6037 10:47:09,920 --> 10:47:15,920 800 samples in the training data set. And we have in the testing data set, we have 200 total 6038 10:47:15,920 --> 10:47:20,320 samples. So we have 100 of each. We're basically doing a coin flip here. Our model is as good as 6039 10:47:20,320 --> 10:47:28,640 guessing. So turn to investigate why our model is not learning. And one of the ways we can do 6040 10:47:28,640 --> 10:47:35,280 that is by visualizing our predictions. So let's write down here from the metrics. It looks like 6041 10:47:35,280 --> 10:47:48,160 our model isn't learning anything. So to inspect it, let's make some predictions and make them 6042 10:47:48,160 --> 10:47:59,680 visual. And we're right down here. In other words, visualize, visualize, visualize. All right. 6043 10:48:00,320 --> 10:48:04,000 So we've trained a model. We've at least got the structure for the training code here. 6044 10:48:04,000 --> 10:48:08,720 But this is the right training code. We've written this code before. So you know that this set up 6045 10:48:08,720 --> 10:48:12,880 for training code does allow a model to train. So there must be something wrong with either 6046 10:48:12,880 --> 10:48:17,840 how we've built our model, the data set. But let's keep going and investigate together. 6047 10:48:17,840 --> 10:48:24,400 So to do so, I've got a function that I've pre-built earlier. Did I mention that we're learning side 6048 10:48:24,400 --> 10:48:28,480 by side of a machine learning cooking show? So this is an ingredient I prepared earlier, 6049 10:48:28,480 --> 10:48:38,160 a part of a dish. So to do so, we're going to import a function called plot decision, 6050 10:48:39,120 --> 10:48:42,320 or maybe I'll turn this into code, plot decision boundary. 6051 10:48:44,800 --> 10:48:50,960 Welcome to the cooking show, cooking with machine learning. What model will we cook up today? 6052 10:48:50,960 --> 10:48:59,360 So if we go to pytorch deep learning, well, it's already over here, but this is the home repo for 6053 10:48:59,360 --> 10:49:03,920 the course, the link for this will be scattered everywhere. But there's a little function here 6054 10:49:03,920 --> 10:49:08,480 called helper functions dot py, which I'm going to fill up with helper functions throughout the 6055 10:49:08,480 --> 10:49:13,360 course. And this is the one I'm talking about here, plot decision boundary. Now we could just 6056 10:49:13,360 --> 10:49:18,320 copy this into our notebook, or I'm going to write some code to import this programmatically, 6057 10:49:18,320 --> 10:49:22,960 so we can use other functions from in here. Here's our plot predictions function that we made in 6058 10:49:22,960 --> 10:49:30,480 the last section, zero one, but this plot decision boundary is a function that I got inspired by 6059 10:49:30,480 --> 10:49:36,880 to create from madewithml.com. Now this is another resource, a little bit of an aside, 6060 10:49:36,880 --> 10:49:41,600 I highly recommend going through this by Goku Mohandas. It gives you the foundations of neural 6061 10:49:41,600 --> 10:49:49,360 networks and also ml ops, which is a field, which is based on getting your neural networks and machine 6062 10:49:49,360 --> 10:49:56,160 learning models into applications that other people can use. So I can't recommend this resource 6063 10:49:56,160 --> 10:50:02,080 enough. So please, please, please check that out if you want another resource for machine learning, 6064 10:50:02,080 --> 10:50:06,480 but this is where this helper function came from. So thank you, Goku Mohandas. I've made a little 6065 10:50:06,480 --> 10:50:12,800 bit of modifications for this course, but not too many. So we could either copy that, paste it in 6066 10:50:12,800 --> 10:50:20,960 here, or we could write some code to import it for us magically, or using the power of the internet, 6067 10:50:20,960 --> 10:50:24,400 right, because that's what we are. We're programmers, we're machine learning engineers, we're data 6068 10:50:24,400 --> 10:50:31,120 scientists. So from pathlib, so the request module in Python is a module that allows you to make 6069 10:50:31,120 --> 10:50:36,400 requests, a request is like going to a website, hey, I'd like to get this code from you, or this 6070 10:50:36,400 --> 10:50:40,560 information from you, can you please send it to me? So that's what that allows us to do, 6071 10:50:40,560 --> 10:50:46,480 and pathlib, we've seen pathlib before, but it allows us to create file parts. Because why? Well, 6072 10:50:46,480 --> 10:50:54,080 we want to save this helper function dot pi script to our Google collab files. And so we can do this 6073 10:50:54,080 --> 10:51:03,760 with a little bit of code. So download helper functions from learn pytorch repo. If it's not 6074 10:51:04,320 --> 10:51:13,920 already downloaded. So let's see how we can do that. So we're going to write some if else code to 6075 10:51:13,920 --> 10:51:23,520 check to see if the path of helper functions dot pi already exist, we don't want to download it again. 6076 10:51:23,520 --> 10:51:30,560 So at the moment, it doesn't exist. So this if statement is going to return false. So let's just 6077 10:51:30,560 --> 10:51:37,440 print out what it does if it returns true helper functions dot pi already exists. We might we could 6078 10:51:37,440 --> 10:51:43,920 even probably do a try and accept looping about if else will help us out for now. So if it exists 6079 10:51:43,920 --> 10:51:53,600 else, print downloading helper functions dot pi. So ours doesn't exist. So it's going to make a 6080 10:51:53,600 --> 10:52:02,400 request or let's set up our request request dot get. And here's where we can put in a URL. But we 6081 10:52:02,400 --> 10:52:09,360 need the raw version of it. So this is the raw version. If we go back, this is just pytorch deep 6082 10:52:09,360 --> 10:52:15,920 learning the repo for this course slash helper functions. If I click raw, I'm going to copy that. 6083 10:52:16,560 --> 10:52:21,040 Oh, don't want to go in there want to go into request get type that in this has to be in a 6084 10:52:21,040 --> 10:52:28,880 string format. So we get the raw URL. And then we're going to go with open, we're going to open 6085 10:52:28,880 --> 10:52:39,200 a file called helper functions dot pi. And we're going to set the context to be right binary, 6086 10:52:39,200 --> 10:52:46,640 which is wb as file F is a common short version of writing file. Because we're going to call 6087 10:52:46,640 --> 10:52:53,920 file dot write, and then request dot content. So this code is basically saying hey requests, 6088 10:52:53,920 --> 10:52:59,120 get the information that's at this link here, which is of course, all of this code here, 6089 10:52:59,120 --> 10:53:05,280 which is a Python script. And then we're going to create a file called helper functions dot pi, 6090 10:53:05,280 --> 10:53:09,920 which gives us write permissions. We're going to name it F, which is short for file. And then 6091 10:53:09,920 --> 10:53:17,280 we're going to call on it file dot write the content of the request. So instead of talking 6092 10:53:17,280 --> 10:53:22,240 through it, how about we see it in action? We'll know if it works if we can from helper functions 6093 10:53:22,240 --> 10:53:28,080 import plot predictions, we're going to use plot predictions later on, as well as plot decision 6094 10:53:28,080 --> 10:53:35,920 boundary. So plot predictions we wrote in the last section. Wonderful. I'm going to write here, 6095 10:53:35,920 --> 10:53:42,080 downloading helper functions dot pi did at work. We have helper functions dot pi. Look at that, 6096 10:53:42,080 --> 10:53:47,360 we've done it programmatically. Can we view this in Google column? Oh my goodness, yes we can. 6097 10:53:47,360 --> 10:53:52,880 And look at that. So this may evolve by the time you do the course, but these are just some general 6098 10:53:52,880 --> 10:53:56,880 helper functions rather than writing all of this out. If you would like to know what's going on 6099 10:53:56,880 --> 10:54:01,440 in plot decision boundary, I encourage you to read through here. And what's going on, 6100 10:54:01,440 --> 10:54:07,360 you can step by step at yourself. There is nothing here that you can't tackle yourself. It's all 6101 10:54:07,360 --> 10:54:12,080 just Python code, no secrets just Python code. We've got we're making predictions with a 6102 10:54:12,080 --> 10:54:17,280 PyTorch model. And then we're testing for multi class or binary. So we're going to get out of that. 6103 10:54:17,280 --> 10:54:23,840 But now let's see the ultimate test is if the plot decision boundary function works. So again, 6104 10:54:23,840 --> 10:54:31,040 we could discuss plot decision boundary of the model. We could discuss what it does behind the scenes 6105 10:54:33,040 --> 10:54:38,640 to the cows come home. But we're going to see it in real life here. I like to get visual. 6106 10:54:38,640 --> 10:54:45,520 So fig size 12, six, we're going to create a plot here, because we are adhering to the data 6107 10:54:45,520 --> 10:54:51,760 explorer's motto of visualize visualize visualize. And we want to subplot because we're going to 6108 10:54:51,760 --> 10:54:59,760 compare our training and test sets here, train. And then we're going to go PLT, or actually we'll 6109 10:54:59,760 --> 10:55:05,280 plot the first one, plot decision boundary. Now, because we're doing a training plot here, 6110 10:55:05,280 --> 10:55:11,520 we're going to pass in model zero and X train and Y train. Now, this is the order that the 6111 10:55:11,520 --> 10:55:18,960 parameters go in. If we press command shift space, I believe Google collab, if it's working with me, 6112 10:55:18,960 --> 10:55:25,680 we'll put up a doc string. There we go, plot decision boundary. Look at the inputs that it 6113 10:55:25,680 --> 10:55:31,120 takes model, which is torch and end up module. And we've got X, which is our X value, which is a 6114 10:55:31,120 --> 10:55:36,640 torch tensor, and Y, which is our torch tensor value here. So that's for the training data. 6115 10:55:36,640 --> 10:55:42,240 Now, let's do the same for the testing data, plot dot subplot. This is going to be one, two, 6116 10:55:42,240 --> 10:55:47,600 two for the index. This is just number of rows of the plot, number of columns. And this is the 6117 10:55:47,600 --> 10:55:52,640 index. So this plot will appear on the first slot. We're going to see this anyway. Anything 6118 10:55:52,640 --> 10:55:58,480 below this code will appear on the second slot, PLT dot title. And we're going to call this one 6119 10:55:58,480 --> 10:56:04,160 test. Then we're going to call plot decision boundary. If this works, this is going to be some 6120 10:56:04,160 --> 10:56:11,440 serious magic. I love visualization functions in machine learning. Okay, you ready? Three, 6121 10:56:11,440 --> 10:56:19,520 two, one, let's check it out. How's our model doing? Oh, look at that. Oh, now it's clear. 6122 10:56:19,520 --> 10:56:24,160 So behind the scenes, this is the plots that plot decision boundary is making. Of course, 6123 10:56:24,160 --> 10:56:28,800 this is the training data. This is the testing data, not as many dot points here, but the same 6124 10:56:28,800 --> 10:56:33,440 sort of line of what's going on. So this is the line that our model is trying to draw through the 6125 10:56:33,440 --> 10:56:38,960 data. No wonder it's getting about 50% accuracy and the loss isn't going down. It's just trying 6126 10:56:38,960 --> 10:56:43,600 to split the data straight through the middle. It's drawing a straight line. But our data is 6127 10:56:43,600 --> 10:56:50,320 circular. Why do you think it's drawing a straight line? Well, do you think it has anything to do 6128 10:56:50,320 --> 10:56:55,920 with the fact that our model is just made with using pure linear layers? Let's go back to our model. 6129 10:56:55,920 --> 10:57:01,680 What's it comprised on? Just a couple of linear layers. What's a linear line? If we look up linear 6130 10:57:01,680 --> 10:57:06,480 line, is this going to work with me? I don't actually think it might. There we go. Linear line, 6131 10:57:06,480 --> 10:57:10,720 all straight lines. So I want you to have a think about this, even if you're completely 6132 10:57:10,720 --> 10:57:17,840 new to deep learning, can we? You can answer this question. Can we ever separate this circular data 6133 10:57:17,840 --> 10:57:24,320 with straight lines? I mean, maybe we could if we drew straight lines here, but then trying to 6134 10:57:24,320 --> 10:57:28,480 curve them around. But there's an easier way. We're going to see that later on. For now, 6135 10:57:29,120 --> 10:57:35,760 how about we try to improve our model? So the model that we built, we've got 100 epochs. 6136 10:57:35,760 --> 10:57:39,920 I wonder if our model will improve if we trained it for longer. So that's a little bit of a challenge 6137 10:57:39,920 --> 10:57:46,080 before the next video. See if you can train the model for 1000 epochs. Does that improve the 6138 10:57:46,080 --> 10:57:51,040 results here? And if it doesn't improve the results here, have a think about why that might be. 6139 10:57:52,400 --> 10:57:59,200 I'll see you in the next video. Welcome back. In the last video, we wrote some code to download 6140 10:57:59,200 --> 10:58:04,080 a series of helper functions from our helper functions dot pi. And later on, you'll see why 6141 10:58:04,080 --> 10:58:10,240 this is quite standard practice as you write more and more code is to write some code, store them 6142 10:58:10,240 --> 10:58:15,600 somewhere such as a Python script like this. And then instead of us rewriting everything that we 6143 10:58:15,600 --> 10:58:20,640 have and helper functions, we just import them and then use them later on. This is similar to 6144 10:58:20,640 --> 10:58:25,680 what we've been doing with PyTorch. PyTorch is essentially just a collection of Python scripts 6145 10:58:25,680 --> 10:58:30,880 that we're using to build neural networks. Well, there's a lot more than what we've just done. 6146 10:58:30,880 --> 10:58:34,640 I mean, we've got one here, but PyTorch is a collection of probably hundreds of different 6147 10:58:34,640 --> 10:58:39,600 Python scripts. But that's beside the point. We're trying to train a model here to separate 6148 10:58:39,600 --> 10:58:45,440 blue and red dots. But our current model is only drawing straight lines. And I got you to 6149 10:58:45,440 --> 10:58:50,960 have a think about whether our straight line model, our linear model could ever separate this data. 6150 10:58:50,960 --> 10:58:56,160 Maybe it could. And I issued the challenge to see if it could if you trained for 1000 epochs. 6151 10:58:56,960 --> 10:59:05,360 So did it improve at anything? Is the accuracy any higher? Well, speaking of training for more 6152 10:59:05,360 --> 10:59:13,760 epochs, we're up to section number five, improving a model. This is from a model perspective. So now 6153 10:59:13,760 --> 10:59:20,720 let's discuss some ways. If you were getting results after you train a machine learning model or a 6154 10:59:20,720 --> 10:59:24,240 deep learning model, whatever kind of model you're working with, and you weren't happy with those 6155 10:59:24,240 --> 10:59:30,240 results. So how could you go about improving them? So this is going to be a little bit of an overview 6156 10:59:30,240 --> 10:59:37,840 of what we're going to get into. So one way is to add more layers. So give the model more chances 6157 10:59:37,840 --> 10:59:45,200 to learn about patterns in the data. Why would that help? Because if our model currently has two 6158 10:59:45,200 --> 10:59:56,400 layers, model zero dot state dinked. Well, we've got however many numbers here, 20 or so. So this 6159 10:59:56,400 --> 11:00:00,720 is zero flayer. This is the first layer. If we had 10 of these, well, we'd have 10 times the 6160 11:00:00,720 --> 11:00:06,720 amount of parameters to try and learn the patterns in this data, a representation of this data. 6161 11:00:06,720 --> 11:00:16,480 Another way is to add more hidden units. So what I mean by that is we created this model here, 6162 11:00:16,480 --> 11:00:24,160 and each of these layers has five hidden units. The first one outputs, out features equals five, 6163 11:00:24,160 --> 11:00:31,920 and this one takes in features equals five. So we could go from, go from five hidden units to 6164 11:00:31,920 --> 11:00:41,520 10 hidden units. The same principle as above applies here is that the more parameters our model has 6165 11:00:41,520 --> 11:00:46,720 to represent our data, the potentially now I say potentially here because some of these things 6166 11:00:46,720 --> 11:00:52,720 might not necessarily work. So our data sets quite simple. So maybe if we added too many layers, 6167 11:00:52,720 --> 11:00:56,800 our models trying to learn things that are too complex, it's trying to adjust too many numbers 6168 11:00:56,800 --> 11:01:01,840 for the data set that we have the same thing for more hidden units. What other options do we 6169 11:01:01,840 --> 11:01:08,640 have? Well, we could fit for longer, give the model more of a chance to learn because every epoch 6170 11:01:08,640 --> 11:01:13,600 is one pass through the data. So maybe 100 times looking at this data set wasn't enough. 6171 11:01:13,600 --> 11:01:18,160 So maybe you could fit for 1000 times, which was the challenge. Then there's change in the 6172 11:01:18,160 --> 11:01:23,200 activation functions, which we're using sigmoid at the moment, which is generally the activation 6173 11:01:23,200 --> 11:01:28,240 function you use for a binary classification problem. But there are also activation functions 6174 11:01:28,240 --> 11:01:34,240 you can put within your model. Hmm, there's a little hint that we'll get to that later. 6175 11:01:34,240 --> 11:01:40,880 Then there's change the learning rate. So the learning rate is the amount the optimizer will 6176 11:01:40,880 --> 11:01:46,720 adjust these every epoch. And if it's too small, our model might not learn anything because it's 6177 11:01:46,720 --> 11:01:52,000 taking forever to change these numbers. But if also on the other side of things, if the learning 6178 11:01:52,000 --> 11:01:58,560 rate is too high, these updates might be too large. And our model might just explode. There's an 6179 11:01:58,560 --> 11:02:05,760 actual problem in machine learning called exploding gradient problem, where the numbers just get 6180 11:02:05,760 --> 11:02:10,800 too large. On the other side, there's also a vanishing gradients problem, where the gradients 6181 11:02:10,800 --> 11:02:17,520 just go basically to zero too quickly. And then there's also change the loss function. But I feel 6182 11:02:17,520 --> 11:02:23,520 like for now, sigmoid and binary cross entropy, pretty good, pretty standard. So we're going to 6183 11:02:23,520 --> 11:02:28,560 have a look at some options here, add more layers and fit for longer, maybe changing the learning 6184 11:02:28,560 --> 11:02:33,680 rate. But let's just add a little bit of color to what we've been talking about. Right now, 6185 11:02:33,680 --> 11:02:37,200 we've fit the model to the data and made a prediction. I'm just going to step through this. 6186 11:02:38,080 --> 11:02:42,240 Where are we up to? We've done this, we've done this, we've done these two, we've built a training 6187 11:02:42,240 --> 11:02:46,400 loop, we've fit the model to the data, made a prediction, we've evaluated our model visually, 6188 11:02:46,400 --> 11:02:50,240 and we're not happy with that. So we're up to number five, we're going to improve through 6189 11:02:50,240 --> 11:02:54,560 experimentation. We don't need to use TensorBoard just yet, we're going to talk about this as our 6190 11:02:54,560 --> 11:03:01,200 high level. TensorBoard is a tool or a utility from PyTorch, which helps you to monitor experiments. 6191 11:03:01,200 --> 11:03:05,520 We'll see that later on. And then we'll get to this, we won't save our model until we've got one 6192 11:03:05,520 --> 11:03:10,800 that we're happy with. And so if we look at what we've just talked about improving a model from a 6193 11:03:10,800 --> 11:03:16,800 model's perspective, let's talk about the things we've talked about with some color this time. So 6194 11:03:16,800 --> 11:03:21,120 say we've got a model here, this isn't the exact model that we're working with, but it's similar 6195 11:03:21,120 --> 11:03:27,200 structure. We've got one, two, three, four layers, we've got a loss function BC with Logit's loss, 6196 11:03:27,200 --> 11:03:32,640 we've got an optimizer, optimizer stochastic gradient descent, and if we did write some training code, 6197 11:03:32,640 --> 11:03:38,320 this is 10 epochs. And then the testing code here, I've just cut it out because it wouldn't fit on 6198 11:03:38,320 --> 11:03:43,040 the slide. Then if we wanted to go to a larger model, let's add some color here so we can highlight 6199 11:03:43,040 --> 11:03:49,680 what's happening, adding layers. Okay, so this one's got one, two, three, four, five, six layers. 6200 11:03:50,320 --> 11:03:56,160 And we've got another color here, which is I'd say this is like a little bit of a greeny blue 6201 11:03:56,160 --> 11:04:01,200 increase the number of hidden units. Okay, so the hidden units are these features here. 6202 11:04:01,200 --> 11:04:08,000 We've gone from 100 to 128 to 128. Remember, the out features of a previous layer have to line up 6203 11:04:08,000 --> 11:04:14,400 with the in features of a next layer. Then we've gone to 256. Wow. So remember how I said multiples 6204 11:04:14,400 --> 11:04:18,400 of eight are pretty good generally in deep learning? Well, this is where these numbers come from. 6205 11:04:19,280 --> 11:04:25,440 And then what else do we have change slash add activation functions? We haven't seen this before 6206 11:04:25,440 --> 11:04:29,440 and end up relu. If you want to jump ahead and have a look at what and end up relu is, 6207 11:04:29,440 --> 11:04:35,280 how would you find out about it? Well, I just Google and end up relu. But we're going to have 6208 11:04:35,280 --> 11:04:41,120 a look at what this is later on. We can see here that this one's got one, but this larger model has 6209 11:04:41,120 --> 11:04:48,080 some relu's scattered between the linear layers. Hmm, maybe that's a hint. If we combine a linear 6210 11:04:48,080 --> 11:04:54,000 layer with a relu, what's a relu layer? I'm not going to spoil this. We're going to find out 6211 11:04:54,000 --> 11:05:00,160 later on change the optimization function. Okay. So we've got SGD. Do you recall how I said 6212 11:05:00,160 --> 11:05:05,600 Adam is another popular one that works fairly well across a lot of problems as well. So Adam 6213 11:05:05,600 --> 11:05:11,200 might be a better option for us here. The learning rate as well. So maybe this learning rate was a 6214 11:05:11,200 --> 11:05:16,240 little too high. And so we've divided it by 10. And then finally, fitting for longer. So instead 6215 11:05:16,240 --> 11:05:21,920 of 10 epochs, we've gone to 100. So how about we try to implement some of these with our own model 6216 11:05:21,920 --> 11:05:27,120 to see if it improves what we've got going on here? Because frankly, like, this isn't 6217 11:05:27,120 --> 11:05:30,800 satisfactory. We're trying to build a neural network here. Neural networks are supposed to be 6218 11:05:30,800 --> 11:05:35,360 these models that can learn almost anything. And we can't even separate some blue dots from 6219 11:05:35,360 --> 11:05:41,120 some red dots. So in the next video, how about we run through writing some code to do some of 6220 11:05:41,120 --> 11:05:46,640 these steps here? In fact, if you want to try yourself, I'd highly encourage that. So I'd start 6221 11:05:46,640 --> 11:05:51,840 with trying to add some more layers and add some more hitting units and fitting for longer. You can 6222 11:05:51,840 --> 11:05:59,520 keep all of the other settings the same for now. But I'll see you in the next video. Welcome back. 6223 11:05:59,520 --> 11:06:05,040 In the last video, we discussed some options to improve our model from a model perspective. And 6224 11:06:05,040 --> 11:06:08,960 namely, we're trying to improve it so that the predictions are better, so that the patterns it 6225 11:06:08,960 --> 11:06:14,640 learns better represent the data. So we can separate blue dots from red dots. And you might be wondering 6226 11:06:14,640 --> 11:06:23,600 why we said from a model perspective here. So let me just write these down. These options are all 6227 11:06:23,600 --> 11:06:35,360 from a models perspective, because they deal directly with the model, rather than the data. 6228 11:06:36,000 --> 11:06:41,680 So there's another way to improve a models results is if the model was sound already, 6229 11:06:41,680 --> 11:06:47,920 in machine learning and deep learning, you may be aware that generally if you have more data samples, 6230 11:06:47,920 --> 11:06:53,840 the model learns or gets better results because it has more opportunity to learn. There's a few 6231 11:06:53,840 --> 11:06:59,040 other ways to improve a model from a data perspective, but we're going to focus on improving a model 6232 11:06:59,040 --> 11:07:12,400 from a models perspective. So, and because these options are all values we as machine learning 6233 11:07:13,760 --> 11:07:26,480 engineers and data scientists can change, they are referred to as hyper parameters. 6234 11:07:26,480 --> 11:07:33,520 So a little bit of an important distinction here. Parameters are the numbers within a model. 6235 11:07:33,520 --> 11:07:37,920 The parameters here, like these values, the weights and biases are parameters, 6236 11:07:37,920 --> 11:07:43,840 are the values a model updates by itself. Hyper parameters are what we as machine learning 6237 11:07:43,840 --> 11:07:48,480 engineers and data scientists, such as adding more layers, more hidden units, fitting for longer 6238 11:07:48,480 --> 11:07:54,640 number of epochs, activation functions, learning rate, loss functions are hyper parameters because 6239 11:07:54,640 --> 11:08:00,800 they're values that we can change. So let's change some of the hyper parameters of our model. 6240 11:08:01,600 --> 11:08:08,400 So we'll create circle model v1. We're going to import from nn.module as well. We could write this 6241 11:08:08,400 --> 11:08:14,560 model using nn.sequential, but we're going to subclass nn.module for practice. 6242 11:08:15,680 --> 11:08:21,120 Why would we use nn.sequential? Well, because as you'll see, our model is not too complicated, 6243 11:08:21,120 --> 11:08:31,520 but we subclass nn.module. In fact, nn.sequential. So if we write here, nn.sequential is also a 6244 11:08:31,520 --> 11:08:40,720 version of nn.module. But we subclass nn.module here for one for practice and for later on, 6245 11:08:40,720 --> 11:08:45,920 if we wanted to, or if you wanted to make more complex models, you're going to see a subclass 6246 11:08:45,920 --> 11:08:52,800 of nn.module a lot in the wild. So the first change we're going to update is the number 6247 11:08:52,800 --> 11:09:01,680 of hidden units. So out features, I might write this down before we do it. Let's try and improve 6248 11:09:01,680 --> 11:09:14,160 our model by adding more hidden units. So this will go from five and we'll increase it to 10. 6249 11:09:14,160 --> 11:09:23,280 And we want to increase the number of layers. So we want to go from two to three. We'll add an 6250 11:09:23,280 --> 11:09:33,120 extra layer and then increase the number of epochs. So we're going to go from 100 to 1,000. Now, 6251 11:09:33,120 --> 11:09:39,200 what can you, we're going to put on our scientist hats for a second. What would be the problem with 6252 11:09:39,200 --> 11:09:45,920 the way we're running this experiment? If we're doing all three things in one hit, why might that 6253 11:09:45,920 --> 11:09:52,080 be problematic? Well, because we might not know which one offered the improvement if there is 6254 11:09:52,080 --> 11:09:57,280 any improvement or degradation. So just to keep in mind going forward, I'm just doing this as an 6255 11:09:57,280 --> 11:10:02,000 example of how we can change all of these. But generally, when you're doing machine learning 6256 11:10:02,000 --> 11:10:10,160 experiments, you'd only like to change one value at a time and track the results. So that's called 6257 11:10:10,160 --> 11:10:14,240 experiment tracking and machine learning. We're going to have a look at experiment tracking a 6258 11:10:14,240 --> 11:10:19,520 little later on in the course, but just keep that in mind. A scientist likes to change one 6259 11:10:20,480 --> 11:10:25,520 variable of what's going on so that they can control what's happening. But we're going to 6260 11:10:25,520 --> 11:10:32,640 create this next layer here layer two. And of course, it takes the same number of out features as 6261 11:10:32,640 --> 11:10:40,480 in features as the previous layer. This is two because why our X train has. Let's look at just 6262 11:10:40,480 --> 11:10:47,040 the first five samples has two features. So now we're going to create self layer three, which 6263 11:10:47,040 --> 11:10:53,280 equals an n dot linear. The in features here is going to be 10. Why? Because the layer above 6264 11:10:53,280 --> 11:10:58,560 has out features equals 10. So what we've changed here so far is we've got hidden units previously 6265 11:10:58,560 --> 11:11:05,440 in the zero of this model was five. And now we've got a third layout, which previously before was 6266 11:11:05,440 --> 11:11:12,240 two. So these are two of our main changes here. And out features equals one, because why? Let's 6267 11:11:12,240 --> 11:11:19,120 have a look at speaking of why. Our why is just one number. So remember the shapes, the input and 6268 11:11:19,120 --> 11:11:23,520 output shapes of a model is one of the most important things in deep learning. We're going to see 6269 11:11:23,520 --> 11:11:28,000 different values for the shapes later on. But because we're working with this data set, we're 6270 11:11:28,000 --> 11:11:34,960 focused on two in features and one out feature. So now that we've got our layers prepared, 6271 11:11:34,960 --> 11:11:41,600 what's next? Well, we have to override the forward method, because every subclass of 6272 11:11:41,600 --> 11:11:49,280 an n dot module has to implement a forward method. So what are we going to do here? Well, we could, 6273 11:11:49,280 --> 11:11:55,680 let me just show you one option. We could go z, which would be z for logits. Logits is actually 6274 11:11:55,680 --> 11:12:03,040 represented by z, fun fact. But you could actually put any variable here. So this could be x one, 6275 11:12:03,040 --> 11:12:07,440 or you could reset x if you wanted to. I just look putting a different one because it's a little 6276 11:12:07,440 --> 11:12:14,560 less confusing for me. And then we could go update z by going self layer two. And then the, 6277 11:12:14,560 --> 11:12:21,280 because z above is the output of layer one, it now goes into here. And then if we go z, 6278 11:12:22,560 --> 11:12:28,080 again, equals self layer three, what's this going to take? It's going to take z from above. 6279 11:12:28,800 --> 11:12:33,200 So this is saying, hey, give me x, put it through layer one, assign it to z. And then 6280 11:12:33,200 --> 11:12:39,520 create a new variable z or override z with self layer two with z from before as the input. And 6281 11:12:39,520 --> 11:12:44,800 then we've got z again, the output of layer two has the input for layer three. And then we could 6282 11:12:44,800 --> 11:12:51,920 return z. So that's just passing our data through each one of these layers here. But a way that 6283 11:12:51,920 --> 11:13:00,080 you can leverage speedups in PyTorch is to call them all at once. So layer three, and we're going 6284 11:13:00,080 --> 11:13:07,920 to put self dot layer two. And this is generally how I'm going to write them. But it also behind 6285 11:13:07,920 --> 11:13:13,280 the scenes, because it's performing all the operations at once, you leverage whatever speed 6286 11:13:13,280 --> 11:13:20,320 ups you can get. Oh, this should be layer one. So it goes in order here. So what's happening? 6287 11:13:20,320 --> 11:13:26,000 Well, it's computing the inside of the brackets first. So layer one, x is going through layer one. 6288 11:13:26,000 --> 11:13:33,200 And then the output of x into layer one is going into layer two. And then the same again, 6289 11:13:33,200 --> 11:13:44,800 for layer three. So this way, this way of writing operations, leverages, speed ups, where possible 6290 11:13:47,360 --> 11:13:54,560 behind the scenes. And so we've done our Ford method there. We're just passing our data through 6291 11:13:54,560 --> 11:14:01,360 layers with an extra hidden units, and an extra layer overall. So now let's create an instance of 6292 11:14:01,360 --> 11:14:07,200 circle model v one, which we're going to set to model one. And we're going to write circle model 6293 11:14:07,200 --> 11:14:13,280 v one. And we're going to send it to the target device, because we like writing device agnostic code. 6294 11:14:14,400 --> 11:14:18,560 And then we're going to check out model one. So let's have a look at what's going on there. 6295 11:14:18,560 --> 11:14:25,760 Beautiful. So now we have a three layered model with more hidden units. So I wonder if we trained 6296 11:14:25,760 --> 11:14:31,680 this model for longer, are we going to get improvements here? So my challenge to you is we've already 6297 11:14:31,680 --> 11:14:36,320 done these steps before. We're going to do them over the next couple of videos for completeness. 6298 11:14:38,320 --> 11:14:45,360 But we need to what create a loss function. So I'll give you a hint. It's very similar to the one 6299 11:14:45,360 --> 11:14:52,560 we've already used. And we need to create an optimizer. And then once we've done that, we need to 6300 11:14:52,560 --> 11:15:01,520 write a training and evaluation loop for model one. So give that a shot. Otherwise, I'll see you 6301 11:15:01,520 --> 11:15:09,280 in the next video. We'll do this all together. Welcome back. In the last video, we subclassed 6302 11:15:09,280 --> 11:15:15,840 nn.module to create circle model V one, which is an upgrade on circle model V zero. In the 6303 11:15:15,840 --> 11:15:22,480 fact that we added more hidden units. So from five to 10. And we added a whole extra layer. 6304 11:15:23,120 --> 11:15:28,960 And we've got an instance of it ready to go. So we're up to in the workflow. We've got our data. 6305 11:15:28,960 --> 11:15:33,360 Well, we haven't changed the data. So we've built our new model. We now need to pick a loss function. 6306 11:15:33,360 --> 11:15:36,880 And I hinted at before that we're going to use the same loss function as before. 6307 11:15:36,880 --> 11:15:41,120 The same optimizer. You might have already done all of these steps. So you may know whether this 6308 11:15:41,120 --> 11:15:45,440 model works on our data set or not. But that's what we're going to work towards finding out in 6309 11:15:45,440 --> 11:15:50,240 this video. So we've built our new model. Now let's pick a loss function and optimizer. We could 6310 11:15:50,240 --> 11:15:54,960 almost do all of this with our eyes closed now, build a training loop, fit the model to the data, 6311 11:15:54,960 --> 11:16:00,320 make a prediction and evaluate the model. We'll come back here. And let's set up a loss function. 6312 11:16:00,320 --> 11:16:07,040 And by the way, if you're wondering, like, why would adding more features here, we've kind of 6313 11:16:07,040 --> 11:16:12,960 hinted at this before. And why would an extra layer improve our model? Well, again, it's back 6314 11:16:12,960 --> 11:16:19,040 to the fact that if we add more neurons, if we add more hidden units, and if we add more layers, 6315 11:16:19,040 --> 11:16:24,560 it just gives our model more numbers to adjust. So look at what's going on here, layer one, 6316 11:16:24,560 --> 11:16:32,320 layer two. Look how many more we have compared to model zero dot state date. 6317 11:16:36,480 --> 11:16:41,040 We have all of these. This is model zero. And we just upgraded it. Look how many more we have 6318 11:16:41,040 --> 11:16:47,360 from just adding an extra layer and more hidden units. So now we have our optimizer can change 6319 11:16:47,360 --> 11:16:53,360 these values to hopefully create a better representation of the data we're trying to fit. 6320 11:16:53,360 --> 11:17:00,720 So we just have more opportunity to learn patterns in our target data set. So that's the theory 6321 11:17:00,720 --> 11:17:06,960 behind it. So let's get rid of ease. Let's create a loss function. What are we going to use? Well, 6322 11:17:06,960 --> 11:17:15,440 we're going to use nn dot BCE with logit's loss. And our optimizer is going to be what? We're 6323 11:17:15,440 --> 11:17:21,680 going to keep that as the same as before, torch dot opt in dot SGD. But we have to be aware that 6324 11:17:21,680 --> 11:17:28,640 because we're using a new model, we have to pass in params of model one. These are the parameters 6325 11:17:28,640 --> 11:17:34,720 we want to optimize. And the LR is going to be 0.1. Is that the same LR we use before learning 6326 11:17:34,720 --> 11:17:42,160 rate? 0.1. Oh, potentially that our learning rate may be too big. 0.1. Where do we create our 6327 11:17:42,160 --> 11:17:48,720 optimizer? So we've written a lot of code here. Optimizer. There we go. 0.1. That's all right. 6328 11:17:48,720 --> 11:17:54,640 So we'll keep it at 0.1 just to keep as many things the same as possible. So we're going to set up 6329 11:17:54,640 --> 11:18:02,480 torch dot manual seed 42 to make training as reproducible as possible torch dot CUDA dot manual 6330 11:18:02,480 --> 11:18:10,560 seed 42. Now, as I said before, don't worry too much if your numbers aren't exactly the same as mine. 6331 11:18:10,560 --> 11:18:17,360 The direction is more important, whether it's good or bad direction. So now let's set up epochs. 6332 11:18:17,360 --> 11:18:25,200 We want to train for longer this time as well. So 1000 epochs. This is one of our three improvements 6333 11:18:25,200 --> 11:18:30,000 that we're trying to do. Adding more hidden units, increase the number of layers and increase the 6334 11:18:30,000 --> 11:18:35,600 number of epochs. So we're going to give our model 1000 looks at the data to try and improve 6335 11:18:35,600 --> 11:18:43,360 its patterns. So put data on the target device. We want to write device agnostic code. And yes, 6336 11:18:43,360 --> 11:18:49,120 we've already done this, but we're going to write it out again for practice because even though we 6337 11:18:49,120 --> 11:18:54,960 could functionize a lot of this, it's good while we're in still the foundation stages to practice 6338 11:18:54,960 --> 11:19:00,000 what's going on here, because I want you to be able to do this with your eyes closed before we 6339 11:19:00,000 --> 11:19:06,320 start to functionize it. So put the training data and the testing data to the target device, 6340 11:19:06,320 --> 11:19:14,320 whatever it is, CPU or GPU. And then we're going to, well, what's our song? For an epoch in range. 6341 11:19:16,160 --> 11:19:20,400 Let's loop through the epochs. We're going to start off with training. What do we do for training? Well, 6342 11:19:20,400 --> 11:19:27,440 we set model one to train. And then what's our first step? Well, we have to forward pass. What's 6343 11:19:27,440 --> 11:19:33,440 our outputs of the model? Well, the raw outputs of a model are logits. So model one, we're going 6344 11:19:33,440 --> 11:19:37,920 to pass it the training data. We're going to squeeze it so that we get rid of an extra one 6345 11:19:37,920 --> 11:19:41,520 dimension. If you don't believe me that we would like to get rid of that one dimension, 6346 11:19:41,520 --> 11:19:47,280 try running the code without that dot squeeze. And why pred equals torch dot round. 6347 11:19:48,880 --> 11:19:57,600 And torch dot sigmoid, why we're calling sigmoid on our logits to go from logits to prediction 6348 11:19:57,600 --> 11:20:07,920 probabilities to prediction labels. And then what do we do next? Well, we calculate the loss 6349 11:20:09,040 --> 11:20:18,080 slash accuracy to here. And remember, accuracy is optional, but loss is not optional. So we're 6350 11:20:18,080 --> 11:20:23,200 going to pass in here, our loss function is going to take in. I wonder if it'll work with just straight 6351 11:20:23,200 --> 11:20:30,800 up why pred? I don't think it will because we're using we need logits in here. Why logits and why 6352 11:20:30,800 --> 11:20:37,680 train? Because why? Oh, Google collab correcting the wrong thing. We have why logits because we're 6353 11:20:37,680 --> 11:20:44,800 using BCE with logits loss here. So let's keep pushing forward. We want our accuracy now, 6354 11:20:44,800 --> 11:20:49,920 which is our accuracy function. And we're going to pass in the order here, which is the reverse 6355 11:20:49,920 --> 11:20:55,120 of above, a little confusing, but I've kept the evaluation function in the same order as 6356 11:20:55,120 --> 11:21:02,960 scikit loan. Why pred equals y pred? Three, we're going to zero the gradients of the optimizer, 6357 11:21:03,760 --> 11:21:08,160 optimizer zero grad. And you might notice that we've started to pick up the pace a little. 6358 11:21:08,800 --> 11:21:13,200 That is perfectly fine. If I'm typing too fast, you can always slow down the video, 6359 11:21:13,840 --> 11:21:17,680 or you could just watch what we're doing and then code it out yourself afterwards, 6360 11:21:17,680 --> 11:21:21,840 the code resources will always be available. We're going to take the last backward 6361 11:21:22,880 --> 11:21:28,560 and perform back propagation. The only reason we're going faster is because we've covered 6362 11:21:28,560 --> 11:21:34,480 these steps. So anything that we sort of spend time here, we've covered in a previous video, 6363 11:21:34,480 --> 11:21:39,920 optimizer step. And this is where the adjustments to all of our models parameters are going to take 6364 11:21:39,920 --> 11:21:46,320 place to hopefully create a better representation of the data. And then we've got testing. What's 6365 11:21:46,320 --> 11:21:51,200 the first step that we do in testing? Well, we call model one dot a vowel to put it in evaluation 6366 11:21:51,200 --> 11:21:55,920 mode. And because we're making predictions, we're going to turn on torch inference mode 6367 11:21:55,920 --> 11:21:59,680 predictions. I call them predictions. Some other places call it inference. 6368 11:22:01,760 --> 11:22:05,440 Remember machine learning has a lot of different names for the same thing. 6369 11:22:05,440 --> 11:22:11,680 Forward pass. So we're going to create the test logits here. Equals model one X test. 6370 11:22:11,680 --> 11:22:16,480 And we're going to squeeze them because we won't don't want the extra one dimension. Just going to 6371 11:22:16,480 --> 11:22:20,880 add some code cells here so that we have more space and I'm typing in the middle of the screen. 6372 11:22:22,000 --> 11:22:26,640 Then I'm going to put in test pred here. How do we get from logits to predictions? Well, 6373 11:22:26,640 --> 11:22:32,080 we go torch dot round. And then we go torch dot sigmoid y sigmoid because we're working with a 6374 11:22:32,080 --> 11:22:37,280 binary classification problem. And to convert logits from a binary classification problem 6375 11:22:37,280 --> 11:22:43,680 to prediction probabilities, we use the sigmoid activation function. And then we're going to 6376 11:22:43,680 --> 11:22:53,280 calculate the loss. So how wrong is our model on the test data? So test last equals loss function. 6377 11:22:53,280 --> 11:23:00,240 We're going to pass it in the test logits. And then we're going to pass it in Y test for the ideal 6378 11:23:00,240 --> 11:23:06,880 labels. And then we're going to also calculate test accuracy. And test accuracy is going to 6379 11:23:06,880 --> 11:23:16,240 take in Y true equals Y test. So the test labels and Y pred equals test pred. So the test predictions 6380 11:23:17,280 --> 11:23:25,280 test predictions here. And our final step is to print out what's happening. So print out what's 6381 11:23:25,280 --> 11:23:31,360 happening. Oh, every tutorial needs a song. If I could, I'd teach everything with song. 6382 11:23:31,360 --> 11:23:38,160 Song and dance. So because we're training for 1000 epochs, how about every 100 epochs we print 6383 11:23:38,160 --> 11:23:45,040 out something. So print f string, and we're going to write epoch in here. So we know what epoch our 6384 11:23:45,040 --> 11:23:50,960 models on. And then we're going to print out the loss. Of course, this is going to be the training 6385 11:23:50,960 --> 11:23:57,200 loss. Because the test loss has test at the front of it. And then accuracy here. Now, of course, 6386 11:23:57,200 --> 11:24:05,760 this is going to be the training accuracy. We go here. And then we're going to pipe. And we're 6387 11:24:05,760 --> 11:24:11,760 going to print out the test loss. And we want the test loss here. We're going to take this to five 6388 11:24:11,760 --> 11:24:17,120 decimal places. Again, when we see the printouts of the different values, do not worry too much 6389 11:24:17,120 --> 11:24:24,080 about the exact numbers on my screen appearing on your screen, because that is inherent to the 6390 11:24:24,080 --> 11:24:31,280 randomness of machine learning. So have we got the direction is more important? Have we got, 6391 11:24:31,280 --> 11:24:35,360 we need a percentage sign here, because that's going to be a bit more complete for accuracy. 6392 11:24:35,360 --> 11:24:40,000 Have we got any errors here? I don't know. I'm just, we've just all coded this free hand, 6393 11:24:40,000 --> 11:24:44,880 right? There's a lot of code going on here. So we're about to train our next model, 6394 11:24:44,880 --> 11:24:49,440 which is the biggest model we've built so far in this course, three layers, 10 hidden units on 6395 11:24:49,440 --> 11:25:02,000 each layer. Let's see what we've got. Three, two, one, run. Oh, what? What? A thousand epochs, 6396 11:25:02,000 --> 11:25:08,560 an extra hidden layer, more hidden units. And we still, our model is still basically a coin toss. 6397 11:25:08,560 --> 11:25:12,320 50%. Now, this can't be for real. Let's plot the decision boundary. 6398 11:25:12,320 --> 11:25:22,080 Plot the decision boundary. To find out, let's get a bit visual. Plot figure, actually, to prevent us 6399 11:25:22,080 --> 11:25:28,480 from writing out all of the plot code, let's just go up here, and we'll copy this. Now, you know, 6400 11:25:28,480 --> 11:25:32,960 I'm not the biggest fan of copying code. But for this case, we've already written it. So there's 6401 11:25:32,960 --> 11:25:38,080 nothing really new here to cover. And we're going to just change this from model zero to model one, 6402 11:25:38,080 --> 11:25:42,480 because why it's our new model that we just trained. And so behind the scenes, plot decision 6403 11:25:42,480 --> 11:25:48,720 boundary is going to make predictions with the target model on the target data set and put it 6404 11:25:48,720 --> 11:25:56,000 into a nice visual representation for us. Oh, I said nice visual representation. What does this 6405 11:25:56,000 --> 11:26:01,600 look like? We've just got a coin toss on our data set. Our model is just again, it's trying 6406 11:26:01,600 --> 11:26:08,800 to draw a straight line to separate circular data. Now, why is this? Our model is based on linear, 6407 11:26:08,800 --> 11:26:18,080 is our data nonlinear? Hmm, maybe I've revealed a few of my tricks. I've done a couple of reveals 6408 11:26:18,080 --> 11:26:24,080 over the past few videos. But this is still quite annoying. And it can be fairly annoying 6409 11:26:24,080 --> 11:26:30,320 when you're training models and they're not working. So how about we verify that this model 6410 11:26:30,320 --> 11:26:36,080 can learn anything? Because right now it's just basically guessing for our data set. 6411 11:26:36,080 --> 11:26:42,000 So this model looks a lot like the model we built in section 01. Let's go back to this. 6412 11:26:42,000 --> 11:26:48,320 This is the learn pytorch.io book pytorch workflow fundamentals. Where did we create a model model 6413 11:26:48,320 --> 11:26:55,840 building essentials? Where did we build a model? Linear regression model? Yeah, here. And then 6414 11:26:55,840 --> 11:27:05,120 dot linear. But we built this model down here. So all we've changed from 01 to here is we've added 6415 11:27:05,120 --> 11:27:11,440 a couple of layers. The forward computation is quite similar. If this model can learn something 6416 11:27:11,440 --> 11:27:18,000 on a straight line, can this model learn something on a straight line? So that's my challenge to you 6417 11:27:18,000 --> 11:27:25,200 is grab the data set that we created in this previous notebook. So data, you could just 6418 11:27:25,200 --> 11:27:31,120 reproduce this in exact data set. And see if you can write some code to fit the model that we built 6419 11:27:31,120 --> 11:27:40,080 here. This one here on the data set that we created in here. Because I want to verify that 6420 11:27:40,080 --> 11:27:46,080 this model can learn anything. Because right now it seems like it's not learning anything at all. 6421 11:27:46,080 --> 11:27:50,720 And that's quite frustrating. So give that a shot. And I'll see you in the next video. 6422 11:27:50,720 --> 11:27:58,320 Welcome back. In the past few videos, we've tried to build a model to separate the blue from red 6423 11:27:58,320 --> 11:28:04,320 dots yet. Our previous efforts have proven futile, but don't worry. We're going to get there. I promise 6424 11:28:04,320 --> 11:28:08,320 you we're going to get there. And I may have a little bit of inside information here. But we're 6425 11:28:08,320 --> 11:28:13,440 going to build a model to separate these blue dots from red dots, a fundamental classification model. 6426 11:28:14,080 --> 11:28:20,480 And we tried a few things in the last couple of videos, such as training for longer, so more epochs. 6427 11:28:20,480 --> 11:28:26,320 We added another layer. We increased the hidden units because we learned of a few methods to 6428 11:28:26,320 --> 11:28:30,880 improve a model from a model perspective, such as upgrading the hyperparameters, such as number 6429 11:28:30,880 --> 11:28:36,240 of layers, more hidden units, fitting for longer, changing the activation functions, 6430 11:28:36,240 --> 11:28:41,040 changing the learning rate, we haven't quite done that one yet, and changing the loss function. 6431 11:28:41,920 --> 11:28:47,520 One way that I like to troubleshoot problems is I'm going to put a subheading here, 5.1. 6432 11:28:47,520 --> 11:28:55,040 We're going to prepare or preparing data to see if our model can fit a straight line. 6433 11:28:56,560 --> 11:29:06,640 So one way to troubleshoot, this is my trick for troubleshooting problems, especially neural 6434 11:29:06,640 --> 11:29:13,440 networks, but just machine learning in general, to troubleshoot a larger problem is to test out 6435 11:29:13,440 --> 11:29:21,920 a smaller problem. And so why is this? Well, because we know that we had something working 6436 11:29:21,920 --> 11:29:29,120 in a previous section, so 01, PyTorch, workflow fundamentals, we built a model here that worked. 6437 11:29:29,680 --> 11:29:36,400 And if we go right down, we know that this linear model can fit a straight line. So we're going 6438 11:29:36,400 --> 11:29:40,640 to replicate a data set to fit a straight line to see if the model that we're building here 6439 11:29:40,640 --> 11:29:46,560 can learn anything at all, because right now it seems like it can't. It's just tossing a coin 6440 11:29:46,560 --> 11:29:53,920 displayed between our data here, which is not ideal. So let's make some data. But yeah, this is the, 6441 11:29:54,480 --> 11:30:00,000 let's create a smaller problem, one that we know that works, and then add more complexity to try 6442 11:30:00,000 --> 11:30:05,120 and solve our larger problem. So create some data. This is going to be the same as notebook 01. 6443 11:30:05,120 --> 11:30:13,040 And I'm going to set up weight equals 0.7 bias equals 0.3. We're going to move quite quickly 6444 11:30:13,040 --> 11:30:19,840 through this because we've seen this in module one, but the overall takeaway from this is we're 6445 11:30:19,840 --> 11:30:25,440 going to see if our model works on any kind of problem at all, or do we have something fundamentally 6446 11:30:25,440 --> 11:30:33,600 wrong, create data. We're going to call it x regression, because it's a straight line, and we 6447 11:30:33,600 --> 11:30:38,480 want it to predict a number rather than a class. So you might be thinking, oh, we might have to change 6448 11:30:38,480 --> 11:30:47,280 a few things of our model architecture. Well, we'll see that in a second dot unsqueeze. And we're 6449 11:30:47,280 --> 11:30:53,840 going to go on the first dimension here or dim equals one. And why regression, we're going to use 6450 11:30:53,840 --> 11:31:00,160 the linear regression formula as well, wait times x, x regression, that is, because we're working 6451 11:31:00,160 --> 11:31:29,160 with a new data set here, plus the bias. So this is linear regression formula. Without epsilon. So it's a simplified version of linear regression, but the same formula that we've seen in a previous section. So now let's check the data. Nothing we really haven't covered here, but we're going to do a sanity check on it to make sure that we're dealing with what we're dealing with. 6452 11:31:29,160 --> 11:31:58,160 What we're dealing with is not just a load of garbage. Because it's all about the data and machine learning. I can't stress to you enough. That's the data explorer's motto is to visualize, visualize, visualize. Oh, what did we get wrong here? Unsqueeze. Did you notice that typo? Why didn't you say something? I'm kidding. There we go. Okay, so we've got 100 samples of x. We've got a different step size here, but that's all right. Let's have a little bit of fun with this. And we've got one x-value, which is, you know, a little bit more. 6453 11:31:58,160 --> 11:32:14,160 One x value per y value is a very similar data set to what we use before. Now, what do we do once we have a data set? Well, if we haven't already got training and test splits, we better make them. So create train and test splits. 6454 11:32:14,160 --> 11:32:27,160 And then we're going to go train split. We're going to use 80% equals int 0.8 times the length of, or we could just put 100 in there. 6455 11:32:27,160 --> 11:32:40,160 But we're going to be specific here. And then we're going to go x train regression, y train regression equals. What are these equal? Well, we're going to go on x regression. 6456 11:32:40,160 --> 11:32:55,160 And we're going to index up to the train split on the x. And then for the y, y regression, we're going to index up to the train split. 6457 11:32:55,160 --> 11:33:09,160 Wonderful. And then we can do the same on the test or creating the test data. Nothing really new here that we need to discuss. We're creating training and test sets. What do they do for each of them? 6458 11:33:09,160 --> 11:33:21,160 Well, the model is going to hopefully learn patterns in the training data set that is able to model the testing data set. And we're going to see that in a second. 6459 11:33:21,160 --> 11:33:37,160 So if we check the length of each, what do we have? Length x train regression. We might just check x train x test regression. What do we have here? 6460 11:33:37,160 --> 11:33:52,160 And then we're going to go length y train regression. Long variable names here. Excuse me for that. But we want to keep it separate from our already existing x and y data. What values do we have here? 6461 11:33:52,160 --> 11:34:12,160 80, 20, 80, 20, beautiful. So 80 training samples to 100 testing samples. That should be enough. Now, because we've got our helper functions file here. And if you don't have this, remember, we wrote some code up here before to where is it? 6462 11:34:12,160 --> 11:34:30,160 To download it from the course GitHub, and we imported plot predictions from it. Now, if we have a look at helper functions.py, it contains the plot predictions function that we created in the last section, section 0.1. There we go. Plot predictions. 6463 11:34:30,160 --> 11:34:41,160 So we're just running this exact same function here, or we're about to run it. It's going to save us from re typing out all of this. That's the beauty of having a helper functions.py file. 6464 11:34:41,160 --> 11:34:52,160 So if we come down here, let's plot our data to visually inspected. Right now, it's just numbers on a page. And we're not going to plot really any predictions because we don't have any predictions yet. 6465 11:34:52,160 --> 11:35:06,160 But we'll pass in the train data is equal to X train regression. And then the next one is the train labels, which is equal to Y train regression. 6466 11:35:06,160 --> 11:35:27,160 And then we have the test data, which is equal to X test regression. And then we have the test labels. Now, I think this should be labels too. Yeah, there we go. Y test progression might be proven wrong as we try to run this function. 6467 11:35:27,160 --> 11:35:42,160 Okay, there we go. So we have some training data and we have some testing data. Now, do you think that our model model one, we have a look what's model one could fit this data. 6468 11:35:42,160 --> 11:35:53,160 Does it have the right amount of in and out features? We may have to adjust these slightly. So I'd like you to think about that. Do we have to change the input features to our model for this data set? 6469 11:35:53,160 --> 11:36:00,160 And do we have to change the out features of our model for this data set? We'll find out in the next video. 6470 11:36:00,160 --> 11:36:16,160 Welcome back. We're currently working through a little side project here, but really the philosophy of what we're doing. We just created a straight line data set because we know that we've built a model in the past back in section 01 to fit a straight line data set. 6471 11:36:16,160 --> 11:36:26,160 And why are we doing this? Well, because the model that we've built so far is not fitting or not working on our circular data set here on our classification data set. 6472 11:36:26,160 --> 11:36:38,160 And so one way to troubleshoot a larger problem is to test out a smaller problem first. So later on, if you're working with a big machine learning data set, you'd probably start with a smaller portion of that data set first. 6473 11:36:38,160 --> 11:36:46,160 Likewise, with a larger machine learning model, instead of starting with a huge model, you'll start with a small model. 6474 11:36:46,160 --> 11:36:55,160 So we're taking a step back here to see if our model is going to learn anything at all on a straight line data set so that we can improve it for a non-straight line data set. 6475 11:36:55,160 --> 11:37:07,160 And there's another hint. Oh, we're going to cover it in a second. I promise you. But let's see how now we can adjust model one to fit a straight line. 6476 11:37:07,160 --> 11:37:16,160 And I should do the question at the end of last video. Do we have to adjust the parameters of model one in any way shape or form to fit this straight line data? 6477 11:37:16,160 --> 11:37:26,160 And you may have realized or you may not have that our model one is set up for our classification data, which has two X input features. 6478 11:37:26,160 --> 11:37:37,160 Whereas this data, if we go X train regression, how many input features do we have? We just get the first sample. 6479 11:37:37,160 --> 11:37:52,160 There's only one value. Or maybe we get the first 10. There's only one value per, let's remind ourselves, this is input and output shapes, one of the most fundamental things in machine learning and deep learning. 6480 11:37:52,160 --> 11:38:01,160 And trust me, I still get this wrong all the time. So that's why I'm harping on about it. We have one feature per one label. So we have to adjust our model slightly. 6481 11:38:01,160 --> 11:38:08,160 We have to change the end features to be one instead of two. The out features can stay the same because we want one number to come out. 6482 11:38:08,160 --> 11:38:23,160 So what we're going to do is code up a little bit different version of model one. So same architecture as model one. But using NN dot sequential, we're going to do the faster way of coding a model here. 6483 11:38:23,160 --> 11:38:30,160 Let's create model two and NN dot sequential. The only thing that's going to change is the number of input features. 6484 11:38:30,160 --> 11:38:42,160 So this will be the exact same code as model one. And the only difference, as I said, will be features or in features is one. And then we'll go out features equals 10. 6485 11:38:42,160 --> 11:38:51,160 So 10 hidden units in the first layer. And of course, the second layer, the number of features here has to line up with the out features of the previous layer. 6486 11:38:51,160 --> 11:39:01,160 This one's going to output 10 features as well. So we're scaling things up from one feature to 10 to try and give our model as much of a chance or as many parameters as possible. 6487 11:39:01,160 --> 11:39:08,160 Of course, we could make this number quite large. We could make it a thousand features if we want. But there is an upper bound on these things. 6488 11:39:08,160 --> 11:39:14,160 And I'm going to let you find those in your experience as a machine learning engineer and a data scientist. 6489 11:39:14,160 --> 11:39:23,160 But for now, we're keeping it nice and small. So we can run as many experiments as possible. Beautiful. Look at that. We've created a sequential model. What happens with NN dot sequential? 6490 11:39:23,160 --> 11:39:31,160 Data goes in here, passes through this layer. Then it passes through this layer. Then it passes through this layer. And what happens when it goes through the layer? 6491 11:39:31,160 --> 11:39:39,160 It triggers the layers forward method, the internal forward method. In the case of NN dot linear, we've seen it. It's got the linear regression formula. 6492 11:39:39,160 --> 11:39:50,160 So if we go NN dot linear, it performs this mathematical operation, the linear transformation. But we've seen that before. Let's keep pushing forward. 6493 11:39:50,160 --> 11:40:00,160 Let's create a loss and an optimizer loss and optimize. We're going to work through our workflow. So loss function, we have to adjust this slightly. 6494 11:40:00,160 --> 11:40:10,160 We're going to use the L1 loss because why we're dealing with a regression problem here rather than a classification problem. And our optimizer, what can we use for our optimizer? 6495 11:40:10,160 --> 11:40:21,160 How about we bring in just the exact same optimizer SGD that we've been using for our classification data. So model two dot params or parameters. 6496 11:40:21,160 --> 11:40:30,160 Always get a little bit confused. And we'll give it an LR of 0.1 because that's what we've been using so far. This is the params here. 6497 11:40:30,160 --> 11:40:38,160 So we want our optimizer to optimize our model two parameters here with a learning rate of 0.1. The learning rate is what? 6498 11:40:38,160 --> 11:40:47,160 The amount each parameter will be or the multiplier that will be applied to each parameter each epoch. 6499 11:40:47,160 --> 11:41:00,160 So now let's train the model. Do you think we could do that in this video? I think we can. So we might just train it on the training data set and then we can evaluate it on the test data set separately. 6500 11:41:00,160 --> 11:41:13,160 So we'll set up both manual seeds, CUDA and because we've set our model to the device up here. So it should be on the GPU or whatever device you have active. 6501 11:41:13,160 --> 11:41:21,160 So set the number of epochs. How many epochs should we set? Well, we set a thousand before, so we'll keep it at that. 6502 11:41:21,160 --> 11:41:30,160 epochs equals a thousand. And now we're getting really good at this sort of stuff here. Let's put our data. Put the data on the target device. 6503 11:41:30,160 --> 11:41:42,160 And I know we've done a lot of the similar steps before, but there's a reason for that. I've kept all these in here because I'd like you to buy the end of this course is to sort of know all of this stuff off by heart. 6504 11:41:42,160 --> 11:41:47,160 And even if you don't know it all off my heart, because trust me, I don't, you know where to look. 6505 11:41:47,160 --> 11:42:00,160 So X train regression, we're going to send this to device. And then we're going to go Y train regression, just a reminder or something to get you to think while we're writing this code. 6506 11:42:00,160 --> 11:42:09,160 What would happen if we didn't put our data on the same device as a model? We've seen that error come up before, but what happens? 6507 11:42:09,160 --> 11:42:16,160 Well, I've just kind of given away, haven't you Daniel? Well, that was a great question. Our code will air off. 6508 11:42:16,160 --> 11:42:22,160 Oh, well, don't worry. There's plenty of questions I've been giving you that I haven't given the answer to yet. 6509 11:42:22,160 --> 11:42:30,160 Device a beautiful. We've got a device agnostic code for the model and for the data. And now let's loop through epochs. 6510 11:42:30,160 --> 11:42:39,160 So train. We're going to for epoch in range epochs for an epoch in a range. Do the forward pass. 6511 11:42:39,160 --> 11:42:49,160 Calculate the loss. So Y pred equals model two. This is the forward pass. X train regression. 6512 11:42:49,160 --> 11:42:58,160 It's all going to work out hunky Dory because our model and our data are on the same device loss equals what we're going to bring in our loss function. 6513 11:42:58,160 --> 11:43:09,160 Then we're going to compare the predictions to Y train regression to the Y labels. What do we do next? 6514 11:43:09,160 --> 11:43:16,160 Optimize a zero grad. Optimize a dot zero grad. We're doing all of this with our comments. Look at us go. 6515 11:43:16,160 --> 11:43:24,160 Loss backward and what's next? Optimize a step, step, step. And of course, we could do some testing here. 6516 11:43:24,160 --> 11:43:33,160 Testing. We'll go model two dot a vowel. And then we'll go with torch dot inference mode. 6517 11:43:33,160 --> 11:43:41,160 We'll do the forward pass. We'll create the test predictions equals model two dot X test regression. 6518 11:43:41,160 --> 11:43:51,160 And then we'll go the test loss equals loss FN on the test predictions and versus the Y test labels. 6519 11:43:51,160 --> 11:44:00,160 Beautiful. Look at that. We've just done an optimization loop, something we spent a whole hour on before, maybe even longer, in about ten lines of code. 6520 11:44:00,160 --> 11:44:05,160 And of course, we could shorten this by making these a function. But we're going to see that later on. 6521 11:44:05,160 --> 11:44:13,160 I'd rather us give a little bit of practice while this is still a bit fresh. Print out what's happening. 6522 11:44:13,160 --> 11:44:21,160 Let's print out what's happening. What should we do? So because we're training for a thousand epochs, I like the idea of printing out something every 100 epochs. 6523 11:44:21,160 --> 11:44:33,160 That should be about enough of a step. Epoch. What do we got? We'll put in the epoch here with the F string and then we'll go to loss, which will be loss. 6524 11:44:33,160 --> 11:44:42,160 And maybe we'll get the first five of those five decimal places that is. We don't have an accuracy, do we? 6525 11:44:42,160 --> 11:44:50,160 Because we're working with regression. And we'll get the test loss out here. And that's going to be.5F as well. 6526 11:44:50,160 --> 11:44:58,160 Beautiful. Have we got any mistakes? I don't think we do. We didn't even run this code cell before. We'll just run these three again, see if we got... 6527 11:44:58,160 --> 11:45:04,160 Look at that. Oh my goodness. Is our loss... Our loss is going down. 6528 11:45:04,160 --> 11:45:09,160 So that means our model must be learning something. 6529 11:45:09,160 --> 11:45:17,160 Now, what if we adjusted the learning rate here? I think if we went 0.01 or something, will that do anything? 6530 11:45:17,160 --> 11:45:25,160 Oh, yes. Look how low our loss gets on the test data set. But let's confirm that. We've got to make some predictions. 6531 11:45:25,160 --> 11:45:30,160 Well, maybe we should do that in the next video. Yeah, this one's getting too long. But how good's that? 6532 11:45:30,160 --> 11:45:37,160 We created a straight line data set and we've created a model to fit it. We set up a loss and an optimizer already. 6533 11:45:37,160 --> 11:45:43,160 And we put the data on the target device. We trained and we tested so our model must be learning something. 6534 11:45:43,160 --> 11:45:48,160 But I'd like you to give a shot at confirming that by using our plot predictions function. 6535 11:45:48,160 --> 11:45:58,160 So make some predictions with our trained model. Don't forget to turn on inference mode. And we should see some red dots here fairly close to the green dots on the next plot. 6536 11:45:58,160 --> 11:46:02,160 Give that a shot and I'll see you in the next video. 6537 11:46:02,160 --> 11:46:11,160 Welcome back. In the last video, we did something very exciting. We solved a smaller problem that's giving us a hint towards our larger problem. 6538 11:46:11,160 --> 11:46:17,160 So we know that the model that we've previously been building, model two, has the capacity to learn something. 6539 11:46:17,160 --> 11:46:25,160 Now, how did we know that? Well, it's because we created this straight line data set. We replicated the architecture that we used for model one. 6540 11:46:25,160 --> 11:46:35,160 Recall that model one didn't work very well on our classification data. But with a little bit of an adjustment such as changing the number of in features. 6541 11:46:35,160 --> 11:46:44,160 And not too much different training code except for a different loss function because, well, we use MAE loss with regression data. 6542 11:46:44,160 --> 11:46:50,160 And we changed the learning rate slightly because we found that maybe our model could learn a bit better. 6543 11:46:50,160 --> 11:46:59,160 And again, I'd encourage you to play around with different values of the learning rate. In fact, anything that we've changed, try and change it yourself and just see what happens. 6544 11:46:59,160 --> 11:47:04,160 That's one of the best ways to learn what goes on with machine learning models. 6545 11:47:04,160 --> 11:47:10,160 But we trained for the same number of epochs. We set up device agnostic code. We did a training and testing loop. 6546 11:47:10,160 --> 11:47:15,160 Look at this looks. Oh, my goodness. Well done. And our loss went down. 6547 11:47:15,160 --> 11:47:23,160 So, hmm. What does that tell us? Well, it tells us that model two or the specific architecture has some capacity to learn something. 6548 11:47:23,160 --> 11:47:28,160 So we must be missing something. And we're going to get to that in a minute, I promise you. 6549 11:47:28,160 --> 11:47:35,160 But we're just going to confirm that our model has learned something and it's not just numbers on a page going down by getting visual. 6550 11:47:35,160 --> 11:47:43,160 So turn on. We're going to make some predictions and plot them. And you may have already done this because I issued that challenge at the last of at the end of the last video. 6551 11:47:43,160 --> 11:47:53,160 So turn on evaluation mode. Let's go model two dot eval. And let's make predictions, which are also known as inference. 6552 11:47:53,160 --> 11:48:02,160 And we're going to go with torch dot inference mode inference mode with torch dot inference mode. 6553 11:48:02,160 --> 11:48:10,160 Make some predictions. We're going to save them as why preds and we're going to use model two and we're going to pass it through ex test regression. 6554 11:48:10,160 --> 11:48:16,160 This should all work because we've set up device agnostic code, plot data and predictions. 6555 11:48:16,160 --> 11:48:23,160 To do this, we can of course use our plot predictions function that we imported via our helper functions dot pi function. 6556 11:48:23,160 --> 11:48:27,160 The code for that is just a few cells above if you'd like to check that out. 6557 11:48:27,160 --> 11:48:33,160 But let's set up the train data here. Train data parameter, which is x train regression. 6558 11:48:33,160 --> 11:48:48,160 And my goodness. Google collab. I'm already typing fast enough. You don't have to slow me down by giving me the wrong auto corrects train label equals y train regression. 6559 11:48:48,160 --> 11:48:53,160 And then we're going to pass in our test data equals ex test regression. 6560 11:48:53,160 --> 11:49:03,160 And then we're going to pass in test labels, which is why test regression got too many variables going on here. My goodness gracious. 6561 11:49:03,160 --> 11:49:08,160 We could have done better with naming, but this will do for now is why preds. 6562 11:49:08,160 --> 11:49:13,160 And then if we plot this, what does it look like? Oh, no, we got an error. 6563 11:49:13,160 --> 11:49:23,160 Now secretly, I kind of knew that that was coming ahead of time. That's the advantage of being the host of this machine learning cooking show. So type error. How do we fix this? 6564 11:49:23,160 --> 11:49:34,160 Remember how I asked you in one of the last videos what would happen if our data wasn't on the same device as our model? Well, we get an error, right? But this is a little bit different as well. 6565 11:49:34,160 --> 11:49:45,160 We've seen this one before. We've got CUDA device type tensa to NumPy. Where is this coming from? Well, because our plot predictions function uses mapplotlib. 6566 11:49:45,160 --> 11:49:55,160 And behind the scenes, mapplotlib references NumPy, which is another numerical computing library. However, NumPy uses a CPU rather than the GPU. 6567 11:49:55,160 --> 11:50:11,160 So we have to call dot CPU, this helpful message is telling us, call tensa dot CPU before we use our tensors with NumPy. So let's just call dot CPU on all of our tensor inputs here and see if this solves our problem. 6568 11:50:11,160 --> 11:50:22,160 Wonderful. Looks like it does. Oh my goodness. Look at those red dots so close. Well, okay. So this just confirms our suspicions. What we kind of already knew is that our model did have some capacity to learn. 6569 11:50:22,160 --> 11:50:34,160 It's just the data set when we changed the data set it worked. So, hmm. Is it our data that our model can't learn on? Like this circular data, or is the model itself? 6570 11:50:34,160 --> 11:50:44,160 Remember, our model is only comprised of linear functions. What is linear? Linear is a straight line, but is our data made of just straight lines? 6571 11:50:44,160 --> 11:50:58,160 I think it's got some nonlinearities in there. So the big secret I've been holding back will reveal itself starting from the next video. So if you want a head start of it, I'd go to torch and end. 6572 11:50:58,160 --> 11:51:15,160 And if we have a look at the documentation, we've been speaking a lot about linear functions. What are these nonlinear activations? And I'll give you another spoiler. We've actually seen one of these nonlinear activations throughout this notebook. 6573 11:51:15,160 --> 11:51:27,160 So go and check that out. See what you can infer from that. And I'll see you in the next video. Let's get started with nonlinearities. Welcome back. 6574 11:51:27,160 --> 11:51:38,160 In the last video, we saw that the model that we've been building has some potential to learn. I mean, look at these predictions. You could get a little bit better, of course, get the red dots on top of the green dots. 6575 11:51:38,160 --> 11:51:47,160 But we're just going to leave that the trend is what we're after. Our model has some capacity to learn, except this is straight line data. 6576 11:51:47,160 --> 11:51:57,160 And we've been hinting at it a fair bit is that we're using linear functions. And if we look up linear data, what does it look like? 6577 11:51:57,160 --> 11:52:05,160 Well, it has a quite a straight line. If we go linear and just search linear, what does this give us? Linear means straight. There we go, straight. 6578 11:52:05,160 --> 11:52:14,160 And then what happens if we search for nonlinear? I kind of hinted at this as well. Nonlinear. Oh, we get some curves. We get curved lines. 6579 11:52:14,160 --> 11:52:20,160 So linear functions. Straight. Nonlinear functions. Hmm. 6580 11:52:20,160 --> 11:52:34,160 Now, this is one of the beautiful things about machine learning. And I'm not sure about you, but when I was in high school, I kind of learned a concept called line of best fit, or y equals mx plus c, or 6581 11:52:34,160 --> 11:52:41,160 y equals mx plus b. And it looks something like this. And then if you wanted to go over these, you use quadratic functions and a whole bunch of other stuff. 6582 11:52:41,160 --> 11:52:51,160 But one of the most fundamental things about machine learning is that we build neural networks and deep down neural networks are just a combination. 6583 11:52:51,160 --> 11:52:56,160 It could be a large combination of linear functions and nonlinear functions. 6584 11:52:56,160 --> 11:53:11,160 So that's why in torch.nn, we have nonlinear activations and we have all these other different types of layers. But essentially, what they're doing deep down is combining straight lines with, if we go back up to our data, non straight lines. 6585 11:53:11,160 --> 11:53:21,160 So, of course, our model didn't work before because we've only given it the power to use linear lines. We've only given it the power to use straight lines. 6586 11:53:21,160 --> 11:53:29,160 But our data is what? It's curved. Although it's simple, we need nonlinearity to be able to model this data set. 6587 11:53:29,160 --> 11:53:38,160 And now, let's say we were building a pizza detection model. So let's look up some images of pizza, one of my favorite foods, images. 6588 11:53:38,160 --> 11:53:43,160 Pizza, right? So could you model pizza with just straight lines? 6589 11:53:43,160 --> 11:53:53,160 You're thinking, Daniel, you can't be serious. A computer vision model doesn't look for just straight lines in this. And I'd argue that, yes, it does, except we also add some curved lines in here. 6590 11:53:53,160 --> 11:54:02,160 That's the beauty of machine learning. Could you imagine trying to write the rules of an algorithm to detect that this is a pizza? Maybe you could put in, oh, it's a curve here. 6591 11:54:02,160 --> 11:54:14,160 And if you see red, no, no, no, no. Imagine if you're trying to do a hundred different foods. Your program would get really large. Instead, we give our machine learning models, if we come down to the model that we created. 6592 11:54:14,160 --> 11:54:22,160 We give our deep learning models the capacity to use linear and nonlinear functions. We haven't seen any nonlinear layers just yet. 6593 11:54:22,160 --> 11:54:27,160 Or maybe we've hinted at some, but that's all right. So we stack these on top of each other, these layers. 6594 11:54:27,160 --> 11:54:38,160 And then the model figures out what patterns in the data it should use, what lines it should draw to draw patterns to not only pizza, but another food such as sushi. 6595 11:54:38,160 --> 11:54:50,160 If we wanted to build a food image classification model, it would do this. The principle remains the same. So the question I'm going to pose to you, we'll get out of this, is, we'll come down here. 6596 11:54:50,160 --> 11:54:58,160 We've unlocked the missing piece or about to. We're going to cover it over the next couple of videos, the missing piece of our model. 6597 11:54:58,160 --> 11:55:05,160 And this is a big one. This is going to follow you out throughout all of machine learning and deep learning, nonlinearity. 6598 11:55:05,160 --> 11:55:25,160 So the question here is, what patterns could you draw if you were given an infinite amount of straight and non straight lines? 6599 11:55:25,160 --> 11:55:39,160 Or in machine learning terms, an infinite amount, but really it is finite. By infinite in machine learning terms, this is a technicality. 6600 11:55:39,160 --> 11:55:45,160 It could be a million parameters. It could be as we've got probably a hundred parameters in our model. 6601 11:55:45,160 --> 11:55:56,160 So just imagine a large amount of straight and non straight lines, an infinite amount of linear and nonlinear functions. 6602 11:55:56,160 --> 11:56:10,160 You could draw some pretty intricate patterns, couldn't you? And that's what gives machine learning and especially neural networks the capacity to not only fit a straight line here, but to separate two different circles. 6603 11:56:10,160 --> 11:56:19,160 But also to do crazy things like drive a self-driving car, or at least power the vision system of a self-driving car. 6604 11:56:19,160 --> 11:56:24,160 Of course, after that, you need some programming to plan what to actually do with what you see in an image. 6605 11:56:24,160 --> 11:56:29,160 But we're getting ahead of ourselves here. Let's now start diving into nonlinearity. 6606 11:56:29,160 --> 11:56:35,160 And the whole idea here is combining the power of linear and nonlinear functions. 6607 11:56:35,160 --> 11:56:44,160 Straight lines and non straight lines. Our classification data is not comprised of just straight lines. It's circles, so we need nonlinearity here. 6608 11:56:44,160 --> 11:56:54,160 So recreating nonlinear data, red and blue circles. We don't need to recreate this, but we're going to do it anyway for completeness. 6609 11:56:54,160 --> 11:57:01,160 So let's get a little bit of a practice. Make and plot data. This is so that you can practice the use of nonlinearity on your own. 6610 11:57:01,160 --> 11:57:09,160 And that plot little bit dot pie plot as PLT. We're going to go a bit faster here because we've covered this code above. 6611 11:57:09,160 --> 11:57:15,160 So import make circles. We're just going to recreate the exact same circle data set that we've created above. 6612 11:57:15,160 --> 11:57:21,160 Number of samples. We'll create a thousand. And we're going to create x and y equals what? 6613 11:57:21,160 --> 11:57:25,160 Make circles. Pass it in number of samples. Beautiful. 6614 11:57:25,160 --> 11:57:33,160 Colab, please. I wonder if I can turn off autocorrect and colab. I'm happy to just see all of my errors in the flesh. See? Look at that. I don't want that. 6615 11:57:33,160 --> 11:57:40,160 I want noise like that. Maybe I'll do that in the next video. We're not going to spend time here looking around how to do it. 6616 11:57:40,160 --> 11:57:46,160 We can work that out on the fly later. For now, I'm too excited to share with you the power of nonlinearity. 6617 11:57:46,160 --> 11:57:55,160 So here, x, we're just going to plot what's going on. We've got two x features and we're going to color it with the flavor of y because we're doing a binary classification. 6618 11:57:55,160 --> 11:58:08,160 And we're going to use one of my favorite C maps, which is color map. And we're going to go PLT dot CM for C map and red blue. 6619 11:58:08,160 --> 11:58:10,160 What do we get? 6620 11:58:10,160 --> 11:58:19,160 Okay, red circle, blue circle. Hey, is it the same color as what's above? I like this color better. 6621 11:58:19,160 --> 11:58:23,160 Did we get that right up here? 6622 11:58:23,160 --> 11:58:28,160 Oh, my goodness. Look how much code we've written. Yeah, I like the other blue. I'm going to bring this down here. 6623 11:58:28,160 --> 11:58:37,160 It's all about aesthetics and machine learning. It's not just numbers on a page, don't you? How could you be so crass? Let's go there. 6624 11:58:37,160 --> 11:58:43,160 Okay, that's better color red and blue. That's small lively, isn't it? So now let's convert to train and test. 6625 11:58:43,160 --> 11:58:47,160 And then we can start to build a model with nonlinearity. Oh, this is so good. 6626 11:58:47,160 --> 11:58:58,160 Okay, convert data to tenses and then to train and test splits. Nothing we haven't covered here before. 6627 11:58:58,160 --> 11:59:08,160 So import torch, but it never hurts to practice code, right? Import torch from sklearn dot model selection. 6628 11:59:08,160 --> 11:59:18,160 Import train test split so that we can split our red and blue dots randomly. And we're going to turn data into tenses. 6629 11:59:18,160 --> 11:59:28,160 And we'll go X equals torch from NumPy and we'll pass in X here. And then we'll change it into type torch dot float. 6630 11:59:28,160 --> 11:59:33,160 Why do we do this? Well, because, oh, my goodness, autocorrect. It's getting the best of me here. 6631 11:59:33,160 --> 11:59:38,160 You know, watching me live code this stuff and battle with autocorrect. That's what this whole course is. 6632 11:59:38,160 --> 11:59:44,160 And we're really teaching pie torch. Am I just battling with Google collab's autocorrect? 6633 11:59:44,160 --> 11:59:52,160 We are turning it into torch dot float with a type here because why NumPy's default, which is what makes circles users behind the scenes. 6634 11:59:52,160 --> 11:59:59,160 NumPy is actually using a lot of other machine learning libraries, pandas, built on NumPy, scikit learn, does a lot of NumPy. 6635 11:59:59,160 --> 12:00:06,160 Matplotlib, NumPy. That's just showing there. What's the word? Is it ubiquitous, ubiquity? I'm not sure, maybe. 6636 12:00:06,160 --> 12:00:10,160 If not, you can correct me. The ubiquity of NumPy. 6637 12:00:10,160 --> 12:00:17,160 And test sets, but we're using pie torch to leverage the power of autograd, which is what powers our gradient descent. 6638 12:00:17,160 --> 12:00:21,160 And the fact that it can use GPUs. 6639 12:00:21,160 --> 12:00:30,160 So we're creating training test splits here with train test split X Y. 6640 12:00:30,160 --> 12:00:36,160 And we're going to go test size equals 0.2. And we're going to set random. 6641 12:00:36,160 --> 12:00:46,160 Random state equals 42. And then we'll view our first five samples. Are these going to be? 6642 12:00:46,160 --> 12:00:52,160 Tenses. Fingers crossed. We haven't got an error. Beautiful. We have tenses here. 6643 12:00:52,160 --> 12:00:56,160 Okay. Now we're up to the exciting part. We've got our data set back. 6644 12:00:56,160 --> 12:00:59,160 I think it's time to build a model with nonlinearity. 6645 12:00:59,160 --> 12:01:06,160 So if you'd like to peek ahead, check out TorchNN again. This is a little bit of a spoiler. 6646 12:01:06,160 --> 12:01:12,160 Go into the nonlinear activation. See if you can find the one that we've already used. That's your challenge. 6647 12:01:12,160 --> 12:01:21,160 Can you find the one we've already used? And go into here and search what is our nonlinear function. 6648 12:01:21,160 --> 12:01:28,160 So give that a go and see what comes up. I'll see you in the next video. 6649 12:01:28,160 --> 12:01:34,160 Welcome back. Now put your hand up if you're ready to learn about nonlinearity. 6650 12:01:34,160 --> 12:01:38,160 And I know I can't see your hands up, but I better see some hands up or I better feel some hands up 6651 12:01:38,160 --> 12:01:44,160 because my hands up because nonlinearity is a magic piece of the puzzle that we're about to learn about. 6652 12:01:44,160 --> 12:01:50,160 So let's title this section building a model with nonlinearity. 6653 12:01:50,160 --> 12:02:03,160 So just to re-emphasize linear equals straight lines and in turn nonlinear equals non-straight lines. 6654 12:02:03,160 --> 12:02:09,160 And I left off the end of the last video, giving you the challenge of checking out the TorchNN module, 6655 12:02:09,160 --> 12:02:13,160 looking for the nonlinear function that we've already used. 6656 12:02:13,160 --> 12:02:19,160 Now where would you go to find such a thing and oh, what do we have here? Nonlinear activations. 6657 12:02:19,160 --> 12:02:27,160 And there's going to be a fair few things here, but essentially all of the modules within TorchNN 6658 12:02:27,160 --> 12:02:31,160 are either some form of layer in a neural network if we recall. 6659 12:02:31,160 --> 12:02:35,160 Let's go to a neural network. We've seen the anatomy of a neural network. 6660 12:02:35,160 --> 12:02:41,160 Generally you'll have an input layer and then multiple hidden layers and some form of output layer. 6661 12:02:41,160 --> 12:02:49,160 Well, these multiple hidden layers can be almost any combination of what's in TorchNN. 6662 12:02:49,160 --> 12:02:53,160 And in fact, they can almost be any combination of function you could imagine. 6663 12:02:53,160 --> 12:02:57,160 Whether they work or not is another question. 6664 12:02:57,160 --> 12:03:02,160 But PyTorch implements some of the most common layers that you would have as hidden layers. 6665 12:03:02,160 --> 12:03:07,160 And they might be pooling layers, padding layers, activation functions. 6666 12:03:07,160 --> 12:03:14,160 And they all have the same premise. They perform some sort of mathematical operation on an input. 6667 12:03:14,160 --> 12:03:22,160 And so if we look into the nonlinear activation functions, you might have find an n dot sigmoid. 6668 12:03:22,160 --> 12:03:27,160 Where have we used this before? There's a sigmoid activation function in math terminology. 6669 12:03:27,160 --> 12:03:31,160 It takes some input x, performs this operation on it. 6670 12:03:31,160 --> 12:03:36,160 And here's what it looks like if we did it on a straight line, but I think we should put this in practice. 6671 12:03:36,160 --> 12:03:39,160 And if you want an example, well, there's an example there. 6672 12:03:39,160 --> 12:03:43,160 All of the other nonlinear activations have examples as well. 6673 12:03:43,160 --> 12:03:46,160 But I'll let you go through all of these in your own time. 6674 12:03:46,160 --> 12:03:48,160 Otherwise we're going to be here forever. 6675 12:03:48,160 --> 12:03:50,160 And then dot relu is another common function. 6676 12:03:50,160 --> 12:03:54,160 We saw that when we looked at the architecture of a classification network. 6677 12:03:54,160 --> 12:04:02,160 So with that being said, how about we start to code a classification model with nonlinearity. 6678 12:04:02,160 --> 12:04:08,160 And of course, if you wanted to, you could look up what is a nonlinear function. 6679 12:04:08,160 --> 12:04:12,160 If you wanted to learn more, nonlinear means the graph is not a straight line. 6680 12:04:12,160 --> 12:04:16,160 Oh, beautiful. So that's how I'd learn about nonlinear functions. 6681 12:04:16,160 --> 12:04:20,160 But while we're here together, how about we write some code. 6682 12:04:20,160 --> 12:04:28,160 So let's go build a model with nonlinear activation functions. 6683 12:04:28,160 --> 12:04:33,160 And just one more thing before, just to re-emphasize what we're doing here. 6684 12:04:33,160 --> 12:04:38,160 Before we write this code, I've got, I just remembered, I've got a nice slide, 6685 12:04:38,160 --> 12:04:43,160 which is the question we posed in the previous video, the missing piece, nonlinearity. 6686 12:04:43,160 --> 12:04:50,160 But the question I want you to think about is what could you draw if you had an unlimited amount of straight, 6687 12:04:50,160 --> 12:04:54,160 in other words, linear, and non-straight, nonlinear line. 6688 12:04:54,160 --> 12:05:02,160 So we've seen previously that we can build a model, a linear model to fit some data that's in a straight line, linear data. 6689 12:05:02,160 --> 12:05:09,160 But when we're working with nonlinear data, well, we need the power of nonlinear functions. 6690 12:05:09,160 --> 12:05:14,160 So this is circular data. And now, this is only a 2D plot, keep in mind there. 6691 12:05:14,160 --> 12:05:19,160 Whereas neural networks and machine learning models can work with numbers that are in hundreds of dimensions, 6692 12:05:19,160 --> 12:05:25,160 impossible for us humans to visualize, but since computers love numbers, it's a piece of cake to them. 6693 12:05:25,160 --> 12:05:32,160 So from torch, import, and then we're going to create our first neural network with nonlinear activations. 6694 12:05:32,160 --> 12:05:36,160 This is so exciting. So let's create a class here. 6695 12:05:36,160 --> 12:05:42,160 We'll create circle model. We've got circle model V1 already. We're going to create circle model V2. 6696 12:05:42,160 --> 12:05:50,160 And we'll inherit from an end dot module. And then we'll write the constructor, which is the init function, 6697 12:05:50,160 --> 12:05:57,160 and we'll pass in self here. And then we'll go self, or super sorry, too many S words. 6698 12:05:57,160 --> 12:06:04,160 Dot underscore underscore init, underscore. There we go. So we've got the constructor here. 6699 12:06:04,160 --> 12:06:13,160 And now let's create a layer one, self dot layer one equals just the same as what we've used before. And then dot linear. 6700 12:06:13,160 --> 12:06:20,160 We're going to create this quite similar to the model that we've built before, except with one added feature. 6701 12:06:20,160 --> 12:06:26,160 And we're going to create in features, which is akin to the number of X features that we have here. 6702 12:06:26,160 --> 12:06:31,160 Again, if this was different, if we had three X features, we might change this to three. 6703 12:06:31,160 --> 12:06:38,160 But because we're working with two, we'll leave it as that. We'll keep out features as 10, so that we have 10 hidden units. 6704 12:06:38,160 --> 12:06:47,160 And then we'll go layer two, and then dot linear. Again, these values here are very customizable because why, because they're hyper parameters. 6705 12:06:47,160 --> 12:06:53,160 So let's line up the out features of layer two, and we'll do the same with layer three. 6706 12:06:53,160 --> 12:06:59,160 Because layer three is going to take the outputs of layer two. So it needs in features of 10. 6707 12:06:59,160 --> 12:07:06,160 And we want layer three to be the output layer, and we want one number as output, so we'll set one here. 6708 12:07:06,160 --> 12:07:13,160 Now, here's the fun part. We're going to introduce a nonlinear function. We're going to introduce the relu function. 6709 12:07:13,160 --> 12:07:18,160 Now, we've seen sigmoid. Relu is another very common one. It's actually quite simple. 6710 12:07:18,160 --> 12:07:22,160 But let's write it out first, and then dot relu. 6711 12:07:22,160 --> 12:07:32,160 So remember, torch dot nn stores a lot of existing nonlinear activation functions, so that we don't necessarily have to code them ourselves. 6712 12:07:32,160 --> 12:07:37,160 However, if we did want to code a relu function, let me show you. It's actually quite simple. 6713 12:07:37,160 --> 12:07:48,160 If we dive into nn dot relu, or relu, however you want to say it, I usually say relu, applies the rectified linear unit function element wise. 6714 12:07:48,160 --> 12:07:52,160 So that means element wise on every element in our input tensor. 6715 12:07:52,160 --> 12:07:58,160 And so it stands for rectified linear unit, and here's what it does. Basically, it takes an input. 6716 12:07:58,160 --> 12:08:06,160 If the input is negative, it turns the input to zero, and it leaves the positive inputs how they are. 6717 12:08:06,160 --> 12:08:08,160 And so this line is not straight. 6718 12:08:08,160 --> 12:08:14,160 Now, you could argue, yeah, well, it's straight here and then straight there, but this is a form of a nonlinear activation function. 6719 12:08:14,160 --> 12:08:19,160 So it goes boom, if it was linear, it would just stay straight there like that. 6720 12:08:19,160 --> 12:08:23,160 But let's see it in practice. Do you think this is going to improve our model? 6721 12:08:23,160 --> 12:08:28,160 Well, let's find out together, hey, forward, we need to implement the forward method. 6722 12:08:28,160 --> 12:08:38,160 And here's what we're going to do. Where should we put our nonlinear activation functions? 6723 12:08:38,160 --> 12:08:48,160 So I'm just going to put a node here. Relu is a nonlinear activation function. 6724 12:08:48,160 --> 12:08:55,160 And remember, wherever I say function, it's just performing some sort of operation on a numerical input. 6725 12:08:55,160 --> 12:09:01,160 So we're going to put a nonlinear activation function in between each of our layers. 6726 12:09:01,160 --> 12:09:05,160 So let me show you what this looks like, self dot layer three. 6727 12:09:05,160 --> 12:09:12,160 We're going to start from the outside in self dot relu, and then we're going to go self dot layer two. 6728 12:09:12,160 --> 12:09:16,160 And then we're going to go self dot relu. 6729 12:09:16,160 --> 12:09:21,160 And then there's a fair bit going on here, but nothing we can't handle layer one. And then here's the X. 6730 12:09:21,160 --> 12:09:30,160 So what happens is our data goes into layer one, performs a linear operation with an end up linear. 6731 12:09:30,160 --> 12:09:34,160 Then we pass the output of layer one to a relu function. 6732 12:09:34,160 --> 12:09:42,160 So we, where's relu up here, we turn all of the negative outputs of our model of our of layer one to zero, 6733 12:09:42,160 --> 12:09:45,160 and we keep the positives how they are. 6734 12:09:45,160 --> 12:09:48,160 And then we do the same here with layer two. 6735 12:09:48,160 --> 12:09:53,160 And then finally, the outputs of layer three stay as they are. We've got out features there. 6736 12:09:53,160 --> 12:09:58,160 We don't have a relu on the end here, because we're going to pass the outputs to the sigmoid function later on. 6737 12:09:58,160 --> 12:10:04,160 And if we really wanted to, we could put self dot sigmoid here equals an end dot sigmoid. 6738 12:10:04,160 --> 12:10:08,160 But I'm going to, that's just one way of constructing it. 6739 12:10:08,160 --> 12:10:16,160 We're just going to apply the sigmoid function to the logits of our model, because what are the logits, the raw output of our model. 6740 12:10:16,160 --> 12:10:22,160 And so let's instantiate our model. This is going to be called model three, which is a little bit confusing, but we're up to model three, 6741 12:10:22,160 --> 12:10:28,160 which is circle model V two, and we're going to send that to the target device. 6742 12:10:28,160 --> 12:10:32,160 And then let's check model three. What does this look like? 6743 12:10:35,160 --> 12:10:45,160 Wonderful. So it doesn't actually show us where the relu's appear, but it just shows us what are the parameters of our circle model V two. 6744 12:10:45,160 --> 12:10:56,160 Now, I'd like you to have a think about this. And my challenge to you is to go ahead and see if this model is capable of working on our data, on our circular data. 6745 12:10:56,160 --> 12:11:01,160 So we've got the data sets ready. You need to set up some training code. 6746 12:11:01,160 --> 12:11:06,160 My challenge to you is write that training code and see if this model works. 6747 12:11:06,160 --> 12:11:09,160 But we're going to go through that over the next few videos. 6748 12:11:09,160 --> 12:11:17,160 And also, my other challenge to you is to go to the TensorFlow Playground and recreate our neural network here. 6749 12:11:17,160 --> 12:11:23,160 You can have two hidden layers. Does this go to 10? Well, it only goes to eight. We'll keep this at five. 6750 12:11:23,160 --> 12:11:29,160 So build something like this. So we've got two layers with five. It's a little bit different to ours because we've got two layers with 10. 6751 12:11:29,160 --> 12:11:36,160 And then put the learning rate to 0.01. What do we have? 0.1 with stochastic gradient descent. 6752 12:11:36,160 --> 12:11:40,160 We've been using 0.1, so we'll leave that. So this is the TensorFlow Playground. 6753 12:11:40,160 --> 12:11:49,160 And then change the activation here. Instead of linear, which we've used before, change it to relu, which is what we're using. 6754 12:11:49,160 --> 12:11:54,160 And press play here and see what happens. I'll see you in the next video. 6755 12:11:54,160 --> 12:12:00,160 Welcome back. In the last video, I left off leaving the challenge of recreating this model here. 6756 12:12:00,160 --> 12:12:06,160 It's not too difficult to do. We've got two hidden layers and five neurons. We've got our data set, which looks kind of like ours. 6757 12:12:06,160 --> 12:12:11,160 But the main points here are have to learning rate of 0.1, which is what we've been using. 6758 12:12:11,160 --> 12:12:20,160 But to change it from, we've previously used a linear activation to change it from linear to relu, which is what we've got set up here in the code. 6759 12:12:20,160 --> 12:12:25,160 Now, remember, relu is a popular and effective nonlinear activation function. 6760 12:12:25,160 --> 12:12:31,160 And we've been discussing that we need nonlinearity to model nonlinear data. 6761 12:12:31,160 --> 12:12:36,160 And so that's the crux to what neural networks are. 6762 12:12:36,160 --> 12:12:40,160 Artificial neural networks, not to get confused with the brain neural networks, but who knows? 6763 12:12:40,160 --> 12:12:44,160 This might be how they work, too. I don't know. I'm not a neurosurgeon or a neuroscientist. 6764 12:12:44,160 --> 12:12:51,160 Artificial neural networks are a large combination of linear. 6765 12:12:51,160 --> 12:13:05,160 So this is straight and non-straight nonlinear functions, which are potentially able to find patterns in data. 6766 12:13:05,160 --> 12:13:09,160 And so for our data set, it's quite small. It's just a blue and a red circle. 6767 12:13:09,160 --> 12:13:17,160 But this same principle applies for larger data sets and larger models combined linear and nonlinear functions. 6768 12:13:17,160 --> 12:13:21,160 So we've got a few tabs going on here. Let's get rid of some. Let's come back to here. 6769 12:13:21,160 --> 12:13:24,160 Did you try this out? Does it work? Do you think it'll work? 6770 12:13:24,160 --> 12:13:29,160 I don't know. Let's find out together. Ready? Three, two, one. 6771 12:13:29,160 --> 12:13:31,160 Look at that. 6772 12:13:31,160 --> 12:13:37,160 Almost instantly the training loss goes down to zero and the test loss is basically zero as well. Look at that. 6773 12:13:37,160 --> 12:13:45,160 That's amazing. We can stop that there. And if we change the learning rate, maybe a little lower, let's see what happens. 6774 12:13:45,160 --> 12:13:49,160 It takes a little bit longer to get to where it wants to go to. 6775 12:13:49,160 --> 12:13:53,160 See, that's the power of changing the learning rate. Let's make it really small. What happens here? 6776 12:13:53,160 --> 12:13:57,160 So that was about 300 epochs. The loss started to go down. 6777 12:13:57,160 --> 12:14:01,160 If we change it to be really small, oh, we're getting a little bit of a trend. 6778 12:14:01,160 --> 12:14:06,160 Is it starting to go down? We're already surpassed the epochs that we had. 6779 12:14:06,160 --> 12:14:11,160 So see how the learning rate is much smaller? That means our model is learning much slower. 6780 12:14:11,160 --> 12:14:17,160 So this is just a beautiful visual way of demonstrating different values of the learning rate. 6781 12:14:17,160 --> 12:14:21,160 We could sit here all day and that might not get to lower, but let's increase it by 10x. 6782 12:14:21,160 --> 12:14:27,160 And that was over 1,000 epochs and it's still at about 0.5, let's say. 6783 12:14:27,160 --> 12:14:31,160 Oh, we got a better. Oh, we're going faster already. 6784 12:14:31,160 --> 12:14:37,160 So not even at 500 or so epochs, we're about 0.4. 6785 12:14:37,160 --> 12:14:41,160 That's the power of the learning rate. We'll increase it by another 10x. 6786 12:14:41,160 --> 12:14:45,160 We'll reset. Start again. Oh, would you look at that much faster this time. 6787 12:14:45,160 --> 12:14:50,160 That is beautiful. Oh, there's nothing better than watching a loss curve go down. 6788 12:14:50,160 --> 12:14:55,160 In the world of machine learning, that is. And then we reset that again. 6789 12:14:55,160 --> 12:15:01,160 And let's change it right back to what we had. And we get to 0 in basically under 100 epochs. 6790 12:15:01,160 --> 12:15:05,160 So that's the power of the learning rate, little visual representation. 6791 12:15:05,160 --> 12:15:10,160 Working on learning rates, it's time for us to build an optimizer and a loss function. 6792 12:15:10,160 --> 12:15:15,160 So that's right here. We've got our nonlinear model set up loss and optimizer. 6793 12:15:15,160 --> 12:15:19,160 You might have already done this because the code, this is code that we've written before, 6794 12:15:19,160 --> 12:15:23,160 but we're going to redo it for completeness and practice. 6795 12:15:23,160 --> 12:15:28,160 So we want a loss function. We're working with logits here and we're working with binary cross entropy. 6796 12:15:28,160 --> 12:15:30,160 So what loss do we use? 6797 12:15:30,160 --> 12:15:34,160 Binary cross entropy. Sorry, we're working with a binary classification problem. 6798 12:15:34,160 --> 12:15:38,160 Blue dots or red dots, torch dot opt in. 6799 12:15:38,160 --> 12:15:42,160 What are some other binary classification problems that you can think of? 6800 12:15:42,160 --> 12:15:46,160 We want model three dot parameters. 6801 12:15:46,160 --> 12:15:49,160 They're the parameters that we want to optimize this model here. 6802 12:15:49,160 --> 12:15:57,160 And we're going to set our LR to 0.1, just like we had in the TensorFlow playground. 6803 12:15:57,160 --> 12:16:03,160 Beautiful. So some other binary classification problems I can think of would be email. 6804 12:16:03,160 --> 12:16:07,160 Spam or not spam credit cards. 6805 12:16:07,160 --> 12:16:12,160 So equals fraud or not fraud. 6806 12:16:12,160 --> 12:16:15,160 What else? You might have insurance claims. 6807 12:16:15,160 --> 12:16:19,160 Equals who's at fault or not at fault. 6808 12:16:19,160 --> 12:16:23,160 If someone puts in a claim speaking about a car crash, whose fault was it? 6809 12:16:23,160 --> 12:16:26,160 Was the person submitting the claim? Were they at fault? 6810 12:16:26,160 --> 12:16:30,160 Or was the person who was also mentioned in the claim? Are they not at fault? 6811 12:16:30,160 --> 12:16:34,160 So there's many more, but they're just some I can think of up the top of my head. 6812 12:16:34,160 --> 12:16:38,160 But now let's train our model with nonlinearity. 6813 12:16:38,160 --> 12:16:42,160 Oh, we're on a roll here. 6814 12:16:42,160 --> 12:16:45,160 Training a model with nonlinearity. 6815 12:16:45,160 --> 12:16:50,160 So we've seen that if we introduce a nonlinear activation function within a model, 6816 12:16:50,160 --> 12:16:56,160 remember this is a linear activation function, and if we train this, the loss doesn't go down. 6817 12:16:56,160 --> 12:17:01,160 But if we just adjust this to add a relu in here, we get the loss going down. 6818 12:17:01,160 --> 12:17:06,160 So hopefully this replicates with our pure PyTorch code. 6819 12:17:06,160 --> 12:17:09,160 So let's do it, hey? 6820 12:17:09,160 --> 12:17:12,160 So we're going to create random seeds. 6821 12:17:12,160 --> 12:17:16,160 Because we're working with CUDA, we'll introduce the CUDA random seed as well. 6822 12:17:16,160 --> 12:17:22,160 Torch.manual seed. Again, don't worry too much if your numbers on your screen aren't exactly what mine are. 6823 12:17:22,160 --> 12:17:26,160 That's due to the inherent randomness of machine learning. 6824 12:17:26,160 --> 12:17:31,160 In fact, stochastic gradient descent stochastic again stands for random. 6825 12:17:31,160 --> 12:17:35,160 And we're just setting up the seeds here so that they can be as close as possible. 6826 12:17:35,160 --> 12:17:38,160 But the direction is more important. 6827 12:17:38,160 --> 12:17:45,160 So if my loss goes down, your loss should also go down on target device. 6828 12:17:45,160 --> 12:17:48,160 And then we're going to go Xtrain. 6829 12:17:48,160 --> 12:17:52,160 So this is setting up device agnostic code. We've done this before. 6830 12:17:52,160 --> 12:17:55,160 But we're going to do it again for completeness. 6831 12:17:55,160 --> 12:17:59,160 Just to practice every step of the puzzle. That's what we want to do. 6832 12:17:59,160 --> 12:18:03,160 We want to have experience. That's what this course is. It's a momentum builder. 6833 12:18:03,160 --> 12:18:13,160 So that when you go to other repos and machine learning projects that use PyTorch, you can go, oh, does this code set device agnostic code? 6834 12:18:13,160 --> 12:18:18,160 What problem are we working on? Is it binary or multi-class classification? 6835 12:18:18,160 --> 12:18:22,160 So let's go loop through data. 6836 12:18:22,160 --> 12:18:26,160 Again, we've done this before, but we're going to set up the epochs. 6837 12:18:26,160 --> 12:18:29,160 Let's do 1000 epochs. Why not? 6838 12:18:29,160 --> 12:18:33,160 So we can go for epoch in range epochs. 6839 12:18:33,160 --> 12:18:37,160 What do we do here? Well, we want to train. So this is training code. 6840 12:18:37,160 --> 12:18:40,160 We set our model model three dot train. 6841 12:18:40,160 --> 12:18:44,160 And I want you to start to think about how could we functionalize this training code? 6842 12:18:44,160 --> 12:18:47,160 We're going to start to move towards that in a future video. 6843 12:18:47,160 --> 12:18:51,160 So one is forward pass. We've got the logits. Why the logits? 6844 12:18:51,160 --> 12:18:57,160 Well, because the raw output of our model without any activation functions towards the final layer. 6845 12:18:57,160 --> 12:19:00,160 Classified as logits or called logits. 6846 12:19:00,160 --> 12:19:10,160 And then we create y-pred as in prediction labels by rounding the output of torch dot sigmoid of the logits. 6847 12:19:10,160 --> 12:19:20,160 So this is going to take us from logits to prediction probabilities to prediction labels. 6848 12:19:20,160 --> 12:19:26,160 And then we can go to, which is calculate the loss. 6849 12:19:26,160 --> 12:19:30,160 That's from my unofficial pytorch song. Calculate the last. 6850 12:19:30,160 --> 12:19:35,160 We go loss equals loss FN y logits. 6851 12:19:35,160 --> 12:19:48,160 Because remember, we've got BCE with logits loss and takes in logits as first input. 6852 12:19:48,160 --> 12:19:53,160 And that's going to calculate the loss between our models, logits and the y training labels. 6853 12:19:53,160 --> 12:19:58,160 And we will go here, we'll calculate accuracy using our accuracy function. 6854 12:19:58,160 --> 12:20:06,160 And this one is a little bit backwards compared to pytorch, but we pass in the y training labels first. 6855 12:20:06,160 --> 12:20:12,160 But it's constructed this way because it's in the same style as scikit line. 6856 12:20:12,160 --> 12:20:21,160 Three, we go optimizer zero grad. We zero the gradients of the optimizer so that it can start from fresh. 6857 12:20:21,160 --> 12:20:25,160 Calculating the ideal gradients every epoch. 6858 12:20:25,160 --> 12:20:28,160 So it's going to reset every epoch, which is fine. 6859 12:20:28,160 --> 12:20:34,160 Then we're going to perform back propagation pytorch is going to take care of that for us by calling loss backwards. 6860 12:20:34,160 --> 12:20:42,160 And then we will perform gradient descent. So step the optimizer to see how we should improve our model parameters. 6861 12:20:42,160 --> 12:20:45,160 So optimizer dot step. 6862 12:20:45,160 --> 12:20:50,160 Oh, and I want to show you speaking of model parameters. Let's check our model three dot state dig. 6863 12:20:50,160 --> 12:20:56,160 So the relu activation function actually doesn't have any parameters. 6864 12:20:56,160 --> 12:21:02,160 So you'll notice here, we've got weight, we've got bias of layer one, layer two, and a layer three. 6865 12:21:02,160 --> 12:21:09,160 So the relu function here doesn't have any parameters to optimize. If we go nn dot relu. 6866 12:21:09,160 --> 12:21:12,160 Does it say what it implements? There we go. 6867 12:21:12,160 --> 12:21:19,160 So it's just the maximum of zero or x. So it takes the input and takes the max of zero or x. 6868 12:21:19,160 --> 12:21:26,160 And so when it takes the max of zero or x, if it's a negative number, zero is going to be higher than a negative number. 6869 12:21:26,160 --> 12:21:29,160 So that's why it zeroes all of the negative inputs. 6870 12:21:29,160 --> 12:21:38,160 And then it leaves the positive inputs how they are because the max of a positive input versus zero is the positive input. 6871 12:21:38,160 --> 12:21:43,160 So this has no parameters to optimize. That's why it's so effective because you think about it. 6872 12:21:43,160 --> 12:21:47,160 Every parameter in our model needs some little bit of computation to adjust. 6873 12:21:47,160 --> 12:21:51,160 And so the more parameters we add to our model, the more compute that is required. 6874 12:21:51,160 --> 12:22:00,160 So generally, the kind of trade-off in machine learning is that, yes, more parameters have more of an ability to learn, but you need more compute. 6875 12:22:00,160 --> 12:22:07,160 So let's go model three dot a vowel. And we're going to go with torch dot inference mode. 6876 12:22:07,160 --> 12:22:13,160 If I could spell inference, that'd be fantastic. We're going to do what? We're going to do the forward pass. 6877 12:22:13,160 --> 12:22:17,160 So test logits equals model three on the test data. 6878 12:22:17,160 --> 12:22:29,160 And then we're going to calculate the test pred labels by calling torch dot round on torch dot sigmoid on the test logits. 6879 12:22:29,160 --> 12:22:33,160 And then we can calculate the test loss. How do we do that? 6880 12:22:33,160 --> 12:22:43,160 And then we can also calculate the test accuracy. I'm just going to give myself some more space here. 6881 12:22:43,160 --> 12:22:54,160 So I can code in the middle of the screen equals accuracy function on what we're going to pass in y true equals y test. 6882 12:22:54,160 --> 12:23:04,160 We're going to pass in y true equals y test. And then we will pass in y pred equals test pred. 6883 12:23:04,160 --> 12:23:08,160 Beautiful. A final step here is to print out what's happening. 6884 12:23:08,160 --> 12:23:13,160 Now, this will be very important because one, it's fun to know what your model is doing. 6885 12:23:13,160 --> 12:23:20,160 And two, if our model does actually learn, I'd like to see the loss values go down and the accuracy values go up. 6886 12:23:20,160 --> 12:23:30,160 As I said, there's nothing much more beautiful in the world of machine learning than watching a loss function go down or a loss value go down and watching a loss curve go down. 6887 12:23:30,160 --> 12:23:36,160 So let's print out the current epoch and then we'll print out the loss, which will just be the training loss. 6888 12:23:36,160 --> 12:23:41,160 And we'll take that to four decimal places. And then we'll go accuracy here. 6889 12:23:41,160 --> 12:23:54,160 And this will be a and we'll take this to two decimal places and we'll put a little percentage sign there and then we'll break it up by putting in the test loss here and we'll put in the test loss. 6890 12:23:54,160 --> 12:24:03,160 Because remember our model learns patterns on the training data set and then evaluates those patterns on the test data set. 6891 12:24:03,160 --> 12:24:14,160 So, and we'll pass in test act here and no doubt there might be an error or two within all of this code, but we're going to try and run this because we've seen this code before, but I think we're ready. 6892 12:24:14,160 --> 12:24:19,160 We're training our first model here with non-linearities built into the model. 6893 12:24:19,160 --> 12:24:24,160 You ready? Three, two, one, let's go. 6894 12:24:24,160 --> 12:24:36,160 Oh, of course. Module torch CUDA has no attribute manuals are just a typo standard man you out. 6895 12:24:36,160 --> 12:24:38,160 There we go. Have to sound that out. 6896 12:24:38,160 --> 12:24:46,160 Another one. What do we get wrong here? Oh, target size must be same as input size. Where did it mess up here? 6897 12:24:46,160 --> 12:24:55,160 What do we get wrong? Test loss, test logits on Y test. Hmm. 6898 12:24:55,160 --> 12:25:05,160 So these two aren't matching up. Model three X test and Y test. What's the size of? 6899 12:25:05,160 --> 12:25:10,160 So let's do some troubleshooting on the fly. Hey, not everything always works out as you want. 6900 12:25:10,160 --> 12:25:18,160 So length of X test, we've got a shape issue here. Remember how I said one of the most common issues in deep learning is a shape issue? 6901 12:25:18,160 --> 12:25:21,160 We've got the same shape here. 6902 12:25:21,160 --> 12:25:33,160 Let's check test logits dot shape and Y test dot shape. We'll print this out. 6903 12:25:33,160 --> 12:25:42,160 So 200. Oh, here's what we have to do. That's what we missed dot squeeze. Oh, see how I've been hinting at the fact that we needed to call dot squeeze. 6904 12:25:42,160 --> 12:25:48,160 So this is where the discrepancy is. Our test logits dot shape. We've got an extra dimension here. 6905 12:25:48,160 --> 12:25:53,160 And what are we getting here? A value error on the target size, which is a shape mismatch. 6906 12:25:53,160 --> 12:25:59,160 So we've got target size 200 must be the same input size as torch size 201. 6907 12:25:59,160 --> 12:26:04,160 So did we squeeze this? Oh, that's why the training worked. Okay, so we've missed this. 6908 12:26:04,160 --> 12:26:12,160 Let's just get rid of this. So we're getting rid of the extra one dimension by using squeeze, which is the one dimension here. 6909 12:26:12,160 --> 12:26:19,160 We should have everything lined up. There we go. Okay. Look at that. Yes. 6910 12:26:19,160 --> 12:26:24,160 Now accuracy has gone up, albeit not by too much. It's still not perfect. 6911 12:26:24,160 --> 12:26:32,160 So really we'd like this to be towards 100% lost to be lower. But I feel like we've got a better performing model. Don't you? 6912 12:26:32,160 --> 12:26:38,160 Now that is the power of non linearity. All we did was we added in a relu layer or just two of them. 6913 12:26:38,160 --> 12:26:51,160 Relu here, relu here. But what did we do? We gave our model the power of straight lines. Oh, straight linear of straight lines and non straight lines. 6914 12:26:51,160 --> 12:26:56,160 So it can potentially draw a line to separate these circles. 6915 12:26:56,160 --> 12:27:05,160 So in the next video, let's draw a line, plot our model decision boundary using our function and see if it really did learn anything. 6916 12:27:05,160 --> 12:27:08,160 I'll see you there. 6917 12:27:08,160 --> 12:27:14,160 Welcome back. In the last video, we trained our first model, and as you can tell, I've got the biggest smile on my face, 6918 12:27:14,160 --> 12:27:22,160 but we trained our first model that harnesses both the power of straight lines and non straight lines or linear functions and non linear functions. 6919 12:27:22,160 --> 12:27:29,160 And by the 1000th epoch, we look like we're getting a bit better results than just pure guessing, which is 50%. 6920 12:27:29,160 --> 12:27:36,160 Because we have 500 samples of red dots and 500 samples of blue dots. So we have evenly balanced classes. 6921 12:27:36,160 --> 12:27:46,160 Now, we've seen that if we added a relu activation function with a data set similar to ours with a TensorFlow playground, the model starts to fit. 6922 12:27:46,160 --> 12:27:51,160 But it doesn't work with just linear. There's a few other activation functions that you could play around with here. 6923 12:27:51,160 --> 12:27:57,160 You could play around with the learning rate, regularization. If you're not sure what that is, I'll leave that as extra curriculum to look up. 6924 12:27:57,160 --> 12:28:03,160 But we're going to retire the TensorFlow program for now because we're going to go back to writing code. 6925 12:28:03,160 --> 12:28:09,160 So let's get out of that. Let's get out of that. We now have to evaluate our model because right now it's just numbers on a page. 6926 12:28:09,160 --> 12:28:18,160 So let's write down here 6.4. What do we like to do to evaluate things? It's visualize, visualize, visualize. 6927 12:28:18,160 --> 12:28:25,160 So evaluating a model trained with nonlinear activation functions. 6928 12:28:25,160 --> 12:28:34,160 And we also discussed the point that neural networks are really just a big combination of linear and nonlinear functions trying to draw patterns in data. 6929 12:28:34,160 --> 12:28:41,160 So with that being said, let's make some predictions with our Model 3, our most recently trained model. 6930 12:28:41,160 --> 12:28:46,160 We'll put it into a Val mode and then we'll set up inference mode. 6931 12:28:46,160 --> 12:28:55,160 And then we'll go yprads equals torch dot round and then torch dot sigmoid. 6932 12:28:55,160 --> 12:29:00,160 We could functionalize this, of course, Model 3 and then pass in X test. 6933 12:29:00,160 --> 12:29:06,160 And you know what? We're going to squeeze these here because we ran into some troubles in the previous video. 6934 12:29:06,160 --> 12:29:14,160 I actually really liked that we did because then we got to troubleshoot a shape error on the fly because that's one of the most common issues you're going to come across in deep learning. 6935 12:29:14,160 --> 12:29:21,160 So yprads, let's check them out and then let's check out y test. 6936 12:29:21,160 --> 12:29:23,160 You want y test 10. 6937 12:29:23,160 --> 12:29:31,160 So remember, when we're evaluating predictions, we want them to be in the same format as our original labels. 6938 12:29:31,160 --> 12:29:33,160 We want to compare apples to apples. 6939 12:29:33,160 --> 12:29:36,160 And if we compare the format here, do these two things look the same? 6940 12:29:36,160 --> 12:29:39,160 Yes, they do. They're both on CUDA and they're both floats. 6941 12:29:39,160 --> 12:29:42,160 We can see that it's got this one wrong. 6942 12:29:42,160 --> 12:29:47,160 Whereas the other ones look pretty good. Hmm, this might look pretty good if we visualize it. 6943 12:29:47,160 --> 12:29:53,160 So now let's, you might have already done this because I issued the challenge of plotting the decision boundaries. 6944 12:29:53,160 --> 12:30:08,160 Plot decision boundaries and let's go PLT dot figure and we're going to set up the fig size to equal 12.6 because, again, one of the advantages of hosting a machine learning cooking show is that you can code ahead of time. 6945 12:30:08,160 --> 12:30:13,160 And then we can go PLT dot title is train. 6946 12:30:13,160 --> 12:30:18,160 And then we're going to call our plot decision boundary function, which we've seen before. 6947 12:30:18,160 --> 12:30:20,160 Plot decision boundary. 6948 12:30:20,160 --> 12:30:22,160 And we're going to pass this one in. 6949 12:30:22,160 --> 12:30:28,160 We could do model three, but we could also pass it in our older models to model one that doesn't use it on the reality. 6950 12:30:28,160 --> 12:30:31,160 In fact, I reckon that'll be a great comparison. 6951 12:30:31,160 --> 12:30:39,160 So we'll also create another plot here for the test data and this will be on index number two. 6952 12:30:39,160 --> 12:30:46,160 So remember, subplot is a number of rows, number of columns, index where the plot appears. 6953 12:30:46,160 --> 12:30:48,160 We'll give this one a title. 6954 12:30:48,160 --> 12:30:50,160 Plot dot title. 6955 12:30:50,160 --> 12:30:53,160 This will be test and Google Colab. 6956 12:30:53,160 --> 12:30:54,160 I didn't want that. 6957 12:30:54,160 --> 12:31:00,160 As I said, this course is also a battle between me and Google Colab's autocorrect. 6958 12:31:00,160 --> 12:31:04,160 So we're going model three and we'll pass in the test data here. 6959 12:31:04,160 --> 12:31:10,160 And behind the scenes, our plot decision boundary function will create a beautiful graphic for us, 6960 12:31:10,160 --> 12:31:17,160 perform some predictions on the X, the features input, and then we'll compare them with the Y values. 6961 12:31:17,160 --> 12:31:19,160 Let's see what's going on here. 6962 12:31:19,160 --> 12:31:21,160 Oh, look at that. 6963 12:31:21,160 --> 12:31:25,160 Yes, our first nonlinear model. 6964 12:31:25,160 --> 12:31:29,160 Okay, it's not perfect, but it is certainly much better than the models that we had before. 6965 12:31:29,160 --> 12:31:30,160 Look at this. 6966 12:31:30,160 --> 12:31:32,160 Model one has no linearity. 6967 12:31:32,160 --> 12:31:35,160 Model one equals no nonlinearity. 6968 12:31:35,160 --> 12:31:38,160 I've got double negative there. 6969 12:31:38,160 --> 12:31:44,160 Whereas model three equals has nonlinearity. 6970 12:31:44,160 --> 12:31:51,160 So do you see the power of nonlinearity or better yet the power of linearity or linear straight lines with non straight lines? 6971 12:31:51,160 --> 12:31:55,160 So I feel like we could do better than this, though. 6972 12:31:55,160 --> 12:32:02,160 Here's your challenge is to can you improve model three to do better? 6973 12:32:02,160 --> 12:32:05,160 What did we get? 6974 12:32:05,160 --> 12:32:13,160 79% accuracy to do better than 80% accuracy on the test data. 6975 12:32:13,160 --> 12:32:15,160 I think you can. 6976 12:32:15,160 --> 12:32:16,160 So that's the challenge. 6977 12:32:16,160 --> 12:32:20,160 And if you're looking for hints on how to do so, where can you look? 6978 12:32:20,160 --> 12:32:22,160 Well, we've covered this improving a model. 6979 12:32:22,160 --> 12:32:26,160 So maybe you add some more layers, maybe you add more hidden units. 6980 12:32:26,160 --> 12:32:27,160 Maybe you fit for longer. 6981 12:32:27,160 --> 12:32:32,160 Maybe you if you add more layers, you put a relio activation function on top of those as well. 6982 12:32:32,160 --> 12:32:36,160 Maybe you lower the learning rate because right now we've got 0.1. 6983 12:32:36,160 --> 12:32:39,160 So give this a shot, try and improve it. 6984 12:32:39,160 --> 12:32:40,160 I think you can do it. 6985 12:32:40,160 --> 12:32:42,160 But we're going to push forward. 6986 12:32:42,160 --> 12:32:45,160 That's going to be your challenge for some extra curriculum. 6987 12:32:45,160 --> 12:32:51,160 I think in the next section, we've seen our nonlinear activation functions in action. 6988 12:32:51,160 --> 12:32:54,160 Let's write some code to replicate them. 6989 12:32:54,160 --> 12:32:57,160 I'll see you there. 6990 12:32:57,160 --> 12:32:58,160 Welcome back. 6991 12:32:58,160 --> 12:33:02,160 In the last video, I left off with the challenge of improving model three to do better than 6992 12:33:02,160 --> 12:33:04,160 80% accuracy on the test data. 6993 12:33:04,160 --> 12:33:06,160 I hope you gave it a shot. 6994 12:33:06,160 --> 12:33:08,160 But here are some of the things I would have done. 6995 12:33:08,160 --> 12:33:12,160 As I potentially add more layers, I maybe increase the number of hidden units, 6996 12:33:12,160 --> 12:33:17,160 and then if we needed to fit for longer and maybe lower the learning rate to 0.01. 6997 12:33:17,160 --> 12:33:22,160 But I'll leave that for you to explore because that's the motto of the data scientists, right? 6998 12:33:22,160 --> 12:33:25,160 Is to experiment, experiment, experiment. 6999 12:33:25,160 --> 12:33:27,160 So let's go in here. 7000 12:33:27,160 --> 12:33:31,160 We've seen our nonlinear activation functions in practice. 7001 12:33:31,160 --> 12:33:33,160 Let's replicate them. 7002 12:33:33,160 --> 12:33:38,160 So replicating nonlinear activation functions. 7003 12:33:38,160 --> 12:33:46,160 And remember neural networks rather than us telling the model what to learn. 7004 12:33:46,160 --> 12:33:52,160 We give it the tools to discover patterns in data. 7005 12:33:52,160 --> 12:34:01,160 And it tries to figure out the best patterns on its own. 7006 12:34:01,160 --> 12:34:04,160 And what are these tools? 7007 12:34:04,160 --> 12:34:06,160 That's right down here. 7008 12:34:06,160 --> 12:34:07,160 We've seen this in action. 7009 12:34:07,160 --> 12:34:13,160 And these tools are linear and nonlinear functions. 7010 12:34:13,160 --> 12:34:18,160 So a neural network is a big stack of linear and nonlinear functions. 7011 12:34:18,160 --> 12:34:21,160 For us, we've only got about four layers or so, four or five layers. 7012 12:34:21,160 --> 12:34:24,160 But as I said, other networks can get much larger. 7013 12:34:24,160 --> 12:34:26,160 But the premise remains. 7014 12:34:26,160 --> 12:34:30,160 Some form of linear and nonlinear manipulation of the data. 7015 12:34:30,160 --> 12:34:32,160 So let's get out of this. 7016 12:34:32,160 --> 12:34:36,160 Let's make our workspace a little bit more cleaner. 7017 12:34:36,160 --> 12:34:38,160 Replicating nonlinear activation functions. 7018 12:34:38,160 --> 12:34:40,160 So let's create a tensor to start with. 7019 12:34:40,160 --> 12:34:43,160 Everything starts from the tensor. 7020 12:34:43,160 --> 12:34:47,160 And we'll go A equals torch A range. 7021 12:34:47,160 --> 12:34:52,160 And we're going to create a range from negative 10 to 10 with a step of one. 7022 12:34:52,160 --> 12:34:57,160 And we can set the D type here to equal torch dot float 32. 7023 12:34:57,160 --> 12:34:59,160 But we don't actually need to. 7024 12:34:59,160 --> 12:35:00,160 That's going to be the default. 7025 12:35:00,160 --> 12:35:06,160 So if we set A here, A dot D type. 7026 12:35:06,160 --> 12:35:10,160 Then we've got torch float 32 and I'm pretty sure if we've got rid of that. 7027 12:35:10,160 --> 12:35:13,160 Oh, we've got torch in 64. 7028 12:35:13,160 --> 12:35:15,160 Why is that happening? 7029 12:35:15,160 --> 12:35:18,160 Well, let's check out A. 7030 12:35:18,160 --> 12:35:23,160 Oh, it's because we've got integers as our values because we have a step as one. 7031 12:35:23,160 --> 12:35:26,160 If we turn this into a float, what's going to happen? 7032 12:35:26,160 --> 12:35:28,160 We get float 32. 7033 12:35:28,160 --> 12:35:29,160 But we'll keep it. 7034 12:35:29,160 --> 12:35:31,160 Otherwise, this is going to be what? 7035 12:35:31,160 --> 12:35:32,160 About a hundred numbers? 7036 12:35:32,160 --> 12:35:33,160 Yeah, no, that's too many. 7037 12:35:33,160 --> 12:35:41,160 Let's keep it at negative 10 to 10 and we'll set the D type here to torch float 32. 7038 12:35:41,160 --> 12:35:42,160 Beautiful. 7039 12:35:42,160 --> 12:35:46,160 So it looks like PyTorch's default data type for integers is in 64. 7040 12:35:46,160 --> 12:35:52,160 But we're going to work with float 32 because float 32, if our data wasn't float 32 with 7041 12:35:52,160 --> 12:35:56,160 the functions we're about to create, we might run into some errors. 7042 12:35:56,160 --> 12:35:59,160 So let's visualize this data. 7043 12:35:59,160 --> 12:36:04,160 I want you to guess, is this a straight line or non-straight line? 7044 12:36:04,160 --> 12:36:05,160 You've got three seconds. 7045 12:36:05,160 --> 12:36:09,160 One, two, three. 7046 12:36:09,160 --> 12:36:11,160 Straight line. 7047 12:36:11,160 --> 12:36:12,160 There we go. 7048 12:36:12,160 --> 12:36:16,160 We've got negative 10 to positive 10 up here or nine. 7049 12:36:16,160 --> 12:36:17,160 Close enough. 7050 12:36:17,160 --> 12:36:19,160 And so how would we turn this straight line? 7051 12:36:19,160 --> 12:36:22,160 If it's a straight line, it's linear. 7052 12:36:22,160 --> 12:36:26,160 How would we perform the relu activation function on this? 7053 12:36:26,160 --> 12:36:31,160 Now, we could of course call torch relu on A. 7054 12:36:31,160 --> 12:36:34,160 Actually, let's in fact just plot this. 7055 12:36:34,160 --> 12:36:37,160 PLT dot plot on torch relu. 7056 12:36:37,160 --> 12:36:39,160 What does this look like? 7057 12:36:39,160 --> 12:36:40,160 Boom, there we go. 7058 12:36:40,160 --> 12:36:42,160 But we want to replicate the relu function. 7059 12:36:42,160 --> 12:36:44,160 So let's go nn dot relu. 7060 12:36:44,160 --> 12:36:46,160 What does it do? 7061 12:36:46,160 --> 12:36:48,160 We've seen this before. 7062 12:36:48,160 --> 12:36:49,160 So we need the max. 7063 12:36:49,160 --> 12:36:51,160 We need to return based on an input. 7064 12:36:51,160 --> 12:36:54,160 We need the max of zero and x. 7065 12:36:54,160 --> 12:36:56,160 So let's give it a shot. 7066 12:36:56,160 --> 12:36:58,160 We'll come here. 7067 12:36:58,160 --> 12:37:00,160 Again, we need more space. 7068 12:37:00,160 --> 12:37:02,160 There can never be enough code space here. 7069 12:37:02,160 --> 12:37:03,160 I like writing lots of code. 7070 12:37:03,160 --> 12:37:04,160 I don't know about you. 7071 12:37:04,160 --> 12:37:06,160 But let's go relu. 7072 12:37:06,160 --> 12:37:09,160 We'll take an input x, which will be some form of tensor. 7073 12:37:09,160 --> 12:37:13,160 And we'll go return torch dot maximum. 7074 12:37:13,160 --> 12:37:15,160 I think you could just do torch dot max. 7075 12:37:15,160 --> 12:37:17,160 But we'll try maximum. 7076 12:37:17,160 --> 12:37:21,160 Torch dot tensor zero. 7077 12:37:21,160 --> 12:37:26,160 So the maximum is going to return the max between whatever this is. 7078 12:37:26,160 --> 12:37:29,160 One option and whatever the other option is. 7079 12:37:29,160 --> 12:37:34,160 So inputs must be tensors. 7080 12:37:34,160 --> 12:37:40,160 So maybe we could just give a type hint here that this is torch dot tensor. 7081 12:37:40,160 --> 12:37:42,160 And this should return a tensor too. 7082 12:37:42,160 --> 12:37:44,160 Return torch dot tensor. 7083 12:37:44,160 --> 12:37:45,160 Beautiful. 7084 12:37:45,160 --> 12:37:47,160 You're ready to try it out. 7085 12:37:47,160 --> 12:37:49,160 Let's see what our relu function does. 7086 12:37:49,160 --> 12:37:51,160 Relu A. 7087 12:37:51,160 --> 12:37:52,160 Wonderful. 7088 12:37:52,160 --> 12:37:55,160 It looks like we got quite a similar output to before. 7089 12:37:55,160 --> 12:37:57,160 Here's our original A. 7090 12:37:57,160 --> 12:37:59,160 So we've got negative numbers. 7091 12:37:59,160 --> 12:38:00,160 There we go. 7092 12:38:00,160 --> 12:38:05,160 So recall that the relu activation function turns all negative numbers into zero 7093 12:38:05,160 --> 12:38:08,160 because it takes the maximum between zero and the input. 7094 12:38:08,160 --> 12:38:11,160 And if the input's negative, well then zero is bigger than it. 7095 12:38:11,160 --> 12:38:15,160 And it leaves all of the positive values as they are. 7096 12:38:15,160 --> 12:38:17,160 So that's the beauty of relu. 7097 12:38:17,160 --> 12:38:20,160 Quite simple, but very effective. 7098 12:38:20,160 --> 12:38:25,160 So let's plot relu activation function. 7099 12:38:25,160 --> 12:38:26,160 Our custom one. 7100 12:38:26,160 --> 12:38:29,160 We will go PLT dot plot. 7101 12:38:29,160 --> 12:38:33,160 We'll call our relu function on A. 7102 12:38:33,160 --> 12:38:36,160 Let's see what this looks like. 7103 12:38:36,160 --> 12:38:38,160 Oh, look at us go. 7104 12:38:38,160 --> 12:38:39,160 Well done. 7105 12:38:39,160 --> 12:38:42,160 Just the exact same as the torch relu function. 7106 12:38:42,160 --> 12:38:43,160 Easy as that. 7107 12:38:43,160 --> 12:38:48,160 And what's another nonlinear activation function that we've used before? 7108 12:38:48,160 --> 12:38:54,160 Well, I believe one of them is if we go down to here, what did we say before? 7109 12:38:54,160 --> 12:38:55,160 Sigmoid. 7110 12:38:55,160 --> 12:38:56,160 Where is that? 7111 12:38:56,160 --> 12:38:57,160 Where are you, Sigmoid? 7112 12:38:57,160 --> 12:38:58,160 Here we go. 7113 12:38:58,160 --> 12:38:59,160 Hello, Sigmoid. 7114 12:38:59,160 --> 12:39:02,160 Oh, this has got a little bit more going on here. 7115 12:39:02,160 --> 12:39:06,160 One over one plus exponential of negative x. 7116 12:39:06,160 --> 12:39:12,160 So Sigmoid or this little symbol for Sigmoid of x, which is an input. 7117 12:39:12,160 --> 12:39:13,160 We get this. 7118 12:39:13,160 --> 12:39:15,160 So let's try and replicate this. 7119 12:39:15,160 --> 12:39:18,160 I might just bring this one in here. 7120 12:39:18,160 --> 12:39:23,160 Right now, let's do the same for Sigmoid. 7121 12:39:23,160 --> 12:39:26,160 So what do we have here? 7122 12:39:26,160 --> 12:39:29,160 Well, we want to create a custom Sigmoid. 7123 12:39:29,160 --> 12:39:32,160 And we want to have some sort of input, x. 7124 12:39:32,160 --> 12:39:39,160 And we want to return one divided by, do we have the function in Sigmoid? 7125 12:39:39,160 --> 12:39:43,160 One divided by one plus exponential. 7126 12:39:43,160 --> 12:39:50,160 One plus torch dot exp for exponential on negative x. 7127 12:39:50,160 --> 12:39:55,160 And we might put the bottom side in brackets so that it does that operation. 7128 12:39:55,160 --> 12:39:58,160 I reckon that looks all right to me. 7129 12:39:58,160 --> 12:40:04,160 So one divided by one plus torch exponential of negative x. 7130 12:40:04,160 --> 12:40:05,160 Do we have that? 7131 12:40:05,160 --> 12:40:06,160 Yes, we do. 7132 12:40:06,160 --> 12:40:08,160 Well, there's only one real way to find out. 7133 12:40:08,160 --> 12:40:11,160 Let's plot the torch version of Sigmoid. 7134 12:40:11,160 --> 12:40:14,160 Torch dot Sigmoid and we'll pass in x. 7135 12:40:14,160 --> 12:40:16,160 See what happens. 7136 12:40:16,160 --> 12:40:19,160 And then, oh, we have a. 7137 12:40:19,160 --> 12:40:20,160 My bad. 7138 12:40:20,160 --> 12:40:21,160 A is our tensor. 7139 12:40:21,160 --> 12:40:22,160 What do we get? 7140 12:40:22,160 --> 12:40:24,160 We get a curved line. 7141 12:40:24,160 --> 12:40:25,160 Wonderful. 7142 12:40:25,160 --> 12:40:27,160 And then we go plt dot plot. 7143 12:40:27,160 --> 12:40:30,160 And we're going to use our Sigmoid function on a. 7144 12:40:30,160 --> 12:40:33,160 Did we replicate torch's Sigmoid function? 7145 12:40:33,160 --> 12:40:35,160 Yes, we did. 7146 12:40:35,160 --> 12:40:38,160 Ooh, now. 7147 12:40:38,160 --> 12:40:41,160 See, this is what's happening behind the scenes with our neural networks. 7148 12:40:41,160 --> 12:40:45,160 Of course, you could do more complicated activation functions or layers and whatnot. 7149 12:40:45,160 --> 12:40:47,160 And you can try to replicate them. 7150 12:40:47,160 --> 12:40:49,160 In fact, that's a great exercise to try and do. 7151 12:40:49,160 --> 12:40:54,160 But we've essentially across the videos and the sections that we've done, we've replicated our linear layer. 7152 12:40:54,160 --> 12:40:56,160 And we've replicated the relu. 7153 12:40:56,160 --> 12:41:00,160 So we've actually built this model from scratch, or we could if we really wanted to. 7154 12:41:00,160 --> 12:41:06,160 But it's a lot easier to use PyTorch's layers because we're building neural networks here like Lego bricks, 7155 12:41:06,160 --> 12:41:08,160 stacking together these layers in some way, shape, or form. 7156 12:41:08,160 --> 12:41:14,160 And because they're a part of PyTorch, we know that they've been error-tested and they compute as fast as possible 7157 12:41:14,160 --> 12:41:18,160 behind the scenes and use GPU and get a whole bunch of benefits. 7158 12:41:18,160 --> 12:41:23,160 PyTorch offers a lot of benefits by using these layers rather than writing them ourselves. 7159 12:41:23,160 --> 12:41:25,160 And so this is what our model is doing. 7160 12:41:25,160 --> 12:41:31,160 It's literally like to learn these values and decrease the loss function and increase the accuracy. 7161 12:41:31,160 --> 12:41:37,160 It's combining linear layers and nonlinear layers or nonlinear functions. 7162 12:41:37,160 --> 12:41:39,160 Where's our relu function here? 7163 12:41:39,160 --> 12:41:41,160 A relu function like this behind the scenes. 7164 12:41:41,160 --> 12:41:47,160 So just combining linear and nonlinear functions to fit a data set. 7165 12:41:47,160 --> 12:41:54,160 And that premise remains even on our small data set and on very large data sets and very large models. 7166 12:41:54,160 --> 12:41:58,160 So with that being said, I think it's time for us to push on. 7167 12:41:58,160 --> 12:42:00,160 We've covered a fair bit of code here. 7168 12:42:00,160 --> 12:42:04,160 But we've worked on a binary classification problem. 7169 12:42:04,160 --> 12:42:08,160 Have we worked on a multi-class classification problem yet? 7170 12:42:08,160 --> 12:42:11,160 Do we have that here? Where's my fun graphic? 7171 12:42:11,160 --> 12:42:15,160 We have multi-class classification. 7172 12:42:15,160 --> 12:42:17,160 I think that's what we cover next. 7173 12:42:17,160 --> 12:42:22,160 We're going to put together all of the steps in our workflow that we've covered for binary classification. 7174 12:42:22,160 --> 12:42:26,160 But now let's move on to a multi-class classification problem. 7175 12:42:26,160 --> 12:42:32,160 If you're with me, I'll see you in the next video. 7176 12:42:32,160 --> 12:42:33,160 Welcome back. 7177 12:42:33,160 --> 12:42:37,160 In the last few videos we've been harnessing the power of nonlinearity. 7178 12:42:37,160 --> 12:42:41,160 Specifically non-straight line functions and we replicated some here. 7179 12:42:41,160 --> 12:42:49,160 And we learned that a neural network combines linear and nonlinear functions to find patterns in data. 7180 12:42:49,160 --> 12:42:53,160 And for our simple red versus blue dots, once we added a little bit of nonlinearity, 7181 12:42:53,160 --> 12:42:58,160 we found the secret source of to start separating our blue and red dots. 7182 12:42:58,160 --> 12:43:02,160 And I also issued you the challenge to try and improve this and I think you can do it. 7183 12:43:02,160 --> 12:43:04,160 So hopefully you've given that a go. 7184 12:43:04,160 --> 12:43:06,160 But now let's keep pushing forward. 7185 12:43:06,160 --> 12:43:09,160 We're going to reiterate over basically everything that we've done, 7186 12:43:09,160 --> 12:43:15,160 except this time from the point of view of a multi-class classification problem. 7187 12:43:15,160 --> 12:43:26,160 So I believe we're up to section eight, putting it all together with a multi-class classification problem. 7188 12:43:26,160 --> 12:43:27,160 Beautiful. 7189 12:43:27,160 --> 12:43:37,160 And recall the difference between binary classification equals one thing or another such as cat versus dog. 7190 12:43:37,160 --> 12:43:47,160 If you were building a cat versus dog image classifier, spam versus not spam for say emails that were spam or not spam or 7191 12:43:47,160 --> 12:43:52,160 even internet posts on Facebook or Twitter or one of the other internet services. 7192 12:43:52,160 --> 12:43:57,160 And then fraud or not fraud for credit card transactions. 7193 12:43:57,160 --> 12:44:05,160 And then multi-class classification is more than one thing or another. 7194 12:44:05,160 --> 12:44:11,160 So we could have cat versus dog versus chicken. 7195 12:44:11,160 --> 12:44:14,160 So I think we've got all the skills to do this. 7196 12:44:14,160 --> 12:44:18,160 Our architecture might be a little bit different for a multi-class classification problem. 7197 12:44:18,160 --> 12:44:20,160 But we've got so many building blocks now. 7198 12:44:20,160 --> 12:44:21,160 It's not funny. 7199 12:44:21,160 --> 12:44:27,160 Let's clean up this and we'll add some more code cells and just to reiterate. 7200 12:44:27,160 --> 12:44:29,160 So we've gone over nonlinearity. 7201 12:44:29,160 --> 12:44:34,160 The question is what could you draw if you had an unlimited amount of straight linear and non-straight, 7202 12:44:34,160 --> 12:44:38,160 nonlinear lines, I believe you could draw some pretty intricate patterns. 7203 12:44:38,160 --> 12:44:41,160 And that is what our neural networks are doing behind the scenes. 7204 12:44:41,160 --> 12:44:47,160 And so we also learned that if we wanted to just replicate some of these nonlinear functions, 7205 12:44:47,160 --> 12:44:51,160 some of the ones that we've used before, we could create a range. 7206 12:44:51,160 --> 12:44:54,160 Linear activation is just the line itself. 7207 12:44:54,160 --> 12:45:00,160 And then if we wanted to do sigmoid, we get this curl here. 7208 12:45:00,160 --> 12:45:07,160 And then if we wanted to do relu, well, we saw how to replicate the relu function as one. 7209 12:45:07,160 --> 12:45:09,160 These both are nonlinear. 7210 12:45:09,160 --> 12:45:18,160 And of course, torch.nn has far more nonlinear activations where they came from just as it has far more different layers. 7211 12:45:18,160 --> 12:45:20,160 And you'll get used to these with practice. 7212 12:45:20,160 --> 12:45:22,160 And that's what we're doing here. 7213 12:45:22,160 --> 12:45:24,160 So let's go back to the keynote. 7214 12:45:24,160 --> 12:45:26,160 So this is what we're going to be working on. 7215 12:45:26,160 --> 12:45:27,160 Multi-class classification. 7216 12:45:27,160 --> 12:45:29,160 So there's one of the big differences here. 7217 12:45:29,160 --> 12:45:33,160 We use the softmax activation function versus sigmoid. 7218 12:45:33,160 --> 12:45:35,160 There's another big difference here. 7219 12:45:35,160 --> 12:45:39,160 Instead of binary cross entropy, we use just cross entropy. 7220 12:45:39,160 --> 12:45:42,160 But I think most of it's going to stay the same. 7221 12:45:42,160 --> 12:45:44,160 We're going to see this in action in a second. 7222 12:45:44,160 --> 12:45:48,160 But let's just describe our problem space. 7223 12:45:48,160 --> 12:45:52,160 Just to go visual, we've covered a fair bit here. 7224 12:45:52,160 --> 12:45:53,160 Well done, everyone. 7225 12:45:53,160 --> 12:45:56,160 So binary versus multi-class classification. 7226 12:45:56,160 --> 12:45:59,160 Binary one thing or another. 7227 12:45:59,160 --> 12:46:00,160 Zero or one. 7228 12:46:00,160 --> 12:46:02,160 Multi-class could be three things. 7229 12:46:02,160 --> 12:46:04,160 Could be a thousand things. 7230 12:46:04,160 --> 12:46:05,160 Could be 5,000 things. 7231 12:46:05,160 --> 12:46:07,160 Could be 25 things. 7232 12:46:07,160 --> 12:46:09,160 So more than one thing or another. 7233 12:46:09,160 --> 12:46:12,160 But that's the basic premise we're going to go with. 7234 12:46:12,160 --> 12:46:14,160 Let's create some data, hey? 7235 12:46:14,160 --> 12:46:15,160 8.1. 7236 12:46:15,160 --> 12:46:21,160 Creating a 20 multi-class data set. 7237 12:46:21,160 --> 12:46:26,160 And so to create our data set, we're going to import our dependencies. 7238 12:46:26,160 --> 12:46:29,160 We're going to re-import torch, even though we already have it. 7239 12:46:29,160 --> 12:46:31,160 Just for a little bit of completeness. 7240 12:46:31,160 --> 12:46:33,160 And we're going to go map plotlib. 7241 12:46:33,160 --> 12:46:37,160 So we can plot, as always, we like to get visual where we can. 7242 12:46:37,160 --> 12:46:40,160 Visualize, visualize, visualize. 7243 12:46:40,160 --> 12:46:44,160 We're going to import from scikitlearn.datasets. 7244 12:46:44,160 --> 12:46:47,160 Let's get make blobs. 7245 12:46:47,160 --> 12:46:49,160 Now, where would I get this from? 7246 12:46:49,160 --> 12:46:51,160 SKlearn.datasets. 7247 12:46:51,160 --> 12:46:53,160 What do we get? 7248 12:46:53,160 --> 12:46:54,160 20 data sets. 7249 12:46:54,160 --> 12:46:57,160 Do we have classification? 7250 12:46:57,160 --> 12:46:58,160 20 data sets. 7251 12:46:58,160 --> 12:47:02,160 Do we have blobs? 7252 12:47:02,160 --> 12:47:06,160 If we just go make scikitlearn. 7253 12:47:06,160 --> 12:47:10,160 Classification data sets. 7254 12:47:10,160 --> 12:47:12,160 What do we get? 7255 12:47:12,160 --> 12:47:15,160 Here's one option. 7256 12:47:15,160 --> 12:47:17,160 There's also make blobs. 7257 12:47:17,160 --> 12:47:18,160 Beautiful. 7258 12:47:18,160 --> 12:47:19,160 Make blobs. 7259 12:47:19,160 --> 12:47:20,160 This is a code for that. 7260 12:47:20,160 --> 12:47:22,160 So let's just copy this in here. 7261 12:47:22,160 --> 12:47:23,160 And make blobs. 7262 12:47:23,160 --> 12:47:25,160 We're going to see this in action anyway. 7263 12:47:25,160 --> 12:47:26,160 Make blobs. 7264 12:47:26,160 --> 12:47:29,160 As you might have guessed, it makes some blobs for us. 7265 12:47:29,160 --> 12:47:31,160 I like blobs. 7266 12:47:31,160 --> 12:47:33,160 It's a fun word to say. 7267 12:47:33,160 --> 12:47:34,160 Blobs. 7268 12:47:34,160 --> 12:47:39,160 So we want train test split because we want to make a data set and then we want to split 7269 12:47:39,160 --> 12:47:40,160 it into train and test. 7270 12:47:40,160 --> 12:47:42,160 Let's set the number of hyper parameters. 7271 12:47:42,160 --> 12:47:48,160 So set the hyper parameters for data creation. 7272 12:47:48,160 --> 12:47:51,160 Now I got these from the documentation here. 7273 12:47:51,160 --> 12:47:52,160 Number of samples. 7274 12:47:52,160 --> 12:47:53,160 How many blobs do we want? 7275 12:47:53,160 --> 12:47:55,160 How many features do we want? 7276 12:47:55,160 --> 12:47:58,160 So say, for example, we wanted two different classes. 7277 12:47:58,160 --> 12:48:00,160 That would be binary classification. 7278 12:48:00,160 --> 12:48:02,160 Say, for example, you wanted 10 classes. 7279 12:48:02,160 --> 12:48:03,160 You could set this to 10. 7280 12:48:03,160 --> 12:48:05,160 And we're going to see what the others are in practice. 7281 12:48:05,160 --> 12:48:09,160 But if you want to read through them, you can well and truly do that. 7282 12:48:09,160 --> 12:48:12,160 So let's set up. 7283 12:48:12,160 --> 12:48:13,160 We want num classes. 7284 12:48:13,160 --> 12:48:15,160 Let's double what we've been working with. 7285 12:48:15,160 --> 12:48:18,160 We've been working with two classes, red dots or blue dots. 7286 12:48:18,160 --> 12:48:19,160 Let's step it up a notch. 7287 12:48:19,160 --> 12:48:20,160 We'll go to four classes. 7288 12:48:20,160 --> 12:48:21,160 Watch out, everyone. 7289 12:48:21,160 --> 12:48:24,160 And we're going to go number of features will be two. 7290 12:48:24,160 --> 12:48:26,160 So we have the same number of features. 7291 12:48:26,160 --> 12:48:28,160 And then the random seed is going to be 42. 7292 12:48:28,160 --> 12:48:31,160 You might be wondering why these are capitalized. 7293 12:48:31,160 --> 12:48:38,160 Well, generally, if we do have some hyper parameters that we say set at the start of a notebook, 7294 12:48:38,160 --> 12:48:43,160 you'll find it's quite common for people to write them as capital letters just to say 7295 12:48:43,160 --> 12:48:46,160 that, hey, these are some settings that you can change. 7296 12:48:46,160 --> 12:48:51,160 You don't have to, but I'm just going to introduce that anyway because you might stumble upon it yourself. 7297 12:48:51,160 --> 12:48:54,160 So create multi-class data. 7298 12:48:54,160 --> 12:48:58,160 We're going to use the make blobs function here. 7299 12:48:58,160 --> 12:49:02,160 So we're going to create some x blobs, some feature blobs and some label blobs. 7300 12:49:02,160 --> 12:49:04,160 Let's see what these look like in a second. 7301 12:49:04,160 --> 12:49:08,160 I know I'm just saying blobs a lot. 7302 12:49:08,160 --> 12:49:11,160 But we pass in here, none samples. 7303 12:49:11,160 --> 12:49:12,160 How many do we want? 7304 12:49:12,160 --> 12:49:14,160 Let's create a thousand as well. 7305 12:49:14,160 --> 12:49:17,160 That could really be a hyper parameter, but we'll just leave that how it is for now. 7306 12:49:17,160 --> 12:49:23,160 Number of features is going to be num features. 7307 12:49:23,160 --> 12:49:29,160 Centres equals num classes. 7308 12:49:29,160 --> 12:49:33,160 So we're going to create four classes because we've set up num classes equal to four. 7309 12:49:33,160 --> 12:49:36,160 And then we're going to go center standard deviation. 7310 12:49:36,160 --> 12:49:41,160 We'll give them a little shake up, add a little bit of randomness in here. 7311 12:49:41,160 --> 12:49:45,160 Give the clusters a little shake up. 7312 12:49:45,160 --> 12:49:46,160 We'll mix them up a bit. 7313 12:49:46,160 --> 12:49:48,160 Make it a bit hard for our model. 7314 12:49:48,160 --> 12:49:50,160 But we'll see what this does in a second. 7315 12:49:50,160 --> 12:49:55,160 Random state equals random seed, which is our favorite random seed 42. 7316 12:49:55,160 --> 12:49:59,160 Of course, you can set it whatever number you want, but I like 42. 7317 12:49:59,160 --> 12:50:02,160 Oh, and we need a comma here, of course. 7318 12:50:02,160 --> 12:50:03,160 Beautiful. 7319 12:50:03,160 --> 12:50:05,160 Now, what do we have to do here? 7320 12:50:05,160 --> 12:50:09,160 Well, because we're using scikit-learn and scikit-learn leverages NumPy. 7321 12:50:09,160 --> 12:50:12,160 So let's turn our data into tenses. 7322 12:50:12,160 --> 12:50:14,160 Turn data into tenses. 7323 12:50:14,160 --> 12:50:16,160 And how do we do that? 7324 12:50:16,160 --> 12:50:22,160 Well, we grab x blob and we call torch from NumPy from NumPy. 7325 12:50:22,160 --> 12:50:24,160 If I could type, that would be fantastic. 7326 12:50:24,160 --> 12:50:25,160 That's all right. 7327 12:50:25,160 --> 12:50:26,160 We're doing pretty well today. 7328 12:50:26,160 --> 12:50:28,160 Haven't made too many typos. 7329 12:50:28,160 --> 12:50:31,160 We did make a few in a couple of videos before, but hey. 7330 12:50:31,160 --> 12:50:33,160 I'm only human. 7331 12:50:33,160 --> 12:50:39,160 So we're going to torch from NumPy and we're going to pass in the y blob. 7332 12:50:39,160 --> 12:50:45,160 And we'll turn it into torch dot float because remember NumPy defaults as float 64, whereas 7333 12:50:45,160 --> 12:50:47,160 PyTorch likes float 32. 7334 12:50:47,160 --> 12:50:53,160 So split into training and test. 7335 12:50:53,160 --> 12:50:58,160 And we're going to create x blob train y or x test. 7336 12:50:58,160 --> 12:51:01,160 x blob test. 7337 12:51:01,160 --> 12:51:04,160 We'll keep the blob nomenclature here. 7338 12:51:04,160 --> 12:51:08,160 y blob train and y blob test. 7339 12:51:08,160 --> 12:51:13,160 And here's again where we're going to leverage the train test split function from scikit-learn. 7340 12:51:13,160 --> 12:51:15,160 So thank you for that scikit-learn. 7341 12:51:15,160 --> 12:51:18,160 x blob and we're going to pass the y blob. 7342 12:51:18,160 --> 12:51:22,160 So features, labels, x is the features, y are the labels. 7343 12:51:22,160 --> 12:51:25,160 And a test size, we've been using a test size of 20%. 7344 12:51:25,160 --> 12:51:29,160 That means 80% of the data will be for the training data. 7345 12:51:29,160 --> 12:51:31,160 That's a fair enough split with our data set. 7346 12:51:31,160 --> 12:51:37,160 And we're going to set the random seed to random seed because generally normally train test split is random, 7347 12:51:37,160 --> 12:51:42,160 but because we want some reproducibility here, we're passing random seeds. 7348 12:51:42,160 --> 12:51:45,160 Finally, we need to get visual. 7349 12:51:45,160 --> 12:51:46,160 So let's plot the data. 7350 12:51:46,160 --> 12:51:51,160 Right now we've got a whole bunch of code and a whole bunch of talking, but not too much visuals going on. 7351 12:51:51,160 --> 12:51:56,160 So we'll write down here, visualize, visualize, visualize. 7352 12:51:56,160 --> 12:52:00,160 And we can call in plot.figure. 7353 12:52:00,160 --> 12:52:02,160 What size do we want? 7354 12:52:02,160 --> 12:52:09,160 I'm going to use my favorite hand in poker, which is 10-7, because it's generally worked out to be a good plot size. 7355 12:52:09,160 --> 12:52:16,160 In my experience, anyway, we'll go x blob. 7356 12:52:16,160 --> 12:52:22,160 And we want the zero index here, and then we'll grab x blob as well. 7357 12:52:22,160 --> 12:52:25,160 And you might notice that we're visualizing the whole data set here. 7358 12:52:25,160 --> 12:52:27,160 That's perfectly fine. 7359 12:52:27,160 --> 12:52:33,160 We could visualize, train and test separately if we really wanted to, but I'll leave that as a level challenge to you. 7360 12:52:33,160 --> 12:52:38,160 And we're going to go red, yellow, blue. 7361 12:52:38,160 --> 12:52:40,160 Wonderful. 7362 12:52:40,160 --> 12:52:41,160 What do we get wrong? 7363 12:52:41,160 --> 12:52:43,160 Oh, of course we got something wrong. 7364 12:52:43,160 --> 12:52:46,160 Santa STD, did we spell center wrong? 7365 12:52:46,160 --> 12:52:47,160 Cluster STD. 7366 12:52:47,160 --> 12:52:49,160 That's what I missed. 7367 12:52:49,160 --> 12:52:52,160 So, cluster STD. 7368 12:52:52,160 --> 12:52:53,160 Standard deviation. 7369 12:52:53,160 --> 12:52:54,160 What do we get wrong? 7370 12:52:54,160 --> 12:52:55,160 Random seed. 7371 12:52:55,160 --> 12:52:57,160 Oh, this needs to be random state. 7372 12:52:57,160 --> 12:52:59,160 Oh, another typo. 7373 12:52:59,160 --> 12:53:00,160 You know what? 7374 12:53:00,160 --> 12:53:02,160 Just as I said, I wasn't getting too many typos. 7375 12:53:02,160 --> 12:53:03,160 I'll get three. 7376 12:53:03,160 --> 12:53:04,160 There we go. 7377 12:53:04,160 --> 12:53:05,160 Look at that. 7378 12:53:05,160 --> 12:53:08,160 Our first multi-class classification data set. 7379 12:53:08,160 --> 12:53:11,160 So if we set this to zero, what does it do to our clusters? 7380 12:53:11,160 --> 12:53:15,160 Let's take note of what's going on here, particularly the space between all of the dots. 7381 12:53:15,160 --> 12:53:20,160 Now, if we set this cluster STD to zero, what happens? 7382 12:53:20,160 --> 12:53:23,160 We get dots that are really just, look at that. 7383 12:53:23,160 --> 12:53:24,160 That's too easy. 7384 12:53:24,160 --> 12:53:26,160 Let's mix it up, all right? 7385 12:53:26,160 --> 12:53:28,160 Now, you can pick whatever value you want here. 7386 12:53:28,160 --> 12:53:34,160 I'm going to use 1.5, because now we need to build a model that's going to draw some lines between these four colors. 7387 12:53:34,160 --> 12:53:36,160 Two axes, four different classes. 7388 12:53:36,160 --> 12:53:42,160 But it's not going to be perfect because we've got some red dots that are basically in the blue dots. 7389 12:53:42,160 --> 12:53:45,160 And so, what's our next step? 7390 12:53:45,160 --> 12:53:47,160 Well, we've got some data ready. 7391 12:53:47,160 --> 12:53:49,160 It's now time to build a model. 7392 12:53:49,160 --> 12:53:51,160 So, I'll see you in the next video. 7393 12:53:51,160 --> 12:53:55,160 Let's build our first multi-class classification model. 7394 12:53:57,160 --> 12:53:58,160 Welcome back. 7395 12:53:58,160 --> 12:54:03,160 In the last video, we created our multi-class classification data set, 7396 12:54:03,160 --> 12:54:06,160 using scikit-learn's make-blobs function. 7397 12:54:06,160 --> 12:54:08,160 And now, why are we doing this? 7398 12:54:08,160 --> 12:54:12,160 Well, because we're going to put all of what we've covered so far together. 7399 12:54:12,160 --> 12:54:17,160 But instead of using binary classification or working with binary classification data, 7400 12:54:17,160 --> 12:54:20,160 we're going to do it with multi-class classification data. 7401 12:54:20,160 --> 12:54:26,160 So, with that being said, let's get into building our multi-class classification model. 7402 12:54:26,160 --> 12:54:29,160 So, we'll create a little heading here. 7403 12:54:29,160 --> 12:54:36,160 Building a multi-class classification model in PyTorch. 7404 12:54:36,160 --> 12:54:39,160 And now, I want you to have a think about this. 7405 12:54:39,160 --> 12:54:42,160 We spent the last few videos covering non-linearity. 7406 12:54:42,160 --> 12:54:46,160 Does this data set need non-linearity? 7407 12:54:46,160 --> 12:54:51,160 As in, could we separate this data set with pure straight lines? 7408 12:54:51,160 --> 12:54:54,160 Or do we need some non-straight lines as well? 7409 12:54:54,160 --> 12:54:56,160 Have a think about that. 7410 12:54:56,160 --> 12:55:01,160 It's okay if you're not sure, we're going to be building a model to fit this data anyway, 7411 12:55:01,160 --> 12:55:03,160 or draw patterns in this data anyway. 7412 12:55:03,160 --> 12:55:06,160 And now, before we get into coding a model, 7413 12:55:06,160 --> 12:55:10,160 so for multi-class classification, we've got this. 7414 12:55:10,160 --> 12:55:14,160 For the input layer shape, we need to define the in features. 7415 12:55:14,160 --> 12:55:18,160 So, how many in features do we have for the hidden layers? 7416 12:55:18,160 --> 12:55:23,160 Well, we could set this to whatever we want, but we're going to keep it nice and simple for now. 7417 12:55:23,160 --> 12:55:28,160 For the number of neurons per hidden layer, again, this could be almost whatever we want, 7418 12:55:28,160 --> 12:55:32,160 but because we're working with a relatively small data set, 7419 12:55:32,160 --> 12:55:36,160 we've only got four different classes, we've only got a thousand data points, 7420 12:55:36,160 --> 12:55:39,160 we'll keep it small as well, but you could change this. 7421 12:55:39,160 --> 12:55:43,160 Remember, you can change any of these because they're hyper parameters. 7422 12:55:43,160 --> 12:55:49,160 For the output layer shape, well, how many output features do we want? 7423 12:55:49,160 --> 12:55:53,160 We need one per class, how many classes do we have? 7424 12:55:53,160 --> 12:55:59,160 We have four clusters of different dots here, so we'll need four output features. 7425 12:55:59,160 --> 12:56:03,160 And then if we go back, we have an output activation of softmax, we haven't seen that yet, 7426 12:56:03,160 --> 12:56:09,160 and then we have a loss function, rather than binary cross entropy, we have cross entropy. 7427 12:56:09,160 --> 12:56:15,160 And then optimizer as well is the same as binary classification, two of the most common 7428 12:56:15,160 --> 12:56:19,160 are SGDs, stochastic gradient descent, or the atom optimizer, 7429 12:56:19,160 --> 12:56:23,160 but of course, the torch.optim package has many different options as well. 7430 12:56:23,160 --> 12:56:28,160 So let's push forward and create our first multi-class classification model. 7431 12:56:28,160 --> 12:56:33,160 First, we're going to create, we're going to get into the habit of creating 7432 12:56:33,160 --> 12:56:39,160 device agnostic code, and we'll set the device here, equals CUDA, 7433 12:56:39,160 --> 12:56:44,160 nothing we haven't seen before, but again, we're doing this to put it all together, 7434 12:56:44,160 --> 12:56:47,160 so that we have a lot of practice. 7435 12:56:47,160 --> 12:56:54,160 Is available, else CPU, and let's go device. 7436 12:56:54,160 --> 12:56:58,160 So we should have a GPU available, beautiful CUDA. 7437 12:56:58,160 --> 12:57:03,160 Now, of course, if you don't, you can go change runtime type, select GPU here, 7438 12:57:03,160 --> 12:57:08,160 that will restart the runtime, you'll have to run all of the code that's before this cell as well, 7439 12:57:08,160 --> 12:57:11,160 but I'm going to be using a GPU. 7440 12:57:11,160 --> 12:57:14,160 You don't necessarily need one because our data set's quite small, 7441 12:57:14,160 --> 12:57:20,160 and our models aren't going to be very large, but we set this up so we have device agnostic code. 7442 12:57:20,160 --> 12:57:25,160 And so let's build a multi-class classification model. 7443 12:57:25,160 --> 12:57:31,160 Look at us go, just covering all of the foundations of classification in general here, 7444 12:57:31,160 --> 12:57:38,160 and we now know that we can combine linear and non-linear functions to create 7445 12:57:38,160 --> 12:57:42,160 neural networks that can find patterns in almost any kind of data. 7446 12:57:42,160 --> 12:57:46,160 So I'm going to call my class here blob model, and it's going to, of course, 7447 12:57:46,160 --> 12:57:51,160 inherit from nn.module, and we're going to upgrade our class here. 7448 12:57:51,160 --> 12:57:54,160 We're going to take some inputs here, and I'll show you how to do this. 7449 12:57:54,160 --> 12:57:58,160 If you're familiar with Python classes, you would have already done stuff like this, 7450 12:57:58,160 --> 12:58:01,160 but we're going to set some parameters for our models, 7451 12:58:01,160 --> 12:58:05,160 because as you write more and more complex classes, you'll want to take inputs here. 7452 12:58:05,160 --> 12:58:11,160 And I'm going to pre-build the, or pre-set the hidden units parameter to eight. 7453 12:58:11,160 --> 12:58:15,160 Because I've decided, you know what, I'm going to start off with eight hidden units, 7454 12:58:15,160 --> 12:58:19,160 and if I wanted to change this to 128, I could. 7455 12:58:19,160 --> 12:58:22,160 But in the constructor here, we've got some options. 7456 12:58:22,160 --> 12:58:24,160 So we have input features. 7457 12:58:24,160 --> 12:58:28,160 We're going to set these programmatically as inputs to our class when we instantiate it. 7458 12:58:28,160 --> 12:58:31,160 The same with output features as well. 7459 12:58:31,160 --> 12:58:35,160 And so here, we're going to call self. 7460 12:58:35,160 --> 12:58:36,160 Oh, no, super. 7461 12:58:36,160 --> 12:58:37,160 Sorry. 7462 12:58:37,160 --> 12:58:40,160 I always get this mixed up dot init. 7463 12:58:40,160 --> 12:58:42,160 And underscore underscore. 7464 12:58:42,160 --> 12:58:43,160 Beautiful. 7465 12:58:43,160 --> 12:58:46,160 So we could do a doc string here as well. 7466 12:58:46,160 --> 12:58:48,160 So let's write in this. 7467 12:58:48,160 --> 12:58:55,160 Initializes multi-class classification. 7468 12:58:55,160 --> 12:59:01,160 If I could spell class e-fication model. 7469 12:59:01,160 --> 12:59:03,160 Oh, this is great. 7470 12:59:03,160 --> 12:59:05,160 And then we have some arcs here. 7471 12:59:05,160 --> 12:59:08,160 This is just a standard way of writing doc strings. 7472 12:59:08,160 --> 12:59:13,160 If you want to find out, this is Google Python doc string guide. 7473 12:59:13,160 --> 12:59:15,160 There we go. 7474 12:59:15,160 --> 12:59:16,160 Google Python style guide. 7475 12:59:16,160 --> 12:59:19,160 This is where I get mine from. 7476 12:59:19,160 --> 12:59:20,160 You can scroll through this. 7477 12:59:20,160 --> 12:59:22,160 This is just a way to write Python code. 7478 12:59:22,160 --> 12:59:23,160 Yeah, there we go. 7479 12:59:23,160 --> 12:59:26,160 So we've got a little sentence saying what's going on. 7480 12:59:26,160 --> 12:59:27,160 We've got arcs. 7481 12:59:27,160 --> 12:59:31,160 We've got returns and we've got errors if something's going on. 7482 12:59:31,160 --> 12:59:33,160 So I highly recommend checking that out. 7483 12:59:33,160 --> 12:59:34,160 Just a little tidbit. 7484 12:59:34,160 --> 12:59:37,160 So this is if someone was to use our class later on. 7485 12:59:37,160 --> 12:59:39,160 They know what the input features are. 7486 12:59:39,160 --> 12:59:48,160 Input features, which is an int, which is number of input features to the model. 7487 12:59:48,160 --> 12:59:52,160 And then, of course, we've got output features, which is also an int. 7488 12:59:52,160 --> 12:59:56,160 Which is number of output features of the model. 7489 12:59:56,160 --> 13:00:00,160 And we've got the red line here is telling us we've got something wrong, but that's okay. 7490 13:00:00,160 --> 13:00:01,160 And then the hidden features. 7491 13:00:01,160 --> 13:00:07,160 Oh, well, this is number of output classes for the case of multi-class classification. 7492 13:00:07,160 --> 13:00:13,160 And then the hidden units. 7493 13:00:13,160 --> 13:00:23,160 Int and then number of hidden units between layers and then the default is eight. 7494 13:00:23,160 --> 13:00:24,160 Beautiful. 7495 13:00:24,160 --> 13:00:26,160 And then under that, we'll just do that. 7496 13:00:26,160 --> 13:00:29,160 Is that going to fix itself? 7497 13:00:29,160 --> 13:00:30,160 Yeah, there we go. 7498 13:00:30,160 --> 13:00:32,160 We could put in what it returns. 7499 13:00:32,160 --> 13:00:34,160 Returns, whatever it returns. 7500 13:00:34,160 --> 13:00:40,160 And then an example use case, but I'll leave that for you to fill out. 7501 13:00:40,160 --> 13:00:41,160 If you like. 7502 13:00:41,160 --> 13:00:44,160 So let's instantiate some things here. 7503 13:00:44,160 --> 13:00:50,160 What we might do is write self dot linear layer stack. 7504 13:00:50,160 --> 13:00:52,160 Self dot linear layer stack. 7505 13:00:52,160 --> 13:00:56,160 And we will set this as nn dot sequential. 7506 13:00:56,160 --> 13:00:58,160 Ooh, we haven't seen this before. 7507 13:00:58,160 --> 13:01:01,160 But we're just going to look at a different way of writing a model here. 7508 13:01:01,160 --> 13:01:04,160 Previously, when we created a model, what did we do? 7509 13:01:04,160 --> 13:01:10,160 Well, we instantiated each layer as its own parameter here. 7510 13:01:10,160 --> 13:01:15,160 And then we called on them one by one, but we did it in a straightforward fashion. 7511 13:01:15,160 --> 13:01:19,160 So that's why we're going to use sequential here to just step through our layers. 7512 13:01:19,160 --> 13:01:24,160 We're not doing anything too fancy, so we'll just set up a sequential stack of layers here. 7513 13:01:24,160 --> 13:01:30,160 And recall that sequential just steps through, passes the data through each one of these layers one by one. 7514 13:01:30,160 --> 13:01:38,160 And because we've set up the parameters up here, input features can equal to input features. 7515 13:01:38,160 --> 13:01:41,160 And output features, what is this going to be? 7516 13:01:41,160 --> 13:01:45,160 Is this going to be output features or is this going to be hidden units? 7517 13:01:45,160 --> 13:01:48,160 It's going to be hidden units because it's not the final layer. 7518 13:01:48,160 --> 13:01:53,160 We want the final layer to output our output features. 7519 13:01:53,160 --> 13:01:59,160 So input features, this will be hidden units because remember the subsequent layer needs to line up with the previous layer. 7520 13:01:59,160 --> 13:02:04,160 Output features, we're going to create another one that outputs hidden units. 7521 13:02:04,160 --> 13:02:16,160 And then we'll go in n.linear in features equals hidden units because it takes the output features of the previous layer. 7522 13:02:16,160 --> 13:02:19,160 So as you see here, the output features of this feeds into here. 7523 13:02:19,160 --> 13:02:22,160 The output features of this feeds into here. 7524 13:02:22,160 --> 13:02:25,160 And then finally, this is going to be our final layer. 7525 13:02:25,160 --> 13:02:26,160 We'll do three layers. 7526 13:02:26,160 --> 13:02:31,160 Output features equals output features. 7527 13:02:31,160 --> 13:02:35,160 Wonderful. So how do we know the values of each of these? 7528 13:02:35,160 --> 13:02:43,160 Well, let's have a look at xtrain.shape and ytrain.shape. 7529 13:02:43,160 --> 13:02:46,160 So in the case of x, we have two input features. 7530 13:02:46,160 --> 13:02:51,160 And in the case of y, well, this is a little confusing as well because y is a scalar. 7531 13:02:51,160 --> 13:02:55,160 But what do you think the values for y are going to be? 7532 13:02:55,160 --> 13:03:01,160 Well, let's go NP. Or is there torch.unique? I'm not sure. Let's find out together, hey? 7533 13:03:01,160 --> 13:03:03,160 Torch unique. 7534 13:03:03,160 --> 13:03:07,160 Zero on one, ytrain. Oh, we need y blob train. That's right, blob. 7535 13:03:07,160 --> 13:03:11,160 I'm too used to writing blob. 7536 13:03:11,160 --> 13:03:15,160 And we need blob train, but I believe it's the same here. 7537 13:03:15,160 --> 13:03:18,160 And then blob. 7538 13:03:18,160 --> 13:03:22,160 There we go. So we have four classes. 7539 13:03:22,160 --> 13:03:26,160 So we need an output features value of four. 7540 13:03:26,160 --> 13:03:34,160 And now if we wanted to add nonlinearity here, we could put it in between our layers here like this. 7541 13:03:34,160 --> 13:03:41,160 But I asked the question before, do you think that this data set needs nonlinearity? 7542 13:03:41,160 --> 13:03:43,160 Well, let's leave it in there to begin with. 7543 13:03:43,160 --> 13:03:46,160 And one of the challenges for you, oh, do we need commerce here? 7544 13:03:46,160 --> 13:03:48,160 I think we need commerce here. 7545 13:03:48,160 --> 13:03:54,160 One of the challenges for you will be to test the model with nonlinearity 7546 13:03:54,160 --> 13:03:56,160 and without nonlinearity. 7547 13:03:56,160 --> 13:03:59,160 So let's just leave it in there for the time being. 7548 13:03:59,160 --> 13:04:01,160 What's missing from this? 7549 13:04:01,160 --> 13:04:03,160 Well, we need a forward method. 7550 13:04:03,160 --> 13:04:07,160 So def forward self X. What can we do here? 7551 13:04:07,160 --> 13:04:12,160 Well, because we've created this as a linear layer stack using nn.sequential, 7552 13:04:12,160 --> 13:04:18,160 we can just go return linear layer stack and pass it X. 7553 13:04:18,160 --> 13:04:20,160 And what's going to happen? 7554 13:04:20,160 --> 13:04:25,160 Whatever input goes into the forward method is just going to go through these layers sequentially. 7555 13:04:25,160 --> 13:04:30,160 Oh, we need to put self here because we've initialized it in the constructor. 7556 13:04:30,160 --> 13:04:31,160 Beautiful. 7557 13:04:31,160 --> 13:04:41,160 And now let's create an instance of blob model and send it to the target device. 7558 13:04:41,160 --> 13:04:45,160 We'll go model four equals blob model. 7559 13:04:45,160 --> 13:04:51,160 And then we can use our input features parameter, which is this one here. 7560 13:04:51,160 --> 13:04:54,160 And we're going to pass it a value of what? 7561 13:04:54,160 --> 13:04:55,160 Two. 7562 13:04:55,160 --> 13:04:58,160 And then output features. Why? Because we have two X features. 7563 13:04:58,160 --> 13:05:02,160 Now, the output feature is going to be the same as the number of classes that we have for. 7564 13:05:02,160 --> 13:05:05,160 If we had 10 classes, we'd set it to 10. 7565 13:05:05,160 --> 13:05:06,160 So we'll go four. 7566 13:05:06,160 --> 13:05:09,160 And then the hidden units is going to be eight by default. 7567 13:05:09,160 --> 13:05:12,160 So we don't have to put this here, but we're going to put it there anyway. 7568 13:05:12,160 --> 13:05:17,160 And then, of course, we're going to send this to device. 7569 13:05:17,160 --> 13:05:24,160 And then we're going to go model four. 7570 13:05:24,160 --> 13:05:26,160 What do we get wrong here? 7571 13:05:26,160 --> 13:05:30,160 Unexpected keyword argument output features. 7572 13:05:30,160 --> 13:05:31,160 Do we spell something wrong? 7573 13:05:31,160 --> 13:05:33,160 No doubt. We've got a spelling mistake. 7574 13:05:33,160 --> 13:05:40,160 Output features. Output features. 7575 13:05:40,160 --> 13:05:42,160 Oh, out features. 7576 13:05:42,160 --> 13:05:48,160 Ah, that's what we needed. Out features, not output. 7577 13:05:48,160 --> 13:05:50,160 I've got a little confused there. 7578 13:05:50,160 --> 13:05:51,160 Okay. 7579 13:05:51,160 --> 13:05:53,160 There we go. Okay, beautiful. 7580 13:05:53,160 --> 13:05:56,160 So just recall that the parameter here for an end up linear. 7581 13:05:56,160 --> 13:05:57,160 Did you pick up on that? 7582 13:05:57,160 --> 13:05:59,160 Is out features not output features. 7583 13:05:59,160 --> 13:06:05,160 Output features, a little confusing here, is our final layout output layers number of features there. 7584 13:06:05,160 --> 13:06:11,160 So we've now got a multi-class classification model that lines up with the data that we're using. 7585 13:06:11,160 --> 13:06:13,160 So the shapes line up. Beautiful. 7586 13:06:13,160 --> 13:06:15,160 Well, what's next? 7587 13:06:15,160 --> 13:06:20,160 Well, we have to create a loss function. And, of course, a training loop. 7588 13:06:20,160 --> 13:06:25,160 So I'll see you in the next few videos. And let's do that together. 7589 13:06:25,160 --> 13:06:31,160 Welcome back. In the last video, we created our multi-class classification model. 7590 13:06:31,160 --> 13:06:35,160 And we did so by subclassing an end up module. 7591 13:06:35,160 --> 13:06:39,160 And we set up a few parameters for our class constructor here. 7592 13:06:39,160 --> 13:06:44,160 So that when we made an instance of the blob model, we could customize the input features. 7593 13:06:44,160 --> 13:06:49,160 The output features. Remember, this lines up with how many features X has. 7594 13:06:49,160 --> 13:06:54,160 And the output features here lines up with how many classes are in our data. 7595 13:06:54,160 --> 13:06:58,160 So if we had 10 classes, we could change this to 10. And it would line up. 7596 13:06:58,160 --> 13:07:02,160 And then if we wanted 128 hidden units, well, we could change that. 7597 13:07:02,160 --> 13:07:07,160 So we're getting a little bit more programmatic with how we create models here. 7598 13:07:07,160 --> 13:07:14,160 And as you'll see later on, a lot of the things that we've built in here can also be functionalized in a similar matter. 7599 13:07:14,160 --> 13:07:16,160 But let's keep pushing forward. What's our next step? 7600 13:07:16,160 --> 13:07:23,160 If we build a model, if we refer to the workflow, you'd see that we have to create a loss function. 7601 13:07:23,160 --> 13:07:32,160 And an optimizer for a multi-class classification model. 7602 13:07:32,160 --> 13:07:36,160 And so what's our option here for creating a loss function? 7603 13:07:36,160 --> 13:07:39,160 Where do we find loss functions in PyTorch? I'm just going to get out of this. 7604 13:07:39,160 --> 13:07:44,160 And I'll make a new tab here. And if we search torch.nn 7605 13:07:44,160 --> 13:07:50,160 Because torch.nn is the basic building box for graphs. In other words, neural networks. 7606 13:07:50,160 --> 13:07:55,160 Where do we find loss functions? Hmm, here we go. Beautiful. 7607 13:07:55,160 --> 13:08:00,160 So we've seen that L1 loss or MSE loss could be used for regression, predicting a number. 7608 13:08:00,160 --> 13:08:07,160 And I'm here to tell you as well that for classification, we're going to be looking at cross entropy loss. 7609 13:08:07,160 --> 13:08:14,160 Now, this is for multi-class classification. For binary classification, we work with BCE loss. 7610 13:08:14,160 --> 13:08:20,160 And of course, there's a few more here, but I'm going to leave that as something that you can explore on your own. 7611 13:08:20,160 --> 13:08:24,160 Let's jump in to cross entropy loss. 7612 13:08:24,160 --> 13:08:30,160 So what do we have here? This criterion computes. Remember, a loss function in PyTorch is also referred to as a criterion. 7613 13:08:30,160 --> 13:08:36,160 You might also see loss function referred to as cost function, C-O-S-T. 7614 13:08:36,160 --> 13:08:43,160 But I call them loss functions. So this criterion computes the cross entropy loss between input and target. 7615 13:08:43,160 --> 13:08:49,160 Okay, so the input is something, and the target is our target labels. 7616 13:08:49,160 --> 13:08:54,160 It is useful when training a classification problem with C classes. There we go. 7617 13:08:54,160 --> 13:09:00,160 So that's what we're doing. We're training a classification problem with C classes, C is a number of classes. 7618 13:09:00,160 --> 13:09:06,160 If provided the optional argument, weight should be a 1D tensor assigning a weight to each of the classes. 7619 13:09:06,160 --> 13:09:15,160 So we don't have to apply a weight here, but why would you apply a weight? Well, it says, if we look at weight here, 7620 13:09:15,160 --> 13:09:20,160 this is particularly useful when you have an unbalanced training set. So just keep this in mind as you're going forward. 7621 13:09:20,160 --> 13:09:29,160 If you wanted to train a dataset that has imbalanced samples, in our case we have the same number of samples for each class, 7622 13:09:29,160 --> 13:09:33,160 but sometimes you might come across a dataset with maybe you only have 10 yellow dots. 7623 13:09:33,160 --> 13:09:39,160 And maybe you have 500 blue dots and only 100 red and 100 light blue dots. 7624 13:09:39,160 --> 13:09:44,160 So you have an unbalanced dataset. So that's where you can come in and have a look at the weight parameter here. 7625 13:09:44,160 --> 13:09:51,160 But for now, we're just going to keep things simple. We have a balanced dataset, and we're going to focus on using this loss function. 7626 13:09:51,160 --> 13:09:59,160 If you'd like to read more, please, you can read on here. And if you wanted to find out more, you could go, what is cross entropy loss? 7627 13:09:59,160 --> 13:10:05,160 And I'm sure you'll find a whole bunch of loss functions. There we go. There's the ML cheat sheet. I love that. 7628 13:10:05,160 --> 13:10:11,160 The ML glossary, that's one of my favorite websites. Towards data science, you'll find that website, Wikipedia. 7629 13:10:11,160 --> 13:10:17,160 Machine learning mastery is also another fantastic website. But you can do that all in your own time. 7630 13:10:17,160 --> 13:10:26,160 Let's code together, hey. We'll set up a loss function. Oh, and one more resource before we get into code is that we've got the architecture, 7631 13:10:26,160 --> 13:10:37,160 well, the typical architecture of a classification model. The loss function for multi-class classification is cross entropy or torch.nn.cross entropy loss. 7632 13:10:37,160 --> 13:10:51,160 Let's code it out. If in doubt, code it out. So create a loss function for multi-class classification. 7633 13:10:51,160 --> 13:11:03,160 And then we go, loss fn equals, and then dot cross entropy loss. Beautiful. And then we want to create an optimizer. 7634 13:11:03,160 --> 13:11:12,160 Create an optimizer for multi-class classification. And then the beautiful thing about optimizers is they're quite flexible. 7635 13:11:12,160 --> 13:11:20,160 They can go across a wide range of different problems. So the optimizer. So two of the most common, and I say most common because they work quite well. 7636 13:11:20,160 --> 13:11:30,160 Across a wide range of problems. So that's why I've only listed two here. But of course, within the torch dot opt in module, you will find a lot more different optimizers. 7637 13:11:30,160 --> 13:11:43,160 But let's stick with SGD for now. And we'll go back and go optimizer equals torch dot opt in for optimizer SGD for stochastic gradient descent. 7638 13:11:43,160 --> 13:11:51,160 The parameters we want our optimizer to optimize model four, we're up to our fourth model already. Oh my goodness. 7639 13:11:51,160 --> 13:12:00,160 Model four dot parameters. And we'll set the learning rate to 0.1. Of course, you could change the learning rate if you wanted to. 7640 13:12:00,160 --> 13:12:09,160 In fact, I'd encourage you to see what happens if you do because why the learning rate is a hyper parameter. 7641 13:12:09,160 --> 13:12:22,160 I'm better at writing code than I am at spelling. You can change. Wonderful. So we've now got a loss function and an optimizer for a multi class classification problem. 7642 13:12:22,160 --> 13:12:26,160 What's next? Well, we could start to build. 7643 13:12:26,160 --> 13:12:35,160 Building a training loop. We could start to do that, but I think we have a look at what the outputs of our model are. 7644 13:12:35,160 --> 13:12:47,160 So more specifically, so getting prediction probabilities for a multi class pie torch model. 7645 13:12:47,160 --> 13:12:56,160 So my challenge to you before the next video is to have a look at what happens if you pass x blob test through a model. 7646 13:12:56,160 --> 13:13:01,160 And remember, what is a model's raw output? What is that referred to as? 7647 13:13:01,160 --> 13:13:06,160 Oh, I'll let you have a think about that before the next video. I'll see you there. 7648 13:13:06,160 --> 13:13:13,160 Welcome back. In the last video, we created a loss function and an optimizer for our multi class classification model. 7649 13:13:13,160 --> 13:13:21,160 And recall the loss function measures how wrong our model's predictions are. 7650 13:13:21,160 --> 13:13:35,160 And the optimizer optimizer updates our model parameters to try and reduce the loss. 7651 13:13:35,160 --> 13:13:44,160 So that's what that does. And I also issued the challenge of doing a forward pass with model four, which is the most recent model that we created. 7652 13:13:44,160 --> 13:13:53,160 And oh, did I just give you some code that wouldn't work? Did I do that on purpose? Maybe, maybe not, you'll never know. 7653 13:13:53,160 --> 13:14:00,160 So if this did work, what are the raw outputs of our model? Let's get some raw outputs of our model. 7654 13:14:00,160 --> 13:14:04,160 And if you recall, the raw outputs of a model are called logits. 7655 13:14:04,160 --> 13:14:11,160 So we got a runtime error expected. All tensors to be on the same device are of course. Why did this come up? 7656 13:14:11,160 --> 13:14:20,160 Well, because if we go next model for dot parameters, and if we check device, what happens here? 7657 13:14:20,160 --> 13:14:29,160 Oh, we need to bring this in. Our model is on the CUDA device, whereas our data is on the CPU still. 7658 13:14:29,160 --> 13:14:35,160 Can we go X? Is our data a tensor? Can we check the device parameter of that? I think we can. 7659 13:14:35,160 --> 13:14:41,160 I might be proven wrong here. Oh, it's on the CPU. Of course, we're getting a runtime error. 7660 13:14:41,160 --> 13:14:45,160 Did you catch that one? If you did, well done. So let's see what happens. 7661 13:14:45,160 --> 13:14:53,160 But before we do a forward pass, how about we turn our model into a vowel mode to make some predictions with torch dot inference mode? 7662 13:14:53,160 --> 13:14:59,160 We'll make some predictions. We don't necessarily have to do this because it's just tests, but it's a good habit. 7663 13:14:59,160 --> 13:15:08,160 Oh, why prads? Equals, what do we get? Why prads? And maybe we'll just view the first 10. 7664 13:15:08,160 --> 13:15:17,160 What do we get here? Oh, my goodness. How much are numbers on a page? Is this the same format as our data or our test labels? 7665 13:15:17,160 --> 13:15:25,160 Let's have a look. No, it's not. Okay. Oh, we need why blob test. Excuse me. 7666 13:15:25,160 --> 13:15:33,160 We're going to make that mistake a fair few times here. So we need to get this into the format of this. Hmm. 7667 13:15:33,160 --> 13:15:42,160 How can we do that? Now, I want you to notice one thing as well is that we have one value here per one value, except that this is actually four values. 7668 13:15:42,160 --> 13:15:54,160 Now, why is that? We have one, two, three, four. Well, that is because we set the out features up here. Our model outputs four features per sample. 7669 13:15:54,160 --> 13:16:01,160 So each sample right now has four numbers associated with it. And what are these called? These are the logits. 7670 13:16:01,160 --> 13:16:10,160 Now, what we have to do here, so let's just write this down in order to evaluate and train and test our model. 7671 13:16:10,160 --> 13:16:25,160 We need to convert our model's outputs, outputs which are logits to prediction probabilities, and then to prediction labels. 7672 13:16:25,160 --> 13:16:34,160 So we've done this before, but for binary classification. So we have to go from logits to predprobs to pred labels. 7673 13:16:34,160 --> 13:16:43,160 All right, I think we can do this. So we've got some logits here. Now, how do we convert these logits to prediction probabilities? 7674 13:16:43,160 --> 13:16:50,160 Well, we use an activation function. And if we go back to our architecture, what's our output activation here? 7675 13:16:50,160 --> 13:17:00,160 For a binary classification, we use sigmoid. But for multi-class classification, these are the two main differences between multi-class classification and binary classification. 7676 13:17:00,160 --> 13:17:06,160 One uses softmax, one uses cross entropy. And it's going to take a little bit of practice to know this off by heart. 7677 13:17:06,160 --> 13:17:11,160 It took me a while, but that's why we have nice tables like this. And that's why we write a lot of code together. 7678 13:17:11,160 --> 13:17:20,160 So we're going to use a softmax function here to convert out logits. Our models raw outputs, which is this here, to prediction probabilities. 7679 13:17:20,160 --> 13:17:30,160 And let's see that. So convert our models, logit outputs to prediction probabilities. 7680 13:17:30,160 --> 13:17:39,160 So let's create why predprobs. So I like to call prediction probabilities predprobs for short. 7681 13:17:39,160 --> 13:17:47,160 So torch dot softmax. And then we go why logits. And we want it across the first dimension. 7682 13:17:47,160 --> 13:17:55,160 So let's have a look. If we print why logits, we'll get the first five values there. And then look at the conversion here. 7683 13:17:55,160 --> 13:18:06,160 Why logits? Oh, why predprobs? That's what we want to compare. Predprobs. Five. Let's check this out. 7684 13:18:06,160 --> 13:18:16,160 Oh, what did we get wrong here? Why logits? Do we have why logits? Oh, no. We should change this to why logits, because really that's the raw output of our model here. 7685 13:18:16,160 --> 13:18:26,160 Why logits? Let's rerun that. Check that. We know that these are different to these, but we ideally want these to be in the same format as these, our test labels. 7686 13:18:26,160 --> 13:18:36,160 These are our models predictions. And now we should be able to convert. There we go. Okay, beautiful. What's happening here? Let's just get out of this. 7687 13:18:36,160 --> 13:18:45,160 And we will add a few code cells here. So we have some space. Now, if you wanted to find out what's happening with torch dot softmax, what could you do? 7688 13:18:45,160 --> 13:18:57,160 We could go torch softmax. See what's happening. Softmax. Okay, so here's the function that's happening. We replicated some nonlinear activation functions before. 7689 13:18:57,160 --> 13:19:04,160 So if you wanted to replicate this, what could you do? Well, if in doubt, code it out. You could code this out. You've got the tools to do so. 7690 13:19:04,160 --> 13:19:16,160 We've got softmax to some X input takes the exponential of X. So torch exponential over the sum of torch exponential of X. So I think you could code that out if you wanted to. 7691 13:19:16,160 --> 13:19:27,160 But let's for now just stick with what we've got. We've got some logits here, and we've got some softmax, some logits that have been passed through the softmax function. 7692 13:19:27,160 --> 13:19:35,160 So that's what's happened here. We've passed our logits as the input here, and it's gone through this activation function. 7693 13:19:35,160 --> 13:19:43,160 These are prediction probabilities. And you might be like, Daniel, these are still just numbers on a page. But you also notice that none of them are negative. 7694 13:19:43,160 --> 13:19:50,160 Okay, and there's another little tidbit about what's going on here. If we sum one of them up, let's get the first one. 7695 13:19:50,160 --> 13:19:58,160 Will this work? And if we go torch dot sum, what happens? 7696 13:19:58,160 --> 13:20:09,160 Ooh, they all sum up to one. So that's one of the effects of the softmax function. And then if we go torch dot max of Y-pred probes. 7697 13:20:09,160 --> 13:20:12,160 So this is a prediction probability. 7698 13:20:12,160 --> 13:20:25,160 For multi class, you'll find that for this particular sample here, the 0th sample, this is the maximum number. And so our model, what this is saying is our model is saying, this is the prediction probability. 7699 13:20:25,160 --> 13:20:34,160 This is how much I think it is class 0. This number here, it's in order. This is how much I think it is class 1. This is how much I think it is class 2. 7700 13:20:34,160 --> 13:20:43,160 This is how much I think it is class 3. And so we have one value for each of our four classes, a little bit confusing because it's 0th indexed. 7701 13:20:43,160 --> 13:20:55,160 But the maximum value here is this index. And so how would we get the particular index value of whatever the maximum number is across these values? 7702 13:20:55,160 --> 13:21:08,160 Well, we can take the argmax and we get tensor 1. So for this particular sample, this one here, our model, and these guesses or these predictions aren't very good. 7703 13:21:08,160 --> 13:21:17,160 Why is that? Well, because our model is still just predicting with random numbers, we haven't trained it yet. So this is just random output here, basically. 7704 13:21:17,160 --> 13:21:30,160 But for now, the premise still remains that our model thinks that for this sample using random numbers, it thinks that index 1 is the right class or class number 1 for this particular sample. 7705 13:21:30,160 --> 13:21:39,160 And then for this next one, what's the maximum number here? I think it would be the 0th index and the same for the next one. What's the maximum number here? 7706 13:21:39,160 --> 13:21:46,160 Well, it would be the 0th index as well. But of course, these numbers are going to change once we've trained our model. 7707 13:21:46,160 --> 13:22:02,160 So how do we get the maximum index value of all of these? So this is where we can go, convert our model's prediction probabilities to prediction labels. 7708 13:22:02,160 --> 13:22:18,160 So let's do that. We can go ypreds equals torch dot argmax on ypredprobs. And if we go across the first dimension as well. So now let's have a look at ypreds. 7709 13:22:18,160 --> 13:22:36,160 Do we have prediction labels in the same format as our ylob test? Beautiful. Yes, we do. Although many of them are wrong, as you can see, ideally they would line up with each other. 7710 13:22:36,160 --> 13:22:44,160 But because our model is predicting or making predictions with random numbers, so they haven't been our model hasn't been trained. All of these are basically random outputs. 7711 13:22:44,160 --> 13:22:52,160 So hopefully once we train our model, it's going to line up the values of the predictions are going to line up with the values of the test labels. 7712 13:22:52,160 --> 13:23:03,160 But that is how we go from our model's raw outputs to prediction probabilities to prediction labels for a multi-class classification problem. 7713 13:23:03,160 --> 13:23:25,160 So let's just add the steps here, logits, raw output of the model, predprobs, to get the prediction probabilities, use torch dot softmax or the softmax activation function, pred labels, take the argmax of the prediction probabilities. 7714 13:23:25,160 --> 13:23:37,160 So we're going to see this in action later on when we evaluate our model, but I feel like now that we know how to go from logits to prediction probabilities to pred labels, we can write a training loop. 7715 13:23:37,160 --> 13:23:51,160 So let's set that up. 8.5, create a training loop, and testing loop for a multi-class pytorch model. This is so exciting. 7716 13:23:51,160 --> 13:23:59,160 I'll see you in the next video. Let's build our first training and testing loop for a multi-class pytorch model, and I'll give you a little hint. 7717 13:23:59,160 --> 13:24:05,160 It's quite similar to the training and testing loops we've built before, so you might want to give it a shot. I think you can. 7718 13:24:05,160 --> 13:24:09,160 Otherwise, we'll do it together in the next video. 7719 13:24:09,160 --> 13:24:20,160 Welcome back. In the last video, we covered how to go from raw logits, which is the output of the model, the raw output of the model for a multi-class pytorch model. 7720 13:24:20,160 --> 13:24:38,160 Then we turned our logits into prediction probabilities using torch.softmax, and then we turn those prediction probabilities into prediction labels by taking the argmax, which returns the index of where the maximum value occurs in the prediction probability. 7721 13:24:38,160 --> 13:24:51,160 So for this particular sample, with these four values, because it outputs four values, because we're working with four classes, if we were working with 10 classes, it would have 10 values, the principle of these steps would still be the same. 7722 13:24:51,160 --> 13:25:03,160 So for this particular sample, this is the value that's the maximum, so we would take that index, which is 1. For this one, the index 0 has the maximum value. 7723 13:25:03,160 --> 13:25:11,160 For this sample, same again, and then same again, I mean, these prediction labels are just random, right? So they're quite terrible. 7724 13:25:11,160 --> 13:25:17,160 But now we're going to change that, because we're going to build a training and testing loop for our multi-class model. 7725 13:25:17,160 --> 13:25:24,160 Let's do that. So fit the multi-class model to the data. 7726 13:25:24,160 --> 13:25:29,160 Let's go set up some manual seeds. 7727 13:25:29,160 --> 13:25:38,160 Torch dot manual seed, again, don't worry too much if our numbers on the page are not exactly the same. That's inherent to the randomness of machine learning. 7728 13:25:38,160 --> 13:25:47,160 We're setting up the manual seeds to try and get them as close as possible, but these do not guarantee complete determinism, which means the same output. 7729 13:25:47,160 --> 13:25:51,160 But we're going to try. The direction is more important. 7730 13:25:51,160 --> 13:26:01,160 Set number of epochs. We're going to go epochs. How about we just do 100? I reckon we'll start with that. We can bump it up to 1000 if we really wanted to. 7731 13:26:01,160 --> 13:26:10,160 Let's put the data to the target device. What's our target device? Well, it doesn't really matter because we've set device agnostic code. 7732 13:26:10,160 --> 13:26:18,160 So whether we're working with a CPU or a GPU, our code will use whatever device is available. I'm typing blog again. 7733 13:26:18,160 --> 13:26:27,160 So we've got x blob train, y blob train. This is going to go where? It's going to go to the device. 7734 13:26:27,160 --> 13:26:43,160 And y blob train to device. And we're going to go x blob test. And then y blob test equals x blob test to device. 7735 13:26:43,160 --> 13:26:54,160 Otherwise, we'll get device issues later on, and we'll send this to device as well. Beautiful. Now, what do we do now? Well, we loop through data. 7736 13:26:54,160 --> 13:27:01,160 Loop through data. So for an epoch in range epochs for an epoch in a range. 7737 13:27:01,160 --> 13:27:05,160 Epox. I don't want that auto correct. Come on, Google Colab. Work with me here. 7738 13:27:05,160 --> 13:27:12,160 We're training our first multi-class classification model. This is serious business. No, I'm joking. It's actually quite fun. 7739 13:27:12,160 --> 13:27:21,160 So model four dot train. And let's do the forward pass. I'm not going to put much commentary here because we've been through this before. 7740 13:27:21,160 --> 13:27:29,160 But what are the logits? The logits are raw outputs of our model. So we'll just go x blob train. 7741 13:27:29,160 --> 13:27:41,160 And x test. I didn't want that. X blob train. Why did that do that? I need to turn off auto correct in Google Colab. I've been saying it for a long time. 7742 13:27:41,160 --> 13:27:50,160 Y pred equals torch dot softmax. So what are we doing here? We're going from logits to prediction probabilities here. 7743 13:27:50,160 --> 13:28:03,160 So torch softmax. Y logits. Across the first dimension. And then we can take the argmax of this and dim equals one. 7744 13:28:03,160 --> 13:28:08,160 In fact, I'm going to show you a little bit of, oh, I've written blog here. Maybe auto correct would have been helpful for that. 7745 13:28:08,160 --> 13:28:16,160 A little trick. You don't actually have to do the torch softmax. The logits. If you just took the argmax of the logits is a little test for you. 7746 13:28:16,160 --> 13:28:23,160 Just take the argmax of the logits. And see, do you get the same similar outputs as what you get here? 7747 13:28:23,160 --> 13:28:31,160 So I've seen that done before, but for completeness, we're going to use the softmax activation function because you'll often see this in practice. 7748 13:28:31,160 --> 13:28:41,160 And now what do we do? We calculate the loss. So the loss FM. We're going to use categorical cross entropy here or just cross entropy loss. 7749 13:28:41,160 --> 13:28:52,160 So if we check our loss function, what do we have? We have cross entropy loss. We're going to compare our models, logits to y blob train. 7750 13:28:52,160 --> 13:28:58,160 And then what are we going to do? We're going to calculate the accuracy because we're working with the classification problem. 7751 13:28:58,160 --> 13:29:06,160 It'd be nice if we had accuracy as well as loss. Accuracy is one of the main classification evaluation metrics. 7752 13:29:06,160 --> 13:29:18,160 y pred equals y pred. y pred. And now what do we do? Well, we have to zero grab the optimizer. Optimizer zero grad. 7753 13:29:18,160 --> 13:29:25,160 Then we go loss backward. And then we step the optimizer. Optimizer step, step, step. 7754 13:29:25,160 --> 13:29:33,160 So none of these steps we haven't covered before. We do the forward pass. We calculate the loss and any evaluation metric we choose to do so. 7755 13:29:33,160 --> 13:29:39,160 We zero the optimizer. We perform back propagation on the loss. And we step the optimizer. 7756 13:29:39,160 --> 13:29:47,160 The optimizer will hopefully behind the scenes update the parameters of our model to better represent the patterns in our training data. 7757 13:29:47,160 --> 13:29:53,160 And so we're going to go testing code here. What do we do for testing code? Well, or inference code. 7758 13:29:53,160 --> 13:29:58,160 We set our model to a vowel mode. 7759 13:29:58,160 --> 13:30:04,160 That's going to turn off a few things behind the scenes that our model doesn't need such as dropout layers, which we haven't covered. 7760 13:30:04,160 --> 13:30:10,160 But you're more than welcome to check them out if you go torch and end. 7761 13:30:10,160 --> 13:30:18,160 Dropout layers. Do we have dropout? Dropout layers. Beautiful. And another one that it turns off is match norm. 7762 13:30:18,160 --> 13:30:24,160 Beautiful. And also you could search this. What does model dot a vowel do? 7763 13:30:24,160 --> 13:30:29,160 And you might come across stack overflow question. One of my favorite resources. 7764 13:30:29,160 --> 13:30:34,160 So there's a little bit of extra curriculum. But I prefer to see things in action. 7765 13:30:34,160 --> 13:30:40,160 So with torch inference mode, again, this turns off things like gradient tracking and a few more things. 7766 13:30:40,160 --> 13:30:45,160 So we get as fast as code as possible because we don't need to track gradients when we're making predictions. 7767 13:30:45,160 --> 13:30:49,160 We just need to use the parameters that our model has learned. 7768 13:30:49,160 --> 13:30:57,160 We want X blob test to go to our model here for the test logits. And then for the test preds, we're going to do the same step as what we've done here. 7769 13:30:57,160 --> 13:31:04,160 We're going to go torch dot softmax on the test logits across the first dimension. 7770 13:31:04,160 --> 13:31:12,160 And we're going to call the argmax on that to get the index value of where the maximum prediction probability value occurs. 7771 13:31:12,160 --> 13:31:19,160 We're going to calculate the test loss or loss function. We're going to pass in what the test logits here. 7772 13:31:19,160 --> 13:31:24,160 Then we're going to pass in why blob test compare the test logits behind the scenes. 7773 13:31:24,160 --> 13:31:32,160 Our loss function is going to do some things that convert the test logits into the same format as our test labels and then return us a value for that. 7774 13:31:32,160 --> 13:31:40,160 Then we'll also calculate the test accuracy here by passing in the why true as why blob test. 7775 13:31:40,160 --> 13:31:43,160 And we have the y pred equals y pred. 7776 13:31:43,160 --> 13:31:44,160 Wonderful. 7777 13:31:44,160 --> 13:31:46,160 And then what's our final step? 7778 13:31:46,160 --> 13:31:53,160 Well, we want to print out what's happening because I love seeing metrics as our model trains. 7779 13:31:53,160 --> 13:31:55,160 It's one of my favorite things to watch. 7780 13:31:55,160 --> 13:32:00,160 If we go if epoch, let's do it every 10 epochs because we've got 100 so far. 7781 13:32:00,160 --> 13:32:02,160 It equals zero. 7782 13:32:02,160 --> 13:32:08,160 Let's print out a nice f string with epoch. 7783 13:32:08,160 --> 13:32:11,160 And then we're going to go loss. 7784 13:32:11,160 --> 13:32:12,160 What do we put in here? 7785 13:32:12,160 --> 13:32:20,160 We'll get our loss value, but we'll take it to four decimal places and we'll get the training accuracy, which will be acc. 7786 13:32:20,160 --> 13:32:26,160 And we'll take this to two decimal places and we'll get a nice percentage sign there. 7787 13:32:26,160 --> 13:32:33,160 And we'll go test loss equals test loss and we'll go there. 7788 13:32:33,160 --> 13:32:38,160 And finally, we'll go test act at the end here, test act. 7789 13:32:38,160 --> 13:32:41,160 Now, I'm sure by now we've written a fair few of these. 7790 13:32:41,160 --> 13:32:46,160 You're either getting sick of them or you're like, wow, I can actually do the steps through here. 7791 13:32:46,160 --> 13:32:49,160 And so don't worry, we're going to be functionalizing all of this later on, 7792 13:32:49,160 --> 13:32:54,160 but I thought I'm going to include them as much as possible so that we can practice as much as possible together. 7793 13:32:54,160 --> 13:32:55,160 So you ready? 7794 13:32:55,160 --> 13:32:59,160 We're about to train our first multi-class classification model. 7795 13:32:59,160 --> 13:33:03,160 In three, two, one, let's go. 7796 13:33:03,160 --> 13:33:04,160 No typos. 7797 13:33:04,160 --> 13:33:05,160 Of course. 7798 13:33:05,160 --> 13:33:07,160 What do we get wrong here? 7799 13:33:07,160 --> 13:33:09,160 Oh, this is a fun error. 7800 13:33:09,160 --> 13:33:11,160 Runtime error. 7801 13:33:11,160 --> 13:33:16,160 NLL loss for reduced CUDA kernel to the index not implemented for float. 7802 13:33:16,160 --> 13:33:19,160 Okay, that's a pretty full on bunch of words there. 7803 13:33:19,160 --> 13:33:21,160 I don't really know how to describe that. 7804 13:33:21,160 --> 13:33:22,160 But here's a little hint. 7805 13:33:22,160 --> 13:33:23,160 We've got float there. 7806 13:33:23,160 --> 13:33:25,160 So we know that float is what? 7807 13:33:25,160 --> 13:33:26,160 Float is a form of data. 7808 13:33:26,160 --> 13:33:28,160 It's a data type. 7809 13:33:28,160 --> 13:33:31,160 So potentially because that's our hint. 7810 13:33:31,160 --> 13:33:33,160 We said not implemented for float. 7811 13:33:33,160 --> 13:33:35,160 So maybe we've got something wrong up here. 7812 13:33:35,160 --> 13:33:38,160 Our data is of the wrong type. 7813 13:33:38,160 --> 13:33:43,160 Can you see anywhere where our data might be the wrong type? 7814 13:33:43,160 --> 13:33:45,160 Well, let's print it out. 7815 13:33:45,160 --> 13:33:47,160 Where's our issue here? 7816 13:33:47,160 --> 13:33:48,160 Why logits? 7817 13:33:48,160 --> 13:33:50,160 Why blob train? 7818 13:33:50,160 --> 13:33:51,160 Okay. 7819 13:33:51,160 --> 13:33:53,160 Why blob train? 7820 13:33:53,160 --> 13:33:54,160 And why logits? 7821 13:33:54,160 --> 13:33:57,160 What does why blob train look like? 7822 13:33:57,160 --> 13:34:01,160 Why blob train? 7823 13:34:01,160 --> 13:34:02,160 Okay. 7824 13:34:02,160 --> 13:34:08,160 And what's the D type here? 7825 13:34:08,160 --> 13:34:09,160 Float. 7826 13:34:09,160 --> 13:34:10,160 Okay. 7827 13:34:10,160 --> 13:34:13,160 So it's not implemented for float. 7828 13:34:13,160 --> 13:34:14,160 Hmm. 7829 13:34:14,160 --> 13:34:16,160 Maybe we have to turn them into a different data type. 7830 13:34:16,160 --> 13:34:26,160 What if we went type torch long tensor? 7831 13:34:26,160 --> 13:34:28,160 What happens here? 7832 13:34:28,160 --> 13:34:31,160 Expected all tensors to be on the same device but found at least two devices. 7833 13:34:31,160 --> 13:34:32,160 Oh, my goodness. 7834 13:34:32,160 --> 13:34:34,160 What do we got wrong here? 7835 13:34:34,160 --> 13:34:37,160 Type torch long tensor. 7836 13:34:37,160 --> 13:34:38,160 Friends. 7837 13:34:38,160 --> 13:34:39,160 Guess what? 7838 13:34:39,160 --> 13:34:40,160 I found it. 7839 13:34:40,160 --> 13:34:44,160 And so it was to do with this pesky little data type issue here. 7840 13:34:44,160 --> 13:34:49,160 So if we run this again and now this one took me a while to find and I want you to know that, 7841 13:34:49,160 --> 13:34:53,160 that behind the scenes, even though, again, this is a machine learning cooking show, 7842 13:34:53,160 --> 13:34:56,160 it still takes a while to troubleshoot code and you're going to come across this. 7843 13:34:56,160 --> 13:35:00,160 But I thought rather than spend 10 minutes doing it in a video, I'll show you what I did. 7844 13:35:00,160 --> 13:35:04,160 So we went through this and we found that, hmm, there's something going on here. 7845 13:35:04,160 --> 13:35:06,160 I don't quite know what this is. 7846 13:35:06,160 --> 13:35:11,160 And that seems quite like a long string of words, not implemented for float. 7847 13:35:11,160 --> 13:35:14,160 And then we looked back at the line where it went wrong. 7848 13:35:14,160 --> 13:35:21,160 And so that we know that maybe the float is hinting at that one of these two tensors is of the wrong data type. 7849 13:35:21,160 --> 13:35:24,160 Now, why would we think that it's the wrong data type? 7850 13:35:24,160 --> 13:35:32,160 Well, because anytime you see float or int or something like that, it generally hints at one of your data types being wrong. 7851 13:35:32,160 --> 13:35:40,160 And so the error is actually right back up here where we created our tensor data. 7852 13:35:40,160 --> 13:35:45,160 So we turned our labels here into float, which generally is okay in PyTorch. 7853 13:35:45,160 --> 13:35:51,160 However, this one should be of type torch dot long tensor, which we haven't seen before. 7854 13:35:51,160 --> 13:35:58,160 But if we go into torch long tensor, let's have a look torch dot tensor. 7855 13:35:58,160 --> 13:36:01,160 Do we have long tensor? 7856 13:36:01,160 --> 13:36:02,160 Here we go. 7857 13:36:02,160 --> 13:36:04,160 64 bit integer signed. 7858 13:36:04,160 --> 13:36:08,160 So why do we need torch dot long tensor? 7859 13:36:08,160 --> 13:36:10,160 And again, this took me a while to find. 7860 13:36:10,160 --> 13:36:20,160 And so I want to express this that in your own code, you probably will butt your head up against some issues and errors that do take you a while to find. 7861 13:36:20,160 --> 13:36:22,160 And data types is one of the main ones. 7862 13:36:22,160 --> 13:36:29,160 But if we look in the documentation for cross entropy loss, the way I kind of found this out was this little hint here. 7863 13:36:29,160 --> 13:36:36,160 The performance of the criteria is generally better when the target contains class indices, as this allows for optimized computation. 7864 13:36:36,160 --> 13:36:40,160 But I read this and it says target contains class indices. 7865 13:36:40,160 --> 13:36:46,160 I'm like, hmm, alza indices already, but maybe they should be integers and not floats. 7866 13:36:46,160 --> 13:36:54,160 But then if you actually just look at the sample code, you would find that they use d type equals torch dot long. 7867 13:36:54,160 --> 13:37:03,160 Now, that's the thing with a lot of code around the internet is that sometimes the answer you're looking for is a little bit buried. 7868 13:37:03,160 --> 13:37:10,160 But if in doubt, run the code and butt your head up against a wall for a bit and keep going. 7869 13:37:10,160 --> 13:37:14,160 So let's just rerun all of this and see do we have an error here? 7870 13:37:14,160 --> 13:37:19,160 Let's train our first multi-class classification model together. 7871 13:37:19,160 --> 13:37:20,160 No arrows, fingers crossed. 7872 13:37:20,160 --> 13:37:22,160 But what did we get wrong here? 7873 13:37:22,160 --> 13:37:24,160 OK, so we've got different size. 7874 13:37:24,160 --> 13:37:27,160 We're slowly working through all of the errors in deep learning here. 7875 13:37:27,160 --> 13:37:31,160 Value error, input batch size 200 to match target size 200. 7876 13:37:31,160 --> 13:37:40,160 So this is telling me maybe our test data, which is of size 200, is getting mixed up with our training data, which is of size 800. 7877 13:37:40,160 --> 13:37:49,160 So we've got test loss, the test logits, model four. 7878 13:37:49,160 --> 13:37:50,160 What's the size? 7879 13:37:50,160 --> 13:37:57,160 Let's print out print test logits dot shape and wine blob test. 7880 13:37:57,160 --> 13:38:01,160 So troubleshooting on the fly here, everyone. 7881 13:38:01,160 --> 13:38:03,160 What do we got? 7882 13:38:03,160 --> 13:38:06,160 Torch size 800. 7883 13:38:06,160 --> 13:38:09,160 Where are our test labels coming from? 7884 13:38:09,160 --> 13:38:13,160 Wine blob test equals, oh, there we go. 7885 13:38:13,160 --> 13:38:15,160 Ah, did you catch that before? 7886 13:38:15,160 --> 13:38:17,160 Maybe you did, maybe you didn't. 7887 13:38:17,160 --> 13:38:19,160 But I think we should be right here. 7888 13:38:19,160 --> 13:38:24,160 Now if we just comment out this line, so we've had a data type issue and we've had a shape issue. 7889 13:38:24,160 --> 13:38:28,160 Two of the main and machine learning, oh, and again, we've had some issues. 7890 13:38:28,160 --> 13:38:29,160 Wine blob test. 7891 13:38:29,160 --> 13:38:30,160 What's going on here? 7892 13:38:30,160 --> 13:38:33,160 I thought we just changed the shape. 7893 13:38:33,160 --> 13:38:41,160 Oh, no, we have to go up and reassign it again because now this is definitely why blob, yes. 7894 13:38:41,160 --> 13:38:49,160 Let's rerun all of this, reassign our data. 7895 13:38:49,160 --> 13:38:53,160 We are running into every single error here, but I'm glad we're doing this because otherwise you might not see how to 7896 13:38:53,160 --> 13:38:56,160 troubleshoot these type of things. 7897 13:38:56,160 --> 13:38:59,160 So the size of a tensor much match the size. 7898 13:38:59,160 --> 13:39:01,160 Oh, we're getting the issue here. 7899 13:39:01,160 --> 13:39:03,160 Test spreads. 7900 13:39:03,160 --> 13:39:04,160 Oh, my goodness. 7901 13:39:04,160 --> 13:39:06,160 We have written so much code here. 7902 13:39:06,160 --> 13:39:07,160 Test spreads. 7903 13:39:07,160 --> 13:39:12,160 So instead of wire spread, this should be test spreads. 7904 13:39:12,160 --> 13:39:13,160 Fingers crossed. 7905 13:39:13,160 --> 13:39:15,160 Are we training our first model yet or what? 7906 13:39:15,160 --> 13:39:16,160 There we go. 7907 13:39:16,160 --> 13:39:18,160 Okay, I'm going to printing out some stuff. 7908 13:39:18,160 --> 13:39:20,160 I don't really want to print out that stuff. 7909 13:39:20,160 --> 13:39:23,160 I want to see the loss go down, so I'm going to. 7910 13:39:23,160 --> 13:39:29,160 So friends, I hope you know that we've just been through some of the most fundamental troubleshooting steps. 7911 13:39:29,160 --> 13:39:32,160 And you might say, oh, Daniel, there's a cop out because you're just coding wrong. 7912 13:39:32,160 --> 13:39:36,160 And in fact, I code wrong all the time. 7913 13:39:36,160 --> 13:39:42,160 But we've now worked out how to troubleshoot them shape errors and data type errors. 7914 13:39:42,160 --> 13:39:43,160 But look at this. 7915 13:39:43,160 --> 13:39:46,160 After all of that, thank goodness. 7916 13:39:46,160 --> 13:39:51,160 Our loss and accuracy go in the directions that we want them to go. 7917 13:39:51,160 --> 13:39:55,160 So our loss goes down and our accuracy goes up. 7918 13:39:55,160 --> 13:39:56,160 Beautiful. 7919 13:39:56,160 --> 13:40:01,160 So it looks like that our model is working on a multi-class classification data set. 7920 13:40:01,160 --> 13:40:03,160 So how do we check that? 7921 13:40:03,160 --> 13:40:08,160 Well, we're going to evaluate it in the next step by visualize, visualize, visualize. 7922 13:40:08,160 --> 13:40:10,160 So you might want to give that a shot. 7923 13:40:10,160 --> 13:40:14,160 See if you can use our plot decision boundary function. 7924 13:40:14,160 --> 13:40:17,160 We'll use our model to separate the data here. 7925 13:40:17,160 --> 13:40:21,160 So it's going to be much the same as what we did for binary classification. 7926 13:40:21,160 --> 13:40:25,160 But this time we're using a different model and a different data set. 7927 13:40:25,160 --> 13:40:28,160 I'll see you there. 7928 13:40:28,160 --> 13:40:29,160 Welcome back. 7929 13:40:29,160 --> 13:40:33,160 In the last video, we went through some of the steps that we've been through before 7930 13:40:33,160 --> 13:40:36,160 in terms of training and testing a model. 7931 13:40:36,160 --> 13:40:42,160 But we also butted our heads up against two of the most common issues in machine learning and deep learning in general. 7932 13:40:42,160 --> 13:40:45,160 And that is data type issues and shape issues. 7933 13:40:45,160 --> 13:40:48,160 But luckily we were able to resolve them. 7934 13:40:48,160 --> 13:40:54,160 And trust me, you're going to run across many of them in your own deep learning and machine learning endeavors. 7935 13:40:54,160 --> 13:40:59,160 So I'm glad that we got to have a look at them and sort of I could show you what I do to troubleshoot them. 7936 13:40:59,160 --> 13:41:02,160 But in reality, it's a lot of experimentation. 7937 13:41:02,160 --> 13:41:08,160 Run the code, see what errors come out, Google the errors, read the documentation, try again. 7938 13:41:08,160 --> 13:41:16,160 But with that being said, it looks like that our model, our multi-class classification model has learned something. 7939 13:41:16,160 --> 13:41:19,160 The loss is going down, the accuracy is going up. 7940 13:41:19,160 --> 13:41:31,160 But we can further evaluate this by making and evaluating predictions with a PyTorch multi-class model. 7941 13:41:31,160 --> 13:41:33,160 So how do we make predictions? 7942 13:41:33,160 --> 13:41:36,160 We've seen this step before, but let's reiterate. 7943 13:41:36,160 --> 13:41:40,160 Make predictions, we're going to set our model to what mode, a vowel mode. 7944 13:41:40,160 --> 13:41:44,160 And then we're going to turn on what context manager, inference mode. 7945 13:41:44,160 --> 13:41:47,160 Because we want to make inference, we want to make predictions. 7946 13:41:47,160 --> 13:41:49,160 Now what do we make predictions on? 7947 13:41:49,160 --> 13:41:52,160 Or what are the predictions? They're going to be logits because why? 7948 13:41:52,160 --> 13:41:55,160 They are the raw outputs of our model. 7949 13:41:55,160 --> 13:41:59,160 So we'll take model four, which we just trained and we'll pass it the test data. 7950 13:41:59,160 --> 13:42:02,160 Well, it needs to be blob test, by the way. 7951 13:42:02,160 --> 13:42:04,160 I keep getting that variable mixed up. 7952 13:42:04,160 --> 13:42:06,160 We just had enough problems with the data, Daniel. 7953 13:42:06,160 --> 13:42:09,160 We don't need any more. You're completely right. 7954 13:42:09,160 --> 13:42:10,160 I agree with you. 7955 13:42:10,160 --> 13:42:13,160 But we're probably going to come across some more problems in the future. 7956 13:42:13,160 --> 13:42:14,160 Don't you worry about that. 7957 13:42:14,160 --> 13:42:17,160 So let's view the first 10 predictions. 7958 13:42:17,160 --> 13:42:21,160 Why logits? What do they look like? 7959 13:42:21,160 --> 13:42:23,160 All right, just numbers on the page. They're raw logits. 7960 13:42:23,160 --> 13:42:31,160 Now how do we go from go from logits to prediction probabilities? 7961 13:42:31,160 --> 13:42:32,160 How do we do that? 7962 13:42:32,160 --> 13:42:40,160 With a multi-class model, we go y-pred-probs equals torch.softmax on the y-logits. 7963 13:42:40,160 --> 13:42:43,160 And we want to do it across the first dimension. 7964 13:42:43,160 --> 13:42:46,160 And what do we have when we go pred-probs? 7965 13:42:46,160 --> 13:42:50,160 Let's go up to the first 10. 7966 13:42:50,160 --> 13:42:52,160 Are we apples to apples yet? 7967 13:42:52,160 --> 13:42:58,160 What does our y-blog test look like? 7968 13:42:58,160 --> 13:43:00,160 We're not apples to apples yet, but we're close. 7969 13:43:00,160 --> 13:43:02,160 So these are prediction probabilities. 7970 13:43:02,160 --> 13:43:04,160 You'll notice that we get some fairly different values here. 7971 13:43:04,160 --> 13:43:08,160 And remember, the one closest to one here, the value closest to one, 7972 13:43:08,160 --> 13:43:12,160 which looks like it's this, the index of the maximum value 7973 13:43:12,160 --> 13:43:15,160 is going to be our model's predicted class. 7974 13:43:15,160 --> 13:43:17,160 So this index is index one. 7975 13:43:17,160 --> 13:43:19,160 And does it correlate here? Yes. 7976 13:43:19,160 --> 13:43:20,160 One, beautiful. 7977 13:43:20,160 --> 13:43:23,160 Then we have index three, which is the maximum value here. 7978 13:43:23,160 --> 13:43:25,160 Three, beautiful. 7979 13:43:25,160 --> 13:43:27,160 And then we have, what do we have here? 7980 13:43:27,160 --> 13:43:30,160 Index two, yes. 7981 13:43:30,160 --> 13:43:31,160 Okay, wonderful. 7982 13:43:31,160 --> 13:43:32,160 But let's not step through that. 7983 13:43:32,160 --> 13:43:33,160 We're programmers. 7984 13:43:33,160 --> 13:43:34,160 We can do this with code. 7985 13:43:34,160 --> 13:43:40,160 So now let's go from pred-probs to pred-labels. 7986 13:43:40,160 --> 13:43:44,160 So y-pred-equals, how do we do that? 7987 13:43:44,160 --> 13:43:50,160 Well, we can do torch.argmax on the y-pred-probs. 7988 13:43:50,160 --> 13:43:52,160 And then we can pass dim equals one. 7989 13:43:52,160 --> 13:43:54,160 We could also do it this way. 7990 13:43:54,160 --> 13:43:57,160 So y-pred-probs call dot-argmax. 7991 13:43:57,160 --> 13:43:59,160 There's no real difference between these two. 7992 13:43:59,160 --> 13:44:03,160 But we're just going to do it this way, called torch.argmax. 7993 13:44:03,160 --> 13:44:05,160 y-pred-es, let's view the first 10. 7994 13:44:05,160 --> 13:44:12,160 Are we now comparing apples to apples when we go y-blob test? 7995 13:44:12,160 --> 13:44:14,160 Yes, we are. 7996 13:44:14,160 --> 13:44:15,160 Have a go at that. 7997 13:44:15,160 --> 13:44:20,160 Look, one, three, two, one, zero, three, one, three, two, one, zero, three. 7998 13:44:20,160 --> 13:44:21,160 Beautiful. 7999 13:44:21,160 --> 13:44:24,160 Now, we could line these up and look at and compare them all day. 8000 13:44:24,160 --> 13:44:25,160 I mean, that would be fun. 8001 13:44:25,160 --> 13:44:29,160 But I know what something that would be even more fun. 8002 13:44:29,160 --> 13:44:30,160 Let's get visual. 8003 13:44:30,160 --> 13:44:33,160 So plot dot figure. 8004 13:44:33,160 --> 13:44:39,160 And we're going to go fig size equals 12.6, just because the beauty of this 8005 13:44:39,160 --> 13:44:43,160 being a cooking show is I kind of know what ingredients work from ahead of time. 8006 13:44:43,160 --> 13:44:46,160 Despite what you saw in the last video with all of that trouble shooting. 8007 13:44:46,160 --> 13:44:50,160 But I'm actually glad that we did that because seriously. 8008 13:44:50,160 --> 13:44:53,160 Shape issues and data type issues. 8009 13:44:53,160 --> 13:44:55,160 You're going to come across a lot of them. 8010 13:44:55,160 --> 13:44:59,160 The two are the main issues I troubleshoot, aside from device issues. 8011 13:44:59,160 --> 13:45:05,160 So let's go x-blob train and y-blob train. 8012 13:45:05,160 --> 13:45:08,160 And we're going to do another plot here. 8013 13:45:08,160 --> 13:45:11,160 We're going to get subplot one, two, two. 8014 13:45:11,160 --> 13:45:14,160 And we're going to do this for the test data. 8015 13:45:14,160 --> 13:45:17,160 Test and then plot decision boundary. 8016 13:45:17,160 --> 13:45:30,160 Plot decision boundary with model four on x-blob test and y-blob test as well. 8017 13:45:30,160 --> 13:45:31,160 Let's see this. 8018 13:45:31,160 --> 13:45:32,160 Did we train a multi-class? 8019 13:45:32,160 --> 13:45:33,160 Oh my goodness. 8020 13:45:33,160 --> 13:45:34,160 Yes, we did. 8021 13:45:34,160 --> 13:45:37,160 Our code worked faster than I can speak. 8022 13:45:37,160 --> 13:45:39,160 Look at that beautiful looking plot. 8023 13:45:39,160 --> 13:45:42,160 We've separated our data almost as best as what we could. 8024 13:45:42,160 --> 13:45:45,160 Like there's some here that are quite inconspicuous. 8025 13:45:45,160 --> 13:45:48,160 And now what's the thing about these lines? 8026 13:45:48,160 --> 13:45:52,160 With this model have worked, I posed the question a fair few videos ago, 8027 13:45:52,160 --> 13:45:56,160 whenever we created our multi-class model that could we separate this data 8028 13:45:56,160 --> 13:45:59,160 without nonlinear functions. 8029 13:45:59,160 --> 13:46:01,160 So how about we just test that? 8030 13:46:01,160 --> 13:46:04,160 Since we've got the code ready, let's go back up. 8031 13:46:04,160 --> 13:46:06,160 We've got nonlinear functions here. 8032 13:46:06,160 --> 13:46:07,160 We've got relu here. 8033 13:46:07,160 --> 13:46:10,160 So I'm just going to recreate our model there. 8034 13:46:10,160 --> 13:46:11,160 So I just took relu out. 8035 13:46:11,160 --> 13:46:12,160 That's all I did. 8036 13:46:12,160 --> 13:46:15,160 Commented it out, this code will still all work. 8037 13:46:15,160 --> 13:46:16,160 Or fingers crossed it will. 8038 13:46:16,160 --> 13:46:18,160 Don't count your chickens before they hatch. 8039 13:46:18,160 --> 13:46:19,160 Daniel, come on. 8040 13:46:19,160 --> 13:46:21,160 We're just going to rerun all of these cells. 8041 13:46:21,160 --> 13:46:23,160 All the code's going to stay the same. 8042 13:46:23,160 --> 13:46:26,160 All we did was we took the nonlinearity out of our model. 8043 13:46:26,160 --> 13:46:28,160 Is it still going to work? 8044 13:46:28,160 --> 13:46:29,160 Oh my goodness. 8045 13:46:29,160 --> 13:46:31,160 It still works. 8046 13:46:31,160 --> 13:46:33,160 Now why is that? 8047 13:46:33,160 --> 13:46:36,160 Well, you'll notice that the lines are a lot more straighter here. 8048 13:46:36,160 --> 13:46:38,160 Did we get different metrics? 8049 13:46:38,160 --> 13:46:39,160 I'll leave that for you to compare. 8050 13:46:39,160 --> 13:46:41,160 Maybe these will be a little bit different. 8051 13:46:41,160 --> 13:46:43,160 I don't think they're too far different. 8052 13:46:43,160 --> 13:46:48,160 But that is because our data is linearly separable. 8053 13:46:48,160 --> 13:46:51,160 So we can draw straight lines only to separate our data. 8054 13:46:51,160 --> 13:46:54,160 However, a lot of the data that you deal with in practice 8055 13:46:54,160 --> 13:46:57,160 will require linear and nonlinear. 8056 13:46:57,160 --> 13:46:59,160 Hence why we spent a lot of time on that. 8057 13:46:59,160 --> 13:47:01,160 Like the circle data that we covered before. 8058 13:47:01,160 --> 13:47:03,160 And let's look up an image of a pizza. 8059 13:47:03,160 --> 13:47:08,160 If you're building a food vision model to take photos of food 8060 13:47:08,160 --> 13:47:11,160 and separate different classes of food, 8061 13:47:11,160 --> 13:47:14,160 could you do this with just straight lines? 8062 13:47:14,160 --> 13:47:17,160 You might be able to, but I personally don't think 8063 13:47:17,160 --> 13:47:19,160 that I could build a model to do such a thing. 8064 13:47:19,160 --> 13:47:22,160 And in fact, PyTorch makes it so easy to add nonlinearities 8065 13:47:22,160 --> 13:47:24,160 to our model, we might as well have them in 8066 13:47:24,160 --> 13:47:27,160 so that our model can use it if it needs it 8067 13:47:27,160 --> 13:47:29,160 and if it doesn't need it, well, hey, 8068 13:47:29,160 --> 13:47:32,160 it's going to build a pretty good model as we saw before 8069 13:47:32,160 --> 13:47:35,160 if we included the nonlinearities in our model. 8070 13:47:35,160 --> 13:47:37,160 So we could uncomment these and our model is still 8071 13:47:37,160 --> 13:47:38,160 going to perform quite well. 8072 13:47:38,160 --> 13:47:40,160 That is the beauty of neural networks, 8073 13:47:40,160 --> 13:47:43,160 is that they decide the numbers that should 8074 13:47:43,160 --> 13:47:45,160 represent outdated the best. 8075 13:47:45,160 --> 13:47:49,160 And so, with that being said, we've evaluated our model, 8076 13:47:49,160 --> 13:47:51,160 we've trained our multi-class classification model, 8077 13:47:51,160 --> 13:47:54,160 we've put everything together, we've gone from binary 8078 13:47:54,160 --> 13:47:57,160 classification to multi-class classification. 8079 13:47:57,160 --> 13:48:00,160 I think there's just one more thing that we should cover 8080 13:48:00,160 --> 13:48:04,160 and that is, let's go here, section number nine, 8081 13:48:04,160 --> 13:48:08,160 a few more classification metrics. 8082 13:48:08,160 --> 13:48:12,160 So, as I said before, evaluating a model, 8083 13:48:12,160 --> 13:48:15,160 let's just put it here, to evaluate our model, 8084 13:48:15,160 --> 13:48:18,160 our classification models, that is, 8085 13:48:18,160 --> 13:48:22,160 evaluating a model is just as important as training a model. 8086 13:48:22,160 --> 13:48:24,160 So, I'll see you in the next video. 8087 13:48:24,160 --> 13:48:28,160 Let's cover a few more classification metrics. 8088 13:48:28,160 --> 13:48:29,160 Welcome back. 8089 13:48:29,160 --> 13:48:31,160 In the last video, we evaluated our 8090 13:48:31,160 --> 13:48:34,160 multi-class classification model visually. 8091 13:48:34,160 --> 13:48:36,160 And we saw that it did pretty darn well, 8092 13:48:36,160 --> 13:48:39,160 because our data turned out to be linearly separable. 8093 13:48:39,160 --> 13:48:41,160 So, our model, even without non-linear functions, 8094 13:48:41,160 --> 13:48:43,160 could separate the data here. 8095 13:48:43,160 --> 13:48:46,160 However, as I said before, most of the data that you deal with 8096 13:48:46,160 --> 13:48:50,160 will require some form of linear and non-linear function. 8097 13:48:50,160 --> 13:48:53,160 So, just keep that in mind, and the beauty of PyTorch is 8098 13:48:53,160 --> 13:48:56,160 that it allows us to create models with linear 8099 13:48:56,160 --> 13:48:59,160 and non-linear functions quite flexibly. 8100 13:48:59,160 --> 13:49:01,160 So, let's write down here. 8101 13:49:01,160 --> 13:49:04,160 If we wanted to further evaluate our classification models, 8102 13:49:04,160 --> 13:49:06,160 we've seen accuracy. 8103 13:49:06,160 --> 13:49:08,160 So, accuracy is one of the main methods 8104 13:49:08,160 --> 13:49:10,160 of evaluating classification models. 8105 13:49:10,160 --> 13:49:14,160 So, this is like saying, out of 100 samples, 8106 13:49:14,160 --> 13:49:18,160 how many does our model get right? 8107 13:49:18,160 --> 13:49:21,160 And so, we've seen our model right now 8108 13:49:21,160 --> 13:49:23,160 is that testing accuracy of nearly 100%. 8109 13:49:23,160 --> 13:49:25,160 So, it's nearly perfect. 8110 13:49:25,160 --> 13:49:27,160 But, of course, there were a few tough samples, 8111 13:49:27,160 --> 13:49:29,160 which I mean a little bit hard. 8112 13:49:29,160 --> 13:49:31,160 Some of them are even within the other samples, 8113 13:49:31,160 --> 13:49:33,160 so you can forgive it a little bit here 8114 13:49:33,160 --> 13:49:36,160 for not being exactly perfect. 8115 13:49:36,160 --> 13:49:38,160 What are some other metrics here? 8116 13:49:38,160 --> 13:49:41,160 Well, we've also got precision, 8117 13:49:41,160 --> 13:49:44,160 and we've also got recall. 8118 13:49:44,160 --> 13:49:46,160 Both of these will be pretty important 8119 13:49:46,160 --> 13:49:50,160 when you have classes with different amounts of values in them. 8120 13:49:50,160 --> 13:49:52,160 So, precision and recall. 8121 13:49:52,160 --> 13:49:57,160 So, accuracy is pretty good to use when you have balanced classes. 8122 13:49:57,160 --> 13:50:00,160 So, this is just text on a page for now. 8123 13:50:00,160 --> 13:50:03,160 F1 score, which combines precision and recall. 8124 13:50:03,160 --> 13:50:05,160 There's also a confusion matrix, 8125 13:50:05,160 --> 13:50:09,160 and there's also a classification report. 8126 13:50:09,160 --> 13:50:12,160 So, I'm going to show you a few code examples 8127 13:50:12,160 --> 13:50:14,160 of where you can access these, 8128 13:50:14,160 --> 13:50:17,160 and I'm going to leave it to you as extra curriculum 8129 13:50:17,160 --> 13:50:20,160 to try each one of these out. 8130 13:50:20,160 --> 13:50:23,160 So, let's go into the keynote. 8131 13:50:23,160 --> 13:50:25,160 And by the way, you should pay yourself on the back here 8132 13:50:25,160 --> 13:50:28,160 because we've just gone through all of the PyTorch workflow 8133 13:50:28,160 --> 13:50:30,160 for a classification problem. 8134 13:50:30,160 --> 13:50:32,160 Not only just binary classification, 8135 13:50:32,160 --> 13:50:35,160 we've done multi-class classification as well. 8136 13:50:35,160 --> 13:50:38,160 So, let's not stop there, though. 8137 13:50:38,160 --> 13:50:40,160 Remember, building a model, 8138 13:50:40,160 --> 13:50:43,160 evaluating a model is just as important as building a model. 8139 13:50:43,160 --> 13:50:46,160 So, we've been through non-linearity. 8140 13:50:46,160 --> 13:50:49,160 We've seen how we could replicate non-linear functions. 8141 13:50:49,160 --> 13:50:52,160 We've talked about the machine learning explorer's motto, 8142 13:50:52,160 --> 13:50:55,160 visualize, visualize, visualize. 8143 13:50:55,160 --> 13:50:58,160 Machine learning practitioners motto is experiment, experiment, experiment. 8144 13:50:58,160 --> 13:51:01,160 I think I called that the machine learning or data scientist motto. 8145 13:51:01,160 --> 13:51:03,160 Same thing, you know? 8146 13:51:03,160 --> 13:51:05,160 And steps in modeling with PyTorch. 8147 13:51:05,160 --> 13:51:06,160 We've seen this in practice, 8148 13:51:06,160 --> 13:51:08,160 so we don't need to look at these slides. 8149 13:51:08,160 --> 13:51:10,160 I mean, they'll be available on the GitHub if you want them, 8150 13:51:10,160 --> 13:51:11,160 but here we are. 8151 13:51:11,160 --> 13:51:14,160 Some common classification evaluation methods. 8152 13:51:14,160 --> 13:51:15,160 So, we have accuracy. 8153 13:51:15,160 --> 13:51:17,160 There's the formal formula if you want, 8154 13:51:17,160 --> 13:51:20,160 but there's also code, which is what we've been focusing on. 8155 13:51:20,160 --> 13:51:23,160 So, we wrote our own accuracy function, which replicates this. 8156 13:51:23,160 --> 13:51:26,160 By the way, Tp stands for not toilet paper, 8157 13:51:26,160 --> 13:51:28,160 it stands for true positive, 8158 13:51:28,160 --> 13:51:33,160 Tn is true negative, false positive, Fp, false negative, Fn. 8159 13:51:33,160 --> 13:51:36,160 And so, the code, we could do torch metrics. 8160 13:51:36,160 --> 13:51:37,160 Oh, what's that? 8161 13:51:37,160 --> 13:51:38,160 But when should you use it? 8162 13:51:38,160 --> 13:51:40,160 The default metric for classification problems. 8163 13:51:40,160 --> 13:51:43,160 Note, it is not the best for imbalanced classes. 8164 13:51:43,160 --> 13:51:45,160 So, if you had, for example, 8165 13:51:45,160 --> 13:51:48,160 1,000 samples of one class, 8166 13:51:48,160 --> 13:51:50,160 so, number one, label number one, 8167 13:51:50,160 --> 13:51:54,160 but you had only 10 samples of class zero. 8168 13:51:54,160 --> 13:51:58,160 So, accuracy might not be the best to use for then. 8169 13:51:58,160 --> 13:52:00,160 For imbalanced data sets, 8170 13:52:00,160 --> 13:52:03,160 you might want to look into precision and recall. 8171 13:52:03,160 --> 13:52:05,160 So, there's a great article called, 8172 13:52:05,160 --> 13:52:09,160 I think it's beyond accuracy, precision and recall, 8173 13:52:09,160 --> 13:52:12,160 which gives a fantastic overview of, there we go. 8174 13:52:12,160 --> 13:52:14,160 This is what I'd recommend. 8175 13:52:14,160 --> 13:52:17,160 There we go, by Will Coestron. 8176 13:52:17,160 --> 13:52:22,160 So, I'd highly recommend this article as some extra curriculum here. 8177 13:52:22,160 --> 13:52:29,160 See this article for when to use precision recall. 8178 13:52:29,160 --> 13:52:31,160 We'll go there. 8179 13:52:31,160 --> 13:52:32,160 Now, if we look back, 8180 13:52:32,160 --> 13:52:35,160 there is the formal formula for precision, 8181 13:52:35,160 --> 13:52:38,160 true positive over true positive plus false positive. 8182 13:52:38,160 --> 13:52:41,160 So, higher precision leads to less false positives. 8183 13:52:41,160 --> 13:52:44,160 So, if false positives are not ideal, 8184 13:52:44,160 --> 13:52:46,160 you probably want to increase precision. 8185 13:52:46,160 --> 13:52:49,160 If false negatives are not ideal, 8186 13:52:49,160 --> 13:52:51,160 you want to increase your recall metric. 8187 13:52:51,160 --> 13:52:55,160 However, you should be aware that there is such thing as a precision recall trade-off. 8188 13:52:55,160 --> 13:52:58,160 And you're going to find this in your experimentation. 8189 13:52:58,160 --> 13:53:01,160 Precision recall trade-off. 8190 13:53:01,160 --> 13:53:05,160 So, that means that, generally, if you increase precision, 8191 13:53:05,160 --> 13:53:07,160 you lower recall. 8192 13:53:07,160 --> 13:53:11,160 And, inversely, if you increase precision, you lower recall. 8193 13:53:11,160 --> 13:53:14,160 So, check out that, just to be aware of that. 8194 13:53:14,160 --> 13:53:18,160 But, again, you're going to learn this through practice of evaluating your models. 8195 13:53:18,160 --> 13:53:21,160 If you'd like some code to do precision and recall, 8196 13:53:21,160 --> 13:53:24,160 you've got torchmetrics.precision, or torchmetrics.recall, 8197 13:53:24,160 --> 13:53:26,160 as well as scikit-learn. 8198 13:53:26,160 --> 13:53:30,160 So scikit-learn has implementations for many different classification metrics. 8199 13:53:30,160 --> 13:53:34,160 Torchmetrics is a PyTorch-like library. 8200 13:53:34,160 --> 13:53:38,160 And then we have F1 score, which combines precision and recall. 8201 13:53:38,160 --> 13:53:42,160 So, it's a good combination if you want something in between these two. 8202 13:53:42,160 --> 13:53:45,160 And then, finally, there's a confusion matrix. 8203 13:53:45,160 --> 13:53:49,160 I haven't listed here a classification report, but I've listed it up here. 8204 13:53:49,160 --> 13:53:53,160 And we can see a classification report in scikit-learn. 8205 13:53:53,160 --> 13:53:55,160 Classification report. 8206 13:53:55,160 --> 13:53:59,160 Classification report kind of just puts together all of the metrics that we've talked about. 8207 13:53:59,160 --> 13:54:03,160 And we can go there. 8208 13:54:03,160 --> 13:54:06,160 But I've been talking a lot about torchmetrics. 8209 13:54:06,160 --> 13:54:09,160 So let's look up torchmetrics' accuracy. 8210 13:54:09,160 --> 13:54:11,160 Torchmetrics. 8211 13:54:11,160 --> 13:54:13,160 So this is a library. 8212 13:54:13,160 --> 13:54:16,160 I don't think it comes with Google Colab at the moment, 8213 13:54:16,160 --> 13:54:19,160 but you can import torchmetrics, and you can initialize a metric. 8214 13:54:19,160 --> 13:54:24,160 So we've built our own accuracy function, but the beauty of using torchmetrics 8215 13:54:24,160 --> 13:54:27,160 is that it uses PyTorch-like code. 8216 13:54:27,160 --> 13:54:31,160 So we've got metric, preds, and target. 8217 13:54:31,160 --> 13:54:36,160 And then we can find out what the value of the accuracy is. 8218 13:54:36,160 --> 13:54:42,160 And if you wanted to implement your own metrics, you could subclass the metric class here. 8219 13:54:42,160 --> 13:54:44,160 But let's just practice this. 8220 13:54:44,160 --> 13:54:51,160 So let's check to see if I'm going to grab this and copy this in here. 8221 13:54:51,160 --> 13:55:00,160 If you want access to a lot of PyTorch metrics, see torchmetrics. 8222 13:55:00,160 --> 13:55:03,160 So can we import torchmetrics? 8223 13:55:03,160 --> 13:55:08,160 Maybe it's already in Google Colab. 8224 13:55:08,160 --> 13:55:09,160 No, not here. 8225 13:55:09,160 --> 13:55:10,160 But that's all right. 8226 13:55:10,160 --> 13:55:13,160 We'll go pip install torchmetrics. 8227 13:55:13,160 --> 13:55:16,160 So Google Colab has access to torchmetrics. 8228 13:55:16,160 --> 13:55:19,160 And that's going to download from torchmetrics. 8229 13:55:19,160 --> 13:55:20,160 It shouldn't take too long. 8230 13:55:20,160 --> 13:55:21,160 It's quite a small package. 8231 13:55:21,160 --> 13:55:22,160 Beautiful. 8232 13:55:22,160 --> 13:55:29,160 And now we're going to go from torchmetrics import accuracy. 8233 13:55:29,160 --> 13:55:30,160 Wonderful. 8234 13:55:30,160 --> 13:55:33,160 And let's see how we can use this. 8235 13:55:33,160 --> 13:55:34,160 So setup metric. 8236 13:55:34,160 --> 13:55:38,160 So we're going to go torchmetric underscore accuracy. 8237 13:55:38,160 --> 13:55:40,160 We could call it whatever we want, really. 8238 13:55:40,160 --> 13:55:42,160 But we need accuracy here. 8239 13:55:42,160 --> 13:55:44,160 We're just going to set up the class. 8240 13:55:44,160 --> 13:55:53,160 And then we're going to calculate the accuracy of our multi-class model by calling torchmetric accuracy. 8241 13:55:53,160 --> 13:55:58,160 And we're going to pass it Y threads and Y blob test. 8242 13:55:58,160 --> 13:56:01,160 Let's see what happens here. 8243 13:56:01,160 --> 13:56:04,160 Oh, what did we get wrong? 8244 13:56:04,160 --> 13:56:05,160 Runtime error. 8245 13:56:05,160 --> 13:56:08,160 Expected all tensors to be on the same device, but found at least two devices. 8246 13:56:08,160 --> 13:56:11,160 Oh, of course. 8247 13:56:11,160 --> 13:56:16,160 Now, remember how I said torchmetrics implements PyTorch like code? 8248 13:56:16,160 --> 13:56:20,160 Well, let's check what device this is on. 8249 13:56:20,160 --> 13:56:22,160 Oh, it's on the CPU. 8250 13:56:22,160 --> 13:56:27,160 So something to be aware of that if you use torchmetrics, you have to make sure your metrics 8251 13:56:27,160 --> 13:56:32,160 are on the same device by using device agnostic code as your data. 8252 13:56:32,160 --> 13:56:34,160 So if we run this, what do we get? 8253 13:56:34,160 --> 13:56:35,160 Beautiful. 8254 13:56:35,160 --> 13:56:43,160 We get an accuracy of 99.5%, which is in line with the accuracy function that we coded ourselves. 8255 13:56:43,160 --> 13:56:47,160 So if you'd like a lot of pre-built metrics functions, be sure to see either 8256 13:56:47,160 --> 13:56:53,160 scikit-learn for any of these or torchmetrics for any PyTorch like metrics. 8257 13:56:53,160 --> 13:56:56,160 But just be aware, if you use the PyTorch version, they have to be on the same 8258 13:56:56,160 --> 13:56:57,160 device. 8259 13:56:57,160 --> 13:57:01,160 And if you'd like to install it, what do we have? 8260 13:57:01,160 --> 13:57:02,160 Where's the metrics? 8261 13:57:02,160 --> 13:57:03,160 Module metrics? 8262 13:57:03,160 --> 13:57:05,160 Do we have classification? 8263 13:57:05,160 --> 13:57:06,160 There we go. 8264 13:57:06,160 --> 13:57:11,160 So look how many different types of classification metrics there are from torchmetrics. 8265 13:57:11,160 --> 13:57:13,160 So I'll leave that for you to explore. 8266 13:57:13,160 --> 13:57:16,160 The resources for this will be here. 8267 13:57:16,160 --> 13:57:20,160 This is an extracurricular article for when to use precision recall. 8268 13:57:20,160 --> 13:57:26,160 And another extracurricular would be to go through the torchmetrics module for 10 minutes 8269 13:57:26,160 --> 13:57:30,160 and have a look at the different metrics for classification. 8270 13:57:30,160 --> 13:57:36,160 So with that being said, I think we've covered a fair bit. 8271 13:57:36,160 --> 13:57:40,160 But I think it's also time for you to practice what you've learned. 8272 13:57:40,160 --> 13:57:43,160 So let's cover some exercises in the next video. 8273 13:57:43,160 --> 13:57:46,160 I'll see you there. 8274 13:57:46,160 --> 13:57:47,160 Welcome back. 8275 13:57:47,160 --> 13:57:52,160 In the last video, we looked at a few more classification metrics, a little bit of a high 8276 13:57:52,160 --> 13:57:57,160 level overview for some more ways to evaluate your classification models. 8277 13:57:57,160 --> 13:58:01,160 And I linked some extracurricular here that you might want to look into as well. 8278 13:58:01,160 --> 13:58:04,160 But we have covered a whole bunch of code together. 8279 13:58:04,160 --> 13:58:07,160 But now it's time for you to practice some of this stuff on your own. 8280 13:58:07,160 --> 13:58:10,160 And so I have some exercises prepared. 8281 13:58:10,160 --> 13:58:12,160 Now, where do you go for the exercises? 8282 13:58:12,160 --> 13:58:16,160 Well, remember on the learnpytorch.io book, for each one of these chapters, there's 8283 13:58:16,160 --> 13:58:17,160 a section. 8284 13:58:17,160 --> 13:58:19,160 Now, just have a look at how much we've covered. 8285 13:58:19,160 --> 13:58:21,160 If I scroll, just keep scrolling. 8286 13:58:21,160 --> 13:58:22,160 Look at that. 8287 13:58:22,160 --> 13:58:23,160 We've covered all of that in this module. 8288 13:58:23,160 --> 13:58:24,160 That's a fair bit of stuff. 8289 13:58:24,160 --> 13:58:28,160 But down the bottom of each one is an exercises section. 8290 13:58:28,160 --> 13:58:32,160 So all exercises are focusing on practicing the code in the sections above, all of these 8291 13:58:32,160 --> 13:58:33,160 sections here. 8292 13:58:33,160 --> 13:58:39,160 I've got number one, two, three, four, five, six, seven. 8293 13:58:39,160 --> 13:58:42,160 Yeah, seven exercises, nice, writing plenty of code. 8294 13:58:42,160 --> 13:58:44,160 And then, of course, extracurricular. 8295 13:58:44,160 --> 13:58:50,160 So these are some challenges that I've mentioned throughout the entire section zero two. 8296 13:58:50,160 --> 13:58:52,160 But I'm going to link this in here. 8297 13:58:52,160 --> 13:58:53,160 Exercises. 8298 13:58:53,160 --> 13:58:57,160 But, of course, you can just find it on the learnpytorch.io book. 8299 13:58:57,160 --> 13:59:01,160 So if we come in here and we just create another heading. 8300 13:59:01,160 --> 13:59:02,160 Exercises. 8301 13:59:02,160 --> 13:59:10,160 And extracurricular. 8302 13:59:10,160 --> 13:59:14,160 And then we just write in here. 8303 13:59:14,160 --> 13:59:18,160 See exercises and extracurricular. 8304 13:59:18,160 --> 13:59:20,160 Here. 8305 13:59:20,160 --> 13:59:27,160 And so if you'd like a template of the exercise code, you can go to the PyTorch deep learning 8306 13:59:27,160 --> 13:59:28,160 repo. 8307 13:59:28,160 --> 13:59:32,160 And then within the extras folder, we have exercises and solutions. 8308 13:59:32,160 --> 13:59:35,160 You might be able to guess what's in each of these exercises. 8309 13:59:35,160 --> 13:59:39,160 We have O2 PyTorch classification exercises. 8310 13:59:39,160 --> 13:59:41,160 This is going to be some skeleton code. 8311 13:59:41,160 --> 13:59:44,160 And then, of course, we have the solutions as well. 8312 13:59:44,160 --> 13:59:46,160 Now, this is just one form of solutions. 8313 13:59:46,160 --> 13:59:51,160 But I'm not going to look at those because I would recommend you looking at the exercises 8314 13:59:51,160 --> 13:59:54,160 first before you go into the solutions. 8315 13:59:54,160 --> 13:59:57,160 So we have things like import torch. 8316 13:59:57,160 --> 13:59:59,160 Set up device agnostic code. 8317 13:59:59,160 --> 14:00:01,160 Create a data set. 8318 14:00:01,160 --> 14:00:03,160 Turn data into a data frame. 8319 14:00:03,160 --> 14:00:05,160 And then et cetera, et cetera. 8320 14:00:05,160 --> 14:00:08,160 For the things that we've done throughout this section. 8321 14:00:08,160 --> 14:00:10,160 So give that a go. 8322 14:00:10,160 --> 14:00:11,160 Try it on your own. 8323 14:00:11,160 --> 14:00:15,160 And if you get stuck, you can refer to the notebook that we've coded together. 8324 14:00:15,160 --> 14:00:16,160 All of this code here. 8325 14:00:16,160 --> 14:00:21,160 You can refer to the documentation, of course. 8326 14:00:21,160 --> 14:00:26,160 And then you can refer to as a last resort, the solutions notebooks. 8327 14:00:26,160 --> 14:00:28,160 So give that a shot. 8328 14:00:28,160 --> 14:00:31,160 And congratulations on finishing. 8329 14:00:31,160 --> 14:00:35,160 Section 02 PyTorch classification. 8330 14:00:35,160 --> 14:00:38,160 Now, if you're still there, you're still with me. 8331 14:00:38,160 --> 14:00:39,160 Let's move on to the next section. 8332 14:00:39,160 --> 14:00:43,160 We're going to cover a few more things of deep learning with PyTorch. 8333 14:00:43,160 --> 14:00:48,160 I'll see you there. 8334 14:00:48,160 --> 14:00:50,160 Hello, and welcome back. 8335 14:00:50,160 --> 14:00:52,160 We've got another section. 8336 14:00:52,160 --> 14:00:56,160 We've got computer vision and convolutional neural networks with. 8337 14:00:56,160 --> 14:00:58,160 PyTorch. 8338 14:00:58,160 --> 14:01:03,160 Now, computer vision is one of my favorite, favorite deep learning topics. 8339 14:01:03,160 --> 14:01:06,160 But before we get into the materials, let's answer a very important question. 8340 14:01:06,160 --> 14:01:09,160 And that is, where can you get help? 8341 14:01:09,160 --> 14:01:13,160 So, first and foremost, is to follow along with the code as best you can. 8342 14:01:13,160 --> 14:01:16,160 We're going to be writing a whole bunch of PyTorch computer vision code. 8343 14:01:16,160 --> 14:01:17,160 And remember our motto. 8344 14:01:17,160 --> 14:01:19,160 If and out, run the code. 8345 14:01:19,160 --> 14:01:22,160 See what the inputs and outputs are of your code. 8346 14:01:22,160 --> 14:01:24,160 And that's try it yourself. 8347 14:01:24,160 --> 14:01:28,160 If you need the doc string to read about what the function you're using does, 8348 14:01:28,160 --> 14:01:31,160 you can press shift command and space in Google CoLab. 8349 14:01:31,160 --> 14:01:33,160 Or it might be control if you're on Windows. 8350 14:01:33,160 --> 14:01:37,160 Otherwise, if you're still stuck, you can search for the code that you're running. 8351 14:01:37,160 --> 14:01:40,160 You might come across stack overflow or the PyTorch documentation. 8352 14:01:40,160 --> 14:01:43,160 We've spent a bunch of time in the PyTorch documentation already. 8353 14:01:43,160 --> 14:01:48,160 And we're going to be referencing a whole bunch in the next module in section three. 8354 14:01:48,160 --> 14:01:49,160 We're up to now. 8355 14:01:49,160 --> 14:01:53,160 If you go through all of these four steps, the next step is to try it again. 8356 14:01:53,160 --> 14:01:55,160 If and out, run the code. 8357 14:01:55,160 --> 14:02:00,160 And then, of course, if you're still stuck, you can ask a question on the PyTorch deep learning repo. 8358 14:02:00,160 --> 14:02:02,160 Discussions tab. 8359 14:02:02,160 --> 14:02:06,160 Now, if we open this up, we can go new discussion. 8360 14:02:06,160 --> 14:02:09,160 And you can write here section 03 for the computer vision. 8361 14:02:09,160 --> 14:02:15,160 My problem is, and then in here, you can write some code. 8362 14:02:15,160 --> 14:02:17,160 Be sure to format it as best you can. 8363 14:02:17,160 --> 14:02:19,160 That way it'll help us answer it. 8364 14:02:19,160 --> 14:02:23,160 And then go, what's happening here? 8365 14:02:23,160 --> 14:02:27,160 Now, why do I format the code in these back ticks here? 8366 14:02:27,160 --> 14:02:32,160 It's so that it looks like code and that it's easier to read when it's formatted on the GitHub discussion. 8367 14:02:32,160 --> 14:02:33,160 Then you can select a category. 8368 14:02:33,160 --> 14:02:39,160 If you have a general chat, an idea, a poll, a Q&A, or a show and tell of something you've made, 8369 14:02:39,160 --> 14:02:41,160 or what you've learned from the course. 8370 14:02:41,160 --> 14:02:43,160 For question and answering, you want to put it as Q&A. 8371 14:02:43,160 --> 14:02:45,160 Then you can click start discussion. 8372 14:02:45,160 --> 14:02:47,160 And it'll appear here. 8373 14:02:47,160 --> 14:02:50,160 And that way, they'll be searchable and we'll be able to help you out. 8374 14:02:50,160 --> 14:02:52,160 So I'm going to get out of this. 8375 14:02:52,160 --> 14:02:56,160 And oh, speaking of resources, we've got the PyTorch deep learning repo. 8376 14:02:56,160 --> 14:02:58,160 The links will be where you need the links. 8377 14:02:58,160 --> 14:03:04,160 All of the code that we're going to write in this section is contained within the section 3 notebook. 8378 14:03:04,160 --> 14:03:06,160 PyTorch computer vision. 8379 14:03:06,160 --> 14:03:09,160 Now, this is a beautiful notebook annotated with heaps of text and images. 8380 14:03:09,160 --> 14:03:13,160 You can go through this on your own time and use it as a reference to help out. 8381 14:03:13,160 --> 14:03:19,160 If you get stuck on any of the code we write through the videos, check it out in this notebook because it's probably here somewhere. 8382 14:03:19,160 --> 14:03:22,160 And then finally, let's get out of these. 8383 14:03:22,160 --> 14:03:24,160 If we come to the book version of the course, 8384 14:03:24,160 --> 14:03:26,160 this is learnpytorch.io. 8385 14:03:26,160 --> 14:03:27,160 We've got home. 8386 14:03:27,160 --> 14:03:30,160 This will probably be updated by the time you look at that. 8387 14:03:30,160 --> 14:03:34,160 But we have section 03, which is PyTorch computer vision. 8388 14:03:34,160 --> 14:03:39,160 It's got all of the information about what we're going to cover in a book format. 8389 14:03:39,160 --> 14:03:42,160 And you can, of course, skip ahead to different subtitles. 8390 14:03:42,160 --> 14:03:44,160 See what we're going to write here. 8391 14:03:44,160 --> 14:03:49,160 All of the links you need and extra resources will be at learnpytorch.io. 8392 14:03:49,160 --> 14:03:52,160 And for this section, it's PyTorch computer vision. 8393 14:03:52,160 --> 14:03:57,160 With that being said, speaking of computer vision, you might have the question, 8394 14:03:57,160 --> 14:04:00,160 what is a computer vision problem? 8395 14:04:00,160 --> 14:04:06,160 Well, if you can see it, it could probably be phrased at some sort of computer vision problem. 8396 14:04:06,160 --> 14:04:08,160 That's how broad computer vision is. 8397 14:04:08,160 --> 14:04:11,160 So let's have a few concrete examples. 8398 14:04:11,160 --> 14:04:14,160 We might have a binary classification problem, 8399 14:04:14,160 --> 14:04:17,160 such as if we wanted to have two different images. 8400 14:04:17,160 --> 14:04:19,160 Is this photo of steak or pizza? 8401 14:04:19,160 --> 14:04:22,160 We might build a model that understands what steak looks like in an image. 8402 14:04:22,160 --> 14:04:24,160 This is a beautiful dish that I cooked, by the way. 8403 14:04:24,160 --> 14:04:27,160 This is me eating pizza at a cafe with my dad. 8404 14:04:27,160 --> 14:04:31,160 And so we could have binary classification, one thing or another. 8405 14:04:31,160 --> 14:04:35,160 And so our machine learning model may take in the pixels of an image 8406 14:04:35,160 --> 14:04:39,160 and understand the different patterns that go into what a steak looks like 8407 14:04:39,160 --> 14:04:41,160 and the same thing with a pizza. 8408 14:04:41,160 --> 14:04:46,160 Now, the important thing to note is that we won't actually be telling our model what to learn. 8409 14:04:46,160 --> 14:04:50,160 It will learn those patterns itself from different examples of images. 8410 14:04:50,160 --> 14:04:55,160 Then we could step things up and have a multi-class classification problem. 8411 14:04:55,160 --> 14:04:56,160 You're noticing a trend here. 8412 14:04:56,160 --> 14:05:00,160 We've covered classification before, but classification can be quite broad. 8413 14:05:00,160 --> 14:05:06,160 It can be across different domains, such as vision or text or audio. 8414 14:05:06,160 --> 14:05:09,160 But if we were working with multi-class classification for an image problem, 8415 14:05:09,160 --> 14:05:13,160 we might have, is this a photo of sushi, steak or pizza? 8416 14:05:13,160 --> 14:05:15,160 And then we have three classes instead of two. 8417 14:05:15,160 --> 14:05:19,160 But again, this could be 100 classes, such as what Nutrify uses, 8418 14:05:19,160 --> 14:05:21,160 which is an app that I'm working on. 8419 14:05:21,160 --> 14:05:23,160 We go to Nutrify.app. 8420 14:05:23,160 --> 14:05:25,160 This is bare bones at the moment. 8421 14:05:25,160 --> 14:05:29,160 But right now, Nutrify can classify up to 100 different foods. 8422 14:05:29,160 --> 14:05:33,160 So if you were to upload an image of food, let's give it a try. 8423 14:05:33,160 --> 14:05:39,160 Nutrify, we'll go into images, and we'll go into sample food images. 8424 14:05:39,160 --> 14:05:41,160 And how about some chicken wings? 8425 14:05:41,160 --> 14:05:43,160 What does it classify this as? 8426 14:05:43,160 --> 14:05:45,160 Chicken wings. Beautiful. 8427 14:05:45,160 --> 14:05:49,160 And then if we upload an image of not food, maybe. 8428 14:05:49,160 --> 14:05:50,160 Let's go to Nutrify. 8429 14:05:50,160 --> 14:05:52,160 This is on my computer, by the way. 8430 14:05:52,160 --> 14:05:54,160 You might not have a sample folder set up like this. 8431 14:05:54,160 --> 14:05:57,160 And then if we upload a photo of a Cybertruck, what does it say? 8432 14:05:57,160 --> 14:05:58,160 No food found. 8433 14:05:58,160 --> 14:06:00,160 Please try another image. 8434 14:06:00,160 --> 14:06:04,160 So behind the scenes, Nutrify is using the pixels of an image 8435 14:06:04,160 --> 14:06:06,160 and then running them through a machine learning model 8436 14:06:06,160 --> 14:06:09,160 and classifying it first, whether it's food or not food. 8437 14:06:09,160 --> 14:06:13,160 And then if it is food, classifying it as what food it is. 8438 14:06:13,160 --> 14:06:16,160 So right now it works for 100 different foods. 8439 14:06:16,160 --> 14:06:18,160 So if we have a look at all these, it can classify apples, 8440 14:06:18,160 --> 14:06:21,160 artichokes, avocados, barbecue sauce. 8441 14:06:21,160 --> 14:06:24,160 Each of these work at different levels of performance, 8442 14:06:24,160 --> 14:06:27,160 but that's just something to keep in mind of what you can do. 8443 14:06:27,160 --> 14:06:30,160 So the whole premise of Nutrify is to upload a photo of food 8444 14:06:30,160 --> 14:06:33,160 and then learn about the nutrition about it. 8445 14:06:33,160 --> 14:06:35,160 So let's go back to our keynote. 8446 14:06:35,160 --> 14:06:37,160 What's another example? 8447 14:06:37,160 --> 14:06:40,160 Well, we could use computer vision for object detection, 8448 14:06:40,160 --> 14:06:42,160 where you might answer the question is, 8449 14:06:42,160 --> 14:06:44,160 where's the thing we're looking for? 8450 14:06:44,160 --> 14:06:48,160 So for example, this car here, I caught them on security camera, 8451 14:06:48,160 --> 14:06:51,160 actually did a hit and run on my new car, 8452 14:06:51,160 --> 14:06:54,160 wasn't that much of an expensive car, but I parked it on the street 8453 14:06:54,160 --> 14:06:57,160 and this person, the trailer came off the back of their car 8454 14:06:57,160 --> 14:07:00,160 and hit my car and then they just picked the trailer up 8455 14:07:00,160 --> 14:07:02,160 and drove away. 8456 14:07:02,160 --> 14:07:06,160 But I went to my neighbor's house and had to look at their security footage 8457 14:07:06,160 --> 14:07:08,160 and they found this car. 8458 14:07:08,160 --> 14:07:11,160 So potentially, you could design a machine learning model 8459 14:07:11,160 --> 14:07:13,160 to find this certain type of car. 8460 14:07:13,160 --> 14:07:16,160 It was an orange jute, by the way, but the images were in black and white 8461 14:07:16,160 --> 14:07:19,160 to detect to see if it ever recognizes a similar car 8462 14:07:19,160 --> 14:07:21,160 that comes across the street and you could go, 8463 14:07:21,160 --> 14:07:23,160 hey, did you crash into my car the other day? 8464 14:07:23,160 --> 14:07:25,160 I didn't actually find who it was. 8465 14:07:25,160 --> 14:07:27,160 So sadly, it was a hit and run. 8466 14:07:27,160 --> 14:07:30,160 But that's object detection, finding something in an image. 8467 14:07:30,160 --> 14:07:32,160 And then you might want to find out 8468 14:07:32,160 --> 14:07:34,160 whether the different sections in this image. 8469 14:07:34,160 --> 14:07:38,160 So this is a great example at what Apple uses on their devices, 8470 14:07:38,160 --> 14:07:43,160 iPhones and iPads and whatnot, to segregate or segment 8471 14:07:43,160 --> 14:07:46,160 the different sections of an image, so person one, person two, 8472 14:07:46,160 --> 14:07:49,160 skin tones, hair, sky, original. 8473 14:07:49,160 --> 14:07:53,160 And then it enhances each of these sections in different ways. 8474 14:07:53,160 --> 14:07:56,160 So that's a practice known as computational photography. 8475 14:07:56,160 --> 14:08:00,160 But the whole premise is how do you segment different portions of an image? 8476 14:08:00,160 --> 14:08:02,160 And then there's a great blog post here 8477 14:08:02,160 --> 14:08:04,160 that talks about how it works and what it does 8478 14:08:04,160 --> 14:08:06,160 and what kind of model that's used. 8479 14:08:06,160 --> 14:08:10,160 I'll leave that as extra curriculum if you'd like to look into it. 8480 14:08:10,160 --> 14:08:13,160 So if you have these images, how do you enhance the sky? 8481 14:08:13,160 --> 14:08:16,160 How do you make the skin tones look how they should? 8482 14:08:16,160 --> 14:08:19,160 How do you remove the background if you really wanted to? 8483 14:08:19,160 --> 14:08:21,160 So all of this happens on device. 8484 14:08:21,160 --> 14:08:24,160 So that's where I got that image from, by the way. 8485 14:08:24,160 --> 14:08:26,160 Semantic Mars. 8486 14:08:26,160 --> 14:08:29,160 And this is another great blog, Apple Machine Learning Research. 8487 14:08:29,160 --> 14:08:33,160 So to keep this in mind, we're about to see another example for computer vision, 8488 14:08:33,160 --> 14:08:35,160 which is Tesla Computer Vision. 8489 14:08:35,160 --> 14:08:39,160 A lot of companies have websites such as Apple Machine Learning Research 8490 14:08:39,160 --> 14:08:44,160 where they share a whole bunch of what they're up to in the world of machine learning. 8491 14:08:44,160 --> 14:08:48,160 So in Tesla's case, they have eight cameras on each of their self-driving cars 8492 14:08:48,160 --> 14:08:52,160 that fuels their full self-driving beta software. 8493 14:08:52,160 --> 14:08:56,160 And so they use computer vision to understand what's going on in an image 8494 14:08:56,160 --> 14:08:58,160 and then plan what's going on. 8495 14:08:58,160 --> 14:09:00,160 So this is three-dimensional vector space. 8496 14:09:00,160 --> 14:09:04,160 And what this means is they're basically taking these different viewpoints 8497 14:09:04,160 --> 14:09:08,160 from the eight different cameras, feeding them through some form of neural network, 8498 14:09:08,160 --> 14:09:13,160 and turning the representation of the environment around the car into a vector. 8499 14:09:13,160 --> 14:09:15,160 So a long string of numbers. 8500 14:09:15,160 --> 14:09:17,160 And why will it do that? 8501 14:09:17,160 --> 14:09:20,160 Well, because computers understand numbers far more than they understand images. 8502 14:09:20,160 --> 14:09:23,160 So we might be able to recognize what's happening here. 8503 14:09:23,160 --> 14:09:27,160 But for a computer to understand it, we have to turn it into vector space. 8504 14:09:27,160 --> 14:09:30,160 And so if you want to have a look at how Tesla uses computer vision, 8505 14:09:30,160 --> 14:09:33,160 so this is from Tesla's AI Day video. 8506 14:09:33,160 --> 14:09:35,160 I'm not going to play it all because it's three hours long, 8507 14:09:35,160 --> 14:09:38,160 but I watched it and I really enjoyed it. 8508 14:09:38,160 --> 14:09:40,160 So there's some information there. 8509 14:09:40,160 --> 14:09:42,160 And there's a little tidbit there. 8510 14:09:42,160 --> 14:09:45,160 If you go to two hours and one minute and 31 seconds on the same video, 8511 14:09:45,160 --> 14:09:48,160 have a look at what Tesla do. 8512 14:09:48,160 --> 14:09:52,160 Well, would you look at that? Where have we seen that before? 8513 14:09:52,160 --> 14:09:56,160 That's some device-agnostic code, but with Tesla's custom dojo chip. 8514 14:09:56,160 --> 14:09:58,160 So Tesla uses PyTorch. 8515 14:09:58,160 --> 14:10:00,160 So the exact same code that we're writing, 8516 14:10:00,160 --> 14:10:03,160 Tesla uses similar PyTorch code to, of course, 8517 14:10:03,160 --> 14:10:05,160 they write PyTorch code to suit their problem. 8518 14:10:05,160 --> 14:10:09,160 But nonetheless, they use PyTorch code to train their machine learning models 8519 14:10:09,160 --> 14:10:12,160 that power their self-driving software. 8520 14:10:12,160 --> 14:10:14,160 So how cool is that? 8521 14:10:14,160 --> 14:10:16,160 And if you want to have a look at another example, 8522 14:10:16,160 --> 14:10:19,160 there's plenty of different Tesla self-driving videos. 8523 14:10:19,160 --> 14:10:21,160 So, oh, we can just play it right here. 8524 14:10:21,160 --> 14:10:22,160 I was going to click the link. 8525 14:10:22,160 --> 14:10:24,160 So look, this is what happens. 8526 14:10:24,160 --> 14:10:26,160 If we have a look in the environment, 8527 14:10:26,160 --> 14:10:29,160 Tesla, the cameras, understand what's going on here. 8528 14:10:29,160 --> 14:10:31,160 And then it computes it into this little graphic here 8529 14:10:31,160 --> 14:10:33,160 on your heads-up display in the car. 8530 14:10:33,160 --> 14:10:35,160 And it kind of understands, well, I'm getting pretty close to this car. 8531 14:10:35,160 --> 14:10:37,160 I'm getting pretty close to that car. 8532 14:10:37,160 --> 14:10:40,160 And then it uses this information about what's happening, 8533 14:10:40,160 --> 14:10:43,160 this perception, to plan where it should drive next. 8534 14:10:43,160 --> 14:10:49,160 And I believe here it ends up going into it. 8535 14:10:49,160 --> 14:10:51,160 It has to stop. 8536 14:10:51,160 --> 14:10:53,160 Yeah, there we go. 8537 14:10:53,160 --> 14:10:54,160 Because we've got a stop sign. 8538 14:10:54,160 --> 14:10:55,160 Look at that. 8539 14:10:55,160 --> 14:10:56,160 It's perceiving the stop sign. 8540 14:10:56,160 --> 14:10:57,160 It's got two people here. 8541 14:10:57,160 --> 14:10:59,160 It just saw a car drive pass across this street. 8542 14:10:59,160 --> 14:11:00,160 So that is pretty darn cool. 8543 14:11:00,160 --> 14:11:03,160 That's just one example of computer vision, one of many. 8544 14:11:03,160 --> 14:11:07,160 And how would you find out what computer vision can be used for? 8545 14:11:07,160 --> 14:11:09,160 Here's what I do. 8546 14:11:09,160 --> 14:11:12,160 What can computer vision be used for? 8547 14:11:12,160 --> 14:11:14,160 Plenty more resources. 8548 14:11:14,160 --> 14:11:15,160 So, oh, there we go. 8549 14:11:15,160 --> 14:11:19,160 27 most popular computer vision applications in 2022. 8550 14:11:19,160 --> 14:11:22,160 So we've covered a fair bit there. 8551 14:11:22,160 --> 14:11:25,160 But what are we going to cover specifically with PyTorch code? 8552 14:11:25,160 --> 14:11:28,160 Well, broadly, like that. 8553 14:11:28,160 --> 14:11:32,160 We're going to get a vision data set to work with using torch vision. 8554 14:11:32,160 --> 14:11:35,160 So PyTorch has a lot of different domain libraries. 8555 14:11:35,160 --> 14:11:38,160 Torch vision helps us deal with computer vision problems. 8556 14:11:38,160 --> 14:11:42,160 And there's existing data sets that we can leverage to play around with computer vision. 8557 14:11:42,160 --> 14:11:45,160 We're going to have a look at the architecture of a convolutional neural network, 8558 14:11:45,160 --> 14:11:47,160 also known as a CNN with PyTorch. 8559 14:11:47,160 --> 14:11:51,160 We're going to look at an end-to-end multi-class image classification problem. 8560 14:11:51,160 --> 14:11:53,160 So multi-class is what? 8561 14:11:53,160 --> 14:11:54,160 More than one thing or another? 8562 14:11:54,160 --> 14:11:56,160 Could be three classes, could be a hundred. 8563 14:11:56,160 --> 14:11:59,160 We're going to look at steps at modeling with CNNs in PyTorch. 8564 14:11:59,160 --> 14:12:02,160 So we're going to create a convolutional network with PyTorch. 8565 14:12:02,160 --> 14:12:05,160 We're going to pick a last function and optimize it to suit our problem. 8566 14:12:05,160 --> 14:12:08,160 We're going to train a model, training a model a model. 8567 14:12:08,160 --> 14:12:10,160 A little bit of a typo there. 8568 14:12:10,160 --> 14:12:12,160 And then we're going to evaluate a model, right? 8569 14:12:12,160 --> 14:12:15,160 So we might have typos with our text, but we'll have less typos in the code. 8570 14:12:15,160 --> 14:12:17,160 And how are we going to do this? 8571 14:12:17,160 --> 14:12:20,160 Well, we could do it cook, so we could do it chemis. 8572 14:12:20,160 --> 14:12:22,160 Well, we're going to do it a little bit of both. 8573 14:12:22,160 --> 14:12:24,160 Part art, part science. 8574 14:12:24,160 --> 14:12:29,160 But since this is a machine learning cooking show, we're going to be cooking up lots of code. 8575 14:12:29,160 --> 14:12:34,160 So in the next video, we're going to cover the inputs and outputs of a computer vision problem. 8576 14:12:34,160 --> 14:12:36,160 I'll see you there. 8577 14:12:36,160 --> 14:12:40,160 So in the last video, we covered what we're going to cover, broadly. 8578 14:12:40,160 --> 14:12:43,160 And we saw some examples of what computer vision problems are. 8579 14:12:43,160 --> 14:12:48,160 Essentially, anything that you're able to see, you can potentially turn into a computer vision problem. 8580 14:12:48,160 --> 14:12:54,160 And we're going to be cooking up lots of machine learning, or specifically pie torch, computer vision code. 8581 14:12:54,160 --> 14:12:56,160 You see I fixed that typo. 8582 14:12:56,160 --> 14:13:00,160 Now let's talk about what the inputs and outputs are of a typical computer vision problem. 8583 14:13:00,160 --> 14:13:04,160 So let's start with a multi-classification example. 8584 14:13:04,160 --> 14:13:09,160 And so we wanted to take photos of different images of food and recognize what they were. 8585 14:13:09,160 --> 14:13:13,160 So we're replicating the functionality of Nutrify. 8586 14:13:13,160 --> 14:13:16,160 So take a photo of food and learn about it. 8587 14:13:16,160 --> 14:13:22,160 So we might start with a bunch of food images that have a height and width of some sort. 8588 14:13:22,160 --> 14:13:27,160 So we have width equals 224, height equals 224, and then they have three color channels. 8589 14:13:27,160 --> 14:13:28,160 Why three? 8590 14:13:28,160 --> 14:13:32,160 Well, that's because we have a value for red, green and blue. 8591 14:13:32,160 --> 14:13:40,160 So if we look at this up, if we go red, green, blue image format. 8592 14:13:40,160 --> 14:13:43,160 So 24-bit RGB images. 8593 14:13:43,160 --> 14:13:51,160 So a lot of images or digital images have some value for a red pixel, a green pixel and a blue pixel. 8594 14:13:51,160 --> 14:13:58,160 And if you were to convert images into numbers, they get represented by some value of red, some value of green and some value of blue. 8595 14:13:58,160 --> 14:14:02,160 That is exactly the same as how we'd represent these images. 8596 14:14:02,160 --> 14:14:09,160 So for example, this pixel here might be a little bit more red, a little less blue, and a little less green because it's close to orange. 8597 14:14:09,160 --> 14:14:11,160 And then we convert that into numbers. 8598 14:14:11,160 --> 14:14:18,160 So what we're trying to do here is essentially what we're trying to do with all of the data that we have with machine learning is represented as numbers. 8599 14:14:18,160 --> 14:14:23,160 So the typical image format to represent an image because we're using computer vision. 8600 14:14:23,160 --> 14:14:25,160 So we're trying to figure out what's in an image. 8601 14:14:25,160 --> 14:14:31,160 The typical way to represent that is in a tensor that has a value for the height, width and color channels. 8602 14:14:31,160 --> 14:14:34,160 And so we might numerically encode these. 8603 14:14:34,160 --> 14:14:37,160 In other words, represent our images as a tensor. 8604 14:14:37,160 --> 14:14:40,160 And this would be the inputs to our machine learning algorithm. 8605 14:14:40,160 --> 14:14:49,160 And in many cases, depending on what problem you're working on, an existing algorithm already exists for many of the most popular computer vision problems. 8606 14:14:49,160 --> 14:14:51,160 And if it doesn't, you can build one. 8607 14:14:51,160 --> 14:14:57,160 And then you might fashion this machine learning algorithm to output the exact shapes that you want. 8608 14:14:57,160 --> 14:14:59,160 In our case, we want three outputs. 8609 14:14:59,160 --> 14:15:02,160 We want one output for each class that we have. 8610 14:15:02,160 --> 14:15:05,160 We want a prediction probability for sushi. 8611 14:15:05,160 --> 14:15:07,160 We want a prediction probability for steak. 8612 14:15:07,160 --> 14:15:09,160 And we want a prediction probability for pizza. 8613 14:15:09,160 --> 14:15:17,160 Now in our case, in this iteration, looks like our model got one of them wrong because the highest value was assigned to the wrong class here. 8614 14:15:17,160 --> 14:15:21,160 So for the second image, it assigned a prediction probability of 0.81 for sushi. 8615 14:15:21,160 --> 14:15:26,160 Now, keep in mind that you could change these classes to whatever your particular problem is. 8616 14:15:26,160 --> 14:15:29,160 I'm just simplifying this and making it three. 8617 14:15:29,160 --> 14:15:31,160 You could have a hundred. 8618 14:15:31,160 --> 14:15:32,160 You could have a thousand. 8619 14:15:32,160 --> 14:15:34,160 You could have five. 8620 14:15:34,160 --> 14:15:37,160 It's just, it depends on what you're working with. 8621 14:15:37,160 --> 14:15:43,160 And so we might use these predicted outputs to enhance our app. 8622 14:15:43,160 --> 14:15:46,160 So if someone wants to take a photo of their plate of sushi, our app might say, 8623 14:15:46,160 --> 14:15:48,160 hey, this is a photo of sushi. 8624 14:15:48,160 --> 14:15:53,160 Here's some information about those, the sushi rolls or the same for steak, the same for pizza. 8625 14:15:53,160 --> 14:15:57,160 Now it might not always get it right because after all, that's what machine learning is. 8626 14:15:57,160 --> 14:15:59,160 It's probabilistic. 8627 14:15:59,160 --> 14:16:02,160 So how would we improve these results here? 8628 14:16:02,160 --> 14:16:08,160 Well, we could show our model more and more images of sushi steak and pizza 8629 14:16:08,160 --> 14:16:12,160 so that it builds up a better internal representation of said images. 8630 14:16:12,160 --> 14:16:17,160 So when it looks at images it's never seen before or images outside its training data set, 8631 14:16:17,160 --> 14:16:19,160 it's able to get better results. 8632 14:16:19,160 --> 14:16:24,160 But just keep in mind this whole process is similar no matter what computer vision problem you're working with. 8633 14:16:24,160 --> 14:16:27,160 You need a way to numerically encode your information. 8634 14:16:27,160 --> 14:16:30,160 You need a machine learning model that's capable of fitting the data 8635 14:16:30,160 --> 14:16:34,160 in the way that you would like it to be fit in our case classification. 8636 14:16:34,160 --> 14:16:37,160 You might have a different type of model if you're working with object detection, 8637 14:16:37,160 --> 14:16:40,160 a different type of model if you're working with segmentation. 8638 14:16:40,160 --> 14:16:44,160 And then you need to fashion the outputs in a way that best suit your problem as well. 8639 14:16:44,160 --> 14:16:47,160 So let's push forward. 8640 14:16:47,160 --> 14:16:52,160 Oh, by the way, the model that often does this is a convolutional neural network. 8641 14:16:52,160 --> 14:16:54,160 In other words, a CNN. 8642 14:16:54,160 --> 14:16:58,160 However, you can use many other different types of machine learning algorithms here. 8643 14:16:58,160 --> 14:17:03,160 It's just that convolutional neural networks typically perform the best with image data. 8644 14:17:03,160 --> 14:17:09,160 Although with recent research, there is the transformer architecture or deep learning model 8645 14:17:09,160 --> 14:17:13,160 that also performs fairly well or very well with image data. 8646 14:17:13,160 --> 14:17:15,160 So just keep that in mind going forward. 8647 14:17:15,160 --> 14:17:18,160 But for now we're going to focus on convolutional neural networks. 8648 14:17:18,160 --> 14:17:23,160 And so we might have input and output shapes because remember one of the chief machine learning problems 8649 14:17:23,160 --> 14:17:30,160 is making sure that your tensor shapes line up with each other, the input and output shapes. 8650 14:17:30,160 --> 14:17:35,160 So if we encoded this image of stake here, we might have a dimensionality of batch size 8651 14:17:35,160 --> 14:17:37,160 with height color channels. 8652 14:17:37,160 --> 14:17:39,160 And now the ordering here could be improved. 8653 14:17:39,160 --> 14:17:41,160 It's usually height then width. 8654 14:17:41,160 --> 14:17:42,160 So alphabetical order. 8655 14:17:42,160 --> 14:17:44,160 And then color channels last. 8656 14:17:44,160 --> 14:17:47,160 So we might have the shape of none, two, two, four, two, four, three. 8657 14:17:47,160 --> 14:17:48,160 Now where does this come from? 8658 14:17:48,160 --> 14:17:50,160 So none could be the batch size. 8659 14:17:50,160 --> 14:17:56,160 Now it's none because we can set the batch size to whatever we want, say for example 32. 8660 14:17:56,160 --> 14:18:02,160 Then we might have a height of two to four and a width of two to four and three color channels. 8661 14:18:02,160 --> 14:18:05,160 Now height and width are also customizable. 8662 14:18:05,160 --> 14:18:08,160 You might change this to be 512 by 512. 8663 14:18:08,160 --> 14:18:11,160 What that would mean is that you have more numbers representing your image. 8664 14:18:11,160 --> 14:18:19,160 And in sense would take more computation to figure out the patterns because there is simply more information encoded in your image. 8665 14:18:19,160 --> 14:18:23,160 But two, two, four, two, four is a common starting point for images. 8666 14:18:23,160 --> 14:18:28,160 And then 32 is also a very common batch size, as we've seen in previous videos. 8667 14:18:28,160 --> 14:18:33,160 But again, this could be changed depending on the hardware you're using, depending on the model you're using. 8668 14:18:33,160 --> 14:18:35,160 You might have a batch size to 64. 8669 14:18:35,160 --> 14:18:37,160 You might have a batch size of 512. 8670 14:18:37,160 --> 14:18:39,160 It's all problem specific. 8671 14:18:39,160 --> 14:18:41,160 And that's this line here. 8672 14:18:41,160 --> 14:18:44,160 These will vary depending on the problem you're working on. 8673 14:18:44,160 --> 14:18:49,160 So in our case, our output shape is three because we have three different classes for now. 8674 14:18:49,160 --> 14:18:53,160 But again, if you have a hundred, you might have an output shape of a hundred. 8675 14:18:53,160 --> 14:18:56,160 If you have a thousand, you might have an output shape of a thousand. 8676 14:18:56,160 --> 14:19:00,160 The same premise of this whole pattern remains though. 8677 14:19:00,160 --> 14:19:07,160 Numerically encode your data, feed it into a model, and then make sure the output shape fits your specific problem. 8678 14:19:07,160 --> 14:19:15,160 And so, for this section, Computer Vision with PyTorch, we're going to be building CNNs to do this part. 8679 14:19:15,160 --> 14:19:21,160 We're actually going to do all of the parts here, but we're going to focus on building a convolutional neural network 8680 14:19:21,160 --> 14:19:25,160 to try and find patterns in data, because it's not always guaranteed that it will. 8681 14:19:25,160 --> 14:19:28,160 Finally, let's look at one more problem. 8682 14:19:28,160 --> 14:19:34,160 Say you had grayscale images of fashion items, and you have quite small images. 8683 14:19:34,160 --> 14:19:36,160 They're only 28 by 28. 8684 14:19:36,160 --> 14:19:38,160 The exact same pattern is going to happen. 8685 14:19:38,160 --> 14:19:42,160 You numerically represent it, use it as inputs to a machine learning algorithm, 8686 14:19:42,160 --> 14:19:46,160 and then hopefully your machine learning algorithm outputs the right type of clothing that it is. 8687 14:19:46,160 --> 14:19:48,160 In this case, it's a t-shirt. 8688 14:19:48,160 --> 14:19:55,160 But I've got dot dot dot here because we're going to be working on a problem that uses ten different types of items of clothing. 8689 14:19:55,160 --> 14:19:58,160 And the images are grayscale, so there's not much detail. 8690 14:19:58,160 --> 14:20:02,160 So hopefully our machine learning algorithm can recognize what's going on in these images. 8691 14:20:02,160 --> 14:20:05,160 There might be a boot, there might be a shirt, there might be pants, there might be a dress, 8692 14:20:05,160 --> 14:20:07,160 etc, etc. 8693 14:20:07,160 --> 14:20:14,160 But we numerically encode our images into dimensionality of batch size, height with color channels. 8694 14:20:14,160 --> 14:20:23,160 This is known as NHWC, or number of batches, or number of images in a batch, height with C, or color channels. 8695 14:20:23,160 --> 14:20:25,160 This is color channels last. 8696 14:20:25,160 --> 14:20:27,160 Why am I showing you two forms of this? 8697 14:20:27,160 --> 14:20:31,160 Do you notice color channels in this one is color channels first? 8698 14:20:31,160 --> 14:20:33,160 So color channels height width? 8699 14:20:33,160 --> 14:20:38,160 Well, because you come across a lot of different representations of data full stop, 8700 14:20:38,160 --> 14:20:42,160 but particularly image data in PyTorch and other libraries, 8701 14:20:42,160 --> 14:20:45,160 many libraries expect color channels last. 8702 14:20:45,160 --> 14:20:49,160 However, PyTorch currently at the time of recording this video may change in the future, 8703 14:20:49,160 --> 14:20:53,160 defaults to representing image data with color channels first. 8704 14:20:53,160 --> 14:20:59,160 Now this is very important because you will get errors if your dimensionality is in the wrong order. 8705 14:20:59,160 --> 14:21:05,160 And so there are ways to go in between these two, and there's a lot of debate of which format is the best. 8706 14:21:05,160 --> 14:21:09,160 It looks like color channels last is going to win over the long term, just because it's more efficient, 8707 14:21:09,160 --> 14:21:12,160 but again, that's outside the scope, but just keep this in mind. 8708 14:21:12,160 --> 14:21:19,160 We're going to write code to interact between these two, but it's the same data just represented in different order. 8709 14:21:19,160 --> 14:21:25,160 And so we could rearrange these shapes to how we want color channels last or color channels first. 8710 14:21:25,160 --> 14:21:31,160 And once again, the shapes will vary depending on the problem that you're working on. 8711 14:21:31,160 --> 14:21:35,160 So with that being said, we've covered the input and output shapes. 8712 14:21:35,160 --> 14:21:37,160 How are we going to see them with code? 8713 14:21:37,160 --> 14:21:43,160 Well, of course we're going to be following the PyTorch workflow that we've done. 8714 14:21:43,160 --> 14:21:47,160 So we need to get our data ready, turn it into tenses in some way, shape or form. 8715 14:21:47,160 --> 14:21:49,160 We can do that with taught division transforms. 8716 14:21:49,160 --> 14:21:52,160 Oh, we haven't seen that one yet, but we will. 8717 14:21:52,160 --> 14:21:57,160 We can use torchutilsdata.datasetutils.data.data loader. 8718 14:21:57,160 --> 14:22:00,160 We can then build a model or pick a pre-trained model to suit our problem. 8719 14:22:00,160 --> 14:22:06,160 We've got a whole bunch of modules to help us with that, torchNN module, torchvision.models. 8720 14:22:06,160 --> 14:22:11,160 And then we have an optimizer and a loss function. 8721 14:22:11,160 --> 14:22:16,160 We can evaluate the model using torch metrics, or we can code our own metric functions. 8722 14:22:16,160 --> 14:22:20,160 We can of course improve through experimentation, which we will see later on, 8723 14:22:20,160 --> 14:22:21,160 which we've actually done that, right? 8724 14:22:21,160 --> 14:22:23,160 We've done improvement through experimentation. 8725 14:22:23,160 --> 14:22:25,160 We've tried different models, we've tried different things. 8726 14:22:25,160 --> 14:22:31,160 And then finally, we can save and reload our trained model if we wanted to use it elsewhere. 8727 14:22:31,160 --> 14:22:33,160 So with that being said, we've covered the workflow. 8728 14:22:33,160 --> 14:22:36,160 This is just a high-level overview of what we're going to code. 8729 14:22:36,160 --> 14:22:41,160 You might be asking the question, what is a convolutional neural network, or a CNN? 8730 14:22:41,160 --> 14:22:43,160 Let's answer that in the next video. 8731 14:22:43,160 --> 14:22:46,160 I'll see you there. 8732 14:22:46,160 --> 14:22:47,160 Welcome back. 8733 14:22:47,160 --> 14:22:52,160 In the last video, we saw examples of computer vision input and output shapes. 8734 14:22:52,160 --> 14:22:58,160 And we kind of hinted at the fact that convolutional neural networks are deep learning models, or CNNs, 8735 14:22:58,160 --> 14:23:02,160 that are quite good at recognizing patterns in images. 8736 14:23:02,160 --> 14:23:07,160 So we left off the last video with the question, what is a convolutional neural network? 8737 14:23:07,160 --> 14:23:09,160 And where could you find out about that? 8738 14:23:09,160 --> 14:23:13,160 What is a convolutional neural network? 8739 14:23:13,160 --> 14:23:15,160 Here's one way to find out. 8740 14:23:15,160 --> 14:23:19,160 And I'm sure, as you've seen, there's a lot of resources for such things. 8741 14:23:19,160 --> 14:23:22,160 A comprehensive guide to convolutional neural networks. 8742 14:23:22,160 --> 14:23:24,160 Which one of these is the best? 8743 14:23:24,160 --> 14:23:26,160 Well, it doesn't really matter. 8744 14:23:26,160 --> 14:23:28,160 The best one is the one that you understand the best. 8745 14:23:28,160 --> 14:23:29,160 So there we go. 8746 14:23:29,160 --> 14:23:31,160 There's a great video from Code Basics. 8747 14:23:31,160 --> 14:23:34,160 I've seen that one before, simple explanation of convolutional neural network. 8748 14:23:34,160 --> 14:23:38,160 I'll leave you to research these things on your own. 8749 14:23:38,160 --> 14:23:42,160 And if you wanted to look at images, there's a whole bunch of images. 8750 14:23:42,160 --> 14:23:45,160 I prefer to learn things by writing code. 8751 14:23:45,160 --> 14:23:47,160 Because remember, this course is code first. 8752 14:23:47,160 --> 14:23:51,160 As a machine learning engineer, 99% of my time is spent writing code. 8753 14:23:51,160 --> 14:23:53,160 So that's what we're going to focus on. 8754 14:23:53,160 --> 14:23:56,160 But anyway, here's the typical architecture of a CNN. 8755 14:23:56,160 --> 14:23:58,160 In other words, a convolutional neural network. 8756 14:23:58,160 --> 14:24:01,160 If you hear me say CNN, I'm not talking about the news website. 8757 14:24:01,160 --> 14:24:05,160 In this course, I'm talking about the architecture convolutional neural network. 8758 14:24:05,160 --> 14:24:11,160 So this is some PyTorch code that we're going to be working towards building. 8759 14:24:11,160 --> 14:24:14,160 But we have some hyperparameters slash layer types here. 8760 14:24:14,160 --> 14:24:15,160 We have an input layer. 8761 14:24:15,160 --> 14:24:19,160 So we have an input layer, which takes some in channels, and an input shape. 8762 14:24:19,160 --> 14:24:23,160 Because remember, it's very important in machine learning and deep learning to line up your 8763 14:24:23,160 --> 14:24:27,160 input and output shapes of whatever model you're using, whatever problem you're working with. 8764 14:24:27,160 --> 14:24:30,160 Then we have some sort of convolutional layer. 8765 14:24:30,160 --> 14:24:33,160 Now, what might happen in a convolutional layer? 8766 14:24:33,160 --> 14:24:39,160 Well, as you might have guessed, as what happens in many neural networks, is that the layers 8767 14:24:39,160 --> 14:24:42,160 perform some sort of mathematical operation. 8768 14:24:42,160 --> 14:24:48,160 Now, convolutional layers perform convolving window operation across an image or across 8769 14:24:48,160 --> 14:24:49,160 a tensor. 8770 14:24:49,160 --> 14:24:53,160 And discover patterns using, let's have a look, actually. 8771 14:24:53,160 --> 14:24:59,160 Let's go, nn.com2d. 8772 14:24:59,160 --> 14:25:01,160 There we go. 8773 14:25:01,160 --> 14:25:02,160 This is what happens. 8774 14:25:02,160 --> 14:25:10,160 So the output of our network equals a bias plus the sum of the weight tensor over the 8775 14:25:10,160 --> 14:25:14,160 convolutional channel out, okay, times input. 8776 14:25:14,160 --> 14:25:19,160 Now, if you want to dig deeper into what is actually going on here, you're more than welcome to 8777 14:25:19,160 --> 14:25:20,160 do that. 8778 14:25:20,160 --> 14:25:23,160 But we're going to be writing code that leverages the torch nn.com2d. 8779 14:25:23,160 --> 14:25:29,160 And we're going to fix up all of these hyperparameters here so that it works with our problem. 8780 14:25:29,160 --> 14:25:33,160 Now, what you need to know here is that this is a bias term. 8781 14:25:33,160 --> 14:25:34,160 We've seen this before. 8782 14:25:34,160 --> 14:25:36,160 And this is a weight matrix. 8783 14:25:36,160 --> 14:25:39,160 So a bias vector typically and a weight matrix. 8784 14:25:39,160 --> 14:25:42,160 And they operate over the input. 8785 14:25:42,160 --> 14:25:45,160 But we'll see these later on with code. 8786 14:25:45,160 --> 14:25:47,160 So just keep that in mind. 8787 14:25:47,160 --> 14:25:48,160 This is what's happening. 8788 14:25:48,160 --> 14:25:53,160 As with every layer in a neural network, some form of operation is happening on our input 8789 14:25:53,160 --> 14:25:54,160 data. 8790 14:25:54,160 --> 14:25:59,160 These operations happen layer by layer until eventually, hopefully, they can be turned into 8791 14:25:59,160 --> 14:26:02,160 some usable output. 8792 14:26:02,160 --> 14:26:04,160 So let's jump back in here. 8793 14:26:04,160 --> 14:26:09,160 Then we have an hidden activation slash nonlinear activation because why do we use nonlinear 8794 14:26:09,160 --> 14:26:10,160 activations? 8795 14:26:10,160 --> 14:26:17,160 Well, it's because if our data was nonlinear, non-straight lines, we need the help of straight 8796 14:26:17,160 --> 14:26:21,160 and non-straight lines to model it, to draw patterns in it. 8797 14:26:21,160 --> 14:26:24,160 Then we typically have a pooling layer. 8798 14:26:24,160 --> 14:26:26,160 And I want you to take this architecture. 8799 14:26:26,160 --> 14:26:31,160 I've said typical here for a reason because these type of architectures are changing all 8800 14:26:31,160 --> 14:26:32,160 the time. 8801 14:26:32,160 --> 14:26:34,160 So this is just one typical example of a CNN. 8802 14:26:34,160 --> 14:26:37,160 It's about as basic as a CNN as you can get. 8803 14:26:37,160 --> 14:26:40,160 So over time, you will start to learn to build more complex models. 8804 14:26:40,160 --> 14:26:44,160 You will not only start to learn to build them, you will just start to learn to use them, 8805 14:26:44,160 --> 14:26:47,160 as we'll see later on in the transfer learning section of the course. 8806 14:26:47,160 --> 14:26:49,160 And then we have an output layer. 8807 14:26:49,160 --> 14:26:51,160 So do you notice the trend here? 8808 14:26:51,160 --> 14:26:55,160 We have an input layer and then we have multiple hidden layers that perform some sort of mathematical 8809 14:26:55,160 --> 14:26:56,160 operation on our data. 8810 14:26:56,160 --> 14:27:02,160 And then we have an output slash linear layer that converts our output into the ideal shape 8811 14:27:02,160 --> 14:27:03,160 that we'd like. 8812 14:27:03,160 --> 14:27:05,160 So we have an output shape here. 8813 14:27:05,160 --> 14:27:08,160 And then how does this look in process? 8814 14:27:08,160 --> 14:27:11,160 While we put in some images, they go through all of these layers here because we've used 8815 14:27:11,160 --> 14:27:12,160 an end up sequential. 8816 14:27:12,160 --> 14:27:18,960 And then hopefully this forward method returns x in a usable status or usable state that 8817 14:27:18,960 --> 14:27:20,960 we can convert into class names. 8818 14:27:20,960 --> 14:27:25,960 And then we could integrate this into our computer vision app in some way, shape or form. 8819 14:27:25,960 --> 14:27:28,960 And here's the asterisk here. 8820 14:27:28,960 --> 14:27:32,960 Note, there are almost an unlimited amount of ways you could stack together a convolutional 8821 14:27:32,960 --> 14:27:33,960 neural network. 8822 14:27:33,960 --> 14:27:35,960 This slide only demonstrates one. 8823 14:27:35,960 --> 14:27:38,960 So just keep that in mind, only demonstrates one. 8824 14:27:38,960 --> 14:27:41,960 But the best way to practice this sort of stuff is not to stare at a page. 8825 14:27:41,960 --> 14:27:44,960 It's to if and out, code it out. 8826 14:27:44,960 --> 14:27:49,960 So let's code, I'll see you in Google CoLab. 8827 14:27:49,960 --> 14:27:50,960 Welcome back. 8828 14:27:50,960 --> 14:27:54,960 Now, we've discussed a bunch of fundamentals about computer vision problems and convolutional 8829 14:27:54,960 --> 14:27:55,960 neural networks. 8830 14:27:55,960 --> 14:27:58,960 But rather than talk to more slides, well, let's start to code them out. 8831 14:27:58,960 --> 14:28:02,960 I'm going to meet you at colab.research.google.com. 8832 14:28:02,960 --> 14:28:05,960 She's going to clean up some of these tabs. 8833 14:28:05,960 --> 14:28:08,960 And I'm going to start a new notebook. 8834 14:28:08,960 --> 14:28:15,960 And then I'm going to name this one, this is going to be 03 PyTorch computer vision. 8835 14:28:15,960 --> 14:28:17,960 And I'm going to call mine video. 8836 14:28:17,960 --> 14:28:24,960 So just so it has the video tag, because if we go in here, if we go video notebooks of 8837 14:28:24,960 --> 14:28:28,960 the PyTorch deep learning repo, the video notebooks are stored in here. 8838 14:28:28,960 --> 14:28:29,960 They've got the underscore video tag. 8839 14:28:29,960 --> 14:28:33,960 So the video notebooks have all of the code I write exactly in the video. 8840 14:28:33,960 --> 14:28:36,960 But there are some reference notebooks to go along with it. 8841 14:28:36,960 --> 14:28:40,960 Let me just write a heading here, PyTorch computer vision. 8842 14:28:40,960 --> 14:28:45,960 And I'll put a resource here, see reference notebook. 8843 14:28:45,960 --> 14:28:48,960 Now, of course, this is the one that's the ground truth. 8844 14:28:48,960 --> 14:28:54,960 It's got all of the code that we're going to be writing. 8845 14:28:54,960 --> 14:28:56,960 I'm going to put that in here. 8846 14:28:56,960 --> 14:28:58,960 Explain with text and images and whatnot. 8847 14:28:58,960 --> 14:29:03,960 And then finally, as we got see reference online book. 8848 14:29:03,960 --> 14:29:09,960 And that is at learnpytorch.io at section number three, PyTorch computer vision. 8849 14:29:09,960 --> 14:29:11,960 I'm going to put that in there. 8850 14:29:11,960 --> 14:29:14,960 And then I'm going to turn this into markdown with command mm. 8851 14:29:14,960 --> 14:29:15,960 Beautiful. 8852 14:29:15,960 --> 14:29:16,960 So let's get started. 8853 14:29:16,960 --> 14:29:18,960 I'm going to get rid of this, get rid of this. 8854 14:29:18,960 --> 14:29:20,960 How do we start this off? 8855 14:29:20,960 --> 14:29:26,960 Well, I believe there are some computer vision libraries that you should be aware of. 8856 14:29:26,960 --> 14:29:29,960 Computer vision libraries in PyTorch. 8857 14:29:29,960 --> 14:29:32,960 So this is just going to be a text based cell. 8858 14:29:32,960 --> 14:29:43,960 But the first one is torch vision, which is the base domain library for PyTorch computer vision. 8859 14:29:43,960 --> 14:29:47,960 So if we look up torch vision, what do we find? 8860 14:29:47,960 --> 14:29:50,960 We have torch vision 0.12. 8861 14:29:50,960 --> 14:29:54,960 That's the version that torch vision is currently up to at the time of recording this. 8862 14:29:54,960 --> 14:29:59,960 So in here, this is very important to get familiar with if you're working on computer vision problems. 8863 14:29:59,960 --> 14:30:03,960 And of course, in the documentation, this is just another tidbit. 8864 14:30:03,960 --> 14:30:05,960 We have torch audio for audio problems. 8865 14:30:05,960 --> 14:30:11,960 We have torch text for text torch vision, which is what we're working on torch rack for recommendation systems 8866 14:30:11,960 --> 14:30:18,960 torch data for dealing with different data pipelines torch serve, which is for serving PyTorch models 8867 14:30:18,960 --> 14:30:20,960 and PyTorch on XLA. 8868 14:30:20,960 --> 14:30:24,960 So I believe that stands for accelerated linear algebra devices. 8869 14:30:24,960 --> 14:30:26,960 You don't have to worry about these ones for now. 8870 14:30:26,960 --> 14:30:28,960 We're focused on torch vision. 8871 14:30:28,960 --> 14:30:33,960 However, if you would like to learn more about a particular domain, this is where you would go to learn more. 8872 14:30:33,960 --> 14:30:36,960 So there's a bunch of different stuff that's going on here. 8873 14:30:36,960 --> 14:30:38,960 Transforming and augmenting images. 8874 14:30:38,960 --> 14:30:42,960 So fundamentally, computer vision is dealing with things in the form of images. 8875 14:30:42,960 --> 14:30:45,960 Even a video gets converted to an image. 8876 14:30:45,960 --> 14:30:47,960 We have models and pre-trained weights. 8877 14:30:47,960 --> 14:30:53,960 So as I referenced before, you can use an existing model that works on an existing computer vision problem for your own problem. 8878 14:30:53,960 --> 14:30:57,960 We're going to cover that in section, I think it's six, for transfer learning. 8879 14:30:57,960 --> 14:31:03,960 And then we have data sets, which is a bunch of computer vision data sets, utils, operators, a whole bunch of stuff here. 8880 14:31:03,960 --> 14:31:07,960 So PyTorch is really, really good for computer vision. 8881 14:31:07,960 --> 14:31:09,960 I mean, look at all the stuff that's going on here. 8882 14:31:09,960 --> 14:31:11,960 But that's enough talking about it. 8883 14:31:11,960 --> 14:31:12,960 Let's just put it in here. 8884 14:31:12,960 --> 14:31:14,960 Torch vision. This is the main one. 8885 14:31:14,960 --> 14:31:16,960 I'm not going to link to all of these. 8886 14:31:16,960 --> 14:31:20,960 All of the links for these, by the way, is in the book version of the course PyTorch Computer Vision. 8887 14:31:20,960 --> 14:31:23,960 And we have what we're going to cover. 8888 14:31:23,960 --> 14:31:27,960 And finally, computer vision libraries in PyTorch. 8889 14:31:27,960 --> 14:31:30,960 Torch vision, data sets, models, transforms, et cetera. 8890 14:31:30,960 --> 14:31:32,960 But let's just write down the other ones. 8891 14:31:32,960 --> 14:31:38,960 So we have torch vision, not data sets, something to be aware of. 8892 14:31:38,960 --> 14:31:47,960 So get data sets and data loading functions for computer vision here. 8893 14:31:47,960 --> 14:31:50,960 Then we have torch vision. 8894 14:31:50,960 --> 14:31:55,960 And from torch vision, models is get pre-trained computer vision. 8895 14:31:55,960 --> 14:32:00,960 So when I say pre-trained computer vision models, we're going to cover this more in transfer learning, as I said. 8896 14:32:00,960 --> 14:32:06,960 Pre-trained computer vision models are models that have been already trained on some existing vision data 8897 14:32:06,960 --> 14:32:11,960 and have trained weights, trained patterns that you can leverage for your own problems, 8898 14:32:11,960 --> 14:32:16,960 that you can leverage for your own problems. 8899 14:32:16,960 --> 14:32:20,960 Then we have torch vision.transforms. 8900 14:32:20,960 --> 14:32:35,960 And then we have functions for manipulating your vision data, which is, of course, images to be suitable for use with an ML model. 8901 14:32:35,960 --> 14:32:41,960 So remember, what do we have to do when we have image data or almost any kind of data? 8902 14:32:41,960 --> 14:32:47,960 For machine learning, we have to prepare it in a way so it can't just be pure images, so that's what transforms help us out with. 8903 14:32:47,960 --> 14:32:53,960 Transforms helps to turn our image data into numbers so we can use it with a machine learning model. 8904 14:32:53,960 --> 14:32:57,960 And then, of course, we have some, these are the torch utils. 8905 14:32:57,960 --> 14:33:02,960 This is not vision specific, it's entirety of PyTorch specific, and that's data set. 8906 14:33:02,960 --> 14:33:10,960 So if we wanted to create our own data set with our own custom data, we have the base data set class for PyTorch. 8907 14:33:10,960 --> 14:33:15,960 And then we have finally torch utils data. 8908 14:33:15,960 --> 14:33:22,960 These are just good to be aware of because you'll almost always use some form of data set slash data loader with whatever PyTorch problem you're working on. 8909 14:33:22,960 --> 14:33:29,960 So this creates a Python iterable over a data set. 8910 14:33:29,960 --> 14:33:30,960 Wonderful. 8911 14:33:30,960 --> 14:33:33,960 I think these are most of the libraries that we're going to be using in this section. 8912 14:33:33,960 --> 14:33:37,960 Let's import some of them, hey, so we can see what's going on. 8913 14:33:37,960 --> 14:33:41,960 Let's go import PyTorch. 8914 14:33:41,960 --> 14:33:43,960 Import PyTorch. 8915 14:33:43,960 --> 14:33:46,960 So import torch. 8916 14:33:46,960 --> 14:33:49,960 We're also going to get NN, which stands for neural network. 8917 14:33:49,960 --> 14:33:50,960 What's in NN? 8918 14:33:50,960 --> 14:33:57,960 Well, in NN, of course, we have lots of layers, lots of loss functions, a whole bunch of different stuff for building neural networks. 8919 14:33:57,960 --> 14:34:02,960 We're going to also import torch vision. 8920 14:34:02,960 --> 14:34:05,960 And then we're going to go from torch vision. 8921 14:34:05,960 --> 14:34:13,960 Import data sets because we're going to be using data sets later on to get a data set to work with from torch vision. 8922 14:34:13,960 --> 14:34:16,960 Well, import transforms. 8923 14:34:16,960 --> 14:34:23,960 You could also go from torch vision dot transforms import to tensor. 8924 14:34:23,960 --> 14:34:26,960 This is one of the main ones you'll see for computer vision problems to tensor. 8925 14:34:26,960 --> 14:34:28,960 You can imagine what it does. 8926 14:34:28,960 --> 14:34:29,960 But let's have a look. 8927 14:34:29,960 --> 14:34:33,960 Transforms to tensor. 8928 14:34:33,960 --> 14:34:36,960 Transforming and augmenting images. 8929 14:34:36,960 --> 14:34:37,960 So look where we are. 8930 14:34:37,960 --> 14:34:41,960 We're in pytorch.org slash vision slash stable slash transforms. 8931 14:34:41,960 --> 14:34:42,960 Over here. 8932 14:34:42,960 --> 14:34:44,960 So we're in the torch vision section. 8933 14:34:44,960 --> 14:34:47,960 And we're just looking at transforming and augmenting images. 8934 14:34:47,960 --> 14:34:49,960 So transforming. 8935 14:34:49,960 --> 14:34:50,960 What do we have? 8936 14:34:50,960 --> 14:34:53,960 Transforms are common image transformations of our and the transforms module. 8937 14:34:53,960 --> 14:34:56,960 They can be trained together using compose. 8938 14:34:56,960 --> 14:34:57,960 Beautiful. 8939 14:34:57,960 --> 14:35:01,960 So if we have two tensor, what does this do? 8940 14:35:01,960 --> 14:35:05,960 Convert a pill image on NumPy and the array to a tensor. 8941 14:35:05,960 --> 14:35:06,960 Beautiful. 8942 14:35:06,960 --> 14:35:08,960 That's what we want to do later on, isn't it? 8943 14:35:08,960 --> 14:35:14,960 Well, this is kind of me giving you a spoiler is we want to convert our images into tensors so that we can use those with our models. 8944 14:35:14,960 --> 14:35:22,960 But there's a whole bunch of different transforms here and actually one of your extra curriculum is to be to read through each of these packages for 10 minutes. 8945 14:35:22,960 --> 14:35:29,960 So that's about an hour of reading, but it will definitely help you later on if you get familiar with using the pytorch documentation. 8946 14:35:29,960 --> 14:35:32,960 After all, this course is just a momentum builder. 8947 14:35:32,960 --> 14:35:34,960 We're going to write heaves of pytorch code. 8948 14:35:34,960 --> 14:35:38,960 But fundamentally, you'll be teaching yourself a lot of stuff by reading the documentation. 8949 14:35:38,960 --> 14:35:40,960 Let's keep going with this. 8950 14:35:40,960 --> 14:35:42,960 Where were we up to? 8951 14:35:42,960 --> 14:35:48,960 When we're getting familiar with our data, mapplotlib is going to be fundamental for visualization. 8952 14:35:48,960 --> 14:35:55,960 Remember, the data explorer's motto, visualize, visualize, visualize, become one with the data. 8953 14:35:55,960 --> 14:36:00,960 So we're going to import mapplotlib.pyplot as PLT. 8954 14:36:00,960 --> 14:36:03,960 And then finally, let's check the versions. 8955 14:36:03,960 --> 14:36:10,960 So print torch.version or underscore, underscore version and print torch vision. 8956 14:36:10,960 --> 14:36:15,960 So by the time you watch this, there might be a newer version of each of these modules out. 8957 14:36:15,960 --> 14:36:18,960 If there's any errors in the code, please let me know. 8958 14:36:18,960 --> 14:36:22,960 But this is just a bare minimum version that you'll need to complete this section. 8959 14:36:22,960 --> 14:36:32,960 I believe at the moment, Google Colab is running 1.11 for torch and maybe 1.10. 8960 14:36:32,960 --> 14:36:34,960 We'll find out in a second. 8961 14:36:34,960 --> 14:36:35,960 It just connected. 8962 14:36:35,960 --> 14:36:38,960 So we're importing pytorch. 8963 14:36:38,960 --> 14:36:40,960 Okay, there we go. 8964 14:36:40,960 --> 14:36:46,960 So my pytorch version is 1.10 and it's got CUDA available and torch vision is 0.11. 8965 14:36:46,960 --> 14:36:50,960 So just make sure if you're running in Google Colab, if you're running this at a later date, 8966 14:36:50,960 --> 14:36:54,960 you probably have at minimum these versions, you might even have a later version. 8967 14:36:54,960 --> 14:36:58,960 So these are the minimum versions required for this upcoming section. 8968 14:36:58,960 --> 14:37:02,960 So we've covered the base computer vision libraries in pytorch. 8969 14:37:02,960 --> 14:37:03,960 We've got them ready to go. 8970 14:37:03,960 --> 14:37:07,960 How about in the next video, we cover getting a data set. 8971 14:37:07,960 --> 14:37:09,960 I'll see you there. 8972 14:37:09,960 --> 14:37:11,960 Welcome back. 8973 14:37:11,960 --> 14:37:15,960 So in the last video, we covered some of the fundamental computer vision libraries in pytorch. 8974 14:37:15,960 --> 14:37:19,960 The main one being torch vision and then modules that stem off torch vision. 8975 14:37:19,960 --> 14:37:25,960 And then of course, we've got torch utils dot data dot data set, which is the base data set class for pytorch 8976 14:37:25,960 --> 14:37:29,960 and data loader, which creates a Python irritable over a data set. 8977 14:37:29,960 --> 14:37:32,960 So let's begin where most machine learning projects do. 8978 14:37:32,960 --> 14:37:36,960 And that is getting a data set, getting a data set. 8979 14:37:36,960 --> 14:37:38,960 I'm going to turn this into markdown. 8980 14:37:38,960 --> 14:37:46,960 And the data set that we're going to be used to demonstrating some computer vision techniques is fashion amnest. 8981 14:37:46,960 --> 14:37:57,960 Which is a take of the data set we'll be using is fashion amnest, which is a take on the original amnest data set, 8982 14:37:57,960 --> 14:38:05,960 amnest database, which is modified national institute of standards and technology database, which is kind of like the hello world 8983 14:38:05,960 --> 14:38:11,960 in machine learning and computer vision, which is these are sample images from the amnest test data set, 8984 14:38:11,960 --> 14:38:15,960 which are grayscale images of handwritten digits. 8985 14:38:15,960 --> 14:38:21,960 So this, I believe was originally used for trying to find out if you could use computer vision at a postal service 8986 14:38:21,960 --> 14:38:24,960 to, I guess, recognize post codes and whatnot. 8987 14:38:24,960 --> 14:38:27,960 I may be wrong about that, but that's what I know. 8988 14:38:27,960 --> 14:38:29,960 Yeah, 1998. 8989 14:38:29,960 --> 14:38:32,960 So all the way back at 1998, how cool is that? 8990 14:38:32,960 --> 14:38:36,960 So this was basically where convolutional neural networks were founded. 8991 14:38:36,960 --> 14:38:38,960 I'll let you read up on the history of that. 8992 14:38:38,960 --> 14:38:44,960 But neural network started to get so good that this data set was quite easy for them to do really well. 8993 14:38:44,960 --> 14:38:46,960 And that's when fashion amnest came out. 8994 14:38:46,960 --> 14:38:50,960 So this is a little bit harder if we go into here. 8995 14:38:50,960 --> 14:38:53,960 This is by Zalando research fashion amnest. 8996 14:38:53,960 --> 14:38:58,960 And it's of grayscale images of pieces of clothing. 8997 14:38:58,960 --> 14:39:06,960 So like we saw before the input and output, what we're going to be trying to do is turning these images of clothing into numbers 8998 14:39:06,960 --> 14:39:11,960 and then training a computer vision model to recognize what the different styles of clothing are. 8999 14:39:11,960 --> 14:39:15,960 And here's a dimensionality plot of all the different items of clothing. 9000 14:39:15,960 --> 14:39:19,960 Visualizing where similar items are grouped together, there's the shoes and whatnot. 9001 14:39:19,960 --> 14:39:21,960 Is this interactive? 9002 14:39:21,960 --> 14:39:23,960 Oh no, it's a video. 9003 14:39:23,960 --> 14:39:24,960 Excuse me. 9004 14:39:24,960 --> 14:39:25,960 There we go. 9005 14:39:25,960 --> 14:39:27,960 To serious machine learning researchers. 9006 14:39:27,960 --> 14:39:29,960 We are talking about replacing amnest. 9007 14:39:29,960 --> 14:39:31,960 Amnest is too easy. 9008 14:39:31,960 --> 14:39:32,960 Amnest is overused. 9009 14:39:32,960 --> 14:39:35,960 Amnest cannot represent modern CV tasks. 9010 14:39:35,960 --> 14:39:41,960 So even now fashion amnest I would say has also been pretty much sold, but it's a good way to get started. 9011 14:39:41,960 --> 14:39:43,960 Now, where could we find such a data set? 9012 14:39:43,960 --> 14:39:45,960 We could download it from GitHub. 9013 14:39:45,960 --> 14:39:50,960 But if we come back to the taught division documentation, have a look at data sets. 9014 14:39:50,960 --> 14:39:52,960 We have a whole bunch of built-in data sets. 9015 14:39:52,960 --> 14:39:56,960 And remember, this is your extra curricular to read through these for 10 minutes or so each. 9016 14:39:56,960 --> 14:39:58,960 But we have an example. 9017 14:39:58,960 --> 14:40:02,960 We could download ImageNet if we want. 9018 14:40:02,960 --> 14:40:05,960 We also have some base classes here for custom data sets. 9019 14:40:05,960 --> 14:40:07,960 We'll see that later on. 9020 14:40:07,960 --> 14:40:10,960 But if we scroll through, we have image classification data sets. 9021 14:40:10,960 --> 14:40:11,960 Caltech 101. 9022 14:40:11,960 --> 14:40:13,960 I didn't even know what all of these are. 9023 14:40:13,960 --> 14:40:14,960 There's a lot here. 9024 14:40:14,960 --> 14:40:15,960 CFAR 100. 9025 14:40:15,960 --> 14:40:18,960 So that's an example of 100 different items. 9026 14:40:18,960 --> 14:40:22,960 So that would be a 100 class, multi-class classification problem. 9027 14:40:22,960 --> 14:40:24,960 CFAR 10 is 10 classes. 9028 14:40:24,960 --> 14:40:26,960 We have amnest. 9029 14:40:26,960 --> 14:40:28,960 We have fashion amnest. 9030 14:40:28,960 --> 14:40:29,960 Oh, that's the one we're after. 9031 14:40:29,960 --> 14:40:36,960 But this is basically what you would do to download a data set from taughtvision.datasets. 9032 14:40:36,960 --> 14:40:39,960 You would download the data in some way, shape, or form. 9033 14:40:39,960 --> 14:40:41,960 And then you would turn it into a data loader. 9034 14:40:41,960 --> 14:40:51,960 So ImageNet is one of the most popular or is probably the gold standard data set for computer vision evaluation. 9035 14:40:51,960 --> 14:40:52,960 It's quite a big data set. 9036 14:40:52,960 --> 14:40:53,960 It's got millions of images. 9037 14:40:53,960 --> 14:40:58,960 But that's the beauty of taught vision is that it allows us to download example data sets 9038 14:40:58,960 --> 14:41:00,960 that we can practice on. 9039 14:41:00,960 --> 14:41:03,960 I don't even perform research on from a built-in module. 9040 14:41:03,960 --> 14:41:07,960 So let's now have a look at the fashion amnest data set. 9041 14:41:07,960 --> 14:41:09,960 How might we get this? 9042 14:41:09,960 --> 14:41:12,960 So we've got some example code here, or this is the documentation. 9043 14:41:12,960 --> 14:41:15,960 taughtvision.datasets.fashion amnest. 9044 14:41:15,960 --> 14:41:16,960 We have to pass in a root. 9045 14:41:16,960 --> 14:41:19,960 So where do we want to download the data set? 9046 14:41:19,960 --> 14:41:22,960 We also have to pass in whether we want the training version of the data set 9047 14:41:22,960 --> 14:41:24,960 or whether we want the testing version of the data set. 9048 14:41:24,960 --> 14:41:26,960 Do we want to download it? 9049 14:41:26,960 --> 14:41:27,960 Yes or no? 9050 14:41:27,960 --> 14:41:31,960 Should we transform the data in any way shape or form? 9051 14:41:31,960 --> 14:41:36,960 So we're going to be downloading images through this function call or this class call. 9052 14:41:36,960 --> 14:41:39,960 Do we want to transform those images in some way? 9053 14:41:39,960 --> 14:41:42,960 What do we have to do to images before we can use them with a model? 9054 14:41:42,960 --> 14:41:45,960 We have to turn them into a tensor, so we might look into that in a moment. 9055 14:41:45,960 --> 14:41:50,960 And target transform is do we want to transform the labels in any way shape or form? 9056 14:41:50,960 --> 14:41:54,960 So often the data sets that you download from taughtvision.datasets 9057 14:41:54,960 --> 14:41:59,960 are pre formatted in a way that they can be quite easily used with PyTorch. 9058 14:41:59,960 --> 14:42:02,960 But that won't always be the case with your own custom data sets. 9059 14:42:02,960 --> 14:42:07,960 However, what we're about to cover is just important to get an idea of what the computer vision workflow is. 9060 14:42:07,960 --> 14:42:13,960 And then later on you can start to customize how you get your data in the right format to be used with the model. 9061 14:42:13,960 --> 14:42:15,960 Then we have some different parameters here and whatnot. 9062 14:42:15,960 --> 14:42:20,960 Let's just rather than look at the documentation, if and down, code it out. 9063 14:42:20,960 --> 14:42:31,960 So we'll be using fashion MNIST and we'll start by, I'm going to just put this here, from taughtvision.datasets. 9064 14:42:31,960 --> 14:42:36,960 And we'll put the link there and we'll start by getting the training data. 9065 14:42:36,960 --> 14:42:38,960 Set up training data. 9066 14:42:38,960 --> 14:42:43,960 I'm just going to make some code cells here so that I can code in the middle of the screen. 9067 14:42:43,960 --> 14:42:50,960 Set up training data. Training data equals data sets dot fashion MNIST. 9068 14:42:50,960 --> 14:42:55,960 Because recall, we've already from taughtvision. 9069 14:42:55,960 --> 14:43:01,960 We don't need to import this again, I'm just doing it for demonstration purposes, but from taughtvision import data sets 9070 14:43:01,960 --> 14:43:04,960 so we can just call data sets dot fashion MNIST. 9071 14:43:04,960 --> 14:43:06,960 And then we're going to type in root. 9072 14:43:06,960 --> 14:43:09,960 See how the doc string comes up and tells us what's going on. 9073 14:43:09,960 --> 14:43:15,960 I personally find this a bit hard to read in Google Colab, so if I'm looking up the documentation, 9074 14:43:15,960 --> 14:43:17,960 I like to just go into here. 9075 14:43:17,960 --> 14:43:19,960 But let's code it out. 9076 14:43:19,960 --> 14:43:25,960 So root is going to be data, so where to download data to. 9077 14:43:25,960 --> 14:43:27,960 We'll see what this does in a minute. 9078 14:43:27,960 --> 14:43:28,960 Then we're going to go train. 9079 14:43:28,960 --> 14:43:31,960 We want the training version of the data set. 9080 14:43:31,960 --> 14:43:36,960 So as I said, a lot of the data sets that you find in taughtvision.datasets 9081 14:43:36,960 --> 14:43:40,960 have been formatted into training data set and testing data set already. 9082 14:43:40,960 --> 14:43:47,960 So this Boolean tells us do we want the training data set? 9083 14:43:47,960 --> 14:43:51,960 So if that was false, we would get the testing data set of fashion MNIST. 9084 14:43:51,960 --> 14:43:53,960 Do we want to download it? 9085 14:43:53,960 --> 14:43:56,960 Do we want to download? 9086 14:43:56,960 --> 14:43:57,960 Yes, no. 9087 14:43:57,960 --> 14:44:00,960 So yes, we do. We're going to set that to true. 9088 14:44:00,960 --> 14:44:03,960 Now what sort of transform do we want to do? 9089 14:44:03,960 --> 14:44:07,960 So because we're going to be downloading images and what do we have to do to our images 9090 14:44:07,960 --> 14:44:11,960 to use them with a machine-loading model, we have to convert them into tensors. 9091 14:44:11,960 --> 14:44:20,960 So I'm going to pass the transform to tensor, but we could also just go torchvision.transforms.to tensor. 9092 14:44:20,960 --> 14:44:23,960 That would be the exact same thing as what we just did before. 9093 14:44:23,960 --> 14:44:27,960 And then the target transform, do we want to transform the labels? 9094 14:44:27,960 --> 14:44:28,960 No, we don't. 9095 14:44:28,960 --> 14:44:31,960 We're going to see how they come, or the target, sorry. 9096 14:44:31,960 --> 14:44:34,960 High torch, this is another way, another naming convention. 9097 14:44:34,960 --> 14:44:37,960 Often uses target for the target that you're trying to predict. 9098 14:44:37,960 --> 14:44:42,960 So using data to predict the target, which is I often use data to predict a label. 9099 14:44:42,960 --> 14:44:44,960 They're the same thing. 9100 14:44:44,960 --> 14:44:49,960 So how do we want to transform the data? 9101 14:44:49,960 --> 14:44:56,960 And how do we want to transform the labels? 9102 14:44:56,960 --> 14:45:01,960 And then we're going to do the same for the test data. 9103 14:45:01,960 --> 14:45:03,960 So we're going to go data sets. 9104 14:45:03,960 --> 14:45:05,960 You might know what to do here. 9105 14:45:05,960 --> 14:45:09,960 It's going to be the exact same code as above, except we're going to change one line. 9106 14:45:09,960 --> 14:45:11,960 We want to store it in data. 9107 14:45:11,960 --> 14:45:16,960 We want to download the training data set as false because we want the testing version. 9108 14:45:16,960 --> 14:45:18,960 Do we want to download it? 9109 14:45:18,960 --> 14:45:19,960 Yes, we do. 9110 14:45:19,960 --> 14:45:21,960 Do we want to transform it the data? 9111 14:45:21,960 --> 14:45:26,960 Yes, we do, we want to use to tensor to convert our image data to tensors. 9112 14:45:26,960 --> 14:45:29,960 And do we want to do a target transform? 9113 14:45:29,960 --> 14:45:30,960 Well, no, we don't. 9114 14:45:30,960 --> 14:45:33,960 We want to keep the label slash the targets as they are. 9115 14:45:33,960 --> 14:45:35,960 Let's see what happens when we run this. 9116 14:45:35,960 --> 14:45:39,960 Oh, downloading fashion, Evan is beautiful. 9117 14:45:39,960 --> 14:45:41,960 So this is going to download all of the labels. 9118 14:45:41,960 --> 14:45:42,960 What do we have? 9119 14:45:42,960 --> 14:45:47,960 Train images, train labels, lovely, test images, test labels, beautiful. 9120 14:45:47,960 --> 14:45:52,960 So that's how quickly we can get a data set by using torch vision data sets. 9121 14:45:52,960 --> 14:45:56,960 Now, if we have a look over here, we have a data folder because we set the root to be 9122 14:45:56,960 --> 14:45:57,960 data. 9123 14:45:57,960 --> 14:46:00,960 Now, if we look what's inside here, we have fashion MNIST, exactly what we wanted. 9124 14:46:00,960 --> 14:46:05,960 Then we have the raw, and then we have a whole bunch of files here, which torch vision has 9125 14:46:05,960 --> 14:46:08,960 converted into data sets for us. 9126 14:46:08,960 --> 14:46:10,960 So let's get out of that. 9127 14:46:10,960 --> 14:46:16,960 And this process would be much the same if we used almost any data set in here. 9128 14:46:16,960 --> 14:46:19,960 They might be slightly different depending on what the documentation says and depending 9129 14:46:19,960 --> 14:46:21,560 on what the data set is. 9130 14:46:21,560 --> 14:46:27,760 But that is how easy torch vision data sets makes it to practice on example computer vision 9131 14:46:27,760 --> 14:46:29,160 data sets. 9132 14:46:29,160 --> 14:46:31,360 So let's go back. 9133 14:46:31,360 --> 14:46:35,840 Let's check out some parameters or some attributes of our data. 9134 14:46:35,840 --> 14:46:38,960 How many samples do we have? 9135 14:46:38,960 --> 14:46:45,760 So we'll check the lengths. 9136 14:46:45,760 --> 14:46:51,160 So we have 60,000 training examples and 10,000 testing examples. 9137 14:46:51,160 --> 14:46:54,680 So what we're going to be doing is we're going to be building a computer vision model to 9138 14:46:54,680 --> 14:47:00,160 find patterns in the training data and then use those patterns to predict on the test 9139 14:47:00,160 --> 14:47:01,400 data. 9140 14:47:01,400 --> 14:47:04,080 And so let's see a first training example. 9141 14:47:04,080 --> 14:47:08,280 See the first training example. 9142 14:47:08,280 --> 14:47:12,400 So we can just index on the train data. 9143 14:47:12,400 --> 14:47:18,640 Let's get the zero index and then we're going to have a look at the image and the label. 9144 14:47:18,640 --> 14:47:19,640 Oh my goodness. 9145 14:47:19,640 --> 14:47:21,560 A whole bunch of numbers. 9146 14:47:21,560 --> 14:47:26,080 Now you see what the two tensor has done for us? 9147 14:47:26,080 --> 14:47:30,680 So we've downloaded some images and thanks to this torch vision transforms to tensor. 9148 14:47:30,680 --> 14:47:32,400 How would we find the documentation for this? 9149 14:47:32,400 --> 14:47:36,680 Well, we could go and see what this does transforms to tensor. 9150 14:47:36,680 --> 14:47:40,080 We could go to tensor. 9151 14:47:40,080 --> 14:47:41,080 There we go. 9152 14:47:41,080 --> 14:47:42,080 What does this do? 9153 14:47:42,080 --> 14:47:43,680 Convert a pill image. 9154 14:47:43,680 --> 14:47:47,700 So that's Python image library image on NumPy array to a tensor. 9155 14:47:47,700 --> 14:47:49,760 This transform does not support torch script. 9156 14:47:49,760 --> 14:47:55,080 So converts a pill image on NumPy array height with color channels in the range 0 to 255 9157 14:47:55,080 --> 14:47:57,320 to a torch float tensor of shape. 9158 14:47:57,320 --> 14:47:58,320 See here? 9159 14:47:58,320 --> 14:48:03,280 This is what I was talking about how PyTorch defaults with a lot of transforms to CHW. 9160 14:48:03,280 --> 14:48:07,960 So color channels first height then width in that range of zero to one. 9161 14:48:07,960 --> 14:48:12,520 So typically red, green and blue values are between zero and 255. 9162 14:48:12,520 --> 14:48:15,080 But neural networks like things between zero and one. 9163 14:48:15,080 --> 14:48:21,000 And in this case, it is now in the shape of color channels first, then height, then width. 9164 14:48:21,000 --> 14:48:26,520 However, some other machine learning libraries prefer height, width, then color channels. 9165 14:48:26,520 --> 14:48:27,520 Just keep that in mind. 9166 14:48:27,520 --> 14:48:29,920 We're going to see this in practice later on. 9167 14:48:29,920 --> 14:48:30,920 So we've got an image. 9168 14:48:30,920 --> 14:48:31,920 We've got a label. 9169 14:48:31,920 --> 14:48:33,800 Let's check out some more details about it. 9170 14:48:33,800 --> 14:48:34,800 Remember how we discussed? 9171 14:48:34,800 --> 14:48:36,640 Oh, there's our label, by the way. 9172 14:48:36,640 --> 14:48:45,200 So nine, we can go traindata.classes, find some information about our class names. 9173 14:48:45,200 --> 14:48:46,800 Class names. 9174 14:48:46,800 --> 14:48:48,800 Beautiful. 9175 14:48:48,800 --> 14:48:55,600 So number nine would be 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. 9176 14:48:55,600 --> 14:48:59,400 So this particular tensor seems to relate to an ankle boot. 9177 14:48:59,400 --> 14:49:00,840 How would we find that out? 9178 14:49:00,840 --> 14:49:01,840 Well, one second. 9179 14:49:01,840 --> 14:49:04,560 I'm just going to show you one more thing, class to IDX. 9180 14:49:04,560 --> 14:49:07,480 Let's go traindata.class to IDX. 9181 14:49:07,480 --> 14:49:09,240 What does this give us? 9182 14:49:09,240 --> 14:49:10,240 Class to IDX. 9183 14:49:10,240 --> 14:49:16,400 This is going to give us a dictionary of different labels and their corresponding index. 9184 14:49:16,400 --> 14:49:20,880 So if our machine learning model predicted nine or class nine, we can convert that to 9185 14:49:20,880 --> 14:49:24,560 ankle boot using this attribute of the train data. 9186 14:49:24,560 --> 14:49:26,840 There are more attributes that you can have a look at if you like. 9187 14:49:26,840 --> 14:49:32,560 You can go traindata.dot, then I just push tab to find out a bunch of different things. 9188 14:49:32,560 --> 14:49:33,560 You can go data. 9189 14:49:33,560 --> 14:49:37,280 That'll be the images, and then I believe you can also go targets. 9190 14:49:37,280 --> 14:49:43,280 So targets, that's all the labels, which is one big long tensor. 9191 14:49:43,280 --> 14:49:45,760 Now let's check the shape. 9192 14:49:45,760 --> 14:49:49,560 Check the shape of our image. 9193 14:49:49,560 --> 14:49:53,240 So image.shape and label.shape. 9194 14:49:53,240 --> 14:49:55,240 What are we going to get from that? 9195 14:49:55,240 --> 14:49:58,040 Oh, label doesn't have a shape. 9196 14:49:58,040 --> 14:49:59,040 Why is that? 9197 14:49:59,040 --> 14:50:01,000 Well, because it's only an integer. 9198 14:50:01,000 --> 14:50:02,000 So oh, beautiful. 9199 14:50:02,000 --> 14:50:03,080 Look at that. 9200 14:50:03,080 --> 14:50:06,440 So our image shape is we have a color channel of one. 9201 14:50:06,440 --> 14:50:13,360 So let me print this out in something prettier, print image shape, which is going to be image 9202 14:50:13,360 --> 14:50:14,360 shape. 9203 14:50:14,360 --> 14:50:19,200 Remember how I said it's very important to be aware of the input and output shapes of 9204 14:50:19,200 --> 14:50:20,880 your models and your data. 9205 14:50:20,880 --> 14:50:23,720 It's all part of becoming one with the data. 9206 14:50:23,720 --> 14:50:28,920 So that is what our image shape is. 9207 14:50:28,920 --> 14:50:36,480 And then if we go next, this is print image label, which is label, but we'll index on 9208 14:50:36,480 --> 14:50:39,880 class names for label. 9209 14:50:39,880 --> 14:50:44,040 And then we'll do that wonderful. 9210 14:50:44,040 --> 14:50:47,920 So our image shape is currently in the format of color channels height width. 9211 14:50:47,920 --> 14:50:50,400 We got a bunch of different numbers that's representing our image. 9212 14:50:50,400 --> 14:50:51,760 It's black and white. 9213 14:50:51,760 --> 14:50:54,000 It only has one color channel. 9214 14:50:54,000 --> 14:50:56,880 Why do you think it only has one color channel? 9215 14:50:56,880 --> 14:51:00,600 Because it's black and white, so if we jump back into the keynote, fashion, we've already 9216 14:51:00,600 --> 14:51:04,280 discussed this, grayscale images have one color channel. 9217 14:51:04,280 --> 14:51:07,840 So that means that for black, the pixel value is zero. 9218 14:51:07,840 --> 14:51:12,000 And for white, it's some value for whatever color is going on here. 9219 14:51:12,000 --> 14:51:16,240 So if it's a very high number, say it's one, it's going to be pure white. 9220 14:51:16,240 --> 14:51:20,640 If it's like 0.001, it might be a faint white pixel. 9221 14:51:20,640 --> 14:51:23,480 But if it's exactly zero, it's going to be black. 9222 14:51:23,480 --> 14:51:27,800 So color images have three color channels for red, green and blue, grayscale have one 9223 14:51:27,800 --> 14:51:29,760 color channel. 9224 14:51:29,760 --> 14:51:34,680 But I think we've done enough of visualizing our images as numbers. 9225 14:51:34,680 --> 14:51:39,800 How about in the next video, we visualize our image as an image? 9226 14:51:39,800 --> 14:51:42,560 I'll see you there. 9227 14:51:42,560 --> 14:51:43,720 Welcome back. 9228 14:51:43,720 --> 14:51:49,160 So in the last video, we checked the input output shapes of our data, and we downloaded 9229 14:51:49,160 --> 14:51:54,880 the fashion MNIST data set, which is comprised of images or grayscale images of T-shirts, 9230 14:51:54,880 --> 14:52:00,520 trousers, pullovers, dress, coat, sandal, shirt, sneaker, bag, ankle boot. 9231 14:52:00,520 --> 14:52:06,120 Now we want to see if we can build a computer vision model to decipher what's going on in 9232 14:52:06,120 --> 14:52:07,120 fashion MNIST. 9233 14:52:07,120 --> 14:52:14,080 So to separate, to classify different items of clothing based on their numerical representation. 9234 14:52:14,080 --> 14:52:18,800 And part of becoming one with the data is, of course, checking the input output shapes 9235 14:52:18,800 --> 14:52:19,880 of it. 9236 14:52:19,880 --> 14:52:24,480 So this is a fashion MNIST data set from Zalando Research. 9237 14:52:24,480 --> 14:52:27,320 Now if you recall, why did we look at our input and output shapes? 9238 14:52:27,320 --> 14:52:29,360 Well, this is what we looked at before. 9239 14:52:29,360 --> 14:52:34,360 We have 28 by 28 grayscale images that we want to represent as a tensor. 9240 14:52:34,360 --> 14:52:37,960 We want to use them as input into a machine learning algorithm, typically a computer vision 9241 14:52:37,960 --> 14:52:39,840 algorithm, such as a CNN. 9242 14:52:39,840 --> 14:52:45,520 And we want to have some sort of outputs that are formatted in the ideal shape that we'd 9243 14:52:45,520 --> 14:52:46,520 like. 9244 14:52:46,520 --> 14:52:49,200 So in our case, we have 10 different types of clothing. 9245 14:52:49,200 --> 14:52:54,800 So we're going to have an output shape of 10, but our input shape is what? 9246 14:52:54,800 --> 14:52:59,280 So by default, PyTorch turns tensors into color channels first. 9247 14:52:59,280 --> 14:53:03,080 So we have an input shape of none, one, 28, 28. 9248 14:53:03,080 --> 14:53:07,520 So none is going to be our batch size, which of course we can set that to whatever we'd 9249 14:53:07,520 --> 14:53:08,520 like. 9250 14:53:08,520 --> 14:53:15,040 Now input shape format is in NCHW, or in other words, color channels first. 9251 14:53:15,040 --> 14:53:18,880 But just remember, if you're working with some other machine learning libraries, you 9252 14:53:18,880 --> 14:53:21,760 may want to use color channels last. 9253 14:53:21,760 --> 14:53:24,280 So let's have a look at where that might be the case. 9254 14:53:24,280 --> 14:53:27,400 We're going to visualize our images. 9255 14:53:27,400 --> 14:53:30,480 So I make a little heading here, 1.2. 9256 14:53:30,480 --> 14:53:35,800 Now this is all part of becoming one with the data. 9257 14:53:35,800 --> 14:53:39,760 In other words, understanding its input and output shapes, how many samples there are, 9258 14:53:39,760 --> 14:53:44,440 what they look like, visualize, visualize, visualize. 9259 14:53:44,440 --> 14:53:45,440 Let's import mapplotlib. 9260 14:53:45,440 --> 14:53:56,280 I'm just going to add a few code cells here, import mapplotlib.pyplot as PLT. 9261 14:53:56,280 --> 14:54:03,720 Now let's create our image and label is our train data zero, and we're going to print 9262 14:54:03,720 --> 14:54:10,400 the image shape so we can understand what inputs are going into our mapplotlib function. 9263 14:54:10,400 --> 14:54:15,440 And then we're going to go plot.imshow, and we're going to pass in our image and see 9264 14:54:15,440 --> 14:54:21,320 what happens, because recall what does our image look like, image? 9265 14:54:21,320 --> 14:54:24,400 Our image is this big tensor of numbers. 9266 14:54:24,400 --> 14:54:27,160 And we've got an image shape, 128, 128. 9267 14:54:27,160 --> 14:54:29,840 Now what happens if we call plot.imshow? 9268 14:54:29,840 --> 14:54:31,840 What happens there? 9269 14:54:31,840 --> 14:54:40,200 Oh, we get an error in valid shape, 128, 128 for image data. 9270 14:54:40,200 --> 14:54:45,440 Now as I said, this is one of the most common errors in machine learning is a shape issue. 9271 14:54:45,440 --> 14:54:50,360 So the shape of your input tensor doesn't match the expected shape of that tensor. 9272 14:54:50,360 --> 14:54:56,200 So this is one of those scenarios where our data format, so color channels first, doesn't 9273 14:54:56,200 --> 14:54:59,080 match up with what mapplotlib is expecting. 9274 14:54:59,080 --> 14:55:04,280 So mapplotlib expects either just height and width, so no color channel for gray style 9275 14:55:04,280 --> 14:55:08,080 images, or it also expects the color channels to be last. 9276 14:55:08,080 --> 14:55:12,800 So we'll see that later on, but for grayscale, we can get rid of that extra dimension by 9277 14:55:12,800 --> 14:55:16,360 passing in image.squeeze. 9278 14:55:16,360 --> 14:55:18,440 So do you recall what squeeze does? 9279 14:55:18,440 --> 14:55:21,160 It's going to remove that singular dimension. 9280 14:55:21,160 --> 14:55:25,960 If we have a look at what goes on now, beautiful, we get an ankle boot. 9281 14:55:25,960 --> 14:55:31,080 Well, that's a very pixelated ankle boot, but we're only dealing with 28 by 28 pixels, 9282 14:55:31,080 --> 14:55:33,280 so not a very high definition image. 9283 14:55:33,280 --> 14:55:34,760 Let's add the title to it. 9284 14:55:34,760 --> 14:55:37,600 We're going to add in the label. 9285 14:55:37,600 --> 14:55:41,520 Beautiful. 9286 14:55:41,520 --> 14:55:43,360 So we've got the number nine here. 9287 14:55:43,360 --> 14:55:46,960 So where if we go up to here, that's an ankle boot. 9288 14:55:46,960 --> 14:55:48,640 Now let's plot this in grayscale. 9289 14:55:48,640 --> 14:55:49,760 How might we do that? 9290 14:55:49,760 --> 14:55:50,760 We can do the same thing. 9291 14:55:50,760 --> 14:55:52,320 We can go plotplt.imshow. 9292 14:55:52,320 --> 14:55:56,560 We're going to pass in image.squeeze. 9293 14:55:56,560 --> 14:56:00,120 And we're going to change the color map, C map equals gray. 9294 14:56:00,120 --> 14:56:03,400 So in mapplotlib, if you ever have to change the colors of your plot, you want to look 9295 14:56:03,400 --> 14:56:09,960 into the C map property or parameter, or sometimes it's also shortened to just C. 9296 14:56:09,960 --> 14:56:14,600 But in this case, M show is C map, and we want to plot title, and we're going to pull 9297 14:56:14,600 --> 14:56:19,640 it in class names and the label integer here. 9298 14:56:19,640 --> 14:56:21,240 So we have a look at it now. 9299 14:56:21,240 --> 14:56:25,240 We have an ankle boot, and we can remove the accesses to if we wanted to plot.access, 9300 14:56:25,240 --> 14:56:27,080 and turn that off. 9301 14:56:27,080 --> 14:56:28,600 That's going to remove the access. 9302 14:56:28,600 --> 14:56:29,600 So there we go. 9303 14:56:29,600 --> 14:56:30,960 That's the type of images that we're dealing with. 9304 14:56:30,960 --> 14:56:33,160 But that's only a singular image. 9305 14:56:33,160 --> 14:56:38,480 How about we harness the power of randomness and have a look at some random images from 9306 14:56:38,480 --> 14:56:39,880 our data set? 9307 14:56:39,880 --> 14:56:42,200 So how would we do this? 9308 14:56:42,200 --> 14:56:44,200 Let's go plot more images. 9309 14:56:44,200 --> 14:56:46,680 We'll set a random seed. 9310 14:56:46,680 --> 14:56:51,640 So you and I are both looking at as similar as possible images, 42. 9311 14:56:51,640 --> 14:56:57,360 Now we'll create a plot by calling plot.figure, and we're going to give it a size. 9312 14:56:57,360 --> 14:56:59,520 We might create a nine by nine grid. 9313 14:56:59,520 --> 14:57:03,040 So we want to see nine random images from our data set. 9314 14:57:03,040 --> 14:57:06,800 So rows, calls, or sorry, maybe we'll do four by four. 9315 14:57:06,800 --> 14:57:07,800 That'll give us 16. 9316 14:57:07,800 --> 14:57:16,080 We're going to go four i in range, and we're going to go one to rows times columns plus 9317 14:57:16,080 --> 14:57:17,080 one. 9318 14:57:17,080 --> 14:57:18,320 So we can print i. 9319 14:57:18,320 --> 14:57:20,280 What's that going to give us? 9320 14:57:20,280 --> 14:57:21,960 We want to see 16 images. 9321 14:57:21,960 --> 14:57:24,200 Oh, they're about. 9322 14:57:24,200 --> 14:57:30,360 So 16 random images, but used with a manual C to 42 of our data set. 9323 14:57:30,360 --> 14:57:33,760 This is one of my favorite things to do with any type of data set that I'm looking 9324 14:57:33,760 --> 14:57:36,760 at, whether it be text, image, audio, doesn't matter. 9325 14:57:36,760 --> 14:57:41,800 I like to randomly have a look at a whole bunch of samples at the start so that I can 9326 14:57:41,800 --> 14:57:45,840 become one with the data. 9327 14:57:45,840 --> 14:57:50,240 With that being said, let's use this loop to grab some random indexes. 9328 14:57:50,240 --> 14:57:55,880 We can do so using tortures, rand, int, so random integer between zero and length of 9329 14:57:55,880 --> 14:57:57,280 the training data. 9330 14:57:57,280 --> 14:58:01,920 This is going to give us a random integer in the range of zero and however many training 9331 14:58:01,920 --> 14:58:06,760 samples we have, which in our case is what, 60,000 or thereabouts. 9332 14:58:06,760 --> 14:58:11,160 So we want to create the size of one, and we want to get the item from that so that we 9333 14:58:11,160 --> 14:58:12,560 have a random index. 9334 14:58:12,560 --> 14:58:14,320 What is this going to give us? 9335 14:58:14,320 --> 14:58:19,800 Oh, excuse me, maybe we print that out. 9336 14:58:19,800 --> 14:58:20,800 There we go. 9337 14:58:20,800 --> 14:58:21,800 So we have random images. 9338 14:58:21,800 --> 14:58:25,320 Now, because we're using manual seed, it will give us the same numbers every time. 9339 14:58:25,320 --> 14:58:30,520 So we have three, seven, five, four, two, three, seven, five, four, two. 9340 14:58:30,520 --> 14:58:36,240 And then if we just commented out the random seed, we'll get different numbers every time. 9341 14:58:36,240 --> 14:58:39,240 But this is just to demonstrate, we'll keep the manual seed there for now. 9342 14:58:39,240 --> 14:58:43,800 You can comment that out if you want different numbers or different images, different indexes 9343 14:58:43,800 --> 14:58:45,000 each time. 9344 14:58:45,000 --> 14:58:49,640 So we'll create the image and the label by indexing on the training data at the random 9345 14:58:49,640 --> 14:58:53,560 index that we're generating. 9346 14:58:53,560 --> 14:58:55,240 And then we'll create our plot. 9347 14:58:55,240 --> 14:59:02,920 So Fig or we'll add a subplot, Fig add subplot, and we're going to go rows, calls, I. 9348 14:59:02,920 --> 14:59:06,120 So at the if index, we're going to add a subplot. 9349 14:59:06,120 --> 14:59:08,960 Remember, we set rows and columns up to here. 9350 14:59:08,960 --> 14:59:13,280 And then we're going to go PLT dot in show, we're going to show what we're going to show 9351 14:59:13,280 --> 14:59:17,400 our image, but we have to squeeze it to get rid of that singular dimension as the color 9352 14:59:17,400 --> 14:59:18,400 channel. 9353 14:59:18,400 --> 14:59:20,000 Otherwise, we end up with an issue with map plot lib. 9354 14:59:20,000 --> 14:59:21,840 We're going to use a color map of gray. 9355 14:59:21,840 --> 14:59:24,000 So it looks like the image we plotted above. 9356 14:59:24,000 --> 14:59:28,640 And then for our title, it's going to be our class names indexed with our label. 9357 14:59:28,640 --> 14:59:31,400 And then we don't want the accesses because that's going to clutter up our plot. 9358 14:59:31,400 --> 14:59:32,880 Let's see what this looks like. 9359 14:59:32,880 --> 14:59:35,720 Oh my goodness, look at that. 9360 14:59:35,720 --> 14:59:36,720 It worked first. 9361 14:59:36,720 --> 14:59:37,720 Go. 9362 14:59:37,720 --> 14:59:40,560 Usually visualizations take a fair bit of trial and error. 9363 14:59:40,560 --> 14:59:45,320 So we have ankle boots, we have shirts, we have bags, we have ankle boots, sandal, shirt, 9364 14:59:45,320 --> 14:59:46,320 pull over. 9365 14:59:46,320 --> 14:59:52,040 Oh, do you notice something about the data set right now, pull over and shirt? 9366 14:59:52,040 --> 14:59:53,400 To me, they look quite similar. 9367 14:59:53,400 --> 14:59:57,320 Do you think that will cause an issue later on when our model is trying to predict between 9368 14:59:57,320 --> 14:59:59,120 a pull over and a shirt? 9369 14:59:59,120 --> 15:00:02,080 How about if we look at some more images? 9370 15:00:02,080 --> 15:00:06,880 We'll get rid of the random seed so we can have a look at different styles. 9371 15:00:06,880 --> 15:00:13,320 So have a sandal ankle boot coat, t-shirt, top, shirt, oh, is that a little bit confusing 9372 15:00:13,320 --> 15:00:16,960 that we have a class for t-shirt and top and shirt? 9373 15:00:16,960 --> 15:00:24,080 Like I'm not sure about you, but what's the difference between a t-shirt and a shirt? 9374 15:00:24,080 --> 15:00:27,600 This is just something to keep in mind as a t-shirt and top, does that look like it could 9375 15:00:27,600 --> 15:00:30,400 be maybe even a dress? 9376 15:00:30,400 --> 15:00:32,000 Like the shape is there. 9377 15:00:32,000 --> 15:00:34,880 So this is just something to keep in mind going forward. 9378 15:00:34,880 --> 15:00:39,720 The chances are if we get confused on our, like you and I looking at our data set, if 9379 15:00:39,720 --> 15:00:43,600 we get confused about different samples and what they're labeled with, our model might 9380 15:00:43,600 --> 15:00:45,920 get confused later on. 9381 15:00:45,920 --> 15:00:50,360 So let's have a look at one more and then we'll go into the next video. 9382 15:00:50,360 --> 15:00:56,200 So we have sneaker, trouser, shirt, sandal, dress, pull over, bag, bag, t-shirt, oh, that's 9383 15:00:56,200 --> 15:00:57,200 quite a difficult one. 9384 15:00:57,200 --> 15:01:00,280 It doesn't look like there's even much going on in that image. 9385 15:01:00,280 --> 15:01:05,080 But the whole premise of building machine learning models to do this would be could you 9386 15:01:05,080 --> 15:01:10,200 write a program that would take in the shapes of these images and figure out, write a rule-based 9387 15:01:10,200 --> 15:01:14,320 program that would go, hey, if it's looked like a rectangle with a buckle in the middle, 9388 15:01:14,320 --> 15:01:16,160 it's probably a bag? 9389 15:01:16,160 --> 15:01:22,200 I mean, you probably could after a while, but I prefer to write machine learning algorithms 9390 15:01:22,200 --> 15:01:24,080 to figure out patterns and data. 9391 15:01:24,080 --> 15:01:26,120 So let's start moving towards that. 9392 15:01:26,120 --> 15:01:30,160 We're now going to go on figuring out how we can prepare this data to be loaded into 9393 15:01:30,160 --> 15:01:31,520 a model. 9394 15:01:31,520 --> 15:01:33,200 I'll see you there. 9395 15:01:33,200 --> 15:01:36,480 All right, all right, all right. 9396 15:01:36,480 --> 15:01:42,880 So we've got 60,000 images of clothing that we'd like to build a computer vision model 9397 15:01:42,880 --> 15:01:44,680 to classify into 10 different classes. 9398 15:01:44,680 --> 15:01:49,560 And now that we've visualized a fair few of these samples, do you think that we could 9399 15:01:49,560 --> 15:01:54,760 model these with just linear lines, so straight lines, or do you think we'll need a model 9400 15:01:54,760 --> 15:01:56,640 with nonlinearity? 9401 15:01:56,640 --> 15:01:58,760 So I'm going to write that down. 9402 15:01:58,760 --> 15:02:13,640 So do you think these items of clothing images could be modeled with pure linear lines, or 9403 15:02:13,640 --> 15:02:17,360 do you think we'll need nonlinearity? 9404 15:02:17,360 --> 15:02:20,560 Don't have to answer that now. 9405 15:02:20,560 --> 15:02:21,560 We could test that out later on. 9406 15:02:21,560 --> 15:02:26,600 You might want to skip ahead and try to build a model yourself with linear lines or nonlinearities. 9407 15:02:26,600 --> 15:02:32,560 We've covered linear lines and nonlinearities before, but let's now start to prepare our 9408 15:02:32,560 --> 15:02:38,280 data even further to prepare data loader. 9409 15:02:38,280 --> 15:02:46,600 So right now, our data is in the form of PyTorch data sets. 9410 15:02:46,600 --> 15:02:48,720 So let's have a look at it. 9411 15:02:48,720 --> 15:02:52,480 Same data. 9412 15:02:52,480 --> 15:02:53,480 There we go. 9413 15:02:53,480 --> 15:02:55,920 So we have data set, which is of fashion MNIST. 9414 15:02:55,920 --> 15:03:01,040 And then if we go test data, we see a similar thing except we have a different number of 9415 15:03:01,040 --> 15:03:02,040 data points. 9416 15:03:02,040 --> 15:03:05,680 We have the same transform on each, we've turned them into tenses. 9417 15:03:05,680 --> 15:03:09,600 So we want to convert them from a data set, which is a collection of all of our data, 9418 15:03:09,600 --> 15:03:10,600 into a data loader. 9419 15:03:10,600 --> 15:03:22,400 Paul, that a data loader, turns our data set into a Python iterable. 9420 15:03:22,400 --> 15:03:26,680 So I'm going to turn this into Markdown, beautiful. 9421 15:03:26,680 --> 15:03:31,240 More specifically, specific Galilee, can I spell right? 9422 15:03:31,240 --> 15:03:35,680 I don't know, we want to just code right, we're not here to learn spelling. 9423 15:03:35,680 --> 15:03:45,880 We want to turn our data into batches, or mini batches. 9424 15:03:45,880 --> 15:03:48,960 Why would we do this? 9425 15:03:48,960 --> 15:03:54,440 Well, we may get away with it by building a model to look at all 60,000 samples of our 9426 15:03:54,440 --> 15:03:56,840 current data set, because it's quite small. 9427 15:03:56,840 --> 15:04:01,320 It's only comprised of images of 28 by 28 pixels. 9428 15:04:01,320 --> 15:04:07,040 And when I say quite small, yes, 60,000 images is actually quite small for a deep learning 9429 15:04:07,040 --> 15:04:08,640 scale data set. 9430 15:04:08,640 --> 15:04:12,360 Modern data sets could be in the millions of images. 9431 15:04:12,360 --> 15:04:20,320 But if our computer hardware was able to look at 60,000 samples of 28 by 28 at one time, 9432 15:04:20,320 --> 15:04:22,240 it would need a fair bit of memory. 9433 15:04:22,240 --> 15:04:28,240 So we have RAM space up here, we have GPU memory, we have compute memory. 9434 15:04:28,240 --> 15:04:33,480 But chances are that it might not be able to store millions of images in memory. 9435 15:04:33,480 --> 15:04:40,480 So what you do is you break a data set from say 60,000 into groups of batches or mini 9436 15:04:40,480 --> 15:04:41,480 batches. 9437 15:04:41,480 --> 15:04:45,040 So we've seen batch size before, why would we do this? 9438 15:04:45,040 --> 15:04:58,000 Well, one, it is more computationally efficient, as in your computing hardware may not be able 9439 15:04:58,000 --> 15:05:06,320 to look store in memory at 60,000 images in one hit. 9440 15:05:06,320 --> 15:05:11,960 So we break it down to 32 images at a time. 9441 15:05:11,960 --> 15:05:14,800 This would be batch size of 32. 9442 15:05:14,800 --> 15:05:17,400 Now again, 32 is a number that you can change. 9443 15:05:17,400 --> 15:05:22,080 32 is just a common batch size that you'll see with many beginner style problems. 9444 15:05:22,080 --> 15:05:24,400 As you go on, you'll see different batch sizes. 9445 15:05:24,400 --> 15:05:29,200 This is just to exemplify the concept of mini batches, which is very common in deep 9446 15:05:29,200 --> 15:05:30,200 learning. 9447 15:05:30,200 --> 15:05:32,200 And why else would we do this? 9448 15:05:32,200 --> 15:05:41,120 The second point or the second main point is it gives our neural network more chances 9449 15:05:41,120 --> 15:05:45,480 to update its gradients per epoch. 9450 15:05:45,480 --> 15:05:48,880 So what I mean by this, this will make more sense when we write a training loop. 9451 15:05:48,880 --> 15:05:54,120 But if we were to just look at 60,000 images at one time, we would per epoch. 9452 15:05:54,120 --> 15:05:59,440 So per iteration through the data, we would only get one update per epoch across our entire 9453 15:05:59,440 --> 15:06:00,440 data set. 9454 15:06:00,440 --> 15:06:06,360 Whereas if we look at 32 images at a time, our neural network updates its internal states, 9455 15:06:06,360 --> 15:06:10,240 its weights, every 32 images, thanks to the optimizer. 9456 15:06:10,240 --> 15:06:12,800 This will make a lot more sense once we write our training loop. 9457 15:06:12,800 --> 15:06:17,440 But these are the two of the main reasons for turning our data into mini batches in the 9458 15:06:17,440 --> 15:06:18,960 form of a data loader. 9459 15:06:18,960 --> 15:06:22,640 Now if you'd like to learn more about the theory behind this, I would highly recommend 9460 15:06:22,640 --> 15:06:25,720 looking up Andrew Org mini batches. 9461 15:06:25,720 --> 15:06:28,720 There's a great lecture on that. 9462 15:06:28,720 --> 15:06:33,640 So yeah, large-scale machine learning, mini batch gradient descent, mini batch gradient 9463 15:06:33,640 --> 15:06:34,640 descent. 9464 15:06:34,640 --> 15:06:36,280 Yeah, that's what it's called mini batch gradient descent. 9465 15:06:36,280 --> 15:06:39,960 If you look up some results on that, you'll find a whole bunch of stuff. 9466 15:06:39,960 --> 15:06:48,080 I might just link this one, I'm going to pause that, I'm going to link this in there. 9467 15:06:48,080 --> 15:06:56,880 So for more on mini batches, see here. 9468 15:06:56,880 --> 15:07:00,920 Now to see this visually, I've got a slide prepared for this. 9469 15:07:00,920 --> 15:07:02,480 So this is what we're going to be working towards. 9470 15:07:02,480 --> 15:07:03,920 There's our input and output shapes. 9471 15:07:03,920 --> 15:07:08,480 We want to create batch size of 32 across all of our 60,000 training images. 9472 15:07:08,480 --> 15:07:12,840 And we're actually going to do the same for our testing images, but we only have 10,000 9473 15:07:12,840 --> 15:07:14,360 testing images. 9474 15:07:14,360 --> 15:07:17,640 So this is what our data set's going to look like, batched. 9475 15:07:17,640 --> 15:07:23,240 So we're going to write some code, namely using the data loader from torch.util.data. 9476 15:07:23,240 --> 15:07:26,120 We're going to pass it a data set, which is our train data. 9477 15:07:26,120 --> 15:07:28,880 We're going to give it a batch size, which we can define as whatever we want. 9478 15:07:28,880 --> 15:07:31,560 For us, we're going to use 32 to begin with. 9479 15:07:31,560 --> 15:07:35,360 And we're going to set shuffle equals true if we're using the training data. 9480 15:07:35,360 --> 15:07:38,200 Why would we set shuffle equals true? 9481 15:07:38,200 --> 15:07:43,120 Well, in case our data set for some reason has order, say we had all of the pants images 9482 15:07:43,120 --> 15:07:47,440 in a row, we had all of the T-shirt images in a row, we had all the sandal images in 9483 15:07:47,440 --> 15:07:48,440 a row. 9484 15:07:48,440 --> 15:07:51,640 We don't want our neural network to necessarily remember the order of our data. 9485 15:07:51,640 --> 15:07:56,280 We just want it to remember individual patterns between different classes. 9486 15:07:56,280 --> 15:07:59,440 So we shuffle up the data, we mix it, we mix it up. 9487 15:07:59,440 --> 15:08:02,200 And then it looks something like this. 9488 15:08:02,200 --> 15:08:06,840 So we might have batch number zero, and then we have 32 samples. 9489 15:08:06,840 --> 15:08:12,040 Now I ran out of space when I was creating these, but we got, that was fun up to 32. 9490 15:08:12,040 --> 15:08:14,440 So this is setting batch size equals 32. 9491 15:08:14,440 --> 15:08:17,700 So we look at 32 samples per batch. 9492 15:08:17,700 --> 15:08:22,720 We mix all the samples up, and we go batch, batch, batch, batch, batch, and we'll have, 9493 15:08:22,720 --> 15:08:26,160 however many batches we have, we'll have number of samples divided by the batch size. 9494 15:08:26,160 --> 15:08:31,000 So 60,000 divided by 32, what's that, 1800 or something like that? 9495 15:08:31,000 --> 15:08:33,120 So this is what we're going to be working towards. 9496 15:08:33,120 --> 15:08:36,320 I did want to write some code in this video, but I think to save it getting too long, we're 9497 15:08:36,320 --> 15:08:38,040 going to write this code in the next video. 9498 15:08:38,040 --> 15:08:42,960 If you would like to give this a go on your own, here's most of the code we have to do. 9499 15:08:42,960 --> 15:08:46,800 So there's the train data loader, do the same for the test data loader. 9500 15:08:46,800 --> 15:08:53,640 And I'll see you in the next video, and we're going to batchify our fashion MNIST data set. 9501 15:08:53,640 --> 15:08:54,720 Welcome back. 9502 15:08:54,720 --> 15:08:59,400 In the last video, we had a brief overview of the concept of mini batches. 9503 15:08:59,400 --> 15:09:05,720 And so rather than our computer looking at 60,000 images in one hit, we break things down. 9504 15:09:05,720 --> 15:09:07,960 We turn it into batches of 32. 9505 15:09:07,960 --> 15:09:11,280 Again, the batch size will vary depending on what problem you're working on. 9506 15:09:11,280 --> 15:09:15,280 But 32 is quite a good value to start with and try out. 9507 15:09:15,280 --> 15:09:20,000 And we do this for two main reasons, if we jump back to the code, why would we do this? 9508 15:09:20,000 --> 15:09:22,160 It is more computationally efficient. 9509 15:09:22,160 --> 15:09:27,920 So if we have a GPU with, say, 10 gigabytes of memory, it might not be able to store all 9510 15:09:27,920 --> 15:09:29,920 60,000 images in one hit. 9511 15:09:29,920 --> 15:09:35,000 In our data set, because it's quite small, it may be hour or two, but it's better practice 9512 15:09:35,000 --> 15:09:38,400 for later on to turn things into mini batches. 9513 15:09:38,400 --> 15:09:42,600 And it also gives our neural network more chances to update its gradients per epoch, 9514 15:09:42,600 --> 15:09:45,680 which will make a lot more sense once we write our training loop. 9515 15:09:45,680 --> 15:09:48,240 But for now, we've spoken enough about the theory. 9516 15:09:48,240 --> 15:09:50,640 Let's write some code to do so. 9517 15:09:50,640 --> 15:09:57,120 So I'm going to import data loader from torch dot utils dot data, import data loader. 9518 15:09:57,120 --> 15:10:02,680 And this principle, by the way, preparing a data loader goes the same for not only images, 9519 15:10:02,680 --> 15:10:08,520 but for text, for audio, whatever sort of data you're working with, mini batches will 9520 15:10:08,520 --> 15:10:13,440 follow you along or batches of data will follow you along throughout a lot of different deep 9521 15:10:13,440 --> 15:10:15,280 learning problems. 9522 15:10:15,280 --> 15:10:18,960 So set up the batch size hyper parameter. 9523 15:10:18,960 --> 15:10:23,560 Remember, a hyper parameter is a value that you can set yourself. 9524 15:10:23,560 --> 15:10:25,960 So batch size equals 32. 9525 15:10:25,960 --> 15:10:27,600 And it's practice. 9526 15:10:27,600 --> 15:10:29,040 You might see it typed as capitals. 9527 15:10:29,040 --> 15:10:34,480 You won't always see it, but you'll see it quite often a hyper parameter typed as capitals. 9528 15:10:34,480 --> 15:10:38,760 And then we're going to turn data sets into iterables. 9529 15:10:38,760 --> 15:10:40,920 So batches. 9530 15:10:40,920 --> 15:10:46,040 So we're going to create a train data loader here of our fashion MNIST data set. 9531 15:10:46,040 --> 15:10:47,720 We're going to use data loader. 9532 15:10:47,720 --> 15:10:49,680 We're going to see what the doc string is. 9533 15:10:49,680 --> 15:10:56,280 Or actually, let's look at the documentation torch data loader. 9534 15:10:56,280 --> 15:11:01,040 This is some extra curriculum for you too, by the way, is to read this data page torch 9535 15:11:01,040 --> 15:11:05,800 utils not data because no matter what problem you're going with with deep learning or pytorch, 9536 15:11:05,800 --> 15:11:07,520 you're going to be working with data. 9537 15:11:07,520 --> 15:11:09,960 So spend 10 minutes just reading through here. 9538 15:11:09,960 --> 15:11:13,880 I think I might have already assigned this, but this is just so important that it's worth 9539 15:11:13,880 --> 15:11:15,600 going through again. 9540 15:11:15,600 --> 15:11:17,120 Read through all of this. 9541 15:11:17,120 --> 15:11:20,440 Even if you don't understand all of it, what's going on, it's just it helps you know where 9542 15:11:20,440 --> 15:11:22,400 to look for certain things. 9543 15:11:22,400 --> 15:11:23,840 So what does it take? 9544 15:11:23,840 --> 15:11:25,080 Data loader takes a data set. 9545 15:11:25,080 --> 15:11:28,280 We need to set the batch size to something is the default of one. 9546 15:11:28,280 --> 15:11:32,400 That means that it would create a batch of one image at a time in our case. 9547 15:11:32,400 --> 15:11:33,920 Do we want to shuffle it? 9548 15:11:33,920 --> 15:11:35,520 Do we want to use a specific sampler? 9549 15:11:35,520 --> 15:11:37,920 There's a few more things going on. 9550 15:11:37,920 --> 15:11:39,120 Number of workers. 9551 15:11:39,120 --> 15:11:44,120 Number of workers stands for how many cores on our machine do we want to use to load data? 9552 15:11:44,120 --> 15:11:47,240 Generally the higher the better for this one, but we're going to keep most of these as 9553 15:11:47,240 --> 15:11:51,960 the default because most of them are set to pretty good values to begin with. 9554 15:11:51,960 --> 15:11:55,160 I'll let you read more into the other parameters here. 9555 15:11:55,160 --> 15:12:00,960 We're going to focus on the first three data set batch size and shuffle true or false. 9556 15:12:00,960 --> 15:12:02,840 Let's see what we can do. 9557 15:12:02,840 --> 15:12:08,200 So data set equals our train data, which is 60,000 fashion MNIST. 9558 15:12:08,200 --> 15:12:12,760 And then we have a batch size, which we're going to set to our batch size hyper parameter. 9559 15:12:12,760 --> 15:12:15,480 So we're going to have a batch size of 32. 9560 15:12:15,480 --> 15:12:17,800 And then finally, do we want to shuffle the training data? 9561 15:12:17,800 --> 15:12:19,960 Yes, we do. 9562 15:12:19,960 --> 15:12:23,360 And then we're going to do the same thing for the test data loader, except we're not 9563 15:12:23,360 --> 15:12:25,120 going to shuffle the test data. 9564 15:12:25,120 --> 15:12:31,800 Now, you can shuffle the test data if you want, but in my practice, it's actually easier 9565 15:12:31,800 --> 15:12:36,840 to evaluate different models when the test data isn't shuffled. 9566 15:12:36,840 --> 15:12:39,640 So you shuffle the training data to remove order. 9567 15:12:39,640 --> 15:12:41,640 And so your model doesn't learn order. 9568 15:12:41,640 --> 15:12:46,960 But for evaluation purposes, it's generally good to have your test data in the same order 9569 15:12:46,960 --> 15:12:51,480 because our model will never actually see the test data set during training. 9570 15:12:51,480 --> 15:12:53,800 We're just using it for evaluation. 9571 15:12:53,800 --> 15:12:56,680 So the order doesn't really matter to the test data loader. 9572 15:12:56,680 --> 15:13:01,160 It's just easier if we don't shuffle it, because then if we evaluate it multiple times, it's 9573 15:13:01,160 --> 15:13:03,680 not been shuffled every single time. 9574 15:13:03,680 --> 15:13:05,360 So let's run that. 9575 15:13:05,360 --> 15:13:14,640 And then we're going to check it out, our train data loader and our test data loader. 9576 15:13:14,640 --> 15:13:15,640 Beautiful. 9577 15:13:15,640 --> 15:13:20,400 Instances of torch utils data, data loader, data loader. 9578 15:13:20,400 --> 15:13:25,480 And now let's check out what we've created, hey, I always like to print different attributes 9579 15:13:25,480 --> 15:13:28,840 of whatever we make, check out what we've created. 9580 15:13:28,840 --> 15:13:32,280 This is all part of becoming one with the data. 9581 15:13:32,280 --> 15:13:40,920 So print F, I'm going to go data loaders, and then pass in, this is just going to output 9582 15:13:40,920 --> 15:13:43,760 basically the exact same as what we've got above. 9583 15:13:43,760 --> 15:13:45,840 This data loader. 9584 15:13:45,840 --> 15:13:50,120 And we can also see what attributes we can get from each of these by going train data 9585 15:13:50,120 --> 15:13:51,120 loader. 9586 15:13:51,120 --> 15:13:54,680 I don't need caps lock there, train data loader, full stop. 9587 15:13:54,680 --> 15:13:55,680 And then we can go tab. 9588 15:13:55,680 --> 15:13:57,520 We've got a whole bunch of different attributes. 9589 15:13:57,520 --> 15:13:58,960 We've got a batch size. 9590 15:13:58,960 --> 15:13:59,960 We've got our data set. 9591 15:13:59,960 --> 15:14:05,760 Do we want to drop the last as in if our batch size overlapped with our 60,000 samples? 9592 15:14:05,760 --> 15:14:07,760 Do we want to get rid of the last batch? 9593 15:14:07,760 --> 15:14:10,440 Say for example, the last batch only had 10 samples. 9594 15:14:10,440 --> 15:14:11,960 Do we want to just drop that? 9595 15:14:11,960 --> 15:14:14,840 Do we want to pin the memory that's going to help later on if we wanted to load our 9596 15:14:14,840 --> 15:14:15,840 data faster? 9597 15:14:15,840 --> 15:14:18,000 A whole bunch of different stuff here. 9598 15:14:18,000 --> 15:14:22,680 If you'd like to research more, you can find all the stuff about what's going on here in 9599 15:14:22,680 --> 15:14:24,680 the documentation. 9600 15:14:24,680 --> 15:14:26,400 But let's just keep pushing forward. 9601 15:14:26,400 --> 15:14:27,800 What else do we want to know? 9602 15:14:27,800 --> 15:14:31,800 So let's find the length of the train data loader. 9603 15:14:31,800 --> 15:14:37,560 We will go length train data loader. 9604 15:14:37,560 --> 15:14:42,640 So this is going to tell us how many batches there are, batches of, which of course is batch 9605 15:14:42,640 --> 15:14:44,560 size. 9606 15:14:44,560 --> 15:14:51,120 And we want print length of test data loader. 9607 15:14:51,120 --> 15:14:59,720 We want length test data loader batches of batch size dot dot dot. 9608 15:14:59,720 --> 15:15:01,160 So let's find out some information. 9609 15:15:01,160 --> 15:15:02,160 What do we have? 9610 15:15:02,160 --> 15:15:03,760 Oh, there we go. 9611 15:15:03,760 --> 15:15:06,760 So just we're seeing what we saw before with this one. 9612 15:15:06,760 --> 15:15:08,560 But this is more interesting here. 9613 15:15:08,560 --> 15:15:09,560 Length of train data loader. 9614 15:15:09,560 --> 15:15:13,280 Yeah, we have about 1,875 batches of 32. 9615 15:15:13,280 --> 15:15:20,560 So if we do 60,000 training samples divided by 32, yeah, it comes out to 1,875. 9616 15:15:20,560 --> 15:15:26,720 And if we did the same with 10,000 for testing samples of 32, it comes out at 313. 9617 15:15:26,720 --> 15:15:27,720 This gets rounded up. 9618 15:15:27,720 --> 15:15:32,480 So this is what I meant, that the last batch will have maybe not 32 because 32 doesn't 9619 15:15:32,480 --> 15:15:36,160 divide evenly into 10,000, but that's okay. 9620 15:15:36,160 --> 15:15:44,920 And so this means that our model is going to look at 1,875 individual batches of 32 9621 15:15:44,920 --> 15:15:49,800 images, rather than just one big batch of 60,000 images. 9622 15:15:49,800 --> 15:15:55,480 Now of course, the number of batches we have will change if we change the batch size. 9623 15:15:55,480 --> 15:15:58,400 So we have 469 batches of 128. 9624 15:15:58,400 --> 15:16:01,200 And if we reduce this down to one, what do we get? 9625 15:16:01,200 --> 15:16:03,040 We have a batch per sample. 9626 15:16:03,040 --> 15:16:09,040 So 60,000 batches of 1, 10,000 batches of 1, we're going to stick with 32. 9627 15:16:09,040 --> 15:16:10,520 But now let's visualize. 9628 15:16:10,520 --> 15:16:12,520 So we've got them in train data loader. 9629 15:16:12,520 --> 15:16:17,400 How would we visualize a batch or a single image from a batch? 9630 15:16:17,400 --> 15:16:18,840 So let's show a sample. 9631 15:16:18,840 --> 15:16:21,640 I'll show you how you can interact with a data loader. 9632 15:16:21,640 --> 15:16:25,040 We're going to use randomness as well. 9633 15:16:25,040 --> 15:16:31,480 So we'll set a manual seed and then we'll get a random index, random idx equals torch 9634 15:16:31,480 --> 15:16:33,560 rand int. 9635 15:16:33,560 --> 15:16:37,880 We're going to go from zero to length of train features batch. 9636 15:16:37,880 --> 15:16:39,960 Oh, where did I get that from? 9637 15:16:39,960 --> 15:16:40,960 Excuse me. 9638 15:16:40,960 --> 15:16:42,320 Getting ahead of myself here. 9639 15:16:42,320 --> 15:16:48,440 I want to check out what's inside the training data loader. 9640 15:16:48,440 --> 15:16:51,320 We'll check out what's inside the training data loader because the test data load is 9641 15:16:51,320 --> 15:16:53,360 going to be similar. 9642 15:16:53,360 --> 15:16:55,680 So we want the train features batch. 9643 15:16:55,680 --> 15:17:01,120 So I say features as in the images themselves and the train labels batch is going to be 9644 15:17:01,120 --> 15:17:05,740 the labels of our data set or the targets in pytorch terminology. 9645 15:17:05,740 --> 15:17:08,040 So next idar data loader. 9646 15:17:08,040 --> 15:17:16,600 So because our data loader has 1875 batches of 32, we're going to turn it into an iterable 9647 15:17:16,600 --> 15:17:24,840 with ita and we're going to get the next batch with next and then we can go here train features 9648 15:17:24,840 --> 15:17:32,080 batch.shape and we'll get train labels batch.shape. 9649 15:17:32,080 --> 15:17:34,520 What do you think this is going to give us? 9650 15:17:34,520 --> 15:17:35,520 Well, there we go. 9651 15:17:35,520 --> 15:17:36,520 Look at that. 9652 15:17:36,520 --> 15:17:37,520 So we have a tensor. 9653 15:17:37,520 --> 15:17:40,440 Each batch we have 32 samples. 9654 15:17:40,440 --> 15:17:45,680 So this is batch size and this is color channels and this is height and this is width. 9655 15:17:45,680 --> 15:17:49,800 And then we have 32 labels associated with the 32 samples. 9656 15:17:49,800 --> 15:17:56,040 Now where have we seen this before, if we go back through our keynote input and output 9657 15:17:56,040 --> 15:17:57,040 shapes. 9658 15:17:57,040 --> 15:18:00,080 So we have shape equals 32, 28, 28, 1. 9659 15:18:00,080 --> 15:18:06,280 So this is color channels last, but ours is currently in color channels first. 9660 15:18:06,280 --> 15:18:11,520 Now again, I sound like a broken record here, but these will vary depending on the problem 9661 15:18:11,520 --> 15:18:12,720 you're working with. 9662 15:18:12,720 --> 15:18:17,640 If we had larger images, what would change or the height and width dimensions would change. 9663 15:18:17,640 --> 15:18:21,960 If we had color images, the color dimension would change, but the premise is still the 9664 15:18:21,960 --> 15:18:22,960 same. 9665 15:18:22,960 --> 15:18:26,880 We're turning our data into batches so that we can pass that to a model. 9666 15:18:26,880 --> 15:18:27,960 Let's come back. 9667 15:18:27,960 --> 15:18:30,240 Let's keep going with our visualization. 9668 15:18:30,240 --> 15:18:36,720 So we want to visualize one of the random samples from a batch and then we're going to 9669 15:18:36,720 --> 15:18:44,000 go image label equals train features batch and we're going to get the random IDX from 9670 15:18:44,000 --> 15:18:50,920 that and we'll get the train labels batch and we'll get the random IDX from that. 9671 15:18:50,920 --> 15:18:56,920 So we're matching up on the, we've got one batch here, train features batch, train labels 9672 15:18:56,920 --> 15:19:03,040 batch and we're just getting the image and the label at a random index within that batch. 9673 15:19:03,040 --> 15:19:07,600 So excuse me, I need to set this equal there. 9674 15:19:07,600 --> 15:19:13,080 And then we're going to go PLT dot in show, what are we going to show? 9675 15:19:13,080 --> 15:19:16,440 We're going to show the image but we're going to have to squeeze it to remove that singular 9676 15:19:16,440 --> 15:19:23,000 dimension and then we'll set the C map equal to gray and then we'll go PLT dot title, we'll 9677 15:19:23,000 --> 15:19:29,440 set the title which is going to be the class names indexed by the label integer and then 9678 15:19:29,440 --> 15:19:32,240 we can turn off the accesses. 9679 15:19:32,240 --> 15:19:37,960 You can use off here or you can use false, depends on what you'd like to use. 9680 15:19:37,960 --> 15:19:43,480 Let's print out the image size because you can never know enough about your data and 9681 15:19:43,480 --> 15:19:55,920 then print, let's also get the label, label and label shape or label size. 9682 15:19:55,920 --> 15:20:02,960 Our label will be just a single integer so it might not have a shape but that's okay. 9683 15:20:02,960 --> 15:20:03,960 Let's have a look. 9684 15:20:03,960 --> 15:20:04,960 Oh, bag. 9685 15:20:04,960 --> 15:20:07,800 See, look, that's quite hard to understand. 9686 15:20:07,800 --> 15:20:10,200 I wouldn't be able to detect that that's a bag. 9687 15:20:10,200 --> 15:20:12,840 Can you tell me that you could write a program to understand that? 9688 15:20:12,840 --> 15:20:15,640 That just looks like a warped rectangle to me. 9689 15:20:15,640 --> 15:20:19,600 But if we had to look at another one, we'll get another random, oh, we've got a random 9690 15:20:19,600 --> 15:20:23,480 seed so it's going to produce the same image each time. 9691 15:20:23,480 --> 15:20:25,760 So we have a shirt, okay, a shirt. 9692 15:20:25,760 --> 15:20:28,280 So we see the image size there, 128, 28. 9693 15:20:28,280 --> 15:20:34,240 Now, recall that the image size is, it's a single image so it doesn't have a batch dimension. 9694 15:20:34,240 --> 15:20:37,840 So this is just color channels height width. 9695 15:20:37,840 --> 15:20:44,520 We'll go again, label four, which is a coat and we could keep doing this to become more 9696 15:20:44,520 --> 15:20:45,800 and more familiar with our data. 9697 15:20:45,800 --> 15:20:52,680 But these are all from this particular batch that we created here, coat and we'll do one 9698 15:20:52,680 --> 15:20:53,680 more, another coat. 9699 15:20:53,680 --> 15:20:55,600 We'll do one more just to make sure it's not a coat. 9700 15:20:55,600 --> 15:20:56,600 There we go. 9701 15:20:56,600 --> 15:20:57,600 We've got a bag. 9702 15:20:57,600 --> 15:20:58,600 Beautiful. 9703 15:20:58,600 --> 15:21:02,200 So we've now turned our data into data loaders. 9704 15:21:02,200 --> 15:21:07,880 So we could use these to pass them into a model, but we don't have a model. 9705 15:21:07,880 --> 15:21:12,280 So I think it's time in the next video, we start to build model zero. 9706 15:21:12,280 --> 15:21:14,440 We start to build a baseline. 9707 15:21:14,440 --> 15:21:17,840 I'll see you in the next video. 9708 15:21:17,840 --> 15:21:18,840 Welcome back. 9709 15:21:18,840 --> 15:21:24,120 So in the last video, we got our data sets or our data set into data loaders. 9710 15:21:24,120 --> 15:21:31,040 So now we have 1,875 batches of 32 images off of the training data set rather than 60,000 9711 15:21:31,040 --> 15:21:33,040 in a one big data set. 9712 15:21:33,040 --> 15:21:38,960 And we have 13 or 313 batches of 32 for the test data set. 9713 15:21:38,960 --> 15:21:41,760 Then we learned how to visualize it from a batch. 9714 15:21:41,760 --> 15:21:47,280 And we saw that we have still the same image size, one color channel, 28, 28. 9715 15:21:47,280 --> 15:21:52,720 All we've done is we've turned them into batches so that we can pass them to our model. 9716 15:21:52,720 --> 15:21:55,480 And speaking of model, let's have a look at our workflow. 9717 15:21:55,480 --> 15:21:56,480 Where are we up to? 9718 15:21:56,480 --> 15:21:58,120 Well, we've got our data ready. 9719 15:21:58,120 --> 15:22:03,840 We've turned it into tensors through a combination of torch vision transforms, torch utils data 9720 15:22:03,840 --> 15:22:04,840 dot data set. 9721 15:22:04,840 --> 15:22:08,320 We didn't have to use that one because torch vision dot data sets did it for us with the 9722 15:22:08,320 --> 15:22:11,360 fashion MNIST data set, but we did use that one. 9723 15:22:11,360 --> 15:22:18,000 We did torch utils dot data, the data loader to turn our data sets into data loaders. 9724 15:22:18,000 --> 15:22:21,840 Now we're up to building or picking a pre-trained model to suit your problem. 9725 15:22:21,840 --> 15:22:23,720 So let's start simply. 9726 15:22:23,720 --> 15:22:25,720 Let's build a baseline model. 9727 15:22:25,720 --> 15:22:29,640 And this is very exciting because we're going to build our first model, our first computer 9728 15:22:29,640 --> 15:22:33,560 vision model, albeit a baseline, but that's an important step. 9729 15:22:33,560 --> 15:22:35,880 So I'm just going to write down here. 9730 15:22:35,880 --> 15:22:46,520 When starting to build a series of machine learning modeling experiments, it's best practice 9731 15:22:46,520 --> 15:22:48,880 to start with a baseline model. 9732 15:22:48,880 --> 15:22:55,440 I'm going to turn this into markdown. 9733 15:22:55,440 --> 15:22:57,200 A baseline model. 9734 15:22:57,200 --> 15:23:01,520 So a baseline model is a simple model. 9735 15:23:01,520 --> 15:23:12,080 You will try and improve upon with subsequent models, models slash experiments. 9736 15:23:12,080 --> 15:23:22,080 So you start simply, in other words, start simply and add complexity when necessary because 9737 15:23:22,080 --> 15:23:24,240 neural networks are pretty powerful, right? 9738 15:23:24,240 --> 15:23:28,760 And so they have a tendency to almost do too well on our data set. 9739 15:23:28,760 --> 15:23:33,040 That's a concept known as overfitting, which we'll cover a little bit more later. 9740 15:23:33,040 --> 15:23:35,960 But we built a simple model to begin with, a baseline. 9741 15:23:35,960 --> 15:23:40,800 And then our whole goal will be to run experiments, according to the workflow, improve through 9742 15:23:40,800 --> 15:23:41,800 experimentation. 9743 15:23:41,800 --> 15:23:43,000 Again, this is just a guide. 9744 15:23:43,000 --> 15:23:47,200 It's not set in stone, but this is the general pattern of how things go. 9745 15:23:47,200 --> 15:23:51,520 Get data ready, build a model, fit the model, evaluate, improve the model. 9746 15:23:51,520 --> 15:23:54,440 So the first model that we build is generally a baseline. 9747 15:23:54,440 --> 15:23:57,640 And then later on, we want to improve through experimentation. 9748 15:23:57,640 --> 15:23:59,840 So let's start building a baseline. 9749 15:23:59,840 --> 15:24:03,320 But I'm going to introduce to you a new layer that we haven't seen before. 9750 15:24:03,320 --> 15:24:06,040 That is creating a flatten layer. 9751 15:24:06,040 --> 15:24:07,480 Now what is a flatten layer? 9752 15:24:07,480 --> 15:24:11,760 Well, this is best seen when we code it out. 9753 15:24:11,760 --> 15:24:15,800 So let's create a flatten model, which is just going to be nn.flatten. 9754 15:24:15,800 --> 15:24:18,000 And where could we find the documentation for this? 9755 15:24:18,000 --> 15:24:24,720 We go nn flatten, flatten in pytorch, what does it do? 9756 15:24:24,720 --> 15:24:30,000 Flattens a continuous range of dims into a tensor, for use with sequential. 9757 15:24:30,000 --> 15:24:35,200 So there's an example there, but I'd rather, if and doubt, code it out. 9758 15:24:35,200 --> 15:24:36,880 So we'll create the flatten layer. 9759 15:24:36,880 --> 15:24:42,240 And of course, all nn.flatten or nn.modules could be used as a model on their own. 9760 15:24:42,240 --> 15:24:48,480 So we're going to get a single sample. 9761 15:24:48,480 --> 15:24:52,680 So x equals train features batch. 9762 15:24:52,680 --> 15:24:54,320 Let's get the first one, zero. 9763 15:24:54,320 --> 15:24:56,000 What does this look like? 9764 15:24:56,000 --> 15:25:04,240 So it's a tensor, x, maybe we get the shape of it as well, x shape. 9765 15:25:04,240 --> 15:25:05,240 What do we get? 9766 15:25:05,240 --> 15:25:06,240 There we go. 9767 15:25:06,240 --> 15:25:07,580 So that's the shape of x. 9768 15:25:07,580 --> 15:25:09,760 Keep that in mind when we pass it through the flatten layer. 9769 15:25:09,760 --> 15:25:13,600 Do you have an inkling of what flatten might do? 9770 15:25:13,600 --> 15:25:18,120 So our shape to begin with is what, 128, 28. 9771 15:25:18,120 --> 15:25:22,440 Now let's flatten the sample. 9772 15:25:22,440 --> 15:25:28,120 So output equals, we're going to pass it to the flatten model, x. 9773 15:25:28,120 --> 15:25:32,200 So this is going to perform the forward pass internally on the flatten layer. 9774 15:25:32,200 --> 15:25:34,760 So perform forward pass. 9775 15:25:34,760 --> 15:25:37,480 Now let's print out what happened. 9776 15:25:37,480 --> 15:25:49,160 Print, shape before flattening equals x dot shape. 9777 15:25:49,160 --> 15:25:56,640 And we're going to print shape after flattening equals output dot shape. 9778 15:25:56,640 --> 15:26:02,200 So we're just taking the output of the flatten model and printing its shape here. 9779 15:26:02,200 --> 15:26:06,400 Oh, do you notice what happened? 9780 15:26:06,400 --> 15:26:12,200 Well we've gone from 128, 28 to 1784. 9781 15:26:12,200 --> 15:26:16,200 Wow what does the output look like? 9782 15:26:16,200 --> 15:26:17,200 Output. 9783 15:26:17,200 --> 15:26:23,720 Oh, the values are now in all one big vector and if we squeeze that we can remove the extra 9784 15:26:23,720 --> 15:26:24,720 dimension. 9785 15:26:24,720 --> 15:26:27,920 So we've got one big vector of values. 9786 15:26:27,920 --> 15:26:29,840 Now where did this number come from? 9787 15:26:29,840 --> 15:26:33,800 Well, if we take this and this is what shape is it? 9788 15:26:33,800 --> 15:26:34,960 We've got color channels. 9789 15:26:34,960 --> 15:26:35,960 We've got height. 9790 15:26:35,960 --> 15:26:46,840 We've got width and now we've flattened it to be color channels, height, width. 9791 15:26:46,840 --> 15:26:53,240 So we've got one big feature vector because 28 by 28 equals what? 9792 15:26:53,240 --> 15:26:58,400 We've got one value per pixel, 784. 9793 15:26:58,400 --> 15:27:02,400 One value per pixel in our output vector. 9794 15:27:02,400 --> 15:27:06,200 Now where did we see this before? 9795 15:27:06,200 --> 15:27:10,760 If we go back to our keynote, if we have a look at Tesla's takes eight cameras and then 9796 15:27:10,760 --> 15:27:16,560 it turns it into a three dimensional vector space, vector space. 9797 15:27:16,560 --> 15:27:17,960 So that's what we're trying to do here. 9798 15:27:17,960 --> 15:27:22,200 We're trying to encode whatever data we're working with in Tesla's case. 9799 15:27:22,200 --> 15:27:24,200 They have eight cameras. 9800 15:27:24,200 --> 15:27:28,200 Now theirs has more dimensions than ours because they have the time aspect because they're 9801 15:27:28,200 --> 15:27:30,400 dealing with video and they have multiple different camera angles. 9802 15:27:30,400 --> 15:27:32,360 We're just dealing with a single image here. 9803 15:27:32,360 --> 15:27:34,520 But regardless, the concept is the same. 9804 15:27:34,520 --> 15:27:39,440 We're trying to condense information down into a single vector space. 9805 15:27:39,440 --> 15:27:43,280 And so if we come back to here, why might we do this? 9806 15:27:43,280 --> 15:27:48,280 Well, it's because we're going to build a baseline model and we're going to use a linear 9807 15:27:48,280 --> 15:27:50,360 layer as the baseline model. 9808 15:27:50,360 --> 15:27:53,480 And the linear layer can't handle multi dimensional data like this. 9809 15:27:53,480 --> 15:27:56,960 We want it to have a single vector as input. 9810 15:27:56,960 --> 15:28:00,600 Now this will make a lot more sense after we've coded up our model. 9811 15:28:00,600 --> 15:28:11,520 Let's do that from torch import and then we're going to go class, fashion, amnest, model 9812 15:28:11,520 --> 15:28:12,520 V zero. 9813 15:28:12,520 --> 15:28:16,520 We're going to inherit from an end dot module. 9814 15:28:16,520 --> 15:28:19,960 And inside here, we're going to have an init function in the constructor. 9815 15:28:19,960 --> 15:28:22,040 We're going to pass in self. 9816 15:28:22,040 --> 15:28:26,840 We're going to have an input shape, which we'll use a type hint, which will take an integer 9817 15:28:26,840 --> 15:28:31,040 because remember, input shape is very important for machine learning models. 9818 15:28:31,040 --> 15:28:34,600 We're going to define a number of hidden units, which will also be an integer, and then we're 9819 15:28:34,600 --> 15:28:38,280 going to define our output shape, which will be what do you think our output shape will 9820 15:28:38,280 --> 15:28:39,280 be? 9821 15:28:39,280 --> 15:28:41,920 How many classes are we dealing with? 9822 15:28:41,920 --> 15:28:43,320 We're dealing with 10 different classes. 9823 15:28:43,320 --> 15:28:47,560 So our output shape will be, I'll save that for later on. 9824 15:28:47,560 --> 15:28:53,360 I'll let you guess for now, or you might already know, we're going to initialize it. 9825 15:28:53,360 --> 15:28:56,720 And then we're going to create our layer stack. 9826 15:28:56,720 --> 15:29:02,920 self.layer stack equals nn.sequential, recall that sequential, whatever you put inside sequential, 9827 15:29:02,920 --> 15:29:06,600 if data goes through sequential, it's going to go through it layer by layer. 9828 15:29:06,600 --> 15:29:10,120 So let's create our first layer, which is going to be nn.flatten. 9829 15:29:10,120 --> 15:29:15,600 So that means anything that comes into this first layer, what's going to happen to it? 9830 15:29:15,600 --> 15:29:18,720 It's going to flatten its external dimensions here. 9831 15:29:18,720 --> 15:29:22,600 So it's going to flatten these into something like this. 9832 15:29:22,600 --> 15:29:25,800 So we're going to flatten it first, flatten our data. 9833 15:29:25,800 --> 15:29:28,640 Then we're going to pass in our linear layer. 9834 15:29:28,640 --> 15:29:33,840 And we're going to have how many n features this is going to be input shape, because we're 9835 15:29:33,840 --> 15:29:36,360 going to define our input shape here. 9836 15:29:36,360 --> 15:29:43,160 And then we're going to go out features, equals hidden units. 9837 15:29:43,160 --> 15:29:46,040 And then we're going to create another linear layer here. 9838 15:29:46,040 --> 15:29:50,040 And we're going to set up n features, equals hidden units. 9839 15:29:50,040 --> 15:29:51,560 Why are we doing this? 9840 15:29:51,560 --> 15:29:54,280 And then out features equals output shape. 9841 15:29:54,280 --> 15:29:58,840 Why are we putting the same out features here as the n features here? 9842 15:29:58,840 --> 15:30:05,600 Well, because subsequent layers, the input of this layer here, its input shape has to 9843 15:30:05,600 --> 15:30:09,000 line up with the output shape of this layer here. 9844 15:30:09,000 --> 15:30:15,560 Hence why we use out features as hidden units for the output of this nn.linear layer. 9845 15:30:15,560 --> 15:30:22,200 And then we use n features as hidden units for the input value of this hidden layer here. 9846 15:30:22,200 --> 15:30:24,160 So let's keep going. 9847 15:30:24,160 --> 15:30:25,160 Let's go def. 9848 15:30:25,160 --> 15:30:30,040 We'll create the forward pass here, because if we subclass nn.module, we have to override 9849 15:30:30,040 --> 15:30:31,640 the forward method. 9850 15:30:31,640 --> 15:30:33,760 The forward method is going to define what? 9851 15:30:33,760 --> 15:30:38,000 It's going to define the forward computation of our model. 9852 15:30:38,000 --> 15:30:43,560 So we're just going to return self.layer stack of x. 9853 15:30:43,560 --> 15:30:49,280 So our model is going to take some input, x, which could be here, x. 9854 15:30:49,280 --> 15:30:53,400 In our case, it's going to be a batch at a time, and then it's going to pass each sample 9855 15:30:53,400 --> 15:30:54,400 through the flatten layer. 9856 15:30:54,400 --> 15:30:59,080 It's going to pass the output of the flatten layer to this first linear layer, and it's 9857 15:30:59,080 --> 15:31:04,240 going to pass the output of this linear layer to this linear layer. 9858 15:31:04,240 --> 15:31:05,240 So that's it. 9859 15:31:05,240 --> 15:31:08,640 Our model is just two linear layers with a flatten layer. 9860 15:31:08,640 --> 15:31:11,480 The flatten layer has no learnable parameters. 9861 15:31:11,480 --> 15:31:13,680 Only these two do. 9862 15:31:13,680 --> 15:31:16,400 And we have no nonlinearities. 9863 15:31:16,400 --> 15:31:18,920 So do you think this will work? 9864 15:31:18,920 --> 15:31:21,080 Does our data set need nonlinearities? 9865 15:31:21,080 --> 15:31:26,920 Well, we can find out once we fit our model to the data, but let's set up an instance 9866 15:31:26,920 --> 15:31:27,920 of our model. 9867 15:31:27,920 --> 15:31:33,000 So torch dot manual seed. 9868 15:31:33,000 --> 15:31:38,320 Let's go set up model with input parameters. 9869 15:31:38,320 --> 15:31:44,800 So we have model zero equals fashion MNIST model, which is just the same class that we 9870 15:31:44,800 --> 15:31:47,440 wrote above. 9871 15:31:47,440 --> 15:31:53,600 And here's where we're going to define the input shape equals 784. 9872 15:31:53,600 --> 15:31:56,920 Where will I get that from? 9873 15:31:56,920 --> 15:31:59,320 Well, that's here. 9874 15:31:59,320 --> 15:32:00,600 That's 28 by 28. 9875 15:32:00,600 --> 15:32:04,800 So the output of flatten needs to be the input shape here. 9876 15:32:04,800 --> 15:32:10,240 So we could put 28 by 28 there, or we're just going to put 784 and then write a comment 9877 15:32:10,240 --> 15:32:11,240 here. 9878 15:32:11,240 --> 15:32:14,440 This is 28 by 28. 9879 15:32:14,440 --> 15:32:22,400 Now if we go, I wonder if nn.linear will tell us, nn.linear will tell us what it expects 9880 15:32:22,400 --> 15:32:25,560 as in features. 9881 15:32:25,560 --> 15:32:32,760 Size of each input sample, shape, where star means any number of dimensions, including 9882 15:32:32,760 --> 15:32:38,720 none in features, linear weight, well, let's figure it out. 9883 15:32:38,720 --> 15:32:43,120 Let's see what happens if in doubt coded out, hey, we'll see what we can do. 9884 15:32:43,120 --> 15:32:47,240 In units equals, let's go with 10 to begin with. 9885 15:32:47,240 --> 15:32:53,120 How many units in the hidden layer? 9886 15:32:53,120 --> 15:32:57,200 And then the output shape is going to be what? 9887 15:32:57,200 --> 15:33:06,560 Output shape is length of class names, which will be 1 for every class. 9888 15:33:06,560 --> 15:33:07,800 Beautiful. 9889 15:33:07,800 --> 15:33:09,320 And now let's go model zero. 9890 15:33:09,320 --> 15:33:11,760 We're going to keep it on the CPU to begin with. 9891 15:33:11,760 --> 15:33:16,880 We could write device-agnostic code, but to begin, we're going to send it to the CPU. 9892 15:33:16,880 --> 15:33:20,320 I might just put that up here, actually, to CPU. 9893 15:33:20,320 --> 15:33:25,240 And then let's have a look at model zero. 9894 15:33:25,240 --> 15:33:26,800 Wonderful. 9895 15:33:26,800 --> 15:33:29,280 So we can try to do a dummy forward pass and see what happens. 9896 15:33:29,280 --> 15:33:37,600 So let's create dummy x equals torch, rand, we'll create it as the same size of image. 9897 15:33:37,600 --> 15:33:38,600 Just a singular image. 9898 15:33:38,600 --> 15:33:43,920 So this is going to be a batch of one, color channel one, height 28, height 28. 9899 15:33:43,920 --> 15:33:50,080 And we're going to go model zero and pass through dummy x. 9900 15:33:50,080 --> 15:33:54,680 So this is going to send dummy x through the forward method. 9901 15:33:54,680 --> 15:33:56,600 Let's see what happens. 9902 15:33:56,600 --> 15:33:59,000 Okay, wonderful. 9903 15:33:59,000 --> 15:34:06,280 So we get an output of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 logits. 9904 15:34:06,280 --> 15:34:07,280 Beautiful. 9905 15:34:07,280 --> 15:34:08,280 That's exactly what we want. 9906 15:34:08,280 --> 15:34:11,600 We have one logit value per class that we have. 9907 15:34:11,600 --> 15:34:15,760 Now what would happen if we got rid of flatten? 9908 15:34:15,760 --> 15:34:18,960 Then we ran this, ran this, ran this. 9909 15:34:18,960 --> 15:34:20,520 What do we get? 9910 15:34:20,520 --> 15:34:25,560 Oh, mat one and mat two shapes cannot be multiplied. 9911 15:34:25,560 --> 15:34:29,160 So we have 28 by 28 and 7. 9912 15:34:29,160 --> 15:34:34,800 Okay, what happens if we change our input shape to 28? 9913 15:34:34,800 --> 15:34:37,560 We're getting shape mismatches here. 9914 15:34:37,560 --> 15:34:38,560 What happens here? 9915 15:34:38,560 --> 15:34:45,080 Oh, okay, we get an interesting output, but this is still not the right shape, is it? 9916 15:34:45,080 --> 15:34:46,600 So that's where the flatten layer comes in. 9917 15:34:46,600 --> 15:34:48,600 What is the shape of this? 9918 15:34:48,600 --> 15:34:52,440 Oh, we get 1, 1, 28, 10. 9919 15:34:52,440 --> 15:34:58,560 Oh, so that's why we put in flatten so that it combines it into a vector. 9920 15:34:58,560 --> 15:35:01,560 So we get rid of this, see if we just leave it in this shape? 9921 15:35:01,560 --> 15:35:05,680 We get 28 different samples of 10, which is not what we want. 9922 15:35:05,680 --> 15:35:09,200 We want to compress our image into a singular vector and pass it in. 9923 15:35:09,200 --> 15:35:13,240 So let's reinstanceuate the flatten layer and let's make sure we've got the right input 9924 15:35:13,240 --> 15:35:19,080 shape here, 28 by 28, and let's pass it through, torch size 110. 9925 15:35:19,080 --> 15:35:22,960 That's exactly what we want, 1 logit per class. 9926 15:35:22,960 --> 15:35:27,320 So this could be a bit fiddly when you first start, but it's also a lot of fun once you 9927 15:35:27,320 --> 15:35:28,800 get it to work. 9928 15:35:28,800 --> 15:35:32,400 And so just keep that in mind, I showed you what it looks like when you have an error. 9929 15:35:32,400 --> 15:35:37,320 One of the biggest errors that you're going to face in machine learning is different tensor 9930 15:35:37,320 --> 15:35:39,200 shape mismatches. 9931 15:35:39,200 --> 15:35:43,960 So just keep in mind the data that you're working with and then have a look at the documentation 9932 15:35:43,960 --> 15:35:48,000 for what input shape certain layers expect. 9933 15:35:48,000 --> 15:35:51,840 So with that being said, I think it's now time that we start moving towards training 9934 15:35:51,840 --> 15:35:53,360 our model. 9935 15:35:53,360 --> 15:35:56,800 I'll see you in the next video. 9936 15:35:56,800 --> 15:35:57,800 Welcome back. 9937 15:35:57,800 --> 15:36:02,400 In the last video, we created model zero, which is going to be our baseline model for 9938 15:36:02,400 --> 15:36:08,560 our computer vision problem of detecting different types of clothing in 28 by 28 gray scale 9939 15:36:08,560 --> 15:36:09,560 images. 9940 15:36:09,560 --> 15:36:14,800 And we also learned the concept of making sure our or we rehashed on the concept of 9941 15:36:14,800 --> 15:36:19,560 making sure our input and output shapes line up with where they need to be. 9942 15:36:19,560 --> 15:36:22,800 We also did a dummy forward pass with some dummy data. 9943 15:36:22,800 --> 15:36:26,880 This is a great way to troubleshoot to see if your model shapes are correct. 9944 15:36:26,880 --> 15:36:32,000 If they come out correctly and if the inputs are lining up with where they need to be. 9945 15:36:32,000 --> 15:36:37,440 And just to rehash on what our model is going to be or what's inside our model, if we check 9946 15:36:37,440 --> 15:36:44,520 model zero state dict, what we see here is that our first layer has a weight tensor. 9947 15:36:44,520 --> 15:36:51,480 It also has a bias and our next layer has a weight tensor and it also has a bias. 9948 15:36:51,480 --> 15:36:56,840 So these are of course initialized with random values, but the whole premise of deep learning 9949 15:36:56,840 --> 15:37:02,200 and machine learning is to pass data through our model and use our optimizer to update 9950 15:37:02,200 --> 15:37:07,000 these random values to better represent the features in our data. 9951 15:37:07,000 --> 15:37:10,560 And I keep saying features, but I just want to rehash on that before we move on to the 9952 15:37:10,560 --> 15:37:12,040 next thing. 9953 15:37:12,040 --> 15:37:14,400 Featuring data could be almost anything. 9954 15:37:14,400 --> 15:37:17,400 So for example, the feature of this bag could be that it's got a rounded handle at the 9955 15:37:17,400 --> 15:37:18,400 top. 9956 15:37:18,400 --> 15:37:19,400 It has a edge over here. 9957 15:37:19,400 --> 15:37:21,000 It has an edge over there. 9958 15:37:21,000 --> 15:37:25,680 Now, we aren't going to tell our model what features to learn about the data. 9959 15:37:25,680 --> 15:37:29,680 The whole premise of it is to, or the whole fun, the whole magic behind machine learning 9960 15:37:29,680 --> 15:37:33,520 is that it figures out what features to learn. 9961 15:37:33,520 --> 15:37:39,680 And so that is what the weights and bias matrices or tensors will represent is different features 9962 15:37:39,680 --> 15:37:40,840 in our images. 9963 15:37:40,840 --> 15:37:45,960 And there could be many because we have 60,000 images of 10 classes. 9964 15:37:45,960 --> 15:37:46,960 So let's keep pushing forward. 9965 15:37:46,960 --> 15:37:50,960 It's now time to set up a loss function and an optimizer. 9966 15:37:50,960 --> 15:37:58,160 Speaking of optimizers, so 3.1 set up loss optimizer and evaluation metrics. 9967 15:37:58,160 --> 15:38:02,760 Now recall in notebook two, I'm going to turn this into markdown. 9968 15:38:02,760 --> 15:38:06,200 We created, oh, I don't need an emoji there. 9969 15:38:06,200 --> 15:38:10,160 So this is, by the way, we're just moving through this workflow. 9970 15:38:10,160 --> 15:38:11,680 We've got our data ready into tensors. 9971 15:38:11,680 --> 15:38:12,960 We've built a baseline model. 9972 15:38:12,960 --> 15:38:16,240 It's now time to pick a loss function and an optimizer. 9973 15:38:16,240 --> 15:38:20,240 So we go back to Google Chrome. 9974 15:38:20,240 --> 15:38:21,880 That's right here. 9975 15:38:21,880 --> 15:38:22,880 Loss function. 9976 15:38:22,880 --> 15:38:24,520 What's our loss function going to be? 9977 15:38:24,520 --> 15:38:35,280 Since we're working with multi-class data, our loss function will be NN dot cross entropy 9978 15:38:35,280 --> 15:38:37,280 loss. 9979 15:38:37,280 --> 15:38:43,840 And our optimizer, we've got a few options here with the optimizer, but we've had practice 9980 15:38:43,840 --> 15:38:49,480 in the past with SGD, which stands for stochastic gradient descent and the atom optimizer. 9981 15:38:49,480 --> 15:38:56,240 So our optimizer, let's just stick with SGD, which is kind of the entry level optimizer 9982 15:38:56,240 --> 15:39:05,800 torch opt in SGD for stochastic gradient descent. 9983 15:39:05,800 --> 15:39:17,400 And finally, our evaluation metric, since we're working on a classification problem, let's 9984 15:39:17,400 --> 15:39:25,160 use accuracy as our evaluation metric. 9985 15:39:25,160 --> 15:39:28,680 So recall that accuracy is a classification evaluation metric. 9986 15:39:28,680 --> 15:39:30,240 Now, where can we find this? 9987 15:39:30,240 --> 15:39:37,280 Well, if we go into learnpytorch.io, this is the beauty of having online reference material. 9988 15:39:37,280 --> 15:39:42,400 In here, neural network classification with PyTorch, in this notebook, section 02, we 9989 15:39:42,400 --> 15:39:45,960 created, do we have different classification methods? 9990 15:39:45,960 --> 15:39:47,360 Yes, we did. 9991 15:39:47,360 --> 15:39:51,840 So we've got a whole bunch of different options here for classification evaluation metrics. 9992 15:39:51,840 --> 15:39:56,280 We've got accuracy, precision, recall, F1 score, a confusion matrix. 9993 15:39:56,280 --> 15:39:57,960 Now we have some code that we could use. 9994 15:39:57,960 --> 15:40:02,160 If we wanted to use torch metrics for accuracy, we could. 9995 15:40:02,160 --> 15:40:06,960 And torch metrics is a beautiful library that has a lot of evaluation. 9996 15:40:06,960 --> 15:40:09,800 Oh, it doesn't exist. 9997 15:40:09,800 --> 15:40:11,440 What happened to torch metrics? 9998 15:40:11,440 --> 15:40:13,440 Maybe I need to fix that. 9999 15:40:13,440 --> 15:40:15,440 Link. 10000 15:40:15,440 --> 15:40:20,800 Torch metrics has a whole bunch of different PyTorch metrics. 10001 15:40:20,800 --> 15:40:23,400 So very useful library. 10002 15:40:23,400 --> 15:40:29,880 But we also coded a function in here, which is accuracy FN. 10003 15:40:29,880 --> 15:40:35,040 So we could copy this, straight into our notebook here. 10004 15:40:35,040 --> 15:40:40,200 Or I've also, if we go to the PyTorch deep learning GitHub, I'll just bring it over here. 10005 15:40:40,200 --> 15:40:43,440 I've also put it in helper functions.py. 10006 15:40:43,440 --> 15:40:48,140 And this is a script of common functions that we've used throughout the course, including 10007 15:40:48,140 --> 15:40:51,000 if we find accuracy function here. 10008 15:40:51,000 --> 15:40:52,000 Calculate accuracy. 10009 15:40:52,000 --> 15:40:58,280 Now, how would we get this helper functions file, this Python file, into our notebook? 10010 15:40:58,280 --> 15:41:01,560 One way is to just copy the code itself, straight here. 10011 15:41:01,560 --> 15:41:04,080 But let's import it as a Python script. 10012 15:41:04,080 --> 15:41:09,200 So import request, and we're going to go from pathlib import path. 10013 15:41:09,200 --> 15:41:14,880 So we want to download, and this is actually what you're going to see, very common practice 10014 15:41:14,880 --> 15:41:20,000 in larger Python projects, especially deep learning and machine learning projects, is 10015 15:41:20,000 --> 15:41:24,040 different functionality split up in different Python files. 10016 15:41:24,040 --> 15:41:27,440 And that way, you don't have to keep rewriting the same code over and over again. 10017 15:41:27,440 --> 15:41:30,520 Like you know how we've written a training and testing loop a fair few times? 10018 15:41:30,520 --> 15:41:35,640 Well, if we've written it once and it works, we might want to save that to a.py file so 10019 15:41:35,640 --> 15:41:37,560 we can import it later on. 10020 15:41:37,560 --> 15:41:42,360 So let's now write some code to import this helper functions.py file into our notebook 10021 15:41:42,360 --> 15:41:43,360 here. 10022 15:41:43,360 --> 15:41:49,720 So download helper functions from learn pytorch repo. 10023 15:41:49,720 --> 15:41:57,720 So we're going to check if our helper functions.py, if this already exists, we don't want 10024 15:41:57,720 --> 15:41:59,000 to download it. 10025 15:41:59,000 --> 15:42:07,960 So we'll print helper functions.py already exists, skipping download, skipping download 10026 15:42:07,960 --> 15:42:08,960 .dot. 10027 15:42:08,960 --> 15:42:11,520 And we're going to go else here. 10028 15:42:11,520 --> 15:42:20,560 If it doesn't exist, so we're going to download it, downloading helper functions.py. 10029 15:42:20,560 --> 15:42:27,720 And we're going to create a request here with the request library equals request.get. 10030 15:42:27,720 --> 15:42:31,400 Now here's where we have to pass in the URL of this file. 10031 15:42:31,400 --> 15:42:33,240 It's not this URL here. 10032 15:42:33,240 --> 15:42:38,360 When dealing with GitHub, to get the actual URL to the files, many files, you have to 10033 15:42:38,360 --> 15:42:39,600 click the raw button. 10034 15:42:39,600 --> 15:42:43,440 So I'll just go back and show you, click raw here. 10035 15:42:43,440 --> 15:42:45,160 And we're going to copy this raw URL. 10036 15:42:45,160 --> 15:42:47,160 See how it's just text here? 10037 15:42:47,160 --> 15:42:50,920 This is what we want to download into our co-lab notebook. 10038 15:42:50,920 --> 15:42:54,520 And we're going to write it in there, request equals request.get. 10039 15:42:54,520 --> 15:42:59,760 And we're going to go with open, and here's where we're going to save our helper functions 10040 15:42:59,760 --> 15:43:01,240 .py. 10041 15:43:01,240 --> 15:43:05,920 We're going to write binary as file, F is for file. 10042 15:43:05,920 --> 15:43:10,360 We're going to go F.write, request.content. 10043 15:43:10,360 --> 15:43:16,280 So what this is saying is Python is going to create a file called helper functions.py 10044 15:43:16,280 --> 15:43:22,160 and give it write binary permissions as F, F is for file, short for file. 10045 15:43:22,160 --> 15:43:28,240 And then we're going to say F.write, request, get that information from helper functions 10046 15:43:28,240 --> 15:43:33,600 .py here, and write your content to this file here. 10047 15:43:33,600 --> 15:43:39,440 So let's give that a shot. 10048 15:43:39,440 --> 15:43:41,840 Beautiful. 10049 15:43:41,840 --> 15:43:45,320 So downloading helper functions.py, let's have a look in here. 10050 15:43:45,320 --> 15:43:47,520 Do we have helper functions.py? 10051 15:43:47,520 --> 15:43:49,720 Yes, we do. 10052 15:43:49,720 --> 15:43:50,720 Wonderful. 10053 15:43:50,720 --> 15:43:53,800 We can import our accuracy function. 10054 15:43:53,800 --> 15:43:54,800 Where is it? 10055 15:43:54,800 --> 15:43:55,800 There we go. 10056 15:43:55,800 --> 15:43:57,760 Import accuracy function. 10057 15:43:57,760 --> 15:44:02,480 So this is very common practice when writing lots of Python code is to put helper functions 10058 15:44:02,480 --> 15:44:04,600 into.py scripts. 10059 15:44:04,600 --> 15:44:10,760 So let's import the accuracy metric. 10060 15:44:10,760 --> 15:44:13,360 Accuracy metric from helper functions. 10061 15:44:13,360 --> 15:44:15,480 Of course, we could have used torch metrics as well. 10062 15:44:15,480 --> 15:44:19,600 That's another perfectly valid option, but I just thought I'd show you what it's like 10063 15:44:19,600 --> 15:44:23,800 to import your own helper function script. 10064 15:44:23,800 --> 15:44:27,760 Of course, you can customize helper functions.py to have whatever you want in there. 10065 15:44:27,760 --> 15:44:28,760 So see this? 10066 15:44:28,760 --> 15:44:32,640 We've got from helper functions, import accuracy function. 10067 15:44:32,640 --> 15:44:33,640 What's this saying? 10068 15:44:33,640 --> 15:44:34,640 Could not be resolved. 10069 15:44:34,640 --> 15:44:37,200 Is this going to work? 10070 15:44:37,200 --> 15:44:38,280 It did. 10071 15:44:38,280 --> 15:44:43,760 And where you can go accuracy function, do we get a doc string? 10072 15:44:43,760 --> 15:44:45,560 Hmm. 10073 15:44:45,560 --> 15:44:47,720 Seems like colab isn't picking things up, but that's all right. 10074 15:44:47,720 --> 15:44:48,720 It looks like it still worked. 10075 15:44:48,720 --> 15:44:52,040 We'll find out later on if it actually works when we train our model. 10076 15:44:52,040 --> 15:44:58,000 So set up loss function and optimizer. 10077 15:44:58,000 --> 15:45:04,640 So I'm going to set up the loss function equals nn dot cross entropy loss. 10078 15:45:04,640 --> 15:45:10,000 And I'm going to set up the optimizer here as we discussed before as torch dot opt-in 10079 15:45:10,000 --> 15:45:13,120 dot SGD for stochastic gradient descent. 10080 15:45:13,120 --> 15:45:18,480 The parameters I want to optimize are the parameters from model zero, our baseline model, 10081 15:45:18,480 --> 15:45:21,440 which we had a look at before, which are all these random numbers. 10082 15:45:21,440 --> 15:45:25,360 We'd like our optimizer to tweak them in some way, shape, or form to better represent our 10083 15:45:25,360 --> 15:45:26,680 data. 10084 15:45:26,680 --> 15:45:28,680 And then I'm going to set the learning rate here. 10085 15:45:28,680 --> 15:45:30,520 How much should they be tweaked each epoch? 10086 15:45:30,520 --> 15:45:32,880 I'm going to set it to 0.1. 10087 15:45:32,880 --> 15:45:36,080 Nice and high because our data set is quite simple. 10088 15:45:36,080 --> 15:45:37,720 It's 28 by 28 images. 10089 15:45:37,720 --> 15:45:39,240 There are 60,000 of them. 10090 15:45:39,240 --> 15:45:44,320 But again, if this doesn't work, we can always adjust this and experiment, experiment, 10091 15:45:44,320 --> 15:45:45,320 experiment. 10092 15:45:45,320 --> 15:45:46,320 So let's run that. 10093 15:45:46,320 --> 15:45:47,320 We've got a loss function. 10094 15:45:47,320 --> 15:45:49,680 Is this going to give me a doc string? 10095 15:45:49,680 --> 15:45:50,880 There we go. 10096 15:45:50,880 --> 15:45:54,040 So calculates accuracy between truth and predictions. 10097 15:45:54,040 --> 15:45:55,880 Now, where does this doc string come from? 10098 15:45:55,880 --> 15:45:59,960 Well, let's have a look, hope of functions. 10099 15:45:59,960 --> 15:46:02,520 That's what we wrote before. 10100 15:46:02,520 --> 15:46:06,880 Good on us for writing good doc strings, accuracy function. 10101 15:46:06,880 --> 15:46:12,200 Well, we're going to test all these out in the next video when we write a training loop. 10102 15:46:12,200 --> 15:46:18,600 So, oh, actually, I think we might do one more function before we write a training loop. 10103 15:46:18,600 --> 15:46:21,440 How about we create a function to time our experiments? 10104 15:46:21,440 --> 15:46:24,080 Yeah, let's give that a go in the next video. 10105 15:46:24,080 --> 15:46:27,200 I'll see you there. 10106 15:46:27,200 --> 15:46:28,200 Welcome back. 10107 15:46:28,200 --> 15:46:32,920 In the last video, we downloaded our helper functions.py script and imported our accuracy 10108 15:46:32,920 --> 15:46:35,920 function that we made in notebook two. 10109 15:46:35,920 --> 15:46:40,080 But we could really beef this up, our helper functions.py file. 10110 15:46:40,080 --> 15:46:43,080 We could put a lot of different helper functions in there and import them so we didn't have 10111 15:46:43,080 --> 15:46:44,080 to rewrite them. 10112 15:46:44,080 --> 15:46:46,560 That's just something to keep in mind for later on. 10113 15:46:46,560 --> 15:46:51,720 But now, let's create a function to time our experiments. 10114 15:46:51,720 --> 15:46:55,040 So creating a function to time our experiments. 10115 15:46:55,040 --> 15:47:00,520 So one of the things about machine learning is that it's very experimental. 10116 15:47:00,520 --> 15:47:03,160 You've probably gathered that so far. 10117 15:47:03,160 --> 15:47:04,640 So let's write here. 10118 15:47:04,640 --> 15:47:10,120 So machine learning is very experimental. 10119 15:47:10,120 --> 15:47:18,800 Two of the main things you'll often want to track are, one, your model's performance 10120 15:47:18,800 --> 15:47:24,800 such as its loss and accuracy values, et cetera. 10121 15:47:24,800 --> 15:47:29,200 And two, how fast it runs. 10122 15:47:29,200 --> 15:47:36,000 So usually you want a higher performance and a fast model, that's the ideal scenario. 10123 15:47:36,000 --> 15:47:40,400 However, you could imagine that if you increase your model's performance, you might have 10124 15:47:40,400 --> 15:47:41,680 a bigger neural network. 10125 15:47:41,680 --> 15:47:43,280 It might have more layers. 10126 15:47:43,280 --> 15:47:45,760 It might have more hidden units. 10127 15:47:45,760 --> 15:47:49,760 It might degrade how fast it runs because you're simply making more calculations. 10128 15:47:49,760 --> 15:47:52,440 So there's often a trade-off between these two. 10129 15:47:52,440 --> 15:47:58,120 And how fast it runs will really be important if you're running a model, say, on the internet 10130 15:47:58,120 --> 15:48:02,360 or say on a dedicated GPU or say on a mobile device. 10131 15:48:02,360 --> 15:48:05,280 So these are two things to really keep in mind. 10132 15:48:05,280 --> 15:48:10,080 So because we're tracking our model's performance with our loss value and our accuracy function, 10133 15:48:10,080 --> 15:48:14,240 let's now write some code to check how fast it runs. 10134 15:48:14,240 --> 15:48:18,520 And I did on purpose above, I kept our model on the CPU. 10135 15:48:18,520 --> 15:48:23,240 So we're also going to compare later on how fast our model runs on the CPU versus how 10136 15:48:23,240 --> 15:48:25,760 fast it runs on the GPU. 10137 15:48:25,760 --> 15:48:28,200 So that's something that's coming up. 10138 15:48:28,200 --> 15:48:30,040 Let's write a function here. 10139 15:48:30,040 --> 15:48:32,680 We're going to use the time module from Python. 10140 15:48:32,680 --> 15:48:37,800 So from time it, import the default timer, as I'm going to call it timer. 10141 15:48:37,800 --> 15:48:44,880 So if we go Python default timer, do we get the documentation for, here we go, time it. 10142 15:48:44,880 --> 15:48:49,640 So do we have default timer, wonderful. 10143 15:48:49,640 --> 15:48:55,240 So the default timer, which is always time.perf counter, you can read more about Python timing 10144 15:48:55,240 --> 15:48:57,040 functions in here. 10145 15:48:57,040 --> 15:49:01,040 But this is essentially just going to say, hey, this is the exact time that our code 10146 15:49:01,040 --> 15:49:02,040 started. 10147 15:49:02,040 --> 15:49:05,800 And then we're going to create another stop for when our code stopped. 10148 15:49:05,800 --> 15:49:07,640 And then we're going to compare the start and stop times. 10149 15:49:07,640 --> 15:49:11,360 And that's going to basically be how long our model took to train. 10150 15:49:11,360 --> 15:49:16,080 So we're going to go def print train time. 10151 15:49:16,080 --> 15:49:18,320 This is just going to be a display function. 10152 15:49:18,320 --> 15:49:24,480 So start, we're going to get the float type hint, by the way, start an end time. 10153 15:49:24,480 --> 15:49:29,480 So the essence of this function will be to compare start and end time. 10154 15:49:29,480 --> 15:49:36,600 And we're going to set the torch or the device here, we'll pass this in as torch dot device. 10155 15:49:36,600 --> 15:49:40,480 And we're going to set that default to none, because we want to compare how fast our model 10156 15:49:40,480 --> 15:49:42,560 runs on different devices. 10157 15:49:42,560 --> 15:49:49,000 So I'm just going to write a little doc string here, prints, difference between start and 10158 15:49:49,000 --> 15:49:51,000 end time. 10159 15:49:51,000 --> 15:49:54,680 And then of course, we could add more there for the arguments, but that's a quick one liner. 10160 15:49:54,680 --> 15:49:56,380 Tell us what our function does. 10161 15:49:56,380 --> 15:49:59,560 So total time equals end minus start. 10162 15:49:59,560 --> 15:50:05,560 And then print, we're going to write here train time on, whichever device we're using 10163 15:50:05,560 --> 15:50:08,680 might be CPU, might be GPU. 10164 15:50:08,680 --> 15:50:16,960 Total time equals, we'll go to three and we'll say seconds, three decimal places that is 10165 15:50:16,960 --> 15:50:20,840 and return total time. 10166 15:50:20,840 --> 15:50:21,840 Beautiful. 10167 15:50:21,840 --> 15:50:32,880 So for example, we could do start time equals timer, and then end time equals timer. 10168 15:50:32,880 --> 15:50:37,640 And then we can put in here some code between those two. 10169 15:50:37,640 --> 15:50:44,200 And then if we go print train, oh, maybe we need a timer like this, we'll find out if 10170 15:50:44,200 --> 15:50:48,560 and out code it out, you know, we'll see if it works. 10171 15:50:48,560 --> 15:50:57,400 Start time and end equals end time and device equals. 10172 15:50:57,400 --> 15:51:04,400 We're running on the CPU right now, CPU, let's see if this works, wonderful. 10173 15:51:04,400 --> 15:51:07,640 So it's a very small number here. 10174 15:51:07,640 --> 15:51:13,880 So train time on CPU, very small number, because the start time is basically on this 10175 15:51:13,880 --> 15:51:19,760 exact line, comment basically it takes no time to run, then end time is on here, we get 10176 15:51:19,760 --> 15:51:25,600 3.304 times 10 to the power of negative five. 10177 15:51:25,600 --> 15:51:30,120 So quite a small number, but if we put some modeling code in here, it's going to measure 10178 15:51:30,120 --> 15:51:35,360 the start time of this cell, it's going to model our code in there, then we have the 10179 15:51:35,360 --> 15:51:39,240 end time, and then we find out how long our model took the train. 10180 15:51:39,240 --> 15:51:44,160 So with that being said, I think we've got all of the pieces of the puzzle for creating 10181 15:51:44,160 --> 15:51:47,120 some training and testing functions. 10182 15:51:47,120 --> 15:51:50,200 So we've got a loss function, we've got an optimizer, we've got a valuation metric, we've 10183 15:51:50,200 --> 15:51:55,280 got a timing function, we've got a model, we've got some data. 10184 15:51:55,280 --> 15:51:59,320 How about we train our first baseline computer vision model in the next video? 10185 15:51:59,320 --> 15:52:01,880 I'll see you there. 10186 15:52:01,880 --> 15:52:04,080 Good morning. 10187 15:52:04,080 --> 15:52:06,800 Well might not be morning wherever you are in the world. 10188 15:52:06,800 --> 15:52:10,880 It's nice and early here, I'm up recording some videos, because we have a lot of momentum 10189 15:52:10,880 --> 15:52:14,560 going with this, but look at this, I took a little break last night, I have a runtime 10190 15:52:14,560 --> 15:52:19,480 disconnected, but this is just what's going to happen if you're using Google Colab. 10191 15:52:19,480 --> 15:52:24,320 Since I use Google Colab Pro, completely unnecessary for the course, but I just found it worth 10192 15:52:24,320 --> 15:52:30,080 it for how much I use Google Colab, I get longer idle timeouts, so that means that my 10193 15:52:30,080 --> 15:52:33,440 Colab notebook will stay persistent for a longer time. 10194 15:52:33,440 --> 15:52:38,760 But of course overnight it's going to disconnect, so I click reconnect, and then if I want to 10195 15:52:38,760 --> 15:52:45,640 get back to wherever we were, because we downloaded some data from torchvision.datasets, I have 10196 15:52:45,640 --> 15:52:47,800 to rerun all of these cells. 10197 15:52:47,800 --> 15:52:53,680 So a nice shortcut, we might have seen this before, is to just come down to where we were, 10198 15:52:53,680 --> 15:52:58,560 and if all the code above works, oh there we go, I wrote myself some notes of where we're 10199 15:52:58,560 --> 15:53:01,120 up to. 10200 15:53:01,120 --> 15:53:05,400 Let's go run before, so this is just going to run all the cells above, and we're up 10201 15:53:05,400 --> 15:53:11,640 to here, 3.3 creating a training loop, and training a model on batches of data. 10202 15:53:11,640 --> 15:53:15,880 So that's going to be a little bit interesting, and I wrote myself another reminder here, this 10203 15:53:15,880 --> 15:53:20,400 is a little bit of behind the scenes, the optimise will update a model's parameters 10204 15:53:20,400 --> 15:53:23,480 once per batch rather than once per epoch. 10205 15:53:23,480 --> 15:53:28,360 So let's hold myself to that note, and make sure I let you know. 10206 15:53:28,360 --> 15:53:32,040 So we're going to make another title here. 10207 15:53:32,040 --> 15:53:40,480 Let's go creating a training loop, and training a model on batches of data. 10208 15:53:40,480 --> 15:53:44,440 So something a little bit different to what we may have seen before if we haven't created 10209 15:53:44,440 --> 15:53:52,960 batches of data using data loader, and recall that just up above here, we've got something 10210 15:53:52,960 --> 15:53:55,360 like 1800 there, there we go. 10211 15:53:55,360 --> 15:54:00,400 So we've split our data into batches, rather than our model looking at 60,000 images of 10212 15:54:00,400 --> 15:54:07,440 fashion MNIST data at one time, it's going to look at 1875 batches of 32, so 32 images 10213 15:54:07,440 --> 15:54:14,400 at the time, of the training data set, and 313 batches of 32 of the test data set. 10214 15:54:14,400 --> 15:54:19,400 So let's go to training loop and train our first model. 10215 15:54:19,400 --> 15:54:23,960 So I'm going to write out a few steps actually, because we have to do a little bit differently 10216 15:54:23,960 --> 15:54:25,480 to what we've done before. 10217 15:54:25,480 --> 15:54:31,080 So one, we want to loop through epochs, so a number of epochs. 10218 15:54:31,080 --> 15:54:35,280 Loop through training batches, and by the way, you might be able to hear some birds singing, 10219 15:54:35,280 --> 15:54:39,080 the sun is about to rise, I hope you enjoy them as much as I do. 10220 15:54:39,080 --> 15:54:45,840 So we're going to perform training steps, and we're going to calculate calculate the 10221 15:54:45,840 --> 15:54:49,400 train loss per batch. 10222 15:54:49,400 --> 15:54:54,680 So this is going to be one of the differences between our previous training loops. 10223 15:54:54,680 --> 15:54:59,280 And this is going to, after number two, we're going to loop through the testing batches. 10224 15:54:59,280 --> 15:55:04,840 So we'll train and evaluate our model at the same step, or same loop. 10225 15:55:04,840 --> 15:55:08,120 And we're going to perform testing steps. 10226 15:55:08,120 --> 15:55:17,080 And then we're going to calculate the test loss per batch as well, per batch. 10227 15:55:17,080 --> 15:55:23,680 Wonderful, four, we're going to, of course, print out what's happening. 10228 15:55:23,680 --> 15:55:28,360 You may have seen the unofficial PyTorch optimization loop theme song. 10229 15:55:28,360 --> 15:55:33,200 And we're going to time it all for fun, of course, because that's what our timing function 10230 15:55:33,200 --> 15:55:34,760 is for. 10231 15:55:34,760 --> 15:55:36,160 So let's get started. 10232 15:55:36,160 --> 15:55:40,000 There's a fair few steps here, but nothing that we can't handle. 10233 15:55:40,000 --> 15:55:43,640 And remember the motto, if and out, code it out. 10234 15:55:43,640 --> 15:55:46,680 Well, there's another one, if and out, run the code, but we haven't written any code to 10235 15:55:46,680 --> 15:55:48,400 run just yet. 10236 15:55:48,400 --> 15:55:52,400 So we're going to import TQDM for a progress bar. 10237 15:55:52,400 --> 15:55:57,600 If you haven't seen TQDM before, it's a very good Python progress bar that you can add 10238 15:55:57,600 --> 15:55:59,960 with a few lines of code. 10239 15:55:59,960 --> 15:56:00,960 So this is just the GitHub. 10240 15:56:00,960 --> 15:56:05,200 It's open source software, one of my favorite pieces of software, and it's going to give 10241 15:56:05,200 --> 15:56:12,160 us a progress bar to let us know how many epochs our training loop has gone through. 10242 15:56:12,160 --> 15:56:15,640 It doesn't have much overhead, but if you want to learn more about it, please refer 10243 15:56:15,640 --> 15:56:17,640 to the TQDM GitHub. 10244 15:56:17,640 --> 15:56:23,680 However, the beautiful thing is that Google CoLab has TQDM built in because it's so good 10245 15:56:23,680 --> 15:56:25,320 and so popular. 10246 15:56:25,320 --> 15:56:28,800 So we're going to import from TQDM.auto. 10247 15:56:28,800 --> 15:56:34,960 So there's a few different types of TQDM progress bars.auto is just going to recognize what 10248 15:56:34,960 --> 15:56:37,040 compute environment we're using. 10249 15:56:37,040 --> 15:56:41,120 And it's going to give us the best type of progress bar for what we're doing. 10250 15:56:41,120 --> 15:56:45,520 So for example, Google CoLab is running a Jupyter Notebook behind the scenes. 10251 15:56:45,520 --> 15:56:52,520 So the progress bar for Jupyter Notebooks is a little bit different to Python scripts. 10252 15:56:52,520 --> 15:56:58,800 So now let's set the seed and start the timer. 10253 15:56:58,800 --> 15:57:03,280 We want to write all of our training loop in this single cell here. 10254 15:57:03,280 --> 15:57:08,400 And then once it starts, once we run this cell, we want the timer to start so that we 10255 15:57:08,400 --> 15:57:12,800 can time how long the entire cell takes to run. 10256 15:57:12,800 --> 15:57:24,920 So we'll go train time start on CPU equals, we set up our timer before, beautiful. 10257 15:57:24,920 --> 15:57:28,480 Now we're going to set the number of epochs. 10258 15:57:28,480 --> 15:57:34,200 Now we're going to keep this small for faster training time so we can run more experiments. 10259 15:57:34,200 --> 15:57:39,720 So we'll keep this small for faster training time. 10260 15:57:39,720 --> 15:57:41,160 That's another little tidbit. 10261 15:57:41,160 --> 15:57:43,840 Do you notice how quickly all of the cells ran above? 10262 15:57:43,840 --> 15:57:48,600 Well, that's because we're using a relatively small data set. 10263 15:57:48,600 --> 15:57:51,960 In the beginning, when you're running experiments, you want them to run quite quickly so that 10264 15:57:51,960 --> 15:57:53,480 you can run them more often. 10265 15:57:53,480 --> 15:57:57,760 So you can learn more about your data so that you can try different things, try different 10266 15:57:57,760 --> 15:57:59,080 models. 10267 15:57:59,080 --> 15:58:02,520 So this is why we're using number of epochs equals three. 10268 15:58:02,520 --> 15:58:07,080 We start with three so that our experiment runs in 30 seconds or a minute or so. 10269 15:58:07,080 --> 15:58:11,200 That way, if something doesn't work, we haven't wasted so much time waiting for a model to 10270 15:58:11,200 --> 15:58:12,920 train. 10271 15:58:12,920 --> 15:58:16,160 Later on, we could train it for 100 epochs if we wanted to. 10272 15:58:16,160 --> 15:58:18,680 So we're going to create a training and test loop. 10273 15:58:18,680 --> 15:58:25,160 So for epoch in TQDM range epochs, let's get this going. 10274 15:58:25,160 --> 15:58:31,960 So for TQDM to work, we just wrap our iterator with TQDM and you'll see later on how this 10275 15:58:31,960 --> 15:58:32,960 tracks the progress. 10276 15:58:32,960 --> 15:58:36,360 So I'm going to put out a little print statement here. 10277 15:58:36,360 --> 15:58:39,080 We'll go epoch. 10278 15:58:39,080 --> 15:58:41,720 This is just going to say what epoch we're on. 10279 15:58:41,720 --> 15:58:44,240 We'll go here. 10280 15:58:44,240 --> 15:58:48,800 That's something that I like to do quite often is put little print statements here and there 10281 15:58:48,800 --> 15:58:51,160 so that we know what's going on. 10282 15:58:51,160 --> 15:58:53,000 So let's set up the training. 10283 15:58:53,000 --> 15:58:55,320 We're going to have to instantiate the train loss. 10284 15:58:55,320 --> 15:58:57,960 We're going to set that to zero to begin with. 10285 15:58:57,960 --> 15:59:03,060 And we're going to cumulatively add some values to the train loss here and then we'll 10286 15:59:03,060 --> 15:59:09,160 see later on how this accumulates and we can calculate the training loss per batch. 10287 15:59:09,160 --> 15:59:12,720 Let's what we're doing up here, calculate the train loss per batch. 10288 15:59:12,720 --> 15:59:17,320 And then finally, at the end of the loop, we will divide our training loss by the number 10289 15:59:17,320 --> 15:59:22,720 of batches so we can get the average training loss per batch and that will give us the training 10290 15:59:22,720 --> 15:59:24,960 loss per epoch. 10291 15:59:24,960 --> 15:59:25,960 Now that's a lot of talking. 10292 15:59:25,960 --> 15:59:27,760 If that doesn't make sense, remember. 10293 15:59:27,760 --> 15:59:29,880 But if and out, code it out. 10294 15:59:29,880 --> 15:59:34,680 So add a loop to loop through the training batches. 10295 15:59:34,680 --> 15:59:40,880 So because our data is batchified now and I've got a crow or maybe a cooker bar sitting 10296 15:59:40,880 --> 15:59:47,800 on the roof across from my apartment, it's singing its song this morning, lovely. 10297 15:59:47,800 --> 15:59:51,320 So we're going to loop through our training batch data. 10298 15:59:51,320 --> 15:59:57,080 So I've got four batch, comma x, y, because remember our training batches come in the 10299 15:59:57,080 --> 15:59:59,000 form of X. 10300 15:59:59,000 --> 16:00:02,880 So that's our data or our images and why, which is label. 10301 16:00:02,880 --> 16:00:09,800 You could call this image label or target as part of which would, but it's convention 10302 16:00:09,800 --> 16:00:13,440 to often call your features X and your labels Y. 10303 16:00:13,440 --> 16:00:20,160 We've seen this before in we're going to enumerate the train data loader as well. 10304 16:00:20,160 --> 16:00:23,760 We do this so we can keep track of the number of batches we've been through. 10305 16:00:23,760 --> 16:00:25,720 So that will give us batch there. 10306 16:00:25,720 --> 16:00:31,160 I'm going to set model zero to training mode because even though that's the default, we 10307 16:00:31,160 --> 16:00:34,240 just want to make sure that it's in training mode. 10308 16:00:34,240 --> 16:00:35,600 Now we're going to do the forward pass. 10309 16:00:35,600 --> 16:00:39,560 If you remember, what are the steps in apply to our optimization loop? 10310 16:00:39,560 --> 16:00:40,960 We do the forward pass. 10311 16:00:40,960 --> 16:00:47,720 We calculate the loss of the minus zero grad, last backwards, up to minus a step, step, 10312 16:00:47,720 --> 16:00:48,720 step. 10313 16:00:48,720 --> 16:00:49,720 So let's do that. 10314 16:00:49,720 --> 16:00:54,440 Hey, model zero, we'll put the features through there and then we're going to calculate the 10315 16:00:54,440 --> 16:00:57,200 loss. 10316 16:00:57,200 --> 16:00:58,560 We've been through these steps before. 10317 16:00:58,560 --> 16:01:03,600 So we're not going to spend too much time on the exact steps here, but we're just going 10318 16:01:03,600 --> 16:01:04,880 to practice writing them out. 10319 16:01:04,880 --> 16:01:08,320 And of course, later on, you might be thinking, then you'll, how come we haven't functionalized 10320 16:01:08,320 --> 16:01:09,720 this training loop already? 10321 16:01:09,720 --> 16:01:12,920 We've seemed to write the same generic code over and over again. 10322 16:01:12,920 --> 16:01:17,120 Well, that's because we like to practice writing PyTorch code, right? 10323 16:01:17,120 --> 16:01:18,480 We're going to functionalize them later on. 10324 16:01:18,480 --> 16:01:20,040 Don't you worry about that. 10325 16:01:20,040 --> 16:01:24,400 So here's another little step that we haven't done before is we have the training loss. 10326 16:01:24,400 --> 16:01:28,920 And so because we've set that to zero to begin with, we're going to accumulate the training 10327 16:01:28,920 --> 16:01:31,520 loss values every batch. 10328 16:01:31,520 --> 16:01:33,680 So we're going to just add it up here. 10329 16:01:33,680 --> 16:01:37,520 And then later on, we're going to divide it by the total number of batches to get the 10330 16:01:37,520 --> 16:01:39,920 average loss per batch. 10331 16:01:39,920 --> 16:01:45,600 So you see how this loss calculation is within the batch loop here? 10332 16:01:45,600 --> 16:01:49,760 So this means that one batch of data is going to go through the model. 10333 16:01:49,760 --> 16:01:53,440 And then we're going to calculate the loss on one batch of data. 10334 16:01:53,440 --> 16:01:57,960 And this loop is going to continue until it's been through all of the batches in the train 10335 16:01:57,960 --> 16:01:59,480 data loader. 10336 16:01:59,480 --> 16:02:03,160 So 1875 steps or whatever there was. 10337 16:02:03,160 --> 16:02:07,620 So accumulate train loss. 10338 16:02:07,620 --> 16:02:15,720 And then we're going to optimize a zero grad, optimizer dot zero grad. 10339 16:02:15,720 --> 16:02:18,400 And then number four is what? 10340 16:02:18,400 --> 16:02:20,320 Loss backward. 10341 16:02:20,320 --> 16:02:21,320 Loss backward. 10342 16:02:21,320 --> 16:02:22,920 We'll do the back propagation step. 10343 16:02:22,920 --> 16:02:27,540 And then finally, we've got number five, which is optimizer step. 10344 16:02:27,540 --> 16:02:35,000 So this is where I left my little note above to remind me and to also let you know, highlight 10345 16:02:35,000 --> 16:02:40,640 that the optimizer will update a model's parameters once per batch rather than once per epoch. 10346 16:02:40,640 --> 16:02:45,160 So you see how we've got a for loop inside our epoch loop here. 10347 16:02:45,160 --> 16:02:46,960 So the batch loop. 10348 16:02:46,960 --> 16:02:50,880 So this is what I meant that the optimizer, this is one of the advantages of using mini 10349 16:02:50,880 --> 16:02:55,640 batches is not only is it more memory efficient because we're not loading 60,000 images into 10350 16:02:55,640 --> 16:02:57,480 memory at a time. 10351 16:02:57,480 --> 16:03:04,440 We are updating our model's parameters once per batch rather than waiting for it to see 10352 16:03:04,440 --> 16:03:07,040 the whole data set with every batch. 10353 16:03:07,040 --> 16:03:12,000 Our model is hopefully getting slightly better. 10354 16:03:12,000 --> 16:03:17,560 So that is because the optimizer dot step call is within the batch loop rather than the 10355 16:03:17,560 --> 16:03:20,400 epoch loop. 10356 16:03:20,400 --> 16:03:24,040 So let's now print out what's happening. 10357 16:03:24,040 --> 16:03:26,440 Print out what's happening. 10358 16:03:26,440 --> 16:03:32,480 So if batch, let's do it every 400 or so batches because we have a lot of batches. 10359 16:03:32,480 --> 16:03:37,480 We don't want to print out too often, otherwise we'll just fill our screen with numbers. 10360 16:03:37,480 --> 16:03:41,520 That might not be a bad thing, but 400 seems a good number. 10361 16:03:41,520 --> 16:03:45,720 That'll be about five printouts if we have 2000 batches. 10362 16:03:45,720 --> 16:03:51,720 So print looked at, and of course you can adjust this to whatever you would like. 10363 16:03:51,720 --> 16:03:56,520 That's the flexibility of PyTorch, flexibility of Python as well. 10364 16:03:56,520 --> 16:03:58,720 So looked at how many samples have we looked at? 10365 16:03:58,720 --> 16:04:02,640 So we're going to take the batch number, multiply it by X, the length of X is going 10366 16:04:02,640 --> 16:04:07,120 to be 32 because that is our batch size. 10367 16:04:07,120 --> 16:04:13,920 Then we're going to just write down here the total number of items that we've got now 10368 16:04:13,920 --> 16:04:19,720 of data set, and we can access that by going train data loader dot data set. 10369 16:04:19,720 --> 16:04:24,880 So that's going to give us length of the data set contained within our train data loader, 10370 16:04:24,880 --> 16:04:30,320 which is you might be able to guess 60,000 or should be. 10371 16:04:30,320 --> 16:04:37,360 Now we have to, because we've been accumulating the train loss, this is going to be quite 10372 16:04:37,360 --> 16:04:41,560 high because we've been adding every single time we've calculated the loss, we've been 10373 16:04:41,560 --> 16:04:45,240 adding it to the train loss, the overall value per batch. 10374 16:04:45,240 --> 16:04:50,320 So now let's adjust if we wanted to find out, see how now we've got this line, we're outside 10375 16:04:50,320 --> 16:04:51,960 of the batch loop. 10376 16:04:51,960 --> 16:04:59,320 We want to adjust our training loss to get the average training loss per batch per epoch. 10377 16:04:59,320 --> 16:05:01,640 So we're coming back to the epoch loop here. 10378 16:05:01,640 --> 16:05:06,120 A little bit confusing, but you just line up where the loops are, and this is going to 10379 16:05:06,120 --> 16:05:09,920 help you figure out what context you're computing in. 10380 16:05:09,920 --> 16:05:12,400 So now we are in the epoch loop. 10381 16:05:12,400 --> 16:05:21,720 So divide total train loss by length of train data loader, oh, this is so exciting, training 10382 16:05:21,720 --> 16:05:23,360 our biggest model yet. 10383 16:05:23,360 --> 16:05:29,440 So train loss equals or divide equals, we're going to reassign the train loss, we're going 10384 16:05:29,440 --> 16:05:32,880 to divide it by the length of the train data loader. 10385 16:05:32,880 --> 16:05:33,880 So why do we do this? 10386 16:05:33,880 --> 16:05:38,800 Well, because we've accumulated the train loss here for every batch in the train data 10387 16:05:38,800 --> 16:05:43,960 loader, but we want to average it out across how many batches there are in the train data 10388 16:05:43,960 --> 16:05:45,560 loader. 10389 16:05:45,560 --> 16:05:51,400 So this value will be quite high until we readjust it to find the average loss per epoch, because 10390 16:05:51,400 --> 16:05:53,320 we are in the epoch loop. 10391 16:05:53,320 --> 16:05:57,520 All right, there are a few steps going on, but that's all right, we'll figure this out, 10392 16:05:57,520 --> 16:06:01,120 or what should happening in a minute, let's code up the testing loop. 10393 16:06:01,120 --> 16:06:03,680 So testing, what do we have to do for testing? 10394 16:06:03,680 --> 16:06:06,680 Well, let's set up a test loss variable. 10395 16:06:06,680 --> 16:06:10,880 Why don't we do accuracy for testing as well? 10396 16:06:10,880 --> 16:06:14,640 Did we do accuracy for training? 10397 16:06:14,640 --> 16:06:17,720 We didn't do accuracy for training, but that's all right, we'll stick to doing accuracy for 10398 16:06:17,720 --> 16:06:18,720 testing. 10399 16:06:18,720 --> 16:06:25,600 We'll go model zero dot eval, we'll put it in evaluation mode, and we'll turn on our 10400 16:06:25,600 --> 16:06:30,720 inference mode context manager with torch dot inference mode. 10401 16:06:30,720 --> 16:06:36,240 Now we'll do the same thing for x, y in test data loader, we don't need to keep track 10402 16:06:36,240 --> 16:06:40,320 of the batches here again in the test data loader. 10403 16:06:40,320 --> 16:06:46,360 So we'll just loop through x, so features, images, and labels in our test data loader. 10404 16:06:46,360 --> 16:06:51,840 We're going to do the forward pass, because the test loop, we don't have an optimization 10405 16:06:51,840 --> 16:06:57,240 step, we are just passing our data through the model and evaluating the patterns it learned 10406 16:06:57,240 --> 16:06:58,240 on the training data. 10407 16:06:58,240 --> 16:07:01,640 So we're going to pass in x here. 10408 16:07:01,640 --> 16:07:08,600 This might be a little bit confusing, let's do this x test, y test. 10409 16:07:08,600 --> 16:07:12,840 That way we don't get confused with our x above for the training set. 10410 16:07:12,840 --> 16:07:20,000 Now we're going to calculate the loss, a cum, relatively might small that wrong app to 10411 16:07:20,000 --> 16:07:21,640 sound that out. 10412 16:07:21,640 --> 16:07:22,840 What do we have here? 10413 16:07:22,840 --> 16:07:28,120 So we've got our test loss variable that we just assigned to zero above, just up here. 10414 16:07:28,120 --> 16:07:30,240 So we're going to do test loss plus equals. 10415 16:07:30,240 --> 16:07:32,800 We're doing this in one step here. 10416 16:07:32,800 --> 16:07:35,840 Test spread, y test. 10417 16:07:35,840 --> 16:07:40,560 So we're comparing our test prediction to our y test labels, our test labels. 10418 16:07:40,560 --> 16:07:44,800 Now we're going to back out of the for loop here, because that's all we have to do, the 10419 16:07:44,800 --> 16:07:47,480 forward pass and calculate the loss for the test data set. 10420 16:07:47,480 --> 16:07:50,960 Oh, I said we're going to calculate the accuracy. 10421 16:07:50,960 --> 16:07:51,960 Silly me. 10422 16:07:51,960 --> 16:07:53,800 So calculate accuracy. 10423 16:07:53,800 --> 16:07:57,440 Let's go test act. 10424 16:07:57,440 --> 16:07:59,800 And we've got plus equals. 10425 16:07:59,800 --> 16:08:02,200 We can bring out our accuracy function here. 10426 16:08:02,200 --> 16:08:07,840 That's what we downloaded from our helper functions dot pi before, y true equals y test. 10427 16:08:07,840 --> 16:08:13,280 And then y pred equals test, pred dot arg max, dim equals one. 10428 16:08:13,280 --> 16:08:14,320 Why do we do this? 10429 16:08:14,320 --> 16:08:18,520 Well, because recall that the outputs of our model, the raw outputs of our model are going 10430 16:08:18,520 --> 16:08:25,320 to be logits and our accuracy function expects our true labels and our predictions to be 10431 16:08:25,320 --> 16:08:27,120 in the same format. 10432 16:08:27,120 --> 16:08:32,000 If our test pred is just logits, we have to call arg max to find the logit value with 10433 16:08:32,000 --> 16:08:35,840 the highest index, and that will be the prediction label. 10434 16:08:35,840 --> 16:08:39,840 And so then we're comparing labels to labels. 10435 16:08:39,840 --> 16:08:42,080 That's what the arg max does here. 10436 16:08:42,080 --> 16:08:49,640 So we can back out of the batch loop now, and we're going to now calculate Cal queue 10437 16:08:49,640 --> 16:08:58,080 length, the test loss, average per batch. 10438 16:08:58,080 --> 16:09:05,480 So let's go here, test loss, divide equals length test data loader. 10439 16:09:05,480 --> 16:09:11,960 So because we were in the context of the loop here of the batch loop, our test lost and 10440 16:09:11,960 --> 16:09:17,200 test accuracy values are per batch and accumulated every single batch. 10441 16:09:17,200 --> 16:09:22,960 So now we're just dividing them by how many batches we had, test data loader, and the 10442 16:09:22,960 --> 16:09:30,960 same thing for the accuracy, calculate the ACK or test ACK average per batch. 10443 16:09:30,960 --> 16:09:39,440 So this is giving us test loss and test accuracy per epoch, test ACK divided equals length, 10444 16:09:39,440 --> 16:09:44,400 test data loader, wonderful, we're so close to finishing this up. 10445 16:09:44,400 --> 16:09:47,880 And now we'll come back to where's our epoch loop. 10446 16:09:47,880 --> 16:09:52,960 We can, these lines are very helpful in Google CoLab, we scroll down. 10447 16:09:52,960 --> 16:09:57,680 I believe if you want them, you can go settings or something like that, yeah, settings. 10448 16:09:57,680 --> 16:09:59,960 That's where you can get these lines from if you don't have them. 10449 16:09:59,960 --> 16:10:03,800 So print out what's happening. 10450 16:10:03,800 --> 16:10:12,280 We are going to print f equals n, let's get the train loss in here. 10451 16:10:12,280 --> 16:10:16,480 Ten loss and we'll print that to four decimal places. 10452 16:10:16,480 --> 16:10:22,360 And then we'll get the test loss, of course, test loss and we'll go, we'll get that to four 10453 16:10:22,360 --> 16:10:24,640 decimal places as well. 10454 16:10:24,640 --> 16:10:35,400 And then we'll get the test ACK, test accuracy, we'll get that to four decimal places as well. 10455 16:10:35,400 --> 16:10:38,920 For f, wonderful. 10456 16:10:38,920 --> 16:10:44,400 And then finally, one more step, ooh, we've written a lot of code in this video. 10457 16:10:44,400 --> 16:10:48,440 We want to calculate the training time because that's another thing that we want to track. 10458 16:10:48,440 --> 16:10:51,240 We want to see how long our model is taken to train. 10459 16:10:51,240 --> 16:11:01,960 So train time end on CPU is going to equal the timer and then we're going to get the 10460 16:11:01,960 --> 16:11:06,920 total train time model zero so we can set up a variable for this so we can compare our 10461 16:11:06,920 --> 16:11:09,000 modeling experiments later on. 10462 16:11:09,000 --> 16:11:20,400 We're going to go print train time, start equals train time, start on CPU and equals 10463 16:11:20,400 --> 16:11:23,840 train time end on CPU. 10464 16:11:23,840 --> 16:11:32,080 And finally, the device is going to be string next model zero dot parameters. 10465 16:11:32,080 --> 16:11:37,520 So we're just, this is one way of checking where our model zero parameters live. 10466 16:11:37,520 --> 16:11:43,080 So beautiful, all right. 10467 16:11:43,080 --> 16:11:44,240 Have we got enough brackets there? 10468 16:11:44,240 --> 16:11:46,720 I don't think we do. 10469 16:11:46,720 --> 16:11:47,720 Okay. 10470 16:11:47,720 --> 16:11:48,720 There we go. 10471 16:11:48,720 --> 16:11:49,720 Whoo. 10472 16:11:49,720 --> 16:11:52,360 I'll just show you what the output of this is. 10473 16:11:52,360 --> 16:11:59,360 So next, model zero dot parameters, what does this give us? 10474 16:11:59,360 --> 16:12:05,040 Oh, can we go device here? 10475 16:12:05,040 --> 16:12:12,560 Oh, what do we have here? 10476 16:12:12,560 --> 16:12:14,160 Model zero dot parameters. 10477 16:12:14,160 --> 16:12:20,720 I thought this was a little trick. 10478 16:12:20,720 --> 16:12:29,080 And then if we go next parameter containing. 10479 16:12:29,080 --> 16:12:34,040 I thought we could get device, oh, there we go. 10480 16:12:34,040 --> 16:12:35,520 Excuse me. 10481 16:12:35,520 --> 16:12:36,520 That's how we get it. 10482 16:12:36,520 --> 16:12:37,960 That's how we get the device that it's on. 10483 16:12:37,960 --> 16:12:41,720 So let me just turn this. 10484 16:12:41,720 --> 16:12:45,280 This is what the output of that's going to be CPU. 10485 16:12:45,280 --> 16:12:47,040 That's what we're after. 10486 16:12:47,040 --> 16:12:50,320 So troubleshooting on the fly here. 10487 16:12:50,320 --> 16:12:51,720 Hopefully all of this code works. 10488 16:12:51,720 --> 16:12:53,400 So we went through all of our steps. 10489 16:12:53,400 --> 16:12:56,360 We're looping through epochs at the top level here. 10490 16:12:56,360 --> 16:12:59,280 We looped through the training batches, performed the training steps. 10491 16:12:59,280 --> 16:13:04,600 So our training loop, forward pass, loss calculation, optimizer zero grad, loss backwards, calculate 10492 16:13:04,600 --> 16:13:06,600 the loss per batch, accumulate those. 10493 16:13:06,600 --> 16:13:11,640 We do the same for the testing batches except without the optimizer steps and print out 10494 16:13:11,640 --> 16:13:14,360 what's happening and we time it all for fun. 10495 16:13:14,360 --> 16:13:18,720 A fair bit going on here, but if you don't think there's any errors, give that a go, run 10496 16:13:18,720 --> 16:13:19,720 that code. 10497 16:13:19,720 --> 16:13:24,320 I'm going to leave this one on a cliffhanger and we're going to see if this works in the 10498 16:13:24,320 --> 16:13:25,320 next video. 10499 16:13:25,320 --> 16:13:28,880 I'll see you there. 10500 16:13:28,880 --> 16:13:29,880 Welcome back. 10501 16:13:29,880 --> 16:13:32,120 The last video was pretty full on. 10502 16:13:32,120 --> 16:13:35,520 We did a fair few steps, but this is all good practice. 10503 16:13:35,520 --> 16:13:38,800 The best way to learn PyTorch code is to write more PyTorch code. 10504 16:13:38,800 --> 16:13:41,080 So did you try it out? 10505 16:13:41,080 --> 16:13:42,080 Did you run this code? 10506 16:13:42,080 --> 16:13:43,080 Did it work? 10507 16:13:43,080 --> 16:13:44,760 Did we probably have an error somewhere? 10508 16:13:44,760 --> 16:13:46,160 Well, let's find out together. 10509 16:13:46,160 --> 16:13:47,160 You ready? 10510 16:13:47,160 --> 16:13:51,600 Let's train our biggest model yet in three, two, one, bomb. 10511 16:13:51,600 --> 16:13:54,000 Oh, of course we did. 10512 16:13:54,000 --> 16:13:55,200 What do we have? 10513 16:13:55,200 --> 16:13:57,040 What's going on? 10514 16:13:57,040 --> 16:13:58,520 Indentation error. 10515 16:13:58,520 --> 16:14:00,360 Ah, classic. 10516 16:14:00,360 --> 16:14:03,760 So print out what's happening. 10517 16:14:03,760 --> 16:14:06,440 Do we not have an indent there? 10518 16:14:06,440 --> 16:14:12,760 Oh, is that not in line with where it needs to be? 10519 16:14:12,760 --> 16:14:14,080 Excuse me. 10520 16:14:14,080 --> 16:14:15,080 Okay. 10521 16:14:15,080 --> 16:14:16,320 Why is this not in line? 10522 16:14:16,320 --> 16:14:19,280 So this is strange to me, enter. 10523 16:14:19,280 --> 16:14:23,520 How did this all get off by one? 10524 16:14:23,520 --> 16:14:25,600 I'm not sure, but this is just what you'll face. 10525 16:14:25,600 --> 16:14:29,640 Like sometimes you'll write this beautiful code that should work, but the main error 10526 16:14:29,640 --> 16:14:33,480 of your entire code is that it's off by a single space. 10527 16:14:33,480 --> 16:14:39,880 I'm not sure how that happened, but we're just going to pull this all into line. 10528 16:14:39,880 --> 16:14:43,320 We could have done this by selecting it all, but we're going to do it line by line just 10529 16:14:43,320 --> 16:14:50,720 to make sure that everything's in the right order, beautiful, and we print out what's 10530 16:14:50,720 --> 16:14:51,720 happening. 10531 16:14:51,720 --> 16:14:54,760 Three, two, one, round two. 10532 16:14:54,760 --> 16:14:55,760 We're going. 10533 16:14:55,760 --> 16:14:56,760 Okay. 10534 16:14:56,760 --> 16:14:57,840 So this is the progress bar I was talking about. 10535 16:14:57,840 --> 16:14:58,840 Look at that. 10536 16:14:58,840 --> 16:14:59,840 How beautiful is that? 10537 16:14:59,840 --> 16:15:01,440 Oh, we're going quite quickly through all of our samples. 10538 16:15:01,440 --> 16:15:03,080 I need to talk faster. 10539 16:15:03,080 --> 16:15:04,080 Oh, there we go. 10540 16:15:04,080 --> 16:15:05,080 We've got some good results. 10541 16:15:05,080 --> 16:15:09,080 We've got the tests, the train loss, the test loss and the test accuracy is pretty darn 10542 16:15:09,080 --> 16:15:10,080 good. 10543 16:15:10,080 --> 16:15:11,600 Oh my goodness. 10544 16:15:11,600 --> 16:15:15,600 This is a good baseline already, 67%. 10545 16:15:15,600 --> 16:15:19,280 So this is showing us it's about seven seconds per iteration. 10546 16:15:19,280 --> 16:15:21,600 Remember TQDM is tracking how many epochs. 10547 16:15:21,600 --> 16:15:22,720 We're going through. 10548 16:15:22,720 --> 16:15:27,880 So we have three epochs and our print statement is just saying, hey, we've looked at zero 10549 16:15:27,880 --> 16:15:33,720 out of 60,000 samples and we looked at 12,000 out of 60,000 samples and we finished on 10550 16:15:33,720 --> 16:15:41,000 an epoch two because it's zero indexed and we have a train loss of 0.4550 and a test 10551 16:15:41,000 --> 16:15:49,160 loss 476 and a test accuracy 834265 and a training time about just over 21 seconds or 10552 16:15:49,160 --> 16:15:50,920 just under 22. 10553 16:15:50,920 --> 16:15:55,600 So keep in mind that your numbers may not be the exact same as mine. 10554 16:15:55,600 --> 16:16:02,600 They should be in the same realm as mine, but due to inherent randomness of machine learning, 10555 16:16:02,600 --> 16:16:05,480 even if we set the manual seed might be slightly different. 10556 16:16:05,480 --> 16:16:10,520 So don't worry too much about that and what I mean by in the same realm, if your accuracy 10557 16:16:10,520 --> 16:16:15,880 is 25 rather than 83, well then probably something's wrong there. 10558 16:16:15,880 --> 16:16:20,120 But if it's 83.6, well then that's not too bad. 10559 16:16:20,120 --> 16:16:24,720 And the same with the train time on CPU, this will be heavily dependent, how long it takes 10560 16:16:24,720 --> 16:16:30,160 to train will be heavily dependent on the hardware that you're using behind the scenes. 10561 16:16:30,160 --> 16:16:32,360 So I'm using Google Colab Pro. 10562 16:16:32,360 --> 16:16:37,240 Now that may mean I get a faster CPU than the free version of Google Colab. 10563 16:16:37,240 --> 16:16:44,280 It also depends on what CPU is available in Google's computer warehouse where Google 10564 16:16:44,280 --> 16:16:47,320 Colab is hosting of how fast this will be. 10565 16:16:47,320 --> 16:16:49,080 So just keep that in mind. 10566 16:16:49,080 --> 16:16:53,960 If your time is 10 times that, then there's probably something wrong. 10567 16:16:53,960 --> 16:16:58,280 If your time is 10 times less than that, well, hey, keep using that hardware because that's 10568 16:16:58,280 --> 16:16:59,720 pretty darn good. 10569 16:16:59,720 --> 16:17:01,600 So let's keep pushing forward. 10570 16:17:01,600 --> 16:17:04,680 This will be our baseline that we try to improve upon. 10571 16:17:04,680 --> 16:17:10,800 So we have an accuracy of 83.5 and we have a train time of 20 or so seconds. 10572 16:17:10,800 --> 16:17:16,120 So we'll see what we can do with a model on the GPU later and then also later on a 10573 16:17:16,120 --> 16:17:18,480 convolutional neural network. 10574 16:17:18,480 --> 16:17:22,880 So let's evaluate our model where we up to what we just did. 10575 16:17:22,880 --> 16:17:23,880 We built a training loop. 10576 16:17:23,880 --> 16:17:24,880 So we've done that. 10577 16:17:24,880 --> 16:17:25,880 That was a fair bit of code. 10578 16:17:25,880 --> 16:17:30,280 But now we're up to we fit the model to the data and make a prediction. 10579 16:17:30,280 --> 16:17:34,880 Let's do these two combined, hey, we'll evaluate our model. 10580 16:17:34,880 --> 16:17:36,240 So we'll come back. 10581 16:17:36,240 --> 16:17:42,320 Number four is make predictions and get model zero results. 10582 16:17:42,320 --> 16:17:47,600 Now we're going to create a function to do this because we want to build multiple models 10583 16:17:47,600 --> 16:17:54,440 and that way we can, if we have, say, model 0123, we can pass it to our function to evaluate 10584 16:17:54,440 --> 16:17:58,240 that model and then we can compare the results later on. 10585 16:17:58,240 --> 16:17:59,880 So that's something to keep in mind. 10586 16:17:59,880 --> 16:18:04,400 If you're going to be writing a bunch of code multiple times, you probably want to 10587 16:18:04,400 --> 16:18:09,600 functionize it and we could definitely do that for our training and last loops. 10588 16:18:09,600 --> 16:18:11,400 But we'll see that later on. 10589 16:18:11,400 --> 16:18:14,200 So let's go deaf of our model. 10590 16:18:14,200 --> 16:18:19,320 So evaluate a given model, we'll pass it in a model, which will be a torch dot nn dot 10591 16:18:19,320 --> 16:18:21,720 module, what of type. 10592 16:18:21,720 --> 16:18:29,560 And we'll pass it in a data loader, which will be of type torch dot utils dot data dot 10593 16:18:29,560 --> 16:18:32,480 data loader. 10594 16:18:32,480 --> 16:18:38,200 And then we'll pass in the loss function so that it can calculate the loss. 10595 16:18:38,200 --> 16:18:41,920 We could pass in an evaluation metric if we wanted to track that too. 10596 16:18:41,920 --> 16:18:44,960 So this will be torch nn dot module as well. 10597 16:18:44,960 --> 16:18:47,720 And then, oh, there we go. 10598 16:18:47,720 --> 16:18:51,560 Speaking of an evaluation function, let's pass in our accuracy function as well. 10599 16:18:51,560 --> 16:18:54,120 And I don't want L, I want that. 10600 16:18:54,120 --> 16:19:06,880 So we want to return a dictionary containing the results of model predicting on data loader. 10601 16:19:06,880 --> 16:19:07,880 So that's what we want. 10602 16:19:07,880 --> 16:19:10,800 We're going to return a dictionary of model results. 10603 16:19:10,800 --> 16:19:14,160 That way we could call this function multiple times with different models and different 10604 16:19:14,160 --> 16:19:20,160 data loaders and then compare the dictionaries full of results depending on which model we 10605 16:19:20,160 --> 16:19:21,720 passed in here. 10606 16:19:21,720 --> 16:19:27,000 So let's set up loss and accuracy equals zero, zero, we'll start those off. 10607 16:19:27,000 --> 16:19:32,320 We'll go, this is going to be much the same as our testing loop above, except it's going 10608 16:19:32,320 --> 16:19:35,760 to be functionalized and we're going to return a dictionary. 10609 16:19:35,760 --> 16:19:41,360 So we'll turn on our context manager for inferencing with torch dot inference mode. 10610 16:19:41,360 --> 16:19:46,360 Now we're going to loop through the data loader and we'll get the x and y values. 10611 16:19:46,360 --> 16:19:51,120 So the x will be our data, the y will be our ideal labels, we'll make predictions with 10612 16:19:51,120 --> 16:19:52,120 the model. 10613 16:19:52,120 --> 16:19:53,960 In other words, do the forward pass. 10614 16:19:53,960 --> 16:19:57,360 So we'll go y pred equals model on x. 10615 16:19:57,360 --> 16:20:02,560 Now we don't have to specify what model it is because we've got the model parameter up 10616 16:20:02,560 --> 16:20:03,560 here. 10617 16:20:03,560 --> 16:20:08,440 So we're starting to make our functions here or this function generalizable. 10618 16:20:08,440 --> 16:20:12,240 So it could be used with almost any model and any data loader. 10619 16:20:12,240 --> 16:20:20,160 So we want to accumulate the loss and accuracy values per batch because this is within the 10620 16:20:20,160 --> 16:20:22,960 batch loop here per batch. 10621 16:20:22,960 --> 16:20:28,840 And then we're going to go loss plus equals loss function, we'll pass it in the y pred 10622 16:20:28,840 --> 16:20:33,680 and the y the true label and we'll do the same with the accuracy. 10623 16:20:33,680 --> 16:20:41,360 So except this time we use our accuracy function, we'll send in y true equals y and y pred equals 10624 16:20:41,360 --> 16:20:47,520 y pred dot argmax because the raw outputs of our model are logits. 10625 16:20:47,520 --> 16:20:51,160 And if we want to convert them into labels, we could take the softmax for the prediction 10626 16:20:51,160 --> 16:20:56,760 probabilities, but we could also take the argmax and just by skipping the softmax step, the 10627 16:20:56,760 --> 16:21:03,680 argmax will get the index where the highest value load it is, dim equals one. 10628 16:21:03,680 --> 16:21:07,880 And then we're going to make sure that we're still within the context manager here. 10629 16:21:07,880 --> 16:21:11,800 So with torch inference mode, but outside the loop. 10630 16:21:11,800 --> 16:21:14,160 So that'll be this line here. 10631 16:21:14,160 --> 16:21:24,200 We're going to scale the loss and act to find the average loss slash act per batch. 10632 16:21:24,200 --> 16:21:30,400 So loss will divide and assign to the length of the data loader. 10633 16:21:30,400 --> 16:21:35,360 So that'll divide and reassign it to however many batches are in our data loader that we 10634 16:21:35,360 --> 16:21:41,200 pass into our of our model function, then we'll do the same thing for the accuracy here. 10635 16:21:41,200 --> 16:21:44,160 Length data loader, beautiful. 10636 16:21:44,160 --> 16:21:48,440 And now we're going to return a dictionary here. 10637 16:21:48,440 --> 16:21:54,760 So return, we can return the model name by inspecting the model. 10638 16:21:54,760 --> 16:21:58,760 We get an attribute of the model, which is its class name. 10639 16:21:58,760 --> 16:22:00,680 I'll show you how you can do that. 10640 16:22:00,680 --> 16:22:06,000 So this is helpful to track if you've created multiple different models and given them different 10641 16:22:06,000 --> 16:22:10,560 class names, you can access the name attribute. 10642 16:22:10,560 --> 16:22:17,160 So this only works when model was created with a class. 10643 16:22:17,160 --> 16:22:19,960 So you just have to ensure that your models have different class names. 10644 16:22:19,960 --> 16:22:24,480 If you want to do it like that, because we're going to do it like that, we can set the model 10645 16:22:24,480 --> 16:22:26,640 name to be its class name. 10646 16:22:26,640 --> 16:22:29,640 We'll get the model loss, which is just this value here. 10647 16:22:29,640 --> 16:22:34,160 After it's been scaled, we'll turn it into a single value by taking dot item. 10648 16:22:34,160 --> 16:22:39,200 And then we'll go model dot act, or we'll get model underscore act for the models accuracy. 10649 16:22:39,200 --> 16:22:41,200 We'll do the same thing here. 10650 16:22:41,200 --> 16:22:42,200 Act. 10651 16:22:42,200 --> 16:22:47,120 I don't think we need to take the item because accuracy comes back in a different form. 10652 16:22:47,120 --> 16:22:50,840 We'll find out, if in doubt, code it out. 10653 16:22:50,840 --> 16:22:53,760 So calculate model zero results on test data set. 10654 16:22:53,760 --> 16:22:57,400 And I want to let you know that you can create your own functions here to do almost whatever 10655 16:22:57,400 --> 16:22:58,560 you want. 10656 16:22:58,560 --> 16:23:01,840 I've just decided that this is going to be helpful for the models and the data that 10657 16:23:01,840 --> 16:23:03,000 we're building. 10658 16:23:03,000 --> 16:23:07,600 But keep that in mind that your models, your data sets might be different and will likely 10659 16:23:07,600 --> 16:23:09,120 be different in the future. 10660 16:23:09,120 --> 16:23:14,200 So you can create these functions for whatever use case you need. 10661 16:23:14,200 --> 16:23:21,320 Model zero results equals a vowel model. 10662 16:23:21,320 --> 16:23:25,120 So we're just going to call our function that we've just created here. 10663 16:23:25,120 --> 16:23:27,840 Model is going to equal model zero. 10664 16:23:27,840 --> 16:23:30,840 The data loader is going to equal what? 10665 16:23:30,840 --> 16:23:35,400 The test data loader, of course, because we want to evaluate it on the test data set. 10666 16:23:35,400 --> 16:23:40,480 And we're going to send in our loss function, which is loss function that we assigned above 10667 16:23:40,480 --> 16:23:42,680 just before our training loop. 10668 16:23:42,680 --> 16:23:49,520 If we come up here, our loss function is up here, and then if we go back down, we have 10669 16:23:49,520 --> 16:23:54,600 our accuracy function is equal to our accuracy function. 10670 16:23:54,600 --> 16:23:57,920 We just pass another function in there, beautiful. 10671 16:23:57,920 --> 16:23:59,520 And let's see if this works. 10672 16:23:59,520 --> 16:24:00,520 Model zero results. 10673 16:24:00,520 --> 16:24:04,600 Did you see any typos likely or errors in our code? 10674 16:24:04,600 --> 16:24:06,320 How do you think our model did? 10675 16:24:06,320 --> 16:24:08,840 Well, let's find out. 10676 16:24:08,840 --> 16:24:11,120 Oh, there we go. 10677 16:24:11,120 --> 16:24:12,520 We got model accuracy. 10678 16:24:12,520 --> 16:24:15,320 Can you see how we could reuse this dictionary later on? 10679 16:24:15,320 --> 16:24:20,120 So if we had model one results, model two results, we could use these dictionaries and compare 10680 16:24:20,120 --> 16:24:21,120 them all together. 10681 16:24:21,120 --> 16:24:22,280 So we've got our model name. 10682 16:24:22,280 --> 16:24:29,560 Our version zero, the model has an accuracy of 83.42 and a loss of 0.47 on the test data 10683 16:24:29,560 --> 16:24:30,560 loader. 10684 16:24:30,560 --> 16:24:32,880 Again, your numbers may be slightly different. 10685 16:24:32,880 --> 16:24:34,600 They should be in the same realm. 10686 16:24:34,600 --> 16:24:37,880 But if they're not the exact same, don't worry too much. 10687 16:24:37,880 --> 16:24:44,120 If they're 20 accuracy points less and the loss is 10 times higher, then you should probably 10688 16:24:44,120 --> 16:24:47,520 go back through your code and check if something is wrong. 10689 16:24:47,520 --> 16:24:51,360 And I believe if we wanted to do a progress bar here, could we do that? 10690 16:24:51,360 --> 16:24:52,360 TQDM. 10691 16:24:52,360 --> 16:24:55,000 Let's have a look, eh? 10692 16:24:55,000 --> 16:24:57,760 Oh, look at that progress bar. 10693 16:24:57,760 --> 16:24:58,920 That's very nice. 10694 16:24:58,920 --> 16:25:02,200 So that's nice and quick because it's only on 313 batches. 10695 16:25:02,200 --> 16:25:04,400 It goes quite quick. 10696 16:25:04,400 --> 16:25:06,600 So now, what's next? 10697 16:25:06,600 --> 16:25:11,480 Well, we've built model one, we've got a model zero, sorry, I'm getting ahead myself. 10698 16:25:11,480 --> 16:25:12,920 We've got a baseline here. 10699 16:25:12,920 --> 16:25:15,160 We've got a way to evaluate our model. 10700 16:25:15,160 --> 16:25:16,760 What's our workflow say? 10701 16:25:16,760 --> 16:25:17,760 So we've got our data ready. 10702 16:25:17,760 --> 16:25:18,760 We've done that. 10703 16:25:18,760 --> 16:25:19,760 We've picked or built a model. 10704 16:25:19,760 --> 16:25:20,760 We've picked a loss function. 10705 16:25:20,760 --> 16:25:21,760 We've built an optimizer. 10706 16:25:21,760 --> 16:25:23,320 We've created a training loop. 10707 16:25:23,320 --> 16:25:24,800 We've fit the model to the data. 10708 16:25:24,800 --> 16:25:26,000 We've made a prediction. 10709 16:25:26,000 --> 16:25:29,600 We've evaluated the model using loss and accuracy. 10710 16:25:29,600 --> 16:25:34,600 We could evaluate it by making some predictions, but we'll save that for later on as in visualizing 10711 16:25:34,600 --> 16:25:36,360 some predictions. 10712 16:25:36,360 --> 16:25:39,720 I think we're up to improving through experimentation. 10713 16:25:39,720 --> 16:25:41,280 So let's give that a go, hey? 10714 16:25:41,280 --> 16:25:45,720 Do you recall that we trained model zero on the CPU? 10715 16:25:45,720 --> 16:25:50,040 How about we build model one and start to train it on the GPU? 10716 16:25:50,040 --> 16:25:55,880 So in the next section, let's create number five, is set up device agnostic code. 10717 16:25:55,880 --> 16:26:01,840 So we've done this one together for using a GPU if there is one. 10718 16:26:01,840 --> 16:26:07,080 So my challenge to you for the next video is to set up some device agnostic code. 10719 16:26:07,080 --> 16:26:11,600 So you might have to go into CoLab if you haven't got a GPU active, change runtime type 10720 16:26:11,600 --> 16:26:16,120 to GPU, and then because it might restart the runtime, you might have to rerun all of 10721 16:26:16,120 --> 16:26:20,840 the cells above so that we get our helper functions file back and the data and whatnot. 10722 16:26:20,840 --> 16:26:27,840 So set up some device agnostic code and I'll see you in the next video. 10723 16:26:27,840 --> 16:26:28,840 How'd you go? 10724 16:26:28,840 --> 16:26:31,920 You should give it a shot, did you set up some device agnostic code? 10725 16:26:31,920 --> 16:26:34,360 I hope you gave it a go, but let's do it together. 10726 16:26:34,360 --> 16:26:35,360 This won't take too long. 10727 16:26:35,360 --> 16:26:38,680 The last two videos have been quite long. 10728 16:26:38,680 --> 16:26:43,800 So if I wanted to set device agnostic code, I want to see if I have a GPU available, do 10729 16:26:43,800 --> 16:26:44,800 I? 10730 16:26:44,800 --> 16:26:47,080 I can check it from the video SMI. 10731 16:26:47,080 --> 16:26:50,640 That fails because I haven't activated a GPU in CoLab yet. 10732 16:26:50,640 --> 16:26:55,080 I can also check here, torch CUDA is available. 10733 16:26:55,080 --> 16:27:00,920 That will PyTorch will check if there's a GPU available with CUDA and it's not. 10734 16:27:00,920 --> 16:27:06,040 So let's fix these two because we want to start using a GPU and we want to set up device 10735 16:27:06,040 --> 16:27:07,040 agnostic code. 10736 16:27:07,040 --> 16:27:12,600 So no matter what hardware our system is running, PyTorch leverages it. 10737 16:27:12,600 --> 16:27:18,120 So we're going to select GPU here, I'm going to click save and you'll notice that our Google 10738 16:27:18,120 --> 16:27:22,360 CoLab notebook will start to reset and we'll start to connect. 10739 16:27:22,360 --> 16:27:23,360 There we go. 10740 16:27:23,360 --> 16:27:28,480 We've got a GPU on the back end, Python, three Google Compute Engine back end GPU. 10741 16:27:28,480 --> 16:27:31,800 Do we have to reset this? 10742 16:27:31,800 --> 16:27:39,120 NVIDIA SMI, wonderful, I have a Tesla T4 GPU with 16 gigabytes of memory, that is wonderful. 10743 16:27:39,120 --> 16:27:40,920 And now do we have a GPU available? 10744 16:27:40,920 --> 16:27:43,480 Oh, torch is not defined. 10745 16:27:43,480 --> 16:27:46,320 Well, do you notice the numbers of these cells? 10746 16:27:46,320 --> 16:27:52,400 One, two, that means because we've reset our runtime to have a GPU, we have to rerun 10747 16:27:52,400 --> 16:27:54,000 all the cells above. 10748 16:27:54,000 --> 16:27:58,160 So we can go run before, that's going to run all the cells above, make sure that we download 10749 16:27:58,160 --> 16:28:03,680 the data, make sure that we download the helper functions file, we go back up, we should see 10750 16:28:03,680 --> 16:28:05,640 our data may be downloading. 10751 16:28:05,640 --> 16:28:07,200 It shouldn't take too long. 10752 16:28:07,200 --> 16:28:12,280 That is another advantage of using a relatively small data set that is already saved on PyTorch 10753 16:28:12,280 --> 16:28:14,240 data sets. 10754 16:28:14,240 --> 16:28:17,600 Just keep in mind that if you use a larger data set and you have to re-download it into 10755 16:28:17,600 --> 16:28:22,800 Google Colab, it may take a while to run, and if you build bigger models, they may take 10756 16:28:22,800 --> 16:28:23,800 a while to run. 10757 16:28:23,800 --> 16:28:27,960 So just keep that in mind for your experiments going forward, start small, increase when 10758 16:28:27,960 --> 16:28:28,960 necessary. 10759 16:28:28,960 --> 16:28:34,520 So we'll re-run this, we'll re-run this, and finally we're going to, oh, there we go, 10760 16:28:34,520 --> 16:28:41,400 we've got a GPU, wonderful, but we'll write some device-agnostic code here, set up device-agnostic 10761 16:28:41,400 --> 16:28:42,720 code. 10762 16:28:42,720 --> 16:28:48,840 So import-torch, now realistically you quite often do this at the start of every notebook, 10763 16:28:48,840 --> 16:28:52,880 but I just wanted to highlight how we might do it if we're in the middle, and I wanted 10764 16:28:52,880 --> 16:28:58,880 to practice running a model on a CPU only before stepping things up and going to a GPU. 10765 16:28:58,880 --> 16:29:06,840 So device equals CUDA, this is for our device-agnostic code, if torch dot CUDA is available, and it 10766 16:29:06,840 --> 16:29:11,400 looks like this is going to return true, else use the CPU. 10767 16:29:11,400 --> 16:29:16,840 And then we're going to check device, wonderful, CUDA. 10768 16:29:16,840 --> 16:29:20,840 So we've got some device-agnostic code ready to go, I think it's time we built another 10769 16:29:20,840 --> 16:29:22,280 model. 10770 16:29:22,280 --> 16:29:26,000 And I asked the question before, do you think that the data set that we're working with 10771 16:29:26,000 --> 16:29:28,600 requires nonlinearity? 10772 16:29:28,600 --> 16:29:34,840 So the shirts, and the bags, and the shoes, do we need nonlinear functions to model this? 10773 16:29:34,840 --> 16:29:40,840 Well it looks like our baseline model without nonlinearities did pretty well at modeling 10774 16:29:40,840 --> 16:29:47,680 our data, so we've got a pretty good test accuracy value, so 83%, so out of 100 images 10775 16:29:47,680 --> 16:29:53,320 it predicts the right one, 83% of the time, 83 times out of 100, it did pretty well without 10776 16:29:53,320 --> 16:29:55,400 nonlinearities. 10777 16:29:55,400 --> 16:30:00,400 Why don't we try a model that uses nonlinearities and it runs on the GPU? 10778 16:30:00,400 --> 16:30:04,640 So you might want to give that a go, see if you can create a model with nonlinear functions, 10779 16:30:04,640 --> 16:30:11,200 try nn.relu, run it on the GPU, and see how it goes, otherwise we'll do it together in 10780 16:30:11,200 --> 16:30:15,040 the next video, I'll see you there. 10781 16:30:15,040 --> 16:30:20,240 Hello everyone, and welcome back, we are making some terrific progress, let's see how far 10782 16:30:20,240 --> 16:30:24,640 we've come, we've got a data set, we've prepared our data loaders, we've built a baseline model, 10783 16:30:24,640 --> 16:30:30,160 and we've trained it, evaluated it, now it's time, oh, and the last video we set up device 10784 16:30:30,160 --> 16:30:37,080 diagnostic code, but where are we in our little framework, we're up to improving through experimentation, 10785 16:30:37,080 --> 16:30:40,720 and quite often that is building a different model and trying it out, it could be using 10786 16:30:40,720 --> 16:30:44,120 more data, it could be tweaking a whole bunch of different things. 10787 16:30:44,120 --> 16:30:49,840 So let's get into some coding, I'm going to write it here, model one, I believe we're 10788 16:30:49,840 --> 16:30:56,520 up to section six now, model one is going to be building a better model with nonlinearity, 10789 16:30:56,520 --> 16:31:00,960 so I asked you to do the challenge in the last video to give it a go, to try and build 10790 16:31:00,960 --> 16:31:05,600 a model with nonlinearity, I hope you gave it a go, because if anything that this course, 10791 16:31:05,600 --> 16:31:09,680 I'm trying to impart on you in this course, it's to give things a go, to try things out 10792 16:31:09,680 --> 16:31:13,960 because that's what machine learning and coding is all about, trying things out, giving it 10793 16:31:13,960 --> 16:31:22,320 a go, but let's write down here, we learned about the power of nonlinearity in notebook 10794 16:31:22,320 --> 16:31:31,160 O2, so if we go to the learnpytorch.io book, we go to section number two, we'll just wait 10795 16:31:31,160 --> 16:31:36,920 for this to load, and then if we come down here, we can search for nonlinearity, the missing 10796 16:31:36,920 --> 16:31:41,920 piece nonlinearity, so I'm going to get this and just copy that in there, if you want to 10797 16:31:41,920 --> 16:31:47,040 see what nonlinearity helps us do, it helps us model nonlinear data, and in the case of 10798 16:31:47,040 --> 16:31:51,960 a circle, can we model that with straight lines, in other words, linear lines? 10799 16:31:51,960 --> 16:31:57,320 All linear means straight, nonlinear means non-straight, and so we learned that through 10800 16:31:57,320 --> 16:32:02,360 the power of linear and nonlinear functions, neural networks can model almost any kind 10801 16:32:02,360 --> 16:32:08,280 of data if we pair them in the right way, so you can go back through and read that there, 10802 16:32:08,280 --> 16:32:15,600 but I prefer to code things out and try it out on our data, so let's create a model with 10803 16:32:15,600 --> 16:32:24,960 nonlinear and linear layers, but we also saw that our model with just linear layers can 10804 16:32:24,960 --> 16:32:29,680 model our data, it's performing quite well, so that's where the experimentation side of 10805 16:32:29,680 --> 16:32:34,440 things will come into play, sometimes you won't know what a model will do, whether it 10806 16:32:34,440 --> 16:32:39,320 will work or won't work on your data set, but that is where we try different things 10807 16:32:39,320 --> 16:32:45,360 out, so we come up here, we look at our data, hmm, that looks actually quite linear to 10808 16:32:45,360 --> 16:32:49,400 me as a bag, like it's just some straight lines, you could maybe model that with just 10809 16:32:49,400 --> 16:32:54,680 straight lines, but there are some things which you could potentially classify as nonlinear 10810 16:32:54,680 --> 16:33:00,320 in here, it's hard to tell without knowing, so let's give it a go, let's write a nonlinear 10811 16:33:00,320 --> 16:33:06,800 model which is going to be quite similar to model zero here, except we're going to interspurse 10812 16:33:06,800 --> 16:33:13,080 some relu layers in between our linear layers, so recall that relu is a nonlinear activation 10813 16:33:13,080 --> 16:33:19,520 function, and relu has the formula, if something comes in and it's a negative value, relu is 10814 16:33:19,520 --> 16:33:23,800 going to turn that negative into a zero, and if something is positive, relu is just going 10815 16:33:23,800 --> 16:33:32,760 to leave it there, so let's create another class here, fashion MNIST model V1, and we're 10816 16:33:32,760 --> 16:33:39,080 going to subclass from nn.module, beautiful, and then we're going to initialize our model, 10817 16:33:39,080 --> 16:33:45,280 it's going to be quite the same as what we created before, we want an input shape, that's 10818 16:33:45,280 --> 16:33:50,120 going to be an integer, and then we want a number of hidden units, and that's going 10819 16:33:50,120 --> 16:33:57,560 to be an int here, and then we want an output shape, int, and I want to stress as well that 10820 16:33:57,560 --> 16:34:03,840 although we're creating a class here with these inputs, classes are as flexible as functions, 10821 16:34:03,840 --> 16:34:08,000 so if you need different use cases for your modeling classes, just keep that in mind that 10822 16:34:08,000 --> 16:34:14,680 you can build that functionality in, self dot layer stack, we're going to spell layer stack 10823 16:34:14,680 --> 16:34:21,200 correctly, and we're going to set this equal to nn dot sequential, because we just want 10824 16:34:21,200 --> 16:34:26,680 a sequential set of layers, the first one's going to be nn dot flatten, which is going 10825 16:34:26,680 --> 16:34:36,480 to be flatten inputs into a single vector, and then we're going to go nn dot linear, 10826 16:34:36,480 --> 16:34:39,720 because we want to flatten our stuff because we want it to be the right shape, if we don't 10827 16:34:39,720 --> 16:34:46,760 flatten it, we get shape issues, input shape, and then the out features of our linear layer 10828 16:34:46,760 --> 16:34:53,040 is going to be the hidden units, hidden units, I'm just going to make some code cells here 10829 16:34:53,040 --> 16:34:58,960 so that my code goes into the middle of the screen, then here is where we're going to 10830 16:34:58,960 --> 16:35:03,720 add a nonlinear layer, so this is where we're going to add in a relu function, and where 10831 16:35:03,720 --> 16:35:08,120 might we put these? Well, generally, you'll have a linear function followed by a nonlinear 10832 16:35:08,120 --> 16:35:13,800 function in the construction of neural networks. However, neural networks are as customizable 10833 16:35:13,800 --> 16:35:19,800 as you can imagine, whether they work or not is a different question. So we'll go output 10834 16:35:19,800 --> 16:35:25,360 shape here, as the out features, oh, do we miss this one up? Yes, we did. This needs 10835 16:35:25,360 --> 16:35:33,360 to be hidden units. And why is that? Well, it's because the output shape of this linear 10836 16:35:33,360 --> 16:35:38,000 layer here needs to match up with the input shape of this linear layer here. The relu 10837 16:35:38,000 --> 16:35:42,240 layer won't change the shape of our data. And you could test that out by printing the 10838 16:35:42,240 --> 16:35:47,680 different shapes if you'd like. And then we're going to finish off with another nonlinear 10839 16:35:47,680 --> 16:35:54,560 layer at the end. Relu. Now, do you think that this will improve our model's results 10840 16:35:54,560 --> 16:35:59,800 or not? Well, it's hard to tell without trying it out, right? So let's continue building 10841 16:35:59,800 --> 16:36:05,360 our model. We have to override the forward method. Self X is going to be, we'll give 10842 16:36:05,360 --> 16:36:09,360 a type in here, this is going to be a torch tensor as the input. And then we're just going 10843 16:36:09,360 --> 16:36:16,920 to return what's happening here, we go self dot layer stack X. So that just means that 10844 16:36:16,920 --> 16:36:20,680 X is going to pass through our layer stack here. And we could customize this, we could 10845 16:36:20,680 --> 16:36:26,720 try it just with one nonlinear activation. This is actually our previous network, just 10846 16:36:26,720 --> 16:36:31,640 with those commented out. All we've done is added in two relu functions. And so I'm 10847 16:36:31,640 --> 16:36:38,040 going to run that beautiful. And so what should we do next? Well, we shouldn't stand 10848 16:36:38,040 --> 16:36:46,240 shaded but previously we ran our last model model zero on if we go parameters. Do we run 10849 16:36:46,240 --> 16:36:54,680 this on the GPU or the CPU? On the CPU. So how about we try out our fashion MNIST model 10850 16:36:54,680 --> 16:37:00,920 or V one running on the device that we just set up which should be CUDA. Wonderful. So 10851 16:37:00,920 --> 16:37:09,160 we can instantiate. So create an instance of model one. So we want model one or actually 10852 16:37:09,160 --> 16:37:14,480 we'll set up a manual seed here so that whenever we create a new instance of a model, it's 10853 16:37:14,480 --> 16:37:18,520 going to be instantiated with random numbers. We don't necessarily have to set a random 10854 16:37:18,520 --> 16:37:25,240 seed, but we do so anyway so that our values are quite similar on your end and my end input 10855 16:37:25,240 --> 16:37:32,200 shape is going to be 784. Where does that come from? Well, that's because this is the 10856 16:37:32,200 --> 16:37:42,800 output of the flatten layer after our 28 by 28 image goes in. Then we're going to set 10857 16:37:42,800 --> 16:37:45,720 up the hidden units. We're going to use the same number of hidden units as before, which 10858 16:37:45,720 --> 16:37:51,440 is going to be 10. And then the output shape is what? We need one value, one output neuron 10859 16:37:51,440 --> 16:37:56,200 for each of our classes. So length of the class names. And then we're going to send 10860 16:37:56,200 --> 16:38:03,320 this to the target device so we can write send to the GPU if it's available. So now 10861 16:38:03,320 --> 16:38:08,040 that we've set up device agnostic code in the last video, we can just put two device 10862 16:38:08,040 --> 16:38:16,720 instead of hard coding that. And so if we check, so this was the output for model zero's device, 10863 16:38:16,720 --> 16:38:23,080 let's now check model one's device, model one parameters, and we can check where those 10864 16:38:23,080 --> 16:38:31,960 parameters live by using the device attribute. Beautiful. So our model one is now living 10865 16:38:31,960 --> 16:38:37,320 on the GPU CUDA at index zero. Index zero means that it's on the first GPU that we have 10866 16:38:37,320 --> 16:38:44,680 available. We only have one GPU available. So it's on this Tesla T for GPU. Now, we've 10867 16:38:44,680 --> 16:38:49,080 got a couple more things to do. Now that we've created another model, we can recreate if 10868 16:38:49,080 --> 16:38:53,360 we go back to our workflow, we've just built a model here. What do we have to do after 10869 16:38:53,360 --> 16:38:58,160 we built a model? We have to instantiate a loss function and an optimizer. Now we've 10870 16:38:58,160 --> 16:39:02,120 done both of those things for model zero. So that's what we're going to do in the next 10871 16:39:02,120 --> 16:39:07,040 video. But I'd like you to go ahead and try to create a loss function for our model and 10872 16:39:07,040 --> 16:39:11,920 optimizer for model one. The hint is that they can be the exact same loss function and 10873 16:39:11,920 --> 16:39:19,360 optimizer as model zero. So give that a shot and I'll see you in the next video. Welcome 10874 16:39:19,360 --> 16:39:24,600 back. In the last video, we created another model. So we're continuing with our modeling 10875 16:39:24,600 --> 16:39:29,560 experiments. And the only difference here between fashion MNIST model V1 and V0 is that 10876 16:39:29,560 --> 16:39:35,160 we've added in nonlinear layers. Now we don't know for now we could think or guess whether 10877 16:39:35,160 --> 16:39:39,520 they would help improve our model. And with practice, you can start to understand how 10878 16:39:39,520 --> 16:39:44,160 different functions will influence your neural networks. But I prefer to, if in doubt, code 10879 16:39:44,160 --> 16:39:49,960 it out, run lots of different experiments. So let's continue. We now have to create 10880 16:39:49,960 --> 16:39:58,240 a loss function, loss, optimizer, and evaluation metrics. So we've done this for model zero. 10881 16:39:58,240 --> 16:40:01,920 So we're not going to spend too much time explaining what's going on here. And we've 10882 16:40:01,920 --> 16:40:06,560 done this a fair few times now. So from helper functions, which is the script we downloaded 10883 16:40:06,560 --> 16:40:11,560 before, we're going to import our accuracy function. And we're going to set up a loss 10884 16:40:11,560 --> 16:40:16,360 function, which is we're working with multi class classification. So what loss function 10885 16:40:16,360 --> 16:40:25,040 do we typically use? And then dot cross entropy loss. And as our optimizer is going to be 10886 16:40:25,040 --> 16:40:31,280 torch dot opt in dot SGD. And we're going to optimize this time. I'll put in the params 10887 16:40:31,280 --> 16:40:37,880 keyword here, model one dot parameters. And the learning rate, we're just going to keep 10888 16:40:37,880 --> 16:40:42,400 it the same as our previous model. And that's a thing to keep a note for your experiments. 10889 16:40:42,400 --> 16:40:46,560 When you're running fair few experiments, you only really want to tweak a couple of things 10890 16:40:46,560 --> 16:40:51,120 or maybe just one thing per experiment, that way you can really narrow down what actually 10891 16:40:51,120 --> 16:40:55,920 influences your model and what improves it slash what doesn't improve it. And a little 10892 16:40:55,920 --> 16:41:05,560 pop quiz. What does a loss function do? This is going to measure how wrong our model is. 10893 16:41:05,560 --> 16:41:15,360 And what does the optimizer do? Tries to update our models parameters to reduce the 10894 16:41:15,360 --> 16:41:21,360 loss. So that's what these two functions are going to be doing. The accuracy function is 10895 16:41:21,360 --> 16:41:26,440 of a course going to be measuring our models accuracy. We measure the accuracy because that's 10896 16:41:26,440 --> 16:41:33,560 one of the base classification metrics. So we'll run this. Now what's next? We're getting 10897 16:41:33,560 --> 16:41:38,080 quite good at this. We've picked a loss function and an optimizer. Now we're going to build 10898 16:41:38,080 --> 16:41:43,960 a training loop. However, we spent quite a bit of time doing that in a previous video. 10899 16:41:43,960 --> 16:41:49,000 If we go up here, that was our vowel model function. Oh, that was helpful. We turned it 10900 16:41:49,000 --> 16:41:55,400 into a function. How about we do the same with these? Why don't we make a function for 10901 16:41:55,400 --> 16:42:03,640 our training loop as well as our testing loop? So I think you can give this a go. We're going 10902 16:42:03,640 --> 16:42:09,240 to make a function in the next video for training. We're going to call that train step. And 10903 16:42:09,240 --> 16:42:14,120 we'll create a function for testing called test step. Now they'll both have to take in 10904 16:42:14,120 --> 16:42:18,360 some parameters. I'll let you figure out what they are. But otherwise, we're going to code 10905 16:42:18,360 --> 16:42:24,040 that up together in the next video. So I'll see you there. 10906 16:42:24,040 --> 16:42:28,800 So we've got a loss function ready and an optimizer. What's our next step? Well, it's 10907 16:42:28,800 --> 16:42:32,720 to create training and evaluation loops. So let's make a heading here. We're going to 10908 16:42:32,720 --> 16:42:40,680 call this functionizing training and evaluation or slash testing loops because we've written 10909 16:42:40,680 --> 16:42:48,160 similar code quite often for training and evaluating slash testing our models. Now we're 10910 16:42:48,160 --> 16:42:52,840 going to start moving towards functionizing code that we've written before because that's 10911 16:42:52,840 --> 16:42:56,640 not only a best practice, it helps reduce errors because if you're writing a training 10912 16:42:56,640 --> 16:43:01,160 loop all the time, we may get it wrong. If we've got one that works for our particular 10913 16:43:01,160 --> 16:43:05,560 problem, hey, we might as well save that as a function so we can continually call that 10914 16:43:05,560 --> 16:43:11,240 over and over and over again. So how about we, and this is going to be very rare that 10915 16:43:11,240 --> 16:43:15,920 I'm going to allow you to do this is that is we're going to copy this training and you 10916 16:43:15,920 --> 16:43:22,400 might have already attempted to create this. That is the function called, let's create 10917 16:43:22,400 --> 16:43:34,640 a function for one training loop. And we're going to call this train step. And we're going 10918 16:43:34,640 --> 16:43:39,880 to create a function for the testing loop. You're going to call this test step. Now these 10919 16:43:39,880 --> 16:43:44,280 are just what I'm calling them. You can call them whatever you want. I just understand 10920 16:43:44,280 --> 16:43:50,520 it quite easily by calling it train step. And then we can for each epoch in a range, 10921 16:43:50,520 --> 16:43:55,480 we call our training step. And then the same thing for each epoch in a range, we can call 10922 16:43:55,480 --> 16:44:01,080 a testing step. This will make a lot more sense once we've coded it out. So let's put 10923 16:44:01,080 --> 16:44:07,840 the training code here. To functionize this, let's start it off with train step. Now what 10924 16:44:07,840 --> 16:44:12,240 parameters should our train step function take in? Well, let's think about this. We 10925 16:44:12,240 --> 16:44:21,840 need a model. We need a data loader. We need a loss function. And we need an optimizer. 10926 16:44:21,840 --> 16:44:28,640 We could also put in an accuracy function here if we wanted to. And potentially it's 10927 16:44:28,640 --> 16:44:33,960 not here, but we could put in what target device we'd like to compute on and make our 10928 16:44:33,960 --> 16:44:38,560 code device agnostic. So this is just the exact same code we went through before. We 10929 16:44:38,560 --> 16:44:43,480 loop through a data loader. We do the forward pass. We calculate the loss. We accumulate 10930 16:44:43,480 --> 16:44:49,960 it. We zero the optimizer. We perform backpropagation in respect to the loss with the parameters 10931 16:44:49,960 --> 16:44:54,520 of the model. And then we step the optimizer to hopefully improve the parameters of our 10932 16:44:54,520 --> 16:45:00,960 model to better predict the data that we're trying to predict. So let's craft a train 10933 16:45:00,960 --> 16:45:08,520 step function here. We'll take a model, which is going to be torch nn.module, type hint. 10934 16:45:08,520 --> 16:45:16,120 And we're going to put in a data loader, which is going to be of type torch utils dot data 10935 16:45:16,120 --> 16:45:21,000 dot data loader. Now we don't necessarily need to put this in these type hints, but 10936 16:45:21,000 --> 16:45:24,520 they're relatively new addition to Python. And so you might start to see them more and 10937 16:45:24,520 --> 16:45:30,440 more. And it also just helps people understand what your code is expecting. So the loss 10938 16:45:30,440 --> 16:45:38,080 function, we're going to put in an optimizer torch dot opt in, which is a type optimizer. 10939 16:45:38,080 --> 16:45:42,200 We also want an accuracy function. We don't necessarily need this either. These are a 10940 16:45:42,200 --> 16:45:47,920 lot of nice to habs. The first four are probably the most important. And then the device. So 10941 16:45:47,920 --> 16:45:55,640 torch is going to be torch dot device equals device. So we'll just hard code that to be 10942 16:45:55,640 --> 16:46:04,560 our already set device parameter. And we'll just write in here, performs training step 10943 16:46:04,560 --> 16:46:15,360 with model, trying to learn on data loader. Nice and simple, we could make that more 10944 16:46:15,360 --> 16:46:20,400 explanatory if we wanted to, but we'll leave it at that for now. And so right at the start, 10945 16:46:20,400 --> 16:46:25,800 we're going to set up train loss and train act equals zero zero. We're going to introduce 10946 16:46:25,800 --> 16:46:30,680 accuracy here. So we can get rid of this. Let's just go through this line by line. What 10947 16:46:30,680 --> 16:46:37,400 do we need to do here? Well, we've got four batch XY in enumerate train data loader. But 10948 16:46:37,400 --> 16:46:42,640 we're going to change that to data loader up here. So we can just change this to data 10949 16:46:42,640 --> 16:46:50,480 loader. Wonderful. And now we've got model zero dot train. Do we want that? Well, no, 10950 16:46:50,480 --> 16:46:54,000 because we're going to keep this model agnostic, we want to be able to use any model with this 10951 16:46:54,000 --> 16:46:59,520 function. So let's get rid of this model dot train. We are missing one step here is 10952 16:46:59,520 --> 16:47:10,600 put data on target device. And we could actually put this model dot train up here. Put model 10953 16:47:10,600 --> 16:47:15,680 into training mode. Now, this will be the default for the model. But just in case we're 10954 16:47:15,680 --> 16:47:21,320 going to call it anyway, model dot train, put data on the target device. So we're going 10955 16:47:21,320 --> 16:47:31,680 to go XY equals X dot two device, Y dot two device. Wonderful. And the forward pass, we 10956 16:47:31,680 --> 16:47:36,760 don't need to use model zero anymore. We're just going to use model that's up here. The 10957 16:47:36,760 --> 16:47:42,200 loss function can stay the same because we're passing in a loss function up there. The train 10958 16:47:42,200 --> 16:47:48,400 loss can be accumulated. That's fine. But we might also accumulate now the train accuracy, 10959 16:47:48,400 --> 16:47:57,960 limit loss, and accuracy per batch. So train act equals or plus equals our accuracy function 10960 16:47:57,960 --> 16:48:08,240 on Y true equals Y and Y pred equals Y pred. So the outputs here, Y pred, we need to take 10961 16:48:08,240 --> 16:48:14,960 because the raw outputs, outputs, the raw logits from the model, because our accuracy 10962 16:48:14,960 --> 16:48:20,480 function expects our predictions to be in the same format as our true values. We need 10963 16:48:20,480 --> 16:48:24,960 to make sure that they are we can call the argmax here on the first dimension. This is 10964 16:48:24,960 --> 16:48:33,560 going to go from logits to prediction labels. We can keep the optimizer zero grab the same 10965 16:48:33,560 --> 16:48:37,600 because we're passing in an optimizer up here. We can keep the loss backwards because the 10966 16:48:37,600 --> 16:48:43,920 loss is just calculated there. We can keep optimizer step. And we could print out what's 10967 16:48:43,920 --> 16:48:50,520 happening. But we might change this up a little bit. We need to divide the total train loss 10968 16:48:50,520 --> 16:48:55,560 and accuracy. I just want to type in accuracy here because now we've added in accuracy metric 10969 16:48:55,560 --> 16:49:04,160 act. So train act divided equals length train data loader. Oh, no, sorry. We can just use 10970 16:49:04,160 --> 16:49:13,240 the data loader here, data loader, data loader. And we're not going to print out per batch 10971 16:49:13,240 --> 16:49:18,200 here. I'm just going to get rid of this. We'll make at the end of this step, we will make 10972 16:49:18,200 --> 16:49:23,760 our print out here, print. Notice how it's at the end of the step because we're outside 10973 16:49:23,760 --> 16:49:30,800 the for loop now. So we're going to here, we're accumulating the loss on the training 10974 16:49:30,800 --> 16:49:35,520 data set and the accuracy on the training data set per batch. And then we're finding 10975 16:49:35,520 --> 16:49:39,640 out at the end of the training steps. So after it's been through all the batches in 10976 16:49:39,640 --> 16:49:45,000 the data loader, we're finding out what the average loss is per batch. And the average 10977 16:49:45,000 --> 16:49:53,120 accuracy is per batch. And now we're going to go train loss is going to be the train 10978 16:49:53,120 --> 16:50:07,680 loss on 0.5. And then we're going to go train act is going to be train act. And we're going 10979 16:50:07,680 --> 16:50:20,160 to set that to 0.2 F. Get that there, percentage. Wonderful. So if all this works, we should 10980 16:50:20,160 --> 16:50:25,400 be able to call our train step function and pass it in a model, a data loader, a loss 10981 16:50:25,400 --> 16:50:30,760 function, an optimizer, an accuracy function and a device. And it should automatically 10982 16:50:30,760 --> 16:50:34,960 do all of these steps. So we're going to find that out in a later video. In the next video, 10983 16:50:34,960 --> 16:50:39,320 we're going to do the same thing we've just done for the training loop with the test step. 10984 16:50:39,320 --> 16:50:43,400 But here's your challenge for this video is to go up to the testing loop code we wrote 10985 16:50:43,400 --> 16:50:49,600 before and try to recreate the test step function in the same format that we've done here. So 10986 16:50:49,600 --> 16:50:56,080 give that a go. And I'll see you in the next video. Welcome back. In the last video, we 10987 16:50:56,080 --> 16:51:01,480 functionalized our training loop. So now we can call this train step function. And instead 10988 16:51:01,480 --> 16:51:06,120 of writing all this training loop code again, well, we can train our model through the art 10989 16:51:06,120 --> 16:51:11,400 of a function. Now let's do the same for our testing loop. So I issued you the challenge 10990 16:51:11,400 --> 16:51:15,680 in the last video to give it a go. I hope you did because that's the best way to practice 10991 16:51:15,680 --> 16:51:20,160 PyTorch code is to write more pytorch code. Let's put in a model, which is going to be 10992 16:51:20,160 --> 16:51:28,720 torch and then dot module. And we're going to put in a data loader. Because we need a 10993 16:51:28,720 --> 16:51:33,360 model and we need data, the data loader is going to be, of course, the test data load 10994 16:51:33,360 --> 16:51:38,760 here, torch dot utils dot data dot data loader. And then we're going to put in a loss function, 10995 16:51:38,760 --> 16:51:44,960 which is going to be torch and end up module as well. Because we're going to use an end 10996 16:51:44,960 --> 16:51:49,680 up cross entropy loss. We'll see that later on. We're going to put in an accuracy function. 10997 16:51:49,680 --> 16:51:53,440 We don't need an optimizer because we're not doing any optimization in the testing loop. 10998 16:51:53,440 --> 16:51:58,280 We're just evaluating. And the device can be torch dot device. And we're going to set 10999 16:51:58,280 --> 16:52:04,600 that as a default to the target device parameter. Beautiful. So we'll put a little doctoring 11000 16:52:04,600 --> 16:52:18,520 here. So performs a testing loop step on model going over data loader. Wonderful. So now 11001 16:52:18,520 --> 16:52:23,560 let's set up a test loss and a test accuracy, because we'll measure test loss and accuracy 11002 16:52:23,560 --> 16:52:28,800 without testing loop function. And we're going to set the model into, I'll just put a comment 11003 16:52:28,800 --> 16:52:38,960 here, put the model in a vowel mode. So model dot a vowel, we don't have to use any underscore 11004 16:52:38,960 --> 16:52:44,960 here as in model zero, because we have a model coming in the top here. Now, what should we 11005 16:52:44,960 --> 16:52:50,680 do? Well, because we're performing a test step, we should turn on inference mode. So 11006 16:52:50,680 --> 16:52:57,840 turn on inference mode, inference mode context manager. Remember, whenever you're performing 11007 16:52:57,840 --> 16:53:02,920 predictions with your model, you should put it in model dot a vowel. And if you want as 11008 16:53:02,920 --> 16:53:07,320 many speedups as you can get, make sure the predictions are done within the inference 11009 16:53:07,320 --> 16:53:12,000 mode. Because remember, inference is another word for predictions within the inference 11010 16:53:12,000 --> 16:53:18,120 mode context manager. So we're going to loop through our data loader for X and Y in data 11011 16:53:18,120 --> 16:53:24,200 loader. We don't have to specify that this is X test. For Y test, we could if we wanted 11012 16:53:24,200 --> 16:53:31,220 to. But because we're in another function here, we can just go for X, Y in data loader, 11013 16:53:31,220 --> 16:53:40,520 we can do the forward pass. After we send the data to the target device, target device, 11014 16:53:40,520 --> 16:53:48,040 so we're going to have X, Y equals X dot two device. And the same thing with Y, we're 11015 16:53:48,040 --> 16:53:53,200 just doing best practice here, creating device agnostic code. Then what should we do? Well, 11016 16:53:53,200 --> 16:53:56,520 we should do the thing that I said before, which is the forward pass. Now that our data 11017 16:53:56,520 --> 16:54:02,320 and model be on the same device, we can create a variable here test pred equals model, we're 11018 16:54:02,320 --> 16:54:09,000 going to pass in X. And then what do we do? We can calculate the loss. So to calculate 11019 16:54:09,000 --> 16:54:17,800 the loss slash accuracy, we're going to accumulate it per batch. So we'll set up test loss equals 11020 16:54:17,800 --> 16:54:25,480 loss function. Oh, plus equals loss function. We're going to pass it in test pred and Y, 11021 16:54:25,480 --> 16:54:30,800 which is our truth label. And then the test act where you will accumulate as well, using 11022 16:54:30,800 --> 16:54:36,560 our accuracy function, we'll pass in Y true equals Y. And then Y pred, what do we have 11023 16:54:36,560 --> 16:54:43,960 to do to Y pred? Well, our test pred, we have to take the argmax to convert it from. 11024 16:54:43,960 --> 16:54:51,080 So this is going to outputs raw logits. Remember, a models raw output is referred to as logits. 11025 16:54:51,080 --> 16:55:01,080 And then here, we have to go from logits to prediction labels. Beautiful. Oh, little typo 11026 16:55:01,080 --> 16:55:07,840 here. Did you catch that one? Tab, tab. Beautiful. Oh, look how good this function is looking. 11027 16:55:07,840 --> 16:55:14,680 Now we're going to adjust the metrics. So adjust metrics and print out. You might notice 11028 16:55:14,680 --> 16:55:21,280 that we're outside of the batch loop here, right? So if we draw down from this line for 11029 16:55:21,280 --> 16:55:25,880 and we write some code here, we're still within the context manager. This is important because 11030 16:55:25,880 --> 16:55:33,800 if we want to adapt a value created inside the context manager, we have to modify it 11031 16:55:33,800 --> 16:55:39,880 still with inside that context manager, otherwise pytorch will throw an error. So try to write 11032 16:55:39,880 --> 16:55:46,800 this code if you want outside the context manager and see if it still works. So test loss, we're 11033 16:55:46,800 --> 16:55:54,400 going to adjust it to find out the average test loss and test accuracy per batch across 11034 16:55:54,400 --> 16:56:00,680 a whole step. So we're going to go length data loader. Now we're going to print out 11035 16:56:00,680 --> 16:56:06,840 what's happening. Print out what's happening. So test loss, which we put in here, well, 11036 16:56:06,840 --> 16:56:11,360 we're going to get the test loss. Let's get this to five decimal places. And then we're 11037 16:56:11,360 --> 16:56:18,040 going to go test act. And we will get that to two decimal places. You could do this as 11038 16:56:18,040 --> 16:56:24,480 many decimal as you want. You could even times it by 100 to get it in proper accuracy format. 11039 16:56:24,480 --> 16:56:31,400 And we'll put a new line on the end here. Wonderful. So now it looks like we've got functions. 11040 16:56:31,400 --> 16:56:37,840 I haven't run this cell yet for a training step and a test step. So how do you think we 11041 16:56:37,840 --> 16:56:42,440 could replicate if we go back up to our training loop that we wrote before? How do you think 11042 16:56:42,440 --> 16:56:51,160 we could replicate the functionality of this, except this time using our functions? Well, 11043 16:56:51,160 --> 16:56:56,280 we could still use this for epoch and TQDM range epochs. But then we would just call 11044 16:56:56,280 --> 16:57:01,840 our training step for this training code, our training step function. And we would call 11045 16:57:01,840 --> 16:57:07,800 our testing step function, passing in the appropriate parameters for our testing loop. 11046 16:57:07,800 --> 16:57:12,320 So that's what we'll do in the next video. We will leverage our two functions, train 11047 16:57:12,320 --> 16:57:18,080 step and test step to train model one. But here's your challenge for this video. Give 11048 16:57:18,080 --> 16:57:24,360 that a go. So use our training step and test step function to train model one for three 11049 16:57:24,360 --> 16:57:31,680 epochs and see how you go. But we'll do it together in the next video. Welcome back. 11050 16:57:31,680 --> 16:57:37,200 How'd you go? Did you create a training loop or a PyTorch optimization loop using our training 11051 16:57:37,200 --> 16:57:43,000 step function and a test step function? Were there any errors? In fact, I don't even know. 11052 16:57:43,000 --> 16:57:46,760 But how about we find out together? Hey, how do we combine these two functions to create 11053 16:57:46,760 --> 16:57:54,400 an optimization loop? So I'm going to go torch dot manual seed 42. And I'm going to measure 11054 16:57:54,400 --> 16:57:58,720 the time of how long our training and test loop takes. This time we're using a different 11055 16:57:58,720 --> 16:58:03,200 model. So this model uses nonlinearities and it's on the GPU. So that's the main thing 11056 16:58:03,200 --> 16:58:08,200 we want to compare is how long our model took on CPU versus GPU. So I'm going to import 11057 16:58:08,200 --> 16:58:16,640 from time it, import default timer as timer. And I'm going to start the train time. Train 11058 16:58:16,640 --> 16:58:27,880 time start on GPU equals timer. And then I'm just right here, set epochs. I'm going to 11059 16:58:27,880 --> 16:58:32,560 set epochs equal to three, because we want to keep our training experiments as close 11060 16:58:32,560 --> 16:58:38,960 to the same as possible. So we can see what little changes do what. And then it's create 11061 16:58:38,960 --> 16:58:51,280 a optimization and evaluation loop using train step and test step. So we're going to loop 11062 16:58:51,280 --> 16:58:59,960 through the epochs for epoch in TQDM. So we get a nice progress bar in epochs. Then we're 11063 16:58:59,960 --> 16:59:08,160 going to print epoch. A little print out of what's going on. Epoch. And we'll get a new 11064 16:59:08,160 --> 16:59:12,120 line. And then maybe one, two, three, four, five, six, seven, eight or something like 11065 16:59:12,120 --> 16:59:16,680 that. Maybe I'm miscounted there. But that's all right. Train step. What do we have to 11066 16:59:16,680 --> 16:59:21,080 do for this? Now we have a little doc string. We have a model. What model would we like 11067 16:59:21,080 --> 16:59:26,360 to use? We'd like to use model one. We have a data loader. What data loader would we 11068 16:59:26,360 --> 16:59:32,720 like to use? Well, we'd like to use our train data loader. We also have a loss function, 11069 16:59:32,720 --> 16:59:44,200 which is our loss function. We have an optimizer, which is our optimizer. And we have an accuracy 11070 16:59:44,200 --> 16:59:53,040 function, which is our accuracy function. And oops, forgot to put FM. And finally, we have 11071 16:59:53,040 --> 16:59:58,200 a device, which equals device, but we're going to set that anyway. So how beautiful is that 11072 16:59:58,200 --> 17:00:02,000 for creating a training loop? Thanks to the code that we've functionalized before. And 11073 17:00:02,000 --> 17:00:07,120 just recall, we set our optimizer and loss function in a previous video. You could bring 11074 17:00:07,120 --> 17:00:12,280 these down here if you really wanted to, so that they're all in one place, either way 11075 17:00:12,280 --> 17:00:17,720 up. But we can just get rid of that because we've already set it. Now we're going to do 11076 17:00:17,720 --> 17:00:22,280 the same thing for our test step. So what do we need here? Let's check the doc string. 11077 17:00:22,280 --> 17:00:25,920 We could put a little bit more information in this doc string if we wanted to to really 11078 17:00:25,920 --> 17:00:30,400 make our code more reusable, and so that if someone else was to use our code, or even 11079 17:00:30,400 --> 17:00:35,800 us in the future knows what's going on. But let's just code it out because we're just 11080 17:00:35,800 --> 17:00:40,480 still fresh in our minds. Model equals model one. What's our data loader going to be for 11081 17:00:40,480 --> 17:00:45,760 the test step? It's going to be our test data loader. Then we're going to set in a loss 11082 17:00:45,760 --> 17:00:49,400 function, which is going to be just the same loss function. We don't need to use an optimizer 11083 17:00:49,400 --> 17:00:56,120 here because we are only evaluating our model, but we can pass in our accuracy function. 11084 17:00:56,120 --> 17:01:00,800 Accuracy function. And then finally, the device is already set, but we can just pass 11085 17:01:00,800 --> 17:01:08,160 it in anyway. Look at that. Our whole optimization loop in a few lines of code. Isn't that beautiful? 11086 17:01:08,160 --> 17:01:12,960 So these functions are something that you could put in, like our helper functions dot 11087 17:01:12,960 --> 17:01:17,720 pi. And that way you could just import it later on. And you don't have to write your 11088 17:01:17,720 --> 17:01:22,600 training loops all over again. But we'll see a more of an example of that later on in 11089 17:01:22,600 --> 17:01:30,040 the course. So let's keep going. We want to measure the train time, right? So we're 11090 17:01:30,040 --> 17:01:34,920 going to create, once it's been through these steps, we're going to create train time end 11091 17:01:34,920 --> 17:01:41,040 on CPU. And then we're going to set that to the timer. So all this is going to do is 11092 17:01:41,040 --> 17:01:46,160 measure at value in time, once this line of code is run, it's going to run all of these 11093 17:01:46,160 --> 17:01:50,920 lines of code. So it's going to perform the training and optimization loop. And then it's 11094 17:01:50,920 --> 17:01:57,120 going to, oh, excuse me, this should be GPU. It's going to measure a point in time here. 11095 17:01:57,120 --> 17:02:01,400 So once all this codes run, measure a point in time there. And then finally, we can go 11096 17:02:01,400 --> 17:02:08,880 total train time for model one is equal to print train time, which is our function that 11097 17:02:08,880 --> 17:02:14,600 we wrote before. And we pass it in a start time. And it prints the difference between 11098 17:02:14,600 --> 17:02:21,120 the start and end time on a target device. So let's do that. Start equals what? Train 11099 17:02:21,120 --> 17:02:31,960 time start on GPU. The end is going to be train time end on GPU. And the device is going 11100 17:02:31,960 --> 17:02:42,000 to be device. Beautiful. So are you ready to run our next modeling experiment model one? 11101 17:02:42,000 --> 17:02:46,080 We've got a model running on the GPU, and it's using nonlinear layers. And we want to 11102 17:02:46,080 --> 17:02:53,840 compare it to our first model, which our results were model zero results. And we have total 11103 17:02:53,840 --> 17:03:00,720 train time on model zero. Yes, we do. So this is what we're going for. Does our model 11104 17:03:00,720 --> 17:03:06,920 one beat these results? And does it beat this result here? So three, two, one, do we 11105 17:03:06,920 --> 17:03:14,160 have any errors? No, we don't. Okay. Train step got an unexpected keyword loss. Oh, did 11106 17:03:14,160 --> 17:03:20,440 you catch that? I didn't type in loss function. Let's run it again. There we go. Okay, we're 11107 17:03:20,440 --> 17:03:25,480 running. We've got a progress bar. It's going to output at the end of each epoch. There 11108 17:03:25,480 --> 17:03:32,880 we go. Training loss. All right. Test accuracy, training accuracy. This is so exciting. I 11109 17:03:32,880 --> 17:03:38,240 love watching neural networks train. Okay, we're improving per epoch. That's a good sign. 11110 17:03:38,240 --> 17:03:45,720 But we've still got a fair way to go. Oh, okay. So what do we have here? Well, we didn't 11111 17:03:45,720 --> 17:03:51,240 beat our, hmm, it looks like we didn't beat our model zero results with the nonlinear 11112 17:03:51,240 --> 17:03:58,560 layers. And we only just slightly had a faster training time. Now, again, your numbers might 11113 17:03:58,560 --> 17:04:02,720 not be the exact same as what I've got here. Right? So that's a big thing about machine 11114 17:04:02,720 --> 17:04:08,200 learning is that it uses randomness. So your numbers might be slightly different. The direction 11115 17:04:08,200 --> 17:04:13,360 should be quite similar. And we may be using different GPUs. So just keep that in mind. 11116 17:04:13,360 --> 17:04:18,920 Right now I'm using a new video, SMI. I'm using a Tesla T4, which is at the time of 11117 17:04:18,920 --> 17:04:25,360 recording this video, Wednesday, April 20, 2022 is a relatively fast GPU for making 11118 17:04:25,360 --> 17:04:29,840 inference. So just keep that in mind. Your GPU in the future may be different. And your 11119 17:04:29,840 --> 17:04:35,760 CPU that you run may also have a different time here. So if these numbers are like 10 11120 17:04:35,760 --> 17:04:40,680 times higher, you might want to look into seeing if your code is there's some error. 11121 17:04:40,680 --> 17:04:44,840 If they're 10 times lower, well, hey, you're running it on some fast hardware. So it looks 11122 17:04:44,840 --> 17:04:52,720 like my code is running on CUDA slightly faster than the CPU, but not dramatically faster. 11123 17:04:52,720 --> 17:04:57,080 And that's probably akin to the fact that our data set isn't too complex and our model 11124 17:04:57,080 --> 17:05:01,840 isn't too large. What I mean by that is our model doesn't have like a vast amount of 11125 17:05:01,840 --> 17:05:07,520 layers. And our data set is only comprised of like, this is the layers our model has. 11126 17:05:07,520 --> 17:05:13,840 And our data set is only comprised of 60,000 images that are 28 by 28. So as you can imagine, 11127 17:05:13,840 --> 17:05:18,760 the more parameters in your model, the more features in your data, the higher this time 11128 17:05:18,760 --> 17:05:25,560 is going to be. And you might sometimes even find that your model is faster on CPU. So 11129 17:05:25,560 --> 17:05:32,280 this is the train time on CPU. You might sometimes find that your model's training 11130 17:05:32,280 --> 17:05:38,240 time on a CPU is in fact faster for the exact same code running on a GPU. Now, why might 11131 17:05:38,240 --> 17:05:48,520 that be? Well, let's write down this here. Let's go note. Sometimes, depending on your 11132 17:05:48,520 --> 17:06:00,560 data slash hardware, you might find that your model trains faster on CPU than GPU. Now, 11133 17:06:00,560 --> 17:06:09,160 why is this? So one of the number one reasons is that one, it could be that the overhead 11134 17:06:09,160 --> 17:06:22,360 for copying data slash model to and from the GPU outweighs the compute benefits offered 11135 17:06:22,360 --> 17:06:28,680 by the GPU. So that's probably one of the number one reasons is that you have to, for 11136 17:06:28,680 --> 17:06:35,600 data to be processed on a GPU, you have to copy it because it is by default on the CPU. 11137 17:06:35,600 --> 17:06:40,840 If you have to copy it to that GPU, you have some overhead time for doing that copy into 11138 17:06:40,840 --> 17:06:45,600 the GPU memory. And then although the GPU will probably compute faster on that data 11139 17:06:45,600 --> 17:06:50,280 once it's there, you still have that back and forth of going between the CPU and the 11140 17:06:50,280 --> 17:07:01,480 GPU. And the number two reason is that the hardware you're using has a better CPU in 11141 17:07:01,480 --> 17:07:08,800 terms of compute capability than the GPU. Now, this is quite a bit rarer. Usually if 11142 17:07:08,800 --> 17:07:14,480 you're using a GPU like a fairly modern GPU, it will be faster at computing, deep learning 11143 17:07:14,480 --> 17:07:21,000 or running deep learning algorithms than your general CPU. But sometimes these numbers 11144 17:07:21,000 --> 17:07:24,920 of compute time are really dependent on the hardware that you're running. So you'll get 11145 17:07:24,920 --> 17:07:29,360 the biggest benefits of speedups on the GPU when you're running larger models, larger 11146 17:07:29,360 --> 17:07:34,720 data sets, and more compute intensive layers in your neural networks. And so if you'd like 11147 17:07:34,720 --> 17:07:39,160 a great article on how to get the most out of your GPUs, it's a little bit technical, 11148 17:07:39,160 --> 17:07:43,880 but this is something to keep in mind as you progress as a machine learning engineer is 11149 17:07:43,880 --> 17:07:54,960 how to make your GPUs go burr. And I mean that burr from first principles. There we 11150 17:07:54,960 --> 17:08:01,320 go. Making deep learning go burr as in your GPU is going burr because it's running so 11151 17:08:01,320 --> 17:08:08,520 fast from first principles. So this is by Horace He who works on PyTorch. And it's 11152 17:08:08,520 --> 17:08:13,080 great. It talks about compute as a first principle. So here's what I mean by copying 11153 17:08:13,080 --> 17:08:17,080 memory and compute. There might be a fair few things you're not familiar with here, 11154 17:08:17,080 --> 17:08:21,840 but that's okay. But just be aware bandwidth. So bandwidth costs are essentially the cost 11155 17:08:21,840 --> 17:08:26,360 paid to move data from one place to another. That's what I was talking about copying stuff 11156 17:08:26,360 --> 17:08:32,800 from the CPU to the GPU. And then also there's one more, where is it overhead? Overhead is 11157 17:08:32,800 --> 17:08:37,240 basically everything else. I called it overhead. There are different terms for different things. 11158 17:08:37,240 --> 17:08:43,120 This article is excellent. So I'm going to just copy this in here. And you'll find this 11159 17:08:43,120 --> 17:08:51,680 in the resources, by the way. So for more on how to make your models compute faster, 11160 17:08:51,680 --> 17:08:59,520 see here. Lovely. So right now our baseline model is performing the best in terms of results. 11161 17:08:59,520 --> 17:09:05,800 And in terms of, or actually our model computing on the GPU is performing faster than our CPU. 11162 17:09:05,800 --> 17:09:10,360 Again yours might be slightly different. For my case, for my particular hardware, CUDA 11163 17:09:10,360 --> 17:09:16,840 is faster. Except model zero, our baseline is better than model one. So what's to do 11164 17:09:16,840 --> 17:09:24,400 next? Well, it's to keep experimenting, of course. I'll see you in the next video. Welcome 11165 17:09:24,400 --> 17:09:29,760 back. Now, before we move on to the next modeling experiment, let's get a results dictionary 11166 17:09:29,760 --> 17:09:35,400 for our model one, a model that we trained on. So just like we've got one for model zero, 11167 17:09:35,400 --> 17:09:39,800 let's create one of these for model one results. And we can create that without a vowel model 11168 17:09:39,800 --> 17:09:45,400 function. So we'll go right back down to where we were. I'll just get rid of this cell. 11169 17:09:45,400 --> 17:09:51,800 And let's type in here, get model one results dictionary. This is helpful. So later on, 11170 17:09:51,800 --> 17:09:56,680 we can compare all of our modeling results, because they'll all be in dictionary format. 11171 17:09:56,680 --> 17:10:05,200 So we're going to model one results equals a vowel model on a model equals model one. 11172 17:10:05,200 --> 17:10:12,040 And we can pass in a data loader, which is going to be our test data loader. Then we 11173 17:10:12,040 --> 17:10:16,840 can pass in a loss function, which is going to equal our loss function. And we can pass 11174 17:10:16,840 --> 17:10:25,680 in our accuracy function equals accuracy function. Wonderful. And then if we check out our model 11175 17:10:25,680 --> 17:10:34,760 one results, what do we get? Oh, no, we get an error. Do we get the code right? That looks 11176 17:10:34,760 --> 17:10:41,280 right to me. Oh, what does this say runtime error expected all tensors to be on the same 11177 17:10:41,280 --> 17:10:49,840 device, but found at least two devices, CUDA and CPU. Of course. So why did this happen? 11178 17:10:49,840 --> 17:10:54,880 Well, let's go back up to our of our model function, wherever we defined that. Here we 11179 17:10:54,880 --> 17:11:02,200 go. Ah, I see. So this is a little gotcha in pytorch or in deep learning in general. There's 11180 17:11:02,200 --> 17:11:05,520 a saying in the industry that deep learning models fail silently. And this is kind of 11181 17:11:05,520 --> 17:11:13,040 one of those ones. It's because our data and our model are on different devices. So remember 11182 17:11:13,040 --> 17:11:19,560 how I said the three big errors are shape mismatches with your data and your model device 11183 17:11:19,560 --> 17:11:24,800 mismatches, which is what we've got so far. And then data type mismatches, which is if 11184 17:11:24,800 --> 17:11:28,980 your data is in the wrong data type to be computed on. So what we're going to have to 11185 17:11:28,980 --> 17:11:35,560 do to fix this is let's bring down our vowel model function down to where we were. And 11186 17:11:35,560 --> 17:11:42,000 just like we've done in our test step and train step functions, where we've created 11187 17:11:42,000 --> 17:11:47,160 device agnostic data here, we've sent our data to the target device, we'll do that exact 11188 17:11:47,160 --> 17:11:52,080 same thing in our vowel model function. And this is just a note for going forward. It's 11189 17:11:52,080 --> 17:11:58,040 always handy to where you can create device agnostic code. So we've got our new of our 11190 17:11:58,040 --> 17:12:07,600 model function here for x, y in our data loader. Let's make our data device agnostic. So just 11191 17:12:07,600 --> 17:12:12,960 like our model is device agnostic, we've sent it to the target device, we will do the same 11192 17:12:12,960 --> 17:12:20,240 here, x dot two device, and then y dot two device. Let's see if that works. We will 11193 17:12:20,240 --> 17:12:25,200 just rerun this cell up here. I'll grab this, we're just going to write the exact same 11194 17:12:25,200 --> 17:12:30,720 code as what we did before. But now it should work because we've sent our, we could actually 11195 17:12:30,720 --> 17:12:36,520 also just pass in the target device here, device equals device. That way we can pass 11196 17:12:36,520 --> 17:12:41,920 in whatever device we want to run it on. And we're going to just add in device here, 11197 17:12:41,920 --> 17:12:50,800 device equals device. And let's see if this runs correctly. Beautiful. So if we compare 11198 17:12:50,800 --> 17:12:57,520 this to our model zero results, it looks like our baseline's still out in front. But that's 11199 17:12:57,520 --> 17:13:02,280 okay. We're going to in the next video, start to step things up a notch and move on to convolutional 11200 17:13:02,280 --> 17:13:07,080 neural networks. This is very exciting. And by the way, just remember, if your numbers 11201 17:13:07,080 --> 17:13:12,760 here aren't exactly the same as mine, don't worry too much. If they're out landishly different, 11202 17:13:12,760 --> 17:13:16,520 just go back through your code and see if it's maybe a cell hasn't been run correctly 11203 17:13:16,520 --> 17:13:20,760 or something like that. If there are a few decimal places off, that's okay. That's due 11204 17:13:20,760 --> 17:13:26,280 to the inherent randomness of machine learning and deep learning. But with that being said, 11205 17:13:26,280 --> 17:13:33,360 I'll see you in the next video. Let's get our hands on convolutional neural networks. 11206 17:13:33,360 --> 17:13:38,160 Welcome back. In the last video, we saw that our second modeling experiment, model one, 11207 17:13:38,160 --> 17:13:42,720 didn't quite beat our baseline. But now we're going to keep going with modeling experiments. 11208 17:13:42,720 --> 17:13:46,800 And we're going to move on to model two. And this is very exciting. We're going to build 11209 17:13:46,800 --> 17:13:55,200 a convolutional neural network, which are also known as CNN. CNNs are also known as 11210 17:13:55,200 --> 17:14:10,200 com net. And CNNs are known for their capabilities to find patterns in visual data. So what are 11211 17:14:10,200 --> 17:14:14,440 we going to do? Well, let's jump back into the keynote. We had a look at this slide before 11212 17:14:14,440 --> 17:14:18,720 where this is the typical architecture of a CNN. There's a fair bit going on here, but 11213 17:14:18,720 --> 17:14:23,560 we're going to step through it one by one. We have an input layer, just like any other 11214 17:14:23,560 --> 17:14:29,120 deep learning model. We have to input some kind of data. We have a bunch of hidden layers 11215 17:14:29,120 --> 17:14:34,000 in our case in a convolutional neural network, you have convolutional layers. You often have 11216 17:14:34,000 --> 17:14:38,920 hidden activations or nonlinear activation layers. You might have a pooling layer. You 11217 17:14:38,920 --> 17:14:44,080 generally always have an output layer of some sort, which is usually a linear layer. And 11218 17:14:44,080 --> 17:14:48,560 so the values for each of these different layers will depend on the problem you're working 11219 17:14:48,560 --> 17:14:53,080 on. So we're going to work towards building something like this. And you'll notice that 11220 17:14:53,080 --> 17:14:57,320 a lot of the code is quite similar to the code that we've been writing before for other 11221 17:14:57,320 --> 17:15:02,120 PyTorch models. The only difference is in here is that we're going to use different 11222 17:15:02,120 --> 17:15:09,280 layer types. And so if we want to visualize a CNN in a colored block edition, we're going 11223 17:15:09,280 --> 17:15:13,480 to code this out in a minute. So don't worry too much. We have a simple CNN. You might 11224 17:15:13,480 --> 17:15:18,080 have an input, which could be this image of my dad eating some pizza with two thumbs 11225 17:15:18,080 --> 17:15:22,920 up. We're going to preprocess that input. We're going to, in other words, turn it into 11226 17:15:22,920 --> 17:15:29,680 a tensor in red, green and blue for an image. And then we're going to pass it through a 11227 17:15:29,680 --> 17:15:36,200 combination of convolutional layers, relu layers and pooling layers. Now again, this 11228 17:15:36,200 --> 17:15:40,720 is a thing to note about deep learning models. I don't want you to get too bogged down in 11229 17:15:40,720 --> 17:15:45,600 the order of how these layers go, because they can be combined in many different ways. 11230 17:15:45,600 --> 17:15:50,400 In fact, research is coming out almost every day, every week about how to best construct 11231 17:15:50,400 --> 17:15:56,840 these layers. The overall principle is what's more important is how do you get your inputs 11232 17:15:56,840 --> 17:16:01,560 into an idolized output? That's the fun part. And then of course, we have the linear output 11233 17:16:01,560 --> 17:16:06,080 layer, which is going to output however many classes or value for however many classes 11234 17:16:06,080 --> 17:16:13,800 that we have in the case of classification. And then if you want to make your CNN deeper, 11235 17:16:13,800 --> 17:16:19,240 this is where the deep comes from deep learning, you can add more layers. So the theory behind 11236 17:16:19,240 --> 17:16:24,120 this, or the practice behind this, is that the more layers you add to your deep learning 11237 17:16:24,120 --> 17:16:30,520 model, the more chances it has to find patterns in the data. Now, how does it find these patterns? 11238 17:16:30,520 --> 17:16:35,440 Well, each one of these layers here is going to perform, just like what we've seen before, 11239 17:16:35,440 --> 17:16:41,680 a different combination of mathematical operations on whatever data we feed it. And each subsequent 11240 17:16:41,680 --> 17:16:48,240 layer receives its input from the previous layer. In this case, there are some advanced 11241 17:16:48,240 --> 17:16:52,440 networks that you'll probably come across later in your research and machine learning 11242 17:16:52,440 --> 17:16:57,640 career that use inputs from layers that are kind of over here or the way down here or 11243 17:16:57,640 --> 17:17:02,280 something like that. They're known as residual connections. But that's beyond the scope of 11244 17:17:02,280 --> 17:17:06,960 what we're covering for now. We just want to build our first convolutional neural network. 11245 17:17:06,960 --> 17:17:11,920 And so let's go back to Google Chrome. I'm going to show you my favorite website to learn 11246 17:17:11,920 --> 17:17:17,680 about convolutional neural networks. It is the CNN explainer website. And this is going 11247 17:17:17,680 --> 17:17:22,000 to be part of your extra curriculum for this video is to spend 20 minutes clicking and 11248 17:17:22,000 --> 17:17:26,040 going through this entire website. We're not going to do that together because I would 11249 17:17:26,040 --> 17:17:30,720 like you to explore it yourself. That is the best way to learn. So what you'll notice up 11250 17:17:30,720 --> 17:17:36,920 here is we have some images of some different sort. And this is going to be our input. So 11251 17:17:36,920 --> 17:17:41,600 let's start with pizza. And then we have a convolutional layer, a relu layer, a conv 11252 17:17:41,600 --> 17:17:47,440 layer, a relu layer, max pool layer, com to relu to com to relu to max pool to this 11253 17:17:47,440 --> 17:17:51,880 architecture is a convolutional neural network. And it's running live in the browser. And 11254 17:17:51,880 --> 17:17:57,680 so we pass this image, you'll notice that it breaks down into red, green and blue. And 11255 17:17:57,680 --> 17:18:01,720 then it goes through each of these layers and something happens. And then finally, we 11256 17:18:01,720 --> 17:18:07,000 have an output. And you notice that the output has 10 different classes here, because we 11257 17:18:07,000 --> 17:18:14,920 have one, two, three, four, five, six, seven, eight, nine, 10, different classes of image 11258 17:18:14,920 --> 17:18:19,600 in this demo here. And of course, we could change this if we had 100 classes, we might 11259 17:18:19,600 --> 17:18:25,560 change this to 100. But the pieces of the puzzle here would still stay quite the same. 11260 17:18:25,560 --> 17:18:30,320 And you'll notice that the class pizza has the highest output value here, because our 11261 17:18:30,320 --> 17:18:35,840 images of pizza, if we change to what is this one, espresso, it's got the highest 11262 17:18:35,840 --> 17:18:40,200 value there. So this is a pretty well performing convolutional neural network. Then we have 11263 17:18:40,200 --> 17:18:45,800 a sport car. Now, if we clicked on each one of these, something is going to happen. Let's 11264 17:18:45,800 --> 17:18:52,600 find out. We have a convolutional layer. So we have an input of an image here that 64 11265 17:18:52,600 --> 17:18:58,560 64 by three. This is color channels last format. So we have a kernel. And this kernel, this 11266 17:18:58,560 --> 17:19:02,000 is what happens inside a convolutional layer. And you might be going, well, there's a lot 11267 17:19:02,000 --> 17:19:06,400 going on here. And yes, of course, there is if this is the first time you ever seen this. 11268 17:19:06,400 --> 17:19:11,680 But essentially, what's happening is a kernel, which is also known as a filter, is going 11269 17:19:11,680 --> 17:19:17,240 over our image pixel values, because of course, they will be in the format of a tensor. And 11270 17:19:17,240 --> 17:19:22,800 trying to find small little intricate patterns in that data. So if we have a look here, and 11271 17:19:22,800 --> 17:19:26,200 this is why it's so valuable to go through this and just play around with it, we start 11272 17:19:26,200 --> 17:19:29,920 in a top left corner, and then slowly move along, you'll see on the output on the right 11273 17:19:29,920 --> 17:19:33,440 hand side, we have another little square. And do you notice in the middle all of those 11274 17:19:33,440 --> 17:19:38,960 numbers changing? Well, that is the mathematical operation that's happening as a convolutional 11275 17:19:38,960 --> 17:19:44,920 layer convolves over our input image. How cool is that? And you might be able to see on the 11276 17:19:44,920 --> 17:19:49,600 output there that there's some slight values for like, look around the headlight here. Do 11277 17:19:49,600 --> 17:19:57,240 you notice on the right how there's some activation? There's some red tiles there? Well, that 11278 17:19:57,240 --> 17:20:02,400 just means that potentially this layer or this hidden unit, and I want to zoom out for 11279 17:20:02,400 --> 17:20:10,960 a second, is we have 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 hidden units. Each one of these is 11280 17:20:10,960 --> 17:20:15,960 going to learn a different feature about the data. And now the beauty of deep learning, 11281 17:20:15,960 --> 17:20:20,360 but also one of the curses of deep learning is that we don't actually control what each 11282 17:20:20,360 --> 17:20:26,120 one of these learns. The magic of deep learning is that it figures it out itself what is 11283 17:20:26,120 --> 17:20:32,240 best to learn. We go into here, notice that each one we click on has a different representation 11284 17:20:32,240 --> 17:20:37,800 on the right hand side. And so this is what's going to happen layer by layer as it goes 11285 17:20:37,800 --> 17:20:42,200 through the convolutional neural network. And so if you want to read about what is a convolutional 11286 17:20:42,200 --> 17:20:46,560 neural network, you can go through here. But we're going to replicate this exact neural 11287 17:20:46,560 --> 17:20:51,400 network here with PyTorch code. That's how I'd prefer to learn it. But if you want the 11288 17:20:51,400 --> 17:20:55,600 intuition behind it, the math behind it, you can check out all of these resources here. 11289 17:20:55,600 --> 17:21:01,200 That is your extra curriculum for this video. So we have an input layer, we have a convolutional 11290 17:21:01,200 --> 17:21:06,520 layer, you can see how the input gets modified by some sort of mathematical operation, which 11291 17:21:06,520 --> 17:21:12,360 is of course, the convolutional operation. And we have there all different numbers finding 11292 17:21:12,360 --> 17:21:17,040 different patterns and data. This is a really good example here. You notice that the outputs 11293 17:21:17,040 --> 17:21:21,880 eyes slightly changes, that'll be a trend throughout each layer. And then we can understand 11294 17:21:21,880 --> 17:21:25,880 the different hyper parameters, but I'm going to leave this for you to explore on your own. 11295 17:21:25,880 --> 17:21:30,600 In the next video, we're going to start to write PyTorch code to replicate everything 11296 17:21:30,600 --> 17:21:40,600 that's going on here. So I'm going to link this in here to find out what's happening 11297 17:21:40,600 --> 17:21:50,680 inside CNN. See this website here. So join me in the next video. This is super exciting. 11298 17:21:50,680 --> 17:21:55,880 We're going to build our first convolutional neural network for computer vision. I'll see 11299 17:21:55,880 --> 17:22:02,840 you there. Welcome back. In the last video, we went briefly through the CNN explainer 11300 17:22:02,840 --> 17:22:07,720 website, which is my favorite resource for learning about convolutional neural networks. 11301 17:22:07,720 --> 17:22:11,880 And of course, we could spend 20 minutes clicking through everything here to find out what's 11302 17:22:11,880 --> 17:22:17,560 going on with a convolutional neural network, or we could start to code one up. So how about 11303 17:22:17,560 --> 17:22:25,400 we do that? Hey, if and down, code it out. So we're going to create a convolutional neural 11304 17:22:25,400 --> 17:22:30,040 network. And what I'm going to do is I'm going to build this, or we're going to build this 11305 17:22:30,040 --> 17:22:35,880 model together in this video. And then because it's going to use layers or PyTorch layers 11306 17:22:35,880 --> 17:22:40,280 that we haven't looked at before, we're going to spend the next couple of videos stepping 11307 17:22:40,280 --> 17:22:45,480 through those layers. So just bear with me, as we code this entire model together, we'll 11308 17:22:45,480 --> 17:22:50,920 go break it down in subsequent videos. So let's build our first convolutional neural 11309 17:22:50,920 --> 17:22:55,240 network. That's a mouthful, by the way, I'm just going to probably stick to saying CNN. 11310 17:22:55,240 --> 17:23:02,280 Fashion MNIST, we're up to model V2. We're going to subclass nn.module, as we always do 11311 17:23:02,280 --> 17:23:08,840 when we're building a PyTorch model. And in here, we're going to say model architecture 11312 17:23:10,120 --> 17:23:17,720 that replicates the tiny VGG. And you might be thinking, where did you get that from, Daniel? 11313 17:23:17,720 --> 17:23:27,000 Model from CNN explainer website. And so oftentimes, when convolutional neural networks or new 11314 17:23:27,000 --> 17:23:31,320 types of architecture come out, the authors of the research paper that present the model 11315 17:23:31,320 --> 17:23:36,280 get to name the model. And so that way, in the future, you can refer to different types of 11316 17:23:36,280 --> 17:23:42,200 model architectures with just a simple name, like tiny VGG. And people kind of know what's going on. 11317 17:23:42,200 --> 17:23:50,040 So I believe somewhere on here, it's called tiny VGG, tiny VGG. We have nothing. Yeah, 11318 17:23:50,680 --> 17:23:59,400 there we go. In tiny VGG. And do we have more than one tiny, tiny, yeah, tiny VGG. And if we 11319 17:23:59,400 --> 17:24:08,360 look up VGG, conv net, VGG 16 was one of the original ones, VGG, very deep convolutional neural 11320 17:24:08,360 --> 17:24:14,760 networks of VGG net. There's also ResNet, which is another convolutional neural network. 11321 17:24:16,120 --> 17:24:21,960 You can also, I don't want to give you my location, Google, you can go popular CNN 11322 17:24:21,960 --> 17:24:28,680 architectures. And this will give you a fair few options. Lynette is one of the first AlexNet, 11323 17:24:28,680 --> 17:24:33,560 ZF net, whole bunch of different resources. And also, how could you find out more about a 11324 17:24:33,560 --> 17:24:38,120 convolutional neural network? What is a convolutional neural network? You can go through that. But 11325 17:24:38,120 --> 17:24:43,320 let's stop that for a moment. Let's code this one up together. So we're going to initialize our 11326 17:24:44,280 --> 17:24:50,280 class here, def init. We're going to pass it in an input shape, just like we often do. 11327 17:24:50,840 --> 17:24:57,240 We're going to put in a number of hidden units, which is an int. And we're going to put in an 11328 17:24:57,240 --> 17:25:04,360 output shape, which is an int. Wonderful. So nothing to outlandish that we haven't seen before there. 11329 17:25:04,360 --> 17:25:13,000 And we're going to go super dot init to initialize our initializer for lack of a better way of 11330 17:25:13,000 --> 17:25:18,600 putting it. Now, we're going to create our neural network in a couple of blocks this time. And 11331 17:25:18,600 --> 17:25:24,200 you might often hear in when you learn more about convolutional neural networks, or I'll just tell 11332 17:25:24,200 --> 17:25:29,480 you that things are referred to are often referred to as convolutional blocks. So if we go back to 11333 17:25:29,480 --> 17:25:36,600 our keynote, this here, this combination of layers might be referred to as a convolutional block. 11334 17:25:36,600 --> 17:25:41,880 And a convolutional block, a deeper CNN, might be comprised of multiple convolutional blocks. 11335 17:25:42,680 --> 17:25:50,440 So to add to the confusion, a block is comprised of multiple layers. And then an overall architecture 11336 17:25:50,440 --> 17:25:56,520 is comprised of multiple blocks. And so the deeper and deeper your models get, the more blocks 11337 17:25:56,520 --> 17:26:01,960 it might be comprised of, and the more layers those blocks may be comprised of within them. 11338 17:26:02,920 --> 17:26:08,360 So it's kind of like Lego, which is very fun. So let's put together an an ensequential. 11339 17:26:09,640 --> 17:26:14,840 Now, the first few layers here that we're going to create in conv block one, uh, 11340 17:26:14,840 --> 17:26:21,880 nn.com 2d. Oh, look at that. Us writing us our first CNN layer. And we have to define something 11341 17:26:21,880 --> 17:26:29,240 here, which is in channels. So this channels refers to the number of channels in your visual data. 11342 17:26:29,240 --> 17:26:33,400 And we're going to put in input shape. So we're defining the input shape. This is going to be 11343 17:26:33,400 --> 17:26:39,160 the first layer in our model. The input shape is going to be what we define when we instantiate 11344 17:26:39,160 --> 17:26:44,920 this class. And then the out channels. Oh, what's the out channels going to be? Well, it's going 11345 17:26:44,920 --> 17:26:49,960 to be hidden units, just like we've done with our previous models. Now the difference here 11346 17:26:49,960 --> 17:26:55,320 is that in nn.com 2d, we have a number of different hyper parameters that we can set. 11347 17:26:55,320 --> 17:26:59,080 I'm going to set some pretty quickly here, but then we're going to step back through them, 11348 17:26:59,080 --> 17:27:04,280 not only in this video, but in subsequent videos. We've got a fair bit going on here. 11349 17:27:04,280 --> 17:27:08,840 We've got in channels, which is our input shape. We've got out channels, which are our hidden units. 11350 17:27:08,840 --> 17:27:14,520 We've got a kernel size, which equals three. Or this could be a tuple as well, three by three. 11351 17:27:14,520 --> 17:27:20,280 But I just like to keep it as three. We've got a stride and we've got padding. Now, 11352 17:27:21,080 --> 17:27:25,560 because these are values, we can set ourselves. What are they referred to as? 11353 17:27:26,840 --> 17:27:31,480 Let's write this down. Values, we can set ourselves in our neural networks. 11354 17:27:32,920 --> 17:27:40,360 In our nn's neural networks are called hyper parameters. So these are the hyper parameters 11355 17:27:40,360 --> 17:27:46,200 of nn.com 2d. And you might be thinking, what is 2d for? Well, because we're working with 11356 17:27:46,200 --> 17:27:51,640 two-dimensional data, our images have height and width. There's also com 1d for one-dimensional data, 11357 17:27:51,640 --> 17:27:55,320 3d for three-dimensional data. We're going to stick with 2d for now. 11358 17:27:56,040 --> 17:28:02,040 And so what do each of these hyper parameters do? Well, before we go through what each one of them 11359 17:28:02,040 --> 17:28:07,480 do, we're going to do that when we step by step through this particular layer. What we've just done 11360 17:28:07,480 --> 17:28:14,600 is we've replicated this particular layer of the CNN explainer website. We've still got the 11361 17:28:14,600 --> 17:28:18,520 relu. We've still got another conv and a relu and a max pool and a conv and a relu and a 11362 17:28:18,520 --> 17:28:24,360 conv and a relu and a max pool. But this is the block I was talking about. This is one block here 11363 17:28:25,400 --> 17:28:29,720 of this neural network, or at least that's how I've broken it down. And this is another block. 11364 17:28:30,360 --> 17:28:34,680 You might notice that they're comprised of the same layers just stacked on top of each other. 11365 17:28:34,680 --> 17:28:39,640 And then we're going to have an output layer. And if you want to learn about where the hyper 11366 17:28:39,640 --> 17:28:45,560 parameters came from, what we just coded, where could you learn about those? Well, one, you could 11367 17:28:45,560 --> 17:28:52,920 go, of course, to the PyTorch documentation, PyTorch, and then com 2d. You can read about it there. 11368 17:28:53,640 --> 17:28:58,200 There's the mathematical operation that we talked about or briefly stepped on before, 11369 17:28:58,200 --> 17:29:05,800 or touched on, stepped on. Is that the right word? So create a conv layer. It's there. 11370 17:29:06,440 --> 17:29:10,120 But also this is why I showed you this beautiful website so that you can read about these 11371 17:29:10,120 --> 17:29:15,080 hyper parameters down here. Understanding hyper parameters. So your extra curriculum for this 11372 17:29:15,080 --> 17:29:21,560 video is to go through this little graphic here and see if you can find out what padding means, 11373 17:29:21,560 --> 17:29:25,880 what the kernel size means, and what the stride means. I'm not going to read through this for you. 11374 17:29:25,880 --> 17:29:31,480 You can have a look at this interactive plot. We're going to keep coding because that's what 11375 17:29:31,480 --> 17:29:36,280 we're all about here. If and out, code it out. So we're going to now add a relu layer. 11376 17:29:37,240 --> 17:29:43,400 And then after that, we're going to add another conv 2d layer. And the in channels here is going 11377 17:29:43,400 --> 17:29:50,680 to be the hidden units, because we're going to take the output size of this layer and use it as 11378 17:29:50,680 --> 17:29:56,840 the input size to this layer. We're going to keep going here. Out channels equals hidden units again 11379 17:29:56,840 --> 17:30:03,880 in this case. And then the kernel size is going to be three as well. Stride will be one. Padding 11380 17:30:03,880 --> 17:30:08,840 will be one. Now, of course, we can change all of these values later on, but just bear with me 11381 17:30:08,840 --> 17:30:14,920 while we set them how they are. We'll have another relu layer. And then we're going to finish off 11382 17:30:14,920 --> 17:30:23,240 with a nn max pool 2d layer. Again, the 2d comes from the same reason we use comf2d. We're working 11383 17:30:23,240 --> 17:30:29,080 with 2d data here. And we're going to set the kernel size here to be equal to two. And of course, 11384 17:30:29,080 --> 17:30:33,560 this can be a tuple as well. So it can be two two. Now, where could you find out about nn max 11385 17:30:33,560 --> 17:30:42,920 pool 2d? Well, we go nn max pool 2d. What does this do? applies a 2d max pooling over an input 11386 17:30:42,920 --> 17:30:50,040 signal composed of several input planes. So it's taking the max of an input. And we've got some 11387 17:30:50,040 --> 17:30:55,400 parameters here, kernel size, the size of the window to take the max over. Now, where have we 11388 17:30:55,400 --> 17:31:01,560 seen a window before? I'm just going to close these. We come back up. Where did we see a window? 11389 17:31:01,560 --> 17:31:08,760 Let's dive into the max pool layer. See where my mouse is? Do you see that two by two? Well, 11390 17:31:08,760 --> 17:31:12,760 that's a window. Now, look at the difference between the input and the output. What's happening? 11391 17:31:13,320 --> 17:31:19,080 Well, we have a tile that's two by two, a window of four. And the max, we're taking the max of that 11392 17:31:19,080 --> 17:31:23,640 tile. In this case, it's zero. Let's find the actual value. There we go. So if you look at those 11393 17:31:23,640 --> 17:31:33,800 four numbers in the middle inside the max brackets, we have 0.07, 0.09, 0.06, 0.05. And the max of 11394 17:31:33,800 --> 17:31:39,880 all those is 0.09. And you'll notice that the input and the output shapes are different. The 11395 17:31:39,880 --> 17:31:46,280 output is half the size of the input. So that's what max pooling does, is it tries to take the max 11396 17:31:46,280 --> 17:31:54,120 value of whatever its input is, and then outputs it on the right here. And so as our data, 11397 17:31:54,120 --> 17:31:59,000 this is a trend in all of deep learning, actually. As our image moves through, this is what you'll 11398 17:31:59,000 --> 17:32:04,360 notice. Notice all the different shapes here. Even if you don't completely understand what's going 11399 17:32:04,360 --> 17:32:09,640 on here, you'll notice that the two values here on the left start to get smaller and smaller as 11400 17:32:09,640 --> 17:32:14,840 they go through the model. And what our model is trying to do here is take the input and learn a 11401 17:32:14,840 --> 17:32:20,600 compressed representation through each of these layers. So it's going to smoosh and smoosh and 11402 17:32:20,600 --> 17:32:27,800 smoosh trying to find the most generalizable patterns to get to the ideal output. And that 11403 17:32:27,800 --> 17:32:33,800 input is eventually going to be a feature vector to our final layer. So a lot going on there, 11404 17:32:33,800 --> 17:32:39,160 but let's keep coding. What we've just completed is this first block. We've got a cons layer, 11405 17:32:39,160 --> 17:32:44,120 a relu layer, a cons layer, a relu layer, and a max pool layer. Look at that, cons layer, 11406 17:32:44,120 --> 17:32:49,240 relu layer, cons layer, relu layer, max pool. Should we move on to the next block? We can do this 11407 17:32:49,240 --> 17:32:55,960 one a bit faster now because we've already coded the first one. So I'm going to do nn.sequential as 11408 17:32:55,960 --> 17:33:02,680 well. And then we're going to go nn.com2d. We're going to set the in channels. What should the 11409 17:33:02,680 --> 17:33:08,600 in channels be here? Well, we're going to set it to hidden units as well because our network is 11410 17:33:08,600 --> 17:33:13,320 going to flow just straight through all of these layers. And the output size of this is going to 11411 17:33:13,320 --> 17:33:19,640 be hidden units. And so we want the in channels to match up with the previous layers out channels. 11412 17:33:19,640 --> 17:33:28,040 So then we're going to go out channels equals hidden units as well. We're going to set the 11413 17:33:28,040 --> 17:33:36,120 kernel size, kernel size equals three, stride equals one, padding equals one, then what comes 11414 17:33:36,120 --> 17:33:43,560 next? Well, because the two blocks are identical, the con block one and com two, we can just go 11415 17:33:43,560 --> 17:33:52,200 the exact same combination of layers. And then relu and n.com2d in channels equals hidden units. 11416 17:33:53,480 --> 17:33:59,000 Out channels equals, you might already know this, hidden units. Then we have kernel size 11417 17:33:59,880 --> 17:34:06,280 equals three, oh, 32, don't want it that big, stride equals one, padding equals one, 11418 17:34:06,280 --> 17:34:13,480 and what comes next? Well, we have another relu layer, relu, and then what comes after that? 11419 17:34:13,480 --> 17:34:22,200 We have another max pool. And then max pool 2d, kernel size equals two, beautiful. Now, 11420 17:34:22,200 --> 17:34:27,720 what have we coded up so far? We've got this block, number one, that's what this one on the inside 11421 17:34:27,720 --> 17:34:33,640 here. And then we have com two, relu two, com two, relu two, max pool two. So we've built these 11422 17:34:33,640 --> 17:34:41,720 two blocks. Now, what do we need to do? Well, we need an output layer. And so what did we do before 11423 17:34:41,720 --> 17:34:49,640 when we made model one? We flattened the inputs of the final layer before we put them to the last 11424 17:34:49,640 --> 17:34:57,320 linear layer. So flatten. So this is going to be the same kind of setup as our classifier layer. 11425 17:34:57,880 --> 17:35:02,520 Now, I say that on purpose, because that's what you'll generally hear the last output layer 11426 17:35:02,520 --> 17:35:07,640 in a classification model called is a classifier layer. So we're going to have these two layers 11427 17:35:07,640 --> 17:35:12,040 are going to be feature extractors. In other words, they're trying to learn the patterns that 11428 17:35:12,040 --> 17:35:18,120 best represent our data. And this final layer is going to take those features and classify them 11429 17:35:18,120 --> 17:35:24,120 into our target classes. Whatever our model thinks best suits those features, or whatever our model 11430 17:35:24,120 --> 17:35:29,960 thinks those features that it learned represents in terms of our classes. So let's code it out. 11431 17:35:29,960 --> 17:35:36,600 We'll go down here. Let's build our classifier layer. This is our biggest neural network yet. 11432 17:35:37,400 --> 17:35:44,120 You should be very proud. We have an end of sequential again. And we're going to pass in 11433 17:35:44,120 --> 17:35:53,240 an end of flatten, because the output of these two blocks is going to be a multi-dimensional tensor, 11434 17:35:53,240 --> 17:36:00,200 something similar to this size 131310. So we want to flatten the outputs into a single feature 11435 17:36:00,200 --> 17:36:05,640 vector. And then we want to pass that feature vector to an nn.linear layer. And we're going to 11436 17:36:05,640 --> 17:36:13,720 go in features equals hidden units times something times something. Now, the reason I do this is 11437 17:36:13,720 --> 17:36:20,120 because we're going to find something out later on, or time zero, just so it doesn't error. But 11438 17:36:20,120 --> 17:36:25,160 sometimes calculating what you're in features needs to be is quite tricky. And I'm going to 11439 17:36:25,160 --> 17:36:30,120 show you a trick that I use later on to figure it out. And then we have out features relates 11440 17:36:30,120 --> 17:36:35,880 to our output shape, which will be the length of how many classes we have, right? One value for 11441 17:36:35,880 --> 17:36:42,280 each class that we have. And so with that being said, let's now that we've defined all of the 11442 17:36:42,280 --> 17:36:49,640 components of our tiny VGG architecture. There is a lot going on, but this is the same methodology 11443 17:36:49,640 --> 17:36:55,720 we've been using the whole time, defining some components, and then putting them together to 11444 17:36:55,720 --> 17:37:03,080 compute in some way in a forward method. So forward self X. How are we going to do this? 11445 17:37:03,640 --> 17:37:11,480 Are we going to set X is equal to self, comp block one X. So X is going to go through comp block one, 11446 17:37:11,480 --> 17:37:18,200 it's going to go through the comp 2D layer, relu layer, comp 2D layer, relu layer, max pool layer, 11447 17:37:18,200 --> 17:37:22,840 which will be the equivalent of an image going through this layer, this layer, this layer, 11448 17:37:22,840 --> 17:37:28,840 this layer, this layer, and then ending up here. So we'll set it to that. And then we can print out 11449 17:37:29,480 --> 17:37:36,680 X dot shape to get its shape. We'll check this later on. Then we pass X through comp block two, 11450 17:37:38,200 --> 17:37:42,760 which is just going to go through all of the layers in this block, which is equivalent to 11451 17:37:42,760 --> 17:37:48,520 the output of this layer going through all of these layers. And then because we've constructed a 11452 17:37:48,520 --> 17:37:54,120 classifier layer, we're going to take the output of this block, which is going to be here, and we're 11453 17:37:54,120 --> 17:37:59,960 going to pass it through our output layer, or what we've termed it, our classifier layer. I'll just 11454 17:37:59,960 --> 17:38:04,520 print out X dot shape here, so we can track the shape as our model moves through the architecture. 11455 17:38:04,520 --> 17:38:15,880 X equals self dot classifier X. And then we're going to return X. Look at us go. We just built 11456 17:38:15,880 --> 17:38:22,040 our first convolutional neural network by replicating what's on a CNN explainer website. 11457 17:38:22,600 --> 17:38:28,840 Now, that is actually very common practice in machine learning is to find some sort of architecture 11458 17:38:28,840 --> 17:38:35,560 that someone has found to work on some sort of problem and replicate it with code and see if it 11459 17:38:35,560 --> 17:38:41,640 works on your own problem. You'll see this quite often. And so now let's instantiate a model. 11460 17:38:42,200 --> 17:38:46,600 Go torch dot manual C. We're going to instantiate our first convolutional neural network. 11461 17:38:48,520 --> 17:38:57,640 Model two equals fashion amnest. We will go model V two. And we are going to set the input shape. 11462 17:38:57,640 --> 17:39:04,920 Now, what will the input shape be? Well, I'll come to the layer up here. The input shape 11463 17:39:04,920 --> 17:39:12,280 is the number of channels in our images. So do we have an image ready to go image shape? 11464 17:39:12,920 --> 17:39:18,280 This is the number of color channels in our image. We have one. If we had color images, 11465 17:39:18,280 --> 17:39:23,640 we would set the input shape to three. So the difference between our convolutional neural network, 11466 17:39:23,640 --> 17:39:30,520 our CNN, tiny VGG, and the CNN explainer tiny VGG is that they are using color images. So 11467 17:39:30,520 --> 17:39:36,520 their input is three here. So one for each color channel, red, green and blue. Whereas we have 11468 17:39:37,160 --> 17:39:41,720 black and white images. So we have only one color channel. So we set the input shape to one. 11469 17:39:42,360 --> 17:39:48,200 And then we're going to go hidden units equals 10, which is exactly the same as what tiny VGG 11470 17:39:48,200 --> 17:39:57,960 has used. 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. So that sets the hidden units value of each of our 11471 17:39:57,960 --> 17:40:04,440 layers. That's the power of creating an initializer with hidden units. And then finally, our output 11472 17:40:04,440 --> 17:40:09,320 shape is going to be what we've seen this before. This is going to be the length of our class names, 11473 17:40:09,320 --> 17:40:14,280 one value for each class in our data set. And of course, we're going to send this model to the 11474 17:40:14,280 --> 17:40:21,720 device. We're going to hit shift and enter. Oh, no, what did we get wrong? Out channels, 11475 17:40:21,720 --> 17:40:27,240 output shape. Where did I spell wrong? Out channels, out channels, out channels. I forgot an L. 11476 17:40:28,040 --> 17:40:34,600 Of course, typo. Oh, kernel size and other typo. Did you notice that? 11477 17:40:34,600 --> 17:40:42,600 Kernel size, kernel size, kernel size, kernel size. Where did we spell this wrong? Oh, here. 11478 17:40:44,440 --> 17:40:46,760 Kernel size. Are there any other typos? Probably. 11479 17:40:50,360 --> 17:40:55,640 A beautiful. There we go. Okay, what have we got? Initializing zero, obtenses and non-op. 11480 17:40:55,640 --> 17:41:00,120 Oh, so we've got an issue here and error here because I've got this. But this is just to, 11481 17:41:00,120 --> 17:41:07,800 there's a trick to calculating this. We're going to cover this in another video. But 11482 17:41:07,800 --> 17:41:13,320 pay yourself on the back. We've written a fair bit of code here. This is a convolutional neural 11483 17:41:13,320 --> 17:41:19,240 network that replicates the tiny VGG architecture on the CNN explainer website. Now, don't forget, 11484 17:41:19,240 --> 17:41:24,120 your extra curriculum is to go through this website for at least 20 minutes and read about 11485 17:41:24,120 --> 17:41:29,000 what's happening in our models. We're focused on code here. But this is particularly where you 11486 17:41:29,000 --> 17:41:33,160 want to pay attention to. If you read through this understanding hyper parameters and play around 11487 17:41:33,160 --> 17:41:37,880 with this, the next couple of videos will make a lot more sense. So read about padding, 11488 17:41:37,880 --> 17:41:43,800 read about kernel size and read about stride. I'll see you in the next video. We're going to go 11489 17:41:43,800 --> 17:41:51,080 through our network step by step. Welcome back. Now, I'm super stoked because in the last video, 11490 17:41:51,080 --> 17:41:58,120 we coded together our first ever convolutional neural network in PyTorch. So well done. We 11491 17:41:58,120 --> 17:42:03,560 replicated the tiny VGG architecture from the CNN explainer website, my favorite place for learning 11492 17:42:03,560 --> 17:42:10,600 about CNNs in the browser. So now we introduced two new layers that we haven't seen before, 11493 17:42:10,600 --> 17:42:17,800 conv2d and maxpool2d. But they all have the same sort of premise of what we've been doing so far 11494 17:42:17,800 --> 17:42:23,720 is that they're trying to learn the best features to represent our data in some way, shape or form. 11495 17:42:23,720 --> 17:42:29,640 Now, in the case of maxpool2d, it doesn't actually have any learnable parameters. It just takes 11496 17:42:29,640 --> 17:42:34,520 the max, but we're going to step through that later on. Let's use this video to step through 11497 17:42:34,520 --> 17:42:41,720 and then conv2d. We're going to do that with code. So I'll make a new heading here. 7.1 11498 17:42:43,080 --> 17:42:52,360 stepping through and then conv2d. Beautiful. Now, where could we find out what's going on 11499 17:42:52,360 --> 17:43:00,520 in an end comp2d? Well, of course, we have the documentation and then comp2d. We've got PyTorch. 11500 17:43:00,520 --> 17:43:05,800 So if you want to learn the mathematical operation that's happening, we have this value here, this 11501 17:43:06,360 --> 17:43:11,480 operation here. Essentially, it's saying the output is equal to the bias term times something, 11502 17:43:11,480 --> 17:43:18,040 plus the sum of the weight times something times the input. So do you see how just the weight 11503 17:43:18,040 --> 17:43:23,560 matrix, the weight tensor and the bias value, manipulating our input in some way equals the output? 11504 17:43:24,440 --> 17:43:33,480 Now, if we map this, we've got batch size, channels in, height, width, channels out, out, out, 11505 17:43:34,040 --> 17:43:37,640 et cetera, et cetera. But we're not going to focus too much on this. If you'd like to 11506 17:43:37,640 --> 17:43:44,280 read more into that, you can. Let's try it with code. And we're going to reproduce this particular 11507 17:43:44,280 --> 17:43:50,760 layer here, the first layer of the CNN explainer website. And we're going to do it with a dummy input. 11508 17:43:50,760 --> 17:43:55,480 In fact, that's one of my favorite ways to test things. So I'm just going to link here the documentation. 11509 17:43:58,040 --> 17:44:09,400 See the documentation for an end comp2d here. And if you'd like to read through more of this, 11510 17:44:09,400 --> 17:44:12,760 of course, this is a beautiful place to learn about what's going on. 11511 17:44:12,760 --> 17:44:18,840 There's the shape how to calculate the shape, height out, width out, et cetera. That's very 11512 17:44:18,840 --> 17:44:22,840 helpful if you need to calculate input and output shapes. But I'll show you my trick for doing so 11513 17:44:22,840 --> 17:44:30,440 later on. We have here, let's create some dummy data. So I'm going to set torch manual seed. We 11514 17:44:30,440 --> 17:44:38,600 need it to be the same size as our CNN explainer data. So 64, 64, 3. But we're going to do it 11515 17:44:38,600 --> 17:44:44,520 pie torch style. This is color channels last. We're going to do color channels first. So how 11516 17:44:44,520 --> 17:44:52,920 about we create a batch of images, we're going to be writing torch dot rand n. And we're going to 11517 17:44:52,920 --> 17:45:01,480 pass in size equals 32, three, 64, 64. And then we're going to create a singular image by taking 11518 17:45:01,480 --> 17:45:09,160 the first of that. So image is zero. Now, let's get the image batch shape. Because a lot of 11519 17:45:09,160 --> 17:45:15,000 machine learning, as I've said before, and deep learning is making sure your data has the right 11520 17:45:15,000 --> 17:45:25,320 shape. So let's check images dot shape. And let's check single image shape. We're going to go test 11521 17:45:25,320 --> 17:45:32,200 image dot shape. And finally, we're going to print, what does the test image look like? 11522 17:45:34,520 --> 17:45:39,240 We'll get this on a new line, hey, new line test image, this is of course not going to be an 11523 17:45:39,240 --> 17:45:44,440 actual image is just going to be a collection of random numbers. And of course, that is what 11524 17:45:44,440 --> 17:45:48,600 our model is currently comprised of model two, if we have a look at what's on the insides, 11525 17:45:49,400 --> 17:45:54,200 we are going to see a whole bunch of random numbers. Look at all this. What do we have? 11526 17:45:54,200 --> 17:46:02,120 We scroll up is going to give us a name for something. We have comp block two, two, we have a weight, 11527 17:46:02,120 --> 17:46:08,200 we have a bias, keep going up, we go right to the top, we have another weight, keep going down, 11528 17:46:08,200 --> 17:46:13,720 we have a bias, a weight, et cetera, et cetera. Now, our model is comprised of random numbers, 11529 17:46:13,720 --> 17:46:19,080 and what we are trying to do is just like all of our other models is pass data in and adjust the 11530 17:46:19,080 --> 17:46:25,160 random numbers within these layers to best represent our data. So let's see what happens 11531 17:46:25,160 --> 17:46:33,480 if we pass some random data through one of our comp2d layers. So let's go here, we're going to 11532 17:46:33,480 --> 17:46:46,040 create a single comp2d layer. So comp layer equals, what is it equal? And then comp2d, 11533 17:46:46,040 --> 17:46:51,960 and we're going to set the in channels is equal to what? Oh, revealed the answer too quickly. 11534 17:46:52,680 --> 17:46:59,880 Three. Why is it three? Well, it's because the in channels is the same number of color channels 11535 17:46:59,880 --> 17:47:09,400 as our images. So if we have a look at our test image shape, what do we have? Three, it has three 11536 17:47:09,400 --> 17:47:14,280 color channels. That is the same as the value here, except the order is reversed. This is color 11537 17:47:14,280 --> 17:47:20,840 channels last, pytorch defaults to color channels first. So, or for now it does, in the future this 11538 17:47:20,840 --> 17:47:25,880 may change. So just keep that in mind. So out channels equals 10. This is equivalent to the 11539 17:47:25,880 --> 17:47:32,600 number of hidden units we have. One. Oh, I don't want that one just yet. One, two, three, four, 11540 17:47:32,600 --> 17:47:39,000 five, six, seven, eight, nine, 10. So we have that 10 there. So we have 10 there. And then we have 11541 17:47:39,000 --> 17:47:44,440 kernel size. Oh, what is the kernel? Well, it's not KFC. I can tell you that. And then we have 11542 17:47:44,440 --> 17:47:48,680 stride. And then we have padding. We're going to step through these in a second. But let's check 11543 17:47:48,680 --> 17:47:55,240 out the kernel. And this kernel can also be three by three. But it's a shortcut to just type in three. 11544 17:47:55,240 --> 17:47:59,400 So that's what it actually means. If you just type in a single number, it's equivalent to typing in 11545 17:47:59,400 --> 17:48:04,200 a tuple. Now, of course, you could find that out by reading through the documentation here. 11546 17:48:04,200 --> 17:48:09,320 But where did I get that value? Well, let's dive into this beautiful website. And let's see what 11547 17:48:09,320 --> 17:48:15,880 happening. So we have a kernel here, which is also called a filter. So the thing I'm talking about 11548 17:48:15,880 --> 17:48:21,560 is this little square here, this kernel. Oh, we can see the weights there at the top. This is how 11549 17:48:21,560 --> 17:48:28,760 beautiful this website is. So if we go over there, this is what's going to happen. This is a convolution. 11550 17:48:28,760 --> 17:48:36,760 It starts with this little square, and it moves pixel by pixel across our image. And you'll notice 11551 17:48:36,760 --> 17:48:41,000 that the output is creating some sort of number there. And you'll notice in the middle, we have a 11552 17:48:41,000 --> 17:48:49,640 mathematical operation. This operation here is what's happening here. I wait times the input. 11553 17:48:50,920 --> 17:48:54,520 That's what we've got there. Now, the beauty of PyTorch is it does all of this behind the 11554 17:48:54,520 --> 17:48:59,160 scenes for us. So again, if you'd like to dig more into the mathematical operation behind the 11555 17:48:59,160 --> 17:49:03,480 scenes, you've got the resource here. And you've also got plenty of other resources online. We're 11556 17:49:03,480 --> 17:49:09,720 going to focus on code for now. So if we keep doing this across our entire image, we get this 11557 17:49:09,720 --> 17:49:14,680 output over here. So that's the kernel. And now where did I get three by three from? Well, look at 11558 17:49:14,680 --> 17:49:23,000 this. One, two, three, one, two, three, one, two, three, three by three, we have nine squares. Now, 11559 17:49:23,000 --> 17:49:28,680 if we scroll down, this was your extracurricular for the last video, understanding hyperparameters. 11560 17:49:28,680 --> 17:49:33,000 What happens if we change the kernel size to three by three? Have a look at the red square on the 11561 17:49:33,000 --> 17:49:38,840 left. Now, if we change it to two by two, it changed again. Three by three. This is our kernel, 11562 17:49:38,840 --> 17:49:43,880 or also known as a filter, passing across our image, performing some sort of mathematical 11563 17:49:43,880 --> 17:49:50,040 operation. And now the whole idea of a convolutional layer is to try and make sure that this kernel 11564 17:49:50,040 --> 17:49:55,880 performs the right operation to get the right output over here. Now, what do these kernels learn? 11565 17:49:55,880 --> 17:50:00,200 Well, that is entirely up to the model. That's the beauty of deep learning is that it 11566 17:50:00,200 --> 17:50:08,040 learns how to best represent our data, hopefully, on its own by looking at more data. And then so 11567 17:50:08,040 --> 17:50:13,160 if we jump back in here, so that's the equivalent of setting kernel size three by three. What if 11568 17:50:13,160 --> 17:50:17,720 we set the stride equal to one? Have we got this in the right order? It doesn't really matter. 11569 17:50:17,720 --> 17:50:25,480 Let's go through stride next. If we go to here, what does stride say? Stride of the convolution 11570 17:50:26,200 --> 17:50:32,040 of the convolving kernel. The default is one. Wonderful. Now, if we set the stride, 11571 17:50:32,040 --> 17:50:36,440 or if we keep it at one, it's a default one, it's going to hop over, watch the red square on the 11572 17:50:36,440 --> 17:50:43,480 left. It's going to hop over one pixel at a time. So the convolution, the convolving, happens one 11573 17:50:43,480 --> 17:50:48,920 pixel at a time. That's what the stride sets. Now, watch what happens when I change the stride 11574 17:50:48,920 --> 17:50:58,680 value to the output shape. Wow. Do you notice that it went down? So we have here, the kernel size 11575 17:50:58,680 --> 17:51:04,120 is still the same. But now we're jumping over two pixels at a time. Notice how on the left, 11576 17:51:04,120 --> 17:51:09,160 two pixels become available. And then if I jump over again, two pixels. So the reason why the 11577 17:51:09,160 --> 17:51:14,920 output compresses is because we're skipping some pixels as we go across the image. And now this 11578 17:51:14,920 --> 17:51:23,560 pattern happens throughout the entire network. That's one of the reasons why you see the size 11579 17:51:23,560 --> 17:51:31,160 of our input or the size of each layer go down over time. What our convolutional layer is doing, 11580 17:51:31,160 --> 17:51:36,280 and in fact, a lot of deep learning neural networks do, is they try to compress the input 11581 17:51:36,840 --> 17:51:43,960 into some representation that best suits the data. Because it would be no point of just memorizing 11582 17:51:43,960 --> 17:51:47,960 the exact patterns, you want to compress it in some way. Otherwise, you just might as well move 11583 17:51:47,960 --> 17:51:53,480 your input data around. You want to learn generalizable patterns that you can move around. And so we 11584 17:51:53,480 --> 17:51:58,280 keep going. We've got padding equals zero. Let's see what happens here. If we change the padding 11585 17:51:58,280 --> 17:52:06,600 value, what happens? Up, down. Notice the size here. Oh, we've added two extra pixels around the 11586 17:52:06,600 --> 17:52:11,960 edge. Now if we go down, one extra pixel. Now if we go zero, now why might we do that? 11587 17:52:13,000 --> 17:52:18,680 If we add some padding on the end, well, that's so that our kernel can operate on what's going on 11588 17:52:18,680 --> 17:52:23,880 here in the corner. In case there's some information on the edges of our image. Then you might be 11589 17:52:23,880 --> 17:52:28,040 thinking, Daniel, there's a whole bunch of values here. How do we know what to set them? 11590 17:52:28,040 --> 17:52:31,880 Well, you notice that I've just copied exactly what is going on here. 11591 17:52:33,560 --> 17:52:39,080 There's a three by three kernel. There's no padding on the image. And the stride is just going 11592 17:52:39,080 --> 17:52:44,600 one by one. And so that's often very common in machine learning, is that when you're just getting 11593 17:52:44,600 --> 17:52:49,480 started and you're not sure what values to set these values to, you just copy some existing 11594 17:52:49,480 --> 17:52:54,360 values from somewhere and see if it works on your own problem. And then if it doesn't, well, 11595 17:52:54,360 --> 17:53:00,600 you can adjust them. So let's see what happens when we do that. So pass the data through 11596 17:53:02,120 --> 17:53:09,320 the convolutional layer. So let's see what happens. Conv output equals conv layer. 11597 17:53:09,320 --> 17:53:15,400 Let's pass it our test image. And we'll check the conv output. What happens? 11598 17:53:17,720 --> 17:53:22,840 Oh no, we get an error. Of course we get a shape error. One of the most common issues of machine 11599 17:53:22,840 --> 17:53:28,680 learning and deep learning. So this is saying that our input for the conv layer expects a four 11600 17:53:28,680 --> 17:53:35,560 dimensional tensor, except it got a three dimensional input size of 364 64. Now, how do we add an 11601 17:53:35,560 --> 17:53:44,920 extra dimension to our test image? Let's have a look. How would we add a batch dimension over on 11602 17:53:44,920 --> 17:53:53,240 the left here? We can go unsqueeze zero. So now we have a four dimensional tensor. Now, just keep 11603 17:53:53,240 --> 17:53:59,320 in mind that if you're running this layer and then com2d on a pytorch version, that is, I believe 11604 17:53:59,320 --> 17:54:08,760 they fixed this or they changed it in pytorch. What am I on? I think this Google collab instance is 11605 17:54:08,760 --> 17:54:14,680 on 1.10. I think you might not get this error if you're running 1.11. So just keep that in mind. 11606 17:54:14,680 --> 17:54:20,920 Like this should work if you're running 1.11. But if it doesn't, you can always unsqueeze here. 11607 17:54:22,440 --> 17:54:28,280 And let's see what happens. Look at that. We get another tensor output. Again, 11608 17:54:28,280 --> 17:54:32,520 this is just all random numbers though, because our test image is just random numbers. And our 11609 17:54:32,520 --> 17:54:38,280 conv layer is instantiated with random numbers. But we'll set the manual seed here. Now, if our 11610 17:54:38,280 --> 17:54:41,880 numbers are different to what's, if your numbers are different to what's on my screen, don't worry 11611 17:54:41,880 --> 17:54:48,440 too much. Why is that? Because our conv layer is instantiated with random numbers. And our test 11612 17:54:48,440 --> 17:54:54,200 image is just random numbers as well. What we're paying attention to is the input and output shapes. 11613 17:54:55,240 --> 17:54:59,640 Do you see what just happened? We put our input image in there with three channels. 11614 17:54:59,640 --> 17:55:05,000 And now because we've set out channels to be 10, we've got 10. And we've got 62, 62. And this is 11615 17:55:05,000 --> 17:55:09,960 just the batch size. It just means one image. So essentially our random numbers, our test image, 11616 17:55:10,520 --> 17:55:15,080 have gone through the convolutional layer that we created, have gone through this mathematical 11617 17:55:15,080 --> 17:55:19,400 operation with regards to all the values that we've set, we've put the weight tensor, well, 11618 17:55:19,400 --> 17:55:24,360 actually PyTorch created that for us. PyTorch has done this whole operation for us. Thank you, 11619 17:55:24,360 --> 17:55:29,560 PyTorch. It's gone through all of these steps across. You could code this all by hand if you want, 11620 17:55:29,560 --> 17:55:34,920 but it's a lot easier and simpler to use a PyTorch layer. And it's done this. And now it's 11621 17:55:34,920 --> 17:55:41,720 created this output. Now, whatever this output is, I don't know, it is random numbers, but this 11622 17:55:41,720 --> 17:55:47,560 same process will happen if we use actual data as well. So let's see what happens if we change 11623 17:55:47,560 --> 17:55:54,440 the values kernel size we increase. Notice how our output has gotten smaller because we're using 11624 17:55:54,440 --> 17:55:59,080 a bigger kernel to convolve across the image. What if we put this to three, three back to what it 11625 17:55:59,080 --> 17:56:04,120 was and stride of two? What do you think will happen? Well, our output size basically halves 11626 17:56:04,120 --> 17:56:08,840 because we're skipping two pixels at a time. We'll put that back to one. What do you think will 11627 17:56:08,840 --> 17:56:16,840 happen if we set padding to one? 64, 64. We get basically the same size because we've added an 11628 17:56:16,840 --> 17:56:21,960 extra pixel around the edges. So you can play around with this. And in fact, I encourage you to 11629 17:56:21,960 --> 17:56:27,400 do this is what we just did. Padding one, we just added an extra dummy zero pixel around the edges. 11630 17:56:27,400 --> 17:56:34,040 So practice with this, see what happens as you pass our test image, random numbers, 11631 17:56:34,040 --> 17:56:40,120 through a conv 2d layer with different values here. What do you think will happen if you change 11632 17:56:40,120 --> 17:56:48,840 this to 64? Give that a shot and I'll see you in the next video. Who's ready to step through 11633 17:56:48,840 --> 17:56:55,240 the nn max pool 2d layer? Put your hand up. I've got my hand up. So let's do it together, hey, 11634 17:56:55,240 --> 17:57:01,560 we've got 7.2. Now you might have already given this a shot yourself. Stepping through 11635 17:57:03,000 --> 17:57:12,200 nn max pool 2d. And this is this is what I do for a lot of different concepts that I haven't 11636 17:57:12,200 --> 17:57:17,640 gone through before is I just write some test code and see what the inputs and outputs are. 11637 17:57:18,200 --> 17:57:23,240 And so where could we find out about max pool 2d? Well, of course, we've got the documentation. 11638 17:57:23,240 --> 17:57:30,040 I'm just going to link this in here. Max pool 2d. In the simplest case, the output value of 11639 17:57:30,040 --> 17:57:38,360 layer with input size nchw output nch out w out. By the way, this is number of batches, 11640 17:57:38,360 --> 17:57:44,680 color channels, height, width. And this is the output of that layer. And kernel size, which is 11641 17:57:44,680 --> 17:57:52,600 a parameter up here, k h k w can be precisely described as out is going to be the max of some 11642 17:57:52,600 --> 17:57:59,800 value, depending on the kernel size and the stride. So let's have a look at that in practice. 11643 17:57:59,800 --> 17:58:03,960 And of course, you can read further through the documentation here. I'll just grab the link for 11644 17:58:03,960 --> 17:58:13,880 this actually. So it's here. Wonderful. And let's now first try it with our test image that we 11645 17:58:13,880 --> 17:58:20,840 created above. So just highlight what the test image is. A bunch of random numbers in the same 11646 17:58:20,840 --> 17:58:27,160 shape as what a single image would be if we were to replicate the image size of the CNN explainer. 11647 17:58:28,440 --> 17:58:33,960 By the way, we'll have a look at a visual in a second of max pool here. But you can go through 11648 17:58:33,960 --> 17:58:39,640 that on your own time. Let's if in doubt, code it out. So we're going to print out the original 11649 17:58:39,640 --> 17:58:48,760 image shape without unsqueezed dimension. Because recall that we had to add an extra dimension to 11650 17:58:48,760 --> 17:58:53,800 pass it through our com2d layer. Now, if you're using a later version of PyTorch, you might not 11651 17:58:53,800 --> 17:58:58,600 get an error if you only use a three dimensional image tensor and pass it through a comp layer. 11652 17:58:59,400 --> 17:59:08,520 So we're going to pass it in test image, original shape, test image dot shape. So this is just going 11653 17:59:08,520 --> 17:59:13,800 to tell us what the line of code in the cell above tells us. But that's fine. I like to make 11654 17:59:13,800 --> 17:59:21,640 pretty printouts, you know, test image with unsqueezed dimension. So this is just going to be our test 11655 17:59:21,640 --> 17:59:28,040 image. And we're going to see what happens when we unsqueeze a dimension, unsqueeze on zero 11656 17:59:28,040 --> 17:59:34,360 for dimension. That is about to say first, but it's the zero. Now we're going to create a sample 11657 17:59:34,360 --> 17:59:44,680 nn max pool 2d layer. Because remember, even layers themselves in torch dot nn are models 11658 17:59:45,480 --> 17:59:49,960 of their own accord. So we can just create a single, this is like creating a single layer model here. 11659 17:59:50,520 --> 17:59:55,400 We'll set the kernel size equal to two. And recall, if we go back to CNN explainer, 11660 17:59:55,960 --> 18:00:01,960 kernel size equal to two results in a two by two square, a two by two kernel that's going to 11661 18:00:01,960 --> 18:00:09,960 convolve over our image, like so. And this is an example input, an example output. And you can see 11662 18:00:09,960 --> 18:00:15,400 the operation that max pooling does here. So just keep that in mind as we pass some sample data 11663 18:00:15,400 --> 18:00:21,960 through our max pool layer. And now let's pass data through it. I actually will pass it through 11664 18:00:21,960 --> 18:00:28,040 just the conv layer first, through just the conv layer. Because that's sort of how you might stack 11665 18:00:28,040 --> 18:00:32,120 things, you might put a convolutional layer and then a max pool layer on top of that convolutional 11666 18:00:32,120 --> 18:00:39,640 layer. So test image through conv. We'll create a variable here, equals our conv layer. 11667 18:00:41,880 --> 18:00:48,840 Is going to take as an input, our test image dot unsqueeze on the zero dimension again. 11668 18:00:50,280 --> 18:00:55,880 Beautiful. Now we're going to print out the shape here. This is just highlighting how I 11669 18:00:55,880 --> 18:01:00,440 like to troubleshoot things is I do one step, print the shape, one step, print the shape, 11670 18:01:00,440 --> 18:01:08,840 see what is happening as our data moves through various layers. So test image through conv.shape, 11671 18:01:08,840 --> 18:01:16,440 we'll see what our conv layer does to the shape of our data. And then we're going to pass data through 11672 18:01:19,480 --> 18:01:24,200 max pool layer, which is the layer we created a couple of lines above this one here. 11673 18:01:24,200 --> 18:01:34,440 So let's see what happens. Test image through current type at the moment through conv and max 11674 18:01:34,440 --> 18:01:41,400 pool. So quite a long variable name here, but this is to help us avoid confusion of what's 11675 18:01:41,400 --> 18:01:47,320 going on. So we go test image through conv. So you notice how we're taking the output of our 11676 18:01:47,320 --> 18:01:53,720 convolutional layer, this here, and we're passing it through our max pool layer, which has another 11677 18:01:53,720 --> 18:02:02,840 typo. Wonderful. And finally, we'll print out the shape, shape after going through conv layer 11678 18:02:04,120 --> 18:02:13,160 and max pool layer. What happens here? So we want test image through conv and max pool. 11679 18:02:13,960 --> 18:02:21,000 Let's see how our max pool layer manipulates our test images shape. You ready? Three, two, 11680 18:02:21,000 --> 18:02:27,240 one, let's go. What do we get? Okay. So we have the test image original shape, 11681 18:02:27,240 --> 18:02:32,840 recall that our test image is just a collection of random numbers. And of course, our conv layer 11682 18:02:33,480 --> 18:02:39,480 is going to be instantiated with random numbers. And max pool actually has no parameters. It just 11683 18:02:39,480 --> 18:02:48,920 takes the maximum of a certain range of inner tensor. So when we unsqueeze the test image as the 11684 18:02:48,920 --> 18:02:55,480 input, we get an extra dimension here. When we pass it through our conv layer. Oh, where did this 11685 18:02:55,480 --> 18:03:03,640 64 come from? 164 64 64 64. Let's go back up to our conv layer. Do you notice how that we get the 11686 18:03:03,640 --> 18:03:09,320 64 there because we changed the out channels value? If we change this back to 10, like what's in the 11687 18:03:09,320 --> 18:03:17,240 CNN explainer model? One, two, three, four, five, six, seven, eight, nine, 10. What do you think will 11688 18:03:17,240 --> 18:03:25,560 happen there? Well, we get a little highlight here. 10. Then we keep going. I'll just get rid of 11689 18:03:25,560 --> 18:03:29,080 this extra cell. We don't need to check the version anymore. We'll check the test image 11690 18:03:29,080 --> 18:03:36,760 shapes still three 64 64. But then as we pass it through the conv layer here, we get a different 11691 18:03:36,760 --> 18:03:42,120 size now. So it originally had three channels as the input for color channels, but we've upscaled 11692 18:03:42,120 --> 18:03:51,640 it to 10 so that we have 10 hidden units in our layer. And then we have 64 64. Now, again, 11693 18:03:51,640 --> 18:03:56,920 these shapes will change if we change the values of what's going on here. So we might put padding 11694 18:03:56,920 --> 18:04:05,240 to zero. What happens there? Instead of 64 64, we get 62 62. And then what happens after we pass 11695 18:04:05,240 --> 18:04:15,640 it through the conv layer and then through the max pool layer? We've got 110 64 64. And now we have 11696 18:04:15,640 --> 18:04:23,240 110 32 32. Now, why is that? Well, let's go back into the CNN explainer, jump into this max pool 11697 18:04:23,240 --> 18:04:27,960 layer here. Maybe this one because it's got a bit more going on. Do you notice on the left here is 11698 18:04:27,960 --> 18:04:33,400 the input? And we've got a two by two kernel here. And so the max pooling layer, what it does is it 11699 18:04:33,400 --> 18:04:39,960 takes the maximum of whatever the input is. So you'll notice the input is 60 60 in this case. 11700 18:04:40,920 --> 18:04:47,080 Whereas the output over here is 30 30. Now, why is that? Well, because the max operation here is 11701 18:04:47,080 --> 18:04:53,720 reducing it from section of four numbers. So let's get one with a few different numbers. 11702 18:04:55,720 --> 18:05:01,240 There we go. That'll do. So it's taking it from four numbers and finding the maximum value within 11703 18:05:01,240 --> 18:05:09,480 those four numbers here. Now, why would it do that? So as we've discussed before, what deep learning 11704 18:05:09,480 --> 18:05:15,560 neural network is trying to do or in this case, a CNN is take some input data and figure out 11705 18:05:15,560 --> 18:05:21,480 what features best represent whatever the input data is and compress them into a feature vector 11706 18:05:21,480 --> 18:05:28,680 that is going to be our output. Now, the reason being for that is because you could consider it 11707 18:05:28,680 --> 18:05:32,760 from a neural networks perspective is that intelligence is compression. So you're trying to 11708 18:05:32,760 --> 18:05:39,720 compress the patterns that make up actual data into a smaller vector space, go from a higher 11709 18:05:39,720 --> 18:05:46,280 dimensional space to a smaller vector space in terms of dimensionality of a tensor. But still, 11710 18:05:46,280 --> 18:05:52,520 this smaller dimensionality space represents the original data and can be used to predict on future 11711 18:05:52,520 --> 18:05:59,880 data. So that's the idea behind Max Paul is, hey, if we've got these learned features from our 11712 18:05:59,880 --> 18:06:05,960 convolutional layers, will the patterns, will the most important patterns stay around if we just 11713 18:06:05,960 --> 18:06:11,560 take the maximum of a certain section? So do you notice how the input here, we still have, 11714 18:06:11,560 --> 18:06:16,760 you can still see the outline of the car here, albeit a little bit more pixelated, 11715 18:06:16,760 --> 18:06:21,960 but just by taking the max of a certain region, we've got potentially the most important feature 11716 18:06:21,960 --> 18:06:27,880 of that little section. And now, of course, you could customize this value here. If when we 11717 18:06:27,880 --> 18:06:32,520 create our max pool layer, you could increase the kernel size to four by four. What do you think 11718 18:06:32,520 --> 18:06:38,600 will happen if we can increase it to four? So here, we've got a two by two kernel. If we increase it 11719 18:06:38,600 --> 18:06:46,600 to four by four, what happens? Ah, do you notice that we've gone from 62 to 15, we've essentially 11720 18:06:46,600 --> 18:06:53,560 divided our feature space by four, we've compressed it even further. Now, will that work? Well, 11721 18:06:53,560 --> 18:06:57,560 I'm not sure. That's part of the experimental nature of machine learning, but we're going to 11722 18:06:57,560 --> 18:07:04,680 keep it at two for now. And so this is with our tensor here 6464. But now let's do the same as 11723 18:07:04,680 --> 18:07:09,800 what we've done above, but we'll do it with a smaller tensor so that we can really visualize 11724 18:07:09,800 --> 18:07:17,880 things. And we're going to just replicate the same operation that's going on here. So let's go here, 11725 18:07:17,880 --> 18:07:25,320 we'll create another random tensor. We'll set up the manual seed first. And we're going to create 11726 18:07:25,320 --> 18:07:35,160 a random tensor with a similar number of dimensions. Now, recall dimensions don't tell you, so this 11727 18:07:35,160 --> 18:07:43,080 is a dimension 1364 64. That is a dimension. The dimensions can have different values within 11728 18:07:43,080 --> 18:07:50,920 themselves. So we want to create a four dimensional tensor to our images. So what that means is, 11729 18:07:50,920 --> 18:07:57,560 let me just show you it's way easy to explain things when we've got code is torch dot rand n. 11730 18:07:57,560 --> 18:08:05,400 And we're going to set it up as size equals one, one, two, two. We can have a look at this random 11731 18:08:05,400 --> 18:08:12,520 tensor. It's got four dimensions. One, two, three, four. So you could have a batch size, 11732 18:08:12,520 --> 18:08:18,280 color channels, and height width, a very small image, but it's a random image here. But this is 11733 18:08:18,280 --> 18:08:24,520 quite similar to what we've got going on here, right? Four numbers. Now, what do you think will 11734 18:08:24,520 --> 18:08:31,160 happen if we create a max pool layer, just like we've done above, create a max pool layer. So we 11735 18:08:31,160 --> 18:08:36,760 go max pool layer, just repeating the code that we have in the cell above, that's all right, 11736 18:08:36,760 --> 18:08:46,280 a little bit of practice. Kernel size equals two. And then we're going to pass the random tensor 11737 18:08:46,280 --> 18:09:05,000 through the max pool layer. So we'll go max pool tensor equals max pool layer. And we're going 11738 18:09:05,000 --> 18:09:10,600 to pass it in the random tensor. Wonderful. And then we can print out some shapes and print 11739 18:09:10,600 --> 18:09:15,480 out some tenses. As we always do to visualize, visualize, visualize. So we're going to write in 11740 18:09:15,480 --> 18:09:24,680 here max pool tensor on a new line. We'll get in the max pool tensor. We'll see what this looks 11741 18:09:24,680 --> 18:09:32,760 like. And we'll also print out max pool tensor shape. And we can probably print out random tensor 11742 18:09:32,760 --> 18:09:39,160 itself, as well as its shape as well. We'll get the shape here, dot shape. And we'll do the same 11743 18:09:39,160 --> 18:09:52,920 for the random tensor. So print, get a new line, random tensor, new line, random tensor. And then 11744 18:09:52,920 --> 18:10:02,360 we'll get the shape. Random tensor shape, random tensor. Oh, a lot of coding here. That's, that's 11745 18:10:02,360 --> 18:10:06,760 the fun part about machine learning, right? You get to write lots of code. Okay. So we're 11746 18:10:06,760 --> 18:10:11,160 visualizing what's going on with our random tensor. This is what's happening within the max pool layer. 11747 18:10:11,160 --> 18:10:15,320 We've seen this from a few different angles now. So we have a random tensor of numbers, 11748 18:10:15,320 --> 18:10:21,160 and we've got a size here. But the max pool tensor, once we pass our random tensor, 11749 18:10:21,800 --> 18:10:30,840 through the max pool layer, what happens? Well, we have 0.3367, 1288, 2345, 2303. Now, 11750 18:10:30,840 --> 18:10:37,720 what's the max of all these? Well, it takes the max here is 3367. Oh, and we've got the random 11751 18:10:37,720 --> 18:10:44,360 tensor down there. We don't want that. And see how we've reduced the shape from two by two to one 11752 18:10:44,360 --> 18:10:51,560 by one. Now, what's going on here? Just for one last time to reiterate, the convolutional layer 11753 18:10:52,200 --> 18:10:59,080 is trying to learn the most important features within an image. So if we jump into here, 11754 18:10:59,080 --> 18:11:06,200 now, what are they? Well, we don't decide what a convolutional layer learns. It learns these 11755 18:11:06,200 --> 18:11:12,200 features on its own. So the convolutional layer learns those features. We pass them through a 11756 18:11:12,200 --> 18:11:18,120 relu nonlinear activation in case our data requires nonlinear functions. And then we pass 11757 18:11:18,120 --> 18:11:24,600 those learned features through a max pool layer to compress them even further. So the convolutional 11758 18:11:24,600 --> 18:11:29,880 layer can compress the features into a smaller space. But the max pooling layer really compresses 11759 18:11:29,880 --> 18:11:36,440 them. So that's the entire idea. One more time, we start with some input data. We design a neural 11760 18:11:36,440 --> 18:11:41,000 network, in this case, a convolutional neural network, to learn a compressed representation 11761 18:11:41,000 --> 18:11:46,360 of what our input data is, so that we can use this compressed representation to later on make 11762 18:11:46,360 --> 18:11:52,360 predictions on images of our own. And in fact, you can try that out if you wanted to click here 11763 18:11:52,360 --> 18:11:58,760 and add your own image. So I'd give that a go. That's your extension for this video. But now we've 11764 18:11:58,760 --> 18:12:06,520 stepped through the max pool 2D layer and the conv 2D layer. I think it's time we started to try 11765 18:12:06,520 --> 18:12:14,200 and use our tiny VGG network. This is your challenge is to create a dummy tensor and pass it through 11766 18:12:14,200 --> 18:12:21,160 this model. Pass it through its forward layer and see what happens to the shape of your dummy tensor 11767 18:12:21,160 --> 18:12:28,280 as it moves through conv block 1 and conv block 2. And I'll show you my trick to calculating 11768 18:12:28,280 --> 18:12:33,000 the in features here for this final layer, which is equivalent to this final layer here. 11769 18:12:34,120 --> 18:12:35,160 I'll see you in the next video. 11770 18:12:37,800 --> 18:12:42,120 Over the last few videos, we've been replicating the tiny VGG architecture 11771 18:12:42,120 --> 18:12:47,400 from the CNN explainer website. And I hope you know that this is this actually quite exciting 11772 18:12:47,400 --> 18:12:53,160 because years ago, this would have taken months of work. And we've just covered we've broken it 11773 18:12:53,160 --> 18:12:57,720 down over the last few videos and rebuilt it ourselves with a few lines of PyTorch code. 11774 18:12:58,760 --> 18:13:04,520 So that's just goes to show how powerful PyTorch is and how far the deep learning field has come. 11775 18:13:04,520 --> 18:13:10,040 But we're not finished yet. Let's just go over to our keynote. This is what we've done. 11776 18:13:10,040 --> 18:13:18,200 CNN explainer model. We have an input layer. We've created that. We have com2d layers. 11777 18:13:18,760 --> 18:13:23,400 We've created those. We have relo activation layers. We've created those. 11778 18:13:24,280 --> 18:13:29,400 And finally, we have pulling layers. And then we finish off with an output layer. 11779 18:13:30,040 --> 18:13:34,360 But now let's see what happens when we actually pass some data through this entire model. 11780 18:13:34,360 --> 18:13:41,480 And as I've said before, this is actually quite a common practice is you replicate a model 11781 18:13:41,480 --> 18:13:46,920 that you found somewhere and then test it out with your own data. So we're going to start off 11782 18:13:46,920 --> 18:13:53,240 by using some dummy data to make sure that our model works. And then we're going to pass through. 11783 18:13:53,240 --> 18:13:58,520 Oh, I've got another slide for this. By the way, here's a breakdown of torch and 11784 18:13:58,520 --> 18:14:04,280 N com2d. If you'd like to see it in text form, nothing here that we really haven't discussed before, but 11785 18:14:04,280 --> 18:14:10,200 this will be in the slides if you would like to see it. Then we have a video animation. 11786 18:14:10,200 --> 18:14:14,840 We've seen this before, though. And plus, I'd rather you go through the CNN explainer website 11787 18:14:14,840 --> 18:14:18,840 on your own and explore this different values rather than me just keep talking about it. 11788 18:14:19,480 --> 18:14:24,920 Here's what we're working towards doing. We have a fashion MNIST data set. And we have 11789 18:14:24,920 --> 18:14:30,040 our inputs. We're going to numerically encode them. We've done that already. Then we have our 11790 18:14:30,040 --> 18:14:35,960 convolutional neural network, which is a combination of convolutional layers, nonlinear activation 11791 18:14:35,960 --> 18:14:40,600 layers, pooling layers. But again, these could be comprised in many different ways, shapes and 11792 18:14:40,600 --> 18:14:46,440 forms. In our case, we've just replicated the tiny VGG architecture. And then finally, 11793 18:14:46,440 --> 18:14:52,200 we want to have an output layer to predict what class of clothing a particular input image is. 11794 18:14:52,200 --> 18:15:02,360 And so let's go back. We have our CNN model here. And we've got model two. So let's just practice 11795 18:15:02,360 --> 18:15:06,920 a dummy forward pass here. We're going to come back up a bit to where we were. We'll make sure 11796 18:15:06,920 --> 18:15:15,240 we've got model two. And we get an error here because I've times this by zero. So I'm going to 11797 18:15:15,240 --> 18:15:21,080 just remove that and keep it there. Let's see what happens if we create a dummy tensor and pass it 11798 18:15:21,080 --> 18:15:29,320 through here. Now, if you recall what our image is, do we have image? This is a fashion MNIST 11799 18:15:29,320 --> 18:15:38,280 image. So I wonder if we can go plot dot M not M show image. And I'm going to squeeze that. 11800 18:15:38,280 --> 18:15:48,440 And I'm going to set the C map equal to gray. So this is our current image. Wonderful. 11801 18:15:48,440 --> 18:15:56,040 So there's our current image. So let's create a tensor. Or maybe we just try to pass this through 11802 18:15:56,040 --> 18:16:02,600 the model and see what happens. How about we try that model image? All right, we're going to try 11803 18:16:02,600 --> 18:16:12,600 the first pass forward pass. So pass image through model. What's going to happen? Well, we get an 11804 18:16:12,600 --> 18:16:17,400 error. Another shape mismatch. We've seen this before. How do we deal with this? Because what 11805 18:16:17,400 --> 18:16:26,440 is the shape of our current image? 128, 28. Now, if you don't have this image instantiated, 11806 18:16:26,440 --> 18:16:33,080 you might have to go back up a few cells. Where did we create image? I'll just find this. So 11807 18:16:33,080 --> 18:16:38,280 just we created this a fairly long time ago. So I'm going to probably recreate it down the 11808 18:16:38,280 --> 18:16:45,080 bottom. My goodness, we've written a lot of code. Well, don't do us. We could create a dummy tensor 11809 18:16:45,080 --> 18:16:51,880 if we wanted to. How about we do that? And then if you want to find, oh, right back up here, 11810 18:16:51,880 --> 18:16:57,720 we have an image. How about we do that? We can just do it with a dummy tensor. That's fine. 11811 18:16:58,760 --> 18:17:03,560 We can create one of the same size. But if you have image instantiated, you can try that out. 11812 18:17:03,560 --> 18:17:10,040 So there's an image. Let's now create an image that is, or a random tensor, that is the same 11813 18:17:10,040 --> 18:17:20,840 shape as our image. So rand image tensor equals what torch dot rand n. And we're going to pass in 11814 18:17:21,400 --> 18:17:28,520 size equals 128, 28. Then if we get rand image tensor, 11815 18:17:32,440 --> 18:17:37,240 we check its shape. What do we get? So the same shape as our test image here, 11816 18:17:37,240 --> 18:17:40,280 but it's just going to be random numbers. But that's okay. We just want to highlight a point 11817 18:17:40,280 --> 18:17:45,800 here of input and output shapes. We want to make sure our model works. Can our random image tensor 11818 18:17:45,800 --> 18:17:50,360 go all the way through our model? That's what we want to find out. So we get an error here. 11819 18:17:50,360 --> 18:17:54,600 We have four dimensions, but our image is three dimensions. How do we add an extra dimension 11820 18:17:54,600 --> 18:17:58,920 for batch size? Now you might not get this error if you're running a later version of pie torch. 11821 18:17:58,920 --> 18:18:07,400 Just keep that in mind. So unsqueeze zero. Oh, expected all tensors to be on the same device, 11822 18:18:07,400 --> 18:18:12,200 but found at least two devices. Again, we're going through all the three major issues in deep 11823 18:18:12,200 --> 18:18:17,960 learning. Shape mismatch, device mismatch, data type mismatch. So let's put this on the device, 11824 18:18:17,960 --> 18:18:21,640 two target device, because we've set up device agnostic code. 11825 18:18:21,640 --> 18:18:27,960 That one and that two shapes cannot be multiplied. Oh, but we can output here. 11826 18:18:28,920 --> 18:18:34,040 That is very exciting. So what I might do is move this a couple of cells up so that we can 11827 18:18:34,040 --> 18:18:40,760 tell what's going on. I'm going to delete this cell. So where do these shapes come from? 11828 18:18:42,040 --> 18:18:46,120 Well, we printed out the shapes there. And so this is what's happened when our, 11829 18:18:46,120 --> 18:18:51,320 I'll just create our random tensor. I'll bring our random tensor up a bit too. Let's bring this up. 11830 18:18:53,080 --> 18:19:01,960 There we go. So we pass our random to image tensor through our model, and we've made sure it's 11831 18:19:01,960 --> 18:19:07,240 got four dimensions by unsqueeze zero. And we make sure it's on the same device as our model, 11832 18:19:07,240 --> 18:19:12,440 because our model has been sent to the GPU. And this is what happens as we pass our random 11833 18:19:12,440 --> 18:19:19,000 image tensor. We've got 12828 instead of previously we've seen 6464.3, which is going to clean this 11834 18:19:19,000 --> 18:19:27,160 up a bit. And we get different shapes here. So you'll notice that as our input, if it was 6464.3 11835 18:19:27,160 --> 18:19:33,400 goes through these layers, it gets shaped into different values. Now this is going to be universal 11836 18:19:33,400 --> 18:19:38,120 across all of the different data sets you work on, you will be working with different shapes. 11837 18:19:38,120 --> 18:19:44,200 So it's important to, and also quite fun, to troubleshoot what shapes you need to use for 11838 18:19:44,200 --> 18:19:48,360 your different layers. So this is where my trick comes in. To find out the shapes for different 11839 18:19:48,360 --> 18:19:53,720 layers, I often construct my models, how we've done here, as best I can with the information 11840 18:19:53,720 --> 18:19:58,360 that I've got, such as replicating what's here. But I don't really know what the output 11841 18:19:58,360 --> 18:20:03,480 shape is going to be before it goes into this final layer. And so I recreate the model as best 11842 18:20:03,480 --> 18:20:10,040 I can. And then I pass data through it in the form of a dummy tensor in the same shape as my 11843 18:20:10,040 --> 18:20:15,000 actual data. So we could customize this to be any shape that we wanted. And then I print the 11844 18:20:15,000 --> 18:20:21,880 shapes of what's happening through each of the forward past steps. And so if we pass it through 11845 18:20:21,880 --> 18:20:27,560 this random tensor through the first column block, it goes through these layers here. And then it 11846 18:20:27,560 --> 18:20:33,240 outputs a tensor with this size. So we've got 10, because that's how many output channels we've 11847 18:20:33,240 --> 18:20:41,640 set. And then 14, 14, because our 2828 tensor has gone through a max pool 2d layer and gone through 11848 18:20:41,640 --> 18:20:47,800 a convolutional layer. And then it goes through the next block, column block two, which is because 11849 18:20:47,800 --> 18:20:52,440 we've put it in the forward method here. And then it outputs the shape. And if we go back down, 11850 18:20:53,080 --> 18:20:59,000 we have now a shape of one 10, seven, seven. So our previous tensor, the output of column block one, 11851 18:20:59,000 --> 18:21:06,200 has gone from 1414 to seven seven. So it's been compressed. So let me just write this down here, 11852 18:21:06,760 --> 18:21:13,880 output shape of column block one, just so we get a little bit more information. 11853 18:21:15,800 --> 18:21:23,160 And I'm just going to copy this, put it in here, that will become block two. 11854 18:21:23,160 --> 18:21:31,160 And then finally, I want to know if I get an output shape of classifier. 11855 18:21:31,160 --> 18:21:39,160 So if I rerun all of this, I don't get an output shape of classifier. So my model is running into 11856 18:21:39,160 --> 18:21:45,160 trouble. Once it gets to, so I get the output of conv block one, I don't get an output of classifier. 11857 18:21:45,160 --> 18:21:51,160 So this is telling me that I have an issue with my classifier layer. Now I know this, but I'm 11858 18:21:51,160 --> 18:21:57,080 not. Now I know this because, well, I've coded this model before, and the in features here, 11859 18:21:57,080 --> 18:22:00,600 we need a special calculation. So what is going on with our shapes? 11860 18:22:02,200 --> 18:22:07,880 Mat one and mat two shapes cannot be multiplied. So do you see here, what is the rule of matrix 11861 18:22:07,880 --> 18:22:12,840 multiplication? The inner dimensions here have to match. We've got 490. Where could that number 11862 18:22:12,840 --> 18:22:21,080 have come from? And we've got 10 times 10. Now, okay, I know I've set hidden units to 10. 11863 18:22:21,080 --> 18:22:28,680 So maybe that's where that 10 came from. And what is the output layer of the output shape of conv 11864 18:22:28,680 --> 18:22:37,320 block two? So if we look, we've got the output shape of conv block two. Where does that go? 11865 18:22:38,520 --> 18:22:45,640 The output of conv block two goes into our classifier model. And then it gets flattened. 11866 18:22:45,640 --> 18:22:51,960 So that's telling us something there. And then our NN linear layer is expecting the output of 11867 18:22:51,960 --> 18:22:59,720 the flatten layer as it's in features. So this is where my trick comes into play. I pass the 11868 18:22:59,720 --> 18:23:06,600 output of conv block two into the classifier layer. It gets flattened. And then that's what 11869 18:23:06,600 --> 18:23:16,840 my NN not linear layer is expecting. So what happens if we flatten this shape here? Do we get 11870 18:23:16,840 --> 18:23:28,600 this value? Let's have a look. So if we go 10 times seven times seven, 490. Now, where was this 10? 11871 18:23:28,600 --> 18:23:38,120 Well, that's our hidden units. And where were these sevens? Well, these sevens are the output 11872 18:23:38,120 --> 18:23:45,400 of conv block two. So that's my trick. I print the shapes of previous layers and see whether or 11873 18:23:45,400 --> 18:23:52,920 not they line up with subsequent layers. So if we go time seven times seven, we're going to have 11874 18:23:52,920 --> 18:23:58,120 hidden units equals 10 times seven times seven. Where do we get the two sevens? Because that is 11875 18:23:58,120 --> 18:24:03,560 the output shape of conv block two. Do you see how this can be a little bit hard to calculate ahead 11876 18:24:03,560 --> 18:24:10,120 of time? Now, you could calculate this by hand if you went into n conv 2d. But I prefer to write 11877 18:24:10,120 --> 18:24:15,400 code to calculate things for me. You can calculate that value by hand. If you go through, 11878 18:24:16,280 --> 18:24:22,360 H out W out, you can add together all of the different parameters and multiply them and divide 11879 18:24:22,360 --> 18:24:27,640 them and whatnot. You can calculate the input and output shapes of your convolutional layers. 11880 18:24:28,200 --> 18:24:34,920 You're more than welcome to try that out by hand. But I prefer to code it out. If and out code it 11881 18:24:34,920 --> 18:24:42,200 out. Now, let's see what happens if we run our random image tensor through our model. Now, 11882 18:24:42,200 --> 18:24:47,480 do you think it will work? Well, let's find out. All we've done is we've added this little line 11883 18:24:47,480 --> 18:24:53,720 here, times seven times seven. And we've calculated that because we've gone, huh, what if we pass a 11884 18:24:53,720 --> 18:25:00,280 tensor of this dimension through a flattened layer? And what is our rule of matrix multiplication? 11885 18:25:00,280 --> 18:25:06,280 The inner dimensions here must match. And why do we know that these are matrices? Well, 11886 18:25:06,280 --> 18:25:10,840 mat one and mat two shapes cannot be multiplied. And we know that inside a linear layer 11887 18:25:10,840 --> 18:25:19,160 is a matrix multiplication. So let's now give this a go. We'll see if it works. 11888 18:25:22,040 --> 18:25:28,680 Oh, ho ho. Would you look at that? That is so exciting. We have the output shape of the classifier 11889 18:25:28,680 --> 18:25:35,960 is one and 10. We have a look, we have one number one, two, three, four, five, six, seven, eight, 11890 18:25:35,960 --> 18:25:45,400 nine, 10, one number for each class in our data set. Wow. Just like the CNN explain a website, 11891 18:25:45,400 --> 18:25:51,240 we have 10 outputs here. We just happen to have 10 classes as well. Now, this number again could be 11892 18:25:51,240 --> 18:25:55,160 whatever you want. It could be 100, could be 30, could be three, depending on how many classes 11893 18:25:55,160 --> 18:26:01,160 you have. But we have just figured out the input and output shapes of each layer in our model. 11894 18:26:01,160 --> 18:26:08,680 So that's very exciting. I think it's now time we've passed a random tensor through. How about we 11895 18:26:08,680 --> 18:26:14,280 pass some actual data through our model? In the next video, let's use our train and test step 11896 18:26:14,280 --> 18:26:19,800 functions to train our first convolutional neural network. I'll see you there. 11897 18:26:24,120 --> 18:26:28,600 Well, let's get ready to train our first CNN. So what do we need? Where are we up to in the 11898 18:26:28,600 --> 18:26:33,720 workflow? Well, we've built a model and we've stepped through it. We know what's going on, 11899 18:26:33,720 --> 18:26:39,960 but let's really see what's going on by training this CNN or see if it trains because we don't 11900 18:26:39,960 --> 18:26:46,680 always know if it will on our own data set, which is of fashion MNIST. So we're going to set up a 11901 18:26:46,680 --> 18:26:54,520 loss function and optimizer for model two. And just as we've done before, model two, turn that 11902 18:26:54,520 --> 18:27:00,200 into markdown. I'll just show you the workflow again. So this is what we're doing. We've got some 11903 18:27:00,200 --> 18:27:06,040 inputs. We've got a numerical encoding. We've built this architecture and hopefully it helps us 11904 18:27:06,040 --> 18:27:13,160 learn or it helps us make a predictive model that we can input images such as grayscale images of 11905 18:27:13,160 --> 18:27:21,320 clothing and predict. And if we look where we are at the PyTorch workflow, we've got our data ready. 11906 18:27:21,320 --> 18:27:29,000 We've built our next model. Now here's where we're up to picking a loss function and an optimizer. 11907 18:27:29,000 --> 18:27:38,120 So let's do that, hey, loss function, or we can do evaluation metrics as well. So set up loss 11908 18:27:38,120 --> 18:27:47,560 function slash eval metrics slash optimizer. And we want from helper functions, import accuracy 11909 18:27:47,560 --> 18:27:52,120 function, we don't need to reimport it, but we're going to do it anyway for completeness. Loss 11910 18:27:52,120 --> 18:27:58,520 function equals nn dot cross entropy loss, because we are working with a multi class classification 11911 18:27:58,520 --> 18:28:03,800 problem. And the optimizer, we're going to keep the same as what we've used before, torch dot 11912 18:28:03,800 --> 18:28:09,800 opt in SGD. And we'll pass it in this time, the params that we're trying to optimize are the 11913 18:28:09,800 --> 18:28:17,400 parameters of model two parameters. And we'll use a learning rate of 0.1. Run that. And just 11914 18:28:17,400 --> 18:28:25,160 to reiterate, here's what we're trying to optimize model two state dig. We have a lot of random 11915 18:28:25,160 --> 18:28:31,560 weights in model two. Have a look at all this. There's the bias, there's the weight. We're going 11916 18:28:31,560 --> 18:28:37,720 to try and optimize these to help us predict on our fashion MNIST data set. So without any further 11917 18:28:37,720 --> 18:28:44,760 ado, let's in the next video, go to the workflow, we're going to build our training loop. But thanks 11918 18:28:44,760 --> 18:28:50,920 to us before, we've now got functions to do this for us. So if you want to give this a go, 11919 18:28:50,920 --> 18:28:58,520 use our train step and test step function to train model two. Try that out. And we'll do it 11920 18:28:58,520 --> 18:29:06,920 together in the next video. We're getting so close to training our model. Let's write some code to 11921 18:29:06,920 --> 18:29:11,720 train our first thing in that model. Training and testing, I'm just going to make another heading 11922 18:29:11,720 --> 18:29:21,000 here. Model two, using our training and test functions. So we don't have to rewrite all of the 11923 18:29:21,000 --> 18:29:25,720 steps in a training loop and a testing loop, because we've already created that functionality 11924 18:29:25,720 --> 18:29:32,680 before through our train step function. There we go. Performs the training, or this should be 11925 18:29:32,680 --> 18:29:39,560 performs a training step with model trying to learn on data loader. So let's set this up. 11926 18:29:39,560 --> 18:29:45,720 We're going to set up torch manual seed 42, and we can set up a CUDA manual seed as well. 11927 18:29:46,600 --> 18:29:51,080 Just to try and make our experiments as reproducible as possible, because we're going to be using 11928 18:29:51,080 --> 18:29:56,360 CUDA, we're going to measure the time because we want to compare our models, not only their 11929 18:29:56,360 --> 18:30:02,920 performance in evaluation metrics, but how long they take to train from time it, because there's 11930 18:30:02,920 --> 18:30:10,200 no point having a model that performs really, really well, but takes 10 times longer to train. 11931 18:30:10,920 --> 18:30:16,520 Well, maybe there is, depending on what you're working on. Model two equals timer, 11932 18:30:19,000 --> 18:30:24,600 and we're going to train and test model, but the time is just something to be aware of, 11933 18:30:24,600 --> 18:30:29,800 is that usually a better performing model will take longer to train. Not always the case, but 11934 18:30:29,800 --> 18:30:36,760 just something to keep in mind. So for epoch in, we're going to use TQDM to measure the progress. 11935 18:30:37,400 --> 18:30:40,680 We're going to create a range of epochs. We're just going to train for three epochs, 11936 18:30:40,680 --> 18:30:48,760 keeping our experiment short for now, just to see how they work, epoch, and we're going to 11937 18:30:48,760 --> 18:30:54,760 print a new line here. So for an epoch in a range, we're going to do the training step, 11938 18:30:54,760 --> 18:31:00,120 which is our train step function. The model is going to be equal to model two, which is our 11939 18:31:00,120 --> 18:31:05,240 convolutional neural network, our tiny VGG. The data loader is just going to be equal to the 11940 18:31:05,240 --> 18:31:10,120 train data loader, the same one we've used before. The loss function is going to be equal to the 11941 18:31:10,120 --> 18:31:16,600 loss function that we've set up above, loss FN. The optimizer as well is going to be 11942 18:31:17,160 --> 18:31:22,200 the optimizer in our case, stochastic gradient descent, optimizer equals optimizer, 11943 18:31:22,200 --> 18:31:26,920 then we set up the accuracy function, which is going to be equal to our accuracy function, 11944 18:31:27,480 --> 18:31:36,360 and the device is going to be the target device. How easy was that? Now we do the same for the 11945 18:31:36,360 --> 18:31:41,640 train or the testing step, sorry, the model is going to be equal to model two, and then the data 11946 18:31:41,640 --> 18:31:51,000 loader is going to be the test data loader, and then the loss function is going to be our same 11947 18:31:51,000 --> 18:31:56,520 our same loss function. And then we have no optimizer for this, we're just going to pass in the 11948 18:31:56,520 --> 18:32:02,920 accuracy function here. And then of course, the device is going to be equal to the device. 11949 18:32:03,800 --> 18:32:12,120 And then what do we do now? Well, we can measure the end time so that we know how long the code 11950 18:32:12,120 --> 18:32:20,120 here took to run. So let's go train time end for model two. This will be on the GPU, by the way, 11951 18:32:20,120 --> 18:32:24,600 but this time it's using a convolutional neural network. And the total train time, 11952 18:32:25,800 --> 18:32:32,440 total train time for model two is going to be equal to print train time, our function that we 11953 18:32:32,440 --> 18:32:37,640 created before as well, to help us measure start and end time. So we're going to pass in train 11954 18:32:37,640 --> 18:32:46,680 to time start model two, and then end is going to be train time end model two. And then we're going 11955 18:32:46,680 --> 18:32:52,680 to print out the device that it's using as well. So you're ready? Are you ready to train our first 11956 18:32:52,680 --> 18:32:58,440 convolutional neural network? Hopefully this code works. We've created these functions before, 11957 18:32:58,440 --> 18:33:04,840 so it should be all right. But if and out, code it out, if and out, run the code, let's see what 11958 18:33:04,840 --> 18:33:13,640 happens. Oh my goodness. Oh, of course. Oh, we forgot to comment out the output shapes. 11959 18:33:13,640 --> 18:33:20,520 So we get a whole bunch of outputs for our model, because what have we done? Back up here, 11960 18:33:21,320 --> 18:33:25,800 we forgot to. So this means every time our data goes through the forward pass, it's going to 11961 18:33:25,800 --> 18:33:33,560 be printing out the output shapes. So let's just comment out these. And I think this cell is going 11962 18:33:33,560 --> 18:33:40,200 to take quite a long time to run because it's got so many printouts. Yeah, see, streaming output 11963 18:33:40,200 --> 18:33:46,600 truncated to the last 5,000 lines. So we're going to try and stop that. Okay, there we go. 11964 18:33:46,600 --> 18:33:52,280 Beautiful. That actually worked. Sometimes it doesn't stop so quickly. So we're going to rerun 11965 18:33:52,280 --> 18:34:00,040 our fashion MSV to model cell so that we comment out these print lines. And then we'll just rerun 11966 18:34:00,040 --> 18:34:07,000 these cells down here. Just go back through fingers crossed, there's no errors. And we'll train our 11967 18:34:07,000 --> 18:34:12,840 model again. Beautiful. Not as many printouts this time. So here we go. Our first CNN is training. 11968 18:34:12,840 --> 18:34:18,840 How do you think it'll go? Well, that's what we have printouts, right? So we can see the progress. 11969 18:34:18,840 --> 18:34:23,160 So you can see here all the functions that are being called behind the scenes from PyTorch. So 11970 18:34:23,160 --> 18:34:27,240 thank you to PyTorch for that. There's our, oh, our train step function was in there. 11971 18:34:28,120 --> 18:34:35,560 Train step. Wonderful. Beautiful. So there's epoch zero. Oh, we get a pretty good test accuracy. 11972 18:34:35,560 --> 18:34:41,480 How good is that? Test accuracy is climbing as well. Have we beaten our baseline? We're looking at 11973 18:34:41,480 --> 18:34:51,320 about 14 seconds per epoch here. And then the final epoch. What do we finish at? Oh, 88.5. Wow. 11974 18:34:51,320 --> 18:35:00,920 In 41.979 or 42 there about seconds. Again, your mileage may vary. Don't worry too much if these 11975 18:35:00,920 --> 18:35:06,520 numbers aren't exactly the same on your screen and same with the training time because we might 11976 18:35:06,520 --> 18:35:15,720 be using slightly different hardware. What GPU do I have today? I have a Tesla P100 GPU. You might 11977 18:35:15,720 --> 18:35:21,000 not have the same GPU. So the training time, if this training time is something like 10 times 11978 18:35:21,000 --> 18:35:28,520 higher, you might want to look into what's going on. And if these values are like 10% lower or 10% 11979 18:35:28,520 --> 18:35:33,560 higher, you might want to see what's going on with your code as well. But let's now calculate 11980 18:35:33,560 --> 18:35:38,040 our Model 2 results. I think it is the best performing model that we have so far. Let's get 11981 18:35:38,040 --> 18:35:44,040 a results dictionary. Model 2 results is so exciting. We're learning the power of convolutional neural 11982 18:35:44,040 --> 18:35:50,520 networks. Model 2 results equals a vowel model. And this is a function that we've created before. 11983 18:35:52,440 --> 18:35:57,480 So returns a dictionary containing the results of a model predicting on data loader. 11984 18:35:57,480 --> 18:36:02,520 So now let's pass in the model, which will be our trained model to, and then we'll pass in the 11985 18:36:02,520 --> 18:36:09,960 data loader, which will be our test data loader. And then, oops, excuse me, typo, our loss function 11986 18:36:09,960 --> 18:36:16,840 will be, of course, our loss function. And the accuracy function will be accuracy function. 11987 18:36:17,480 --> 18:36:23,160 And the device is already set, but we can reset anyway, device equals device. And we'll check 11988 18:36:23,160 --> 18:36:34,200 out the Model 2 results. Make some predictions. Oh, look at that. Model accuracy 88. Does that 11989 18:36:34,200 --> 18:36:43,000 beat our baseline? Model 0 results. Oh, we did beat our baseline with a convolutional neural network. 11990 18:36:43,640 --> 18:36:50,360 All right. So I feel like that's, uh, that's quite exciting. But now let's keep going on. And, uh, 11991 18:36:50,360 --> 18:36:55,000 let's start to compare the results of all of our models. I'll see you in the next video. 11992 18:36:59,000 --> 18:37:04,760 Welcome back. Now, in the last video, we trained our first convolutional neural network. And 11993 18:37:04,760 --> 18:37:10,440 from the looks of things, it's improved upon our baseline. But let's make sure by comparing, 11994 18:37:10,440 --> 18:37:14,520 this is another important part of machine learning experiments is comparing the results 11995 18:37:14,520 --> 18:37:21,640 across your experiments. So and training time. Now, we've done that in a way where we've got 11996 18:37:21,640 --> 18:37:28,360 three dictionaries here of our model zero results, model one results, model two results. So how 11997 18:37:28,360 --> 18:37:36,600 about we create a data frame comparing them? So let's import pandas as PD. And we're going to 11998 18:37:36,600 --> 18:37:45,960 compare results equals PD dot data frame. And because our model results dictionaries, uh, 11999 18:37:45,960 --> 18:37:53,160 all have the same keys. Let's pass them in as a list. So model zero results, model one results, 12000 18:37:53,960 --> 18:38:00,920 and model two results to compare them. Wonderful. And what it looks like when we compare the results. 12001 18:38:00,920 --> 18:38:09,320 All righty. So recall our first model was our baseline V zero was just two linear layers. 12002 18:38:09,880 --> 18:38:18,040 And so we have an accuracy of 83.4 and a loss of 0.47. The next model was we trained on the GPU 12003 18:38:18,040 --> 18:38:26,200 and we introduced nonlinearities. So we actually found that that was worse off than our baseline. 12004 18:38:26,200 --> 18:38:32,360 But then we brought in the big guns. We brought in the tiny VGG architecture from the CNN explainer 12005 18:38:32,360 --> 18:38:38,200 website and trained our first convolutional neural network. And we got the best results so far. 12006 18:38:38,760 --> 18:38:43,080 But there's a lot more experiments that we could do. We could go back through our 12007 18:38:43,720 --> 18:38:50,920 tiny VGG and we could increase the number of hidden units. Where do we create our model up here? 12008 18:38:50,920 --> 18:38:55,880 We could increase this to say 30 and see what happens. That would be a good experiment to 12009 18:38:55,880 --> 18:39:01,400 try. And if we found that nonlinearities didn't help with our second model, we could comment out 12010 18:39:01,400 --> 18:39:07,800 the relu layers. We could of course change the kernel size, change the padding, change the max 12011 18:39:07,800 --> 18:39:12,440 pool. A whole bunch of different things that we could try here. We could train it for longer. 12012 18:39:12,440 --> 18:39:16,280 So maybe if we train it for 10 epochs, it would perform better. But these are just things to 12013 18:39:16,280 --> 18:39:20,920 keep in mind and try out. I'd encourage you to give them a go yourself. But for now, we've kept 12014 18:39:20,920 --> 18:39:26,840 all our experiments quite the same. How about we see the results we add in the training time? 12015 18:39:26,840 --> 18:39:31,480 Because that's another important thing that we've been tracking as well. So we'll add 12016 18:39:32,440 --> 18:39:41,560 training time to results comparison. So the reason why we do this is because 12017 18:39:42,520 --> 18:39:47,800 if this model is performing quite well, even compared to our CNN, so a difference in about 12018 18:39:47,800 --> 18:39:53,880 5% accuracy, maybe that's tolerable in the space that we're working, except that this model 12019 18:39:54,440 --> 18:39:59,880 might actually train and perform inference 10 times faster than this model. So that's just 12020 18:39:59,880 --> 18:40:05,080 something to be aware of. It's called the performance speed trade off. So let's add another column 12021 18:40:05,080 --> 18:40:12,120 here, compare results. And we're going to add in, oh, excuse me, got a little error there. That's 12022 18:40:12,120 --> 18:40:17,640 all right. Got trigger happy on the shift and enter. Training time equals, we're going to add in, 12023 18:40:18,600 --> 18:40:26,760 we've got another list here is going to be total train time for model zero, and total train time 12024 18:40:27,560 --> 18:40:37,080 for model one, and total train time for model two. And then we have a look at our 12025 18:40:37,080 --> 18:40:46,040 how compare results dictionary, or sorry, compare results data frame. Wonderful. So we see, and 12026 18:40:46,040 --> 18:40:50,520 now this is another thing. I keep stressing this to keep in mind. If your numbers aren't exactly 12027 18:40:50,520 --> 18:40:55,480 of what I've got here, don't worry too much. Go back through the code and see if you've set up 12028 18:40:55,480 --> 18:40:59,480 the random seeds correctly, you might need a koodle random seed. We may have missed one of those. 12029 18:41:00,200 --> 18:41:03,560 If your numbers are out landishly different to these numbers, then you should go back through 12030 18:41:03,560 --> 18:41:08,200 your code and see if there's something wrong. And again, the training time will be highly 12031 18:41:08,200 --> 18:41:12,920 dependent on the compute environment you're using. So if you're running this notebook locally, 12032 18:41:12,920 --> 18:41:17,800 you might get faster training times. If you're running it on a different GPU to what I have, 12033 18:41:17,800 --> 18:41:23,640 NVIDIA SMI, you might get different training times. So I'm using a Tesla P100, which is quite a fast 12034 18:41:23,640 --> 18:41:28,840 GPU. But that's because I'm paying for Colab Pro, which generally gives you faster GPUs. 12035 18:41:28,840 --> 18:41:36,840 And model zero was trained on the CPU. So depending on what compute resource Google allocates to you 12036 18:41:36,840 --> 18:41:43,000 with Google Colab, this number might vary here. So just keep that in mind. These values training 12037 18:41:43,000 --> 18:41:48,760 time will be very dependent on the hardware you're using. But if your numbers are dramatically 12038 18:41:48,760 --> 18:41:52,840 different, well, then you might want to change something in your code and see what's going on. 12039 18:41:52,840 --> 18:42:01,960 And how about we finish this off with a graph? So let's go visualize our model results. And while 12040 18:42:01,960 --> 18:42:08,680 we're doing this, have a look at the data frame above. Is the performance here 10 seconds longer 12041 18:42:08,680 --> 18:42:15,480 training time worth that extra 5% of the results on the accuracy? Now in our case, we're using a 12042 18:42:15,480 --> 18:42:21,160 relatively toy problem. What I mean by toy problem is quite a simple data set to try and test this 12043 18:42:21,160 --> 18:42:27,080 out. But in your practice, that may be worth doing. If your model takes longer to train, 12044 18:42:27,080 --> 18:42:32,600 but gets quite a bit better performance, it really depends on the problem you're working with. 12045 18:42:33,240 --> 18:42:38,520 Compare results. And we're going to set the index as the model name, because I think that's 12046 18:42:38,520 --> 18:42:43,560 what we want our graph to be, not the model name. And then we're going to plot, we want to compare 12047 18:42:43,560 --> 18:42:52,760 the model accuracy. And we want to plot, the kind is going to be equal to bar h, horizontal bar chart. 12048 18:42:53,400 --> 18:43:02,600 We've got p x label, we're going to get accuracy as a percentage. And then we're going to go py label. 12049 18:43:02,600 --> 18:43:06,200 This is just something that you could share. If someone was asking, how did your modeling 12050 18:43:06,200 --> 18:43:10,360 experiments go on fashion MNIST? Well, here's what I've got. And then they ask you, well, 12051 18:43:10,360 --> 18:43:14,760 what's the fashion MNIST model V2? Well, you could say that's a convolutional neural network that 12052 18:43:14,760 --> 18:43:20,600 trained, that's replicates the CNN explainer website that trained on a GPU. How long did that 12053 18:43:20,600 --> 18:43:25,080 take to train? Well, then you've got the training time here. We could just do it as a vertical bar 12054 18:43:25,080 --> 18:43:32,360 chart. I did it as horizontal so that this looks a bit funny to me. So horizontal like that. 12055 18:43:32,360 --> 18:43:39,960 So the model names are over here. Wonderful. So now I feel like we've got a trained model. 12056 18:43:40,760 --> 18:43:45,960 How about we make some visual predictions? Because we've just got numbers on a page here, 12057 18:43:45,960 --> 18:43:51,880 but our model is trained on computer vision data. And the whole point of making a machine 12058 18:43:51,880 --> 18:43:57,800 learning model on computer vision data is to be able to visualize predictions. So let's give 12059 18:43:57,800 --> 18:44:02,840 that a shot, hey, in the next video, we're going to use our best performing model, fashion MNIST 12060 18:44:02,840 --> 18:44:08,040 model V2 to make predictions on random samples from the test data set. You might want to give 12061 18:44:08,040 --> 18:44:13,640 that a shot, make some predictions on random samples from the test data set, and plot them out with 12062 18:44:13,640 --> 18:44:19,400 their predictions as the title. So try that out. Otherwise, we'll do it together in the next video. 12063 18:44:19,400 --> 18:44:28,520 In the last video, we compared our models results. We tried three experiments. One was a basic linear 12064 18:44:28,520 --> 18:44:34,680 model. One was a linear model with nonlinear activations. And fashion MNIST model V2 is a 12065 18:44:34,680 --> 18:44:40,280 convolutional neural network. And we saw that from an accuracy perspective, our convolutional neural 12066 18:44:40,280 --> 18:44:45,800 network performed the best. However, it had the longest training time. And I just want to exemplify 12067 18:44:45,800 --> 18:44:50,680 the fact that the training time will vary depending on the hardware that you run on. We spoke about 12068 18:44:50,680 --> 18:44:56,200 this in the last video. However, I took a break after finishing the last video, reran all of the 12069 18:44:56,200 --> 18:45:01,320 cells that we've written, all of the code cells up here by coming back to the notebook and going 12070 18:45:01,320 --> 18:45:06,520 run all. And as you'll see, if you compare the training times here to the last video, we get 12071 18:45:06,520 --> 18:45:11,960 some different values. Now, I'm not sure exactly what hardware Google collab is using behind the 12072 18:45:11,960 --> 18:45:16,920 scenes. But this is just something to keep in mind, at least from now on, we know how to track 12073 18:45:16,920 --> 18:45:22,680 our different variables, such as how long our model takes to train and what its performance 12074 18:45:22,680 --> 18:45:30,840 values are. But it's time to get visual. So let's create another heading, make and evaluate. This 12075 18:45:30,840 --> 18:45:37,240 is one of my favorite steps after training a machine learning model. So make and evaluate random 12076 18:45:37,240 --> 18:45:44,200 predictions with the best model. So we're going to follow the data explorer's model of getting 12077 18:45:44,200 --> 18:45:49,720 visual visual visual or visualize visualize visualize. Let's make a function called make 12078 18:45:49,720 --> 18:45:55,960 predictions. And it's going to take a model, which will be a torch and end module type. 12079 18:45:56,920 --> 18:46:02,840 It's also going to take some data, which can be a list. It'll also take a device type, 12080 18:46:02,840 --> 18:46:07,640 which will be torch dot device. And we'll set that by default to equal the default device that 12081 18:46:07,640 --> 18:46:12,680 we've already set up. And so what we're going to do is create an empty list for prediction 12082 18:46:12,680 --> 18:46:20,200 probabilities. Because what we'd like to do is just take random samples from the test data set, 12083 18:46:20,760 --> 18:46:26,280 make predictions on them using our model, and then plot those predictions. We want to visualize 12084 18:46:26,280 --> 18:46:32,120 them. And so we'll also turn our model into evaluation mode, because if you're making predictions with 12085 18:46:32,120 --> 18:46:37,640 your model, you should turn on evaluation mode. We'll also switch on the inference mode context 12086 18:46:37,640 --> 18:46:43,560 manager, because predictions is another word for inference. And we're going to loop through 12087 18:46:43,560 --> 18:46:51,800 for each sample in data. Let's prepare the sample. So this is going to take in 12088 18:46:52,760 --> 18:46:58,600 a single image. So we will unsqueeze it, because we need to add a batch size dimension 12089 18:46:58,600 --> 18:47:05,800 on the sample, we'll set dim equals to zero, and then we'll pass that to the device. So 12090 18:47:06,520 --> 18:47:14,440 add a batch dimension, that's with the unsqueeze, and pass to target device. That way, our data and 12091 18:47:14,440 --> 18:47:20,440 model are on the same device. And we can do a forward pass. Well, we could actually up here go 12092 18:47:21,800 --> 18:47:26,040 model dot two device. That way we know that we've got device agnostic code there. 12093 18:47:26,040 --> 18:47:33,000 Now let's do the forward pass, forward pass model outputs raw logits. So recall that if we have a 12094 18:47:33,000 --> 18:47:39,800 linear layer at the end of our model, it outputs raw logits. So pred logit for a single sample is 12095 18:47:39,800 --> 18:47:45,640 going to equal model. We pass the sample to our target model. And then we're going to get the 12096 18:47:45,640 --> 18:47:50,600 prediction probability. How do we get the prediction probability? So we want to go from 12097 18:47:50,600 --> 18:47:59,240 logit to prediction probability. Well, if we're working with a multi class classification problem, 12098 18:47:59,240 --> 18:48:05,320 we're going to use the softmax activation function on our pred logit. And we're going to squeeze 12099 18:48:05,320 --> 18:48:11,640 it so it gets rid of an extra dimension. And we're going to pass in dim equals zero. So that's going 12100 18:48:11,640 --> 18:48:17,800 to give us our prediction probability for a given sample. Now let's also turn our prediction 12101 18:48:17,800 --> 18:48:25,480 probabilities into prediction labels. So get pred. Well, actually, I think we're just going 12102 18:48:25,480 --> 18:48:31,160 to return the pred probes. Yeah, let's see what that looks like, because we've got a 12103 18:48:31,160 --> 18:48:37,640 an empty list up here for pred probes. So for matplotlib, we're going to have to use our data 12104 18:48:37,640 --> 18:48:43,320 on the CPU. So let's make sure it's on the CPU, because matplotlib doesn't work with the GPU. 12105 18:48:43,320 --> 18:48:51,400 So get pred prob off GPU for further calculations. So we're just hard coded in here to make sure 12106 18:48:51,400 --> 18:48:58,440 that our prediction probabilities off the GPU. So pred probs, which is our list up here. We're 12107 18:48:58,440 --> 18:49:06,040 going to append the pred prob that we just calculated. But we're going to put it on the CPU. And then 12108 18:49:06,040 --> 18:49:12,840 let's go down here. And we're going to. So if we've done it right, we're going to have a list of 12109 18:49:12,840 --> 18:49:18,280 prediction probabilities relating to particular samples. So we're going to stack the pred probs 12110 18:49:19,000 --> 18:49:25,320 to turn list into a tensor. So this is only one way of doing things. There are many different ways 12111 18:49:25,320 --> 18:49:30,200 that you could make predictions and visualize them. I'm just exemplifying one way. So we're 12112 18:49:30,200 --> 18:49:34,920 going to torch stack, which is just going to say, hey, concatenate everything in the list to a 12113 18:49:34,920 --> 18:49:49,560 single tensor. So we might need to tab that over, tab, tab. Beautiful. So let's try this function 12114 18:49:49,560 --> 18:49:54,760 in action and see what happens. I'm going to import random. And then I'm going to set the random 12115 18:49:54,760 --> 18:50:00,600 seed to 42. And then I'm going to create test samples as an empty list, because we want an empty 12116 18:50:00,600 --> 18:50:06,200 or we want a list of test samples to iterate through. And I'm going to create test labels also as an 12117 18:50:06,200 --> 18:50:10,760 empty list. So that remember, when we are evaluating predictions, we want to compare them to the 12118 18:50:10,760 --> 18:50:15,480 ground truth. So we want to get some test samples. And then we want to get their actual labels so 12119 18:50:15,480 --> 18:50:21,720 that when our model makes predictions, we can compare them to their actual labels. So for sample, 12120 18:50:22,360 --> 18:50:29,240 comma label, in, we're going to use random to sample the test data. Now note that this is not 12121 18:50:29,240 --> 18:50:36,360 the test data loader. This is just test data. And we're going to set k equals to nine. And recall, 12122 18:50:36,360 --> 18:50:40,520 if you want to have a look at test data, what do we do here? We can just go test data, 12123 18:50:41,960 --> 18:50:47,400 which is our data set, not converted into a data loader yet. And then if we wanted to get the first 12124 18:50:47,400 --> 18:50:53,080 10 samples, can we do that? Only one element tensors can be converted into Python scalars. So if we 12125 18:50:53,080 --> 18:50:59,240 get the first zero, and maybe we can go up to 10. Yeah, there we go. And what's the shape of this? 12126 18:51:03,240 --> 18:51:10,920 Tuple has no object shape. Okay, so we need to go image label equals that. And then can we check 12127 18:51:10,920 --> 18:51:18,520 the shape of the image label? Oh, because the labels are going to be integers. 12128 18:51:18,520 --> 18:51:25,800 Wonderful. So that's not the first 10 samples, but that's just what we get if we iterate through 12129 18:51:25,800 --> 18:51:32,520 the test data, we get an image tensor, and we get an associated label. So that's what we're doing 12130 18:51:32,520 --> 18:51:37,800 with this line here, we're just randomly sampling nine samples. And this could be any number you 12131 18:51:37,800 --> 18:51:41,560 want. I'm going to use nine, because this is a spoiler for later on, we're going to create a 12132 18:51:41,560 --> 18:51:48,280 three by three plot. So that just nine is just a fun number. So get some random samples from the 12133 18:51:48,280 --> 18:51:55,800 test data set. And then we can go test samples dot append sample. And we will go test labels dot 12134 18:51:55,800 --> 18:52:05,400 append label. And then let's go down here, view the first, maybe we go first sample shape. 12135 18:52:06,760 --> 18:52:11,720 So test samples zero dot shape. 12136 18:52:11,720 --> 18:52:19,000 And then if we get test samples, zero, we're going to get a tensor of image values. And then 12137 18:52:19,000 --> 18:52:28,920 if we wanted to plot that, can we go PLT, M show, C map, equals gray. And we may have to squeeze 12138 18:52:28,920 --> 18:52:35,480 this, I believe, to remove the batch tensor. Let's see what happens batch dimension. There we go. 12139 18:52:35,480 --> 18:52:41,000 Beautiful. So that's to me, a shoe, a high heel shoe of some sort. If we get the title, 12140 18:52:41,000 --> 18:52:50,760 PLT dot title, test labels, let's see what this looks like. It's a five, which is, of course, 12141 18:52:50,760 --> 18:52:59,560 class names will index on that. Sandal. Okay, beautiful. So we have nine random samples, 12142 18:52:59,560 --> 18:53:06,360 nine labels that are associated with that sample. Now let's make some predictions. So make predictions. 12143 18:53:06,360 --> 18:53:13,240 And this is one of my favorite things to do. I can't stress it enough is to randomly pick data 12144 18:53:13,240 --> 18:53:18,600 samples from the test data set and predict on them and do it over and over and over again to see 12145 18:53:18,600 --> 18:53:23,960 what the model is doing. So not only at the start of a problem, I'll just get the prediction 12146 18:53:23,960 --> 18:53:29,560 probabilities here. We're going to call our make predictions function. So not only at the start of 12147 18:53:29,560 --> 18:53:34,520 a problem should you become one with the data, even after you've trained a model, you'll want to 12148 18:53:34,520 --> 18:53:39,000 further become one with the data, but this time become one with your models predictions on the 12149 18:53:39,000 --> 18:53:46,840 data and see what happens. So view the first two prediction probabilities list. So we're just 12150 18:53:46,840 --> 18:53:51,160 using our make predictions function that we created before, passing at the model, the train model 12151 18:53:51,160 --> 18:53:55,800 to, and we're passing at the data, which is the test samples, which is this list that we just 12152 18:53:55,800 --> 18:54:03,000 created up here, which is comprised of random samples from the test data set. Wonderful. So 12153 18:54:03,000 --> 18:54:09,480 let's go. Pred probes. Oh, we don't want to view them all. That's going to give us 12154 18:54:13,880 --> 18:54:19,480 Oh, we want to the prediction probabilities for a given sample. And so how do we convert 12155 18:54:19,480 --> 18:54:25,720 prediction probabilities into labels? Because if we're trying to, if we have a look at test 12156 18:54:25,720 --> 18:54:32,040 labels, if we're trying to compare apples to apples, when we're evaluating our model, we want to, 12157 18:54:32,040 --> 18:54:37,320 we can't really necessarily compare the prediction probabilities straight to the test labels. So we 12158 18:54:37,320 --> 18:54:43,720 need to convert these prediction probabilities into prediction labels. So how can we do that? 12159 18:54:44,520 --> 18:54:50,920 Well, we can use argmax to take whichever value here, the index, in this case, this one, 12160 18:54:51,560 --> 18:54:58,120 the index of whichever value is the highest of these prediction probabilities. So let's see that 12161 18:54:58,120 --> 18:55:08,520 in action. Convert prediction probabilities to labels. So we'll go pred classes equals 12162 18:55:08,520 --> 18:55:14,360 pred probes, and we'll get the argmax across the first dimension. And now let's have a look at the 12163 18:55:14,360 --> 18:55:23,880 pred classes. Wonderful. So are they in the same format as our test labels? Yes, they are. So if 12164 18:55:23,880 --> 18:55:28,760 you'd like to go ahead, in the next video, we're going to plot these and compare them. So we're 12165 18:55:28,760 --> 18:55:33,560 going to write some code to create a mapplotlib plotting function that's going to plot nine 12166 18:55:33,560 --> 18:55:41,160 different samples, along with their original labels, and their predicted label. So give that a shot, 12167 18:55:41,160 --> 18:55:44,920 we've just written some code here to make some predictions on random samples. If you'd like them 12168 18:55:44,920 --> 18:55:51,000 to be truly random, you can comment out the seed here, but I've just kept the seed at 42. So that 12169 18:55:51,000 --> 18:55:58,440 our random dot sample selects the same samples on your end and on my end. So in the next video, 12170 18:55:58,440 --> 18:56:08,120 let's plot these. Let's now continue following the data explorer's motto of visualize visualize 12171 18:56:08,120 --> 18:56:12,920 visualize. We have some prediction classes. We have some labels we'd like to compare them to. 12172 18:56:12,920 --> 18:56:17,000 You can compare them visually. It looks like our model is doing pretty good. But let's, 12173 18:56:17,000 --> 18:56:22,120 since we're making predictions on images, let's plot those images along with the predictions. 12174 18:56:22,760 --> 18:56:27,560 So I'm going to write some code here to plot the predictions. I'm going to create a matplotlib 12175 18:56:27,560 --> 18:56:34,360 figure. I'm going to set the fig size to nine and nine. Because we've got nine random samples, 12176 18:56:34,920 --> 18:56:39,160 you could, of course, change this to however many you want. I just found that a three by three 12177 18:56:39,160 --> 18:56:45,720 plot works pretty good in practice. And I'm going to set n rows. So for my matplotlib plot, I want 12178 18:56:45,720 --> 18:56:53,880 three rows. And I want three columns. And so I'm going to enumerate through the samples in test 12179 18:56:53,880 --> 18:57:02,920 samples. And then I'm going to create a subplot for each sample. So create a subplot. Because this 12180 18:57:02,920 --> 18:57:07,400 is going to create a subplot because it's within the loop. Each time it goes through a new sample, 12181 18:57:07,400 --> 18:57:17,240 create a subplot of n rows and calls. And the index it's going to be on is going to be i plus 12182 18:57:17,240 --> 18:57:23,240 one, because it can't start at zero. So we just put i plus one in there. What's going on here? 12183 18:57:24,600 --> 18:57:31,720 Enumerate. Oh, excuse me. In enumerate, wonderful. So now we're going to plot the target image. 12184 18:57:31,720 --> 18:57:40,920 We can go plot dot in show, we're going to get sample dot squeeze. Because we need to remove the 12185 18:57:40,920 --> 18:57:47,960 batch dimension. And then we're going to set the C map is equal to gray. What's this telling me 12186 18:57:47,960 --> 18:57:57,560 up here? Oh, no, that's correct. Next, we're going to find the prediction label in text form, 12187 18:57:57,560 --> 18:58:02,360 because we don't want it in a numeric form, we could do that. But we want to look at things 12188 18:58:02,360 --> 18:58:08,520 visually with human readable language, such as sandal for whatever class sandal is, whatever number 12189 18:58:08,520 --> 18:58:14,760 class that is. So we're going to set the pred label equals class names. And we're going to index 12190 18:58:15,320 --> 18:58:22,200 using the pred classes I value. So right now we're going to plot our sample. We're going to find 12191 18:58:22,200 --> 18:58:29,080 its prediction. And now we're going to get the truth label. So we also want this in text form. 12192 18:58:30,040 --> 18:58:35,160 And what is the truth label going to be? Well, the truth label is we're going to have to index 12193 18:58:35,720 --> 18:58:42,600 using class names and index on that using test labels I. So we're just matching up our indexes 12194 18:58:42,600 --> 18:58:50,120 here. Finally, we're going to create a title, create a title for the plot. And now here's what I like 12195 18:58:50,120 --> 18:58:54,680 to do as well. If we're getting visual, well, we might as well get really visual, right? So I 12196 18:58:54,680 --> 18:58:59,080 think we can change the color of the title text, depending if the prediction is right or wrong. 12197 18:58:59,720 --> 18:59:04,200 So I'm going to create a title using an F string, pred is going to be a pred label, 12198 18:59:04,760 --> 18:59:10,440 and truth label. We could even plot the prediction probabilities here if we wanted to. That might 12199 18:59:10,440 --> 18:59:16,840 be an extension that you might want to try. And so here we're going to check for equality 12200 18:59:16,840 --> 18:59:26,920 between pred and truth and change color of title text. So what I mean by this, it's going to be a 12201 18:59:26,920 --> 18:59:33,560 lot easier to explain if we just if and doubt coded out. So if the pred label equals the truth 12202 18:59:33,560 --> 18:59:41,880 label, so they're equal, I want the plot dot title to be the title text. But I want the font size, 12203 18:59:41,880 --> 18:59:50,200 well, the font size can be the same 10. I want the color to equal green. So if they're so green text, 12204 18:59:50,920 --> 19:00:00,920 if prediction, same as truth, and else I'm going to set the plot title to have title text font 12205 19:00:00,920 --> 19:00:07,880 size equals 10. And the color is going to be red. So does that make sense? All we're doing is we're 12206 19:00:07,880 --> 19:00:13,400 enumerating through our test samples that we got up here, test samples that we found randomly from 12207 19:00:13,400 --> 19:00:18,760 the test data set. And then each time we're creating a subplot, we're plotting our image, 12208 19:00:19,240 --> 19:00:23,480 we're finding the prediction label by indexing on the class names with our pred classes value, 12209 19:00:24,760 --> 19:00:29,800 we're getting the truth label, and we're creating a title for the plot that compares the pred label 12210 19:00:29,800 --> 19:00:36,520 to the truth. And we're changing the color of the title text, depending if the pred label is 12211 19:00:36,520 --> 19:00:45,160 correct or not. So let's see what happens. Did we get it right? Oh, yes, we did. Oh, I'm going to 12212 19:00:45,160 --> 19:00:51,960 do one more thing. I want to turn off the accesses, just so we get more real estate. I love these 12213 19:00:51,960 --> 19:00:57,720 kind of plots. It helps that our model got all of these predictions right. So look at this, 12214 19:00:58,360 --> 19:01:05,320 pred sandal, truth, sandal, pred trouser, truth trouser. So that's pretty darn good, right? See how, 12215 19:01:05,320 --> 19:01:10,280 for me, I much appreciate, like, I much prefer visualizing things numbers on a page look good, 12216 19:01:10,280 --> 19:01:15,160 but there's something, there's nothing quite like visualizing your machine learning models 12217 19:01:15,160 --> 19:01:19,800 predictions, especially when it gets it right. So how about we select some different random samples 12218 19:01:19,800 --> 19:01:24,920 up here, we could functionize this as well to do like all of this code in one hit, but that's all 12219 19:01:24,920 --> 19:01:31,160 right. We'll be a bit hacky for now. So this is just going to randomly sample with no seed at all. 12220 19:01:31,160 --> 19:01:36,600 So your samples might be different to mine, nine different samples. So this time we have an ankle 12221 19:01:36,600 --> 19:01:41,720 boot, we'll make some predictions, we'll just step through all of this code here. And oh, 12222 19:01:41,720 --> 19:01:47,000 there we go. It got one wrong. So all of these are correct. But this is more interesting as 12223 19:01:47,000 --> 19:01:51,320 well is where does your model get things wrong? So it predicted address, but this is a coat. 12224 19:01:52,280 --> 19:01:58,600 Now, do you think that this could be potentially address? To me, I could see that as being addressed. 12225 19:01:58,600 --> 19:02:02,920 So I kind of understand where the model's coming from there. Let's make some more random predictions. 12226 19:02:03,800 --> 19:02:06,280 We might do two more of these before we move on to the next video. 12227 19:02:07,480 --> 19:02:12,840 Oh, all correct. We're interested in getting some wrong here. So our model seems to be too good. 12228 19:02:16,840 --> 19:02:20,840 All correct again. Okay, one more time. If we don't get any wrong, we're going on to the next 12229 19:02:20,840 --> 19:02:27,240 video. But this is just really, oh, there we go. Too wrong. Beautiful. So predicted address, 12230 19:02:27,240 --> 19:02:31,400 and that's a shirt. Okay. I can kind of see where the model might have stuffed up there. 12231 19:02:31,400 --> 19:02:35,880 It's a little bit long for a shirt for me, but I can still understand that that would be a shirt. 12232 19:02:36,600 --> 19:02:42,280 And this is a pullover, but the truth is a coat. So maybe, maybe there's some issues with the labels. 12233 19:02:42,280 --> 19:02:46,680 And that's probably what you'll find in a lot of data sets, especially quite large ones. 12234 19:02:46,680 --> 19:02:51,160 Just with a sheer law of large numbers, there may be some truth labels in your data sets that 12235 19:02:51,160 --> 19:02:57,320 you work with that are wrong. And so that's why I like to see, compare the models predictions 12236 19:02:57,320 --> 19:03:02,360 versus the truth on a bunch of random samples to go, you know what, is our models results 12237 19:03:02,360 --> 19:03:07,800 better or worse than they actually are. And that's what visualizing helps you do is figure out, 12238 19:03:07,800 --> 19:03:13,080 you know what, our model is actually, it says it's good on the accuracy. But when we visualize 12239 19:03:13,080 --> 19:03:19,080 the predictions, it's not too good. And vice versa, right? So you can keep playing around with this, 12240 19:03:19,080 --> 19:03:24,520 try, look at some more random samples by running this again. We'll do one more for good luck. 12241 19:03:24,520 --> 19:03:28,600 And then we'll move on to the next video. We're going to go on to another way. Oh, 12242 19:03:29,240 --> 19:03:35,160 see, this is another example. Some labels here could be confusing. And speaking of confusing, 12243 19:03:35,160 --> 19:03:40,200 well, that's going to be a spoiler for the next video. But do you see how the prediction is a 12244 19:03:40,200 --> 19:03:46,760 t-shirt top, but the truth is a shirt? To me, that label is kind of overlapping. Like, I don't know, 12245 19:03:46,760 --> 19:03:52,360 what's the difference between a t-shirt and a shirt? So that's something that you'll find 12246 19:03:52,360 --> 19:03:56,760 as you train models is maybe your model is going to tell you about your data as well. 12247 19:03:58,120 --> 19:04:04,040 And so we hinted that this is going to be confused. The model is confused between t-shirt top and 12248 19:04:04,040 --> 19:04:10,840 shirt. How about we plot a confusion matrix in the next video? I'll see you there. 12249 19:04:10,840 --> 19:04:18,760 We're up to a very exciting point in evaluating our machine learning model. 12250 19:04:18,760 --> 19:04:25,320 And that is visualizing, visualizing, visualizing. And we saw that in the previous video, our model 12251 19:04:25,320 --> 19:04:30,120 kind of gets a little bit confused. And in fact, I would personally get confused at the difference 12252 19:04:30,120 --> 19:04:38,760 between t-shirt slash top and a shirt. So these kind of insights into our model predictions 12253 19:04:38,760 --> 19:04:44,760 can also give us insights into maybe some of our labels could be improved. And another way to 12254 19:04:44,760 --> 19:04:53,000 check that is to make a confusion matrix. So let's do that, making a confusion matrix for further 12255 19:04:53,720 --> 19:05:00,120 prediction evaluation. Now, a confusion matrix is another one of my favorite ways of evaluating 12256 19:05:00,120 --> 19:05:04,760 a classification model, because that's what we're doing. We're doing multi class classification. 12257 19:05:04,760 --> 19:05:10,120 And if you recall, if we go back to section two of the lone pytorch.io book, 12258 19:05:11,080 --> 19:05:16,520 and then if we scroll down, we have a section here, more classification evaluation metrics. 12259 19:05:16,520 --> 19:05:20,760 So accuracy is probably the gold standard of classification evaluation. 12260 19:05:20,760 --> 19:05:25,880 There's precision, there's recall, there's F1 score, and there's a confusion matrix here. 12261 19:05:25,880 --> 19:05:30,920 So how about we try to build one of those? I want to get this and copy this. 12262 19:05:30,920 --> 19:05:42,280 So, and write down a confusion matrix is a fantastic way of evaluating your classification models 12263 19:05:43,800 --> 19:05:52,440 visually. Beautiful. So we're going to break this down. First of all, we need to plot a 12264 19:05:52,440 --> 19:05:58,520 confusion matrix. We need to make predictions with our trained model on the test data set. 12265 19:05:58,520 --> 19:06:05,400 Number two, we're going to make a confusion matrix. And to do so, we're going to leverage 12266 19:06:05,400 --> 19:06:12,360 torch metrics tricks have to figure out how to spell metrics and confusion matrix. So recall 12267 19:06:12,360 --> 19:06:18,440 that torch metrics we've touched on this before is a great package torch metrics for a whole 12268 19:06:18,440 --> 19:06:25,960 bunch of evaluation metrics of machine learning models in pytorch flavor. So if we find we've 12269 19:06:25,960 --> 19:06:30,760 got classification metrics, we've got audio image detection. Look how this is beautiful, 12270 19:06:30,760 --> 19:06:34,840 a bunch of different evaluation metrics. And if we go down over here, we've got confusion 12271 19:06:34,840 --> 19:06:43,240 matrix. So I only touched on five here, but or six. But if you look at torch metrics, they've got, 12272 19:06:43,240 --> 19:06:48,360 how many is that about 25 different classification metrics? So if you want some extra curriculum, 12273 19:06:48,360 --> 19:06:54,760 you can read through these. But let's go to confusion matrix. And if we look at some code here, 12274 19:06:54,760 --> 19:07:00,040 we've got torch metrics, confusion matrix, we need to pass in number of classes. We can 12275 19:07:00,040 --> 19:07:04,920 normalize if we want. And do you notice how this is quite similar to the pytorch documentation? 12276 19:07:05,640 --> 19:07:11,160 Well, that's the beautiful thing about torch metrics is that it's created with pytorch in mind. 12277 19:07:12,040 --> 19:07:15,320 So let's try out if you wanted to try it out on some 12278 19:07:16,280 --> 19:07:19,960 tester code, you could do it here. But since we've already got some of our own code, 12279 19:07:19,960 --> 19:07:28,200 let's just bring in this. And then number three is to plot it. We've got another helper package here, 12280 19:07:29,080 --> 19:07:37,800 plot the confusion matrix using ML extend. So this is another one of my favorite helper 12281 19:07:37,800 --> 19:07:41,880 libraries for machine learning things. It's got a lot of functionality that you can code up 12282 19:07:41,880 --> 19:07:47,240 yourself, but you often find yourself coding at a few too many times, such as plotting a confusion 12283 19:07:47,240 --> 19:07:57,400 matrix. So if we look up ML extend plot confusion matrix, this is a wonderful library. I believe it was 12284 19:07:58,680 --> 19:08:06,280 it was created by Sebastian Rushka, who's a machine learning researcher and also author of 12285 19:08:06,280 --> 19:08:11,400 a great book. There he is. Yeah, this is a side note machine learning with pytorch and 12286 19:08:11,400 --> 19:08:17,400 scikit loan. I just got this book it just got released in the start of 2022. And it's a great 12287 19:08:17,400 --> 19:08:22,200 book. So that's a little side note for learning more about machine learning with pytorch and scikit 12288 19:08:22,200 --> 19:08:27,160 loan. So shout out to Sebastian Rushka. Thank you for this package as well. This is going to 12289 19:08:27,160 --> 19:08:32,040 just help us plot a confusion matrix like this. So we'll have our predicted labels on the bottom 12290 19:08:32,040 --> 19:08:36,520 and our true labels on the side here. But we can just copy this code in here. 12291 19:08:36,520 --> 19:08:44,520 Link sorry, and then confusion matrix, we can copy that in here. The thing is that torch 12292 19:08:44,520 --> 19:08:51,560 metrics doesn't come with Google Colab. So if you're using Google Colab, I think ML extend does, 12293 19:08:51,560 --> 19:08:58,280 but we need a certain version of ML extend that Google Colab doesn't yet have yet. So we actually 12294 19:08:58,280 --> 19:09:05,720 need version 0.19.0. But we're going to import those in a second. Let's first make some predictions 12295 19:09:05,720 --> 19:09:12,600 across our entire test data set. So previously, we made some predictions only on nine random samples. 12296 19:09:13,800 --> 19:09:19,240 So random sample, we selected nine. You could, of course, change this number to make it on more. 12297 19:09:19,800 --> 19:09:25,240 But this was only on nine samples. Let's write some code to make predictions across our entire 12298 19:09:25,240 --> 19:09:32,120 test data set. So import tqdm.auto for progress bar tracking. 12299 19:09:32,120 --> 19:09:37,880 So tqdm.auto. We don't need to re-import it. I believe we've already got it above, but I'm just 12300 19:09:37,880 --> 19:09:44,520 going to do it anyway for completeness. And so we're going to make, this is step one, above, 12301 19:09:45,320 --> 19:09:53,480 make predictions, make predictions with trained model. Our trained model is model two. So let's 12302 19:09:53,480 --> 19:09:59,320 create an empty predictions list. So we can add our predictions to that. We're going to set our 12303 19:09:59,320 --> 19:10:06,360 model into evaluation mode. And we're going to set with torch inference mode as our context manager. 12304 19:10:06,360 --> 19:10:11,800 And then inside that, let's just build the same sort of code that we used for our testing loop, 12305 19:10:12,840 --> 19:10:16,040 except this time we're going to append all of our predictions to a list. 12306 19:10:18,040 --> 19:10:25,400 So we're going to iterate through the test data loader. And we can give our tqdm description. 12307 19:10:25,400 --> 19:10:30,200 We're going to say making predictions dot dot dot. You'll see what that looks like in a minute. 12308 19:10:31,240 --> 19:10:40,840 And here we are going to send the data and targets to target device. So x, y equals x 12309 19:10:42,040 --> 19:10:50,120 to device and y to device. Wonderful. And we're going to do the forward pass. 12310 19:10:50,120 --> 19:10:56,360 So we're going to create y logit. Remember, the raw outputs of a model with a linear layer at the 12311 19:10:56,360 --> 19:11:02,280 end are referred to as logits. And we don't need to calculate the loss, but we want to turn predictions 12312 19:11:03,480 --> 19:11:15,640 from logits to prediction probabilities to prediction labels. So we'll set here y pred equals torch 12313 19:11:15,640 --> 19:11:20,520 dot softmax. You could actually skip the torch softmax step if you wanted to and just take the 12314 19:11:20,520 --> 19:11:26,120 argmax of the logits. But we will just go from prediction probabilities to pred labels for completeness. 12315 19:11:27,080 --> 19:11:33,880 So squeeze and we're going to do it across the first dimension or the zeroth dimension. And then 12316 19:11:33,880 --> 19:11:39,560 we'll take the argmax of that across the first dimension as well. And a little tidbit. If you 12317 19:11:40,440 --> 19:11:44,360 take different dimensions here, you'll probably get different values. So just check the inputs 12318 19:11:44,360 --> 19:11:49,560 and outputs of your code to make sure you're using the right dimension here. And so let's go 12319 19:11:49,560 --> 19:11:56,520 put predictions on CPU for evaluation. Because if we're going to plot anything, that plot lib will 12320 19:11:56,520 --> 19:12:03,800 want them on the CPU. So we're going to append our predictions to y preds, y pred dot CPU. 12321 19:12:04,680 --> 19:12:10,280 Beautiful. And because we're going to have a list of different predictions, we can use concatenate 12322 19:12:10,280 --> 19:12:17,960 a list of predictions into a tensor. So let's just print out y preds. And so I can show you what 12323 19:12:17,960 --> 19:12:24,120 it looks like. And then if we go y pred tensor, this is going to turn our list of predictions 12324 19:12:24,120 --> 19:12:33,640 into a single tensor. And then we'll go y pred tensor. And we'll view the first 10. Let's see if this 12325 19:12:33,640 --> 19:12:39,320 works. So making predictions. Oh, would you look at that? Okay, so yeah, here's our list of 12326 19:12:39,320 --> 19:12:46,360 predictions. A big list of tensors. Right, we don't really want it like that. So if we get rid of 12327 19:12:46,360 --> 19:12:51,720 that, and there's our progress bar, it's going through each batch in the test data load, so there's 12328 19:12:51,720 --> 19:13:00,520 313 batches of 32. So if we comment out print y preds, this line here torch dot cat y preds is 12329 19:13:00,520 --> 19:13:07,320 going to turn this these tensors into a single tensor, or this list of tensors into a single 12330 19:13:07,320 --> 19:13:13,720 tensor concatenate. Now, if we have a look, there we go, beautiful. And if we have a look at the 12331 19:13:13,720 --> 19:13:17,960 whole thing, we're making predictions every single time here, but that's all right. They are pretty 12332 19:13:17,960 --> 19:13:25,160 quick. There we go. One big long tensor. And then if we check length y pred tensor, there should be 12333 19:13:25,160 --> 19:13:34,200 one prediction per test sample. 10,000 beautiful. So now we're going to, we need to install torch 12334 19:13:34,200 --> 19:13:38,840 metrics because torch metrics doesn't come with Google Colab at the time of recording. So let 12335 19:13:38,840 --> 19:13:45,640 me just show you if we tried to import torch metrics. It doesn't, it might in the future, so just keep 12336 19:13:45,640 --> 19:13:50,040 that in mind, it might come with Google Colab because it's a pretty useful package. But let's 12337 19:13:50,040 --> 19:14:02,280 now install see if required packages are installed. And if not, install them. So we'll just install 12338 19:14:02,280 --> 19:14:08,200 torch metrics. We'll finish off this video by trying to import. We'll set up a try and accept 12339 19:14:08,200 --> 19:14:13,640 loop. So Python is going to try import torch metrics and ML extend. I write it like this, 12340 19:14:13,640 --> 19:14:18,280 because you may already have to which metrics and ML extend if you're running this code on a local 12341 19:14:18,280 --> 19:14:23,720 machine. But if you're running it in Google Colab, which I'm sure many of you are, we are 12342 19:14:23,720 --> 19:14:29,880 going to try and import it anyway. And if it doesn't work, we're going to install it. 12343 19:14:29,880 --> 19:14:35,720 So ML extend, I'm just going to check the version here because we need version for our plot confusion 12344 19:14:35,720 --> 19:14:43,640 matrix function. This one, we need version 0.19.0 or higher. So I'm just going to write a little 12345 19:14:43,640 --> 19:14:54,040 statement here. Assert int ML extend dot version. So if these two, if this condition in the try 12346 19:14:54,040 --> 19:15:02,200 loop is or try block is accepted, it will skip the next step dot split. And I'm just going to check 12347 19:15:02,200 --> 19:15:10,600 the first index string equals is greater than or equal to 19. Otherwise, I'm going to return an 12348 19:15:10,600 --> 19:15:20,280 error saying ML extend version should be 0.19.0 or higher. And so let me just show you what this 12349 19:15:20,280 --> 19:15:31,320 looks like. If we run this here, string and int, did I not turn it into a string? Oh, excuse me. 12350 19:15:32,040 --> 19:15:40,920 There we go. And I don't need that bracket on the end. There we go. So that's what I'm saying. 12351 19:15:40,920 --> 19:15:47,160 So this is just saying, hey, the version of ML extend that you have should be 0 or should be 12352 19:15:47,160 --> 19:15:53,640 19 or higher. Because right now Google Colab by default has 14, this may change in the future. 12353 19:15:53,640 --> 19:15:58,360 So let's finish off this accept block. If the above condition fails, which it should, 12354 19:15:59,080 --> 19:16:05,960 we are going to pip install. So we're going to install this into Google Colab torch metrics. 12355 19:16:05,960 --> 19:16:11,480 We're going to do it quietly. And we're also going to pass the U tag for update ML extend. 12356 19:16:11,480 --> 19:16:20,760 So import torch metrics, ML extend afterwards, after it's been installed and upgraded. And print, 12357 19:16:20,760 --> 19:16:32,360 we're going to go ML extend version, going to go ML extend underscore version. And let's see what 12358 19:16:32,360 --> 19:16:38,520 happens if we run this. So we should see, yeah, some installation happening here. This is going 12359 19:16:38,520 --> 19:16:45,480 to install torch metrics. Oh, do we not have ML extend the upgraded version? Let's have a look. 12360 19:16:45,480 --> 19:16:52,200 We may need to restart our Google Colab instance. Ah, okay, let's take this off. Quiet. 12361 19:16:55,480 --> 19:16:57,640 Is this going to tell us to restart Google Colab? 12362 19:17:00,600 --> 19:17:06,360 Well, let's restart our runtime. After you've run this cell, if you're using Google Colab, 12363 19:17:06,360 --> 19:17:11,480 you may have to restart your runtime to reflect the fact that we have the updated version of ML 12364 19:17:11,480 --> 19:17:19,560 extend. So I'm going to restart my runtime now. Otherwise, we won't be able to plot our confusion 12365 19:17:19,560 --> 19:17:25,400 matrix. We need 0.19.0. And I'm going to run all of these cells. So I'm going to pause the video 12366 19:17:25,400 --> 19:17:31,000 here, run all of the cells by clicking run all. Note, if you run into any errors, you will have 12367 19:17:31,000 --> 19:17:35,880 to run those cells manually. And then I'm going to get back down to this cell and make sure that I 12368 19:17:35,880 --> 19:17:42,120 have ML extend version 0.1.9. I'll see in a few seconds. 12369 19:17:46,120 --> 19:17:50,760 I'm back. And just a little heads up. If you restart your runtime and click run all, 12370 19:17:50,760 --> 19:17:56,120 your Colab notebook will stop running cells if it runs into an error. So this is that error we 12371 19:17:56,120 --> 19:18:01,960 found in a previous video where our data and model were on different devices. So to skip past that, 12372 19:18:01,960 --> 19:18:09,400 we can just jump to the next cell and we can click run after. There we go. And it's going to run all 12373 19:18:09,400 --> 19:18:15,720 of the cells after for us. It's going to retrain our models. Everything's going to get rerun. 12374 19:18:15,720 --> 19:18:20,040 And then we're going to come right back down to where we were before trying to install the 12375 19:18:20,040 --> 19:18:25,240 updated version of ML extend. I'm going to write some more code while our code is running import 12376 19:18:25,240 --> 19:18:31,080 ML extend. And then I'm going to just make sure that we've got the right version here. You may 12377 19:18:31,080 --> 19:18:38,840 require a runtime restart. You may not. So just try to see after you've run this install of 12378 19:18:38,840 --> 19:18:43,880 torch metrics and upgrade of ML extend. See if you can re import ML extend. And if you have the 12379 19:18:43,880 --> 19:18:50,440 version 0.19.0 or above, we should be able to run the code. Yeah, there we go. Wonderful. 12380 19:18:50,440 --> 19:19:02,360 ML extend 0.19.0. And we've got ML extend version, assert, import. Beautiful. So we've got a lot 12381 19:19:02,360 --> 19:19:08,600 of extra code here. In the next video, let's move forward with creating a confusion matrix. 12382 19:19:08,600 --> 19:19:12,440 I just wanted to show you how to install and upgrade some packages in Google Colab if you 12383 19:19:12,440 --> 19:19:18,120 don't have them. But now we've got predictions across our entire test data set. And we're going 12384 19:19:18,120 --> 19:19:25,720 to be moving towards using confusion matrix function here to compare our predictions versus the target 12385 19:19:25,720 --> 19:19:32,280 data of our test data set. So I'll see in the next video, let's plot a confusion matrix. 12386 19:19:36,200 --> 19:19:40,840 Welcome back. In the last video, we wrote a bunch of code to import some extra libraries that we 12387 19:19:40,840 --> 19:19:45,960 need for plotting a confusion matrix. This is really helpful, by the way. Google Colab comes 12388 19:19:45,960 --> 19:19:50,040 with a lot of prebuilt installed stuff. But definitely later on down the track, you're going to need 12389 19:19:50,040 --> 19:19:55,640 to have some experience installing stuff. And this is just one way that you can do it. And we also 12390 19:19:55,640 --> 19:20:01,400 made predictions across our entire test data set. So we've got 10,000 predictions in this tensor. 12391 19:20:01,400 --> 19:20:06,520 And what we're going to do with a confusion matrix is confirm or compare these predictions 12392 19:20:06,520 --> 19:20:13,240 to the target labels in our test data set. So we've done step number one. And we've prepared 12393 19:20:13,240 --> 19:20:20,040 ourselves for step two and three, by installing torch metrics, and installing ML extend or the 12394 19:20:20,040 --> 19:20:25,480 later version of ML extend. So now let's go through step two, making a confusion matrix, 12395 19:20:25,480 --> 19:20:30,440 and step three plotting that confusion matrix. This is going to look so good. I love how good 12396 19:20:30,440 --> 19:20:35,880 confusion matrix is look. So because we've got torch metrics now, we're going to import the 12397 19:20:35,880 --> 19:20:42,680 confusion matrix class. And from our ML extend, we're going to go into the plotting module, 12398 19:20:42,680 --> 19:20:49,720 and import plot confusion matrix. Recall that the documentation for both of these are 12399 19:20:50,680 --> 19:20:58,760 within torch metrics here, and within ML extend here. Let's see what they look like. So number two 12400 19:20:58,760 --> 19:21:07,000 is set up confusion matrix instance, and compare predictions to targets. That's what evaluating a 12401 19:21:07,000 --> 19:21:11,960 model is, right? Comparing our models predictions to the target predictions. So I'm going to set 12402 19:21:11,960 --> 19:21:18,760 up a confusion matrix under the variable conf mat, then I'm going to call the confusion matrix class 12403 19:21:18,760 --> 19:21:24,440 from torch metrics. And to set up an instance of it, I need to pass in the number of classes that 12404 19:21:24,440 --> 19:21:31,320 we have. So because we have 10 classes, they are all contained within class names. Recall that 12405 19:21:31,320 --> 19:21:36,120 class names is a list of all of the different classes that we're working with. So I'm just going 12406 19:21:36,120 --> 19:21:41,480 to pass in the number of classes as the length of our class names. And then I can use that 12407 19:21:41,480 --> 19:21:48,760 conf mat instance, confusion matrix instance, to create a confusion matrix tensor by passing 12408 19:21:48,760 --> 19:21:55,160 into conf mat, which is what I've just created up here. Conf mat, just like we do with our loss 12409 19:21:55,160 --> 19:22:03,240 function, I'm going to pass in preds equals our Y pred tensor, which is just above Y pred tensor 12410 19:22:03,240 --> 19:22:10,120 that we calculated all of the predictions on the test data set. There we go. That's our preds. 12411 19:22:10,120 --> 19:22:18,600 And our target is going to be equal to test data dot targets. And this is our test data data set 12412 19:22:18,600 --> 19:22:24,280 that we've seen before. So if we go test data and press tab, we've got a bunch of different 12413 19:22:24,280 --> 19:22:32,040 attributes, we can get the classes. And of course, we can get the targets, which is the labels. 12414 19:22:32,040 --> 19:22:38,280 PyTorch calls labels targets. I usually refer to them as labels, but the target is the test data 12415 19:22:38,280 --> 19:22:44,120 target. So we want to compare our models predictions on the test data set to our test data targets. 12416 19:22:44,920 --> 19:22:49,080 And so let's keep going forward. We're up to step number three now. So this is going to create 12417 19:22:49,080 --> 19:22:55,400 our confusion matrix tensor. Oh, let's see what that looks like, actually. Conf mat tensor. 12418 19:22:58,360 --> 19:23:06,920 Oh, okay. So we've got a fair bit going on here. But let's turn this into a pretty version of this. 12419 19:23:06,920 --> 19:23:12,440 So along the bottom is going to be our predicted labels. And along the side here is going to be 12420 19:23:12,440 --> 19:23:17,480 our true labels. But this is where the power of ML extend comes in. We're going to plot our 12421 19:23:17,480 --> 19:23:25,000 confusion matrix. So let's create a figure and an axes. We're going to call the function plot 12422 19:23:25,000 --> 19:23:32,280 confusion matrix that we've just imported above. And we're going to pass in our conf mat equals 12423 19:23:32,280 --> 19:23:38,520 our conf mat tensor. But because we're working with map plot lib, it'll want it as NumPy. 12424 19:23:39,640 --> 19:23:47,720 So I'm just going to write here, map plot lib likes working with NumPy. And we're going to 12425 19:23:47,720 --> 19:23:53,960 pass in the class names so that we get labels for each of our rows and columns. Class names, 12426 19:23:53,960 --> 19:23:58,600 this is just a list of our text based class names. And then I'm going to set the fig size 12427 19:23:58,600 --> 19:24:05,000 to my favorite hand and poker, which is 10, seven. Also happens to be a good dimension for 12428 19:24:05,000 --> 19:24:12,440 Google Colab. Look at that. Oh, that is something beautiful to see. Now a confusion matrix. The 12429 19:24:12,440 --> 19:24:18,360 ideal confusion matrix will have all of the diagonal rows darkened with all of the values 12430 19:24:18,360 --> 19:24:23,480 and no values here, no values here. Because that means that the predicted label lines up with the 12431 19:24:23,480 --> 19:24:29,240 true label. So in our case, we have definitely a very dark diagonal here. But let's dive into 12432 19:24:29,240 --> 19:24:34,600 some of the highest numbers here. It looks like our model is predicting shirt when the true label 12433 19:24:34,600 --> 19:24:40,040 is actually t shirt slash top. So that is reflective of what we saw before. Do we still have that 12434 19:24:40,040 --> 19:24:46,200 image there? Okay, we don't have an image there. But in a previous video, we saw that when we plotted 12435 19:24:46,200 --> 19:24:52,760 our predictions, the model predicted t shirt slash top when it was actually a shirt. And of course, 12436 19:24:52,760 --> 19:24:58,360 vice versa. So what's another one here? Looks like our model is predicting shirt when it's 12437 19:24:58,360 --> 19:25:05,800 actually a coat. And now this is something that you can use to visually inspect your data to see 12438 19:25:05,800 --> 19:25:12,280 if the the errors that your model is making make sense from a visual perspective. So it's getting 12439 19:25:12,280 --> 19:25:16,920 confused by predicting pull over when the actual label is coat, predicting pull over when the 12440 19:25:16,920 --> 19:25:22,200 actual label is shirt. So a lot of these things clothing wise and data wise may in fact look 12441 19:25:22,200 --> 19:25:27,720 quite the same. Here's a relatively large one as well. It's predicting sneaker when it should be 12442 19:25:27,720 --> 19:25:33,560 an ankle boot. So it's confusing two different types of shoes there. So this is just a way to 12443 19:25:33,560 --> 19:25:38,360 further evaluate your model and start to go. Hmm, maybe our labels are a little bit confusing. 12444 19:25:38,360 --> 19:25:43,720 Could we expand them a little bit more? So keep that in mind, a confusion matrix is one of the 12445 19:25:43,720 --> 19:25:50,920 most powerful ways to visualize your classification model predictions. And a really, really, really 12446 19:25:50,920 --> 19:25:55,880 helpful way of creating one is to use torch metrics confusion matrix. And to plot it, 12447 19:25:56,840 --> 19:26:02,440 you can use plot confusion matrix from ML extend. However, if you're using Google Colab for these, 12448 19:26:02,440 --> 19:26:08,920 you may need to import them or install them. So that's a confusion matrix. If you'd like 12449 19:26:08,920 --> 19:26:13,960 more classification metrics, you've got them here. And you've got, of course, more in torch 12450 19:26:13,960 --> 19:26:20,120 metrics. So give that a look. I think in the next video, we've done a fair bit of evaluation. 12451 19:26:20,120 --> 19:26:25,400 Where are we up to in our workflow? I believe it's time we saved and loaded our best trained model. 12452 19:26:25,400 --> 19:26:27,800 So let's give that a go. I'll see you in the next video. 12453 19:26:31,160 --> 19:26:36,840 In the last video, we created a beautiful confusion matrix with the power of torch metrics 12454 19:26:37,640 --> 19:26:44,520 and ML extend. But now it's time to save and load our best model. Because if we, if we evaluated it, 12455 19:26:44,520 --> 19:26:48,920 our convolutional neural network and go, you know what, this model is pretty good. Let's export 12456 19:26:48,920 --> 19:26:54,840 it to a file so we can use it somewhere else. Let's see how we do that. And by the way, if we go into 12457 19:26:54,840 --> 19:27:02,360 our keynote, we've got a value at model torch metrics. We've been through this a fair few times 12458 19:27:02,360 --> 19:27:06,920 now. We've improved through experimentation. We haven't used tensor board yet, but that'll be 12459 19:27:06,920 --> 19:27:12,600 in a later video and save and reload your trained model. So here's where we're up to. If we've gone 12460 19:27:12,600 --> 19:27:16,200 through all these steps enough times and we're like, you know what, let's save our model so we 12461 19:27:16,200 --> 19:27:20,680 can use it elsewhere. And we can reload it in to make sure that it's, it's saved correctly. 12462 19:27:21,320 --> 19:27:26,840 Let's go through with this step. We want number 11. We're going to go save and load 12463 19:27:26,840 --> 19:27:31,400 best performing model. You may have already done this before. So if you've been through the other 12464 19:27:31,400 --> 19:27:35,960 parts of the course, you definitely have. So if you want to give that a go, pause the video now 12465 19:27:35,960 --> 19:27:43,480 and try it out yourself. I believe we did it in notebook number one. We have here we go, 12466 19:27:43,480 --> 19:27:48,040 saving and loading a pie torch model. You can go through this section of section number one 12467 19:27:48,040 --> 19:27:53,960 on your own and see if you can do it. Otherwise, let's code it out together. So I'm going to start 12468 19:27:53,960 --> 19:28:00,680 from with importing path from path lib, because I like to create a model directory path. 12469 19:28:01,560 --> 19:28:08,520 So create model directory path. So my model path is going to be set equal to path. And I'm going 12470 19:28:08,520 --> 19:28:13,480 to save it to models. This is where I want to, I want to create file over here called models 12471 19:28:14,200 --> 19:28:22,680 and save my models to their model path dot MKD for make directory parents. Yes, I wanted to make 12472 19:28:22,680 --> 19:28:28,200 the parent directories if they don't exist and exist. Okay. Also equals true. So if we try to 12473 19:28:28,200 --> 19:28:33,720 create it, but it's already existing, we're not going to get an error. That's fine. And next, 12474 19:28:33,720 --> 19:28:38,200 we're going to create a model save path. Just going to add some code cells here. So we have 12475 19:28:38,200 --> 19:28:48,520 more space. Let's pass in here a model name. Going to set this equal to, since we're on section three, 12476 19:28:48,520 --> 19:28:56,600 I'm going to call this O three pie torch, computer vision, model two is our best model. And I'm going 12477 19:28:56,600 --> 19:29:04,680 to save it to PTH for pie torch. You can also save it to dot PT. I like to use PTH. And we're 12478 19:29:04,680 --> 19:29:13,720 going to go model save path equal model path slash model name. So now if we have a look at this, 12479 19:29:13,720 --> 19:29:20,200 we're going to have a path called model save path. But it's going to be a POSIX path in models 12480 19:29:20,200 --> 19:29:24,600 O three pie torch computer vision, model two dot PTH. And if we have a look over here, 12481 19:29:25,320 --> 19:29:30,200 we should have, yeah, we have a models directory now. That's not going to have anything in it at 12482 19:29:30,200 --> 19:29:34,440 the moment. We've got our data directory that we had before there's fashion MNIST. This is a good 12483 19:29:34,440 --> 19:29:40,680 way to start setting up your directories, break them down data models, helper function files, 12484 19:29:40,680 --> 19:29:48,760 etc. But let's keep going. Let's save, save the model state dict. We're going to go print, 12485 19:29:49,800 --> 19:29:56,520 saving model to just going to give us some information about what's happening. Model save 12486 19:29:56,520 --> 19:30:03,160 path. And we can save a model by calling torch dot save. And we pass in the object that we want 12487 19:30:03,160 --> 19:30:10,520 to save using the object parameter, OBJ. When we get a doc string there, we're going to go model 12488 19:30:10,520 --> 19:30:16,280 two, we want to save the state dict, recall that the state dict is going to be our models what 12489 19:30:17,000 --> 19:30:21,880 our models learned parameters on the data set, so that all the weights and biases and all that 12490 19:30:21,880 --> 19:30:29,160 sort of jazz. Beautiful. So when we first created model two, these were all random numbers. They've 12491 19:30:29,160 --> 19:30:34,760 been or since we trained model two on our training data, these have all been updated to represent 12492 19:30:34,760 --> 19:30:40,520 the training images. And we can leverage these later on, as you've seen before, to make predictions. 12493 19:30:40,520 --> 19:30:45,320 So I'm not going to go through all those, but that's what we're saving. And the file path is 12494 19:30:45,320 --> 19:30:52,440 going to be our model save path. So let's run this and see what happens. Beautiful. We're saving our 12495 19:30:52,440 --> 19:30:58,680 model to our model directory. And now let's have a look in here. Do we have a model? Yes, we do. 12496 19:30:58,680 --> 19:31:03,640 Beautiful. So that's how quickly we can save a model. Of course, you can customize what the name is, 12497 19:31:03,640 --> 19:31:08,600 where you save it, et cetera, et cetera. Now, let's see what happens when we load it in. 12498 19:31:09,480 --> 19:31:13,880 So create a new instance, because we only saved the state dict of model two, 12499 19:31:14,440 --> 19:31:20,200 we need to create a new instance of our model two, or how it was created, which was with 12500 19:31:20,200 --> 19:31:27,080 our class fashion MNIST V two. If we saved the whole model, we could just import it to a new 12501 19:31:27,080 --> 19:31:32,360 variable. But I'll let you read back more on that on the different ways of saving a model in here. 12502 19:31:32,360 --> 19:31:38,280 There's also a link to the pytorch documentation would highly recommend that. But let's see it in 12503 19:31:38,280 --> 19:31:45,480 action, we need to create a new instance of our fashion MNIST model V two, which is our convolution 12504 19:31:45,480 --> 19:31:50,840 or neural network. So I'm going to set the manual seed. That way when we create a new instance, 12505 19:31:50,840 --> 19:31:56,200 it's instantiated with the same random numbers. So we're going to set up loaded model two, 12506 19:31:56,200 --> 19:32:04,440 equals fashion MNIST V two. And it's important here that we set it up with the same parameters 12507 19:32:04,440 --> 19:32:10,280 as our original saved model. So fashion MNIST V two. Oh, we've got a typo here. 12508 19:32:11,320 --> 19:32:17,000 I'll fashion MNIST model V two. Wonderful. So the input shape is going to be one, 12509 19:32:17,000 --> 19:32:22,760 because that is the number of color channels in our test, in our images, test image dot shape. 12510 19:32:22,760 --> 19:32:29,400 Do we still have a test image should be? Oh, well, we've created a different one, but our image size, 12511 19:32:29,400 --> 19:32:39,080 our image shape is 12828 image shape for color channels height width. Then we create it with 12512 19:32:39,080 --> 19:32:43,480 hidden units, we use 10 for hidden units. So we can just set that here. This is important, 12513 19:32:43,480 --> 19:32:46,600 they just have to otherwise if the shapes aren't the same, what are we going to get? We're going 12514 19:32:46,600 --> 19:32:52,520 to get a shape mismatch error. And our output shape is what is also going to be 10 or 12515 19:32:52,520 --> 19:32:59,560 length of class names. If you have the class names variable instantiated, that is. So we're 12516 19:32:59,560 --> 19:33:06,040 going to load in the saved state dict, the one that we just saved. So we can go loaded model two, 12517 19:33:07,560 --> 19:33:15,400 dot load state dict. And we can pass in torch dot load in here. And the file that we want to load 12518 19:33:15,400 --> 19:33:22,840 or the file path is model save path up here. This is why I like to just save my path variables 12519 19:33:22,840 --> 19:33:28,360 to a variable so that I can just use them later on, instead of re typing out this all the time, 12520 19:33:28,360 --> 19:33:34,600 which is definitely prone to errors. So we're going to send the model to the target device. 12521 19:33:34,600 --> 19:33:44,040 Loaded model two dot two device. Beautiful. Let's see what happens here. 12522 19:33:46,520 --> 19:33:54,280 Wonderful. So let's now evaluate the loaded model. So evaluate loaded model. The results 12523 19:33:54,280 --> 19:34:00,520 should be very much the same as our model two results. So model two results. 12524 19:34:00,520 --> 19:34:07,480 So this is what we're looking for. We want to make sure that our saved model saved these results 12525 19:34:07,480 --> 19:34:12,680 pretty closely. Now I say pretty closely because you might find some discrepancies in this lower 12526 19:34:12,680 --> 19:34:18,280 these lower decimals here, just because of the way files get saved and something gets lost, 12527 19:34:18,280 --> 19:34:24,200 et cetera, et cetera. So that's just to do with precision and computing. But as long as the first 12528 19:34:24,200 --> 19:34:32,200 few numbers are quite similar, well, then we're all gravy. So let's go torch manual seed. 12529 19:34:33,960 --> 19:34:39,400 Remember, evaluating a model is almost as well is just as important as training a model. So this 12530 19:34:39,400 --> 19:34:44,280 is what we're doing. We're making sure our model save correctly. Before we deployed it, if it didn't 12531 19:34:44,280 --> 19:34:49,320 if we deployed it, it didn't save correctly. Well, then we'd get our we would get less than ideal 12532 19:34:49,320 --> 19:34:54,600 results, wouldn't we? So model equals loaded model two, we're going to use our same 12533 19:34:54,600 --> 19:35:00,040 of our model function, by the way. And of course, we're going to evaluate it on the same test data 12534 19:35:00,040 --> 19:35:05,560 set that we've been using test data loader. And we're going to create a loss function or just 12535 19:35:05,560 --> 19:35:10,440 put in our loss function that we've created before. And our accuracy function is the accuracy 12536 19:35:10,440 --> 19:35:15,320 function we've been using throughout this notebook. So now let's check out loaded model two results. 12537 19:35:15,320 --> 19:35:20,840 They should be quite similar to this one. We're going to make some predictions. And then if we go 12538 19:35:20,840 --> 19:35:28,360 down, do we have the same numbers? Yes, we do. So we have five, six, eight, two, nine, five, six, 12539 19:35:28,360 --> 19:35:33,800 eight, two, nine, wonderful. And three, one, three, five, eight, three, one, three, five, eight, 12540 19:35:33,800 --> 19:35:41,160 beautiful. It looks like our loaded model gets the same results as our previously trained model 12541 19:35:41,160 --> 19:35:47,160 before we even saved it. And if you wanted to check if they were close, you can also use torch 12542 19:35:47,160 --> 19:35:52,200 dot is close, check if model results, if you wanted to check if they were close programmatically, 12543 19:35:52,200 --> 19:35:57,320 that is, because we just looked at these visually, check if model results are close to each other. 12544 19:35:58,760 --> 19:36:04,280 Now we can go torch is close, we're going to pass in torch dot tensor, we have to turn these 12545 19:36:04,280 --> 19:36:12,520 values into a tensor. We're going to go model two results. And we'll compare the model loss. 12546 19:36:13,080 --> 19:36:18,360 How about we do that? We want to make sure the loss values are the same. Or very close, 12547 19:36:18,360 --> 19:36:25,880 that is with torch dot is close. Torch dot tensor model. Or we want this one to be loaded model two 12548 19:36:25,880 --> 19:36:35,400 results. Model loss. Another bracket on the end there. And we'll see how close they are true, 12549 19:36:35,400 --> 19:36:41,080 wonderful. Now, if this doesn't return true, you can also adjust the tolerance levels in here. 12550 19:36:41,800 --> 19:36:47,960 So we go atal equals, this is going to be the absolute tolerance. So if we do one to the negative 12551 19:36:47,960 --> 19:36:53,560 eight, it's saying like, Hey, we need to make sure our results are basically the same up to eight 12552 19:36:53,560 --> 19:36:59,400 decimal points. That's probably quite low. I would say just make sure they're at least within two. 12553 19:37:00,360 --> 19:37:05,720 But if you're getting discrepancies here between your saved model and your loaded model, or sorry, 12554 19:37:05,720 --> 19:37:10,840 this model here, the original one and your loaded model, if they are quite large, so they're like 12555 19:37:10,840 --> 19:37:15,480 more than a few decimal points off in this column or even here, I'd go back through your code and 12556 19:37:15,480 --> 19:37:19,880 make sure that your model is saving correctly, make sure you've got random seeds set up. But 12557 19:37:19,880 --> 19:37:24,680 if they're pretty close, like in terms of within three or two decimal places of each other, 12558 19:37:24,680 --> 19:37:29,240 well, then I'd say that's that's close enough. But you can also adjust the tolerance level here 12559 19:37:29,240 --> 19:37:37,240 to check if your model results are close enough, programmatically. Wow, we have covered a fair bit 12560 19:37:37,240 --> 19:37:43,400 here. We've gone through this entire workflow for a computer vision problem. Let's in the next 12561 19:37:43,400 --> 19:37:49,400 video, I think that's enough code for this section, section three, pytorch computer vision. I've got 12562 19:37:49,400 --> 19:37:53,880 some exercises and some extra curriculum lined up for you. So let's have a look at those in the 12563 19:37:53,880 --> 19:38:03,480 next video. I'll see you there. My goodness. Look how much computer vision pytorch code 12564 19:38:03,480 --> 19:38:08,120 we've written together. We started off right up the top. We looked at the reference notebook and 12565 19:38:08,120 --> 19:38:12,680 the online book. We checked out computer vision libraries and pytorch, the main one being torch 12566 19:38:12,680 --> 19:38:17,960 vision. Then we got a data set, namely the fashion MNIST data set. There are a bunch more data sets 12567 19:38:17,960 --> 19:38:21,880 that we could have looked at. And in fact, I'd encourage you to try some out in the torch vision 12568 19:38:21,880 --> 19:38:28,360 dot data sets, use all of the steps that we've done here to try it on another data set. We repaired 12569 19:38:28,360 --> 19:38:34,440 our data loaders. So turned our data into batches. We built a baseline model, which is an important 12570 19:38:34,440 --> 19:38:39,800 step in machine learning, because the baseline model is usually relatively simple. And it's going 12571 19:38:39,800 --> 19:38:45,480 to serve as a baseline that you're going to try and improve upon through just go back to the keynote 12572 19:38:45,480 --> 19:38:52,360 through various experiments. We then made predictions with model zero. We evaluated it. 12573 19:38:53,000 --> 19:38:57,880 We timed our predictions to see if running our models on the GPU was faster when we learned that 12574 19:38:57,880 --> 19:39:02,920 sometimes a GPU won't necessarily speed up code if it's a relatively small data set because of the 12575 19:39:02,920 --> 19:39:09,800 overheads between copying data from CPU to GPU. We tried a model with non-linearity and we saw that 12576 19:39:09,800 --> 19:39:15,640 it didn't really improve upon our baseline model. But then we brought in the big guns, a convolutional 12577 19:39:15,640 --> 19:39:21,160 neural network, replicating the CNN explainer website. And by gosh, didn't we spend a lot of time 12578 19:39:21,160 --> 19:39:27,000 here? I'd encourage you as part of your extra curriculum to go through this again and again. 12579 19:39:27,000 --> 19:39:32,440 I still even come back to refer to it too. I referred to it a lot making the materials for this 12580 19:39:32,440 --> 19:39:37,720 video section and this code section. So be sure to go back and check out the CNN explainer website 12581 19:39:37,720 --> 19:39:44,600 for more of what's going on behind the scenes of your CNNs. But we coded one using pure pytorch. 12582 19:39:44,600 --> 19:39:50,120 That is amazing. We compared our model results across different experiments. We found that our 12583 19:39:50,120 --> 19:39:55,800 convolutional neural network did the best, although it took a little bit longer to train. And we also 12584 19:39:55,800 --> 19:40:01,720 learned that the training time values will definitely vary depending on the hardware you're using. 12585 19:40:01,720 --> 19:40:07,240 So that's just something to keep in mind. We made an evaluated random predictions with our best 12586 19:40:07,240 --> 19:40:13,160 model, which is an important step in visualizing, visualizing, visualizing your model's predictions, 12587 19:40:13,160 --> 19:40:18,680 because you could get evaluation metrics. But until you start to actually visualize what's going on, 12588 19:40:18,680 --> 19:40:24,520 well, in my case, that's how I best understand what my model is thinking. We saw a confusion 12589 19:40:24,520 --> 19:40:30,600 matrix using two different libraries torch metrics and ML extend a great way to evaluate 12590 19:40:30,600 --> 19:40:36,520 your classification models. And we saw how to save and load the best performing model to file 12591 19:40:36,520 --> 19:40:41,880 and made sure that the results of our saved model weren't too different from the model that 12592 19:40:41,880 --> 19:40:49,400 we trained within the notebook. So now it is time I'd love for you to practice what 12593 19:40:49,400 --> 19:40:52,680 you've gone through. This is actually really exciting now because you've gone through an end-to-end 12594 19:40:52,680 --> 19:40:58,840 computer vision problem. I've got some exercises prepared. If you go to the learn pytorch.io website 12595 19:40:58,840 --> 19:41:04,440 in section 03, scroll down. You can read through all of this. This is all the materials that we've 12596 19:41:04,440 --> 19:41:09,080 just covered in pure code. There's a lot of pictures in this notebook too that are helpful to learn 12597 19:41:09,080 --> 19:41:14,360 things what's going on. We have some exercises here. So all of the exercises are focused on 12598 19:41:14,360 --> 19:41:19,800 practicing the code and the sections above. We have two resources. We also have some extra 12599 19:41:19,800 --> 19:41:23,880 curriculum that I've put together. If you want an in-depth understanding of what's going on 12600 19:41:23,880 --> 19:41:28,200 behind the scenes in the convolutional neural networks, because we've focused a lot on code, 12601 19:41:28,760 --> 19:41:34,040 I'd highly recommend MIT's induction to deep computer vision lecture. You can spend 10 minutes 12602 19:41:34,040 --> 19:41:39,000 clicking through the different options in the pytorch vision library, torch vision, look up most 12603 19:41:39,000 --> 19:41:44,040 common convolutional neural networks in the torch vision model library, and then for a larger number 12604 19:41:44,040 --> 19:41:48,440 of pre-trained pytorch computer vision models, and if you get deeper into computer vision, 12605 19:41:48,440 --> 19:41:54,280 you're probably going to run into the torch image models library, otherwise known as 10, 12606 19:41:54,280 --> 19:41:59,080 but I'm going to leave that as extra curriculum. I'm going to just link this exercises section 12607 19:41:59,080 --> 19:42:05,960 here. Again, it's at learn pytorch.io in the exercises section. We come down. There we go. 12608 19:42:07,240 --> 19:42:13,560 But there is also resource here, an exercise template notebook. So we've got one, what are 12609 19:42:13,560 --> 19:42:17,960 three areas in industry where computer vision is being currently used. Now this is in the 12610 19:42:17,960 --> 19:42:25,080 pytorch deep learning repo, extras exercises number three. I've put out some template code here 12611 19:42:25,080 --> 19:42:30,120 for you to fill in these different sections. So some of them are code related. Some of them 12612 19:42:30,120 --> 19:42:35,160 are just text based, but they should all be able to be completed by referencing what we've gone 12613 19:42:35,160 --> 19:42:40,920 through in this notebook here. And just as one more, if we go back to pytorch deep learning, 12614 19:42:43,080 --> 19:42:47,080 this will probably be updated by the time you get here, you can always find the exercise in 12615 19:42:47,080 --> 19:42:53,560 extra curriculum by going computer vision, go to exercise in extra curriculum, or if we go into 12616 19:42:53,560 --> 19:43:00,440 the extras file, and then we go to solutions. I've now also started to add video walkthroughs 12617 19:43:00,440 --> 19:43:07,720 of each of the solutions. So this is me going through each of the exercises myself and coding 12618 19:43:07,720 --> 19:43:12,440 them. And so you'll get to see the unedited videos. So they're just one long live stream. 12619 19:43:12,440 --> 19:43:18,120 And I've done some for O2, O3, and O4, and there will be more here by the time you watch this video. 12620 19:43:18,120 --> 19:43:22,840 But if you'd like to see how I figure out the solutions to the exercises, you can watch those 12621 19:43:22,840 --> 19:43:28,840 videos and go through them yourself. But first and foremost, I would highly recommend trying out 12622 19:43:28,840 --> 19:43:34,120 the exercises on your own first. And then if you get stuck, refer to the notebook here, 12623 19:43:34,120 --> 19:43:40,200 refer to the pytorch documentation. And finally, you can check out what I would have coded as a 12624 19:43:40,200 --> 19:43:47,560 potential solution. So there's number three, computer vision, exercise solutions. So congratulations 12625 19:43:47,560 --> 19:43:52,120 on going through the pytorch computer vision section. I'll see you in the next section. We're 12626 19:43:52,120 --> 19:43:58,200 going to look at pytorch custom data sets, but no spoilers. I'll see you soon. 12627 19:44:04,760 --> 19:44:11,960 Hello, hello, hello, and welcome to section number four of the Learn pytorch for deep learning course. 12628 19:44:11,960 --> 19:44:20,600 We have custom data sets with pytorch. Now, before we dive into what we're going to cover, 12629 19:44:20,600 --> 19:44:25,080 let's answer the most important question. Where can you get help? Now, we've been through this 12630 19:44:25,080 --> 19:44:31,000 a few times now, but it's important to reiterate. Follow along with the code as best you can. We're 12631 19:44:31,000 --> 19:44:36,680 going to be writing a bunch of pytorch code. Remember the motto, if and out, run the code. 12632 19:44:37,240 --> 19:44:42,120 That's in line with try it for yourself. If you'd like to read or read the doxtring, 12633 19:44:42,120 --> 19:44:47,640 you can press shift command plus space in Google Colab. Or if you're on Windows, command might 12634 19:44:47,640 --> 19:44:52,760 be control. Then if you're still stuck, you can search for it. Two of the resources you will 12635 19:44:52,760 --> 19:44:57,640 probably come across is stack overflow or the wonderful pytorch documentation, which we've 12636 19:44:57,640 --> 19:45:03,400 had a lot of experience with so far. Then, of course, try again, go back through your code, 12637 19:45:03,400 --> 19:45:09,240 if and out, code it out, or if and out, run the code. And then finally, if you're still stuck, 12638 19:45:09,880 --> 19:45:16,600 ask a question on the pytorch deep learning discussions GitHub page. So if I click this link, 12639 19:45:16,600 --> 19:45:22,280 we come to Mr. D Burke slash pytorch deep learning, the URL is here. We've seen this before. If you 12640 19:45:22,280 --> 19:45:28,840 have a trouble or a problem with any of the course, you can start a discussion and you can 12641 19:45:28,840 --> 19:45:34,840 select the category, general ideas, polls, Q and A, and then we can go here, video, 12642 19:45:34,840 --> 19:45:43,400 put the video number in. So 99, for example, my code doesn't do what I'd like it to. So say 12643 19:45:43,400 --> 19:45:52,280 your problem and then come in here, write some code here, code here, and then my question is 12644 19:45:54,840 --> 19:45:59,320 something, something, something, click start discussion, and then we can help out. And then if 12645 19:45:59,320 --> 19:46:03,160 we come back to the discussions, of course, you can search for what's going on. So if you have an 12646 19:46:03,160 --> 19:46:07,080 error and you feel like someone else might have seen this error, you can, of course, search it 12647 19:46:07,080 --> 19:46:12,520 and find out what's happening. Now, I just want to highlight again, the resources for this course 12648 19:46:12,520 --> 19:46:17,960 are at learn pytorch.io. We are up to section four. This is a beautiful online book version of 12649 19:46:17,960 --> 19:46:23,080 all the materials we are going to cover in this section. So spoiler alert, you can use this as a 12650 19:46:23,080 --> 19:46:29,720 reference. And then, of course, in the GitHub, we have the same notebook here, pytorch custom 12651 19:46:29,720 --> 19:46:35,320 data sets. This is the ground truth notebook. So check that out if you get stuck. So I'm just 12652 19:46:35,320 --> 19:46:40,360 going to exit out of this. We've got pytorch custom data sets at learn pytorch.io. And then, 12653 19:46:40,360 --> 19:46:45,800 of course, the discussions tab for the Q&A. Now, if we jump back to the keynote, what do we have? 12654 19:46:46,760 --> 19:46:53,480 We might be asking, what is a custom data set? Now, we've built a fair few pytorch deeplining 12655 19:46:53,480 --> 19:46:59,800 neural networks so far on various data sets, such as fashion MNIST. But you might be wondering, 12656 19:46:59,800 --> 19:47:05,080 hey, I've got my own data set, or I'm working on my own problem. Can I build a model with pytorch 12657 19:47:05,080 --> 19:47:11,720 to predict on that data set? And the answer is yes. However, you do have to go through a few 12658 19:47:11,720 --> 19:47:17,000 pre processing steps to make that data set compatible with pytorch. And that's what we're 12659 19:47:17,000 --> 19:47:23,640 going to be covering in this section. And so I'd like to highlight the pytorch domain libraries. 12660 19:47:23,640 --> 19:47:28,840 Now, we've had a little bit of experience before with torch vision, such as if we wanted to classify 12661 19:47:28,840 --> 19:47:35,240 whether a photo was a pizza, steak, or sushi. So a computer vision image classification problem. 12662 19:47:35,960 --> 19:47:43,320 Now, there's also text, such as if these reviews are positive or negative. And you can use torch 12663 19:47:43,320 --> 19:47:48,120 text for that. But again, these are only just one problem within the vision space within the text 12664 19:47:48,120 --> 19:47:54,760 space. I want you to just understand that if you have any type of vision data, you probably 12665 19:47:54,760 --> 19:47:59,320 want to look into torch vision. And if you have any kind of text data, you probably want to look 12666 19:47:59,320 --> 19:48:05,640 into torch text. And then if you have audio, such as if you wanted to classify what song was playing, 12667 19:48:05,640 --> 19:48:12,760 this is what Shazam does, it uses the input sound of some sort of music, and then runs a neural network 12668 19:48:12,760 --> 19:48:18,200 over it to classify it to a certain song, you can look into torch audio for that. And then if you'd 12669 19:48:18,200 --> 19:48:23,960 like to recommend something such as you have an online store, or if your Netflix or something 12670 19:48:23,960 --> 19:48:29,480 like that, and you'd like to have a homepage that updates for recommendations, you'd like to look 12671 19:48:29,480 --> 19:48:35,320 into torch rec, which stands for recommendation system. And so this is just something to keep in mind. 12672 19:48:36,680 --> 19:48:43,560 Because each of these domain libraries has a data sets module that helps you work with different 12673 19:48:43,560 --> 19:48:49,800 data sets from different domains. And so different domain libraries contain data loading functions 12674 19:48:49,800 --> 19:48:56,600 for different data sources. So torch vision, let's just go into the next slide, we have problem space 12675 19:48:56,600 --> 19:49:02,120 vision for pre built data sets, so existing data sets like we've seen with fashion MNIST, 12676 19:49:02,120 --> 19:49:07,320 as well as functions to load your own vision data sets, you want to look into torch vision 12677 19:49:07,320 --> 19:49:14,200 dot data sets. So if we click on this, we have built in data sets, this is the pie torch documentation. 12678 19:49:14,200 --> 19:49:20,520 And if we go here, we have torch audio, torch text, torch vision, torch rec, torch data. Now, 12679 19:49:20,520 --> 19:49:26,600 at the time of recording, which is April 2022, this is torch data is currently in beta. But it's 12680 19:49:26,600 --> 19:49:32,600 going to be updated over time. So just keep this in mind, updated over time to add even more ways 12681 19:49:32,600 --> 19:49:38,520 to load different data resources. But for now, we're just going to get familiar with torch vision 12682 19:49:38,520 --> 19:49:45,720 data sets. If we went into torch text, there's another torch text dot data sets. And then if we 12683 19:49:45,720 --> 19:49:52,120 went into torch audio, we have torch audio dot data sets. And so you're noticing a trend here 12684 19:49:52,120 --> 19:49:57,960 that depending on the domain you're working in, whether it be vision, text, audio, or your data 12685 19:49:57,960 --> 19:50:03,880 is recommendation data, you'll probably want to look into its custom library within pie torch. 12686 19:50:03,880 --> 19:50:09,000 And of course, the bonus is torch data. It contains many different helper functions for loading data, 12687 19:50:09,000 --> 19:50:14,600 and is currently in beta as of April 2022. So 2022. So the by the time you watch this torch data 12688 19:50:14,600 --> 19:50:20,200 may be out of beta. And then that should be something that's extra curriculum on top of what we're 12689 19:50:20,200 --> 19:50:26,680 going to cover in this section. So let's keep going. So this is what we're going to work towards 12690 19:50:26,680 --> 19:50:35,800 building food vision mini. So we're going to load some data, namely some images of pizza, 12691 19:50:35,800 --> 19:50:42,040 sushi, and steak from the food 101 data set, we're going to build an image classification model, 12692 19:50:42,040 --> 19:50:48,040 such as the model that might power a food vision recognition app or a food image recognition app. 12693 19:50:48,760 --> 19:50:55,160 And then we're going to see if it can classify an image of pizza as pizza, an image of sushi as sushi, 12694 19:50:55,160 --> 19:51:00,520 and an image of steak as steak. So this is what we're going to focus on. We want to load, 12695 19:51:00,520 --> 19:51:06,680 say we had images existing already of pizza, sushi, and steak, we want to write some code 12696 19:51:06,680 --> 19:51:13,240 to load these images of food. So our own custom data set for building this food vision mini model, 12697 19:51:13,240 --> 19:51:17,960 which is quite similar to if you go to this is the project I'm working on personally, 12698 19:51:17,960 --> 19:51:27,800 neutrify.app. This is a food image recognition model. Here we go. So it's still a work in progress as 12699 19:51:27,800 --> 19:51:33,480 I'm going through it, but you can upload an image of food and neutrify will try to classify 12700 19:51:33,480 --> 19:51:41,400 what type of food it is. So do we have steak? There we go. Let's upload that. Beautiful steak. 12701 19:51:41,400 --> 19:51:45,880 So we're going to be building a similar model to what powers neutrify. And then there's the 12702 19:51:45,880 --> 19:51:50,920 macro nutrients for the steak. If you'd like to find out how it works, I've got all the links here, 12703 19:51:50,920 --> 19:51:56,360 but that's at neutrify.app. So let's keep pushing forward. We'll go back to the keynote. 12704 19:51:57,000 --> 19:52:02,360 This is what we're working towards. As I said, we want to load these images into PyTorch so that 12705 19:52:02,360 --> 19:52:07,080 we can build a model. We've already built a computer vision model. So we want to figure out 12706 19:52:07,080 --> 19:52:13,080 how do we get our own data into that computer vision model. And so of course we'll be adhering 12707 19:52:13,080 --> 19:52:20,760 to our PyTorch workflow that we've used a few times now. So we're going to learn how to load a 12708 19:52:20,760 --> 19:52:26,440 data set with our own custom data rather than an existing data set within PyTorch. We'll see how 12709 19:52:26,440 --> 19:52:31,880 we can build a model to fit our own custom data set. We'll go through all the steps that's involved 12710 19:52:31,880 --> 19:52:36,600 in training a model such as picking a loss function and an optimizer. We'll build a training loop. 12711 19:52:36,600 --> 19:52:44,520 We'll evaluate our model. We'll improve through experimentation. And then we can see save and reloading 12712 19:52:44,520 --> 19:52:50,760 our model. But we're also going to practice predicting on our own custom data, which is a very, 12713 19:52:50,760 --> 19:52:56,120 very important step whenever training your own models. So what we're going to cover broadly, 12714 19:52:57,400 --> 19:53:01,960 we're going to get a custom data set with PyTorch. As we said, we're going to become one with the 12715 19:53:01,960 --> 19:53:07,880 data. In other words, preparing and visualizing it. We'll learn how to transform data for use with 12716 19:53:07,880 --> 19:53:12,520 a model, very important step. We'll see how we can load custom data with pre-built functions 12717 19:53:12,520 --> 19:53:18,280 and our own custom functions. We'll build a computer vision model, aka food vision mini, 12718 19:53:18,280 --> 19:53:24,920 to classify pizza, steak, and sushi images. So a multi-class classification model. We'll compare 12719 19:53:24,920 --> 19:53:29,480 models with and without data augmentation. We haven't covered that yet, but we will later on. 12720 19:53:29,480 --> 19:53:35,400 And finally, we'll see how we can, as I said, make predictions on custom data. So this means 12721 19:53:35,400 --> 19:53:42,040 data that's not within our training or our test data set. And how are we going to do it? Well, 12722 19:53:42,040 --> 19:53:47,720 we could do it cooks or chemists. But I like to treat machine learning as a little bit of an art, 12723 19:53:47,720 --> 19:53:54,520 so we're going to be cooking up lots of code. With that being said, I'll see you in Google Colab. 12724 19:53:54,520 --> 19:54:03,800 Let's code. Welcome back to the PyTorch cooking show. Let's now learn how we can cook up some 12725 19:54:03,800 --> 19:54:11,400 custom data sets. I'm going to jump into Google Colab. So colab.research.google.com. 12726 19:54:12,760 --> 19:54:18,440 And I'm going to click new notebook. I'm just going to make sure this is zoomed in enough for 12727 19:54:18,440 --> 19:54:26,760 the video. Wonderful. So I'm going to rename this notebook 04 because we're up to section 04. 12728 19:54:27,640 --> 19:54:33,800 And I'm going to call it PyTorch custom data sets underscore video because this is going to be one 12729 19:54:33,800 --> 19:54:37,880 of the video notebooks, which has all the code that I write during the videos, which is of course 12730 19:54:37,880 --> 19:54:44,200 contained within the video notebooks folder on the PyTorch deep learning repo. So if you'd like 12731 19:54:44,200 --> 19:54:48,520 the resource or the ground truth notebook for this, I'm going to just put a heading here. 12732 19:54:49,560 --> 19:54:59,880 04 PyTorch custom data sets video notebook, make that bigger, and then put resources. 12733 19:55:01,720 --> 19:55:12,520 So book version of the course materials for 04. We'll go there, and then we'll go ground truth 12734 19:55:12,520 --> 19:55:17,800 version of notebook 04, which will be the reference notebook that we're going to use 12735 19:55:17,800 --> 19:55:24,680 for this section. Come into PyTorch custom data sets. And then we can put that in there. 12736 19:55:25,640 --> 19:55:34,040 Wonderful. So the whole synopsis of this custom data sets section is we've used some data sets 12737 19:55:34,040 --> 19:55:44,840 with PyTorch before, but how do you get your own data into PyTorch? Because that's what you 12738 19:55:44,840 --> 19:55:49,080 want to start working on, right? You want to start working on problems of your own. You want to 12739 19:55:49,080 --> 19:55:53,160 come into any sort of data that you've never worked with before, and you want to figure out how do 12740 19:55:53,160 --> 19:56:03,400 you get that into PyTorch. So one of the ways to do so is via custom data sets. And then I want 12741 19:56:03,400 --> 19:56:09,720 to put a note down here. So we're going to go zero section zero is going to be importing 12742 19:56:09,720 --> 19:56:21,080 PyTorch and setting up device agnostic code. But I want to just stress here that domain libraries. 12743 19:56:23,240 --> 19:56:31,160 So just to reiterate what we went through last video. So depending on what you're working on, 12744 19:56:31,160 --> 19:56:41,800 whether it be vision, text, audio, recommendation, something like that, you'll want to look into 12745 19:56:41,800 --> 19:56:52,200 each of the PyTorch domain libraries for existing data loader or data loading functions and 12746 19:56:52,200 --> 19:57:00,040 customizable data loading functions. So just keep that in mind. We've seen some of them. So if we 12747 19:57:00,040 --> 19:57:07,080 go torch vision, which is what we're going to be looking at, torch vision, we've got data sets, 12748 19:57:07,080 --> 19:57:12,440 and we've got documentation, we've got data sets for each of the other domain libraries here as 12749 19:57:12,440 --> 19:57:18,440 well. So if you're working on a text problem, it's going to be a similar set of steps to what 12750 19:57:18,440 --> 19:57:23,640 we're going to do with our vision problem when we build food vision mini. What we have is a data 12751 19:57:23,640 --> 19:57:28,360 set that exists somewhere. And what we want to do is bring that into PyTorch so we can build a 12752 19:57:28,360 --> 19:57:34,760 model with it. So let's import the libraries that we need. So we're going to import torch and 12753 19:57:35,960 --> 19:57:42,200 we'll probably import an N. So we'll import that from PyTorch. And I'm just going to check the 12754 19:57:42,200 --> 19:57:54,040 torch version here. So note, we need PyTorch 1.10.0 plus is required for this course. So if you're 12755 19:57:54,040 --> 19:57:59,480 using Google Colab at a later date, you may have a later version of PyTorch. I'm just going to 12756 19:57:59,480 --> 19:58:08,520 show you what version I'm using. Just going to let this load. We're going to get this ready. 12757 19:58:08,520 --> 19:58:13,000 We're going to also set up device agnostic code right from the start this time because this is 12758 19:58:13,000 --> 19:58:19,080 best practice with PyTorch. So this way, if we have a CUDA device available, our model is going 12759 19:58:19,080 --> 19:58:25,560 to use that CUDA device. And our data is going to be on that CUDA device. So there we go. Wonderful. 12760 19:58:25,560 --> 19:58:34,840 We've got PyTorch 1.10.0 plus CUDA. 111. Maybe that's 11.1. So let's check if CUDA.is available. 12761 19:58:34,840 --> 19:58:40,920 Now, I'm using Google Colab. We haven't set up a GPU yet. So it probably won't be available yet. 12762 19:58:40,920 --> 19:58:49,640 Let's have a look. Wonderful. So because we've started a new Colab instance, it's going to use 12763 19:58:49,640 --> 19:58:56,040 the CPU by default. So how do we change that? We come up to runtime, change runtime type. I'm going 12764 19:58:56,040 --> 19:59:02,520 to go hard there accelerator GPU. We've done this a few times now. I am paying for Google Colab Pro. 12765 19:59:02,520 --> 19:59:09,960 So one of the benefits of that is that it our Google Colab reserves faster GPUs for you. You do 12766 19:59:09,960 --> 19:59:15,160 don't need Google Colab Pro. As I've said to complete this course, you can use the free version, 12767 19:59:15,160 --> 19:59:22,840 but just recall Google Colab Pro tends to give you a better GPU just because GPUs aren't free. 12768 19:59:23,960 --> 19:59:28,600 Wonderful. So now we've got access to a GPU CUDA. What GPU do I have? 12769 19:59:30,360 --> 19:59:36,840 Nvidia SMI. I have a Tesla P100 with 16 gigabytes of memory, which will be more than enough for 12770 19:59:36,840 --> 19:59:43,400 the problem that we're going to work on in this video. So I believe that's enough to cover for 12771 19:59:43,400 --> 19:59:49,240 the first coding video. Let's in the next section, we are working with custom datasets after all. 12772 19:59:49,240 --> 19:59:51,880 Let's in the next video. Let's get some data, hey. 12773 19:59:55,320 --> 20:00:01,560 Now, as I said in the last video, we can't cover custom datasets without some data. So let's get 12774 20:00:01,560 --> 20:00:07,720 some data and just remind ourselves what we're going to build. And that is food vision mini. 12775 20:00:07,720 --> 20:00:13,320 So we need a way of getting some food images. And if we go back to Google Chrome, 12776 20:00:14,200 --> 20:00:21,240 torch vision datasets has plenty of built-in datasets. And one of them is the food 101 dataset. 12777 20:00:22,200 --> 20:00:30,520 Food 101. So if we go in here, this is going to take us to the original food 101 website. 12778 20:00:30,520 --> 20:00:37,000 So food 101 is 101 different classes of food. It has a challenging dataset of 101 different 12779 20:00:37,000 --> 20:00:44,920 food categories with 101,000 images. So that's a quite a beefy dataset. And so for each class, 12780 20:00:44,920 --> 20:00:52,920 250 manually reviewed test images are provided. So we have per class, 101 classes, 250 testing 12781 20:00:52,920 --> 20:01:00,680 images, and we have 750 training images. Now, we could start working on this entire dataset 12782 20:01:00,680 --> 20:01:06,280 straight from the get go. But to practice, I've created a smaller subset of this dataset, 12783 20:01:06,280 --> 20:01:12,280 and I'd encourage you to do the same with your own problems. Start small and upgrade when necessary. 12784 20:01:13,080 --> 20:01:18,680 So I've reduced the number of categories to three and the number of images to 10%. 12785 20:01:18,680 --> 20:01:26,840 Now, you could reduce this to an arbitrary amount, but I've just decided three is enough to begin with 12786 20:01:26,840 --> 20:01:32,280 and 10% of the data. And then if it works, hey, you could upscale that on your own accord. 12787 20:01:32,840 --> 20:01:38,200 And so I just want to show you the notebook that I use to create this dataset and as extra curriculum, 12788 20:01:38,200 --> 20:01:43,800 you could go through this notebook. So if we go into extras, 04 custom data creation, 12789 20:01:43,800 --> 20:01:50,280 this is just how I created the subset of data. So making a dataset to use with notebook number 12790 20:01:50,280 --> 20:01:58,120 four, I created it in custom image data set or image classification style. So we have a top level 12791 20:01:58,120 --> 20:02:03,240 folder of pizza, steak, and sushi. We have a training directory with pizza, steak, and sushi 12792 20:02:03,240 --> 20:02:09,960 images. And we have a test directory with pizza, steak, and sushi images as well. So you can go 12793 20:02:09,960 --> 20:02:16,200 through that to check it out how it was made. But now, oh, and also, if you go to loan pytorch.io 12794 20:02:16,200 --> 20:02:22,440 section four, there's more information here about what food 101 is. So get data. Here we go. 12795 20:02:23,080 --> 20:02:28,840 There's all the information about food 101. There's some resources, the original food 101 data set, 12796 20:02:28,840 --> 20:02:35,160 torch vision data sets, food 101, how I created this data set, and actually downloading the data. 12797 20:02:35,160 --> 20:02:40,840 But now we're going to write some code, because this data set, the smaller version that I've created 12798 20:02:40,840 --> 20:02:46,920 is on the pytorch deep learning repo, under data. And then we have pizza, steak, sushi.zip. 12799 20:02:46,920 --> 20:02:53,320 Oh, this one is a little spoiler for one of the exercises for this section. But you'll see that 12800 20:02:53,320 --> 20:03:01,320 later. Let's go in here. Let's now write some code to get this data set from GitHub, 12801 20:03:01,320 --> 20:03:05,000 pizza, steak, sushi.zip. And then we'll explore it, we'll become one with the data. 12802 20:03:05,800 --> 20:03:12,440 So I just want to write down here, our data set is a subset of the food 101 data set. 12803 20:03:14,520 --> 20:03:23,240 Food 101 starts with 101 different classes of food. So we could definitely build computer 12804 20:03:23,240 --> 20:03:29,720 vision models for 101 classes, but we're going to start smaller. Our data set starts with three 12805 20:03:29,720 --> 20:03:41,160 classes of food, and only 10% of the images. So what's right here? And 1000 images per class, 12806 20:03:42,040 --> 20:03:54,360 which is 750 training, 250 testing. And we have about 75 training images per class, 12807 20:03:54,360 --> 20:04:03,880 and about 25 testing images per class. So why do this? When starting out ML projects, 12808 20:04:05,000 --> 20:04:13,880 it's important to try things on a small scale and then increase the scale when necessary. 12809 20:04:15,320 --> 20:04:21,800 The whole point is to speed up how fast you can experiment. 12810 20:04:21,800 --> 20:04:27,000 Because there's no point trying to experiment on things that if we try to train on 100,000 12811 20:04:27,000 --> 20:04:32,360 images to begin with, our models might train take half an hour to train at a time. So at the 12812 20:04:32,360 --> 20:04:39,240 beginning, we want to increase the rate that we experiment at. And so let's get some data. 12813 20:04:39,240 --> 20:04:45,320 We're going to import requests so that we can request something from GitHub to download this 12814 20:04:45,320 --> 20:04:52,040 URL here. Then we're also going to import zip file from Python, because our data is in the form 12815 20:04:52,040 --> 20:04:57,720 of a zip file right now. Then we're going to get path lib, because I like to use paths whenever 12816 20:04:57,720 --> 20:05:04,360 I'm dealing with file paths or directory paths. So now let's set up a path to a data folder. 12817 20:05:05,080 --> 20:05:10,200 And this, of course, will depend on where your data set lives, what you'd like to do. But I 12818 20:05:10,200 --> 20:05:15,160 typically like to create a folder over here called data. And that's just going to store all of my 12819 20:05:15,160 --> 20:05:24,440 data for whatever project I'm working on. So data path equals path data. And then we're going to go 12820 20:05:24,440 --> 20:05:34,200 image path equals data path slash pizza steak sushi. That's how we're going to have images 12821 20:05:34,200 --> 20:05:40,280 from those three classes. Pizza steak and sushi are three of the classes out of the 101 in food 12822 20:05:40,280 --> 20:05:48,840 101. So if the image folder doesn't exist, so if our data folder already exists, we don't want to 12823 20:05:48,840 --> 20:05:55,800 redownload it. But if it doesn't exist, we want to download it and unzip it. So if image path 12824 20:05:55,800 --> 20:06:08,120 is der, so we want to print out the image path directory already exists skipping download. 12825 20:06:09,880 --> 20:06:19,960 And then if it doesn't exist, we want to print image path does not exist, creating one. Beautiful. 12826 20:06:19,960 --> 20:06:25,960 And so we're going to go image path dot mk der to make a directory. We want to make its parents 12827 20:06:25,960 --> 20:06:30,920 if we need to. So the parent directories and we want to pass exist, okay, equals true. So we don't 12828 20:06:30,920 --> 20:06:36,920 get any errors if it already exists. And so then we can write some code. I just want to show you 12829 20:06:36,920 --> 20:06:44,920 what this does if we run it. So our target directory data slash pizza steak sushi does not exist. 12830 20:06:44,920 --> 20:06:51,560 It's creating one. So then we have now data and inside pizza steak sushi. Wonderful. But we're 12831 20:06:51,560 --> 20:06:55,640 going to fill this up with some images so that we have some data to work with. And then the whole 12832 20:06:55,640 --> 20:07:02,200 premise of this entire section will be loading this data of just images into PyTorch so that we 12833 20:07:02,200 --> 20:07:06,600 can build a computer vision model on it. But I just want to stress that this step will be very 12834 20:07:06,600 --> 20:07:11,480 similar no matter what data you're working with. You'll have some folder over here or maybe it'll 12835 20:07:11,480 --> 20:07:16,040 live on the cloud somewhere. Who knows wherever your data is, but you'll want to write code to 12836 20:07:16,040 --> 20:07:24,760 load it from here into PyTorch. So let's download pizza steak and sushi data. So I'm going to use 12837 20:07:24,760 --> 20:07:32,280 width. I'll just X over here. So we have more screen space with open. I'm going to open the data 12838 20:07:32,280 --> 20:07:39,800 path slash the file name that I'm trying to open, which will be pizza steak sushi dot zip. And I'm 12839 20:07:39,800 --> 20:07:47,160 going to write binary as F. So this is essentially saying I'm doing this in advance because I know 12840 20:07:47,160 --> 20:07:54,360 I'm going to download this folder here. So I know the the file name of it, pizza steak sushi dot zip. 12841 20:07:54,360 --> 20:08:04,040 I'm going to download that into Google collab and I want to open it up. So request equals request 12842 20:08:04,040 --> 20:08:13,320 dot get. And so when I want to get this file, I can click here. And then if I click download, 12843 20:08:13,880 --> 20:08:19,880 it's going to what do you think it's going to do? Well, let's see. If I wanted to download it 12844 20:08:19,880 --> 20:08:25,800 locally, I could do that. And then I could come over here. And then I could click upload if I 12845 20:08:25,800 --> 20:08:30,760 wanted to. So upload the session storage. I could upload it from that. But I prefer to write code 12846 20:08:30,760 --> 20:08:35,240 so that I could just run this cell over again and have the file instead of being download to 12847 20:08:35,240 --> 20:08:41,480 my local computer. It just goes straight into Google collab. So to do that, we need the URL 12848 20:08:42,040 --> 20:08:47,160 from here. And I'm just going to put that in there. It needs to be as a string. 12849 20:08:49,160 --> 20:08:57,080 Excuse me. I'm getting trigger happy on the shift and enter. Wonderful. So now I've got a request 12850 20:08:57,080 --> 20:09:04,040 to get the content that's in here. And GitHub can't really show this because this is a zip file 12851 20:09:04,040 --> 20:09:11,320 of images, spoiler alert. Now let's keep going. We're going to print out that we're downloading 12852 20:09:11,320 --> 20:09:20,680 pizza, stake and sushi data dot dot dot. And then I'm going to write to file the request dot content. 12853 20:09:21,400 --> 20:09:26,760 So the content of the request that I just made to GitHub. So that's request is here. 12854 20:09:26,760 --> 20:09:32,120 Using the Python request library to get the information here from GitHub. This URL could be 12855 20:09:32,120 --> 20:09:38,280 wherever your file has been stored. And then I'm going to write the content of that request 12856 20:09:38,280 --> 20:09:46,040 to my target file, which is this. This here. So if I just copy this, I'm going to write the data 12857 20:09:46,040 --> 20:09:55,400 to here data path slash pizza, stake sushi zip. And then because it's a zip file, I want to unzip it. 12858 20:09:55,400 --> 20:10:03,720 So unzip pizza, stake sushi data. Let's go with zip file. So we imported zip file up there, 12859 20:10:03,720 --> 20:10:09,480 which is a Python library to help us deal with zip files. We're going to use zip file dot zip 12860 20:10:09,480 --> 20:10:13,960 file. We're going to pass it in the data path. So just the path that we did below, 12861 20:10:14,680 --> 20:10:23,320 data path slash pizza, stake sushi dot zip. And this time, instead of giving it right permissions, 12862 20:10:23,320 --> 20:10:29,080 so that's what wb stands for, stands for right binary. I'm going to give it read permissions. 12863 20:10:29,080 --> 20:10:35,880 So I want to read this target file instead of writing it. And I'm going to go as zip ref. 12864 20:10:36,600 --> 20:10:40,520 We can call this anything really, but zip ref is kind of, you'll see this a lot in 12865 20:10:41,320 --> 20:10:48,440 different Python examples. So we're going to print out again. So unzipping pizza, stake, 12866 20:10:48,440 --> 20:10:59,560 and sushi data. Then we're going to go zip underscore ref dot extract all. And we're going to go image 12867 20:10:59,560 --> 20:11:06,840 path. So what this means is it's taking the zip ref here. And it's extracting all of the 12868 20:11:06,840 --> 20:11:14,360 information that's within that zip ref. So within this zip file, to the image path, 12869 20:11:14,360 --> 20:11:21,320 which is what we created up here. So if we have a look at image path, let's see that. 12870 20:11:22,600 --> 20:11:29,960 Image path. Wonderful. So that's where all of the contents of that zip file are going to go 12871 20:11:29,960 --> 20:11:37,240 into this file. So let's see it in action. You're ready. Hopefully it works. Three, two, one, run. 12872 20:11:37,240 --> 20:11:45,560 File is not a zip file. Oh, no, what do we get wrong? So did I type this wrong? 12873 20:11:47,560 --> 20:11:58,440 Got zip data path. Oh, we got the zip file here. Pizza, stake, sushi, zip, read data path. 12874 20:11:59,800 --> 20:12:03,480 Okay, I found the error. So this is another thing that you'll have to keep in mind. 12875 20:12:03,480 --> 20:12:08,600 And I believe we've covered this before, but I like to keep the errors in these videos so that 12876 20:12:08,600 --> 20:12:12,680 you can see where I get things wrong, because you never write code right the first time. 12877 20:12:13,240 --> 20:12:18,600 So we have this link in GitHub. We have to make sure that we have the raw link address. So if I 12878 20:12:18,600 --> 20:12:24,760 come down to here and copy the link address from the download button, you'll notice a slight 12879 20:12:24,760 --> 20:12:29,720 difference if we come back into here. So I'm just going to copy that there. So if we step 12880 20:12:29,720 --> 20:12:35,960 through this GitHub, Mr. D Burke pytorch deep learning, we have raw instead of blob. So that 12881 20:12:35,960 --> 20:12:41,960 is why we've had an error is that our code is correct. It's just downloading the wrong data. 12882 20:12:42,680 --> 20:12:47,080 So let's change this to the raw. So just keep that in mind, you must have raw here. 12883 20:12:47,880 --> 20:12:49,320 And so let's see if this works. 12884 20:12:52,600 --> 20:12:56,440 Do we have the correct data? Oh, we might have to delete this. Oh, there we go. 12885 20:12:56,440 --> 20:13:03,720 Test. Beautiful. Train. Pizza steak sushi. Wonderful. So it looks like we've got some data. And if we 12886 20:13:03,720 --> 20:13:09,560 open this up, what do we have? We have various JPEGs. Okay. So this is our testing data. And if 12887 20:13:09,560 --> 20:13:15,640 we click on there, we've got an image of pizza. Beautiful. So we're going to explore this a 12888 20:13:15,640 --> 20:13:21,080 little bit more in the next video. But that is some code that we've written to download data sets 12889 20:13:21,080 --> 20:13:27,720 or download our own custom data set. Now, just recall that we are working specifically on a pizza 12890 20:13:27,720 --> 20:13:33,880 steak and sushi problem for computer vision. However, our whole premise is that we have some 12891 20:13:33,880 --> 20:13:38,760 custom data. And we want to convert these. How do we get these into tenses? That's what we want 12892 20:13:38,760 --> 20:13:45,560 to do. And so the same process will be for your own problems. We'll be loading a target data set 12893 20:13:45,560 --> 20:13:51,000 and then writing code to convert whatever the format the data set is in into tenses for PyTorch. 12894 20:13:52,120 --> 20:13:55,560 So I'll see you in the next video. Let's explore the data we've downloaded. 12895 20:14:00,360 --> 20:14:06,040 Welcome back. In the last video, we wrote some code to download a target data set, our own custom 12896 20:14:06,040 --> 20:14:13,000 data set from the PyTorch deep learning data directory. And if you'd like to see how that 12897 20:14:13,000 --> 20:14:18,040 data set was made, you can go to PyTorch deep learning slash extras. It's going to be in the 12898 20:14:18,040 --> 20:14:24,600 custom data creation notebook here for 04. So I've got all the code there. All we've done is take 12899 20:14:24,600 --> 20:14:31,000 data from the food 101 data set, which you can download from this website here, or from torch 12900 20:14:31,000 --> 20:14:40,120 vision. So if we go to torch vision, food 101. We've got the data set built into PyTorch there. 12901 20:14:40,120 --> 20:14:46,680 So I've used that data set from PyTorch and broken it down from 101 classes to three classes so that 12902 20:14:46,680 --> 20:14:52,680 we can start with a small experiment. So there we go. Get the training data, data sets food 101, 12903 20:14:52,680 --> 20:15:00,280 and then I've customized it to be my own style. So if we go back to CoLab, we've now got 12904 20:15:00,280 --> 20:15:04,920 pizza steak sushi, a test folder, which will be our testing images, and a train folder, 12905 20:15:04,920 --> 20:15:10,840 which will be our training images. This data is in standard image classification format. But we'll 12906 20:15:10,840 --> 20:15:16,280 cover that in a second. All we're going to do in this video is kick off section number two, 12907 20:15:16,280 --> 20:15:25,400 which is becoming one with the data, which is one of my favorite ways to refer to data preparation 12908 20:15:25,400 --> 20:15:35,320 and data exploration. So we're coming one with the data. And I'd just like to show you one of my 12909 20:15:35,320 --> 20:15:41,960 favorite quotes from Abraham loss function. So if I had eight hours to build a machine learning model, 12910 20:15:41,960 --> 20:15:48,200 I'd spend the first six hours preparing my data set. And that's what we're going to do. Abraham 12911 20:15:48,200 --> 20:15:53,080 loss function sounds like he knows what is going on. But since we've just downloaded some data, 12912 20:15:53,080 --> 20:16:00,640 let's explore it. Hey, and we'll write some code now to walk through each of the directories. How 12913 20:16:00,640 --> 20:16:07,800 you explore your data will depend on what data you've got. So we've got a fair few different 12914 20:16:07,800 --> 20:16:12,760 directories here with a fair few different folders within them. So how about we walk through each 12915 20:16:12,760 --> 20:16:18,120 of these directories and see what's going on. If you have visual data, you probably want to 12916 20:16:18,120 --> 20:16:22,440 visualize an image. So we're going to do that in the second two, write a little doc string for 12917 20:16:22,440 --> 20:16:33,160 this helper function. So walks through the path, returning its contents. Now, just in case you didn't 12918 20:16:33,160 --> 20:16:39,240 know Abraham loss function does not exist as far as I know. But I did make up that quote. So we're 12919 20:16:39,240 --> 20:16:46,680 going to use the OS dot walk function, OS dot walk. And we're going to pass it in a dirt path. And 12920 20:16:46,680 --> 20:16:55,320 what does walk do? We can get the doc string here. Directory tree generator. For each directory 12921 20:16:55,320 --> 20:17:01,240 in the directory tree rooted at the top, including top itself, but in excluding dot and dot dot, 12922 20:17:01,240 --> 20:17:08,440 yields a three tuple, derpath, der names, and file names. You can step through this in the Python 12923 20:17:08,440 --> 20:17:12,680 documentation, if you'd like. But essentially, it's just going to go through our target directory, 12924 20:17:12,680 --> 20:17:17,560 which in this case will be this one here. And walk through each of these directories printing out 12925 20:17:17,560 --> 20:17:23,560 some information about each one. So let's see that in action. This is one of my favorite things to do 12926 20:17:23,560 --> 20:17:31,080 if we're working with standard image classification format data. So there are lane, length, 12927 20:17:31,080 --> 20:17:41,000 der names, directories. And let's go land, land, file names. We say at length, like I've got the 12928 20:17:41,000 --> 20:17:50,200 G on the end, but it's just land images in, let's put in here, derpath. So a little bit confusing 12929 20:17:50,200 --> 20:17:54,760 if you've never used walk before, but it's so exciting to see all of the information in all 12930 20:17:54,760 --> 20:18:00,120 of your directories. Oh, we didn't read and run it. Let's check out function now walk through der. 12931 20:18:00,120 --> 20:18:05,320 And we're going to pass it in the image path, which is what? Well, it's going to show us. 12932 20:18:05,960 --> 20:18:11,800 How beautiful. So let's compare what we've got in our printout here. There are two directories 12933 20:18:11,800 --> 20:18:17,480 and zero images in data, pizza, steak sushi. So this one here, there's zero images, but there's 12934 20:18:17,480 --> 20:18:24,520 two directories test and train wonderful. And there are three directories in data, pizza, steak, sushi, 12935 20:18:24,520 --> 20:18:31,720 test. Yes, that looks correct. Three directories, pizza, steak, sushi. And then we have zero 12936 20:18:31,720 --> 20:18:38,840 directories and 19 images in pizza, steak, sushi, slash test, steak. We have a look at this. So that 12937 20:18:38,840 --> 20:18:44,760 means there's 19 testing images for steak. Let's have a look at one of them. There we go. Now, 12938 20:18:44,760 --> 20:18:49,880 again, these are from the food 101 data set, the original food 101 data set, which is just a whole 12939 20:18:49,880 --> 20:18:55,400 bunch of images of food, 100,000 of them. There's some steak there. Wonderful. And we're trying to 12940 20:18:55,400 --> 20:19:01,240 build a food vision model to recognize what is in each image. Then if we jump down to here, 12941 20:19:01,240 --> 20:19:07,240 we have three directories in the training directory. So we have pizza, steak, sushi. And then we have 12942 20:19:07,240 --> 20:19:15,880 75 steak images, 72 sushi images and 78 pizza. So slightly different, but very much the same 12943 20:19:15,880 --> 20:19:20,680 numbers. They're not too far off each other. So we've got about 75 or so training images, 12944 20:19:20,680 --> 20:19:26,840 and we've got about 25 or so testing images per class. Now these were just randomly selected 12945 20:19:26,840 --> 20:19:34,840 from the food 101 data set 10% of three different classes. So let's keep pushing forward. And we're 12946 20:19:34,840 --> 20:19:44,440 going to set up our training and test parts. So I just want to show you, we'll just set up this, 12947 20:19:44,440 --> 20:19:52,280 and then I'll just show you the standard image classification setup, image path.train. And we're 12948 20:19:52,280 --> 20:19:57,480 going to go tester. So if you're working on image classification problem, we want to set this up 12949 20:19:57,480 --> 20:20:04,280 as test. And then if we print out the trainer and the tester, this is what we're going to be 12950 20:20:04,280 --> 20:20:09,880 trying to do. We're going to write some code to go, Hey, look at this path for our training images. 12951 20:20:09,880 --> 20:20:17,080 And look at this path for our testing images. And so this is the standard image classification 12952 20:20:17,080 --> 20:20:22,600 data format is that you have your overall data set folder. And then you have a training folder 12953 20:20:22,600 --> 20:20:27,880 dedicated to all of the training images that you might have. And then you have a testing folder 12954 20:20:27,880 --> 20:20:31,880 dedicated to all of the testing images that you might have. And you could have a validation 12955 20:20:31,880 --> 20:20:39,240 data set here as well if you wanted to. But to label each one of these images, the class name 12956 20:20:39,240 --> 20:20:46,680 is the folder name. So all of the pizza images live in the pizza directory, the same for steak, 12957 20:20:46,680 --> 20:20:52,760 and the same for sushi. So depending on your problem, your own data format will depend on 12958 20:20:52,760 --> 20:20:57,240 whatever you're working on, you might have folders of different text files or folders of 12959 20:20:57,240 --> 20:21:04,440 different audio files. But the premise remains, we're going to be writing code to get our data here 12960 20:21:04,440 --> 20:21:11,000 into tenses for use with PyTorch. And so where does this come from? This image data classification 12961 20:21:11,000 --> 20:21:18,920 format. Well, if we go to the torch vision dot data sets documentation, as you start to work 12962 20:21:18,920 --> 20:21:23,400 with more data sets, you'll start to realize that there are standardized ways of storing 12963 20:21:23,400 --> 20:21:28,440 specific types of data. So if we come down to here, base classes for custom data sets, 12964 20:21:28,440 --> 20:21:33,480 we'll be working towards using this image folder data set. But this is a generic data 12965 20:21:33,480 --> 20:21:40,360 loader where the images are arranged in this way by default. So I've specifically formatted our data 12966 20:21:40,360 --> 20:21:48,280 to mimic the style that this pre built data loading function is for. So we've got a root directory 12967 20:21:48,280 --> 20:21:54,520 here in case of we were classifying dog and cat images, we have root, then we have a dog folder, 12968 20:21:54,520 --> 20:22:00,360 then we have various images. And the same thing for cat, this would be dog versus cat. But the only 12969 20:22:00,360 --> 20:22:06,120 difference for us is that we have food images, and we have pizza steak sushi. If we wanted to use the 12970 20:22:06,120 --> 20:22:12,680 entire food 101 data set, we would have 101 different folders of images here, which is totally 12971 20:22:12,680 --> 20:22:18,440 possible. But to begin with, we're keeping things small. So let's keep pushing forward. As I said, 12972 20:22:18,440 --> 20:22:22,760 we're dealing with a computer vision problem. So what's another way to explore our data, 12973 20:22:22,760 --> 20:22:28,440 other than just walking through the directories themselves. Let's visualize an image, hey? But 12974 20:22:28,440 --> 20:22:33,640 we've done that before with just clicking on the file. How about we write some code to do so. 12975 20:22:35,400 --> 20:22:38,840 We'll replicate this but with code. I'll see you in the next video. 12976 20:22:42,840 --> 20:22:49,800 Welcome back. In the last video, we started to become one with the data. And we learned that we 12977 20:22:49,800 --> 20:22:56,680 have about 75 images per training class and about 25 images per testing class. And we also learned 12978 20:22:56,680 --> 20:23:04,440 that the standard image classification data structure is to have the steak images within the steak 12979 20:23:04,440 --> 20:23:09,720 folder of the training data set and the same for test, and the pizza images within the pizza 12980 20:23:09,720 --> 20:23:14,920 folder, and so on for each different image classification class that we might have. 12981 20:23:14,920 --> 20:23:19,720 So if you want to create your own data set, you might format it in such a way that your training 12982 20:23:19,720 --> 20:23:25,000 images are living in a directory with their classification name. So if you wanted to classify 12983 20:23:25,000 --> 20:23:30,360 photos of dogs and cats, you might create a training folder of train slash dog train slash 12984 20:23:30,360 --> 20:23:37,080 cat, put images of dogs in the dog folder, images of cats in the cat folder, and then the same for 12985 20:23:37,080 --> 20:23:42,040 the testing data set. But the premise remains, I'm going to sound like a broken record here. 12986 20:23:42,040 --> 20:23:48,120 We want to get our data from these files, whatever files they may be in, whatever data structure 12987 20:23:48,120 --> 20:23:53,080 they might be in, into tenses. But before we do that, let's keep becoming one with the data. 12988 20:23:53,080 --> 20:23:59,880 And we're going to visualize an image. So visualizing an image, and you know how much I love randomness. 12989 20:24:00,520 --> 20:24:07,480 So let's select a random image from all of the files that we have in here. And let's plot it, 12990 20:24:07,480 --> 20:24:11,720 hey, because we could just click through them and visualize them. But I like to do things with 12991 20:24:11,720 --> 20:24:20,760 code. So specifically, let's let's plan this out. Let's write some code to number one is get all 12992 20:24:20,760 --> 20:24:28,760 of the image paths. We'll see how we can do that with the path path lib library. We then want to 12993 20:24:28,760 --> 20:24:36,360 pick a random image path using we can use Python's random for that. Python's random dot choice will 12994 20:24:36,360 --> 20:24:45,240 pick a single image random dot choice. Then we want to get the image class name. And this is where 12995 20:24:45,240 --> 20:24:51,480 part lib comes in handy. Class name, recall that whichever target image we pick, the class name will 12996 20:24:51,480 --> 20:24:56,600 be whichever directory that it's in. So in the case of if we picked a random image from this directory, 12997 20:24:57,320 --> 20:25:04,840 the class name would be pizza. So we can do that using, I think it's going to be path lib dot path. 12998 20:25:04,840 --> 20:25:09,240 And then we'll get the parent folder, wherever that image lives. So the parent image parent 12999 20:25:09,240 --> 20:25:15,320 folder that parent directory of our target random image. And we're going to get the stem of that. 13000 20:25:15,960 --> 20:25:23,000 So we have stem, stem is the last little bit here. Number four, what should we do? Well, 13001 20:25:23,000 --> 20:25:30,280 we want to open the image. So since we're working with images, let's open the image 13002 20:25:31,160 --> 20:25:38,360 with Python's pill, which is Python image library, but we'll actually be pillow. So if we go Python 13003 20:25:38,360 --> 20:25:45,240 pillow, a little bit confusing when I started to learn about Python image manipulation. So pillow 13004 20:25:45,240 --> 20:25:53,800 is a friendly pill for, but it's still called pill. So just think of pillow as a way to process 13005 20:25:53,800 --> 20:26:01,160 images with Python. So pill is the Python imaging library by Frederick Lund. And so Alex Clark and 13006 20:26:01,160 --> 20:26:09,880 contributors have created pillow. So thank you, everyone. And let's go to number five. What do 13007 20:26:09,880 --> 20:26:14,120 we want to do as well? We want to, yeah, let's get some metadata about the image. We'll then show 13008 20:26:14,120 --> 20:26:22,200 the image and print metadata. Wonderful. So let's import random, because machine learning is all 13009 20:26:22,200 --> 20:26:28,120 about harnessing the power of randomness. And I like to use randomness to explore data as well 13010 20:26:28,120 --> 20:26:36,840 as model it. So let's set the seed. So we get the same image on both of our ends. So random dot seed. 13011 20:26:38,280 --> 20:26:43,320 I'm going to use 42. You can use whatever you'd like. But if you'd like to get the same image as me, 13012 20:26:43,320 --> 20:26:54,200 I'd suggest using 42 as well. Now let's get all the image paths. So we can do this because our image 13013 20:26:54,200 --> 20:27:00,680 path list, we want to get our image path. So recall that our image path 13014 20:27:02,920 --> 20:27:08,520 is this. So this folder here, I'm just going to close all this. So this is our image path, 13015 20:27:08,520 --> 20:27:13,240 this folder here, you can also go copy path if you wanted to, we're just going to get something 13016 20:27:13,240 --> 20:27:20,440 very similar there. That's going to error out. So I'll just comment that. So it doesn't error. 13017 20:27:20,440 --> 20:27:28,040 That's our path. But we're going to keep it in the POSIX path format. And we can go list. Let's 13018 20:27:28,040 --> 20:27:34,440 create a list of image path dot glob, which stands for grab. I don't actually know what glob stands 13019 20:27:34,440 --> 20:27:42,520 for. But to me, it's like glob together. All of the images that are all of the files that suit 13020 20:27:42,520 --> 20:27:48,520 a certain pattern. So glob together for me means stick them all together. And you might be able 13021 20:27:48,520 --> 20:27:54,120 to correct me if I've got the wrong meaning there. I'd appreciate that. And so we're going to pass 13022 20:27:54,120 --> 20:28:02,840 in a certain combination. So we want star slash star. And then we want star dot jpg. Now why are 13023 20:28:02,840 --> 20:28:09,720 we doing this? Well, because we want every image path. So star is going to be this first 13024 20:28:10,680 --> 20:28:16,920 directory here. So any combination, it can be train or test. And then this star means anything for 13025 20:28:16,920 --> 20:28:24,600 what's inside tests. And let's say this first star is equal to test. This second star is equal to 13026 20:28:24,600 --> 20:28:30,280 anything here. So it could be any of pizza, steak or sushi. And then finally, this star, 13027 20:28:30,280 --> 20:28:37,240 let's say it was test pizza. This star is anything in here. And that is before dot jpg. 13028 20:28:37,800 --> 20:28:42,920 So it could be any one of these files here. Now this will make more sense once we print it out. 13029 20:28:42,920 --> 20:28:50,920 So image path list, let's have a look. There we go. So now we've got a list of every single image 13030 20:28:50,920 --> 20:28:57,800 that's within pizza steak sushi. And this is just another way that I like to visualize data is to 13031 20:28:57,800 --> 20:29:03,000 just get all of the paths and then randomly visualize it, whether it be an image or text or 13032 20:29:03,000 --> 20:29:08,360 audio, you might want to randomly listen to it. Recall that each each of the domain libraries have 13033 20:29:08,360 --> 20:29:13,800 different input and output methods for different data sets. So if we come to torch vision, we have 13034 20:29:13,800 --> 20:29:21,160 utils. So we have different ways to draw on images, reading and writing images and videos. So we 13035 20:29:21,160 --> 20:29:27,640 could load an image via read image, we could decode it, we could do a whole bunch of things. 13036 20:29:27,640 --> 20:29:33,560 I'll let you explore that as extra curriculum. But now let's select a random image from here 13037 20:29:33,560 --> 20:29:42,360 and plot it. So we'll go number two, which was our step up here, pick a random image. So pick a 13038 20:29:42,360 --> 20:29:51,880 random image path. Let's get rid of this. And so we can go random image path equals random 13039 20:29:51,880 --> 20:29:58,440 dot choice, harness the power of randomness to explore our data. Let's get a random image from 13040 20:29:58,440 --> 20:30:02,840 image path list, and then we'll print out random image path, which one was our lucky image that 13041 20:30:02,840 --> 20:30:12,360 we selected. Beautiful. So we have a test pizza image is our lucky random image. And 13042 20:30:13,960 --> 20:30:18,200 because we've got a random seed, it's going to be the same one each time. Yes, it is. 13043 20:30:19,080 --> 20:30:22,840 And if we comment out the random seed, we'll get a different one each time. We've got a stake 13044 20:30:22,840 --> 20:30:29,720 image. We've got another stake image. Another stake image. Oh, three in a row, four in a row. 13045 20:30:29,720 --> 20:30:34,600 Oh, pizza. Okay, let's keep going. So we'll get the image class 13046 20:30:36,520 --> 20:30:44,760 from the path name. So the image class is the name of the directory, because our image data is 13047 20:30:44,760 --> 20:30:52,600 in standard image classification format, where the image is stored. So let's do that image class 13048 20:30:52,600 --> 20:31:03,480 equals random image path dot parent dot stem. And then we're going to print image class. What do we 13049 20:31:03,480 --> 20:31:12,120 get? So we've got pizza. Wonderful. So the parent is this folder here. And then the stem is the end 13050 20:31:12,120 --> 20:31:17,400 of that folder, which is pizza. Beautiful. Well, now what are we up to now? We're working with 13051 20:31:17,400 --> 20:31:22,440 images. Let's open up the image so we can open up the image using pill. We could also open up the 13052 20:31:22,440 --> 20:31:29,320 image with pytorch here. So with read image, but we're going to use pill to keep things a little 13053 20:31:29,320 --> 20:31:37,000 bit generic for now. So open image, image equals image. So from pill import image, and the image 13054 20:31:37,000 --> 20:31:42,040 class has an open function. And we're just going to pass it in here, the random image path. Note 13055 20:31:42,040 --> 20:31:48,360 if this is corrupt, if your images corrupt, this may error. So then you could potentially use this 13056 20:31:48,360 --> 20:31:55,400 to clean up your data set. I've imported a lot of images with image dot open of our target data 13057 20:31:55,400 --> 20:32:00,360 set here. I don't believe any of them are corrupt. But if they are, please let me know. And we'll find 13058 20:32:00,360 --> 20:32:06,440 out later on when our model tries to train on it. So let's print some metadata. So when we open our 13059 20:32:06,440 --> 20:32:15,000 image, we get some information from it. So let's go our random image path is what? Random image path. 13060 20:32:15,000 --> 20:32:22,440 We're already printing this out, but we'll do it again anyway. And then we're going to go the image 13061 20:32:22,440 --> 20:32:32,040 class is equal to what will be the image class. Wonderful. And then we can print out, we can get 13062 20:32:32,040 --> 20:32:37,320 some metadata about our images. So the image height is going to be IMG dot height. We get that 13063 20:32:37,320 --> 20:32:43,320 metadata from using the pill library. And then we're going to print out image width. And we'll get 13064 20:32:43,320 --> 20:32:50,680 IMG dot width. And then we'll print the image itself. Wonderful. And we can get rid of this, 13065 20:32:50,680 --> 20:32:55,240 and we can get rid of this. Let's now have a look at some random images from our data set. 13066 20:32:59,240 --> 20:33:05,000 Lovely. We've got an image of pizza there. Now I will warn you that the downsides of working with 13067 20:33:05,000 --> 20:33:10,680 food data is it does make you a little bit hungry. So there we've got some sushi. And then we've got 13068 20:33:10,680 --> 20:33:22,200 some more sushi. Some steak. And we have a steak, we go one more for good luck. And we finish off 13069 20:33:22,200 --> 20:33:25,880 with some sushi. Oh, that could be a little bit confusing to me. I thought that might be steak 13070 20:33:25,880 --> 20:33:31,320 to begin with. And this is the scene. Now we'll do one more. Why it's important to sort of visualize 13071 20:33:31,320 --> 20:33:35,400 your images randomly, because you never know what you're going to come across. And this way, 13072 20:33:35,400 --> 20:33:39,560 once we visualize enough images, you could do this a hundred more times. You could do this 13073 20:33:39,560 --> 20:33:45,000 20 more times until you feel comfortable to go, Hey, I feel like I know enough about the data now. 13074 20:33:45,000 --> 20:33:50,760 Let's see how well our model goes on this sort of data. So I'll finish off on this steak image. 13075 20:33:50,760 --> 20:33:56,200 And now I'll set your little challenge before the next video is to visualize an image like we've 13076 20:33:56,200 --> 20:34:03,960 done here. But this time do it with matplotlib. So try to visualize an image with matplotlib. 13077 20:34:03,960 --> 20:34:09,640 That's your little challenge before the next video. So give that a go. We want to do a random 13078 20:34:09,640 --> 20:34:14,760 image as well. So quite a similar set up to this. But instead of printing out things like this, 13079 20:34:14,760 --> 20:34:20,120 we want to visualize it using matplotlib. So try that out and we'll do it together in the next video. 13080 20:34:24,680 --> 20:34:30,760 Oh, we are well on the way to creating our own PyTorch custom data set. We've started to 13081 20:34:30,760 --> 20:34:37,800 become one with the data. But now let's continue to visualize another image. I set you the challenge 13082 20:34:37,800 --> 20:34:43,320 in the last video to try and replicate what we've done here with the pill library with matplotlib. 13083 20:34:43,320 --> 20:34:49,240 So now let's give it a go. Hey, and why use matplotlib? Well, because matplotlib and I'm going to 13084 20:34:49,240 --> 20:34:53,800 import numpy as well, because we're going to have to convert this image into an array. That was a 13085 20:34:53,800 --> 20:34:59,560 little trick that I didn't quite elaborate on. But I hope you tried to decode it out and figure 13086 20:34:59,560 --> 20:35:06,760 it out from the errors you received. But matplotlib is one of the most fundamental data science 13087 20:35:06,760 --> 20:35:11,000 libraries. So you're going to see it everywhere. So it's just important to be aware of how to plot 13088 20:35:11,000 --> 20:35:21,720 images and data with matplotlib. So turn the image into an array. So we can go image as array. And 13089 20:35:21,720 --> 20:35:29,400 I'm going to use the numpy method NP as array. We're going to pass it in the image, recall that 13090 20:35:29,400 --> 20:35:34,920 the image is the same image that we've just set up here. And we've already opened it with pill. 13091 20:35:36,440 --> 20:35:46,200 And then I'm going to plot the image. So plot the image with matplotlib. plt.figure. 13092 20:35:46,200 --> 20:35:56,440 And then we can go fig size equals 10, seven. And then we're going to go plt.im show image as 13093 20:35:56,440 --> 20:36:03,480 array, pass it in the array of numbers. I'm going to set the title here as an f string. And then 13094 20:36:03,480 --> 20:36:11,400 I'm going to pass in image class, equals image class. Then I'm going to pass in image shape. So 13095 20:36:11,400 --> 20:36:15,240 we can get the shape here. Now this is another important thing to be aware of of your different 13096 20:36:15,240 --> 20:36:20,840 datasets when you're exploring them is what is the shape of your data? Because what's one of the 13097 20:36:20,840 --> 20:36:25,880 main errors in machine learning and deep learning? It's shape mismatch issues. So if we know the 13098 20:36:25,880 --> 20:36:31,240 shape of our data where we can start to go, okay, I kind of understand what shape I need my model 13099 20:36:31,240 --> 20:36:36,760 layers to be in what what shape I need my other data to be in. And I'm going to turn the axes off 13100 20:36:36,760 --> 20:36:44,520 here. Beautiful. So look at what we've got. Now I've just thrown this in here without really 13101 20:36:44,520 --> 20:36:49,720 explaining it. But we've seen this before in the computer vision section. As our image shape is 13102 20:36:49,720 --> 20:36:59,800 512 3063. Now the dimensions here are height is 512 pixels. The width is 306 pixels. And it has 13103 20:36:59,800 --> 20:37:08,040 three color channels. So what format is this? This is color channels last, which is the default 13104 20:37:08,040 --> 20:37:14,840 for the pill library. There's also the default for map plot lib. But pytorch recall is default 13105 20:37:14,840 --> 20:37:20,360 if we put the color channels at the start color channels first. Now there is a lot of debate as 13106 20:37:20,360 --> 20:37:24,360 I've said over which is the best order. It looks like it's leading towards going towards this. But 13107 20:37:24,360 --> 20:37:30,120 for now pytorch defaults to color channels first. But that's okay. Because we can manipulate these 13108 20:37:30,120 --> 20:37:36,200 dimensions to what we need for whatever code that we're writing. And the three color channels is what 13109 20:37:36,200 --> 20:37:41,400 red, green and blue. So if you combine red, green and blue in some way, shape or form, 13110 20:37:41,400 --> 20:37:47,640 you get the different colors here that represent our image. And so if we have a look at our image 13111 20:37:47,640 --> 20:37:59,480 as a ray. Our image is in numerical format. Wonderful. So okay. We've got one way to do this for 13112 20:37:59,480 --> 20:38:07,800 one image. I think we start moving towards scaling this up to do it for every image in our data 13113 20:38:07,800 --> 20:38:13,240 folder. So let's just finish off this video by visualizing one more image. What do we get? Same 13114 20:38:13,240 --> 20:38:19,240 premise. The image is now as an array, different numerical values. We've got a delicious looking 13115 20:38:19,240 --> 20:38:28,440 pizza here of shave 512 512 with color channels last. And we've got the same thing up here. So 13116 20:38:28,440 --> 20:38:33,640 that is one way to become one with the data is to visualize different images, especially random 13117 20:38:33,640 --> 20:38:38,040 images. You could do the same thing visualizing different text samples that you're working with 13118 20:38:38,040 --> 20:38:43,480 or listening to different audio samples. It depends what domain you're working in. So now in the 13119 20:38:43,480 --> 20:38:49,880 next video, let's start working towards turning all of the images in here. Now that we visualize 13120 20:38:49,880 --> 20:38:54,600 some of them and become one with the data, we've seen that the shapes are varying in terms of 13121 20:38:54,600 --> 20:38:59,000 height and width. But they all look like they have three color channels because we have color images. 13122 20:38:59,640 --> 20:39:04,680 But now we want to write code to turn all of these images into pytorch tenses. 13123 20:39:05,480 --> 20:39:09,080 So let's start moving towards that. I'll see you in the next video. 13124 20:39:12,600 --> 20:39:18,920 Hello and welcome back. In the last video, we converted an image to a NumPy array. 13125 20:39:18,920 --> 20:39:25,400 And we saw how an image can be represented as an array. But what if we'd like to get this image 13126 20:39:25,400 --> 20:39:33,160 from our custom data set over here, pizza steak sushi into pytorch? Well, let's cover that in 13127 20:39:33,160 --> 20:39:39,640 this video. So I'm going to create a new heading here. And it's going to be transforming data. 13128 20:39:40,280 --> 20:39:45,160 And so what we'd like to do here is I've been hinting at the fact the whole time is we want 13129 20:39:45,160 --> 20:39:50,280 to get our data into tensor format, because that is the data type that pytorch accepts. 13130 20:39:50,920 --> 20:39:59,320 So let's write down here before we can use our image data with pytorch. Now this goes for images, 13131 20:39:59,320 --> 20:40:05,720 other vision data, it goes for text, it goes to audio, basically whatever kind of data set you're 13132 20:40:05,720 --> 20:40:12,760 working with, you need some way to turn it into tenses. So that's step number one. Turn your target 13133 20:40:12,760 --> 20:40:23,160 data into tenses. In our case, it's going to be a numerical representation of our images. 13134 20:40:24,600 --> 20:40:35,240 And number two is turn it into a torch dot utils dot data dot data set. So recall from a previous 13135 20:40:35,240 --> 20:40:43,880 video that we've used the data set to house all of our data in tensor format. And then subsequently, 13136 20:40:43,880 --> 20:40:54,680 we've turned our data sets, our pytorch data sets into torch dot utils dot data dot data loader. 13137 20:40:55,240 --> 20:41:02,440 And a data loader creates an iterable or a batched version of our data set. So for short, we're going 13138 20:41:02,440 --> 20:41:11,960 to call these data set and data loader. Now, as I discussed previously, if we go to the pytorch 13139 20:41:11,960 --> 20:41:19,960 documentation torch vision for torch vision, this is going to be quite similar for torch audio torch 13140 20:41:19,960 --> 20:41:25,880 text, torch rec torch data eventually when it comes out of beta, there are different ways to 13141 20:41:25,880 --> 20:41:33,400 create such data sets. So we can go into the data sets module, and then we can find built-in data 13142 20:41:33,400 --> 20:41:43,400 sets, and then also base classes for custom data sets. But if we go into here, image folder, 13143 20:41:43,400 --> 20:41:47,400 there's another parameter I'd like to show you, and this is going to be universal across many of 13144 20:41:47,400 --> 20:41:54,040 your different data types is the transform parameter. Now, the transform parameter is 13145 20:41:54,040 --> 20:42:01,560 a parameter we can use to pass in some transforms on our data. So when we load our data sets from an 13146 20:42:01,560 --> 20:42:08,600 image folder, it performs a transform on those data samples that we've sent in here as the target 13147 20:42:08,600 --> 20:42:13,960 data folder. Now, this is a lot more easier to understand through illustration, rather than just 13148 20:42:13,960 --> 20:42:20,280 talking about it. So let's create a transform. And the main transform we're going to be doing is 13149 20:42:20,280 --> 20:42:24,840 transforming our data, and we're turning it into tenses. So let's see what that looks like. So we're 13150 20:42:24,840 --> 20:42:29,880 going to just going to re import all of the main libraries that we're going to use. So from torch 13151 20:42:29,880 --> 20:42:38,600 utils dot data, let's import data loader. And we're going to import from torch vision. I'm going to 13152 20:42:38,600 --> 20:42:47,000 import data sets. And I'm also going to import transforms. Beautiful. And I'm going to create 13153 20:42:47,000 --> 20:42:55,160 another little heading here, this is going to be 3.1, transforming data with torch vision dot 13154 20:42:55,160 --> 20:43:01,560 transform. So the main transform we're looking to here is turning out images from JPEGs. 13155 20:43:04,200 --> 20:43:07,960 If we go into train, and then we go into any folder, we've got JPEG images. 13156 20:43:09,720 --> 20:43:13,320 And we want to turn these into tensor representation. So there's some pizza there. 13157 20:43:13,320 --> 20:43:20,040 We'll get out of this. Let's see what we can do. How about we create a transform here, 13158 20:43:20,760 --> 20:43:27,240 write a transform for image. And let's start off by calling it data transform. 13159 20:43:27,880 --> 20:43:32,840 And I'm going to show you how we can combine a few transforms together. If you want to 13160 20:43:32,840 --> 20:43:38,120 combine transforms together, you can use transforms dot compose. You can also use 13161 20:43:38,120 --> 20:43:45,560 an n dot sequential to combine transforms. But we're going to stick with transforms dot 13162 20:43:45,560 --> 20:43:53,160 compose for now. And it takes a list. And so let's just write out three transforms to begin with. 13163 20:43:53,160 --> 20:43:57,720 And then we can talk about them after we do so. So we want to resize our images 13164 20:43:59,480 --> 20:44:06,200 to 6464. Now, why might we do this? Well, do you recall in the last section computer vision, 13165 20:44:06,200 --> 20:44:13,160 we use the tiny VGG architecture. And what size were the images that the tiny VGG architecture took? 13166 20:44:14,600 --> 20:44:19,480 Well, we replicated the CNN website version or the CNN explainer website version, and they took 13167 20:44:19,480 --> 20:44:25,880 images of size 6464. So perhaps we want to leverage that computer vision model later on. 13168 20:44:25,880 --> 20:44:32,280 So we're going to resize our images to 6464. And then we're going to create another transform. 13169 20:44:32,280 --> 20:44:37,960 And so this is, I just want to highlight how transforms can help you manipulate your data in a 13170 20:44:37,960 --> 20:44:42,840 certain way. So if we wanted to flip the images, which is a form of data augmentation, in other 13171 20:44:42,840 --> 20:44:49,560 words, artificially increasing the diversity of our data set, we can flip the images randomly on 13172 20:44:49,560 --> 20:44:59,960 the horizontal. So transforms dot random horizontal flip. And I'm going to put a probability in here 13173 20:44:59,960 --> 20:45:08,360 of p equals 0.5. So that means 50% of the time, if an image goes through this transform pipeline, 13174 20:45:09,000 --> 20:45:13,400 it will get flipped on the horizontal axis. As I said, this makes a lot more sense when we 13175 20:45:13,400 --> 20:45:19,800 visualize it. So we're going to do that very shortly. And finally, we're going to turn the image into 13176 20:45:19,800 --> 20:45:31,240 a torch tensor. So we can do this with transforms dot to tensor. And now where might you find such 13177 20:45:31,240 --> 20:45:37,480 transforms? So this transform here says to tensor, if we have a look at the doc string, 13178 20:45:37,480 --> 20:45:42,440 we got convert a pill image, which is what we're working with right now, or a NumPy array to a 13179 20:45:42,440 --> 20:45:47,080 tensor. This transform does not support torch script. If you'd like to find out what that is, 13180 20:45:47,080 --> 20:45:51,560 I'd like to read the documentation for that. It's essentially turning your pytorch code into a 13181 20:45:51,560 --> 20:45:59,320 Python script. It converts a pill image or a NumPy array from height with color channels in the range 13182 20:45:59,320 --> 20:46:06,280 0 to 255, which is what our values are up here. They're from 0 to 255, red, green and blue, 13183 20:46:06,920 --> 20:46:14,200 to a torch float tensor of shape color channels height width in the range 0 to 1. So it will 13184 20:46:14,200 --> 20:46:21,400 take our tensor values here or our NumPy array values from 0 to 255 and convert them into a torch 13185 20:46:21,400 --> 20:46:27,240 tensor in the range 0 to 1. We're going to see this later on in action. But this is our first 13186 20:46:27,240 --> 20:46:33,560 transform. So we can pass data data through that. In fact, I'd encourage you to try that out. 13187 20:46:34,200 --> 20:46:40,760 See what happens when you pass in data transform. What happens when you pass it in our image as a 13188 20:46:40,760 --> 20:46:52,280 ray? Image as a ray. Let's see what happens. Hey, oh, image should be pill image got class NumPy 13189 20:46:52,280 --> 20:46:58,120 array. What if we just pass in our straight up image? So this is a pill image. There we go. 13190 20:46:58,680 --> 20:47:01,880 Beautiful. So if we look at the shape of this, what do we get? 13191 20:47:01,880 --> 20:47:10,200 3 64 64. There's 64. And if what if we wanted to change this to 224, which is another common value for 13192 20:47:11,800 --> 20:47:17,400 computer vision models to 24 to 24. Do you see how powerful this is? This little transforms 13193 20:47:17,400 --> 20:47:23,400 module, the torch vision library will change that back to 64 64. And then if we have a look at what 13194 20:47:23,400 --> 20:47:31,000 D type of our transform tensor is, we get torch float 32. Beautiful. So now we've got a way to 13195 20:47:31,000 --> 20:47:36,360 transform our images into tensors. And so, but we're still only doing this with one image. 13196 20:47:37,160 --> 20:47:43,640 How about we progress towards doing it for every image in our data folder here? 13197 20:47:44,840 --> 20:47:49,560 But before we do that, I'd like to visualize what this looks like. So in the next video, 13198 20:47:49,560 --> 20:47:54,040 let's write some code to visualize what it looks like to transform multiple images at a time. 13199 20:47:54,680 --> 20:47:59,160 And I think it'd be a good idea to compare the transform that we're doing to the original image. 13200 20:47:59,160 --> 20:48:04,040 So I'll see you in the next video. Let's write some visualization code. 13201 20:48:06,840 --> 20:48:13,240 Let's now follow our data explorer's motto of visualizing our transformed images. So we saw what it looks 13202 20:48:13,240 --> 20:48:18,680 like to pass one image through a data transform. And if we wanted to find more documentation on 13203 20:48:18,680 --> 20:48:25,080 torch vision transforms, where could we go? There is a lot of these. So transforming and augmenting 13204 20:48:25,080 --> 20:48:31,000 images, this is actually going to be your extra curriculum for this video. So transforms are 13205 20:48:31,000 --> 20:48:36,440 common image transformations available in the transforms module. They can be chained together 13206 20:48:36,440 --> 20:48:41,400 using compose, which is what we've already done. Beautiful. And so if you'd like to go through all 13207 20:48:41,400 --> 20:48:45,640 of these, there's a whole bunch of different transforms that you can do, including some data 13208 20:48:45,640 --> 20:48:50,200 augmentation transforms. And then if you'd like to see them visually, I'd encourage you to check 13209 20:48:50,200 --> 20:48:55,800 out illustration of transforms. But let's write some code to explore our own transform visually 13210 20:48:55,800 --> 20:49:04,520 first. So I'll leave this as a link. So I'm going up here, right here, transforms 13211 20:49:06,600 --> 20:49:17,720 help you get your images ready to be used with a model slash perform data augmentation. 13212 20:49:17,720 --> 20:49:24,280 Wonderful. So we've got a way to turn images into tenses. That's what we want for our model. 13213 20:49:24,280 --> 20:49:29,560 We want our images as pytorch tenses. The same goes for any other data type that you're working 13214 20:49:29,560 --> 20:49:35,720 with. But now I'd just like to visualize what it looks like if we plot a number of transformed 13215 20:49:35,720 --> 20:49:41,480 images. So we're going to make a function here that takes in some image paths, a transform, 13216 20:49:41,480 --> 20:49:46,440 a number of images to transform at a time and a random seed here, because we're going to harness 13217 20:49:46,440 --> 20:49:53,400 the power of randomness. And sometimes we want to set the seed. Sometimes we don't. So we have 13218 20:49:53,400 --> 20:49:59,640 an image path list that we've created before, which is just all of the image paths that we have 13219 20:49:59,640 --> 20:50:09,000 of our data set. So data, pizza, steak sushi. Now how about we select some random image paths 13220 20:50:09,000 --> 20:50:15,320 and then take the image from that path, run it through our data transform, and then compare the 13221 20:50:15,320 --> 20:50:21,080 original image of what it looks like and the transformed image and what that looks like. 13222 20:50:22,120 --> 20:50:25,640 Let's give it a try, hey? So I'm going to write a doc string of what this does, 13223 20:50:26,600 --> 20:50:35,880 and then selects random images from a path of images and loads slash transforms them, 13224 20:50:35,880 --> 20:50:45,240 then plots the original verse, the transformed version. So that's quite a long doc string, 13225 20:50:45,240 --> 20:50:51,800 but that'll be enough. We can put in some stuff for the image paths, transforms, and seed. We'll 13226 20:50:51,800 --> 20:51:00,280 just code this out. Let's go random seed, we'll create the seed. Maybe we do it if seed, random seed. 13227 20:51:00,280 --> 20:51:08,360 Let's put that, and we'll set seed to equal none by default. That way we can, we'll see if this works, 13228 20:51:08,360 --> 20:51:14,280 hey, if in doubt, coded out random image paths, and then we're going to go random sample from the 13229 20:51:14,280 --> 20:51:19,160 image paths and the number of sample that we're going to do. So random sample is going to, this will 13230 20:51:19,160 --> 20:51:25,240 be a list on which part in here that this is a list. So we're going to randomly sample 13231 20:51:25,240 --> 20:51:34,360 k, which is going to be n. So three images from our image path list. And then we're going to go for 13232 20:51:34,360 --> 20:51:40,280 image path, we're going to loop through the randomly sampled image parts. You know how much I love 13233 20:51:40,280 --> 20:51:46,440 harnessing the power of randomness for visualization. So for image path in random image paths, let's 13234 20:51:46,440 --> 20:51:54,920 open up that image using pill image dot open image path as f. And then we're going to create a 13235 20:51:54,920 --> 20:52:02,360 figure and an axes. And we're going to create a subplot with my plot lib. So subplots. And we 13236 20:52:02,360 --> 20:52:13,320 want it to create one row. So it goes n rows and calls. One row and n calls equals two. And then 13237 20:52:13,320 --> 20:52:20,760 on the first or the zeroth axis, we're going to plot the original image. So in show, we're just 13238 20:52:20,760 --> 20:52:27,640 going to pass it straight in f. And then if we want to go x zero, we're going to set the title. So 13239 20:52:27,640 --> 20:52:35,080 set title, we're going to set it to be the original. So we'll create this as an f string, original, 13240 20:52:35,080 --> 20:52:40,840 and then new line will create a size variable. And this is going to be f dot size. So we're just 13241 20:52:40,840 --> 20:52:48,840 getting the size attribute from our file. So we'll keep going, and we'll turn off the axes here. 13242 20:52:48,840 --> 20:52:57,400 So axis, and we're going to set that to false. Now let's transform on the first axes plot. We're 13243 20:52:57,400 --> 20:53:03,720 going to transform and plot target image. This is so that our images are going to be side by side, 13244 20:53:03,720 --> 20:53:08,760 the original and the transformed version. So there's one thing that we're going to have to do. I'll 13245 20:53:08,760 --> 20:53:14,200 just, I'll code it out in a wrong way first. I think that'll be a good way to illustrate what's 13246 20:53:14,200 --> 20:53:23,240 going on. f. So I'm just going to put a note here. Note, we will need to change shape for 13247 20:53:23,240 --> 20:53:29,080 matplotlib, because we're going to come back here. Because what does this do? What have we 13248 20:53:29,080 --> 20:53:35,880 noticed that our transform does? If we check the shape here, oh, excuse me, it converts our image 13249 20:53:35,880 --> 20:53:45,240 to color channels first. Whereas matplotlib prefers color channels last. So just keep that 13250 20:53:45,240 --> 20:53:51,240 in mind for when we're going forward. This code, I'm writing it, it will error on purpose. So 13251 20:53:51,800 --> 20:53:58,200 transformed image. And then we're going to go axe one as well. We're going to set the title, 13252 20:53:58,200 --> 20:54:06,040 which is going to be transformed. And then we'll create a new line and we'll say size is going to be 13253 20:54:07,560 --> 20:54:17,400 transformed image dot shape. Or probably a bit of, yeah, we could probably go shape here. And then 13254 20:54:17,400 --> 20:54:23,560 finally, we're going to go axe one, we're going to turn the axis, we're going to set that to false. 13255 20:54:23,560 --> 20:54:28,760 You can also set it to off. So you could write false, or you could write off, you might see that 13256 20:54:28,760 --> 20:54:36,200 different versions of that somewhere. And I'm going to write a super title here, which we'll see what 13257 20:54:36,200 --> 20:54:41,720 this looks like class is going to be image path. So we're getting the target image path. And we're 13258 20:54:41,720 --> 20:54:46,680 just going to get the attribute or the parent attribute, and then the stem attribute from that, 13259 20:54:46,680 --> 20:54:51,160 just like we did before, to get the class name. And then I'm going to set this to a larger font 13260 20:54:51,160 --> 20:54:57,480 size, so that we make some nice looking plots, right? If we're going to visualize our data, 13261 20:54:57,480 --> 20:55:03,240 we might as well make our plots visually appealing. So let's plot some transformed data or transformed 13262 20:55:03,240 --> 20:55:08,520 images. So image paths, we're going to set this to image part list, which is just the variable we 13263 20:55:08,520 --> 20:55:15,000 have down below, which is the part list, a list containing all of our image paths. Our transform, 13264 20:55:15,000 --> 20:55:21,160 we're going to set our transform to be equal to our data transform. So this just means that if 13265 20:55:21,160 --> 20:55:26,920 we pass the transform in, our image is going to go through that transform, and then go through all 13266 20:55:26,920 --> 20:55:31,160 of these is going to be resized, it's going to be randomly horizontally flipped, and it's going to 13267 20:55:31,160 --> 20:55:37,560 be converted to a tensor. And then so we're going to set that data transfer there or data transform, 13268 20:55:37,560 --> 20:55:43,000 sorry, and is going to be three. So we plot three images, and we'll set the seed to 42 to begin with. 13269 20:55:43,000 --> 20:55:52,600 Let's see if this works. Oh, what did we get wrong? We have invalid shape. As I said, I love seeing 13270 20:55:52,600 --> 20:55:57,960 this error, because we have seen this error many times, and we know what to do with it. We know that 13271 20:55:57,960 --> 20:56:02,920 we have to rearrange the shapes of our data in some way, shape or form. Wow, I said shape a lot 13272 20:56:02,920 --> 20:56:07,640 there. That's all right. Let's go here, permute. This is what we have to do. We have to permute, 13273 20:56:07,640 --> 20:56:12,760 we have to swap the order of the axes. So right now, our color channels is first. So we have to 13274 20:56:12,760 --> 20:56:18,360 bring this color channel axis or dimension to the end. So we need to shuffle these across. So 64 13275 20:56:18,360 --> 20:56:22,680 into here, 64 into here, and three on the end. We need to, in other words, turn it from color 13276 20:56:22,680 --> 20:56:29,400 channels first to color channels last. So we can do that by permuting it to have the first 13277 20:56:29,400 --> 20:56:35,080 axis come now in the zero dimension spot. And then number two was going to be in the first 13278 20:56:35,080 --> 20:56:40,360 dimension spot. And then number zero was going to be at the back end. So this is essentially going 13279 20:56:40,360 --> 20:56:53,080 from C H W, and we're just changing the order to be H W C. So the exact same data is going to be 13280 20:56:53,080 --> 20:56:57,800 within that tensor. We're just changing the order of the dimensions. Let's see if this works. 13281 20:57:00,200 --> 20:57:07,400 Look at that. Oh, I love seeing some manipulated data. We have a class of pizza and the original 13282 20:57:07,400 --> 20:57:13,160 image is there, and it's 512 by 512. But then we've resized it using our transform. Notice that 13283 20:57:13,160 --> 20:57:18,920 it's a lot more pixelated now, but that makes sense because it's only 64 64 pixels. Now, why 13284 20:57:18,920 --> 20:57:25,080 might we do such a thing? Well, one, if is this image still look like that? Well, to me, it still 13285 20:57:25,080 --> 20:57:29,400 does. But the most important thing will be does it look like that to our model? Does it still look 13286 20:57:29,400 --> 20:57:36,040 like the original to our model? Now 64 by 64, there is less information encoded in this image. 13287 20:57:36,040 --> 20:57:42,440 So our model will be able to compute faster on images of this size. However, we may lose 13288 20:57:42,440 --> 20:57:48,920 some performance because not as much information is encoded as the original image. Again, the size 13289 20:57:48,920 --> 20:57:53,480 of an image is something that you can control. You can set it to be a hyper parameter. You can 13290 20:57:53,480 --> 20:58:01,320 tune the size to see if it improves your model. But I've just decided to go 60 64 64 3 in line 13291 20:58:01,320 --> 20:58:08,760 with the CNN explainer website. So a little hint, we're going to be re replicating this model that 13292 20:58:08,760 --> 20:58:15,000 we've done before. Now you notice that our images are now the same size 64 64 3 as what the CNN 13293 20:58:15,000 --> 20:58:19,800 explainer model uses. So that's where I've got that from. But again, you could change this to 13294 20:58:19,800 --> 20:58:25,000 size to whatever you want. And we see, oh, we've got a stake image here. And you notice that our 13295 20:58:25,000 --> 20:58:30,680 image has been flipped on the horizontal. So the horizontal access, our image has just been flipped 13296 20:58:30,680 --> 20:58:37,080 same with this one here. So this is the power of torch transforms. Now there are a lot more 13297 20:58:37,080 --> 20:58:42,200 transforms, as I said, you can go through them here to have a look at what's going on. Illustrations 13298 20:58:42,200 --> 20:58:48,680 of transforms is a great place. So there's resize, there's center crop, you can crop your 13299 20:58:48,680 --> 20:58:54,600 images, you can crop five different locations, you can do grayscale, you can change the color, 13300 20:58:54,600 --> 20:59:02,280 a whole bunch of different things. I'd encourage you to check this out. That's your extra curriculum 13301 20:59:02,280 --> 20:59:09,720 for this video. But now that we've visualized a transform, this is what I hinted at before that 13302 20:59:09,720 --> 20:59:17,720 we're going to use this transform for when we load all of our images in, using into a torch 13303 20:59:17,720 --> 20:59:23,720 data set. So I just wanted to make sure that they had been visualized first. We're going to use our 13304 20:59:23,720 --> 20:59:31,000 data transform in the next video when we load all of our data using a torch vision dot data sets 13305 20:59:31,000 --> 20:59:35,080 helper function. So let's give that a go. I'll see you in the next video. 13306 20:59:38,680 --> 20:59:43,160 Have a look at that beautiful plot. We've got some original images and some transformed 13307 20:59:43,160 --> 20:59:48,360 images. And the beautiful thing about our transformed images is that they're in tensor format, 13308 20:59:48,360 --> 20:59:52,120 which is what we need for our model. That's what we've been slowly working towards. 13309 20:59:52,120 --> 20:59:58,520 We've got a data set. And now we've got a way to turn it into tensors ready for a model. So 13310 20:59:58,520 --> 21:00:03,320 let's just visualize what another, I'll turn the seed off here so we can look at some more random 13311 21:00:03,320 --> 21:00:11,000 images. There we go. Okay, so we've got stake pixelated because we're downsizing 64, 64, 3. 13312 21:00:11,560 --> 21:00:16,760 Same thing for this one. And it's been flipped on the horizontal. And then same thing for this 13313 21:00:16,760 --> 21:00:25,960 pizza image and we'll do one more to finish off. Wonderful. So that is the premise of transforms 13314 21:00:26,600 --> 21:00:31,880 turning our images into tensors and also manipulating those images if we want to. 13315 21:00:32,760 --> 21:00:38,200 So let's get rid of this. I'm going to make another heading. We're up to section or part four now. 13316 21:00:38,200 --> 21:00:49,800 And this is going to be option one. So loading image data using image folder. And now I'm going 13317 21:00:49,800 --> 21:00:59,000 to turn that into markdown. And so let's go torch vision data sets. So recall how each one of the 13318 21:00:59,000 --> 21:01:03,480 torch vision domain libraries has its own data sets module that has built in functions for 13319 21:01:03,480 --> 21:01:09,160 helping you load data. In this case, we have an image folder. And there's a few others here if 13320 21:01:09,160 --> 21:01:16,120 you'd like to look into those. But an image folder, this class is going to help us load in data that 13321 21:01:16,120 --> 21:01:21,800 is in this format, the generic image classification format. So this is a prebuilt data sets function. 13322 21:01:22,360 --> 21:01:29,080 Just like there's prebuilt data sets, we can use prebuilt data set functions. Now option two 13323 21:01:29,080 --> 21:01:35,640 later on, this is a spoiler, is we're going to create our own custom version of a data set loader. 13324 21:01:35,640 --> 21:01:41,640 But we'll see that in a later video. So let's see how we can use image folder to load all of our 13325 21:01:42,440 --> 21:01:48,040 custom data, our custom images into tensors. So this is where the transform is going to come in 13326 21:01:48,040 --> 21:01:57,640 helpful. So let's write here, we can load image classification data using, let's write this, 13327 21:01:57,640 --> 21:02:06,520 let's write the full path name, torch vision dot data sets dot image folder. Put that in there, 13328 21:02:07,800 --> 21:02:16,120 beautiful. And so let's just start it out, use image folder to create data sets. Now in a previous 13329 21:02:16,120 --> 21:02:22,760 video, I hinted at the fact that we can pass a transform to our image folder class. That's going 13330 21:02:22,760 --> 21:02:30,120 to be right here. So let's see what that looks like in practice. So from torch vision, I'm going 13331 21:02:30,120 --> 21:02:35,800 to import data sets, because that's where the image folder module lives. And then we can go train 13332 21:02:35,800 --> 21:02:43,800 data equals data sets dot image folder. And we're going to pass in the root, which is our train 13333 21:02:43,800 --> 21:02:48,680 der, because we're going to do it for the training directory first. And then we're going to pass 13334 21:02:48,680 --> 21:02:54,520 in a transform, which is going to be equal to our data transform. And then we're going to pass in 13335 21:02:54,520 --> 21:02:58,280 a target transform, but we're going to leave this as none, which is the default, I believe, 13336 21:02:59,160 --> 21:03:08,120 we go up to here. Yeah, target transform is optional. So what this means is this is going to be a 13337 21:03:08,120 --> 21:03:17,000 transform for the data. And this is going to be a transform for the label slash target. 13338 21:03:17,000 --> 21:03:23,160 PyTorch likes to use target, I like to use label, but that's okay. So this means that we don't need 13339 21:03:23,160 --> 21:03:29,240 a target transform, because our labels are going to be inferred by the target directory where the 13340 21:03:29,240 --> 21:03:35,400 images live. So our pizza images are in this directory, and they're going to have pizza as the label, 13341 21:03:35,400 --> 21:03:42,840 because our data set is in standard image classification format. Now, if your data set wasn't in a 13342 21:03:42,840 --> 21:03:47,720 standard image classification format, you might use a different data loader here. A lot of them 13343 21:03:47,720 --> 21:03:54,520 will have a transform for the data. So this transform is going to run our images, whatever images are 13344 21:03:54,520 --> 21:04:00,440 loaded from these folders, through this transform that we've created here, it's going to resize them, 13345 21:04:00,440 --> 21:04:04,760 randomly flip them on the horizontal, and then turn them into tenses, which is exactly how we 13346 21:04:04,760 --> 21:04:11,560 want them for our PyTorch models. And if we wanted to transform the labels in some way, shape or form, 13347 21:04:11,560 --> 21:04:16,840 we could pass in a target transform here. But in our case, we don't need to transform the labels. 13348 21:04:18,120 --> 21:04:23,480 So let's now do the same thing for the test data. And so that's why I wanted to visualize 13349 21:04:23,480 --> 21:04:30,840 our transforms in the previous videos, because otherwise we're just passing them in as a transform. 13350 21:04:30,840 --> 21:04:35,000 So really, what's going to happen behind the scenes is all of our images are going to go 13351 21:04:35,000 --> 21:04:39,560 through these steps. And so that's what they're going to look like when we turn them into a data 13352 21:04:39,560 --> 21:04:45,480 set. So let's create the test data here or the test data set. The transform, we're going to 13353 21:04:45,480 --> 21:04:50,680 transform the test data set in the same way we've transformed our training data set. And we're 13354 21:04:50,680 --> 21:04:57,400 just going to leave that like that. So let's now print out what our data sets look like, 13355 21:04:57,400 --> 21:05:06,520 train data, and test data. Beautiful. So we have a data set, a torch data set, 13356 21:05:06,520 --> 21:05:10,360 which is an image folder. And we have number of data points. This is going to be for the training 13357 21:05:10,360 --> 21:05:17,640 data set. We have 225. So that means about 75 images per class. And we have the root location, 13358 21:05:17,640 --> 21:05:22,600 which is the folder we've loaded them in from, which is our training directory. We've set these 13359 21:05:22,600 --> 21:05:30,360 two up before, trained and tester. And then we have a transform here, which is a standard transform, 13360 21:05:30,360 --> 21:05:36,200 a resize, followed by random horizontal flip, followed by two tensor. Then we've got basically 13361 21:05:36,200 --> 21:05:43,560 the same output here for our test directory, except we have less samples there. So let's get a few 13362 21:05:43,560 --> 21:05:48,520 little attributes from the image folder. This is one of the benefits of using a pytorch prebuilt 13363 21:05:49,160 --> 21:05:54,600 data loader, is that or data set loader is that it comes with a fair few attributes. So we could 13364 21:05:54,600 --> 21:06:00,520 go to the documentation, find this out from in here, inherits from data set folder, keep digging 13365 21:06:00,520 --> 21:06:06,840 into there, or we could just come straight into Google collab. Let's go get class names as a list. 13366 21:06:07,400 --> 21:06:12,920 Can we go train data dot and then press tab? Beautiful. So we've got a fair few things here 13367 21:06:12,920 --> 21:06:19,320 that are attributes. Let's have a look at classes. This is going to give us a list of the class names, 13368 21:06:20,440 --> 21:06:27,960 class names. This is very helpful later on. So we've got pizza steak sushi. We're trying to 13369 21:06:27,960 --> 21:06:34,200 do everything with code here. So if we have this attribute of train data dot classes, 13370 21:06:34,200 --> 21:06:38,440 we can use this list later on for when we plot images straight from our data set, 13371 21:06:38,440 --> 21:06:45,160 or make predictions on them and we want to label them. You can also get class names as a dictionary, 13372 21:06:45,160 --> 21:06:54,200 map to their integer index, that is, so we can go train data dot and press tab. We've got class 13373 21:06:54,200 --> 21:07:02,840 to ID X. Let's see what this looks like. Class decked. Wonderful. So then we've got our string 13374 21:07:02,840 --> 21:07:10,040 class names mapped to their integer. So we've got pizza is zero, steak is one, sushi is two. Now, 13375 21:07:10,040 --> 21:07:15,320 this is where the target transform would come into play. If you wanted to transform those 13376 21:07:16,520 --> 21:07:20,680 these labels here in some way, shape or form, you could pass a transform into here. 13377 21:07:20,680 --> 21:07:26,760 And then if we keep going, let's check the lengths of what's going on. Check the lengths 13378 21:07:27,320 --> 21:07:32,680 of our data set. So we've seen this before, but this is going to just give us how many samples 13379 21:07:32,680 --> 21:07:39,880 that we have length, train data, length, test data, beautiful. And then of course, if you'd like 13380 21:07:39,880 --> 21:07:44,520 to explore more attributes, you can go train data dot, and then we've got a few other things, 13381 21:07:44,520 --> 21:07:50,600 functions, images, loader, samples, targets. If you wanted to just see the images, you can go dot 13382 21:07:50,600 --> 21:07:55,800 samples. If you wanted to see just the labels, you can go dot targets. This is going to be all 13383 21:07:55,800 --> 21:07:59,800 of our labels. Look at that. And I believe they're going to be an order. So we're going to have 13384 21:07:59,800 --> 21:08:05,160 zero, zero, zero, one, one, one, two, two, and then if we wanted to have a look, let's say we have a 13385 21:08:05,160 --> 21:08:15,880 look at the first sample, hey, we have data, pizza, steak sushi, train, pizza. There's the image path, 13386 21:08:15,880 --> 21:08:23,560 and it's a label zero for pizza. Wonderful. So now we've done that. How about we, we've been 13387 21:08:23,560 --> 21:08:30,440 visualizing this whole time. So let's keep up that trend. And let's visualize a sample and a label 13388 21:08:30,440 --> 21:08:37,960 from the train data data set. So in this video, we've used image folder to load our images 13389 21:08:37,960 --> 21:08:43,880 into tenses. And because our data is already in standard image classification format, 13390 21:08:43,880 --> 21:08:47,800 we can use one of torch vision dot data sets prebuilt functions. 13391 21:08:49,560 --> 21:08:53,800 So let's do some more visualization in the next video. I'll see you there. 13392 21:08:53,800 --> 21:09:03,560 Welcome back. In the last video, we used data sets dot image folder to turn all of our 13393 21:09:04,360 --> 21:09:11,480 image data into tenses. And we did that with the help of our data transform, which is a little 13394 21:09:11,480 --> 21:09:18,520 pipeline up here to take in some data, or specifically an image, resize it to a value that we've set in 13395 21:09:18,520 --> 21:09:24,520 our k6464 randomly flip it along the horizontal. We don't necessarily need this, but I've just put 13396 21:09:24,520 --> 21:09:29,320 that in there to indicate what happens when you pass an image through a transforms pipeline. 13397 21:09:29,320 --> 21:09:35,880 And then most importantly, we've turned our images into a torch tensor. So that means that our data, 13398 21:09:35,880 --> 21:09:42,040 our custom data set, this is so exciting, is now compatible to be used with a pytorch model. 13399 21:09:42,040 --> 21:09:47,000 So let's keep pushing forward. We're not finished yet. We're going to visualize some samples 13400 21:09:47,000 --> 21:09:55,800 from the train data data set. So let's, how can we do this? Let's get, we can index on the train data 13401 21:09:56,440 --> 21:10:06,360 data set to get a single image and a label. So if we go, can we do train data zero? What does that 13402 21:10:06,360 --> 21:10:13,720 give us? Okay, so this is going to give us an image tensor. And it's associated label. In this 13403 21:10:13,720 --> 21:10:22,600 case, it's an image of pizza, because why it's associated label is pizza. So let's take the zero 13404 21:10:22,600 --> 21:10:29,720 zero. So this is going to be our image. And the label is going to be train data zero. And we're 13405 21:10:29,720 --> 21:10:35,640 just going to get the first index item there, which is going to be one. And then if we have a look 13406 21:10:35,640 --> 21:10:44,440 at them separately, image and label, beautiful. So now one of our target images is in tensor format, 13407 21:10:44,440 --> 21:10:49,400 exactly how we want it. And it's label is in numeric format as well, which is also exactly how 13408 21:10:49,400 --> 21:10:55,560 we want it. And then if we wanted to convert this back to a non label, we can go class names 13409 21:10:57,160 --> 21:11:03,400 and index on that. And we see pizza. And I mean, non label is in non numeric, we can get it back 13410 21:11:03,400 --> 21:11:09,960 to string format, which is human understandable. We can just index on class names. So let's print 13411 21:11:09,960 --> 21:11:14,920 out some information about what's going on here. Print F, we're going to go image tensor. 13412 21:11:15,720 --> 21:11:20,920 I love F strings if you haven't noticed yet. Image tensor. And we're going to set in 13413 21:11:21,560 --> 21:11:25,560 new line, we're going to pass it in our image, which is just the image that we've got here. 13414 21:11:26,600 --> 21:11:30,520 Then we'll print in some more information about that. This is still all becoming one with the 13415 21:11:30,520 --> 21:11:36,840 data right where we're slowly finding out information about our data set so that if errors arise later 13416 21:11:36,840 --> 21:11:41,960 on, we can go, hmm, our image or we're getting a shape error. And I know our images are of this 13417 21:11:41,960 --> 21:11:47,240 shape or we're getting a data type error, which is why I've got the dot D type here. And that 13418 21:11:47,240 --> 21:11:53,480 might be why we're getting a data type issue. So let's do one more with the image label, 13419 21:11:53,480 --> 21:12:00,200 label, oh, well, actually, we'll do one more. We'll do print, we'll get the label data type as well. 13420 21:12:01,160 --> 21:12:07,720 Label, this will be important to take note of later on. Type, as I said, three big issues. 13421 21:12:08,360 --> 21:12:15,800 Shape mismatch, device mismatch, and data type mismatch. Can we get the type of our label? 13422 21:12:15,800 --> 21:12:24,840 Beautiful. So we've got our image tensor and we've got its shape. It's of torch size 36464. 13423 21:12:25,400 --> 21:12:31,320 That's exactly how we want it. The data type is torch float 32, which is the default data type 13424 21:12:31,320 --> 21:12:39,400 in PyTorch. Our image label is zero and the label data type is of integer. So let's try and plot 13425 21:12:39,400 --> 21:12:45,960 this and see what it looks like, hey, using matplotlib. So first of all, what do we have to do? Well, 13426 21:12:45,960 --> 21:12:53,720 we have to rearrange the order of dimensions. In other words, matplotlib likes color channels 13427 21:12:53,720 --> 21:12:59,240 last. So let's see what looks this looks like. We'll go image per mute. We've done this before, 13428 21:12:59,240 --> 21:13:06,120 image.permute 120 means we're reordering the dimensions. Zero would usually be here, 13429 21:13:06,120 --> 21:13:10,200 except that we've taken the zero dimension, the color channels and put it on the end 13430 21:13:10,200 --> 21:13:17,720 and shuffled the other two forward. So let's now print out different shapes. I love printing 13431 21:13:17,720 --> 21:13:22,120 out the change in shapes. It helps me really understand what's going on. Because sometimes 13432 21:13:22,120 --> 21:13:26,280 I look at a line like this and it doesn't really help me. But if I print out something of what 13433 21:13:26,280 --> 21:13:31,720 the shapes were originally and what they changed to, well, hey, that's a big help. That's what 13434 21:13:31,720 --> 21:13:37,640 Jupiter notebooks are all about, right? So this is going to be color channels first, height, 13435 21:13:38,360 --> 21:13:45,640 width. And depending on what data you're using, if you're not using images, if you're using text, 13436 21:13:45,640 --> 21:13:51,960 still knowing the shape of your data is a very good thing. We're going to go image per mute.shape 13437 21:13:52,520 --> 21:13:58,120 and this should be everything going right is height with color channels on the end here. 13438 21:13:58,120 --> 21:14:03,640 And we're just going to plot the image. You can never get enough plotting practice. 13439 21:14:04,680 --> 21:14:12,520 Plot the image. You're going to go PLT dot figure, we'll pass in fig size equals 10, 7. 13440 21:14:13,080 --> 21:14:19,320 And then we're going to PLT dot in show. We'll pass in the permuted image, 13441 21:14:20,120 --> 21:14:27,480 image underscore permutes, and then we'll turn off the axes. And we will set the title to be 13442 21:14:27,480 --> 21:14:33,240 class names. And we're going to index on the label, just as we did before. And we're going to set 13443 21:14:33,240 --> 21:14:41,000 the font size equal to 14. So it's nice and big. Here we go. Beautiful. There is our image of pizza. 13444 21:14:41,560 --> 21:14:47,960 It is very pixelated because we're going from about 512 as the original size 512 by 512 to 64, 13445 21:14:47,960 --> 21:14:54,360 64. I would encourage you to try this out. Potentially, you could use a different image here. So we've 13446 21:14:54,360 --> 21:14:59,960 indexed on sample zero. Maybe you want to change this to just be a random image and go through these 13447 21:14:59,960 --> 21:15:05,320 steps here. And then if you'd like to see different transforms, I'd also encourage you to try 13448 21:15:05,320 --> 21:15:10,600 changing this out, our transform pipeline here, maybe increase the size and see what it looks 13449 21:15:10,600 --> 21:15:16,600 like. And if you're feeling really adventurous, you can go into torch vision and look at the 13450 21:15:16,600 --> 21:15:21,800 transforms library here and then try one of these and see what it does to our images. 13451 21:15:21,800 --> 21:15:28,520 But we're going to keep pushing forward. We are going to look at another way. Or actually, 13452 21:15:28,520 --> 21:15:37,320 I think for completeness, let's now turn, we've got a data set. We want to, we wrote up here before 13453 21:15:37,320 --> 21:15:43,400 that we wanted to turn our images into a data set, and then subsequently a torch utils data 13454 21:15:43,400 --> 21:15:49,960 data loader. So we've done this before, by batching our images, or batching our data that we've 13455 21:15:49,960 --> 21:15:56,200 been working with. So I'd encourage you to give this a shot yourself. Try to go through the next 13456 21:15:56,200 --> 21:16:02,840 video and create a train data loader using our train data, wherever that is train data, 13457 21:16:02,840 --> 21:16:10,040 and a test data loader using our test data. So give that a shot and we'll do it together in the 13458 21:16:10,040 --> 21:16:19,880 next video. We'll turn our data sets into data loaders. Welcome back. How'd you go? In the last 13459 21:16:19,880 --> 21:16:26,200 video, I issued you the challenge to turn our data sets into data loaders. So let's do that 13460 21:16:26,200 --> 21:16:30,840 together now. I hope you gave it a shot. That's the best way to practice. So turn loaded images 13461 21:16:30,840 --> 21:16:38,760 into data loaders. So we're still adhering to our PyTorch workflow here. We've got a custom 13462 21:16:38,760 --> 21:16:43,800 data set. We found a way to turn it into tenses in the form of data sets. And now we're going to 13463 21:16:43,800 --> 21:16:50,680 turn it into a data loader. So we can turn our data sets into iterables or batchify our data. 13464 21:16:50,680 --> 21:17:00,920 So let's write down here, a data loader is going to help us turn our data sets into iterables. 13465 21:17:01,800 --> 21:17:11,160 And we can customize the batch size, write this down. So our model can see batch size 13466 21:17:11,160 --> 21:17:19,160 images at a time. So this is very important. As we touched on in the last section computer vision, 13467 21:17:19,160 --> 21:17:25,320 we create a batch size because if we had 100,000 images, chances are if they were all in one data 13468 21:17:25,320 --> 21:17:30,360 set, there's 100,000 images in the food 101 data set. We're only working with about 200. 13469 21:17:31,080 --> 21:17:37,960 If we try to load all 100,000 in one hit, chances are our hardware may run out of memory. And so 13470 21:17:37,960 --> 21:17:45,640 that's why we matchify our images. So if we have a look at this, NVIDIA SMI, our GPU only has 16 13471 21:17:45,640 --> 21:17:52,600 gigabytes. I'm using a Tesla T4 right now, well, has about 15 gigabytes of memory. So if we tried 13472 21:17:52,600 --> 21:17:58,280 to load 100,000 images into that whilst also computing on them with a PyTorch model, 13473 21:17:58,280 --> 21:18:03,080 potentially we're going to run out of memory and run into issues. So instead, we can turn them 13474 21:18:03,080 --> 21:18:09,720 into a data loader so that our model looks at 32 images at a time and can leverage all of the 13475 21:18:09,720 --> 21:18:18,280 memory that it has rather than running out of memory. So let's turn our train and test data sets 13476 21:18:18,280 --> 21:18:25,880 into data loaders, turn train and test data sets into data loaders. Now, this is not just for image 13477 21:18:25,880 --> 21:18:36,200 data. This is for all kinds of data in PyTorch. Images, text, audio, you name it. So import data 13478 21:18:36,200 --> 21:18:42,440 loader, then we're going to create a train data loader. We're going to set it equal to data loader. 13479 21:18:42,440 --> 21:18:49,160 We're going to pass in a data set. So let's set this to train data. Let's set the batch size. 13480 21:18:49,160 --> 21:18:54,440 What should we set the batch size to? I'm going to come up here and set a laser capital variable. 13481 21:18:54,440 --> 21:19:01,880 I'm going to use 32 because 32 is a good batch size. So we'll go 32 or actually, 13482 21:19:01,880 --> 21:19:05,480 let's start small. Let's just start with a batch size of one and see what happens. 13483 21:19:05,480 --> 21:19:11,720 Batch size one, number of workers. So this parameter is going to be, this is an important one. I'm going 13484 21:19:11,720 --> 21:19:15,560 to, I potentially have covered it before, but I'm going to introduce it again. Is this going to be 13485 21:19:15,560 --> 21:19:23,720 how many cores or how many CPU cores that is used to load your data? So the higher the better usually 13486 21:19:23,720 --> 21:19:32,520 and you can set this via OS CPU count, which will count how many CPUs your compute hardware has. 13487 21:19:32,520 --> 21:19:39,160 So I'll just show you how this works. Import OS and this is a Python OS module. We can do 13488 21:19:39,160 --> 21:19:45,080 CPU count to find out how many CPUs our Google Colab instance has. Mine has two, 13489 21:19:45,080 --> 21:19:51,320 your number may vary, but I believe most Colab instances have two CPUs. If you're running this on 13490 21:19:51,320 --> 21:19:55,480 your local machine, you may have more. If you're running it on dedicated deep learning hardware, 13491 21:19:55,480 --> 21:20:03,240 you may even have even more, right? So generally, if you set this to one, it will use one CPU core, 13492 21:20:03,240 --> 21:20:10,600 but if you set it to OS dot CPU count, it will use as many as possible. So we're just going to 13493 21:20:10,600 --> 21:20:16,040 leave this as one right now. You can customize this to however you want. And I'm going to shuffle 13494 21:20:16,040 --> 21:20:21,240 the training data because I don't want my model to recognize any order in the training data. So I'm 13495 21:20:21,240 --> 21:20:28,440 going to mix it up. And then I'm going to create the test data loader. Data set equals test data. 13496 21:20:29,720 --> 21:20:36,520 And batch size equals one, num workers, I'm going to set this to equal one as well. Again, 13497 21:20:36,520 --> 21:20:41,560 you can customize each of these, their hyper parameters to whatever you want. Number of workers 13498 21:20:41,560 --> 21:20:47,400 generally the more the better. And then I'm going to set shuffle equals false for the test data so 13499 21:20:47,400 --> 21:20:52,760 that if we want to evaluate our models later on, our test data set is always in the same order. 13500 21:20:53,640 --> 21:20:58,440 So now let's have a look at train data loader, see what happens. And test data loader. 13501 21:21:03,400 --> 21:21:09,720 Wonderful. So we get two instances of torch utils dot data dot data loader. And now we can 13502 21:21:09,720 --> 21:21:17,240 see if we can visualize something from the train data loader, as well as the test data loader. 13503 21:21:17,240 --> 21:21:21,000 I actually maybe we just visualize something from one of them. So we're not just double 13504 21:21:21,000 --> 21:21:26,680 handling everything. We get a length here. Wonderful. Because we're using a batch size of one, 13505 21:21:26,680 --> 21:21:34,920 our lengths of our data loaders are the same as our data sets. Now, of course, this would change 13506 21:21:34,920 --> 21:21:42,360 if we set, oh, we didn't even set this to the batch size parameter batch size. Let's come down 13507 21:21:42,360 --> 21:21:48,760 here and do the same here batch size. So we'll watch this change. If we wanted to look at 32 13508 21:21:48,760 --> 21:21:54,920 images at a time, we definitely could do that. So now we have eight batches, because 22, 225 13509 21:21:54,920 --> 21:22:02,680 divided by 32 equals roughly eight. And then 75 divided by 32 also equals roughly three. And 13510 21:22:02,680 --> 21:22:07,080 remember, these numbers are going to be rounded if there are some overlaps. So let's get rid of, 13511 21:22:08,280 --> 21:22:13,320 we'll change this back to one. And we'll keep that there. We'll get rid of these two. 13512 21:22:14,040 --> 21:22:20,680 And let's see what it looks like to plot an image from our data loader. Or at least have a look at it. 13513 21:22:22,520 --> 21:22:25,800 Check out the shapes. That's probably the most important point at this time. We've already 13514 21:22:25,800 --> 21:22:32,280 plotted in our things. So let's iterate through our train data loader. And we'll grab the next one. 13515 21:22:32,280 --> 21:22:40,120 We'll grab the image and the label. And we're going to print out here. So batch size will now be one. 13516 21:22:40,840 --> 21:22:46,520 You can change the batch size if you like. This is just again, another way of getting familiar 13517 21:22:46,520 --> 21:22:57,400 with the shapes of our data. So image shape. Let's go image dot shape. And we're going to 13518 21:22:57,400 --> 21:23:03,160 write down here. This shape is going to be batch size. This is what our data loader is going to 13519 21:23:03,160 --> 21:23:12,120 add to our images is going to add a batch dimension, color channels, height, width. And then print. 13520 21:23:12,840 --> 21:23:18,600 Let's check out that label shape. Same thing with the labels. It's going to add a batch 13521 21:23:18,600 --> 21:23:28,120 dimension. Label. And let's see what happens. Oh, we forgot the end of the bracket. Beautiful. 13522 21:23:28,680 --> 21:23:33,400 So we've got image shape. Our label shape is only one because we have a batch size of one. 13523 21:23:34,120 --> 21:23:41,160 And so now we've got batch size one, color channels three, height, width. And if we change this to 13524 21:23:41,160 --> 21:23:48,920 32, what do you think's going to happen? We get a batch size of 32, still three color channels, 13525 21:23:49,640 --> 21:23:55,800 still 64, still 64. And now we have 32 labels. So that means within each batch, we have 32 13526 21:23:55,800 --> 21:24:02,840 images. And we have 32 labels. We could use this with a model. I'm going to change this back to one. 13527 21:24:02,840 --> 21:24:11,000 And I think we've covered enough in terms of loading our data sets. How cool is this? 13528 21:24:11,000 --> 21:24:16,840 We've come a long way. We've downloaded a custom data set. We've loaded it into a data set using 13529 21:24:16,840 --> 21:24:23,720 image folder turned it into tenses using our data transform and now batchified our custom data set 13530 21:24:23,720 --> 21:24:29,160 in data loaders. We've used these with models before. So if you wanted to, you could go right 13531 21:24:29,160 --> 21:24:33,480 ahead and build a convolutional neural network to try and find patterns in our image tenses. 13532 21:24:34,040 --> 21:24:39,800 But in the next video, let's pretend we didn't have this data loader, 13533 21:24:41,320 --> 21:24:50,200 this image folder class available to us. How could we load our image data set so that it's 13534 21:24:50,200 --> 21:24:58,040 compatible? Like our image data set here, how could we replicate this image folder class? 13535 21:24:58,040 --> 21:25:04,600 So that we could use it with a data loader. Because data load is part of torch utils.data, 13536 21:25:04,600 --> 21:25:10,040 you're going to see these everywhere. Let's pretend we didn't have the torch vision.data sets 13537 21:25:10,040 --> 21:25:15,720 image folder helper function. And we'll see in the next video, how we can replicate that functionality. 13538 21:25:15,720 --> 21:25:25,160 I'll see you there. Welcome back. So over the past few videos, we've been working out how to get 13539 21:25:25,160 --> 21:25:31,320 how to get our data from our data folder, pizza, steak, and sushi. We've got images of different 13540 21:25:31,320 --> 21:25:35,960 food data here. And we're trying to get it into Tensor format. So we've seen how to do that 13541 21:25:35,960 --> 21:25:44,360 with an existing data loader helper function or data set function in image folder. However, 13542 21:25:44,360 --> 21:25:50,040 what if image folder didn't exist? And we need to write our own custom data loading function. 13543 21:25:50,040 --> 21:25:56,280 Now the premise of this is although it does exist, it's going to be good practice because you might 13544 21:25:56,280 --> 21:26:01,080 come across a case where you're trying to use a data set where a prebuilt function doesn't exist. 13545 21:26:01,080 --> 21:26:08,600 So let's replicate the functionality of image folder by creating our own data loading class. 13546 21:26:08,600 --> 21:26:15,800 So we want a few things. We want to be able to get the class names as a list from our loaded data. 13547 21:26:15,800 --> 21:26:22,360 And we want to be able to get our class names as a dictionary as well. So the whole goal of this 13548 21:26:22,360 --> 21:26:29,000 video is to start writing a function or a class that's capable of loading data from here into 13549 21:26:29,000 --> 21:26:36,680 Tensor format, capable of being used with the PyTorch's data loader class, like we've done here. So we 13550 21:26:36,680 --> 21:26:41,720 want to create a data set. Let's start it off. We're going to create another heading here. This is 13551 21:26:41,720 --> 21:26:52,280 going to be number five, option two, loading image data with a custom data set. So we want a few 13552 21:26:53,720 --> 21:27:03,320 functionality steps here. Number one is one, two, be able to load images from file to one, 13553 21:27:03,320 --> 21:27:13,480 two, be able to get class names from the data set, and three, one, two, be able to get classes 13554 21:27:13,480 --> 21:27:21,720 as dictionary from the data set. And so let's briefly discuss the pros and cons of creating 13555 21:27:21,720 --> 21:27:31,000 your own custom data set. We saw option one was to use a pre-existing data set loader helping 13556 21:27:31,000 --> 21:27:36,600 function from torch vision. And it's going to be quite similar if we go torch vision data sets. 13557 21:27:39,240 --> 21:27:44,200 Quite similar if you're using other domain libraries here, there we're going to be data 13558 21:27:44,200 --> 21:27:51,720 loading utilities. But at the base level of PyTorch is torchutils.data.dataset. Now this is 13559 21:27:51,720 --> 21:27:58,920 the base data set class. So we want to build on top of this to create our own image folder loading 13560 21:27:58,920 --> 21:28:05,720 class. So what are the pros and cons of creating your own custom data set? Well, let's discuss some 13561 21:28:05,720 --> 21:28:16,760 pros. So one pro would be you can create a data set out of almost anything as long as you write 13562 21:28:16,760 --> 21:28:25,160 the right code to load it in. And another pro is that you're not limited to PyTorch pre-built 13563 21:28:25,160 --> 21:28:34,200 data set functions. A couple of cons would be that even though this is to point number one. 13564 21:28:35,240 --> 21:28:44,040 So even though you could create a data set out of almost anything, it doesn't mean that it will 13565 21:28:44,040 --> 21:28:50,120 automatically work. It will work. And of course, you can verify this through extensive testing, 13566 21:28:50,120 --> 21:28:56,200 seeing if your model actually works, if it actually loads data in the way that you want it. And another 13567 21:28:56,200 --> 21:29:04,440 con is that using a custom data set requires us to write more code. So often results in us 13568 21:29:05,080 --> 21:29:14,760 writing more code, which could be prone to errors or performance issues. So typically if 13569 21:29:14,760 --> 21:29:20,360 something makes it into the PyTorch standard library or the PyTorch domain libraries, 13570 21:29:22,280 --> 21:29:28,680 if functionality makes it into here, it's generally been tested many, many times. And it can kind of 13571 21:29:28,680 --> 21:29:34,280 be verified that it works quite well with, or if you do use it, it works quite well. Whereas if 13572 21:29:34,280 --> 21:29:40,040 we write our own code, sure, we can test it ourselves, but it hasn't got the robustness to begin with, 13573 21:29:40,040 --> 21:29:45,240 that is, we could fix it over time, as something that's included in say the PyTorch standard library. 13574 21:29:45,960 --> 21:29:49,960 Nonetheless, it's important to be aware of how we could create such a custom data set. 13575 21:29:50,600 --> 21:29:55,080 So let's import a few things that we're going to use. We'll import OS, because we're going to be 13576 21:29:55,080 --> 21:30:02,200 working with Python's file system over here. We're going to import path lib, because we're going to 13577 21:30:02,200 --> 21:30:06,760 be working with file paths. We'll import torch, we don't need to again, but I'm just doing this 13578 21:30:06,760 --> 21:30:14,120 for completeness. We're going to import image from pill, the image class, because we want to be 13579 21:30:14,120 --> 21:30:20,920 opening images. I'm going to import from torch utils dot data. I'm going to import data set, 13580 21:30:20,920 --> 21:30:26,600 which is the base data set. And as I said over here, we can go to data sets, click on torch utils 13581 21:30:26,600 --> 21:30:32,680 data dot data set. This is an abstract class representing a data set. And you'll find that this 13582 21:30:32,680 --> 21:30:38,680 data set links to itself. So this is the base data set class. Many of the data sets in PyTorch, 13583 21:30:38,680 --> 21:30:43,880 the prebuilt functions, subclass this. So this is what we're going to be doing it. 13584 21:30:44,520 --> 21:30:49,960 And as a few notes here, all subclasses should overwrite get item. And you should optionally 13585 21:30:49,960 --> 21:30:54,680 overwrite land. These two methods, we're going to see this in a future video. For now, we're just 13586 21:30:54,680 --> 21:31:01,880 we're just setting the scene here. So from torch vision, we're going to import transforms, because 13587 21:31:01,880 --> 21:31:08,120 we want to not only import our images, but we want to transform them into tenses. And from the 13588 21:31:08,120 --> 21:31:15,240 Python's typing module, I'm going to import tuple dict and list. So we can put type hints 13589 21:31:15,240 --> 21:31:25,080 when we create our class and loading functions. Wonderful. So this is our instance of torch vision 13590 21:31:25,080 --> 21:31:32,280 dot data sets image folder, torch vision dot data sets dot image folder. Let's have a look 13591 21:31:32,280 --> 21:31:38,760 at the train data. So we want to write a function that can replicate getting the classes from a 13592 21:31:38,760 --> 21:31:46,920 particular directory, and also turning them into an index or dictionary that is. So let's build 13593 21:31:46,920 --> 21:31:52,200 a helper function to replicate this functionality here. In other words, I'd like to write a helper 13594 21:31:52,200 --> 21:31:57,640 function that if we pass it in a file path, such as pizza steak sushi or this data folder, 13595 21:31:58,440 --> 21:32:04,680 it's going to go in here. And it's going to return the class names as a list. And it's also going 13596 21:32:04,680 --> 21:32:10,840 to turn them into a dictionary, because it's going to be helpful for later on when we'd like to access 13597 21:32:10,840 --> 21:32:17,320 the classes and the class to ID X. And if we really want to completely recreate image folder, 13598 21:32:17,320 --> 21:32:23,240 well, image folder has this functionality. So we'd like that too. So this is just a little high level 13599 21:32:23,240 --> 21:32:28,520 overview of what we're going to be doing. I might link in here that we're going to subclass this. 13600 21:32:29,640 --> 21:32:39,960 So all custom data sets in pie torch, often subclass this. So here's what we're going to be doing. 13601 21:32:39,960 --> 21:32:44,200 Over the next few videos, we want to be able to load images from a file. Now you could replace 13602 21:32:44,200 --> 21:32:49,560 images with whatever data that you're working with the same premise will be here. You want to be 13603 21:32:49,560 --> 21:32:53,560 able to get the class names from the data set and want to be able to get classes as a dictionary 13604 21:32:53,560 --> 21:32:59,880 from the data set. So we're going to map our samples, our image samples to that class name 13605 21:33:00,760 --> 21:33:09,000 by just passing a file path to a function that we're about to write. And some pros and cons of 13606 21:33:09,000 --> 21:33:14,520 creating a custom data set. We've been through that. Let's in the next video, start coding up a 13607 21:33:14,520 --> 21:33:23,320 helper function to retrieve these two things from our target directory. In the last video, 13608 21:33:23,320 --> 21:33:29,720 we discussed the exciting concept of creating a custom data set. And we wrote down a few things 13609 21:33:29,720 --> 21:33:34,840 that we want to get. We discussed some pros and cons. And we learned that many custom data sets 13610 21:33:34,840 --> 21:33:40,680 inherit from torch dot utils dot data data set. So that's what we'll be doing later on. In this 13611 21:33:40,680 --> 21:33:46,280 video, let's focus on writing a helper function to recreate this functionality. So I'm going to 13612 21:33:46,280 --> 21:33:55,480 title this 5.1, creating a helper function to get class names. I'm going to turn this into 13613 21:33:55,480 --> 21:34:02,280 markdown. And if I go into here, so we want to function to let's write down some steps and then 13614 21:34:02,280 --> 21:34:09,240 we'll code it out. So we'll get the class names, we're going to use OS dot scanner. So it's going 13615 21:34:09,240 --> 21:34:21,560 to scanner directory to traverse a target directory. And ideally, the directory is in standard image 13616 21:34:22,440 --> 21:34:31,320 classification format. So just like the image folder class, our custom data class is going to 13617 21:34:31,320 --> 21:34:37,720 require our data already be formatted. In the standard image classification format, such as 13618 21:34:37,720 --> 21:34:42,920 train and test for training and test images, and then images for a particular class are in a 13619 21:34:42,920 --> 21:34:49,240 particular directory. So let's keep going. And number two, what else do we want it to do? We want 13620 21:34:49,240 --> 21:34:57,800 it to raise an error if the class names aren't found. So if this happens, there might be, 13621 21:34:57,800 --> 21:35:03,240 we want this to enter the fact that there might be something wrong with the directory structure. 13622 21:35:04,920 --> 21:35:14,120 And number three, we also want to turn the class names into our dict and a list and return them. 13623 21:35:15,720 --> 21:35:19,720 Beautiful. So let's get started. Let's set up the path directory 13624 21:35:19,720 --> 21:35:28,760 for the target directory. So our target directory is going to be what the directory we want to load 13625 21:35:28,760 --> 21:35:33,800 directory, if I could spell, we want to load our data from, let's start with the training 13626 21:35:33,800 --> 21:35:41,880 der, just for an example. So target directory, what do we get? So we're just going to use the 13627 21:35:41,880 --> 21:35:49,640 training folder as an example to begin with. And we'll go print target der, we'll put in the target 13628 21:35:49,640 --> 21:35:57,080 directory, just want to exemplify what we're doing. And then we're going to get the class names 13629 21:35:57,720 --> 21:36:03,720 from the target directory. So I'll show you the functionality of our scanner. Of course, 13630 21:36:03,720 --> 21:36:11,400 you could look this up in the Python documentation. So class names found, let's set this to be sorted. 13631 21:36:11,400 --> 21:36:21,320 And then we'll get the entry name, entry dot name for entry in list. So we're going to get OS list 13632 21:36:22,040 --> 21:36:30,760 scanner of the image path slash target directory. Let's see what happens when we do this. 13633 21:36:32,680 --> 21:36:35,240 Target directory have we got the right brackets here. 13634 21:36:35,240 --> 21:36:46,280 Now, is this going to work? Let's find out. Oh, image path slash target directory. 13635 21:36:48,600 --> 21:36:53,240 What do we get wrong? Oh, we don't need the image path there. Let's put, let's just put target 13636 21:36:53,240 --> 21:37:01,880 directory there. There we go. Beautiful. So we set up our target directory as been the training 13637 21:37:01,880 --> 21:37:07,960 to. And so if we just go, let's just do list. What happens if we just run this function here? 13638 21:37:09,160 --> 21:37:15,720 Oh, a scanner. Yeah, so there we go. So we have three directory entries. So this is where we're 13639 21:37:15,720 --> 21:37:21,560 getting entry dot name for everything in the training directory. So if we look in the training 13640 21:37:21,560 --> 21:37:27,960 directory, what do we have train? And we have one entry for pizza, one entry for sushi, one entry 13641 21:37:27,960 --> 21:37:33,240 for steak. Wonderful. So now we have a way to get a list of class names. And we could quite easily 13642 21:37:33,240 --> 21:37:38,680 turn this into a dictionary, couldn't we? Which is exactly what we want to do. We want to recreate 13643 21:37:38,680 --> 21:37:44,120 this, which we've done. And we want to recreate this, which is also done. So now let's take this 13644 21:37:44,120 --> 21:37:51,560 functionality here. And let's turn that into a function. All right, what can we do? What do we 13645 21:37:51,560 --> 21:37:58,760 call this? I'm going to call this def fine classes. And I'm going to say that it takes in a directory 13646 21:37:58,760 --> 21:38:05,320 which is a string. And it's going to return. This is where I imported typing from Python type and 13647 21:38:05,320 --> 21:38:13,400 imported tuple. And I'm going to return a list, which is a list of strings and a dictionary, 13648 21:38:13,400 --> 21:38:24,120 which is strings map to integers. Beautiful. So let's keep going. We want to, we want this 13649 21:38:24,120 --> 21:38:30,040 function to return given a target directory, we want it to return these two things. So we've seen 13650 21:38:30,040 --> 21:38:36,840 how we can get a list of the directories in a target directory by using OS scanner. So let's 13651 21:38:36,840 --> 21:38:47,800 write finds the classes are the class folder names in a target directory. Beautiful. And we know 13652 21:38:47,800 --> 21:38:54,760 that it's going to return a list and a dictionary. So let's do step number one, we want to get the 13653 21:38:54,760 --> 21:39:04,200 class names by scanning the target directory. We'll go classes, just we're going to replicate the 13654 21:39:04,200 --> 21:39:11,880 functionality we've done about, but for any given directory here. So classes equals sorted entry 13655 21:39:11,880 --> 21:39:24,440 dot name for entry in OS scanner. And we're going to pass at the target directory. If entry dot is 13656 21:39:24,440 --> 21:39:31,800 dirt, we're just going to make sure it's a directory as well. And so if we just return classes and see 13657 21:39:31,800 --> 21:39:39,080 what happens. So find classes, let's pass it in our target directory, which is our training directory. 13658 21:39:39,640 --> 21:39:49,400 What do we get? Beautiful. So we need to also return class to ID X. So let's keep going. So number 13659 21:39:49,400 --> 21:40:00,840 two is let's go raise an error. If class names could not be found. So if not classes, let's say 13660 21:40:00,840 --> 21:40:08,520 raise file, we're going to raise a file not found error. And then let's just write in here F 13661 21:40:09,400 --> 21:40:16,280 couldn't find any classes in directory. So we're just writing some error checking code here. 13662 21:40:19,240 --> 21:40:23,560 So if we can't find a class list within our target directory, we're going to raise this 13663 21:40:23,560 --> 21:40:32,760 error and say couldn't find any classes in directory, please check file structure. And there's another 13664 21:40:32,760 --> 21:40:38,280 checkup here that's going to help us as well to check if the entry is a directory. So finally, 13665 21:40:38,280 --> 21:40:44,440 let's do number three. What do we want to do? So we want to create a dictionary of index labels. 13666 21:40:44,440 --> 21:40:55,480 So computers, why do we do this? Well, computers prefer numbers rather than strings as labels. So we 13667 21:40:55,480 --> 21:41:03,480 can do this, we've already got a list of classes. So let's just create class to ID X equals class 13668 21:41:03,480 --> 21:41:15,480 name, I for I class name in enumerate classes. Let's see what this looks like. 13669 21:41:18,200 --> 21:41:25,800 So we go class names, and then class to ID X, or we can just return it actually. Do we spell 13670 21:41:25,800 --> 21:41:31,880 enumerate role? Yes, we did. So what this is going to do is going to map a class name to an integer 13671 21:41:31,880 --> 21:41:37,240 or to I for I class name in enumerate classes. So it's going to go through this, and it's going 13672 21:41:37,240 --> 21:41:42,440 to go for I. So the first one zero is going to be pizza. Ideally, one will be steak, 13673 21:41:42,440 --> 21:41:49,160 two will be sushi. Let's see how this goes. Beautiful. Look at that. We've just replicated 13674 21:41:49,160 --> 21:41:56,440 the functionality of image folder. So now we can use this helper function in our own custom 13675 21:41:56,440 --> 21:42:02,040 data set, find classes to traverse through a target directory, such as train, we could do the 13676 21:42:02,040 --> 21:42:09,160 same for test if we wanted to to. And that way, we've got a list of classes. And we've also got 13677 21:42:09,160 --> 21:42:17,720 a dictionary mapping those classes to integers. So now let's in the next video move towards sub 13678 21:42:17,720 --> 21:42:26,040 classing torch utils dot data dot data set. And we're going to fully replicate image folder. So I'll see you there. 13679 21:42:30,280 --> 21:42:35,640 In the last video, we wrote a great helper function called find classes that takes in a target 13680 21:42:35,640 --> 21:42:42,840 directory and returns a list of classes and a dictionary mapping those class names to an integer. 13681 21:42:42,840 --> 21:42:50,840 So let's move forward. And this time, we're going to create a custom data set. To replicate 13682 21:42:50,840 --> 21:42:56,520 image folder. Now we don't necessarily have to do this, right, because image folder already exists. 13683 21:42:56,520 --> 21:43:01,800 And if something already exists in the pie torch library, chances are it's going to be tested well, 13684 21:43:01,800 --> 21:43:08,200 it's going to work efficiently. And we should use it if we can. But if we needed some custom 13685 21:43:08,200 --> 21:43:14,520 functionality, we can always build up our own custom data set by sub classing torch dot utils 13686 21:43:14,520 --> 21:43:20,360 dot data data set. Or if a pre built data set function didn't exist, well, we're probably going 13687 21:43:20,360 --> 21:43:26,520 to want to subclass torch utils data dot data set anyway. And if we go into the documentation here, 13688 21:43:26,520 --> 21:43:30,040 there's a few things that we need to keep in mind when we're creating our own custom data set. 13689 21:43:30,680 --> 21:43:35,800 All data sets that represent a map from keys to data samples. So that's what we want to do. 13690 21:43:35,800 --> 21:43:41,480 We want to map keys, in other words, targets or labels to data samples, which in our case are 13691 21:43:41,480 --> 21:43:50,840 food images. So we should subclass this class here. Now to note, all subclasses should overwrite 13692 21:43:50,840 --> 21:43:57,240 get item. So get item is a method in Python, which is going to get an item or get a sample, 13693 21:43:57,800 --> 21:44:03,320 supporting fetching a data sample for a given key. So for example, if we wanted to get sample 13694 21:44:03,320 --> 21:44:08,520 number 100, this is what get item should support and should return us sample number 100. 13695 21:44:09,400 --> 21:44:15,960 And subclasses could also optionally override land, which is the length of a data set. So return 13696 21:44:15,960 --> 21:44:20,920 the size of the data set by many sampler implementations and the default options of data 13697 21:44:20,920 --> 21:44:27,800 loader, because we want to use this custom data set with data loader later on. So we should keep 13698 21:44:27,800 --> 21:44:34,040 this in mind when we're building our own custom subclasses of torch utils data data set. Let's see 13699 21:44:34,040 --> 21:44:37,560 this hands on, we're going to break it down. It's going to be a fair bit of code, but that's all right. 13700 21:44:38,280 --> 21:44:48,840 Nothing that we can't handle. So to create our own custom data set, we want to number one, 13701 21:44:48,840 --> 21:44:55,800 first things first is we're going to subclass subclass torch dot utils dot data dot data set. 13702 21:44:55,800 --> 21:45:07,160 Two, what do we want to do? We want to init our subclass with target directory. So the directory 13703 21:45:07,160 --> 21:45:18,920 we'd like to get data from, as well as a transform, if we'd like to transform our data. So just like 13704 21:45:18,920 --> 21:45:25,400 when we used image folder, we could pass a transform to our data set, so that we could transform the 13705 21:45:25,400 --> 21:45:32,120 data that we were loading. We want to do the same thing. And we want to create several attributes. 13706 21:45:33,560 --> 21:45:41,640 Let's write them down here. We want paths, which will be the parts of our images. What else do 13707 21:45:41,640 --> 21:45:50,040 we want? We want transform, which will be the transform we'd like to use. We want classes, 13708 21:45:50,040 --> 21:46:00,440 which is going to be a list of the target classes. And we want class to ID X, which is going to be 13709 21:46:00,440 --> 21:46:10,520 a dict of the target classes, mapped to integer labels. Now, of course, these attributes will 13710 21:46:10,520 --> 21:46:16,120 differ depending on your data set. But we're replicating image folder here. So these are just 13711 21:46:16,120 --> 21:46:22,840 some of the things that we've seen that come with image folder. But regardless of what data set 13712 21:46:22,840 --> 21:46:26,520 you're working with, there are probably some things that you want to cross them universal. 13713 21:46:26,520 --> 21:46:31,320 You probably want all the paths of where your data is coming from, the transforms you'd like to 13714 21:46:31,320 --> 21:46:37,080 perform on your data, what classes you're working with, and a map of those classes to an index. 13715 21:46:37,640 --> 21:46:44,920 So let's keep pushing forward. We want to create a function to load images, because after all, 13716 21:46:44,920 --> 21:46:51,960 we want to open some images. So this function will open an image. Number five, we want to 13717 21:46:52,600 --> 21:47:03,640 overwrite the LAN method to return the length of our data set. So just like it said in the documentation, 13718 21:47:05,160 --> 21:47:12,360 if you subclass using torch.utils.data, the data set, you should overwrite get item, 13719 21:47:12,360 --> 21:47:17,240 and you should optionally overwrite LAN. So we're going to, instead of optionally, we are going to 13720 21:47:17,240 --> 21:47:29,000 overwrite length. And number six, we want to overwrite the get item method to return a given sample 13721 21:47:29,640 --> 21:47:39,800 when passed an index. Excellent. So we've got a fair few steps here. But if they don't make 13722 21:47:39,800 --> 21:47:45,240 sense now, it's okay. Let's code it out. Remember our motto, if and doubt, code it out. And if 13723 21:47:45,240 --> 21:47:50,520 and doubt, run the code. So we're going to write a custom data set. This is so exciting, because 13724 21:47:51,800 --> 21:47:57,320 when you work with prebuilt data sets, it's pretty cool in machine learning. But when you can write 13725 21:47:57,880 --> 21:48:05,480 code to create your own data sets, and that's, well, that's magic. So number one is we're going to, 13726 21:48:05,480 --> 21:48:11,960 or number zero is we're going to import torch utils data set, we don't have to rewrite this, 13727 21:48:11,960 --> 21:48:16,680 we've already imported it, but we're going to do it anyway for completeness. Now step number one 13728 21:48:16,680 --> 21:48:24,760 is to subclass it subclass torch utils data, the data set. So just like when we built a model, 13729 21:48:25,480 --> 21:48:30,520 we're going to subclass and in module, but in this time, we're going to call us our class 13730 21:48:30,520 --> 21:48:37,400 image folder custom. And we're going to inherit from data set. This means that all the functionality 13731 21:48:37,400 --> 21:48:43,160 that's contained within torch utils data data set, we're going to get for our own custom class. 13732 21:48:44,360 --> 21:48:48,120 Number two, let's initialize. So we're going to initialize 13733 21:48:49,560 --> 21:48:54,840 our custom data set. And there's a few things that we'd like, and into our subclass with the 13734 21:48:54,840 --> 21:49:00,280 target directory, the directory we'd like to get data from, as well as the transform if we'd 13735 21:49:00,280 --> 21:49:06,760 like to transform our data. So let's write a knit function, a knit, and we're going to go self, 13736 21:49:07,640 --> 21:49:15,640 target, and target is going to be a string. And we're going to set a transform here, 13737 21:49:15,640 --> 21:49:24,440 we'll set it equal to none. Beautiful. So this way we can pass in a target directory of images 13738 21:49:24,440 --> 21:49:29,400 that we'd like to load. And we can also pass in a transform, just similar to the transforms that 13739 21:49:29,400 --> 21:49:35,960 we've created previously. So now we're up to number three, which is create several attributes. So 13740 21:49:35,960 --> 21:49:43,960 let's see what this looks like, create class attributes. So we'll get all of the image paths. 13741 21:49:44,680 --> 21:49:52,120 So we can do this just like we've done before, self paths equals list, path lib dot path, 13742 21:49:52,120 --> 21:49:56,440 because what's our target directory going to be? Well, I'll give you a spoiler alert, 13743 21:49:56,440 --> 21:50:02,200 it's going to be a path like the test directory, or it's going to be the train directory. 13744 21:50:02,760 --> 21:50:08,680 Because we're going to use this once for our test directory and our train directory, 13745 21:50:08,680 --> 21:50:15,320 just like we use the original image folder. So we're going to go through the target directory 13746 21:50:15,320 --> 21:50:23,160 and find out all of the paths. So this is getting all of the image paths that support 13747 21:50:23,160 --> 21:50:31,400 or that follow the file name convention of star star dot jpg. So if we have a look at this, 13748 21:50:31,400 --> 21:50:38,440 we passed in the test folder. So test is the folder star would mean any of these 123 pizza 13749 21:50:38,440 --> 21:50:44,520 steak sushi, that's the first star, then slash would go into the pizza directory. The star here 13750 21:50:44,520 --> 21:50:50,760 would mean any of the file combinations here that end in dot jpg. So this is getting us a list of 13751 21:50:50,760 --> 21:50:56,680 all of the image paths within a target directory. In other words, within the test directory and 13752 21:50:56,680 --> 21:51:02,120 within the train directory, when we call these two separately. So let's keep going, we've got all 13753 21:51:02,120 --> 21:51:06,920 of the image parts, what else did we have to do? We want to create transforms. So let's set up 13754 21:51:06,920 --> 21:51:16,840 transforms, self dot transforms equals transform. Oh, we'll just call that transform actually, 13755 21:51:16,840 --> 21:51:25,080 set up transform equals transform. So we're going to get this from here. And I put it as 13756 21:51:25,080 --> 21:51:33,480 none because it transform can be optional. So let's create classes and class to ID X attributes, 13757 21:51:33,480 --> 21:51:39,640 which is the next one on our list, which is here classes and class to ID X. Now, lucky us, 13758 21:51:39,640 --> 21:51:46,600 in the previous video, we created a function to return just those things. So let's go self dot 13759 21:51:46,600 --> 21:51:56,680 classes and self dot class to ID X equals find classes. And we're going to pass in the target 13760 21:51:56,680 --> 21:52:04,360 der or the target der from here. Now, what's next? We've done step number three, we need 13761 21:52:05,960 --> 21:52:10,520 number four is create a function to load images. All right, let's see what this looks like. So 13762 21:52:10,520 --> 21:52:20,200 number four, create a function to load images. So let's call it load image. And we're going to 13763 21:52:20,200 --> 21:52:26,280 pass in self. And we'll also pass in an index. So the index of the image we'd like to load. 13764 21:52:26,920 --> 21:52:33,800 And this is going to return an image dot image. So where does that come from? Well, previously, 13765 21:52:33,800 --> 21:52:39,480 we imported from pill. So we're going to use Python image library or pillow to import our 13766 21:52:39,480 --> 21:52:45,560 images. So we're going to give on a file path from here, such as pizza, we're going to import 13767 21:52:45,560 --> 21:52:50,840 it with the image class. And we can do that using, I believe it's image dot open. So let's give that 13768 21:52:50,840 --> 21:53:00,360 a try. I'll just write a note in here, opens an image via a path and returns it. So let's write 13769 21:53:00,360 --> 21:53:08,360 image path equals self. This is why we got all of the image paths above. So self dot paths. And 13770 21:53:08,360 --> 21:53:16,920 we're going to index it on the index. Beautiful. And then let's return image dot open image path. 13771 21:53:17,960 --> 21:53:21,560 So we're going to get a particular image path. And then we're just going to open it. 13772 21:53:22,920 --> 21:53:28,120 So now we're up to step number five, override the land method to return the length of our data set. 13773 21:53:29,400 --> 21:53:34,360 This is optional, but we're going to do it anyway. So overwrite. 13774 21:53:34,360 --> 21:53:43,400 Len. So this just wants to return how many samples we have in our data set. So let's write that 13775 21:53:43,400 --> 21:53:51,240 def, Len. So if we call Len on our data set instance, it's going to return just how many numbers there 13776 21:53:51,240 --> 21:53:59,640 are. So let's write this down. Returns the total number of samples. And this is just going to be 13777 21:53:59,640 --> 21:54:08,920 simply return length or Len of self dot paths. So for our target directory, if it was the training 13778 21:54:08,920 --> 21:54:16,360 directory, we'd return the number of image paths that this code has found out here. And same for the 13779 21:54:16,360 --> 21:54:26,360 test directory. So next, I'm going to go number six is we want to overwrite, we put this up here, 13780 21:54:26,360 --> 21:54:32,440 the get item method. So this is required if we want to subclass torch utils data data set. So 13781 21:54:32,440 --> 21:54:38,520 this is in the documentation here. All subclasses should override get item. So we want get item to, 13782 21:54:38,520 --> 21:54:43,960 if we pass it an index to our data set, we want it to return that particular item. So let's see 13783 21:54:43,960 --> 21:54:51,960 what this looks like. Override the get item method to return our particular sample. 13784 21:54:51,960 --> 21:55:00,680 And now this method is going to leverage get item, all of the code that we've created above. 13785 21:55:00,680 --> 21:55:06,760 So this is going to go take in self, which is the class itself. And it's going to take in an index, 13786 21:55:06,760 --> 21:55:15,160 which will be of an integer. And it's going to return a tuple of torch dot tensor and an integer, 13787 21:55:15,160 --> 21:55:21,720 which is the same thing that gets returned when we index on our training data. So if we have a 13788 21:55:21,720 --> 21:55:31,880 look image label equals train data, zero, get item is going to replicate this. We pass it an index here. 13789 21:55:33,720 --> 21:55:39,960 Let's check out the image and the label. This is what we have to replicate. So remember train 13790 21:55:39,960 --> 21:55:45,880 data was created with image folder from torch vision dot data sets. And so we will now get item 13791 21:55:45,880 --> 21:55:51,480 to return an image and a label, which is a tuple of a torch tensor, where the image is of a tensor 13792 21:55:51,480 --> 21:55:59,320 here. And the label is of an integer, which is the label here, the particular index as to which 13793 21:55:59,320 --> 21:56:08,520 this image relates to. So let's keep pushing forward. I'm going to write down here, returns one sample 13794 21:56:08,520 --> 21:56:20,360 of data, data and label, X and, or we'll just go XY. So we know that it's a tuple. Beautiful. 13795 21:56:20,360 --> 21:56:28,440 So let's set up the image. What do we want the image to be? Well, this is where we're going to 13796 21:56:28,440 --> 21:56:34,520 call on our self dot load image function, which is what we've created up here. Do you see the 13797 21:56:34,520 --> 21:56:40,120 customization capabilities of creating your own class? So we've got a fair bit of code here, 13798 21:56:40,120 --> 21:56:44,760 right? But essentially, all we're doing is we're just creating functions that is going to help us 13799 21:56:44,760 --> 21:56:50,440 load our images into some way, shape or form. Now, again, I can't stress this enough, regardless 13800 21:56:50,440 --> 21:56:55,960 of the data that you're working on, the pattern here will be quite similar. You'll just have to 13801 21:56:55,960 --> 21:57:02,200 change the different functions you use to load your data. So let's load an image of a particular 13802 21:57:02,200 --> 21:57:08,440 index. So if we pass in an index here, it's going to load in that image. Then what do we do? Well, 13803 21:57:08,440 --> 21:57:14,360 we want to get the class name, which is going to be self dot paths. And we'll get the index here, 13804 21:57:15,000 --> 21:57:23,480 and we can go parent dot name. So this expects path in format data, 13805 21:57:24,760 --> 21:57:33,240 folder slash class name slash image dot JPG. That's just something to be aware of. And the class 13806 21:57:33,240 --> 21:57:40,040 ID X is going to be self dot class to ID X. And we will get the class name here. 13807 21:57:42,840 --> 21:57:50,360 So now we have an image by loading in the image here. We have a class name by because our data 13808 21:57:50,360 --> 21:57:55,080 is going to be or our data is currently in standard image classification format. You may have to 13809 21:57:55,080 --> 21:57:59,320 change this depending on the format your data is in, we can get the class name from that, 13810 21:57:59,320 --> 21:58:06,920 and we can get the class ID X by indexing on our attribute up here, our dictionary of class names 13811 21:58:06,920 --> 21:58:15,720 to indexes. Now we have one small little step. This is transform if necessary. So remember our 13812 21:58:16,600 --> 21:58:22,440 transform parameter up here. If we want to transform our target image, well, let's put in if self dot 13813 21:58:22,440 --> 21:58:29,240 transform if the transform exists, let's pass the image through that transform, transform image 13814 21:58:29,240 --> 21:58:37,800 and then we're going to also return the class ID X. So do you notice how we've returned a 13815 21:58:37,800 --> 21:58:44,520 tuple here? This is going to be a torch tensor. If our transform exists and the class ID X is also 13816 21:58:44,520 --> 21:58:51,000 going to be returned, which is what we want here, X and Y, which is what gets returned here, 13817 21:58:51,000 --> 21:59:02,520 image as a tensor label as an integer. So return data label X, Y, and then if the transform doesn't 13818 21:59:02,520 --> 21:59:17,880 exist, let's just return image class ID X, return untransformed image and label. Beautiful. So 13819 21:59:17,880 --> 21:59:24,200 that is a fair bit of code there. So you can see the pro of subclassing torch utils data that data 13820 21:59:24,200 --> 21:59:29,240 set is that we can customize this in almost any way we wanted to to load whatever data that we're 13821 21:59:29,240 --> 21:59:34,840 working with, well, almost any data. However, because we've written so much code, this may be 13822 21:59:34,840 --> 21:59:38,200 prone to errors, which we're going to find out in the next video to see if it actually works. 13823 21:59:39,160 --> 21:59:43,720 But essentially, all we've done is we've followed the documentation here torch dot utils data 13824 21:59:43,720 --> 21:59:51,000 dot data set to replicate the functionality of an existing data loader function, namely image folder. 13825 21:59:51,000 --> 21:59:56,920 So if we scroll back up, ideally, if we've done it right, we should be able to write code like this, 13826 21:59:58,120 --> 22:00:02,440 passing in a root directory, such as a training directory, a particular data transform. 13827 22:00:03,080 --> 22:00:10,920 And we should get very similar instances as image folder, but using our own custom data set class. 13828 22:00:10,920 --> 22:00:20,520 So let's try that out in the next video. So now we've got a custom image folder class 13829 22:00:20,520 --> 22:00:26,280 that replicates the functionality of the original image folder, data loader class, 13830 22:00:26,280 --> 22:00:32,360 or data set class, that is, let's test it out. Let's see if it works on our own custom data. 13831 22:00:32,360 --> 22:00:43,400 So we're going to create a transform here so that we can transform our images raw jpeg images into tenses, 13832 22:00:43,400 --> 22:00:49,800 because that's the whole goal of importing data into pytorch. So let's set up a train transforms 13833 22:00:49,800 --> 22:00:57,560 compose. We're going to set it to equal to transforms dot compose. And I'm going to pass in a list here, 13834 22:00:57,560 --> 22:01:05,800 that it's going to be transforms, we're going to resize it to 6464. Whatever the image size will 13835 22:01:05,800 --> 22:01:12,600 reduce it down to 6464. Then we're going to go transforms dot random horizontal flip. We don't 13836 22:01:12,600 --> 22:01:18,680 need to necessarily flip them, but we're going to do it anyway, just to see if it works. And then 13837 22:01:18,680 --> 22:01:25,240 let's put in here transforms dot to tensor, because our images are getting opened as a pill image, 13838 22:01:25,240 --> 22:01:33,320 using image dot open. But now we're using the to transform transform from pytorch or torch 13839 22:01:33,320 --> 22:01:41,800 visions dot transforms. So I'll just put this here. From torch vision dot transforms, that way you 13840 22:01:41,800 --> 22:01:47,480 know where importing transforms there. And let's create one for the test data set as well, test 13841 22:01:47,480 --> 22:01:56,120 transforms, we'll set this up. Oh, excuse me, I need to just go import transforms. And let's go 13842 22:01:56,120 --> 22:02:01,720 transforms dot compose. And we'll pass in another list, we're going to do the exact same as above, 13843 22:02:01,720 --> 22:02:10,840 we'll set up resize, and we'll set the size equal to 6464. And then transforms, we're going to go 13844 22:02:10,840 --> 22:02:16,440 dot to tensor, we're going to skip the data augmentation for test data. Because typically, 13845 22:02:16,440 --> 22:02:23,160 you don't manipulate your test data in terms of data augmentation, you just convert it into a 13846 22:02:23,160 --> 22:02:29,800 tensor, rather than manipulate its orientation, shape, size, etc, etc. So let's run this. 13847 22:02:31,720 --> 22:02:41,080 And now let's see how image folder custom class works. Test out image folder custom. 13848 22:02:41,080 --> 22:02:50,280 Let's go, we'll set up the train data custom is equal to image folder custom. And then we'll set up 13849 22:02:50,280 --> 22:02:56,120 the target, which is equal to the training directory. And then we'll pass in the transform, 13850 22:02:56,120 --> 22:03:04,120 which is equal to the train transforms, which we just created above train transforms. And then 13851 22:03:04,120 --> 22:03:09,160 we're going to, I think that's all we need, actually, we only had two parameters that we're not going 13852 22:03:09,160 --> 22:03:13,880 to use a target transform, because our labels, we've got to help a function to transform our labels. 13853 22:03:13,880 --> 22:03:20,120 So test data custom is going to be image folder custom. And I'm going to set up the target to be 13854 22:03:20,120 --> 22:03:26,840 equal to the test directory. And the transform is going to be the test transforms from the cell 13855 22:03:26,840 --> 22:03:34,840 above there. And what's co lab telling me there? Oh, I'm going to set that up. Did we spell 13856 22:03:34,840 --> 22:03:40,200 something? Oh, we spelled it wrong train transforms. There we go. Beautiful. Now let's have a look at 13857 22:03:40,200 --> 22:03:49,160 our train data and test data custom. See if it worked. What do we have? Or we have an image folder 13858 22:03:49,160 --> 22:03:54,600 custom. Well, it doesn't give us as much rich information as just checking it out as it does 13859 22:03:54,600 --> 22:04:01,320 for the train data. But that's okay. We can still inspect these. So this is our original one made 13860 22:04:01,320 --> 22:04:07,960 with image folder. And we've got now train data custom and test data custom. Let's see if we can 13861 22:04:07,960 --> 22:04:13,320 get some information from there. So let's check the original length of the train data and see if 13862 22:04:13,320 --> 22:04:19,640 we can use the land method on our train data custom. Did that work? Wonderful. Now how about we do it 13863 22:04:19,640 --> 22:04:26,680 for the original test data made with image folder and our custom version made with test data or 13864 22:04:26,680 --> 22:04:32,760 image folder custom. Beautiful. That's exactly what we want. And now let's have a look at the 13865 22:04:32,760 --> 22:04:40,440 train data custom. Let's see if the classes attribute comes up. Dot classes. And we'll just leave that 13866 22:04:40,440 --> 22:04:47,080 there. We'll do the class dot ID X. Yes, it is. So this attribute here is I wonder if we get 13867 22:04:49,240 --> 22:04:54,840 information from Google co lab loading. What do we get? Oh, classes to ID X classes load image 13868 22:04:54,840 --> 22:05:02,440 paths transform. So if we go back up here, all these attributes are from here paths transform 13869 22:05:03,080 --> 22:05:09,800 classes class to ID X as well as load image. So this is all coming from the code that we wrote 13870 22:05:09,800 --> 22:05:16,200 our custom data set class. So let's keep pushing forward. Let's have a look at the class to ID X. 13871 22:05:16,200 --> 22:05:21,880 Do we get the same as what we wanted before? Yes, we do beautiful a dictionary containing our 13872 22:05:21,880 --> 22:05:30,200 string names and the integer associations. So let's now check for equality. We can do this by going 13873 22:05:31,800 --> 22:05:43,720 check for equality between original image folder data set and image folder custom data set. Now 13874 22:05:43,720 --> 22:05:49,960 we've kind of already done that here, but let's just try it out. Let's go print. Let's go train 13875 22:05:49,960 --> 22:05:58,440 data custom dot classes. Is that equal to train? Oh, I don't want three equals train data. The 13876 22:05:58,440 --> 22:06:07,960 original one classes and also print. Let's do test data custom dot classes. Is this equal to 13877 22:06:08,840 --> 22:06:16,920 test data? The original one classes. True and true. Now you could try this out. In fact, 13878 22:06:16,920 --> 22:06:24,520 it's a little exercise to try it out to compare the others. But congratulations to us, we have 13879 22:06:24,520 --> 22:06:30,440 replicated the main functionality of the image folder data set class. And so the takeaways from 13880 22:06:30,440 --> 22:06:37,800 this is that whatever data you have, PyTorch gives you a base data set class to inherit from. 13881 22:06:37,800 --> 22:06:43,720 And then you can write a function or a class that somehow interacts with whatever data you're 13882 22:06:43,720 --> 22:06:49,240 working with. So in our case, we load in an image. And then you, as long as you override the land 13883 22:06:49,240 --> 22:06:56,200 method and the get item method and return some sort of values, well, you can create your own 13884 22:06:56,200 --> 22:07:02,200 data set loading function. How beautiful is that? So that's going to help you work with your own 13885 22:07:02,200 --> 22:07:08,280 custom data sets in PyTorch. So let's keep pushing forward. We've seen analytically that 13886 22:07:08,280 --> 22:07:14,440 our custom data set is quite similar to the original PyTorch, torch vision dot data sets 13887 22:07:14,440 --> 22:07:20,840 image folder data set. But you know what I like to do? I like to visualize things. So let's in 13888 22:07:20,840 --> 22:07:27,720 the next video, create a function to display some random images from our trained data custom class. 13889 22:07:27,720 --> 22:07:37,640 It's time to follow the data explorer's motto of visualize, visualize, visualize. So let's 13890 22:07:37,640 --> 22:07:46,440 create another section. I'm going to write here a title called create a function to display random 13891 22:07:46,440 --> 22:07:52,200 images. And sure, we've, we've had a look at the different attributes of our custom data set. 13892 22:07:52,200 --> 22:07:57,720 We see that it gives back a list of different class names. We see that the lengths are similar 13893 22:07:57,720 --> 22:08:04,360 to the original, but there's nothing quite like visualizing some data. So let's go in here. We're 13894 22:08:04,360 --> 22:08:11,080 going to write a function, a helper function. So step number one, we need to take in a data set. 13895 22:08:11,080 --> 22:08:15,960 So one of the data sets that we just created, whether it be trained data custom or trained data. 13896 22:08:15,960 --> 22:08:29,480 And a number of other parameters, such as class names and how many images to visualize. And then 13897 22:08:29,480 --> 22:08:39,000 step number two is to prevent the display getting out of hand. Let's cap the number of 13898 22:08:39,000 --> 22:08:45,800 images to see at 10. Because look, if our data set is going to be thousands of images and we want 13899 22:08:45,800 --> 22:08:50,040 to put in a number of images to look at, let's just make sure it's the maximum is 10. That should 13900 22:08:50,040 --> 22:09:00,280 be enough. So we'll set the random seed for reproducibility. Number four is, let's get a list of random 13901 22:09:00,280 --> 22:09:08,120 samples. So we want random sample indexes, don't just get rid of this s from what do we want it from 13902 22:09:08,840 --> 22:09:15,560 from the target data set. So we want to take in a data set, and we want to count the number of 13903 22:09:15,560 --> 22:09:21,000 images we're seeing, we want to set a random seed. And do you see how much I use randomness here to 13904 22:09:21,000 --> 22:09:26,280 really get an understanding of our data? I really, really, really love harnessing the power of 13905 22:09:26,280 --> 22:09:32,280 randomness. So we want to get a random sample of indexes from all of our data set. And then we're 13906 22:09:32,280 --> 22:09:41,320 going to set up a matplotlib plot. Then we want to loop through the random sample images. 13907 22:09:41,320 --> 22:09:50,360 And plot them with matplotlib. And then as a side to this one, step seven is we need to make sure 13908 22:09:50,360 --> 22:10:00,360 the dimensions of our images line up with matplotlib. So matplotlib needs a height width color channels. 13909 22:10:00,360 --> 22:10:10,680 All right, let's take it on, hey? So number one is create a function to take in a data set. 13910 22:10:10,680 --> 22:10:16,280 So we're going to call this def, let's call it def display random images going to be one of our 13911 22:10:16,280 --> 22:10:21,480 helper functions. We've created a few type of functions like this. But let's take in a data set, 13912 22:10:21,480 --> 22:10:27,640 which is torch utils of type that is of type data set. Then we're going to take in classes, 13913 22:10:27,640 --> 22:10:32,920 which is going to be a list of different strings. So this is going to be our class names for 13914 22:10:32,920 --> 22:10:38,360 whichever data set we're using. I'm going to set this equal to none. And then we're going to take in 13915 22:10:38,360 --> 22:10:43,400 n, which is the number of images we'd like to plot. And I'm going to set this to 10 by default. So 13916 22:10:43,400 --> 22:10:49,000 we can see 10 images at a time, 10 random images, that is, do we want to display the shape? Let's 13917 22:10:49,000 --> 22:10:54,920 set that equal to true, so that we can display what the shape of the images, because we're passing 13918 22:10:54,920 --> 22:11:01,400 it through our transform as it goes into a data set. So we want to see what the shape of our 13919 22:11:01,400 --> 22:11:07,080 images are just to make sure that that's okay. And we can also let's set up a seed, which is 13920 22:11:07,080 --> 22:11:13,560 going to be an integer, and we'll set that to none to begin with as well. Okay, so step number two, 13921 22:11:13,560 --> 22:11:17,960 what do we have above? We have to prevent the display getting out of hand, let's cap the number 13922 22:11:17,960 --> 22:11:23,160 of images to see at 10. So we've got n is by default, it's going to be 10, but let's just make 13923 22:11:23,160 --> 22:11:32,920 sure that it stays there. Adjust display, if n is too high. So if n is greater than 10, 13924 22:11:32,920 --> 22:11:39,640 let's just readjust this, let's set n equal to 10, and display shape, we'll turn off the 13925 22:11:39,640 --> 22:11:45,720 display shape, because if we have 10 images, our display may get out of hand. So just print out 13926 22:11:45,720 --> 22:11:56,600 here for display purposes, and shouldn't be larger than 10, setting to 10, and removing 13927 22:11:57,160 --> 22:12:01,320 shape display. Now I only know this because I've had experience cooking this dish before. 13928 22:12:01,320 --> 22:12:06,840 In other words, I've written this type of code before. So you can customize the beautiful thing 13929 22:12:06,840 --> 22:12:12,040 about Python and PyTorch, as you can customize these display functions in any way you see fit. 13930 22:12:12,040 --> 22:12:17,480 So step number three, what are we doing? Set the random seed for reproducibility. Okay, 13931 22:12:17,480 --> 22:12:25,080 set the seed. So if seed, let's set random dot seed equal to that seed value, and then we can keep 13932 22:12:25,080 --> 22:12:32,760 and then we can keep going. So number four is let's get some random sample indexes. So we can do 13933 22:12:32,760 --> 22:12:39,480 that by going get random sample indexes, which is step number four here. So we've got a target 13934 22:12:39,480 --> 22:12:45,400 data set that we want to inspect. We want to get some random samples from that. So let's create a 13935 22:12:45,400 --> 22:12:54,680 random samples IDX list. And I'm going to randomly sample from a length of our data set, or sorry, 13936 22:12:54,680 --> 22:12:59,080 a range of the length of our data set. And I'll show you what this means in a second. 13937 22:13:00,440 --> 22:13:06,280 And the K, excuse me, have we got enough brackets there? I always get confused with the brackets. 13938 22:13:06,280 --> 22:13:11,960 The K is going to be n. So in this case, I want to randomly sample 10 images from the length of 13939 22:13:11,960 --> 22:13:18,120 our data set or 10 indexes. So let's just have a look at what this looks like. We'll put in here, 13940 22:13:18,120 --> 22:13:24,840 our train data custom here. So this is going to take a range of the length of our train data 13941 22:13:24,840 --> 22:13:32,120 custom, which is what 225. We looked at that before, just up here, length of this. So between zero 13942 22:13:32,120 --> 22:13:37,960 and 255, we're going to get 10 indexes if we've done this correctly. Beautiful. So there's 10 13943 22:13:37,960 --> 22:13:44,840 random samples from our train data custom, or 10 random indexes, that is. So we're up to step number 13944 22:13:44,840 --> 22:13:53,000 five, which was loop through the random sample images or indexes. Let's create this to indexes, 13945 22:13:53,800 --> 22:13:58,440 indexes and plot them with matplotlib. So this is going to give us a list here. 13946 22:13:59,000 --> 22:14:10,040 So let's go loop through random indexes and plot them with matplotlib. Beautiful. So for 13947 22:14:10,040 --> 22:14:21,640 i tug sample in enumerate, let's enumerate through the random, random samples, idx list. 13948 22:14:22,360 --> 22:14:30,920 And then we're going to go tug image and tug label, because all of the samples in our target 13949 22:14:30,920 --> 22:14:36,440 data set are in the form of tuples. So we're going to get the target image and the target label, 13950 22:14:36,440 --> 22:14:43,320 which is going to be data set tug sample. We'll take the index. So it might be one of these values 13951 22:14:43,320 --> 22:14:51,000 here. We'll index on that. And the zero index will be the image. And then we'll go on the data set as 13952 22:14:51,000 --> 22:14:58,440 well. We'll take the tug sample index. And then the index number one will be the label of our target 13953 22:14:58,440 --> 22:15:06,920 sample. And then number seven, oh, excuse me, we've missed a step. That should be number six. 13954 22:15:08,120 --> 22:15:14,200 Did you catch that? Number five is setup plot. So we can do this quite easily by going plot 13955 22:15:14,200 --> 22:15:20,280 figure. This is so that each time we iterate through another sample, we're going to have 13956 22:15:20,280 --> 22:15:27,560 quite a big figure here. So we set up the plot outside the loop so that we can add a plot to this 13957 22:15:27,560 --> 22:15:34,040 original plot here. And now this is number seven, where we make sure the dimensions of our images 13958 22:15:34,040 --> 22:15:39,560 line up with matplotlib. So if we recall by default, pytorch is going to turn our image dimensions into 13959 22:15:39,560 --> 22:15:47,560 what color channels first, however, matplotlib prefers color channels last. So let's go adjust, 13960 22:15:47,560 --> 22:15:58,200 tensor dimensions for plotting. So let's go tag image. Let's call this tag image adjust equals 13961 22:15:58,200 --> 22:16:06,600 tag image dot commute. And we're going to alter the order of the indexes. So this is going to go 13962 22:16:06,600 --> 22:16:14,040 from color channels or the dimensions that is height width. And we're going to change this width, 13963 22:16:14,040 --> 22:16:23,480 if I could spell, to height width color channels. Beautiful. That one will probably catch you off 13964 22:16:23,480 --> 22:16:28,360 guard a few times. But we've seen it a couple of times now. So we're going to keep going with this 13965 22:16:29,000 --> 22:16:36,760 plot adjusted samples. So now we can add a subplot to our matplotlib plot. And we want to create, 13966 22:16:36,760 --> 22:16:44,840 we want one row of n images, this will make a lot more sense when we visualize it. And then for 13967 22:16:44,840 --> 22:16:52,840 the index, we're going to keep track of i plus one. So let's keep going. So then we're going to go 13968 22:16:52,840 --> 22:17:01,960 plot in show. And I'm going to go tug image adjust. So I'm going to plot this image here. And then 13969 22:17:01,960 --> 22:17:11,800 let's turn off the axis. And we can go if the classes variable exists, which is up here, a list 13970 22:17:11,800 --> 22:17:18,200 of classes, let's adjust the title of the plot to be the particular index in the class list. So 13971 22:17:18,200 --> 22:17:26,520 title equals f class. And then we're going to put in here classes. And we're going to index on that 13972 22:17:26,520 --> 22:17:31,480 with the target label index, which is going to come from here. Because that's going to be a new 13973 22:17:31,480 --> 22:17:42,520 numerical format. And then if display shape, let's set the title equal to title plus f. We're going 13974 22:17:42,520 --> 22:17:49,320 to go new line shape. This is going to be the shape of the image, tug image adjust dot shape. 13975 22:17:50,840 --> 22:17:57,320 And then we'll set the title to PLT dot title. So you see how if we have display shape, we're 13976 22:17:57,320 --> 22:18:02,120 just adjusting the title variable that we created here. And then we're putting the title onto the 13977 22:18:02,120 --> 22:18:09,240 plot. So let's see how this goes. That is quite a beautiful function. Let's pass in one of our 13978 22:18:09,240 --> 22:18:15,560 data sets and see what it looks like. Let's plot some random images. So which one should we start 13979 22:18:15,560 --> 22:18:23,560 with first? So let's display random images from the image folder created data sets. So this is the 13980 22:18:23,560 --> 22:18:30,200 inbuilt pytorch image folder. Let's go display random images, the function we just created above. 13981 22:18:30,200 --> 22:18:34,440 We're going to pass in the train data. And then we can pass in the number of images. Let's have 13982 22:18:34,440 --> 22:18:41,640 a look at five. And the classes is going to be the class names, which is just a list of our 13983 22:18:41,640 --> 22:18:46,360 different class names. And then we can set the seed, we want it to be random. So we'll just set 13984 22:18:46,360 --> 22:18:55,240 the seed to equal none. Oh, doesn't that look good? So this is from our original train data 13985 22:18:55,240 --> 22:19:02,440 made with image folder. So option number one up here, option one, there we go. And we've 13986 22:19:02,440 --> 22:19:08,360 passed in the class name. So this is sushi resize to 64, 64, three, same with all of the others, 13987 22:19:08,360 --> 22:19:14,840 but from different classes. Let's set the seed to 42, see what happens. I get these images, 13988 22:19:14,840 --> 22:19:21,480 we got a sushi, we got a pizza, we got pizza, sushi pizza. And then if we try a different one, 13989 22:19:21,480 --> 22:19:29,480 we just go none. We get random images again, wonderful. Now let's write the same code, 13990 22:19:29,480 --> 22:19:38,600 but this time using our train data custom data set. So display random images from the image folder 13991 22:19:38,600 --> 22:19:46,840 custom data set. So this is the one that we created display random images. I'm going to pass 13992 22:19:46,840 --> 22:19:53,880 in train data custom, our own data set. Oh, this is exciting. Let's set any equal to 10 and just see 13993 22:19:53,880 --> 22:19:58,680 see how far we can go with with our plot. Or maybe we set it to 20 and just see if our 13994 22:19:58,680 --> 22:20:08,920 code for adjusting the plot makes sense. Class names and seed equals, I'm going to put in 42 this time. 13995 22:20:08,920 --> 22:20:13,800 There we go. For display purposes, and shouldn't be larger than 10 setting to 10 and removing shape 13996 22:20:13,800 --> 22:20:21,160 display. So we have a stake image, a pizza image, pizza, steak pizza, pizza, pizza, pizza, steak, 13997 22:20:21,160 --> 22:20:26,200 pizza. If we turn off the random seed, we should get another 10 random images here. 13998 22:20:26,200 --> 22:20:34,600 Beautiful. Look at that. Steak, steak, sushi, pizza, steak, sushi class. I'm reading out 13999 22:20:34,600 --> 22:20:41,720 the different things here. Pizza, pizza, pizza, pizza. Okay. So it looks like our custom data set 14000 22:20:41,720 --> 22:20:48,200 is working from both a qualitative standpoint, looking at the different images and a quantitative. 14001 22:20:48,200 --> 22:20:52,680 How about we change it to five and see what it looks like? Do we have a different shape? Yes, 14002 22:20:52,680 --> 22:20:59,320 we do the same shape as above. Wonderful. Okay. So we've got train data custom. 14003 22:21:00,120 --> 22:21:05,560 And we've got train data, which is made from image folder. But the premises remain, we've built up 14004 22:21:05,560 --> 22:21:10,600 a lot of different ideas. And we're looking at things from different points of view. We are 14005 22:21:10,600 --> 22:21:17,720 getting our data from the folder structure here into tensor format. So there's still one more 14006 22:21:17,720 --> 22:21:23,800 step that we have to do. And that's go from data set to data loader. So in the next video, 14007 22:21:23,800 --> 22:21:30,120 let's see how we can turn our custom loaded images, train data custom, and test data custom 14008 22:21:30,120 --> 22:21:35,240 into data loaders. So you might want to go ahead and give that a try yourself. We've done it before 14009 22:21:35,240 --> 22:21:40,440 up here. Turn loaded images into data loaders. We're going to replicate the same thing as we did 14010 22:21:40,440 --> 22:21:45,480 in here for our option number two, except this time we'll be using our custom data set. 14011 22:21:45,480 --> 22:21:54,280 I'll see you in the next video. I'll take some good looking images and even better that they're 14012 22:21:54,280 --> 22:22:00,360 from our own custom data set. Now we've got one more step. We're going to turn our data set into a 14013 22:22:00,360 --> 22:22:05,800 data loader. In other words, we're going to batchify all of our images so they can be used with the 14014 22:22:05,800 --> 22:22:12,440 model. And I gave you the challenge of trying this out yourself in the last video. So I hope 14015 22:22:12,440 --> 22:22:17,400 you gave that a go. But let's see what that might look like in here. So I'm going to go 5.4. 14016 22:22:17,960 --> 22:22:26,360 Let's go. What should we call this? So turn custom loaded images into data loaders. So this 14017 22:22:26,360 --> 22:22:33,000 is just goes to show that we can write our own custom data set class. And we can still use it 14018 22:22:33,000 --> 22:22:42,760 with PyTorch's data loader. So let's go from utils torch dot utils that is utils dot data import 14019 22:22:42,760 --> 22:22:47,080 data loader. We'll get that in here. We don't need to do that again, but I'm just doing it for 14020 22:22:47,080 --> 22:22:52,920 completeness. So we're going to set this to train data loader custom. And I'm going to create an 14021 22:22:52,920 --> 22:22:59,160 instance of data loader here. And then inside I'm going to pass the data set, which is going to be 14022 22:22:59,160 --> 22:23:05,640 train data custom. I'm just going to set a universal parameter here in capitals for batch size equals 14023 22:23:05,640 --> 22:23:11,880 32. Because we can come down here, we can set the batch size, we're going to set this equal to 32. 14024 22:23:12,840 --> 22:23:17,320 Or in other words, the batch size parameter we set up there, we can set the number of workers 14025 22:23:17,320 --> 22:23:26,040 here as well. If you set to zero, let's go see what the default is actually torch utils data loader. 14026 22:23:26,040 --> 22:23:34,680 What's the default for number of workers? Zero. Okay, beautiful. And recall that number of workers 14027 22:23:34,680 --> 22:23:41,000 is going to set how many cores load your data with a data loader. And generally higher is better. 14028 22:23:41,000 --> 22:23:46,280 But you can also experiment with this value and see what value suits your model and your 14029 22:23:46,280 --> 22:23:52,600 hardware the best. So just keep in mind that number of workers is going to alter how much 14030 22:23:52,600 --> 22:23:59,400 compute your hardware that you're running your code on uses to load your data. So by default, 14031 22:23:59,400 --> 22:24:05,880 it's set to zero. And then we're going to shuffle the training data. Wonderful. And let's do the 14032 22:24:05,880 --> 22:24:10,920 same for the test data loader. We'll create test data loader custom. And I'm going to create a 14033 22:24:10,920 --> 22:24:18,120 new instance. So let me make a few code cells here of data loader, and create a data set or pass 14034 22:24:18,120 --> 22:24:24,520 in the data set parameter as the test data custom. So again, these data sets are what we've created 14035 22:24:24,520 --> 22:24:32,600 using our own custom data set class. I'm going to set the batch size equal to batch size. And 14036 22:24:32,600 --> 22:24:37,720 let's set the number workers equal to zero. In a previous video, we've also set it to CPU count. 14037 22:24:38,680 --> 22:24:46,280 You can also set it to one. You can hard code it to four all depends on what hardware you're using. 14038 22:24:46,280 --> 22:24:54,040 I like to use OPA OS dot CPU count. And then we're not going to shuffle the test data. 14039 22:24:56,680 --> 22:25:05,160 False. Beautiful. And let's have a look at what we get here. Train data loader custom and test 14040 22:25:06,120 --> 22:25:10,840 data loader custom. And actually, I'm just going to reset this instead of being OOS CPU count. 14041 22:25:10,840 --> 22:25:14,360 I'm going to put it back to zero, just so we've got it in line with the one above. 14042 22:25:14,360 --> 22:25:22,120 And of course, numb workers, we could also set this numb workers equals zero or OS dot CPU count. 14043 22:25:22,920 --> 22:25:29,880 And then we could come down here and set this as numb workers and numb workers. 14044 22:25:30,920 --> 22:25:38,040 And let's have a look to see if it works. Beautiful. So we've got two instances of utils.data.data 14045 22:25:38,040 --> 22:25:43,960 loader. Now, let's just get a single sample from the train data loader here, just to make sure the 14046 22:25:43,960 --> 22:25:52,280 image shape and batch size is correct. Get image and label from custom data loader. We want image 14047 22:25:52,280 --> 22:25:59,720 custom. And I'm going to go label custom equals next. And I'm going to iter over the train data 14048 22:25:59,720 --> 22:26:10,280 loader custom. And then let's go print out the shapes. We want image custom dot shape and label 14049 22:26:10,280 --> 22:26:18,120 custom. Do we get a shape here? Beautiful. There we go. So we have shape here of 32, 14050 22:26:18,120 --> 22:26:24,840 because that is our batch size. Then we have three color channels, 64, 64, which is in line with 14051 22:26:24,840 --> 22:26:31,000 what? Which is in line with our transform that we set all the way up here. Transform. We transform 14052 22:26:31,000 --> 22:26:35,240 our image. You may want to change that to something different depending on the model you're using, 14053 22:26:35,240 --> 22:26:41,240 depending on how much data you want to be comprised within your image. Recall, generally a larger 14054 22:26:41,240 --> 22:26:47,640 image size encodes more information. And this is all coming from our original image folder 14055 22:26:47,640 --> 22:26:53,320 custom data set class. So look at us go. And I mean, this is a lot of code here or a fair bit of 14056 22:26:53,320 --> 22:26:59,240 code, right? But you could think of this as like you write it once. And then if your data set continues 14057 22:26:59,240 --> 22:27:05,960 to be in this format, well, you can use this over and over again. So you might put this, this image 14058 22:27:05,960 --> 22:27:11,960 folder custom into a helper function file over here, such as data set dot pie or something like 14059 22:27:11,960 --> 22:27:18,040 that. And then you could call it in future code instead of rewriting it all the time. And so that's 14060 22:27:18,040 --> 22:27:23,320 just exactly what pytorch is done with taught vision dot data sets dot image folder. So we've 14061 22:27:23,320 --> 22:27:27,400 got some shapes here. And if we wanted to change the batch size, what do we do? We just change it 14062 22:27:27,400 --> 22:27:32,840 like that 64. Remember, a good batch size is also a multiple of eight, because that's going to help 14063 22:27:32,840 --> 22:27:43,640 out computing. And batch size equals one. We get a batch size equal of one. We've been through a 14064 22:27:43,640 --> 22:27:49,240 fair bit. But we've covered a very important thing. And that is loading your own data with a custom 14065 22:27:49,240 --> 22:27:54,680 data set. So generally, you will be able to load your own data with an existing data loading function 14066 22:27:54,680 --> 22:28:01,160 or data set function from one of the torch domain libraries, such as torch audio, torch text, 14067 22:28:01,160 --> 22:28:07,400 torch vision, torch rack. And later on, when it's out of beta, torch data. But if you need to create 14068 22:28:07,400 --> 22:28:13,160 your own custom one, while you can subclass torch dot utils dot data, dot data set, and then add 14069 22:28:13,160 --> 22:28:19,000 your own functionality to it. So let's keep pushing forward. Previously, we touched a little bit on 14070 22:28:19,000 --> 22:28:25,880 transforming data. And you may have heard me say that torch vision transforms can be used for data 14071 22:28:25,880 --> 22:28:33,640 augmentation. And if you haven't, that is what the documentation says here. But data augmentation 14072 22:28:33,640 --> 22:28:39,560 is manipulating our images in some way, shape or form, so that we can artificially increase the 14073 22:28:39,560 --> 22:28:46,680 diversity of our training data set. So let's have a look at that more in the next video. I'll see you 14074 22:28:46,680 --> 22:28:56,600 there. Over the last few videos, we've created functions and classes to load in our own custom 14075 22:28:56,600 --> 22:29:03,320 data set. And we learned that one of the biggest steps in loading a custom data set is transforming 14076 22:29:03,320 --> 22:29:10,200 your data, particularly turning your target data into tenses. And we also had a brief look at the 14077 22:29:10,200 --> 22:29:14,840 torch vision transforms module. And we saw that there's a fair few different ways that we can 14078 22:29:14,840 --> 22:29:21,880 transform our data. And that one of the ways that we can transform our image data is through 14079 22:29:21,880 --> 22:29:27,480 augmentation. And so if we went into the illustration of transforms, let's have a look at all the 14080 22:29:27,480 --> 22:29:33,160 different ways we can do it. We've got resize going to change the size of the original image. 14081 22:29:33,160 --> 22:29:39,320 We've got center crop, which will crop. We've got five crop. We've got grayscale. We've got random 14082 22:29:39,320 --> 22:29:46,520 transforms. We've got Gaussian blur. We've got random rotation, random caffeine, random crop. 14083 22:29:46,520 --> 22:29:50,840 We could keep going. And in fact, I'd encourage you to check out all of the different options here. 14084 22:29:51,640 --> 22:29:58,680 But oh, there's auto augment. Wonderful. There's random augment. This is what I was hinting at. 14085 22:29:58,680 --> 22:30:04,040 Data augmentation. Do you notice how the original image gets augmented in different ways here? 14086 22:30:04,040 --> 22:30:09,560 So it gets artificially changed. So it gets rotated a little here. It gets dark and a little 14087 22:30:09,560 --> 22:30:15,320 here or maybe brightened, depending how you look at it, it gets shifted up here. And then the colors 14088 22:30:15,320 --> 22:30:21,720 kind of change here. And so this process is known as data augmentation, as we've hinted at. 14089 22:30:21,720 --> 22:30:30,440 And we're going to create another section here, which is number six, other forms of transforms. 14090 22:30:31,080 --> 22:30:37,240 And this is data augmentation. So how could you find out about what data augmentation is? 14091 22:30:37,240 --> 22:30:42,120 Well, you could go here. What is data augmentation? And I'm sure there's going to be plenty of 14092 22:30:42,120 --> 22:30:48,680 resources here. Wikipedia. There we go. Data augmentation in data analysis are techniques 14093 22:30:48,680 --> 22:30:55,320 used to increase the amount of data by adding slightly modified copies of already existing data 14094 22:30:55,320 --> 22:31:01,240 or newly created synthetic data from existing data. So I'm going to write down here, 14095 22:31:02,040 --> 22:31:11,880 data augmentation is the process of artificially adding diversity to your training data. 14096 22:31:11,880 --> 22:31:23,960 Now, in the case of image data, this may mean applying various image transformations to the 14097 22:31:23,960 --> 22:31:30,120 training images. And we saw a whole bunch of those in the torch vision transformed package. 14098 22:31:30,120 --> 22:31:35,400 But now let's have a look at one type of data augmentation in particular. And that is trivial 14099 22:31:35,400 --> 22:31:41,160 augment. But just to illustrate this, I've got a slide here ready to go. We've got what is data 14100 22:31:41,160 --> 22:31:47,800 augmentation. And it's looking at the same image, but from different perspectives. And we do this, 14101 22:31:47,800 --> 22:31:54,600 as I said, to artificially increase the diversity of a data set. So if we imagine our original 14102 22:31:54,600 --> 22:31:59,480 images over here on the left, and then if we wanted to rotate it, we could apply a rotation 14103 22:31:59,480 --> 22:32:04,440 transform. And then if we wanted to shift it on the vertical and the horizontal axis, 14104 22:32:04,440 --> 22:32:10,280 we could apply a shift transform. And if we wanted to zoom in on the image, we could apply 14105 22:32:10,280 --> 22:32:16,200 a zoom transform. And there are many different types of transforms. As I've got a note here, 14106 22:32:16,200 --> 22:32:20,200 there are many different kinds of data augmentation, such as cropping, replacing, 14107 22:32:20,200 --> 22:32:26,120 shearing. And this slide only demonstrates a few. But I'd like to highlight another type of data 14108 22:32:26,120 --> 22:32:34,360 augmentation. And that is one used to recently train pytorch torch vision image models to state 14109 22:32:34,360 --> 22:32:42,680 of the art levels. So let's take a look at one particular type of data augmentation, 14110 22:32:43,880 --> 22:32:51,800 used to train pytorch vision models to state of the art levels. 14111 22:32:54,440 --> 22:32:59,240 Now, just in case you're not sure why we might do this, we would like to increase 14112 22:32:59,240 --> 22:33:06,520 the diversity of our training data so that our images become harder for our model to learn. Or 14113 22:33:06,520 --> 22:33:11,880 it gets a chance to view the same image from different perspectives so that when you use your 14114 22:33:11,880 --> 22:33:18,520 image classification model in practice, it's seen the same sort of images, but from many different 14115 22:33:18,520 --> 22:33:23,880 angles. So hopefully it learns patterns that are generalizable to those different angles. 14116 22:33:23,880 --> 22:33:35,720 So this practice, hopefully, results in a model that's more generalizable to unseen data. 14117 22:33:36,920 --> 22:33:48,280 And so if we go to torch vision, state of the art, here we go. So this is a recent blog post 14118 22:33:48,280 --> 22:33:51,960 by the pytorch team, how to train state of the art models, which is what we want to do, 14119 22:33:51,960 --> 22:33:57,240 state of the art means best in business, otherwise known as soda. You might see this acronym quite 14120 22:33:57,240 --> 22:34:03,160 often using torch visions latest primitives. So torch vision is the package that we've been 14121 22:34:03,160 --> 22:34:08,680 using to work with vision data. And torch vision has a bunch of primitives, which are, 14122 22:34:08,680 --> 22:34:16,280 in other words, functions that help us train really good performing models. So blog post here. 14123 22:34:16,280 --> 22:34:23,560 And if we jump into this blog post and if we scroll down, we've got some improvements here. 14124 22:34:23,560 --> 22:34:28,360 So there's an original ResNet 50 model. ResNet 50 is a common computer vision architecture. 14125 22:34:29,000 --> 22:34:35,800 So accuracy at one. So what do we have? Well, let's just say they get a boost in what the previous 14126 22:34:35,800 --> 22:34:43,880 results were. So if we scroll down, there is a type of data augmentation here. So if we add up 14127 22:34:43,880 --> 22:34:48,840 all of the improvements that they used, so there's a whole bunch here. Now, as your extra curriculum, 14128 22:34:48,840 --> 22:34:53,560 I'd encourage you to look at what the improvements are. You're not going to get them all the first 14129 22:34:53,560 --> 22:34:58,520 go, but that's all right. Blog posts like this come out all the time and the recipes are continually 14130 22:34:58,520 --> 22:35:04,840 changing. So even though I'm showing you this now, this may change in the future. So I just 14131 22:35:04,840 --> 22:35:09,960 scroll down to see if this table showed us what the previous results were. Doesn't look like it does. 14132 22:35:09,960 --> 22:35:15,720 Oh, no, there's the baseline. So 76 and with all these little additions, it got right up to nearly 14133 22:35:15,720 --> 22:35:21,320 81. So nearly a boost of 5% accuracy. And that's pretty good. So what we're going to have a look 14134 22:35:21,320 --> 22:35:26,280 at is trivial augment. So there's a bunch of different things such as learning rate optimization, 14135 22:35:26,280 --> 22:35:32,120 training for longer. So these are ways you can improve your model. Random erasing of image data, 14136 22:35:32,120 --> 22:35:37,720 label smoothing, you can add that as a parameter to your loss functions, such as cross entropy loss, 14137 22:35:37,720 --> 22:35:44,600 mix up and cut mix, weight decay tuning, fixed res mitigations, exponential moving average, 14138 22:35:44,600 --> 22:35:49,240 which is EMA, inference resize tuning. So there's a whole bunch of different recipe items here, 14139 22:35:49,240 --> 22:35:52,760 but we're going to focus on what we're going to break it down. Let's have a look at trivial 14140 22:35:52,760 --> 22:36:01,560 augment. So we'll come in here. Let's look at trivial augment. So if we wanted to look at 14141 22:36:01,560 --> 22:36:06,600 trivial augment, can we find it in here? Oh, yes, we can. It's right here. Trivial augment. 14142 22:36:06,600 --> 22:36:12,840 So as you'll see, if you pass an image into trivial augment, it's going to change it in a few 14143 22:36:12,840 --> 22:36:21,080 different ways. So if we go into here, let's write that down. So let's see this in action on some 14144 22:36:21,080 --> 22:36:30,040 of our own data. So we'll import from torch vision, import transforms. And we're going to create a 14145 22:36:30,040 --> 22:36:41,240 train transform, which is equal to transforms dot compose. We'll pass it in there. And this is 14146 22:36:41,240 --> 22:36:45,960 going to be very similar to what we've done before in terms of composing a transform. What do we 14147 22:36:45,960 --> 22:36:51,880 want to do? Well, let's say we wanted to resize one of our images or an image going through this 14148 22:36:51,880 --> 22:36:59,240 transform. Let's change its size to 224224, which is a common size in image classification. And 14149 22:36:59,240 --> 22:37:07,720 then it's going to go through transforms. We're going to pass in trivial augment wide. And there's 14150 22:37:07,720 --> 22:37:14,920 a parameter here, which is number of magnitude bins, which is basically a number from 0 to 31, 14151 22:37:14,920 --> 22:37:22,040 31 being the max of how intense you want the augmentation to happen. So say we, we only put this as 14152 22:37:22,040 --> 22:37:29,720 5, our augmentation would be of intensity from 0 to 5. And so in that case, the maximum wouldn't 14153 22:37:29,720 --> 22:37:35,480 be too intense. So if we put it to 31, it's going to be the max intensity. And what I mean by intensity 14154 22:37:35,480 --> 22:37:43,400 is say this rotation, if we go on a scale of 0 to 31, this may be a 10, whereas 31 would be 14155 22:37:43,400 --> 22:37:50,680 completely rotating. And same with all these others, right? So the lower this number, the less the 14156 22:37:50,680 --> 22:37:58,440 maximum up a bound of the applied transform will be. Then if we go transforms dot to tensor, 14157 22:37:59,800 --> 22:38:06,840 wonderful. So there we've just implemented trivial augment. How beautiful is that? That is from 14158 22:38:07,400 --> 22:38:13,880 the PyTorch torch vision transforms library. We've got trivial augment wide. And it was used 14159 22:38:13,880 --> 22:38:20,920 trivial augment to train the latest state of the art vision models in the PyTorch torch vision 14160 22:38:21,560 --> 22:38:25,960 models library or models repository. And if you wanted to look up trivial augment, how could you 14161 22:38:25,960 --> 22:38:31,320 find that? You could search it. Here is the paper if you'd like to read it. Oh, it's implemented. 14162 22:38:31,320 --> 22:38:36,120 It's actually a very, very, I would say, let's just say trivial augment. I didn't want to say 14163 22:38:36,120 --> 22:38:40,120 simple because I don't want to downplay it. Trivial augment leverages the power of randomness 14164 22:38:40,120 --> 22:38:45,480 quite beautifully. So I'll let you read more on there. I would rather try it out on our data 14165 22:38:45,480 --> 22:38:52,440 and visualize it first. Test transform. Let's go transforms compose. And you might have the 14166 22:38:52,440 --> 22:38:58,360 question of which transforms should I use with my data? Well, that's the million dollar question, 14167 22:38:58,360 --> 22:39:02,760 right? That's the same thing as asking, which model should I use for my data? There's a fair 14168 22:39:02,760 --> 22:39:09,160 few different answers there. And my best answer will be try out a few, see what work for other 14169 22:39:09,160 --> 22:39:14,680 people like we've done here by finding that trivial augment worked well for the PyTorch team. 14170 22:39:14,680 --> 22:39:19,400 Try that on your own problems. If it works well, excellent. If it doesn't work well, 14171 22:39:19,400 --> 22:39:24,760 well, you can always excuse me. We've got a spelling mistake. If it doesn't work well, 14172 22:39:24,760 --> 22:39:29,720 well, you can always set up an experiment to try something else. So let's test out our 14173 22:39:29,720 --> 22:39:34,680 augmentation pipeline. So we'll get all the image paths. We've already done this, but we're 14174 22:39:34,680 --> 22:39:39,720 going to do it anyway. Again, just to reiterate, we've covered a fair bit here. So I might just 14175 22:39:39,720 --> 22:39:46,200 rehash on a few things. We're going to get list, image path, which is our, let me just show you 14176 22:39:47,080 --> 22:39:51,720 our image path. We just want to get all of the images within this file. 14177 22:39:52,360 --> 22:39:59,960 So we'll go image path dot glob, glob together all the files and folders that match this pattern. 14178 22:39:59,960 --> 22:40:07,720 And then if we check, what do we get? We'll check the first 10. Beautiful. And then we can 14179 22:40:07,720 --> 22:40:13,160 leverage our function from the four to plot some random images, plot random images. 14180 22:40:14,520 --> 22:40:20,280 We'll pass in or plot transformed random transformed images. That's what we want. 14181 22:40:20,280 --> 22:40:26,120 Let's see what it looks like when it goes through our trivial augment. So image paths, 14182 22:40:26,120 --> 22:40:34,040 equals image part list. This is a function that we've created before, by the way, transform equals 14183 22:40:34,040 --> 22:40:38,840 train transform, which is the transform we just created above that contains trivial augment. 14184 22:40:40,600 --> 22:40:45,080 And then we're going to put n equals three for five images. And we'll do seed equals none 14185 22:40:45,080 --> 22:40:52,200 to plot. Oh, sorry, n equals three for three images, not five. Beautiful. And we'll set the 14186 22:40:52,200 --> 22:40:58,200 seed equals none, by the way. So look at this. We've got class pizza. Now trivial augment, 14187 22:40:58,200 --> 22:41:03,480 it resized this. Now, I'm not quite sure what it did to transform it per se. Maybe it got a little 14188 22:41:03,480 --> 22:41:09,000 bit darker. This one looks like it's been the colors have been manipulated in some way, shape, 14189 22:41:09,000 --> 22:41:16,200 or form. And this one looks like it's been resized and not too much has happened to that one from 14190 22:41:16,200 --> 22:41:22,280 my perspective. So if we go again, let's have a look at another three images. So trivial augment 14191 22:41:22,280 --> 22:41:28,760 works. And what I said before, it harnesses the power of randomness. It kind of selects randomly 14192 22:41:28,760 --> 22:41:33,640 from all of these other augmentation types, and applies them at some level of intensity. 14193 22:41:34,280 --> 22:41:38,520 So all of these ones here, trivial augment is just going to select summit random, and then 14194 22:41:38,520 --> 22:41:44,280 apply them some random intensity from zero to 31, because that's what we've set on our data. 14195 22:41:44,280 --> 22:41:48,440 And of course, you can read a little bit more in the documentation, or sorry, in the paper here. 14196 22:41:49,080 --> 22:41:53,560 But I like to see it happening. So this one looks like it's been cut off over here a little bit. 14197 22:41:54,200 --> 22:41:58,760 This one again, the colors have been changed in some way, shape, or form. This one's been darkened. 14198 22:41:59,560 --> 22:42:04,440 And so do you see how we're artificially adding diversity to our training data set? So instead 14199 22:42:04,440 --> 22:42:09,800 of all of our images being this one perspective like this, we're adding a bunch of different 14200 22:42:09,800 --> 22:42:14,520 angles and telling our model, hey, you got to try and still learn these patterns, even if they've 14201 22:42:14,520 --> 22:42:20,200 been manipulated. So we'll try one more of these. So look at that one. That's pretty 14202 22:42:20,200 --> 22:42:25,000 manipulated there, isn't it? But it's still an image of stake. So that's what we're trying to 14203 22:42:25,000 --> 22:42:28,920 get our model to do is still recognize this image as an image of stake, even though it's been 14204 22:42:28,920 --> 22:42:34,360 manipulated a bit. Now, will this work or not? Hey, it might, it might not, but that's all the 14205 22:42:34,360 --> 22:42:40,760 nature of experimentation is. So play around. I would encourage you to go in the transforms 14206 22:42:40,760 --> 22:42:46,360 documentation like we've just done, illustrations, change this one out, trivial augment wine, 14207 22:42:46,360 --> 22:42:51,240 for another type of augmentation that you can find in here, and see what it does to some of 14208 22:42:51,240 --> 22:42:57,000 our images randomly. I've just highlighted trivial augment because it's what the PyTorch team have 14209 22:42:57,000 --> 22:43:02,440 used in their most recent blog post for their training recipe to train state-of-the-art vision 14210 22:43:02,440 --> 22:43:09,080 models. So speaking of training models, let's move forward and we've got to build our first model 14211 22:43:09,800 --> 22:43:12,200 for this section. I'll see you in the next video. 14212 22:43:15,960 --> 22:43:21,480 Welcome back. In the last video, we covered how the PyTorch team used trivial augment 14213 22:43:21,480 --> 22:43:26,280 wide, which is the latest state-of-the-art in data augmentation at the time of recording this 14214 22:43:26,280 --> 22:43:31,720 video to train their latest state-of-the-art computer vision models that are within 14215 22:43:31,720 --> 22:43:37,960 torch vision. And we saw how easily we could apply trivial augment thanks to torch vision 14216 22:43:37,960 --> 22:43:43,240 dot transforms. And we'll just see one more of those in action, just to highlight what's going on. 14217 22:43:45,720 --> 22:43:49,800 So it doesn't look like much happened to that image when we augmented, but we see this one has 14218 22:43:49,800 --> 22:43:53,720 been moved over. We've got some black space there. This one has been rotated a little, 14219 22:43:53,720 --> 22:43:59,240 and now we've got some black space there. But now's time for us to build our first 14220 22:43:59,240 --> 22:44:04,920 computer vision model on our own custom data set. So let's get started. We're going to go model zero. 14221 22:44:05,880 --> 22:44:11,080 We're going to reuse the tiny VGG architecture, which we covered in the computer vision section. 14222 22:44:11,080 --> 22:44:15,080 And the first experiment that we're going to do, we're going to build a baseline, 14223 22:44:15,080 --> 22:44:19,880 which is what we do with model zero. We're going to build it without data augmentation. 14224 22:44:19,880 --> 22:44:26,280 So rather than use trivial augment, which we've got up here, which is what the PyTorch team used 14225 22:44:26,280 --> 22:44:30,600 to train their state-of-the-art computer vision models, we're going to start by training our 14226 22:44:30,600 --> 22:44:36,120 computer vision model without data augmentation. And then so later on, we can try one to see 14227 22:44:36,680 --> 22:44:41,800 with data augmentation to see if it helps or doesn't. So let me just put a link in here, 14228 22:44:42,440 --> 22:44:48,840 CNN explainer. This is the model architecture that we covered in depth in the last section. 14229 22:44:48,840 --> 22:44:52,520 So we're not going to go spend too much time here. All you have to know is that we're going 14230 22:44:52,520 --> 22:44:58,760 to have an input of 64, 64, 3 into multiple different layers, such as convolutional layers, 14231 22:44:58,760 --> 22:45:03,480 relio layers, max pool layers. And then we're going to have some output layer that suits the 14232 22:45:03,480 --> 22:45:09,240 number of classes that we have. In this case, there's 10 different classes, but in our case, 14233 22:45:09,240 --> 22:45:17,800 we have three different classes, one for pizza, steak, and sushi. So let's replicate the tiny VGG 14234 22:45:17,800 --> 22:45:26,280 architecture from the CNN explainer website. And this is going to be good practice, right? 14235 22:45:26,280 --> 22:45:29,720 We're not going to spend too much time referencing their architecture. We're going to spend more 14236 22:45:29,720 --> 22:45:35,080 time coding here. But of course, before we can train a model, what do we have to do? Well, 14237 22:45:35,080 --> 22:45:43,320 let's go 7.1. We're going to create some transforms and loading data. We're going to load data for 14238 22:45:43,320 --> 22:45:51,000 model zero. Now, we could of course use some of the variables that we already have loaded. But 14239 22:45:51,000 --> 22:45:57,480 we're going to recreate them just to practice. So let's create a simple transform. And what is 14240 22:45:57,480 --> 22:46:04,840 our whole premise of loading data for model zero? We want to get our data from the data folder, 14241 22:46:05,640 --> 22:46:10,600 from pizza, steak sushi, from the training and test folders, from their respective folders, 14242 22:46:10,600 --> 22:46:15,480 we want to load these images and turn them into tenses. Now we've done this a few times now. 14243 22:46:16,120 --> 22:46:23,560 And one of the ways that we can do that is by creating a transform equals transforms dot compose. 14244 22:46:24,520 --> 22:46:32,680 And we're going to pass in, let's resize it. So transforms dot resize, we're going to resize our 14245 22:46:32,680 --> 22:46:40,520 images to be the same size as the tiny VGG architecture on the CNN explainer website. 64 14246 22:46:40,520 --> 22:46:48,360 64 three. And then we're also going to pass in another transform to tensor. So that our 14247 22:46:48,920 --> 22:46:55,480 images get resized to 64 64. And then they get converted into tenses. And particularly, 14248 22:46:55,480 --> 22:47:02,360 these values within that tensor are going to be between zero and one. So there's our transform. 14249 22:47:02,360 --> 22:47:07,000 Now we're going to load some data. If you want to pause the video here and try to load it yourself, 14250 22:47:07,000 --> 22:47:12,840 I'd encourage you to try out option one, loading image data using the image folder class, 14251 22:47:12,840 --> 22:47:20,120 and then turn that data set, that image folder data set into a data loader. So batchify it so 14252 22:47:20,120 --> 22:47:26,200 that we can use it with a pytorch model. So give that a shot. Otherwise, let's go ahead and do 14253 22:47:26,200 --> 22:47:33,800 that together. So one, we're going to load and transform data. We've done this before, 14254 22:47:33,800 --> 22:47:39,960 but let's just rehash on it what we're doing. So from torch vision import data sets, then we're 14255 22:47:39,960 --> 22:47:46,600 going to create the train data simple. And I call this simple because we're going to use at first 14256 22:47:46,600 --> 22:47:52,600 a simple transform, one with no data augmentation. And then later on for another modeling experiment, 14257 22:47:52,600 --> 22:47:58,360 we're going to create another transform one with data augmentation. So let's put this here 14258 22:47:58,360 --> 22:48:06,840 data sets image folder. And let's go the route equals the training directory. And then the 14259 22:48:06,840 --> 22:48:11,240 transform is going to be what? It's going to be our simple transform that we've got above. 14260 22:48:11,960 --> 22:48:17,480 And then we can put in test data simple here. And we're going to create data sets dot image 14261 22:48:17,480 --> 22:48:22,120 folder. And then we're going to pass in the route as the test directory. And we'll pass in the 14262 22:48:22,120 --> 22:48:26,760 transform is going to be the simple transform again above. So we're performing the same 14263 22:48:26,760 --> 22:48:33,080 transformation here on our training data, and on our testing data. Then what's the next step 14264 22:48:33,080 --> 22:48:42,600 we can do here? Well, we can to turn the data sets into data loaders. So let's try it out. 14265 22:48:42,600 --> 22:48:49,960 First, we're going to import OS, then from torch dot utils dot data, we're going to import data 14266 22:48:49,960 --> 22:48:58,600 loader. And then we're going to set up batch size and number of workers. So let's go batch size. 14267 22:48:58,600 --> 22:49:01,560 We're going to use a batch size of 32 for our first model. 14268 22:49:03,560 --> 22:49:09,800 Numb workers, which will be the number of excuse me, got a typo up here classic number of workers, 14269 22:49:09,800 --> 22:49:16,440 which will be the what the number of CPU cores that we dedicate towards loading our data. 14270 22:49:16,440 --> 22:49:24,200 So let's now create the data loaders. We're going to create train data loader simple, 14271 22:49:24,200 --> 22:49:32,840 which will be equal to data loader. And the data set that goes in here will be train data 14272 22:49:32,840 --> 22:49:37,880 simple. Then we can set the batch size equal to the batch size parameter that we just created, 14273 22:49:37,880 --> 22:49:43,160 or hyper parameter that is, recall a hyper parameter is something that you can set yourself. We 14274 22:49:43,160 --> 22:49:50,520 would like to shuffle the training data. And we're going to set numb workers equal to numb workers. 14275 22:49:51,240 --> 22:49:58,120 So in our case, how many calls does Google Colab have? Let's just run this. Find out how many 14276 22:49:58,120 --> 22:50:05,160 numb workers there are. I think there's going to be two CPUs. Wonderful. And then we're going to do 14277 22:50:05,160 --> 22:50:14,520 the same thing for the test data loader. Test data loader simple. We're going to go data loader. 14278 22:50:14,520 --> 22:50:19,880 We'll pass in the data set here, which is going to be the test data simple. And then we're going 14279 22:50:19,880 --> 22:50:27,720 to go batch size equals batch size. We're not going to shuffle the test data set. And then the 14280 22:50:27,720 --> 22:50:35,800 numb workers will just set it to the same thing as we've got above. Beautiful. So I hope you gave 14281 22:50:35,800 --> 22:50:41,000 that a shot, but now do you see how quickly we can get our data loaded if it's in the right format? 14282 22:50:41,640 --> 22:50:46,280 I know we spent a lot of time going through all of these steps over multiple videos and 14283 22:50:46,280 --> 22:50:51,480 writing lots of code, but this is how quickly we can get set up to load our data. We create a 14284 22:50:51,480 --> 22:50:57,000 simple transform, and then we load in and transform our data at the same time. And then we turn the 14285 22:50:57,000 --> 22:51:02,600 data sets into data loaders just like this. Now we're ready to use these data loaders with a model. 14286 22:51:03,400 --> 22:51:10,040 So speaking of models, how about we build the tiny VGG architecture in the next video? And in 14287 22:51:10,040 --> 22:51:15,400 fact, we've already done this in notebook number three. So if you want to refer back to the model 14288 22:51:15,400 --> 22:51:21,240 that we built there, right down here, which was model number two, if you want to refer back to 14289 22:51:21,240 --> 22:51:27,480 this section and give it a go yourself, I'd encourage you to do so. Otherwise, we'll build tiny VGG 14290 22:51:27,480 --> 22:51:36,440 architecture in the next video. Welcome back. In the last video, we got set up starting to get 14291 22:51:36,440 --> 22:51:41,560 ready to model our first custom data set. And I issued you the challenge to try and replicate 14292 22:51:41,560 --> 22:51:47,400 the tiny VGG architecture from the CNN explainer website, which we covered in notebook number 14293 22:51:47,400 --> 22:51:53,480 three. But now let's see how fast we can do that together. Hey, I'm going to write down here section 14294 22:51:53,480 --> 22:51:59,160 seven point two. And I know we've already coded this up before, but it's good practice to see what 14295 22:51:59,160 --> 22:52:07,320 it's like to build pytorch models from scratch, create tiny VGG model class. So the model is going 14296 22:52:07,320 --> 22:52:12,440 to come from here. Previously, we created our model, there would have been one big change from 14297 22:52:12,440 --> 22:52:18,600 the model that we created in section number three, which is that our model in section number three 14298 22:52:18,600 --> 22:52:24,760 used black and white images. But now the images that we have are going to be color images. So 14299 22:52:24,760 --> 22:52:30,120 there's going to be three color channels rather than one. And there might be a little bit of a 14300 22:52:30,120 --> 22:52:35,880 trick that we have to do to find out the shape later on in the classifier layer. But let's get 14301 22:52:35,880 --> 22:52:43,160 started. We've got class tiny VGG, we're going to inherit from nn.module. This is going to be 14302 22:52:44,200 --> 22:52:55,880 the model architecture copying tiny VGG from CNN explainer. And remember that it's a it's 14303 22:52:55,880 --> 22:53:00,920 quite a common practice in machine learning to find a model that works for a problem similar to 14304 22:53:00,920 --> 22:53:06,440 yours and then copy it and try it on your own problem. So I only want two underscores there. 14305 22:53:06,440 --> 22:53:12,440 We're going to initialize our class. We're going to give it an input shape, which will be an int. 14306 22:53:13,160 --> 22:53:17,960 We're going to say how many hidden units do we want, which will also be an int. And we're going 14307 22:53:17,960 --> 22:53:25,480 to have an output shape, which will be an int as well. And it's going to return something none 14308 22:53:25,480 --> 22:53:32,840 of type none. And if we go down here, we can initialize it with super dot underscore init. 14309 22:53:34,520 --> 22:53:40,520 Beautiful. And now let's create the first COM block. So COM block one, which we'll recall 14310 22:53:40,520 --> 22:53:48,920 will be this section of layers here. So COM block one, let's do an nn.sequential to do so. 14311 22:53:48,920 --> 22:53:56,440 Now we need com relu com relu max pool. So let's try this out. And then com to D. 14312 22:53:57,080 --> 22:54:04,520 The in channels is going to be the input shape of our model. The input shape parameter. 14313 22:54:04,520 --> 22:54:09,320 The out channels is going to be the number of hidden units we have, which is from 14314 22:54:10,360 --> 22:54:15,080 Oh, I'm gonna just put enter down here input shape hidden units. We're just getting those 14315 22:54:15,080 --> 22:54:20,840 to there. Let's set the kernel size to three, which will be how big the convolving window will be 14316 22:54:20,840 --> 22:54:27,400 over our image data. There's a stride of one and the padding equals one as well. So these are the 14317 22:54:27,400 --> 22:54:34,040 similar parameters to what the CNN explainer website uses. And we're going to go and then 14318 22:54:34,840 --> 22:54:43,000 relu. And then we're going to go and then com to D. And I want to stress that even if someone 14319 22:54:43,000 --> 22:54:48,600 else uses like certain values for these, you don't have to copy them exactly. So just keep that in 14320 22:54:48,600 --> 22:54:53,960 mind. You can try out various values of these. These are all hyper parameters that you can set 14321 22:54:53,960 --> 22:55:01,880 yourself. Hidden units, out channels, equals hidden units as well. Then we're going to go kernel 14322 22:55:01,880 --> 22:55:09,240 size equals three stride equals one. And we're going to put padding equals one as well. 14323 22:55:09,240 --> 22:55:13,320 Then we're going to have another relu layer. And I believe I forgot my comma up here. 14324 22:55:15,960 --> 22:55:19,960 Another relu layer here. And we're going to finish off 14325 22:55:21,800 --> 22:55:27,880 with an N dot max pool 2D. And we're going to put in the kernel size. 14326 22:55:27,880 --> 22:55:39,160 These equals two and the stride here equals two. Wonderful. So oh, by the way, for max 14327 22:55:39,160 --> 22:55:47,240 pool 2D, the default stride value is same as the kernel size. So let's have a go here. 14328 22:55:47,720 --> 22:55:55,240 What can we do now? Well, we could just replicate this block as block two. So how about we copy this 14329 22:55:55,240 --> 22:56:00,920 down here? We've already had enough practice writing this sort of code. So we're going to 14330 22:56:00,920 --> 22:56:05,240 go comp block two, but we need to change the input shape here. The input shape of this block 14331 22:56:05,240 --> 22:56:10,200 two is going to receive the output shape here. So we need to line those up. This is going to be 14332 22:56:10,200 --> 22:56:21,480 hidden units. Hidden units. And I believe that's all we need to change there. Beautiful. So let's 14333 22:56:21,480 --> 22:56:27,240 create the classifier layer. And the classifier layer recall is going to be this output layer 14334 22:56:27,240 --> 22:56:33,240 here. So we need at some point to add a linear layer. That's going to have a number of outputs 14335 22:56:33,240 --> 22:56:37,960 equal to the number of classes that we're working with. And in this case, the number of classes is 14336 22:56:37,960 --> 22:56:45,320 10. But in our case, our custom data set, we have three classes, pizza, steak, sushi. So let's 14337 22:56:45,320 --> 22:56:51,320 create a classifier layer, which will be an end sequential. And then we're going to pass in an end 14338 22:56:51,320 --> 22:56:57,880 dot flatten to turn the outputs of our convolutional blocks into feature vector into a feature vector 14339 22:56:57,880 --> 22:57:03,880 site. And then we're going to have an end dot linear. And the end features, do you remember my 14340 22:57:03,880 --> 22:57:09,400 trick for calculating the shape in features? I'm going to put hidden units here for the time being. 14341 22:57:09,400 --> 22:57:16,520 Out features is going to be output shape. So I put hidden units here for the time being because 14342 22:57:16,520 --> 22:57:22,520 we don't quite yet know what the output shape of all of these operations is going to be. Of course, 14343 22:57:22,520 --> 22:57:28,040 we could calculate them by hand by looking up the formula for input and output shapes of convolutional 14344 22:57:28,040 --> 22:57:34,520 layers. So the input and output shapes are here. But I prefer to just do it programmatically and let 14345 22:57:34,520 --> 22:57:40,920 the errors tell me where I'm wrong. So we can do that by doing a forward pass. And speaking of a 14346 22:57:40,920 --> 22:57:45,880 forward pass, let's create a forward method, because every time we have to subclass an end 14347 22:57:45,880 --> 22:57:51,480 dot module, we have to override the forward method. We've done this a few times. But as you can see, 14348 22:57:51,480 --> 22:57:57,960 I'm picking up the pace a little bit because you've got this. So let's pass in the conv block one, 14349 22:57:57,960 --> 22:58:03,480 we're going to go X, then we're going to print out x dot shape. And then we're going to reassign 14350 22:58:03,480 --> 22:58:10,120 X to be self.com block two. So we're passing it through our second block of convolutional layers, 14351 22:58:10,120 --> 22:58:15,480 print X dot shape to check the shape here. Now this is where our model will probably error 14352 22:58:15,480 --> 22:58:20,760 is because the input shape here isn't going to line up in features, hidden units, because we've 14353 22:58:20,760 --> 22:58:26,600 passed all of the output of what's going through comp block one, comp block two to a flatten layer, 14354 22:58:26,600 --> 22:58:32,040 because we want a feature vector to go into our nn.linear layer, our output layer, which has an 14355 22:58:32,040 --> 22:58:38,520 out features size of output shape. And then we're going to return X. So I'm going to print x dot 14356 22:58:38,520 --> 22:58:43,240 shape here. And I just want to let you in on one little secret as well. We haven't covered this 14357 22:58:43,240 --> 22:58:48,600 before, but we could rewrite this entire forward method, this entire stack of code, 14358 22:58:48,600 --> 22:58:55,320 by going return self dot classifier, and then going from the outside in. So we could pass in 14359 22:58:55,320 --> 22:59:03,240 comp block two here, comp block two, and then self comp block one, and then X on the inside. 14360 22:59:03,960 --> 22:59:09,720 So that is essentially the exact same thing as what we've done here, except this is going to 14361 22:59:10,520 --> 22:59:17,560 benefits from operator fusion. Now this topic is beyond the scope of this course, 14362 22:59:17,560 --> 22:59:22,840 essentially, all you need to know is that operator fusion behind the scenes speeds up 14363 22:59:22,840 --> 22:59:27,960 how your GPU performs computations. So all of these are going to happen in one step, 14364 22:59:27,960 --> 22:59:33,640 rather than here, we are reassigning X every time we make a computation through these layers. 14365 22:59:33,640 --> 22:59:40,280 So we're spending time going from computation back to memory, computation back to memory, 14366 22:59:40,280 --> 22:59:44,440 whereas this kind of just chunks it all together in one hit. If you'd like to read 14367 22:59:44,440 --> 22:59:49,880 more about this, I'd encourage you to look up the blog post, how to make your GPUs go 14368 22:59:49,880 --> 22:59:58,600 bur from first principles, and bur means fast. That's why I love this post, right? 14369 22:59:58,600 --> 23:00:04,440 Because it's half satire, half legitimately, like GPU computer science. So if you go in here, 14370 23:00:04,440 --> 23:00:08,520 yeah, here's what we want to avoid. We want to avoid all of this transportation between 14371 23:00:08,520 --> 23:00:14,840 memory and compute. And then if we look in here, we might have operator fusion. There we go. 14372 23:00:14,840 --> 23:00:20,680 This is operator fusion, the most important optimization in deep learning compilers. So 14373 23:00:20,680 --> 23:00:25,640 I will link this, making deep learning go bur from first principles by Horace Hare, 14374 23:00:25,640 --> 23:00:31,400 a great blog post that I really like, right here. So if you'd like to read more on that, 14375 23:00:31,400 --> 23:00:35,800 it's also going to be in the extracurricular section of the course. So don't worry, it'll be there. 14376 23:00:35,800 --> 23:00:43,080 Now, we've got a model. Oh, where do we, where do we forget a comma? Right here, of course we did. 14377 23:00:47,000 --> 23:00:51,240 And we've got another, we forgot another comma up here. Did you notice these? 14378 23:00:53,080 --> 23:00:59,480 Beautiful. Okay. So now we can create our model by going torch or an instance of the tiny VGG 14379 23:00:59,480 --> 23:01:07,000 to see if our model holds up. Let's create model zero equals tiny VGG. And I'm going to pass in 14380 23:01:07,000 --> 23:01:11,000 the input shape. What is the input shape? It's going to be the number of color channels of our 14381 23:01:11,000 --> 23:01:17,640 image. So number of color channels in our image data, which is three, because we have color images. 14382 23:01:19,400 --> 23:01:24,280 And then we're going to put in hidden units, equals 10, which will be the same number of 14383 23:01:24,280 --> 23:01:31,720 hidden units as the tiny VGG architecture. One, two, three, four, five, six, seven, eight, nine, 14384 23:01:31,720 --> 23:01:39,080 10. Again, we could put in 10, we could put in 100, we could put in 64, which is a good multiple 14385 23:01:39,080 --> 23:01:44,120 of eight. So let's just leave it at 10 for now. And then the output shape is going to be what? 14386 23:01:44,680 --> 23:01:49,960 It's going to be the length of our class names, because we want one hidden unit or one output unit 14387 23:01:49,960 --> 23:01:55,800 per class. And then we're going to send it to the target device, which is of course CUDA. And then 14388 23:01:55,800 --> 23:02:05,080 we can check out our model zero here. Beautiful. So that took a few seconds, as you saw there, 14389 23:02:05,080 --> 23:02:09,160 to move to the GPU memory. So that's just something to keep in mind for when you build 14390 23:02:09,160 --> 23:02:14,200 large neural networks and you want to speed up their computation, is to use operator fusion 14391 23:02:14,200 --> 23:02:19,480 where you can, because as you saw, it took a few seconds for our model to just move from the CPU, 14392 23:02:19,480 --> 23:02:26,680 which is the default to the GPU. So we've got our architecture here. But of course, we know that 14393 23:02:26,680 --> 23:02:32,680 this potentially is wrong. And how would we find that out? Well, we could find the right hidden 14394 23:02:32,680 --> 23:02:38,440 unit shape or we could find that it's wrong by passing some dummy data through our model. So 14395 23:02:38,440 --> 23:02:43,560 that's one of my favorite ways to troubleshoot a model. Let's in the next video pass some dummy 14396 23:02:43,560 --> 23:02:49,240 data through our model and see if we've implemented the forward pass correctly. And also check the 14397 23:02:49,240 --> 23:03:00,040 input and output shapes of each of our layers. I'll see you there. In the last video, we replicated 14398 23:03:00,040 --> 23:03:05,960 the tiny VGG architecture from the CNN explainer website, very similar to the model that we built 14399 23:03:05,960 --> 23:03:13,320 in section 03. But this time, we're using color images instead of grayscale images. And we did 14400 23:03:13,320 --> 23:03:18,440 it quite a bit faster than what we previously did, because we've already covered it, right? 14401 23:03:18,440 --> 23:03:21,960 And you've had some experience now building pilotage models from scratch. 14402 23:03:21,960 --> 23:03:28,680 So we're going to pick up the pace when we build our models. But let's now go and try a dummy 14403 23:03:28,680 --> 23:03:34,680 forward pass to check that our forward method is working correctly and that our input and output 14404 23:03:34,680 --> 23:03:42,440 shapes are correct. So let's create a new heading. Try a forward pass on a single image. And this 14405 23:03:42,440 --> 23:03:51,720 is one of my favorite ways to test the model. So let's first get a single image. Get a single 14406 23:03:51,720 --> 23:03:58,440 image. We want an image batch. Maybe we get an image batch, get a single image batch, because 14407 23:03:58,440 --> 23:04:07,720 we've got images that are batches already image batch. And then we'll get a label batch. And we'll 14408 23:04:07,720 --> 23:04:15,320 go next, it a train data loader. Simple. That's the data loader that we're working with for now. 14409 23:04:16,600 --> 23:04:22,040 And then we'll check image batch dot shape and label batch dot shape. 14410 23:04:25,480 --> 23:04:29,720 Wonderful. And now let's see what happens. Try a forward pass. 14411 23:04:29,720 --> 23:04:38,760 Oh, I spelled single wrong up here. Try a forward pass. We could try this on a single image trying 14412 23:04:38,760 --> 23:04:45,640 it on a same batch will result in similar results. So let's go model zero. And we're just going to 14413 23:04:45,640 --> 23:04:54,360 pass it in the image batch and see what happens. Oh, no. Of course, we get that input type, 14414 23:04:54,360 --> 23:05:00,120 torch float tensor and wait type torch CUDA float tensor should be the same or input should be. 14415 23:05:00,760 --> 23:05:06,520 So we've got tensors on a different device, right? So this is on the CPU, the image batch, 14416 23:05:06,520 --> 23:05:12,360 whereas our model is, of course, on the target device. So we've seen this error a number of times. 14417 23:05:12,360 --> 23:05:20,360 Let's see if this fixes it. Oh, we get an other error. And we kind of expected this type of error. 14418 23:05:20,360 --> 23:05:26,680 We've got runtime error amount one and mat two shapes cannot be multiplied. 32. So that looks 14419 23:05:26,680 --> 23:05:35,560 like the batch size 2560 and 10. Hmm, what is 10? Well, recall that 10 is the number of hidden 14420 23:05:35,560 --> 23:05:41,640 units that we have. So this is the size here. That's 10 there. So it's trying to multiply 14421 23:05:42,280 --> 23:05:48,920 a matrix of this size by this size. So 10 has got something going on with it. We need to get 14422 23:05:48,920 --> 23:05:53,960 these two numbers, the middle numbers, to satisfy the rules of matrix multiplication, 14423 23:05:53,960 --> 23:05:58,120 because that's what happens in our linear layer. We need to get these two numbers the same. 14424 23:06:00,040 --> 23:06:07,080 And so our hint and my trick is to look at the previous layer. So if that's our batch size, 14425 23:06:07,080 --> 23:06:15,240 where does this value come from? Well, could it be the fact that a tensor of this size goes 14426 23:06:15,240 --> 23:06:22,360 through the flatten layer? Recall that we have this layer up here. So we've printed out the shape 14427 23:06:22,360 --> 23:06:29,720 here of the conv block, the output of conv block one. Now this shape here is the output of conv 14428 23:06:29,720 --> 23:06:36,360 block two. So we've got this number, the output of conv block one, and then the output of conv 14429 23:06:36,360 --> 23:06:43,560 block two. So that must be the input to our classifier layer. So if we go 10 times 16 times 16, 14430 23:06:43,560 --> 23:06:55,000 what do we get? 2560. Beautiful. So we can multiply our hitting units 10 by 16 by 16, which is the 14431 23:06:55,000 --> 23:07:04,440 shape here. And we get 2560. Let's see if that works. We'll go up here, times 16 times 16. 14432 23:07:05,080 --> 23:07:10,280 And let's see what happens. We'll rerun the model, we'll rerun the image batch, and then we'll pass 14433 23:07:10,280 --> 23:07:16,920 it. Oh, look at that. Our model works. Or the shapes at least line up. We don't know if it works 14434 23:07:16,920 --> 23:07:22,280 yet. We haven't started training yet. But this is the output size. We've got the output. It's on 14435 23:07:22,280 --> 23:07:27,640 the CUDA device, of course. But we've got 32 samples with three numbers in each. Now these are going 14436 23:07:27,640 --> 23:07:32,920 to be as good as random, because we haven't trained our model yet. We've only initialized it here 14437 23:07:32,920 --> 23:07:41,560 with random weights. So we've got 32 or a batch worth of random predictions on 32 images. 14438 23:07:42,360 --> 23:07:46,920 So you see how the output shape here three corresponds to the output shape we set up here. 14439 23:07:47,640 --> 23:07:52,280 Output shape equals length class names, which is exactly the number of classes that we're dealing 14440 23:07:52,280 --> 23:07:59,880 with. But I think our number is a little bit different to what's in the CNN explainer 1616. 14441 23:07:59,880 --> 23:08:07,240 How did they end up with 1313? You know what? I think we got one of these numbers wrong, 14442 23:08:07,240 --> 23:08:13,640 kernel size, stride, padding. Let's have a look. Jump into here. If we wanted to truly replicate it, 14443 23:08:14,680 --> 23:08:20,520 is there any padding here? I actually don't think there's any padding here. So what if we go back 14444 23:08:20,520 --> 23:08:27,560 here and see if we can change this to zero and change this to zero? Zero. I'm not sure if this 14445 23:08:27,560 --> 23:08:31,800 will work, by the way. If it doesn't, it's not too bad, but we're just trying to line up the shapes 14446 23:08:31,800 --> 23:08:38,920 with the CNN explainer to truly replicate it. So the output of the COM Block 1 should be 30-30-10. 14447 23:08:38,920 --> 23:08:46,200 What are we working with at the moment? We've got 32-32-10. So let's see if removing the padding 14448 23:08:46,200 --> 23:08:51,960 from our convolutional layers lines our shape up with the CNN explainer. So I'm going to rerun 14449 23:08:51,960 --> 23:08:57,320 this, rerun our model. I've set the padding to zero on all of our padding hyper parameters. 14450 23:08:58,040 --> 23:09:02,600 Oh, and we get another error. We get another shape error. Of course we do, 14451 23:09:02,600 --> 23:09:09,080 because we've now got different shapes. Wow, do you see how often that these errors come up? 14452 23:09:10,200 --> 23:09:15,080 Trust me, I spend a lot of time troubleshooting these shape errors. So we now have to line up 14453 23:09:15,080 --> 23:09:23,080 these shapes. So we've got 13-13-10. Now does that equal 16-90? Let's try it out. 13-13-10. 14454 23:09:24,120 --> 23:09:30,760 16-90. Beautiful. And do our shapes line up with the CNN explainer? So we've got 30-30-10. 14455 23:09:30,760 --> 23:09:36,520 Remember, these are in PyTorch. So color channels first, whereas this is color channels last. So 14456 23:09:36,520 --> 23:09:41,640 yeah, we've got the output of our first COM Block is lining up here. That's correct. 14457 23:09:41,640 --> 23:09:46,200 And then same with the second block. How good is that? We've officially replicated the CNN explainer 14458 23:09:46,200 --> 23:09:55,240 model. So we can take this value 13-13-10 and bring it back up here. 13-13-10. Remember, 14459 23:09:55,240 --> 23:09:59,640 hidden units is 10. So we're just going to multiply it by 13-13. You could calculate 14460 23:09:59,640 --> 23:10:05,160 these shapes by hand, but my trick is I like to let the error codes give me a hint of where to go. 14461 23:10:05,160 --> 23:10:15,160 And boom, there we go. We get it working again. Some shape troubleshooting on the fly. So now 14462 23:10:15,160 --> 23:10:20,440 we've done a single forward pass on the model. We can kind of verify that our data at least flows 14463 23:10:20,440 --> 23:10:27,240 through it. What's next? Well, I'd like to show you another little package that I like to use 14464 23:10:27,240 --> 23:10:33,480 to also have a look at the input and output shapes of my model. And that is called Torch Info. So 14465 23:10:33,480 --> 23:10:39,400 you might want to give this a shot before we go into the next video. But in the next video, 14466 23:10:39,400 --> 23:10:44,280 we're going to see how we can use Torch Info to print out a summary of our model. So we're 14467 23:10:44,280 --> 23:10:51,080 going to get something like this. So this is how beautifully easy Torch Info is to use. So 14468 23:10:51,080 --> 23:10:57,160 give that a shot, install it into Google CoLab and run it in a cell here. See if you can get 14469 23:10:57,160 --> 23:11:03,800 something similar to this output for our model zero. And I'll see you in the next video. We'll try 14470 23:11:03,800 --> 23:11:13,880 that together. In the last video, we checked our model by doing a forward pass on a single batch. 14471 23:11:13,880 --> 23:11:18,920 And we learned that our forward method so far looks like it's intact and that we don't get any 14472 23:11:18,920 --> 23:11:24,440 shape errors as our data moves through the model. But I'd like to introduce to you one of my 14473 23:11:24,440 --> 23:11:31,480 favorite packages for finding out information from a PyTorch model. And that is Torch Info. 14474 23:11:31,480 --> 23:11:40,200 So let's use Torch Info to get an idea of the shapes going through our model. So you know how 14475 23:11:40,200 --> 23:11:46,040 much I love doing things in a programmatic way? Well, that's what Torch Info does. Before, 14476 23:11:46,040 --> 23:11:50,200 we used print statements to find out the different shapes going through our model. 14477 23:11:50,200 --> 23:11:54,920 And I'm just going to comment these out in our forward method so that when we run this later on 14478 23:11:54,920 --> 23:12:00,840 during training, we don't get excessive printouts of all the shapes. So let's see what Torch Info 14479 23:12:00,840 --> 23:12:06,520 does. And in the last video, I issued a challenge to give it a go. It's quite straightforward of 14480 23:12:06,520 --> 23:12:11,240 how to use it. But let's see it together. This is the type of output we're looking for from our 14481 23:12:11,240 --> 23:12:16,840 tiny VGG model. And of course, you could get this type of output from almost any PyTorch model. 14482 23:12:16,840 --> 23:12:23,080 But we have to install it first. And as far as I know, Google CoLab doesn't come with Torch Info 14483 23:12:23,080 --> 23:12:29,800 by default. Now, you might as well try this in the future and see if it works. But yeah, I don't 14484 23:12:29,800 --> 23:12:35,800 get this module because my Google CoLab instance doesn't have an install. No problem with that. 14485 23:12:35,800 --> 23:12:45,400 Let's install Torch Info here. Install Torch Info and then we'll import it if it's available. 14486 23:12:45,400 --> 23:12:51,320 So we're going to try and import Torch Info. If it's already installed, we'll import it. 14487 23:12:51,320 --> 23:12:58,840 And then if it doesn't work, if that try block fails, we're going to run pip install Torch Info. 14488 23:12:58,840 --> 23:13:06,440 And then we will import Torch Info. And then we're going to run down here from Torch Info, 14489 23:13:06,440 --> 23:13:12,760 import summary. And then if this all works, we're going to get a summary of our model. We're going 14490 23:13:12,760 --> 23:13:19,240 to pass it in model zero. And we have to put in an input size here. Now that is an example of the 14491 23:13:19,240 --> 23:13:24,600 size of data that will flow through our model. So in our case, let's put in an input size of 1, 14492 23:13:25,160 --> 23:13:32,440 3, 64, 64. So this is an example of putting in a batch of one image. You could potentially 14493 23:13:32,440 --> 23:13:37,560 put in 32 here if you wanted, but let's just put in a batch of a singular image. And of course, 14494 23:13:37,560 --> 23:13:43,560 we could change these values here if we wanted to, 24 to 24. But what you might notice is that if 14495 23:13:43,560 --> 23:13:50,440 it doesn't get the right input size, it produces an error. There we go. So just like we got before 14496 23:13:50,440 --> 23:13:55,400 when we printed out our input sizes manually, we get an error here. Because what Torch Info 14497 23:13:55,400 --> 23:14:00,280 behind the scenes is going to do is it's going to do a forward pass on whichever model you pass 14498 23:14:00,280 --> 23:14:06,360 it with an input size of whichever input size you give it. So let's put in the input size that 14499 23:14:06,360 --> 23:14:15,800 our model was built for. Wonderful. So what Torch Info gives us is, oh, excuse me, we didn't 14500 23:14:15,800 --> 23:14:22,680 comment out the printouts before. So just make sure we've commented out these printouts in the 14501 23:14:22,680 --> 23:14:29,240 forward method of our 20 VGG class. So I'm just going to run this, then we run that, run that, 14502 23:14:29,240 --> 23:14:35,080 just to make sure everything still works. We'll run Torch Info. There we go. So no printouts 14503 23:14:35,080 --> 23:14:40,360 from our model, but this is, look how beautiful this is. I love how this prints out. So we have 14504 23:14:40,360 --> 23:14:46,600 our tiny VGG class, and then we can see it's comprised of three sequential blocks. And then 14505 23:14:46,600 --> 23:14:51,480 inside those sequential blocks, we have different combinations of layers. We have some conv layers, 14506 23:14:52,040 --> 23:14:57,720 some relu layers, some max pool layers. And then the final layer is our classification layer 14507 23:14:57,720 --> 23:15:03,720 with a flatten and a linear layer. And we can see the shapes changing throughout our model. 14508 23:15:03,720 --> 23:15:10,760 As our data goes in and gets manipulated by the various layers. So are these in line with 14509 23:15:10,760 --> 23:15:15,960 the CNN explainer? So if we check this last one, we've already verified this before. 14510 23:15:17,320 --> 23:15:23,560 And we also get some other helpful information down here, which is total params. So you can see 14511 23:15:23,560 --> 23:15:29,400 that each of these layers has a different amount of parameters to learn. Now, recall that a parameter 14512 23:15:29,400 --> 23:15:36,280 is a value such as a weight or a bias term within each of our layers, which starts off as a random 14513 23:15:36,280 --> 23:15:42,120 number. And the whole goal of deep learning is to adjust those random numbers to better represent 14514 23:15:42,120 --> 23:15:49,000 our data. So in our case, we have just over 8000 total parameters. Now this is actually quite small. 14515 23:15:49,800 --> 23:15:53,880 In the future, you'll probably play around with models that have a million parameters or more. 14516 23:15:53,880 --> 23:16:00,280 And models now are starting to have many billions of parameters. And we also get some 14517 23:16:00,280 --> 23:16:04,840 information here, such as how much the model size would be. Now this would be very helpful, 14518 23:16:05,400 --> 23:16:09,880 depending on where we had to put our model. So what you'll notice is that as a model gets larger, 14519 23:16:10,520 --> 23:16:15,640 as more layers, it will have more parameters, more weights and bias terms that can be adjusted 14520 23:16:15,640 --> 23:16:23,000 to learn patterns and data. But its input size and its estimated total size would definitely get 14521 23:16:23,000 --> 23:16:27,880 bigger as well. So that's just something to keep in mind if you have size constraints in terms of 14522 23:16:27,880 --> 23:16:34,600 storage in your future applications. So ours is under a megabyte, which is quite small. But you 14523 23:16:34,600 --> 23:16:39,720 might find that some models in the future get up to 500 megabytes, maybe even over a gigabyte. 14524 23:16:39,720 --> 23:16:44,600 So just keep that in mind for going forward. And that's the crux of torch info, one of my 14525 23:16:44,600 --> 23:16:49,640 favorite packages, just gives you an idea of the input and output shapes of each of your layers. 14526 23:16:49,640 --> 23:16:54,760 So you can use torch info wherever you need. It should work with most of your PyTorch models. 14527 23:16:54,760 --> 23:17:00,280 Just be sure to pass it in the right input size. You can also use it to verify like we did before, 14528 23:17:00,280 --> 23:17:06,520 if the input and output shapes are correct. So check that out, big shout out to Tyler Yup, 14529 23:17:06,520 --> 23:17:13,640 and everyone who's created the torch info package. Now in the next video, let's move towards training 14530 23:17:13,640 --> 23:17:19,000 our tiny VGG model. We're going to have to create some training and test functions. If you want to 14531 23:17:19,000 --> 23:17:26,920 jump ahead, we've already done this. So I encourage you to go back to section 6.2 in the 14532 23:17:26,920 --> 23:17:31,880 functionalizing training and test loops. And we're going to build functions very similar to this, 14533 23:17:31,880 --> 23:17:38,520 but for our custom data set. So if you want to replicate these functions in this notebook, 14534 23:17:39,160 --> 23:17:43,240 give that a go. Otherwise, I'll see you in the next video and we'll do it together. 14535 23:17:43,240 --> 23:17:53,000 How'd you go? Did you give it a shot? Did you try replicating the train step and the test step 14536 23:17:53,000 --> 23:17:58,760 function? I hope you did. Otherwise, let's do that in this video, but this time we're going to do 14537 23:17:58,760 --> 23:18:04,280 it for our custom data sets. And what you'll find is not much, if anything, changes, because 14538 23:18:04,280 --> 23:18:10,520 we've created our train and test loop functions in such a way that they're generic. So we want 14539 23:18:10,520 --> 23:18:16,360 to create a train step function. And by generic, I mean they can be used with almost any model and 14540 23:18:16,360 --> 23:18:26,520 data loader. So train step is takes in a model and data loader and trains the model on the data 14541 23:18:26,520 --> 23:18:33,320 loader. And we also want to create another function called test step, which takes in 14542 23:18:33,320 --> 23:18:42,040 a model and a data loader and other things and evaluates the model on the data loader. And of course, 14543 23:18:42,040 --> 23:18:47,080 for the train step and for the test step, each of them respectively are going to take a training 14544 23:18:47,080 --> 23:18:53,720 data loader. I just might make this a third heading so that our outline looks nice, beautiful. 14545 23:18:54,360 --> 23:18:58,520 Section seven is turning out to be quite a big section. Of course, we want them to be 14546 23:18:58,520 --> 23:19:04,360 respectively taken their own data loader. So train takes in the train data loader, test takes in the 14547 23:19:04,360 --> 23:19:09,800 test data loader. Without any further ado, let's create the train step function. Now we've seen 14548 23:19:09,800 --> 23:19:15,880 this one in the computer vision section. So let's see what we can make here. So we need a train 14549 23:19:15,880 --> 23:19:21,880 step, which is going to take in a model, which will be a torch and then dot module. And we want 14550 23:19:21,880 --> 23:19:28,680 it also to take in a data loader, which will be a torch dot utils dot data dot data loader. 14551 23:19:29,960 --> 23:19:34,120 And then it's going to take in a loss function, which is going to be a torch and then 14552 23:19:34,120 --> 23:19:41,720 dot module as well. And then it's going to take in an optimizer, which is going to be torch 14553 23:19:41,720 --> 23:19:49,400 opt in dot optimizer. Wonderful. And then what do we do? What's the first thing that we do in 14554 23:19:49,400 --> 23:19:57,560 a training step? Well, we put the model in train mode. So let's go model dot train. 14555 23:19:58,840 --> 23:20:04,600 Then what shall we do next? Well, let's set up some evaluation metrics, one of them being loss 14556 23:20:04,600 --> 23:20:13,480 and one of them being accuracy. So set up train loss and train accuracy values. And we're going 14557 23:20:13,480 --> 23:20:18,360 to accumulate these per batch because we're working with batches. So we've got train loss 14558 23:20:18,360 --> 23:20:27,560 and train act equals zero, zero. Now we can loop through our data loader. So let's write loop through 14559 23:20:28,600 --> 23:20:33,240 data loader. And we'll loop through each of the batches in this because we've batchified our 14560 23:20:33,240 --> 23:20:42,520 data loader. So for batch x, y, in enumerate data loader, we want to send the data to the target 14561 23:20:42,520 --> 23:20:52,280 device. So we could even put that device parameter up here. Device equals device. We'll set that 14562 23:20:52,280 --> 23:21:04,440 to device by default. And then we can go x, y equals x dot two device. And y dot two device. 14563 23:21:05,880 --> 23:21:11,960 Beautiful. And now what do we do? Well, remember the pie torch, the unofficial pie torch optimization 14564 23:21:11,960 --> 23:21:22,040 song, we do the forward pass. So y pred equals model om x. And then number two is we calculate the 14565 23:21:22,040 --> 23:21:32,280 last. So calculate the loss. Let's go loss equals loss function. And we're going to pass it in 14566 23:21:32,280 --> 23:21:37,560 y pred y. We've done this a few times now. So that's why we're doing it a little bit faster. 14567 23:21:37,560 --> 23:21:42,520 So I hope you noticed that the things that we've covered before, I'm stepping up the pace a bit. 14568 23:21:42,520 --> 23:21:47,880 So it might be a bit of a challenge, but that's all right, you can handle it. And then, so that's 14569 23:21:47,880 --> 23:21:53,960 accumulating the loss. So we're starting from zero up here. And then each batch, we're doing a forward 14570 23:21:53,960 --> 23:22:00,200 pass, calculating the loss, and then adding it to the overall train loss. And so we're going to 14571 23:22:00,200 --> 23:22:07,640 optimize a zero grad. So zero, the gradients of the optimizer for each new batch. And then we're 14572 23:22:07,640 --> 23:22:16,040 going to perform back propagation. So loss backwards. And then five, what do we do? Optimize a step, 14573 23:22:16,040 --> 23:22:22,600 step, step. Wonderful. Look at that. Look at us coding a train loop in a minute or so. 14574 23:22:22,600 --> 23:22:31,640 Now, let's calculate the accuracy and accumulate it. Calculate the, you notice that we don't have 14575 23:22:31,640 --> 23:22:39,560 an accuracy function here. That's because accuracy is quite a straightforward metric to calculate. 14576 23:22:39,560 --> 23:22:46,280 So we'll first get the, the y pred class, because this is going to output model logits. 14577 23:22:46,280 --> 23:22:54,600 As we've seen before, the raw output of a model is logits. So to get the class, we're going to take 14578 23:22:54,600 --> 23:23:02,040 the arg max torch dot softmax. So we'll get the prediction probabilities of y pred, which is the 14579 23:23:02,040 --> 23:23:07,480 raw logits, what we've got up here, across dimension one, and then also across dimension one here. 14580 23:23:09,640 --> 23:23:15,080 Beautiful. So that should give us the labels. And then we can find out if this is wrong by 14581 23:23:15,080 --> 23:23:21,640 checking it later on. And then we're going to create the accuracy by taking the y pred class, 14582 23:23:21,640 --> 23:23:28,360 checking for a quality with the right labels. So this is going to give us how many of these 14583 23:23:28,360 --> 23:23:34,760 values equal true. And we want to take the sum of that, take the item of that, which is just a 14584 23:23:34,760 --> 23:23:41,640 single integer. And then we want to divide it by the length of y pred. So we're just getting the 14585 23:23:41,640 --> 23:23:48,040 total number that are right, and dividing it by the length of samples. So that's the formula for 14586 23:23:48,040 --> 23:23:53,640 accuracy. Now we can come down here outside of the batch loop, we know that because we've got this 14587 23:23:53,640 --> 23:24:03,160 helpful line drawn here. And we can go adjust metrics to get the average loss and accuracy 14588 23:24:03,160 --> 23:24:10,360 per batch. So we're going to set train loss is equal to train loss, divided by the length of 14589 23:24:10,360 --> 23:24:15,720 the data loader. So the number of batches in total. And the train accuracy is the train 14590 23:24:15,720 --> 23:24:21,720 act, divided by the length of the data loader as well. So that's going to give us the average 14591 23:24:21,720 --> 23:24:31,720 loss and average accuracy per epoch across all batches. So train act. Now that's a pretty good 14592 23:24:31,720 --> 23:24:38,760 looking function to me for a train step. Do you want to take on the test step? So pause the video, 14593 23:24:38,760 --> 23:24:44,600 give it a shot, and you'll get great inspiration from this notebook here. Otherwise, we're going 14594 23:24:44,600 --> 23:24:51,800 to do it together in three, two, one, let's do the test step. So create a test step function. 14595 23:24:53,080 --> 23:24:59,800 So we want to be able to call these functions in an epoch loop. And that way, instead of writing 14596 23:24:59,800 --> 23:25:03,560 out training and test code for multiple different models, we just write it out once, and we can 14597 23:25:03,560 --> 23:25:10,680 call those functions. So let's create def test step, we're going to do model, which is going to be 14598 23:25:11,960 --> 23:25:16,840 if I could type torch and then module. And then we're going to do data loader, 14599 23:25:17,640 --> 23:25:26,680 which is torch utils dot data, that data loader, capital L there. And then we're going to just 14600 23:25:26,680 --> 23:25:31,960 pass in a loss function here, because we don't need an optimizer for the test function. We're 14601 23:25:31,960 --> 23:25:37,160 not trying to optimize anything, we're just trying to evaluate how our model did on the training 14602 23:25:37,160 --> 23:25:42,600 dataset. And let's put in the device here, why not? That way we can change the device if we need 14603 23:25:42,600 --> 23:25:48,520 to. So put model in a val mode, because we're going to be evaluating or we're going to be testing. 14604 23:25:49,480 --> 23:26:00,360 Then we can set up test loss and test accuracy values. So test loss and test act. We're going 14605 23:26:00,360 --> 23:26:05,480 to make these zero, we're going to accumulate them per batch. But before we go through the batch, 14606 23:26:05,480 --> 23:26:12,120 let's turn on inference mode. So this is behind the scenes going to take care of a lot of pie torch 14607 23:26:12,120 --> 23:26:16,760 functionality that we don't need. That's very helpful during training, such as tracking gradients. 14608 23:26:16,760 --> 23:26:23,400 But during testing, we don't need that. So loop through data loader or data batches. 14609 23:26:23,400 --> 23:26:33,720 And we're going to go for batch x, y in enumerate data loader. You'll notice that above, we didn't 14610 23:26:33,720 --> 23:26:40,600 actually use this batch term here. And we probably won't use it here either. But I just like to go 14611 23:26:40,600 --> 23:26:48,120 through and have that there in case we wanted to use it anyway. So send data to the target device. 14612 23:26:48,120 --> 23:26:59,000 So we're going to go x, y equals x dot two device. And same with y dot two device. Beautiful. And 14613 23:26:59,000 --> 23:27:04,760 then what do we do for an evaluation step or a test step? Well, of course, we do the forward pass, 14614 23:27:05,480 --> 23:27:14,080 forward pass. And we're going to, let's call these test pred logits and get the raw outputs of our 14615 23:27:14,080 --> 23:27:21,680 model. And then we can calculate the loss on those raw outputs, calculate the loss. We get the loss 14616 23:27:21,680 --> 23:27:30,880 is equal to loss function on test pred logits versus y. And then we're going to accumulate the 14617 23:27:30,880 --> 23:27:39,200 loss. So test loss plus equals loss dot item. Remember, item just gets a single integer from 14618 23:27:39,200 --> 23:27:46,320 whatever term you call it on. And then we're going to calculate the accuracy. Now we can do this 14619 23:27:47,040 --> 23:27:53,600 exactly how we've done for the training data set or the training step. So test pred labels, 14620 23:27:53,600 --> 23:27:59,120 we're going to, you don't, I just want to highlight the fact that you actually don't need to take 14621 23:27:59,120 --> 23:28:04,640 the softmax here, you could just take the argmax directly from this. The reason why we take the 14622 23:28:04,640 --> 23:28:11,040 softmax. So you could do the same here, you could just directly take the argmax of the logits. The 14623 23:28:11,040 --> 23:28:16,000 reason why we get the softmax is just for completeness. So if you wanted the prediction probabilities, 14624 23:28:16,000 --> 23:28:22,240 you could use torch dot softmax on the prediction logits. But it's not 100% necessary to get the 14625 23:28:22,240 --> 23:28:27,360 same values. And you can test this out yourself. So try this with and without the softmax and 14626 23:28:27,360 --> 23:28:34,320 see if you get the same results. So we're going to go test accuracy. Plus equals, now we'll just 14627 23:28:34,320 --> 23:28:40,560 create our accuracy calculation on the fly test pred labels. We'll check for equality on the y, 14628 23:28:41,280 --> 23:28:46,080 then we'll get the sum of that, we'll get the item of that, and then we'll divide that by the 14629 23:28:46,080 --> 23:28:53,040 length of the test pred labels. Beautiful. So it's going to give us accuracy per batch. And so now 14630 23:28:53,040 --> 23:29:04,320 we want to adjust the metrics to get average loss and accuracy per batch. So test loss equals 14631 23:29:04,320 --> 23:29:11,120 test loss divided by length of the data loader. And then we're going to go test, 14632 23:29:11,120 --> 23:29:17,360 ac equals test, act divided by length of the data loader. And then finally, we're going to 14633 23:29:17,360 --> 23:29:28,640 return the test loss, not lost, and test accuracy. Look at us go. Now, in previous videos, that took 14634 23:29:28,640 --> 23:29:33,440 us, or in previous sections, that took us a fairly long time. But now we've done it in about 10 14635 23:29:33,440 --> 23:29:38,400 minutes or so. So give yourself a pat in the back for all the progress you've been making. 14636 23:29:38,960 --> 23:29:44,720 But now let's in the next video, we did this in the computer vision section as well. We created, 14637 23:29:44,720 --> 23:29:52,560 do we create a train function? Oh, no, we didn't. But we could. So let's create a function to 14638 23:29:53,120 --> 23:29:57,760 functionize this. We want to train our model. I think we did actually. Deaf train, we've done 14639 23:29:57,760 --> 23:30:05,920 so much. I'm not sure what we've done. Oh, okay. So looks like we might not have. But in the next 14640 23:30:05,920 --> 23:30:11,840 video, give yourself this challenge, create a function called train that combines these two 14641 23:30:11,840 --> 23:30:19,200 functions and loops through them both with an epoch range. So just like we've done here in the 14642 23:30:19,200 --> 23:30:26,080 previous notebook, can you functionize this? So just this step here. So you'll need to take in a 14643 23:30:26,080 --> 23:30:30,800 number of epochs, you'll need to take in a train data loader and a test data loader, a model, a 14644 23:30:30,800 --> 23:30:36,560 loss function, an optimizer, and maybe a device. And I think you should be pretty on your way to 14645 23:30:36,560 --> 23:30:41,200 all the steps we need for train. So give that a shot. But in the next video, we're going to create 14646 23:30:41,200 --> 23:30:47,600 a function that combines train step and test step to train a model. I'll see you there. 14647 23:30:51,360 --> 23:30:56,880 How'd you go? In the last video, I issued you the challenge to combine our train step function, 14648 23:30:56,880 --> 23:31:01,440 as well as our test step function together in their own function so that we could just call 14649 23:31:01,440 --> 23:31:05,840 one function that calls both of these and train a model and evaluate it, of course. 14650 23:31:05,840 --> 23:31:12,560 So let's now do that together. I hope you gave it a shot. That's what it's all about. So we're 14651 23:31:12,560 --> 23:31:18,240 going to create a train function. Now the role of this function is going to, as I said, combine 14652 23:31:18,240 --> 23:31:25,040 train step and test step. Now we're doing all of this on purpose, right, because we want to not 14653 23:31:25,040 --> 23:31:30,000 have to rewrite all of our code all the time. So we want to be functionalizing as many things as 14654 23:31:30,000 --> 23:31:35,600 possible, so that we can just import these later on, if we wanted to train more models and just 14655 23:31:35,600 --> 23:31:41,040 leverage the code that we've written before, as long as it works. So let's see if it does, 14656 23:31:41,040 --> 23:31:48,080 we're going to create a train function. I'm going to first import TQDM, TQDM.auto, 14657 23:31:48,080 --> 23:31:52,560 because I'd like to get a progress bar while our model is training. There's nothing quite like 14658 23:31:52,560 --> 23:32:01,440 watching a neural network train. So step number one is we need to create a train function that takes 14659 23:32:01,440 --> 23:32:13,120 in various model parameters, plus optimizer, plus data loaders, plus a loss function. A whole 14660 23:32:13,120 --> 23:32:18,800 bunch of different things. So let's create def train. And I'm going to pass in a model here, 14661 23:32:19,600 --> 23:32:25,600 which is going to be torch and then dot module. You'll notice that the inputs of this are going 14662 23:32:25,600 --> 23:32:31,680 to be quite similar to our train step and test step. I don't actually need that there. 14663 23:32:32,880 --> 23:32:39,680 So we also want a train data loader for the training data, torch dot utils dot data dot data 14664 23:32:39,680 --> 23:32:47,440 loader. And we also want a test data loader, which is going to be torch dot utils dot data 14665 23:32:47,440 --> 23:32:54,000 dot data loader. And then we want an optimizer. So the optimizer will only be used with our 14666 23:32:54,000 --> 23:32:59,360 training data set, but that's okay. We can take it as an input of the miser. And then we want a 14667 23:32:59,360 --> 23:33:04,880 loss function. This will generally be used for both our training and testing step. Because that's 14668 23:33:04,880 --> 23:33:09,520 what we're combining here. Now, since we're working with multi class classification, 14669 23:33:09,520 --> 23:33:14,640 I'm going to set our loss function to be a default of an n dot cross entropy loss. 14670 23:33:15,920 --> 23:33:21,120 Then I'm going to get epochs. I'm going to set five, we'll train for five epochs by default. 14671 23:33:21,120 --> 23:33:27,440 And then finally, I'm going to set the device equal to the device. So what do we get wrong here? 14672 23:33:30,240 --> 23:33:34,000 That's all right. We'll just keep coding. We'll ignore these little red lines. If they 14673 23:33:34,000 --> 23:33:39,680 stay around, we'll come back to them. So step number two, I'm going to create. This is a step 14674 23:33:39,680 --> 23:33:44,960 you might not have seen, but I'm going to create an empty results dictionary. Now, this is going 14675 23:33:44,960 --> 23:33:51,040 to help us track our results. Do you recall in a previous notebook, we outputted a model dictionary 14676 23:33:51,040 --> 23:33:56,880 for how a model went. So if we look at model one results, yeah, we got a dictionary like this. 14677 23:33:57,520 --> 23:34:03,520 So I'd like to create one of these on the fly, but keep track of the result every epoch. So what 14678 23:34:03,520 --> 23:34:09,840 was the loss on epoch number zero? What was the accuracy on epoch number three? So we'll show you 14679 23:34:09,840 --> 23:34:14,160 how I'll do that. We can use a dictionary and just update that while our model trains. 14680 23:34:14,880 --> 23:34:20,000 So results, I want to keep track of the train loss. So we're going to set that equal to an empty 14681 23:34:20,000 --> 23:34:25,840 list and just append to it. I also want to keep track of the train accuracy. We'll set that as 14682 23:34:25,840 --> 23:34:32,240 an empty list as well. I also want to keep track of the test loss. And I also want to keep track 14683 23:34:32,240 --> 23:34:38,080 of the test accuracy. Now, you'll notice over time that these, what you can track is actually 14684 23:34:38,080 --> 23:34:44,640 very flexible. And what your functions can do is also very flexible. So this is not the gold 14685 23:34:44,640 --> 23:34:50,480 standard of doing anything by any means. It's just one way that works. And you'll probably find in 14686 23:34:50,480 --> 23:34:57,280 the future that you need different functionality. And of course, you can code that out. So let's 14687 23:34:57,280 --> 23:35:05,440 now loop through our epochs. So for epoch in TQDM, let's create a range of our epochs above. 14688 23:35:06,160 --> 23:35:09,440 And then we can set the train loss. Have I missed a comma up here somewhere? 14689 23:35:09,440 --> 23:35:17,840 Type annotation not supported for that type of expression. Okay, that's all right. We'll just leave 14690 23:35:17,840 --> 23:35:24,160 that there. So we're going to go train loss and train act, recall that our train step function 14691 23:35:24,800 --> 23:35:30,400 that we created in the previous video, train step returns our train loss and train act. So as I 14692 23:35:30,400 --> 23:35:36,800 said, I want to keep track of these throughout our training. So I'm going to get them from train 14693 23:35:36,800 --> 23:35:43,200 step. Then for each epoch in our range of epochs, we're going to pass in our model and perform a 14694 23:35:43,200 --> 23:35:48,080 training step. So the data loader here is of course going to be the train data loader. The 14695 23:35:48,080 --> 23:35:52,320 loss function is just going to be the loss function that we pass into the train function. 14696 23:35:52,880 --> 23:35:59,520 And then the optimizer is going to be the optimizer. And then the device is going to be device. 14697 23:36:00,160 --> 23:36:03,920 Beautiful. Look at that. We just performed a training step in five lines of code. 14698 23:36:03,920 --> 23:36:08,560 So let's keep pushing forward. It's telling us we've got a whole bunch of different things here. 14699 23:36:08,560 --> 23:36:14,240 Epox is not defined. Maybe we just have to get rid of this. We can't have the type annotation here. 14700 23:36:14,240 --> 23:36:22,160 And that'll that'll stop. That'll stop Google Colab getting angry at us. If it does anymore, 14701 23:36:22,160 --> 23:36:28,400 I'm just going to ignore it for now. Epox. Anyway, we'll leave it at that. We'll find out if there's 14702 23:36:28,400 --> 23:36:33,920 an error later on. Test loss. You might be able to find it before I do. So test step. We're going 14703 23:36:33,920 --> 23:36:39,200 to pass in the model. We're going to pass in a data loader. Now this is going to be the test data 14704 23:36:39,200 --> 23:36:46,080 loader. Look at us go. Grading training and test step functions, loss function. And then we don't 14705 23:36:46,080 --> 23:36:49,280 need an optimizer. We're just going to pass in the device. And then behind the scenes, 14706 23:36:50,160 --> 23:36:56,320 both of these functions are going to train and test our model. How cool is that? So still within 14707 23:36:56,320 --> 23:37:02,640 the loop. This is important. Within the loop, we're going to have number four is we're going to 14708 23:37:02,640 --> 23:37:09,120 print out. Let's print out what's happening. Print out what's happening. We can go print. 14709 23:37:09,120 --> 23:37:14,720 And we'll do a fancy little print statement here. We'll get the epoch. And then we will get 14710 23:37:15,280 --> 23:37:20,800 the train loss, which will be equal to the train loss. We'll get that to, let's go 14711 23:37:20,800 --> 23:37:26,880 four decimal places. How about that? And then we'll get the train accuracy, which is going to be the 14712 23:37:26,880 --> 23:37:34,720 train act. We'll get that to four, maybe three decimal of four, just for just so it looks nice. 14713 23:37:34,720 --> 23:37:39,840 It looks aesthetic. And then we'll go test loss. We'll get that coming out here. And we'll pass 14714 23:37:39,840 --> 23:37:45,120 in the test loss. We'll get that to four decimal places as well. And then finally, we'll get the 14715 23:37:45,120 --> 23:37:52,480 test accuracy. So a fairly long print statement here. But that's all right. We'd like to see how 14716 23:37:52,480 --> 23:37:59,200 our model is doing while it's training. Beautiful. And so again, still within the epoch, we want to 14717 23:37:59,200 --> 23:38:04,000 update our results dictionary so that we can keep track of how our model performed over time. 14718 23:38:04,640 --> 23:38:11,040 So let's pass in results. We want to update the train loss. And so this is going to be this. 14719 23:38:11,040 --> 23:38:21,840 And then we can append our train loss value. So this is just going to expend the list in here 14720 23:38:21,840 --> 23:38:27,840 with the train loss value, every epoch. And then we'll do the same thing on the train accuracy, 14721 23:38:27,840 --> 23:38:41,280 append train act. And then we'll do the same thing again with test loss dot append test loss. 14722 23:38:41,280 --> 23:38:51,280 And then we will finally do the same thing with the test accuracy test accuracy. Now, 14723 23:38:51,280 --> 23:38:56,720 this is a pretty big function. But this is why we write the code now so that we can use it 14724 23:38:56,720 --> 23:39:03,360 multiple times later on. So return the field results at the end of the epoch. So outside the 14725 23:39:03,360 --> 23:39:12,400 epochs loop. So our loop, we're outside it now. Let's return results. Now, I've probably got an 14726 23:39:12,400 --> 23:39:16,640 error somewhere here and you might be able to spot it. Okay, train data loader. Where do we get 14727 23:39:16,640 --> 23:39:23,040 that invalid syntax? Maybe up here, we don't have a comma here. Was that the issue the whole time? 14728 23:39:23,040 --> 23:39:30,720 Wonderful. You might have seen that I'm completely missed that. But we now have a train function 14729 23:39:30,720 --> 23:39:34,960 to train our model. And the train function, of course, is going to call out our train step 14730 23:39:34,960 --> 23:39:42,480 function and our test step function. So what's left to do? Well, nothing less than train and 14731 23:39:42,480 --> 23:39:50,480 evaluate model zero. So our model is way back up here. How about in the next video, we leverage 14732 23:39:50,480 --> 23:39:56,080 our functions, namely just the train function, because it's going to call our train step function 14733 23:39:56,080 --> 23:40:01,600 and our test step function and train our model. So I'm going to encourage you to give that a go. 14734 23:40:01,600 --> 23:40:05,040 You're going to have to go back to the workflow. Maybe you'll maybe already know this. 14735 23:40:06,640 --> 23:40:11,520 So what have we done? We've got our data ready and we turned it into tenses using a combination 14736 23:40:11,520 --> 23:40:16,560 of these functions. We've built and picked a model while we've built a model, which is the 14737 23:40:16,560 --> 23:40:22,720 tiny VGG architecture. Have we created a loss function yet? I don't think we have or an optimizer. 14738 23:40:23,520 --> 23:40:27,920 I don't think we've done that yet. We've definitely built a training loop though. 14739 23:40:28,960 --> 23:40:32,720 We aren't using torch metrics. We're just using accuracy, but we could use this if we want. 14740 23:40:33,280 --> 23:40:37,520 We haven't improved through experimentation yet, but we're going to try this later on and 14741 23:40:37,520 --> 23:40:43,600 then save and reload the model. We've seen this before. So I think we're up to picking a loss 14742 23:40:43,600 --> 23:40:49,760 function and an optimizer. So give that a shot. In the next video, we're going to create a loss 14743 23:40:49,760 --> 23:40:54,000 function and an optimizer and then leverage the functions we've spent in the last two videos 14744 23:40:54,000 --> 23:41:01,520 creating to train our first model model zero on our own custom data set. This is super exciting. 14745 23:41:02,080 --> 23:41:03,120 I'll see you in the next video. 14746 23:41:06,480 --> 23:41:09,840 Who's ready to train and evaluate model zero? Put your hand up. 14747 23:41:09,840 --> 23:41:17,920 I definitely am. So let's do it together. We're going to start off section 7.7 and we're going 14748 23:41:17,920 --> 23:41:24,480 to put in train and evaluate model zero, our baseline model on our custom data set. Now, 14749 23:41:25,120 --> 23:41:29,680 if we refer back to the PyTorch workflow, I issued you the challenge in the last video to try and 14750 23:41:29,680 --> 23:41:34,640 create a loss function and an optimizer. I hope you gave that a go, but we've already built a 14751 23:41:34,640 --> 23:41:40,480 training loop. So we're going to leverage our training loop functions, namely train, train step 14752 23:41:40,480 --> 23:41:47,440 and test step. All we need to do now is instantiate a model, choose a loss function and an optimizer 14753 23:41:47,440 --> 23:41:54,560 and pass those values to our training function. So let's do that. All right, this is so exciting. 14754 23:41:54,560 --> 23:42:05,280 Let's set the random seeds. I'm going to set torch manual seed 42 and torch cuda manual seed 42. 14755 23:42:05,280 --> 23:42:09,760 Now remember, I just want to highlight something. I read an article the other day about not using 14756 23:42:09,760 --> 23:42:16,320 random seeds. The reason why we are using random seeds is for educational purposes. So to try and 14757 23:42:16,320 --> 23:42:21,200 get our numbers on my screen and your screen as close as possible, but in practice, you quite 14758 23:42:21,200 --> 23:42:27,600 often don't use random seeds all the time. The reason why is because you want your models performance 14759 23:42:27,600 --> 23:42:34,400 to be similar regardless of the random seed that you use. So just keep that in mind going forward. 14760 23:42:34,400 --> 23:42:41,280 We're using random seeds to just exemplify how we can get similar numbers on our page. But 14761 23:42:41,280 --> 23:42:46,720 ideally, no matter what the random seed was, our models would go in the same direction. 14762 23:42:46,720 --> 23:42:53,040 That's where we want our models to eventually go. But we're going to train for five epochs. 14763 23:42:53,680 --> 23:43:00,160 And now let's create a recreate an instance of tiny VGG. We can do so because we've created the 14764 23:43:00,160 --> 23:43:07,120 tiny VGG class. So tiny VGG, which is our model zero. We don't have to do this, but we're going 14765 23:43:07,120 --> 23:43:13,600 to do it any later. So we've got all the code in one place, tiny VGG. What is our input shape 14766 23:43:13,600 --> 23:43:21,280 going to be? That is the number of color channels of our target images. And because we're dealing 14767 23:43:21,280 --> 23:43:26,080 with color images, we have an input shape of three. Previously, we used an input shape of one to 14768 23:43:26,080 --> 23:43:32,240 deal with grayscale images. I'm going to set hidden units to 10 in line with the CNN explainer website. 14769 23:43:32,880 --> 23:43:38,960 And the output shape is going to be the number of classes in our training data set. And then, 14770 23:43:38,960 --> 23:43:46,400 of course, we're going to send the target model to the target device. So what do we do now? 14771 23:43:47,040 --> 23:43:52,720 Well, we set up a loss function and an optimizer, loss function, and optimizer. 14772 23:43:53,360 --> 23:43:58,800 So our loss function is going to be because we're dealing with multiclass classification, 14773 23:43:58,800 --> 23:44:05,520 and then cross entropy, if I could spell cross entropy loss. And then we're going to have an 14774 23:44:05,520 --> 23:44:10,720 optimizer. This time, how about we mix things up? How about we try the atom optimizer? Now, 14775 23:44:10,720 --> 23:44:14,880 of course, the optimizer is one of the hyper parameters that you can set for your model, 14776 23:44:14,880 --> 23:44:20,240 and a hyper parameter being a value that you can set yourself. So the parameters that we want to 14777 23:44:20,240 --> 23:44:29,120 optimize are our model zero parameters. And we're going to set a learning rate of 0.001. Now, 14778 23:44:29,120 --> 23:44:34,320 recall that you can tweet this learning rate, if you like, but I believe, did I just see that 14779 23:44:34,320 --> 23:44:43,440 the default learning rate of atom is 0.001? Yeah, there we go. So Adam's default learning rate is 14780 23:44:43,440 --> 23:44:49,920 one to the power of 10 to the negative three. And so that is a default learning rate for Adam. 14781 23:44:49,920 --> 23:44:57,280 And as I said, oftentimes, different variables in the pytorch library, such as optimizers, 14782 23:44:57,280 --> 23:45:02,240 have good default values that work across a wide range of problems. So we're just going to stick 14783 23:45:02,240 --> 23:45:06,320 with the default. If you want to, you can experiment with different values of this. 14784 23:45:07,360 --> 23:45:11,680 But now let's start the timer, because we want to time our models. 14785 23:45:13,440 --> 23:45:20,560 We're going to import from time it. We want to get the default timer class. And I'm going to 14786 23:45:20,560 --> 23:45:26,960 import that as timer, just so we don't have to type out default timer. So the start time is going 14787 23:45:26,960 --> 23:45:32,320 to be timer. This is going to just put a line in the sand of what the start time is at this 14788 23:45:32,320 --> 23:45:37,680 particular line of code. It's going to measure that. And then we're going to train model zero. 14789 23:45:38,240 --> 23:45:44,800 Now this is using, of course, our train function. So let's write model zero results, and then 14790 23:45:44,800 --> 23:45:52,400 they wrote model one, but we're not up to there yet. So let's go train model equals model zero. 14791 23:45:52,400 --> 23:45:58,400 And this is just the training function that we wrote in a previous video. And the train data 14792 23:45:58,400 --> 23:46:03,840 is going to be our train data loader. And we've got train data loader simple, because we're not 14793 23:46:03,840 --> 23:46:10,240 using data augmentation for model one. And then our test data loader is going to be our test data 14794 23:46:10,240 --> 23:46:15,440 loader simple. And then we're going to set our optimizer, which is equal to the optimizer we just 14795 23:46:15,440 --> 23:46:21,360 created. Friendly atom optimizer. And the loss function is going to be the loss function that 14796 23:46:21,360 --> 23:46:28,880 we just created, which is an n cross entropy loss. Finally, we can send in epochs is going to be 14797 23:46:28,880 --> 23:46:35,200 num epochs, which is what we set at the start of this video to five. And of course, we could train 14798 23:46:35,200 --> 23:46:39,680 our model for longer if we wanted to. But the whole idea of when you first start training a model 14799 23:46:39,680 --> 23:46:44,560 is to keep your experiments quick. So that's why we're only training for five, maybe later on you 14800 23:46:44,560 --> 23:46:50,640 train for 10, 20, tweak the learning rate, do a whole bunch of different things. But let's go 14801 23:46:50,640 --> 23:46:56,080 down here, let's end the timer, see how long our models took to train, and the timer and print out 14802 23:46:58,640 --> 23:47:05,200 how long it took. So in a previous section, we created a helper function for this. 14803 23:47:06,160 --> 23:47:10,000 We're just going to simplify it in this section. And we're just going to print out how long the 14804 23:47:10,000 --> 23:47:19,280 training time was. Total training time. Let's go n time minus start time. And then we're going to go 14805 23:47:19,280 --> 23:47:26,960 point, we'll take it to three decimal places, hey, seconds, you ready to train our first model, 14806 23:47:26,960 --> 23:47:33,920 our first convolutional neural network on our own custom data set on pizza, stake and sushi 14807 23:47:33,920 --> 23:47:43,040 images. Let's do it. You're ready? Three, two, one, no errors. Oh, there we go. Okay, 14808 23:47:43,760 --> 23:47:48,480 should this be trained data loader? Did you notice that? What is our trained data 14809 23:47:48,480 --> 23:47:55,200 taker's input? Oh, we're not getting a doc string. Oh, there we go. We want trained data 14810 23:47:55,200 --> 23:48:03,440 loader, data loader, and same with this, I believe. Let's try again. Beautiful. Oh, look at that 14811 23:48:03,440 --> 23:48:10,400 lovely progress bar. Okay, how's our model is training quite fast? Okay. All right, what do we 14812 23:48:10,400 --> 23:48:17,200 get? So we get an accuracy on the training data set of about 40%. And we get an accuracy on the 14813 23:48:17,200 --> 23:48:25,840 test data set of about 50%. Now, what's that telling us? It's telling us that about 50% of the time 14814 23:48:25,840 --> 23:48:32,000 our model is getting the prediction correct. But we've only got three classes. So even if our model 14815 23:48:32,000 --> 23:48:40,800 was guessing, it would get things right 33% of the time. So even if you just guessed pizza every 14816 23:48:40,800 --> 23:48:46,480 single time, because we only have three classes, if you guessed pizza every single time, you get 14817 23:48:46,480 --> 23:48:53,120 a baseline accuracy of 33%. So our model isn't doing too much better than our baseline accuracy. 14818 23:48:53,120 --> 23:48:57,600 Of course, we'd like this number to go higher, and maybe it would if it trained for longer. 14819 23:48:58,480 --> 23:49:03,520 So I'll let you experiment with that. But if you'd like to see some different methods of 14820 23:49:03,520 --> 23:49:10,720 improving a model, recall back in section number O two, we had an improving a model section, 14821 23:49:10,720 --> 23:49:16,160 improving a model. Here we go. So here's some things you might want to try. 14822 23:49:18,160 --> 23:49:24,160 We can improve a model by adding more layers. So if we come back to our tiny VGG architecture, 14823 23:49:25,520 --> 23:49:33,840 right up here, we're only using two convolutional blocks. Perhaps you wanted to add in a convolutional 14824 23:49:33,840 --> 23:49:39,920 block three. You can also add more hidden units. Right now we're using 10 hidden units. You might 14825 23:49:39,920 --> 23:49:44,960 want to double that and see what happens. Fitting for longer. This is what we just spoke about. 14826 23:49:44,960 --> 23:49:50,800 So right now we're only fitting for five epochs. So if you maybe wanted to try double that again, 14827 23:49:50,800 --> 23:49:57,120 and then even double that again, changing the activation functions. So maybe relu is not the 14828 23:49:57,120 --> 23:50:02,720 ideal activation function for our specific use case. Change the learning rate. We've spoken 14829 23:50:02,720 --> 23:50:08,880 about that before. So right now our learning rate is 0.001 for Adam, which is the default. 14830 23:50:08,880 --> 23:50:14,000 But perhaps there's a better learning rate out there. Change the loss function. This is probably not 14831 23:50:15,200 --> 23:50:19,680 in our case, not going to help too much because cross entropy loss is a pretty good loss for 14832 23:50:19,680 --> 23:50:24,880 multi class classification. But these are some things that you could try these first three, 14833 23:50:24,880 --> 23:50:29,760 especially. You could try quite quickly. You could try doubling the layers. You could try 14834 23:50:29,760 --> 23:50:34,320 adding more hidden units. And you could try fitting for longer. So I'd give that a shot. 14835 23:50:34,320 --> 23:50:42,960 But in the next video, we're going to take our model zero results, which is a dictionary or at 14836 23:50:42,960 --> 23:50:49,760 least it should be. And we're going to plot some loss curves. So this is a good way to inspect how 14837 23:50:49,760 --> 23:50:55,440 our model is training. Yes, we've got some values here. Let's plot these in the next video. I'll see you there. 14838 23:50:55,440 --> 23:51:06,400 In the last video, we trained our first convolutional neural network on custom data. So you should be 14839 23:51:06,400 --> 23:51:12,000 very proud of that. That is no small feat to take our own data set of whatever we want 14840 23:51:12,000 --> 23:51:17,200 and train apply to its model on it. However, we did find that it didn't perform as well as we'd 14841 23:51:17,200 --> 23:51:22,800 like it to. We also highlighted a few different things that we could try to do to improve it. 14842 23:51:22,800 --> 23:51:29,360 But now let's plot our models results using a loss curve. So I'm going to write another heading 14843 23:51:29,360 --> 23:51:39,840 down here. We'll go, I believe we're up to 7.8. So plot the loss curves of model zero. So what 14844 23:51:39,840 --> 23:51:49,120 is a loss curve? So I'm going to write down here, a loss curve is a way of tracking your models 14845 23:51:49,120 --> 23:51:55,280 progress over time. So if we just looked up Google and we looked up loss curves, 14846 23:51:56,720 --> 23:52:02,960 oh, there's a great guide by the way. I'm going to link this. But I'd rather if and doubt code it 14847 23:52:02,960 --> 23:52:09,440 out than just look at guides. Yeah, loss curves. So yeah, loss over time. So there's our loss value 14848 23:52:09,440 --> 23:52:15,120 on the left. And there's say steps, which is epochs or batches or something like that. 14849 23:52:15,120 --> 23:52:19,680 Then we've got a whole bunch of different loss curves over here. Essentially, what we want it 14850 23:52:19,680 --> 23:52:27,120 to do is go down over time. So that's the idea loss curve. Let's go back down here. 14851 23:52:28,480 --> 23:52:37,040 And a good guide for different loss curves can be seen here. We're not going to go through that 14852 23:52:37,040 --> 23:52:42,080 just yet. Let's focus on plotting our own models, loss curves, and we can inspect those. 14853 23:52:42,080 --> 23:52:51,920 Let's get the model keys. Get the model zero results keys. I'm going to type in model zero 14854 23:52:51,920 --> 23:52:58,240 results dot keys because it's a dictionary. Let's see if we can write some code to plot these 14855 23:52:59,760 --> 23:53:06,000 values here. So yeah, over time. So we have one value for train loss, train, 14856 23:53:06,000 --> 23:53:12,240 act, test loss, and test act for every epoch. And of course, these lists would be longer if we 14857 23:53:12,240 --> 23:53:18,400 train for more epochs. But let's just how about we create a function called def plot loss curves, 14858 23:53:18,400 --> 23:53:26,880 which will take in a results dictionary, which is of string and a list of floats. So this just 14859 23:53:26,880 --> 23:53:33,600 means that our results parameter here is taking in a dictionary that has a string as a key. 14860 23:53:33,600 --> 23:53:42,000 And it contains a list of floats. That's what this means here. So let's write a doc string 14861 23:53:42,000 --> 23:53:53,200 plots training curves of a results dictionary. Beautiful. And so we're in this section of our 14862 23:53:53,200 --> 23:53:58,880 workflow, which is kind of like a, we're kind of doing something similar to TensorBoard, what it 14863 23:53:58,880 --> 23:54:02,720 does. I'll let you look into that if you want to. Otherwise, we're going to see it later on. 14864 23:54:02,720 --> 23:54:08,160 But we're really evaluating our model here. Let's write some plotting code. We're going to use map plot 14865 23:54:08,160 --> 23:54:17,520 lib. So we want to get the lost values of the results dictionary. So this is training and test. 14866 23:54:19,040 --> 23:54:24,080 Let's set loss equal to results train loss. So this is going to be the loss on the training 14867 23:54:24,080 --> 23:54:30,800 data set. And then we'll create the test loss, which is going to be, well, index on the results 14868 23:54:30,800 --> 23:54:36,640 dictionary and get the test loss. Beautiful. Now we'll do the same and we'll get the accuracy. 14869 23:54:37,280 --> 23:54:43,680 Get the accuracy values of the results dictionary. So training and test. 14870 23:54:44,880 --> 23:54:50,080 Then we're going to go accuracy equals results. This will be the training accuracy train 14871 23:54:50,080 --> 23:55:00,000 act and accuracy. Oh, we'll call this test accuracy actually test accuracy equals results test act. 14872 23:55:00,800 --> 23:55:05,280 Now let's create a number of epochs. So we want to figure out how many epochs we did. We can do 14873 23:55:05,280 --> 23:55:12,880 that by just counting the length of this value here. So figure out how many epochs there were. 14874 23:55:13,520 --> 23:55:18,640 So we'll set epochs equal to a range because we want to plot it over time. Our models results 14875 23:55:18,640 --> 23:55:25,360 over time. That's that's the whole idea of a loss curve. So we'll just get the the length of 14876 23:55:26,320 --> 23:55:33,360 our results here. And we'll get the range. So now we can set up a plot. 14877 23:55:34,960 --> 23:55:41,200 Let's go PLT dot figure. And we'll set the fig size equal to something nice and big because 14878 23:55:41,200 --> 23:55:47,120 we're going to do four plots. We want one for maybe two plots, one for the loss, one for the accuracy. 14879 23:55:47,120 --> 23:55:57,520 And then we'll go plot the loss. PLT dot subplot. We're going to create one row, two columns, 14880 23:55:57,520 --> 23:56:03,920 and index number one. We want to put PLT dot plot. And here's where we're going to plot the 14881 23:56:03,920 --> 23:56:12,480 training loss. So we get that a label of train loss. And then we'll add another plot with epochs 14882 23:56:12,480 --> 23:56:20,560 and test loss. The label here is going to be test loss. And then we'll add a title, which will be 14883 23:56:20,560 --> 23:56:28,000 loss PLT. Let's put a label on the X, which will be epochs. So we know how many steps we've done. 14884 23:56:28,000 --> 23:56:33,920 This plot over here, loss curves, it uses steps. I'm going to use epochs. They mean almost the 14885 23:56:33,920 --> 23:56:41,200 same thing. It depends on what scale you'd like to see your loss curves. We'll get a legend as well 14886 23:56:41,200 --> 23:56:48,560 so that we are the labels appear. Now we're going to plot the accuracy. So PLT dot subplot. 14887 23:56:49,120 --> 23:56:54,560 Let's go one, two, and then index number two that this plot's going to be on PLT dot plot. 14888 23:56:55,120 --> 23:57:02,880 We're going to go epochs accuracy. And the label here is going to be train accuracy. 14889 23:57:04,000 --> 23:57:08,000 And then we'll get on the next plot, which is actually going to be on the same plot. 14890 23:57:08,000 --> 23:57:12,560 We'll put the test accuracy. That way we have the test accuracy and the training accuracy side 14891 23:57:12,560 --> 23:57:20,480 by side, test accuracy same with the train loss and train, sorry, test loss. And then we'll give 14892 23:57:20,480 --> 23:57:25,600 our plot a title. This plot is going to be accuracy. And then we're going to give it an 14893 23:57:25,600 --> 23:57:31,120 X label, which is going to be epochs as well. And then finally, we'll get the plot, but legend, 14894 23:57:31,120 --> 23:57:36,400 a lot of plotting code here. But let's see what this looks like. Hey, if we've done it all right, 14895 23:57:36,400 --> 23:57:42,640 we should be able to pass it in a dictionary just like this and see some nice plots like this. 14896 23:57:44,480 --> 23:57:54,400 Let's give it a go. And I'm going to call plot loss curves. And I'm going to pass in model 0 results. 14897 23:57:58,960 --> 23:58:06,000 All righty then. Okay. So that's not too bad. Now, why do I say that? Well, because we're 14898 23:58:06,000 --> 23:58:12,400 looking here for mainly trends, we haven't trained our model for too long. Quantitatively, we know 14899 23:58:12,400 --> 23:58:18,720 that our model hasn't performed at the way we'd like it to do. So we'd like the accuracy on both 14900 23:58:18,720 --> 23:58:23,680 the train and test data sets to be higher. And then of course, if the accuracy is going higher, 14901 23:58:23,680 --> 23:58:31,520 then the loss is going to come down. So the ideal trend for a loss curve is to go down from 14902 23:58:31,520 --> 23:58:38,160 the top left to the bottom right. In other words, the loss is going down over time. So that's, 14903 23:58:38,160 --> 23:58:44,000 the trend is all right here. So potentially, if we train for more epochs, which I'd encourage 14904 23:58:44,000 --> 23:58:49,680 you to give it a go, our model's loss might get lower. And the accuracy is also trending in the 14905 23:58:49,680 --> 23:58:56,800 right way. Our accuracy, we want it to go up over time. So if we train for more epochs, these curves 14906 23:58:56,800 --> 23:59:02,880 may continue to go on. Now, they may not, they, you never really know, right? You can guess these 14907 23:59:02,880 --> 23:59:08,640 things. But until you try it, you don't really know. So in the next video, we're going to have a 14908 23:59:08,640 --> 23:59:12,800 look at some different forms of loss curves. But before we do that, I'd encourage you to go through 14909 23:59:12,800 --> 23:59:19,360 this guide here, interpreting loss curves. So I feel like if you just search out loss curves, 14910 23:59:19,360 --> 23:59:23,200 you're going to find Google's guide, or you could just search interpreting loss curves. 14911 23:59:23,200 --> 23:59:29,840 Because as you'll see, there's many different ways that loss curves can be interpreted. But the ideal 14912 23:59:29,840 --> 23:59:37,520 trend is for the loss to go down over time, and metrics like accuracy to go up over time. 14913 23:59:38,240 --> 23:59:43,280 So in the next video, let's cover a few different forms of loss curves, such as the ideal loss 14914 23:59:43,280 --> 23:59:47,440 curve, what it looks like when your model's underfitting, and what it looks like when your 14915 23:59:47,440 --> 23:59:52,400 model's overfitting. And if you'd like to have a primer on those things, I'd read through this 14916 23:59:52,400 --> 23:59:57,120 guide here. Don't worry too much if you're not sure what's happening. We're going to cover a bit 14917 23:59:57,120 --> 24:00:05,440 more about loss curves in the next video. I'll see you there. In the last video, we looked at our 14918 24:00:05,440 --> 24:00:12,560 model's loss curves, and also the accuracy curves. And a loss curve is a way to evaluate a model's 14919 24:00:12,560 --> 24:00:18,000 performance over time, such as how long it was training for. And as you'll see, if you Google 14920 24:00:18,000 --> 24:00:24,240 some images of loss curves, you'll see many different types of loss curves. They come in all 14921 24:00:24,240 --> 24:00:29,200 different shapes and sizes. And there's many different ways to interpret loss curves. So 14922 24:00:29,200 --> 24:00:34,560 this is Google's testing and debugging and machine learning guide. So I'm going to set this as 14923 24:00:34,560 --> 24:00:41,360 actually curriculum for this section. So we're up to number eight. Let's have a look at what should 14924 24:00:41,360 --> 24:00:54,400 an ideal loss curve look like. So we'll just link that in there. Now, loss curve, I'll just 14925 24:00:54,400 --> 24:01:05,520 rewrite here, is a loss curve is, I'll just make some space. A loss curve is one of the most 14926 24:01:05,520 --> 24:01:15,440 helpful ways to troubleshoot a model. So the trend of a loss curve, you want it to go down over time, 14927 24:01:15,440 --> 24:01:20,480 and the trend typically of an evaluation metric, like accuracy, you want it to go up over time. 14928 24:01:20,480 --> 24:01:28,720 So let's go into the keynote, loss curves. So a way to evaluate your model's performance over time. 14929 24:01:28,720 --> 24:01:34,320 These are three of the main different forms of loss curve that you'll face. But again, 14930 24:01:34,320 --> 24:01:39,600 there's many different types as mentioned in here, interpreting loss curves. Sometimes you get it 14931 24:01:39,600 --> 24:01:45,200 going all over the place. Sometimes your loss will explode. Sometimes your metrics will be 14932 24:01:45,200 --> 24:01:50,400 contradictory. Sometimes your testing loss will be higher than your training loss. We'll have a 14933 24:01:50,400 --> 24:01:54,400 look at what that is. Sometimes your model gets stuck. In other words, the loss doesn't reduce. 14934 24:01:55,040 --> 24:02:00,480 Let's have a look at some loss curves here in the case of underfitting, overfitting, and just 14935 24:02:00,480 --> 24:02:07,280 right. So this is the Goldilocks zone. Underfitting is when your model's loss on the training and 14936 24:02:07,280 --> 24:02:13,600 test data sets could be lower. So in our case, if we go back to our loss curves, of course, 14937 24:02:13,600 --> 24:02:19,120 we want this to be lower, and we want our accuracy to be higher. So from our perspective, 14938 24:02:19,120 --> 24:02:24,640 it looks like our model is underfitting. And we would probably want to train it for longer, 14939 24:02:24,640 --> 24:02:30,560 say, 10, 20 epochs to see if this train continues. If it keeps going down, it may stop underfitting. 14940 24:02:31,200 --> 24:02:39,600 So underfitting is when your loss could be lower. Now, the inverse of underfitting is called 14941 24:02:39,600 --> 24:02:43,760 overfitting. And so two of the biggest problems in machine learning is trying to 14942 24:02:45,200 --> 24:02:50,640 underfitting. So in other words, make your loss lower and also reduce overfitting. These are 14943 24:02:50,640 --> 24:02:55,520 both active areas of research because you always want your model to perform better, 14944 24:02:55,520 --> 24:03:00,480 but you also want it to perform pretty much the same on the training set as it does the test set. 14945 24:03:01,280 --> 24:03:08,960 And so overfitting would be when your training loss is lower than your testing loss. And why 14946 24:03:08,960 --> 24:03:14,000 would this be overfitting? So it means overfitting because your model is essentially learning the 14947 24:03:14,000 --> 24:03:19,600 training data too well. And that means the loss goes down on the training data set, 14948 24:03:19,600 --> 24:03:26,640 which is typically a good thing. However, this learning is not reflected in the testing data set. 14949 24:03:27,200 --> 24:03:32,320 So your model is essentially memorizing patterns in the training data set that don't 14950 24:03:32,320 --> 24:03:38,320 generalize well to the test data set. So this is where we come to the just right curve is that we 14951 24:03:38,320 --> 24:03:45,760 want, ideally, our training loss to reduce as much as our test loss. And quite often, you'll find 14952 24:03:45,760 --> 24:03:51,040 that the loss is slightly lower on the training set than it is on the test set. And that's just 14953 24:03:51,040 --> 24:03:56,160 because the model is exposed to the training data, and it's never seen the test data before. 14954 24:03:56,800 --> 24:04:01,200 So it might be a little bit lower on the training data set than on the test data set. 14955 24:04:02,000 --> 24:04:08,240 So underfitting, the model's loss could be lower. Overfitting, the model is learning the training 14956 24:04:08,240 --> 24:04:13,760 data too well. Now, this would be equivalent to say you were studying for a final exam, 14957 24:04:13,760 --> 24:04:19,440 and you just memorize the course materials, the training set. And when it came time to the final 14958 24:04:19,440 --> 24:04:25,280 exam, because you don't even memorize the course materials, you couldn't adapt those skills to 14959 24:04:25,280 --> 24:04:31,120 questions you hadn't seen before. So the final exam would be the test set. So that's overfitting. 14960 24:04:31,120 --> 24:04:37,520 The train loss is lower than the test loss. And just right, ideally, you probably won't see 14961 24:04:37,520 --> 24:04:44,160 loss curves this exact smooth. I mean, they might be a little bit jumpy. Ideally, your training loss 14962 24:04:44,160 --> 24:04:48,960 and test loss go down at a similar rate. And of course, there's more combinations of these. If 14963 24:04:48,960 --> 24:04:53,760 you'd like to see them, check out the Google's loss curve guide that you can check that out there. 14964 24:04:53,760 --> 24:04:58,400 That's some extra curriculum. Now, you probably want to know how do you deal with underfitting 14965 24:04:58,400 --> 24:05:02,880 and overfitting? Let's look at a few ways. We'll start with overfitting. 14966 24:05:02,880 --> 24:05:10,960 So we want to reduce overfitting. In other words, we want our model to perform just as 14967 24:05:10,960 --> 24:05:16,320 well on the training data set as it does on the test data set. So one of the best ways to 14968 24:05:16,320 --> 24:05:21,920 reduce overfitting is to get more data. So this means that our training data set will be larger. 14969 24:05:21,920 --> 24:05:28,400 Our model will be exposed to more examples. And with us, in theory, it doesn't always work. 14970 24:05:28,400 --> 24:05:33,440 These all come with a caveat, right? They don't always work as with many things in machine learning. 14971 24:05:33,440 --> 24:05:38,480 So get more data, give your model more chance to learn patterns, generalizable patterns in a 14972 24:05:38,480 --> 24:05:44,480 data set. You can use data augmentation. So make your models training data set harder to learn. 14973 24:05:45,120 --> 24:05:50,880 So we've seen a few examples of data augmentation. You can get better data. So not only more data, 14974 24:05:50,880 --> 24:05:57,360 perhaps the data that you're using isn't that the quality isn't that good. So if you enhance the 14975 24:05:57,360 --> 24:06:02,480 quality of your data set, your model may be able to learn better, more generalizable patterns and 14976 24:06:02,480 --> 24:06:08,720 in turn reduce overfitting. Use transfer learning. So we're going to cover this in a later section 14977 24:06:08,720 --> 24:06:15,680 of the course. But transfer learning is taking one model that works, taking its patterns that it's 14978 24:06:15,680 --> 24:06:21,600 learned and applying it to your own data set. So for example, I'll just go into the Torch Vision 14979 24:06:21,600 --> 24:06:28,880 models library. Many of these models in here in Torch Vision, the models module, have already 14980 24:06:28,880 --> 24:06:35,680 been trained on a certain data set and such as ImageNet. And you can take the weights or the 14981 24:06:35,680 --> 24:06:41,360 patterns that these models have learned. And if they work well on an ImageNet data set, which is 14982 24:06:41,360 --> 24:06:47,520 millions of different images, you can adjust those patterns to your own problem. And oftentimes 14983 24:06:47,520 --> 24:06:52,480 that will help with overfitting. If you're still overfitting, you can try to simplify your model. 14984 24:06:52,480 --> 24:06:58,160 Usually this means taking away things like extra layers, taking away more hidden units. So say you 14985 24:06:58,160 --> 24:07:04,240 had 10 layers, you might reduce it to five layers. Why does this? What's the theory behind this? 14986 24:07:04,240 --> 24:07:10,400 Well, if you simplify your model and take away complexity from your model, you're kind of telling 14987 24:07:10,400 --> 24:07:15,280 your model, hey, use what you've got. And you're going to have to, because you've only got five 14988 24:07:15,280 --> 24:07:20,320 layers now, you're going to have to make sure that those five layers work really well, because 14989 24:07:20,320 --> 24:07:25,200 you've no longer got 10. And the same for hidden units. Say you started with 100 hidden units per 14990 24:07:25,200 --> 24:07:32,160 layer, you might reduce that to 50 and say, hey, you had 100 before. Now use those 50 and make your 14991 24:07:32,160 --> 24:07:41,440 patterns generalizable. Use learning rate decay. So the learning rate is how much your optimizer 14992 24:07:41,440 --> 24:07:52,560 updates your model's weight every step. So learning rate decay is to decay the learning rate 14993 24:07:52,560 --> 24:07:58,720 over time. So you might look this up, you can look this up, go high torch, learning rate, 14994 24:07:59,520 --> 24:08:05,120 scheduling. So what this means is you want to decrease your learning rate over time. 14995 24:08:05,120 --> 24:08:09,280 Now, I know I'm giving you a lot of different things here, but you've got this keynote as a 14996 24:08:09,280 --> 24:08:15,200 reference. So you can come across these over time. So learning rate scheduling. So we might look 14997 24:08:15,200 --> 24:08:23,120 into here, do we have schedule, scheduler, beautiful. So this is going to adjust the learning rate 14998 24:08:23,120 --> 24:08:28,720 over time. So for example, at the start of when a model is training, you might want a higher learning 14999 24:08:28,720 --> 24:08:34,720 rate. And then as the model starts to learn patterns more and more and more, you might want to reduce 15000 24:08:34,720 --> 24:08:39,760 that learning rate over time so that your model doesn't update its patterns too much 15001 24:08:39,760 --> 24:08:45,840 in later epochs. So that's the concept of learning rate scheduling. At the closer you get to convergence, 15002 24:08:46,480 --> 24:08:51,840 the lower you might want to set your learning rate, think of it like this. If you're reaching 15003 24:08:51,840 --> 24:08:57,760 for a coin at the back of a couch, can we get an image of that coin at back of couch? 15004 24:08:57,760 --> 24:09:05,760 Images. So if you're trying to reach a coin in the cushions here, so the closer you get to that coin, 15005 24:09:05,760 --> 24:09:11,760 at the beginning, you might take big steps. But then the closer you get to that coin, the smaller 15006 24:09:11,760 --> 24:09:15,920 the step you might take to pick that coin out. Because if you take a big step when you're really 15007 24:09:15,920 --> 24:09:22,480 close to the coin here, the coin might fall down the couch. The same thing with learning rate decay. 15008 24:09:22,480 --> 24:09:26,640 At the start of your model training, you might take bigger steps as your model works its way 15009 24:09:26,640 --> 24:09:32,800 down the loss curve. But then you get closer and closer to the ideal position on the loss curve. 15010 24:09:32,800 --> 24:09:36,960 You might start to lower and lower that learning rate until you get right very close to the end 15011 24:09:36,960 --> 24:09:43,680 and you can pick up the coin. Or in other words, your model can converge. And then finally, use 15012 24:09:43,680 --> 24:09:50,080 early stopping. So if we go into an image, is there early stopping here? Early stopping. 15013 24:09:50,080 --> 24:10:00,480 Loss curves early stopping. So what this means is that you stop. Yeah, there we go. So there's 15014 24:10:00,480 --> 24:10:04,800 heaps of different guides early stopping with PyTorch. Beautiful. So what this means is before 15015 24:10:04,800 --> 24:10:11,120 your testing error starts to go up, you keep track of your model's testing error. And then you stop 15016 24:10:11,120 --> 24:10:16,400 your model from training or you save the weight or you save the patterns where your model's loss 15017 24:10:16,400 --> 24:10:21,360 was the lowest. So then you could just set your model to train for an infinite amount of training 15018 24:10:21,360 --> 24:10:27,360 steps. And as soon as the testing error starts to increase for say 10 steps in a row, you go back 15019 24:10:27,360 --> 24:10:31,440 to this point here and go, I think that was where our model was the best. And the testing 15020 24:10:31,440 --> 24:10:36,640 error started to increase after that. So we're going to save that model there instead of the model 15021 24:10:36,640 --> 24:10:42,640 here. So that's the concept of early stopping. So that's dealing with overfitting. There are 15022 24:10:42,640 --> 24:10:48,560 other methods to deal with underfitting. So recall underfitting is when we have a loss that isn't as 15023 24:10:48,560 --> 24:10:54,160 low as we'd like it to be. Our model is not fitting the data very well. So it's underfitting. 15024 24:10:54,960 --> 24:10:59,440 So to reduce underfitting, you can add more layers slash units to your model. You're trying to 15025 24:10:59,440 --> 24:11:05,120 increase your model's ability to learn by adding more layers or units. You can again tweak the 15026 24:11:05,120 --> 24:11:09,440 learning rate. Perhaps your learning rate is too high to begin with and your model doesn't learn 15027 24:11:09,440 --> 24:11:14,000 very well. So you can adjust the learning rate again, just like we discussed with reaching for 15028 24:11:14,000 --> 24:11:19,120 that coin at the back of a couch. If your model is still underfitting, you can train for longer. So 15029 24:11:19,120 --> 24:11:24,720 that means giving your model more opportunities to look at the data. So more epochs, that just 15030 24:11:24,720 --> 24:11:29,600 means it's got looking at the training set over and over and over and over again and trying to 15031 24:11:29,600 --> 24:11:35,120 learn those patterns. However, you might find again, if you try to train for too long, your testing 15032 24:11:35,120 --> 24:11:40,960 error will start to go up. Your model might start overfitting if you train too long. So machine 15033 24:11:40,960 --> 24:11:45,760 learning is all about a balance between underfitting and overfitting. You want your model to fit quite 15034 24:11:45,760 --> 24:11:52,720 well. And so this is a great one. So you want your model to start fitting quite well. But then if you 15035 24:11:52,720 --> 24:11:58,320 try to reduce underfitting too much, you might start to overfit and then vice versa, right? If 15036 24:11:58,320 --> 24:12:03,520 you try to reduce overfitting too much, your model might underfit. So this is one of the 15037 24:12:03,520 --> 24:12:07,760 most fun dances in machine learning, the balance between overfitting and underfitting. 15038 24:12:08,640 --> 24:12:13,600 Finally, you might use transfer learning. So transfer learning helps with overfitting and 15039 24:12:13,600 --> 24:12:18,560 underfitting. Recall transfer learning is using a model's learned patterns from Ron problem and 15040 24:12:18,560 --> 24:12:23,280 adjusting them to your own. We're going to see this later on in the course. And then finally, 15041 24:12:23,280 --> 24:12:29,920 use less regularization. So regularization is holding your model back. So it's trying 15042 24:12:29,920 --> 24:12:34,480 to prevent overfitting. So if you do too much preventing of overfitting, in other words, 15043 24:12:34,480 --> 24:12:40,720 regularizing your model, you might end up underfitting. So if we go back, we have a look at the ideal 15044 24:12:40,720 --> 24:12:47,040 curves, underfitting. If you try to prevent underfitting too much, so increasing your model's 15045 24:12:47,040 --> 24:12:51,920 capability to learn, you might end up overfitting. And if you try to prevent overfitting too much, 15046 24:12:51,920 --> 24:12:57,760 you might end up underfitting. We are going for the just right section. And this is going to be a 15047 24:12:57,760 --> 24:13:03,600 balance between these two throughout your entire machine learning career. In fact, it's probably 15048 24:13:03,600 --> 24:13:11,040 the most prevalent area of research is trying to get models not to underfit, but also not to 15049 24:13:11,040 --> 24:13:17,120 overfit. So keep that in mind. A loss curve is a great way to evaluate your model's performance 15050 24:13:17,120 --> 24:13:23,040 over time. And a lot of what we do with the loss curves is try to work out whether our model is 15051 24:13:23,040 --> 24:13:28,480 underfitting or overfitting, and we're trying to get to this just right curve. We might not get 15052 24:13:28,480 --> 24:13:34,240 exactly there, but we want to keep trying getting as close as we can. So with that being said, 15053 24:13:34,800 --> 24:13:41,120 let's now build another model in the next video. And we're going to try a method to try and see if 15054 24:13:41,120 --> 24:13:46,480 we can use data augmentation to prevent our model from overfitting. Although that experiment 15055 24:13:46,480 --> 24:13:50,800 doesn't sound like the most ideal one we could do right now, because it looks like our model is 15056 24:13:50,800 --> 24:13:54,960 underfitting. So with your knowledge of what you've just learned in the previous video, 15057 24:13:54,960 --> 24:14:01,840 how to prevent underfitting, what would you do to increase this model's capability of learning 15058 24:14:01,840 --> 24:14:07,040 patterns in the training data set? Would you train it for longer? Would you add more layers? 15059 24:14:07,040 --> 24:14:12,960 Would you add more hidden units? Have a think and we'll start building another model in the next video. 15060 24:14:12,960 --> 24:14:23,840 Welcome back. In the last video, we covered the important concept of a loss curve and how it can 15061 24:14:23,840 --> 24:14:28,800 give us information about whether our model is underfitting. In other words, our model's loss 15062 24:14:28,800 --> 24:14:35,680 could be lower or whether it's overfitting. In other words, the training loss is lower than the test 15063 24:14:35,680 --> 24:14:41,360 loss or far lower than the validation loss. That's another thing to note here is that I put training 15064 24:14:41,360 --> 24:14:46,960 and test sets here. You could also do this with a validation data set and that the just right, 15065 24:14:46,960 --> 24:14:52,240 the Goldilocks zone, is when our training and test loss are quite similar over time. 15066 24:14:53,600 --> 24:14:57,360 Now, there was a fair bit of information in that last video, so I just wanted to highlight 15067 24:14:57,360 --> 24:15:02,720 that you can get this all in section 04, which is the notebook that we're working on. And then 15068 24:15:02,720 --> 24:15:08,240 if you come down over here, if we come to section 8, watch an ideal loss curve look like we've got 15069 24:15:08,240 --> 24:15:14,000 underfitting, overfitting, just right, how to deal with overfitting. We've got a few options here. 15070 24:15:14,000 --> 24:15:18,160 We've got how to deal with underfitting and then we've got a few options there. And then if we 15071 24:15:18,160 --> 24:15:26,560 wanted to look for more, how to deal with overfitting. You could find a bunch of resources here and then 15072 24:15:26,560 --> 24:15:35,520 how to deal with underfitting. You could find a bunch of resources here as well. So that is a 15073 24:15:35,520 --> 24:15:39,600 very fine line, very fine balance that you're going to experience throughout all of your 15074 24:15:39,600 --> 24:15:45,520 machine learning career. But it's time now to move on. We're going to move on to creating 15075 24:15:45,520 --> 24:15:53,920 another model, which is tiny VGG, with data augmentation this time. So if we go back to the slide, 15076 24:15:54,480 --> 24:15:59,200 data augmentation is one way of dealing with overfitting. Now, it's probably not the most 15077 24:15:59,200 --> 24:16:04,160 ideal experiment that we could take because our model zero, our baseline model, looks like it's 15078 24:16:04,160 --> 24:16:11,520 underfitting. But data augmentation, as we've seen before, is a way of manipulating images 15079 24:16:11,520 --> 24:16:17,520 to artificially increase the diversity of your training data set without collecting more data. 15080 24:16:18,160 --> 24:16:23,440 So we could take our photos of pizza, sushi, and steak and randomly rotate them 30 degrees 15081 24:16:24,080 --> 24:16:29,920 and increase diversity forces a model to learn or hopefully learn. Again, all of these come with 15082 24:16:29,920 --> 24:16:36,640 a caveat of not always being the silver bullet to learn more generalizable patterns. Now, 15083 24:16:36,640 --> 24:16:40,800 I should have spelled generalizable here rather than generalization, but similar thing. 15084 24:16:42,000 --> 24:16:46,400 Let's go here. Let's create to start off with, we'll just write down. 15085 24:16:48,160 --> 24:16:55,040 Now let's try another modeling experiment. So this is in line with our PyTorch workflow, 15086 24:16:55,040 --> 24:17:01,040 trying a model and trying another one and trying another one, so and so over again. This time, 15087 24:17:01,760 --> 24:17:10,160 using the same model as before, but with some slight data augmentation. 15088 24:17:10,880 --> 24:17:15,040 Oh, maybe we're not slight. That's probably not the best word. We'll just say with some data 15089 24:17:15,040 --> 24:17:20,720 augmentation. And if we come down here, we're going to write section 9.1. We need to first 15090 24:17:20,720 --> 24:17:28,320 create a transform with data augmentation. So we've seen what this looks like before. 15091 24:17:28,320 --> 24:17:34,400 We're going to use the trivial augment data augmentation, create training transform, 15092 24:17:34,400 --> 24:17:40,800 which is, as we saw in a previous video, what PyTorch the PyTorch team have recently used 15093 24:17:40,800 --> 24:17:46,080 to train their state-of-the-art computer vision models. So train transform trivial. 15094 24:17:46,080 --> 24:17:53,920 This is what I'm going to call my transform. And I'm just going to from Torch Vision import 15095 24:17:53,920 --> 24:17:58,400 transforms. We've done this before. We don't have to re-import it, but I'm going to do it anyway, 15096 24:17:58,400 --> 24:18:03,280 just to show you that we're re-importing or we're using transforms. And we're going to compose 15097 24:18:03,280 --> 24:18:09,600 a transform here. Recall that transforms help us manipulate our data. So we're going to transform 15098 24:18:09,600 --> 24:18:15,760 our images into size 64 by 64. Then we're going to set up a trivial augment transforms, 15099 24:18:15,760 --> 24:18:24,400 just like we did before, trivial augment wide. And we're going to set the number of magnitude 15100 24:18:24,400 --> 24:18:31,120 bins here to be 31, which is the default here, which means we'll randomly use some data augmentation 15101 24:18:31,120 --> 24:18:38,080 on each one of our images. And it will be applied at a magnitude of 0 to 31, also randomly selected. 15102 24:18:38,640 --> 24:18:44,720 So if we lower this to five, the upper bound of intensity of how much that data augmentation is 15103 24:18:44,720 --> 24:18:51,760 applied to a certain image will be less than if we set it to say 31. Now, our final transform 15104 24:18:51,760 --> 24:18:56,160 here is going to be too tensor because we want our images in tensor format for our model. 15105 24:18:57,120 --> 24:19:03,200 And then I'm going to create a test transform. I'm going to call this simple, which is just going 15106 24:19:03,200 --> 24:19:09,360 to be transforms dot compose. And all that it's going to have, oh, I should put a list here, 15107 24:19:09,360 --> 24:19:17,200 all that it's going to have, we'll just make some space over here, is going to be transforms. 15108 24:19:17,200 --> 24:19:27,600 All we want to do is resize the image size equals 64 64. Now we don't apply data augmentation 15109 24:19:27,600 --> 24:19:33,200 to the test data set, because we only just want to evaluate our models on the test data set. 15110 24:19:33,200 --> 24:19:37,840 Our models aren't going to be learning any generalizable patterns on the test data set, 15111 24:19:37,840 --> 24:19:43,200 which is why we focus our data augmentations on the training data set. And I've just readjusted 15112 24:19:43,200 --> 24:19:51,040 that. I don't want to do that. Beautiful. So we've got a transform ready. Now let's load some data 15113 24:19:51,040 --> 24:20:01,360 using those transforms. So we'll create train and test data sets and data loaders 15114 24:20:01,360 --> 24:20:12,480 with data augmentation. So we've done this before. You might want to try it out on your own. So 15115 24:20:12,480 --> 24:20:18,640 pause the video if you'd like to test it out. Create a data set and a data loader using these 15116 24:20:18,640 --> 24:20:25,360 transforms here. And recall that our data set is going to be creating a data set from pizza, 15117 24:20:25,360 --> 24:20:31,760 steak and sushi for the train and test folders. And that our data loader is going to be batchifying 15118 24:20:31,760 --> 24:20:41,520 our data set. So let's turn our image folders into data sets. Data sets, beautiful. And I'm going 15119 24:20:41,520 --> 24:20:48,000 to write here train data augmented just so we know that it's it's been augmented. We've got a few 15120 24:20:48,000 --> 24:20:52,960 of similar variable names throughout this notebook. So I just want to be as clear as possible. And 15121 24:20:52,960 --> 24:20:59,680 I'm going to use, I'll just re import torch vision data sets. We've seen this before, the image 15122 24:20:59,680 --> 24:21:05,280 folder. So rather than our use our own custom class, we're going to use the existing image folder 15123 24:21:05,280 --> 24:21:12,320 class that's within torch vision data sets. And we have to pass in here a root. So I'll just get 15124 24:21:12,320 --> 24:21:19,920 the doc string there, root, which is going to be equal to our trainer, which recall is the path 15125 24:21:19,920 --> 24:21:28,000 to our training directory. Got that saved. And then I'm going to pass in here, the transform is going 15126 24:21:28,000 --> 24:21:35,920 to be train transform trivial. So our training data is going to be augmented. Thanks to this 15127 24:21:35,920 --> 24:21:41,520 transform here, and transforms trivial augment wide. You know where you can find more about 15128 24:21:41,520 --> 24:21:47,200 trivial augment wide, of course, in the pie torch documentation, or just searching transforms 15129 24:21:47,200 --> 24:21:59,600 trivial augment wide. And did I spell this wrong? trivial. Oh, train train transform. I spelled 15130 24:21:59,600 --> 24:22:06,640 that wrong. Of course I did. So test data, let's create this as test data simple, equals data sets 15131 24:22:06,640 --> 24:22:12,640 dot image folder. And the root D is going to be here the test directory. And the transform is 15132 24:22:12,640 --> 24:22:24,240 just going to be what the test transform simple. Beautiful. So now let's turn these data sets 15133 24:22:24,240 --> 24:22:35,280 into data loaders. So turn our data sets into data loaders. We're going to import os, 15134 24:22:36,080 --> 24:22:41,200 I'm going to set the batch size here to equal to 32. The number of workers that are going to 15135 24:22:41,200 --> 24:22:46,320 load our data loaders, I'm going to set this to os dot CPU count. So there'll be one worker 15136 24:22:46,320 --> 24:22:55,200 per CPU on our machine. I'm going to set here the torch manual seed to 42, because we're going to 15137 24:22:55,200 --> 24:23:03,840 shuffle our training data. Train data loader, I'm going to call this augmented equals data loader. 15138 24:23:03,840 --> 24:23:08,480 Now I just want to I don't need to re import this, but I just want to show you again from 15139 24:23:08,480 --> 24:23:15,600 torch dot utils. You can never have enough practice right dot data. Let's import data loader. 15140 24:23:16,320 --> 24:23:22,960 So that's where we got the data loader class from. Now let's go train data augmented. We'll 15141 24:23:22,960 --> 24:23:28,000 pass in that as the data set. And I'll just put in here the parameter name for completeness. 15142 24:23:28,640 --> 24:23:33,760 That's our data set. And then we want to set the batch size, which is equal to batch size. 15143 24:23:33,760 --> 24:23:42,960 I'm going to set shuffle equal to true. And I'm going to set num workers equal to num workers. 15144 24:23:45,040 --> 24:23:52,080 Beautiful. And now let's do that again with the test data loader that this time test data 15145 24:23:52,080 --> 24:23:56,000 loader. I'm going to call this test data loader simple. We're not using any data 15146 24:23:56,000 --> 24:24:01,760 augmentation on the test data set, just turning our images, our test images into tenses. 15147 24:24:01,760 --> 24:24:08,480 The data set here is going to be test data simple. Going to pass in the batch size equal to batch 15148 24:24:08,480 --> 24:24:14,320 size. So both our data loaders will have a batch size of 32. Going to keep shuffle on false. 15149 24:24:14,320 --> 24:24:26,480 And num workers, I'm going to set to num workers. Look at us go. We've already got a data set 15150 24:24:26,480 --> 24:24:32,640 and a data loader. This time, our data loader is going to be augmented for the training data set. 15151 24:24:32,640 --> 24:24:36,480 And it's going to be nice and simple for the test data set. So this is really similar, 15152 24:24:36,480 --> 24:24:41,440 this data loader to the previous one we made. The only difference in this modeling experiment 15153 24:24:41,440 --> 24:24:46,960 is that we're going to be adding data augmentation, namely trivial augment wide. 15154 24:24:47,680 --> 24:24:52,480 So with that being said, we've got a data set, we've got a data loader. In the next video, 15155 24:24:52,480 --> 24:24:58,240 let's construct and train model one. In fact, you might want to give that a go. So you can use 15156 24:24:58,240 --> 24:25:05,920 our tiny VGG class to make model one. And then you can use our train function to train a new 15157 24:25:05,920 --> 24:25:12,880 tiny VGG instance with our training data loader augmented and our test data loader simple. 15158 24:25:13,520 --> 24:25:17,600 So give that a go and we'll do it together in the next video. I'll see you there. 15159 24:25:17,600 --> 24:25:26,160 Now that we've got our data sets and data loaders with data augmentation ready, 15160 24:25:26,160 --> 24:25:34,080 let's now create another model. So 9.3, we're going to construct and train model one. 15161 24:25:34,880 --> 24:25:40,240 And this time, I'm just going to write what we're going to doing, going to be doing sorry. 15162 24:25:40,240 --> 24:25:46,000 This time, we'll be using the same model architecture, but we're changing the data here. 15163 24:25:46,000 --> 24:25:56,720 Except this time, we've augmented the training data. So we'd like to see how this performs 15164 24:25:56,720 --> 24:26:03,200 compared to a model with no data augmentation. So that was our baseline up here. And that's what 15165 24:26:03,200 --> 24:26:08,160 you'll generally do with your experiments. You'll start as simple as possible and introduce 15166 24:26:08,160 --> 24:26:14,000 complexity when required. So create model one and send it to the target device, that is, 15167 24:26:14,000 --> 24:26:21,120 to the target device. And because of our helpful selves previously, we can create a manual seed 15168 24:26:21,120 --> 24:26:28,320 here, torch.manualseed. And we can create model one, leveraging the class that we created before. 15169 24:26:29,760 --> 24:26:36,720 So although we built tiny VGG from scratch in this video, in this section, sorry, in subsequent 15170 24:26:37,600 --> 24:26:42,000 coding sessions, because we've built it from scratch once and we know that it works, we can 15171 24:26:42,000 --> 24:26:48,800 just recreate it by calling the class and passing in different variables here. So let's get the 15172 24:26:48,800 --> 24:26:54,480 number of classes that we have in our train data augmented classes. And we're going to send it 15173 24:26:54,480 --> 24:27:03,280 to device. And then if we inspect model one, let's have a look. Wonderful. Now let's keep going. 15174 24:27:03,280 --> 24:27:08,160 We can also leverage our training function that we did. You might have tried this before. 15175 24:27:08,160 --> 24:27:17,040 So let's now train our model. She's going to put here. Wonderful. Now we've got a model and 15176 24:27:17,040 --> 24:27:24,960 data loaders. Let's create what do we have to do? We have to create a loss function and an optimizer 15177 24:27:24,960 --> 24:27:37,360 and call upon our train function that we created earlier to train and evaluate our model. Beautiful. 15178 24:27:37,360 --> 24:27:48,720 So I'm going to set the random seeds, torch dot manual seeds, and torch dot CUDA, because we're 15179 24:27:48,720 --> 24:27:54,320 going to be using CUDA. Let's set the manual seed here 42. I'm going to set the number of epochs. 15180 24:27:54,320 --> 24:28:01,520 We're going to keep many of the parameters the same. Set the number of epochs, num epochs equals 15181 24:28:01,520 --> 24:28:06,880 five. We could of course train this model for longer if we really wanted to by increasing the 15182 24:28:06,880 --> 24:28:16,400 number of epochs. But now let's set up the loss function. So loss FN equals NN cross entropy loss. 15183 24:28:16,960 --> 24:28:21,920 Don't forget this just came into mind. Loss function often as well in PyTorch is called 15184 24:28:21,920 --> 24:28:27,200 criterion. So the criterion you're trying to reduce. But I just like to call it loss function. 15185 24:28:28,720 --> 24:28:34,240 And then we're going to have optimizer. Let's use the same optimizer we use before torch dot 15186 24:28:34,240 --> 24:28:41,840 opt in dot atom. Recall SGD and atom are two of the most popular optimizers. So model one dot 15187 24:28:41,840 --> 24:28:46,640 parameters. Then the parameters we're going to optimize. We're going to set the learning rate to 15188 24:28:46,640 --> 24:28:54,000 zero zero one, which is the default for the atom optimizer in PyTorch. Then we're going to start 15189 24:28:54,000 --> 24:29:02,640 the timer. So from time it, let's import the default timer as timer. And we'll go start time 15190 24:29:02,640 --> 24:29:12,320 equals timer. And then let's go train model one. How can we do this? Well, we're going to get a 15191 24:29:12,320 --> 24:29:17,920 results dictionary as model one results. We're going to call upon our train function. Inside our 15192 24:29:17,920 --> 24:29:23,920 train function, we'll pass the model parameter as model one. For the train data loader parameter, 15193 24:29:23,920 --> 24:29:29,520 we're going to pass in train data loader augmented. So our augmented training data loader. 15194 24:29:29,520 --> 24:29:39,520 And for the test data loader, we can pass in here test data loader. Simple. Then we can write our 15195 24:29:39,520 --> 24:29:45,520 optimizer, which will be the atom optimizer. Our loss function is going to be an n cross entropy 15196 24:29:45,520 --> 24:29:50,480 loss, what we've created above. And then we can set the number of epochs is going to be equal to 15197 24:29:50,480 --> 24:29:56,240 num epochs. And then if we really wanted to, we could set the device equal to device, which will 15198 24:29:56,240 --> 24:30:03,600 be our target device. And now let's end the timer and print out how long it took. 15199 24:30:06,240 --> 24:30:18,960 Took n time equals timer. And we'll go print total training time for model one is going to be 15200 24:30:18,960 --> 24:30:28,960 n time minus start time. And oh, it would help if I could spell, we'll get that to three decimal 15201 24:30:28,960 --> 24:30:35,120 places. And that'll be seconds. So you're ready? We look how quickly we built a training pipeline 15202 24:30:35,120 --> 24:30:41,600 for model one. And look how big easily we created it. So go ask for coding all of that stuff up 15203 24:30:41,600 --> 24:30:50,000 before. Let's train our second model, our first model using data augmentation. You're ready? Three, 15204 24:30:50,000 --> 24:30:56,640 two, one, let's go. No errors. Beautiful. We're going nice and quick here. 15205 24:30:59,440 --> 24:31:05,520 So oh, about just over seven seconds. So what what GPU do I have currently? 15206 24:31:05,520 --> 24:31:11,520 Just keep this in mind that I'm using Google Colab Pro. So I get preference in terms of 15207 24:31:11,520 --> 24:31:16,960 allocating a faster GPU. Your model training time may be longer than what I've got, depending on the 15208 24:31:16,960 --> 24:31:22,720 GPU. It also may be faster, again, depending on the GPU. But we get about seven seconds, but it looks 15209 24:31:22,720 --> 24:31:29,440 like our model with data augmentation didn't perform as well as our model without data augmentation. 15210 24:31:29,440 --> 24:31:37,600 Hmm. So how long did our model before without data augmentation take the train? Oh, just over seven 15211 24:31:37,600 --> 24:31:43,440 seconds as well. But we got better results in terms of accuracy on the training and test data sets 15212 24:31:43,440 --> 24:31:48,880 for model zero. So maybe data augmentation doesn't help in our case. And we kind of hinted at that 15213 24:31:48,880 --> 24:31:56,320 because the loss here was already going down. We weren't really overfitting yet. So recall that data 15214 24:31:56,320 --> 24:32:03,280 augmentation is a way to help with overfitting generally. So maybe that wasn't the best step to 15215 24:32:03,280 --> 24:32:09,360 try and improve our model. But let's nonetheless keep evaluating our model. In the next video, 15216 24:32:09,360 --> 24:32:14,320 we're going to plot the loss curves of model one. So in fact, you might want to give that a go. 15217 24:32:14,960 --> 24:32:20,960 So we've got a function plot loss curves, and we've got some results in a dictionary format. 15218 24:32:20,960 --> 24:32:27,200 So try that out, plot the loss curves, and see what you see. Let's do it together in the next video. 15219 24:32:28,000 --> 24:32:28,640 I'll see you there. 15220 24:32:31,920 --> 24:32:36,720 In the last video, we did the really exciting thing of training our first model with data 15221 24:32:36,720 --> 24:32:43,680 augmentation. But we also saw that quantitatively, it looks like that it didn't give us much improvement. 15222 24:32:43,680 --> 24:32:48,320 So let's keep evaluating our model here. I'm going to make a section. Recall that one of my 15223 24:32:48,320 --> 24:32:52,960 favorite ways or one of the best ways, not just my favorite, to evaluate the performance of a 15224 24:32:52,960 --> 24:33:03,040 model over time is to plot the loss curves. So a loss curve helps you evaluate your model's performance 15225 24:33:04,000 --> 24:33:10,880 over time. And it will also give you a great visual representation or a visual way to see if 15226 24:33:10,880 --> 24:33:18,880 your model is underfitting or overfitting. So let's plot the loss curves of model one results and see 15227 24:33:18,880 --> 24:33:24,640 what happens. We're using this function we created before. And oh my goodness, is that going in the 15228 24:33:24,640 --> 24:33:32,400 right direction? It looks like our test loss is going up here. Now, is that where we want it to go? 15229 24:33:33,040 --> 24:33:38,320 Remember the ideal direction for a loss curve is to go down over time because loss is measuring 15230 24:33:38,320 --> 24:33:44,160 what? It's measuring how wrong our model is. And the accuracy curve looks like it's all over the 15231 24:33:44,160 --> 24:33:49,280 place as well. I mean, it's going up kind of, but maybe we don't have enough time to measure these 15232 24:33:49,280 --> 24:33:56,160 things. So an experiment that you could do is train both of our models model zero and model one 15233 24:33:56,720 --> 24:34:02,640 for more epochs and see if these loss curves flatten out. So I'll pose you the question, 15234 24:34:02,640 --> 24:34:09,440 is our model underfitting or overfitting right now or both? So if we want to have a look at the 15235 24:34:09,440 --> 24:34:15,280 loss curves, our just right is for the loss that is, this is not for accuracy, this is for loss over 15236 24:34:15,280 --> 24:34:23,520 time, we want it to go down. So for me, our model is underfitting because our loss could be lower, 15237 24:34:23,520 --> 24:34:28,960 but it also looks like it's overfitting as well. So it's not doing a very good job because our test 15238 24:34:28,960 --> 24:34:38,080 loss is far higher than our training loss. So if we go back to section four of the LearnPyTorch.io 15239 24:34:38,080 --> 24:34:43,280 book, what should an ideal loss curve look like? I'd like you to start thinking of some ways 15240 24:34:43,840 --> 24:34:49,920 that we could deal with overfitting of our model. So could we get more data? Could we simplify it? 15241 24:34:49,920 --> 24:34:53,360 Could we use transfer learning? We're going to see that later on, but you might want to jump 15242 24:34:53,360 --> 24:34:58,320 ahead and have a look. And if we're dealing with underfitting, what are some other things that we 15243 24:34:58,320 --> 24:35:03,280 could try with our model? Could we add some more layers, potentially another convolutional block? 15244 24:35:03,280 --> 24:35:08,000 Could we increase the number of hidden units per layer? So if we've got currently 10 hidden units 15245 24:35:08,000 --> 24:35:13,040 per layer, maybe you want to increase that to 64 or something like that? Could we train it for 15246 24:35:13,040 --> 24:35:17,120 longer? That's probably one of the easiest things to try with our current training functions. We 15247 24:35:17,120 --> 24:35:24,720 could train for 20 epochs. So have a go at this, reference this, try out some experiments with, 15248 24:35:24,720 --> 24:35:30,960 see if you can get these loss curves more towards the ideal shape. And in the next video, we're going 15249 24:35:30,960 --> 24:35:35,920 to keep pushing forward. We're going to compare our model results. So we've done two experiments. 15250 24:35:35,920 --> 24:35:39,760 Let's now see them side by side. We've looked at our model results individually, 15251 24:35:39,760 --> 24:35:44,480 and we know that they could be improved. But a good way to compare all of your experiments 15252 24:35:44,480 --> 24:35:49,520 is to compare your model's results side by side. So that's what we're going to do in the next video. 15253 24:35:49,520 --> 24:35:59,120 I'll see you there. Now that we've compared our models loss curves on their own individually, 15254 24:35:59,120 --> 24:36:05,840 how about we compare our model results to each other? So let's have a look at comparing our model 15255 24:36:05,840 --> 24:36:13,520 results. And so I'm going to write a little note here that after evaluating our modeling 15256 24:36:13,520 --> 24:36:24,080 experiments on their own, it's important to compare them to each other. And there's a few 15257 24:36:24,080 --> 24:36:31,040 different ways to do this. There's a few different ways to do this. Number one is hard coding. 15258 24:36:32,000 --> 24:36:37,600 So like we've done, we've written functions, we've written helper functions and whatnot, 15259 24:36:37,600 --> 24:36:42,080 and manually plotted things. So I'm just going to write in here, this is what we're doing. 15260 24:36:42,080 --> 24:36:48,880 Then, of course, there are tools to do this, such as PyTorch plus TensorBoard. So I'll link to this, 15261 24:36:48,880 --> 24:36:54,720 PyTorch TensorBoard. We're going to see this in a later section of the course. TensorBoard is a 15262 24:36:54,720 --> 24:36:59,760 great resource for tracking your experiments. If you'd like to jump forward and have a look at what 15263 24:36:59,760 --> 24:37:05,040 that is in the PyTorch documentation, I'd encourage you to do so. Then another one of my favorite 15264 24:37:05,040 --> 24:37:13,840 tools is weights and biases. So these are all going to involve some code as well, but they help out 15265 24:37:13,840 --> 24:37:19,840 with automatically tracking different experiments. So weights and biases is one of my favorite, 15266 24:37:20,400 --> 24:37:25,040 and you've got platform for experiments. That's what you'll be looking at. So if you run multiple 15267 24:37:25,040 --> 24:37:30,720 experiments, you can set up weights and biases pretty easy to track your different model hub 15268 24:37:30,720 --> 24:37:37,040 parameters. So PyTorch, there we go. Import weights and biases, start a new run on weights and biases. 15269 24:37:37,760 --> 24:37:42,800 You can save the learning rate value and whatnot, go through your data and just log everything there. 15270 24:37:43,360 --> 24:37:48,240 So this is not a course about different tools. We're going to focus on just pure PyTorch, 15271 24:37:48,240 --> 24:37:51,360 but I thought I'd leave these here anyway, because you're going to come across them 15272 24:37:51,360 --> 24:37:58,000 eventually, and MLflow is another one of my favorites as well. We have ML tracking, 15273 24:37:58,000 --> 24:38:03,200 projects, models, registry, all that sort of stuff. If you'd like in to look into 15274 24:38:03,200 --> 24:38:08,400 more ways to track your experiments, there are some extensions. But for now, we're going to stick 15275 24:38:08,400 --> 24:38:13,840 with hard coding. We're just going to do it as simple as possible to begin with. And if we wanted 15276 24:38:13,840 --> 24:38:20,080 to add other tools later on, we can sure do that. So let's create a data frame for each of our model 15277 24:38:20,080 --> 24:38:26,480 results. We can do this because our model results recall are in the form of dictionaries. So model 15278 24:38:26,480 --> 24:38:33,200 zero results. But you can see what we're doing now by hard coding this, it's quite cumbersome. 15279 24:38:33,200 --> 24:38:38,720 Can you imagine if we had say 10 models or even just five models, we'd have to really 15280 24:38:38,720 --> 24:38:45,040 write a fair bit of code here for all of our dictionaries and whatnot, whereas these tools 15281 24:38:45,040 --> 24:38:51,520 here help you to track everything automatically. So we've got a data frame here. Model zero results 15282 24:38:51,520 --> 24:38:56,720 over time. These are our number of epochs. We can notice that the training loss starts to go down. 15283 24:38:56,720 --> 24:39:02,240 The testing loss also starts to go down. And the accuracy on the training and test data set starts 15284 24:39:02,240 --> 24:39:06,800 to go up. Now, those are the trends that we're looking for. So an experiment you could try would 15285 24:39:06,800 --> 24:39:12,400 be to train this model zero for longer to see if it improved. But we're currently just interested 15286 24:39:12,400 --> 24:39:18,800 in comparing results. So let's set up a plot. I want to plot model zero results and model one 15287 24:39:18,800 --> 24:39:24,880 results on the same plot. So we'll need a plot for training loss. We'll need a plot for training 15288 24:39:24,880 --> 24:39:31,360 accuracy, test loss and test accuracy. And then we want two separate lines on each of them. One 15289 24:39:31,360 --> 24:39:37,040 for model zero and one for model one. And this particular pattern would be similar regardless if 15290 24:39:37,040 --> 24:39:42,640 we had 10 different experiments, or if we had 10 different metrics we wanted to compare, 15291 24:39:42,640 --> 24:39:47,040 you generally want to plot them all against each other to make them visual. And that's what tools 15292 24:39:47,040 --> 24:39:52,800 such as weights and biases, what TensorBoard, and what ML flow can help you to do. I'm just 15293 24:39:52,800 --> 24:39:59,920 going to get out of that, clean up our browser. So let's set up a plot here. I'm going to use 15294 24:39:59,920 --> 24:40:04,320 matplotlib. I'm going to put in a figure. I'm going to make it quite large because we want four 15295 24:40:04,320 --> 24:40:10,000 subplots, one for each of the metrics we want to compare across our different models. Now, 15296 24:40:10,000 --> 24:40:18,560 let's get number of epochs. So epochs is going to be length, or we'll turn it into a range, actually, 15297 24:40:19,360 --> 24:40:28,720 range of Len model zero DF. So that's going to give us five. Beautiful range between zero and five. 15298 24:40:29,280 --> 24:40:36,560 Now, let's create a plot for the train loss. We want to compare the train loss across model zero 15299 24:40:36,560 --> 24:40:45,440 and the train loss across model one. So we can go PLT dot subplot. Let's create a plot with two 15300 24:40:45,440 --> 24:40:50,240 rows and two columns. And this is going to be index number one will be the training loss. 15301 24:40:50,240 --> 24:40:57,280 We'll go PLT dot plot. I'm going to put in here epochs and then model zero DF. Inside here, 15302 24:40:57,280 --> 24:41:04,400 I'm going to put train loss for our first metric. And then I'm going to label it with model zero. 15303 24:41:04,400 --> 24:41:12,000 So we're comparing the train loss on each of our modeling experiments. Recall that model zero was 15304 24:41:12,000 --> 24:41:19,280 our baseline model. And that was tiny VGG without data augmentation. And then we tried out model one, 15305 24:41:19,280 --> 24:41:25,760 which was the same model. But all we did was we added a data augmentation transform to our training 15306 24:41:25,760 --> 24:41:35,360 data. So PLT will go x label. They both used the same test data set and PLT dot legend. Let's see 15307 24:41:35,360 --> 24:41:42,560 what this looks like. Wonderful. So there's our training loss across two different models. 15308 24:41:43,520 --> 24:41:49,200 So we notice that model zero is trending in the right way. Model one kind of exploded on epoch 15309 24:41:49,200 --> 24:41:56,080 number that would be zero, one, two, or one, depending how you're counting. Let's just say epoch number 15310 24:41:56,080 --> 24:42:00,720 two, because that's easier. The loss went up. But then it started to go back down. So again, 15311 24:42:00,720 --> 24:42:05,360 if we continued training these models, we might notice that the overall trend of the loss is 15312 24:42:05,360 --> 24:42:11,440 going down on the training data set, which is exactly what we'd like. So let's now plot, 15313 24:42:12,080 --> 24:42:18,160 we'll go the test loss. So I'm going to go test loss here. And then I'm going to change this. 15314 24:42:18,160 --> 24:42:25,520 I believe if I hold control, or command, maybe, nope, or option on my Mac keyboard, 15315 24:42:25,520 --> 24:42:29,680 yeah, so it might be a different key on Windows. But for me, I can press option and I can get a 15316 24:42:29,680 --> 24:42:35,520 multi cursor here. So I'm just going to come back in here. And that way I can backspace there 15317 24:42:35,520 --> 24:42:42,320 and just turn this into test loss. Wonderful. So I'm going to put this as test loss as the title. 15318 24:42:42,320 --> 24:42:48,960 And I need to change the index. So this will be index one, index two, index three, index four. 15319 24:42:49,600 --> 24:42:55,360 Let's see what this looks like. Do we get the test loss? Beautiful. That's what we get. 15320 24:42:55,360 --> 24:43:00,000 However, we noticed that model one is probably overfitting at this stage. So maybe the data 15321 24:43:00,000 --> 24:43:05,920 augmentation wasn't the best change to make to our model. Recall that even if you make a change 15322 24:43:05,920 --> 24:43:11,120 to your model, such as preventing overfitting or underfitting, it won't always guarantee that 15323 24:43:11,120 --> 24:43:17,680 the change takes your model's evaluation metrics in the right direction. Ideally, loss is going 15324 24:43:17,680 --> 24:43:25,280 from top left to bottom right over time. So looks like model zero is winning out here at the moment 15325 24:43:25,280 --> 24:43:32,640 on the loss front. So now let's plot the accuracy for both training and test. So I'm going to change 15326 24:43:32,640 --> 24:43:38,960 this to train. I'm going to put this as accuracy. And this is going to be index number three on the 15327 24:43:38,960 --> 24:43:47,440 plot. And do we save it as, yeah, just act? Wonderful. So I'm going to option click here on my Mac. 15328 24:43:48,080 --> 24:43:54,080 This is going to be train. And this is going to be accuracy here. And then I'll change this one to 15329 24:43:54,800 --> 24:44:00,560 accuracy. And then I'm going to change this to accuracy. And this is going to be plot number four, 15330 24:44:01,200 --> 24:44:07,680 two rows, two columns, index number four. And I'm going to option click here to have two cursors, 15331 24:44:07,680 --> 24:44:15,360 test, act. And then I'll change this to test, act. And I'm going to get rid of the legend here. 15332 24:44:15,360 --> 24:44:20,960 It takes a little bit to plot because we're doing four graphs in one hit. Wonderful. So that's 15333 24:44:20,960 --> 24:44:26,800 comparing our models. But do you see how we could potentially functionalize this to plot, however, 15334 24:44:26,800 --> 24:44:32,880 many model results that we have? But if we had say another five models, we did another five 15335 24:44:32,880 --> 24:44:37,360 experiments, which is actually not too many experiments on a problem, you might find that 15336 24:44:37,360 --> 24:44:41,840 sometimes you do over a dozen experiments for a single modeling problem, maybe even more. 15337 24:44:42,560 --> 24:44:47,280 These graphs can get pretty outlandish with all the little lines going through. So that's 15338 24:44:47,280 --> 24:44:54,400 again what tools like TensorBoard, weights and biases and MLflow will help with. But if we have 15339 24:44:54,400 --> 24:44:59,440 a look at the accuracy, it seems that both of our models are heading in the right direction. 15340 24:44:59,440 --> 24:45:05,360 We want to go from the bottom left up in the case of accuracy. But the test accuracy that's training, 15341 24:45:05,360 --> 24:45:10,000 oh, excuse me, is this not training accuracy? I messed up that. Did you catch that one? 15342 24:45:12,000 --> 24:45:16,880 So training accuracy, we're heading in the right direction, but it looks like model one is 15343 24:45:16,880 --> 24:45:20,960 yeah, still overfitting. So the results we're getting on the training data set 15344 24:45:20,960 --> 24:45:26,160 aren't coming over to the testing data set. And that's what we really want our models to shine 15345 24:45:26,160 --> 24:45:32,880 is on the test data set. So metrics on the training data set are good. But ideally, 15346 24:45:32,880 --> 24:45:37,840 we want our models to perform well on the test data set data it hasn't seen before. 15347 24:45:38,480 --> 24:45:42,560 So that's just something to keep in mind. Whenever you do a series of modeling experiments, 15348 24:45:42,560 --> 24:45:47,920 it's always good to not only evaluate them individually, evaluate them against each other. 15349 24:45:47,920 --> 24:45:52,320 So that way you can go back through your experiments, see what worked and what didn't. 15350 24:45:52,320 --> 24:45:56,320 If you were to ask me what I would do for both of these models, I would probably train them for 15351 24:45:56,320 --> 24:46:02,400 longer and maybe add some more hidden units to each of the layers and see where the results go from 15352 24:46:02,400 --> 24:46:08,560 there. So give that a shot. In the next video, let's see how we can use our trained models to 15353 24:46:08,560 --> 24:46:14,880 make a prediction on our own custom image of food. So yes, we used a custom data set of 15354 24:46:14,880 --> 24:46:21,040 pizza steak and sushi images. But what if we had our own, what if we finished this model training 15355 24:46:21,040 --> 24:46:25,840 and we decided, you know what, this is a good enough model. And then we deployed it to an app like 15356 24:46:25,840 --> 24:46:32,320 neutrify dot app, which is a food recognition app that I'm personally working on. Then we wanted to 15357 24:46:32,320 --> 24:46:38,080 upload an image and have it be classified by our pytorch model. So let's give that a shot, see how 15358 24:46:38,080 --> 24:46:44,640 we can use our trained model to predict on an image that's not in our training data and not in our 15359 24:46:44,640 --> 24:46:54,560 testing data. I'll see you in the next video. Welcome back. In the last video, we compared our 15360 24:46:54,560 --> 24:47:00,640 modeling experiments. Now we're going to move on to one of the most exciting parts of deep learning. 15361 24:47:00,640 --> 24:47:13,280 And that is making a prediction on a custom image. So although we've trained a model on custom data, 15362 24:47:14,560 --> 24:47:23,200 how do you make a prediction on a sample slash image in our case? That's not in either 15363 24:47:23,200 --> 24:47:30,640 the training or testing data set. So let's say you were building a food recognition app, 15364 24:47:30,640 --> 24:47:35,360 such as neutrify, take a photo of food and learn about it. You wanted to use computer vision to 15365 24:47:35,360 --> 24:47:41,280 essentially turn foods into QR codes. So I'll just show you the workflow here. If we were to upload 15366 24:47:41,280 --> 24:47:48,000 this image of my dad giving two thumbs up for a delicious pizza. And what does neutrify predicted 15367 24:47:48,000 --> 24:47:54,160 as pizza? Beautiful. So macaronutrients that you get some nutrition information and then the time 15368 24:47:54,160 --> 24:48:00,400 taken. So we could replicate a similar process to this using our trained PyTorch model, or be it. 15369 24:48:00,400 --> 24:48:05,360 It's not going to be too great of results or performance because we've seen that we could 15370 24:48:05,360 --> 24:48:11,360 improve our models, but based on the accuracy here and based on the loss and whatnot. But let's just 15371 24:48:11,360 --> 24:48:16,720 see what it's like, the workflow. So the first thing we're going to do is get a custom image. 15372 24:48:16,720 --> 24:48:23,440 Now we could upload one here, such as clicking the upload button in Google Colab, choosing an image 15373 24:48:23,440 --> 24:48:29,200 and then importing it like that. But I'm going to do so programmatically, as you've seen before. 15374 24:48:29,200 --> 24:48:35,360 So let's write some code in this video to download a custom image. I'm going to do so using requests 15375 24:48:36,320 --> 24:48:43,040 and like all good cooking shows, I've prepared a custom image for us. So custom image path. But 15376 24:48:43,040 --> 24:48:48,240 again, you could use this process that we're going to go through with any of your own images 15377 24:48:48,240 --> 24:48:54,000 of pizza, steak or sushi. And if you wanted to train your own model on another set of custom data, 15378 24:48:54,000 --> 24:49:00,160 the workflow will be quite similar. So I'm going to download a photo called pizza dad, 15379 24:49:00,880 --> 24:49:07,280 which is my dad, two big thumbs up. And so I'm going to download it from github. So this image is 15380 24:49:07,280 --> 24:49:12,800 on the course github. And let's write some code to download the image. If it doesn't already exist 15381 24:49:13,440 --> 24:49:20,720 in our Colab instance. So if you wanted to upload a single image, you could click with this button. 15382 24:49:20,720 --> 24:49:25,120 Just be aware that like all of our other data, it's going to disappear if Colab disconnects. 15383 24:49:25,120 --> 24:49:28,480 So that's why I like to write code. So we don't have to re upload it every time. 15384 24:49:28,480 --> 24:49:38,720 So if not custom image path is file, let's open a request here or open a file going to open up 15385 24:49:38,720 --> 24:49:46,640 the custom image path with right binary permissions as F short for file. And then when downloading, 15386 24:49:47,360 --> 24:49:54,000 this is because our image is stored on github. When downloading an image or when downloading 15387 24:49:54,000 --> 24:50:00,960 from github in general, you typically want the raw link need to use the raw file link. 15388 24:50:01,760 --> 24:50:08,320 So let's write a request here equals request dot get. So if we go to the pytorch deep learning 15389 24:50:08,320 --> 24:50:15,120 repo, then if we go into, I believe it might be extras, not in extras, it's going to be in images, 15390 24:50:15,120 --> 24:50:19,200 that would make a lot more sense. Wouldn't it Daniel? Let's get O for pizza dad. 15391 24:50:19,200 --> 24:50:26,560 So if we have a look, this is pytorch deep learning images, O for pizza dad. There's a big version 15392 24:50:26,560 --> 24:50:32,160 of the image there. And then if we click download, just going to give us the raw link. Yeah, there we 15393 24:50:32,160 --> 24:50:36,880 go. So that's the image. Hey dad, how you doing? Is that pizza delicious? It looks like it. 15394 24:50:36,880 --> 24:50:43,920 Let's see if our model can get this right. What do you think? Will it? So of course, we want 15395 24:50:43,920 --> 24:50:50,400 our model to predict pizza for this image because it's got a pizza in it. So custom image path, 15396 24:50:51,040 --> 24:50:56,800 we're going to download that. I've just put in the raw URL above. So notice the raw 15397 24:50:57,360 --> 24:51:04,080 github user content. That's from the course github. Then I'm going to go f dot right. So file, 15398 24:51:05,040 --> 24:51:13,360 write the request content. So the content from the request, in other words, the raw file from 15399 24:51:13,360 --> 24:51:18,880 github here. Similar workflow for if you were getting another image from somewhere else on 15400 24:51:18,880 --> 24:51:26,560 the internet and else if it is already downloaded, let's just not download it. So print f custom image 15401 24:51:26,560 --> 24:51:36,240 path already exists skipping download. And let's see if this works or run the code. So downloading 15402 24:51:36,240 --> 24:51:46,080 data o four pizza dad dot jpeg. And if we go into here, we refresh. There we go. Beautiful. So our 15403 24:51:46,080 --> 24:51:52,640 data or our custom image, sorry, is now in our data folder. So if we click on this, this is inside 15404 24:51:52,640 --> 24:52:01,760 Google CoLab now. Beautiful. We got a big nice big image there. And there's a nice big pizza there. 15405 24:52:01,760 --> 24:52:07,680 So we're going to be writing some code over the next few videos to do the exact same process as 15406 24:52:07,680 --> 24:52:13,360 what we've been doing to import our custom data set for our custom image. What do we still have to 15407 24:52:13,360 --> 24:52:18,560 do? We still have to turn it into tenses. And then we have to pass it through our model. So let's see 15408 24:52:18,560 --> 24:52:28,000 what that looks like over the next few videos. We are up to one of the most exciting parts of 15409 24:52:28,000 --> 24:52:34,800 building dev learning models. And that is predicting on custom data in our case, a custom image of 15410 24:52:35,760 --> 24:52:40,640 a photo of my dad eating pizza. So of course, we're training a computer vision model on here on 15411 24:52:40,640 --> 24:52:45,840 pizza steak and sushi. So hopefully the ideal result for our model to predict on this image 15412 24:52:45,840 --> 24:52:53,200 will be pizza. So let's keep going. Let's figure out how we can get our image, our custom image, 15413 24:52:53,200 --> 24:52:59,760 our singular image into Tensor form, loading in a custom image with pytorch, creating another 15414 24:52:59,760 --> 24:53:06,160 section here. So I'm just going to write down here, we have to make sure our custom image is in the 15415 24:53:06,160 --> 24:53:18,240 same format as the data our model was trained on. So namely, that was in Tensor form with data type 15416 24:53:18,240 --> 24:53:29,120 torch float 32. And then of shape 64 by 64 by three. So we might need to change the shape of our 15417 24:53:29,120 --> 24:53:37,760 image. And then we need to make sure that it's on the right device. Command MM, beautiful. So let's 15418 24:53:37,760 --> 24:53:44,800 see what this looks like. Hey, so if I'm going to import torch vision. Now the package you use to 15419 24:53:44,800 --> 24:53:50,720 load your data will depend on the domain you're in. So let's open up the torch vision documentation. 15420 24:53:51,760 --> 24:53:56,240 We can go to models. That's okay. So if we're working with text, you might want to look in 15421 24:53:56,240 --> 24:54:01,760 here for some input and output functions, so some loading functions, torch audio, same thing. 15422 24:54:02,320 --> 24:54:07,120 Torch vision is what we're working with. Let's click into torch vision. Now we want to look into 15423 24:54:07,120 --> 24:54:12,480 reading and writing images and videos because we want to read in an image, right? We've got a 15424 24:54:12,480 --> 24:54:17,520 custom image. We want to read it in. So this is part of your extracurricular, by the way, to go 15425 24:54:17,520 --> 24:54:22,080 through these for at least 10 minutes each. So spend an hour if you're going through torch vision. 15426 24:54:22,080 --> 24:54:26,960 You could do the same across these other ones. It will just really help you familiarize yourself 15427 24:54:26,960 --> 24:54:33,120 with all the functions of PyTorch domain libraries. So we want to look here's some options for video. 15428 24:54:33,120 --> 24:54:38,480 We're not working with video. Here's some options for images. Now what do we want to do? We want 15429 24:54:38,480 --> 24:54:44,400 to read in an image. So we've got a few things here. Decode image. Oh, I've skipped over one. 15430 24:54:44,960 --> 24:54:51,520 We can write a JPEG if we wanted to. We can encode a PNG. Let's jump into this one. Read image. 15431 24:54:51,520 --> 24:54:58,560 What does it do? Read the JPEG or PNG into a three-dimensional RGB or grayscale tensor. 15432 24:54:58,560 --> 24:55:03,040 That is what we want. And then optionally converts the image to the desired format. 15433 24:55:03,040 --> 24:55:10,640 The values of the output tensor are you int eight. Okay. Beautiful. So let's see what this looks like. 15434 24:55:10,640 --> 24:55:16,400 Okay. Mode. The read mode used optionally for converting the image. Let's see what we can do 15435 24:55:16,400 --> 24:55:25,920 with this. I'm going to copy this in. So I'll write this down. We can read an image into PyTorch using 15436 24:55:25,920 --> 24:55:34,560 and go with that. So let's see what this looks like in practice. Read in custom image. I can't 15437 24:55:34,560 --> 24:55:40,640 explain to you how much I love using deep learning models to predict on custom data. So custom image. 15438 24:55:41,200 --> 24:55:45,680 We're going to call it you int eight because as we read from the documentation here, 15439 24:55:46,240 --> 24:55:52,560 it reads it in you int eight format. So let's have a look at what that looks like rather than 15440 24:55:52,560 --> 24:55:59,040 just talking about it. Torch vision.io. Read image. What's our target image path? 15441 24:55:59,760 --> 24:56:04,240 Well, we've got custom image path up here. This is why I like to do things programmatically. 15442 24:56:04,800 --> 24:56:08,800 So if our collab notebook reset, we could just run this cell again, 15443 24:56:08,800 --> 24:56:14,960 get our custom image and then we've got it here. So custom image you int eight. Let's see what this 15444 24:56:14,960 --> 24:56:23,840 looks like. Oh, what did we get wrong? Unable to cast Python instance. Oh, does it need to be a 15445 24:56:23,840 --> 24:56:30,720 string expected a value type of string or what found POSIX path? So this the path needs to be a 15446 24:56:30,720 --> 24:56:38,800 string. Okay. If we have a look at our custom image path, what did we get wrong? Oh, we've got a 15447 24:56:38,800 --> 24:56:46,400 POSIX path. So let's convert this custom image path into a string and see what happens. Look at that. 15448 24:56:47,520 --> 24:56:55,520 That's how image in integer form. I wonder if this is plotable. Let's go PLT dot M show custom image 15449 24:56:55,520 --> 24:57:00,880 you int eight. Maybe we get a dimensionality problem here in valid shape. Okay. Let's 15450 24:57:00,880 --> 24:57:11,120 some permute it, permute, and we'll go one, two, zero. Is this going to plot? It's a fairly big image. 15451 24:57:11,680 --> 24:57:18,640 There we go. Two thumbs up. Look at us. So that is the power of torch vision.io. I owe stands for 15452 24:57:18,640 --> 24:57:24,160 input output. We were just able to read in our custom image. Now, how about we get some metadata 15453 24:57:24,160 --> 24:57:29,040 about this? Let's go. We'll print it up here, actually. I'll keep that there because that's 15454 24:57:29,040 --> 24:57:35,840 fun to plot it. Let's find the shape of our data, the data type. And yeah, we've got it in Tensor 15455 24:57:35,840 --> 24:57:41,360 format, but it's you int eight right now. So we might have to convert that to float 32. We want 15456 24:57:41,360 --> 24:57:46,720 to find out its shape. And we need to make sure that if we're predicting on a custom image, 15457 24:57:46,720 --> 24:57:52,080 the data that we're predicting on the custom image needs to be on the same device as our model. 15458 24:57:52,080 --> 24:58:00,240 So let's print out some info. Print. Let's go custom image Tensor. And this is going to be a new line. 15459 24:58:00,240 --> 24:58:08,160 And then we will go custom image you int eight. Wonderful. And then let's go custom image 15460 24:58:08,160 --> 24:58:16,240 shape. We will get the shape parameter custom image shape or attribute. Sorry. And then we also 15461 24:58:16,240 --> 24:58:21,600 want to know the data type custom image data type. But we have a kind of an inkling because the 15462 24:58:21,600 --> 24:58:29,440 documentation said it would be you int eight, you int eight, and we'll go D type. Let's have a look. 15463 24:58:30,160 --> 24:58:36,400 What do we have? So there's our image Tensor. And it's quite a big image. So custom image shape. 15464 24:58:36,880 --> 24:58:44,560 So what was our model trained on? Our model was trained on images of 64 by 64. So this image 15465 24:58:44,560 --> 24:58:49,600 encodes a lot more information than what our model was trained on. So we're going to have to 15466 24:58:49,600 --> 24:58:56,240 change that shape to pass it through our model. And then we've got an image data type here or 15467 24:58:56,240 --> 24:59:01,280 Tensor data type of torch you int eight. So maybe that's going to be some errors for us later on. 15468 24:59:01,280 --> 24:59:07,520 So if you want to go ahead and see if you can resize this Tensor to 64 64 using a torch transform 15469 24:59:07,520 --> 24:59:12,080 or torch vision transform, I'd encourage you to try that out. And if you know how to change a 15470 24:59:12,080 --> 24:59:18,640 torch tensor from you int eight to torch float 32, give that a shot as well. So let's try 15471 24:59:18,640 --> 24:59:22,800 make a prediction on our image in the next video. I'll see you there. 15472 24:59:26,000 --> 24:59:31,440 In the last video, we loaded in our own custom image and got two big thumbs up from my dad, 15473 24:59:31,440 --> 24:59:36,720 and we turned it into a tensor. So we've got a custom image tensor here. It's quite big though, 15474 24:59:36,720 --> 24:59:40,560 and we looked at a few things of what we have to do before we pass it through our model. 15475 24:59:40,560 --> 24:59:46,720 So we need to make sure it's in the data type torch float 32, shape 64, 64, 3, and on the right 15476 24:59:46,720 --> 24:59:56,320 device. So let's make another section here. We'll go 11.2 and we'll call it making a prediction on a 15477 24:59:56,320 --> 25:00:04,240 custom image with a pie torch model with a trained pie torch model. And albeit, our models aren't 15478 25:00:04,240 --> 25:00:09,040 quite the level we would like them at yet. I think it's important just to see what it's like to 15479 25:00:09,040 --> 25:00:17,600 make a prediction end to end on some custom data, because that's the fun part, right? So try to make 15480 25:00:17,600 --> 25:00:22,800 a prediction on an image. Now, I want to just highlight something about the importance of different 15481 25:00:22,800 --> 25:00:28,240 data types and shapes and whatnot and devices, three of the biggest errors in deep learning. 15482 25:00:28,800 --> 25:00:36,240 In let's see what happens if we try to predict on you int eight format. So we'll go model one 15483 25:00:36,240 --> 25:00:44,320 dot eval and with torch dot inference mode. Let's make a prediction. We'll pass it through our model 15484 25:00:44,320 --> 25:00:49,680 one. We could use model zero if we wanted to here. They're both performing pretty poorly anyway. 15485 25:00:50,320 --> 25:00:56,080 Let's send it to the device and see what happens. Oh, no. What did we get wrong here? 15486 25:00:56,800 --> 25:01:04,080 Runtime error input type. Ah, so we've got you int eight. So this is one of our first errors 15487 25:01:04,080 --> 25:01:10,480 that we talked about. We need to make sure that our custom data is of the same data type that 15488 25:01:10,480 --> 25:01:17,040 our model was originally trained on. So we've got torch CUDA float tensor. So we've got an issue 15489 25:01:17,040 --> 25:01:24,480 here. We've got a you into eight image data or image tensor trying to be predicted on by a model 15490 25:01:24,480 --> 25:01:33,920 with its data type of torch CUDA float tensor. So let's try fix this by loading the custom image 15491 25:01:33,920 --> 25:01:42,720 and convert to torch dot float 32. So one of the ways we can do this is we'll just recreate the 15492 25:01:42,720 --> 25:01:49,520 custom image tensor. And I'm going to use torch vision dot IO dot read image. We don't have to 15493 25:01:49,520 --> 25:01:53,600 fully reload our image, but I'm going to do it anyway for completeness and a little bit of practice. 15494 25:01:54,400 --> 25:02:01,760 And then I'm going to set the type here with the type method to torch float 32. And then 15495 25:02:01,760 --> 25:02:10,880 let's just see what happens. We'll go custom image. Let's see what this looks like. I wonder if our 15496 25:02:10,880 --> 25:02:17,280 model will work on this. Let's just try again, we'll bring this up, copy this down to make a 15497 25:02:17,280 --> 25:02:24,960 prediction and custom image dot two device. Our image is in torch float 32 now. Let's see what 15498 25:02:24,960 --> 25:02:32,400 happens. Oh, we get an issue. Oh my goodness, that's a big matrix. Now I have a feeling that 15499 25:02:32,400 --> 25:02:39,040 that might be because our image, our custom image is of a shape that's far too large. Custom image 15500 25:02:39,040 --> 25:02:47,200 dot shape. What do we get? Oh my gosh, 4000 and 3,024. And do you notice as well that our values 15501 25:02:47,200 --> 25:02:54,320 here are between zero and one, whereas our previous images, do we have an image? There we go. That 15502 25:02:54,320 --> 25:02:59,760 our model was trained on what between zero and one. So how could we get these values to be between 15503 25:02:59,760 --> 25:03:08,720 zero and one? Well, one of the ways to do so is by dividing by 255. Now, why would we divide by 255? 15504 25:03:09,840 --> 25:03:17,120 Well, because that's a standard image format is to store the image tensor values in values from 15505 25:03:17,120 --> 25:03:24,560 zero to 255 for red, green and blue color channels. So if we want to scale them, so this is what I 15506 25:03:24,560 --> 25:03:31,280 meant by zero to 255, if we wanted to scale these values to be between zero and one, we can divide 15507 25:03:31,280 --> 25:03:38,320 them by 255. Because that is the maximum value that they can be. So let's see what happens if we do 15508 25:03:38,320 --> 25:03:47,120 that. Okay, we get our image values between zero and one. Can we plot this image? So plt dot m 15509 25:03:47,120 --> 25:03:52,800 show, let's plot our custom image. We got a permute it. So it works nicely with mapplotlib. 15510 25:03:53,760 --> 25:03:54,640 What do we get here? 15511 25:04:00,720 --> 25:04:05,680 Beautiful. We get the same image, right? But it's still quite big. Look at that. We've got a pixel 15512 25:04:05,680 --> 25:04:11,600 height of or image height of almost 4000 pixels and a width of over 3000 pixels. So we need to do 15513 25:04:11,600 --> 25:04:17,280 some adjustments further on. So let's keep going. We've got custom image to device. We've got an 15514 25:04:17,280 --> 25:04:23,200 error here. So this is a shape error. So what can we do to transform our image shape? And you 15515 25:04:23,200 --> 25:04:29,280 might have already tried this. Well, let's create a transform pipeline to transform our image shape. 15516 25:04:29,280 --> 25:04:37,760 So create transform pipeline or composition to resize the image. Because remember, what are we 15517 25:04:37,760 --> 25:04:42,560 trying to do? We're trying to get our model to predict on the same type of data it was trained on. 15518 25:04:42,560 --> 25:04:51,280 So let's go custom image transform is transforms dot compose. And we're just going to, since our 15519 25:04:51,280 --> 25:04:59,600 image is already of a tensor, let's do transforms dot resize, and we'll set the size to the same shape 15520 25:04:59,600 --> 25:05:06,560 that our model was trained on, or the same size that is. So let's go from torch vision. We don't 15521 25:05:06,560 --> 25:05:10,400 have to rewrite this. It's already imported. But I just want to highlight that we're using the 15522 25:05:10,400 --> 25:05:18,000 transforms package. We'll run that. There we go. We've got a transform pipeline. Now let's see what 15523 25:05:18,000 --> 25:05:25,680 happens when we transform our target image, transform target image. What happens? Custom image 15524 25:05:25,680 --> 25:05:32,240 transformed. I love printing the inputs and outputs of our different pipelines here. So let's pass 15525 25:05:32,240 --> 25:05:39,840 our custom image that we've just imported. So custom image transform, our custom image is recall 15526 25:05:39,840 --> 25:05:48,640 of shape. Quite large. We're going to pass it through our transformation pipeline. And let's 15527 25:05:48,640 --> 25:05:58,960 print out the shapes. Let's go original shape. And then we'll go custom image dot shape. And then 15528 25:05:58,960 --> 25:06:10,080 we'll go print transformed shape is going to be custom image underscore transformed dot shape. 15529 25:06:11,040 --> 25:06:18,560 Let's see the transformation. Oh, would you look at that? How good we've gone from quite a large image 15530 25:06:18,560 --> 25:06:24,160 to a transformed image here. So it's going to be squished and squashed a little. So that's what 15531 25:06:24,160 --> 25:06:30,400 happens. Let's see what happens when we plot our transformed image. We've gone from 4000 pixels 15532 25:06:30,400 --> 25:06:36,400 on the height to 64. And we've gone from 3000 pixels on the height to 64. So this is what our 15533 25:06:36,400 --> 25:06:45,440 model is going to see. Let's go custom image transformed. And we're going to permute it to be 120. 15534 25:06:47,520 --> 25:06:52,480 Okay, so quite pixelated. Do you see how this might affect the accuracy of our model? 15535 25:06:52,480 --> 25:06:58,800 Because we've gone from custom image, is this going to, oh, yeah, we need to plot dot image. 15536 25:06:58,800 --> 25:07:06,640 So we've gone from this high definition image to an image that's of far lower quality here. 15537 25:07:06,640 --> 25:07:11,600 And I can kind of see myself that this is still a pizza, but I know that it's a pizza. So just 15538 25:07:11,600 --> 25:07:15,760 keep this in mind going forward is that another way that we could potentially improve our model's 15539 25:07:15,760 --> 25:07:23,680 performance if we increased the size of the training image data. So instead of 64 64, we might want 15540 25:07:23,680 --> 25:07:30,240 to upgrade our models capability to deal with images that are of 224 224. So if we have a look 15541 25:07:30,240 --> 25:07:40,800 at what this looks like 224 224. Wow, that looks a lot better than 64 64. So that's something that 15542 25:07:40,800 --> 25:07:46,160 you might want to try out later on. But we're going to stick in line with the CNN explainer model. 15543 25:07:47,760 --> 25:07:52,240 How about we try to make another prediction? So since we transformed our 15544 25:07:53,760 --> 25:08:00,400 image to be the same size as the data our model was trained on. So with torch inference mode, 15545 25:08:00,400 --> 25:08:08,480 let's go custom image pred equals model one on custom image underscore transformed. 15546 25:08:08,480 --> 25:08:15,360 Does it work now? Oh my goodness, still not working expected all tensors on the same device. Of course, 15547 25:08:15,360 --> 25:08:21,760 that's what we forgot here. Let's go to device. Or actually, let's leave that error there. And 15548 25:08:21,760 --> 25:08:27,440 we'll just copy this code down here. And let's put this custom image transform back on the right 15549 25:08:27,440 --> 25:08:35,520 device and see if we finally get a prediction to happen with our model. Oh, we still get an error. 15550 25:08:35,520 --> 25:08:42,560 Oh my goodness, what's going on here? Oh, we need to add a batch size to it. So I'm just gonna write 15551 25:08:42,560 --> 25:09:00,240 up here. This will error. No batch size. And this will error. Image not on right device. And then 15552 25:09:00,240 --> 25:09:07,120 let's try again, we need to add a batch size to our image. So if we look at custom image transformed 15553 25:09:08,880 --> 25:09:16,080 dot shape, recall that our images that passed through our model had a batch dimension. So this 15554 25:09:16,080 --> 25:09:22,320 is another place where we get shape mismatch issues is if our model, because what's going on 15555 25:09:22,320 --> 25:09:28,160 in neural network is a lot of tensor manipulation. If the dimensions don't line up, we want to perform 15556 25:09:28,160 --> 25:09:33,920 matrix multiplication and the rules. If we don't play to the rules, the matrix multiplication will 15557 25:09:33,920 --> 25:09:43,040 fail. So let's fix this by adding a batch dimension. So we can do this by going a custom image transformed. 15558 25:09:43,040 --> 25:09:50,160 Let's unsqueeze it on the first dimension and then check the shape. There we go. We add a single batch. 15559 25:09:50,160 --> 25:09:54,960 So that's what we want to do when we make a prediction on a single custom image. We want to pass it to 15560 25:09:54,960 --> 25:10:02,000 our model as an image or a batch of one sample. So let's finally see if this will work. 15561 25:10:03,520 --> 25:10:09,040 Let's just not comment what we'll do. This, or maybe we'll try anyway, this should work. 15562 25:10:11,120 --> 25:10:17,440 Added a batch size. So do you see the steps we've been through so far? And we're just going to 15563 25:10:17,440 --> 25:10:26,640 unsqueeze this. Unsqueeze on the zero dimension to add a batch size. Oh, it didn't error. Oh my 15564 25:10:26,640 --> 25:10:32,960 goodness. It didn't error. Have a look at that. Yes, that's what we want. We get a prediction 15565 25:10:32,960 --> 25:10:39,680 load it because the raw outputs of our model, we get a load it value for each of our custom classes. 15566 25:10:39,680 --> 25:10:44,720 So this could be pizza. This could be steak. And this could be sushi, depending on the order of 15567 25:10:44,720 --> 25:10:54,160 our classes. Let's just have a look. Class to IDX. Did we not get that? Class names. 15568 25:10:56,240 --> 25:11:01,760 Beautiful. So pizza steak sushi. We've still got a ways to go to convert this into that. 15569 25:11:01,760 --> 25:11:08,400 But I just want to highlight what we've done. So note, to make a prediction on a custom image, 15570 25:11:08,400 --> 25:11:16,000 we had to. And this is something you'll have to keep in mind for almost all of your custom data. 15571 25:11:16,000 --> 25:11:22,600 It needs to be formatted in the same way that your model was trained on. So we had to load the image 15572 25:11:22,600 --> 25:11:35,160 and turn it into a tensor. We had to make sure the image was the same data type as the model. 15573 25:11:35,160 --> 25:11:43,560 So that was torch float 32. And then we had to make sure the image was the same shape as the data 15574 25:11:43,560 --> 25:11:54,760 the model was trained on, which was 64, 64, three with a batch size. So that was one, 15575 25:11:54,760 --> 25:12:02,720 three, 64, 64. And excuse me, this should actually be the other way around. This should be color 15576 25:12:02,720 --> 25:12:09,600 channels first, because we're dealing with pie torch here. 64. And then finally, we had to make 15577 25:12:09,600 --> 25:12:21,120 sure the image was on the same device as our model. So they are three of the big ones that we've 15578 25:12:21,120 --> 25:12:26,160 talked about so much the same data type or data type mismatch will result in a bunch of issues. 15579 25:12:26,160 --> 25:12:33,520 Shape mismatch will result in a bunch of issues. And device mismatch will also result in a bunch 15580 25:12:33,520 --> 25:12:41,760 of issues. If you want these to be highlighted, they are in the learn pie torch.io resource. We have 15581 25:12:41,760 --> 25:12:48,080 putting things together. Where do we have it? Oh, yeah, no, it's in the main takeaway section, 15582 25:12:48,080 --> 25:12:53,160 sorry, predicting on your own custom data with a trained model as possible, as long as you format 15583 25:12:53,160 --> 25:12:58,200 the data into a similar format to what the model was trained on. So make sure you take care of the 15584 25:12:58,200 --> 25:13:03,800 three big pie torch and deep learning errors. Wrong data types, wrong data shapes, and wrong 15585 25:13:03,800 --> 25:13:10,520 devices, regardless of whether that's images or audio or text, these three will follow you around. 15586 25:13:10,520 --> 25:13:17,880 So just keep them in mind. But now we've got some code to predict on custom images, but it's kind 15587 25:13:17,880 --> 25:13:22,440 of all over the place. We've got about 10 coding cells here just to make a prediction on a custom 15588 25:13:22,440 --> 25:13:29,640 image. How about we functionize this and see if it works on our pizza dad image. I'll see you in the 15589 25:13:29,640 --> 25:13:38,840 next video. Welcome back. We're now well on our way to making custom predictions on our own custom 15590 25:13:38,840 --> 25:13:44,840 image data. Let's keep pushing forward. In the last video, we finished off getting some raw model 15591 25:13:44,840 --> 25:13:50,920 logits. So the raw outputs from our model. Now, let's see how we can convert these logits into 15592 25:13:50,920 --> 25:13:59,400 prediction labels. Let's write some code. So convert logits to prediction labels. Or let's go 15593 25:14:00,040 --> 25:14:06,040 convert logits. Let's first convert them to prediction probabilities. Probabilities. 15594 25:14:07,000 --> 25:14:14,840 So how do we do that? Let's go custom image pred probes equals torch dot softmax 15595 25:14:14,840 --> 25:14:22,520 to convert our custom image pred across the first dimension. So the first dimension of this tensor 15596 25:14:22,520 --> 25:14:28,040 will be the inner brackets, of course. So just this little section here. Let's see what these 15597 25:14:28,040 --> 25:14:36,360 look like. This will be prediction probabilities. Wonderful. So you'll notice that these are quite 15598 25:14:36,360 --> 25:14:42,600 spread out. Now, this is not ideal. Ideally, we'd like our model to assign a fairly large 15599 25:14:42,600 --> 25:14:49,240 prediction probability to the target class, the right target class that is. However, since our model 15600 25:14:49,240 --> 25:14:53,720 when we trained it isn't actually performing that all that well. The prediction probabilities 15601 25:14:53,720 --> 25:14:58,680 are quite spread out across all of the classes. But nonetheless, we're just highlighting what 15602 25:14:58,680 --> 25:15:03,320 it's like to predict on custom data. So now let's convert the prediction probabilities 15603 25:15:03,320 --> 25:15:12,680 to prediction labels. Now, you'll notice that we used softmax because why we are working with 15604 25:15:12,680 --> 25:15:19,480 multi class classification data. And so we can get the custom image pred labels, the integers, 15605 25:15:20,040 --> 25:15:28,120 by taking the argmax of the prediction probabilities, custom image pred probes across the first 15606 25:15:28,120 --> 25:15:35,000 dimension as well. So let's go custom image pred labels. Let's see what they look like. 15607 25:15:35,960 --> 25:15:42,680 Zero. So the index here with the highest value is index number zero. And you'll notice that it's 15608 25:15:42,680 --> 25:15:49,240 still on the coded device. So what would happen if we try to index on our class names with 15609 25:15:49,240 --> 25:16:00,760 the custom image pred labels? Or maybe that doesn't need to be a plural. Oh, there we go. We get pizza. 15610 25:16:00,760 --> 25:16:06,440 But you might also have to change this to the CPU later on. Otherwise, you might run into some 15611 25:16:06,440 --> 25:16:11,960 errors. So just be aware of that. So you notice how we just put it to the CPU. So we get pizza. We 15612 25:16:11,960 --> 25:16:16,120 got a correct prediction. But this is as good as guessing in my opinion, because these are kind 15613 25:16:16,120 --> 25:16:22,760 of spread out. Ideally, this value would be higher, maybe something like 0.8 or above for our pizza 15614 25:16:22,760 --> 25:16:31,320 dad image. But nonetheless, our model is getting two thumbs up even on this 64 by 64 image. But 15615 25:16:31,320 --> 25:16:37,000 that's a lot of code that we've written. Let's functionize it. So we can just pass in a file path 15616 25:16:37,000 --> 25:16:42,360 and get a custom prediction from it. So putting custom image prediction together. 15617 25:16:42,360 --> 25:16:52,440 Let's go building a function. So we want the ideal outcome is, let's plot our image as well. 15618 25:16:52,440 --> 25:17:08,760 Ideal outcome is a function where we plot or where we pass an image path to and have our model predict 15619 25:17:08,760 --> 25:17:17,880 on that image and plot the image plus the prediction. So this is our ideal outcome. And I think I'm 15620 25:17:17,880 --> 25:17:24,120 going to issue this as a challenge. So give that a go, put all of our code above together. And you'll 15621 25:17:24,120 --> 25:17:27,880 just have to import the image, you'll have to process it and whatnot. I know I said we were going 15622 25:17:27,880 --> 25:17:31,720 to build a function in this video, but we're going to say that to the next video. I'd like 15623 25:17:31,720 --> 25:17:39,160 you to give that a go. So start from way back up here, import the image via torture vision.io read 15624 25:17:39,160 --> 25:17:45,400 image, format it using what we've done, change the data type, change the shape, change the device, 15625 25:17:46,040 --> 25:17:54,200 and then plot the image with its prediction as the title. So give that a go and we'll do it 15626 25:17:54,200 --> 25:18:02,760 together in the next video. How'd you go? I just realized I had a typo in the previous cell, 15627 25:18:02,760 --> 25:18:07,640 but that's all right. Did you give it a shot? Did you put together the custom image prediction 15628 25:18:07,640 --> 25:18:13,720 in a function format? I'd love it if you did. But if not, that's okay. Let's keep going. Let's see 15629 25:18:13,720 --> 25:18:17,640 what that might look like. And there are many different ways that you could do this. But 15630 25:18:17,640 --> 25:18:21,720 here's one of the ways that I've thought of. So we want to function that's going to 15631 25:18:21,720 --> 25:18:28,440 pred and plot a target image. We wanted to take in a torch model. And so that's going to be ideally 15632 25:18:28,440 --> 25:18:33,960 a trained model. We wanted to also take in an image path, which will be of a string. It can 15633 25:18:33,960 --> 25:18:40,680 take in a class names list so that we can index it and get the prediction label in string format. 15634 25:18:41,400 --> 25:18:46,680 So let's put this as a list of strings. And by default, this can equal none. Just in case we 15635 25:18:46,680 --> 25:18:52,280 just wanted the prediction, it wants to take in a transform so that we can pass it in some form of 15636 25:18:52,280 --> 25:18:58,280 transform to transform the image. And then it's going to take in a device, which will be by default 15637 25:18:58,280 --> 25:19:05,000 the target device. So let's write a little doc string here, makes a prediction on a target image 15638 25:19:05,000 --> 25:19:17,160 with a trained model and plots the image and prediction. Beautiful. Now what do we have to do 15639 25:19:17,160 --> 25:19:25,160 first? Let's load in the image. Load in the image just like we did before with torch vision. So 15640 25:19:25,720 --> 25:19:34,680 target image equals torch vision.io dot read image. And we'll go string on the image path, 15641 25:19:34,680 --> 25:19:40,200 which will be the image path here. And we convert it to a string just in case it doesn't get passed 15642 25:19:40,200 --> 25:19:49,880 in as a string. And then let's change it into type torch float 32. Because we want to make sure that 15643 25:19:49,880 --> 25:19:57,240 our custom image or our custom data is in the same type as what we trained our model on. So now 15644 25:19:57,240 --> 25:20:09,480 let's divide the image pixel values by 255 to get them between zero or to get them between zero 15645 25:20:10,120 --> 25:20:17,960 one as a range. So we can just do this by target image equals target image divided by 255. And we 15646 25:20:17,960 --> 25:20:22,920 could also just do this in one step up here 255. But I've just put it out there just to let you know 15647 25:20:22,920 --> 25:20:30,920 that, hey, read image imports image data as between zero and 255. So our model prefers numbers 15648 25:20:30,920 --> 25:20:37,960 between zero and one. So let's just scale it there. Now we want to transform our data if necessary. 15649 25:20:37,960 --> 25:20:43,400 In our case, it is, but it won't always be. So we want this function to be pretty generic 15650 25:20:43,400 --> 25:20:51,160 predomplot image. So if the transform exists, let's set the target image to the transform, 15651 25:20:51,160 --> 25:20:56,280 or we'll pass it through the transform that is wonderful. And the transform we're going to get 15652 25:20:56,280 --> 25:21:04,840 from here. Now what's left to do? Well, let's make sure the model is on the target device. 15653 25:21:05,960 --> 25:21:10,920 It might be by default, but if we're passing in a device parameter, we may as well make sure the 15654 25:21:10,920 --> 25:21:18,680 model is there too. And now we can make a prediction. So let's turn on a vowel slash inference mode 15655 25:21:18,680 --> 25:21:25,800 and make a prediction with our model. So model, we call a vowel mode, and then with torch dot 15656 25:21:25,800 --> 25:21:30,520 inference mode, because we're making a prediction, we want to turn our model into inference mode, 15657 25:21:30,520 --> 25:21:39,160 or put it in inference mode context. Let's add an extra dimension to the image. Let's go target 15658 25:21:39,160 --> 25:21:43,880 image. We could do this step above, actually, but we're just going to do it here. From kind of 15659 25:21:43,880 --> 25:21:49,080 remembering things on the fly here of what we need to do, we're adding a, this is, let's write 15660 25:21:49,080 --> 25:21:59,720 this down, this is the batch dimension. e g our model will predict on batches of one x image. 15661 25:22:00,520 --> 25:22:05,400 So we're just unsqueezing it to add an extra dimension at the zero dimension space, 15662 25:22:05,400 --> 25:22:09,400 just like we did in a previous video. Now let's make a prediction 15663 25:22:09,400 --> 25:22:16,040 on the image with an extra dimension. Otherwise, if we don't have that extra dimension, we saw 15664 25:22:16,040 --> 25:22:21,160 that we get a shape issue. So right down here, target image pred. And remember, this is going 15665 25:22:21,160 --> 25:22:29,560 to be the raw model outputs, raw logit outputs. We're going to target image pred. And yeah, 15666 25:22:30,120 --> 25:22:35,000 I believe that's all we need for the prediction. Oh wait, there was one more thing, two device. 15667 25:22:35,000 --> 25:22:44,200 Me too. Also make sure the target image is on the right device. Beautiful. So fair 15668 25:22:44,200 --> 25:22:49,160 few steps here, but nothing we can't handle. All we're really doing is replicating what we've done 15669 25:22:49,160 --> 25:22:55,000 for batches of images. But we want to make sure that if someone passed any image to our 15670 25:22:55,640 --> 25:23:00,440 pred and plot image function, that we've got functionality in here to handle that image. 15671 25:23:00,440 --> 25:23:06,360 And do we get this? Oh, we want just target image to device. Did you catch that error? 15672 25:23:06,920 --> 25:23:16,120 So let's keep going. Now let's convert the logits. Our models raw logits. Let's convert those 15673 25:23:16,120 --> 25:23:22,200 to prediction probabilities. This is so exciting. We're getting so close to making a function 15674 25:23:22,200 --> 25:23:27,160 to predict on custom data. So we'll set this to target image pred probes, which is going to be 15675 25:23:27,160 --> 25:23:33,640 torch dot softmax. And we will pass in the target image pred here. We want to get the softmax of 15676 25:23:33,640 --> 25:23:39,480 the first dimension. Now let's convert our prediction probabilities, which is what we get in the line 15677 25:23:39,480 --> 25:23:49,320 above. We want to convert those to prediction labels. So let's get the target image pred labels 15678 25:23:49,320 --> 25:23:56,200 labels equals torch dot argmax. We want to get the argmax of, or in other words, the index, 15679 25:23:56,200 --> 25:24:03,000 which is the maximum value from the pred probes of the first dimension as well. Now what should we 15680 25:24:03,000 --> 25:24:09,240 return here? Well, we don't really need to return anything. We want to create a plot. So let's plot 15681 25:24:09,240 --> 25:24:21,640 the image alongside the prediction and prediction probability. Beautiful. So plot dot in show, 15682 25:24:21,640 --> 25:24:27,240 what are we going to pass in here? We're going to pass in here our target image. Now we have to 15683 25:24:27,240 --> 25:24:33,000 squeeze this, I believe, because we've added an extra dimension up here. So we'll squeeze it to 15684 25:24:33,000 --> 25:24:40,120 remove that batch size. And then we still have to permute it because map plot lib likes images 15685 25:24:40,120 --> 25:24:47,560 in the format color channels last one, two, zero. So remove batch dimension. 15686 25:24:50,040 --> 25:24:59,960 And rearrange shape to be hc hwc. That is color channels last. Now if the class names parameter 15687 25:24:59,960 --> 25:25:05,720 exists, so we've passed in a list of class names, this function is really just replicating 15688 25:25:05,720 --> 25:25:11,000 everything we've done in the past 10 cells, by the way. So right back up here, we're replicating 15689 25:25:11,000 --> 25:25:15,480 all of this stuff in one function. So pretty large function, but once we've written it, 15690 25:25:15,480 --> 25:25:22,920 we can pass in our images as much as we like. So if class names exist, let's set the title 15691 25:25:22,920 --> 25:25:29,960 to our showcase that class name. So the pred is going to be class names. Let's index on that 15692 25:25:29,960 --> 25:25:36,600 pred image, or target image pred label. And this is where we'll have to put it to the CPU, 15693 25:25:36,600 --> 25:25:42,840 because if we're using a title with map plot lib, map plot lib cannot handle things that are on 15694 25:25:42,840 --> 25:25:48,920 the GPU. This is why we have to put it to the CPU. And then I believe that should be enough for 15695 25:25:48,920 --> 25:25:55,640 that. Let's add a little line in here, so that we can have it. Oh, I've missed something. 15696 25:25:56,520 --> 25:26:03,240 An outside bracket there. Wonderful. Let's add the prediction probability, because that's always 15697 25:26:03,240 --> 25:26:09,960 fun to see. So we want target image pred probs. And we want to get the maximum pred problem from 15698 25:26:09,960 --> 25:26:16,280 that. And we'll also put that on the CPU. And I think we might get this three decimal places. 15699 25:26:16,280 --> 25:26:24,600 Now this is saying, oh, pred labels, we don't need that. We need just non plural, beautiful. Now, 15700 25:26:24,600 --> 25:26:32,760 if the class names doesn't exist, let's just set the title equal to f f string, we'll go pred, 15701 25:26:34,040 --> 25:26:39,800 target image pred label. Is Google Colab still telling me this is wrong? 15702 25:26:39,800 --> 25:26:45,480 Target image pred label. Oh, no, we've still got the same thing. It just hasn't caught up with me, 15703 25:26:45,480 --> 25:26:51,160 and I'm coding a bit fast here. And then we'll pass in the prob, which will be just the same as 15704 25:26:51,160 --> 25:27:03,880 above. I could even copy this in. Beautiful. And let's now set the title to the title. And we 15705 25:27:03,880 --> 25:27:11,080 and we will turn the axes off. PLT axes false. Fair bit of code there. But this is going to be a 15706 25:27:11,080 --> 25:27:16,840 super exciting moment. Let's see what this looks like. When we pass it in a target image and a 15707 25:27:16,840 --> 25:27:23,000 target model, some class names, and a transform. Are you ready? We've got our transform ready, 15708 25:27:23,000 --> 25:27:29,160 by the way, it's back up here. Custom image transform. It's just going to resize our image. 15709 25:27:29,160 --> 25:27:36,040 So let's see. Oh, this file was updated remotely or in another tab. Sometimes this happens, and 15710 25:27:36,040 --> 25:27:39,960 usually Google Colab sorts itself out, but that's all right. It doesn't affect our code for now. 15711 25:27:39,960 --> 25:27:46,040 Pred on our custom image. Are you ready? Save failed. Would you like to override? Yes, I would. 15712 25:27:47,000 --> 25:27:52,440 So you might see that in Google Colab. Usually it fixes itself. There we go. Save successfully. 15713 25:27:52,440 --> 25:27:58,040 Pred and plot image. I was going to say, Google Colab, don't fail me now. We're about to predict 15714 25:27:58,040 --> 25:28:07,080 on our own custom data. Using a model trained on our own custom data. Image part. Let's pass in 15715 25:28:07,080 --> 25:28:13,080 custom image path, which is going to be the path to our pizza dad image. Let's go class names, 15716 25:28:13,080 --> 25:28:19,160 equals class names, which is pizza, steak, and sushi. We'll pass in our transform to convert our 15717 25:28:19,160 --> 25:28:28,520 image to the right shape and size custom image transform. And then finally, the target device is 15718 25:28:28,520 --> 25:28:33,320 going to be device. Are you ready? Let's make a prediction on custom data. One of my favorite 15719 25:28:33,320 --> 25:28:38,840 things. One of the most fun things to do when building deep learning models. Three, two, one. 15720 25:28:38,840 --> 25:28:49,880 How did it go? Oh, no. What did we get wrong? CPU. Okay. Such a so close, but yet so far. 15721 25:28:50,760 --> 25:28:57,960 Has no attribute CPU. Oh, maybe we need to put this to CPU. That's where I got the square bracket 15722 25:28:57,960 --> 25:29:03,800 wrong. So that's what we needed to change. We needed to because this is going to be potentially 15723 25:29:03,800 --> 25:29:09,640 on the GPU. Tag image pred label. We need to put it on the CPU. We need to do that. Why? 15724 25:29:09,640 --> 25:29:14,920 Because this is going to be the title of our map plot lib plot. And map plot lib doesn't interface 15725 25:29:14,920 --> 25:29:25,240 too well with data on a GPU. Let's try it again. Three, two, one, running. Oh, look at that. 15726 25:29:25,240 --> 25:29:30,520 Prediction on a custom image. And it gets it right. Two thumbs up. I didn't plan this. Our model is 15727 25:29:30,520 --> 25:29:36,040 performing actually quite poorly. So this is as good as a guess to me. You might want to try this 15728 25:29:36,040 --> 25:29:41,160 on your own image. And in fact, if you do, please share it with me. I would love to see it. But 15729 25:29:41,800 --> 25:29:47,640 you could potentially try this with another model. See what happens? Steak. Okay, there we go. So 15730 25:29:47,640 --> 25:29:55,880 even though model one performs worse quantitatively, it performs better qualitatively. So that's the 15731 25:29:55,880 --> 25:30:01,640 power of a visualize, visualize, visualize. And if we use model zero, also, which isn't performing 15732 25:30:01,640 --> 25:30:08,600 too well, it gets it wrong with a prediction probability of 0.368, which isn't too high either. 15733 25:30:09,160 --> 25:30:13,400 So we've talked about a couple of different ways to improve our models. Now we've even 15734 25:30:13,400 --> 25:30:19,240 got a way to make predictions on our own custom images. So give that a shot. I'd love to see 15735 25:30:19,240 --> 25:30:25,080 your custom predictions, upload an image here if you want, or download it into Google Colab using 15736 25:30:25,080 --> 25:30:32,840 code that we've used before. But we've come a fairly long way. I feel like we've covered enough 15737 25:30:32,840 --> 25:30:38,360 for custom data sets. Let's summarize what we've covered in the next video. And I've got a bunch 15738 25:30:38,360 --> 25:30:43,800 of exercises and extra curriculum for you. So this is exciting stuff. I'll see you in the next video. 15739 25:30:47,160 --> 25:30:52,200 In the last video, we did the very exciting thing of making a prediction on our own custom 15740 25:30:52,200 --> 25:30:57,400 image, although it's quite pixelated. And although our models performance quantitatively didn't 15741 25:30:57,400 --> 25:31:03,160 turn out to be too good qualitatively, it happened to work out. But of course, there are a fair few 15742 25:31:03,160 --> 25:31:08,760 ways that we could improve our models performance. But the main takeaway here is that we had to do 15743 25:31:08,760 --> 25:31:16,200 a bunch of pre processing to make sure our custom image was in the same format as what our model 15744 25:31:16,200 --> 25:31:21,400 expected. And this is quite a lot of what I do behind the scenes for Nutrify. If you upload an 15745 25:31:21,400 --> 25:31:27,400 image here, it gets pre processed in a similar way to go through our image classification model 15746 25:31:27,400 --> 25:31:34,760 to output a label like this. So let's get out of this. To summarize, I've got a colorful slide here, 15747 25:31:34,760 --> 25:31:40,120 but we've already covered this predicting on custom data. These are three things to make sure of, 15748 25:31:40,120 --> 25:31:46,040 regardless of whether you're using images, text or audio, make sure your data is in the right 15749 25:31:46,040 --> 25:31:53,080 data type. In our case, it was torch float 32. Make sure your data is on the same device as the model. 15750 25:31:53,080 --> 25:31:59,880 So we had to put our custom image to the GPU, which was where our model also lived. And then we had 15751 25:31:59,880 --> 25:32:05,480 to make sure our data was in the correct shape. So the original shape was 64, 64, 3. Actually, 15752 25:32:05,480 --> 25:32:10,120 this should be reversed, because it was color channels first. But the same principle remains here. 15753 25:32:10,120 --> 25:32:17,480 We had to add a batch dimension and rearrange if we needed. So in our case, we used images of this 15754 25:32:17,480 --> 25:32:24,680 shape batches first color channels first height width. But depending on your problem will depend 15755 25:32:24,680 --> 25:32:29,240 on your shape, depending on the device you're using will depend on where your data and your 15756 25:32:29,240 --> 25:32:34,120 model lives. And depending on the data type you're using will depend on what you're using for torch 15757 25:32:34,120 --> 25:32:43,560 float 32 or something else. So let's summarize. If we go here main takeaways, you can read through 15758 25:32:43,560 --> 25:32:49,320 these, but some of the big ones are pie torch has many built in functions to deal with all kinds 15759 25:32:49,320 --> 25:32:55,400 of data from vision to text to audio to recommendation systems. So if we look at the pie torch docs, 15760 25:32:57,480 --> 25:33:01,480 you're going to become very familiar with these over time. We've got torch audio data, 15761 25:33:01,480 --> 25:33:06,280 torch text, torch vision is what we practiced with. And we've got a whole bunch of things here for 15762 25:33:06,280 --> 25:33:13,320 transforming and augmenting images, data sets, utilities, operators, and torch data is currently 15763 25:33:13,320 --> 25:33:19,160 in beta. But this is just something to be aware of later on. So it's a prototype library right now, 15764 25:33:19,160 --> 25:33:24,680 but by the time you watch this, it might be available. But it's another way of loading data. 15765 25:33:24,680 --> 25:33:31,640 So just be aware of this for later on. And if we come back to up here, if applied to watch built 15766 25:33:31,640 --> 25:33:36,440 in data loading functions, don't suit your requirements, you can write your own custom 15767 25:33:36,440 --> 25:33:42,600 data set classes by subclassing torch dot utils dot data dot data set. And we saw that way back 15768 25:33:42,600 --> 25:33:50,360 up here in option number two. Option two, here we go, loading image data with a custom data set, 15769 25:33:50,360 --> 25:33:56,040 wrote plenty of code to do that. And then a lot of machine learning is dealing with the 15770 25:33:56,040 --> 25:34:00,280 balance between overfitting and underfitting. We've got a whole section in the book here to 15771 25:34:00,280 --> 25:34:04,120 check out what an ideal loss curve should look like and how to deal with overfitting, 15772 25:34:04,120 --> 25:34:09,960 how to deal with underfitting. It's it is a fine line. So much of the research and machine 15773 25:34:09,960 --> 25:34:16,600 learning is actually dedicated towards this balance. And then three big things for being aware of 15774 25:34:16,600 --> 25:34:21,960 when you're predicting on your own custom data, wrong data types, wrong data shapes, 15775 25:34:21,960 --> 25:34:27,000 and wrong devices. This will follow you around, as I said, and we saw that in practice to get our 15776 25:34:27,000 --> 25:34:32,760 own custom image ready for a trained model. Now, we have some exercises here. If you'd like 15777 25:34:32,760 --> 25:34:38,200 the link to it, you can go to loan pytorch.io section number four exercises, and of course, 15778 25:34:38,200 --> 25:34:41,960 extra curriculum. A lot of the things I've mentioned throughout the course that would be a good 15779 25:34:41,960 --> 25:34:47,560 resource to check out contained in here. But the exercises, this is this is your time to shine, 15780 25:34:47,560 --> 25:34:52,120 your time to practice. Let's go back to this notebook, scroll right down to the bottom. 15781 25:34:52,120 --> 25:35:00,760 Look how much code we've written. Goodness me, exercises for all exercises and extra curriculum. 15782 25:35:02,760 --> 25:35:08,920 See here, turn that into markdown. Wonderful. And so if we go in here, you've got a couple of 15783 25:35:08,920 --> 25:35:15,000 resources. There's an exercise template notebook for number four, and example solutions for notebook 15784 25:35:15,000 --> 25:35:21,080 number four, which is what we're working on now. So of course, I'd encourage you to go through the 15785 25:35:21,080 --> 25:35:27,880 pytorch custom data sets exercises template first. Try to fill out all of the code here on your own. 15786 25:35:27,880 --> 25:35:32,440 So we've got some questions here. We've got some dummy code. We've got some comments. 15787 25:35:32,440 --> 25:35:38,200 So give that a go. Go through this. Use this book resource to reference. Use all the code 15788 25:35:38,200 --> 25:35:43,080 we've written. Use the documentation, whatever you want. But try to go through this on your own. 15789 25:35:43,080 --> 25:35:47,560 And then if you get stuck somewhere, you can look at an example solution that I created, 15790 25:35:47,560 --> 25:35:53,160 which is here, pytorch custom data sets exercise solutions. And just be aware that this is just 15791 25:35:53,160 --> 25:35:57,480 one way of doing things. It's not necessarily the best. It's just a way to reference what 15792 25:35:57,480 --> 25:36:03,400 you're writing to what I would do. And there's actually now live walkthroughs of the solutions, 15793 25:36:03,400 --> 25:36:08,680 errors and all on YouTube. So if you go to this video, which is going to mute. So this is me 15794 25:36:08,680 --> 25:36:13,960 live streaming the whole thing, writing a bunch of pytorch code. If you just keep going through all 15795 25:36:13,960 --> 25:36:19,800 of that, you'll see me writing all of the solutions, running into errors, trying different things, 15796 25:36:19,800 --> 25:36:25,160 et cetera, et cetera. But that's on YouTube. You can check that out on your own time. But I feel 15797 25:36:25,160 --> 25:36:32,360 like we've covered enough exercises. Oh, by the way, this is in the extras exercises tab 15798 25:36:32,360 --> 25:36:37,640 of the pytorch deep learning repo. So extras exercises and solutions that are contained in there. 15799 25:36:39,160 --> 25:36:45,960 Far out. We've covered a lot. Look at all that. So that has been pytorch custom data sets. 15800 25:36:46,600 --> 25:36:55,240 I will see you in the next section. Holy smokes. That was a lot of pytorch code. 15801 25:36:56,360 --> 25:37:01,640 But if you're still hungry for more, there is five more chapters available at learnpytorch.io, 15802 25:37:01,640 --> 25:37:06,840 which cover transfer learning, my favorite topic, pytorch model experiment tracking, 15803 25:37:06,840 --> 25:37:11,880 pytorch paper replicating, and pytorch model deployment. How do you get your model into the 15804 25:37:11,880 --> 25:37:17,400 hands of others? And if you'd like to learn in this video style, the videos for those chapters 15805 25:37:17,400 --> 25:37:32,520 are available at zero to mastery.io. But otherwise, happy machine learning. And I'll see you next time. 1984018

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.