All language subtitles for 14. TF Syntax Basics - Part Two - Creating and Training the Model

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic Download
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:05,170 --> 00:00:09,160 Welcome everyone to part two of Caris syntax basics. 2 00:00:09,310 --> 00:00:14,740 In part two we're going to focus on creating the model running our model and then actually generating 3 00:00:14,740 --> 00:00:16,220 predictions from the model. 4 00:00:16,390 --> 00:00:21,820 In part three after this we'll focus on how to actually evaluate the model's performance as well as 5 00:00:21,820 --> 00:00:25,650 how to predict on brand new data sets and save and load our models. 6 00:00:25,660 --> 00:00:29,110 Let's head back to the notebook to continue where we left off last time. 7 00:00:29,110 --> 00:00:29,380 All right. 8 00:00:29,380 --> 00:00:31,900 Here I am at the notebook where we left off last time. 9 00:00:32,020 --> 00:00:35,670 So we've read in our data then just a little bit of analysis on it. 10 00:00:35,680 --> 00:00:40,790 Basically visualize that pair plot and we've scaled our featured data. 11 00:00:40,900 --> 00:00:45,910 So now the next step is to actually create the model with the curious syntax. 12 00:00:45,940 --> 00:00:52,660 And for this we need to do two imports we'll say from tensor flow that carries and this is the way the 13 00:00:52,660 --> 00:00:55,600 curve API is packaged inside of tensor flow. 14 00:00:55,630 --> 00:01:00,370 We just say from tensor flow that carries and then we can do any imports we want as if we had already 15 00:01:00,370 --> 00:01:06,650 installed the caris library separately in the first one we're going to import is the sequential model 16 00:01:08,740 --> 00:01:15,640 and then the next thing we're going to do is say from tensor flow that carries layers will import. 17 00:01:15,870 --> 00:01:23,670 That's and says all we need to build a very simple model with Caris essentially what we do is we set 18 00:01:23,670 --> 00:01:27,340 up a base sequential model and then keep adding layers to it. 19 00:01:27,420 --> 00:01:30,790 In this case we'll just add a simple dense layer. 20 00:01:30,810 --> 00:01:37,200 I would also highly recommend that if you call help on either of these such as help sequential there's 21 00:01:37,200 --> 00:01:41,520 really nice documentation and various examples inside the documentation. 22 00:01:41,520 --> 00:01:46,050 So it actually shows you a lot of what we're going to do which is how to construct the model and then 23 00:01:46,050 --> 00:01:48,270 how to add layers to it et cetera. 24 00:01:48,270 --> 00:01:52,810 So there's lots of really cool examples here not just for sequential but if you also check out dance 25 00:01:53,670 --> 00:01:58,350 it'll give you a lot of information on the various parameters it takes then if you scroll down you'll 26 00:01:58,350 --> 00:02:00,390 eventually also see some examples. 27 00:02:00,600 --> 00:02:03,590 But let's go into work through how we can actually do this. 28 00:02:03,600 --> 00:02:08,850 Now there's two ways of creating a character based model. 29 00:02:08,890 --> 00:02:17,250 One way is to call sequential and then passing a list of the actual layers you want. 30 00:02:17,320 --> 00:02:23,320 So I'm going to pass on a dense layer and if we take a look at dense all this means if we expand on 31 00:02:23,320 --> 00:02:30,400 this is that it's a regular densely connected neural network layer and all that means is if something 32 00:02:30,400 --> 00:02:36,010 is densely connected it's going to be a normal feed forward network where every neuron is connected 33 00:02:36,310 --> 00:02:38,870 to every other neuron in the next layer. 34 00:02:39,010 --> 00:02:43,540 Later on we'll learn about more sophisticated layers but the dense layer is what we're working with 35 00:02:43,540 --> 00:02:46,970 right now for our basic artificial neural networks. 36 00:02:46,990 --> 00:02:50,560 You'll also notice that it has quite a few parameter calls inside of it. 37 00:02:50,800 --> 00:02:57,280 The two parameter calls that we need to be aware of right now is units and activation units is just 38 00:02:57,280 --> 00:02:58,680 another word for neurons. 39 00:02:58,720 --> 00:03:04,320 Essentially how many neurons are actually going to be in this layer and then activation takes in a string 40 00:03:04,330 --> 00:03:09,760 call for what activation function these neurons should be using should they be using a sigmoid activation 41 00:03:09,950 --> 00:03:12,710 they rectified linear unit et cetera. 42 00:03:12,760 --> 00:03:18,500 So for right now what we're gonna do is show you how we could build out a network. 43 00:03:18,550 --> 00:03:24,520 Let's imagine I wanted my first layer to be four neurons densely connected meaning every neurons connected 44 00:03:24,520 --> 00:03:31,850 to every other neuron and then we can say my activation and I could say something like R E L U which 45 00:03:31,850 --> 00:03:33,700 stands for rectified linear unit. 46 00:03:33,950 --> 00:03:38,090 And if you're confused on these activation functions make sure you go back and check out the theory 47 00:03:38,090 --> 00:03:40,540 portions of the section of the course. 48 00:03:40,550 --> 00:03:44,720 Hopefully can see the connection here between what we discussed in theory versus what we're actually 49 00:03:44,720 --> 00:03:46,370 implementing here if code. 50 00:03:46,370 --> 00:03:51,350 And if I wanted to add in another layer I would simply keep passing these into my list so maybe I want 51 00:03:51,350 --> 00:03:56,770 the next layer to have two neurons and another activation functional rectified linear unit. 52 00:03:56,810 --> 00:04:02,120 I could also pass in strings like sigmoid etc. and you can check out the online documentation for the 53 00:04:02,120 --> 00:04:05,620 various string calls for different activation functions. 54 00:04:05,670 --> 00:04:10,190 Let's imagine I wanted one final output layer with one unit. 55 00:04:10,190 --> 00:04:14,410 I would just say that's one again I could play around the activation function there. 56 00:04:14,600 --> 00:04:17,090 But this is just one way we could build out the model. 57 00:04:17,210 --> 00:04:20,150 So that is upon your call for sequential. 58 00:04:20,150 --> 00:04:25,430 You actually pass in a list of those layers the other where we can do this. 59 00:04:25,820 --> 00:04:28,800 And this is going to be our preferred method throughout the course. 60 00:04:28,910 --> 00:04:36,160 You'll see why in a second is to create an empty sequential model and then off that model variable you 61 00:04:36,220 --> 00:04:44,270 add the layers in separately one at a time so you would say activation rectified linear unit and then 62 00:04:44,270 --> 00:04:50,770 I can just copy and paste this command as such and then kind of play around the values here. 63 00:04:50,770 --> 00:04:57,580 Maybe I want this and since this my last layer I don't actually want an activation etc. So this cell 64 00:04:57,760 --> 00:05:02,290 and this cell are actually going to produce the exact same model. 65 00:05:02,350 --> 00:05:04,000 So what's the difference between them. 66 00:05:04,000 --> 00:05:05,640 As far as convenience. 67 00:05:05,680 --> 00:05:10,830 Well what's really convenient about doing this in separate lines like this instead of one giant list 68 00:05:10,840 --> 00:05:18,100 call is if I ever want to quickly edit or turn off a layer I can simply just commented out as so and 69 00:05:18,100 --> 00:05:23,410 then rerun the cell and now I'll go straight from my four neurons to this last one you're on output 70 00:05:23,410 --> 00:05:28,300 layer that's a little harder to do here when we're dealing with a list you would have to kind of delete 71 00:05:28,300 --> 00:05:32,440 this already have it on separate lines and be careful of the way we comment things. 72 00:05:32,560 --> 00:05:37,960 So that's why we're gonna focus on using this methodology for building out our models we'll create an 73 00:05:37,960 --> 00:05:42,190 empty sequential model and then we'll add in the layers one by one. 74 00:05:42,190 --> 00:05:45,430 So let's go ahead and do that for our particular dataset. 75 00:05:45,640 --> 00:05:54,190 In this case what we're gonna do is we'll have three layers with four neurons each using a rectified 76 00:05:54,190 --> 00:05:59,510 linear unit and then our very last layer will be just one final output node. 77 00:05:59,740 --> 00:06:05,470 So the final output layer is actually pretty important and that's gonna be determined by your actual 78 00:06:06,070 --> 00:06:08,920 data and your actual situation of what you're trying to predict. 79 00:06:08,920 --> 00:06:15,610 Recall that with this particular dataset we're predicting a single numerical price value. 80 00:06:15,670 --> 00:06:21,290 So what I want is my very last layer to be a single neuron that produces some sort of price. 81 00:06:21,340 --> 00:06:25,710 So it's going to predict maybe four hundred and fifty dollars or six hundred dollars et cetera. 82 00:06:25,780 --> 00:06:32,980 That's why I'm choosing that final layer to just have dense one where it's just going to try to predict 83 00:06:33,250 --> 00:06:34,150 the price. 84 00:06:34,150 --> 00:06:38,920 So that final output is then going to be measured against the true price. 85 00:06:38,920 --> 00:06:46,060 And we'll do that with some sort of lost function and that's where this final line comes into play which 86 00:06:46,060 --> 00:06:48,930 is compiling your model and compiling your model. 87 00:06:48,970 --> 00:06:50,610 We take a look at shift tab here. 88 00:06:50,770 --> 00:06:55,930 It again has lots of different parameter calls and we're going to be exploring these later on in the 89 00:06:55,930 --> 00:06:56,710 course. 90 00:06:56,710 --> 00:07:01,960 But the main ones we want to look at right now are the optimizer and the loss function. 91 00:07:01,960 --> 00:07:06,640 The optimizer is essentially just kind of asking you how do you actually want to perform this gradient 92 00:07:06,640 --> 00:07:07,190 descent. 93 00:07:07,210 --> 00:07:12,490 Do you want to use as prop or as we discussed there's other methods of optimization such as the atom 94 00:07:12,490 --> 00:07:13,450 optimizer. 95 00:07:13,450 --> 00:07:18,370 So I could also say optimizer is equal to and then in the string past an atom which is the string code 96 00:07:18,370 --> 00:07:23,710 for the atom optimizer and you can reference the documentation to see what optimizes are available for 97 00:07:23,710 --> 00:07:29,830 you and then what's also really important here is this loss parameter and the last parameter that string 98 00:07:29,830 --> 00:07:34,900 code is going to change dependent on what you're actually trying to accomplish here. 99 00:07:35,080 --> 00:07:41,260 And if you take a look at our curves syntax basics notebook if you scroll down we actually have a little 100 00:07:41,260 --> 00:07:44,620 section here on choosing an optimizer and loss. 101 00:07:44,620 --> 00:07:50,800 So if you're performing a multi class classification problem you can really choose a variety of optimizer 102 00:07:50,800 --> 00:07:58,190 is but the loss string parameter you should be calling is categorical cross entropy. 103 00:07:58,300 --> 00:08:03,670 If you're performing a binary classification problem you can again choose various optimizer is but the 104 00:08:03,670 --> 00:08:11,200 loss here will be binary cross entropy and in our case we are performing a regression problem because 105 00:08:11,260 --> 00:08:13,750 our label is a continuous value. 106 00:08:13,810 --> 00:08:21,460 So in our case we will use means squared error as our lost functionality so means square makes sense 107 00:08:21,580 --> 00:08:26,200 because we'll essentially be taking the mean squared error of our predictions against true values and 108 00:08:26,200 --> 00:08:29,770 be trying to minimize that through our optimizer call. 109 00:08:29,770 --> 00:08:35,680 So let's come back to our notebook here and then follow along with that we'll say our optimizer is equal 110 00:08:35,680 --> 00:08:44,310 to our mess prop and more importantly our loss since we're performing a regression based task is MSE 111 00:08:44,670 --> 00:08:51,120 which is our means squared error and once we run that we have a full model ready to go and I'll go ahead 112 00:08:51,180 --> 00:08:55,670 and delete this cell so that we just see that model being created. 113 00:08:55,680 --> 00:09:00,900 So have our models sequential we add in whatever layers we want with as many neurons and activation 114 00:09:01,440 --> 00:09:02,790 functions that we want. 115 00:09:02,790 --> 00:09:08,790 We take care to make sure our last output layer matches the actual task we're trying to solve as well 116 00:09:08,820 --> 00:09:14,790 as when we're compiling it making sure our lost function or loss parameter call matches will we're actually 117 00:09:14,790 --> 00:09:17,450 trying to solve once that is done. 118 00:09:17,700 --> 00:09:23,690 We're ready to train the model or fit the model to the training data and we can do this by saying models 119 00:09:23,690 --> 00:09:25,080 that fit. 120 00:09:25,080 --> 00:09:30,720 And then again if you do shift tab here you'll notice a wide variety of parameters some of which we're 121 00:09:30,720 --> 00:09:33,330 gonna go over in a later lecture. 122 00:09:33,420 --> 00:09:39,870 But the main ones I want to concern you with right now are X Y and then epochs. 123 00:09:39,870 --> 00:09:48,510 So X is simply what are the features that we're training on in this case x train and then why are what 124 00:09:48,510 --> 00:09:53,850 are the actual training labels that correspond to those training feature points which in our case is 125 00:09:53,850 --> 00:09:56,830 white train and then epochs. 126 00:09:56,830 --> 00:09:58,000 What's that stands for. 127 00:09:58,030 --> 00:10:05,130 Is one epoch means you've gone through the entire dataset one time and again as a quick note. 128 00:10:05,140 --> 00:10:11,050 If you take a look at our provided notebook we actually provide essentially a rundown of what this batch 129 00:10:11,080 --> 00:10:12,730 and what epochs mean. 130 00:10:12,730 --> 00:10:20,360 So this epoch is essentially an arbitrary cut off that's defined as one pass over the entire dataset. 131 00:10:20,380 --> 00:10:26,650 So if I've gone through the entirety of X train one time that is one epoch. 132 00:10:26,800 --> 00:10:32,490 So I'll go ahead and say that my model is going to go through the training set. 133 00:10:32,590 --> 00:10:33,860 Two hundred and fifty times. 134 00:10:33,880 --> 00:10:36,070 So two hundred fifty epochs. 135 00:10:36,070 --> 00:10:42,250 Later on we'll discuss had actually choose that number correctly and how we can actually use callbacks 136 00:10:42,250 --> 00:10:48,880 of carers to add an early stopping so that we can just choose some arbitrarily large epochs number and 137 00:10:48,880 --> 00:10:54,910 then our model itself will be smart enough to stop at a certain particular time that is optimized based 138 00:10:54,910 --> 00:10:56,420 off some validation loss. 139 00:10:56,440 --> 00:11:01,810 So keep in mind this is kind of the simplest way we could fit but later all will become more sophisticated 140 00:11:01,840 --> 00:11:04,660 for capabilities and actually fitting our model. 141 00:11:04,660 --> 00:11:09,750 So we'll be dealing with things like batch sizes callbacks validation data etc.. 142 00:11:10,060 --> 00:11:16,240 Right now we'll keep it very simple and we'll simply fit on extreme corresponding to why train for two 143 00:11:16,240 --> 00:11:17,560 hundred fifty epochs. 144 00:11:17,560 --> 00:11:23,710 Last thing I want to note here is that there is a verbose call so verbose equals one that essentially 145 00:11:23,710 --> 00:11:27,100 indicates the printed output during training. 146 00:11:27,130 --> 00:11:32,260 So if I run this cell you'll notice that as a continuous training over these epochs it prints out these 147 00:11:32,260 --> 00:11:36,570 little reports for me the higher the number on verbose parameter call. 148 00:11:36,700 --> 00:11:38,670 That means the more information is displayed. 149 00:11:38,800 --> 00:11:43,400 And if you set verbose equal to zero it means it won't actually output anything. 150 00:11:43,540 --> 00:11:47,360 I would recommend however that you set for both equal to some number that is not zero. 151 00:11:47,410 --> 00:11:50,580 So you can get an idea of where you are in your actual training stage. 152 00:11:50,620 --> 00:11:55,390 Otherwise you just see a cell running and you won't know if it's on the tenth epoch or the two hundred 153 00:11:55,390 --> 00:11:56,400 three POC. 154 00:11:56,480 --> 00:11:59,270 Now this is just a very simple and small dataset. 155 00:11:59,290 --> 00:12:03,880 You'll notice that we essentially finished our training and you should also notice that our mean square 156 00:12:03,910 --> 00:12:08,500 error is very large in the beginning because it essentially starts off with just the random weights 157 00:12:08,500 --> 00:12:09,300 and biases. 158 00:12:09,520 --> 00:12:12,490 But as it begins to adjust these weights and biases. 159 00:12:12,490 --> 00:12:17,980 So we'll scroll down to later epochs you'll notice the loss is slowly decreasing and it should decrease 160 00:12:17,980 --> 00:12:23,650 very quickly at first and then kind of slowly as it goes further and further until towards the end it 161 00:12:23,650 --> 00:12:30,760 should begin adjusting to some sort of means squared value so we can go ahead and see this by plotting 162 00:12:30,760 --> 00:12:31,590 it out. 163 00:12:31,630 --> 00:12:40,040 So I'm going to come down here to this new cell and show you how we can actually take a look at this 164 00:12:40,040 --> 00:12:45,840 training history by saying model that history thought history and that returns back. 165 00:12:45,840 --> 00:12:55,130 This dictionary of the corresponding historical losses which means I can pass this to a data frame and 166 00:12:55,130 --> 00:13:03,350 say model that history that history run that I and get this nice data frame so I'll set that as my lost 167 00:13:03,350 --> 00:13:10,850 data frame and then I can actually plot this out by simply saying loss underscored the F plot run that 168 00:13:11,390 --> 00:13:13,250 I see something that looks like this. 169 00:13:13,250 --> 00:13:16,210 So this is very typical of neural network training. 170 00:13:16,220 --> 00:13:21,630 You start with a very high loss during your first couple of epoch runs and then as the weights and biases 171 00:13:21,650 --> 00:13:27,590 start to get adjusted you hopefully see kind of a steady but steep decline in your loss or your error 172 00:13:27,980 --> 00:13:33,350 and eventually it will level off where you're not really doing any sort of improvement in your performance 173 00:13:33,740 --> 00:13:39,290 as you train more and more and later on we'll be able to compare this to our validation loss to actually 174 00:13:39,290 --> 00:13:41,440 check for things like overfishing. 175 00:13:41,490 --> 00:13:46,720 OK so we just went over how to create a model as well as how to fit a model. 176 00:13:46,790 --> 00:13:51,860 Coming up next we're actually going to dive deeper into the evaluation of this model against the test 177 00:13:51,860 --> 00:13:56,850 set as well as how to evaluate against a brand new data point. 178 00:13:56,900 --> 00:13:58,640 So thanks and I'll see you at part 3. 20387

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.