Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:05,170 --> 00:00:09,160
Welcome everyone to part two of Caris syntax basics.
2
00:00:09,310 --> 00:00:14,740
In part two we're going to focus on creating the model running our model and then actually generating
3
00:00:14,740 --> 00:00:16,220
predictions from the model.
4
00:00:16,390 --> 00:00:21,820
In part three after this we'll focus on how to actually evaluate the model's performance as well as
5
00:00:21,820 --> 00:00:25,650
how to predict on brand new data sets and save and load our models.
6
00:00:25,660 --> 00:00:29,110
Let's head back to the notebook to continue where we left off last time.
7
00:00:29,110 --> 00:00:29,380
All right.
8
00:00:29,380 --> 00:00:31,900
Here I am at the notebook where we left off last time.
9
00:00:32,020 --> 00:00:35,670
So we've read in our data then just a little bit of analysis on it.
10
00:00:35,680 --> 00:00:40,790
Basically visualize that pair plot and we've scaled our featured data.
11
00:00:40,900 --> 00:00:45,910
So now the next step is to actually create the model with the curious syntax.
12
00:00:45,940 --> 00:00:52,660
And for this we need to do two imports we'll say from tensor flow that carries and this is the way the
13
00:00:52,660 --> 00:00:55,600
curve API is packaged inside of tensor flow.
14
00:00:55,630 --> 00:01:00,370
We just say from tensor flow that carries and then we can do any imports we want as if we had already
15
00:01:00,370 --> 00:01:06,650
installed the caris library separately in the first one we're going to import is the sequential model
16
00:01:08,740 --> 00:01:15,640
and then the next thing we're going to do is say from tensor flow that carries layers will import.
17
00:01:15,870 --> 00:01:23,670
That's and says all we need to build a very simple model with Caris essentially what we do is we set
18
00:01:23,670 --> 00:01:27,340
up a base sequential model and then keep adding layers to it.
19
00:01:27,420 --> 00:01:30,790
In this case we'll just add a simple dense layer.
20
00:01:30,810 --> 00:01:37,200
I would also highly recommend that if you call help on either of these such as help sequential there's
21
00:01:37,200 --> 00:01:41,520
really nice documentation and various examples inside the documentation.
22
00:01:41,520 --> 00:01:46,050
So it actually shows you a lot of what we're going to do which is how to construct the model and then
23
00:01:46,050 --> 00:01:48,270
how to add layers to it et cetera.
24
00:01:48,270 --> 00:01:52,810
So there's lots of really cool examples here not just for sequential but if you also check out dance
25
00:01:53,670 --> 00:01:58,350
it'll give you a lot of information on the various parameters it takes then if you scroll down you'll
26
00:01:58,350 --> 00:02:00,390
eventually also see some examples.
27
00:02:00,600 --> 00:02:03,590
But let's go into work through how we can actually do this.
28
00:02:03,600 --> 00:02:08,850
Now there's two ways of creating a character based model.
29
00:02:08,890 --> 00:02:17,250
One way is to call sequential and then passing a list of the actual layers you want.
30
00:02:17,320 --> 00:02:23,320
So I'm going to pass on a dense layer and if we take a look at dense all this means if we expand on
31
00:02:23,320 --> 00:02:30,400
this is that it's a regular densely connected neural network layer and all that means is if something
32
00:02:30,400 --> 00:02:36,010
is densely connected it's going to be a normal feed forward network where every neuron is connected
33
00:02:36,310 --> 00:02:38,870
to every other neuron in the next layer.
34
00:02:39,010 --> 00:02:43,540
Later on we'll learn about more sophisticated layers but the dense layer is what we're working with
35
00:02:43,540 --> 00:02:46,970
right now for our basic artificial neural networks.
36
00:02:46,990 --> 00:02:50,560
You'll also notice that it has quite a few parameter calls inside of it.
37
00:02:50,800 --> 00:02:57,280
The two parameter calls that we need to be aware of right now is units and activation units is just
38
00:02:57,280 --> 00:02:58,680
another word for neurons.
39
00:02:58,720 --> 00:03:04,320
Essentially how many neurons are actually going to be in this layer and then activation takes in a string
40
00:03:04,330 --> 00:03:09,760
call for what activation function these neurons should be using should they be using a sigmoid activation
41
00:03:09,950 --> 00:03:12,710
they rectified linear unit et cetera.
42
00:03:12,760 --> 00:03:18,500
So for right now what we're gonna do is show you how we could build out a network.
43
00:03:18,550 --> 00:03:24,520
Let's imagine I wanted my first layer to be four neurons densely connected meaning every neurons connected
44
00:03:24,520 --> 00:03:31,850
to every other neuron and then we can say my activation and I could say something like R E L U which
45
00:03:31,850 --> 00:03:33,700
stands for rectified linear unit.
46
00:03:33,950 --> 00:03:38,090
And if you're confused on these activation functions make sure you go back and check out the theory
47
00:03:38,090 --> 00:03:40,540
portions of the section of the course.
48
00:03:40,550 --> 00:03:44,720
Hopefully can see the connection here between what we discussed in theory versus what we're actually
49
00:03:44,720 --> 00:03:46,370
implementing here if code.
50
00:03:46,370 --> 00:03:51,350
And if I wanted to add in another layer I would simply keep passing these into my list so maybe I want
51
00:03:51,350 --> 00:03:56,770
the next layer to have two neurons and another activation functional rectified linear unit.
52
00:03:56,810 --> 00:04:02,120
I could also pass in strings like sigmoid etc. and you can check out the online documentation for the
53
00:04:02,120 --> 00:04:05,620
various string calls for different activation functions.
54
00:04:05,670 --> 00:04:10,190
Let's imagine I wanted one final output layer with one unit.
55
00:04:10,190 --> 00:04:14,410
I would just say that's one again I could play around the activation function there.
56
00:04:14,600 --> 00:04:17,090
But this is just one way we could build out the model.
57
00:04:17,210 --> 00:04:20,150
So that is upon your call for sequential.
58
00:04:20,150 --> 00:04:25,430
You actually pass in a list of those layers the other where we can do this.
59
00:04:25,820 --> 00:04:28,800
And this is going to be our preferred method throughout the course.
60
00:04:28,910 --> 00:04:36,160
You'll see why in a second is to create an empty sequential model and then off that model variable you
61
00:04:36,220 --> 00:04:44,270
add the layers in separately one at a time so you would say activation rectified linear unit and then
62
00:04:44,270 --> 00:04:50,770
I can just copy and paste this command as such and then kind of play around the values here.
63
00:04:50,770 --> 00:04:57,580
Maybe I want this and since this my last layer I don't actually want an activation etc. So this cell
64
00:04:57,760 --> 00:05:02,290
and this cell are actually going to produce the exact same model.
65
00:05:02,350 --> 00:05:04,000
So what's the difference between them.
66
00:05:04,000 --> 00:05:05,640
As far as convenience.
67
00:05:05,680 --> 00:05:10,830
Well what's really convenient about doing this in separate lines like this instead of one giant list
68
00:05:10,840 --> 00:05:18,100
call is if I ever want to quickly edit or turn off a layer I can simply just commented out as so and
69
00:05:18,100 --> 00:05:23,410
then rerun the cell and now I'll go straight from my four neurons to this last one you're on output
70
00:05:23,410 --> 00:05:28,300
layer that's a little harder to do here when we're dealing with a list you would have to kind of delete
71
00:05:28,300 --> 00:05:32,440
this already have it on separate lines and be careful of the way we comment things.
72
00:05:32,560 --> 00:05:37,960
So that's why we're gonna focus on using this methodology for building out our models we'll create an
73
00:05:37,960 --> 00:05:42,190
empty sequential model and then we'll add in the layers one by one.
74
00:05:42,190 --> 00:05:45,430
So let's go ahead and do that for our particular dataset.
75
00:05:45,640 --> 00:05:54,190
In this case what we're gonna do is we'll have three layers with four neurons each using a rectified
76
00:05:54,190 --> 00:05:59,510
linear unit and then our very last layer will be just one final output node.
77
00:05:59,740 --> 00:06:05,470
So the final output layer is actually pretty important and that's gonna be determined by your actual
78
00:06:06,070 --> 00:06:08,920
data and your actual situation of what you're trying to predict.
79
00:06:08,920 --> 00:06:15,610
Recall that with this particular dataset we're predicting a single numerical price value.
80
00:06:15,670 --> 00:06:21,290
So what I want is my very last layer to be a single neuron that produces some sort of price.
81
00:06:21,340 --> 00:06:25,710
So it's going to predict maybe four hundred and fifty dollars or six hundred dollars et cetera.
82
00:06:25,780 --> 00:06:32,980
That's why I'm choosing that final layer to just have dense one where it's just going to try to predict
83
00:06:33,250 --> 00:06:34,150
the price.
84
00:06:34,150 --> 00:06:38,920
So that final output is then going to be measured against the true price.
85
00:06:38,920 --> 00:06:46,060
And we'll do that with some sort of lost function and that's where this final line comes into play which
86
00:06:46,060 --> 00:06:48,930
is compiling your model and compiling your model.
87
00:06:48,970 --> 00:06:50,610
We take a look at shift tab here.
88
00:06:50,770 --> 00:06:55,930
It again has lots of different parameter calls and we're going to be exploring these later on in the
89
00:06:55,930 --> 00:06:56,710
course.
90
00:06:56,710 --> 00:07:01,960
But the main ones we want to look at right now are the optimizer and the loss function.
91
00:07:01,960 --> 00:07:06,640
The optimizer is essentially just kind of asking you how do you actually want to perform this gradient
92
00:07:06,640 --> 00:07:07,190
descent.
93
00:07:07,210 --> 00:07:12,490
Do you want to use as prop or as we discussed there's other methods of optimization such as the atom
94
00:07:12,490 --> 00:07:13,450
optimizer.
95
00:07:13,450 --> 00:07:18,370
So I could also say optimizer is equal to and then in the string past an atom which is the string code
96
00:07:18,370 --> 00:07:23,710
for the atom optimizer and you can reference the documentation to see what optimizes are available for
97
00:07:23,710 --> 00:07:29,830
you and then what's also really important here is this loss parameter and the last parameter that string
98
00:07:29,830 --> 00:07:34,900
code is going to change dependent on what you're actually trying to accomplish here.
99
00:07:35,080 --> 00:07:41,260
And if you take a look at our curves syntax basics notebook if you scroll down we actually have a little
100
00:07:41,260 --> 00:07:44,620
section here on choosing an optimizer and loss.
101
00:07:44,620 --> 00:07:50,800
So if you're performing a multi class classification problem you can really choose a variety of optimizer
102
00:07:50,800 --> 00:07:58,190
is but the loss string parameter you should be calling is categorical cross entropy.
103
00:07:58,300 --> 00:08:03,670
If you're performing a binary classification problem you can again choose various optimizer is but the
104
00:08:03,670 --> 00:08:11,200
loss here will be binary cross entropy and in our case we are performing a regression problem because
105
00:08:11,260 --> 00:08:13,750
our label is a continuous value.
106
00:08:13,810 --> 00:08:21,460
So in our case we will use means squared error as our lost functionality so means square makes sense
107
00:08:21,580 --> 00:08:26,200
because we'll essentially be taking the mean squared error of our predictions against true values and
108
00:08:26,200 --> 00:08:29,770
be trying to minimize that through our optimizer call.
109
00:08:29,770 --> 00:08:35,680
So let's come back to our notebook here and then follow along with that we'll say our optimizer is equal
110
00:08:35,680 --> 00:08:44,310
to our mess prop and more importantly our loss since we're performing a regression based task is MSE
111
00:08:44,670 --> 00:08:51,120
which is our means squared error and once we run that we have a full model ready to go and I'll go ahead
112
00:08:51,180 --> 00:08:55,670
and delete this cell so that we just see that model being created.
113
00:08:55,680 --> 00:09:00,900
So have our models sequential we add in whatever layers we want with as many neurons and activation
114
00:09:01,440 --> 00:09:02,790
functions that we want.
115
00:09:02,790 --> 00:09:08,790
We take care to make sure our last output layer matches the actual task we're trying to solve as well
116
00:09:08,820 --> 00:09:14,790
as when we're compiling it making sure our lost function or loss parameter call matches will we're actually
117
00:09:14,790 --> 00:09:17,450
trying to solve once that is done.
118
00:09:17,700 --> 00:09:23,690
We're ready to train the model or fit the model to the training data and we can do this by saying models
119
00:09:23,690 --> 00:09:25,080
that fit.
120
00:09:25,080 --> 00:09:30,720
And then again if you do shift tab here you'll notice a wide variety of parameters some of which we're
121
00:09:30,720 --> 00:09:33,330
gonna go over in a later lecture.
122
00:09:33,420 --> 00:09:39,870
But the main ones I want to concern you with right now are X Y and then epochs.
123
00:09:39,870 --> 00:09:48,510
So X is simply what are the features that we're training on in this case x train and then why are what
124
00:09:48,510 --> 00:09:53,850
are the actual training labels that correspond to those training feature points which in our case is
125
00:09:53,850 --> 00:09:56,830
white train and then epochs.
126
00:09:56,830 --> 00:09:58,000
What's that stands for.
127
00:09:58,030 --> 00:10:05,130
Is one epoch means you've gone through the entire dataset one time and again as a quick note.
128
00:10:05,140 --> 00:10:11,050
If you take a look at our provided notebook we actually provide essentially a rundown of what this batch
129
00:10:11,080 --> 00:10:12,730
and what epochs mean.
130
00:10:12,730 --> 00:10:20,360
So this epoch is essentially an arbitrary cut off that's defined as one pass over the entire dataset.
131
00:10:20,380 --> 00:10:26,650
So if I've gone through the entirety of X train one time that is one epoch.
132
00:10:26,800 --> 00:10:32,490
So I'll go ahead and say that my model is going to go through the training set.
133
00:10:32,590 --> 00:10:33,860
Two hundred and fifty times.
134
00:10:33,880 --> 00:10:36,070
So two hundred fifty epochs.
135
00:10:36,070 --> 00:10:42,250
Later on we'll discuss had actually choose that number correctly and how we can actually use callbacks
136
00:10:42,250 --> 00:10:48,880
of carers to add an early stopping so that we can just choose some arbitrarily large epochs number and
137
00:10:48,880 --> 00:10:54,910
then our model itself will be smart enough to stop at a certain particular time that is optimized based
138
00:10:54,910 --> 00:10:56,420
off some validation loss.
139
00:10:56,440 --> 00:11:01,810
So keep in mind this is kind of the simplest way we could fit but later all will become more sophisticated
140
00:11:01,840 --> 00:11:04,660
for capabilities and actually fitting our model.
141
00:11:04,660 --> 00:11:09,750
So we'll be dealing with things like batch sizes callbacks validation data etc..
142
00:11:10,060 --> 00:11:16,240
Right now we'll keep it very simple and we'll simply fit on extreme corresponding to why train for two
143
00:11:16,240 --> 00:11:17,560
hundred fifty epochs.
144
00:11:17,560 --> 00:11:23,710
Last thing I want to note here is that there is a verbose call so verbose equals one that essentially
145
00:11:23,710 --> 00:11:27,100
indicates the printed output during training.
146
00:11:27,130 --> 00:11:32,260
So if I run this cell you'll notice that as a continuous training over these epochs it prints out these
147
00:11:32,260 --> 00:11:36,570
little reports for me the higher the number on verbose parameter call.
148
00:11:36,700 --> 00:11:38,670
That means the more information is displayed.
149
00:11:38,800 --> 00:11:43,400
And if you set verbose equal to zero it means it won't actually output anything.
150
00:11:43,540 --> 00:11:47,360
I would recommend however that you set for both equal to some number that is not zero.
151
00:11:47,410 --> 00:11:50,580
So you can get an idea of where you are in your actual training stage.
152
00:11:50,620 --> 00:11:55,390
Otherwise you just see a cell running and you won't know if it's on the tenth epoch or the two hundred
153
00:11:55,390 --> 00:11:56,400
three POC.
154
00:11:56,480 --> 00:11:59,270
Now this is just a very simple and small dataset.
155
00:11:59,290 --> 00:12:03,880
You'll notice that we essentially finished our training and you should also notice that our mean square
156
00:12:03,910 --> 00:12:08,500
error is very large in the beginning because it essentially starts off with just the random weights
157
00:12:08,500 --> 00:12:09,300
and biases.
158
00:12:09,520 --> 00:12:12,490
But as it begins to adjust these weights and biases.
159
00:12:12,490 --> 00:12:17,980
So we'll scroll down to later epochs you'll notice the loss is slowly decreasing and it should decrease
160
00:12:17,980 --> 00:12:23,650
very quickly at first and then kind of slowly as it goes further and further until towards the end it
161
00:12:23,650 --> 00:12:30,760
should begin adjusting to some sort of means squared value so we can go ahead and see this by plotting
162
00:12:30,760 --> 00:12:31,590
it out.
163
00:12:31,630 --> 00:12:40,040
So I'm going to come down here to this new cell and show you how we can actually take a look at this
164
00:12:40,040 --> 00:12:45,840
training history by saying model that history thought history and that returns back.
165
00:12:45,840 --> 00:12:55,130
This dictionary of the corresponding historical losses which means I can pass this to a data frame and
166
00:12:55,130 --> 00:13:03,350
say model that history that history run that I and get this nice data frame so I'll set that as my lost
167
00:13:03,350 --> 00:13:10,850
data frame and then I can actually plot this out by simply saying loss underscored the F plot run that
168
00:13:11,390 --> 00:13:13,250
I see something that looks like this.
169
00:13:13,250 --> 00:13:16,210
So this is very typical of neural network training.
170
00:13:16,220 --> 00:13:21,630
You start with a very high loss during your first couple of epoch runs and then as the weights and biases
171
00:13:21,650 --> 00:13:27,590
start to get adjusted you hopefully see kind of a steady but steep decline in your loss or your error
172
00:13:27,980 --> 00:13:33,350
and eventually it will level off where you're not really doing any sort of improvement in your performance
173
00:13:33,740 --> 00:13:39,290
as you train more and more and later on we'll be able to compare this to our validation loss to actually
174
00:13:39,290 --> 00:13:41,440
check for things like overfishing.
175
00:13:41,490 --> 00:13:46,720
OK so we just went over how to create a model as well as how to fit a model.
176
00:13:46,790 --> 00:13:51,860
Coming up next we're actually going to dive deeper into the evaluation of this model against the test
177
00:13:51,860 --> 00:13:56,850
set as well as how to evaluate against a brand new data point.
178
00:13:56,900 --> 00:13:58,640
So thanks and I'll see you at part 3.
20387
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.