Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,650 --> 00:00:03,440
Step five modeling part two.
2
00:00:03,560 --> 00:00:07,330
Choosing mentioned in the last listen.
3
00:00:07,340 --> 00:00:09,100
There were three parts to modeling.
4
00:00:09,170 --> 00:00:13,310
Choosing a model choosing a model and comparing models.
5
00:00:13,310 --> 00:00:20,990
Once you've got your data split into training validation and test sets you can start to go through each
6
00:00:20,990 --> 00:00:22,390
of these steps.
7
00:00:22,490 --> 00:00:28,700
In this lesson we're going to cover some points on choosing a model which is you choose a model and
8
00:00:28,700 --> 00:00:30,280
train it on your training data.
9
00:00:31,100 --> 00:00:36,620
Unlike creating your own algorithms from scratch there are many pre-built machine learning models which
10
00:00:36,710 --> 00:00:40,260
you can take advantage of when you first begin.
11
00:00:40,400 --> 00:00:46,430
Your main goal will be knowing what kind of machine learning algorithm to use with what kind of problem
12
00:00:46,910 --> 00:00:52,560
this is because some algorithms work better than others on different types of data.
13
00:00:52,610 --> 00:00:56,630
We'll have a look at this specifically when we get hands on with our projects.
14
00:00:56,780 --> 00:01:03,920
But for now a tidbit to remember is if you're working with structured data decision trees such as random
15
00:01:03,920 --> 00:01:11,060
forest and gradient boosting algorithms like cat boost in x g boost tend to work best and if you're
16
00:01:11,060 --> 00:01:18,790
working with unstructured data deep learning neural networks and transfer learning tend to work best.
17
00:01:18,860 --> 00:01:22,140
Once you've chosen a model your next step is to train.
18
00:01:22,520 --> 00:01:27,680
The main goal here will be to line up the inputs and outputs.
19
00:01:27,770 --> 00:01:34,520
For example in our heart disease problem we want our model to look at the feature variables the inputs
20
00:01:35,090 --> 00:01:40,500
and then find the patterns and use them to predict the target variable.
21
00:01:40,500 --> 00:01:47,060
So remember from previous lesson these variables here are the feature variables we want to use these
22
00:01:47,060 --> 00:01:49,580
to predict the target variables.
23
00:01:49,580 --> 00:01:58,600
Another common naming setting is to use x which is the data to predict Y which is the labels different
24
00:01:58,600 --> 00:02:01,450
machine learning algorithms have different ways of doing this.
25
00:02:01,900 --> 00:02:06,340
We'll see how to do it for a handful of useful ones in future projects.
26
00:02:06,580 --> 00:02:11,020
And remember training a model takes place on the training data split.
27
00:02:11,050 --> 00:02:14,490
This is where your model learns the course material.
28
00:02:14,560 --> 00:02:19,780
We don't want to let our models cheat and see the final exam before they do their study.
29
00:02:19,900 --> 00:02:25,790
Depending on how much data you have and how complex your model is training may take a while.
30
00:02:25,870 --> 00:02:32,710
One of your biggest goals when training a model is to minimize the times between experiments.
31
00:02:32,770 --> 00:02:38,420
So sometimes this will mean to use a small portion of your data first.
32
00:02:38,590 --> 00:02:45,340
For example if your training dataset had 100000 examples you might start training a model with only
33
00:02:45,340 --> 00:02:48,770
the first 10000 and see how it goes.
34
00:02:48,790 --> 00:02:55,240
You might also decide to use a less complicated model to begin with deep model such as no networks generally
35
00:02:55,360 --> 00:02:58,350
take longer to train than other kinds of models.
36
00:02:58,350 --> 00:03:02,590
Now this is something worth considering when it comes to training your own models.
37
00:03:02,590 --> 00:03:09,730
For example if an experiment takes you three hours or even up to a couple of days for a small percentage
38
00:03:09,730 --> 00:03:15,050
abuse and performance of your model you might consider is this experiment actually worth it.
39
00:03:15,280 --> 00:03:18,160
Because machine learning is highly iterative.
40
00:03:18,160 --> 00:03:25,090
We want to minimize this experimentation time that we can go from step 1 to step 2 to step 3.
41
00:03:25,090 --> 00:03:32,930
But again if this looks confusing we'll see it in practice in the hands on projects and things to remember.
42
00:03:33,100 --> 00:03:36,280
Some models work better than others and different problems.
43
00:03:36,280 --> 00:03:38,730
Don't be afraid to try things and this is really important.
44
00:03:38,730 --> 00:03:41,910
Machine learning is as we said before a highly iterative process.
45
00:03:41,920 --> 00:03:48,760
So some things that you try out may not work the first time and that means you just neglect that thing
46
00:03:48,760 --> 00:03:51,240
going forward and try something else.
47
00:03:51,280 --> 00:03:54,910
Start small and build up add complexity as you need.
48
00:03:54,910 --> 00:03:59,620
What this means is that for example if you had one hundred thousand examples we're going to start on
49
00:03:59,620 --> 00:04:04,870
ten thousand we're going to use a simple model first to begin with rather than going to the biggest
50
00:04:04,870 --> 00:04:10,690
and latest and greatest model because what we're after is practical results not something that's that's
51
00:04:10,700 --> 00:04:12,190
the best on paper.
52
00:04:12,190 --> 00:04:15,180
We want to see something that can actually be used in the real world.
53
00:04:16,030 --> 00:04:21,850
And part of your experiments will involve tuning a model which shows good initial results to get better
54
00:04:21,910 --> 00:04:22,660
results.
55
00:04:22,660 --> 00:04:27,730
Like tuning a car if your car does well on on one track it might not do well on another track.
56
00:04:27,940 --> 00:04:29,950
So you turn it up.
57
00:04:29,950 --> 00:04:34,510
Let's have a look at how to do that but instead of four cars we'll do it for machine learning models.
6354
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.