Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,550 --> 00:00:04,780
Hello and welcome back to the course on machine learning in today's tutorial We will show you how to
2
00:00:04,780 --> 00:00:06,990
download the data sets for this course.
3
00:00:07,060 --> 00:00:14,380
Now as you can probably tell the courses large has got to over 200 tutorials and over 35 hours of content.
4
00:00:14,620 --> 00:00:20,230
And as you can imagine that much training we will have to have lots and lots of data sets and that's
5
00:00:20,230 --> 00:00:25,750
why we have decided to place all of the data centers on their own separate page.
6
00:00:25,780 --> 00:00:32,520
So in order to get the data sets you will need to go to W.W. does super data science dot com slash machine
7
00:00:32,530 --> 00:00:33,950
high from learning.
8
00:00:33,970 --> 00:00:36,300
So that's two words machine high and learning.
9
00:00:36,370 --> 00:00:38,800
The website is super data centers dot com.
10
00:00:38,800 --> 00:00:46,780
And here you'll see a whole page dedicated to this course with lots and lots of data sets available
11
00:00:46,780 --> 00:00:53,800
which you can download and install onto your machine in order to fall along with the tutorials.
12
00:00:53,800 --> 00:00:59,230
So today we're going to start off with the very first one as an example and then throughout the course
13
00:00:59,230 --> 00:01:02,160
you'll be able to download the right data set for every single session.
14
00:01:02,260 --> 00:01:04,310
Depending on the session that you're doing.
15
00:01:04,600 --> 00:01:09,380
So today we're going to start off with the data pre-processing data set.
16
00:01:09,400 --> 00:01:16,090
And here also we are going to get right away the machine learning it is that template folder is a special
17
00:01:16,090 --> 00:01:23,380
template folder we've created for you to help you store these data sets in a hierarchical fashion so
18
00:01:23,380 --> 00:01:29,340
that you can navigate all of these datasets better and that's so that they're all in the right place
19
00:01:29,340 --> 00:01:33,620
so we have a very orderly folder structure on your own machine.
20
00:01:33,640 --> 00:01:36,560
So go ahead and download these to the above zip files.
21
00:01:36,610 --> 00:01:41,500
So if you just click there and click there you'll see that these are being downloaded.
22
00:01:41,500 --> 00:01:44,350
The first one is very small is just empty folders.
23
00:01:44,380 --> 00:01:48,980
And then the second one is the data set for the section and then whenever you get to a new section we'll
24
00:01:49,000 --> 00:01:50,760
of course remind you at the start of the section.
25
00:01:50,770 --> 00:01:56,680
But basically what you'll need to do is just download the right data set for your section that you're
26
00:01:56,680 --> 00:01:58,900
in but more on that later.
27
00:01:58,900 --> 00:02:00,880
Let's move on to today's section.
28
00:02:01,030 --> 00:02:02,970
So we've got these two zip files here.
29
00:02:03,040 --> 00:02:08,500
All you have to do is take the machine learning what is a template folder and unzip it to the location
30
00:02:08,500 --> 00:02:09,750
where you want it to be.
31
00:02:09,760 --> 00:02:15,550
So I'll get on Zip mind to the desktop I'm going to right click and I'm going to extract just click
32
00:02:15,560 --> 00:02:16,520
effect here.
33
00:02:16,630 --> 00:02:18,570
So that's on Windows on Mac.
34
00:02:18,580 --> 00:02:23,640
Similar thing just open unzip the file and zip zip folder.
35
00:02:23,800 --> 00:02:25,150
So there we go there's a folder.
36
00:02:25,150 --> 00:02:30,970
And now if you look inside here you'll see that you've got a very nice neat structure.
37
00:02:31,120 --> 00:02:37,570
You can go inside any one of these sections for instance clustering you'll see the title of the section
38
00:02:37,600 --> 00:02:39,640
and then you can go into any one of these.
39
00:02:39,970 --> 00:02:44,290
And again these are empty for now and that's because we haven't done those sections as you go through
40
00:02:44,290 --> 00:02:49,450
the course you will populate these folders with their respective data sets.
41
00:02:49,450 --> 00:02:52,680
Now we're going to go into data pre-processing so part 1.
42
00:02:52,900 --> 00:02:58,390
And here we've got this whole empty folder so we don't worry about the folders with the dashes those
43
00:02:58,390 --> 00:03:02,510
are just titles to remind us where we're located inside the folder structure.
44
00:03:02,530 --> 00:03:10,000
So just go ahead and take your data pre-processing zip folder drag it here and right click and just
45
00:03:10,000 --> 00:03:12,410
say extract action here.
46
00:03:14,230 --> 00:03:19,320
And again just go inside to pre-processing because we don't want it to be in its own separate folder.
47
00:03:19,390 --> 00:03:21,320
Just take all these files.
48
00:03:21,550 --> 00:03:22,960
Copy that was right click.
49
00:03:22,960 --> 00:03:26,290
Actually cut them and then paste them here.
50
00:03:26,350 --> 00:03:29,600
So basically you don't need this folder now because it's empty.
51
00:03:29,650 --> 00:03:31,060
Delete that folder.
52
00:03:31,090 --> 00:03:36,410
You can delete the zip file because we don't need it anymore and you can delete the zip file as well.
53
00:03:36,700 --> 00:03:37,300
So there we go.
54
00:03:37,300 --> 00:03:40,650
Now we have our machine learning aitches a template folder.
55
00:03:40,720 --> 00:03:46,970
You can remove sheilas do that we can remove template folder from there and just say machine learning
56
00:03:47,050 --> 00:03:48,030
to read.
57
00:03:48,730 --> 00:03:55,360
And if you go in here you'll see data pre-processing and there you go see you've got your data set ready
58
00:03:55,360 --> 00:04:01,600
for the session plus you really have all of the templates which you will be creating with Hoddle and
59
00:04:01,600 --> 00:04:04,520
throughout the tutorials in this section.
60
00:04:04,570 --> 00:04:07,870
So that's pretty much what you need to do for every single section that you go through.
61
00:04:07,900 --> 00:04:12,250
And again I will remind you and the reason for the structure is so what why did we structure it like
62
00:04:12,250 --> 00:04:12,850
this.
63
00:04:12,850 --> 00:04:19,300
Why did we for instance not include all of the data sets right away inside these folders for you right.
64
00:04:19,300 --> 00:04:23,710
That would looks feels like it would be more convenient but at the same time there's a couple of reasons
65
00:04:23,720 --> 00:04:30,730
so the first one is that in case we need to update something in case we need to update a certain section
66
00:04:30,730 --> 00:04:33,780
like this section for instance we need to update the dataset.
67
00:04:33,820 --> 00:04:39,880
Well in that case if we had to update it and then upload the whole folder that would take time.
68
00:04:39,880 --> 00:04:46,450
That would mean the Course would not be available for longer or that means while we're updating a lot
69
00:04:46,450 --> 00:04:50,350
of people we'll be getting the wrong data set and we don't want that so if we want to update something
70
00:04:50,350 --> 00:04:56,140
now we can very quickly just update that one zip file on the Web site and that's very very quick for
71
00:04:56,140 --> 00:04:56,730
us to do.
72
00:04:56,980 --> 00:05:02,170
And the second reason is of course size if we had put all of the datasets in here right away this fall
73
00:05:02,170 --> 00:05:03,190
there would be massive.
74
00:05:03,190 --> 00:05:09,550
So it's a much more efficient to download just a section that you're doing and then proceed with those
75
00:05:10,030 --> 00:05:10,960
tutorials.
76
00:05:10,960 --> 00:05:11,550
So there we go.
77
00:05:11,560 --> 00:05:12,790
That's how you get the data set.
78
00:05:12,800 --> 00:05:17,760
And now I'll hand you over to her son who'll take you through two days of data set.
79
00:05:17,770 --> 00:05:24,730
So what does this data set about this virus that contains Arkan's country age salary and purchased and
80
00:05:24,820 --> 00:05:31,780
10 10 months 10 observations and basically this contains information of customers of some company.
81
00:05:32,080 --> 00:05:37,780
And the first three columns are informations of these customers like the country the age and the salary
82
00:05:38,230 --> 00:05:41,070
and the fourth column purchased here sells.
83
00:05:41,100 --> 00:05:45,430
If yes or no the customer but the product of the company.
84
00:05:45,430 --> 00:05:51,310
So we have to distinguish something very important here that we will distinguish for the rest of the
85
00:05:51,310 --> 00:05:52,090
course.
86
00:05:52,150 --> 00:05:57,220
It's the difference between the independent variables and the dependent variables.
87
00:05:57,490 --> 00:06:00,510
So the independent variables are the first three columns.
88
00:06:00,510 --> 00:06:05,340
Country age and salary and the dependent variable is purchased here.
89
00:06:05,350 --> 00:06:12,490
The fourth column and in any machine or any model we are going to use some independent variables to
90
00:06:12,490 --> 00:06:14,390
predict a dependent variable.
91
00:06:14,620 --> 00:06:20,050
So that means here that with this three first columns the three independent variables we are going to
92
00:06:20,050 --> 00:06:22,010
predict if yes or no.
93
00:06:22,060 --> 00:06:24,670
The customer purchased a product.
94
00:06:25,060 --> 00:06:25,400
Okay.
95
00:06:25,400 --> 00:06:30,970
So that's the first distinction that we really need to understand and it's very important to do this
96
00:06:30,970 --> 00:06:36,040
section because the data pre-processing steps that we're going into in this section we will have to
97
00:06:36,040 --> 00:06:39,810
do it for all the machine learning models we are going to make.
98
00:06:39,820 --> 00:06:43,010
So it's really essential to know how to manage this.
99
00:06:43,090 --> 00:06:47,050
But don't worry it's going to be very simple and besides I'm going to give you at the end of this section
100
00:06:47,050 --> 00:06:52,930
a template that will allow us later to preprocess the data in fleshlight for all the machine learning
101
00:06:52,930 --> 00:06:54,600
models we're going to make.
102
00:06:54,610 --> 00:06:56,740
So I look forward to starting the steps with you.
103
00:06:56,750 --> 00:06:58,630
And until then enjoy machine learning.
10942
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.