All language subtitles for 010 Get the dataset-en

af Afrikaans
sq Albanian
am Amharic
ar Arabic Download
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bn Bengali
bs Bosnian
bg Bulgarian
ca Catalan
ceb Cebuano
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
tl Filipino
fi Finnish
fr French
fy Frisian
gl Galician
ka Georgian
de German
el Greek
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
km Khmer
ko Korean
ku Kurdish (Kurmanji)
ky Kyrgyz
lo Lao
la Latin
lv Latvian
lt Lithuanian
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mn Mongolian
my Myanmar (Burmese)
ne Nepali
no Norwegian
ps Pashto
fa Persian
pl Polish
pt Portuguese
pa Punjabi
ro Romanian
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
st Sesotho
sn Shona
sd Sindhi
si Sinhala
sk Slovak
sl Slovenian
so Somali
es Spanish
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
te Telugu
th Thai
tr Turkish
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
or Odia (Oriya)
rw Kinyarwanda
tk Turkmen
tt Tatar
ug Uyghur
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,550 --> 00:00:04,780 Hello and welcome back to the course on machine learning in today's tutorial We will show you how to 2 00:00:04,780 --> 00:00:06,990 download the data sets for this course. 3 00:00:07,060 --> 00:00:14,380 Now as you can probably tell the courses large has got to over 200 tutorials and over 35 hours of content. 4 00:00:14,620 --> 00:00:20,230 And as you can imagine that much training we will have to have lots and lots of data sets and that's 5 00:00:20,230 --> 00:00:25,750 why we have decided to place all of the data centers on their own separate page. 6 00:00:25,780 --> 00:00:32,520 So in order to get the data sets you will need to go to W.W. does super data science dot com slash machine 7 00:00:32,530 --> 00:00:33,950 high from learning. 8 00:00:33,970 --> 00:00:36,300 So that's two words machine high and learning. 9 00:00:36,370 --> 00:00:38,800 The website is super data centers dot com. 10 00:00:38,800 --> 00:00:46,780 And here you'll see a whole page dedicated to this course with lots and lots of data sets available 11 00:00:46,780 --> 00:00:53,800 which you can download and install onto your machine in order to fall along with the tutorials. 12 00:00:53,800 --> 00:00:59,230 So today we're going to start off with the very first one as an example and then throughout the course 13 00:00:59,230 --> 00:01:02,160 you'll be able to download the right data set for every single session. 14 00:01:02,260 --> 00:01:04,310 Depending on the session that you're doing. 15 00:01:04,600 --> 00:01:09,380 So today we're going to start off with the data pre-processing data set. 16 00:01:09,400 --> 00:01:16,090 And here also we are going to get right away the machine learning it is that template folder is a special 17 00:01:16,090 --> 00:01:23,380 template folder we've created for you to help you store these data sets in a hierarchical fashion so 18 00:01:23,380 --> 00:01:29,340 that you can navigate all of these datasets better and that's so that they're all in the right place 19 00:01:29,340 --> 00:01:33,620 so we have a very orderly folder structure on your own machine. 20 00:01:33,640 --> 00:01:36,560 So go ahead and download these to the above zip files. 21 00:01:36,610 --> 00:01:41,500 So if you just click there and click there you'll see that these are being downloaded. 22 00:01:41,500 --> 00:01:44,350 The first one is very small is just empty folders. 23 00:01:44,380 --> 00:01:48,980 And then the second one is the data set for the section and then whenever you get to a new section we'll 24 00:01:49,000 --> 00:01:50,760 of course remind you at the start of the section. 25 00:01:50,770 --> 00:01:56,680 But basically what you'll need to do is just download the right data set for your section that you're 26 00:01:56,680 --> 00:01:58,900 in but more on that later. 27 00:01:58,900 --> 00:02:00,880 Let's move on to today's section. 28 00:02:01,030 --> 00:02:02,970 So we've got these two zip files here. 29 00:02:03,040 --> 00:02:08,500 All you have to do is take the machine learning what is a template folder and unzip it to the location 30 00:02:08,500 --> 00:02:09,750 where you want it to be. 31 00:02:09,760 --> 00:02:15,550 So I'll get on Zip mind to the desktop I'm going to right click and I'm going to extract just click 32 00:02:15,560 --> 00:02:16,520 effect here. 33 00:02:16,630 --> 00:02:18,570 So that's on Windows on Mac. 34 00:02:18,580 --> 00:02:23,640 Similar thing just open unzip the file and zip zip folder. 35 00:02:23,800 --> 00:02:25,150 So there we go there's a folder. 36 00:02:25,150 --> 00:02:30,970 And now if you look inside here you'll see that you've got a very nice neat structure. 37 00:02:31,120 --> 00:02:37,570 You can go inside any one of these sections for instance clustering you'll see the title of the section 38 00:02:37,600 --> 00:02:39,640 and then you can go into any one of these. 39 00:02:39,970 --> 00:02:44,290 And again these are empty for now and that's because we haven't done those sections as you go through 40 00:02:44,290 --> 00:02:49,450 the course you will populate these folders with their respective data sets. 41 00:02:49,450 --> 00:02:52,680 Now we're going to go into data pre-processing so part 1. 42 00:02:52,900 --> 00:02:58,390 And here we've got this whole empty folder so we don't worry about the folders with the dashes those 43 00:02:58,390 --> 00:03:02,510 are just titles to remind us where we're located inside the folder structure. 44 00:03:02,530 --> 00:03:10,000 So just go ahead and take your data pre-processing zip folder drag it here and right click and just 45 00:03:10,000 --> 00:03:12,410 say extract action here. 46 00:03:14,230 --> 00:03:19,320 And again just go inside to pre-processing because we don't want it to be in its own separate folder. 47 00:03:19,390 --> 00:03:21,320 Just take all these files. 48 00:03:21,550 --> 00:03:22,960 Copy that was right click. 49 00:03:22,960 --> 00:03:26,290 Actually cut them and then paste them here. 50 00:03:26,350 --> 00:03:29,600 So basically you don't need this folder now because it's empty. 51 00:03:29,650 --> 00:03:31,060 Delete that folder. 52 00:03:31,090 --> 00:03:36,410 You can delete the zip file because we don't need it anymore and you can delete the zip file as well. 53 00:03:36,700 --> 00:03:37,300 So there we go. 54 00:03:37,300 --> 00:03:40,650 Now we have our machine learning aitches a template folder. 55 00:03:40,720 --> 00:03:46,970 You can remove sheilas do that we can remove template folder from there and just say machine learning 56 00:03:47,050 --> 00:03:48,030 to read. 57 00:03:48,730 --> 00:03:55,360 And if you go in here you'll see data pre-processing and there you go see you've got your data set ready 58 00:03:55,360 --> 00:04:01,600 for the session plus you really have all of the templates which you will be creating with Hoddle and 59 00:04:01,600 --> 00:04:04,520 throughout the tutorials in this section. 60 00:04:04,570 --> 00:04:07,870 So that's pretty much what you need to do for every single section that you go through. 61 00:04:07,900 --> 00:04:12,250 And again I will remind you and the reason for the structure is so what why did we structure it like 62 00:04:12,250 --> 00:04:12,850 this. 63 00:04:12,850 --> 00:04:19,300 Why did we for instance not include all of the data sets right away inside these folders for you right. 64 00:04:19,300 --> 00:04:23,710 That would looks feels like it would be more convenient but at the same time there's a couple of reasons 65 00:04:23,720 --> 00:04:30,730 so the first one is that in case we need to update something in case we need to update a certain section 66 00:04:30,730 --> 00:04:33,780 like this section for instance we need to update the dataset. 67 00:04:33,820 --> 00:04:39,880 Well in that case if we had to update it and then upload the whole folder that would take time. 68 00:04:39,880 --> 00:04:46,450 That would mean the Course would not be available for longer or that means while we're updating a lot 69 00:04:46,450 --> 00:04:50,350 of people we'll be getting the wrong data set and we don't want that so if we want to update something 70 00:04:50,350 --> 00:04:56,140 now we can very quickly just update that one zip file on the Web site and that's very very quick for 71 00:04:56,140 --> 00:04:56,730 us to do. 72 00:04:56,980 --> 00:05:02,170 And the second reason is of course size if we had put all of the datasets in here right away this fall 73 00:05:02,170 --> 00:05:03,190 there would be massive. 74 00:05:03,190 --> 00:05:09,550 So it's a much more efficient to download just a section that you're doing and then proceed with those 75 00:05:10,030 --> 00:05:10,960 tutorials. 76 00:05:10,960 --> 00:05:11,550 So there we go. 77 00:05:11,560 --> 00:05:12,790 That's how you get the data set. 78 00:05:12,800 --> 00:05:17,760 And now I'll hand you over to her son who'll take you through two days of data set. 79 00:05:17,770 --> 00:05:24,730 So what does this data set about this virus that contains Arkan's country age salary and purchased and 80 00:05:24,820 --> 00:05:31,780 10 10 months 10 observations and basically this contains information of customers of some company. 81 00:05:32,080 --> 00:05:37,780 And the first three columns are informations of these customers like the country the age and the salary 82 00:05:38,230 --> 00:05:41,070 and the fourth column purchased here sells. 83 00:05:41,100 --> 00:05:45,430 If yes or no the customer but the product of the company. 84 00:05:45,430 --> 00:05:51,310 So we have to distinguish something very important here that we will distinguish for the rest of the 85 00:05:51,310 --> 00:05:52,090 course. 86 00:05:52,150 --> 00:05:57,220 It's the difference between the independent variables and the dependent variables. 87 00:05:57,490 --> 00:06:00,510 So the independent variables are the first three columns. 88 00:06:00,510 --> 00:06:05,340 Country age and salary and the dependent variable is purchased here. 89 00:06:05,350 --> 00:06:12,490 The fourth column and in any machine or any model we are going to use some independent variables to 90 00:06:12,490 --> 00:06:14,390 predict a dependent variable. 91 00:06:14,620 --> 00:06:20,050 So that means here that with this three first columns the three independent variables we are going to 92 00:06:20,050 --> 00:06:22,010 predict if yes or no. 93 00:06:22,060 --> 00:06:24,670 The customer purchased a product. 94 00:06:25,060 --> 00:06:25,400 Okay. 95 00:06:25,400 --> 00:06:30,970 So that's the first distinction that we really need to understand and it's very important to do this 96 00:06:30,970 --> 00:06:36,040 section because the data pre-processing steps that we're going into in this section we will have to 97 00:06:36,040 --> 00:06:39,810 do it for all the machine learning models we are going to make. 98 00:06:39,820 --> 00:06:43,010 So it's really essential to know how to manage this. 99 00:06:43,090 --> 00:06:47,050 But don't worry it's going to be very simple and besides I'm going to give you at the end of this section 100 00:06:47,050 --> 00:06:52,930 a template that will allow us later to preprocess the data in fleshlight for all the machine learning 101 00:06:52,930 --> 00:06:54,600 models we're going to make. 102 00:06:54,610 --> 00:06:56,740 So I look forward to starting the steps with you. 103 00:06:56,750 --> 00:06:58,630 And until then enjoy machine learning. 10942

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.