All language subtitles for 7. Features In Data

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian Download
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,980 --> 00:00:07,890 Gone through Step 1 problem definition we've gone through Step 2 data and step 3. 2 00:00:08,020 --> 00:00:10,030 We've defined what success means for us. 3 00:00:10,530 --> 00:00:14,920 Now let's get on to Step 4 which is features. 4 00:00:14,920 --> 00:00:20,380 Now here the question we're trying to answer is what do we already know about the data. 5 00:00:20,450 --> 00:00:25,460 Now if you haven't worked with data before you might hear this word features and be wondering what powders 6 00:00:25,690 --> 00:00:27,230 features mean. 7 00:00:27,280 --> 00:00:30,720 Well you'll hear this world come up a lot in machine learning. 8 00:00:31,000 --> 00:00:37,030 Maybe in the form of feature learning or feature variables or when someone ask how many features are 9 00:00:37,030 --> 00:00:39,550 there or what kind of features are there. 10 00:00:40,630 --> 00:00:44,200 Features is another word for different forms of data. 11 00:00:44,390 --> 00:00:49,960 Now we've already discussed different kinds of data such as structured and unstructured but features 12 00:00:49,960 --> 00:00:56,210 refers to the different forms of data within structured or unstructured data. 13 00:00:56,290 --> 00:01:03,520 For example let's go back to our predicting heart disease problem we might want to see if things such 14 00:01:03,520 --> 00:01:10,270 as a person's body weight their sex their average resting heart rate and their chest pain rating can 15 00:01:10,270 --> 00:01:13,310 be used to predict if they have heart disease or not. 16 00:01:14,500 --> 00:01:21,880 These three things a patient's body weight sex average resting heart rate and chest pain are features 17 00:01:22,180 --> 00:01:28,410 of the data that could also be referred to as feature variables. 18 00:01:28,440 --> 00:01:37,900 In other words we want to use the feature variables to predict the target variables which is whether 19 00:01:37,900 --> 00:01:41,430 or not a person has heart disease or no. 20 00:01:41,650 --> 00:01:45,770 Now when it comes to feature variables again there are different kinds. 21 00:01:45,820 --> 00:01:49,920 You've got numerical which means a number like body weight. 22 00:01:50,410 --> 00:01:58,160 There's categorical which means one thing or another like sex or whether a patient is a smoker or not. 23 00:01:58,300 --> 00:02:06,160 And then there's derived which is when someone like yourself looks at the data and creates a new feature 24 00:02:06,400 --> 00:02:08,700 using the existing ones. 25 00:02:08,770 --> 00:02:14,890 For example you might look at someone's hospital visit history timestamps and if they've had a visit 26 00:02:14,920 --> 00:02:20,820 in the last year you could make a categorical feature called visited in last year. 27 00:02:22,360 --> 00:02:25,980 If someone had visited in the last year they would get true. 28 00:02:25,990 --> 00:02:28,330 Or in our case yes. 29 00:02:28,450 --> 00:02:31,240 If not they would get false or in this case. 30 00:02:31,320 --> 00:02:32,380 No. 31 00:02:32,500 --> 00:02:40,360 The process of deriving features like this out of data is often referred to as feature engineering our 32 00:02:40,360 --> 00:02:46,030 heart disease example is structured but unstructured data has features too. 33 00:02:46,370 --> 00:02:52,240 They're just a little less obvious if you looked at enough images of dogs you'd start to figure out 34 00:02:52,670 --> 00:02:53,460 OK. 35 00:02:53,620 --> 00:02:59,170 Most of these creatures have four shapes coming out of their body their legs and a couple of circles 36 00:02:59,170 --> 00:03:05,720 up the front their eyes as a machine learning algorithm looks at different images. 37 00:03:05,720 --> 00:03:11,570 It would start to learn these different shapes and much more and figure out how different pictures are 38 00:03:11,570 --> 00:03:14,430 similar or different to each other. 39 00:03:14,480 --> 00:03:21,170 Don't worry when it comes to figuring out the different patterns between features such as the four rectangles 40 00:03:21,170 --> 00:03:27,470 sort of shapes coming out of a dog's body or the circles at the front of the dog's head you don't have 41 00:03:27,470 --> 00:03:30,240 to tell the machine learning algorithm what they are. 42 00:03:30,260 --> 00:03:38,710 The beautiful thing is it because the mount on its own the final thing to remember is a feature works 43 00:03:38,710 --> 00:03:41,250 best within a machine learning algorithm. 44 00:03:41,260 --> 00:03:47,950 If many of the samples have it for an hour predicting heart disease problem say we had a feature which 45 00:03:47,950 --> 00:03:56,440 was called most Eden Foods which had a list of the foods the Patient 8 most often but only 10 per cent 46 00:03:56,800 --> 00:04:00,100 or 10 out of 100 patient records had it. 47 00:04:00,460 --> 00:04:09,760 So this one idea for patient I.D. for 3 2 8 has most in food which is fries not ideal and these other 48 00:04:09,760 --> 00:04:16,030 patients don't have it because remember only 10 out of 100 examples have the most eaten food. 49 00:04:16,030 --> 00:04:19,830 They have data here and so these ones are just missing and that will be the same. 50 00:04:19,830 --> 00:04:25,000 So if you can imagine there's 100 patients here only 10 of them will have this most eaten food column 51 00:04:25,000 --> 00:04:26,050 filled. 52 00:04:26,350 --> 00:04:31,520 Since a machine learning algorithm learns best when all samples have similar information. 53 00:04:31,620 --> 00:04:39,480 We have to leave this one out or try to collect more information before using it the process of ensuring 54 00:04:39,540 --> 00:04:43,570 all samples have similar information is called feature coverage. 55 00:04:43,650 --> 00:04:47,740 In an ideal dataset you've got complete feature coverage. 56 00:04:47,940 --> 00:04:54,750 So for us to want to to be able to use this feature of most in foods Ideally we'd want all values here 57 00:04:54,780 --> 00:05:01,560 or at least more than 10 percent coverage which means that over 10 percent or over 10 and 100 examples 58 00:05:01,830 --> 00:05:04,010 have some sort of value in this column. 59 00:05:05,450 --> 00:05:11,000 We'll have plenty of practice looking at different features and coming lectures and projects and lessons. 60 00:05:11,000 --> 00:05:14,610 In the meantime think about a problem you had to solve recently. 61 00:05:14,750 --> 00:05:16,450 What features went into it. 62 00:05:16,640 --> 00:05:22,130 Were they numerical or categorical or did you combine them into your own derived feature. 6913

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.