All language subtitles for 5. Types of Data

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian Download
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,560 --> 00:00:03,110 Now we figured out what problem we're trying to solve. 2 00:00:03,110 --> 00:00:07,730 We've matched our specific problem to a different type of machine learning problem. 3 00:00:07,730 --> 00:00:10,640 It's time to have a look at what data we have. 4 00:00:10,700 --> 00:00:16,350 As you may have guessed the question we're trying to answer here is what kind of data do we have. 5 00:00:16,370 --> 00:00:23,180 Data comes in many different shapes and sizes but the main two types are structured and unstructured 6 00:00:23,960 --> 00:00:29,030 structured data is something you'd expect to see in an excel file such as rows and columns of different 7 00:00:29,030 --> 00:00:36,390 patient medical records and whether or not they have heart disease or not or customer purchase transactions. 8 00:00:36,440 --> 00:00:42,920 It's called structured data because all of the samples the different patient records are typically in 9 00:00:42,920 --> 00:00:50,450 similar format meaning one column might contain numbers of a certain type such as average blood pressure 10 00:00:50,570 --> 00:00:58,190 or sags or weight of a patient and another column might have whether they have chest pain or not and 11 00:00:58,190 --> 00:01:00,640 what the level of intensity is. 12 00:01:00,680 --> 00:01:08,870 Unstructured data are things like images natural language text such as transcribed phone calls videos 13 00:01:09,110 --> 00:01:15,120 and audio files although we can turn these into numbers and create structure. 14 00:01:15,120 --> 00:01:17,960 They typically come in many varying formats. 15 00:01:18,000 --> 00:01:24,180 One picture of a dog may look completely different to another image of a dog and the email as you write 16 00:01:24,180 --> 00:01:29,760 back and forth with the friend may have a completely different structure to the emails you'd write to 17 00:01:29,760 --> 00:01:31,150 a co-worker. 18 00:01:31,170 --> 00:01:37,590 Now within these two data types there's static and streaming data static data is data which doesn't 19 00:01:37,590 --> 00:01:39,170 change over time. 20 00:01:39,300 --> 00:01:45,690 You may have a spreadsheet of patient records in a dot CSP format which stands for commas Separated 21 00:01:45,690 --> 00:01:52,360 Values which simply means all of the different data is in one file separated by commas. 22 00:01:52,500 --> 00:01:53,890 It looks like this. 23 00:01:53,940 --> 00:01:55,680 You check this table we got the idea. 24 00:01:55,680 --> 00:01:56,720 Com I'll wait. 25 00:01:56,760 --> 00:02:03,030 Comma sex and if we were to read that into it to a data frame using a tool like pandas we'll have a 26 00:02:03,030 --> 00:02:04,690 look at this in a future lesson. 27 00:02:04,800 --> 00:02:06,510 It would look something like this. 28 00:02:06,540 --> 00:02:11,970 So a lot of data you'll actually come across comes in a simple format like this. 29 00:02:11,970 --> 00:02:18,150 But to turn it into something that a little bit more structural you can convert it to this. 30 00:02:18,290 --> 00:02:21,720 Now CSB is one of the most common types of static data formats. 31 00:02:21,800 --> 00:02:24,920 We're going to get very used to this by the end of the course. 32 00:02:25,140 --> 00:02:31,890 And since these values won't really change over time they're called static Usually what you'll want 33 00:02:31,980 --> 00:02:35,120 is a lot of these examples in machine learning. 34 00:02:35,130 --> 00:02:38,660 There's a saying The more data the better. 35 00:02:38,850 --> 00:02:44,520 Which makes sense if you think about it the more examples you have of something such as the inputs and 36 00:02:44,610 --> 00:02:52,140 outputs of patient records where the inputs are a patient's body parameters and the outputs are whether 37 00:02:52,140 --> 00:02:54,260 they have heart disease or not. 38 00:02:54,540 --> 00:02:58,090 The more chances you'll have to find patterns between them. 39 00:02:58,110 --> 00:03:00,390 The same goes for machine learning algorithms. 40 00:03:00,390 --> 00:03:07,380 The more examples they can look at the more chance they have at finding patterns and thus using those 41 00:03:07,380 --> 00:03:10,160 patterns to predict something in the future. 42 00:03:10,260 --> 00:03:14,970 Like whether a new patient who comes along who isn't in this table whether they have heart disease or 43 00:03:14,970 --> 00:03:15,240 not. 44 00:03:17,070 --> 00:03:20,830 Streaming data is data which is constantly changed over time. 45 00:03:20,880 --> 00:03:26,430 For example say you wanted to predict how a stock price will change based on news headlines you'll be 46 00:03:26,430 --> 00:03:27,990 working with streaming data. 47 00:03:28,050 --> 00:03:34,380 Since news headlines are being updated constantly you'll want to be the first to see how they change 48 00:03:34,380 --> 00:03:42,790 stocks most of the work you will do in practice will start on static data and then if your data analysis 49 00:03:42,850 --> 00:03:48,580 and machine learning efforts prove to show some insights you'll move towards streaming data for when 50 00:03:48,580 --> 00:03:51,470 you go to deployment or in production. 51 00:03:51,910 --> 00:03:58,510 A common data science workflow begins by opening a v file in a Jupiter notebook a tool for building 52 00:03:58,510 --> 00:04:05,830 machine learning projects then exploring the data and performing data analysis using pandas a python 53 00:04:05,830 --> 00:04:12,790 library for data analysis and making visualizations such as graphs and comparing different data points 54 00:04:12,790 --> 00:04:21,280 using map plot lib then building machine learning models on the data using psychic learn such as a machine 55 00:04:21,280 --> 00:04:25,060 learning model to predict using these patterns here. 56 00:04:25,060 --> 00:04:31,790 Whether or not a patient has heart disease don't worry if you're thinking what's a Jupiter notebook 57 00:04:32,110 --> 00:04:35,730 and pandas what are we at the zoo. 58 00:04:35,840 --> 00:04:40,750 We've got dedicated sections and projects coming out for each of these tools. 59 00:04:40,810 --> 00:04:46,030 For now think about the different kinds of data you create or use every day. 60 00:04:46,030 --> 00:04:48,670 Are they structured or unstructured. 61 00:04:48,670 --> 00:04:49,710 How much data is there. 6681

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.