All language subtitles for 01-Introduction to data wrangling

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal) Download
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 0 00:00:00,000 --> 00:00:03,090 1 00:00:03,090 --> 00:00:07,520 RAFAEL IRIZARRY: The data sets used in this series 2 00:00:07,520 --> 00:00:12,520 have been made available to you as R objects, specifically as data frames. 3 00:00:12,520 --> 00:00:16,340 The US murders data, the reported heights data, the Gapminder data, 4 00:00:16,340 --> 00:00:18,950 and the poll data are all examples. 5 00:00:18,950 --> 00:00:22,950 These data sets come included in the dslabs package, 6 00:00:22,950 --> 00:00:25,620 and we loaded them using the data function. 7 00:00:25,620 --> 00:00:28,700 Furthermore, we have made the data available in what 8 00:00:28,700 --> 00:00:34,530 is referred to as tidy form, a concept we define later in this course. 9 00:00:34,530 --> 00:00:38,360 The tidyverse packages and functions assume that the data is tidy, 10 00:00:38,360 --> 00:00:40,710 and this assumption is a big part of the reason 11 00:00:40,710 --> 00:00:42,730 these packages work so well together. 12 00:00:42,730 --> 00:00:45,540 We did quite a bit of work behind the scenes 13 00:00:45,540 --> 00:00:49,710 to get the original raw data into the tidy tables you work with. 14 00:00:49,710 --> 00:00:53,480 However, in a typical data science project, 15 00:00:53,480 --> 00:00:57,770 it is much more typical for the data to be in a file, a database, 16 00:00:57,770 --> 00:01:02,540 or extracted from a document, including web pages, tweets, or PDF. 17 00:01:02,540 --> 00:01:07,600 In these cases, the first step is to import the data into R, 18 00:01:07,600 --> 00:01:11,350 and when using the tidy verse, tidy up the data. 19 00:01:11,350 --> 00:01:13,750 The first step in the data analysis process 20 00:01:13,750 --> 00:01:17,360 usually involves several often complicated steps 21 00:01:17,360 --> 00:01:21,350 to convert data from its raw form to the tidy form that greatly 22 00:01:21,350 --> 00:01:24,010 facilitates the rest of the analysis. 23 00:01:24,010 --> 00:01:27,050 We refer to this process as data wrangling. 24 00:01:27,050 --> 00:01:30,400 In this course, we cover several common steps 25 00:01:30,400 --> 00:01:35,270 of the data wrangling process including importing data into R from files, 26 00:01:35,270 --> 00:01:41,220 tidying data, string processing, HTML parsing, working with dates and times, 27 00:01:41,220 --> 00:01:42,980 and text mining. 28 00:01:42,980 --> 00:01:47,210 Rarely are all these wrangling steps necessary in a single analysis, 29 00:01:47,210 --> 00:01:51,140 but a data scientist will likely face them all at some point. 30 00:01:51,140 --> 00:01:54,970 Some of the examples we used to demonstrate data wrangling techniques 31 00:01:54,970 --> 00:01:59,800 are based on the work we did to convert the raw data into the tidy data 32 00:01:59,800 --> 00:02:05,850 sets provided by the dslab packages and used in the series as examples. 33 00:02:05,850 --> 00:02:08,707 2947

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.