subtitlecat.com

All language subtitles for 01-Introduction to data wrangling

Afrikaans

Akan

Albanian

Amharic

Arabic

Armenian

Azerbaijani

Basque

Belarusian

Bemba

Bengali

Bihari

Bosnian

Breton

Bulgarian

Cambodian

Catalan

Cebuano

Cherokee

Chichewa

Chinese (Simplified)

Chinese (Traditional)

Corsican

Croatian

Czech

Danish

Dutch

English

Esperanto

Estonian

Ewe

Faroese

Filipino

Finnish

French

Frisian

Galician

Georgian

German

Greek

Guarani

Gujarati

Haitian Creole

Hausa

Hawaiian

Hebrew

Hindi

Hmong

Hungarian

Icelandic

Igbo

Indonesian

Interlingua

Irish

Italian

Japanese

Javanese

Kannada

Kazakh

Kinyarwanda

Kirundi

Kongo

Korean

Krio (Sierra Leone)

Kurdish

Kurdish (Soranî)

Kyrgyz

Laothian

Latin

Latvian

Lingala

Lithuanian

Lozi

Luganda

Luo

Luxembourgish

Macedonian

Malagasy

Malay

Malayalam

Maltese

Maori

Marathi

Mauritian Creole

Moldavian

Mongolian

Myanmar (Burmese)

Montenegrin

Nepali

Nigerian Pidgin

Northern Sotho

Norwegian

Norwegian (Nynorsk)

Occitan

Oriya

Oromo

Pashto

Persian

Polish

Portuguese (Brazil)

Portuguese (Portugal) Download

Punjabi

Quechua

Romanian

Romansh

Runyakitara

Russian

Samoan

Scots Gaelic

Serbian

Serbo-Croatian

Sesotho

Setswana

Seychellois Creole

Shona

Sindhi

Sinhalese

Slovak

Slovenian

Somali

Spanish

Spanish (Latin American)

Sundanese

Swahili

Swedish

Tajik

Tamil

Tatar

Telugu

Thai

Tigrinya

Tonga

Tshiluba

Tumbuka

Turkish

Turkmen

Twi

Uighur

Ukrainian

Urdu

Uzbek

Vietnamese

Welsh

Wolof

Xhosa

Yiddish

Yoruba

Zulu

Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 0 00:00:00,000 --> 00:00:03,090 1 00:00:03,090 --> 00:00:07,520 RAFAEL IRIZARRY: The data sets used in this series 2 00:00:07,520 --> 00:00:12,520 have been made available to you as R objects, specifically as data frames. 3 00:00:12,520 --> 00:00:16,340 The US murders data, the reported heights data, the Gapminder data, 4 00:00:16,340 --> 00:00:18,950 and the poll data are all examples. 5 00:00:18,950 --> 00:00:22,950 These data sets come included in the dslabs package, 6 00:00:22,950 --> 00:00:25,620 and we loaded them using the data function. 7 00:00:25,620 --> 00:00:28,700 Furthermore, we have made the data available in what 8 00:00:28,700 --> 00:00:34,530 is referred to as tidy form, a concept we define later in this course. 9 00:00:34,530 --> 00:00:38,360 The tidyverse packages and functions assume that the data is tidy, 10 00:00:38,360 --> 00:00:40,710 and this assumption is a big part of the reason 11 00:00:40,710 --> 00:00:42,730 these packages work so well together. 12 00:00:42,730 --> 00:00:45,540 We did quite a bit of work behind the scenes 13 00:00:45,540 --> 00:00:49,710 to get the original raw data into the tidy tables you work with. 14 00:00:49,710 --> 00:00:53,480 However, in a typical data science project, 15 00:00:53,480 --> 00:00:57,770 it is much more typical for the data to be in a file, a database, 16 00:00:57,770 --> 00:01:02,540 or extracted from a document, including web pages, tweets, or PDF. 17 00:01:02,540 --> 00:01:07,600 In these cases, the first step is to import the data into R, 18 00:01:07,600 --> 00:01:11,350 and when using the tidy verse, tidy up the data. 19 00:01:11,350 --> 00:01:13,750 The first step in the data analysis process 20 00:01:13,750 --> 00:01:17,360 usually involves several often complicated steps 21 00:01:17,360 --> 00:01:21,350 to convert data from its raw form to the tidy form that greatly 22 00:01:21,350 --> 00:01:24,010 facilitates the rest of the analysis. 23 00:01:24,010 --> 00:01:27,050 We refer to this process as data wrangling. 24 00:01:27,050 --> 00:01:30,400 In this course, we cover several common steps 25 00:01:30,400 --> 00:01:35,270 of the data wrangling process including importing data into R from files, 26 00:01:35,270 --> 00:01:41,220 tidying data, string processing, HTML parsing, working with dates and times, 27 00:01:41,220 --> 00:01:42,980 and text mining. 28 00:01:42,980 --> 00:01:47,210 Rarely are all these wrangling steps necessary in a single analysis, 29 00:01:47,210 --> 00:01:51,140 but a data scientist will likely face them all at some point. 30 00:01:51,140 --> 00:01:54,970 Some of the examples we used to demonstrate data wrangling techniques 31 00:01:54,970 --> 00:01:59,800 are based on the work we did to convert the raw data into the tidy data 32 00:01:59,800 --> 00:02:05,850 sets provided by the dslab packages and used in the series as examples. 33 00:02:05,850 --> 00:02:08,707 2947