All language subtitles for 2. Data Frame Basics

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian Download
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,660 --> 00:00:04,690 Hello everyone and welcome to the lecture on data frame basics. 2 00:00:04,800 --> 00:00:09,800 We've learned a lot about vectors and their two dimensional counterpart matrices. 3 00:00:09,810 --> 00:00:11,580 Now we're going to learn about data frames. 4 00:00:11,610 --> 00:00:17,730 One of the main tools for data analysis with our matrix inputs we're limited because all data inside 5 00:00:17,730 --> 00:00:23,700 of the matrix to be of the same data type with data frames we're going to be able to organize and mix 6 00:00:23,700 --> 00:00:27,090 data types to create powerful data structures. 7 00:00:27,330 --> 00:00:32,970 Let's go ahead and jump to our studio to begin exploring some built in data frames. 8 00:00:32,970 --> 00:00:33,350 All right. 9 00:00:33,390 --> 00:00:39,450 So here in our studio I'm going to go ahead and just push the right hand windows over to the right a 10 00:00:39,450 --> 00:00:40,440 little further. 11 00:00:40,440 --> 00:00:43,880 So we just see the console and the scripting screen. 12 00:00:44,580 --> 00:00:49,070 Let's start off by seeing and exploring some built in data frames. 13 00:00:49,470 --> 00:00:52,850 One of the first ones we're going to check comes from that state data set. 14 00:00:52,850 --> 00:00:57,410 So if you begin typing state into our studio you should see a bunch of pop ups. 15 00:00:57,420 --> 00:01:03,460 Let's go ahead and explore state x 7 7. 16 00:01:03,470 --> 00:01:06,840 And then when you click enter you'll notice a data frame pops up. 17 00:01:06,870 --> 00:01:08,720 This looks similar to a matrix. 18 00:01:08,760 --> 00:01:13,220 But notice how the rows and the columns are names. 19 00:01:13,530 --> 00:01:21,060 Now data frames can also have names rows and columns versus just having the number signifying what number 20 00:01:21,060 --> 00:01:23,080 column or what number row it is. 21 00:01:23,340 --> 00:01:28,230 This sort of labeling is going to be really useful or trying to access our data from this data frame 22 00:01:28,230 --> 00:01:28,770 . 23 00:01:28,770 --> 00:01:33,870 You can begin to think about data frames as sort of an X like an Excel spreadsheet. 24 00:01:33,930 --> 00:01:37,920 It's going to be really useful for working with large amounts of data. 25 00:01:37,920 --> 00:01:41,340 Let's go ahead and explore a few more builtin data frames. 26 00:01:41,380 --> 00:01:42,990 You know go ahead and clear the cons.. 27 00:01:43,260 --> 00:01:50,760 And I'm going to import U.S. personal expenditure so this data set consists of United States personal 28 00:01:50,760 --> 00:01:54,530 expenditures in the billions of dollars. 29 00:01:54,670 --> 00:01:57,290 It's a smaller data frame but you can see the same idea. 30 00:01:57,300 --> 00:02:00,730 We have a labeled row and some labeled columns. 31 00:02:00,870 --> 00:02:06,090 Now the label rows or the labeled columns to actually need to have string or character labels that can 32 00:02:06,090 --> 00:02:08,160 be numeric just like a normal index. 33 00:02:08,160 --> 00:02:15,150 So for example in other built in data frame is the women data frame you just begin to type women and 34 00:02:15,150 --> 00:02:21,210 this dataset gives the average height and weights for American women aged 30 to 39 of a small sample 35 00:02:21,210 --> 00:02:23,320 of them. 36 00:02:23,340 --> 00:02:28,230 So if we explore this data frame a little more we noticed we have two columns here height and weight 37 00:02:28,530 --> 00:02:33,760 with various values and each row is just a numeric indexed value. 38 00:02:33,760 --> 00:02:37,180 It doesn't actually have a label. 39 00:02:37,650 --> 00:02:42,270 You might be wondering well how do you get a list of all the available built in data frames that are 40 00:02:42,720 --> 00:02:43,600 in order to do that. 41 00:02:43,620 --> 00:02:51,480 You can just type in data close parentheses on that data function and you should get a new tab to pop 42 00:02:51,480 --> 00:02:52,280 up here. 43 00:02:52,470 --> 00:02:57,810 That will give you a list of all the data sets in the builtin data sets packaged with are. 44 00:02:57,810 --> 00:03:04,200 So if we scroll down here you'll see on the left the name that you need to import and a right description 45 00:03:04,230 --> 00:03:04,770 of it. 46 00:03:05,040 --> 00:03:11,970 So for instance let's say we just scroll down here we want to explore the world's telephone's will we 47 00:03:11,970 --> 00:03:14,730 just type in world phones to see that. 48 00:03:14,760 --> 00:03:23,130 So go ahead and type in world phones enter and so we start seeing some data about the number of phones 49 00:03:23,130 --> 00:03:27,280 in the world separated by columns of the different parts of the world. 50 00:03:27,600 --> 00:03:31,100 Something you've made noticed is that some of the data frames are really big. 51 00:03:31,350 --> 00:03:36,720 If you just want to take a peek at either the top or the bottom of the data frame you can use the head 52 00:03:36,750 --> 00:03:38,060 and tail functions. 53 00:03:38,250 --> 00:03:45,660 So for example remember we had that state DOT X 7:7 data frame which was quite large because it has 54 00:03:45,660 --> 00:03:47,490 all 50 states inside of it. 55 00:03:47,660 --> 00:03:57,930 We can go ahead and call ahead on State DOT x 7 7 and this will just return the first six rows of that 56 00:03:57,930 --> 00:04:06,070 data frame and we can also call tail state x 7 7 or just passing whatever data frame we want in they'll 57 00:04:06,080 --> 00:04:09,870 shoot the last six rows. 58 00:04:09,870 --> 00:04:15,900 Some other useful built in functions for working data frames are SDR and summary. 59 00:04:16,080 --> 00:04:22,320 We can use the TR function to get the structure of a data frame which gives information on the structure 60 00:04:22,320 --> 00:04:27,710 of the data frame and the data it contains such as variable names and data types. 61 00:04:27,720 --> 00:04:34,010 So for example if I call SDR which stands for structure and passing the data frame object. 62 00:04:34,260 --> 00:04:40,030 Let's go ahead and pass in that state X 7:7 click enter. 63 00:04:40,140 --> 00:04:42,950 Notice we begin to get some information about it. 64 00:04:43,160 --> 00:04:49,730 Like what kind of rows it has what kind of columns it has the dimensions etc.. 65 00:04:49,860 --> 00:04:55,430 There's another built in function which will give you some nice testicle data called summary. 66 00:04:55,560 --> 00:05:01,350 So summary is just a generic function that's used to produce results summaries of the results of various 67 00:05:01,410 --> 00:05:05,470 model fitting functions or just of objects like a data frame. 68 00:05:05,490 --> 00:05:10,460 So if you pass in your data frame here so summerise thought X 7:7 69 00:05:13,650 --> 00:05:17,540 notice we get a summary for each of the columns in our data frame. 70 00:05:17,550 --> 00:05:24,310 So for example the population summary gives you the minimum value the median value the mean values some 71 00:05:24,330 --> 00:05:27,290 court tile values as well as the maximum value. 72 00:05:27,780 --> 00:05:33,480 Later on in the course we're going to be using summary quite a bit for not just data frames but a whole 73 00:05:33,480 --> 00:05:35,790 host of functions and objects. 74 00:05:35,790 --> 00:05:40,040 So just keep that in mind since we've explored built in data frames. 75 00:05:40,050 --> 00:05:43,920 Let's go ahead and explore how we can create our own data frames now. 76 00:05:44,780 --> 00:05:48,770 OK we'll start off by creating a data frame from some vectors. 77 00:05:48,780 --> 00:05:51,550 Let's go ahead and make up some weather data. 78 00:05:51,570 --> 00:05:57,660 I'll start off by making a vector called days and we'll just have this be the days of the week which 79 00:05:57,660 --> 00:05:58,870 we've used before. 80 00:05:59,190 --> 00:06:03,300 So Monday Tuesday Wednesday 81 00:06:06,720 --> 00:06:16,710 Thursday and Friday we have Monday Tuesday Wednesday Thursday Friday for the days then we're going to 82 00:06:16,710 --> 00:06:24,200 make a temp sector and you can actually just quickly copy and paste this from the notes and I'm going 83 00:06:24,200 --> 00:06:26,250 to make up some temperatures. 84 00:06:26,280 --> 00:06:30,190 Let's say these are in Celsius now. 85 00:06:30,540 --> 00:06:31,240 OK. 86 00:06:31,620 --> 00:06:33,090 So we have my temperatures there. 87 00:06:33,130 --> 00:06:36,060 And let's make one final vector. 88 00:06:36,060 --> 00:06:39,390 This one's going to be a vector of logical values. 89 00:06:39,390 --> 00:06:44,430 So boolean values will indicate whether or not it rained that day. 90 00:06:44,460 --> 00:06:48,670 So we'll say it drained on Tuesday on Monday. 91 00:06:48,760 --> 00:06:54,770 Did it rain on Wednesday then in Reno Thursday and it rained again on Friday. 92 00:06:55,230 --> 00:07:05,400 Okay so we can go ahead and pass in these vectors into the data dot Frain function so we can go ahead 93 00:07:05,400 --> 00:07:16,020 and say days Tim rain and now we outputted a data frame and let's go ahead and actually save this as 94 00:07:16,020 --> 00:07:25,990 DMF variable data frame and then pass in the vector's days Tim brain. 95 00:07:26,700 --> 00:07:31,530 And then when we check out our data frame we'll notice that the column names have been automatically 96 00:07:31,530 --> 00:07:40,050 set to be the same as the name of the vector variable and by default the rows are just indexed numbers 97 00:07:40,050 --> 00:07:41,040 . 98 00:07:41,460 --> 00:07:49,430 Since we've now Cranor a data frame we can actually go ahead and call structure on it as TR RDF that 99 00:07:49,440 --> 00:07:52,920 will give us some information about the structure of the data in the data frame. 100 00:07:52,920 --> 00:07:56,780 Or we can even call a summary on that data frame. 101 00:07:57,490 --> 00:08:04,980 Okay that's it for the basics of what a data frame is and what it looks like and are the main things 102 00:08:04,980 --> 00:08:07,420 you should have been able to get out of this lecture. 103 00:08:07,620 --> 00:08:08,760 Are the following. 104 00:08:09,000 --> 00:08:15,120 You can type in data to get a output of all the data sets in the package data sets. 105 00:08:15,170 --> 00:08:16,300 You can play around with. 106 00:08:16,320 --> 00:08:18,720 Are these already built in for you. 107 00:08:18,720 --> 00:08:24,850 Also remember you can use SDR and summary to check out data frames whether you create them or they're 108 00:08:24,850 --> 00:08:25,750 built in. 109 00:08:26,010 --> 00:08:33,000 And finally you can create your own data frames for passing and vectors into the data dot frame function 110 00:08:33,000 --> 00:08:33,780 . 111 00:08:34,080 --> 00:08:34,950 OK. 112 00:08:35,010 --> 00:08:36,600 Again that's it for the basics. 113 00:08:36,600 --> 00:08:41,150 Up next we're going to learn about selection and indexing data frame elements. 114 00:08:41,160 --> 00:08:43,130 Thanks everyone and I'll see you at the next lecture 12140

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.