Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,660 --> 00:00:04,690
Hello everyone and welcome to the lecture on data frame basics.
2
00:00:04,800 --> 00:00:09,800
We've learned a lot about vectors and their two dimensional counterpart matrices.
3
00:00:09,810 --> 00:00:11,580
Now we're going to learn about data frames.
4
00:00:11,610 --> 00:00:17,730
One of the main tools for data analysis with our matrix inputs we're limited because all data inside
5
00:00:17,730 --> 00:00:23,700
of the matrix to be of the same data type with data frames we're going to be able to organize and mix
6
00:00:23,700 --> 00:00:27,090
data types to create powerful data structures.
7
00:00:27,330 --> 00:00:32,970
Let's go ahead and jump to our studio to begin exploring some built in data frames.
8
00:00:32,970 --> 00:00:33,350
All right.
9
00:00:33,390 --> 00:00:39,450
So here in our studio I'm going to go ahead and just push the right hand windows over to the right a
10
00:00:39,450 --> 00:00:40,440
little further.
11
00:00:40,440 --> 00:00:43,880
So we just see the console and the scripting screen.
12
00:00:44,580 --> 00:00:49,070
Let's start off by seeing and exploring some built in data frames.
13
00:00:49,470 --> 00:00:52,850
One of the first ones we're going to check comes from that state data set.
14
00:00:52,850 --> 00:00:57,410
So if you begin typing state into our studio you should see a bunch of pop ups.
15
00:00:57,420 --> 00:01:03,460
Let's go ahead and explore state x 7 7.
16
00:01:03,470 --> 00:01:06,840
And then when you click enter you'll notice a data frame pops up.
17
00:01:06,870 --> 00:01:08,720
This looks similar to a matrix.
18
00:01:08,760 --> 00:01:13,220
But notice how the rows and the columns are names.
19
00:01:13,530 --> 00:01:21,060
Now data frames can also have names rows and columns versus just having the number signifying what number
20
00:01:21,060 --> 00:01:23,080
column or what number row it is.
21
00:01:23,340 --> 00:01:28,230
This sort of labeling is going to be really useful or trying to access our data from this data frame
22
00:01:28,230 --> 00:01:28,770
.
23
00:01:28,770 --> 00:01:33,870
You can begin to think about data frames as sort of an X like an Excel spreadsheet.
24
00:01:33,930 --> 00:01:37,920
It's going to be really useful for working with large amounts of data.
25
00:01:37,920 --> 00:01:41,340
Let's go ahead and explore a few more builtin data frames.
26
00:01:41,380 --> 00:01:42,990
You know go ahead and clear the cons..
27
00:01:43,260 --> 00:01:50,760
And I'm going to import U.S. personal expenditure so this data set consists of United States personal
28
00:01:50,760 --> 00:01:54,530
expenditures in the billions of dollars.
29
00:01:54,670 --> 00:01:57,290
It's a smaller data frame but you can see the same idea.
30
00:01:57,300 --> 00:02:00,730
We have a labeled row and some labeled columns.
31
00:02:00,870 --> 00:02:06,090
Now the label rows or the labeled columns to actually need to have string or character labels that can
32
00:02:06,090 --> 00:02:08,160
be numeric just like a normal index.
33
00:02:08,160 --> 00:02:15,150
So for example in other built in data frame is the women data frame you just begin to type women and
34
00:02:15,150 --> 00:02:21,210
this dataset gives the average height and weights for American women aged 30 to 39 of a small sample
35
00:02:21,210 --> 00:02:23,320
of them.
36
00:02:23,340 --> 00:02:28,230
So if we explore this data frame a little more we noticed we have two columns here height and weight
37
00:02:28,530 --> 00:02:33,760
with various values and each row is just a numeric indexed value.
38
00:02:33,760 --> 00:02:37,180
It doesn't actually have a label.
39
00:02:37,650 --> 00:02:42,270
You might be wondering well how do you get a list of all the available built in data frames that are
40
00:02:42,720 --> 00:02:43,600
in order to do that.
41
00:02:43,620 --> 00:02:51,480
You can just type in data close parentheses on that data function and you should get a new tab to pop
42
00:02:51,480 --> 00:02:52,280
up here.
43
00:02:52,470 --> 00:02:57,810
That will give you a list of all the data sets in the builtin data sets packaged with are.
44
00:02:57,810 --> 00:03:04,200
So if we scroll down here you'll see on the left the name that you need to import and a right description
45
00:03:04,230 --> 00:03:04,770
of it.
46
00:03:05,040 --> 00:03:11,970
So for instance let's say we just scroll down here we want to explore the world's telephone's will we
47
00:03:11,970 --> 00:03:14,730
just type in world phones to see that.
48
00:03:14,760 --> 00:03:23,130
So go ahead and type in world phones enter and so we start seeing some data about the number of phones
49
00:03:23,130 --> 00:03:27,280
in the world separated by columns of the different parts of the world.
50
00:03:27,600 --> 00:03:31,100
Something you've made noticed is that some of the data frames are really big.
51
00:03:31,350 --> 00:03:36,720
If you just want to take a peek at either the top or the bottom of the data frame you can use the head
52
00:03:36,750 --> 00:03:38,060
and tail functions.
53
00:03:38,250 --> 00:03:45,660
So for example remember we had that state DOT X 7:7 data frame which was quite large because it has
54
00:03:45,660 --> 00:03:47,490
all 50 states inside of it.
55
00:03:47,660 --> 00:03:57,930
We can go ahead and call ahead on State DOT x 7 7 and this will just return the first six rows of that
56
00:03:57,930 --> 00:04:06,070
data frame and we can also call tail state x 7 7 or just passing whatever data frame we want in they'll
57
00:04:06,080 --> 00:04:09,870
shoot the last six rows.
58
00:04:09,870 --> 00:04:15,900
Some other useful built in functions for working data frames are SDR and summary.
59
00:04:16,080 --> 00:04:22,320
We can use the TR function to get the structure of a data frame which gives information on the structure
60
00:04:22,320 --> 00:04:27,710
of the data frame and the data it contains such as variable names and data types.
61
00:04:27,720 --> 00:04:34,010
So for example if I call SDR which stands for structure and passing the data frame object.
62
00:04:34,260 --> 00:04:40,030
Let's go ahead and pass in that state X 7:7 click enter.
63
00:04:40,140 --> 00:04:42,950
Notice we begin to get some information about it.
64
00:04:43,160 --> 00:04:49,730
Like what kind of rows it has what kind of columns it has the dimensions etc..
65
00:04:49,860 --> 00:04:55,430
There's another built in function which will give you some nice testicle data called summary.
66
00:04:55,560 --> 00:05:01,350
So summary is just a generic function that's used to produce results summaries of the results of various
67
00:05:01,410 --> 00:05:05,470
model fitting functions or just of objects like a data frame.
68
00:05:05,490 --> 00:05:10,460
So if you pass in your data frame here so summerise thought X 7:7
69
00:05:13,650 --> 00:05:17,540
notice we get a summary for each of the columns in our data frame.
70
00:05:17,550 --> 00:05:24,310
So for example the population summary gives you the minimum value the median value the mean values some
71
00:05:24,330 --> 00:05:27,290
court tile values as well as the maximum value.
72
00:05:27,780 --> 00:05:33,480
Later on in the course we're going to be using summary quite a bit for not just data frames but a whole
73
00:05:33,480 --> 00:05:35,790
host of functions and objects.
74
00:05:35,790 --> 00:05:40,040
So just keep that in mind since we've explored built in data frames.
75
00:05:40,050 --> 00:05:43,920
Let's go ahead and explore how we can create our own data frames now.
76
00:05:44,780 --> 00:05:48,770
OK we'll start off by creating a data frame from some vectors.
77
00:05:48,780 --> 00:05:51,550
Let's go ahead and make up some weather data.
78
00:05:51,570 --> 00:05:57,660
I'll start off by making a vector called days and we'll just have this be the days of the week which
79
00:05:57,660 --> 00:05:58,870
we've used before.
80
00:05:59,190 --> 00:06:03,300
So Monday Tuesday Wednesday
81
00:06:06,720 --> 00:06:16,710
Thursday and Friday we have Monday Tuesday Wednesday Thursday Friday for the days then we're going to
82
00:06:16,710 --> 00:06:24,200
make a temp sector and you can actually just quickly copy and paste this from the notes and I'm going
83
00:06:24,200 --> 00:06:26,250
to make up some temperatures.
84
00:06:26,280 --> 00:06:30,190
Let's say these are in Celsius now.
85
00:06:30,540 --> 00:06:31,240
OK.
86
00:06:31,620 --> 00:06:33,090
So we have my temperatures there.
87
00:06:33,130 --> 00:06:36,060
And let's make one final vector.
88
00:06:36,060 --> 00:06:39,390
This one's going to be a vector of logical values.
89
00:06:39,390 --> 00:06:44,430
So boolean values will indicate whether or not it rained that day.
90
00:06:44,460 --> 00:06:48,670
So we'll say it drained on Tuesday on Monday.
91
00:06:48,760 --> 00:06:54,770
Did it rain on Wednesday then in Reno Thursday and it rained again on Friday.
92
00:06:55,230 --> 00:07:05,400
Okay so we can go ahead and pass in these vectors into the data dot Frain function so we can go ahead
93
00:07:05,400 --> 00:07:16,020
and say days Tim rain and now we outputted a data frame and let's go ahead and actually save this as
94
00:07:16,020 --> 00:07:25,990
DMF variable data frame and then pass in the vector's days Tim brain.
95
00:07:26,700 --> 00:07:31,530
And then when we check out our data frame we'll notice that the column names have been automatically
96
00:07:31,530 --> 00:07:40,050
set to be the same as the name of the vector variable and by default the rows are just indexed numbers
97
00:07:40,050 --> 00:07:41,040
.
98
00:07:41,460 --> 00:07:49,430
Since we've now Cranor a data frame we can actually go ahead and call structure on it as TR RDF that
99
00:07:49,440 --> 00:07:52,920
will give us some information about the structure of the data in the data frame.
100
00:07:52,920 --> 00:07:56,780
Or we can even call a summary on that data frame.
101
00:07:57,490 --> 00:08:04,980
Okay that's it for the basics of what a data frame is and what it looks like and are the main things
102
00:08:04,980 --> 00:08:07,420
you should have been able to get out of this lecture.
103
00:08:07,620 --> 00:08:08,760
Are the following.
104
00:08:09,000 --> 00:08:15,120
You can type in data to get a output of all the data sets in the package data sets.
105
00:08:15,170 --> 00:08:16,300
You can play around with.
106
00:08:16,320 --> 00:08:18,720
Are these already built in for you.
107
00:08:18,720 --> 00:08:24,850
Also remember you can use SDR and summary to check out data frames whether you create them or they're
108
00:08:24,850 --> 00:08:25,750
built in.
109
00:08:26,010 --> 00:08:33,000
And finally you can create your own data frames for passing and vectors into the data dot frame function
110
00:08:33,000 --> 00:08:33,780
.
111
00:08:34,080 --> 00:08:34,950
OK.
112
00:08:35,010 --> 00:08:36,600
Again that's it for the basics.
113
00:08:36,600 --> 00:08:41,150
Up next we're going to learn about selection and indexing data frame elements.
114
00:08:41,160 --> 00:08:43,130
Thanks everyone and I'll see you at the next lecture
12140
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.