Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,600 --> 00:00:06,140
Hello everyone and welcome to the our data frame's exercises solutions walk through lecture.
2
00:00:06,390 --> 00:00:11,700
This lecture is going to be programming through the solutions for the data frame's exercises and explaining
3
00:00:11,730 --> 00:00:12,820
as we go along.
4
00:00:13,110 --> 00:00:15,070
Let's jump to our studio and get started.
5
00:00:15,270 --> 00:00:15,570
OK.
6
00:00:15,570 --> 00:00:21,720
So here we are studio in the first exercise question was to recreate the following data frame by creating
7
00:00:21,720 --> 00:00:25,440
vectors and using that data frame function.
8
00:00:25,440 --> 00:00:30,570
So the data frame in question that we need to recreate is in the exercise notebook and have also printed
9
00:00:30,570 --> 00:00:31,710
it out here.
10
00:00:31,710 --> 00:00:34,410
So looks like we want three rows and three columns.
11
00:00:34,410 --> 00:00:38,890
Sam Frank and Amy are people as rows so this the index of the rows.
12
00:00:39,180 --> 00:00:43,180
And then we have an age column the weights column and a sex column.
13
00:00:43,220 --> 00:00:46,650
So as the instructions say we'll do it by creating vectors.
14
00:00:46,980 --> 00:00:53,380
So let's go ahead and start and go ahead and make a name vector to hold the names of the people.
15
00:00:53,820 --> 00:01:03,370
So that's going to be Sam Frank is our next one and Amy is our last one.
16
00:01:03,610 --> 00:01:07,200
You new single or double quotes here won't really make a difference.
17
00:01:07,200 --> 00:01:10,880
Next one will have is H.
18
00:01:11,610 --> 00:01:17,810
And I'll make that vector have 20 to 25 and 26.
19
00:01:18,210 --> 00:01:29,070
Then we have a wait's column so make a vector called a weight and that will carry 150 165 120.
20
00:01:29,070 --> 00:01:35,580
And then finally we have a six column for their gender and then we have them
21
00:01:38,390 --> 00:01:42,820
male again and then female.
22
00:01:42,840 --> 00:01:44,160
All right so we have a vectors.
23
00:01:44,160 --> 00:01:48,150
Now the question is how do we combine these into a data frame.
24
00:01:48,540 --> 00:01:55,350
Well we need to call our data that frame function and then I'm going to go ahead and say robot names
25
00:01:56,010 --> 00:02:03,570
is equal to the name vector and that's how we can assign those row names that index labeling to the
26
00:02:03,570 --> 00:02:04,980
name vector.
27
00:02:04,980 --> 00:02:10,020
Then I just need a pass in the columns that I want in this case are just the vectors.
28
00:02:10,020 --> 00:02:17,120
So I can say age weights and then that sex gender column.
29
00:02:17,220 --> 00:02:19,730
Let's go ahead and assign this to a data frame.
30
00:02:20,040 --> 00:02:26,760
If we can print out DSF And if we just read this we get the exact same result looks.
31
00:02:26,820 --> 00:02:33,060
If you've got a match fantastic may go ahead and clear the con..
32
00:02:33,060 --> 00:02:38,610
One other thing I want to mention before we continue onto the next exercise is in case you couldn't
33
00:02:38,610 --> 00:02:44,460
figure out how to actually set the names of the row row that names using this input data frame function
34
00:02:44,790 --> 00:02:51,840
you could have also set it using the row names function that we've seen before row names and then what
35
00:02:51,840 --> 00:02:56,100
we do is just passing your data frame and then assign a vector of names.
36
00:02:56,100 --> 00:03:04,500
So for example if we had different names such as ABC just like we did for matrices you can use this
37
00:03:04,620 --> 00:03:11,230
same functionality knows another way you could have set those names for the rows.
38
00:03:11,250 --> 00:03:11,830
All right.
39
00:03:12,060 --> 00:03:15,280
So we have age weight sex and ABC.
40
00:03:15,300 --> 00:03:18,030
Let's go ahead and continue on.
41
00:03:18,070 --> 00:03:26,390
Going to go ahead and clear this text here and it clear the council and put in the next exercise question
42
00:03:26,390 --> 00:03:26,980
.
43
00:03:27,060 --> 00:03:30,620
So the next exercise question was the check of empty cars is a data frame using.
44
00:03:30,630 --> 00:03:32,300
Is it a frame.
45
00:03:32,310 --> 00:03:38,080
So again empty cars is a built in data frame in our So it's go and just check the head of it.
46
00:03:38,100 --> 00:03:42,840
You don't need to import any libraries or do anything you see a type empty cars it will automatically
47
00:03:42,840 --> 00:03:45,350
know that you're referencing that data frame.
48
00:03:45,360 --> 00:03:49,620
Quick reminder do you want to see what other data is available for you that's built in.
49
00:03:49,620 --> 00:03:56,520
You can say data as a function and this little pop up will show up with names of builtin matrices data
50
00:03:56,520 --> 00:03:58,850
friends vectors etc..
51
00:03:58,850 --> 00:04:00,960
We're going to close that now.
52
00:04:01,410 --> 00:04:07,830
So we want to check if MT Kerr's is a data frame so we can always check if an object is a particular
53
00:04:07,830 --> 00:04:10,890
type of class or a particular type of data structure etc..
54
00:04:11,020 --> 00:04:13,540
But using is datt methodology.
55
00:04:13,560 --> 00:04:18,870
So that is that we can then we just say whatever we're actually checking for in this case we're checking
56
00:04:18,870 --> 00:04:20,470
for is that data frame.
57
00:04:20,670 --> 00:04:26,000
And we just pass in empty cars and it returns True which is good because IndyCar has a built in data
58
00:04:26,000 --> 00:04:26,550
.
59
00:04:26,820 --> 00:04:33,130
Next exercise was to use as that data frame to convert a matrix into a data frame.
60
00:04:33,330 --> 00:04:41,100
So just like we have these is options we also have as options and as options will basically try to convert
61
00:04:41,520 --> 00:04:45,200
from one object or a data type to another.
62
00:04:45,300 --> 00:04:47,970
So we're going to say as well actually.
63
00:04:47,970 --> 00:04:58,710
First off we want to actually set up our matrix so the matrix in this case is this M-80 we sign that
64
00:04:58,710 --> 00:05:06,730
matrix and then we can say as that data frame pass in M-80 and we get a data frame back.
65
00:05:06,730 --> 00:05:12,360
So if I just say amitie or Matt by itself notice a difference in the output display here we can see
66
00:05:12,360 --> 00:05:18,270
it's a matrix due to the bracket notation in the kitting rows and columns just by index or array numbers
67
00:05:18,270 --> 00:05:18,610
.
68
00:05:18,660 --> 00:05:25,880
Here we can see the data frame actually has built in column names and a builtin row naming scheme.
69
00:05:25,890 --> 00:05:26,210
All right.
70
00:05:26,250 --> 00:05:28,530
Moving on to the next exercise question.
71
00:05:28,860 --> 00:05:34,790
It was to set the builtin data for an empty cars as a variable DFAC if I clear the council.
72
00:05:34,980 --> 00:05:40,380
All we had to do for this step was quite simple just say the F is empty cars and we're going to be doing
73
00:05:40,380 --> 00:05:43,880
is referring to ADF for the rest of the questions.
74
00:05:44,010 --> 00:05:49,240
So the next one was to display the first six rows of DLF.
75
00:05:49,260 --> 00:05:51,440
Question number five how do we actually do that.
76
00:05:51,450 --> 00:05:56,510
Well we can just call ahead on the path though automatically display the first six rows.
77
00:05:56,510 --> 00:06:00,430
Do you want to display a certain number of rows from the top of your data frame.
78
00:06:00,510 --> 00:06:06,060
You can specify a second argument and head which is just an integer saying OK only display the first
79
00:06:06,060 --> 00:06:09,420
two rows six seven rows etc..
80
00:06:09,660 --> 00:06:17,040
Next question we had to answer was What does the average MP G or mpg value for all the cars.
81
00:06:17,460 --> 00:06:22,000
Well north to answer this question let's go in and check.
82
00:06:22,030 --> 00:06:27,300
Looks like we have an MPG column number the way one way of calling columns off a data frame is just
83
00:06:27,300 --> 00:06:29,220
by using the dollar sign.
84
00:06:29,220 --> 00:06:34,110
So using that methodology I can get a vector of the values and have a vector of values.
85
00:06:34,110 --> 00:06:41,370
It means I can just call me in and say the F mpg.
86
00:06:42,030 --> 00:06:47,550
And there we have it's around twenty point one miles per gallon is the average MPG value for all the
87
00:06:47,550 --> 00:06:48,800
cars.
88
00:06:48,870 --> 00:06:50,800
Go in and go onto the next question.
89
00:06:51,600 --> 00:06:53,960
OK so exercise seven.
90
00:06:53,970 --> 00:06:58,040
Question number seven was to select the rows for all cars have 6 cylinders.
91
00:06:58,090 --> 00:06:59,850
There's a couple of ways we can do this.
92
00:06:59,880 --> 00:07:06,570
One way is through bracket notation where we can just say since we have a data frame as empty cars DSF
93
00:07:07,110 --> 00:07:13,060
specify the cylinder column equals six.
94
00:07:13,110 --> 00:07:14,770
And then we have to add an extra comma.
95
00:07:14,850 --> 00:07:17,970
Since we're looking for all the rows where that's true.
96
00:07:17,970 --> 00:07:23,740
And then that will return where the data frame cylinder column has equality with six.
97
00:07:23,760 --> 00:07:29,520
So that's why these bracket notation we can also use the subset function to do this.
98
00:07:29,640 --> 00:07:38,850
So we can say subsets pass in our data frame and then say cylinder equals to 6 and they'll produce the
99
00:07:38,850 --> 00:07:40,090
exact same result.
100
00:07:40,320 --> 00:07:41,740
Either method is correct.
101
00:07:41,760 --> 00:07:46,890
This is the method shown in the solution notebook but you could have also done subset the cylinders
102
00:07:47,090 --> 00:07:49,450
six is a couple of other ways to do this.
103
00:07:49,470 --> 00:07:55,350
And later on later in the course that is all there and how to use the player library to also filter
104
00:07:55,350 --> 00:07:57,800
out results using some special functions.
105
00:07:57,940 --> 00:08:01,020
For now either of these two methods would have been correct.
106
00:08:01,020 --> 00:08:09,680
Moving on to the next exercise we had to select the columns am gear and carb from the data frame.
107
00:08:09,690 --> 00:08:11,510
So how do we actually do that.
108
00:08:11,690 --> 00:08:12,750
We clear the console.
109
00:08:12,780 --> 00:08:19,650
We know if we want to select just one column we can say bracket notation comma and then the name of
110
00:08:19,650 --> 00:08:26,640
the column such as am and I'll return those vector values if we want several columns so we can just
111
00:08:26,640 --> 00:08:29,810
pass any vector of the column names.
112
00:08:29,910 --> 00:08:39,780
So we want AM year car and there we have it we scroll up to actually see this.
113
00:08:39,840 --> 00:08:42,260
This is the resulting data frame.
114
00:08:42,330 --> 00:08:46,710
So you get those three columns back along with their real names that are associated with each of those
115
00:08:46,710 --> 00:08:48,300
values.
116
00:08:48,300 --> 00:08:52,590
There's a couple of other ways to do this but this is probably the most straightforward as far as you
117
00:08:52,590 --> 00:08:55,520
see him bracket notation indexing to do this.
118
00:08:55,530 --> 00:09:02,280
Moving on to the next question that was to create a new column called performance which is calculated
119
00:09:02,280 --> 00:09:04,500
by horsepower divided by weight.
120
00:09:04,740 --> 00:09:06,940
Let's go ahead and do that.
121
00:09:07,320 --> 00:09:08,930
Can it clear the council.
122
00:09:09,720 --> 00:09:13,150
So how do we actually create a new column with a data frame.
123
00:09:13,470 --> 00:09:14,740
Well there are several ways to do it.
124
00:09:14,760 --> 00:09:21,180
The easiest is just by specifying that column as if it already exists and then assigning it some values
125
00:09:21,180 --> 00:09:21,270
.
126
00:09:21,270 --> 00:09:24,240
In this case we want to assign horsepower the value by weight.
127
00:09:24,300 --> 00:09:30,870
So just go ahead and call those columns horse power after they reframe divided by weight.
128
00:09:31,340 --> 00:09:33,790
And let's go ahead and check the head of our data free now.
129
00:09:34,350 --> 00:09:40,830
And notice we have the new performance column and this will lead us into our next question and our next
130
00:09:40,830 --> 00:09:46,140
question notice that the performance column has several decimal place precision so it looks it goes
131
00:09:46,140 --> 00:09:47,940
up to five decimal places.
132
00:09:48,090 --> 00:09:53,820
We want to figure out how to use round to reduce this accuracy to only two decimal places.
133
00:09:53,820 --> 00:09:55,360
And it says check help round.
134
00:09:55,440 --> 00:09:56,580
Let's go ahead and do that.
135
00:09:56,730 --> 00:10:04,560
So if we haven't seen round before we can say help round her and we get this nice help documentation
136
00:10:04,650 --> 00:10:06,890
on the rounding of numbers.
137
00:10:06,990 --> 00:10:12,540
There's several functions to help us round numbers but we're looking just for round which round is the
138
00:10:12,540 --> 00:10:16,170
value in the first argument to the specified number of decimal places.
139
00:10:16,170 --> 00:10:24,420
So if we go ahead and copy and paste the documentation line looks like this and we end up having is
140
00:10:24,780 --> 00:10:26,250
two arguments here.
141
00:10:26,250 --> 00:10:27,600
X and digits.
142
00:10:27,610 --> 00:10:35,280
So X is the numeric vector and what digits represents is the number of decimal places that we want to
143
00:10:35,280 --> 00:10:36,980
use.
144
00:10:36,990 --> 00:10:42,180
Let's go ahead and shift this over to the right now that we know how to use round.
145
00:10:42,210 --> 00:10:46,890
Let's go ahead and reassign performance.
146
00:10:46,890 --> 00:10:51,170
So say performance is going to be equal to.
147
00:10:51,210 --> 00:10:55,600
And we can go ahead and say DMF performance again.
148
00:10:55,770 --> 00:11:01,530
And in this case what we're going to do is use round to pass in round.
149
00:11:01,530 --> 00:11:06,270
And the second argument we pass on is to which is the digits argument to make that really clear we can
150
00:11:06,270 --> 00:11:09,600
just say digits equals 2.
151
00:11:10,320 --> 00:11:15,240
And now if I check the head of my data frame I notice that my performance column has been truncated
152
00:11:15,330 --> 00:11:17,640
or rounded off to two digits.
153
00:11:17,640 --> 00:11:20,250
So it's not a straight truncation it's just a rounding off.
154
00:11:20,250 --> 00:11:26,280
So for example thirty point three four six gets rounded to thirty point three five.
155
00:11:26,280 --> 00:11:29,580
Let's go ahead and move on to the next question.
156
00:11:29,610 --> 00:11:31,540
Next question Frigo and informant.
157
00:11:31,560 --> 00:11:40,080
This was what is the average MPG for cars that have more than 100 horsepower and a weight value of more
158
00:11:40,080 --> 00:11:41,500
than 2.5.
159
00:11:41,850 --> 00:11:43,630
Let's go ahead and figure this out.
160
00:11:44,130 --> 00:11:47,330
There's a couple of ways you can solve this.
161
00:11:47,400 --> 00:11:49,880
Also the first method using subset.
162
00:11:50,220 --> 00:11:56,580
So our first challenge is to grab the subset of the data frame where we have more than 100 horsepower
163
00:11:56,610 --> 00:11:59,310
and a weight value of more than two point five.
164
00:11:59,310 --> 00:12:04,960
So I can say subset pass in my data frame and then pass in my condition.
165
00:12:04,960 --> 00:12:08,390
So in this case want a horse power greater than 100.
166
00:12:08,980 --> 00:12:17,050
And so using that logical operator I want weight to be also greater than 2.5.
167
00:12:17,730 --> 00:12:24,480
So if I go ahead and call that subset I get back to a subset of the data frame where this is true and
168
00:12:24,480 --> 00:12:29,290
I can actually call columns off of that subset command.
169
00:12:29,310 --> 00:12:35,770
So when you go in and clear this from that subset command I can call a column off of it.
170
00:12:35,880 --> 00:12:45,600
MPG which means I can take that whole statement and pass it into the mean function.
171
00:12:46,050 --> 00:12:51,210
And there you have sixteen point eight six etc. which is the average miles per gallon for cars that
172
00:12:51,210 --> 00:12:56,430
have more than 100 horsepower and a weight value of 2.5.
173
00:12:56,430 --> 00:13:00,700
That's how you can solve this question using the subset function.
174
00:13:00,720 --> 00:13:04,100
Now we could also bracket notation to do this.
175
00:13:04,190 --> 00:13:06,930
I'll go ahead and show you how we could have done that.
176
00:13:07,170 --> 00:13:13,750
We can say DSF and in brackets pass what the actual conditions you want.
177
00:13:14,190 --> 00:13:21,480
So this gets a little messier because we have to specify DFI dollar signs but it's essentially the same
178
00:13:21,480 --> 00:13:21,940
logic.
179
00:13:21,990 --> 00:13:32,650
We're saying T.F. horsepower greater than 100 and DPF weights greater than 2.5.
180
00:13:33,150 --> 00:13:40,930
And then what we can do off of this is put a comma call mpg Whoops.
181
00:13:41,070 --> 00:13:45,550
And then we just see that we get the exact same results.
182
00:13:45,750 --> 00:13:54,190
So I can call I mean on this entire thing and this is how you would do it using bracket notation.
183
00:13:54,720 --> 00:13:59,320
Personally subset looks a lot cleaner and has a lot more readable to me personally.
184
00:13:59,460 --> 00:14:03,130
But if you really like bracket notation you could have also done it this way.
185
00:14:03,600 --> 00:14:08,250
As I mentioned earlier later on we'll learn how to use that the player library to try to clean up these
186
00:14:08,250 --> 00:14:12,180
sort of filter instructions with a nice clean syntax.
187
00:14:12,180 --> 00:14:18,320
Finally the last question was this what is the mpg of the Hornet sport about.
188
00:14:18,390 --> 00:14:20,220
So how do we actually find that.
189
00:14:20,580 --> 00:14:28,580
Well I can't get my data frame and then just pass in the name of that car on it.
190
00:14:28,590 --> 00:14:31,660
Spore about karma.
191
00:14:31,800 --> 00:14:37,920
So I pass this first because that's the actual name comma because I want all the cars for that.
192
00:14:37,950 --> 00:14:41,590
So if I just do this they'll return the horn at sport about Roe.
193
00:14:42,030 --> 00:14:47,970
And if I want the mpg of that I can just say dollar sign mpg and there we have eighteen point seven
194
00:14:48,040 --> 00:14:51,140
The MPG of the Hornet sport about car.
195
00:14:51,180 --> 00:14:51,860
OK.
196
00:14:52,110 --> 00:14:56,870
That's it for this lecture on the solutions walk through for the data frames exercise.
197
00:14:56,880 --> 00:15:02,130
If any of that was unclear makes you reference the notebook work through the exercises again or reference
198
00:15:02,130 --> 00:15:05,600
the data frames lectures from the data frame section of the course.
199
00:15:05,610 --> 00:15:07,390
Thanks everyone and I'll see you at the next lecture
21166
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.