All language subtitles for 7. Data Frame Training Exercises - Solutions Walkthrough

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian Download
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,600 --> 00:00:06,140 Hello everyone and welcome to the our data frame's exercises solutions walk through lecture. 2 00:00:06,390 --> 00:00:11,700 This lecture is going to be programming through the solutions for the data frame's exercises and explaining 3 00:00:11,730 --> 00:00:12,820 as we go along. 4 00:00:13,110 --> 00:00:15,070 Let's jump to our studio and get started. 5 00:00:15,270 --> 00:00:15,570 OK. 6 00:00:15,570 --> 00:00:21,720 So here we are studio in the first exercise question was to recreate the following data frame by creating 7 00:00:21,720 --> 00:00:25,440 vectors and using that data frame function. 8 00:00:25,440 --> 00:00:30,570 So the data frame in question that we need to recreate is in the exercise notebook and have also printed 9 00:00:30,570 --> 00:00:31,710 it out here. 10 00:00:31,710 --> 00:00:34,410 So looks like we want three rows and three columns. 11 00:00:34,410 --> 00:00:38,890 Sam Frank and Amy are people as rows so this the index of the rows. 12 00:00:39,180 --> 00:00:43,180 And then we have an age column the weights column and a sex column. 13 00:00:43,220 --> 00:00:46,650 So as the instructions say we'll do it by creating vectors. 14 00:00:46,980 --> 00:00:53,380 So let's go ahead and start and go ahead and make a name vector to hold the names of the people. 15 00:00:53,820 --> 00:01:03,370 So that's going to be Sam Frank is our next one and Amy is our last one. 16 00:01:03,610 --> 00:01:07,200 You new single or double quotes here won't really make a difference. 17 00:01:07,200 --> 00:01:10,880 Next one will have is H. 18 00:01:11,610 --> 00:01:17,810 And I'll make that vector have 20 to 25 and 26. 19 00:01:18,210 --> 00:01:29,070 Then we have a wait's column so make a vector called a weight and that will carry 150 165 120. 20 00:01:29,070 --> 00:01:35,580 And then finally we have a six column for their gender and then we have them 21 00:01:38,390 --> 00:01:42,820 male again and then female. 22 00:01:42,840 --> 00:01:44,160 All right so we have a vectors. 23 00:01:44,160 --> 00:01:48,150 Now the question is how do we combine these into a data frame. 24 00:01:48,540 --> 00:01:55,350 Well we need to call our data that frame function and then I'm going to go ahead and say robot names 25 00:01:56,010 --> 00:02:03,570 is equal to the name vector and that's how we can assign those row names that index labeling to the 26 00:02:03,570 --> 00:02:04,980 name vector. 27 00:02:04,980 --> 00:02:10,020 Then I just need a pass in the columns that I want in this case are just the vectors. 28 00:02:10,020 --> 00:02:17,120 So I can say age weights and then that sex gender column. 29 00:02:17,220 --> 00:02:19,730 Let's go ahead and assign this to a data frame. 30 00:02:20,040 --> 00:02:26,760 If we can print out DSF And if we just read this we get the exact same result looks. 31 00:02:26,820 --> 00:02:33,060 If you've got a match fantastic may go ahead and clear the con.. 32 00:02:33,060 --> 00:02:38,610 One other thing I want to mention before we continue onto the next exercise is in case you couldn't 33 00:02:38,610 --> 00:02:44,460 figure out how to actually set the names of the row row that names using this input data frame function 34 00:02:44,790 --> 00:02:51,840 you could have also set it using the row names function that we've seen before row names and then what 35 00:02:51,840 --> 00:02:56,100 we do is just passing your data frame and then assign a vector of names. 36 00:02:56,100 --> 00:03:04,500 So for example if we had different names such as ABC just like we did for matrices you can use this 37 00:03:04,620 --> 00:03:11,230 same functionality knows another way you could have set those names for the rows. 38 00:03:11,250 --> 00:03:11,830 All right. 39 00:03:12,060 --> 00:03:15,280 So we have age weight sex and ABC. 40 00:03:15,300 --> 00:03:18,030 Let's go ahead and continue on. 41 00:03:18,070 --> 00:03:26,390 Going to go ahead and clear this text here and it clear the council and put in the next exercise question 42 00:03:26,390 --> 00:03:26,980 . 43 00:03:27,060 --> 00:03:30,620 So the next exercise question was the check of empty cars is a data frame using. 44 00:03:30,630 --> 00:03:32,300 Is it a frame. 45 00:03:32,310 --> 00:03:38,080 So again empty cars is a built in data frame in our So it's go and just check the head of it. 46 00:03:38,100 --> 00:03:42,840 You don't need to import any libraries or do anything you see a type empty cars it will automatically 47 00:03:42,840 --> 00:03:45,350 know that you're referencing that data frame. 48 00:03:45,360 --> 00:03:49,620 Quick reminder do you want to see what other data is available for you that's built in. 49 00:03:49,620 --> 00:03:56,520 You can say data as a function and this little pop up will show up with names of builtin matrices data 50 00:03:56,520 --> 00:03:58,850 friends vectors etc.. 51 00:03:58,850 --> 00:04:00,960 We're going to close that now. 52 00:04:01,410 --> 00:04:07,830 So we want to check if MT Kerr's is a data frame so we can always check if an object is a particular 53 00:04:07,830 --> 00:04:10,890 type of class or a particular type of data structure etc.. 54 00:04:11,020 --> 00:04:13,540 But using is datt methodology. 55 00:04:13,560 --> 00:04:18,870 So that is that we can then we just say whatever we're actually checking for in this case we're checking 56 00:04:18,870 --> 00:04:20,470 for is that data frame. 57 00:04:20,670 --> 00:04:26,000 And we just pass in empty cars and it returns True which is good because IndyCar has a built in data 58 00:04:26,000 --> 00:04:26,550 . 59 00:04:26,820 --> 00:04:33,130 Next exercise was to use as that data frame to convert a matrix into a data frame. 60 00:04:33,330 --> 00:04:41,100 So just like we have these is options we also have as options and as options will basically try to convert 61 00:04:41,520 --> 00:04:45,200 from one object or a data type to another. 62 00:04:45,300 --> 00:04:47,970 So we're going to say as well actually. 63 00:04:47,970 --> 00:04:58,710 First off we want to actually set up our matrix so the matrix in this case is this M-80 we sign that 64 00:04:58,710 --> 00:05:06,730 matrix and then we can say as that data frame pass in M-80 and we get a data frame back. 65 00:05:06,730 --> 00:05:12,360 So if I just say amitie or Matt by itself notice a difference in the output display here we can see 66 00:05:12,360 --> 00:05:18,270 it's a matrix due to the bracket notation in the kitting rows and columns just by index or array numbers 67 00:05:18,270 --> 00:05:18,610 . 68 00:05:18,660 --> 00:05:25,880 Here we can see the data frame actually has built in column names and a builtin row naming scheme. 69 00:05:25,890 --> 00:05:26,210 All right. 70 00:05:26,250 --> 00:05:28,530 Moving on to the next exercise question. 71 00:05:28,860 --> 00:05:34,790 It was to set the builtin data for an empty cars as a variable DFAC if I clear the council. 72 00:05:34,980 --> 00:05:40,380 All we had to do for this step was quite simple just say the F is empty cars and we're going to be doing 73 00:05:40,380 --> 00:05:43,880 is referring to ADF for the rest of the questions. 74 00:05:44,010 --> 00:05:49,240 So the next one was to display the first six rows of DLF. 75 00:05:49,260 --> 00:05:51,440 Question number five how do we actually do that. 76 00:05:51,450 --> 00:05:56,510 Well we can just call ahead on the path though automatically display the first six rows. 77 00:05:56,510 --> 00:06:00,430 Do you want to display a certain number of rows from the top of your data frame. 78 00:06:00,510 --> 00:06:06,060 You can specify a second argument and head which is just an integer saying OK only display the first 79 00:06:06,060 --> 00:06:09,420 two rows six seven rows etc.. 80 00:06:09,660 --> 00:06:17,040 Next question we had to answer was What does the average MP G or mpg value for all the cars. 81 00:06:17,460 --> 00:06:22,000 Well north to answer this question let's go in and check. 82 00:06:22,030 --> 00:06:27,300 Looks like we have an MPG column number the way one way of calling columns off a data frame is just 83 00:06:27,300 --> 00:06:29,220 by using the dollar sign. 84 00:06:29,220 --> 00:06:34,110 So using that methodology I can get a vector of the values and have a vector of values. 85 00:06:34,110 --> 00:06:41,370 It means I can just call me in and say the F mpg. 86 00:06:42,030 --> 00:06:47,550 And there we have it's around twenty point one miles per gallon is the average MPG value for all the 87 00:06:47,550 --> 00:06:48,800 cars. 88 00:06:48,870 --> 00:06:50,800 Go in and go onto the next question. 89 00:06:51,600 --> 00:06:53,960 OK so exercise seven. 90 00:06:53,970 --> 00:06:58,040 Question number seven was to select the rows for all cars have 6 cylinders. 91 00:06:58,090 --> 00:06:59,850 There's a couple of ways we can do this. 92 00:06:59,880 --> 00:07:06,570 One way is through bracket notation where we can just say since we have a data frame as empty cars DSF 93 00:07:07,110 --> 00:07:13,060 specify the cylinder column equals six. 94 00:07:13,110 --> 00:07:14,770 And then we have to add an extra comma. 95 00:07:14,850 --> 00:07:17,970 Since we're looking for all the rows where that's true. 96 00:07:17,970 --> 00:07:23,740 And then that will return where the data frame cylinder column has equality with six. 97 00:07:23,760 --> 00:07:29,520 So that's why these bracket notation we can also use the subset function to do this. 98 00:07:29,640 --> 00:07:38,850 So we can say subsets pass in our data frame and then say cylinder equals to 6 and they'll produce the 99 00:07:38,850 --> 00:07:40,090 exact same result. 100 00:07:40,320 --> 00:07:41,740 Either method is correct. 101 00:07:41,760 --> 00:07:46,890 This is the method shown in the solution notebook but you could have also done subset the cylinders 102 00:07:47,090 --> 00:07:49,450 six is a couple of other ways to do this. 103 00:07:49,470 --> 00:07:55,350 And later on later in the course that is all there and how to use the player library to also filter 104 00:07:55,350 --> 00:07:57,800 out results using some special functions. 105 00:07:57,940 --> 00:08:01,020 For now either of these two methods would have been correct. 106 00:08:01,020 --> 00:08:09,680 Moving on to the next exercise we had to select the columns am gear and carb from the data frame. 107 00:08:09,690 --> 00:08:11,510 So how do we actually do that. 108 00:08:11,690 --> 00:08:12,750 We clear the console. 109 00:08:12,780 --> 00:08:19,650 We know if we want to select just one column we can say bracket notation comma and then the name of 110 00:08:19,650 --> 00:08:26,640 the column such as am and I'll return those vector values if we want several columns so we can just 111 00:08:26,640 --> 00:08:29,810 pass any vector of the column names. 112 00:08:29,910 --> 00:08:39,780 So we want AM year car and there we have it we scroll up to actually see this. 113 00:08:39,840 --> 00:08:42,260 This is the resulting data frame. 114 00:08:42,330 --> 00:08:46,710 So you get those three columns back along with their real names that are associated with each of those 115 00:08:46,710 --> 00:08:48,300 values. 116 00:08:48,300 --> 00:08:52,590 There's a couple of other ways to do this but this is probably the most straightforward as far as you 117 00:08:52,590 --> 00:08:55,520 see him bracket notation indexing to do this. 118 00:08:55,530 --> 00:09:02,280 Moving on to the next question that was to create a new column called performance which is calculated 119 00:09:02,280 --> 00:09:04,500 by horsepower divided by weight. 120 00:09:04,740 --> 00:09:06,940 Let's go ahead and do that. 121 00:09:07,320 --> 00:09:08,930 Can it clear the council. 122 00:09:09,720 --> 00:09:13,150 So how do we actually create a new column with a data frame. 123 00:09:13,470 --> 00:09:14,740 Well there are several ways to do it. 124 00:09:14,760 --> 00:09:21,180 The easiest is just by specifying that column as if it already exists and then assigning it some values 125 00:09:21,180 --> 00:09:21,270 . 126 00:09:21,270 --> 00:09:24,240 In this case we want to assign horsepower the value by weight. 127 00:09:24,300 --> 00:09:30,870 So just go ahead and call those columns horse power after they reframe divided by weight. 128 00:09:31,340 --> 00:09:33,790 And let's go ahead and check the head of our data free now. 129 00:09:34,350 --> 00:09:40,830 And notice we have the new performance column and this will lead us into our next question and our next 130 00:09:40,830 --> 00:09:46,140 question notice that the performance column has several decimal place precision so it looks it goes 131 00:09:46,140 --> 00:09:47,940 up to five decimal places. 132 00:09:48,090 --> 00:09:53,820 We want to figure out how to use round to reduce this accuracy to only two decimal places. 133 00:09:53,820 --> 00:09:55,360 And it says check help round. 134 00:09:55,440 --> 00:09:56,580 Let's go ahead and do that. 135 00:09:56,730 --> 00:10:04,560 So if we haven't seen round before we can say help round her and we get this nice help documentation 136 00:10:04,650 --> 00:10:06,890 on the rounding of numbers. 137 00:10:06,990 --> 00:10:12,540 There's several functions to help us round numbers but we're looking just for round which round is the 138 00:10:12,540 --> 00:10:16,170 value in the first argument to the specified number of decimal places. 139 00:10:16,170 --> 00:10:24,420 So if we go ahead and copy and paste the documentation line looks like this and we end up having is 140 00:10:24,780 --> 00:10:26,250 two arguments here. 141 00:10:26,250 --> 00:10:27,600 X and digits. 142 00:10:27,610 --> 00:10:35,280 So X is the numeric vector and what digits represents is the number of decimal places that we want to 143 00:10:35,280 --> 00:10:36,980 use. 144 00:10:36,990 --> 00:10:42,180 Let's go ahead and shift this over to the right now that we know how to use round. 145 00:10:42,210 --> 00:10:46,890 Let's go ahead and reassign performance. 146 00:10:46,890 --> 00:10:51,170 So say performance is going to be equal to. 147 00:10:51,210 --> 00:10:55,600 And we can go ahead and say DMF performance again. 148 00:10:55,770 --> 00:11:01,530 And in this case what we're going to do is use round to pass in round. 149 00:11:01,530 --> 00:11:06,270 And the second argument we pass on is to which is the digits argument to make that really clear we can 150 00:11:06,270 --> 00:11:09,600 just say digits equals 2. 151 00:11:10,320 --> 00:11:15,240 And now if I check the head of my data frame I notice that my performance column has been truncated 152 00:11:15,330 --> 00:11:17,640 or rounded off to two digits. 153 00:11:17,640 --> 00:11:20,250 So it's not a straight truncation it's just a rounding off. 154 00:11:20,250 --> 00:11:26,280 So for example thirty point three four six gets rounded to thirty point three five. 155 00:11:26,280 --> 00:11:29,580 Let's go ahead and move on to the next question. 156 00:11:29,610 --> 00:11:31,540 Next question Frigo and informant. 157 00:11:31,560 --> 00:11:40,080 This was what is the average MPG for cars that have more than 100 horsepower and a weight value of more 158 00:11:40,080 --> 00:11:41,500 than 2.5. 159 00:11:41,850 --> 00:11:43,630 Let's go ahead and figure this out. 160 00:11:44,130 --> 00:11:47,330 There's a couple of ways you can solve this. 161 00:11:47,400 --> 00:11:49,880 Also the first method using subset. 162 00:11:50,220 --> 00:11:56,580 So our first challenge is to grab the subset of the data frame where we have more than 100 horsepower 163 00:11:56,610 --> 00:11:59,310 and a weight value of more than two point five. 164 00:11:59,310 --> 00:12:04,960 So I can say subset pass in my data frame and then pass in my condition. 165 00:12:04,960 --> 00:12:08,390 So in this case want a horse power greater than 100. 166 00:12:08,980 --> 00:12:17,050 And so using that logical operator I want weight to be also greater than 2.5. 167 00:12:17,730 --> 00:12:24,480 So if I go ahead and call that subset I get back to a subset of the data frame where this is true and 168 00:12:24,480 --> 00:12:29,290 I can actually call columns off of that subset command. 169 00:12:29,310 --> 00:12:35,770 So when you go in and clear this from that subset command I can call a column off of it. 170 00:12:35,880 --> 00:12:45,600 MPG which means I can take that whole statement and pass it into the mean function. 171 00:12:46,050 --> 00:12:51,210 And there you have sixteen point eight six etc. which is the average miles per gallon for cars that 172 00:12:51,210 --> 00:12:56,430 have more than 100 horsepower and a weight value of 2.5. 173 00:12:56,430 --> 00:13:00,700 That's how you can solve this question using the subset function. 174 00:13:00,720 --> 00:13:04,100 Now we could also bracket notation to do this. 175 00:13:04,190 --> 00:13:06,930 I'll go ahead and show you how we could have done that. 176 00:13:07,170 --> 00:13:13,750 We can say DSF and in brackets pass what the actual conditions you want. 177 00:13:14,190 --> 00:13:21,480 So this gets a little messier because we have to specify DFI dollar signs but it's essentially the same 178 00:13:21,480 --> 00:13:21,940 logic. 179 00:13:21,990 --> 00:13:32,650 We're saying T.F. horsepower greater than 100 and DPF weights greater than 2.5. 180 00:13:33,150 --> 00:13:40,930 And then what we can do off of this is put a comma call mpg Whoops. 181 00:13:41,070 --> 00:13:45,550 And then we just see that we get the exact same results. 182 00:13:45,750 --> 00:13:54,190 So I can call I mean on this entire thing and this is how you would do it using bracket notation. 183 00:13:54,720 --> 00:13:59,320 Personally subset looks a lot cleaner and has a lot more readable to me personally. 184 00:13:59,460 --> 00:14:03,130 But if you really like bracket notation you could have also done it this way. 185 00:14:03,600 --> 00:14:08,250 As I mentioned earlier later on we'll learn how to use that the player library to try to clean up these 186 00:14:08,250 --> 00:14:12,180 sort of filter instructions with a nice clean syntax. 187 00:14:12,180 --> 00:14:18,320 Finally the last question was this what is the mpg of the Hornet sport about. 188 00:14:18,390 --> 00:14:20,220 So how do we actually find that. 189 00:14:20,580 --> 00:14:28,580 Well I can't get my data frame and then just pass in the name of that car on it. 190 00:14:28,590 --> 00:14:31,660 Spore about karma. 191 00:14:31,800 --> 00:14:37,920 So I pass this first because that's the actual name comma because I want all the cars for that. 192 00:14:37,950 --> 00:14:41,590 So if I just do this they'll return the horn at sport about Roe. 193 00:14:42,030 --> 00:14:47,970 And if I want the mpg of that I can just say dollar sign mpg and there we have eighteen point seven 194 00:14:48,040 --> 00:14:51,140 The MPG of the Hornet sport about car. 195 00:14:51,180 --> 00:14:51,860 OK. 196 00:14:52,110 --> 00:14:56,870 That's it for this lecture on the solutions walk through for the data frames exercise. 197 00:14:56,880 --> 00:15:02,130 If any of that was unclear makes you reference the notebook work through the exercises again or reference 198 00:15:02,130 --> 00:15:05,600 the data frames lectures from the data frame section of the course. 199 00:15:05,610 --> 00:15:07,390 Thanks everyone and I'll see you at the next lecture 21166

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.