All language subtitles for 5. Overview of Data Frame Operations - Part 2

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian Download
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,750 --> 00:00:02,320 Hello everyone and welcome to this. 2 00:00:02,340 --> 00:00:08,100 Part 2 of the overview of data operations lecture and this lecture we're going to continue our understanding 3 00:00:08,390 --> 00:00:11,030 of some of the most common data frame operations. 4 00:00:11,040 --> 00:00:13,810 Let's go ahead and jump to our studio and get started. 5 00:00:14,310 --> 00:00:19,620 All right here your studio right where we left off the last lecture talking not referencing columns 6 00:00:19,620 --> 00:00:19,860 . 7 00:00:19,890 --> 00:00:24,390 We're going to continue talking about data from operations by talking about adding rows to the data 8 00:00:24,390 --> 00:00:25,350 frame. 9 00:00:25,350 --> 00:00:30,900 Let's go ahead and clear this data frame for the consul excuse me and talk about topic number seven 10 00:00:31,020 --> 00:00:43,660 adding rose to go ahead and create a data frame ZF to and call data frame pass in a column name call 11 00:00:43,670 --> 00:00:56,640 the name that one set of equal to two thousand and say call by named to and equal to the string or character 12 00:00:56,730 --> 00:00:57,250 you. 13 00:00:57,270 --> 00:01:03,660 So if I take a look at this DFA to notice has the same column names as the data from or working before 14 00:01:03,970 --> 00:01:05,120 the earth. 15 00:01:05,220 --> 00:01:07,070 Except now I have two new entries. 16 00:01:07,200 --> 00:01:13,260 The value 2000 and new in order to bind this new roads for a data frame. 17 00:01:13,400 --> 00:01:17,880 All we have to do is use the our bind function that we already used in the past. 18 00:01:17,880 --> 00:01:26,790 So I can say DFW call our Bines for binds and then I'm going to go ahead and pass on my original data 19 00:01:26,790 --> 00:01:33,890 frame D.S. and then DFI to which is that data frame that I want to bind as a new row. 20 00:01:34,080 --> 00:01:38,690 And now if I take a look at DFW you'll notice down at the bottom we have a new row. 21 00:01:38,730 --> 00:01:42,380 New value is 2000 and you. 22 00:01:42,390 --> 00:01:43,650 Now let's shift our focus. 23 00:01:43,650 --> 00:01:46,650 Talking about adding new columns to a data frame. 24 00:01:46,800 --> 00:01:48,770 There's a couple of different ways to do this. 25 00:01:48,770 --> 00:01:50,820 I'm going to go ahead and show you a few of them. 26 00:01:51,040 --> 00:01:58,470 Going to go out and clear the console and color ADF which is the original data frame column name one 27 00:01:58,470 --> 00:02:01,490 column name two rows one through 10. 28 00:02:01,860 --> 00:02:08,060 I can add a new column using the dollar sign method which is a dollar sign. 29 00:02:08,370 --> 00:02:14,660 Create a new column name and then pass in whatever you want the column to be. 30 00:02:14,850 --> 00:02:21,120 So imagine I wanted a new column that was just double values of column name one where I can go ahead 31 00:02:21,120 --> 00:02:30,760 and do say two times and then pass in the data frame if dollar sign called out Name one. 32 00:02:30,960 --> 00:02:38,440 And now if we look at B-F you'll notice we have a value new call for this new column and it's just double 33 00:02:38,440 --> 00:02:40,390 the value of column name 1. 34 00:02:40,810 --> 00:02:46,500 And that's one way you can quickly create new columns onto a data frame instead of using something like 35 00:02:46,950 --> 00:02:49,090 row binder see buying that we saw earlier. 36 00:02:49,200 --> 00:02:54,210 It's much easier to just go ahead and name your new column directly with some sort of assignments of 37 00:02:54,210 --> 00:02:55,310 that new column. 38 00:02:55,320 --> 00:03:01,830 Keep in mind that these values should line up as far as a number of elements for your column. 39 00:03:01,830 --> 00:03:05,510 This sort of operation is also really useful if you want to make copies of columns. 40 00:03:05,520 --> 00:03:11,550 So for instance we take a look at if we have called Name one called Name 2 and in that new column let's 41 00:03:11,550 --> 00:03:19,500 say I wanted to make a copy of my new column I could say if dollar sign new call and have that just 42 00:03:19,500 --> 00:03:23,740 be equal to Deif dollar sign you call. 43 00:03:24,090 --> 00:03:30,630 But instead of calling it you call it I'm going to do is say you call that copy. 44 00:03:30,630 --> 00:03:33,720 Now if I do this it's going to go ahead and check out the name. 45 00:03:33,720 --> 00:03:34,940 The head of media. 46 00:03:35,430 --> 00:03:38,600 You'll see I have a copy of that new column. 47 00:03:38,610 --> 00:03:42,250 You can also use any other of the column references we've talked about. 48 00:03:42,260 --> 00:03:57,240 So for instance I could say D.S. brackets comma and then put in new call let's say copy to then assign 49 00:03:57,240 --> 00:04:02,180 this to DMF new call. 50 00:04:02,190 --> 00:04:06,570 So now if I check head of D.S. I have the second copy here. 51 00:04:06,630 --> 00:04:10,870 So I have new call you call the copy new call the copy to. 52 00:04:10,890 --> 00:04:16,620 The only difference between this sort of operation and this line that I'm highlighting versus this operation 53 00:04:16,740 --> 00:04:18,860 is just the way I'm addressing the column. 54 00:04:18,960 --> 00:04:25,410 So classic notation is just dollar sign method but you can also use the brackets and comma method for 55 00:04:25,440 --> 00:04:26,940 denoting that new column. 56 00:04:26,970 --> 00:04:29,130 It's really up to you and what you feel more comfortable with. 57 00:04:29,280 --> 00:04:35,100 But the basic premise is that you call a column as if it's already on your data frame but remember to 58 00:04:35,100 --> 00:04:40,800 give it some new assignment for new values and that's the basic way of adding new columns to your data 59 00:04:40,900 --> 00:04:41,340 . 60 00:04:41,640 --> 00:04:44,550 So we've gone over adding rows adding columns. 61 00:04:44,550 --> 00:04:47,400 Up next is setting column names. 62 00:04:47,400 --> 00:04:53,940 We already know and went over that if we just say call names we can get back the names of the columns 63 00:04:53,940 --> 00:05:01,600 of our data frame if we want to actually rename columns what we can do is use column names. 64 00:05:01,620 --> 00:05:08,140 Passen are data frame and there's two things we can do if we wanted to rename all of them at once. 65 00:05:08,190 --> 00:05:14,940 We could just pass a vector of new names so I could say one and that isn't just passing characters. 66 00:05:14,940 --> 00:05:21,180 These aren't actually integers 3 for what's good and say five. 67 00:05:21,180 --> 00:05:28,460 So I have columns 1 2 3 4 5 so we'll rename them to just those numbers 1 2 3 4 5 and then if we check 68 00:05:28,460 --> 00:05:35,000 out the head of our data frame notice now instead of those original column names you just have 1 2 3 69 00:05:35,000 --> 00:05:36,310 4 and 5. 70 00:05:36,560 --> 00:05:41,330 So that's what you could do if you wanted to rename all the columns at once if you just wanted to rename 71 00:05:41,330 --> 00:05:48,830 a single column but you'd end up doing is calling call names DPF and then with brackets you would go 72 00:05:48,830 --> 00:05:52,130 ahead and select what column number you want to rename. 73 00:05:52,130 --> 00:05:54,430 So for instance let's say I want to rename the first column. 74 00:05:54,560 --> 00:06:00,380 I'll say one and then I'll just say new column name. 75 00:06:00,440 --> 00:06:02,400 Pass that as a string. 76 00:06:02,420 --> 00:06:08,500 And if I check the head of the IDF notice I have a new code name for that first column and my data frame 77 00:06:08,500 --> 00:06:08,790 . 78 00:06:09,080 --> 00:06:13,910 And again quick note here this is an integer not a character. 79 00:06:13,960 --> 00:06:14,380 All right. 80 00:06:14,420 --> 00:06:19,210 So that's the basic overview of the topic of column names in a data frame as far as setting them. 81 00:06:19,540 --> 00:06:24,320 You have a few more topics go over and those are selecting multiple rows selecting multiple columns 82 00:06:24,410 --> 00:06:26,150 and then dealing with missing data. 83 00:06:26,150 --> 00:06:27,930 Let's talk about how to select most Choros. 84 00:06:27,950 --> 00:06:32,460 We've already gone in the first part of this lecture series on how to select the single row. 85 00:06:32,600 --> 00:06:35,710 Let's talk about selecting multiple rows. 86 00:06:36,020 --> 00:06:36,960 Actually quite easy. 87 00:06:37,010 --> 00:06:40,520 And it's really similar to how we selected the multiple rows in a matrix. 88 00:06:40,550 --> 00:06:46,460 All you have to do is put in the name of your data frame and then what you can do is go ahead and slice 89 00:06:46,580 --> 00:06:47,330 the rows you want. 90 00:06:47,330 --> 00:06:53,610 So if you wanted the first 10 rows you would just use or slicing notation comma. 91 00:06:53,750 --> 00:06:56,810 And that's one way of selecting the first 10 rows. 92 00:06:56,810 --> 00:06:59,900 You can go ahead and say selects first let's say three rows. 93 00:07:00,110 --> 00:07:02,080 Now returns first three rows. 94 00:07:02,090 --> 00:07:07,700 This is essentially the same as calling the head of your data frame both for a specific number of rows 95 00:07:07,850 --> 00:07:09,340 that you want returns. 96 00:07:09,360 --> 00:07:14,270 Now the way you can do this is just by using head sooner that if we just say head data frame. 97 00:07:14,320 --> 00:07:20,100 I'll return the first six rows but you can always specify how many rows you want back as far as these 98 00:07:20,120 --> 00:07:23,480 top rows and you can specify that a second argument. 99 00:07:23,480 --> 00:07:29,930 So let's say I want the first seven rows of my data frame I can say DF comma seven and I'll return the 100 00:07:29,930 --> 00:07:33,620 first seven rows. 101 00:07:33,620 --> 00:07:39,560 You can also take advantage of negative sign to select everything but a certain row. 102 00:07:39,590 --> 00:07:43,110 So imagine I wanted to select everything but row 2. 103 00:07:43,250 --> 00:07:45,290 I'm going to go out and clear the consul here. 104 00:07:45,560 --> 00:07:48,020 So we have data frame. 105 00:07:48,020 --> 00:07:52,280 Let's just say head of data frame and I want to select everything. 106 00:07:52,280 --> 00:07:53,770 But the second row. 107 00:07:53,810 --> 00:07:57,800 So notice how second row has a bunch of other unwieldy numbers there. 108 00:07:57,800 --> 00:07:58,840 We want to select everything. 109 00:07:58,850 --> 00:08:05,840 But that second row I can say with brackets we say to comma that would have selected the second row 110 00:08:06,260 --> 00:08:11,570 by say negative to comma that selects everything but the second row. 111 00:08:12,050 --> 00:08:12,550 OK. 112 00:08:12,550 --> 00:08:19,440 And so you can use those data signs in a few other operations with our in order to say everything. 113 00:08:19,490 --> 00:08:23,960 But now we'll go over those as they come up throughout the course. 114 00:08:23,960 --> 00:08:29,750 Finally I want to go over conditional selection and in order to do this we're going to be using empty 115 00:08:29,750 --> 00:08:33,240 cars data frame so that's empty cars. 116 00:08:33,410 --> 00:08:39,550 And it looks like this we can do conditional selection on a data frame by passing in logical conditions 117 00:08:39,560 --> 00:08:39,620 . 118 00:08:39,650 --> 00:08:43,190 We want to filter by and the syntax for that is as follows. 119 00:08:43,190 --> 00:08:48,290 I'll say something like empty cars brackets. 120 00:08:48,620 --> 00:08:56,840 And then let's say I wanted to pass or get back every car or every row where the MPG was greater than 121 00:08:57,020 --> 00:08:58,450 20 mpg. 122 00:08:58,610 --> 00:09:02,800 What I would then end up doing is assin the name of my data frame in this case. 123 00:09:02,810 --> 00:09:05,460 Empty cars dollar sign. 124 00:09:05,630 --> 00:09:10,070 MPG is the column I'm interested in and I will say greater then 20. 125 00:09:10,070 --> 00:09:12,810 So he used this comparison operator. 126 00:09:12,930 --> 00:09:13,520 All right. 127 00:09:13,520 --> 00:09:17,470 So if I just do this I'll get back in air. 128 00:09:17,550 --> 00:09:22,640 And the reason I'm getting back the air is because I have undefine columns selected. 129 00:09:22,640 --> 00:09:26,390 So what I actually need to do is remember the pass in it comma here. 130 00:09:26,600 --> 00:09:31,740 So students sometimes forget to pass and that comma and they get this undefine column selected. 131 00:09:31,940 --> 00:09:36,380 So remember that if you're getting this undefine column selected it's because you forgot to say Oh give 132 00:09:36,380 --> 00:09:38,630 me back all the columns for that. 133 00:09:39,410 --> 00:09:41,280 And now we get back the state of frame. 134 00:09:41,360 --> 00:09:43,430 And we can break this down by thinking. 135 00:09:43,790 --> 00:09:50,300 We're asking are for empty cars data frame where this statement is true for the rows where the column 136 00:09:50,300 --> 00:09:52,200 of them e.g. is greater than 20. 137 00:09:52,250 --> 00:09:55,310 So we're just saying give me back the rows where this is true. 138 00:09:55,640 --> 00:09:58,750 Comma all the columns for that. 139 00:09:58,760 --> 00:10:06,170 So then you can also pass in addition arguments over here for specific columns back. 140 00:10:06,170 --> 00:10:12,300 Let's go ahead and build on top of this example by filtering by two separate columns. 141 00:10:12,320 --> 00:10:16,190 So for example I have empty cars. 142 00:10:16,190 --> 00:10:25,950 Let's go ahead and show the head them see cars so mpg cylinders horsepower etc. I can say empty cars 143 00:10:27,490 --> 00:10:33,430 rackets I can put it in one condition like we saw earlier where I say empty cars. 144 00:10:33,540 --> 00:10:36,020 MPG is greater than 20. 145 00:10:36,280 --> 00:10:40,980 Well let's say I also wanted the number of cylinders to be equal to six cylinders. 146 00:10:40,990 --> 00:10:44,830 So that's this second column c y l cylinders. 147 00:10:44,880 --> 00:10:56,610 I would then say and put that and operator there and say and empty cars dollar sign c y l is equal to 148 00:10:56,670 --> 00:10:57,970 6. 149 00:10:58,170 --> 00:11:05,400 And remember to put in a comma here and sometimes it's also nice to put in parentheses around your logical 150 00:11:05,860 --> 00:11:11,790 for comparison operator statements just so it's a little easier to read you can see what are the two 151 00:11:11,790 --> 00:11:18,270 separate factors that you're trying to compare and you'll see we'll get back empty cars where the MPG 152 00:11:18,570 --> 00:11:25,260 was greater than 20 and the number of cylinders was equal to six if you wanted to only get specific 153 00:11:25,260 --> 00:11:26,790 cones back from this. 154 00:11:26,790 --> 00:11:29,820 You could add that in as a second argument over here. 155 00:11:29,820 --> 00:11:35,320 So for instance let's say we only wanted to get HP or horsepower back for these cars. 156 00:11:35,650 --> 00:11:48,130 Well it's a MPG cylinder and HP I couldn't pasan a vector of those column names say mpg cylinder horsepower 157 00:11:48,120 --> 00:11:48,970 . 158 00:11:49,620 --> 00:11:51,990 And now we only get back those columns. 159 00:11:52,000 --> 00:11:56,430 This is the kind of filtering you're going to be doing all the time when you're importing data from 160 00:11:56,430 --> 00:11:59,430 a CSFB and playing around a fit trying to visualize it. 161 00:11:59,620 --> 00:12:01,150 Get an idea of what it looks like. 162 00:12:01,170 --> 00:12:05,260 So these are the kind of things that we're trying to build up to and use a lot throughout the course 163 00:12:05,250 --> 00:12:05,580 . 164 00:12:05,580 --> 00:12:07,130 These sort of statements. 165 00:12:07,290 --> 00:12:12,640 Now this is how you can select rows or multiple rows based on some sort of condition. 166 00:12:12,630 --> 00:12:16,920 You can also use the subset function to do the exact same thing. 167 00:12:17,010 --> 00:12:24,480 So the subset function you say subset passing your data frame in this case it's actually empty cars 168 00:12:24,480 --> 00:12:25,120 . 169 00:12:25,470 --> 00:12:31,080 And then you basically do all these same commands except you don't have to worry about calling it off 170 00:12:31,080 --> 00:12:31,800 the data frame. 171 00:12:31,810 --> 00:12:36,270 Since you already passed the data frame into the subset function so it knows what you're actually talking 172 00:12:36,270 --> 00:12:36,800 about. 173 00:12:37,050 --> 00:12:46,440 So you can say something like MPG greater than 20 and empty cars. 174 00:12:46,470 --> 00:12:53,540 Sit ups use me and cylinders equals to six. 175 00:12:53,620 --> 00:12:58,170 So now return the same subset as this call. 176 00:12:58,200 --> 00:13:03,300 But notice what's nice about using subset if you prefer to use it that way is that you don't have to 177 00:13:03,390 --> 00:13:07,150 continually say empty cars dollar sign in the name of the column. 178 00:13:07,320 --> 00:13:12,030 Since you passed in the data frame it's subset function it already knows what you mean when you say 179 00:13:12,030 --> 00:13:15,110 something like MPG or seat y l. 180 00:13:15,120 --> 00:13:15,690 All right. 181 00:13:16,050 --> 00:13:22,410 So that's the basics of using a subset or this sort of bracket notation use whatever you feel most comfortable 182 00:13:22,410 --> 00:13:23,210 with. 183 00:13:23,220 --> 00:13:25,810 Let's go ahead and clear the console. 184 00:13:26,350 --> 00:13:29,480 We've already seen a few examples of how to select the multiple column names. 185 00:13:29,480 --> 00:13:32,710 Foolishest review them quickly. 186 00:13:32,860 --> 00:13:38,000 So I'll say have empty cars. 187 00:13:38,520 --> 00:13:43,100 Check the head of it one more time just so we can take a look at those column names we have. 188 00:13:43,110 --> 00:13:44,810 MPG cylinder. 189 00:13:44,800 --> 00:13:46,630 So I want to select multiple columns. 190 00:13:46,620 --> 00:13:48,300 There's a few ways I can do this. 191 00:13:48,630 --> 00:13:56,880 I can say empty cars brackets nothing comma and then a vector of either the numbers that relate to the 192 00:13:56,880 --> 00:13:57,980 columns I want. 193 00:13:58,140 --> 00:14:06,960 So for example if I want columns 1 2 and 3 in this case or B mpg c y l the ISP it returns back. 194 00:14:06,960 --> 00:14:13,860 If we scroll up here those street columns the other way of doing this we actually just saw is by saying 195 00:14:14,880 --> 00:14:18,820 passing in the actual names is a feel like MPG comma. 196 00:14:18,970 --> 00:14:21,370 See y while etc.. 197 00:14:21,810 --> 00:14:24,540 And if we scroll up to see this it just returns this column. 198 00:14:24,540 --> 00:14:25,780 So we asked for. 199 00:14:25,800 --> 00:14:26,540 So it's up to you. 200 00:14:26,550 --> 00:14:28,560 Usually I'll probably be using something like this. 201 00:14:28,650 --> 00:14:32,820 As you remember the order of the columns you just remember the name of the columns. 202 00:14:32,910 --> 00:14:37,520 Finally let's clear the consul and talk about dealing with missing data. 203 00:14:37,680 --> 00:14:41,740 So dealing with missing data is a pretty important skill to know especially when you're working with 204 00:14:41,740 --> 00:14:42,650 data frames. 205 00:14:42,780 --> 00:14:47,240 And there's a couple of useful built in functions to help you find missing data and check if there's 206 00:14:47,250 --> 00:14:49,330 missing data in your data front. 207 00:14:49,470 --> 00:14:56,280 So the way to do this is let's say we want to detect if there were any missing data points or an eight 208 00:14:56,280 --> 00:14:57,370 points. 209 00:14:57,370 --> 00:15:01,280 When I say any names we have no oil are missing data there. 210 00:15:01,360 --> 00:15:03,450 To wanted to take them anywhere in our data frame. 211 00:15:03,450 --> 00:15:04,670 How can we do that. 212 00:15:05,070 --> 00:15:08,420 We can say is not an A. 213 00:15:09,340 --> 00:15:16,320 And then passing your data frame So let's say we pass in empty cars if we pass this is an antique cars 214 00:15:16,320 --> 00:15:16,890 . 215 00:15:16,890 --> 00:15:20,610 We get back this data frame of boolean values. 216 00:15:20,820 --> 00:15:23,030 So we say false false fossils false. 217 00:15:23,250 --> 00:15:28,250 And notice we get all falses because there is no missing information in the state of frame. 218 00:15:28,380 --> 00:15:33,250 If for some reason one of these had an essay or no value we will get a true somewhere around the state 219 00:15:33,250 --> 00:15:34,300 of frame. 220 00:15:34,320 --> 00:15:40,350 So the way you can quickly check if you have any known or any are missing data in your data frame is 221 00:15:40,360 --> 00:15:41,930 bypassing that same argument. 222 00:15:41,940 --> 00:15:44,690 And we go in and clear the council. 223 00:15:44,940 --> 00:15:55,500 So we say is N.A. Asin your data frame and you can take advantage of the any function and that will 224 00:15:55,500 --> 00:15:58,250 check if any of those valleys is true. 225 00:15:58,710 --> 00:16:04,320 In this case none of those values are true so we get false back that if any single one of those values 226 00:16:04,340 --> 00:16:10,380 and that is that and a check on your data frame was true you would have gotten a true back as a report 227 00:16:10,380 --> 00:16:10,880 . 228 00:16:10,890 --> 00:16:17,130 So this is a nice little shortcut any isn't a data frame in order to check if you have any missing points 229 00:16:17,350 --> 00:16:19,610 anywhere in your data frame. 230 00:16:19,620 --> 00:16:23,660 Again you can expand this idea if you want to check if they're anywhere in a certain column. 231 00:16:23,730 --> 00:16:29,850 You can just pass in the set of your data frame pass in your column such as let's say for dealing with 232 00:16:29,940 --> 00:16:34,810 empty cars in pass in mpg. 233 00:16:36,430 --> 00:16:40,690 And if you wanted the whole empty car's data frame just say or any of them. 234 00:16:40,790 --> 00:16:41,290 No. 235 00:16:41,340 --> 00:16:47,210 Or N.A. if you want to replace missing data you can do that by taking advantage of this. 236 00:16:47,250 --> 00:16:50,640 Is that an a call so we can say something like this. 237 00:16:50,670 --> 00:17:00,260 Our data frame in this case will say D.S. will check is the whoops we'll put in a bracket and say is 238 00:17:01,540 --> 00:17:08,780 at a pass in your data frame and you can go ahead and pass in the values you want to replace. 239 00:17:08,800 --> 00:17:12,430 You can say replace all know values of physico. 240 00:17:12,550 --> 00:17:16,950 Usually you probably won't want to do such a broad command since it's going to do this for every single 241 00:17:16,950 --> 00:17:18,420 column in your data frame. 242 00:17:18,690 --> 00:17:19,720 But just keep that in mind. 243 00:17:19,720 --> 00:17:24,710 You can use the sort of is that in a notation check to replace null values. 244 00:17:24,900 --> 00:17:28,260 Do you only want it to do this for a single selected column. 245 00:17:28,260 --> 00:17:32,090 You'd use the exact same notation but you would pass and column names instead. 246 00:17:32,350 --> 00:17:35,790 So you would say something like for instance for the empty cars. 247 00:17:35,880 --> 00:17:41,740 Let's say we had a missing values in the MPG column you would say empty cars dollar sign mpg for the 248 00:17:41,730 --> 00:17:49,120 MPG column is dot and a you would say empty cars. 249 00:17:49,500 --> 00:17:52,780 Dollar Sign this case mpg. 250 00:17:52,980 --> 00:17:57,360 And then you would go ahead and replace that with whatever value you wanted to replace that with zero 251 00:17:57,370 --> 00:17:57,910 . 252 00:17:57,900 --> 00:18:08,220 What's nice is you can then also use something like mean of some column such as empty cars MPG to quickly 253 00:18:08,220 --> 00:18:12,860 replace no values with the average value of that column. 254 00:18:13,050 --> 00:18:16,200 And that's kind of a common method for dealing with missing data. 255 00:18:16,200 --> 00:18:20,490 Really depends on what your data looks like and what best practices are for that situation you're dealing 256 00:18:20,490 --> 00:18:21,060 with. 257 00:18:21,070 --> 00:18:26,490 But this sort of line can be really useful to quickly compute average or mean data into a column that's 258 00:18:26,490 --> 00:18:28,140 missing values. 259 00:18:28,140 --> 00:18:30,090 All right I hope that helped. 260 00:18:30,120 --> 00:18:35,370 Remember to use the notes as a reference in order to fully understand or reference a cheat sheet for 261 00:18:35,380 --> 00:18:36,450 this lecture. 262 00:18:36,880 --> 00:18:37,860 OK. 263 00:18:37,870 --> 00:18:39,780 Thanks everyone and I'll see you at the next lecture 27394

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.