Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,420 --> 00:00:05,250
Hello everyone and welcome to the lecture on data frame's selection and indexing.
2
00:00:05,530 --> 00:00:08,910
And this lecture we're going to learn how to grab data out of our data frame.
3
00:00:09,120 --> 00:00:12,410
Let's go ahead and jump to our studio to see how to do this.
4
00:00:12,450 --> 00:00:12,810
All right.
5
00:00:12,810 --> 00:00:19,230
So here our studio and I'm going to go ahead and just move the environment tabs and the file tabs to
6
00:00:19,230 --> 00:00:22,880
the side so we can focus on the scripting and console.
7
00:00:23,090 --> 00:00:27,200
Notice I'm going to be using the same data we used in a previous lecture.
8
00:00:27,240 --> 00:00:33,300
This weather data of the days the temperature and the rain to go ahead and pass this as ADF going to
9
00:00:33,300 --> 00:00:35,400
go ahead and run the source.
10
00:00:35,440 --> 00:00:39,750
And now on a call if it's going to reveal that data frame.
11
00:00:39,750 --> 00:00:45,030
Let's go ahead and learn how we can use this bracket notation we're familiar with to actually grab information
12
00:00:45,390 --> 00:00:52,050
from a data frame when we were learning about Matrix indexing and selection of elements from a matrix
13
00:00:52,060 --> 00:00:52,260
.
14
00:00:52,350 --> 00:00:55,350
We learned that we could just say something like Matt.
15
00:00:55,650 --> 00:01:00,750
And then the Rose who want it and the columns he wanted were actually going to be able to do the same
16
00:01:00,750 --> 00:01:02,550
thing with data frames.
17
00:01:02,550 --> 00:01:05,680
So imagine that we wanted everything from the first row.
18
00:01:05,790 --> 00:01:10,870
We would just input row number one comma and then blank in the killing.
19
00:01:11,010 --> 00:01:14,980
All the columns from the first row and that would actually return.
20
00:01:15,000 --> 00:01:20,330
Likewise the first row of our data frame and notice how the labels of the columns are kept in the results
21
00:01:20,330 --> 00:01:20,920
.
22
00:01:20,970 --> 00:01:24,900
They can also do the same thing to get everything from the first column.
23
00:01:25,080 --> 00:01:32,700
So I can say something like blank comma 1 instead of that square bracket notation and slow return the
24
00:01:32,700 --> 00:01:36,590
days notice that the levels are included here.
25
00:01:37,620 --> 00:01:42,600
Now something to note here is that one of the main points of using data frames is actually being able
26
00:01:42,600 --> 00:01:49,050
to label your columns and your rows by some sort of character or string names such as days temp or rain
27
00:01:49,050 --> 00:01:49,940
.
28
00:01:50,040 --> 00:01:55,230
We're going to be able to actually index using those names instead of having to remember which column
29
00:01:55,230 --> 00:01:57,270
numbers are which row numbers you want.
30
00:01:57,480 --> 00:01:59,220
Let's go ahead and show how we can do this.
31
00:01:59,340 --> 00:02:01,530
Going to go ahead and clear this.
32
00:02:01,530 --> 00:02:03,170
We have our data frame.
33
00:02:03,210 --> 00:02:07,240
Let's go ahead and select all range values from the rain column.
34
00:02:07,380 --> 00:02:15,330
I can say Blinkx comma and then input the string rain instead of inputting the number three for the
35
00:02:15,330 --> 00:02:21,230
third index column and that's going to return all the values of the rain call.
36
00:02:21,300 --> 00:02:26,770
For example let's say I wanted to get the first five rows for days and temps.
37
00:02:27,030 --> 00:02:33,600
Well I can say D-F if I want the first five rows That's actually all the rows in this case going to
38
00:02:33,600 --> 00:02:36,260
go ahead in the use slicing notation here.
39
00:02:36,300 --> 00:02:41,710
One colon five comma and then I want the columns days and temp.
40
00:02:41,730 --> 00:02:45,640
So I want all five row values for the columns days and temp.
41
00:02:45,780 --> 00:02:54,540
I can go ahead and pass and a vector is in the combined function with days and temp inside of it.
42
00:02:55,260 --> 00:03:00,350
And there you have that subsection of the data frame with just those two columns.
43
00:03:01,140 --> 00:03:05,030
We also have the ability to just grab all the values of a particular column.
44
00:03:05,130 --> 00:03:08,250
Now we can do this in a couple of ways.
45
00:03:08,280 --> 00:03:11,940
One way we can do this is by using the dollar sign notation.
46
00:03:11,940 --> 00:03:17,480
So you put the name of your data frame a dollar sign and then the name of the column.
47
00:03:17,670 --> 00:03:22,950
And you'll notice that our studio actually provides little clips for the column names.
48
00:03:23,160 --> 00:03:27,430
So I can just go ahead and say days will return all the values for days.
49
00:03:27,660 --> 00:03:32,630
The dollar sign and I begin to get the other column name option's temp etc..
50
00:03:32,960 --> 00:03:36,240
We can also use bracket notation to do the same thing.
51
00:03:36,270 --> 00:03:42,140
I can put India off and then just pass in a string of the column I want.
52
00:03:42,150 --> 00:03:44,050
Notice how this is different though.
53
00:03:44,400 --> 00:03:50,040
If I use the bracket notation I'm actually getting the data returns in a data frame format.
54
00:03:50,040 --> 00:03:58,230
So I keep this index notation with the column days and the row number values if I use the dollar sign
55
00:03:58,230 --> 00:04:01,130
format actually getting back a vector.
56
00:04:01,170 --> 00:04:07,050
So keep that in mind when you're using this notation to callback columns completely.
57
00:04:07,050 --> 00:04:07,550
All right.
58
00:04:07,620 --> 00:04:13,020
Let's go ahead and learn how we can use the subset function to grab a subset of values from our data
59
00:04:13,030 --> 00:04:15,280
Freyne based off some condition.
60
00:04:15,610 --> 00:04:19,120
For example imagine I wanted to grab the days where it rained.
61
00:04:19,170 --> 00:04:24,540
That means we want to grab the days where rain is equal to true but use a subset condition to do that
62
00:04:25,440 --> 00:04:29,450
you put in subset you pass in your data frame source.
63
00:04:29,450 --> 00:04:31,890
So in our case it's día.
64
00:04:32,910 --> 00:04:39,350
Then you pass in the subset argument and you go ahead and pasand some sort of condition statement that
65
00:04:39,380 --> 00:04:43,220
these conditions statements pretty much always use comparison operators.
66
00:04:43,440 --> 00:04:47,580
And we want our condition to be rain is equal to true.
67
00:04:47,580 --> 00:04:51,780
Notice that rain doesn't actually have to be a character string.
68
00:04:51,870 --> 00:04:56,190
You can just put in range without any quotes and that's because subset knows that you're going to be
69
00:04:56,190 --> 00:05:01,580
referring to column names in the court and subset.
70
00:05:01,580 --> 00:05:08,090
If we go ahead and do that we get back a data frame where the values of rain was equal to true again
71
00:05:08,110 --> 00:05:08,190
.
72
00:05:08,220 --> 00:05:12,960
Notice how the condition uses some sort of comparison operator in the above case it was the equality
73
00:05:12,960 --> 00:05:13,700
operator.
74
00:05:13,800 --> 00:05:15,130
Equals equals.
75
00:05:15,480 --> 00:05:20,310
Let's go ahead and show another example where we grab days where the temperature was greater than 23
76
00:05:20,310 --> 00:05:21,960
degrees.
77
00:05:22,170 --> 00:05:30,120
Again I say subset pass in my data frame source put in subset as my argument and then put in some sort
78
00:05:30,120 --> 00:05:31,040
of condition.
79
00:05:31,170 --> 00:05:39,370
This case I want temp column to be greater than 23 so return all instances where that's true.
80
00:05:39,660 --> 00:05:40,470
And there you have it.
81
00:05:40,470 --> 00:05:44,320
It's rows 4 and 5 and it's returned as a data frame.
82
00:05:45,030 --> 00:05:51,300
OK finally let's learn about ordering a data from we can sort the order of our data frame by using the
83
00:05:51,390 --> 00:05:56,250
order function you pass in the column you want to sort by into the order function.
84
00:05:56,430 --> 00:05:59,550
Then you use that vector to select from the data frame.
85
00:05:59,550 --> 00:06:03,000
Let's go ahead and see an example of this.
86
00:06:03,320 --> 00:06:09,430
I'm going to make a variable called sorted temp.
87
00:06:10,080 --> 00:06:17,960
I'm going to go ahead and call the order function and then Pessin ZF.
88
00:06:18,090 --> 00:06:28,740
Let's go ahead and pasand DFA temp in if I call sort of temp.
89
00:06:28,740 --> 00:06:29,490
All right.
90
00:06:29,490 --> 00:06:36,180
Notice how I get back the index notation for the temp vector.
91
00:06:36,180 --> 00:06:41,510
Basically what this means is it's the sorted temperatures by index.
92
00:06:41,550 --> 00:06:49,030
So temperature 21 should be sorted first if you're going to sort them from least to greatest.
93
00:06:49,050 --> 00:06:53,650
Then it goes twenty two point two and the remaining three were already in order.
94
00:06:53,670 --> 00:06:58,160
Let me go ahead and show you how you can use this sort of vector to actually sort your data frame.
95
00:06:58,340 --> 00:07:08,220
You would just go DMF pass in sort of temp comma and this will go ahead and sort your data frame by
96
00:07:08,220 --> 00:07:17,010
those temperature values and you can notice now how these rows values were connected to sort of temp
97
00:07:17,010 --> 00:07:17,970
.
98
00:07:18,300 --> 00:07:24,000
When we think of what's actually going on here we're essentially just asking for the index elements
99
00:07:24,090 --> 00:07:26,760
in that order and by default it's ascending.
100
00:07:26,760 --> 00:07:30,570
You can also pass a negative sign to do a descending order.
101
00:07:30,600 --> 00:07:33,060
Let me go ahead and show you how we can do that.
102
00:07:33,110 --> 00:07:45,150
All say sending 10 order that a negative term.
103
00:07:45,270 --> 00:07:48,010
Take a look at the ascending temp.
104
00:07:48,210 --> 00:07:50,290
Now it's 5 4 3 1 2.
105
00:07:50,430 --> 00:07:59,310
And when we pass that into our data frame the sending temp column or comics Scuse me because all the
106
00:07:59,310 --> 00:08:01,170
columns.
107
00:08:01,170 --> 00:08:05,310
Now we get a descending temperature again with this sort of notation.
108
00:08:05,310 --> 00:08:07,680
All you're doing is you're basically just asking.
109
00:08:07,800 --> 00:08:15,660
Give me back the rows of the data frame in this order 5 4 3 1 2 because you use the order function to
110
00:08:15,660 --> 00:08:17,900
sort by temperature.
111
00:08:17,910 --> 00:08:22,800
You also could have done this exact operation instead of using the bracket notation here by using the
112
00:08:22,800 --> 00:08:26,200
dollar sign notation to show you a quick example that.
113
00:08:26,250 --> 00:08:35,110
Let's go ahead and just say the sending temp order E-F.
114
00:08:35,890 --> 00:08:42,480
Tim I will go ahead and put a negative sign up front that is sending and then we'll actually get the
115
00:08:42,480 --> 00:08:47,170
same thing when we call for the sending as the data frame.
116
00:08:47,190 --> 00:08:52,650
All right that's it for the basics of selection and indexing of a data frame in the next lecture.
117
00:08:52,650 --> 00:08:58,200
We're going to be doing a complete overview of the various operations and capabilities of a data frame
118
00:08:58,200 --> 00:08:58,830
.
119
00:08:58,890 --> 00:09:04,200
Remember for the next lecture there's an entire notebook ready for you to reference almost as a cheat
120
00:09:04,200 --> 00:09:06,090
sheet for use in data frames.
121
00:09:06,150 --> 00:09:09,930
The next lecture is actually going to be one of the most fundamental lectures as far as learning how
122
00:09:09,930 --> 00:09:12,390
to use data frames with our.
123
00:09:12,390 --> 00:09:15,200
OK thanks everyone and I'll see you at the next lecture
13109
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.