Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,880 --> 00:00:04,550
Next up I want to talk about grouping and aggregating data.
2
00:00:04,690 --> 00:00:09,600
So within the transform tab of the query editor you see this group by option on the left.
3
00:00:09,850 --> 00:00:15,250
And this allows you to aggregate roll up your data at different levels of granularity.
4
00:00:15,490 --> 00:00:21,790
So some common examples of this would be doing something like transforming daily transactions into weekly
5
00:00:21,790 --> 00:00:29,880
or monthly or rolling up transaction level data by store by product brand by region et cetera.
6
00:00:30,160 --> 00:00:36,730
So it's taking a really deep really detailed table and rolling it up into a higher level summary.
7
00:00:36,730 --> 00:00:39,090
So let's take a quick look at an example.
8
00:00:39,310 --> 00:00:45,070
I know it's a little bit tough to see but what we've got here are order quantities by order date product
9
00:00:45,070 --> 00:00:47,010
key and customer key.
10
00:00:47,170 --> 00:00:53,170
And the important thing to note here is that we have multiple orders for a given product key in this
11
00:00:53,170 --> 00:01:00,480
case product number to 14 has been ordered multiple times on multiple dates by multiple different customers.
12
00:01:00,540 --> 00:01:07,390
And if you wanted to transform this table into a summary of orders order quantities rolled up by unique
13
00:01:07,420 --> 00:01:11,730
product keys the group by option is a great way to do that.
14
00:01:11,740 --> 00:01:17,290
So when you click that group by button you'll see a dialog box that looks something like this that basically
15
00:01:17,290 --> 00:01:19,070
allow you to tell power be-I.
16
00:01:19,270 --> 00:01:26,260
I'd like to group this table by unique product keys and the operation that I'd like to evaluate for
17
00:01:26,260 --> 00:01:30,860
those product keys is the sum of the order quantity column.
18
00:01:30,880 --> 00:01:36,400
In other words you're taking all of these duplicate rows with the same product key and multiple order
19
00:01:36,400 --> 00:01:39,400
quantities and you're compressing them down to one.
20
00:01:39,550 --> 00:01:45,040
When you do that compression or that aggregation how do we want to treat those order quantity values.
21
00:01:45,250 --> 00:01:49,780
In this case we're taking a simple sum and we're ending up with a table like this.
22
00:01:49,870 --> 00:01:57,010
Two columns one containing unique distinct product keys and the second containing the total quantity
23
00:01:57,250 --> 00:02:01,630
or the sum of order quantity values associated with each.
24
00:02:01,630 --> 00:02:09,340
So to recap what we've done is essentially transform a daily transaction level table into a summary
25
00:02:09,340 --> 00:02:12,370
of total quantity rolled up by product keys.
26
00:02:12,610 --> 00:02:14,620
And important thing to note here.
27
00:02:14,830 --> 00:02:21,790
As you may have noticed any fields that aren't specified in our group by settings are lost they're not
28
00:02:21,790 --> 00:02:27,850
preserved in that final table because we eliminated that level of granularity in the process.
29
00:02:27,850 --> 00:02:31,230
Now one more example using the advanced option.
30
00:02:31,480 --> 00:02:36,870
Consider the same exact table that we started with reusing the same group by option.
31
00:02:36,940 --> 00:02:42,340
But this time looking at the advanced version and really the only difference here between basic and
32
00:02:42,340 --> 00:02:49,440
advanced is that advanced allows you to specify multiple Collins or additional columns to group by.
33
00:02:49,720 --> 00:02:56,620
So this time instead of just grouping by product key or grouping by product key and customer key and
34
00:02:56,620 --> 00:02:59,100
again we're evaluating that same operation.
35
00:02:59,200 --> 00:03:02,890
The some of the order quantity values just like before.
36
00:03:03,190 --> 00:03:10,600
This time what we end up with is a three column table product key customer key and total quantity.
37
00:03:10,600 --> 00:03:18,010
So to recap we've again transformed that daily transaction level table now into a summary of total quantity
38
00:03:18,520 --> 00:03:22,750
aggregated by both product key and customer key.
39
00:03:22,780 --> 00:03:29,640
In other words we're now looking at quantities by each unique combination of those two fields.
40
00:03:30,070 --> 00:03:35,710
So the best comparison that I can make if you're an Excel user this is just like creating a pivot table
41
00:03:35,820 --> 00:03:42,520
and pulling in the sum of order quantity into your values and your product key and customer fields into
42
00:03:42,520 --> 00:03:43,730
your row labels.
43
00:03:44,140 --> 00:03:46,630
So that's grouping and aggregating data.
44
00:03:46,630 --> 00:03:52,770
In a nutshell that's actually work through an example or two in power be-I.
45
00:03:52,800 --> 00:03:56,240
So back in power be I mean my adventure works report.
46
00:03:56,520 --> 00:04:01,620
Instead of adding new data I'm going to edit my queries to jump into the query editor.
47
00:04:01,920 --> 00:04:04,640
Got my four connections my four tables here.
48
00:04:04,830 --> 00:04:12,750
I'm going to dig into my A.W. sales 20:17 data which contains daily sales records broken down by product
49
00:04:12,780 --> 00:04:16,330
keys by customer keys territory keys.
50
00:04:16,350 --> 00:04:21,120
So this is a good candidate for using these grouping or aggregation tools.
51
00:04:21,120 --> 00:04:28,020
So just like our demo Let's go ahead and select the product key column and let's say we want to turn
52
00:04:28,110 --> 00:04:35,640
this entire table which contains multiple product keys and multiple instances of product keys which
53
00:04:35,640 --> 00:04:43,340
we can see if we sort these as you can see multiple sales for product number 214 and so on and so forth
54
00:04:43,590 --> 00:04:46,030
for all of the other products in this table.
55
00:04:46,070 --> 00:04:52,400
There's a lot here so I have to scroll quite a bit to get to them but take my word for it there are
56
00:04:52,400 --> 00:04:54,860
many many more product ideas in here.
57
00:04:54,860 --> 00:05:01,190
So the idea is that we want to take this table collapse these rows with multiple product keys into essentially
58
00:05:01,190 --> 00:05:09,440
a summary table and just like our example in the slide let's evaluate the sum of order quantity for
59
00:05:09,440 --> 00:05:11,330
each of these product keys.
60
00:05:11,330 --> 00:05:20,740
So go ahead and transform click group by Remstar with our basic option here grouping by product key.
61
00:05:20,740 --> 00:05:26,800
The new column name which is the column that contains the values that are getting aggregated or rolled
62
00:05:26,800 --> 00:05:32,900
out we can call it whatever we want let's say total quantity for example.
63
00:05:33,130 --> 00:05:38,650
And again that operation you have different statistical functions here different aggregator functions
64
00:05:39,190 --> 00:05:44,850
could take the some you could average order values you could translate them all to a max or men you
65
00:05:44,860 --> 00:05:51,640
could count the rows in this case we want the sum and we want the sum of that order quantity and press
66
00:05:51,640 --> 00:05:52,710
OK.
67
00:05:52,780 --> 00:06:00,070
So just like our demo this collapsed our data into a two column table with that new aggregated quantity
68
00:06:00,070 --> 00:06:06,910
field that we've named total quantity and a unique list of product keys that can test that they're unique
69
00:06:07,180 --> 00:06:09,000
by sorting them.
70
00:06:09,050 --> 00:06:16,760
And as you can see there's only one instance of each of those IDs with the associated total quantity.
71
00:06:16,760 --> 00:06:18,640
Now let's do one more example.
72
00:06:18,670 --> 00:06:24,350
I'm just going to get back and remove those applied steps and I want to do one more group by example
73
00:06:24,650 --> 00:06:26,590
with the advanced options instead.
74
00:06:26,900 --> 00:06:28,860
So let's select product key again.
75
00:06:29,800 --> 00:06:32,510
Group by click the Advanced button.
76
00:06:32,850 --> 00:06:38,860
And now we can add a second grouping here for whatever other fields we want to pull into this summary
77
00:06:38,860 --> 00:06:39,760
table.
78
00:06:39,760 --> 00:06:45,910
So in this case maybe customer maybe territory choose customer key here because I want to see the total
79
00:06:45,910 --> 00:06:49,870
sales for every combination of product and customer.
80
00:06:49,870 --> 00:06:58,540
And again my column name could be something like total quantity and it will be the sum of the order
81
00:06:58,540 --> 00:06:59,630
quantity.
82
00:06:59,730 --> 00:07:00,300
OK.
83
00:07:01,210 --> 00:07:01,890
And there you go.
84
00:07:01,900 --> 00:07:04,070
Very very similar process here.
85
00:07:04,090 --> 00:07:10,540
It's now collapsed my table into a three column table containing unique combinations of product key
86
00:07:10,690 --> 00:07:11,950
and customer key.
87
00:07:12,280 --> 00:07:13,490
So there you have it.
88
00:07:13,540 --> 00:07:16,700
Once again let's go ahead and remove those steps.
89
00:07:16,750 --> 00:07:19,490
We don't need this sort by product key.
90
00:07:19,690 --> 00:07:24,460
And then in fact if we don't even want to save anything that we've done in here we don't even need to
91
00:07:24,460 --> 00:07:26,200
go to the close and apply button.
92
00:07:26,200 --> 00:07:29,340
We can simply close out of that query editor.
93
00:07:29,380 --> 00:07:32,100
So there you go grouping and aggregating data.
10176
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.