Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,300 --> 00:00:01,800
Instructor: Before we go any further,
2
00:00:01,800 --> 00:00:05,130
let's take a minute to discuss the previous situation.
3
00:00:05,130 --> 00:00:07,170
If this was a real world situation,
4
00:00:07,170 --> 00:00:09,060
you would've many many points,
5
00:00:09,060 --> 00:00:11,970
potentially forming four clusters.
6
00:00:11,970 --> 00:00:14,250
With the risk of oversimplifying the matter,
7
00:00:14,250 --> 00:00:18,840
B could represent small, expensive apartments or ripoffs.
8
00:00:18,840 --> 00:00:22,230
A would represent small, reasonably priced apartments.
9
00:00:22,230 --> 00:00:25,410
D, big, reasonably priced apartments
10
00:00:25,410 --> 00:00:29,673
and C would represent big, cheap apartments or bargains.
11
00:00:30,780 --> 00:00:34,440
All else equal, what are we likely to observe usually?
12
00:00:34,440 --> 00:00:36,300
Small apartments would be cheaper
13
00:00:36,300 --> 00:00:38,970
and big apartments would be more expensive.
14
00:00:38,970 --> 00:00:40,200
Maybe the rip-offs.
15
00:00:40,200 --> 00:00:42,540
Were representing apartments in the city center
16
00:00:42,540 --> 00:00:45,750
while the bargains apartments in the suburbs.
17
00:00:45,750 --> 00:00:47,790
If we separate them from the rest,
18
00:00:47,790 --> 00:00:51,150
we will be left with something that looks very familiar,
19
00:00:51,150 --> 00:00:53,013
our good old regression.
20
00:00:53,910 --> 00:00:55,680
And that's how different statistical methods
21
00:00:55,680 --> 00:00:57,600
communicate with each other.
22
00:00:57,600 --> 00:01:01,410
Now, what about the initial four cluster situation?
23
00:01:01,410 --> 00:01:03,960
Clustering in this case could help us identify
24
00:01:03,960 --> 00:01:05,970
omitted variable bias.
25
00:01:05,970 --> 00:01:07,110
In this situation,
26
00:01:07,110 --> 00:01:08,880
you could think about clustering as a method
27
00:01:08,880 --> 00:01:11,280
for exploring the data and realizing that
28
00:01:11,280 --> 00:01:13,110
one or more significant variables
29
00:01:13,110 --> 00:01:15,660
have not been included in the analysis.
30
00:01:15,660 --> 00:01:19,050
So instead of predicting price based solely on size,
31
00:01:19,050 --> 00:01:20,700
we may need to include location
32
00:01:20,700 --> 00:01:22,650
to get our better prediction.
33
00:01:22,650 --> 00:01:25,230
Okay, hopefully this lecture was useful
34
00:01:25,230 --> 00:01:26,880
not only for your clustering
35
00:01:26,880 --> 00:01:30,210
but your data science understanding as a whole.
36
00:01:30,210 --> 00:01:31,210
Thanks for watching.
2799
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.