Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:09,633 --> 00:00:10,586
We humans,
2
00:00:10,610 --> 00:00:12,001
have a keen eye for
3
00:00:12,025 --> 00:00:12,964
visual treat
4
00:00:12,987 --> 00:00:15,100
and love a good eye candy.
5
00:00:15,478 --> 00:00:17,690
That being said, statisticians
6
00:00:17,714 --> 00:00:19,326
were having a hard time
7
00:00:19,350 --> 00:00:20,758
with getting people to listen
8
00:00:20,782 --> 00:00:22,426
to their very important
9
00:00:22,450 --> 00:00:24,128
and relevent data.
10
00:00:24,489 --> 00:00:25,538
Frankly speaking,
11
00:00:25,562 --> 00:00:26,642
even though tables
12
00:00:26,666 --> 00:00:28,356
are easier to consume,
13
00:00:28,380 --> 00:00:30,462
it still does not taste good.
14
00:00:30,486 --> 00:00:31,141
Does it?
15
00:00:31,465 --> 00:00:32,166
So what
16
00:00:32,189 --> 00:00:32,680
to do?
17
00:00:32,704 --> 00:00:34,417
A lot of data
18
00:00:34,441 --> 00:00:35,402
that we deal with
19
00:00:35,425 --> 00:00:37,614
in the real life is comparative.
20
00:00:37,638 --> 00:00:39,333
As in comparing 2 things.
21
00:00:39,358 --> 00:00:40,598
It can be about
22
00:00:40,622 --> 00:00:42,169
which student is tallest
23
00:00:42,193 --> 00:00:43,838
or which candy is cheapest
24
00:00:43,862 --> 00:00:44,711
and so on.
25
00:00:45,015 --> 00:00:46,151
This is where
26
00:00:46,174 --> 00:00:47,794
graphical representation
27
00:00:47,819 --> 00:00:49,116
comes to our rescue.
28
00:00:49,139 --> 00:00:51,658
Graphical representation of data
29
00:00:51,682 --> 00:00:53,218
allows us to understand
30
00:00:53,242 --> 00:00:55,465
the data much more easily
31
00:00:55,489 --> 00:00:58,472
and intuitively than a table.
32
00:00:58,497 --> 00:00:59,767
Our aim here,
33
00:00:59,791 --> 00:01:00,916
is to throw some light
34
00:01:00,940 --> 00:01:02,627
on 3 major types
35
00:01:02,651 --> 00:01:05,119
of graphical representation of data.
36
00:01:05,144 --> 00:01:07,052
That is the bar graph,
37
00:01:07,075 --> 00:01:09,050
the histogram and
38
00:01:09,074 --> 00:01:10,499
the frequency polygon.
39
00:01:20,904 --> 00:01:21,972
You already know
40
00:01:21,996 --> 00:01:23,522
a few things about graphs
41
00:01:23,546 --> 00:01:25,078
from your earlier classes.
42
00:01:25,102 --> 00:01:26,394
Let's build on that.
43
00:01:26,418 --> 00:01:28,934
Let's represent table of heights
44
00:01:28,958 --> 00:01:29,681
in the form
45
00:01:29,705 --> 00:01:30,881
of a bar graph.
46
00:01:30,905 --> 00:01:32,634
Let's bring up the table
47
00:01:32,658 --> 00:01:34,418
of ungrouped data here.
48
00:01:34,735 --> 00:01:35,899
To draw the chart,
49
00:01:35,923 --> 00:01:37,738
I'll start off by drawing
50
00:01:37,763 --> 00:01:39,858
a flat horizontal line
51
00:01:39,881 --> 00:01:41,655
called the X axis.
52
00:01:41,679 --> 00:01:42,704
Where you represent
53
00:01:42,728 --> 00:01:44,181
different height values
54
00:01:44,205 --> 00:01:45,556
of all the students.
55
00:01:45,864 --> 00:01:47,227
Now, as I
56
00:01:47,251 --> 00:01:48,609
go through this data,
57
00:01:48,632 --> 00:01:50,531
I start to add a dot
58
00:01:50,555 --> 00:01:52,116
above the X axis.
59
00:01:52,453 --> 00:01:54,236
So first, I put a dot
60
00:01:54,260 --> 00:01:55,670
at 195.
61
00:01:55,695 --> 00:01:58,235
The next is at 175.
62
00:01:58,259 --> 00:02:00,611
The next at 170
63
00:02:00,635 --> 00:02:02,339
and I keep continuing.
64
00:02:02,363 --> 00:02:03,653
The fifth student
65
00:02:03,677 --> 00:02:05,677
is at 185.
66
00:02:05,701 --> 00:02:07,897
And so is the sixth student.
67
00:02:07,920 --> 00:02:10,993
So I add a dot above that.
68
00:02:11,018 --> 00:02:12,499
This can continue
69
00:02:12,523 --> 00:02:15,061
till I exhaust the complete data.
70
00:02:15,084 --> 00:02:17,003
So if you observe,
71
00:02:17,028 --> 00:02:19,105
the Y axis represents
72
00:02:19,129 --> 00:02:20,478
the number of students
73
00:02:20,502 --> 00:02:22,770
or the frequency.
74
00:02:23,248 --> 00:02:24,223
Because this is called
75
00:02:24,247 --> 00:02:25,518
a bar graph and not
76
00:02:25,541 --> 00:02:26,652
a dot graph,
77
00:02:26,677 --> 00:02:28,578
instead of you using dots
78
00:02:28,602 --> 00:02:29,622
like you just did,
79
00:02:29,646 --> 00:02:30,979
you can start to draw
80
00:02:31,003 --> 00:02:33,770
rectangular bars and extend it
81
00:02:33,794 --> 00:02:36,118
till your corresponding data point.
82
00:02:36,141 --> 00:02:37,741
For example, there is just
83
00:02:37,765 --> 00:02:39,669
one student with the height of
84
00:02:39,694 --> 00:02:43,059
130 cms. So I extend the bar
85
00:02:43,082 --> 00:02:45,234
till it reaches the level of
86
00:02:45,258 --> 00:02:47,596
1 on the Y axis.
87
00:02:47,620 --> 00:02:49,384
The next data value is
88
00:02:49,408 --> 00:02:52,744
135. Has a frequency of 2.
89
00:02:52,769 --> 00:02:54,503
So I extend the bar
90
00:02:54,527 --> 00:02:56,142
till it reaches a value of
91
00:02:56,166 --> 00:02:58,141
2 on your Y axis.
92
00:02:58,165 --> 00:03:00,724
Moving on, we have 155
93
00:03:00,748 --> 00:03:02,720
which appears 7 times.
94
00:03:02,744 --> 00:03:04,228
So the bar for this value
95
00:03:04,252 --> 00:03:06,574
extends till it reaches 7
96
00:03:06,598 --> 00:03:07,888
on the Y axis.
97
00:03:07,912 --> 00:03:10,604
162 goes upto 6.
98
00:03:10,627 --> 00:03:13,304
168 goes up to 4.
99
00:03:13,328 --> 00:03:15,488
Finally 195,
100
00:03:15,512 --> 00:03:17,810
with the frequency of 1.
101
00:03:17,833 --> 00:03:19,733
Remember that the thickness
102
00:03:19,758 --> 00:03:20,797
of all these bars
103
00:03:20,820 --> 00:03:21,465
that you see
104
00:03:21,489 --> 00:03:23,272
is actually of your choice.
105
00:03:23,296 --> 00:03:25,211
But for the sake of clarity,
106
00:03:25,235 --> 00:03:26,629
you tend to maintain
107
00:03:26,653 --> 00:03:27,270
all of them
108
00:03:27,294 --> 00:03:28,871
as the same thickness.
109
00:03:29,167 --> 00:03:31,319
From this you can easily tell
110
00:03:31,343 --> 00:03:32,695
the number of students
111
00:03:32,719 --> 00:03:34,560
that have the same height
112
00:03:34,584 --> 00:03:35,960
by looking at the
113
00:03:35,985 --> 00:03:38,162
height of each of these bars.
114
00:03:38,389 --> 00:03:40,890
This becomes all the more necessary,
115
00:03:40,914 --> 00:03:42,749
when we ask questions regarding
116
00:03:42,774 --> 00:03:44,138
a particular height
117
00:03:44,162 --> 00:03:46,299
and also how many students
118
00:03:46,322 --> 00:03:47,616
have the same height.
119
00:03:48,054 --> 00:03:49,656
We apply the same logic
120
00:03:49,679 --> 00:03:51,577
to grouped data as well.
121
00:03:51,602 --> 00:03:53,243
In this, we have
122
00:03:53,267 --> 00:03:54,777
heights of 60 students
123
00:03:54,800 --> 00:03:57,206
grouped into classes of 10 each.
124
00:03:57,230 --> 00:04:00,192
So we drop the axis again.
125
00:04:00,216 --> 00:04:01,759
This time we have
126
00:04:01,783 --> 00:04:03,549
the classes on the X axis
127
00:04:03,573 --> 00:04:05,601
and corresponding frequencies
128
00:04:05,624 --> 00:04:06,861
on the Y axis.
129
00:04:07,134 --> 00:04:09,581
The class of 130-140
130
00:04:09,605 --> 00:04:11,492
has a frequency of 9.
131
00:04:11,516 --> 00:04:13,487
So the bar extends from
132
00:04:13,511 --> 00:04:15,019
X axis, to reach
133
00:04:15,043 --> 00:04:16,405
a level of 9
134
00:04:16,430 --> 00:04:17,538
on the Y axis.
135
00:04:17,914 --> 00:04:19,488
Similarly, for the rest
136
00:04:19,512 --> 00:04:20,631
of the frequencies.
137
00:04:21,130 --> 00:04:22,618
Making data visual,
138
00:04:22,642 --> 00:04:24,638
makes it leads better
139
00:04:24,662 --> 00:04:26,587
to understand it. Doesn't it?
140
00:04:26,611 --> 00:04:29,525
A similar representation can happen
141
00:04:29,549 --> 00:04:32,014
using a histogram as well.
142
00:04:32,038 --> 00:04:33,541
Let's dive in.
143
00:04:44,121 --> 00:04:46,206
A histogram is just like
144
00:04:46,230 --> 00:04:47,387
this bar graph.
145
00:04:47,411 --> 00:04:48,928
But I will have to make
146
00:04:48,952 --> 00:04:50,390
a few changes here.
147
00:04:50,619 --> 00:04:52,147
Just like a bar graph
148
00:04:52,171 --> 00:04:53,897
we represent the height
149
00:04:53,922 --> 00:04:55,597
on the horizontal axis
150
00:04:55,621 --> 00:04:58,066
but using a suitable scale.
151
00:04:58,090 --> 00:05:00,818
Scale here, becomes very important
152
00:05:00,842 --> 00:05:02,515
because the area
153
00:05:02,539 --> 00:05:03,992
that this bar covers
154
00:05:04,016 --> 00:05:05,055
in a histogram
155
00:05:05,080 --> 00:05:06,701
is very very important.
156
00:05:07,074 --> 00:05:08,551
We can choose the scale
157
00:05:08,575 --> 00:05:12,293
as 1cm equivalent to 10cms.
158
00:05:12,317 --> 00:05:14,270
So each class occupies
159
00:05:14,294 --> 00:05:16,343
a width of 1cm
160
00:05:16,367 --> 00:05:17,646
on this graph.
161
00:05:17,670 --> 00:05:20,099
Also since the first class interval
162
00:05:20,123 --> 00:05:21,880
is not starting from zero
163
00:05:21,904 --> 00:05:24,454
but a fixed non-zero value
164
00:05:24,478 --> 00:05:25,801
we show it on a graph
165
00:05:25,825 --> 00:05:28,044
by marking a Kink
166
00:05:28,068 --> 00:05:29,504
like you see here.
167
00:05:29,528 --> 00:05:31,650
As this has a break
168
00:05:31,673 --> 00:05:33,326
on the axis. Next.
169
00:05:33,351 --> 00:05:34,467
Unlike the bar graph
170
00:05:34,491 --> 00:05:35,537
there are no gaps
171
00:05:35,561 --> 00:05:37,009
in between the rectangles
172
00:05:37,033 --> 00:05:37,675
of the graph.
173
00:05:37,699 --> 00:05:38,858
So, I will have to
174
00:05:38,882 --> 00:05:40,730
knock off all of these gaps
175
00:05:40,754 --> 00:05:42,145
and will have to keep
176
00:05:42,169 --> 00:05:44,245
only the lower class limits
177
00:05:44,269 --> 00:05:45,037
on the graph.
178
00:05:45,561 --> 00:05:48,880
Technically, it is one solid figure.
179
00:05:48,904 --> 00:05:50,138
What you see now
180
00:05:50,162 --> 00:05:52,461
is called a Histogram.
181
00:05:52,707 --> 00:05:54,244
One important thing
182
00:05:54,267 --> 00:05:55,249
that you will need to
183
00:05:55,273 --> 00:05:56,875
keep in mind about a histogram,
184
00:05:56,899 --> 00:05:58,852
is the area of the graph
185
00:05:58,876 --> 00:06:01,222
plays a very crucial role.
186
00:06:01,690 --> 00:06:03,834
In fact, the area of the bar
187
00:06:03,858 --> 00:06:05,802
is directly proportional
188
00:06:05,826 --> 00:06:08,081
to the frequency of that data.
189
00:06:08,105 --> 00:06:10,273
Also the sum of areas
190
00:06:10,297 --> 00:06:11,698
of all the bars
191
00:06:11,721 --> 00:06:12,609
is equal to the
192
00:06:12,633 --> 00:06:15,430
total frequency of all the classes
193
00:06:15,454 --> 00:06:16,429
in the table.
194
00:06:16,737 --> 00:06:18,464
Till now we have dealt with
195
00:06:18,488 --> 00:06:20,449
classes of equal sizes.
196
00:06:20,676 --> 00:06:22,019
What if I have
197
00:06:22,043 --> 00:06:23,643
different class sizes
198
00:06:23,667 --> 00:06:25,308
on the same histogram?
199
00:06:25,545 --> 00:06:26,680
That is, what if
200
00:06:26,704 --> 00:06:28,452
I have to put all students
201
00:06:28,476 --> 00:06:30,636
with heights less than 150
202
00:06:30,660 --> 00:06:31,520
in one bracket
203
00:06:31,544 --> 00:06:32,895
and anyone with heights
204
00:06:32,919 --> 00:06:35,726
more than 170 in another bracket.
205
00:06:35,990 --> 00:06:38,014
Absolutely arbitrary.
206
00:06:38,038 --> 00:06:39,236
So that means
207
00:06:39,260 --> 00:06:41,321
class interval 130-140,
208
00:06:41,345 --> 00:06:43,772
140-150 get clubbed
209
00:06:43,796 --> 00:06:45,270
into one class interval
210
00:06:45,294 --> 00:06:48,718
of 130-150 along with their
211
00:06:48,741 --> 00:06:50,207
respective frequencies.
212
00:06:50,231 --> 00:06:53,880
Likewise class intervals of 170-180,
213
00:06:53,904 --> 00:06:56,847
180-190, 190-200
214
00:06:56,871 --> 00:06:58,425
all get clubbed
215
00:06:58,448 --> 00:07:01,556
in a class interval of 170-200.
216
00:07:01,580 --> 00:07:04,725
And in between, we have 150-160
217
00:07:04,749 --> 00:07:06,933
and 160-170.
218
00:07:06,957 --> 00:07:08,651
That remain as is.
219
00:07:08,675 --> 00:07:10,622
So, that means, now
220
00:07:10,646 --> 00:07:12,259
you have a new table
221
00:07:12,283 --> 00:07:14,910
with classes of different widths.
222
00:07:14,934 --> 00:07:16,650
Tthe first class width is 20.
223
00:07:16,674 --> 00:07:19,131
That is 150 minus 130.
224
00:07:19,156 --> 00:07:21,068
Followed by 2 class widths
225
00:07:21,092 --> 00:07:22,092
of 10 each.
226
00:07:22,115 --> 00:07:24,629
That's 160 minus 150.
227
00:07:24,653 --> 00:07:25,844
And the last one
228
00:07:25,868 --> 00:07:27,451
with a width of 30,
229
00:07:27,475 --> 00:07:30,437
which is 200 minus 170
230
00:07:30,461 --> 00:07:31,302
that you see.
231
00:07:31,555 --> 00:07:32,859
If I were to draw
232
00:07:32,883 --> 00:07:34,457
a bar graph here,
233
00:07:34,481 --> 00:07:36,410
this is how it would look.
234
00:07:37,239 --> 00:07:38,076
For a histogram
235
00:07:38,100 --> 00:07:39,062
on the other hand,
236
00:07:39,086 --> 00:07:40,184
I mentioned that the
237
00:07:40,208 --> 00:07:42,566
areas of the bars are crucial
238
00:07:42,591 --> 00:07:44,446
for accurate representation.
239
00:07:44,470 --> 00:07:46,194
We need to pay attention
240
00:07:46,218 --> 00:07:47,723
to the width and height
241
00:07:47,747 --> 00:07:49,314
of these bars here.
242
00:07:49,338 --> 00:07:50,575
So, the width of
243
00:07:50,599 --> 00:07:52,320
all the 3 classes are
244
00:07:52,344 --> 00:07:53,831
different. Remember
245
00:07:53,855 --> 00:07:54,976
I told you that the
246
00:07:55,000 --> 00:07:56,388
area of a histogram
247
00:07:56,412 --> 00:07:58,100
has to be proportional
248
00:07:58,124 --> 00:07:59,303
to the frequency.
249
00:07:59,674 --> 00:08:01,265
So how do we do this?
250
00:08:01,289 --> 00:08:02,499
We need to bring
251
00:08:02,523 --> 00:08:04,885
all the frequencies in line
252
00:08:04,910 --> 00:08:07,037
with the minimum class width.
253
00:08:07,061 --> 00:08:09,492
The minimum class width here is
254
00:08:09,516 --> 00:08:10,383
10.
255
00:08:10,407 --> 00:08:12,441
The length of the rectangles
256
00:08:12,465 --> 00:08:13,972
are to be modified
257
00:08:13,996 --> 00:08:16,669
to proportionate this class size.
258
00:08:16,930 --> 00:08:17,876
For instance,
259
00:08:17,900 --> 00:08:19,766
when the class size is 20,
260
00:08:19,790 --> 00:08:22,034
as is the first case.
261
00:08:22,057 --> 00:08:23,575
The length of the rectangle
262
00:08:23,600 --> 00:08:25,874
will be 16 times 10
263
00:08:25,898 --> 00:08:27,354
divided by 20,
264
00:08:27,378 --> 00:08:28,203
which is going to be
265
00:08:28,226 --> 00:08:29,016
equivalent to 8.
266
00:08:29,040 --> 00:08:31,408
This is simple cross multiplication.
267
00:08:31,656 --> 00:08:33,958
This way the total frequency
268
00:08:33,982 --> 00:08:36,284
will be 16 in this range.
269
00:08:36,616 --> 00:08:37,885
The next 2 groups
270
00:08:37,909 --> 00:08:39,626
the class widths are the same
271
00:08:39,650 --> 00:08:41,263
as the minimum class width.
272
00:08:41,287 --> 00:08:42,690
Hence you don't need to
273
00:08:42,714 --> 00:08:43,653
change anything.
274
00:08:43,913 --> 00:08:45,350
The last one however,
275
00:08:45,374 --> 00:08:47,340
goes through the same treatment
276
00:08:47,365 --> 00:08:48,433
as the first one.
277
00:08:48,633 --> 00:08:49,704
In this instance,
278
00:08:49,728 --> 00:08:51,559
the class size is 30
279
00:08:51,583 --> 00:08:53,502
and the frequency is 8.
280
00:08:53,526 --> 00:08:54,833
So when the class size
281
00:08:54,857 --> 00:08:55,769
becomes 10,
282
00:08:55,794 --> 00:08:57,200
the length of this rectangle
283
00:08:57,224 --> 00:08:58,802
will be 8 times 10
284
00:08:58,826 --> 00:08:59,978
divided by 30.
285
00:09:00,002 --> 00:09:02,217
That is 2.666
286
00:09:02,601 --> 00:09:04,798
This histogram can now be said,
287
00:09:04,823 --> 00:09:05,939
to be proportional
288
00:09:05,963 --> 00:09:06,935
to the students
289
00:09:06,959 --> 00:09:09,053
per 10 cm interval.
290
00:09:19,426 --> 00:09:20,568
Even though a bar graph
291
00:09:20,592 --> 00:09:22,002
and a histogram look alike,
292
00:09:22,026 --> 00:09:23,704
you might have noticed already
293
00:09:23,728 --> 00:09:25,454
that there are a few differences.
294
00:09:25,874 --> 00:09:28,241
In fact if I bring them together
295
00:09:28,265 --> 00:09:29,924
unless you are a statistician,
296
00:09:29,948 --> 00:09:30,944
chances are,
297
00:09:30,968 --> 00:09:32,629
that you will get confused.
298
00:09:32,653 --> 00:09:33,733
This exercise that
299
00:09:33,757 --> 00:09:34,676
we will do now,
300
00:09:34,700 --> 00:09:36,201
will help you sort out
301
00:09:36,225 --> 00:09:37,428
this confusion.
302
00:09:37,453 --> 00:09:38,363
If I ask you to
303
00:09:38,387 --> 00:09:40,629
collect data about language preferences
304
00:09:40,653 --> 00:09:42,105
of the students and
305
00:09:42,128 --> 00:09:43,155
add it to our
306
00:09:43,180 --> 00:09:44,224
original table.
307
00:09:44,435 --> 00:09:46,054
Now I will be able to
308
00:09:46,078 --> 00:09:47,192
draw a bar graph
309
00:09:47,217 --> 00:09:48,029
out of it.
310
00:09:48,420 --> 00:09:49,912
Now let's try to make
311
00:09:49,935 --> 00:09:51,388
a histogram out of the
312
00:09:51,412 --> 00:09:52,434
language data
313
00:09:52,458 --> 00:09:53,718
that we have collected.
314
00:09:54,201 --> 00:09:56,746
Is that even possible?
315
00:09:57,152 --> 00:10:00,091
Hmm. No it is not.
316
00:10:00,719 --> 00:10:03,013
Infact the data that you collect
317
00:10:03,037 --> 00:10:05,211
can be split into qualitative
318
00:10:05,234 --> 00:10:07,263
and quantitative data.
319
00:10:07,287 --> 00:10:08,573
If you're looking at
320
00:10:08,597 --> 00:10:09,667
colors of the car
321
00:10:09,691 --> 00:10:10,410
on the road,
322
00:10:10,434 --> 00:10:11,835
then the color of the car
323
00:10:11,859 --> 00:10:13,076
which is a data
324
00:10:13,100 --> 00:10:15,085
which is of the qualitative kind
325
00:10:15,109 --> 00:10:16,836
because this describes the
326
00:10:16,860 --> 00:10:18,914
quality of that particular data.
327
00:10:18,938 --> 00:10:20,213
Or if I ask you
328
00:10:20,236 --> 00:10:21,750
the flavor of ice cream
329
00:10:21,775 --> 00:10:22,468
that you like,
330
00:10:22,491 --> 00:10:24,957
that again is a qualitative data.
331
00:10:24,982 --> 00:10:26,319
On the other hand,
332
00:10:26,343 --> 00:10:27,804
data such as heights,
333
00:10:27,829 --> 00:10:30,338
weights, roll numbers, etc.
334
00:10:30,361 --> 00:10:31,516
are data that are
335
00:10:31,540 --> 00:10:33,025
represented by numbers.
336
00:10:33,269 --> 00:10:37,086
Here height is 160 cms tall.
337
00:10:37,110 --> 00:10:39,886
160 is a quantitative data
338
00:10:39,910 --> 00:10:41,306
since it refers to
339
00:10:41,330 --> 00:10:42,660
numerical data.
340
00:10:42,683 --> 00:10:44,063
From the examples
341
00:10:44,087 --> 00:10:45,305
that we have solved before,
342
00:10:45,329 --> 00:10:46,193
you can see
343
00:10:46,217 --> 00:10:47,476
that we can represent both
344
00:10:47,500 --> 00:10:50,135
qualitative and quantitative data
345
00:10:50,159 --> 00:10:51,212
on the bar graph.
346
00:10:51,236 --> 00:10:52,887
Where as we can represent
347
00:10:52,911 --> 00:10:55,087
only quantitative data
348
00:10:55,111 --> 00:10:56,358
on a histogram.
349
00:10:56,382 --> 00:10:57,908
So the next time,
350
00:10:57,931 --> 00:10:59,356
you need to make a graph
351
00:10:59,380 --> 00:11:00,752
be sure to analyze
352
00:11:00,776 --> 00:11:02,147
what kind of data
353
00:11:02,171 --> 00:11:03,708
you are trying to represent.
354
00:11:04,069 --> 00:11:05,434
Now let's start making
355
00:11:05,458 --> 00:11:07,241
a difference table out here.
356
00:11:07,265 --> 00:11:08,597
And let's start populating
357
00:11:08,621 --> 00:11:10,343
the differences as we go about.
358
00:11:10,661 --> 00:11:11,736
Let's bring back the
359
00:11:11,761 --> 00:11:12,999
graph of heights
360
00:11:13,023 --> 00:11:14,647
from the bar graph section.
361
00:11:14,870 --> 00:11:16,173
Now we see that
362
00:11:16,197 --> 00:11:17,444
on the X axis,
363
00:11:17,468 --> 00:11:19,932
each data point is represented
364
00:11:19,956 --> 00:11:21,923
individually. For example,
365
00:11:21,947 --> 00:11:24,694
a student's height of 130 cms
366
00:11:24,718 --> 00:11:26,520
is represented individually
367
00:11:26,544 --> 00:11:29,810
as 130 cms on the X axis.
368
00:11:29,833 --> 00:11:32,020
This kind of data representation
369
00:11:32,044 --> 00:11:35,614
individually is called Discrete data.
370
00:11:35,638 --> 00:11:36,829
And we also
371
00:11:36,853 --> 00:11:37,682
know that we can
372
00:11:37,705 --> 00:11:38,952
construct a bar a graph
373
00:11:38,976 --> 00:11:40,994
using grouped data as well.
374
00:11:41,018 --> 00:11:43,345
Grouped data here, refers to
375
00:11:43,369 --> 00:11:45,712
when a data point is represented
376
00:11:45,735 --> 00:11:47,586
not individually but as a
377
00:11:47,610 --> 00:11:49,667
continuous range of values.
378
00:11:49,888 --> 00:11:51,915
In case of discrete data,
379
00:11:51,939 --> 00:11:53,278
we can have gaps
380
00:11:53,302 --> 00:11:54,700
in between the values
381
00:11:54,724 --> 00:11:55,841
of data points
382
00:11:55,865 --> 00:11:56,950
on the X axis.
383
00:11:56,974 --> 00:11:58,478
The data that you collect
384
00:11:58,502 --> 00:12:00,216
can again be classified
385
00:12:00,240 --> 00:12:01,344
in one more type.
386
00:12:01,368 --> 00:12:04,256
As continuous and discrete data.
387
00:12:04,280 --> 00:12:05,314
When you're talking about
388
00:12:05,338 --> 00:12:07,517
discrete data, there can be
389
00:12:07,541 --> 00:12:08,687
gaps in the data
390
00:12:08,711 --> 00:12:09,446
that you collect.
391
00:12:09,678 --> 00:12:11,623
For example 130 cms
392
00:12:11,648 --> 00:12:15,196
and 135 cms as heights of students
393
00:12:15,220 --> 00:12:16,615
has a gap of
394
00:12:16,639 --> 00:12:18,233
5 in between them.
395
00:12:18,257 --> 00:12:19,629
And when you're talking about
396
00:12:19,653 --> 00:12:20,850
continuous data,
397
00:12:20,874 --> 00:12:22,698
there cannot be these gaps
398
00:12:22,722 --> 00:12:23,944
that you see here.
399
00:12:23,968 --> 00:12:25,330
So, when it comes to a
400
00:12:25,354 --> 00:12:27,036
bar graph, you can represent
401
00:12:27,060 --> 00:12:30,146
both continous and discrete data.
402
00:12:30,170 --> 00:12:31,629
But in a histogram
403
00:12:31,653 --> 00:12:33,199
you can represent only
404
00:12:33,222 --> 00:12:34,726
continuous data.
405
00:12:34,751 --> 00:12:36,605
This is another reason why
406
00:12:36,629 --> 00:12:38,279
bars of a bar graph
407
00:12:38,303 --> 00:12:39,989
are separated by a gap,
408
00:12:40,013 --> 00:12:42,238
since they are discrete values.
409
00:12:42,262 --> 00:12:43,749
Whereas in a histogram
410
00:12:43,773 --> 00:12:46,287
all the bars are clubbed together.
411
00:12:46,311 --> 00:12:47,705
We also cannot
412
00:12:47,729 --> 00:12:49,344
reorder this data
413
00:12:49,368 --> 00:12:50,914
in case of a histogram
414
00:12:50,939 --> 00:12:53,429
due to continuity of the data.
415
00:12:53,452 --> 00:12:56,167
Let's add these 2 points also
416
00:12:56,191 --> 00:12:58,319
into our comparison chart.
417
00:12:58,570 --> 00:13:00,386
Using continuous data means
418
00:13:00,409 --> 00:13:01,596
that the classes have to be
419
00:13:01,620 --> 00:13:03,057
ordered on the graph
420
00:13:03,081 --> 00:13:04,704
as the appeared to us.
421
00:13:04,727 --> 00:13:05,875
On the other hand
422
00:13:05,899 --> 00:13:07,193
having discrete data
423
00:13:07,217 --> 00:13:08,156
in the bar graph
424
00:13:08,180 --> 00:13:10,023
allows you to arrange the variables
425
00:13:10,047 --> 00:13:12,097
in anyway you want to.
426
00:13:12,121 --> 00:13:13,913
When I'm drawing a bar graph
427
00:13:13,937 --> 00:13:14,942
the order in which
428
00:13:14,967 --> 00:13:15,854
I show the elements
429
00:13:15,878 --> 00:13:16,900
on the X axis
430
00:13:16,924 --> 00:13:18,813
is not a problem at all.
431
00:13:18,837 --> 00:13:21,488
I can first show 130-140.
432
00:13:21,512 --> 00:13:23,719
Then show 150-160.
433
00:13:23,743 --> 00:13:26,580
And then I can have 140-150.
434
00:13:26,604 --> 00:13:28,004
But when it comes to a
435
00:13:28,028 --> 00:13:29,521
histogram, I cannot
436
00:13:29,545 --> 00:13:31,064
reorder the data.
437
00:13:31,088 --> 00:13:33,043
This is obvious because
438
00:13:33,067 --> 00:13:34,240
we are dealing with
439
00:13:34,264 --> 00:13:36,036
continuous variable.
440
00:13:36,060 --> 00:13:37,338
This is one more
441
00:13:37,362 --> 00:13:39,046
for the comparison chart.
442
00:13:39,275 --> 00:13:40,957
As you've already seen,
443
00:13:40,981 --> 00:13:43,193
the spaces in between the bars
444
00:13:43,217 --> 00:13:45,245
are not present in the histogram.
445
00:13:45,269 --> 00:13:47,246
It essentially looks like
446
00:13:47,270 --> 00:13:48,718
one big block.
447
00:13:48,892 --> 00:13:50,742
Also the width of the bars
448
00:13:50,766 --> 00:13:52,169
need not be the same
449
00:13:52,193 --> 00:13:53,872
when it comes to a histogram.
450
00:13:54,244 --> 00:13:55,718
Also remember,
451
00:13:55,741 --> 00:13:57,207
that the area of the bar
452
00:13:57,231 --> 00:13:58,841
plays a huge role
453
00:13:58,865 --> 00:14:00,482
in a histogram and hence,
454
00:14:00,506 --> 00:14:02,712
we need to maintain uniformity
455
00:14:02,736 --> 00:14:04,575
of class width through out.
456
00:14:04,726 --> 00:14:05,806
But in the case
457
00:14:05,831 --> 00:14:06,657
of a bar graph,
458
00:14:06,681 --> 00:14:07,644
the width of the bars
459
00:14:07,668 --> 00:14:08,725
are immaterial
460
00:14:08,748 --> 00:14:09,992
to the interpretation
461
00:14:10,017 --> 00:14:11,065
of the bar graph.
462
00:14:21,355 --> 00:14:23,310
There is yet another visual way
463
00:14:23,334 --> 00:14:25,672
of representing quantitative data
464
00:14:25,696 --> 00:14:27,012
and its frequency.
465
00:14:27,389 --> 00:14:29,841
It's called the frequency polygon.
466
00:14:30,162 --> 00:14:31,736
Let's consider the histogram
467
00:14:31,760 --> 00:14:33,278
that we initially constructed
468
00:14:33,302 --> 00:14:35,416
with equal class intervals.
469
00:14:35,812 --> 00:14:37,711
Let me mark this point,
470
00:14:37,735 --> 00:14:38,740
which is the midpoint
471
00:14:38,764 --> 00:14:39,882
of the class interval
472
00:14:39,906 --> 00:14:41,830
of 130-140.
473
00:14:41,854 --> 00:14:43,450
I will call this point
474
00:14:43,474 --> 00:14:45,176
as the class mark.
475
00:14:45,200 --> 00:14:47,276
So class mark is a
476
00:14:47,301 --> 00:14:49,210
mathematical way of saying
477
00:14:49,233 --> 00:14:51,001
mid-point of class interval
478
00:14:51,026 --> 00:14:52,170
which we obtained
479
00:14:52,194 --> 00:14:53,847
by adding the upper
480
00:14:53,871 --> 00:14:55,903
and lower limits of a class
481
00:14:55,927 --> 00:14:57,907
and dividing it by 2.
482
00:14:57,931 --> 00:14:59,020
If we consider
483
00:14:59,045 --> 00:15:01,449
the class interval of 150-160,
484
00:15:01,473 --> 00:15:02,820
its class mark is
485
00:15:02,844 --> 00:15:05,228
150+160/2
486
00:15:05,252 --> 00:15:07,380
which is going to be 155.
487
00:15:07,404 --> 00:15:09,120
Next I will highlight
488
00:15:09,144 --> 00:15:10,028
the class marks
489
00:15:10,052 --> 00:15:11,836
for all other class intervals
490
00:15:11,860 --> 00:15:12,625
as well.
491
00:15:12,934 --> 00:15:14,785
For a frequency polygon,
492
00:15:14,809 --> 00:15:15,813
all I have to do
493
00:15:15,837 --> 00:15:16,717
is to connect
494
00:15:16,741 --> 00:15:18,071
all of these dots.
495
00:15:18,095 --> 00:15:19,553
Or connect all of these
496
00:15:19,577 --> 00:15:21,359
class marks. Well.
497
00:15:21,383 --> 00:15:24,316
I said frequency polygon.
498
00:15:24,340 --> 00:15:26,119
But what is a polygon?
499
00:15:26,143 --> 00:15:27,711
A polygon is a
500
00:15:27,736 --> 00:15:29,390
multi-sided shape.
501
00:15:29,414 --> 00:15:30,659
But before all
502
00:15:30,683 --> 00:15:32,840
it is a closed shape.
503
00:15:33,207 --> 00:15:35,007
So how do we get that?
504
00:15:35,031 --> 00:15:36,673
We add a class interval
505
00:15:36,696 --> 00:15:37,920
before the first one
506
00:15:37,944 --> 00:15:38,933
in the data
507
00:15:38,957 --> 00:15:40,037
and do the same
508
00:15:40,062 --> 00:15:41,117
in the other end
509
00:15:41,141 --> 00:15:42,949
of the histogram as well.
510
00:15:42,972 --> 00:15:45,028
Since, the first class interval
511
00:15:45,053 --> 00:15:48,615
is 130-140 we add another
512
00:15:48,639 --> 00:15:50,690
with 120-130.
513
00:15:50,714 --> 00:15:53,061
This class interval will ofcourse
514
00:15:53,085 --> 00:15:54,711
have a frequency of zero,
515
00:15:54,735 --> 00:15:55,632
since it is not
516
00:15:55,656 --> 00:15:57,364
represented in the table.
517
00:15:57,388 --> 00:15:58,608
We just have to
518
00:15:58,631 --> 00:16:00,006
mark the class mark
519
00:16:00,031 --> 00:16:01,369
for this group. That is
520
00:16:01,393 --> 00:16:04,344
130+120/2
521
00:16:04,368 --> 00:16:06,248
which is going to be 125.
522
00:16:06,477 --> 00:16:08,061
And then we are done.
523
00:16:08,085 --> 00:16:09,764
We can now connect the line
524
00:16:09,788 --> 00:16:11,245
to the X axis.
525
00:16:11,269 --> 00:16:12,209
Doing the same
526
00:16:12,233 --> 00:16:13,263
on the other end,
527
00:16:13,286 --> 00:16:16,116
we add 200-210
528
00:16:16,140 --> 00:16:17,898
to the frequency polygon graph.
529
00:16:17,923 --> 00:16:21,126
Marking the class mark as 205
530
00:16:21,150 --> 00:16:22,663
and closing the figure
531
00:16:22,687 --> 00:16:23,735
at the both ends
532
00:16:23,759 --> 00:16:26,035
gives us the frequency polygon.
533
00:16:26,324 --> 00:16:28,111
Instead of drawing the entire
534
00:16:28,135 --> 00:16:29,658
bar of a histogram,
535
00:16:29,682 --> 00:16:32,448
you just mark the frequency levels
536
00:16:32,472 --> 00:16:33,935
with the Y axis
537
00:16:33,959 --> 00:16:35,347
at the class mark.
538
00:16:35,371 --> 00:16:36,826
Just like a histogram,
539
00:16:36,850 --> 00:16:39,342
frequency polygon's total area
540
00:16:39,366 --> 00:16:41,043
is directly proportional
541
00:16:41,067 --> 00:16:42,834
to the total frequency
542
00:16:42,858 --> 00:16:44,115
of the table.
543
00:16:44,139 --> 00:16:45,468
For the sake of convenience,
544
00:16:45,492 --> 00:16:46,892
let's bring back the histogram
545
00:16:46,916 --> 00:16:47,539
that we drew
546
00:16:47,563 --> 00:16:48,816
in the previous sections.
547
00:16:48,840 --> 00:16:50,202
Let's take the graph
548
00:16:50,226 --> 00:16:51,611
where the frequency polygon
549
00:16:51,634 --> 00:16:53,808
is drawn over the histogram.
550
00:16:53,832 --> 00:16:56,197
So if I join the class marks
551
00:16:56,222 --> 00:16:56,981
you can see
552
00:16:57,005 --> 00:16:58,393
that the chunks of area
553
00:16:58,417 --> 00:17:00,656
are being leftout of calculation.
554
00:17:00,870 --> 00:17:02,032
There are also a few
555
00:17:02,057 --> 00:17:03,150
empty areas
556
00:17:03,174 --> 00:17:05,328
inside the frequency polygon.
557
00:17:05,673 --> 00:17:06,584
To prove to you
558
00:17:06,608 --> 00:17:07,527
that the area of the
559
00:17:07,551 --> 00:17:08,664
frequency polygon
560
00:17:08,688 --> 00:17:10,054
and that of the histogram
561
00:17:10,079 --> 00:17:11,001
are the same,
562
00:17:11,025 --> 00:17:12,494
I will cut the part
563
00:17:12,517 --> 00:17:14,192
which is outside the line.
564
00:17:14,216 --> 00:17:16,253
Flip it all over and see
565
00:17:16,277 --> 00:17:18,152
that it fits exactly
566
00:17:18,176 --> 00:17:20,311
into the empty area here.
567
00:17:20,336 --> 00:17:21,945
The same can be done
568
00:17:21,969 --> 00:17:23,484
for all the bars.
569
00:17:23,826 --> 00:17:25,933
So eventually, we see that
570
00:17:25,957 --> 00:17:27,350
all the triangles
571
00:17:27,375 --> 00:17:29,248
ejected by the line we drew
572
00:17:29,271 --> 00:17:30,669
are included within this
573
00:17:30,693 --> 00:17:32,808
frequency polygon. And hence,
574
00:17:32,832 --> 00:17:34,304
we can visually say
575
00:17:34,328 --> 00:17:35,472
that the total area
576
00:17:35,496 --> 00:17:36,802
of the frequency polygon
577
00:17:36,825 --> 00:17:37,648
is equal to the
578
00:17:37,673 --> 00:17:39,480
total area of the histogram
579
00:17:39,504 --> 00:17:41,133
made by the same data.
580
00:17:41,526 --> 00:17:43,038
Also the area
581
00:17:43,062 --> 00:17:45,496
is proportional to the frequency.
582
00:17:55,723 --> 00:17:56,927
So till now,
583
00:17:56,951 --> 00:17:57,875
we have learnt about
584
00:17:57,899 --> 00:17:59,411
raw data and how
585
00:17:59,435 --> 00:18:00,922
unless it has context,
586
00:18:00,946 --> 00:18:03,563
it is useless. Raw data
587
00:18:03,587 --> 00:18:05,168
can also be made useful
588
00:18:05,192 --> 00:18:06,430
by processing it.
589
00:18:06,454 --> 00:18:07,691
We process data
590
00:18:07,715 --> 00:18:09,278
by means of statistics.
591
00:18:09,302 --> 00:18:10,669
Using methods such as
592
00:18:10,693 --> 00:18:13,209
creating a frequency distribution table.
593
00:18:13,233 --> 00:18:14,657
Frequency is the
594
00:18:14,680 --> 00:18:15,654
number of times
595
00:18:15,679 --> 00:18:16,892
a particular data
596
00:18:16,916 --> 00:18:18,695
appears in a data set.
597
00:18:18,719 --> 00:18:20,278
When the number of heights
598
00:18:20,303 --> 00:18:21,806
are considered individually
599
00:18:21,830 --> 00:18:24,318
we call it ungrouped data set.
600
00:18:24,341 --> 00:18:25,653
It was too much data
601
00:18:25,678 --> 00:18:26,460
to deal with.
602
00:18:26,484 --> 00:18:27,458
So we then
603
00:18:27,482 --> 00:18:29,151
clubbed the heights to create
604
00:18:29,174 --> 00:18:31,217
a grouped data and a
605
00:18:31,241 --> 00:18:33,645
grouped frequency distribution table.
606
00:18:33,669 --> 00:18:35,143
We then decided
607
00:18:35,167 --> 00:18:37,067
that numbers are all together
608
00:18:37,091 --> 00:18:38,005
too boring,
609
00:18:38,029 --> 00:18:39,041
and came up with
610
00:18:39,065 --> 00:18:42,210
graphical methods of representing data.
611
00:18:42,234 --> 00:18:44,360
This includes bar graphs,
612
00:18:44,384 --> 00:18:47,291
histograms and frequency polygons.
613
00:18:47,685 --> 00:18:49,444
Bar graphs is an excellent
614
00:18:49,468 --> 00:18:50,626
comparative tool
615
00:18:50,650 --> 00:18:52,036
and is used mostly
616
00:18:52,060 --> 00:18:53,914
in non numerical context.
617
00:18:53,938 --> 00:18:56,101
Such as comparing 2 items.
618
00:18:56,125 --> 00:18:57,857
The bars in a bar graph,
619
00:18:57,880 --> 00:18:59,992
typically are of the same width
620
00:19:00,016 --> 00:19:01,407
but bare no relevance
621
00:19:01,432 --> 00:19:03,209
to the area that they occupy.
622
00:19:03,232 --> 00:19:04,738
In contrast to it,
623
00:19:04,763 --> 00:19:05,882
in a histogram
624
00:19:05,906 --> 00:19:07,542
the dimensions of the bars
625
00:19:07,566 --> 00:19:08,968
are very crucial.
626
00:19:09,196 --> 00:19:10,705
The area of the bar
627
00:19:10,729 --> 00:19:12,214
is directly proportional
628
00:19:12,238 --> 00:19:13,510
to its frequency.
629
00:19:13,534 --> 00:19:15,259
Consequently, so
630
00:19:15,283 --> 00:19:16,920
the width of the class intervals
631
00:19:16,943 --> 00:19:18,560
must be taken into account
632
00:19:18,585 --> 00:19:20,067
whenever you're attempting
633
00:19:20,090 --> 00:19:22,169
to answer relevent questions.
634
00:19:22,193 --> 00:19:23,288
Whenever the width of the
635
00:19:23,312 --> 00:19:25,113
class intervals is non-uniform,
636
00:19:25,137 --> 00:19:27,176
use the minimum class interval
637
00:19:27,200 --> 00:19:28,767
as a standard
638
00:19:28,791 --> 00:19:30,549
and use cross multiplication
639
00:19:30,573 --> 00:19:32,439
to get an accurate representation
640
00:19:32,463 --> 00:19:34,041
of data on the graph.
641
00:19:34,358 --> 00:19:36,632
When it comes to frequency polygons,
642
00:19:36,656 --> 00:19:37,753
the only thing that
643
00:19:37,777 --> 00:19:38,831
you need to do differently
644
00:19:38,855 --> 00:19:39,700
from a histogram,
645
00:19:39,724 --> 00:19:41,004
is to mark the
646
00:19:41,028 --> 00:19:42,700
class mark on the graph.
647
00:19:43,024 --> 00:19:45,287
Class mark is the midpoint
648
00:19:45,312 --> 00:19:47,105
of all the class intervals.
649
00:19:47,129 --> 00:19:49,259
Instead of an entire bar,
650
00:19:49,282 --> 00:19:51,136
you only make one mark.
651
00:19:51,160 --> 00:19:52,272
Then you connect
652
00:19:52,296 --> 00:19:53,522
all of these dots
653
00:19:53,547 --> 00:19:55,007
and get a line.
654
00:19:55,031 --> 00:19:56,738
To make frequency polygon
655
00:19:56,761 --> 00:19:57,369
out of this,
656
00:19:57,393 --> 00:19:57,998
you need to
657
00:19:58,023 --> 00:19:59,067
close the figure.
658
00:19:59,090 --> 00:20:01,039
To do this, add a class
659
00:20:01,063 --> 00:20:01,999
before the first
660
00:20:02,023 --> 00:20:03,781
and after the last classes
661
00:20:03,805 --> 00:20:04,896
with the same width
662
00:20:04,920 --> 00:20:06,361
as the width of the first
663
00:20:06,385 --> 00:20:08,570
and the last classes respectively.
40841
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.