Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:08,341 --> 00:00:11,576
The future
unfolds before our eyes...
2
00:00:11,611 --> 00:00:14,212
but is it always beyond our grasp?
3
00:00:14,247 --> 00:00:17,582
What was once the province
of the gods
4
00:00:17,617 --> 00:00:21,820
has now come more clearly into view
5
00:00:21,855 --> 00:00:24,856
through mathematics and data.
6
00:00:24,891 --> 00:00:27,893
Out of some early observations
about gambling
7
00:00:27,928 --> 00:00:30,729
arose tools that guide
8
00:00:30,764 --> 00:00:32,964
our scientific understanding
of the world
9
00:00:32,999 --> 00:00:34,766
and more...
10
00:00:34,801 --> 00:00:38,136
through the power of prediction.
11
00:00:41,675 --> 00:00:43,341
From our decisions
about the weather...
12
00:00:43,376 --> 00:00:46,044
The strongest
hurricane ever on record...
13
00:00:46,079 --> 00:00:47,746
To finding
someone lost at sea...
14
00:00:47,781 --> 00:00:48,914
Commencing search pattern.
15
00:00:48,949 --> 00:00:50,348
Keep a good look out!
16
00:00:50,383 --> 00:00:52,884
Every day
mathematics and data combine
17
00:00:52,919 --> 00:00:55,854
to help us envision
what might be...
18
00:00:55,889 --> 00:00:59,524
It's the best crystal ball
that humankind can have.
19
00:00:59,559 --> 00:01:02,661
Take a trip
on the wings of probability
20
00:01:02,696 --> 00:01:04,429
into the future.
21
00:01:04,464 --> 00:01:05,464
We're thinking about luck
22
00:01:05,499 --> 00:01:06,832
or misfortune,
23
00:01:06,867 --> 00:01:09,268
but they just basically are
a question of math, right?
24
00:01:11,571 --> 00:01:14,072
"Prediction by the Numbers" --
25
00:01:14,107 --> 00:01:26,351
right now, on "NOVA."
26
00:01:26,386 --> 00:01:29,688
The Orange County
Fair, held in Southern California.
27
00:01:31,992 --> 00:01:35,560
In theory, these crowds
hold a predictive power
28
00:01:35,595 --> 00:01:38,296
that can have startling accuracy,
29
00:01:38,331 --> 00:01:43,001
but it doesn't belong to any
individual, only the group.
30
00:01:43,036 --> 00:01:44,636
And even then, it has to be viewed
31
00:01:44,671 --> 00:01:49,641
through the lens of mathematics.
32
00:01:49,676 --> 00:01:53,078
The theory is known as
the "wisdom of crowds,"
33
00:01:53,113 --> 00:01:57,582
a phenomenon first documented
about a hundred years ago.
34
00:01:57,617 --> 00:02:00,152
Statistician
Talithia Williams is here
35
00:02:00,187 --> 00:02:03,155
to see if the theory checks out,
and to spend some time
36
00:02:03,190 --> 00:02:06,358
with the fair's most beloved animal,
37
00:02:06,393 --> 00:02:09,427
Patches, a 14-year-old ox.
38
00:02:11,932 --> 00:02:14,232
It
was a fair kind of like this one
39
00:02:14,267 --> 00:02:16,268
where, in 1906,
40
00:02:16,303 --> 00:02:19,504
Sir Francis Galton came across a contest
41
00:02:19,539 --> 00:02:22,541
where you had to guess
the weight of an ox,
42
00:02:22,576 --> 00:02:24,943
like Patches
you see here behind me.
43
00:02:26,746 --> 00:02:29,281
After the ox
weight-guessing contest was over,
44
00:02:29,316 --> 00:02:34,719
Galton took all the entries home
and analyzed them statistically.
45
00:02:34,754 --> 00:02:36,388
To his surprise,
46
00:02:36,423 --> 00:02:39,357
while none of the individual
guesses were correct,
47
00:02:39,392 --> 00:02:42,194
the average of all the guesses
48
00:02:42,229 --> 00:02:45,096
was off by less than one percent.
49
00:02:45,131 --> 00:02:47,866
That's the wisdom of crowds.
50
00:02:47,901 --> 00:02:51,336
But is it still true?
51
00:02:51,371 --> 00:02:54,005
So, here's how I think
we can test that today.
52
00:02:54,040 --> 00:02:57,375
What if we ask a random sample
of people here at the fair
53
00:02:57,410 --> 00:03:00,912
if they can guess how many jelly
beans they think are in the jar.
54
00:03:00,947 --> 00:03:03,615
And then, we take those numbers
and average them
55
00:03:03,650 --> 00:03:05,116
and see if that's actually close
56
00:03:05,151 --> 00:03:07,586
to the true number of jelly beans.
57
00:03:09,422 --> 00:03:12,090
Guess how many jelly beans
are in here.
58
00:03:12,125 --> 00:03:14,025
Come on, guys, everybody's
got to have their guess.
59
00:03:14,060 --> 00:03:15,560
I see your mind churning.
60
00:03:15,595 --> 00:03:16,428
1,227.
61
00:03:16,463 --> 00:03:18,063
846.
62
00:03:18,098 --> 00:03:19,731
Probably like 925?
63
00:03:19,766 --> 00:03:21,199
I think a thousand.
64
00:03:21,234 --> 00:03:22,734
So just write your number down.
65
00:03:22,769 --> 00:03:24,169
Uh huh, there you go.
66
00:03:24,204 --> 00:03:26,538
Can I have a jelly bean?
67
00:03:28,642 --> 00:03:32,911
The 135 guesses
gathered from the crowd vary wildly.
68
00:03:32,946 --> 00:03:35,880
The range of our guesses was,
69
00:03:35,915 --> 00:03:40,185
from the smallest was 183,
the largest was 12,000.
70
00:03:40,220 --> 00:03:42,721
So you can tell,
folks were really guessing.
71
00:03:42,756 --> 00:03:49,160
But when we take the average
of our guesses, we get 1,522.
72
00:03:49,195 --> 00:03:50,996
So the question is,
73
00:03:51,031 --> 00:03:54,899
how close is our average to the
actual number of jelly beans?
74
00:03:54,934 --> 00:03:58,536
Well, now's the moment of truth.
75
00:04:09,916 --> 00:04:15,120
All right, so the real number
of jelly beans was 1,676.
76
00:04:15,155 --> 00:04:19,791
The average of our guesses was
off by less than ten percent.
77
00:04:19,826 --> 00:04:22,160
So there actually
was some wisdom in our crowd.
78
00:04:22,195 --> 00:04:25,697
Though off by about ten percent,
79
00:04:25,732 --> 00:04:27,265
the average of the crowd's estimates
80
00:04:27,300 --> 00:04:29,868
was still more accurate
81
00:04:29,903 --> 00:04:32,637
than the vast majority
of the individual guesses.
82
00:04:32,672 --> 00:04:36,508
Even so, the wisdom of crowds
does have limits.
83
00:04:36,543 --> 00:04:40,178
It can be easily undermined
by outside influences
84
00:04:40,213 --> 00:04:43,682
and tends to work best on
questions with clear answers,
85
00:04:43,717 --> 00:04:44,883
like a number.
86
00:04:44,918 --> 00:04:47,018
The steps Talithia took
87
00:04:47,053 --> 00:04:50,955
reflect a process going on
all around us these days
88
00:04:50,990 --> 00:04:53,358
in the work of statisticians.
89
00:04:53,393 --> 00:04:54,726
Thanks, everybody.
90
00:04:54,761 --> 00:04:55,960
So we collected this data,
91
00:04:55,995 --> 00:04:58,763
right, we analyzed it mathematically,
92
00:04:58,798 --> 00:05:01,366
and we got an estimate
that was pretty close
93
00:05:01,401 --> 00:05:03,435
to the actual true value.
94
00:05:03,470 --> 00:05:06,471
That's math and statistics
at work.
95
00:05:10,143 --> 00:05:14,579
We didn't always use
math and statistics to make predictions.
96
00:05:14,614 --> 00:05:19,617
The Romans studied the flights
and cries of birds.
97
00:05:19,652 --> 00:05:23,822
The Chinese cracked "oracle"
bones with a hot metal rod
98
00:05:23,857 --> 00:05:25,857
and read the results
99
00:05:25,892 --> 00:05:28,827
19th-century Russians
used chickens.
100
00:05:30,930 --> 00:05:34,032
Throughout history,
we've sought the future
101
00:05:34,067 --> 00:05:36,534
in moles on people's faces,
102
00:05:36,569 --> 00:05:38,870
clouds in the sky,
103
00:05:38,905 --> 00:05:41,973
or a pearl cast into an iron pot.
104
00:05:42,008 --> 00:05:48,080
And that list of things used
for predicting goes on and on.
105
00:05:53,753 --> 00:05:57,789
But more recently-- that is
the last couple hundred years--
106
00:05:57,824 --> 00:06:00,892
to see into the future,
we've turned to science
107
00:06:00,927 --> 00:06:05,029
and made some remarkable predictions
108
00:06:05,064 --> 00:06:08,600
from the existence of Neptune,
109
00:06:08,635 --> 00:06:11,936
or radio waves,
110
00:06:11,971 --> 00:06:15,206
or black holes,
111
00:06:15,241 --> 00:06:19,611
to the future location
of a comet with such precision
112
00:06:19,646 --> 00:06:21,446
we could land a space probe
on it.
113
00:06:23,316 --> 00:06:26,050
But if you pop the hood
of science,
114
00:06:26,085 --> 00:06:29,354
inside you'll find a field
of applied mathematics
115
00:06:29,389 --> 00:06:32,757
that's made many of those
predictions possible:
116
00:06:32,792 --> 00:06:34,659
statistics.
117
00:06:34,694 --> 00:06:36,828
Statistics is kind of unique.
118
00:06:36,863 --> 00:06:39,998
It's not an empirical science
itself, but it's not pure math,
119
00:06:40,033 --> 00:06:41,900
but it's not philosophy either.
120
00:06:41,935 --> 00:06:45,170
It's the framework, the language,
121
00:06:45,205 --> 00:06:49,407
the rules by which we do science.
122
00:06:49,442 --> 00:06:51,276
From that, we can make decisions,
123
00:06:51,311 --> 00:06:54,646
we can make conclusions,
we can make predictions.
124
00:06:54,681 --> 00:06:56,815
That's what... that's what
statisticians try to do.
125
00:06:56,850 --> 00:06:59,751
Why I love statistics is that
126
00:06:59,786 --> 00:07:02,787
it predicts the likelihood
of future occurrences,
127
00:07:02,822 --> 00:07:06,925
which really means
it's the best crystal ball
128
00:07:06,960 --> 00:07:08,893
that humankind can have.
129
00:07:10,930 --> 00:07:14,232
Ultimately, all
the predictive power of statistics
130
00:07:14,267 --> 00:07:18,536
rests on a revolutionary insight
from about 500 years ago--
131
00:07:18,571 --> 00:07:21,873
that chance itself can be tamed
132
00:07:21,908 --> 00:07:24,543
through the mathematics
of probability.
133
00:07:26,813 --> 00:07:29,414
Viva Las Vegas!
134
00:07:29,449 --> 00:07:33,384
Here's a city full of palaces
135
00:07:33,419 --> 00:07:36,254
built on understanding
probability
136
00:07:36,289 --> 00:07:38,490
and fueled by gambling,
137
00:07:38,525 --> 00:07:40,592
which may seem a funny place
138
00:07:40,627 --> 00:07:42,760
to find mathematician Keith Devlin.
139
00:07:42,795 --> 00:07:45,096
But mathematics and gambling
140
00:07:45,131 --> 00:07:47,832
have been tied together
for centuries.
141
00:07:47,867 --> 00:07:51,269
Today in a casino, you'll find roulette,
142
00:07:51,304 --> 00:07:52,737
slot machines,
143
00:07:52,772 --> 00:07:54,572
blackjack.
144
00:07:54,607 --> 00:07:58,676
Playing craps is also known
as "rolling the bones,"
145
00:07:58,711 --> 00:08:00,545
which is more accurate
than you might think.
146
00:08:00,580 --> 00:08:02,580
Humans have been gambling
147
00:08:02,615 --> 00:08:05,383
since the beginnings
of modern civilization.
148
00:08:05,418 --> 00:08:07,719
The ancient Greeks, the ancient Egyptians,
149
00:08:07,754 --> 00:08:11,856
would use the ankle bones of
sheep as a form of early dice.
150
00:08:11,891 --> 00:08:16,327
Surprisingly, while
the Greeks laid the foundation
151
00:08:16,362 --> 00:08:19,831
for our mathematics,
they didn't spend any effort
152
00:08:19,866 --> 00:08:22,166
trying to analyze games of chance.
153
00:08:22,201 --> 00:08:24,936
It seems to have
never occurred to them,
154
00:08:24,971 --> 00:08:28,806
or indeed to anybody way up
until the 15th, 16th century,
155
00:08:28,841 --> 00:08:30,575
that you could apply mathematics
156
00:08:30,610 --> 00:08:33,144
to calculate the way
these games would come out.
157
00:08:35,582 --> 00:08:39,450
16th-century Italian
mathematician Gerolamo Cardano
158
00:08:39,485 --> 00:08:42,086
made a key early observation:
159
00:08:42,121 --> 00:08:45,623
that the more times
a game of chance is played,
160
00:08:45,658 --> 00:08:47,959
the better mathematical
probability
161
00:08:47,994 --> 00:08:49,794
predicts the outcome,
162
00:08:49,829 --> 00:08:54,399
later proven as
the law of large numbers.
163
00:08:54,434 --> 00:08:59,704
Examples of the law of large
numbers at work surround us.
164
00:08:59,739 --> 00:09:01,105
When I flip this coin,
165
00:09:01,140 --> 00:09:02,840
we have no way of knowing
166
00:09:02,875 --> 00:09:04,809
whether it's going
to come up heads or tails.
167
00:09:07,513 --> 00:09:09,013
That time it was heads.
168
00:09:09,048 --> 00:09:13,484
On the other hand, if I were
to toss a coin 100 times,
169
00:09:13,519 --> 00:09:17,522
roughly 50% of the time
it would come up heads,
170
00:09:17,557 --> 00:09:19,724
and 50% of the time
it would come up tails.
171
00:09:19,759 --> 00:09:21,159
We can't predict a single toss.
172
00:09:21,194 --> 00:09:25,630
We can predict the aggregate
behavior over a 100 tosses.
173
00:09:25,665 --> 00:09:27,498
That's the law of large numbers.
174
00:09:27,533 --> 00:09:31,102
In fact, casinos are a testament
175
00:09:31,137 --> 00:09:33,871
to the iron hand of
the law of large numbers.
176
00:09:33,906 --> 00:09:35,239
The games are designed
177
00:09:35,274 --> 00:09:38,843
to give the casinos
a slight edge over the gambler.
178
00:09:38,878 --> 00:09:41,346
Take American roulette:
179
00:09:41,381 --> 00:09:44,649
on the wheel are
the numbers one through 36,
180
00:09:44,684 --> 00:09:46,718
half red and half black.
181
00:09:46,753 --> 00:09:49,420
Betting a dollar on one color
or the other
182
00:09:49,455 --> 00:09:52,457
seems like a 50-50 proposition.
183
00:09:52,492 --> 00:09:56,227
But the wheel also has
two green slots with zeros.
184
00:09:56,262 --> 00:09:57,528
If the ball lands in those,
185
00:09:57,563 --> 00:10:01,366
the casino wins all the bets
on either red or black.
186
00:10:01,401 --> 00:10:03,434
And that's the kind of edge
187
00:10:03,469 --> 00:10:07,505
that makes the casino money
over the long run.
188
00:10:07,540 --> 00:10:09,007
Customers are gambling.
189
00:10:09,042 --> 00:10:10,708
The casino is absolutely
not gambling.
190
00:10:10,743 --> 00:10:12,276
Because they may lose money,
191
00:10:12,311 --> 00:10:16,180
they may lose a lot of money
to one or two players,
192
00:10:16,215 --> 00:10:18,016
but if you have thousands
and thousands of players,
193
00:10:18,051 --> 00:10:19,517
by the law of large numbers,
194
00:10:19,552 --> 00:10:22,487
you are guaranteed to make money.
195
00:10:24,490 --> 00:10:25,890
The law of large numbers
196
00:10:25,925 --> 00:10:28,493
comes into play
outside of gambling too.
197
00:10:28,528 --> 00:10:31,396
In basketball, a field goal or
shooting percentage
198
00:10:31,431 --> 00:10:34,766
is the number of baskets made
199
00:10:34,801 --> 00:10:38,369
divided by the number of shots taken.
200
00:10:38,404 --> 00:10:41,105
But early in the season,
201
00:10:41,140 --> 00:10:43,441
when it's based on
a low number of attempts,
202
00:10:43,476 --> 00:10:45,410
that percentage can be misleading.
203
00:10:45,445 --> 00:10:48,246
At the beginning
of the season, a less skilled player
204
00:10:48,281 --> 00:10:52,216
might get off a few lucky shots
in a row.
205
00:10:52,251 --> 00:10:53,451
And at that point,
206
00:10:53,486 --> 00:10:55,119
they'd have a super-high
shooting percentage.
207
00:11:00,560 --> 00:11:02,126
Meanwhile, a very skilled player
208
00:11:02,161 --> 00:11:06,164
might miss a few at the
beginning of the season
209
00:11:06,199 --> 00:11:07,498
and have a low shooting percentage.
210
00:11:08,534 --> 00:11:10,134
But as the season goes on,
211
00:11:10,169 --> 00:11:11,703
and the total number of shots climbs,
212
00:11:11,738 --> 00:11:15,306
their shooting percentages
will soon reflect
213
00:11:15,341 --> 00:11:16,674
their true skill level.
214
00:11:18,444 --> 00:11:21,012
That's the law of large numbers
at work.
215
00:11:21,047 --> 00:11:26,317
A small sample, like just
a few shots, can be deceptive,
216
00:11:26,352 --> 00:11:29,454
while a large sample, like a lot of shots,
217
00:11:29,489 --> 00:11:31,522
gives you a better picture.
218
00:11:39,165 --> 00:11:43,034
The gambling observations that
led to the law of large numbers
219
00:11:43,069 --> 00:11:47,338
were a start, but what really
launched probability theory
220
00:11:47,373 --> 00:11:50,241
and opened up a door to a whole new way
221
00:11:50,276 --> 00:11:52,643
of thinking about the future,
222
00:11:52,678 --> 00:11:54,078
was a series of letters
223
00:11:54,113 --> 00:11:56,848
exchanged between
two French mathematicians,
224
00:11:56,883 --> 00:12:02,453
Blaise Pascal and
Pierre de Fermat in the 1650s,
225
00:12:02,488 --> 00:12:04,822
about another gambling problem
that had been kicking around
226
00:12:04,857 --> 00:12:07,358
for a few centuries.
227
00:12:07,393 --> 00:12:11,062
A simplified version of
the problem goes like this:
228
00:12:11,097 --> 00:12:14,165
two players-- let's call them
Blaise and Pierre--
229
00:12:14,200 --> 00:12:16,134
are flipping a coin.
230
00:12:16,169 --> 00:12:19,971
Blaise has chosen heads,
and Pierre tails.
231
00:12:20,006 --> 00:12:22,840
The game is the best of 5 flips,
232
00:12:22,875 --> 00:12:26,144
and each has put money
into the pot.
233
00:12:26,179 --> 00:12:30,715
They flip the coin three times,
and Blaise is ahead two to one.
234
00:12:32,151 --> 00:12:35,586
But then the game is interrupted.
235
00:12:35,621 --> 00:12:39,657
What is the fair way to split the pot?
236
00:12:39,692 --> 00:12:41,926
The question is:
how do they divide up the pot
237
00:12:41,961 --> 00:12:44,428
so that it's fair
to what might have happened
238
00:12:44,463 --> 00:12:46,430
if they'd been able to complete the game.
239
00:12:46,465 --> 00:12:51,402
Fermat suggested
imagining the possible future outcomes
240
00:12:51,437 --> 00:12:53,604
if the game had continued.
241
00:12:53,639 --> 00:12:56,307
There are just two more
coin flips,
242
00:12:56,342 --> 00:12:59,177
creating four possible
combinations.
243
00:12:59,212 --> 00:13:05,383
Heads-heads, heads-tails,
tails-heads, and tails-tails.
244
00:13:05,418 --> 00:13:09,520
In the first three,
Blaise wins with enough heads.
245
00:13:09,555 --> 00:13:12,256
Pierre only wins in the last case,
246
00:13:12,291 --> 00:13:16,394
so Fermat suggested that
a three-to-one split
247
00:13:16,429 --> 00:13:18,930
was the correct solution.
248
00:13:18,965 --> 00:13:20,631
The key breakthrough
249
00:13:20,666 --> 00:13:24,202
was imagining the future,
mathematically,
250
00:13:24,237 --> 00:13:27,305
something even Pascal had trouble with.
251
00:13:27,340 --> 00:13:31,943
Because what Fermat did was say,
"Let's look into the future,
252
00:13:31,978 --> 00:13:33,744
"look at possible futures,
253
00:13:33,779 --> 00:13:36,547
"and we'll count the way
things could have happened
254
00:13:36,582 --> 00:13:38,449
in different possible futures."
255
00:13:38,484 --> 00:13:41,118
It was a simple arithmetic issue,
256
00:13:41,153 --> 00:13:44,822
but the idea of counting things
in the future
257
00:13:44,857 --> 00:13:47,124
was just completely new,
258
00:13:47,159 --> 00:13:49,060
and Pascal couldn't wrap
his mind around it.
259
00:13:49,095 --> 00:13:53,397
Eventually Pascal
accepted Fermat's solution,
260
00:13:53,432 --> 00:13:54,999
as did others,
261
00:13:55,034 --> 00:13:58,002
and today, that exchange
of letters is regarded
262
00:13:58,037 --> 00:14:01,405
as the birth of
modern probability theory.
263
00:14:01,440 --> 00:14:04,842
People realized
the future wasn't blank.
264
00:14:04,877 --> 00:14:06,911
You didn't know exactly
what was going to happen,
265
00:14:06,946 --> 00:14:09,313
but you could calculate
with great precision
266
00:14:09,348 --> 00:14:12,116
what the likelihood
of things happening were.
267
00:14:12,151 --> 00:14:14,252
You could make all of the predictions
268
00:14:14,287 --> 00:14:16,587
we make today and take for granted.
269
00:14:16,622 --> 00:14:20,324
You could make them using mathematics.
270
00:14:20,359 --> 00:14:22,426
It was a fundamental insight,
271
00:14:22,461 --> 00:14:25,529
and one of the doors
that led to the modern world.
272
00:14:25,564 --> 00:14:29,634
Inherent in all our attempts
to predict the future--
273
00:14:29,669 --> 00:14:31,335
from the stock market
274
00:14:31,370 --> 00:14:32,837
to insurance
275
00:14:32,872 --> 00:14:35,072
to web retailers trying to figure out
276
00:14:35,107 --> 00:14:36,841
what you might buy next,
277
00:14:36,876 --> 00:14:39,977
is the idea that with the right data,
278
00:14:40,012 --> 00:14:44,148
the likelihood of future events
can be calculated.
279
00:14:48,421 --> 00:14:50,288
In fact, one of
the great success stories
280
00:14:50,323 --> 00:14:52,056
in the science of prediction
281
00:14:52,091 --> 00:14:55,259
yields a forecast
that many of us check every day
282
00:14:55,294 --> 00:14:58,763
to answer the question,
"Do I need an umbrella
283
00:14:58,798 --> 00:15:01,732
or a storm shelter?"
284
00:15:04,937 --> 00:15:06,304
The hurricane season of 2017
285
00:15:06,339 --> 00:15:09,941
will be remembered for
its ferocity and destruction.
286
00:15:09,976 --> 00:15:11,976
The strongest ever on record...
287
00:15:12,011 --> 00:15:14,111
The Puerto Rico and the San Juan
that we knew yesterday
288
00:15:14,146 --> 00:15:15,346
is no longer there.
289
00:15:17,416 --> 00:15:21,352
The storms formed and
gained in intensity with surprising speed,
290
00:15:21,387 --> 00:15:24,655
leaving forecasters
to emphasize the uncertainty
291
00:15:24,690 --> 00:15:25,856
of where they might land.
292
00:15:25,891 --> 00:15:27,825
Maria is now
a Category Three hurricane.
293
00:15:27,860 --> 00:15:30,461
Exactly what it's going to look
like, we just don't know yet.
294
00:15:30,496 --> 00:15:32,363
There's still
great uncertainty...
295
00:15:32,398 --> 00:15:35,099
In weather forecasting,
296
00:15:35,134 --> 00:15:38,636
the only certainty is uncertainty.
297
00:15:38,671 --> 00:15:40,004
One thing we know
298
00:15:40,039 --> 00:15:43,174
for sure is we cannot give you
a perfect forecast.
299
00:15:43,209 --> 00:15:49,013
Given the nature of
how we make a forecast,
300
00:15:49,048 --> 00:15:53,718
from the global observations to
equations running on computers,
301
00:15:53,753 --> 00:15:56,253
stepping out in time,
302
00:15:56,288 --> 00:15:58,489
I don't think there'll ever be
a perfect forecast.
303
00:15:58,524 --> 00:16:02,159
To fight that uncertainty,
304
00:16:02,194 --> 00:16:06,464
forecasters have turned
to more data-- lots more data.
305
00:16:06,499 --> 00:16:08,566
Here at the
National Weather Service
306
00:16:08,601 --> 00:16:10,067
Baltimore-Washington office,
307
00:16:10,102 --> 00:16:12,603
meteorologist Isha Renta
308
00:16:12,638 --> 00:16:15,873
prepares for the afternoon
launch of a weather balloon.
309
00:16:15,908 --> 00:16:17,875
Twice a day, every day,
310
00:16:17,910 --> 00:16:21,512
all across the U.S.
and around the world,
311
00:16:21,547 --> 00:16:23,414
at the very same time,
312
00:16:23,449 --> 00:16:26,584
balloons are released
to take a package of instruments
313
00:16:26,619 --> 00:16:29,653
up through the atmosphere.
314
00:16:29,688 --> 00:16:34,325
It transmits readings about
every ten meters in height.
315
00:16:34,360 --> 00:16:36,093
It's my understanding
that they have developed
316
00:16:36,128 --> 00:16:39,230
other ways to get vertical
profiles of the atmosphere,
317
00:16:39,265 --> 00:16:41,766
but still the accuracy
and the resolution
318
00:16:41,801 --> 00:16:43,567
that the weather balloon will
give you is a lot higher.
319
00:16:43,602 --> 00:16:45,803
So that's why
we still depend on them.
320
00:16:51,010 --> 00:16:53,778
The data from
Isha's weather balloon ends up
321
00:16:53,813 --> 00:16:55,613
at the National Center
for Environmental Prediction
322
00:16:55,648 --> 00:16:57,515
in College Park, Maryland,
323
00:16:57,550 --> 00:17:00,684
the starting point for
nearly all weather forecasts
324
00:17:00,719 --> 00:17:04,188
in the United States.
325
00:17:04,223 --> 00:17:06,891
Her information becomes one drop
in a very large bucket
326
00:17:06,926 --> 00:17:11,662
of data taken in each day.
327
00:17:11,697 --> 00:17:12,963
Temperature, pressure,
wind speed,
328
00:17:12,998 --> 00:17:14,365
and direction in the atmosphere.
329
00:17:14,400 --> 00:17:16,801
Tens of thousands of point
observations are used
330
00:17:16,836 --> 00:17:20,037
every hour of every day
as kind of a starting point.
331
00:17:20,072 --> 00:17:22,440
That's where we begin the simulation,
332
00:17:22,475 --> 00:17:23,674
from those observations.
333
00:17:25,511 --> 00:17:27,244
It all
becomes part of a process,
334
00:17:27,279 --> 00:17:28,646
which has been described
335
00:17:28,681 --> 00:17:31,248
as one of the great
intellectual achievements
336
00:17:31,283 --> 00:17:34,852
of the 20th century: numerical forecasting.
337
00:17:37,790 --> 00:17:39,890
The first step
in numerical forecasting
338
00:17:39,925 --> 00:17:44,295
is to break a nearly 40-mile-
thick section of the atmosphere
339
00:17:44,330 --> 00:17:47,098
into a three-dimensional grid.
340
00:17:47,133 --> 00:17:51,335
Then, each grid point is
assigned numerical values
341
00:17:51,370 --> 00:17:53,304
for different aspects of the weather,
342
00:17:53,339 --> 00:17:56,140
based on the billions of measurements
343
00:17:56,175 --> 00:17:59,210
continually pouring
into the Weather Service.
344
00:17:59,245 --> 00:18:00,544
So you'll have an understanding
345
00:18:00,579 --> 00:18:02,146
of temperature, pressure,
346
00:18:02,181 --> 00:18:04,882
and values in terms of wind
and wind direction
347
00:18:04,917 --> 00:18:06,183
at each one of these points
348
00:18:06,218 --> 00:18:07,518
within this grid that covers the globe.
349
00:18:07,553 --> 00:18:11,956
From there,
equations from the physics
350
00:18:11,991 --> 00:18:16,760
of fluids and thermodynamics
are applied to each grid point.
351
00:18:16,795 --> 00:18:19,029
Not only do you
change the characteristics
352
00:18:19,064 --> 00:18:22,199
at each grid point, but the
changes at those grid points
353
00:18:22,234 --> 00:18:24,201
affect neighboring grid points,
354
00:18:24,236 --> 00:18:27,104
and then neighboring grid points
affect other grid points.
355
00:18:27,139 --> 00:18:29,874
And so you evolve the atmosphere
through time
356
00:18:29,909 --> 00:18:31,842
in this three-dimensional space.
357
00:18:31,877 --> 00:18:35,679
And
remarkably, the approach works.
358
00:18:35,714 --> 00:18:37,515
It's amazingly crazy that it works.
359
00:18:37,550 --> 00:18:41,418
It's remarkable how well it does work,
360
00:18:41,453 --> 00:18:44,221
given that we're making
grand assumptions
361
00:18:44,256 --> 00:18:46,557
about the initial state,
so to speak,
362
00:18:46,592 --> 00:18:49,059
or the beginning state
of any forecast.
363
00:18:49,094 --> 00:18:54,165
And that initial state
turns out to be absolutely crucial.
364
00:18:56,268 --> 00:18:58,035
In the early days
of numerical forecasting,
365
00:18:58,070 --> 00:19:01,438
it seemed like
a definitive weather prediction
366
00:19:01,473 --> 00:19:04,808
extending far into the future
might soon be possible.
367
00:19:04,843 --> 00:19:08,913
But research in the 1960s
368
00:19:08,948 --> 00:19:12,583
showed that slight errors
in measuring the initial state
369
00:19:12,618 --> 00:19:17,688
grow larger over time,
leading predictions astray.
370
00:19:17,723 --> 00:19:20,291
So as you step ahead in time,
371
00:19:20,326 --> 00:19:22,393
the forecast
will become less accurate.
372
00:19:22,428 --> 00:19:27,198
Ironically, that
sensitivity to initial conditions
373
00:19:27,233 --> 00:19:30,134
also suggested a way
to improve the accuracy
374
00:19:30,169 --> 00:19:32,403
of numerical weather forecasts.
375
00:19:32,438 --> 00:19:35,839
Thanks to the power of today's computers,
376
00:19:35,874 --> 00:19:39,076
forecasters can run their
weather simulations not once
377
00:19:39,111 --> 00:19:41,645
but several times.
378
00:19:41,680 --> 00:19:45,716
For each run, they slightly
alter the initial conditions
379
00:19:45,751 --> 00:19:48,519
to reflect the inherent error
built into the measurements
380
00:19:48,554 --> 00:19:50,387
and the uncertainty in the model itself.
381
00:19:50,422 --> 00:19:55,559
The process is called ensemble forecasting,
382
00:19:55,594 --> 00:19:59,029
and the results are called
spaghetti plots.
383
00:19:59,064 --> 00:20:01,599
We're looking at
about 100 different forecasts here
384
00:20:01,634 --> 00:20:04,268
for the jet stream at about six days ago.
385
00:20:04,303 --> 00:20:05,970
We have the actual jet stream
386
00:20:06,005 --> 00:20:08,072
drawn as the white line
on here today,
387
00:20:08,107 --> 00:20:10,474
and you can see most of the
forecasts six days ago
388
00:20:10,509 --> 00:20:12,409
were well north of where we
actually find the jet stream
389
00:20:12,444 --> 00:20:13,711
this morning.
390
00:20:13,746 --> 00:20:16,680
And then we'll go to
a five-day forecast
391
00:20:16,715 --> 00:20:19,583
and a four-day forecast
and a three-day forecast
392
00:20:19,618 --> 00:20:23,087
and then down to two days
and the day of the event.
393
00:20:23,122 --> 00:20:24,822
And you can see how the model
forecasts all converge
394
00:20:24,857 --> 00:20:27,558
on that solution, which is what
you would expect them to do.
395
00:20:27,593 --> 00:20:30,194
But you go back
to the six-day forecast,
396
00:20:30,229 --> 00:20:32,997
you can see the large spread
in the ensemble solutions
397
00:20:33,032 --> 00:20:35,232
for this particular pattern.
398
00:20:35,267 --> 00:20:40,004
In the end,
meteorologists turn to statistical tools
399
00:20:40,039 --> 00:20:42,172
to analyze weather forecasts
400
00:20:42,207 --> 00:20:45,609
and often use probabilities
to express the uncertainty
401
00:20:45,644 --> 00:20:47,645
in the results.
402
00:20:47,680 --> 00:20:49,913
That's the "40% chance of rain"
you might hear
403
00:20:49,948 --> 00:20:52,483
from your local forecaster.
404
00:20:52,518 --> 00:20:54,718
Meteorology is probabilistic
at its very core,
405
00:20:54,753 --> 00:20:56,720
and I believe that
the general public knows
406
00:20:56,755 --> 00:21:00,491
there is uncertainty inherent
in everything we say,
407
00:21:00,526 --> 00:21:01,725
but we're getting better.
408
00:21:04,530 --> 00:21:09,066
Our forecasts for three days out
now are as accurate
409
00:21:09,101 --> 00:21:11,001
as one-day forecasts were
about 10 years ago.
410
00:21:11,036 --> 00:21:12,569
And this continues to improve.
411
00:21:12,604 --> 00:21:15,673
So the science has advanced
beyond my wildest dreams,
412
00:21:15,708 --> 00:21:19,577
and it's hard to even see
where it might go in the future.
413
00:21:22,147 --> 00:21:23,847
Just like in meteorology,
414
00:21:23,882 --> 00:21:25,149
for the rest of science,
415
00:21:25,184 --> 00:21:27,618
the ultimate test of our understanding
416
00:21:27,653 --> 00:21:31,588
is our ability
to make accurate predictions.
417
00:21:31,623 --> 00:21:33,724
On a grand scale, scientific theories
418
00:21:33,759 --> 00:21:36,593
like Einstein's general theory
of relativity
419
00:21:36,628 --> 00:21:38,028
have to make predictions
420
00:21:38,063 --> 00:21:40,597
that can be tested to become accepted.
421
00:21:40,632 --> 00:21:43,233
In that case, it took four years
422
00:21:43,268 --> 00:21:45,636
before a full solar eclipse
revealed
423
00:21:45,671 --> 00:21:48,939
that light passing near the sun curved,
424
00:21:48,974 --> 00:21:52,443
just as predicted
by Einstein's theory--
425
00:21:52,478 --> 00:21:54,611
the first proof he was right
426
00:21:54,646 --> 00:21:58,582
that the sun's mass distorts
the fabric of space-time--
427
00:21:58,617 --> 00:22:00,884
what we experience as gravity.
428
00:22:02,221 --> 00:22:06,890
In fact, the scientific method
demands a hypothesis
429
00:22:06,925 --> 00:22:09,793
which leads to a prediction
of results
430
00:22:09,828 --> 00:22:12,229
from a carefully designed
experiment
431
00:22:12,264 --> 00:22:13,864
that will test its claim.
432
00:22:16,268 --> 00:22:20,104
Surprisingly, it wasn't
until the 1920s and '30s
433
00:22:20,139 --> 00:22:23,273
that a British scientist,
Ronald A. Fisher,
434
00:22:23,308 --> 00:22:26,710
laid out guidelines
for designing experiments
435
00:22:26,745 --> 00:22:31,982
using statistics and probability
as a way of judging results.
436
00:22:36,221 --> 00:22:39,423
As an example,
he told the story of a lady
437
00:22:39,458 --> 00:22:41,091
who claimed to taste the difference
438
00:22:41,126 --> 00:22:43,861
between milk poured into her tea
439
00:22:47,866 --> 00:22:51,335
and tea poured into her milk.
440
00:22:56,408 --> 00:22:59,576
Fisher considered ways
to test that.
441
00:22:59,611 --> 00:23:03,514
What if he presented her
with just one cup to identify?
442
00:23:03,549 --> 00:23:05,749
If she got it right one time,
you'd probably,
443
00:23:05,784 --> 00:23:08,352
"Well, yeah, but she had a
50-50 chance, just by guessing,
444
00:23:08,387 --> 00:23:09,887
of getting it right."
445
00:23:09,922 --> 00:23:12,523
So you'd be pretty unconvinced
that she has the skill.
446
00:23:12,558 --> 00:23:16,460
Fisher proposed
that a reasonable test of her ability
447
00:23:16,495 --> 00:23:18,729
would be eight cups,
448
00:23:18,764 --> 00:23:20,130
four with milk into tea,
449
00:23:20,165 --> 00:23:23,801
four with tea into milk,
450
00:23:23,836 --> 00:23:26,970
each presented randomly.
451
00:23:27,005 --> 00:23:33,210
The lady then had to separate
them back into the two groups.
452
00:23:33,245 --> 00:23:34,645
Why eight?
453
00:23:34,680 --> 00:23:38,248
Because that produced
70 possible combinations
454
00:23:38,283 --> 00:23:43,487
of the cups, but only one
with them separated correctly.
455
00:23:43,522 --> 00:23:46,089
If she got it right,
456
00:23:46,124 --> 00:23:48,992
that wouldn't prove
she had a special ability,
457
00:23:49,027 --> 00:23:53,163
but Fisher could conclude,
if she was just guessing,
458
00:23:53,198 --> 00:23:56,433
it was an extremely unlikely result,
459
00:23:56,468 --> 00:24:00,771
a probability
of just 1.4 percent.
460
00:24:00,806 --> 00:24:03,707
Thanks mainly to Fisher,
461
00:24:03,742 --> 00:24:07,478
that idea became enshrined
in experimental science
462
00:24:07,513 --> 00:24:12,182
as the "p-value" --
p for probability.
463
00:24:12,217 --> 00:24:15,252
If you assume your results
were just due to chance,
464
00:24:15,287 --> 00:24:18,088
that what you were testing
had no effect,
465
00:24:18,123 --> 00:24:21,825
what's the probability
you would see those results
466
00:24:21,860 --> 00:24:24,094
or something even more rare?
467
00:24:24,129 --> 00:24:26,196
If you assume that there's a process
468
00:24:26,231 --> 00:24:28,131
that is completely random,
469
00:24:28,166 --> 00:24:32,870
and you find that it's pretty
unlikely to get your data,
470
00:24:32,905 --> 00:24:34,972
then you might be suspicious
that something is happening.
471
00:24:35,007 --> 00:24:37,975
You might conclude, in fact,
that it's not a random process.
472
00:24:38,010 --> 00:24:42,179
That this is interesting to look
at what else might be going on,
473
00:24:42,214 --> 00:24:43,914
and it passes some kind
of sniff test.
474
00:24:43,949 --> 00:24:47,384
Fisher also
suggested a benchmark:
475
00:24:47,419 --> 00:24:48,785
only experimental results
476
00:24:48,820 --> 00:24:51,922
where the p-value
was under .05--
477
00:24:51,957 --> 00:24:54,291
a probability of less than
five percent--
478
00:24:54,326 --> 00:24:56,093
were worth a second look.
479
00:24:56,128 --> 00:24:57,394
In other words,
480
00:24:57,429 --> 00:25:00,731
if you assume your results
were just due to chance,
481
00:25:00,766 --> 00:25:03,734
you'd see them less than
one time out of 20.
482
00:25:03,769 --> 00:25:06,370
Not very likely.
483
00:25:06,405 --> 00:25:10,607
He called those results
"statistically significant."
484
00:25:10,642 --> 00:25:12,276
Statistically significant.
485
00:25:12,311 --> 00:25:13,577
Now this is a terrible word.
486
00:25:13,612 --> 00:25:15,379
It could be quite insignificant.
487
00:25:15,414 --> 00:25:19,583
You could be detecting
a very, very, very small effect,
488
00:25:19,618 --> 00:25:22,686
but it would be called,
in the mathematical lingo,
489
00:25:22,721 --> 00:25:23,854
"significant."
490
00:25:25,090 --> 00:25:26,523
Since Fisher's day,
491
00:25:26,558 --> 00:25:30,761
p-values have been used as a
convenient yardstick for success
492
00:25:30,796 --> 00:25:34,565
by many, including
most scientific journals.
493
00:25:34,600 --> 00:25:36,700
Since they prefer to publish successes,
494
00:25:36,735 --> 00:25:40,103
and getting published is
critical to career advancement,
495
00:25:40,138 --> 00:25:43,941
the temptation to massage and
manipulate experimental data
496
00:25:43,976 --> 00:25:47,244
into a good p-value is enormous.
497
00:25:47,279 --> 00:25:51,348
There's even a name for it:
"p-hacking."
498
00:25:51,383 --> 00:25:55,686
P-hacking is when researchers
consciously or unconsciously
499
00:25:55,721 --> 00:25:59,389
guide their data analysis to get
the results that they want,
500
00:25:59,424 --> 00:26:03,560
and since .05 is kind of the-the
bar for being able to publish
501
00:26:03,595 --> 00:26:06,630
and call something real,
and get all your grant money,
502
00:26:06,665 --> 00:26:10,434
it's usually guiding the results
503
00:26:10,469 --> 00:26:13,370
so that you arrive
at that p of .05.
504
00:26:15,774 --> 00:26:19,776
How much p-hacking
really goes on is hard to know.
505
00:26:19,811 --> 00:26:22,212
What may be more important
is to remember
506
00:26:22,247 --> 00:26:25,882
what was originally intended
by a p-value.
507
00:26:25,917 --> 00:26:31,355
The p-value was always meant
to be a detective, not a judge.
508
00:26:31,390 --> 00:26:33,090
If you do an experiment
509
00:26:33,125 --> 00:26:36,593
and find the result that is
statistically significant,
510
00:26:36,628 --> 00:26:40,530
that is telling you, that is
an interesting place to look
511
00:26:40,565 --> 00:26:42,733
and research and understand
further what's going on,
512
00:26:42,768 --> 00:26:45,769
not "don't study this anymore
because the matter is settled."
513
00:26:45,804 --> 00:26:50,540
In a sense,
a low p-value is an invitation
514
00:26:50,575 --> 00:26:54,244
to reproduce the experiment,
to help validate the result,
515
00:26:54,279 --> 00:26:56,546
but that doesn't always happen.
516
00:26:56,581 --> 00:27:00,584
In fact, there are few
career incentives for it.
517
00:27:00,619 --> 00:27:04,221
Journals and funders prefer novel research.
518
00:27:04,256 --> 00:27:08,625
There is no Nobel Prize
for replication.
519
00:27:08,660 --> 00:27:14,164
Another solution to p-hacking
and the overemphasis on p-values
520
00:27:14,199 --> 00:27:17,868
may simply be greater transparency.
521
00:27:17,903 --> 00:27:19,436
More and
more, what people are doing
522
00:27:19,471 --> 00:27:21,405
is publishing their data.
523
00:27:21,440 --> 00:27:25,776
And so it's becoming harder and
harder to lie with statistics,
524
00:27:25,811 --> 00:27:27,444
because people will just probe and say,
525
00:27:27,479 --> 00:27:28,912
"Well, give me the set you analyzed
526
00:27:28,947 --> 00:27:30,514
and let me see
how you got this result."
527
00:27:32,551 --> 00:27:36,453
Statistics continues
to play a fundamental role in science,
528
00:27:36,488 --> 00:27:39,189
but really anywhere data
is collected,
529
00:27:39,224 --> 00:27:40,624
you'll find statisticians
are at work,
530
00:27:40,659 --> 00:27:44,861
looking for patterns, drawing conclusions,
531
00:27:44,896 --> 00:27:47,230
and often making predictions--
532
00:27:47,265 --> 00:27:50,267
though they don't always work out.
533
00:27:51,369 --> 00:27:54,271
The presidential election
of 2016
534
00:27:54,306 --> 00:27:55,772
was a tough one for pollsters,
535
00:27:55,807 --> 00:27:59,276
the folks who conduct
and analyze opinion polls.
536
00:28:00,612 --> 00:28:03,146
Hillary Clinton was
the overwhelming favorite
537
00:28:03,181 --> 00:28:05,582
to beat Donald Trump
right up to election day.
538
00:28:05,617 --> 00:28:07,517
Trump is headed for a historic defeat.
539
00:28:07,552 --> 00:28:09,119
He's going to lose by a landslide.
540
00:28:09,154 --> 00:28:11,288
I think that she's going
to have a very good night.
541
00:28:11,323 --> 00:28:15,992
The "New York
Times" put Trump's chances at 15%.
542
00:28:16,027 --> 00:28:20,430
One pollster on election night
gave him one percent.
543
00:28:20,465 --> 00:28:24,468
A projection of a 99% chance
of winning, is that correct?
544
00:28:24,503 --> 00:28:25,869
The odds are overwhelming
545
00:28:25,904 --> 00:28:28,305
of a Hillary Clinton victory
on Tuesday.
546
00:28:28,340 --> 00:28:31,442
I would be very surprised
if anything else happened
547
00:28:33,078 --> 00:28:37,514
And, of course,
Trump won, and Clinton lost.
548
00:28:37,549 --> 00:28:39,249
People were repeatedly told,
549
00:28:39,284 --> 00:28:40,584
"Hillary Clinton is the candidate
550
00:28:40,619 --> 00:28:43,253
most likely to win this
election," and she didn't.
551
00:28:43,288 --> 00:28:46,556
And I think that really left
people feeling almost lied to,
552
00:28:46,591 --> 00:28:49,326
almost cheated by these numbers.
553
00:28:49,361 --> 00:28:52,229
So what
was going on with the polls?
554
00:28:52,264 --> 00:28:55,332
And exactly how do
people predict elections?
555
00:28:55,367 --> 00:28:59,202
One way is just by asking people
who they'll vote for.
556
00:28:59,237 --> 00:29:03,273
One of
the great things about polling
557
00:29:03,308 --> 00:29:05,942
is that we don't have to talk to everybody
558
00:29:05,977 --> 00:29:12,249
in order to find out what
the opinions are of everybody.
559
00:29:12,284 --> 00:29:14,484
We can actually select
something called a sample.
560
00:29:16,388 --> 00:29:18,822
Sampling is a familiar idea.
561
00:29:18,857 --> 00:29:20,857
To see if the soup is right,
562
00:29:20,892 --> 00:29:23,760
you taste a teaspoon, not the whole pot.
563
00:29:23,795 --> 00:29:25,929
To test your blood at the doctor,
564
00:29:25,964 --> 00:29:28,131
they typically draw less than an ounce,
565
00:29:28,166 --> 00:29:30,267
they don't drain you dry.
566
00:29:30,302 --> 00:29:31,635
But in many circumstances,
567
00:29:31,670 --> 00:29:37,340
finding a representative sample
is harder than it sounds.
568
00:29:37,375 --> 00:29:39,643
Let's suppose that this
569
00:29:39,678 --> 00:29:42,746
is the population of about
a thousand people in a city,
570
00:29:42,781 --> 00:29:45,949
and we want to know,
"Are people for or against
571
00:29:45,984 --> 00:29:48,552
converting a park
into a dog park?"
572
00:29:48,587 --> 00:29:50,487
And so these green beads
down here
573
00:29:50,522 --> 00:29:52,322
are going to represent
people who were for it,
574
00:29:52,357 --> 00:29:55,091
and the red beads are
folks who are against it.
575
00:29:55,126 --> 00:29:58,795
Talithia's
first step is to take advantage
576
00:29:58,830 --> 00:30:03,767
of an unlikely ally
in sampling: randomness.
577
00:30:03,802 --> 00:30:05,569
The beauty of randomization
578
00:30:05,604 --> 00:30:08,605
is that as long as you throw
everything from your population
579
00:30:08,640 --> 00:30:12,042
into one pot
and randomly pull it out,
580
00:30:12,077 --> 00:30:13,643
you can be sure that you're
within
581
00:30:13,678 --> 00:30:15,178
a certain percentage points
582
00:30:15,213 --> 00:30:17,014
of the actual value that's in that pot.
583
00:30:19,217 --> 00:30:23,153
So the plan is to randomly
sample the beads-- but how many?
584
00:30:23,188 --> 00:30:27,190
That depends on how much
accuracy Talithia wants.
585
00:30:27,225 --> 00:30:31,228
One measure is
the margin of error--
586
00:30:31,263 --> 00:30:33,830
the maximum amount
the result from the sample
587
00:30:33,865 --> 00:30:37,567
can be expected to differ from
that of the whole population.
588
00:30:37,602 --> 00:30:40,670
It's the plus or minus figure,
often a percentage,
589
00:30:40,705 --> 00:30:43,506
you see in the fine print
in polls.
590
00:30:43,541 --> 00:30:47,510
But there's also confidence level.
591
00:30:47,545 --> 00:30:49,913
Inherently, there is uncertainty
592
00:30:49,948 --> 00:30:52,782
that any sample really
represents a whole population.
593
00:30:52,817 --> 00:30:55,952
The confidence level tells you
how sure you can be
594
00:30:55,987 --> 00:30:57,287
about your result.
595
00:30:57,322 --> 00:30:59,823
A 90% confidence level means,
596
00:30:59,858 --> 00:31:03,860
on average, if you ran your poll
or sample 100 times,
597
00:31:03,895 --> 00:31:06,963
90 of those times, it would be accurate,
598
00:31:06,998 --> 00:31:09,299
within the margin of error.
599
00:31:09,334 --> 00:31:13,570
Talithia knows the total number
of beads is a thousand.
600
00:31:13,605 --> 00:31:16,072
And she's settled on
a plus-or-minus five-percent
601
00:31:16,107 --> 00:31:20,110
margin of error
at a 90% confidence level.
602
00:31:20,145 --> 00:31:25,548
That means she needs a sample
size of at least 214 beads.
603
00:31:25,583 --> 00:31:27,150
Here are the results:
604
00:31:27,185 --> 00:31:31,788
We got a 103 red beads
and 111 green,
605
00:31:31,823 --> 00:31:34,791
so about 48% of our population
would vote against,
606
00:31:34,826 --> 00:31:37,460
and about 52% would vote for.
607
00:31:37,495 --> 00:31:39,963
Now, remember that margin of
error that we talked about,
608
00:31:39,998 --> 00:31:41,631
that plus-or-minus five percent?
609
00:31:41,666 --> 00:31:43,867
So once you take that into account,
610
00:31:43,902 --> 00:31:46,169
those numbers really
aren't that different at all.
611
00:31:46,204 --> 00:31:50,774
So I guess you could say,
this puppy is too close to call.
612
00:31:50,809 --> 00:31:56,546
In fact, within the
margin of error, the stats got it right.
613
00:31:56,581 --> 00:32:00,517
There were an equal number of
red and green beads in the jar.
614
00:32:03,021 --> 00:32:05,989
While the sampling error
built in from the mathematics
615
00:32:06,024 --> 00:32:07,257
can be quantified,
616
00:32:07,292 --> 00:32:10,226
there are other errors
that can't.
617
00:32:10,261 --> 00:32:14,030
The other parts of the
error-- how we word our questions,
618
00:32:14,065 --> 00:32:15,899
how the respondents feel that day,
619
00:32:15,934 --> 00:32:18,234
the responsibility to predict
620
00:32:18,269 --> 00:32:21,905
what their behavior is going
to be somewhere down the line--
621
00:32:21,940 --> 00:32:23,440
all those sources of error
622
00:32:23,475 --> 00:32:25,909
are something that we can't calculate.
623
00:32:25,944 --> 00:32:30,914
And there's a catch
to random sampling for polls too.
624
00:32:30,949 --> 00:32:32,615
A few decades back,
625
00:32:32,650 --> 00:32:35,719
when just about every household
had a landline,
626
00:32:35,754 --> 00:32:40,323
finding a random sample meant
randomly dialing phone numbers.
627
00:32:40,358 --> 00:32:44,260
Into the 1970s
and the 1980s, we were getting,
628
00:32:44,295 --> 00:32:46,496
you know, 90% response rates.
629
00:32:46,531 --> 00:32:48,798
If we randomly chose a phone number,
630
00:32:48,833 --> 00:32:50,967
somebody on the other end
of that phone would pick it up
631
00:32:51,002 --> 00:32:52,335
and would do the interview
with us.
632
00:32:53,772 --> 00:32:55,538
Those days are over.
633
00:32:55,573 --> 00:32:58,475
Thanks to caller I.D.
and answering machines,
634
00:32:58,510 --> 00:33:01,444
people often don't answer
their landlines anymore--
635
00:33:01,479 --> 00:33:04,114
if they even have one.
636
00:33:04,149 --> 00:33:06,750
Response rates are way down.
637
00:33:06,785 --> 00:33:09,652
Only about ten percent of people
638
00:33:09,687 --> 00:33:11,321
respond to polls.
639
00:33:11,356 --> 00:33:13,289
So you're kind of crossing your fingers
640
00:33:13,324 --> 00:33:14,991
and hoping the people you reach
641
00:33:15,026 --> 00:33:18,161
are the same as the ones
that are actually going to vote.
642
00:33:18,196 --> 00:33:23,733
For example, we found in 2016,
pollsters were not reaching
643
00:33:23,768 --> 00:33:26,870
enough white voters
without college degrees.
644
00:33:26,905 --> 00:33:29,439
If there's a bias
in the data, you cannot recover from it.
645
00:33:29,474 --> 00:33:32,142
As we've seen
from some recent elections.
646
00:33:33,678 --> 00:33:37,280
After Donald
Trump's surprise win,
647
00:33:37,315 --> 00:33:39,516
many wondered if polling was broken.
648
00:33:39,551 --> 00:33:42,852
But if you look
at the polls themselves,
649
00:33:42,887 --> 00:33:44,287
and not the headlines,
650
00:33:44,322 --> 00:33:47,657
on average, polls on the
national and state level
651
00:33:47,692 --> 00:33:52,362
were off by
historically typical amounts.
652
00:33:52,397 --> 00:33:55,598
So when I hear people
say, "Oh, the polls were wrong,"
653
00:33:55,633 --> 00:33:58,635
then it probably reflects
people's interpretations
654
00:33:58,670 --> 00:34:01,337
about the polls being wrong,
655
00:34:01,372 --> 00:34:03,273
where people, for various reasons,
656
00:34:03,308 --> 00:34:06,409
looked at the polls, and they said,
657
00:34:06,444 --> 00:34:08,812
"These numbers prove to me
that Clinton's going to win."
658
00:34:08,847 --> 00:34:10,980
When we looked at the polls,
we said,
659
00:34:11,015 --> 00:34:13,817
"These numbers certainly
make her a favorite,
660
00:34:13,852 --> 00:34:16,719
"but they point toward an
election that's fairly close
661
00:34:16,754 --> 00:34:18,755
and quite uncertain, actually."
662
00:34:18,790 --> 00:34:21,257
And in 2016,
663
00:34:21,292 --> 00:34:25,328
the U .S. presidential election
was just that close.
664
00:34:25,363 --> 00:34:27,630
Trump's victory depended
on fewer votes
665
00:34:27,665 --> 00:34:30,967
than the seating capacity of
some college football stadiums--
666
00:34:31,002 --> 00:34:33,436
spread across three states:
667
00:34:33,471 --> 00:34:36,840
Pennsylvania, Wisconsin,
and Michigan.
668
00:34:36,875 --> 00:34:40,510
And there were some problems
with the polls in those states
669
00:34:40,545 --> 00:34:43,046
that led to underestimating
Trump's support,
670
00:34:43,081 --> 00:34:46,950
according to a postmortem
by a consortium of pollsters.
671
00:34:50,688 --> 00:34:53,923
Nate Silver, the founder
of the website FiveThirtyEight,
672
00:34:53,958 --> 00:34:55,959
is one of the biggest names
in polling--
673
00:34:55,994 --> 00:34:59,429
even though he doesn't
generally conduct polls.
674
00:34:59,464 --> 00:35:02,832
Our job is to
take other people's polls
675
00:35:02,867 --> 00:35:05,902
and to translate that
676
00:35:05,937 --> 00:35:09,339
in terms of a probability,
to say basically whether--
677
00:35:09,374 --> 00:35:10,507
um, who's ahead,
678
00:35:10,542 --> 00:35:13,443
which is usually pretty easy
to tell, um,
679
00:35:13,478 --> 00:35:15,845
but then how certain or
uncertain is the election
680
00:35:15,880 --> 00:35:17,780
is the more difficult part.
681
00:35:17,815 --> 00:35:20,717
Like a meteorologist,
682
00:35:20,752 --> 00:35:23,987
Nate presents his predictions
as probabilities.
683
00:35:24,022 --> 00:35:27,390
On the morning of Election Day 2016,
684
00:35:27,425 --> 00:35:29,959
he gave Clinton about
a 70% chance of winning
685
00:35:29,994 --> 00:35:33,696
and Trump about a 30% chance.
686
00:35:33,731 --> 00:35:36,332
That's like rolling
a ten-sided die
687
00:35:36,367 --> 00:35:38,201
with seven sides that are Clinton
688
00:35:38,236 --> 00:35:40,770
and three that are Trump.
689
00:35:40,805 --> 00:35:42,705
People who
make probabilistic forecasts,
690
00:35:42,740 --> 00:35:47,243
they're not saying that politics
is intrinsically random.
691
00:35:47,278 --> 00:35:51,114
They're saying that we have
imperfect knowledge of it,
692
00:35:51,149 --> 00:35:53,383
and that if you think you can be
more certain than that,
693
00:35:53,418 --> 00:35:57,887
you're probably fooling yourself
based on how accurate polls,
694
00:35:57,922 --> 00:35:59,789
other types of political data are.
695
00:36:02,260 --> 00:36:04,794
Ultimately,
interpreting a probability
696
00:36:04,829 --> 00:36:07,030
depends on the situation.
697
00:36:07,065 --> 00:36:10,633
While a 30% chance might seem slim,
698
00:36:10,668 --> 00:36:14,270
if you learned the flight
you were about to board
699
00:36:14,305 --> 00:36:16,940
crashed three
out of every ten trips,
700
00:36:16,975 --> 00:36:18,875
would you get on the plane?
701
00:36:18,910 --> 00:36:21,811
As this
plane only makes it to its destination
702
00:36:21,846 --> 00:36:23,713
seven out of ten times,
703
00:36:23,748 --> 00:36:26,449
please pay attention
to our short safety briefing.
704
00:36:26,484 --> 00:36:29,085
Or if a weather forecaster said
705
00:36:29,120 --> 00:36:31,721
there's only a 30% chance of rain,
706
00:36:31,756 --> 00:36:32,822
and then it rained--
707
00:36:32,857 --> 00:36:34,324
would you care?
708
00:36:34,359 --> 00:36:35,858
If it does rain,
709
00:36:35,893 --> 00:36:38,661
no one demands to know,
"Why did it rain?
710
00:36:38,696 --> 00:36:40,463
We have to get
to the bottom of this."
711
00:36:40,498 --> 00:36:42,465
We can say like, "It just did."
712
00:36:42,500 --> 00:36:44,000
It might have rained,
it might not have rained.
713
00:36:44,035 --> 00:36:45,401
As it happened, it did.
714
00:36:45,436 --> 00:36:48,137
I do think there's
a certain natural resistance
715
00:36:48,172 --> 00:36:50,907
to seeing things
that maybe we care about
716
00:36:50,942 --> 00:36:52,775
more than whether
it's going to rain or not,
717
00:36:52,810 --> 00:36:55,912
like elections, in that same way.
718
00:36:55,947 --> 00:36:59,249
As 2016 shows,
719
00:36:59,284 --> 00:37:01,818
predicting who will win
the U.S. presidency,
720
00:37:01,853 --> 00:37:05,622
a one-time contest
between two unique opponents,
721
00:37:05,657 --> 00:37:08,858
is far from easy.
722
00:37:08,893 --> 00:37:10,560
But in at least one field,
723
00:37:10,595 --> 00:37:14,030
there are literally decades
of detailed statistics
724
00:37:14,065 --> 00:37:16,065
on how the contests played out--
725
00:37:16,100 --> 00:37:17,900
baseball.
726
00:37:19,070 --> 00:37:22,272
Baseball has always been
a game of numbers--
727
00:37:22,307 --> 00:37:28,077
box scores, batting averages,
ERAs, RBIs.
728
00:37:28,112 --> 00:37:31,514
But while stats have always
been part of baseball,
729
00:37:31,549 --> 00:37:36,519
in the last 20 years,
their importance has skyrocketed
730
00:37:36,554 --> 00:37:39,656
due to sports analytics,
731
00:37:39,691 --> 00:37:43,994
the use of predictive models
to improve a team's performance.
732
00:37:45,396 --> 00:37:47,096
To some extent every business,
not just sports,
733
00:37:47,131 --> 00:37:50,266
is really trying to predict
the next event, you know.
734
00:37:50,301 --> 00:37:51,668
Whether you're on Wall Street,
735
00:37:51,703 --> 00:37:53,036
or if you're in the tech business,
736
00:37:53,071 --> 00:37:54,637
what's the new new thing.
737
00:37:54,672 --> 00:37:56,039
And for us,
it's future player performance.
738
00:37:57,742 --> 00:37:59,842
Billy Beane was one of the first
739
00:37:59,877 --> 00:38:03,179
to adopt the quantitative
approach in the late '90s,
740
00:38:03,214 --> 00:38:06,883
when he was the general manager
of the Oakland Athletics.
741
00:38:06,918 --> 00:38:10,019
Stuck with the low payroll
of a small-market team,
742
00:38:10,054 --> 00:38:13,389
he abandoned decades
of subjective baseball lore
743
00:38:13,424 --> 00:38:17,560
and committed the organization
to using statistical analyses
744
00:38:17,595 --> 00:38:20,096
to guide the team's
decision-making.
745
00:38:20,131 --> 00:38:22,332
It very much became a mathematical equation
746
00:38:22,367 --> 00:38:24,133
putting together a baseball team.
747
00:38:24,168 --> 00:38:29,072
Billy's stats-driven
approach started to attract attention
748
00:38:29,107 --> 00:38:30,773
when the Oakland A's finished
in the playoffs
749
00:38:30,808 --> 00:38:34,777
in four consecutive years
750
00:38:34,812 --> 00:38:38,448
and set a league record
with 20 wins in a row.
751
00:38:38,483 --> 00:38:39,749
Then it was lionized
752
00:38:39,784 --> 00:38:43,019
and even given a name in
a best-selling book and movie,
753
00:38:43,054 --> 00:38:44,520
"Moneyball."
754
00:38:44,555 --> 00:38:46,756
Brad Pitt plays Billy.
755
00:38:46,791 --> 00:38:49,759
If we win on
our budget with this team,
756
00:38:49,794 --> 00:38:53,296
we'll have changed the game.
757
00:38:54,499 --> 00:38:56,232
While "Moneyballing" didn't lead
758
00:38:56,267 --> 00:38:59,335
to a league championship
for the Oakland A's,
759
00:38:59,370 --> 00:39:01,871
it did change the game.
760
00:39:01,906 --> 00:39:04,240
Today, every
Major League Baseball team
761
00:39:04,275 --> 00:39:06,109
has a sports analytics
department,
762
00:39:06,144 --> 00:39:09,712
trying to predict and enhance
future player performance
763
00:39:09,747 --> 00:39:11,147
through data,
764
00:39:11,182 --> 00:39:12,949
analyzing everything
765
00:39:12,984 --> 00:39:16,352
from the angle and speed
of the ball coming off the bat--
766
00:39:16,387 --> 00:39:18,621
to which players should be brought up
767
00:39:18,656 --> 00:39:21,491
from the minor leagues
or traded.
768
00:39:21,526 --> 00:39:24,794
I'll never
pretend to be a math whiz,
769
00:39:24,829 --> 00:39:27,697
I just understand its powers
and its application.
770
00:39:27,732 --> 00:39:29,365
When you run a
Major League Baseball team,
771
00:39:29,400 --> 00:39:32,001
which is a great job,
772
00:39:32,036 --> 00:39:34,003
and every kid
who dreams of doing it,
773
00:39:34,038 --> 00:39:36,939
I can tell you it's
everything you've thought of.
774
00:39:36,974 --> 00:39:38,307
But when they ask me,
775
00:39:38,342 --> 00:39:39,776
"What do I have to do
to do that?"
776
00:39:39,811 --> 00:39:42,178
My answer is always the same.
777
00:39:42,213 --> 00:39:44,514
I say, "Go study
and get an A in math."
778
00:39:44,549 --> 00:39:49,318
While sports
analytics has transformed baseball,
779
00:39:49,353 --> 00:39:54,056
Moneyballing has found its way
into many unrelated fields.
780
00:39:54,091 --> 00:39:56,993
Proponents of data-driven
decision making and prediction
781
00:39:57,028 --> 00:40:00,096
have applied the approach
to areas as diverse
782
00:40:00,131 --> 00:40:04,200
as popular music and law enforcement.
783
00:40:04,235 --> 00:40:06,335
Moneyballing has been enabled
784
00:40:06,370 --> 00:40:08,471
by the vast amounts of information
785
00:40:08,506 --> 00:40:12,775
gathered through the internet,
so-called "Big Data."
786
00:40:12,810 --> 00:40:14,710
Our current output of data
787
00:40:14,745 --> 00:40:19,916
is roughly 2.5
quintillion bytes a day.
788
00:40:21,385 --> 00:40:24,754
But what about
the opposite situation,
789
00:40:24,789 --> 00:40:27,890
when there's very little data,
yet actions need to be taken--
790
00:40:27,925 --> 00:40:32,328
for example when searching
for people lost at sea?
791
00:40:32,363 --> 00:40:35,965
How do you even begin
to predict where they might be?
792
00:40:38,236 --> 00:40:41,904
The U.S. Coast Guard's
Sector Boston Command Center.
793
00:40:41,939 --> 00:40:44,974
From this secure set of rooms,
794
00:40:45,009 --> 00:40:49,212
the Coast Guard coordinates all
operations in the Boston area,
795
00:40:49,247 --> 00:40:52,248
including national security,
drug enforcement,
796
00:40:52,283 --> 00:40:54,851
and search and rescue.
797
00:40:59,657 --> 00:41:01,757
Good morning, Coast Guard
Sector Boston Command Center,
798
00:41:01,792 --> 00:41:03,025
Mr. Fleming speaking.
799
00:41:03,060 --> 00:41:04,126
Uh, good morning, sir...
800
00:41:04,161 --> 00:41:05,995
A caller reports
801
00:41:06,030 --> 00:41:07,430
that a friend went paddleboarding
802
00:41:07,465 --> 00:41:10,366
earlier in the morning,
but he's now overdue.
803
00:41:16,774 --> 00:41:18,508
The Coast Guard initiates a search
804
00:41:18,543 --> 00:41:19,809
with a 45-foot response boat...
805
00:41:19,844 --> 00:41:21,077
Engaging...
806
00:41:21,112 --> 00:41:22,078
...out of Boston Harbor.
807
00:41:22,113 --> 00:41:24,714
Coming up.
808
00:41:24,749 --> 00:41:26,616
Unfortunately,
a paddle craft in trouble
809
00:41:26,651 --> 00:41:28,751
has grown increasingly common.
810
00:41:28,786 --> 00:41:30,920
You are required
to have a life jacket on.
811
00:41:30,955 --> 00:41:33,689
The reason for that is in 2015,
812
00:41:33,724 --> 00:41:37,460
I think we had
625 deaths nationwide--
813
00:41:37,495 --> 00:41:39,729
a number of those people
that were recovered
814
00:41:39,764 --> 00:41:41,631
were recovered without
a life jacket.
815
00:41:41,666 --> 00:41:45,434
The Command
Center also launches another boat
816
00:41:45,469 --> 00:41:48,404
out of Station Point Allerton,
in Hull.
817
00:41:48,439 --> 00:41:49,805
Short tack disconnected.
818
00:41:49,840 --> 00:41:51,040
Stand clear of lines.
819
00:41:52,710 --> 00:41:55,244
The caller said the
missing person typically paddled
820
00:41:55,279 --> 00:41:58,214
between Nantasket Beach
and Boston Light,
821
00:41:58,249 --> 00:41:59,715
about three miles away.
822
00:41:59,750 --> 00:42:01,584
But with all the unknowns--
823
00:42:01,619 --> 00:42:04,520
where he got into trouble
and how he may have drifted--
824
00:42:04,555 --> 00:42:09,025
the search area could be
as large as 20 square miles.
825
00:42:12,463 --> 00:42:15,598
Search and rescue operations
826
00:42:15,633 --> 00:42:19,402
are often based
on unique circumstances
827
00:42:19,437 --> 00:42:22,772
and require action,
despite incomplete information.
828
00:42:22,807 --> 00:42:24,507
To attack problems like that,
829
00:42:24,542 --> 00:42:27,076
statisticians turn to an idea
that originates
830
00:42:27,111 --> 00:42:29,812
with an 18th-century
English clergyman
831
00:42:29,847 --> 00:42:34,016
interested in probability--
Thomas Bayes.
832
00:42:34,051 --> 00:42:37,186
Imagine you are given a coin to flip,
833
00:42:37,221 --> 00:42:40,723
and you want to know if it
is fair, 50-50 heads or tails,
834
00:42:40,758 --> 00:42:45,494
or weighted to land
more on heads than tails.
835
00:42:45,529 --> 00:42:48,097
The traditional approach
in statistics and science
836
00:42:48,132 --> 00:42:53,102
doesn't assume either answer and
uses experiments to find out.
837
00:42:53,137 --> 00:42:58,441
In this case that involves
flipping the coin a lot.
838
00:42:58,476 --> 00:43:01,877
Or you could approach the
problem like a Bayesian.
839
00:43:01,912 --> 00:43:04,246
Unlike traditional statistics,
840
00:43:04,281 --> 00:43:06,882
that means starting
with an initial probability
841
00:43:06,917 --> 00:43:09,285
based on what you know.
842
00:43:09,320 --> 00:43:12,021
In this case, all the coins
you've ever come across
843
00:43:12,056 --> 00:43:15,358
in a lifetime of flipping coins
have been fair.
844
00:43:15,393 --> 00:43:18,928
It seems likely
this one is probably fair too.
845
00:43:18,963 --> 00:43:21,897
Next, you also flip the coin,
846
00:43:21,932 --> 00:43:24,433
updating the probability
as you go.
847
00:43:24,468 --> 00:43:27,370
Let's say it starts off
with several heads in a row.
848
00:43:27,405 --> 00:43:29,005
That might make you wonder,
849
00:43:29,040 --> 00:43:32,174
increasing your probability
estimate that it's weighted.
850
00:43:32,209 --> 00:43:36,846
But as you flip it more times,
those start to look like chance.
851
00:43:36,881 --> 00:43:38,681
In the end, your best estimate
852
00:43:38,716 --> 00:43:41,917
is that it is probably
a fair coin,
853
00:43:41,952 --> 00:43:44,754
but you are open
to any new information.
854
00:43:44,789 --> 00:43:48,157
Like it belongs to your uncle
the con man, "Crooked Larry."
855
00:43:50,995 --> 00:43:52,461
Sector 659.
856
00:43:52,496 --> 00:43:56,132
Our estimated time of arrival
is one-one-five-eight.
857
00:43:56,167 --> 00:44:00,369
Bayesian inference
creates a rigorous mathematical approach
858
00:44:00,404 --> 00:44:05,241
to calculating probabilities
based on new information.
859
00:44:05,276 --> 00:44:07,643
And it sits at the heart
of the Coast Guard's
860
00:44:07,678 --> 00:44:12,381
Search and Rescue Optimal
Planning System: SAROPS.
861
00:44:12,416 --> 00:44:15,051
He's been
missing since 7:30 this morning,
862
00:44:15,086 --> 00:44:16,952
so I'm going to go ahead
and do a SAROPS drift.
863
00:44:16,987 --> 00:44:18,754
SAROPS takes information
864
00:44:18,789 --> 00:44:21,991
about the last-known position
of the object of the search...
865
00:44:22,026 --> 00:44:23,626
What's the direction of the wind?
866
00:44:23,661 --> 00:44:26,195
...along with the
readings of currents and winds
867
00:44:26,230 --> 00:44:28,264
and combines them with information
868
00:44:28,299 --> 00:44:30,933
about how objects drift
in the water
869
00:44:30,968 --> 00:44:33,903
to simulate thousands of possible paths
870
00:44:33,938 --> 00:44:36,238
the target may have taken.
871
00:44:36,273 --> 00:44:38,708
These get processed into probabilities,
872
00:44:38,743 --> 00:44:40,443
indicated by color,
873
00:44:40,478 --> 00:44:43,679
and turned into search plans
to be executed.
874
00:44:43,714 --> 00:44:46,348
SAROPS is really
a workhorse for the Coast Guard.
875
00:44:46,383 --> 00:44:48,317
It does a lot of the calculations for us.
876
00:44:48,352 --> 00:44:50,419
It provides us with a lot
of valuable search patterns
877
00:44:50,454 --> 00:44:51,987
and search-planning options.
878
00:44:52,022 --> 00:44:54,356
I thought he was
pretty far off shore but, you know,
879
00:44:54,391 --> 00:44:56,892
he said he was okay, so I kept going.
880
00:44:56,927 --> 00:44:58,828
Word of the search has spread.
881
00:44:58,863 --> 00:45:03,399
A boater calls in a sighting
from earlier in the day.
882
00:45:03,434 --> 00:45:05,401
What I did is I went in and put
that information into SAROPS,
883
00:45:05,436 --> 00:45:07,002
and it changed everything.
884
00:45:07,037 --> 00:45:10,940
SAROPS quickly
recalculates all the probabilities
885
00:45:10,975 --> 00:45:13,542
and generates a new search plan.
886
00:45:13,577 --> 00:45:17,680
The area has shifted about
three miles farther out to sea.
887
00:45:19,683 --> 00:45:23,052
We are on-scene,
commencing search pattern now.
888
00:45:25,089 --> 00:45:26,122
Keep a good look out.
889
00:45:26,157 --> 00:45:27,556
Roger, coming up.
890
00:45:27,591 --> 00:45:28,791
We're assessing
the situation on scene.
891
00:45:30,661 --> 00:45:34,497
Any object you see in
the water, please take a closer look at.
892
00:45:45,709 --> 00:45:47,743
Paddleboarder, port side
893
00:45:51,115 --> 00:45:54,416
Roger,
we have located a paddleboarder
894
00:45:54,451 --> 00:45:55,718
with zero-one person on board.
895
00:45:55,753 --> 00:45:57,453
Off the port corridor!
896
00:45:57,488 --> 00:45:59,488
Starboard side.
897
00:45:59,523 --> 00:46:00,956
I have a visual
898
00:46:00,991 --> 00:46:02,658
All right.
899
00:46:02,693 --> 00:46:07,129
As it turns out,
the search has been a drill.
900
00:46:07,164 --> 00:46:10,466
Hours earlier, the paddleboard
was placed in the water
901
00:46:10,501 --> 00:46:13,536
by another Coast Guard ship
and allowed to drift.
902
00:46:13,571 --> 00:46:17,606
The instruments mounted on it
are there to measure wind
903
00:46:17,641 --> 00:46:20,242
and record the path it's taken,
904
00:46:20,277 --> 00:46:21,877
information that will later be used
905
00:46:21,912 --> 00:46:25,181
to tweak the drift simulations
in SAROPS,
906
00:46:25,216 --> 00:46:28,184
though the system performed
quite well today.
907
00:46:28,219 --> 00:46:30,953
The object was right
in the middle of our search patterns.
908
00:46:30,988 --> 00:46:33,389
So SAROPS was actually
dead-on accurate
909
00:46:33,424 --> 00:46:35,024
in predicting where we needed to search
910
00:46:35,059 --> 00:46:36,458
to find
the missing paddleboarder.
911
00:46:38,395 --> 00:46:40,696
To be able
to call a family and say,
912
00:46:40,731 --> 00:46:42,364
"Your family and friends
is coming home,"
913
00:46:42,399 --> 00:46:43,899
is absolutely a call
914
00:46:43,934 --> 00:46:45,401
that all of us should have
the chance to make,
915
00:46:45,436 --> 00:46:47,803
and, fortunately,
because of stuff like this,
916
00:46:47,838 --> 00:46:49,572
we do get to make that call.
917
00:46:52,610 --> 00:46:55,611
The computational
complexity of updating probabilities
918
00:46:55,646 --> 00:46:59,415
held the Bayesian approach back
for most of the 20th century.
919
00:46:59,450 --> 00:47:05,054
But today's computing power
has unleashed it on the world.
920
00:47:05,089 --> 00:47:08,691
It's in everything from your spam filter
921
00:47:08,726 --> 00:47:13,596
to the way Google searches work
to self-driving cars.
922
00:47:13,631 --> 00:47:17,700
Some even find in the Bayesian
embrace of probability,
923
00:47:17,735 --> 00:47:21,537
similarities to how we learn
from experience.
924
00:47:21,572 --> 00:47:24,773
And they've built it into computers,
925
00:47:24,808 --> 00:47:26,642
Making it part of a powerful new force:
926
00:47:26,677 --> 00:47:28,811
machine learning.
927
00:47:28,846 --> 00:47:31,513
In the past,
when we programmed computers,
928
00:47:31,548 --> 00:47:36,018
we tended to really write down,
in excruciating detail,
929
00:47:36,053 --> 00:47:38,687
a set of rules
that would tell the computer
930
00:47:38,722 --> 00:47:42,324
what to do in every single
contingencies.
931
00:47:42,359 --> 00:47:44,526
But there's another approach--
932
00:47:44,561 --> 00:47:46,462
to treat the computer
933
00:47:46,497 --> 00:47:51,267
like a child
learning to ride a bike.
934
00:47:51,302 --> 00:47:53,669
No one teaches a child
to ride using a set of rules.
935
00:47:53,704 --> 00:47:55,971
There may be some tips,
936
00:47:56,006 --> 00:47:58,407
but ultimately,
it is trial and error--
937
00:47:58,442 --> 00:48:02,778
experience--
that's the instructor.
938
00:48:02,813 --> 00:48:04,747
The new thing,
the new kid on the block
939
00:48:04,782 --> 00:48:05,981
is machine learning,
940
00:48:06,016 --> 00:48:07,683
specifically something
called deep learning.
941
00:48:07,718 --> 00:48:10,986
Here, we don't inform
the computer of the rules,
942
00:48:11,021 --> 00:48:12,354
but through examples.
943
00:48:12,389 --> 00:48:14,056
So similar to, like, a small child
944
00:48:14,091 --> 00:48:16,692
that falls down and learns
from this experience,
945
00:48:16,727 --> 00:48:18,995
we just let the computer
learn from examples.
946
00:48:21,532 --> 00:48:23,165
Suppose
you want to train a computer
947
00:48:23,200 --> 00:48:26,068
to recognize pictures of cats.
948
00:48:26,103 --> 00:48:29,338
By scanning through thousands
of labeled pictures--
949
00:48:29,373 --> 00:48:31,407
some cats, some not--
950
00:48:31,442 --> 00:48:34,109
the computer can develop
its own guidelines
951
00:48:34,144 --> 00:48:38,380
for assessing the probability
that a picture is a cat.
952
00:48:38,415 --> 00:48:39,982
And these days
953
00:48:40,017 --> 00:48:44,353
computers are doing far more
than just looking for cats.
954
00:48:44,388 --> 00:48:46,155
Some of the best computers now
955
00:48:46,190 --> 00:48:49,858
can learn how to beat
the world's best Go champion
956
00:48:49,893 --> 00:48:54,263
or to discover documents
in stacks of documents,
957
00:48:54,298 --> 00:48:56,498
work that highly paid lawyers
normally do,
958
00:48:56,533 --> 00:48:58,867
or diagnose diseases.
959
00:48:58,902 --> 00:49:00,970
At Stanford,
we recently ran a study
960
00:49:01,005 --> 00:49:03,839
to understand whether
a machine-learning algorithm
961
00:49:03,874 --> 00:49:07,676
can compete with top-notch,
Stanford-level,
962
00:49:07,711 --> 00:49:09,044
board-certified dermatologists
963
00:49:09,079 --> 00:49:13,182
in spotting things like skin cancer.
964
00:49:13,217 --> 00:49:16,685
And lo and behold, we found that
our machine-learning algorithm,
965
00:49:16,720 --> 00:49:18,187
our little box,
966
00:49:18,222 --> 00:49:21,824
is as good as the best human
doctor in finding skin cancer.
967
00:49:21,859 --> 00:49:24,994
That raises a lot of questions:
968
00:49:25,029 --> 00:49:28,130
should we trust software
over our doctors?
969
00:49:28,165 --> 00:49:31,700
Or are diagnostic programs
like Sebastian's
970
00:49:31,735 --> 00:49:34,103
the intelligent medical
assistants of tomorrow,
971
00:49:34,138 --> 00:49:37,506
a new tool but not a substitute?
972
00:49:37,541 --> 00:49:41,010
And there are other concerns.
973
00:49:41,045 --> 00:49:46,715
If you asked a person riding
a bike exactly how they do it,
974
00:49:46,750 --> 00:49:49,184
they'd be hard-pressed
to put it into words.
975
00:49:49,219 --> 00:49:50,652
The same is true
976
00:49:50,687 --> 00:49:53,789
with so-called "black box"
machine learning applications
977
00:49:53,824 --> 00:49:55,457
like Sebastian's:
978
00:49:55,492 --> 00:49:57,626
no one, including Sebastian,
979
00:49:57,661 --> 00:50:00,829
knows how it detects skin cancer.
980
00:50:00,864 --> 00:50:04,166
Like the bicyclist, it just does,
981
00:50:04,201 --> 00:50:06,869
which may be fine
for diagnostic software,
982
00:50:06,904 --> 00:50:09,104
but not for other aspects
of medicine,
983
00:50:09,139 --> 00:50:12,408
like treatment decisions.
984
00:50:12,443 --> 00:50:13,776
If what you're doing is deciding
985
00:50:13,811 --> 00:50:15,878
what dose of chemotherapy
to give a patient,
986
00:50:15,913 --> 00:50:17,946
I think most people would be uncomfortable
987
00:50:17,981 --> 00:50:19,481
with that being a black box.
988
00:50:19,516 --> 00:50:20,883
People would want to understand
989
00:50:20,918 --> 00:50:22,584
where those predictions
are coming from.
990
00:50:22,619 --> 00:50:25,387
The same can be true
991
00:50:25,422 --> 00:50:26,955
for evaluating who should
get a home loan,
992
00:50:26,990 --> 00:50:30,592
or who should get fired from
their job for poor performance,
993
00:50:30,627 --> 00:50:33,529
or who gets paroled,
994
00:50:33,564 --> 00:50:35,230
all situations
995
00:50:35,265 --> 00:50:38,967
in which black box machine
learning software are in use.
996
00:50:39,002 --> 00:50:40,436
These are algorithms
997
00:50:40,471 --> 00:50:43,605
that can have a big effect
on people's lives.
998
00:50:43,640 --> 00:50:45,607
And we have to understand,
as a society,
999
00:50:45,642 --> 00:50:47,142
what is going into those algorithms
1000
00:50:47,177 --> 00:50:48,510
and what they're based on,
1001
00:50:48,545 --> 00:50:50,879
in order to make sure that
they're not perpetuating
1002
00:50:50,914 --> 00:50:53,315
social problems that we already have.
1003
00:50:55,319 --> 00:50:59,788
We live in an age
when the fusion of data, computers,
1004
00:50:59,823 --> 00:51:03,392
probability, and statistics
1005
00:51:03,427 --> 00:51:06,095
grants us more predictive power
than we've ever known before.
1006
00:51:06,130 --> 00:51:10,165
We can see the tangible benefits,
1007
00:51:10,200 --> 00:51:12,167
and some of the dangers,
1008
00:51:12,202 --> 00:51:17,272
while also wondering
where this will all go.
1009
00:51:17,307 --> 00:51:19,241
We're really seeing a new
science of statistics
1010
00:51:19,276 --> 00:51:20,809
developing under our feet.
1011
00:51:20,844 --> 00:51:21,977
That's exciting,
1012
00:51:22,012 --> 00:51:24,746
and I think it must be a little
bit like
1013
00:51:24,781 --> 00:51:26,381
what it was like when
the theory of probability
1014
00:51:26,416 --> 00:51:28,217
was first being developed
1015
00:51:28,252 --> 00:51:30,853
by Pascal and Fermat
and people around them,
1016
00:51:30,888 --> 00:51:32,020
that people were sort of saying,
1017
00:51:32,055 --> 00:51:33,889
"My God, these are questions
that mathematics
1018
00:51:33,924 --> 00:51:35,924
can really have something
to say about."
1019
00:51:35,959 --> 00:51:37,593
I think that must have been
what it was like
1020
00:51:37,628 --> 00:51:39,595
when statistics
in its traditional form
1021
00:51:39,630 --> 00:51:41,897
was being developed in the
first part of the 20th century,
1022
00:51:41,932 --> 00:51:44,199
and suddenly
people were just asking
1023
00:51:44,234 --> 00:51:45,634
whole new kinds of questions
1024
00:51:45,669 --> 00:51:47,069
that they couldn't even
have approached before.
1025
00:51:47,104 --> 00:51:50,005
And I think we're having
another moment like that now.
1026
00:51:51,842 --> 00:51:54,943
While tomorrow
will always remain uncertain,
1027
00:51:54,978 --> 00:51:58,547
mathematics will continue
to guide the way,
1028
00:51:58,582 --> 00:52:00,916
through the power of probability,
1029
00:52:00,951 --> 00:52:04,253
and prediction by the numbers.
82008
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.