All language subtitles for NOVA.S45E05.Prediction.by.the.Numbers.1080p.AMZN.WEB-DL.DDP.2.0.H.264-GNOME_track3_[eng]

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:08,341 --> 00:00:11,576 The future unfolds before our eyes... 2 00:00:11,611 --> 00:00:14,212 but is it always beyond our grasp? 3 00:00:14,247 --> 00:00:17,582 What was once the province of the gods 4 00:00:17,617 --> 00:00:21,820 has now come more clearly into view 5 00:00:21,855 --> 00:00:24,856 through mathematics and data. 6 00:00:24,891 --> 00:00:27,893 Out of some early observations about gambling 7 00:00:27,928 --> 00:00:30,729 arose tools that guide 8 00:00:30,764 --> 00:00:32,964 our scientific understanding of the world 9 00:00:32,999 --> 00:00:34,766 and more... 10 00:00:34,801 --> 00:00:38,136 through the power of prediction. 11 00:00:41,675 --> 00:00:43,341 From our decisions about the weather... 12 00:00:43,376 --> 00:00:46,044 The strongest hurricane ever on record... 13 00:00:46,079 --> 00:00:47,746 To finding someone lost at sea... 14 00:00:47,781 --> 00:00:48,914 Commencing search pattern. 15 00:00:48,949 --> 00:00:50,348 Keep a good look out! 16 00:00:50,383 --> 00:00:52,884 Every day mathematics and data combine 17 00:00:52,919 --> 00:00:55,854 to help us envision what might be... 18 00:00:55,889 --> 00:00:59,524 It's the best crystal ball that humankind can have. 19 00:00:59,559 --> 00:01:02,661 Take a trip on the wings of probability 20 00:01:02,696 --> 00:01:04,429 into the future. 21 00:01:04,464 --> 00:01:05,464 We're thinking about luck 22 00:01:05,499 --> 00:01:06,832 or misfortune, 23 00:01:06,867 --> 00:01:09,268 but they just basically are a question of math, right? 24 00:01:11,571 --> 00:01:14,072 "Prediction by the Numbers" -- 25 00:01:14,107 --> 00:01:26,351 right now, on "NOVA." 26 00:01:26,386 --> 00:01:29,688 The Orange County Fair, held in Southern California. 27 00:01:31,992 --> 00:01:35,560 In theory, these crowds hold a predictive power 28 00:01:35,595 --> 00:01:38,296 that can have startling accuracy, 29 00:01:38,331 --> 00:01:43,001 but it doesn't belong to any individual, only the group. 30 00:01:43,036 --> 00:01:44,636 And even then, it has to be viewed 31 00:01:44,671 --> 00:01:49,641 through the lens of mathematics. 32 00:01:49,676 --> 00:01:53,078 The theory is known as the "wisdom of crowds," 33 00:01:53,113 --> 00:01:57,582 a phenomenon first documented about a hundred years ago. 34 00:01:57,617 --> 00:02:00,152 Statistician Talithia Williams is here 35 00:02:00,187 --> 00:02:03,155 to see if the theory checks out, and to spend some time 36 00:02:03,190 --> 00:02:06,358 with the fair's most beloved animal, 37 00:02:06,393 --> 00:02:09,427 Patches, a 14-year-old ox. 38 00:02:11,932 --> 00:02:14,232 It was a fair kind of like this one 39 00:02:14,267 --> 00:02:16,268 where, in 1906, 40 00:02:16,303 --> 00:02:19,504 Sir Francis Galton came across a contest 41 00:02:19,539 --> 00:02:22,541 where you had to guess the weight of an ox, 42 00:02:22,576 --> 00:02:24,943 like Patches you see here behind me. 43 00:02:26,746 --> 00:02:29,281 After the ox weight-guessing contest was over, 44 00:02:29,316 --> 00:02:34,719 Galton took all the entries home and analyzed them statistically. 45 00:02:34,754 --> 00:02:36,388 To his surprise, 46 00:02:36,423 --> 00:02:39,357 while none of the individual guesses were correct, 47 00:02:39,392 --> 00:02:42,194 the average of all the guesses 48 00:02:42,229 --> 00:02:45,096 was off by less than one percent. 49 00:02:45,131 --> 00:02:47,866 That's the wisdom of crowds. 50 00:02:47,901 --> 00:02:51,336 But is it still true? 51 00:02:51,371 --> 00:02:54,005 So, here's how I think we can test that today. 52 00:02:54,040 --> 00:02:57,375 What if we ask a random sample of people here at the fair 53 00:02:57,410 --> 00:03:00,912 if they can guess how many jelly beans they think are in the jar. 54 00:03:00,947 --> 00:03:03,615 And then, we take those numbers and average them 55 00:03:03,650 --> 00:03:05,116 and see if that's actually close 56 00:03:05,151 --> 00:03:07,586 to the true number of jelly beans. 57 00:03:09,422 --> 00:03:12,090 Guess how many jelly beans are in here. 58 00:03:12,125 --> 00:03:14,025 Come on, guys, everybody's got to have their guess. 59 00:03:14,060 --> 00:03:15,560 I see your mind churning. 60 00:03:15,595 --> 00:03:16,428 1,227. 61 00:03:16,463 --> 00:03:18,063 846. 62 00:03:18,098 --> 00:03:19,731 Probably like 925? 63 00:03:19,766 --> 00:03:21,199 I think a thousand. 64 00:03:21,234 --> 00:03:22,734 So just write your number down. 65 00:03:22,769 --> 00:03:24,169 Uh huh, there you go. 66 00:03:24,204 --> 00:03:26,538 Can I have a jelly bean? 67 00:03:28,642 --> 00:03:32,911 The 135 guesses gathered from the crowd vary wildly. 68 00:03:32,946 --> 00:03:35,880 The range of our guesses was, 69 00:03:35,915 --> 00:03:40,185 from the smallest was 183, the largest was 12,000. 70 00:03:40,220 --> 00:03:42,721 So you can tell, folks were really guessing. 71 00:03:42,756 --> 00:03:49,160 But when we take the average of our guesses, we get 1,522. 72 00:03:49,195 --> 00:03:50,996 So the question is, 73 00:03:51,031 --> 00:03:54,899 how close is our average to the actual number of jelly beans? 74 00:03:54,934 --> 00:03:58,536 Well, now's the moment of truth. 75 00:04:09,916 --> 00:04:15,120 All right, so the real number of jelly beans was 1,676. 76 00:04:15,155 --> 00:04:19,791 The average of our guesses was off by less than ten percent. 77 00:04:19,826 --> 00:04:22,160 So there actually was some wisdom in our crowd. 78 00:04:22,195 --> 00:04:25,697 Though off by about ten percent, 79 00:04:25,732 --> 00:04:27,265 the average of the crowd's estimates 80 00:04:27,300 --> 00:04:29,868 was still more accurate 81 00:04:29,903 --> 00:04:32,637 than the vast majority of the individual guesses. 82 00:04:32,672 --> 00:04:36,508 Even so, the wisdom of crowds does have limits. 83 00:04:36,543 --> 00:04:40,178 It can be easily undermined by outside influences 84 00:04:40,213 --> 00:04:43,682 and tends to work best on questions with clear answers, 85 00:04:43,717 --> 00:04:44,883 like a number. 86 00:04:44,918 --> 00:04:47,018 The steps Talithia took 87 00:04:47,053 --> 00:04:50,955 reflect a process going on all around us these days 88 00:04:50,990 --> 00:04:53,358 in the work of statisticians. 89 00:04:53,393 --> 00:04:54,726 Thanks, everybody. 90 00:04:54,761 --> 00:04:55,960 So we collected this data, 91 00:04:55,995 --> 00:04:58,763 right, we analyzed it mathematically, 92 00:04:58,798 --> 00:05:01,366 and we got an estimate that was pretty close 93 00:05:01,401 --> 00:05:03,435 to the actual true value. 94 00:05:03,470 --> 00:05:06,471 That's math and statistics at work. 95 00:05:10,143 --> 00:05:14,579 We didn't always use math and statistics to make predictions. 96 00:05:14,614 --> 00:05:19,617 The Romans studied the flights and cries of birds. 97 00:05:19,652 --> 00:05:23,822 The Chinese cracked "oracle" bones with a hot metal rod 98 00:05:23,857 --> 00:05:25,857 and read the results 99 00:05:25,892 --> 00:05:28,827 19th-century Russians used chickens. 100 00:05:30,930 --> 00:05:34,032 Throughout history, we've sought the future 101 00:05:34,067 --> 00:05:36,534 in moles on people's faces, 102 00:05:36,569 --> 00:05:38,870 clouds in the sky, 103 00:05:38,905 --> 00:05:41,973 or a pearl cast into an iron pot. 104 00:05:42,008 --> 00:05:48,080 And that list of things used for predicting goes on and on. 105 00:05:53,753 --> 00:05:57,789 But more recently-- that is the last couple hundred years-- 106 00:05:57,824 --> 00:06:00,892 to see into the future, we've turned to science 107 00:06:00,927 --> 00:06:05,029 and made some remarkable predictions 108 00:06:05,064 --> 00:06:08,600 from the existence of Neptune, 109 00:06:08,635 --> 00:06:11,936 or radio waves, 110 00:06:11,971 --> 00:06:15,206 or black holes, 111 00:06:15,241 --> 00:06:19,611 to the future location of a comet with such precision 112 00:06:19,646 --> 00:06:21,446 we could land a space probe on it. 113 00:06:23,316 --> 00:06:26,050 But if you pop the hood of science, 114 00:06:26,085 --> 00:06:29,354 inside you'll find a field of applied mathematics 115 00:06:29,389 --> 00:06:32,757 that's made many of those predictions possible: 116 00:06:32,792 --> 00:06:34,659 statistics. 117 00:06:34,694 --> 00:06:36,828 Statistics is kind of unique. 118 00:06:36,863 --> 00:06:39,998 It's not an empirical science itself, but it's not pure math, 119 00:06:40,033 --> 00:06:41,900 but it's not philosophy either. 120 00:06:41,935 --> 00:06:45,170 It's the framework, the language, 121 00:06:45,205 --> 00:06:49,407 the rules by which we do science. 122 00:06:49,442 --> 00:06:51,276 From that, we can make decisions, 123 00:06:51,311 --> 00:06:54,646 we can make conclusions, we can make predictions. 124 00:06:54,681 --> 00:06:56,815 That's what... that's what statisticians try to do. 125 00:06:56,850 --> 00:06:59,751 Why I love statistics is that 126 00:06:59,786 --> 00:07:02,787 it predicts the likelihood of future occurrences, 127 00:07:02,822 --> 00:07:06,925 which really means it's the best crystal ball 128 00:07:06,960 --> 00:07:08,893 that humankind can have. 129 00:07:10,930 --> 00:07:14,232 Ultimately, all the predictive power of statistics 130 00:07:14,267 --> 00:07:18,536 rests on a revolutionary insight from about 500 years ago-- 131 00:07:18,571 --> 00:07:21,873 that chance itself can be tamed 132 00:07:21,908 --> 00:07:24,543 through the mathematics of probability. 133 00:07:26,813 --> 00:07:29,414 Viva Las Vegas! 134 00:07:29,449 --> 00:07:33,384 Here's a city full of palaces 135 00:07:33,419 --> 00:07:36,254 built on understanding probability 136 00:07:36,289 --> 00:07:38,490 and fueled by gambling, 137 00:07:38,525 --> 00:07:40,592 which may seem a funny place 138 00:07:40,627 --> 00:07:42,760 to find mathematician Keith Devlin. 139 00:07:42,795 --> 00:07:45,096 But mathematics and gambling 140 00:07:45,131 --> 00:07:47,832 have been tied together for centuries. 141 00:07:47,867 --> 00:07:51,269 Today in a casino, you'll find roulette, 142 00:07:51,304 --> 00:07:52,737 slot machines, 143 00:07:52,772 --> 00:07:54,572 blackjack. 144 00:07:54,607 --> 00:07:58,676 Playing craps is also known as "rolling the bones," 145 00:07:58,711 --> 00:08:00,545 which is more accurate than you might think. 146 00:08:00,580 --> 00:08:02,580 Humans have been gambling 147 00:08:02,615 --> 00:08:05,383 since the beginnings of modern civilization. 148 00:08:05,418 --> 00:08:07,719 The ancient Greeks, the ancient Egyptians, 149 00:08:07,754 --> 00:08:11,856 would use the ankle bones of sheep as a form of early dice. 150 00:08:11,891 --> 00:08:16,327 Surprisingly, while the Greeks laid the foundation 151 00:08:16,362 --> 00:08:19,831 for our mathematics, they didn't spend any effort 152 00:08:19,866 --> 00:08:22,166 trying to analyze games of chance. 153 00:08:22,201 --> 00:08:24,936 It seems to have never occurred to them, 154 00:08:24,971 --> 00:08:28,806 or indeed to anybody way up until the 15th, 16th century, 155 00:08:28,841 --> 00:08:30,575 that you could apply mathematics 156 00:08:30,610 --> 00:08:33,144 to calculate the way these games would come out. 157 00:08:35,582 --> 00:08:39,450 16th-century Italian mathematician Gerolamo Cardano 158 00:08:39,485 --> 00:08:42,086 made a key early observation: 159 00:08:42,121 --> 00:08:45,623 that the more times a game of chance is played, 160 00:08:45,658 --> 00:08:47,959 the better mathematical probability 161 00:08:47,994 --> 00:08:49,794 predicts the outcome, 162 00:08:49,829 --> 00:08:54,399 later proven as the law of large numbers. 163 00:08:54,434 --> 00:08:59,704 Examples of the law of large numbers at work surround us. 164 00:08:59,739 --> 00:09:01,105 When I flip this coin, 165 00:09:01,140 --> 00:09:02,840 we have no way of knowing 166 00:09:02,875 --> 00:09:04,809 whether it's going to come up heads or tails. 167 00:09:07,513 --> 00:09:09,013 That time it was heads. 168 00:09:09,048 --> 00:09:13,484 On the other hand, if I were to toss a coin 100 times, 169 00:09:13,519 --> 00:09:17,522 roughly 50% of the time it would come up heads, 170 00:09:17,557 --> 00:09:19,724 and 50% of the time it would come up tails. 171 00:09:19,759 --> 00:09:21,159 We can't predict a single toss. 172 00:09:21,194 --> 00:09:25,630 We can predict the aggregate behavior over a 100 tosses. 173 00:09:25,665 --> 00:09:27,498 That's the law of large numbers. 174 00:09:27,533 --> 00:09:31,102 In fact, casinos are a testament 175 00:09:31,137 --> 00:09:33,871 to the iron hand of the law of large numbers. 176 00:09:33,906 --> 00:09:35,239 The games are designed 177 00:09:35,274 --> 00:09:38,843 to give the casinos a slight edge over the gambler. 178 00:09:38,878 --> 00:09:41,346 Take American roulette: 179 00:09:41,381 --> 00:09:44,649 on the wheel are the numbers one through 36, 180 00:09:44,684 --> 00:09:46,718 half red and half black. 181 00:09:46,753 --> 00:09:49,420 Betting a dollar on one color or the other 182 00:09:49,455 --> 00:09:52,457 seems like a 50-50 proposition. 183 00:09:52,492 --> 00:09:56,227 But the wheel also has two green slots with zeros. 184 00:09:56,262 --> 00:09:57,528 If the ball lands in those, 185 00:09:57,563 --> 00:10:01,366 the casino wins all the bets on either red or black. 186 00:10:01,401 --> 00:10:03,434 And that's the kind of edge 187 00:10:03,469 --> 00:10:07,505 that makes the casino money over the long run. 188 00:10:07,540 --> 00:10:09,007 Customers are gambling. 189 00:10:09,042 --> 00:10:10,708 The casino is absolutely not gambling. 190 00:10:10,743 --> 00:10:12,276 Because they may lose money, 191 00:10:12,311 --> 00:10:16,180 they may lose a lot of money to one or two players, 192 00:10:16,215 --> 00:10:18,016 but if you have thousands and thousands of players, 193 00:10:18,051 --> 00:10:19,517 by the law of large numbers, 194 00:10:19,552 --> 00:10:22,487 you are guaranteed to make money. 195 00:10:24,490 --> 00:10:25,890 The law of large numbers 196 00:10:25,925 --> 00:10:28,493 comes into play outside of gambling too. 197 00:10:28,528 --> 00:10:31,396 In basketball, a field goal or shooting percentage 198 00:10:31,431 --> 00:10:34,766 is the number of baskets made 199 00:10:34,801 --> 00:10:38,369 divided by the number of shots taken. 200 00:10:38,404 --> 00:10:41,105 But early in the season, 201 00:10:41,140 --> 00:10:43,441 when it's based on a low number of attempts, 202 00:10:43,476 --> 00:10:45,410 that percentage can be misleading. 203 00:10:45,445 --> 00:10:48,246 At the beginning of the season, a less skilled player 204 00:10:48,281 --> 00:10:52,216 might get off a few lucky shots in a row. 205 00:10:52,251 --> 00:10:53,451 And at that point, 206 00:10:53,486 --> 00:10:55,119 they'd have a super-high shooting percentage. 207 00:11:00,560 --> 00:11:02,126 Meanwhile, a very skilled player 208 00:11:02,161 --> 00:11:06,164 might miss a few at the beginning of the season 209 00:11:06,199 --> 00:11:07,498 and have a low shooting percentage. 210 00:11:08,534 --> 00:11:10,134 But as the season goes on, 211 00:11:10,169 --> 00:11:11,703 and the total number of shots climbs, 212 00:11:11,738 --> 00:11:15,306 their shooting percentages will soon reflect 213 00:11:15,341 --> 00:11:16,674 their true skill level. 214 00:11:18,444 --> 00:11:21,012 That's the law of large numbers at work. 215 00:11:21,047 --> 00:11:26,317 A small sample, like just a few shots, can be deceptive, 216 00:11:26,352 --> 00:11:29,454 while a large sample, like a lot of shots, 217 00:11:29,489 --> 00:11:31,522 gives you a better picture. 218 00:11:39,165 --> 00:11:43,034 The gambling observations that led to the law of large numbers 219 00:11:43,069 --> 00:11:47,338 were a start, but what really launched probability theory 220 00:11:47,373 --> 00:11:50,241 and opened up a door to a whole new way 221 00:11:50,276 --> 00:11:52,643 of thinking about the future, 222 00:11:52,678 --> 00:11:54,078 was a series of letters 223 00:11:54,113 --> 00:11:56,848 exchanged between two French mathematicians, 224 00:11:56,883 --> 00:12:02,453 Blaise Pascal and Pierre de Fermat in the 1650s, 225 00:12:02,488 --> 00:12:04,822 about another gambling problem that had been kicking around 226 00:12:04,857 --> 00:12:07,358 for a few centuries. 227 00:12:07,393 --> 00:12:11,062 A simplified version of the problem goes like this: 228 00:12:11,097 --> 00:12:14,165 two players-- let's call them Blaise and Pierre-- 229 00:12:14,200 --> 00:12:16,134 are flipping a coin. 230 00:12:16,169 --> 00:12:19,971 Blaise has chosen heads, and Pierre tails. 231 00:12:20,006 --> 00:12:22,840 The game is the best of 5 flips, 232 00:12:22,875 --> 00:12:26,144 and each has put money into the pot. 233 00:12:26,179 --> 00:12:30,715 They flip the coin three times, and Blaise is ahead two to one. 234 00:12:32,151 --> 00:12:35,586 But then the game is interrupted. 235 00:12:35,621 --> 00:12:39,657 What is the fair way to split the pot? 236 00:12:39,692 --> 00:12:41,926 The question is: how do they divide up the pot 237 00:12:41,961 --> 00:12:44,428 so that it's fair to what might have happened 238 00:12:44,463 --> 00:12:46,430 if they'd been able to complete the game. 239 00:12:46,465 --> 00:12:51,402 Fermat suggested imagining the possible future outcomes 240 00:12:51,437 --> 00:12:53,604 if the game had continued. 241 00:12:53,639 --> 00:12:56,307 There are just two more coin flips, 242 00:12:56,342 --> 00:12:59,177 creating four possible combinations. 243 00:12:59,212 --> 00:13:05,383 Heads-heads, heads-tails, tails-heads, and tails-tails. 244 00:13:05,418 --> 00:13:09,520 In the first three, Blaise wins with enough heads. 245 00:13:09,555 --> 00:13:12,256 Pierre only wins in the last case, 246 00:13:12,291 --> 00:13:16,394 so Fermat suggested that a three-to-one split 247 00:13:16,429 --> 00:13:18,930 was the correct solution. 248 00:13:18,965 --> 00:13:20,631 The key breakthrough 249 00:13:20,666 --> 00:13:24,202 was imagining the future, mathematically, 250 00:13:24,237 --> 00:13:27,305 something even Pascal had trouble with. 251 00:13:27,340 --> 00:13:31,943 Because what Fermat did was say, "Let's look into the future, 252 00:13:31,978 --> 00:13:33,744 "look at possible futures, 253 00:13:33,779 --> 00:13:36,547 "and we'll count the way things could have happened 254 00:13:36,582 --> 00:13:38,449 in different possible futures." 255 00:13:38,484 --> 00:13:41,118 It was a simple arithmetic issue, 256 00:13:41,153 --> 00:13:44,822 but the idea of counting things in the future 257 00:13:44,857 --> 00:13:47,124 was just completely new, 258 00:13:47,159 --> 00:13:49,060 and Pascal couldn't wrap his mind around it. 259 00:13:49,095 --> 00:13:53,397 Eventually Pascal accepted Fermat's solution, 260 00:13:53,432 --> 00:13:54,999 as did others, 261 00:13:55,034 --> 00:13:58,002 and today, that exchange of letters is regarded 262 00:13:58,037 --> 00:14:01,405 as the birth of modern probability theory. 263 00:14:01,440 --> 00:14:04,842 People realized the future wasn't blank. 264 00:14:04,877 --> 00:14:06,911 You didn't know exactly what was going to happen, 265 00:14:06,946 --> 00:14:09,313 but you could calculate with great precision 266 00:14:09,348 --> 00:14:12,116 what the likelihood of things happening were. 267 00:14:12,151 --> 00:14:14,252 You could make all of the predictions 268 00:14:14,287 --> 00:14:16,587 we make today and take for granted. 269 00:14:16,622 --> 00:14:20,324 You could make them using mathematics. 270 00:14:20,359 --> 00:14:22,426 It was a fundamental insight, 271 00:14:22,461 --> 00:14:25,529 and one of the doors that led to the modern world. 272 00:14:25,564 --> 00:14:29,634 Inherent in all our attempts to predict the future-- 273 00:14:29,669 --> 00:14:31,335 from the stock market 274 00:14:31,370 --> 00:14:32,837 to insurance 275 00:14:32,872 --> 00:14:35,072 to web retailers trying to figure out 276 00:14:35,107 --> 00:14:36,841 what you might buy next, 277 00:14:36,876 --> 00:14:39,977 is the idea that with the right data, 278 00:14:40,012 --> 00:14:44,148 the likelihood of future events can be calculated. 279 00:14:48,421 --> 00:14:50,288 In fact, one of the great success stories 280 00:14:50,323 --> 00:14:52,056 in the science of prediction 281 00:14:52,091 --> 00:14:55,259 yields a forecast that many of us check every day 282 00:14:55,294 --> 00:14:58,763 to answer the question, "Do I need an umbrella 283 00:14:58,798 --> 00:15:01,732 or a storm shelter?" 284 00:15:04,937 --> 00:15:06,304 The hurricane season of 2017 285 00:15:06,339 --> 00:15:09,941 will be remembered for its ferocity and destruction. 286 00:15:09,976 --> 00:15:11,976 The strongest ever on record... 287 00:15:12,011 --> 00:15:14,111 The Puerto Rico and the San Juan that we knew yesterday 288 00:15:14,146 --> 00:15:15,346 is no longer there. 289 00:15:17,416 --> 00:15:21,352 The storms formed and gained in intensity with surprising speed, 290 00:15:21,387 --> 00:15:24,655 leaving forecasters to emphasize the uncertainty 291 00:15:24,690 --> 00:15:25,856 of where they might land. 292 00:15:25,891 --> 00:15:27,825 Maria is now a Category Three hurricane. 293 00:15:27,860 --> 00:15:30,461 Exactly what it's going to look like, we just don't know yet. 294 00:15:30,496 --> 00:15:32,363 There's still great uncertainty... 295 00:15:32,398 --> 00:15:35,099 In weather forecasting, 296 00:15:35,134 --> 00:15:38,636 the only certainty is uncertainty. 297 00:15:38,671 --> 00:15:40,004 One thing we know 298 00:15:40,039 --> 00:15:43,174 for sure is we cannot give you a perfect forecast. 299 00:15:43,209 --> 00:15:49,013 Given the nature of how we make a forecast, 300 00:15:49,048 --> 00:15:53,718 from the global observations to equations running on computers, 301 00:15:53,753 --> 00:15:56,253 stepping out in time, 302 00:15:56,288 --> 00:15:58,489 I don't think there'll ever be a perfect forecast. 303 00:15:58,524 --> 00:16:02,159 To fight that uncertainty, 304 00:16:02,194 --> 00:16:06,464 forecasters have turned to more data-- lots more data. 305 00:16:06,499 --> 00:16:08,566 Here at the National Weather Service 306 00:16:08,601 --> 00:16:10,067 Baltimore-Washington office, 307 00:16:10,102 --> 00:16:12,603 meteorologist Isha Renta 308 00:16:12,638 --> 00:16:15,873 prepares for the afternoon launch of a weather balloon. 309 00:16:15,908 --> 00:16:17,875 Twice a day, every day, 310 00:16:17,910 --> 00:16:21,512 all across the U.S. and around the world, 311 00:16:21,547 --> 00:16:23,414 at the very same time, 312 00:16:23,449 --> 00:16:26,584 balloons are released to take a package of instruments 313 00:16:26,619 --> 00:16:29,653 up through the atmosphere. 314 00:16:29,688 --> 00:16:34,325 It transmits readings about every ten meters in height. 315 00:16:34,360 --> 00:16:36,093 It's my understanding that they have developed 316 00:16:36,128 --> 00:16:39,230 other ways to get vertical profiles of the atmosphere, 317 00:16:39,265 --> 00:16:41,766 but still the accuracy and the resolution 318 00:16:41,801 --> 00:16:43,567 that the weather balloon will give you is a lot higher. 319 00:16:43,602 --> 00:16:45,803 So that's why we still depend on them. 320 00:16:51,010 --> 00:16:53,778 The data from Isha's weather balloon ends up 321 00:16:53,813 --> 00:16:55,613 at the National Center for Environmental Prediction 322 00:16:55,648 --> 00:16:57,515 in College Park, Maryland, 323 00:16:57,550 --> 00:17:00,684 the starting point for nearly all weather forecasts 324 00:17:00,719 --> 00:17:04,188 in the United States. 325 00:17:04,223 --> 00:17:06,891 Her information becomes one drop in a very large bucket 326 00:17:06,926 --> 00:17:11,662 of data taken in each day. 327 00:17:11,697 --> 00:17:12,963 Temperature, pressure, wind speed, 328 00:17:12,998 --> 00:17:14,365 and direction in the atmosphere. 329 00:17:14,400 --> 00:17:16,801 Tens of thousands of point observations are used 330 00:17:16,836 --> 00:17:20,037 every hour of every day as kind of a starting point. 331 00:17:20,072 --> 00:17:22,440 That's where we begin the simulation, 332 00:17:22,475 --> 00:17:23,674 from those observations. 333 00:17:25,511 --> 00:17:27,244 It all becomes part of a process, 334 00:17:27,279 --> 00:17:28,646 which has been described 335 00:17:28,681 --> 00:17:31,248 as one of the great intellectual achievements 336 00:17:31,283 --> 00:17:34,852 of the 20th century: numerical forecasting. 337 00:17:37,790 --> 00:17:39,890 The first step in numerical forecasting 338 00:17:39,925 --> 00:17:44,295 is to break a nearly 40-mile- thick section of the atmosphere 339 00:17:44,330 --> 00:17:47,098 into a three-dimensional grid. 340 00:17:47,133 --> 00:17:51,335 Then, each grid point is assigned numerical values 341 00:17:51,370 --> 00:17:53,304 for different aspects of the weather, 342 00:17:53,339 --> 00:17:56,140 based on the billions of measurements 343 00:17:56,175 --> 00:17:59,210 continually pouring into the Weather Service. 344 00:17:59,245 --> 00:18:00,544 So you'll have an understanding 345 00:18:00,579 --> 00:18:02,146 of temperature, pressure, 346 00:18:02,181 --> 00:18:04,882 and values in terms of wind and wind direction 347 00:18:04,917 --> 00:18:06,183 at each one of these points 348 00:18:06,218 --> 00:18:07,518 within this grid that covers the globe. 349 00:18:07,553 --> 00:18:11,956 From there, equations from the physics 350 00:18:11,991 --> 00:18:16,760 of fluids and thermodynamics are applied to each grid point. 351 00:18:16,795 --> 00:18:19,029 Not only do you change the characteristics 352 00:18:19,064 --> 00:18:22,199 at each grid point, but the changes at those grid points 353 00:18:22,234 --> 00:18:24,201 affect neighboring grid points, 354 00:18:24,236 --> 00:18:27,104 and then neighboring grid points affect other grid points. 355 00:18:27,139 --> 00:18:29,874 And so you evolve the atmosphere through time 356 00:18:29,909 --> 00:18:31,842 in this three-dimensional space. 357 00:18:31,877 --> 00:18:35,679 And remarkably, the approach works. 358 00:18:35,714 --> 00:18:37,515 It's amazingly crazy that it works. 359 00:18:37,550 --> 00:18:41,418 It's remarkable how well it does work, 360 00:18:41,453 --> 00:18:44,221 given that we're making grand assumptions 361 00:18:44,256 --> 00:18:46,557 about the initial state, so to speak, 362 00:18:46,592 --> 00:18:49,059 or the beginning state of any forecast. 363 00:18:49,094 --> 00:18:54,165 And that initial state turns out to be absolutely crucial. 364 00:18:56,268 --> 00:18:58,035 In the early days of numerical forecasting, 365 00:18:58,070 --> 00:19:01,438 it seemed like a definitive weather prediction 366 00:19:01,473 --> 00:19:04,808 extending far into the future might soon be possible. 367 00:19:04,843 --> 00:19:08,913 But research in the 1960s 368 00:19:08,948 --> 00:19:12,583 showed that slight errors in measuring the initial state 369 00:19:12,618 --> 00:19:17,688 grow larger over time, leading predictions astray. 370 00:19:17,723 --> 00:19:20,291 So as you step ahead in time, 371 00:19:20,326 --> 00:19:22,393 the forecast will become less accurate. 372 00:19:22,428 --> 00:19:27,198 Ironically, that sensitivity to initial conditions 373 00:19:27,233 --> 00:19:30,134 also suggested a way to improve the accuracy 374 00:19:30,169 --> 00:19:32,403 of numerical weather forecasts. 375 00:19:32,438 --> 00:19:35,839 Thanks to the power of today's computers, 376 00:19:35,874 --> 00:19:39,076 forecasters can run their weather simulations not once 377 00:19:39,111 --> 00:19:41,645 but several times. 378 00:19:41,680 --> 00:19:45,716 For each run, they slightly alter the initial conditions 379 00:19:45,751 --> 00:19:48,519 to reflect the inherent error built into the measurements 380 00:19:48,554 --> 00:19:50,387 and the uncertainty in the model itself. 381 00:19:50,422 --> 00:19:55,559 The process is called ensemble forecasting, 382 00:19:55,594 --> 00:19:59,029 and the results are called spaghetti plots. 383 00:19:59,064 --> 00:20:01,599 We're looking at about 100 different forecasts here 384 00:20:01,634 --> 00:20:04,268 for the jet stream at about six days ago. 385 00:20:04,303 --> 00:20:05,970 We have the actual jet stream 386 00:20:06,005 --> 00:20:08,072 drawn as the white line on here today, 387 00:20:08,107 --> 00:20:10,474 and you can see most of the forecasts six days ago 388 00:20:10,509 --> 00:20:12,409 were well north of where we actually find the jet stream 389 00:20:12,444 --> 00:20:13,711 this morning. 390 00:20:13,746 --> 00:20:16,680 And then we'll go to a five-day forecast 391 00:20:16,715 --> 00:20:19,583 and a four-day forecast and a three-day forecast 392 00:20:19,618 --> 00:20:23,087 and then down to two days and the day of the event. 393 00:20:23,122 --> 00:20:24,822 And you can see how the model forecasts all converge 394 00:20:24,857 --> 00:20:27,558 on that solution, which is what you would expect them to do. 395 00:20:27,593 --> 00:20:30,194 But you go back to the six-day forecast, 396 00:20:30,229 --> 00:20:32,997 you can see the large spread in the ensemble solutions 397 00:20:33,032 --> 00:20:35,232 for this particular pattern. 398 00:20:35,267 --> 00:20:40,004 In the end, meteorologists turn to statistical tools 399 00:20:40,039 --> 00:20:42,172 to analyze weather forecasts 400 00:20:42,207 --> 00:20:45,609 and often use probabilities to express the uncertainty 401 00:20:45,644 --> 00:20:47,645 in the results. 402 00:20:47,680 --> 00:20:49,913 That's the "40% chance of rain" you might hear 403 00:20:49,948 --> 00:20:52,483 from your local forecaster. 404 00:20:52,518 --> 00:20:54,718 Meteorology is probabilistic at its very core, 405 00:20:54,753 --> 00:20:56,720 and I believe that the general public knows 406 00:20:56,755 --> 00:21:00,491 there is uncertainty inherent in everything we say, 407 00:21:00,526 --> 00:21:01,725 but we're getting better. 408 00:21:04,530 --> 00:21:09,066 Our forecasts for three days out now are as accurate 409 00:21:09,101 --> 00:21:11,001 as one-day forecasts were about 10 years ago. 410 00:21:11,036 --> 00:21:12,569 And this continues to improve. 411 00:21:12,604 --> 00:21:15,673 So the science has advanced beyond my wildest dreams, 412 00:21:15,708 --> 00:21:19,577 and it's hard to even see where it might go in the future. 413 00:21:22,147 --> 00:21:23,847 Just like in meteorology, 414 00:21:23,882 --> 00:21:25,149 for the rest of science, 415 00:21:25,184 --> 00:21:27,618 the ultimate test of our understanding 416 00:21:27,653 --> 00:21:31,588 is our ability to make accurate predictions. 417 00:21:31,623 --> 00:21:33,724 On a grand scale, scientific theories 418 00:21:33,759 --> 00:21:36,593 like Einstein's general theory of relativity 419 00:21:36,628 --> 00:21:38,028 have to make predictions 420 00:21:38,063 --> 00:21:40,597 that can be tested to become accepted. 421 00:21:40,632 --> 00:21:43,233 In that case, it took four years 422 00:21:43,268 --> 00:21:45,636 before a full solar eclipse revealed 423 00:21:45,671 --> 00:21:48,939 that light passing near the sun curved, 424 00:21:48,974 --> 00:21:52,443 just as predicted by Einstein's theory-- 425 00:21:52,478 --> 00:21:54,611 the first proof he was right 426 00:21:54,646 --> 00:21:58,582 that the sun's mass distorts the fabric of space-time-- 427 00:21:58,617 --> 00:22:00,884 what we experience as gravity. 428 00:22:02,221 --> 00:22:06,890 In fact, the scientific method demands a hypothesis 429 00:22:06,925 --> 00:22:09,793 which leads to a prediction of results 430 00:22:09,828 --> 00:22:12,229 from a carefully designed experiment 431 00:22:12,264 --> 00:22:13,864 that will test its claim. 432 00:22:16,268 --> 00:22:20,104 Surprisingly, it wasn't until the 1920s and '30s 433 00:22:20,139 --> 00:22:23,273 that a British scientist, Ronald A. Fisher, 434 00:22:23,308 --> 00:22:26,710 laid out guidelines for designing experiments 435 00:22:26,745 --> 00:22:31,982 using statistics and probability as a way of judging results. 436 00:22:36,221 --> 00:22:39,423 As an example, he told the story of a lady 437 00:22:39,458 --> 00:22:41,091 who claimed to taste the difference 438 00:22:41,126 --> 00:22:43,861 between milk poured into her tea 439 00:22:47,866 --> 00:22:51,335 and tea poured into her milk. 440 00:22:56,408 --> 00:22:59,576 Fisher considered ways to test that. 441 00:22:59,611 --> 00:23:03,514 What if he presented her with just one cup to identify? 442 00:23:03,549 --> 00:23:05,749 If she got it right one time, you'd probably, 443 00:23:05,784 --> 00:23:08,352 "Well, yeah, but she had a 50-50 chance, just by guessing, 444 00:23:08,387 --> 00:23:09,887 of getting it right." 445 00:23:09,922 --> 00:23:12,523 So you'd be pretty unconvinced that she has the skill. 446 00:23:12,558 --> 00:23:16,460 Fisher proposed that a reasonable test of her ability 447 00:23:16,495 --> 00:23:18,729 would be eight cups, 448 00:23:18,764 --> 00:23:20,130 four with milk into tea, 449 00:23:20,165 --> 00:23:23,801 four with tea into milk, 450 00:23:23,836 --> 00:23:26,970 each presented randomly. 451 00:23:27,005 --> 00:23:33,210 The lady then had to separate them back into the two groups. 452 00:23:33,245 --> 00:23:34,645 Why eight? 453 00:23:34,680 --> 00:23:38,248 Because that produced 70 possible combinations 454 00:23:38,283 --> 00:23:43,487 of the cups, but only one with them separated correctly. 455 00:23:43,522 --> 00:23:46,089 If she got it right, 456 00:23:46,124 --> 00:23:48,992 that wouldn't prove she had a special ability, 457 00:23:49,027 --> 00:23:53,163 but Fisher could conclude, if she was just guessing, 458 00:23:53,198 --> 00:23:56,433 it was an extremely unlikely result, 459 00:23:56,468 --> 00:24:00,771 a probability of just 1.4 percent. 460 00:24:00,806 --> 00:24:03,707 Thanks mainly to Fisher, 461 00:24:03,742 --> 00:24:07,478 that idea became enshrined in experimental science 462 00:24:07,513 --> 00:24:12,182 as the "p-value" -- p for probability. 463 00:24:12,217 --> 00:24:15,252 If you assume your results were just due to chance, 464 00:24:15,287 --> 00:24:18,088 that what you were testing had no effect, 465 00:24:18,123 --> 00:24:21,825 what's the probability you would see those results 466 00:24:21,860 --> 00:24:24,094 or something even more rare? 467 00:24:24,129 --> 00:24:26,196 If you assume that there's a process 468 00:24:26,231 --> 00:24:28,131 that is completely random, 469 00:24:28,166 --> 00:24:32,870 and you find that it's pretty unlikely to get your data, 470 00:24:32,905 --> 00:24:34,972 then you might be suspicious that something is happening. 471 00:24:35,007 --> 00:24:37,975 You might conclude, in fact, that it's not a random process. 472 00:24:38,010 --> 00:24:42,179 That this is interesting to look at what else might be going on, 473 00:24:42,214 --> 00:24:43,914 and it passes some kind of sniff test. 474 00:24:43,949 --> 00:24:47,384 Fisher also suggested a benchmark: 475 00:24:47,419 --> 00:24:48,785 only experimental results 476 00:24:48,820 --> 00:24:51,922 where the p-value was under .05-- 477 00:24:51,957 --> 00:24:54,291 a probability of less than five percent-- 478 00:24:54,326 --> 00:24:56,093 were worth a second look. 479 00:24:56,128 --> 00:24:57,394 In other words, 480 00:24:57,429 --> 00:25:00,731 if you assume your results were just due to chance, 481 00:25:00,766 --> 00:25:03,734 you'd see them less than one time out of 20. 482 00:25:03,769 --> 00:25:06,370 Not very likely. 483 00:25:06,405 --> 00:25:10,607 He called those results "statistically significant." 484 00:25:10,642 --> 00:25:12,276 Statistically significant. 485 00:25:12,311 --> 00:25:13,577 Now this is a terrible word. 486 00:25:13,612 --> 00:25:15,379 It could be quite insignificant. 487 00:25:15,414 --> 00:25:19,583 You could be detecting a very, very, very small effect, 488 00:25:19,618 --> 00:25:22,686 but it would be called, in the mathematical lingo, 489 00:25:22,721 --> 00:25:23,854 "significant." 490 00:25:25,090 --> 00:25:26,523 Since Fisher's day, 491 00:25:26,558 --> 00:25:30,761 p-values have been used as a convenient yardstick for success 492 00:25:30,796 --> 00:25:34,565 by many, including most scientific journals. 493 00:25:34,600 --> 00:25:36,700 Since they prefer to publish successes, 494 00:25:36,735 --> 00:25:40,103 and getting published is critical to career advancement, 495 00:25:40,138 --> 00:25:43,941 the temptation to massage and manipulate experimental data 496 00:25:43,976 --> 00:25:47,244 into a good p-value is enormous. 497 00:25:47,279 --> 00:25:51,348 There's even a name for it: "p-hacking." 498 00:25:51,383 --> 00:25:55,686 P-hacking is when researchers consciously or unconsciously 499 00:25:55,721 --> 00:25:59,389 guide their data analysis to get the results that they want, 500 00:25:59,424 --> 00:26:03,560 and since .05 is kind of the-the bar for being able to publish 501 00:26:03,595 --> 00:26:06,630 and call something real, and get all your grant money, 502 00:26:06,665 --> 00:26:10,434 it's usually guiding the results 503 00:26:10,469 --> 00:26:13,370 so that you arrive at that p of .05. 504 00:26:15,774 --> 00:26:19,776 How much p-hacking really goes on is hard to know. 505 00:26:19,811 --> 00:26:22,212 What may be more important is to remember 506 00:26:22,247 --> 00:26:25,882 what was originally intended by a p-value. 507 00:26:25,917 --> 00:26:31,355 The p-value was always meant to be a detective, not a judge. 508 00:26:31,390 --> 00:26:33,090 If you do an experiment 509 00:26:33,125 --> 00:26:36,593 and find the result that is statistically significant, 510 00:26:36,628 --> 00:26:40,530 that is telling you, that is an interesting place to look 511 00:26:40,565 --> 00:26:42,733 and research and understand further what's going on, 512 00:26:42,768 --> 00:26:45,769 not "don't study this anymore because the matter is settled." 513 00:26:45,804 --> 00:26:50,540 In a sense, a low p-value is an invitation 514 00:26:50,575 --> 00:26:54,244 to reproduce the experiment, to help validate the result, 515 00:26:54,279 --> 00:26:56,546 but that doesn't always happen. 516 00:26:56,581 --> 00:27:00,584 In fact, there are few career incentives for it. 517 00:27:00,619 --> 00:27:04,221 Journals and funders prefer novel research. 518 00:27:04,256 --> 00:27:08,625 There is no Nobel Prize for replication. 519 00:27:08,660 --> 00:27:14,164 Another solution to p-hacking and the overemphasis on p-values 520 00:27:14,199 --> 00:27:17,868 may simply be greater transparency. 521 00:27:17,903 --> 00:27:19,436 More and more, what people are doing 522 00:27:19,471 --> 00:27:21,405 is publishing their data. 523 00:27:21,440 --> 00:27:25,776 And so it's becoming harder and harder to lie with statistics, 524 00:27:25,811 --> 00:27:27,444 because people will just probe and say, 525 00:27:27,479 --> 00:27:28,912 "Well, give me the set you analyzed 526 00:27:28,947 --> 00:27:30,514 and let me see how you got this result." 527 00:27:32,551 --> 00:27:36,453 Statistics continues to play a fundamental role in science, 528 00:27:36,488 --> 00:27:39,189 but really anywhere data is collected, 529 00:27:39,224 --> 00:27:40,624 you'll find statisticians are at work, 530 00:27:40,659 --> 00:27:44,861 looking for patterns, drawing conclusions, 531 00:27:44,896 --> 00:27:47,230 and often making predictions-- 532 00:27:47,265 --> 00:27:50,267 though they don't always work out. 533 00:27:51,369 --> 00:27:54,271 The presidential election of 2016 534 00:27:54,306 --> 00:27:55,772 was a tough one for pollsters, 535 00:27:55,807 --> 00:27:59,276 the folks who conduct and analyze opinion polls. 536 00:28:00,612 --> 00:28:03,146 Hillary Clinton was the overwhelming favorite 537 00:28:03,181 --> 00:28:05,582 to beat Donald Trump right up to election day. 538 00:28:05,617 --> 00:28:07,517 Trump is headed for a historic defeat. 539 00:28:07,552 --> 00:28:09,119 He's going to lose by a landslide. 540 00:28:09,154 --> 00:28:11,288 I think that she's going to have a very good night. 541 00:28:11,323 --> 00:28:15,992 The "New York Times" put Trump's chances at 15%. 542 00:28:16,027 --> 00:28:20,430 One pollster on election night gave him one percent. 543 00:28:20,465 --> 00:28:24,468 A projection of a 99% chance of winning, is that correct? 544 00:28:24,503 --> 00:28:25,869 The odds are overwhelming 545 00:28:25,904 --> 00:28:28,305 of a Hillary Clinton victory on Tuesday. 546 00:28:28,340 --> 00:28:31,442 I would be very surprised if anything else happened 547 00:28:33,078 --> 00:28:37,514 And, of course, Trump won, and Clinton lost. 548 00:28:37,549 --> 00:28:39,249 People were repeatedly told, 549 00:28:39,284 --> 00:28:40,584 "Hillary Clinton is the candidate 550 00:28:40,619 --> 00:28:43,253 most likely to win this election," and she didn't. 551 00:28:43,288 --> 00:28:46,556 And I think that really left people feeling almost lied to, 552 00:28:46,591 --> 00:28:49,326 almost cheated by these numbers. 553 00:28:49,361 --> 00:28:52,229 So what was going on with the polls? 554 00:28:52,264 --> 00:28:55,332 And exactly how do people predict elections? 555 00:28:55,367 --> 00:28:59,202 One way is just by asking people who they'll vote for. 556 00:28:59,237 --> 00:29:03,273 One of the great things about polling 557 00:29:03,308 --> 00:29:05,942 is that we don't have to talk to everybody 558 00:29:05,977 --> 00:29:12,249 in order to find out what the opinions are of everybody. 559 00:29:12,284 --> 00:29:14,484 We can actually select something called a sample. 560 00:29:16,388 --> 00:29:18,822 Sampling is a familiar idea. 561 00:29:18,857 --> 00:29:20,857 To see if the soup is right, 562 00:29:20,892 --> 00:29:23,760 you taste a teaspoon, not the whole pot. 563 00:29:23,795 --> 00:29:25,929 To test your blood at the doctor, 564 00:29:25,964 --> 00:29:28,131 they typically draw less than an ounce, 565 00:29:28,166 --> 00:29:30,267 they don't drain you dry. 566 00:29:30,302 --> 00:29:31,635 But in many circumstances, 567 00:29:31,670 --> 00:29:37,340 finding a representative sample is harder than it sounds. 568 00:29:37,375 --> 00:29:39,643 Let's suppose that this 569 00:29:39,678 --> 00:29:42,746 is the population of about a thousand people in a city, 570 00:29:42,781 --> 00:29:45,949 and we want to know, "Are people for or against 571 00:29:45,984 --> 00:29:48,552 converting a park into a dog park?" 572 00:29:48,587 --> 00:29:50,487 And so these green beads down here 573 00:29:50,522 --> 00:29:52,322 are going to represent people who were for it, 574 00:29:52,357 --> 00:29:55,091 and the red beads are folks who are against it. 575 00:29:55,126 --> 00:29:58,795 Talithia's first step is to take advantage 576 00:29:58,830 --> 00:30:03,767 of an unlikely ally in sampling: randomness. 577 00:30:03,802 --> 00:30:05,569 The beauty of randomization 578 00:30:05,604 --> 00:30:08,605 is that as long as you throw everything from your population 579 00:30:08,640 --> 00:30:12,042 into one pot and randomly pull it out, 580 00:30:12,077 --> 00:30:13,643 you can be sure that you're within 581 00:30:13,678 --> 00:30:15,178 a certain percentage points 582 00:30:15,213 --> 00:30:17,014 of the actual value that's in that pot. 583 00:30:19,217 --> 00:30:23,153 So the plan is to randomly sample the beads-- but how many? 584 00:30:23,188 --> 00:30:27,190 That depends on how much accuracy Talithia wants. 585 00:30:27,225 --> 00:30:31,228 One measure is the margin of error-- 586 00:30:31,263 --> 00:30:33,830 the maximum amount the result from the sample 587 00:30:33,865 --> 00:30:37,567 can be expected to differ from that of the whole population. 588 00:30:37,602 --> 00:30:40,670 It's the plus or minus figure, often a percentage, 589 00:30:40,705 --> 00:30:43,506 you see in the fine print in polls. 590 00:30:43,541 --> 00:30:47,510 But there's also confidence level. 591 00:30:47,545 --> 00:30:49,913 Inherently, there is uncertainty 592 00:30:49,948 --> 00:30:52,782 that any sample really represents a whole population. 593 00:30:52,817 --> 00:30:55,952 The confidence level tells you how sure you can be 594 00:30:55,987 --> 00:30:57,287 about your result. 595 00:30:57,322 --> 00:30:59,823 A 90% confidence level means, 596 00:30:59,858 --> 00:31:03,860 on average, if you ran your poll or sample 100 times, 597 00:31:03,895 --> 00:31:06,963 90 of those times, it would be accurate, 598 00:31:06,998 --> 00:31:09,299 within the margin of error. 599 00:31:09,334 --> 00:31:13,570 Talithia knows the total number of beads is a thousand. 600 00:31:13,605 --> 00:31:16,072 And she's settled on a plus-or-minus five-percent 601 00:31:16,107 --> 00:31:20,110 margin of error at a 90% confidence level. 602 00:31:20,145 --> 00:31:25,548 That means she needs a sample size of at least 214 beads. 603 00:31:25,583 --> 00:31:27,150 Here are the results: 604 00:31:27,185 --> 00:31:31,788 We got a 103 red beads and 111 green, 605 00:31:31,823 --> 00:31:34,791 so about 48% of our population would vote against, 606 00:31:34,826 --> 00:31:37,460 and about 52% would vote for. 607 00:31:37,495 --> 00:31:39,963 Now, remember that margin of error that we talked about, 608 00:31:39,998 --> 00:31:41,631 that plus-or-minus five percent? 609 00:31:41,666 --> 00:31:43,867 So once you take that into account, 610 00:31:43,902 --> 00:31:46,169 those numbers really aren't that different at all. 611 00:31:46,204 --> 00:31:50,774 So I guess you could say, this puppy is too close to call. 612 00:31:50,809 --> 00:31:56,546 In fact, within the margin of error, the stats got it right. 613 00:31:56,581 --> 00:32:00,517 There were an equal number of red and green beads in the jar. 614 00:32:03,021 --> 00:32:05,989 While the sampling error built in from the mathematics 615 00:32:06,024 --> 00:32:07,257 can be quantified, 616 00:32:07,292 --> 00:32:10,226 there are other errors that can't. 617 00:32:10,261 --> 00:32:14,030 The other parts of the error-- how we word our questions, 618 00:32:14,065 --> 00:32:15,899 how the respondents feel that day, 619 00:32:15,934 --> 00:32:18,234 the responsibility to predict 620 00:32:18,269 --> 00:32:21,905 what their behavior is going to be somewhere down the line-- 621 00:32:21,940 --> 00:32:23,440 all those sources of error 622 00:32:23,475 --> 00:32:25,909 are something that we can't calculate. 623 00:32:25,944 --> 00:32:30,914 And there's a catch to random sampling for polls too. 624 00:32:30,949 --> 00:32:32,615 A few decades back, 625 00:32:32,650 --> 00:32:35,719 when just about every household had a landline, 626 00:32:35,754 --> 00:32:40,323 finding a random sample meant randomly dialing phone numbers. 627 00:32:40,358 --> 00:32:44,260 Into the 1970s and the 1980s, we were getting, 628 00:32:44,295 --> 00:32:46,496 you know, 90% response rates. 629 00:32:46,531 --> 00:32:48,798 If we randomly chose a phone number, 630 00:32:48,833 --> 00:32:50,967 somebody on the other end of that phone would pick it up 631 00:32:51,002 --> 00:32:52,335 and would do the interview with us. 632 00:32:53,772 --> 00:32:55,538 Those days are over. 633 00:32:55,573 --> 00:32:58,475 Thanks to caller I.D. and answering machines, 634 00:32:58,510 --> 00:33:01,444 people often don't answer their landlines anymore-- 635 00:33:01,479 --> 00:33:04,114 if they even have one. 636 00:33:04,149 --> 00:33:06,750 Response rates are way down. 637 00:33:06,785 --> 00:33:09,652 Only about ten percent of people 638 00:33:09,687 --> 00:33:11,321 respond to polls. 639 00:33:11,356 --> 00:33:13,289 So you're kind of crossing your fingers 640 00:33:13,324 --> 00:33:14,991 and hoping the people you reach 641 00:33:15,026 --> 00:33:18,161 are the same as the ones that are actually going to vote. 642 00:33:18,196 --> 00:33:23,733 For example, we found in 2016, pollsters were not reaching 643 00:33:23,768 --> 00:33:26,870 enough white voters without college degrees. 644 00:33:26,905 --> 00:33:29,439 If there's a bias in the data, you cannot recover from it. 645 00:33:29,474 --> 00:33:32,142 As we've seen from some recent elections. 646 00:33:33,678 --> 00:33:37,280 After Donald Trump's surprise win, 647 00:33:37,315 --> 00:33:39,516 many wondered if polling was broken. 648 00:33:39,551 --> 00:33:42,852 But if you look at the polls themselves, 649 00:33:42,887 --> 00:33:44,287 and not the headlines, 650 00:33:44,322 --> 00:33:47,657 on average, polls on the national and state level 651 00:33:47,692 --> 00:33:52,362 were off by historically typical amounts. 652 00:33:52,397 --> 00:33:55,598 So when I hear people say, "Oh, the polls were wrong," 653 00:33:55,633 --> 00:33:58,635 then it probably reflects people's interpretations 654 00:33:58,670 --> 00:34:01,337 about the polls being wrong, 655 00:34:01,372 --> 00:34:03,273 where people, for various reasons, 656 00:34:03,308 --> 00:34:06,409 looked at the polls, and they said, 657 00:34:06,444 --> 00:34:08,812 "These numbers prove to me that Clinton's going to win." 658 00:34:08,847 --> 00:34:10,980 When we looked at the polls, we said, 659 00:34:11,015 --> 00:34:13,817 "These numbers certainly make her a favorite, 660 00:34:13,852 --> 00:34:16,719 "but they point toward an election that's fairly close 661 00:34:16,754 --> 00:34:18,755 and quite uncertain, actually." 662 00:34:18,790 --> 00:34:21,257 And in 2016, 663 00:34:21,292 --> 00:34:25,328 the U .S. presidential election was just that close. 664 00:34:25,363 --> 00:34:27,630 Trump's victory depended on fewer votes 665 00:34:27,665 --> 00:34:30,967 than the seating capacity of some college football stadiums-- 666 00:34:31,002 --> 00:34:33,436 spread across three states: 667 00:34:33,471 --> 00:34:36,840 Pennsylvania, Wisconsin, and Michigan. 668 00:34:36,875 --> 00:34:40,510 And there were some problems with the polls in those states 669 00:34:40,545 --> 00:34:43,046 that led to underestimating Trump's support, 670 00:34:43,081 --> 00:34:46,950 according to a postmortem by a consortium of pollsters. 671 00:34:50,688 --> 00:34:53,923 Nate Silver, the founder of the website FiveThirtyEight, 672 00:34:53,958 --> 00:34:55,959 is one of the biggest names in polling-- 673 00:34:55,994 --> 00:34:59,429 even though he doesn't generally conduct polls. 674 00:34:59,464 --> 00:35:02,832 Our job is to take other people's polls 675 00:35:02,867 --> 00:35:05,902 and to translate that 676 00:35:05,937 --> 00:35:09,339 in terms of a probability, to say basically whether-- 677 00:35:09,374 --> 00:35:10,507 um, who's ahead, 678 00:35:10,542 --> 00:35:13,443 which is usually pretty easy to tell, um, 679 00:35:13,478 --> 00:35:15,845 but then how certain or uncertain is the election 680 00:35:15,880 --> 00:35:17,780 is the more difficult part. 681 00:35:17,815 --> 00:35:20,717 Like a meteorologist, 682 00:35:20,752 --> 00:35:23,987 Nate presents his predictions as probabilities. 683 00:35:24,022 --> 00:35:27,390 On the morning of Election Day 2016, 684 00:35:27,425 --> 00:35:29,959 he gave Clinton about a 70% chance of winning 685 00:35:29,994 --> 00:35:33,696 and Trump about a 30% chance. 686 00:35:33,731 --> 00:35:36,332 That's like rolling a ten-sided die 687 00:35:36,367 --> 00:35:38,201 with seven sides that are Clinton 688 00:35:38,236 --> 00:35:40,770 and three that are Trump. 689 00:35:40,805 --> 00:35:42,705 People who make probabilistic forecasts, 690 00:35:42,740 --> 00:35:47,243 they're not saying that politics is intrinsically random. 691 00:35:47,278 --> 00:35:51,114 They're saying that we have imperfect knowledge of it, 692 00:35:51,149 --> 00:35:53,383 and that if you think you can be more certain than that, 693 00:35:53,418 --> 00:35:57,887 you're probably fooling yourself based on how accurate polls, 694 00:35:57,922 --> 00:35:59,789 other types of political data are. 695 00:36:02,260 --> 00:36:04,794 Ultimately, interpreting a probability 696 00:36:04,829 --> 00:36:07,030 depends on the situation. 697 00:36:07,065 --> 00:36:10,633 While a 30% chance might seem slim, 698 00:36:10,668 --> 00:36:14,270 if you learned the flight you were about to board 699 00:36:14,305 --> 00:36:16,940 crashed three out of every ten trips, 700 00:36:16,975 --> 00:36:18,875 would you get on the plane? 701 00:36:18,910 --> 00:36:21,811 As this plane only makes it to its destination 702 00:36:21,846 --> 00:36:23,713 seven out of ten times, 703 00:36:23,748 --> 00:36:26,449 please pay attention to our short safety briefing. 704 00:36:26,484 --> 00:36:29,085 Or if a weather forecaster said 705 00:36:29,120 --> 00:36:31,721 there's only a 30% chance of rain, 706 00:36:31,756 --> 00:36:32,822 and then it rained-- 707 00:36:32,857 --> 00:36:34,324 would you care? 708 00:36:34,359 --> 00:36:35,858 If it does rain, 709 00:36:35,893 --> 00:36:38,661 no one demands to know, "Why did it rain? 710 00:36:38,696 --> 00:36:40,463 We have to get to the bottom of this." 711 00:36:40,498 --> 00:36:42,465 We can say like, "It just did." 712 00:36:42,500 --> 00:36:44,000 It might have rained, it might not have rained. 713 00:36:44,035 --> 00:36:45,401 As it happened, it did. 714 00:36:45,436 --> 00:36:48,137 I do think there's a certain natural resistance 715 00:36:48,172 --> 00:36:50,907 to seeing things that maybe we care about 716 00:36:50,942 --> 00:36:52,775 more than whether it's going to rain or not, 717 00:36:52,810 --> 00:36:55,912 like elections, in that same way. 718 00:36:55,947 --> 00:36:59,249 As 2016 shows, 719 00:36:59,284 --> 00:37:01,818 predicting who will win the U.S. presidency, 720 00:37:01,853 --> 00:37:05,622 a one-time contest between two unique opponents, 721 00:37:05,657 --> 00:37:08,858 is far from easy. 722 00:37:08,893 --> 00:37:10,560 But in at least one field, 723 00:37:10,595 --> 00:37:14,030 there are literally decades of detailed statistics 724 00:37:14,065 --> 00:37:16,065 on how the contests played out-- 725 00:37:16,100 --> 00:37:17,900 baseball. 726 00:37:19,070 --> 00:37:22,272 Baseball has always been a game of numbers-- 727 00:37:22,307 --> 00:37:28,077 box scores, batting averages, ERAs, RBIs. 728 00:37:28,112 --> 00:37:31,514 But while stats have always been part of baseball, 729 00:37:31,549 --> 00:37:36,519 in the last 20 years, their importance has skyrocketed 730 00:37:36,554 --> 00:37:39,656 due to sports analytics, 731 00:37:39,691 --> 00:37:43,994 the use of predictive models to improve a team's performance. 732 00:37:45,396 --> 00:37:47,096 To some extent every business, not just sports, 733 00:37:47,131 --> 00:37:50,266 is really trying to predict the next event, you know. 734 00:37:50,301 --> 00:37:51,668 Whether you're on Wall Street, 735 00:37:51,703 --> 00:37:53,036 or if you're in the tech business, 736 00:37:53,071 --> 00:37:54,637 what's the new new thing. 737 00:37:54,672 --> 00:37:56,039 And for us, it's future player performance. 738 00:37:57,742 --> 00:37:59,842 Billy Beane was one of the first 739 00:37:59,877 --> 00:38:03,179 to adopt the quantitative approach in the late '90s, 740 00:38:03,214 --> 00:38:06,883 when he was the general manager of the Oakland Athletics. 741 00:38:06,918 --> 00:38:10,019 Stuck with the low payroll of a small-market team, 742 00:38:10,054 --> 00:38:13,389 he abandoned decades of subjective baseball lore 743 00:38:13,424 --> 00:38:17,560 and committed the organization to using statistical analyses 744 00:38:17,595 --> 00:38:20,096 to guide the team's decision-making. 745 00:38:20,131 --> 00:38:22,332 It very much became a mathematical equation 746 00:38:22,367 --> 00:38:24,133 putting together a baseball team. 747 00:38:24,168 --> 00:38:29,072 Billy's stats-driven approach started to attract attention 748 00:38:29,107 --> 00:38:30,773 when the Oakland A's finished in the playoffs 749 00:38:30,808 --> 00:38:34,777 in four consecutive years 750 00:38:34,812 --> 00:38:38,448 and set a league record with 20 wins in a row. 751 00:38:38,483 --> 00:38:39,749 Then it was lionized 752 00:38:39,784 --> 00:38:43,019 and even given a name in a best-selling book and movie, 753 00:38:43,054 --> 00:38:44,520 "Moneyball." 754 00:38:44,555 --> 00:38:46,756 Brad Pitt plays Billy. 755 00:38:46,791 --> 00:38:49,759 If we win on our budget with this team, 756 00:38:49,794 --> 00:38:53,296 we'll have changed the game. 757 00:38:54,499 --> 00:38:56,232 While "Moneyballing" didn't lead 758 00:38:56,267 --> 00:38:59,335 to a league championship for the Oakland A's, 759 00:38:59,370 --> 00:39:01,871 it did change the game. 760 00:39:01,906 --> 00:39:04,240 Today, every Major League Baseball team 761 00:39:04,275 --> 00:39:06,109 has a sports analytics department, 762 00:39:06,144 --> 00:39:09,712 trying to predict and enhance future player performance 763 00:39:09,747 --> 00:39:11,147 through data, 764 00:39:11,182 --> 00:39:12,949 analyzing everything 765 00:39:12,984 --> 00:39:16,352 from the angle and speed of the ball coming off the bat-- 766 00:39:16,387 --> 00:39:18,621 to which players should be brought up 767 00:39:18,656 --> 00:39:21,491 from the minor leagues or traded. 768 00:39:21,526 --> 00:39:24,794 I'll never pretend to be a math whiz, 769 00:39:24,829 --> 00:39:27,697 I just understand its powers and its application. 770 00:39:27,732 --> 00:39:29,365 When you run a Major League Baseball team, 771 00:39:29,400 --> 00:39:32,001 which is a great job, 772 00:39:32,036 --> 00:39:34,003 and every kid who dreams of doing it, 773 00:39:34,038 --> 00:39:36,939 I can tell you it's everything you've thought of. 774 00:39:36,974 --> 00:39:38,307 But when they ask me, 775 00:39:38,342 --> 00:39:39,776 "What do I have to do to do that?" 776 00:39:39,811 --> 00:39:42,178 My answer is always the same. 777 00:39:42,213 --> 00:39:44,514 I say, "Go study and get an A in math." 778 00:39:44,549 --> 00:39:49,318 While sports analytics has transformed baseball, 779 00:39:49,353 --> 00:39:54,056 Moneyballing has found its way into many unrelated fields. 780 00:39:54,091 --> 00:39:56,993 Proponents of data-driven decision making and prediction 781 00:39:57,028 --> 00:40:00,096 have applied the approach to areas as diverse 782 00:40:00,131 --> 00:40:04,200 as popular music and law enforcement. 783 00:40:04,235 --> 00:40:06,335 Moneyballing has been enabled 784 00:40:06,370 --> 00:40:08,471 by the vast amounts of information 785 00:40:08,506 --> 00:40:12,775 gathered through the internet, so-called "Big Data." 786 00:40:12,810 --> 00:40:14,710 Our current output of data 787 00:40:14,745 --> 00:40:19,916 is roughly 2.5 quintillion bytes a day. 788 00:40:21,385 --> 00:40:24,754 But what about the opposite situation, 789 00:40:24,789 --> 00:40:27,890 when there's very little data, yet actions need to be taken-- 790 00:40:27,925 --> 00:40:32,328 for example when searching for people lost at sea? 791 00:40:32,363 --> 00:40:35,965 How do you even begin to predict where they might be? 792 00:40:38,236 --> 00:40:41,904 The U.S. Coast Guard's Sector Boston Command Center. 793 00:40:41,939 --> 00:40:44,974 From this secure set of rooms, 794 00:40:45,009 --> 00:40:49,212 the Coast Guard coordinates all operations in the Boston area, 795 00:40:49,247 --> 00:40:52,248 including national security, drug enforcement, 796 00:40:52,283 --> 00:40:54,851 and search and rescue. 797 00:40:59,657 --> 00:41:01,757 Good morning, Coast Guard Sector Boston Command Center, 798 00:41:01,792 --> 00:41:03,025 Mr. Fleming speaking. 799 00:41:03,060 --> 00:41:04,126 Uh, good morning, sir... 800 00:41:04,161 --> 00:41:05,995 A caller reports 801 00:41:06,030 --> 00:41:07,430 that a friend went paddleboarding 802 00:41:07,465 --> 00:41:10,366 earlier in the morning, but he's now overdue. 803 00:41:16,774 --> 00:41:18,508 The Coast Guard initiates a search 804 00:41:18,543 --> 00:41:19,809 with a 45-foot response boat... 805 00:41:19,844 --> 00:41:21,077 Engaging... 806 00:41:21,112 --> 00:41:22,078 ...out of Boston Harbor. 807 00:41:22,113 --> 00:41:24,714 Coming up. 808 00:41:24,749 --> 00:41:26,616 Unfortunately, a paddle craft in trouble 809 00:41:26,651 --> 00:41:28,751 has grown increasingly common. 810 00:41:28,786 --> 00:41:30,920 You are required to have a life jacket on. 811 00:41:30,955 --> 00:41:33,689 The reason for that is in 2015, 812 00:41:33,724 --> 00:41:37,460 I think we had 625 deaths nationwide-- 813 00:41:37,495 --> 00:41:39,729 a number of those people that were recovered 814 00:41:39,764 --> 00:41:41,631 were recovered without a life jacket. 815 00:41:41,666 --> 00:41:45,434 The Command Center also launches another boat 816 00:41:45,469 --> 00:41:48,404 out of Station Point Allerton, in Hull. 817 00:41:48,439 --> 00:41:49,805 Short tack disconnected. 818 00:41:49,840 --> 00:41:51,040 Stand clear of lines. 819 00:41:52,710 --> 00:41:55,244 The caller said the missing person typically paddled 820 00:41:55,279 --> 00:41:58,214 between Nantasket Beach and Boston Light, 821 00:41:58,249 --> 00:41:59,715 about three miles away. 822 00:41:59,750 --> 00:42:01,584 But with all the unknowns-- 823 00:42:01,619 --> 00:42:04,520 where he got into trouble and how he may have drifted-- 824 00:42:04,555 --> 00:42:09,025 the search area could be as large as 20 square miles. 825 00:42:12,463 --> 00:42:15,598 Search and rescue operations 826 00:42:15,633 --> 00:42:19,402 are often based on unique circumstances 827 00:42:19,437 --> 00:42:22,772 and require action, despite incomplete information. 828 00:42:22,807 --> 00:42:24,507 To attack problems like that, 829 00:42:24,542 --> 00:42:27,076 statisticians turn to an idea that originates 830 00:42:27,111 --> 00:42:29,812 with an 18th-century English clergyman 831 00:42:29,847 --> 00:42:34,016 interested in probability-- Thomas Bayes. 832 00:42:34,051 --> 00:42:37,186 Imagine you are given a coin to flip, 833 00:42:37,221 --> 00:42:40,723 and you want to know if it is fair, 50-50 heads or tails, 834 00:42:40,758 --> 00:42:45,494 or weighted to land more on heads than tails. 835 00:42:45,529 --> 00:42:48,097 The traditional approach in statistics and science 836 00:42:48,132 --> 00:42:53,102 doesn't assume either answer and uses experiments to find out. 837 00:42:53,137 --> 00:42:58,441 In this case that involves flipping the coin a lot. 838 00:42:58,476 --> 00:43:01,877 Or you could approach the problem like a Bayesian. 839 00:43:01,912 --> 00:43:04,246 Unlike traditional statistics, 840 00:43:04,281 --> 00:43:06,882 that means starting with an initial probability 841 00:43:06,917 --> 00:43:09,285 based on what you know. 842 00:43:09,320 --> 00:43:12,021 In this case, all the coins you've ever come across 843 00:43:12,056 --> 00:43:15,358 in a lifetime of flipping coins have been fair. 844 00:43:15,393 --> 00:43:18,928 It seems likely this one is probably fair too. 845 00:43:18,963 --> 00:43:21,897 Next, you also flip the coin, 846 00:43:21,932 --> 00:43:24,433 updating the probability as you go. 847 00:43:24,468 --> 00:43:27,370 Let's say it starts off with several heads in a row. 848 00:43:27,405 --> 00:43:29,005 That might make you wonder, 849 00:43:29,040 --> 00:43:32,174 increasing your probability estimate that it's weighted. 850 00:43:32,209 --> 00:43:36,846 But as you flip it more times, those start to look like chance. 851 00:43:36,881 --> 00:43:38,681 In the end, your best estimate 852 00:43:38,716 --> 00:43:41,917 is that it is probably a fair coin, 853 00:43:41,952 --> 00:43:44,754 but you are open to any new information. 854 00:43:44,789 --> 00:43:48,157 Like it belongs to your uncle the con man, "Crooked Larry." 855 00:43:50,995 --> 00:43:52,461 Sector 659. 856 00:43:52,496 --> 00:43:56,132 Our estimated time of arrival is one-one-five-eight. 857 00:43:56,167 --> 00:44:00,369 Bayesian inference creates a rigorous mathematical approach 858 00:44:00,404 --> 00:44:05,241 to calculating probabilities based on new information. 859 00:44:05,276 --> 00:44:07,643 And it sits at the heart of the Coast Guard's 860 00:44:07,678 --> 00:44:12,381 Search and Rescue Optimal Planning System: SAROPS. 861 00:44:12,416 --> 00:44:15,051 He's been missing since 7:30 this morning, 862 00:44:15,086 --> 00:44:16,952 so I'm going to go ahead and do a SAROPS drift. 863 00:44:16,987 --> 00:44:18,754 SAROPS takes information 864 00:44:18,789 --> 00:44:21,991 about the last-known position of the object of the search... 865 00:44:22,026 --> 00:44:23,626 What's the direction of the wind? 866 00:44:23,661 --> 00:44:26,195 ...along with the readings of currents and winds 867 00:44:26,230 --> 00:44:28,264 and combines them with information 868 00:44:28,299 --> 00:44:30,933 about how objects drift in the water 869 00:44:30,968 --> 00:44:33,903 to simulate thousands of possible paths 870 00:44:33,938 --> 00:44:36,238 the target may have taken. 871 00:44:36,273 --> 00:44:38,708 These get processed into probabilities, 872 00:44:38,743 --> 00:44:40,443 indicated by color, 873 00:44:40,478 --> 00:44:43,679 and turned into search plans to be executed. 874 00:44:43,714 --> 00:44:46,348 SAROPS is really a workhorse for the Coast Guard. 875 00:44:46,383 --> 00:44:48,317 It does a lot of the calculations for us. 876 00:44:48,352 --> 00:44:50,419 It provides us with a lot of valuable search patterns 877 00:44:50,454 --> 00:44:51,987 and search-planning options. 878 00:44:52,022 --> 00:44:54,356 I thought he was pretty far off shore but, you know, 879 00:44:54,391 --> 00:44:56,892 he said he was okay, so I kept going. 880 00:44:56,927 --> 00:44:58,828 Word of the search has spread. 881 00:44:58,863 --> 00:45:03,399 A boater calls in a sighting from earlier in the day. 882 00:45:03,434 --> 00:45:05,401 What I did is I went in and put that information into SAROPS, 883 00:45:05,436 --> 00:45:07,002 and it changed everything. 884 00:45:07,037 --> 00:45:10,940 SAROPS quickly recalculates all the probabilities 885 00:45:10,975 --> 00:45:13,542 and generates a new search plan. 886 00:45:13,577 --> 00:45:17,680 The area has shifted about three miles farther out to sea. 887 00:45:19,683 --> 00:45:23,052 We are on-scene, commencing search pattern now. 888 00:45:25,089 --> 00:45:26,122 Keep a good look out. 889 00:45:26,157 --> 00:45:27,556 Roger, coming up. 890 00:45:27,591 --> 00:45:28,791 We're assessing the situation on scene. 891 00:45:30,661 --> 00:45:34,497 Any object you see in the water, please take a closer look at. 892 00:45:45,709 --> 00:45:47,743 Paddleboarder, port side 893 00:45:51,115 --> 00:45:54,416 Roger, we have located a paddleboarder 894 00:45:54,451 --> 00:45:55,718 with zero-one person on board. 895 00:45:55,753 --> 00:45:57,453 Off the port corridor! 896 00:45:57,488 --> 00:45:59,488 Starboard side. 897 00:45:59,523 --> 00:46:00,956 I have a visual 898 00:46:00,991 --> 00:46:02,658 All right. 899 00:46:02,693 --> 00:46:07,129 As it turns out, the search has been a drill. 900 00:46:07,164 --> 00:46:10,466 Hours earlier, the paddleboard was placed in the water 901 00:46:10,501 --> 00:46:13,536 by another Coast Guard ship and allowed to drift. 902 00:46:13,571 --> 00:46:17,606 The instruments mounted on it are there to measure wind 903 00:46:17,641 --> 00:46:20,242 and record the path it's taken, 904 00:46:20,277 --> 00:46:21,877 information that will later be used 905 00:46:21,912 --> 00:46:25,181 to tweak the drift simulations in SAROPS, 906 00:46:25,216 --> 00:46:28,184 though the system performed quite well today. 907 00:46:28,219 --> 00:46:30,953 The object was right in the middle of our search patterns. 908 00:46:30,988 --> 00:46:33,389 So SAROPS was actually dead-on accurate 909 00:46:33,424 --> 00:46:35,024 in predicting where we needed to search 910 00:46:35,059 --> 00:46:36,458 to find the missing paddleboarder. 911 00:46:38,395 --> 00:46:40,696 To be able to call a family and say, 912 00:46:40,731 --> 00:46:42,364 "Your family and friends is coming home," 913 00:46:42,399 --> 00:46:43,899 is absolutely a call 914 00:46:43,934 --> 00:46:45,401 that all of us should have the chance to make, 915 00:46:45,436 --> 00:46:47,803 and, fortunately, because of stuff like this, 916 00:46:47,838 --> 00:46:49,572 we do get to make that call. 917 00:46:52,610 --> 00:46:55,611 The computational complexity of updating probabilities 918 00:46:55,646 --> 00:46:59,415 held the Bayesian approach back for most of the 20th century. 919 00:46:59,450 --> 00:47:05,054 But today's computing power has unleashed it on the world. 920 00:47:05,089 --> 00:47:08,691 It's in everything from your spam filter 921 00:47:08,726 --> 00:47:13,596 to the way Google searches work to self-driving cars. 922 00:47:13,631 --> 00:47:17,700 Some even find in the Bayesian embrace of probability, 923 00:47:17,735 --> 00:47:21,537 similarities to how we learn from experience. 924 00:47:21,572 --> 00:47:24,773 And they've built it into computers, 925 00:47:24,808 --> 00:47:26,642 Making it part of a powerful new force: 926 00:47:26,677 --> 00:47:28,811 machine learning. 927 00:47:28,846 --> 00:47:31,513 In the past, when we programmed computers, 928 00:47:31,548 --> 00:47:36,018 we tended to really write down, in excruciating detail, 929 00:47:36,053 --> 00:47:38,687 a set of rules that would tell the computer 930 00:47:38,722 --> 00:47:42,324 what to do in every single contingencies. 931 00:47:42,359 --> 00:47:44,526 But there's another approach-- 932 00:47:44,561 --> 00:47:46,462 to treat the computer 933 00:47:46,497 --> 00:47:51,267 like a child learning to ride a bike. 934 00:47:51,302 --> 00:47:53,669 No one teaches a child to ride using a set of rules. 935 00:47:53,704 --> 00:47:55,971 There may be some tips, 936 00:47:56,006 --> 00:47:58,407 but ultimately, it is trial and error-- 937 00:47:58,442 --> 00:48:02,778 experience-- that's the instructor. 938 00:48:02,813 --> 00:48:04,747 The new thing, the new kid on the block 939 00:48:04,782 --> 00:48:05,981 is machine learning, 940 00:48:06,016 --> 00:48:07,683 specifically something called deep learning. 941 00:48:07,718 --> 00:48:10,986 Here, we don't inform the computer of the rules, 942 00:48:11,021 --> 00:48:12,354 but through examples. 943 00:48:12,389 --> 00:48:14,056 So similar to, like, a small child 944 00:48:14,091 --> 00:48:16,692 that falls down and learns from this experience, 945 00:48:16,727 --> 00:48:18,995 we just let the computer learn from examples. 946 00:48:21,532 --> 00:48:23,165 Suppose you want to train a computer 947 00:48:23,200 --> 00:48:26,068 to recognize pictures of cats. 948 00:48:26,103 --> 00:48:29,338 By scanning through thousands of labeled pictures-- 949 00:48:29,373 --> 00:48:31,407 some cats, some not-- 950 00:48:31,442 --> 00:48:34,109 the computer can develop its own guidelines 951 00:48:34,144 --> 00:48:38,380 for assessing the probability that a picture is a cat. 952 00:48:38,415 --> 00:48:39,982 And these days 953 00:48:40,017 --> 00:48:44,353 computers are doing far more than just looking for cats. 954 00:48:44,388 --> 00:48:46,155 Some of the best computers now 955 00:48:46,190 --> 00:48:49,858 can learn how to beat the world's best Go champion 956 00:48:49,893 --> 00:48:54,263 or to discover documents in stacks of documents, 957 00:48:54,298 --> 00:48:56,498 work that highly paid lawyers normally do, 958 00:48:56,533 --> 00:48:58,867 or diagnose diseases. 959 00:48:58,902 --> 00:49:00,970 At Stanford, we recently ran a study 960 00:49:01,005 --> 00:49:03,839 to understand whether a machine-learning algorithm 961 00:49:03,874 --> 00:49:07,676 can compete with top-notch, Stanford-level, 962 00:49:07,711 --> 00:49:09,044 board-certified dermatologists 963 00:49:09,079 --> 00:49:13,182 in spotting things like skin cancer. 964 00:49:13,217 --> 00:49:16,685 And lo and behold, we found that our machine-learning algorithm, 965 00:49:16,720 --> 00:49:18,187 our little box, 966 00:49:18,222 --> 00:49:21,824 is as good as the best human doctor in finding skin cancer. 967 00:49:21,859 --> 00:49:24,994 That raises a lot of questions: 968 00:49:25,029 --> 00:49:28,130 should we trust software over our doctors? 969 00:49:28,165 --> 00:49:31,700 Or are diagnostic programs like Sebastian's 970 00:49:31,735 --> 00:49:34,103 the intelligent medical assistants of tomorrow, 971 00:49:34,138 --> 00:49:37,506 a new tool but not a substitute? 972 00:49:37,541 --> 00:49:41,010 And there are other concerns. 973 00:49:41,045 --> 00:49:46,715 If you asked a person riding a bike exactly how they do it, 974 00:49:46,750 --> 00:49:49,184 they'd be hard-pressed to put it into words. 975 00:49:49,219 --> 00:49:50,652 The same is true 976 00:49:50,687 --> 00:49:53,789 with so-called "black box" machine learning applications 977 00:49:53,824 --> 00:49:55,457 like Sebastian's: 978 00:49:55,492 --> 00:49:57,626 no one, including Sebastian, 979 00:49:57,661 --> 00:50:00,829 knows how it detects skin cancer. 980 00:50:00,864 --> 00:50:04,166 Like the bicyclist, it just does, 981 00:50:04,201 --> 00:50:06,869 which may be fine for diagnostic software, 982 00:50:06,904 --> 00:50:09,104 but not for other aspects of medicine, 983 00:50:09,139 --> 00:50:12,408 like treatment decisions. 984 00:50:12,443 --> 00:50:13,776 If what you're doing is deciding 985 00:50:13,811 --> 00:50:15,878 what dose of chemotherapy to give a patient, 986 00:50:15,913 --> 00:50:17,946 I think most people would be uncomfortable 987 00:50:17,981 --> 00:50:19,481 with that being a black box. 988 00:50:19,516 --> 00:50:20,883 People would want to understand 989 00:50:20,918 --> 00:50:22,584 where those predictions are coming from. 990 00:50:22,619 --> 00:50:25,387 The same can be true 991 00:50:25,422 --> 00:50:26,955 for evaluating who should get a home loan, 992 00:50:26,990 --> 00:50:30,592 or who should get fired from their job for poor performance, 993 00:50:30,627 --> 00:50:33,529 or who gets paroled, 994 00:50:33,564 --> 00:50:35,230 all situations 995 00:50:35,265 --> 00:50:38,967 in which black box machine learning software are in use. 996 00:50:39,002 --> 00:50:40,436 These are algorithms 997 00:50:40,471 --> 00:50:43,605 that can have a big effect on people's lives. 998 00:50:43,640 --> 00:50:45,607 And we have to understand, as a society, 999 00:50:45,642 --> 00:50:47,142 what is going into those algorithms 1000 00:50:47,177 --> 00:50:48,510 and what they're based on, 1001 00:50:48,545 --> 00:50:50,879 in order to make sure that they're not perpetuating 1002 00:50:50,914 --> 00:50:53,315 social problems that we already have. 1003 00:50:55,319 --> 00:50:59,788 We live in an age when the fusion of data, computers, 1004 00:50:59,823 --> 00:51:03,392 probability, and statistics 1005 00:51:03,427 --> 00:51:06,095 grants us more predictive power than we've ever known before. 1006 00:51:06,130 --> 00:51:10,165 We can see the tangible benefits, 1007 00:51:10,200 --> 00:51:12,167 and some of the dangers, 1008 00:51:12,202 --> 00:51:17,272 while also wondering where this will all go. 1009 00:51:17,307 --> 00:51:19,241 We're really seeing a new science of statistics 1010 00:51:19,276 --> 00:51:20,809 developing under our feet. 1011 00:51:20,844 --> 00:51:21,977 That's exciting, 1012 00:51:22,012 --> 00:51:24,746 and I think it must be a little bit like 1013 00:51:24,781 --> 00:51:26,381 what it was like when the theory of probability 1014 00:51:26,416 --> 00:51:28,217 was first being developed 1015 00:51:28,252 --> 00:51:30,853 by Pascal and Fermat and people around them, 1016 00:51:30,888 --> 00:51:32,020 that people were sort of saying, 1017 00:51:32,055 --> 00:51:33,889 "My God, these are questions that mathematics 1018 00:51:33,924 --> 00:51:35,924 can really have something to say about." 1019 00:51:35,959 --> 00:51:37,593 I think that must have been what it was like 1020 00:51:37,628 --> 00:51:39,595 when statistics in its traditional form 1021 00:51:39,630 --> 00:51:41,897 was being developed in the first part of the 20th century, 1022 00:51:41,932 --> 00:51:44,199 and suddenly people were just asking 1023 00:51:44,234 --> 00:51:45,634 whole new kinds of questions 1024 00:51:45,669 --> 00:51:47,069 that they couldn't even have approached before. 1025 00:51:47,104 --> 00:51:50,005 And I think we're having another moment like that now. 1026 00:51:51,842 --> 00:51:54,943 While tomorrow will always remain uncertain, 1027 00:51:54,978 --> 00:51:58,547 mathematics will continue to guide the way, 1028 00:51:58,582 --> 00:52:00,916 through the power of probability, 1029 00:52:00,951 --> 00:52:04,253 and prediction by the numbers. 82008

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.