All language subtitles for 003 Exploring the Dataset (SQuAD) in Python_en

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic Download
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:11,150 --> 00:00:14,960 In this lecture, we will begin looking at our question answering notebook. 2 00:00:15,230 --> 00:00:19,250 As usual, we begin by installing transformers and data sets. 3 00:00:27,290 --> 00:00:30,470 The next step is to input the function load data set. 4 00:00:30,620 --> 00:00:33,860 We then call this function passing in the string squad. 5 00:00:34,010 --> 00:00:36,380 We'll call the result of raw data sets. 6 00:00:44,590 --> 00:00:50,650 So note that the data set already becomes split into two parts one for train and one for validation. 7 00:00:50,950 --> 00:00:56,770 The train set has about 88,000 samples, while the validation set has about a 10,000. 8 00:00:57,130 --> 00:01:02,920 Note that each data set it comes with the columns ID title, context, question and answers. 9 00:01:04,129 --> 00:01:09,050 So far in this course, we haven't had much a need for the ID column, but in this section it will be 10 00:01:09,050 --> 00:01:09,770 crucial. 11 00:01:10,070 --> 00:01:12,560 On the other hand, we won't be using the title. 12 00:01:18,640 --> 00:01:21,790 Let's look at one of the titles anyway just to see what it is. 13 00:01:25,100 --> 00:01:28,190 As you can see, it says University of Notre Dame. 14 00:01:31,690 --> 00:01:34,450 Now let's look at the context for the same sample. 15 00:01:37,650 --> 00:01:41,490 As you can see, it's the same context I showed you in the previous lecture. 16 00:01:46,700 --> 00:01:48,980 Now let's look at the corresponding question. 17 00:01:53,010 --> 00:01:56,700 So it says what is in front of the Notre Dame main building. 18 00:02:00,450 --> 00:02:02,610 Finally, let's look at the answers. 19 00:02:05,400 --> 00:02:08,370 So the answer is a copper statue of Christ. 20 00:02:14,120 --> 00:02:20,330 Now, as you've seen, the answers are stored in a list and the column is the plural answers indicating 21 00:02:20,330 --> 00:02:23,180 that there can be multiple answers per input. 22 00:02:23,570 --> 00:02:28,490 I stated that although this is always possible, it is not the case for the train set. 23 00:02:28,730 --> 00:02:32,330 So in this line we simply check how many samples in the train set. 24 00:02:32,360 --> 00:02:36,050 Have a list of answers with any length, not equal to one. 25 00:02:44,870 --> 00:02:50,570 So as you can see, after we apply this filter, we find that zero samples have an answer list with 26 00:02:50,570 --> 00:02:52,220 length and not equal to one. 27 00:02:52,670 --> 00:02:57,380 This means that for the train set, every input has precisely one answer. 28 00:03:02,570 --> 00:03:06,260 The next step is to check out one of the answers from the validation set. 29 00:03:11,310 --> 00:03:15,030 As you can see, this particular sample has three answers. 30 00:03:15,030 --> 00:03:20,610 So for the validation set, it is possible for one question to have multiple answers. 31 00:03:25,350 --> 00:03:30,930 Let's now check the corresponding context to see why this question may have multiple answers. 32 00:03:39,780 --> 00:03:41,210 So we find the text. 33 00:03:41,220 --> 00:03:48,810 The game was played on February seven, 2016 at Levi's Stadium in the San Francisco Bay area at Santa 34 00:03:48,810 --> 00:03:50,040 Clara, California. 35 00:03:50,430 --> 00:03:53,520 This explains the three possible answers we saw above. 36 00:03:53,970 --> 00:03:57,000 In fact, there are probably even more valid answers. 37 00:03:57,030 --> 00:04:01,770 For instance, we could say San Francisco Bay Area or simply Bay Area. 38 00:04:02,070 --> 00:04:04,170 Although these are not in the data set. 39 00:04:10,420 --> 00:04:12,160 Now just for completion sake. 40 00:04:12,160 --> 00:04:13,690 Let's check out the question. 41 00:04:17,500 --> 00:04:22,810 As you can see, the question is, as expected, where did Super Bowl 50 take place? 42 00:04:28,140 --> 00:04:34,260 Now, one weird aspect of this data set is that for some cases where there are multiple answers, the 43 00:04:34,260 --> 00:04:36,090 answers are actually all the same. 44 00:04:37,230 --> 00:04:39,750 Let's check the first sample to see an example. 45 00:04:44,350 --> 00:04:48,310 So as you can see, we get Denver Broncos three times. 46 00:04:48,700 --> 00:04:53,830 Now, you might think that they could be different answers if the same string shows up multiple times 47 00:04:53,830 --> 00:04:54,970 in the context. 48 00:04:55,120 --> 00:05:00,430 But in this case, we see that the answer is START index is the same in all three cases. 4775

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.