All language subtitles for 03_supervised-learning-part-2.en

af Afrikaans
ak Akan
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bem Bemba
bn Bengali
bh Bihari
bs Bosnian
br Breton
bg Bulgarian
km Cambodian
ca Catalan
ceb Cebuano
chr Cherokee
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
ee Ewe
fo Faroese
tl Filipino
fi Finnish
fr French
fy Frisian
gaa Ga
gl Galician
ka Georgian
de German
el Greek
gn Guarani
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ia Interlingua
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
rw Kinyarwanda
rn Kirundi
kg Kongo
ko Korean
kri Krio (Sierra Leone)
ku Kurdish
ckb Kurdish (Soranî)
ky Kyrgyz
lo Laothian
la Latin
lv Latvian
ln Lingala
lt Lithuanian
loz Lozi
lg Luganda
ach Luo
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mfe Mauritian Creole
mo Moldavian
mn Mongolian
my Myanmar (Burmese)
sr-ME Montenegrin
ne Nepali
pcm Nigerian Pidgin
nso Northern Sotho
no Norwegian
nn Norwegian (Nynorsk)
oc Occitan
or Oriya
om Oromo
ps Pashto
fa Persian Download
pl Polish
pt-BR Portuguese (Brazil)
pt Portuguese (Portugal)
pa Punjabi
qu Quechua
ro Romanian
rm Romansh
nyn Runyakitara
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
sh Serbo-Croatian
st Sesotho
tn Setswana
crs Seychellois Creole
sn Shona
sd Sindhi
si Sinhalese
sk Slovak
sl Slovenian
so Somali
es Spanish
es-419 Spanish (Latin American)
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
tt Tatar
te Telugu
th Thai
ti Tigrinya
to Tonga
lua Tshiluba
tum Tumbuka
tr Turkish
tk Turkmen
tw Twi
ug Uighur
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
wo Wolof
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:02,177 --> 00:00:08,866 So supervised learning algorithms learn to predict input, output or X to Y mapping. 2 00:00:08,866 --> 00:00:12,574 And in the last video you saw that regression algorithms, 3 00:00:12,574 --> 00:00:17,568 which is a type of supervised learning algorithm learns to predict numbers out 4 00:00:17,568 --> 00:00:20,081 of infinitely many possible numbers. 5 00:00:20,081 --> 00:00:24,879 There's a second major type of supervised learning algorithm called a classification 6 00:00:24,879 --> 00:00:25,603 algorithm. 7 00:00:25,603 --> 00:00:28,935 Let's take a look at what this means. 8 00:00:28,935 --> 00:00:35,102 Take breast cancer detection as an example of a classification problem. 9 00:00:35,102 --> 00:00:37,819 Say you're building a machine learning system so 10 00:00:37,819 --> 00:00:41,389 that doctors can have a diagnostic tool to detect breast cancer. 11 00:00:41,389 --> 00:00:46,753 This is important because early detection could potentially save a patient's life. 12 00:00:46,753 --> 00:00:51,784 Using a patient's medical records your machine learning system tries to 13 00:00:51,784 --> 00:00:57,311 figure out if a tumor that is a lump is malignant meaning cancerous or dangerous. 14 00:00:57,311 --> 00:01:02,171 Or if that tumor, that lump is benign, meaning that it's just 15 00:01:02,171 --> 00:01:06,586 a lump that isn't cancerous and isn't that dangerous? 16 00:01:06,586 --> 00:01:10,882 Some of my friends have actually been working on this specific problem. 17 00:01:10,882 --> 00:01:15,552 So maybe your dataset has tumors of various sizes. 18 00:01:15,552 --> 00:01:19,478 And these tumors are labeled as either benign, 19 00:01:19,478 --> 00:01:23,504 which I will designate in this example with a 0 or 20 00:01:23,504 --> 00:01:28,529 malignant, which will designate in this example with a 1. 21 00:01:28,529 --> 00:01:33,075 You can then plot your data on a graph like this where 22 00:01:33,075 --> 00:01:38,047 the horizontal axis represents the size of the tumor and 23 00:01:38,047 --> 00:01:42,171 the vertical axis takes on only two values 0 or 24 00:01:42,171 --> 00:01:48,023 1 depending on whether the tumor is benign, 0 or malignant 1. 25 00:01:48,023 --> 00:01:48,873 One reason that this is different from regression is that we're trying to predict 26 00:01:48,873 --> 00:01:49,471 only a small number of possible outputs or categories. 27 00:01:49,471 --> 00:01:55,210 In this case two possible 28 00:01:55,210 --> 00:01:59,308 outputs 0 or 1, 29 00:01:59,308 --> 00:02:04,510 benign or malignant. 30 00:02:04,510 --> 00:02:10,142 This is different from regression which tries to predict any number, 31 00:02:10,142 --> 00:02:14,637 all of the infinitely many number of possible numbers. 32 00:02:14,637 --> 00:02:18,768 And so the fact that there are only two possible outputs is 33 00:02:18,768 --> 00:02:21,275 what makes this classification. 34 00:02:21,275 --> 00:02:25,140 Because there are only two possible outputs or 35 00:02:25,140 --> 00:02:28,708 two possible categories in this example, 36 00:02:28,708 --> 00:02:32,887 you can also plot this data set on a line like this. 37 00:02:32,887 --> 00:02:38,128 Right now, I'm going to use two different symbols to denote 38 00:02:38,128 --> 00:02:43,677 the category using a circle an O to denote the benign examples and 39 00:02:43,677 --> 00:02:47,395 a cross to denote the malignant examples. 40 00:02:47,395 --> 00:02:51,724 And if new patients walks in for a diagnosis and 41 00:02:51,724 --> 00:02:57,052 they have a lump that is this size, then the question is, 42 00:02:57,052 --> 00:03:02,838 will your system classify this tumor as benign or malignant? 43 00:03:02,838 --> 00:03:07,815 It turns out that in classification problems you can also have more than two 44 00:03:07,815 --> 00:03:09,874 possible output categories. 45 00:03:09,874 --> 00:03:14,594 Maybe you're learning algorithm can output multiple types of cancer 46 00:03:14,594 --> 00:03:17,474 diagnosis if it turns out to be malignant. 47 00:03:17,474 --> 00:03:22,497 So let's call two different types of cancer type 1 and type 2. 48 00:03:22,497 --> 00:03:27,271 In this case the average would have three possible output 49 00:03:27,271 --> 00:03:29,864 categories it could predict. 50 00:03:29,864 --> 00:03:34,157 And by the way in classification, the terms output classes and 51 00:03:34,157 --> 00:03:37,804 output categories are often used interchangeably. 52 00:03:37,804 --> 00:03:42,255 So what I say class or category when referring to the output, 53 00:03:42,255 --> 00:03:44,097 it means the same thing. 54 00:03:44,097 --> 00:03:50,914 So to summarize classification algorithms predict categories. 55 00:03:50,914 --> 00:03:52,754 Categories don't have to be numbers. 56 00:03:52,754 --> 00:03:56,321 It could be non numeric for example, 57 00:03:56,321 --> 00:04:01,737 it can predict whether a picture is that of a cat or a dog. 58 00:04:01,737 --> 00:04:07,016 And it can predict if a tumor is benign or malignant. 59 00:04:07,016 --> 00:04:12,930 Categories can also be numbers like 0, 1 or 0, 1, 2. 60 00:04:12,930 --> 00:04:17,932 But what makes classification different from regression when 61 00:04:17,932 --> 00:04:23,312 you're interpreting the numbers is that classification predicts 62 00:04:23,312 --> 00:04:29,253 a small finite limited set of possible output categories such as 0, 1 and 63 00:04:29,253 --> 00:04:34,469 2 but not all possible numbers in between like 0.5 or 1.7. 64 00:04:34,469 --> 00:04:40,601 In the example of supervised learning that we've been looking at, 65 00:04:40,601 --> 00:04:45,023 we had only one input value the size of the tumor. 66 00:04:45,023 --> 00:04:51,086 But you can also use more than one input value to predict an output. 67 00:04:51,086 --> 00:04:55,773 Here's an example, instead of just knowing the tumor size, 68 00:04:55,773 --> 00:04:59,391 say you also have each patient's age in years. 69 00:04:59,391 --> 00:05:04,941 Your new data set now has two inputs, age and tumor size. 70 00:05:04,941 --> 00:05:11,315 What in this new dataset we're going to use circles to show patients whose tumors 71 00:05:11,315 --> 00:05:17,327 are benign and crosses to show the patients with a tumor that was malignant. 72 00:05:17,327 --> 00:05:23,079 So when a new patient comes in, the doctor can measure the patient's tumor size and 73 00:05:23,079 --> 00:05:25,394 also record the patient's age. 74 00:05:25,394 --> 00:05:26,972 And so given this, 75 00:05:26,972 --> 00:05:32,605 how can we predict if this patient's tumor is benign or malignant? 76 00:05:32,605 --> 00:05:37,956 Well, given the day said like this, what the learning algorithm might do 77 00:05:37,956 --> 00:05:44,105 is find some boundary that separates out the malignant tumors from the benign ones. 78 00:05:44,105 --> 00:05:48,898 So the learning algorithm has to decide how to fit a boundary line 79 00:05:48,898 --> 00:05:50,423 through this data. 80 00:05:50,423 --> 00:05:54,681 The boundary line found by the learning algorithm would help the doctor with 81 00:05:54,681 --> 00:05:55,620 the diagnosis. 82 00:05:55,620 --> 00:06:00,795 In this case the tumor is more likely to be benign. 83 00:06:00,795 --> 00:06:05,385 From this example we have seen how to inputs the patient's age and 84 00:06:05,385 --> 00:06:07,060 tumor size can be used. 85 00:06:07,060 --> 00:06:12,995 In other machine learning problems often many more input values are required. 86 00:06:12,995 --> 00:06:17,813 My friends who worked on breast cancer detection use many additional inputs, 87 00:06:17,813 --> 00:06:22,047 like the thickness of the tumor clump, uniformity of the cell size, 88 00:06:22,047 --> 00:06:24,469 uniformity of the cell shape and so on. 89 00:06:24,469 --> 00:06:29,585 So to recap supervised learning maps input x to output y, 90 00:06:29,585 --> 00:06:35,673 where the learning algorithm learns from the quote right answers. 91 00:06:35,673 --> 00:06:41,197 The two major types of supervised learning our regression and classification. 92 00:06:41,197 --> 00:06:45,761 In a regression application like predicting prices of houses, the learning 93 00:06:45,761 --> 00:06:50,618 algorithm has to predict numbers from infinitely many possible output numbers. 94 00:06:50,618 --> 00:06:55,494 Whereas in classification the learning algorithm has to make a prediction of 95 00:06:55,494 --> 00:06:58,802 a category, all of a small set of possible outputs. 96 00:06:58,802 --> 00:07:01,880 So you now know what is supervised learning, 97 00:07:01,880 --> 00:07:05,288 including both regression and classification. 98 00:07:05,288 --> 00:07:06,902 I hope you're having fun. 99 00:07:06,902 --> 00:07:10,468 Next there's a second major type of machine learning 100 00:07:10,468 --> 00:07:12,694 called unsupervised learning. 101 00:07:12,694 --> 00:07:15,560 Let's go on to the next video to see what that is9064

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.