All language subtitles for 08 - Feature Combination and Dimensionality Reduction.en

af Afrikaans
sq Albanian
am Amharic
ar Arabic
hy Armenian
az Azerbaijani
eu Basque
be Belarusian
bn Bengali
bs Bosnian
bg Bulgarian
ca Catalan
ceb Cebuano
ny Chichewa
zh-CN Chinese (Simplified)
zh-TW Chinese (Traditional)
co Corsican
hr Croatian
cs Czech
da Danish
nl Dutch
en English
eo Esperanto
et Estonian
tl Filipino
fi Finnish
fr French Download
fy Frisian
gl Galician
ka Georgian
de German
el Greek
gu Gujarati
ht Haitian Creole
ha Hausa
haw Hawaiian
iw Hebrew
hi Hindi
hmn Hmong
hu Hungarian
is Icelandic
ig Igbo
id Indonesian
ga Irish
it Italian
ja Japanese
jw Javanese
kn Kannada
kk Kazakh
km Khmer
ko Korean
ku Kurdish (Kurmanji)
ky Kyrgyz
lo Lao
la Latin
lv Latvian
lt Lithuanian
lb Luxembourgish
mk Macedonian
mg Malagasy
ms Malay
ml Malayalam
mt Maltese
mi Maori
mr Marathi
mn Mongolian
my Myanmar (Burmese)
ne Nepali
no Norwegian
ps Pashto
fa Persian
pl Polish
pt Portuguese
pa Punjabi
ro Romanian
ru Russian
sm Samoan
gd Scots Gaelic
sr Serbian
st Sesotho
sn Shona
sd Sindhi
si Sinhala
sk Slovak
sl Slovenian
so Somali
es Spanish
su Sundanese
sw Swahili
sv Swedish
tg Tajik
ta Tamil
te Telugu
th Thai
tr Turkish
uk Ukrainian
ur Urdu
uz Uzbek
vi Vietnamese
cy Welsh
xh Xhosa
yi Yiddish
yo Yoruba
zu Zulu
or Odia (Oriya)
rw Kinyarwanda
tk Turkmen
tt Tatar
ug Uyghur
Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 0 00:00:01,040 --> 00:00:02,359 [Autogenerated] in this clip, Let's first 1 00:00:02,359 --> 00:00:04,730 more to discussing feature combination. 2 00:00:04,730 --> 00:00:06,719 Now it's quite possible that the raw data 3 00:00:06,719 --> 00:00:08,500 that you're working with contains very 4 00:00:08,500 --> 00:00:10,769 granular information, which doesn't have 5 00:00:10,769 --> 00:00:13,210 much predictive power. Feature combination 6 00:00:13,210 --> 00:00:15,349 may involve aggregating and bringing 7 00:00:15,349 --> 00:00:17,420 features together to get a future with 8 00:00:17,420 --> 00:00:20,440 more political power. Now you might find 9 00:00:20,440 --> 00:00:22,660 it in the real bold. Some features 10 00:00:22,660 --> 00:00:24,510 naturally work better when they're 11 00:00:24,510 --> 00:00:27,769 considered together. Ah, feature by itself 12 00:00:27,769 --> 00:00:30,420 may not contain much information, but then 13 00:00:30,420 --> 00:00:32,609 considered in conjunction with another 14 00:00:32,609 --> 00:00:35,000 feature. The teachers and combination 15 00:00:35,000 --> 00:00:37,179 might contain information that is relevant 16 00:00:37,179 --> 00:00:39,240 to your model. It's quite possible that 17 00:00:39,240 --> 00:00:42,299 the original feature might be to row or 18 00:00:42,299 --> 00:00:45,039 two granular. Bringing features together 19 00:00:45,039 --> 00:00:47,289 can help improve the predictive power of 20 00:00:47,289 --> 00:00:49,229 features. Let's say you're building a 21 00:00:49,229 --> 00:00:51,039 machine learning model tow. Predict 22 00:00:51,039 --> 00:00:53,100 traffic patterns in a city. Let's say the 23 00:00:53,100 --> 00:00:56,210 city is a bang lord. Now you might get 24 00:00:56,210 --> 00:00:58,359 information from the day off the beat that 25 00:00:58,359 --> 00:01:00,729 it is. You might also get information from 26 00:01:00,729 --> 00:01:02,929 the time off the But when taken in 27 00:01:02,929 --> 00:01:05,540 conjunction, when you use a future cross 28 00:01:05,540 --> 00:01:07,780 day off the week plus time off the day, 29 00:01:07,780 --> 00:01:10,219 you might get a resulting feature that has 30 00:01:10,219 --> 00:01:12,439 more predictive power If you're looking at 31 00:01:12,439 --> 00:01:15,480 traffic at Friday evening at 60 m, you 32 00:01:15,480 --> 00:01:17,219 know it's going to be terrible. But if 33 00:01:17,219 --> 00:01:19,420 you're looking at the same time, 60 M on a 34 00:01:19,420 --> 00:01:23,239 Sunday traffic is quite likely not as bad. 35 00:01:23,239 --> 00:01:25,250 Let's say you want to combine features 36 00:01:25,250 --> 00:01:27,439 together to predict temperature. You can 37 00:01:27,439 --> 00:01:29,459 take into account the current season 38 00:01:29,459 --> 00:01:31,200 whether it's spring, summer, fall or 39 00:01:31,200 --> 00:01:33,519 winter. You can also take into account the 40 00:01:33,519 --> 00:01:35,840 time off the but when taken together, you 41 00:01:35,840 --> 00:01:38,530 might find that the feature combination is 42 00:01:38,530 --> 00:01:41,129 more than the sum of the parts. And 43 00:01:41,129 --> 00:01:42,959 finally, let's move on to the last 44 00:01:42,959 --> 00:01:44,420 component that we'll discuss with in 45 00:01:44,420 --> 00:01:46,879 future engineering. At this dimensionality 46 00:01:46,879 --> 00:01:49,430 reduction. When you're working with data 47 00:01:49,430 --> 00:01:50,950 in the real world, you will find that a 48 00:01:50,950 --> 00:01:53,060 common problem toe have is that you have 49 00:01:53,060 --> 00:01:55,909 too much data. This is a curse and not a 50 00:01:55,909 --> 00:01:57,519 blessing, and it's often referred to as 51 00:01:57,519 --> 00:01:59,920 the cost of dimensionality. This is where 52 00:01:59,920 --> 00:02:02,239 he would apply pre processing algorithms 53 00:02:02,239 --> 00:02:05,250 to reduce the complexity off raw features, 54 00:02:05,250 --> 00:02:07,790 and the specific aim off these algorithms 55 00:02:07,790 --> 00:02:10,340 is to reduce the number of input features 56 00:02:10,340 --> 00:02:12,639 so you have fewer features to work with 57 00:02:12,639 --> 00:02:14,900 having too many features to work within 58 00:02:14,900 --> 00:02:17,120 your data is referred to us the cost of 59 00:02:17,120 --> 00:02:19,229 dimensionality, and it leads to several 60 00:02:19,229 --> 00:02:21,020 problems. You have problems visualizing 61 00:02:21,020 --> 00:02:23,169 your data. You encounter problems during 62 00:02:23,169 --> 00:02:25,439 training as Celeste hearing prediction. 63 00:02:25,439 --> 00:02:27,210 When you work with higher dimensionality 64 00:02:27,210 --> 00:02:30,009 data machine learning models find it hard 65 00:02:30,009 --> 00:02:32,759 to find patterns within your data leading 66 00:02:32,759 --> 00:02:34,919 toe poor quality models over fitted 67 00:02:34,919 --> 00:02:37,360 models. Forfeited models are those that 68 00:02:37,360 --> 00:02:40,009 perform well in training but poorly in the 69 00:02:40,009 --> 00:02:42,689 real world. In production, dimensionality 70 00:02:42,689 --> 00:02:45,699 reduction explicitly aims to solve the 71 00:02:45,699 --> 00:02:48,319 coast of dimensionality while preserving 72 00:02:48,319 --> 00:02:50,449 as much information as possible from the 73 00:02:50,449 --> 00:02:51,990 underlying features. You don't want to 74 00:02:51,990 --> 00:02:54,530 lose too much information. Dimensionality 75 00:02:54,530 --> 00:02:56,759 reduction is a form of unsupervised 76 00:02:56,759 --> 00:02:59,030 learning. You're working with an unlabeled 77 00:02:59,030 --> 00:03:01,710 corpus of data based on the kind of data 78 00:03:01,710 --> 00:03:03,659 that you're working with. There are many 79 00:03:03,659 --> 00:03:05,120 different techniques that you can use for 80 00:03:05,120 --> 00:03:07,009 dimensionality reduction. When you're 81 00:03:07,009 --> 00:03:09,159 working with linear data, you can choose 82 00:03:09,159 --> 00:03:10,930 principle components analysis, which 83 00:03:10,930 --> 00:03:13,939 involves Lee orienting your original data 84 00:03:13,939 --> 00:03:15,659 so that their projected along newer, 85 00:03:15,659 --> 00:03:18,150 better axes. If you're working with non 86 00:03:18,150 --> 00:03:20,120 linear data, you can apply manifold 87 00:03:20,120 --> 00:03:22,139 learning techniques. This involved 88 00:03:22,139 --> 00:03:24,840 unrolling complex forms off Dayton and 89 00:03:24,840 --> 00:03:27,150 higher access toe express data in a 90 00:03:27,150 --> 00:03:29,840 simpler form with lower dimensionality. 91 00:03:29,840 --> 00:03:32,060 Manafort Island techniques are similar toe 92 00:03:32,060 --> 00:03:34,120 unrolling a carpet so that it's 93 00:03:34,120 --> 00:03:36,569 represented in two dimensions. Leading 94 00:03:36,569 --> 00:03:39,610 semantic analysis is a topic modeling and 95 00:03:39,610 --> 00:03:41,370 dimensionality reduction technique that 96 00:03:41,370 --> 00:03:44,319 you can use to work with text data if 97 00:03:44,319 --> 00:03:46,389 you're working with images auto, including 98 00:03:46,389 --> 00:03:51,000 confined, efficient, lower dimensionality representation for your images. 7751

Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.