subtitlecat.com

All language subtitles for 004 Dataset Splitting

Afrikaans

Akan

Albanian

Amharic

Arabic

Armenian

Azerbaijani

Basque

Belarusian

Bemba

Bengali

Bihari

Bosnian

Breton

Bulgarian

Cambodian

Catalan

Cebuano

Cherokee

Chichewa

Chinese (Simplified)

Chinese (Traditional)

Corsican

Croatian

Czech

Danish

Dutch

English

Esperanto

Estonian

Ewe

Faroese

Filipino

Finnish

French

Frisian

Galician

Georgian

German

Greek

Guarani

Gujarati

Haitian Creole

Hausa

Hawaiian

Hebrew

Hindi

Hmong

Hungarian

Icelandic

Igbo

Indonesian

Interlingua

Irish

Italian

Japanese

Javanese

Kannada

Kazakh

Kinyarwanda

Kirundi

Kongo

Korean

Krio (Sierra Leone)

Kurdish

Kurdish (Soranî)

Kyrgyz

Laothian

Latin

Latvian

Lingala

Lithuanian

Lozi

Luganda

Luo

Luxembourgish

Macedonian

Malagasy

Malay

Malayalam

Maltese

Maori

Marathi

Mauritian Creole

Moldavian

Mongolian

Myanmar (Burmese)

Montenegrin

Nepali

Nigerian Pidgin

Northern Sotho

Norwegian

Norwegian (Nynorsk)

Occitan

Oriya

Oromo

Pashto

Persian

Polish

Portuguese (Brazil)

Portuguese (Portugal)

Punjabi

Quechua

Romanian

Romansh

Runyakitara

Russian

Samoan

Scots Gaelic

Serbian

Serbo-Croatian

Sesotho

Setswana

Seychellois Creole

Shona

Sindhi

Sinhalese

Slovak

Slovenian

Somali

Spanish

Spanish (Latin American)

Sundanese

Swahili

Swedish

Tajik

Tamil

Tatar

Telugu

Thai

Tigrinya

Tonga

Tshiluba

Tumbuka

Turkish

Turkmen

Twi

Uighur

Ukrainian

Urdu

Uzbek

Vietnamese Download

Welsh

Wolof

Xhosa

Yiddish

Yoruba

Zulu

Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:00,530 --> 00:00:03,290 In this video, we will explain about dataset splitting. 2 00:00:03,590 --> 00:00:07,430 The dataset must be split into two parts a training set and a test set. 3 00:00:09,760 --> 00:00:14,590 Because this data is used for training, the training that has the most data following the completion 4 00:00:14,590 --> 00:00:19,030 of the training process, the test set is used to evaluate the performance of the train model. 5 00:00:19,030 --> 00:00:22,270 The data is split to prevent overfitting and to evaluate the model. 6 00:00:25,650 --> 00:00:30,810 Overfitting occurs when a model performs well on a training set, but performs poorly on data that the 7 00:00:30,810 --> 00:00:32,490 model has never seen before. 8 00:00:33,700 --> 00:00:42,160 Commonly the size distribution of training and testing sets is 67% training and 33% testing 75% training 9 00:00:42,160 --> 00:00:43,870 and 25% testing. 10 00:00:44,890 --> 00:00:47,380 90% training and 10% testing. 11 00:00:47,650 --> 00:00:51,380 The YOLO model has had two parameters to obtain optimal values. 12 00:00:51,400 --> 00:00:54,460 These hyper parameters must be tuned during the training process. 13 00:00:54,460 --> 00:00:59,650 Data is required to test the tuning hyper parameter, however, because the data used is not from the 14 00:00:59,650 --> 00:01:03,910 training set an additional component, namely the validation set is required. 15 00:01:04,810 --> 00:01:08,710 Typically the validation set is 10% to 20% of the training set. 16 00:01:10,080 --> 00:01:13,620 We have provided a Python program for performing data set splitting. 17 00:01:13,650 --> 00:01:17,280 The program will be explained in the section on training custom objects. 18 00:01:19,190 --> 00:01:19,910 See you then. 1876