subtitlecat.com

All language subtitles for 02 - Measuring Performance - Introduction.en

Afrikaans

Akan

Albanian

Amharic

Arabic Download

Armenian

Azerbaijani

Basque

Belarusian

Bemba

Bengali

Bihari

Bosnian

Breton

Bulgarian

Cambodian

Catalan

Cebuano

Cherokee

Chichewa

Chinese (Simplified)

Chinese (Traditional)

Corsican

Croatian

Czech

Danish

Dutch

English

Esperanto

Estonian

Ewe

Faroese

Filipino

Finnish

French

Frisian

Galician

Georgian

German

Greek

Guarani

Gujarati

Haitian Creole

Hausa

Hawaiian

Hebrew

Hindi

Hmong

Hungarian

Icelandic

Igbo

Indonesian

Interlingua

Irish

Italian

Japanese

Javanese

Kannada

Kazakh

Kinyarwanda

Kirundi

Kongo

Korean

Krio (Sierra Leone)

Kurdish

Kurdish (Soranî)

Kyrgyz

Laothian

Latin

Latvian

Lingala

Lithuanian

Lozi

Luganda

Luo

Luxembourgish

Macedonian

Malagasy

Malay

Malayalam

Maltese

Maori

Marathi

Mauritian Creole

Moldavian

Mongolian

Myanmar (Burmese)

Montenegrin

Nepali

Nigerian Pidgin

Northern Sotho

Norwegian

Norwegian (Nynorsk)

Occitan

Oriya

Oromo

Pashto

Persian

Polish

Portuguese (Brazil)

Portuguese (Portugal)

Punjabi

Quechua

Romanian

Romansh

Runyakitara

Russian

Samoan

Scots Gaelic

Serbian

Serbo-Croatian

Sesotho

Setswana

Seychellois Creole

Shona

Sindhi

Sinhalese

Slovak

Slovenian

Somali

Spanish Download

Spanish (Latin American)

Sundanese

Swahili

Swedish

Tajik

Tamil

Tatar

Telugu

Thai

Tigrinya

Tonga

Tshiluba

Tumbuka

Turkish

Turkmen

Twi

Uighur

Ukrainian

Urdu

Uzbek

Vietnamese

Welsh

Wolof

Xhosa

Yiddish

Yoruba

Zulu

Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated: 1 00:00:09,000 --> 00:00:14,000 focus on how to measure and compare performance of different data structures and algorithms. 2 00:00:14,000 --> 00:00:21,000 This specific module is a bit special, in that it presents a toolbox that we will use intensively throughout 3 00:00:21,000 --> 00:00:28,000 the rest of the course. Some of the concepts presented may seem odd or abstract at first, and they might 4 00:00:28,000 --> 00:00:34,000 take some time to get used to, but hold on, once you are through this module and we have established a common 5 00:00:34,000 --> 00:00:40,000 language for performance complexity comparison, the remaining modules will be much more concrete. 6 00:00:40,000 --> 00:00:46,000 We just need this toolbox first. So, without further ado, let's consider 7 00:00:46,000 --> 00:00:54,000 a few different ways of measure performance. So, one strategy is to use a stopwatch to simply measure the 8 00:00:54,000 --> 00:01:00,000 time it takes for a given action to complete. That was actually what we did in the introduction, 9 00:01:00,000 --> 00:01:05,000 comparing two implementations on an iPhone and a laptop respectively. 10 00:01:05,000 --> 00:01:12,000 However, that approach comes with some drawbacks, because a lot of things are in play when measuring one 11 00:01:12,000 --> 00:01:18,000 thing as the hardware the executes the implementation, where more capable hardware may result in a faster 12 00:01:18,000 --> 00:01:27,000 execution time, even though the algorithm implemented in itself is just the same or maybe even slightly worse 13 00:01:27,000 --> 00:01:33,000 than an algorithm running on some less capable hardware. Other things can influence the measure time, too, 14 00:01:33,000 --> 00:01:42,000 such as the compiler optimizations, and possible simultaneously running programs that may eat resources from our machine. 15 00:01:42,000 --> 00:01:50,000 Measuring execution time with a stopwatch has another drawback, namely the fact that it is only a single measuring point. 16 00:01:50,000 --> 00:01:57,000 Specifically, that single measurement does not say anything about how the execution time may behave for 17 00:01:57,000 --> 00:02:04,000 smaller or larger inputs; for example, how the execution time may grow for our log reader when more log 18 00:02:04,000 --> 00:02:10,000 lines are supposed to be analyzed. We can achieve a better understanding on the nature of the performance 19 00:02:10,000 --> 00:02:17,000 behavior for varying input sizes by considering the number of instructions executed by the machine when 20 00:02:17,000 --> 00:02:23,000 running a given program. If we're shown that each instruction takes a certain fraction of a second to 21 00:02:23,000 --> 00:02:31,000 execute, then the total time usage is just a matter of multiplying this per instruction time with the total 22 00:02:31,000 --> 00:02:39,000 number of instructions. For example, consider the function from the intro that just reads the provided log 23 00:02:39,000 --> 00:02:46,000 file and counts the number of log lines. We could assume that the two first lines in the function uses one 24 00:02:46,000 --> 00:02:53,000 instruction each, summarizing to two instructions. And for the sake of simplicity, that the contents of the 25 00:02:53,000 --> 00:03:04,000 for loop uses four instructions. If the log file contains 100,000 lines, we thus end up with an expression 26 00:03:04,000 --> 00:03:10,000 for the total number of instructions executed that looks like this. 27 00:03:10,000 --> 00:03:19,000 The last + 1 comes from the return statement. If we, instead of assuming 100,000 log lines are using the 28 00:03:19,000 --> 00:03:28,000 letter N to symbolize any number of log lines, then we get a more general expression that looks like this. 29 00:03:28,000 --> 00:03:34,000 Now we can get an idea of the complexity of an algorithm for a given input size simply by inserting an 30 00:03:34,000 --> 00:03:42,000 appropriate number of N in the formal layer. An interesting question we could now ask is how this expression 31 00:03:42,000 --> 00:03:51,000 grows when N increases, and this leads us to the third approach of measuring performance, namely, to look at 32 00:03:51,000 --> 00:03:58,000 the nature of the curve that the instruction count formally from before represents. 33 00:03:58,000 --> 00:04:04,000 And remember that the number of instructions directly relates to the execution time. 34 00:04:04,000 --> 00:04:13,000 Consider the following examples of such curves. The formula from before represented a straight line like this. 35 00:04:13,000 --> 00:04:21,000 A straight line means that the execution time is somewhat reliable, so that an increment in the input size, 36 00:04:21,000 --> 00:04:32,000 or N as we named it, of 500, say, from 1000 to 1500, results in the same execution time growth than an 37 00:04:32,000 --> 00:04:41,000 increment of the input size from, say, 10,000 to 10,500. Here is another example of a line where the 38 00:04:41,000 --> 00:04:47,000 execution time increases slightly faster. Often when the curve looks like this we are happy, 39 00:04:47,000 --> 00:04:53,000 because then our algorithm does not get relatively more cumbersome for a larger input. 40 00:04:53,000 --> 00:05:01,000 We also say that the execution time grows linearly with the input size, and linearity is good. 41 00:05:01,000 --> 00:05:10,000 An example of an even better looking curve is this one. And as you can see, increasing the input size with 42 00:05:10,000 --> 00:05:19,000 a certain amount has a higher relative impact on the execution time for smaller inputs than for larger inputs. 43 00:05:19,000 --> 00:05:28,000 That is, increasing the input size with, say, 500 again, will result in almost no additional execution time 44 00:05:28,000 --> 00:05:33,000 when the input is sufficiently large. Of course, this is a quite ideal performance behavior for an 45 00:05:33,000 --> 00:05:41,000 algorithm, and as we shall see later on, there are ways of searching for data that has exactly this property. 46 00:05:41,000 --> 00:05:49,000 And this basically means that the extra time we need to find the needle in the haystack only increases a tiny 47 00:05:49,000 --> 00:05:57,000 bit, even if the haystack grows significantly in size. On the other end of the scale is a curve like this. 48 00:05:57,000 --> 00:06:02,000 As you can see, this curve has opposite properties than a formal one. 49 00:06:02,000 --> 00:06:09,000 Even though the execution time only grows slowly for small input sizes, the execution time will explode in 50 00:06:09,000 --> 00:06:17,000 size whenever the input gets large enough, and notice that this explosion can happen even if the execution 51 00:06:17,000 --> 00:06:23,000 time is very small for small inputs. Naturally this is the worst imaginable behavior for an algorithm, 52 00:06:23,000 --> 00:06:30,000 because this effectively sets a probably rather low upper limit for how large inputs we can handle. 53 00:06:30,000 --> 00:06:38,000 So, considering the curves that represent the instruction count can give us a pretty good understanding of 54 00:06:38,000 --> 00:06:48,000 how the execution time grows when the input size grows. With that mindset in hand, we can start considering 55 00:06:48,000 --> 00:06:55,000 other performance characteristics. For example, we could consider a best-case scenario for an algorithm, 56 00:06:55,000 --> 00:07:02,000 that is, a curve that represents the most optimistic simulation for an execution, a guarantee that states 57 00:07:02,000 --> 00:07:09,000 what the shortest possible execution time can ever be. We can also consider the worst-case scenario, 58 00:07:09,000 --> 00:07:14,000 that is, an upper limit for how bad an algorithm can perform for a given input size. 59 00:07:14,000 --> 00:07:20,000 When talking about algorithms, this curve is very interesting, simply because there is often a desire to 60 00:07:20,000 --> 00:07:28,000 solve problems as fast as possible, not only when the input is nice, but also in the worst possible situations. 61 00:07:28,000 --> 00:07:34,000 And then, it is useful to know how bad the algorithm will possibly behave. 62 00:07:34,000 --> 00:07:42,000 As a last example, consider a performance behavior that is nice, except for certain spikes, as shown in this curve. 63 00:07:42,000 --> 00:07:49,000 If we are running the corresponding algorithm numerous times, multiple times, it can be interesting to 64 00:07:49,000 --> 00:07:55,000 consider the amortized performance, that is, what is the real cost when summarizing the execution times 65 00:07:55,000 --> 00:08:01,000 across multiple executions. Do the spikes contribute significantly to the execution time in the total 66 00:08:01,000 --> 00:08:09,500 picture, or can they, so to speak, be balanced out? So, that was four ways to consider performance measuring. 67 00:08:09,500 --> 00:08:18,000 In the next clip, we will zoom in on how to come from program source code to its related execution time curve. 9116