Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
00:00:00.000 --> 00:00:02.465
[MUSIC PLAYING]
00:00:24.287 --> 00:00:25.370
DAVID J. MALAN: All right.
00:00:25.370 --> 00:00:28.730
This is CS50'S Introduction
to Programming with Python.
00:00:28.730 --> 00:00:30.050
My name is David Malan.
00:00:30.050 --> 00:00:32.210
And over these past many
weeks have we focused
00:00:32.210 --> 00:00:36.020
on functions and variables early
on, then conditionals, and loops,
00:00:36.020 --> 00:00:39.030
and exceptions, a bit of
libraries, unit test file
00:00:39.030 --> 00:00:42.050
layout, regular expressions,
object-oriented programming,
00:00:42.050 --> 00:00:43.520
and really, et cetera.
00:00:43.520 --> 00:00:45.770
And indeed, that's
where we focus today, is
00:00:45.770 --> 00:00:48.710
on all the more that you can
do with Python and programming
00:00:48.710 --> 00:00:52.410
more generally beyond some of
those fundamental concepts as well.
00:00:52.410 --> 00:00:55.340
In fact, if you start to flip
through the documentation for Python
00:00:55.340 --> 00:00:59.570
and all of its form, all of which is as
always accessible at docs.python.org,
00:00:59.570 --> 00:01:03.770
you'll see additional documentation
on Python's own tutorial and library,
00:01:03.770 --> 00:01:05.510
its reference, its how-to.
00:01:05.510 --> 00:01:09.540
And among all of those various
documents as well as others more online,
00:01:09.540 --> 00:01:12.530
you'll see that there's some tidbits
that we didn't quite touch on.
00:01:12.530 --> 00:01:15.560
And indeed, even though we
themed these past several weeks
00:01:15.560 --> 00:01:19.430
of around fairly broad topics that
are rather essential for doing
00:01:19.430 --> 00:01:22.160
typical types of problems
in Python, it turns out
00:01:22.160 --> 00:01:25.460
there's quite a number of other features
as well, that we didn't necessarily
00:01:25.460 --> 00:01:27.950
touch on, that didn't
necessarily fit within any
00:01:27.950 --> 00:01:30.560
of those overarching
concepts, or might have
00:01:30.560 --> 00:01:33.770
been a little too much too soon if we
did them too early on in the course.
00:01:33.770 --> 00:01:36.320
And so, in today, our
final lecture, well, we
00:01:36.320 --> 00:01:38.990
focus really on all the more
that you can do with Python
00:01:38.990 --> 00:01:42.740
and hopefully whet your appetite for
teaching yourself all the more to.
00:01:42.740 --> 00:01:45.680
For instance, among
Python's various data types,
00:01:45.680 --> 00:01:49.130
there's this other one that we haven't
had occasion to yet use, namely, a set.
00:01:49.130 --> 00:01:52.040
In mathematics, a set is
typically a collection of values
00:01:52.040 --> 00:01:53.820
wherein there are no duplicates.
00:01:53.820 --> 00:01:55.170
So it's not quite a list.
00:01:55.170 --> 00:01:58.790
It's a bit more special than that
in that somehow any duplicates are
00:01:58.790 --> 00:01:59.990
eliminated for you.
00:01:59.990 --> 00:02:03.200
Well, it turns out within Python,
this is an actual data type
00:02:03.200 --> 00:02:05.690
that you yourself can use in your code.
00:02:05.690 --> 00:02:08.330
And via the documentation
here, might you
00:02:08.330 --> 00:02:10.580
be able to glean that
it's a useful problem
00:02:10.580 --> 00:02:13.430
if you want to somehow
automatically filter out duplicates.
00:02:13.430 --> 00:02:16.250
So let me go ahead and
go over to VS Code here.
00:02:16.250 --> 00:02:20.210
And let me go ahead and show you a file
that I created a bit of in advance,
00:02:20.210 --> 00:02:23.300
whereby we have a file
here called houses.py.
00:02:23.300 --> 00:02:26.420
And in houses.py, I already
went ahead and whipped up
00:02:26.420 --> 00:02:29.990
a big list of students
inside of which is
00:02:29.990 --> 00:02:33.470
a number of dictionaries, each of
which represents a student's name
00:02:33.470 --> 00:02:35.310
and house respectively.
00:02:35.310 --> 00:02:37.080
Now, this is a pretty
sizable dictionary.
00:02:37.080 --> 00:02:39.560
And so, it lends itself to
iteration over the same.
00:02:39.560 --> 00:02:42.810
And suppose that the goal here
was quite simply to figure out,
00:02:42.810 --> 00:02:46.735
well, what are the unique houses at
Hogwarts in the world of Harry Potter?
00:02:46.735 --> 00:02:49.610
It would be nice, perhaps, to not
have to know these kinds of details
00:02:49.610 --> 00:02:50.570
or look them up online.
00:02:50.570 --> 00:02:54.500
Here we have a set of students, albeit
not exhaustive, with all of the houses.
00:02:54.500 --> 00:02:58.550
But among these students here, what are
the unique houses in which they live?
00:02:58.550 --> 00:03:00.800
Well, I could certainly, as
a human, just eyeball this
00:03:00.800 --> 00:03:03.650
and tell you that it's, well,
Gryffindor, Slytherin, and Ravenclaw.
00:03:03.650 --> 00:03:06.620
But how can we go about doing it
programmatically for these students
00:03:06.620 --> 00:03:07.520
as well?
00:03:07.520 --> 00:03:09.360
Well, let's take one
approach first here.
00:03:09.360 --> 00:03:11.240
Let me go into houses.py.
00:03:11.240 --> 00:03:15.170
And let me propose that we first
how about create an empty list
00:03:15.170 --> 00:03:20.390
called houses in which I'm going to
accumulate each of the houses uniquely.
00:03:20.390 --> 00:03:24.920
So every time I iterate through
this list of dictionaries,
00:03:24.920 --> 00:03:28.830
I'm only going to add a house to this
list if I haven't seen it before.
00:03:28.830 --> 00:03:30.020
So how do I express that?
00:03:30.020 --> 00:03:34.130
Well, let me iterate over all of the
students with for student in students,
00:03:34.130 --> 00:03:35.550
as we've done in the past.
00:03:35.550 --> 00:03:37.200
And let me ask you a question now.
00:03:37.200 --> 00:03:40.400
So if the current student's house--
00:03:40.400 --> 00:03:43.550
and notice that I'm indexing
into the current student
00:03:43.550 --> 00:03:46.580
because I know they are a
dictionary or dict object,
00:03:46.580 --> 00:03:51.830
and if that student's house
is not in my house's list,
00:03:51.830 --> 00:03:56.480
then, indented, am I going
to say houses.append,
00:03:56.480 --> 00:03:58.190
because again, houses is a list.
00:03:58.190 --> 00:04:02.120
And I'm going to append that
particular house to the list.
00:04:02.120 --> 00:04:04.130
Then at the very bottom
here, let me go ahead
00:04:04.130 --> 00:04:07.880
and do something somewhat interesting
here and say, for each of the houses
00:04:07.880 --> 00:04:11.360
that I've accumulated in,
I could just say houses.
00:04:11.360 --> 00:04:14.917
But if I just say houses, what was the
point of accumulating them all at once?
00:04:14.917 --> 00:04:16.709
I could just do this
whole thing in a loop.
00:04:16.709 --> 00:04:19.190
Let's at least go about
and sort those houses
00:04:19.190 --> 00:04:22.550
with sorted, which is going
to the strings alphabetically.
00:04:22.550 --> 00:04:25.520
And let's go ahead therein
and print each of the houses.
00:04:25.520 --> 00:04:27.260
Let me go ahead now
in my terminal window
00:04:27.260 --> 00:04:29.715
and run Python of
houses.py and hit Enter.
00:04:29.715 --> 00:04:30.590
And there we have it.
00:04:30.590 --> 00:04:34.220
Gryffindor, Ravenclaw,
Slytherin in alphabetical order,
00:04:34.220 --> 00:04:37.280
even though in the list
of dictionaries up here,
00:04:37.280 --> 00:04:40.960
technically the order in which we
saw these was Gryffindor, Gryffindor,
00:04:40.960 --> 00:04:43.190
Gryffindor, Slytherin, Ravenclaw.
00:04:43.190 --> 00:04:46.620
So indeed, my code seems to
have sorted them properly.
00:04:46.620 --> 00:04:48.110
So this is perfectly fine.
00:04:48.110 --> 00:04:50.600
And it's one way of
solving this problem.
00:04:50.600 --> 00:04:55.070
But it turns out we could use more
that's built into the language Python
00:04:55.070 --> 00:04:56.690
to solve this problem ourself.
00:04:56.690 --> 00:05:00.470
Here I'm rather reinventing a
wheel, really the notion of a set
00:05:00.470 --> 00:05:02.610
wherein duplicates
are eliminated for me.
00:05:02.610 --> 00:05:04.580
So let me go ahead and
clear my terminal window
00:05:04.580 --> 00:05:07.580
and perhaps change the type
of object I'm using here.
00:05:07.580 --> 00:05:09.650
Instead of a list, which
could also be written
00:05:09.650 --> 00:05:11.960
like this to create an
empty list, let me go ahead
00:05:11.960 --> 00:05:15.740
and create an empty set,
whereby I call a function called
00:05:15.740 --> 00:05:18.500
set that's going to return
to me some object in Python
00:05:18.500 --> 00:05:21.950
that represents this notion of a set
wherein duplicates are automatically
00:05:21.950 --> 00:05:22.730
eliminated.
00:05:22.730 --> 00:05:24.650
And now, I can tighten up my code.
00:05:24.650 --> 00:05:27.370
Because I don't have to use
this if condition myself.
00:05:27.370 --> 00:05:29.590
I think I can just do
something like this.
00:05:29.590 --> 00:05:32.650
Inside of my loop, let me do houses.add.
00:05:32.650 --> 00:05:35.800
So it's not append for a
set, it's append for a list.
00:05:35.800 --> 00:05:39.220
But it's add to a set
per the documentation.
00:05:39.220 --> 00:05:42.490
Then let me go ahead and add
this current student's house.
00:05:42.490 --> 00:05:45.110
And now, I think the rest
of my code can be the same.
00:05:45.110 --> 00:05:48.670
I'm just now trusting per the
documentation for set in Python
00:05:48.670 --> 00:05:50.860
that it's going to filter
out duplicates for me.
00:05:50.860 --> 00:05:55.240
And I can just blindly add, add, add,
add all of these houses to the set
00:05:55.240 --> 00:05:57.850
and any duplicates already
there will be gone.
00:05:57.850 --> 00:06:00.550
Python of houses.py and Enter.
00:06:00.550 --> 00:06:04.480
And voila, we're back in business
with just those three there as well.
00:06:04.480 --> 00:06:08.890
Let me pause here to see if there's any
questions now on this use of set, which
00:06:08.890 --> 00:06:11.380
is just another data type
that's available to you,
00:06:11.380 --> 00:06:14.620
another class in the world of
Python that you can reach for when
00:06:14.620 --> 00:06:16.750
solving some problem like this.
00:06:16.750 --> 00:06:19.330
STUDENT: How can we
locate an item in a set,
00:06:19.330 --> 00:06:22.148
for example, find
Gryffindor in that set?
00:06:22.148 --> 00:06:24.190
DAVID J. MALAN: How do
you find an item in a set?
00:06:24.190 --> 00:06:28.030
You can use very similar syntax
as we've done for a list before.
00:06:28.030 --> 00:06:34.630
You can use syntax like if
Gryffindor in houses then,
00:06:34.630 --> 00:06:36.980
and you can answer a
question along those lines.
00:06:36.980 --> 00:06:40.790
So you can use in and not in
and similar functions as well.
00:06:40.790 --> 00:06:42.250
Other questions on set?
00:06:42.250 --> 00:06:45.730
STUDENT: Look what happens if
you have a similar house name?
00:06:45.730 --> 00:06:48.520
Let's say instead of
Slytherin, it is maybe
00:06:48.520 --> 00:06:52.000
an O instead of an I.
Will the for loop loop
00:06:52.000 --> 00:06:56.800
throughout each of those
letters in the house name?
00:06:56.800 --> 00:06:59.270
DAVID J. MALAN: It would
compare the strings.
00:06:59.270 --> 00:07:01.810
So if Slytherin appears
more than once but is
00:07:01.810 --> 00:07:04.720
slightly misspelled or
capitalized, if I heard you right,
00:07:04.720 --> 00:07:08.230
those would appear to
be distinct strings.
00:07:08.230 --> 00:07:11.420
So you would get both versions
of Slytherin in the result.
00:07:11.420 --> 00:07:14.500
However, we've seen in the past
how we can clean up users' data
00:07:14.500 --> 00:07:15.910
if indeed it might be messy.
00:07:15.910 --> 00:07:19.030
We could force everything to
uppercase, or everything to lowercase,
00:07:19.030 --> 00:07:22.270
or we could use capitalize
the function built into strs,
00:07:22.270 --> 00:07:25.300
or title case that would handle
some of the cleanup for us.
00:07:25.300 --> 00:07:28.930
In this case, because the data is not
coming from humans using the input
00:07:28.930 --> 00:07:31.600
function, I wrote the code
in advance, it's safer
00:07:31.600 --> 00:07:33.760
to assume that I got the houses right.
00:07:33.760 --> 00:07:37.000
But that's absolutely a risk
if it's coming from users.
00:07:37.000 --> 00:07:39.850
Allow me to turn our attention
back to some of the other features
00:07:39.850 --> 00:07:43.390
here that we can leverage in Python if
we dig further into the documentation
00:07:43.390 --> 00:07:45.130
and read up more on its features.
00:07:45.130 --> 00:07:47.380
Well, in some language,
there's this notion
00:07:47.380 --> 00:07:51.520
of global variables, whereby you
can define a variable that's either
00:07:51.520 --> 00:07:54.280
local to a function, as
we've seen many times,
00:07:54.280 --> 00:07:58.450
or if you put a variable outside
of all of your functions,
00:07:58.450 --> 00:08:01.060
perhaps near the top of your
file, that would generally
00:08:01.060 --> 00:08:03.340
be considered a global variable.
00:08:03.340 --> 00:08:06.350
Or in the world of Python, it
might be specific to the module.
00:08:06.350 --> 00:08:09.610
But for all intents and purposes, it's
going to behave for a given program
00:08:09.610 --> 00:08:11.080
as though it is global.
00:08:11.080 --> 00:08:13.120
However, it turns out
that if you do this
00:08:13.120 --> 00:08:16.750
when solving some problem down the line,
whereby you have multiple functions
00:08:16.750 --> 00:08:20.590
and you do have one or more variables
that are outside of those functions,
00:08:20.590 --> 00:08:26.470
you might not be able to change those
variables as easily as you might think.
00:08:26.470 --> 00:08:28.930
So indeed, let me go
back to VS Code here.
00:08:28.930 --> 00:08:32.289
And in just a moment, I'm going to go
ahead and create a new file, how about
00:08:32.289 --> 00:08:34.419
called bank.py.
00:08:34.419 --> 00:08:36.730
Let's go ahead and implement
the notion of a bank
00:08:36.730 --> 00:08:40.960
wherein we can store things
like money in various forms.
00:08:40.960 --> 00:08:42.710
And let me go ahead and do this.
14307
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.