Would you like to inspect the original subtitles? These are the user uploaded subtitles that are being translated:
1
00:00:00,200 --> 00:00:02,700
One of the most important
2
00:00:02,700 --> 00:00:04,530
responsibilities
for those of us in
3
00:00:04,530 --> 00:00:07,170
data-centered
careers involves how
4
00:00:07,170 --> 00:00:09,180
we protect our organizations,
5
00:00:09,180 --> 00:00:11,655
manage and protect data.
6
00:00:11,655 --> 00:00:13,380
This has a lot to do with
7
00:00:13,380 --> 00:00:14,610
communication exchanges
8
00:00:14,610 --> 00:00:16,815
between a company
and its customers.
9
00:00:16,815 --> 00:00:18,210
As you've been learning,
10
00:00:18,210 --> 00:00:20,840
almost all communication
generates data,
11
00:00:20,840 --> 00:00:22,650
whether it's a shopping receipt,
12
00:00:22,650 --> 00:00:24,105
confirmation of an order,
13
00:00:24,105 --> 00:00:27,165
or even earning customer
loyalty points.
14
00:00:27,165 --> 00:00:30,935
Businesses have a big
responsibility to their customers,
15
00:00:30,935 --> 00:00:32,540
especially when it comes to
16
00:00:32,540 --> 00:00:35,455
maintaining and
protecting user privacy.
17
00:00:35,455 --> 00:00:37,280
Any data gathered from
18
00:00:37,280 --> 00:00:39,440
individuals or consumers
is referred to
19
00:00:39,440 --> 00:00:43,480
as personally identifiable
information or PII.
20
00:00:43,480 --> 00:00:46,820
PII permits the identity
of an individual to be
21
00:00:46,820 --> 00:00:50,000
inferred by either direct
or indirect means.
22
00:00:50,000 --> 00:00:52,355
This includes things
like biometric records,
23
00:00:52,355 --> 00:00:54,380
usernames, and Social Security
24
00:00:54,380 --> 00:00:56,650
or national
identification numbers.
25
00:00:56,650 --> 00:00:58,850
Because this
information is often
26
00:00:58,850 --> 00:01:00,710
associated with
medical, financial,
27
00:01:00,710 --> 00:01:01,790
and employment records,
28
00:01:01,790 --> 00:01:05,860
PII is sensitive and must
be managed with great care.
29
00:01:05,860 --> 00:01:08,090
After all, when
someone's personal data
30
00:01:08,090 --> 00:01:09,695
is improperly handled,
31
00:01:09,695 --> 00:01:12,500
they become vulnerable
to identity theft,
32
00:01:12,500 --> 00:01:14,620
fraud, and other issues.
33
00:01:14,620 --> 00:01:17,525
Recently, there have been
great efforts to take
34
00:01:17,525 --> 00:01:18,560
a wider view of
35
00:01:18,560 --> 00:01:22,015
data collection practices
and protect individuals.
36
00:01:22,015 --> 00:01:25,700
Industries are trending
towards aggregate information.
37
00:01:25,700 --> 00:01:28,310
This is data from a
significant number of
38
00:01:28,310 --> 00:01:31,525
users that has eliminated
personal information.
39
00:01:31,525 --> 00:01:34,550
By aggregating the
data and removing PII,
40
00:01:34,550 --> 00:01:36,710
this protects people
and gives them
41
00:01:36,710 --> 00:01:39,110
more control over
their own data.
42
00:01:39,110 --> 00:01:42,200
Similarly, as more industries
become interconnected,
43
00:01:42,200 --> 00:01:45,455
the amount of data available
to them increases.
44
00:01:45,455 --> 00:01:47,450
Just as with aggregate
information,
45
00:01:47,450 --> 00:01:48,785
the more data collected,
46
00:01:48,785 --> 00:01:50,690
the more likely it
is that it will be
47
00:01:50,690 --> 00:01:53,359
representative of
a wider population
48
00:01:53,359 --> 00:01:55,370
rather than a single user.
49
00:01:55,370 --> 00:01:57,500
A key thing to keep
in mind is that
50
00:01:57,500 --> 00:02:00,710
data gathering is a
task managed by humans,
51
00:02:00,710 --> 00:02:03,440
and that process can be informed
52
00:02:03,440 --> 00:02:04,595
by different backgrounds,
53
00:02:04,595 --> 00:02:07,270
experiences, beliefs,
and worldviews.
54
00:02:07,270 --> 00:02:10,250
These and other types
of biases can affect
55
00:02:10,250 --> 00:02:12,215
the way that data
is communicated
56
00:02:12,215 --> 00:02:13,835
and how the results are shared,
57
00:02:13,835 --> 00:02:17,170
which in turn can have an
impact on business decisions.
58
00:02:17,170 --> 00:02:19,475
Effective data
professionals know that,
59
00:02:19,475 --> 00:02:21,620
whether collecting,
analyzing, interpreting,
60
00:02:21,620 --> 00:02:23,695
or communicating sensitive data,
61
00:02:23,695 --> 00:02:27,345
bias should always
be considered.
62
00:02:27,345 --> 00:02:29,090
So be very careful when
63
00:02:29,090 --> 00:02:31,880
interpreting data where
there is a clear source
64
00:02:31,880 --> 00:02:36,740
of bias and be on the lookout
for subtle biases as well.
65
00:02:36,740 --> 00:02:40,325
In addition to thinking
through bias in the data,
66
00:02:40,325 --> 00:02:42,230
data professionals
should also try
67
00:02:42,230 --> 00:02:44,105
to emphasize that there can be
68
00:02:44,105 --> 00:02:46,670
a multitude of possible
interpretations
69
00:02:46,670 --> 00:02:49,085
for every data insight.
70
00:02:49,085 --> 00:02:52,025
The main trick is avoid
71
00:02:52,025 --> 00:02:53,690
jumping to conclusions until
72
00:02:53,690 --> 00:02:55,660
you've really done
your homework.
73
00:02:55,660 --> 00:02:58,070
One method of
addressing bias is to
74
00:02:58,070 --> 00:03:00,530
make sure that the
data that you're
75
00:03:00,530 --> 00:03:02,945
working with has the
same characteristics
76
00:03:02,945 --> 00:03:05,795
as the greater population
that you're interested in.
77
00:03:05,795 --> 00:03:09,310
In data analytics, this
is called a sample.
78
00:03:09,310 --> 00:03:12,940
A good sample is a
segment of a population
79
00:03:12,940 --> 00:03:16,510
that is representative of
that entire population.
80
00:03:16,510 --> 00:03:17,650
Here's an example.
81
00:03:17,650 --> 00:03:19,450
A clothing company is
82
00:03:19,450 --> 00:03:22,555
analyzing sales in their
highest growth market.
83
00:03:22,555 --> 00:03:25,240
They want to determine
what color shirts will
84
00:03:25,240 --> 00:03:28,240
be most popular in
the upcoming season.
85
00:03:28,240 --> 00:03:29,650
One person notes that
86
00:03:29,650 --> 00:03:31,840
red and blue shirts
accounted for 80 percent of
87
00:03:31,840 --> 00:03:33,640
their sales in this market over
88
00:03:33,640 --> 00:03:36,330
the past three months.
This is a big number.
89
00:03:36,330 --> 00:03:39,460
So they suggest ordering
lots of red and blue shirts,
90
00:03:39,460 --> 00:03:42,040
but another person
points out that
91
00:03:42,040 --> 00:03:44,380
the local sports team's
colors are red and blue,
92
00:03:44,380 --> 00:03:47,425
and this team had recently
won a championship.
93
00:03:47,425 --> 00:03:49,150
It's very likely that sales
94
00:03:49,150 --> 00:03:50,690
of red and blue shirts will have
95
00:03:50,690 --> 00:03:52,550
spiked as consumers purchase
96
00:03:52,550 --> 00:03:54,445
tease to support the local team.
97
00:03:54,445 --> 00:03:56,060
Plus, they note that
98
00:03:56,060 --> 00:03:58,040
although this market
is high-growth,
99
00:03:58,040 --> 00:04:00,020
it only represents 40 percent
100
00:04:00,020 --> 00:04:02,075
of the retailer's total sales.
101
00:04:02,075 --> 00:04:03,875
With all this
information in mind,
102
00:04:03,875 --> 00:04:06,635
decision-makers at
this retailer instead
103
00:04:06,635 --> 00:04:08,840
choose to evaluate
color popularity
104
00:04:08,840 --> 00:04:11,705
over a full year and
across all markets.
105
00:04:11,705 --> 00:04:15,070
This will provide a much
more complete picture.
106
00:04:15,070 --> 00:04:18,610
We'll investigate more about
bias later in this program,
107
00:04:18,610 --> 00:04:20,060
and as you progress,
108
00:04:20,060 --> 00:04:22,100
you'll discover many
more strategies
109
00:04:22,100 --> 00:04:24,110
for ensuring that
you're aware of bias
110
00:04:24,110 --> 00:04:26,315
and proactively
working to counter
111
00:04:26,315 --> 00:04:29,700
it in all of your data work.7897
Can't find what you're looking for?
Get subtitles in any language from opensubtitles.com, and translate them here.