Wild Wide AI: responsible data science

Who shoots first — the new race or the human?

Michel Kana, Ph.D
Towards Data Science

--

Data Science can do good things for us: it improves life, it makes things more efficient, more effective and leads to a better experience. There are however some miss-steps that data-driven analysis has already exhibited. Here are few examples where data science tools were intentionally or unintentionally misused:

  • In 2012 a team of investigative journalists from The Wall Street Journal found out that Staples - a multinational supply retailing corporation — offered lower prices to buyers who live in more affluent neighborhoods. Staples’ intention was to offer discounts to customers who lived closer to their competitors’ stores. However their competitors tended to build stores in richer neighborhoods. Based on the correlation between location and social status, this resulted in price discrimination based on race. Neither Staples nor customers did not know about this side effect until a team of investigative journalists brought it to light (source).
  • In 2015, the AdFischer project demonstrated via simulation that synthetic men online profiles were being shown ads for high paying jobs significantly more frequently than female profiles. The result was a clear employment discrimination based on gender (source). The study started surfacing the problem due to lack of responsibility intentionally or not in data-driven algorithmic pipelines.
  • In 2016, investigators from ProPublica discovered that the software used by judges in court to predict future crimes was often incorrect, and it was racist: blacks were almost twice as likely as whites to be labeled a higher risk but less likely to re-offend. The tool made the opposite mistake among whites: they were much more likely than blacks to be labeled lower risk but went on to commit other crimes. ProPublica’s study was very influential. They published the dataset, the data methodology, as well as the data processing code in the form of a Jupyter Notebook on GitHub. This striking result really speaks about the opacity and the lack of fairness in these types of tools, especially when they were used in the public sector, in governments, in the juridical system (source).

Is Data Science impartial?

It is often claimed that data science is algorithmic and therefore cannot be biased. And yet, we saw examples above where all traditional evils of discrimination exhibit themselves in the data science ecosystem. Bias is inherited both in the data and in the process, is propelled and amplified.

Transparency is an idea, a mindset, a set of mechanisms that can help prevent discrimination, enable public debate and establish trust. When we make data science, we interact with society. The way we do decisions has to be in an environment where we have trust from the participants, from the public. Technology alone won’t solve the issue. User engagement, policy efforts are important.

Data responsibility

Aspects of responsibility in the data science ecosystem include: fairness, transparency, diversity and data protection. The area of responsible data science is very new but is already at the edge of all the top machine learning conferences because these are difficult but interesting and relevant problems.

Moritz Hardt

What is Fairness?

Philosophers, lawyers, sociologists have been asking this question for many years. In the data science context we usually solve the task of predictive analytics, predicting future performance or behavior based on some past or present observation (dataset). Statistical bias occurs when models used to solve such tasks do not fit the data very well. A biased model is somehow imprecise and does not summarize the data correctly. Societal bias happens when the data or the model does not represent the world correctly. An example occurs when the data is not representative. This is the case if we only used the data for police going to the SAME neighborhood over and over, and we use this information only about crime from those particular neighborhoods. Societal bias can also be caused by how we define world. Is it the world as it is that we are trying to impact with predictive analytics or the world as it should be? Who should determine what the world should be like?

What is discrimination?

In most legal systems, there are two concepts defining discrimination:

  • Disparate treatment is the illegal practice of treating an entity, such as a creditor or employee, differently based on a protected characteristic such as race, gender, age, religion, sexual orientation, or national origin. Disparate treatment comes in a context where there is some benefit to begin or some harm to be brought to the individual being treated, for example sentencing them or admitting them to college or granting them credit. It is something where there is actually tangible positive or negative impact.
  • Disparate impact is the result of systematic disparate treatment, where disproportionate adverse impact is observed on members of
    a protected class. Different countries protect different classes or sub-populations.

When we talk about discrimination, we are using terms which could be uncomfortable, such as racism, gender, sexual orientation. Political correctness in the extreme sense has no place in these debates about responsible data science. We have to be able to name concepts to be able to talk about them. Once we can talk about those concepts, we can take corrective action.

Technical definition of fairness

Let’s consider vendors who are assigning outcomes to members of a population. This is the most basic case, a binary classification. Positive outcomes may be: offered employment, accepted to school, offered a loan, offered a discount. Negative outcomes may be: denied employment, rejected from school, denied a loan, not offered a discount. What we worry about in fairness is how outcome is assigned to members of a population. Let’s assume that 40% got the positive outcome. Some sub-population however may be treated differently by this process. Let’s assume that we know ahead of time what the sub-population is, for example red haired people. Thus we can divide our population into two groups: people with red hair, and people without red hair. In our example we observe that while 40% of the population got the positive outcome, only 20% of red haired received the positive outcome. 60% of the other received the positive outcome. Here, according to some definition, we observe disparate impact on the group of red haired individuals. Another way to denote this situation is that statistical parity fails. This is a baseline definition of fairness without conditioning. There is quite some sophistication about using such assessment in real-life, for example in courts. This basic definition of fairness, written into many laws around the globe, dictates that demographics of the individuals receiving any outcome are the same as demographics of the underlying population.

Assessing disparate impact

The vendor could say that he actually did not intend or did not look at all at hair color, which happens to be the sensitive attribute in the dataset. Instead the vendor would say that he decided to give the positive outcome to people whose hair is long. The vendor is denying the accusation and saying that he is not discriminating based on hair color. The thing is that the vendor has adversely impacted red haired people. It is not the intention that we care about, but the effect on the sub-population. In other words, blinding is not a legal or ethical excuse. Removing hair color from vendor’s process on outcome assignment does not prevent discrimination from occurring. Disparate impact is legally assessed on the impact, not on the intention.

Mitigating disparate impact

If we detect a violation of statistical parity, we may want to mitigate. In an environment in which we have a number of positive outcomes which we can assign, we have to swap some outcomes. We have to take a positive outcome from somebody in the not-red haired group and give it to someone else in the red haired group. Not everyone will agree with swapping outcomes. An individual who used to get the positive outcome would stop getting it any more. This would lead to individual fairness. It stipulates that any two individuals who are similar within a particular task should receive similar outcomes. There is a tension between group and individual fairness that is not easy to resolve.

Individual vs group fairness

An example in which individual fairness and group fairness was taken to the supreme court appears in the Ricci v. DeStefano case in 2009. Firefighters took a test for promotion, and the department threw out the test results because none of the black firefighters scored high enough to be promoted. The fire department was afraid that they could be sued for discrimination and disparate impact if they were to admit results and not promote any black firefighter. But then the lawsuit was brought by the firefighters who would have been eligible for promotion but who weren’t promoted as a result of this. There was an individual fairness argument, a disparate treatment argument. They argued that race was used to negatively impact them. This case was ruled in favor of white firefighters, in favor of individual fairness.

Individual fairness is equality, everybody gets the same box to reach the tree. Group fairness is the equity view, everybody gets as many boxes as they need to be able to reach the tree. Equity costs more because society has to invest more. These are two intrinsically different world views that we cannot logically decide which one is better. These are just two different points of view, there isn’t a better one. They go back to what we believe a world as it is, is a world as it should be. The truth is going to be somewhere in the middle. It is important to understand which kinds of mitigation are consistent with which kinds of belief systems.

Formal definition of fairness

Friedler et. al. tease out the difference between beliefs about fairness and
mechanisms that logically follow from those beliefs in their paper from 2016. The construct space is intrinsically the state of the world. It is made of things we cannot directly measure such as intelligence, grit, propensity to commit crime and risk-adverseness. We however want to measure intelligence and grit when we decide who to admit to college. We want to know the propensity of a person to recommit crime and his risk-adverseness in justice. These are raw properties which are exhibited and not directly accessible. Instead we look at the observed space where there are proxies, which are to a greater or lesser degree aligned with the properties that we want to measure. For intelligence the proxy would be SAT score, grit would be measured by high-school GPA, propensity to commit crime by family history and risk-adverseness by age. The decision space is then made of what we would like to decide: performance in college and recidivism.

Fairness is defined here as a mapping from the construct space to the decision space, via the observe space. Individual fairness (equality) believes that the observed space faithfully represents the construct space. For example high-school GPA is a good measure of grit. Therefore the mapping from construct to decision space has low distortion. Group fairness (equity) however says that there is a systematic distortion caused by structural bias, society bias when going from the construct space to observed space. Furthermore this distortion aligns with groups structure, with membership in protected groups in our society. In other words the society systematically discriminates.

to be continued …

References: lecture on responsible data science at Harvard University by Prof. Julia Stoyanovich (New York University) — selected chapters from “ The Age of Surveillance Capitalism” book by Shoshana Zuboff — thoughts from “What worries me about AI” post by François Chollet.

--

--

Husband & Dad. Mental health advocate. Top Medium Writer. 20 years in IT. AI Expert @Harvard. Empowering human-centered organizations with high-tech.