Are we really fixing the bias problem in AI?
The most recent issue of the MIT Sloan Management Review  features four articles that, while discussing very different topics, end up mentioning the same issue: the bias problem. As multiple studies have shown, humans are biased and so is Artificial Intelligence.
The article Using Artificial Intelligence to Promote Diversity  comments on the fact that, being developed by humans and trained on human data, AI systems tend to perpetuate stereotypes. To contrast such a tendency, the article lists three types of actions that are currently being adopted in the AI research:
designing for inclusion (e.g. hiring more women developers);
training systems with “better” data (e.g. upsampling underrepresented groups);
giving bots a variety of voices (e.g. avoiding using only female voices).
The suggested actions do not seem to really target the bias problem. Rather they appear to arbitrarily identify groups to protect (i.e. on the base of criteria such as ethnicity, social class, gender identity, sexual preference and/or religious beliefs) and then constrain the models to reduce thier biases against those groups.
The problem in this approach is that the group identification may be biased itself. Leaving aside that every historical period and every society have different sensibilities about which groups deserve protection, even assuming complete fairness in the decision process, it is obvious that some groups will get more attention than others. In particular, I suspect that all those groups whose identification requires defining complex and entangled factors (e.g. the factors listed above, plus less evident factors such as age, height, weight, experience, personality, appearance, behavior, etc.) will be left behind. In this sense, constraining models to avoid biases against arbitrarily identified groups does not solve the bias problem, but it simply shifts it elsewhere.
This would not be an issue if AI was not going to drastically affect every aspect of our lives. In the second of the four articles, New Ways to Gauge Talent and Potential , Josh Bersi and Tomas Chamorro-Premuzic introduce three emerging assessment methods for recruitment:
gamified assessment (i.e. let the candidate enjoy the interview, playing a game that measures his performance and attitude);
digital interviews (i.e. use AI agents to standardize the interviews and to make the evaluation more rigorous);
candidate data mining (i.e. investigate the candidate’s reputation, followership and level of authority on social networks and the Web).
In a short paragraph about the ethics of talent identification, the authors mention some common solutions to the discrimination problem, including blinding the system to factors such as the skin color. Despite the good intentions, again a lot of arbitrariness is involved in such a solution. For instance, even if we recognize that a social class is penalized in a recruitment process, who is going to define the set of attributes that are necessary to properly constrain the model so that it is not biased towards that social class anymore? Will these attributes help all or only some/most of the social class members? And, even more importantly, who guarantees that other participants will not be affected by these constraints? By nature, AI systems are constantly seeking for discriminatory factors (i.e. factors to distinguish, not to discriminate) to help creating decision boundaries. When obvious factors of unfairness will be removed from the equation, the system will simply search for other factors that are non obvious, making the bias problem less evident, but not less dangerous. Mathematically, if we want that p(y | x, G) is independent of G (where G is a “protected group”), once we enforce the group protection constraint it is very likely that the distribution p(y | x, D) (where D is an unprotected group, which may be very hard to identify) will change too. Should not we care about it?
Well, someone may say that, at the very end, we can use the system decisions simply as recommendation, leaving to humans the final word. This would have been possible if it was not that recommendation systems strongly condition human opinions. The third article, The Hidden Side Effect of Recommendation Systems , shows how recommendations shape music preferences. Specifically, the authors organized an experiment to assess how music recommendations could affect the willingness of paying for a song. Music consumers were found more likely to pay for a song when its recommendation was higher, even if such recommendation was randomly generated. It is true that recruiters are not music consumers, but how much can they resist to an “unbiased AI recommendation system”?
Disclaimer: This article is not anyhow meant to discourage fairness in AI. It actually aims at incentivizing a serious and unbiased debate about methods to make the systems more robust and respectful to everyone, even to those groups that are currently not conscious of being discriminated. In this sense, it is my personal opinion that pluralism in the approaches is the key to avoid unethical bias shifts. A recent paper by Chen et al. (2018)  has shown that there is a trade-off between fairness and accuracy, and favoring the first over the second may have high costs in certain high-stakes decisions (e.g. healthcare and criminal justice). The authors claim that there exist a fair way to increase fairness in AI, without necessarily sacrificing accuracy, and it is the collection of larger and more representative data.
And, yes, I have not forgotten that the articles were four. The fourth one is for us, AI researchers and developers. The article is titled: Can we solve AI’s ‘Trust Problem’? . Making the long story short, it shows that US citizens do not trust AI (e.g. only 9% of the interviewed subjects trusted financial services; only 4% trusted AI in the hiring process). This lack of trust seems to derive from the high promises of AI makers followed by factual low deliveries. Maybe, we should be more fair and admit that we are still far from solving the bias problem. Admit that we are still scratching the surface of this issue... Maybe, being unbiased at least in this would be a very first step.
Thanks to Adam Fisch for the helpful suggestions.
 MIT Sloan Management Review, Winter 2019, Vol. 60, NO. 2
 “Using Artificial Intelligence to Promote Diversity”, Paul R. Daugherty, H. James Wilson and Rumann Chowdhury. https://sloanreview.mit.edu/article/using-artificial-intelligence-to-promote-diversity/
 “New Ways to Gauge Talent and Potential”, Josh Bersi and Tomas Chamorro-Premuzic. https://sloanreview.mit.edu/article/new-ways-to-gauge-talent-and-potential/
 “The Hidden Side Effect of Recommandation Systems”, Gediminas Adomavicius, Jesse Bockstedt, Shawn P. Curley, JingJing Zhang, and Sam Ranbotham. https://sloanreview.mit.edu/article/the-hidden-side-effects-of-recommendation-systems/
 “Why Is My Classifier Discriminatory?”, Irene Y. Chen, Fredrik D. Johansson, David Sontag. https://arxiv.org/pdf/1805.12002.pdf
 “Can we solve AI’s ‘Trust Problem’”, Thomas H. Davenport. https://sloanreview.mit.edu/article/can-we-solve-ais-trust-problem/