"Wrong side of the tracks": Big Data and Protected Categories

When we use machine learning for public policy, we find that many useful variables are associated with others on which it would be ethically problematic to base decisions. This problem becomes particularly acute in the Big Data era, when predictions are often made in the absence of strong theories for underlying causal mechanisms. We describe the dangers to democratic decision-making when high-performance algorithms fail to provide an explicit account of causation. We then demonstrate how information theory allows us to degrade predictions so that they decorrelate from protected variables with minimal loss of accuracy. Enforcing total decorrelation is at best a near-term solution, however. The role of causal argument in ethical debate urges the development of new, interpretable machine-learning algorithms that reference causal mechanisms.

[1]  Angela L. Duckworth,et al.  Grit: perseverance and passion for long-term goals. , 2007, Journal of personality and social psychology.

[2]  G. Williams Causation in the Law , 1961, The Cambridge Law Journal.

[3]  S. Spencer,et al.  Stereotype Threat and Women's Math Performance , 1999 .

[4]  Eric Gossett,et al.  Big Data: A Revolution That Will Transform How We Live, Work, and Think , 2015 .

[5]  Tom M Mitchell,et al.  Mining Our Reality , 2009, Science.

[6]  R. Pargetter,et al.  Metaphysics of causation , 1990 .

[7]  Cynthia Rudin,et al.  An Interpretable Stroke Prediction Model using Rules and Bayesian Analysis , 2013, AAAI.

[8]  Carlos Eduardo Scheidegger,et al.  Certifying and Removing Disparate Impact , 2014, KDD.

[9]  Eden Medina,et al.  Rethinking algorithmic regulation , 2015, Kybernetes.

[10]  A. Clark,et al.  Supersizing The Mind Embodiment Action And Cognitive Extension Andy Clark , 2016 .

[11]  Angela L. Duckworth,et al.  The grit effect: predicting retention in the military, the workplace, school and marriage , 2013, Front. Psychol..

[12]  Sonja B. Starr Evidence-Based Sentencing and the Scientific Rationalization of Discrimination , 2013 .

[13]  R. Boire Predictive analytics: The power to predict who will click, buy, lie, or die , 2013 .

[14]  A. Hasman,et al.  Probabilistic reasoning in intelligent systems: Networks of plausible inference , 1991 .

[15]  M. Moore Causation and Responsibility: An Essay in Law, Morals, and Metaphysics , 2009 .

[16]  Daniel N. Rockmore,et al.  Intelligent Data Analysis of Intelligent Systems , 2010, IDA.

[17]  Melanie Mitchell,et al.  Interpreting individual classifications of hierarchical networks , 2013, 2013 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).

[18]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[19]  John Haugeland,et al.  Artificial intelligence - the very idea , 1987 .

[20]  Viktor Mayer-Schnberger,et al.  Big Data: A Revolution That Will Transform How We Live, Work, and Think , 2013 .

[21]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[22]  Saint John Walker Big Data: A Revolution That Will Transform How We Live, Work, and Think , 2014 .

[23]  Lada A. Adamic,et al.  Computational Social Science , 2009, Science.

[24]  Charles Duhigg,et al.  How Companies Learn Your Secrets , 2012 .

[25]  Cassidy R. Sugimoto,et al.  Big data is not a monolith , 2016 .

[26]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[27]  Mirko Farina Supersizing the Mind: Embodiment, Action and Cognitive Extension. , 2010 .