A Causal Framework for Discovering and Removing Direct and Indirect Discrimination

Anti-discrimination is an increasingly important task in data science. In this paper, we investigate the problem of discovering both direct and indirect discrimination from the historical data, and removing the discriminatory effects before the data is used for predictive analysis (e.g., building classifiers). We make use of the causal network to capture the causal structure of the data. Then we model direct and indirect discrimination as the path-specific effects, which explicitly distinguish the two types of discrimination as the causal effects transmitted along different paths in the network. Based on that, we propose an effective algorithm for discovering direct and indirect discrimination, as well as an algorithm for precisely removing both types of discrimination while retaining good data utility. Different from previous works, our approaches can ensure that the predictive models built from the modified data will not incur discrimination in decision making. Experiments using real datasets show the effectiveness of our approaches.

[1]  Jun Sakuma,et al.  Fairness-aware Learning through Regularization Approach , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[2]  Lu Zhang,et al.  Situation Testing-Based Discrimination Discovery: A Causal Inference Approach , 2016, IJCAI.

[3]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[4]  Carlos Eduardo Scheidegger,et al.  Certifying and Removing Disparate Impact , 2014, KDD.

[5]  Lu Zhang,et al.  On Discrimination Discovery Using Causal Networks , 2016, SBP-BRiMS.

[6]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[7]  KamiranFaisal,et al.  Data preprocessing techniques for classification without discrimination , 2012 .

[8]  G. Miller,et al.  Cognitive science. , 1981, Science.

[9]  Peter Bühlmann,et al.  Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm , 2007, J. Mach. Learn. Res..

[10]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[11]  Chen Avin,et al.  Identifiability of Path-Specific Effects , 2005, IJCAI.

[12]  Salvatore Ruggieri,et al.  A multidisciplinary survey on discrimination analysis , 2013, The Knowledge Engineering Review.

[13]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[14]  Laura E. Brown,et al.  Scaling-Up Bayesian Network Learning to Thousands of Variables Using Local Learning Techniques , 2003 .

[15]  Franco Turini,et al.  k-NN as an implementation of situation testing for discrimination discovery and prevention , 2011, KDD.

[16]  Ilya Shpitser,et al.  Counterfactual Graphical Models for Longitudinal Mediation Analysis With Unobserved Confounding , 2012, Cogn. Sci..

[17]  F. R. Rosendaal,et al.  Prediction , 2015, Journal of thrombosis and haemostasis : JTH.

[18]  David Heckerman,et al.  Causal independence for probability assessment and inference using Bayesian networks , 1996, IEEE Trans. Syst. Man Cybern. Part A.

[19]  Franco Turini,et al.  Discrimination-aware data mining , 2008, KDD.

[20]  David Heckerman,et al.  A New Look at Causal Independence , 1994, UAI.

[21]  Josep Domingo-Ferrer,et al.  A Methodology for Direct and Indirect Discrimination Prevention in Data Mining , 2013, IEEE Transactions on Knowledge and Data Engineering.

[22]  Constantin F. Aliferis,et al.  Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part I: Algorithms and Empirical Evaluation , 2010, J. Mach. Learn. Res..

[23]  Suresh Venkatasubramanian,et al.  Auditing Black-box Models by Obscuring Features , 2016, ArXiv.

[24]  Franco Turini,et al.  Data mining for discrimination discovery , 2010, TKDD.

[25]  Diego Colombo,et al.  Order-independent constraint-based causal structure learning , 2012, J. Mach. Learn. Res..

[26]  Toon Calders,et al.  Three naive Bayes approaches for discrimination-free classification , 2010, Data Mining and Knowledge Discovery.

[27]  Toon Calders,et al.  Data preprocessing techniques for classification without discrimination , 2011, Knowledge and Information Systems.

[28]  Chris Clifton,et al.  Combating discrimination using Bayesian networks , 2014, Artificial Intelligence and Law.

[29]  Suresh Venkatasubramanian,et al.  Auditing black-box models for indirect influence , 2016, Knowledge and Information Systems.

[30]  Shuliang Wang,et al.  Data Mining and Knowledge Discovery , 2005, Mathematical Principles of the Internet.

[31]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[32]  L. Khachiyan,et al.  The polynomial solvability of convex quadratic programming , 1980 .

[33]  Francesco Bonchi,et al.  Exposing the probabilistic causal structure of discrimination , 2015, International Journal of Data Science and Analytics.

[34]  Toon Calders,et al.  Discrimination Aware Decision Tree Learning , 2010, 2010 IEEE International Conference on Data Mining.

[35]  Richard Scheines,et al.  The Tetrad Project , 1990 .

[36]  Toon Calders,et al.  Handling Conditional Discrimination , 2011, 2011 IEEE 11th International Conference on Data Mining.

[37]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .