Better decision support through exploratory discrimination-aware data mining: foundations and empirical evidence

Decision makers in banking, insurance or employment mitigate many of their risks by telling “good” individuals and “bad” individuals apart. Laws codify societal understandings of which factors are legitimate grounds for differential treatment (and when and in which contexts)—or are considered unfair discrimination, including gender, ethnicity or age. Discrimination-aware data mining (DADM) implements the hope that information technology supporting the decision process can also keep it free from unjust grounds. However, constraining data mining to exclude a fixed enumeration of potentially discriminatory features is insufficient. We argue for complementing it with exploratory DADM, where discriminatory patterns are discovered and flagged rather than suppressed. This article discusses the relative merits of constraint-oriented and exploratory DADM from a conceptual viewpoint. In addition, we consider the case of loan applications to empirically assess the fitness of both discrimination-aware data mining approaches for two of their typical usage scenarios: prevention and detection. Using Mechanical Turk, 215 US-based participants were randomly placed in the roles of a bank clerk (discrimination prevention) or a citizen / policy advisor (detection). They were tasked to recommend or predict the approval or denial of a loan, across three experimental conditions: discrimination-unaware data mining, exploratory, and constraint-oriented DADM (eDADM resp. cDADM). The discrimination-aware tool support in the eDADM and cDADM treatments led to significantly higher proportions of correct decisions, which were also motivated more accurately. There is significant evidence that the relative advantage of discrimination-aware techniques depends on their intended usage. For users focussed on making and motivating their decisions in non-discriminatory ways, cDADM resulted in more accurate and less discriminatory results than eDADM. For users focussed on monitoring for preventing discriminatory decisions and motivating these conclusions, eDADM yielded more accurate results than cDADM.

[1]  Josep Domingo-Ferrer,et al.  Discrimination prevention in data mining for intrusion and crime detection , 2011, 2011 IEEE Symposium on Computational Intelligence in Cyber Security (CICS).

[2]  T. Mexia,et al.  Author ' s personal copy , 2009 .

[3]  Josep Domingo-Ferrer,et al.  A Methodology for Direct and Indirect Discrimination Prevention in Data Mining , 2013, IEEE Transactions on Knowledge and Data Engineering.

[4]  Sara Hajian,et al.  Simultaneous Discrimination Prevention and Privacy Protection in Data Publishing and Mining , 2013, ArXiv.

[5]  Gwyneth Pitt Genuine occupational requirements , 2009 .

[6]  Bettina Berendt,et al.  Exploring Discrimination: A User-centric Evaluation of Discrimination-Aware Data Mining , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[7]  Toon Calders,et al.  Three naive Bayes approaches for discrimination-free classification , 2010, Data Mining and Knowledge Discovery.

[8]  Amar Cheema,et al.  Data collection in a flat world: the strengths and weaknesses of mechanical turk samples , 2013 .

[9]  Christa Tobler,et al.  Case C-236/09, Association belge des Consommateurs Test-Achats ASBL, Yann van Vugt, Charles Basselier v. Conseil des ministres, Judgment of the Court of Justice (Grand Chamber) of 1 Marc , 2011, Common Market Law Review.

[10]  Emine Yilmaz,et al.  Crowdsourcing interactions: using crowdsourcing for evaluating interactive information retrieval systems , 2012, Information Retrieval.

[11]  Bettina Berendt,et al.  A Privacy-Protecting Business-Analytics Service for On-Line Transactions , 2008, Int. J. Electron. Commer..

[12]  Arjen P. de Vries,et al.  Increasing cheat robustness of crowdsourcing tasks , 2013, Information Retrieval.

[13]  Cordelia Fine Delusions of Gender , 2010 .

[14]  Jun Sakuma,et al.  Considerations on Fairness-Aware Data Mining , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[15]  Franco Turini,et al.  k-NN as an implementation of situation testing for discrimination discovery and prevention , 2011, KDD.

[16]  Catherine Plaisant,et al.  The challenge of information visualization evaluation , 2004, AVI.

[17]  Salvatore Ruggieri,et al.  A multidisciplinary survey on discrimination analysis , 2013, The Knowledge Engineering Review.

[18]  Jos Dumortier,et al.  The Accountability Principle in Data Protection Regulation: Origin, Development and Future Directions , 2011, Managing Privacy through Accountability.

[19]  Jun Sakuma,et al.  Fairness-Aware Classifier with Prejudice Remover Regularizer , 2012, ECML/PKDD.

[20]  Tamara Munzner,et al.  Design Study Methodology: Reflections from the Trenches and the Stacks , 2012, IEEE Transactions on Visualization and Computer Graphics.

[21]  Bettina Berendt More than modelling and hiding: towards a comprehensive view of Web mining and privacy , 2012, Data Mining and Knowledge Discovery.

[22]  Franco Turini,et al.  DCUBE: discrimination discovery in databases , 2010, SIGMOD Conference.

[23]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[24]  Erich Schanze Injustice by Generalization: Notes on the Test-Achats Decision of the European Court of Justice , 2013, German Law Journal.

[25]  David Arnott,et al.  Cognitive biases and decision support systems development: a design science approach , 2006, Inf. Syst. J..

[26]  M. Mattson,et al.  From words to meaning: A semantic illusion , 1981 .

[27]  Jiawei Han,et al.  CPAR: Classification based on Predictive Association Rules , 2003, SDM.

[28]  James R. Lewis,et al.  IBM computer usability satisfaction questionnaires: Psychometric evaluation and instructions for use , 1995, Int. J. Hum. Comput. Interact..

[29]  Franco Turini,et al.  Discrimination-aware data mining , 2008, KDD.

[30]  J L Bresnahan,et al.  A general equation and technique for the exact partitioning of chi-square contingency tables. , 1966, Psychological bulletin.

[31]  Franco Turini,et al.  Data mining for discrimination discovery , 2010, TKDD.

[32]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[33]  Franco Turini,et al.  A study of top-k measures for discrimination discovery , 2012, SAC '12.

[34]  Binh Luong Thanh Generalized discrimination discovery on semi-structured data supported by ontology , 2011 .

[35]  Sicco Verwer,et al.  Classifying Socially Sensitive Data Without Discrimination: An Analysis of a Crime Suspect Dataset , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[36]  Franco Turini,et al.  Measuring Discrimination in Socially-Sensitive Decision Records , 2009, SDM.

[37]  C. Fine Delusions of Gender : The Real Science Behind Sex Differences , 2010 .

[38]  An Baeyens,et al.  European Court of Justice , 2012 .

[39]  Bonnie Kaplan,et al.  Evaluating informatics applications - clinical decision support systems literature review , 2001, Int. J. Medical Informatics.

[40]  Ben Shneiderman,et al.  Integrating Statistics and Visualization for Exploratory Power: From Long-Term Case Studies to Design Guidelines , 2009, IEEE Computer Graphics and Applications.

[41]  Ronen Avraham,et al.  Understanding Insurance Anti-Discrimination Laws , 2012 .

[42]  Franco Turini,et al.  Integrating induction and deduction for finding evidence of discrimination , 2009, Artificial Intelligence and Law.

[43]  Toon Calders,et al.  Discrimination Aware Decision Tree Learning , 2010, 2010 IEEE International Conference on Data Mining.

[44]  Sang M. Lee,et al.  An exploratory cognitive DSS for strategic decision making , 2003, Decis. Support Syst..

[45]  Bettina Berendt,et al.  Visual Data Mining for Higher-level Patterns: Discrimination-Aware Data Mining and Beyond , 2011, LWA.

[46]  Franco Turini,et al.  Integrating induction and deduction for finding evidence of discrimination , 2009, ICAIL.

[47]  Barbro Back,et al.  Evaluating the Quality of Use of Visual Data-Mining Tools , 2004 .

[48]  Faisal Kamiran,et al.  Quantifying explainable discrimination and removing illegal discrimination in automated decision making , 2012, Knowledge and Information Systems.

[49]  Josep Domingo-Ferrer,et al.  Direct and Indirect Discrimination Prevention Methods , 2013, Discrimination and Privacy in the Information Society.

[50]  Chris Clifton,et al.  Discriminatory Decision Policy Aware Classification , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[51]  S. Gutwirth,et al.  Privacy, Data Protection and Law Enforcement. Opacity of the Individual and Transparency of Power , 2022, Direito Público.

[52]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[53]  Josep Domingo-Ferrer,et al.  Injecting Discrimination and Privacy Awareness Into Pattern Discovery , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.