Proxy Discrimination∗ in Data-Driven Systems Theory and Experiments with Machine Learnt Programs

Machine learnt systems inherit biases against protected classes, historically disparaged groups, from training data. Usually, these biases are not explicit, they rely on subtle correlations discovered by training algorithms, and are therefore difficult to detect. We formalize proxy discrimination in data-driven systems, a class of properties indicative of bias, as the presence of protected class correlates that have causal influence on the system's output. We evaluate an implementation on a corpus of social datasets, demonstrating how to validate systems against these properties and to repair violations where they occur.

[1]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[2]  Deirdre K. Mulligan,et al.  Discrimination in Online Personalization: A Multidisciplinary Inquiry , 2018, FAT.

[3]  Toon Calders,et al.  Three naive Bayes approaches for discrimination-free classification , 2010, Data Mining and Knowledge Discovery.

[4]  Jun Sakuma,et al.  Fairness-Aware Classifier with Prejudice Remover Regularizer , 2012, ECML/PKDD.

[5]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[6]  Justin Bleich,et al.  Forecasts of Violence to Inform Sentencing Decisions , 2014 .

[7]  Roxana Geambasu,et al.  Discovering Unwarranted Associations in Data-Driven Applications with the FairTest Testing Toolkit , 2015, ArXiv.

[8]  Franco Turini,et al.  k-NN as an implementation of situation testing for discrimination discovery and prevention , 2011, KDD.

[9]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[10]  Christopher T. Lowenkamp,et al.  False Positives, False Negatives, and False Analyses: A Rejoinder to "Machine Bias: There's Software Used across the Country to Predict Future Criminals. and It's Biased against Blacks" , 2016 .

[11]  Michael Carl Tschantz,et al.  Automated Experiments on Ad Privacy Settings: A Tale of Opacity, Choice, and Discrimination , 2014, ArXiv.

[12]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[13]  Toniann Pitassi,et al.  Learning Fair Representations , 2013, ICML.

[14]  Toon Calders,et al.  Building Classifiers with Independency Constraints , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[15]  Yair Zick,et al.  Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[16]  Camille Gear Rich,et al.  Performing Racial and Ethnic Identity: Discrimination By Proxy and the Future of Title VII , 2004 .

[17]  許炳華 差別影響歧視理論適用之爭議──以美國聯邦最高法院Texas Department of Housing and Community Affairs v. The Inclusive Communities Project Inc.案為探討核心 , 2017 .

[18]  Andrew D. Selbst,et al.  Big Data's Disparate Impact , 2016 .

[19]  Roxana Geambasu,et al.  Sunlight: Fine-grained Targeting Detection at Scale with Statistical Confidence , 2015, CCS.

[20]  Matt Fredrikson,et al.  Use Privacy in Data-Driven Systems: Theory and Experiments with Machine Learnt Programs , 2017, CCS.

[21]  Limin Jia,et al.  Policy auditing over incomplete logs: theory, implementation and applications , 2011, CCS '11.

[22]  Edward W. Frees,et al.  Predictive Modeling Applications in Actuarial Science , 2014 .

[23]  Roxana Geambasu,et al.  XRay: Enhancing the Web's Transparency with Differential Correlation , 2014, USENIX Security Symposium.

[24]  R. Berk,et al.  Forecasting Domestic Violence: A Machine Learning Approach to Help Inform Arraignment Decisions , 2016 .

[25]  J. Meseguer,et al.  Security Policies and Security Models , 1982, 1982 IEEE Symposium on Security and Privacy.

[26]  Marina Meila,et al.  Comparing Clusterings by the Variation of Information , 2003, COLT.

[27]  Kuldeep Kumar,et al.  A Comparative Analysis of Decision Trees Vis-à-vis Other Computational Data Mining Techniques in Automotive Insurance Fraud Detection , 2012 .

[28]  Graham J. G. Upton,et al.  A Dictionary of Statistics , 2002 .

[29]  Bernhard Schölkopf,et al.  Avoiding Discrimination through Causal Reasoning , 2017, NIPS.

[30]  Jun Sakuma,et al.  Fairness-aware Learning through Regularization Approach , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[31]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[32]  Carlos Eduardo Scheidegger,et al.  Certifying and Removing Disparate Impact , 2014, KDD.

[33]  Latanya Sweeney,et al.  Discrimination in online ad delivery , 2013, CACM.

[34]  Ryen W. White,et al.  Screening for Pancreatic Adenocarcinoma Using Signals From Web Search Logs: Feasibility Study and Results. , 2016, Journal of oncology practice.

[35]  David Maxwell Chickering,et al.  A Decision Theoretic Approach to Targeted Advertising , 2000, UAI.

[36]  Cynthia Rudin,et al.  Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model , 2015, ArXiv.

[37]  Michael Carl Tschantz,et al.  Formalizing and Enforcing Purpose Restrictions in Privacy Policies , 2012, 2012 IEEE Symposium on Security and Privacy.

[38]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[39]  Suresh Venkatasubramanian,et al.  Auditing black-box models for indirect influence , 2016, Knowledge and Information Systems.