Discovering Unwarranted Associations in Data-Driven Applications with the FairTest Testing Toolkit

In today's data-driven world, programmers routinely incorporate user data into complex algorithms, heuristics, and application pipelines. While often beneficial, this practice can have unintended and detrimental consequences, such as the discriminatory effects identified in Staples' online pricing algorithm and the racially offensive labels recently found in Google's image tagger. We argue that such effects are bugs that should be tested for and debugged in a manner similar to functionality, performance, and security bugs. We describe FairTest, a testing toolkit that detects unwarranted associations between an algorithm's outputs (e.g., prices or labels) and user subpopulations, including protected groups (e.g., defined by race or gender). FairTest reports any statistically significant associations to programmers as potential bugs and ranks them by their strength while accounting for known explanatory factors. We designed FairTest for ease of use by programmers and integrated it into the evaluation framework of SciPy, a popular library for data analytics. We used FairTest experimentally to identify unfair disparate impact, offensive labeling, and disparate rates of algorithmic error in six applications and datasets. As examples, our results reveal subtle biases against older populations in the distribution of error in a real predictive health application, and offensive racial labeling in an image tagger.

[1]  Toniann Pitassi,et al.  The reusable holdout: Preserving validity in adaptive data analysis , 2015, Science.

[2]  Toon Calders,et al.  Handling Conditional Discrimination , 2011, 2011 IEEE 11th International Conference on Data Mining.

[3]  M. Phil,et al.  A METHODOLOGY FOR DIRECT AND INDIRECT DISCRIMINATION PREVENTION IN DATA MINING , 2015 .

[4]  Ramesh Govindan,et al.  AdReveal: improving transparency into online targeted advertising , 2013, HotNets.

[5]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[6]  Teva J. Scheer Uniform Guidelines on Employee Selection Procedures , 2007 .

[7]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[8]  David Lazer,et al.  Measuring Price Discrimination and Steering on E-commerce Web Sites , 2014, Internet Measurement Conference.

[9]  Jun Sakuma,et al.  Fairness-Aware Classifier with Prejudice Remover Regularizer , 2012, ECML/PKDD.

[10]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[11]  Dan S. Wallach,et al.  An Empirical Study of Mobile Ad Targeting , 2015, ArXiv.

[12]  Toon Calders,et al.  Classifying without discriminating , 2009, 2009 2nd International Conference on Computer, Control and Communication.

[13]  Franco Turini,et al.  Integrating induction and deduction for finding evidence of discrimination , 2009, Artificial Intelligence and Law.

[14]  Nick Feamster,et al.  Exposing Inconsistent Web Search Results with Bobble , 2014, PAM.

[15]  Franco Turini,et al.  k-NN as an implementation of situation testing for discrimination discovery and prevention , 2011, KDD.

[16]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[17]  John Goodier,et al.  A Dictionary of Statistics , 2003 .

[18]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[19]  Trevor J. Hastie,et al.  Confidence intervals for random forests: the jackknife and the infinitesimal jackknife , 2013, J. Mach. Learn. Res..

[20]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[21]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[22]  Sofya Raskhodnikova,et al.  Testing the Lipschitz Property over Product Distributions with Applications to Data Privacy , 2013, TCC.

[23]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  W. McMeekin,et al.  Looking inside the Black Box , 2012 .

[25]  F. James Rohlf,et al.  Biometry: The Principles and Practice of Statistics in Biological Research , 1969 .

[26]  Roxana Geambasu,et al.  Sunlight: Fine-grained Targeting Detection at Scale with Statistical Confidence , 2015, CCS.

[27]  Roxana Geambasu,et al.  XRay: Enhancing the Web's Transparency with Differential Correlation , 2014, USENIX Security Symposium.

[28]  Kristin Branson,et al.  Understanding classifier errors by examining influential neighbors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Qiang Ma,et al.  Adscape: harvesting and analyzing online display ads , 2014, WWW.

[30]  M. D. Ernst Permutation Methods: A Basis for Exact Inference , 2004 .

[31]  Michael Carl Tschantz,et al.  Automated Experiments on Ad Privacy Settings , 2014, Proc. Priv. Enhancing Technol..

[32]  J. Rodgers,et al.  Thirteen ways to look at the correlation coefficient , 1988 .

[33]  Balachander Krishnamurthy,et al.  Measuring personalization of web search , 2013, WWW.

[34]  Michael Carl Tschantz,et al.  Automated Experiments on Ad Privacy Settings: A Tale of Opacity, Choice, and Discrimination , 2014, ArXiv.

[35]  Toon Calders,et al.  Discrimination Aware Decision Tree Learning , 2010, 2010 IEEE International Conference on Data Mining.

[36]  Latanya Sweeney,et al.  Discrimination in online ad delivery , 2013, CACM.

[37]  Barnabás Póczos,et al.  Nonparametric Divergence Estimation with Applications to Machine Learning on Distributions , 2011, UAI.

[38]  Carlos Eduardo Scheidegger,et al.  Certifying and Removing Disparate Impact , 2014, KDD.

[39]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[40]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[41]  Liam Paninski,et al.  Estimation of Entropy and Mutual Information , 2003, Neural Computation.

[42]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[43]  Franco Turini,et al.  Discrimination-aware data mining , 2008, KDD.

[44]  Jennifer L. Peresie Toward a Coherent Test for Disparate Impact Discrimination , 2009 .

[45]  Franco Turini,et al.  Data mining for discrimination discovery , 2010, TKDD.

[46]  P. Bickel,et al.  Sex Bias in Graduate Admissions: Data from Berkeley , 1975, Science.

[47]  T. Carrilio Looking Inside the “Black Box” , 2006 .

[48]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[49]  Toniann Pitassi,et al.  Learning Fair Representations , 2013, ICML.

[50]  Ameet Talwalkar,et al.  MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..

[51]  E. H. Simpson,et al.  The Interpretation of Interaction in Contingency Tables , 1951 .

[52]  Josep Domingo-Ferrer,et al.  A Methodology for Direct and Indirect Discrimination Prevention in Data Mining , 2013, IEEE Transactions on Knowledge and Data Engineering.

[53]  Jean-Pierre Hubaux,et al.  Privacy is dead, long live privacy , 2016, Commun. ACM.

[54]  Sofya Raskhodnikova,et al.  Testing and Reconstruction of Lipschitz Functions with Applications to Data Privacy , 2011, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[55]  Toon Calders,et al.  Three naive Bayes approaches for discrimination-free classification , 2010, Data Mining and Knowledge Discovery.

[56]  Wouter Joosen,et al.  Crying wolf? On the price discrimination of online airline tickets , 2014, PETS 2014.

[57]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[58]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.