论文信息 - The Price of Fair PCA: One Extra Dimension - 字舞流文

The Price of Fair PCA: One Extra Dimension

We investigate whether the standard dimensionality reduction technique of PCA inadvertently produces data representations with different fidelity for two different populations. We show on several real-world data sets, PCA has higher reconstruction error on population A than on B (for example, women versus men or lower- versus higher-educated individuals). This can happen even when the data set has a similar number of samples from A and B. This motivates our study of dimensionality reduction techniques which maintain similar fidelity for A and B. We define the notion of Fair PCA and give a polynomial-time algorithm for finding a low dimensional representation of the data which is nearly-optimal with respect to this measure. Finally, we show on real-world data sets that our algorithm can be used to efficiently generate a fair low dimensional representation of the data.

Mohit Singh | Santosh S. Vempala | Samira Samadi | Uthaipon Tao Tantipongpipat | Jamie H. Morgenstern | S. Vempala | Mohit Singh | U. Tantipongpipat | S. Samadi | Jamie Morgenstern

[1] Zhe Zhao,et al. Data Decisions and Theoretical Implications when Adversarially Learning Fair Representations , 2017, ArXiv.

[2] Matt Olfat,et al. Convex Formulations for Fair Principal Component Analysis , 2018, AAAI.

[3] I-Cheng Yeh,et al. The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients , 2009, Expert Syst. Appl..

[4] Aaron Roth,et al. Fairness in Learning: Classic and Contextual Bandits , 2016, NIPS.

[5] Franco Turini,et al. k-NN as an implementation of situation testing for discrimination discovery and prevention , 2011, KDD.

[6] Franco Turini,et al. Discrimination-aware data mining , 2008, KDD.

[7] Krishna P. Gummadi,et al. Fairness Constraints: A Mechanism for Fair Classification , 2015, ArXiv.

[8] Toon Calders,et al. Discrimination Aware Decision Tree Learning , 2010, 2010 IEEE International Conference on Data Mining.

[9] Jun Sakuma,et al. Fairness-Aware Classifier with Prejudice Remover Regularizer , 2012, ECML/PKDD.

[10] Benjamin Fish,et al. A Confidence-Based Approach for Balancing Fairness and Accuracy , 2016, SDM.

[11] Josep Domingo-Ferrer,et al. A Methodology for Direct and Indirect Discrimination Prevention in Data Mining , 2013, IEEE Transactions on Knowledge and Data Engineering.

[12] Toon Calders,et al. Three naive Bayes approaches for discrimination-free classification , 2010, Data Mining and Knowledge Discovery.

[13] Suresh Venkatasubramanian,et al. Runaway Feedback Loops in Predictive Policing , 2017, FAT.

[14] Sean A. Munson,et al. Unequal Representation and Gender Stereotypes in Image Search Results for Occupations , 2015, CHI.

[15] Shai Ben-David,et al. Understanding Machine Learning: From Theory to Algorithms , 2014 .

[16] Timnit Gebru,et al. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification , 2018, FAT.

[17] David K. Smith. Theory of Linear and Integer Programming , 1987 .

[18] Marwan Mattar,et al. Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[19] I. Jolliffe. Principal Component Analysis and Factor Analysis , 1986 .

[20] Arkadi Nemirovski,et al. Lectures on modern convex optimization - analysis, algorithms, and engineering applications , 2001, MPS-SIAM series on optimization.

[21] Alexandra Chouldechova,et al. Does mitigating ML's impact disparity require treatment disparity? , 2017, NeurIPS.

[22] Mahmoud Afifi,et al. AFIF4: Deep Gender Classification based on AdaBoost-based Fusion of Isolated Facial Features and Foggy Faces , 2017, J. Vis. Commun. Image Represent..

[23] Nathan Srebro,et al. Equality of Opportunity in Supervised Learning , 2016, NIPS.

[24] Carlos Eduardo Scheidegger,et al. Certifying and Removing Disparate Impact , 2014, KDD.

[25] Toon Calders,et al. Data preprocessing techniques for classification without discrimination , 2011, Knowledge and Information Systems.

[26] Toniann Pitassi,et al. Learning Adversarially Fair and Transferable Representations , 2018, ICML.

[27] Jon M. Kleinberg,et al. Inherent Trade-Offs in the Fair Determination of Risk Scores , 2016, ITCS.

[28] Latanya Sweeney,et al. Discrimination in online ad delivery , 2013, CACM.

[29] Sanjeev Arora,et al. The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..

[30] Timnit Gebru,et al. Datasheets for datasets , 2018, Commun. ACM.

[31] Sampath Kannan,et al. Fairness Incentives for Myopic Agents , 2017, EC.

[32] Kush R. Varshney,et al. Optimized Pre-Processing for Discrimination Prevention , 2017, NIPS.

[33] Alexandra Chouldechova,et al. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments , 2016, Big Data.

[34] H. Hotelling. Analysis of a complex of statistical variables into principal components. , 1933 .

[35] Alexandra Chouldechova,et al. Does mitigating ML's disparate impact require disparate treatment? , 2017, ArXiv.

[36] Toniann Pitassi,et al. Fairness through awareness , 2011, ITCS '12.

[37] Blake Lemoine,et al. Mitigating Unwanted Biases with Adversarial Learning , 2018, AIES.

[38] Xiangliang Zhang,et al. Decision Theory for Discrimination-Aware Classification , 2012, 2012 IEEE 12th International Conference on Data Mining.

[39] Toniann Pitassi,et al. Learning Fair Representations , 2013, ICML.

[40] Suresh Venkatasubramanian,et al. Auditing black-box models for indirect influence , 2016, Knowledge and Information Systems.

[41] Karl Pearson F.R.S.. LIII. On lines and planes of closest fit to systems of points in space , 1901 .