Unbiased Subdata Selection for Fair Classification: A Unified Framework and Scalable Algorithms

As an important problem in modern data analytics, classification has witnessed varieties of applications from different domains. Different from conventional classification approaches, fair classification concerns the issues of unintentional biases against the sensitive features (e.g., gender, race). Due to high nonconvexity of fairness measures, existing methods are often unable to model exact fairness, which can cause inferior fair classification outcomes. This paper fills the gap by developing a novel unified framework to jointly optimize accuracy and fairness. The proposed framework is versatile and can incorporate different fairness measures studied in literature precisely as well as can be applicable to many classifiers including deep classification models. Specifically, in this paper, we first prove Fisher consistency of the proposed framework. We then show that many classification models within this framework can be recast as mixed-integer convex programs, which can be solved effectively by off-the-shelf solvers when the instance sizes are moderate and can be used as benchmarks to compare the efficiency of approximation algorithms. We prove that in the proposed framework, when the classification outcomes are known, the resulting problem, termed “unbiased subdata selection,” is strongly polynomial-solvable and can be used to enhance the classification fairness by selecting more representative data points. This motivates us to develop iterative refining strategy (IRS) to solve the large-scale instances, where we improve the classification accuracy and conduct the unbiased subdata selection in an alternating fashion. We study the convergence property of IRS and derive its approximation bound. More broadly, this framework can be leveraged to improve classification models with unbalanced data by taking F1 score into consideration. Finally, we numerically demonstrate that the proposed framework can consistently yield better fair classification outcomes than existing methods.

[1]  Benjamin Fish,et al.  A Confidence-Based Approach for Balancing Fairness and Accuracy , 2016, SDM.

[2]  Kush R. Varshney,et al.  Optimized Pre-Processing for Discrimination Prevention , 2017, NIPS.

[3]  L. Ayalew,et al.  The application of GIS-based logistic regression for landslide susceptibility mapping in the Kakuda-Yahiko Mountains, Central Japan , 2005 .

[4]  Petros Xanthopoulos,et al.  A weighted support vector machine method for control chart pattern recognition , 2014, Comput. Ind. Eng..

[5]  Jun Sakuma,et al.  Fairness-Aware Classifier with Prejudice Remover Regularizer , 2012, ECML/PKDD.

[6]  Tong Zhang,et al.  Text Categorization Based on Regularized Linear Classification Methods , 2001, Information Retrieval.

[7]  Marleen de Bruijne,et al.  Machine learning approaches in medical image analysis: From detection to diagnosis , 2016, Medical Image Anal..

[8]  Aditya Krishna Menon,et al.  The cost of fairness in binary classification , 2018, FAT.

[9]  Lucila Ohno-Machado,et al.  Logistic regression and artificial neural network classification models: a methodology review , 2002, J. Biomed. Informatics.

[10]  Xuan Li,et al.  Probabilistic framework of visual anomaly detection for unbalanced data , 2016, Neurocomputing.

[11]  Phebe Vayanos,et al.  Learning Optimal and Fair Decision Trees for Non-Discriminative Decision-Making , 2019, AAAI.

[12]  Shai Ben-David,et al.  Empirical Risk Minimization under Fairness Constraints , 2018, NeurIPS.

[13]  Prateek Jain,et al.  Low-rank matrix completion using alternating minimization , 2012, STOC '13.

[14]  Bernhard Schölkopf,et al.  Avoiding Discrimination through Causal Reasoning , 2017, NIPS.

[15]  Kamal Nasrollahi,et al.  Deep Pain: Exploiting Long Short-Term Memory Networks for Facial Expression Classification , 2017, IEEE Transactions on Cybernetics.

[16]  Yufeng Liu,et al.  Fisher Consistency of Multicategory Support Vector Machines , 2007, AISTATS.

[17]  M. Phil,et al.  A METHODOLOGY FOR DIRECT AND INDIRECT DISCRIMINATION PREVENTION IN DATA MINING , 2015 .

[18]  Jack Y. Yang,et al.  A comparative study of different machine learning methods on microarray gene expression data , 2008, BMC Genomics.

[19]  Gianluca Bontempi,et al.  Learned lessons in credit card fraud detection from a practitioner perspective , 2014, Expert Syst. Appl..

[20]  Yi Lin Multicategory Support Vector Machines, Theory, and Application to the Classification of . . . , 2003 .

[21]  Yang Song,et al.  Age Progression/Regression by Conditional Adversarial Autoencoder , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Hui Li,et al.  Generalized Alternating Projection for Weighted-퓁2, 1 Minimization with Applications to Model-Based Compressive Sensing , 2014, SIAM J. Imaging Sci..

[23]  Han Meng,et al.  Parameter selection in SVM with RBF kernel function , 2012, World Automation Congress 2012.

[24]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[25]  Julia Rubin,et al.  Fairness Definitions Explained , 2018, 2018 IEEE/ACM International Workshop on Software Fairness (FairWare).

[26]  Aihua Shen,et al.  Application of Classification Models on Credit Card Fraud Detection , 2007, 2007 International Conference on Service Systems and Service Management.

[27]  Viet Anh Nguyen,et al.  A Distributionally Robust Approach to Fair Classification , 2020, ArXiv.

[28]  Saeid Sanei,et al.  A gradient-based alternating minimization approach for optimization of the measurement matrix in compressive sensing , 2012, Signal Process..

[29]  Krishna P. Gummadi,et al.  Fairness Constraints: A Flexible Approach for Fair Classification , 2019, J. Mach. Learn. Res..

[30]  Boi Faltings,et al.  Non-Discriminatory Machine Learning through Convex Fairness Criteria , 2018, AAAI.

[31]  Ananth Balashankar,et al.  What is Fair? Exploring Pareto-Efficiency for Fairness Constrained Classifiers , 2019, ArXiv.

[32]  Krishna P. Gummadi,et al.  Fairness Beyond Disparate Treatment & Disparate Impact: Learning Classification without Disparate Mistreatment , 2016, WWW.

[33]  John Langford,et al.  A Reductions Approach to Fair Classification , 2018, ICML.

[34]  Fernando De la Torre,et al.  Facing Imbalanced Data--Recommendations for the Use of Performance Metrics , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[35]  Jon M. Kleinberg,et al.  On Fairness and Calibration , 2017, NIPS.

[36]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[37]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[38]  Yi Lin,et al.  Support Vector Machines and the Bayes Rule in Classification , 2002, Data Mining and Knowledge Discovery.

[39]  Le Yu,et al.  Exploiting effective facial patches for robust gender recognition , 2019, Tsinghua Science and Technology.

[40]  Mohammed Bennamoun,et al.  Linear Regression for Face Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  M. Lai,et al.  On Convergence of the Alternating Projection Method for Matrix Completion and Sparse Recovery Problems , 2017, 1711.02151.

[42]  Toon Calders,et al.  Data preprocessing techniques for classification without discrimination , 2011, Knowledge and Information Systems.

[43]  Weijun Xie,et al.  On the Cluster-aware Supervised Learning (CluSL): Frameworks, Convergent Algorithms, and Applications , 2019 .

[44]  Josep Domingo-Ferrer,et al.  A Methodology for Direct and Indirect Discrimination Prevention in Data Mining , 2013, IEEE Transactions on Knowledge and Data Engineering.

[45]  Matt Olfat,et al.  Spectral Algorithms for Computing Fair Support Vector Machines , 2017, AISTATS.

[46]  Xia Hu,et al.  Fairness in Deep Learning: A Computational Perspective , 2019, IEEE Intelligent Systems.

[47]  Keiron O'Shea,et al.  An Introduction to Convolutional Neural Networks , 2015, ArXiv.