Classification with Strategically Withheld Data

Machine learning techniques can be useful in applications such as credit approval and college admission. However, to be classified more favorably in such contexts, an agent may decide to strategically withhold some of her features, such as bad test scores. This is a missing data problem with a twist: which data is missing depends on the chosen classifier, because the specific classifier is what may create the incentive to withhold certain feature values. We address the problem of training classifiers that are robust to this behavior. We design three classification methods: MINCUT, HILLCLIMBING (HC) and Incentive-Compatible Logistic Regression (IC-LR). We show that MINCUT is optimal when the true distribution of data is fully known. However, it can produce complex decision boundaries, and hence be prone to overfitting in some cases. Based on a characterization of truthful classifiers (i.e., those that give no incentive to strategically hide features), we devise a simpler alternative called HC which consists of a hierarchical ensemble of out-of-thebox classifiers, trained using a specialized hill-climbing procedure which we show to be convergent. For several reasons, MINCUT and HC are not effective in utilizing a large number of complementarily informative features. To this end, we present IC-LR, a modification of Logistic Regression that removes the incentive to strategically drop features. We also show that our algorithms perform well in experiments on realworld data sets, and present insights into their relative performance in different settings.

[1]  Nicole Immorlica,et al.  Maximizing Welfare with Incentive-Aware Evaluation Mechanisms , 2020, IJCAI.

[2]  Ariel D. Procaccia,et al.  Strategyproof Linear Regression in High Dimensions , 2018, EC.

[3]  Ohad Shamir,et al.  Learning to classify with missing and corrupted features , 2008, ICML '08.

[4]  Ariel D. Procaccia,et al.  Incentive compatible regression learning , 2008, SODA '08.

[5]  Foster J. Provost,et al.  Handling Missing Values when Applying Classification Models , 2007, J. Mach. Learn. Res..

[6]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[7]  Larry J. Eshelman,et al.  A dynamic ensemble approach to robust classification in the presence of missing data , 2015, Machine Learning.

[8]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[9]  J. Suykens,et al.  Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research , 2015, Eur. J. Oper. Res..

[10]  Vincent Conitzer,et al.  Distinguishing Distributions When Samples Are Strategically Transformed , 2019, NeurIPS.

[11]  Cynthia Rudin,et al.  Falling Rule Lists , 2014, AISTATS.

[12]  Vincent Conitzer,et al.  The Revelation Principle for Mechanism Design with Reporting Costs , 2016, EC.

[13]  J. Doug Tygar,et al.  Adversarial machine learning , 2019, AISec '11.

[14]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[15]  Gustavo E. A. P. A. Batista,et al.  An analysis of four missing data treatment methods for supervised learning , 2003, Appl. Artif. Intell..

[16]  Amir Globerson,et al.  Nightmare at test time: robust learning by feature deletion , 2006, ICML.

[17]  Vincent Conitzer,et al.  Complexity of Mechanism Design with Signaling Costs , 2015, AAMAS.

[18]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[19]  Hanrui Zhang,et al.  Classification with Few Tests through Self-Selection , 2021, AAAI.

[20]  Lan Yu Mechanism design with partial verification and revelation principle , 2010, Autonomous Agents and Multi-Agent Systems.

[21]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[22]  Christos H. Papadimitriou,et al.  Strategic Classification , 2015, ITCS.

[23]  Shai Ben-David,et al.  Multiclass Learnability and the ERM principle , 2011, COLT.

[24]  Pedro M. Domingos,et al.  Adversarial classification , 2004, KDD.

[25]  D. Goldstein,et al.  Simple Rules for Complex Decisions , 2017, 1702.04690.

[26]  Vincent Conitzer,et al.  When Samples Are Strategically Selected , 2019, ICML.

[27]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[28]  Hanrui Zhang,et al.  Automated Mechanism Design for Classification with Partial Verification , 2021, AAAI.

[29]  Christoforos Anagnostopoulos,et al.  When is the area under the receiver operating characteristic curve an appropriate measure of classifier performance? , 2013, Pattern Recognit. Lett..

[30]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[31]  Ben Taskar,et al.  Semi-Supervised Learning with Adversarially Missing Label Information , 2010, NIPS.

[32]  Raquel Florez-Lopez,et al.  Effects of missing data in credit risk scoring. A comparative analysis of methods to achieve robustness in the absence of sufficient data , 2010 .

[33]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[34]  Jerry R. Green,et al.  Partially Verifiable Information and Mechanism Design , 1986 .

[35]  Nitesh V. Chawla,et al.  SPECIAL ISSUE ON LEARNING FROM IMBALANCED DATA SETS , 2004 .

[36]  Benjamin M. Marlin,et al.  Missing Data Problems in Machine Learning , 2008 .

[37]  Vincent Conitzer,et al.  Incentive-Aware PAC Learning , 2021, AAAI.

[38]  Jon M. Kleinberg,et al.  How Do Classifiers Induce Agents to Invest Effort Strategically? , 2018, EC.