Active Learning for Cost-Sensitive Classification

We design an active learning algorithm for cost-sensitive multiclass classification: problems where different errors have different costs. Our algorithm, COAL, makes predictions by regressing to each label's cost and predicting the smallest. On a new example, it uses a set of regressors that perform well on past data to estimate possible costs for each label. It queries only the labels that could be the best, ignoring the sure losers. We prove COAL can be efficiently implemented for any regression family that admits squared loss optimization; it also enjoys strong guarantees with respect to predictive performance and labeling effort. We empirically compare COAL to passive learning and several active learning baselines, showing significant improvements in labeling effort and test cost on real-world datasets.

[1]  E. Mammen,et al.  Smooth Discrimination Analysis , 1999 .

[2]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[3]  Francesco Orabona,et al.  Better Algorithms for Selective Sampling , 2011, ICML.

[4]  John Langford,et al.  Learning to Search Better than Your Teacher , 2015, ICML.

[5]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[6]  Karthik Sridharan,et al.  Learning with Square Loss: Localization through Offset Rademacher Complexity , 2015, COLT.

[7]  Siyuan Zhou,et al.  Active learning for cost-sensitive classification using logistic regression model , 2016, 2016 IEEE International Conference on Big Data Analysis (ICBDA).

[8]  John Langford,et al.  Normalized Online Learning , 2013, UAI.

[9]  John Langford,et al.  Efficient and Parsimonious Agnostic Active Learning , 2015, NIPS.

[10]  Steve Hanneke,et al.  Theory of Disagreement-Based Active Learning , 2014, Found. Trends Mach. Learn..

[11]  Daniel J. Hsu Algorithms for active learning , 2010 .

[12]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[13]  David Haussler,et al.  Sphere Packing Numbers for Subsets of the Boolean n-Cube with Bounded Vapnik-Chervonenkis Dimension , 1995, J. Comb. Theory, Ser. A.

[14]  Éva Tardos,et al.  Fast Approximation Algorithms for Fractional Packing and Covering Problems , 1995, Math. Oper. Res..

[15]  John Langford,et al.  Sensitive Error Correcting Output Codes , 2005, COLT.

[16]  Sanjeev Arora,et al.  The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..

[17]  Maria-Florina Balcan,et al.  Margin Based Active Learning , 2007, COLT.

[18]  Robert D. Nowak,et al.  Faster Rates in Regression via Active Learning , 2005, NIPS.

[19]  P. Massart,et al.  Risk bounds for statistical learning , 2007, math/0702683.

[20]  John Langford,et al.  Search-based structured prediction , 2009, Machine Learning.

[21]  Liu Yang,et al.  Minimax Analysis of Active Learning , 2014, J. Mach. Learn. Res..

[22]  Robert D. Nowak,et al.  Minimax Bounds for Active Learning , 2007, IEEE Transactions on Information Theory.

[23]  John Langford,et al.  Agnostic Active Learning Without Constraints , 2010, NIPS.

[24]  Akshay Krishnamurthy,et al.  Efficient Algorithms for Adversarial Contextual Learning , 2016, ICML.

[25]  Kamalika Chaudhuri,et al.  Beyond Disagreement-Based Agnostic Active Learning , 2014, NIPS.

[26]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[27]  Alekh Agarwal,et al.  Selective sampling algorithms for cost-sensitive multiclass prediction , 2013, ICML.

[28]  John Langford,et al.  Agnostic active learning , 2006, J. Comput. Syst. Sci..

[29]  Alexandra Carpentier,et al.  Adaptivity to Noise Parameters in Nonparametric Active Learning , 2017, COLT.

[30]  John Langford,et al.  Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits , 2014, ICML.

[31]  Byron Boots,et al.  Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction , 2017, ICML.

[32]  Liu Yang,et al.  Surrogate Losses in Passive and Active Learning , 2012, Electronic Journal of Statistics.

[33]  David Cohn,et al.  Active Learning , 2010, Encyclopedia of Machine Learning.

[34]  Karthik Sridharan,et al.  Empirical Entropy, Minimax Regret and Minimax Risk , 2013, ArXiv.

[35]  Percy Liang,et al.  Learning Where to Sample in Structured Prediction , 2015, AISTATS.

[36]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Karthik Sridharan,et al.  On Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities , 2015, COLT.

[38]  John Langford,et al.  Contextual Bandit Algorithms with Supervised Learning Guarantees , 2010, AISTATS.

[39]  Stanislav Minsker,et al.  Plug-in Approach to Active Learning , 2011, J. Mach. Learn. Res..

[40]  Claudio Gentile,et al.  Learning noisy linear classifiers via adaptive and selective sampling , 2011, Machine Learning.

[41]  J. Andrew Bagnell,et al.  Reinforcement and Imitation Learning via Interactive No-Regret Learning , 2014, ArXiv.

[42]  Sanjoy Dasgupta,et al.  A General Agnostic Active Learning Algorithm , 2007, ISAIM.

[43]  John Langford,et al.  Importance weighted active learning , 2008, ICML '09.

[44]  Haipeng Luo,et al.  Practical Contextual Bandits with Regression Oracles , 2018, ICML.

[45]  John Langford,et al.  Online Importance Weight Aware Updates , 2010, UAI.

[46]  Maria-Florina Balcan,et al.  Active and passive learning of linear separators under log-concave distributions , 2012, COLT.

[47]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[48]  Alex A. Freitas,et al.  A survey of hierarchical classification across different application domains , 2010, Data Mining and Knowledge Discovery.