Active learning for cost-sensitive classification using logistic regression model

Active learning aims to selectively label the most informative examples to save the data collection cost. While active learning has been well studied for balanced classification problems, limited research is performed in cost-sensitive scenario. In this paper, we investigate the problem of active learning for cost-sensitive classification. We first propose a general active learning framework named GEM, which chooses examples leading to the minimum generalization error. Then we incorporate the misclassification cost into expected loss calculation under the proposed framework, and derive a model estimation rule with the Newton-Raphson method using logistic regression as the base model. Finally, we present the complete active learning algorithm for cost-sensitive classification. Extensive experiments on various benchmark data sets from the UCI repository have demonstrated the effectiveness of the proposed algorithm.

[1]  Francisco Herrera,et al.  Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics , 2012, Expert Syst. Appl..

[2]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[3]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[4]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[5]  Chris H. Q. Ding,et al.  Active Learning for Support Vector Machines with Maximum Model Change , 2014, ECML/PKDD.

[6]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[7]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[8]  Ya Zhang,et al.  Active Learning for Ranking through Expected Loss Optimization , 2010, IEEE Transactions on Knowledge and Data Engineering.

[9]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[10]  Goo Jun,et al.  A self-training approach to cost sensitive uncertainty sampling , 2009, Machine Learning.

[11]  Steven C. H. Hoi,et al.  Cost-sensitive online active learning with application to malicious URL detection , 2013, KDD.

[12]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[13]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[14]  Chengqi Zhang,et al.  Cost Sensitive Classification in Data Mining , 2010, ADMA.

[15]  Alekh Agarwal,et al.  Selective sampling algorithms for cost-sensitive multiclass prediction , 2013, ICML.