论文信息 - Cost-sensitive classification with inadequate labeled data

Cost-sensitive classification with inadequate labeled data

It is an actual and challenging issue to learn cost-sensitive models from those datasets that are with few labeled data and plentiful unlabeled data, because some time labeled data are very difficult, time consuming and/or expensive to obtain. To solve this issue, in this paper we proposed two classification strategies to learn cost-sensitive classifier from training datasets with both labeled and unlabeled data, based on Expectation Maximization (EM). The first method, Direct-EM, uses EM to build a semi-supervised classifier, then directly computes the optimal class label for each test example using the class probability produced by the learning model. The second method, CS-EM, modifies EM by incorporating misclassification cost into the probability estimation process. We conducted extensive experiments to evaluate the efficiency, and results show that when using only a small number of labeled training examples, the CS-EM outperforms the other competing methods on majority of the selected UCI data sets across different cost ratios, especially when cost ratio is high.

[1] Pedro M. Domingos. MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[2] George Forman,et al. Learning from Little: Comparison of Classifiers Given Little Training , 2004, PKDD.

[3] Goo Jun,et al. A Self-training Approach to Cost Sensitive Uncertainty Sampling , 2009, ECML/PKDD.

[4] Ning Chen,et al. Weighted Learning Vector Quantization to Cost-Sensitive Learning , 2010, ICANN.

[5] Dragos D. Margineantu,et al. Class Probability Estimation and Cost-Sensitive Classification Decisions , 2002, ECML.

[6] Matthias Seeger,et al. Learning from Labeled and Unlabeled Data , 2010, Encyclopedia of Machine Learning.

[7] Bianca Zadrozny,et al. Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers , 2001, ICML.

[8] Thomas G. Dietterich,et al. Improved Class Probability Estimates from Decision Tree Models , 2003 .

[9] Carla E. Brodley,et al. Pruning Decision Trees with Misclassification Costs , 1998, ECML.

[10] Zhi-Hua Zhou,et al. Cost-Sensitive Semi-Supervised Support Vector Machine , 2010, AAAI.

[11] Xin Yao,et al. Cost-sensitive classification with genetic programming , 2005, 2005 IEEE Congress on Evolutionary Computation.

[12] Sebastian Thrun,et al. Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[13] Pedro M. Domingos,et al. Tree Induction for Probability-Based Ranking , 2003, Machine Learning.

[14] Fabio Roli. Semi-supervised Multiple Classifier Systems: Background and Research Directions , 2005, Multiple Classifier Systems.

[15] Charles Elkan,et al. The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[16] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[17] Li Liu,et al. Cost-sensitive semi-supervised classification using CS-EM , 2008, 2008 8th IEEE International Conference on Computer and Information Technology.

[18] Thomas G. Dietterich,et al. Methods for cost-sensitive learning , 2002 .

[19] Nikunj C. Oza. Multiple Classifier Systems, 6th International Workshop, MCS 2005, Seaside, CA, USA, June 13-15, 2005, Proceedings , 2005, Multiple Classifier Systems.

[20] Xiaojin Zhu,et al. --1 CONTENTS , 2006 .

[21] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[22] Guido Dedene,et al. Cost-sensitive learning and decision making revisited , 2005, Eur. J. Oper. Res..

[23] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .