论文信息 - An Empirical Evaluation of Adaboost Extensions for Cost-Sensitive Classification

An Empirical Evaluation of Adaboost Extensions for Cost-Sensitive Classification

Classification is a data mining technique used to predict group membership for data instances. Cost-sensitive classifier is relatively new field of research in the data mining and machine learning communities. They are basically used for classification tasks under the cost-based model, unlike the error-based model. Error based classifier AdaBoost is a simple algorithm that reweights the training instances to build multiple classifiers in training phase, without considering the cost of misclassification. Out of all generated classifiers in training, in classification, it collects the weighted votes from each and classifies the new sample (example) according to maximum votes collected. Intuitively, combining multiple models shall give more robust predictions than a single model under the situation where misclassification costs are considered. Boosting has been shown to be an effective method of combining multiple models in order to enhance the predictive accuracy of a single model. Thus, it is natural to think that boosting might also reduce the misclassification costs. All the cost-sensitive boosters are studied and five new extensions are proposed and their results are compared in this paper. A few future extensions are notified. General Terms AdaBoost, Cost-sensitive classifiers, Data-Mining, Misclassification cost.

Ankit Desai | P. M. Jadav

[1] Kai Ming Ting,et al. Boosting Trees for Cost-Sensitive Classifications , 1998, ECML.

[2] Geoffrey I. Webb. Cost-Sensitive Specialization , 1996, PRICAI.

[3] John Langford,et al. Cost-sensitive learning by cost-proportionate example weighting , 2003, Third IEEE International Conference on Data Mining.

[4] Yoram Singer,et al. Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[5] Pedro M. Domingos. MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[6] Tao Wang,et al. Handling over-fitting in test cost-sensitive decision tree learning by feature selection, smoothing and pruning , 2010, J. Syst. Softw..

[7] M. Zweig,et al. Prevalence-value-accuracy plots: a new method for comparing diagnostic tests based on misclassification costs. , 1999, Clinical chemistry.

[8] Sunil Vadera,et al. An empirical comparison of cost‐sensitive decision tree induction algorithms , 2011, Expert Syst. J. Knowl. Eng..

[9] Salvatore J. Stolfo,et al. AdaCost: Misclassification Cost-Sensitive Boosting , 1999, ICML.