Roulette Sampling for Cost-Sensitive Learning

In this paper, we propose a new and general preprocessor algorithm, called CSRoulette, which converts any cost-insensitive classification algorithms into cost-sensitive ones. CSRouletteis based on cost proportional roulette sampling technique (called CPRSin short). CSRouletteis closely related to Costing, another cost-sensitive meta-learning algorithm, which is based on rejection sampling. Unlike rejection sampling which produces smaller samples, CPRScan generate different size samples. To further improve its performance, we apply ensemble (bagging) on CPRS; the resulting algorithm is called CSRoulette. Our experiments show that CSRouletteoutperforms Costing and other meta-learning methods in most datasets tested. In addition, we investigate the effect of various sample sizes and conclude that reduced sample sizes (as in rejection sampling) cannot be compensated by increasing the number of bagging iterations.

[1]  Robert C. Holte,et al.  Exploiting the Cost (In)sensitivity of Decision Tree Splitting Criteria , 2000, ICML.

[2]  Qiang Yang,et al.  Decision trees with minimal costs , 2004, ICML.

[3]  Russell Greiner,et al.  Budgeted learning of nailve-bayes classifiers , 2002, UAI 2002.

[4]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[5]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[6]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[7]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[8]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[9]  Peter D. Turney Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm , 1994, J. Artif. Intell. Res..

[10]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[11]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[12]  Foster J. Provost,et al.  Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction , 2003, J. Artif. Intell. Res..

[13]  Kai Ming Ting,et al.  Inducing Cost-Sensitive Trees via Instance Weighting , 1998, PKDD.

[14]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[15]  Russell Greiner,et al.  Budgeted Learning of Naive-Bayes Classifiers , 2003, UAI.

[16]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[17]  John Langford,et al.  Cost-sensitive learning by cost-proportionate example weighting , 2003, Third IEEE International Conference on Data Mining.

[18]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .