MetaCost: a general method for making classifiers cost-sensitive

Research in machine learning, statistics and related fields has produced a wide variety of algorithms for classification. However, most of these algorithms assume that all errors have the same cost, which is seldom the case in KDD problems. Individually making each classification learner costsensitive is laborious, and often non-trivial. In this paper we propose a principled method for making an arbitrary classifier cost-sensitive by wrapping a cost-minimizing procedure around it. This procedure, called MetaCost, treats the underlying classifier as a black box, requiring no knowledge of its functioning or change to it. Unlike stratification, MetaCost, is applicable to any number of classes and to arbitrary cost matrices. Empirical trials on a large suite of benchmark databases show that MetaCost almost always produces large cost reductions compared to the cost-blind classifier used (C4.5RULES) and to two forms of stratification. Further tests identify the key components of MetaCost and those that can be varied without substantial loss. Experiments on a larger database indicate that MetaCost scales well.

[1]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[2]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[3]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[4]  Ryszard S. Michalski,et al.  A Theory and Methodology of Inductive Learning , 1983, Artificial Intelligence.

[5]  Editors , 1986, Brain Research Bulletin.

[6]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[7]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[8]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[9]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[10]  L. Breiman Pasting Bites Together For Prediction In Large Data Sets And On-Line , 1996 .

[11]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[12]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[13]  Pedro M. Domingos Linear-Time Rule Induction , 1996, KDD.

[14]  L. Breiman OUT-OF-BAG ESTIMATION , 1996 .

[15]  Pedro M. Domingos Knowledge Acquisition from Examples Via Multiple Models , 1997 .

[16]  Pedro M. Domingos Why Does Bagging Work? A Bayesian Account and its Implications , 1997, KDD.

[17]  Daniel A. Keim,et al.  On Knowledge Discovery and Data Mining , 1997 .

[18]  Pedro M. Domingos Knowledge Acquisition form Examples Vis Multiple Models , 1997, ICML.

[19]  Stephen D. Bay Combining Nearest Neighbor Classifiers Through Multiple Feature Subsets , 1998, ICML.

[20]  Zijian Zheng,et al.  Naive Bayesian Classifier Committees , 1998, ECML.

[21]  Salvatore J. Stolfo,et al.  Toward Scalable Learning with Non-Uniform Class and Cost Distributions: A Case Study in Credit Card Fraud Detection , 1998, KDD.

[22]  Pedro M. Domingos,et al.  How to Get a Free Lunch: A Simple Cost Model for Machine Learning Applications , 1998 .

[23]  Ron Kohavi,et al.  The Case against Accuracy Estimation for Comparing Induction Algorithms , 1998, ICML.

[24]  Zijian Zheng Naive Bayesian Classiier Committees , 1998 .

[25]  Kai Ming Ting,et al.  Boosting Trees for Cost-Sensitive Classifications , 1998, ECML.

[26]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[27]  Peter D. Turney Cost-sensitive learning bibliography , 2000, The Web Conference.