A method for controlling errors in two-class classification

In data mining, two types of errors may occur when records are classified into two categories A and B: misclassifying an item in A as one of B and conversely misclassifying an item in B as one of A. Tight control over these two errors is needed in many practical situations since, for example, an error of one type may have far more serious consequences than one of the other type. Most previous work on two-class classifiers does not allow for such control. We describe a general approach that supports tight error control for two-class classification and that can utilize any two-class classification method as part of the decision mechanism. The main idea is to construct from the given training data a family of classifiers and then, using the training data once more, to estimate two distributions of certain vote totals. The error control is achieved via the two estimated distributions. The approach has been tested using several well-known classification problems. In each case, the two estimated distributions were very close to those obtained from verification data. Accordingly, good error control can be achieved.

[1]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[2]  datasets,et al.  Breast Cancer Diagnosis , 1967, Nature.

[3]  Toshihide Ibaraki,et al.  An Implementation of Logical Analysis of Data , 2000, IEEE Trans. Knowl. Data Eng..

[4]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[5]  Halbert White,et al.  Artificial Neural Networks: Approximation and Learning Theory , 1992 .

[6]  O. Mangasarian,et al.  Pattern Recognition Via Linear Programming: Theory and Application to Medical Diagnosis , 1989 .

[7]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[8]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[9]  William Nick Street,et al.  Breast Cancer Diagnosis and Prognosis Via Linear Programming , 1995, Oper. Res..

[10]  A. E. Hoerl,et al.  Ridge Regression: Applications to Nonorthogonal Problems , 1970 .

[11]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Mauricio G. C. Resende,et al.  A continuous approach to inductive inference , 1992, Math. Program..

[13]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[14]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[15]  Leo Breiman,et al.  Stacked regressions , 2004, Machine Learning.

[16]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[17]  Soundar R. T. Kumara,et al.  Generating logical expressions from positive and negative examples via a branch-and-bound approach , 1994, Comput. Oper. Res..

[18]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[19]  D. J. Hand,et al.  Recent advances in error rate estimation , 1986, Pattern Recognit. Lett..

[20]  David W. Opitz,et al.  An Empirical Evaluation of Bagging and Boosting , 1997, AAAI/IAAI.

[21]  J. C. Schlimmer,et al.  Concept acquisition through representational adjustment , 1987 .

[22]  Stephen I. Gallant,et al.  Neural network learning and expert systems , 1993 .

[23]  Corinna Cortes,et al.  Boosting Decision Trees , 1995, NIPS.

[24]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.