Automatic model selection in cost-sensitive boosting

Abstract This paper introduces SSTBoost, a predictive classification methodology designed to target the accuracy of a modified boosting algorithm towards required sensitivity and specificity constraints. The SSTBoost method is demonstrated in practice for the automated medical diagnosis of cancer on a set of skin lesions (42 melanomas and 110 naevi) described by geometric and colorimetric features. A cost-sensitive variant of the AdaBoost algorithm is combined with a procedure for the automatic selection of optimal cost parameters. Within each boosting step, different weights are considered for errors on false negatives and false positives, and differently updated for negatives and positives. Given only a target region in the ROC space, the method also completely automates the selection of the cost parameters ratio, tipically of uncertain definition. On the cancer diagnosis problem, SSTBoost outperformed in accuracy and stability a battery of specialized automatic systems based on different types of multiple classifier combinations and a panel of expert dermatologists. The method thus can be applied for the early diagnosis of melanoma cancer or in other problems in which an automated cost-sensitive classification is required.

[1]  John Shawe-Taylor,et al.  Optimizing Classifers for Imbalanced Training Sets , 1998, NIPS.

[2]  Cesare Furlanello,et al.  Boosting of Tree-Based Classifiers for Predictive Risk Modeling in GIS , 2000, Multiple Classifier Systems.

[3]  R. H. Moss,et al.  Neural network diagnosis of malignant melanoma from color images , 1994, IEEE Transactions on Biomedical Engineering.

[4]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[5]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[6]  Kai Ming Ting,et al.  An Empirical Study of MetaCost Using Boosting Algorithms , 2000, ECML.

[7]  Yoav Freund,et al.  Discussion of the paper "Arcing Classifiers" by Leo Breiman , 1998 .

[8]  G Pellacani,et al.  Digital videomicroscopy improves diagnostic accuracy for melanoma. , 1998, Journal of the American Academy of Dermatology.

[9]  Enrico Blanzieri,et al.  Exploiting Classifier Combination for Early Melanoma Diagnosis Support , 2000, ECML.

[10]  Hugues Talbot,et al.  Automated melanoma diagnosis system , 1999, Smart Materials, Nano-, and Micro- Smart Systems.

[11]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[12]  H Takiwaki,et al.  A rudimentary system for automatic discrimination among basic skin lesions on the basis of color analysis of video images. , 1995, Journal of the American Academy of Dermatology.

[13]  Kai Ming Ting,et al.  A Comparative Study of Cost-Sensitive Boosting Algorithms , 2000, ICML.

[14]  K Wolff,et al.  Computer-aided epiluminescence microscopy of pigmented skin lesions: the value of clinical data for the classification process , 2000, Melanoma research.

[15]  Salvatore J. Stolfo,et al.  AdaCost: Misclassification Cost-Sensitive Boosting , 1999, ICML.

[16]  R. Tibshirani,et al.  An introduction to the bootstrap , 1993 .

[17]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[18]  M. Binder,et al.  Application of an artificial neural network in epiluminescence microscopy pattern analysis of pigmented skin lesions: a pilot study , 1994, The British journal of dermatology.

[19]  Lucila Ohno-Machado,et al.  A Comparison of Machine Learning Methods for the Diagnosis of Pigmented Skin Lesions , 2001, J. Biomed. Informatics.

[20]  Scott W. Menzies,et al.  Epiluminescence Microscopy Diagnostic Criteria With Follow-up Computer-Based Monitoring of Less Suspicious Lesions May Increase Sensitivity for the Diagnosis of Melanoma While Maintaining Adequate Specificity , 2001 .

[21]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[22]  Thomas G. Dietterich,et al.  Bootstrap Methods for the Cost-Sensitive Evaluation of Classifiers , 2000, ICML.

[23]  A. Green,et al.  Computer image analysis in the diagnosis of melanoma. , 1994, Journal of the American Academy of Dermatology.

[24]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[25]  Niall M. Adams,et al.  Improving the Practice of Classifier Performance Assessment , 2000, Neural Computation.

[26]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[27]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[28]  Amanda J. C. Sharkey,et al.  Combining Artificial Neural Nets: Ensemble and Modular Multi-Net Systems , 1999 .