Cost-Sensitive Ensemble of Support Vector Machines for Effective Detection of Microcalcification in Breast Cancer Diagnosis

This paper presents a new approach for the cost-sensitive classification problems based on the Boosting ensemble of support vector machines (SVMs). Different from conventional Boosting ensemble learning methods that adjust the distribution of training instances for minimizing the misclassification rate, the presented approach adjusts the training data distribution so as to minimize the expected cost of classification. This approach has been applied successfully in Microcalcification (MC) detection which is a typical cost-sensitive classification problem in breast cancer diagnosis. Its performance is evaluated by means of Receiver Operating Characteristics (ROC) curves and the expected costs of classification. Experimental results have consistently confirmed that the ROC of the SVM ensemble classifier is very close to the curve enveloping the base classifier ROC curves. This characteristic illustrates that the SVM ensemble is able to always improve the performance of the classification. Furthermore, the experimental results demonstrate that the method presented is able to not only increase the area under the ROC curve (AUC) but also minimize the expected classification cost.

[1]  Ling Guan,et al.  A CAD System for the Automatic Detection of Clustered Microcalcification in Digitized Mammogram Films , 2000, IEEE Trans. Medical Imaging.

[2]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[3]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[4]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[5]  Robert C. Holte,et al.  Exploiting the Cost (In)sensitivity of Decision Tree Splitting Criteria , 2000, ICML.

[6]  Ioannis Pitas,et al.  Combining support vector machines for accurate face detection , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[7]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[8]  Paul Sajda,et al.  Learning contextual relationships in mammograms using a hierarchical pyramid neural network , 2002, IEEE Transactions on Medical Imaging.

[9]  Robert E. Schapire,et al.  A Brief Introduction to Boosting , 1999, IJCAI.

[10]  Hyun-Chul Kim,et al.  Constructing support vector machine ensemble , 2003, Pattern Recognit..

[11]  Carlos Ordonez,et al.  Discovering Interesting Association Rules in Medical Data , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[12]  Kai Ming Ting,et al.  A Comparative Study of Cost-Sensitive Boosting Algorithms , 2000, ICML.

[13]  Osmar R. Zaïane,et al.  Mammography Classification By an Association Rule-based Classifier , 2002, MDM/KDD.

[14]  Osmar R. Zaïane,et al.  Application of Data Mining Techniques for Medical Image Classification , 2001, MDM/KDD.

[15]  Giorgio Valentini,et al.  Bias-Variance Analysis of Support Vector Machines for the Development of SVM-Based Ensemble Methods , 2004, J. Mach. Learn. Res..

[16]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.

[17]  Nikolas P. Galatsanos,et al.  A support vector machine approach for detection of microcalcifications , 2002, IEEE Transactions on Medical Imaging.

[18]  Kai Ming Ting,et al.  Boosting Cost-Sensitive Trees , 1998, Discovery Science.

[19]  Hyun-Chul Kim,et al.  Pattern classification using support vector machine ensemble , 2002, Object recognition supported by user interaction for service robots.

[20]  Salvatore J. Stolfo,et al.  AdaCost: Misclassification Cost-Sensitive Boosting , 1999, ICML.

[21]  Thomas G. Dietterich,et al.  Bootstrap Methods for the Cost-Sensitive Evaluation of Classifiers , 2000, ICML.