Support vector machine ensembles for breast cancer type prediction from mid-FTIR micro-calcification spectra

Abstract Over the past years Fourier transform infrared (FTIR) spectroscopy has been demonstrated as a prospective tool for cancer diagnostics. In order to apply FTIR spectroscopy as a routine tool for biomedical diagnostics of tissue samples, strong and reliable classifiers are needed. Frequently, the number of available tissue samples is restricted and due to that data sets consist of a small number of samples, often less than 100. This can result in unstable classifiers, which perform poorly on unseen data. In this work we present a way to overcome this limitation by aggregating several support vector machines in to an ensemble. Different ensemble systems, including bagging, boosting and tree-based models, were investigated for a FTIR data set acquired from different types and stages of breast cancer. It was found that an ensemble system predicts 88.9% of the unseen multi-class test set correctly. In comparison a single classifier only achieved a predictive performance of 66.7%. As these results show, the application of SVM ensembles in biomedical diagnostics using FTIR spectroscopy can be highly beneficial.

[1]  Claudia Beleites,et al.  Assessing and improving the stability of chemometric models in small sample size situations , 2008, Analytical and bioanalytical chemistry.

[2]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[3]  Abdel Ghaffmr Mokamed Ahmed,et al.  One Against All , 2009 .

[4]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[5]  N. Shepherd,et al.  New relationships between breast microcalcifications and cancer , 2010, British Journal of Cancer.

[6]  Friedhelm Schwenker,et al.  Hierarchical support vector machines for multi-class pattern recognition , 2000, KES'2000. Fourth International Conference on Knowledge-Based Intelligent Engineering Systems and Allied Technologies. Proceedings (Cat. No.00TH8516).

[7]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[8]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[9]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[10]  Dong-Sheng Cao,et al.  The boosting: A new idea of building models , 2010 .

[11]  Robert Sabourin,et al.  “One Against One” or “One Against All”: Which One is Better for Handwriting Recognition with SVMs? , 2006 .

[12]  Irccyn,et al.  Tenth international workshop on frontiers in handwriting recognition , 2006 .

[13]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[14]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[15]  J K Greenson,et al.  Reproducibility of the diagnosis of dysplasia in Barrett esophagus: a reaffirmation. , 2001, Human pathology.

[16]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[18]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[19]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[20]  D. Titterington,et al.  Comparison of Discrimination Techniques Applied to a Complex Data Set of Head Injured Patients , 1981 .

[21]  Christoph Krafft,et al.  Disease recognition by infrared and Raman spectroscopy , 2009, Journal of biophotonics.

[22]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[23]  C. Kendall,et al.  Vibrational spectroscopy: a clinical tool for cancer diagnostics. , 2009, The Analyst.

[24]  Qing-Song Xu,et al.  Support vector machines and its applications in chemistry , 2009 .