Recursive feature elimination in Raman spectra with support vector machines

The presence of irrelevant and correlated data points in a Raman spectrum can lead to a decline in classifier performance. We introduce support vector machine (SVM)-based recursive feature elimination into the field of Raman spectroscopy and demonstrate its performance on a data set of spectra of clinically relevant microorganisms in urine samples, along with patient samples. As the original technique is only suitable for two-class problems, we adapt it to the multi-class setting. It is shown that a large amount of spectral points can be removed without degrading the prediction accuracy of the resulting model notably.

[1]  R. Tibshirani,et al.  Penalized classification using Fisher's linear discriminant , 2011, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[2]  Dor Ben-Amotz,et al.  Stripping of Cosmic Spike Spectral Artifacts Using a New Upper-Bound Spectrum Algorithm , 2001 .

[3]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[4]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[5]  Kurt Hornik,et al.  kernlab - An S4 Package for Kernel Methods in R , 2004 .

[6]  Jan M. Van Campenhout 36 Topics in measurement selection , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[7]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[8]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[9]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[10]  C. Furlanello,et al.  Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products , 2006 .

[11]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[12]  M. Morhác,et al.  Background elimination methods for multidimensional coincidence γ-ray spectra , 1997 .

[13]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[14]  Bjoern H. Menze,et al.  A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data , 2009, BMC Bioinformatics.

[15]  Jürgen Popp,et al.  A comprehensive study of classification methods for medical diagnosis , 2009 .

[16]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[17]  Jürgen Popp,et al.  Culture independent Raman spectroscopic identification of urinary tract infection pathogens: a proof of principle study. , 2013, Analytical chemistry.

[18]  Barry K. Lavine,et al.  Raman Spectroscopy and Genetic Algorithms for the Classification of Wood Types , 2001 .

[19]  Sayan Mukherjee,et al.  Classifying Microarray Data Using Support Vector Machines , 2003 .

[20]  Jürgen Popp,et al.  Identification of meat-associated pathogens via Raman microspectroscopy. , 2014, Food microbiology.

[21]  Jürgen Popp,et al.  The application of Raman spectroscopy for the detection and identification of microorganisms , 2016 .

[22]  Edward R. Dougherty,et al.  The peaking phenomenon in the presence of feature-selection , 2008, Pattern Recognit. Lett..

[23]  Jürgen Popp,et al.  Checking and Improving Calibration of Raman Spectra using Chemometric Approaches , 2011 .

[24]  A. Zeileis Econometric Computing with HC and HAC Covariance Matrix Estimators , 2004 .

[25]  Thomas Lengauer,et al.  Classification with correlated features: unreliability of feature ranking and solutions , 2011, Bioinform..

[26]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[27]  Michael Schmitt,et al.  Chemotaxonomic Identification of Single Bacteria by Micro-Raman Spectroscopy: Application to Clean-Room-Relevant Biological Contaminations , 2005, Applied and Environmental Microbiology.

[28]  Yoram Bresler,et al.  On the Optimality of the Backward Greedy Algorithm for the Subset Selection Problem , 2000, SIAM J. Matrix Anal. Appl..