Classification of Microorganisms via Raman Spectroscopy Using Gaussian Processes

Automatic categorization of microorganisms is a complex task which requires advanced techniques to achieve accurate performance. In this paper, we aim at identifying microorganisms based on Raman spectroscopy. Empirical studies over the last years show that powerful machine learning methods such as Support Vector Machines (SVMs) are suitable for this task. Our work focuses on the Gaussian process (GP) classifier which is new to this field, provides fully probabilistic outputs and allows for efficient hyperparameter optimization. We also investigate the incorporation of prior knowledge regarding possible signal variations where known concepts from invariant kernel theory are transferred to the GP framework. In order to validate the suitability of the GP classifier, a comparison with state-of-the-art learners is conducted on a large-scale Raman spectra dataset, showing that the GP classifier significantly outperforms all other tested classifiers including SVM. Our results further show that incorporating prior knowledge leads to a significant performance gain when small amounts of training data are used.

[1]  Ali H. Sayed,et al.  Linear Estimation in Krein Spaces - Part I: Theory , 1996 .

[2]  Thomas G. Dietterich Adaptive computation and machine learning , 1998 .

[3]  S. T. Cowan,et al.  Cowan and Steel's manual for the identification of medical bacteria , 1993 .

[4]  R. Amann,et al.  Single-cell identification in microbial communities by improved fluorescence in situ hybridization techniques , 2008, Nature Reviews Microbiology.

[5]  Bernard Haasdonk,et al.  Transformation knowledge in pattern analysis with kernel methods: distance and integration kernels , 2006 .

[6]  C. Collins,et al.  Cowan and Steel's manual for the identification of medical bacteria. 3rd edn , 1993 .

[7]  T. Kailath,et al.  Linear estimation in Krein spaces. I. Theory , 1996, IEEE Trans. Autom. Control..

[8]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[9]  O. Ronneberger,et al.  Using transformation knowledge for the classification of Raman spectra of biological samples , 2006 .

[10]  E. Martin,et al.  Gaussian process regression for multivariate spectroscopic calibration , 2007 .

[11]  S. T. Cowan,et al.  Manual for the identification of medical bacteria. , 1960 .

[12]  K. Maquelin,et al.  Rapid Identification of Mycobacteria by Raman Spectroscopy , 2008, Journal of Clinical Microbiology.

[13]  C. Rasmussen,et al.  Approximations for Binary Gaussian Process Classification , 2008 .

[14]  I. Phillips Cowan and Steel's Manual for the Identification of Medical Bacteria , 1993 .

[15]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[16]  Michael Schmitt,et al.  Chemotaxonomic Identification of Single Bacteria by Micro-Raman Spectroscopy: Application to Clean-Room-Relevant Biological Contaminations , 2005, Applied and Environmental Microbiology.

[17]  Maya R. Gupta,et al.  Learning kernels from indefinite similarities , 2009, ICML '09.

[18]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.