How to pre-process Raman spectra for reliable and stable models?

Raman spectroscopy in combination with chemometrics is gaining more and more importance for answering biological questions. This results from the fact that Raman spectroscopy is non-invasive, marker-free and water is not corrupting Raman spectra significantly. However, Raman spectra contain despite Raman fingerprint information other contributions like fluorescence background, Gaussian noise, cosmic spikes and other effects dependent on experimental parameters, which have to be removed prior to the analysis, in order to ensure that the analysis is based on the Raman measurements and not on other effects. Here we present a comprehensive study of the influence of pre-processing procedures on statistical models. We will show that a large amount of possible and physically meaningful pre-processing procedures leads to bad results. Furthermore a method based on genetic algorithms (GAs) is introduced, which chooses the spectral pre-processing according to the carried out analysis task without trying all possible pre-processing approaches (grid-search). This was demonstrated for the two most common tasks, namely for a multivariate calibration model and for two classification models. However, the presented approach can be applied in general, if there is a computational measure, which can be optimized. The suggested GA procedure results in models, which have a higher precision and are more stable against corrupting effects.

[1]  Jürgen Popp,et al.  Resonance Raman studies of photochemical molecular devices for multielectron storage , 2008 .

[2]  C. B. Lucasius,et al.  Understanding and using genetic algorithms Part 1. Concepts, properties and context , 1993 .

[3]  David C. Hoaglin,et al.  Some Implementations of the Boxplot , 1989 .

[4]  Michael Schmitt,et al.  Chemotaxonomic Identification of Single Bacteria by Micro-Raman Spectroscopy: Application to Clean-Room-Relevant Biological Contaminations , 2005, Applied and Environmental Microbiology.

[5]  Gerwin J. Puppels,et al.  Estimating the influence of experimental parameters on the prediction error of PLS calibration models based on Raman spectra , 2006 .

[6]  Jürgen Popp,et al.  Towards a specific characterisation of components on a cell surface—combined TERS‐investigations of lipids and human cells , 2009 .

[7]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[8]  G. W. Small,et al.  Spectral simulation protocol for extending the lifetime of near-infrared multivariate calibrations. , 2009, Analytical chemistry.

[9]  Jasjeet S. Sekhon,et al.  Genetic Optimization Using Derivatives , 2011, Political Analysis.

[10]  Jürgen Popp,et al.  Direct analysis of clinical relevant single bacterial cells from cerebrospinal fluid during bacterial meningitis by means of micro‐Raman spectroscopy , 2009, Journal of biophotonics.

[11]  Jürgen Popp,et al.  A comprehensive study of classification methods for medical diagnosis , 2009 .

[12]  B. Dietzek,et al.  Raman and CARS microspectroscopy of cells and tissues. , 2009, The Analyst.

[13]  A. Mahadevan-Jansen,et al.  Automated Method for Subtraction of Fluorescence from Biological Raman Spectra , 2003, Applied spectroscopy.

[14]  A. Dary,et al.  Modulation of Lipid Metabolism and Spiramycin Biosynthesis in Streptomyces ambofaciens Unstable Mutants , 1999, Applied and Environmental Microbiology.

[15]  Jürgen Popp,et al.  Three-dimensional molecular mapping of a multiple emulsion by means of CARS microscopy. , 2008, The journal of physical chemistry. B.

[16]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[17]  Jürgen Popp,et al.  Analysis of the cytochrome distribution via linear and nonlinear Raman spectroscopy. , 2010, The Analyst.

[18]  John C Rasmussen,et al.  Molecular imaging with optics: primer and case for near-infrared fluorescence techniques in personalized medicine. , 2008, Journal of biomedical optics.

[19]  D. R. Cousens,et al.  SNIP, A STATISTICS-SENSITIVE BACKGROUND TREATMENT FOR THE QUANTITATIVE-ANALYSIS OF PIXE SPECTRA IN GEOSCIENCE APPLICATIONS , 1988 .

[20]  Jürgen Popp,et al.  SERS: a versatile tool in chemical and biochemical diagnostics , 2008, Analytical and bioanalytical chemistry.

[21]  A. Savitzky,et al.  Smoothing and Differentiation of Data by Simplified Least Squares Procedures. , 1964 .

[22]  Volker Deckert,et al.  Tip-enhanced Raman scattering. , 2008, Chemical Society reviews.

[23]  M. Schmitt,et al.  Quantitative mineral analysis using Raman spectroscopy and chemometric techniques , 2010 .

[24]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[25]  François Tiaho,et al.  Estimation of helical angles of myosin and collagen by second harmonic generation imaging microscopy. , 2007, Optics express.

[26]  G. Steinberg,et al.  Hyphal Growth: a Tale of Motors, Lipids, and the Spitzenkörper , 2007, Eukaryotic Cell.

[27]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[28]  K. Reynolds,et al.  Alteration of the Fatty Acid Profile of Streptomyces coelicolor by Replacement of the Initiation Enzyme 3-Ketoacyl Acyl Carrier Protein Synthase III (FabH) , 2005, Journal of bacteriology.

[29]  Jürgen Popp,et al.  Identification and differentiation of single cells from peripheral blood by Raman spectroscopic imaging , 2010, Journal of biophotonics.

[30]  Jürgen Popp,et al.  Towards a quantitative SERS approach--online monitoring of analytes in a microfluidic system with isotope-edited internal standards. , 2009, Journal of biophotonics.