Improving discrimination of Raman spectra by optimising preprocessing strategies on the basis of the ability to refine the relationship between variance components

Abstract Discrimination of the samples into predefined groups is the issue at hand in many fields, such as medicine, environmental and forensic studies, etc. Its success strongly depends on the effectiveness of groups separation, which is optimal when the group means are much more distant than the data within the groups, i.e. the variation of the group means is greater than the variation of the data averaged over all groups. The task is particularly demanding for signals (e.g. spectra) as a lot of effort is required to prepare them in a way to uncover interesting features and turn them into more meaningful information that better fits for the purpose of data analysis. The solution can be adequately handled by using preprocessing strategies which should highlight the features relevant for further analysis (e.g. discrimination) by removing unwanted variation, deteriorating effects, such as noise or baseline drift, and standardising the signals. The aim of the research was to develop an automated procedure for optimising the choice of the preprocessing strategy to make it most suitable for discrimination purposes. The authors propose a novel concept to assess the goodness of the preprocessing strategy using the ratio of the between-groups to within-groups variance on the first latent variable derived from regularised MANOVA that is capable of exposing the groups differences for highly multidimensional data. The quest for the best preprocessing strategy was carried out using the grid search and much more efficient genetic algorithm. The adequacy of this novel concept, that remarkably supports the discrimination analysis, was verified through the assessment of the capability of solving two forensic comparison problems - discrimination between differently-aged bloodstains and various car paints described by Raman spectra - using likelihood ratio framework, as a recommended tool for discriminating samples in the forensics.

[1]  P. Cadusch,et al.  Improved methods for fluorescence background subtraction from Raman spectra , 2013, 1306.4156.

[2]  Jürgen Popp,et al.  Optimization of Raman-spectrum baseline correction in biological application. , 2016, The Analyst.

[3]  Martin Lopatka,et al.  Evaluating score- and feature-based likelihood ratio models for multivariate continuous data: applied to forensic MDMA comparison , 2015 .

[4]  L. Buydens,et al.  Regularized MANOVA (rMANOVA) in untargeted metabolomics. , 2015, Analytica chimica acta.

[5]  H. Senn,et al.  Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Application in 1H NMR metabonomics. , 2006, Analytical chemistry.

[6]  D. B. Hibbert Genetic algorithms in chemistry , 1993 .

[7]  Grzegorz Zadora,et al.  In the pursuit of the holy grail of forensic science – Spectroscopic studies on the estimation of time since deposition of bloodstains , 2018, TrAC Trends in Analytical Chemistry.

[8]  G. Zadora,et al.  Interpretation of FTIR spectra of polymers and Raman spectra of car paints by means of likelihood ratio approach supported by wavelet transform for reducing data dimensionality , 2015, Analytical and Bioanalytical Chemistry.

[9]  Jürgen Popp,et al.  How to pre-process Raman spectra for reliable and stable models? , 2011, Analytica chimica acta.

[10]  R. Bonner,et al.  Application of wavelet transforms to experimental spectra : Smoothing, denoising, and data set compression , 1997 .

[11]  Lutgarde M. C. Buydens,et al.  Evolutionary optimisation : a tutorial , 1998 .

[12]  Riccardo Leardi,et al.  Genetic algorithms in chemistry. , 2007, Journal of chromatography. A.

[13]  H. Martens,et al.  Extended multiplicative signal correction and spectral interference subtraction: new preprocessing methods for near infrared spectroscopy. , 1991, Journal of pharmaceutical and biomedical analysis.

[14]  Lutgarde M. C. Buydens,et al.  Breaking with trends in pre-processing? , 2013 .

[15]  K. Liland,et al.  Model-based pre-processing in Raman spectroscopy of biological samples , 2016 .

[16]  A. Savitzky,et al.  Smoothing and Differentiation of Data by Simplified Least Squares Procedures. , 1964 .

[17]  Franco Taroni,et al.  Statistics and the Evaluation of Evidence for Forensic Scientists , 2004 .

[18]  Colin Aitken,et al.  Evaluation of trace evidence in the form of multivariate data , 2004 .

[19]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[20]  Ł. Komsta,et al.  Comparison of Several Methods of Chromatographic Baseline Removal with a New Approach Based on Quantile Regression , 2011, Chromatographia.

[21]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[22]  Jens Petter Wold,et al.  Raman Spectra of Biological Samples: A Study of Preprocessing Methods , 2006, Applied spectroscopy.

[23]  Douglas B. Kell,et al.  Wavelet Denoising of Infrared Spectra , 1997 .

[24]  D. R. Cousens,et al.  SNIP, A STATISTICS-SENSITIVE BACKGROUND TREATMENT FOR THE QUANTITATIVE-ANALYSIS OF PIXE SPECTRA IN GEOSCIENCE APPLICATIONS , 1988 .

[25]  Peter Lasch,et al.  Spectral pre-processing for biomedical vibrational spectroscopy and microspectroscopic imaging , 2012 .

[26]  Ton G van Leeuwen,et al.  Forensic quest for age determination of bloodstains. , 2012, Forensic science international.

[27]  Robert W. Field,et al.  Baseline subtraction using robust local regression estimation , 2001 .

[28]  D. McLean,et al.  Automated Autofluorescence Background Subtraction Algorithm for Biomedical Raman Spectroscopy , 2007, Applied spectroscopy.

[29]  Ganesh D. Sockalingum,et al.  Pre‐processing in biochemometrics: correction for path‐length and temperature effects of water in FTIR bio‐spectroscopy by EMSC , 2006 .

[30]  Yi-Zeng Liang,et al.  Baseline correction using adaptive iteratively reweighted penalized least squares. , 2010, The Analyst.

[31]  P. Eilers A perfect smoother. , 2003, Analytical chemistry.

[32]  Jiangtao Peng,et al.  Asymmetric least squares for multiple spectra baseline correction. , 2010, Analytica chimica acta.

[33]  Yi-Zeng Liang,et al.  An intelligent background-correction algorithm for highly fluorescent samples in Raman spectroscopy , 2010 .

[34]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Yizeng Liang,et al.  Preprocessing of analytical profiles in the presence of homoscedastic or heteroscedastic noise , 1994 .

[36]  Achim Kohler,et al.  Extended multiplicative signal correction in vibrational spectroscopy, a tutorial , 2012 .

[37]  Edmund Taylor Whittaker On a New Method of Graduation , 1922, Proceedings of the Edinburgh Mathematical Society.

[38]  Rekha Gautam,et al.  Review of multidimensional data processing approaches for Raman and infrared spectroscopy , 2015, EPJ Techniques and Instrumentation.

[39]  H. Martens,et al.  Extended Multiplicative Signal Correction as a Tool for Separation and Characterization of Physical and Chemical Information in Fourier Transform Infrared Microscopy Images of Cryo-Sections of Beef Loin , 2005, Applied spectroscopy.

[40]  Chen Chen,et al.  Selective iteratively reweighted quantile regression for baseline correction , 2014, Analytical and Bioanalytical Chemistry.

[41]  Desire L. Massart,et al.  Wavelets — something for analytical chemistry? , 1997 .

[42]  Quan Liu,et al.  Review of Fluorescence Suppression Techniques in Raman Spectroscopy , 2015 .

[43]  Agnieszka Martyna,et al.  Statistical Analysis in Forensic Science: Evidential Value of Multivariate Physicochemical Data , 2014 .

[44]  A. Mahadevan-Jansen,et al.  Automated Method for Subtraction of Fluorescence from Biological Raman Spectra , 2003, Applied spectroscopy.

[45]  N. Dean,et al.  Hybrid approach combining chemometrics and likelihood ratio framework for reporting the evidential value of spectra. , 2016, Analytica chimica acta.

[46]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .