Finding relevant spectral regions between spectroscopic techniques by use of cross model validation and partial least squares regression.

In this paper, we extend the concept of cross model validation (CMV) to multiple X and Y variables where different spectroscopic techniques serve as X and Y data in a regression context. For the first dataset on marzipan samples the main objective was to find significant regions in the spectral data, and to discuss the issue of false discovery, i.e. combinations of variables that erroneously are found to be significant. A permutation test within the framework of CMV showed that no regression coefficients in the partial least squares regression (PLSR) model between FT-IR and VIS/NIR spectra show significance at the 5% level. We believe the reason is that the CMV acts as strong filter towards spurious correlations. Corresponding CH- and OH-bands between FT-IR and NIR spectra gave significant regions. For the second dataset, the results from CMV are interpreted more in detail with chemical background knowledge in mind. Most of the significant regions found between the Raman and NIR spectra could be interpreted from the chemical composition of the oil mixtures. Some regions were more difficult to interpret, which could be due to systematic baseline effects in the NIR data.

[1]  Frank Westad,et al.  Cross validation and uncertainty estimates in independent component analysis , 2003 .

[2]  Vincent Baeten,et al.  Oil and Fat Classification by FT-Raman Spectroscopy , 1998 .

[3]  Geoffrey J McLachlan,et al.  Selection bias in gene extraction on the basis of microarray gene-expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Harald Martens,et al.  Reducing over-optimism in variable selection by cross-model validation , 2006 .

[5]  Sumio Kawano,et al.  Near infrared spectral patterns of fatty acid analysis from fats and oils , 1991 .

[6]  A. Mahadevan-Jansen,et al.  Automated Method for Subtraction of Fluorescence from Biological Raman Spectra , 2003, Applied spectroscopy.

[7]  Robert Tibshirani,et al.  Computer‐Intensive Statistical Methods , 2006 .

[8]  J P Wold,et al.  Raman and Near-Infrared Spectroscopy for Quantification of Fat Composition in a Complex Food Model System , 2005, Applied spectroscopy.

[9]  V. Segtnan,et al.  The potential of Raman spectroscopy for characterisation of the fatty acid unsaturation of salmon. , 2006, Analytica chimica acta.

[10]  H. Martens,et al.  Variable Selection in near Infrared Spectroscopy Based on Significance Testing in Partial Least Squares Regression , 2000 .

[11]  B. Efron Bootstrap Methods: Another Look at the Jackknife , 1979 .

[12]  Søren Balling Engelsen,et al.  Rapid Spectroscopic Analysis of Marzipan—Comparative Instrumentation , 2004 .

[13]  David B. Allison,et al.  Randomization tests for small samples: an application for genetic expression data , 2003 .

[14]  H. Martens,et al.  Modified Jack-knife estimation of parameter uncertainty in bilinear modelling by partial least squares regression (PLSR) , 2000 .