A data-driven functional projection approach for the selection of feature ranges in spectra with ICA or cluster analysis

Prediction problems from spectra are largely encountered in chemometry. In addition to accurate predictions, it is often needed to extract information about which wavelengths in the spectra contribute in an effective way to the quality of the prediction. This implies to select wavelengths (or wavelength intervals), a problem associated to variable selection. In this paper, it is shown how this problem may be tackled in the specific case of smooth (for example infrared) spectra. The functional character of the spectra (their smoothness) is taken into account through a functional variable projection procedure. Contrarily to standard approaches, the projection is performed on a basis that is driven by the spectra themselves, in order to best fit their characteristics. The methodology is illustrated by two examples of functional projection, using Independent Component Analysis and functional variable clustering, respectively. The performances on two standard infrared spectra benchmarks are illustrated.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  B. K. Alsberg Representation of spectra by continuous functions , 1993 .

[3]  Mikael Karlsson,et al.  Compression of first‐order spectral data using the B‐spline zero compression method , 1996 .

[4]  Liu Xianming,et al.  A Time Petri Net Extended with Price Information , 2007 .

[5]  P. Geladi,et al.  Linearization and Scatter-Correction for Near-Infrared Reflectance Spectra of Meat , 1985 .

[6]  Michel Verleysen,et al.  Fast Selection of Spectral Variables with B-Spline Compression , 2007, ArXiv.

[7]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[8]  Johan A. K. Suykens,et al.  LS-SVMlab : a MATLAB / C toolbox for Least Squares Support Vector Machines , 2007 .

[9]  Michel Verleysen,et al.  Mutual information for the selection of relevant variables in spectrometric nonlinear modelling , 2006, ArXiv.

[10]  Michel José Anzanello,et al.  Chemometrics and Intelligent Laboratory Systems , 2009 .

[11]  Desire L. Massart,et al.  Noise suppression and signal compression using the wavelet packet transform , 1997 .

[12]  Applied Spectroscopy , 2010 .

[13]  G. J. Fleer,et al.  Stationary dynamics approach to analytical approximations for polymer coexistence curves. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[14]  Michel Verleysen,et al.  Chemometric calibration of infrared spectrometers: selection and validation of variables by non-linear models , 2004 .

[15]  Olav M. Kvalheim,et al.  Compression of nth‐order data arrays by B‐splines. Part 1: Theory , 1993 .

[16]  M. Hassoun,et al.  Neural processing letters , 2000 .

[17]  R. Barnes,et al.  Standard Normal Variate Transformation and De-Trending of Near-Infrared Diffuse Reflectance Spectra , 1989 .

[18]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[19]  Andrew G. Glen,et al.  APPL , 2001 .

[20]  D. Kell,et al.  An introduction to wavelet transforms for chemometricians: A time-frequency approach , 1997 .

[21]  Edwin D. Mares,et al.  On S , 1994, Stud Logica.

[22]  Ron Wehrens,et al.  The pls Package: Principal Component and Partial Least Squares Regression in R , 2007 .

[23]  Michel Verleysen,et al.  Representation of functional data in neural networks , 2005, Neurocomputing.