On the use of a self organising map as feature compressor in the building of calibration models: Application to FTIR-spectrophotometry

Abstract Considerable attention has been given to strategies for variable selection in spectroscopic analysis. Here we introduce a different approach, the self organising map as a feature compressor, which also helps reducing the dimensionality of the problem. The method is straightforward and does not need previous knowledge about the regions of the spectra that contain relevant variables or information, so it applies generally. We coupled the method to multiple linear regression, partial component analysis and partial least squares and used it to quantitatively analyse 2-component liquid samples using FTIR spectroscopy. The predicted concentrations of the species within the mixture were extremely accurate (the correlation coefficients of estimated versus real concentrations were 0.997 and 0.995 for methanol and p-xylene, respectively). Furthermore, when applying the feature compression step, calibration models become more stable since they are able to better estimate a concentration not present in the training set.

[1]  M. C. U. Araújo,et al.  The successive projections algorithm for variable selection in spectroscopic multicomponent analysis , 2001 .

[2]  T. Næs,et al.  Principal component regression in NIR analysis: Viewpoints, background details and selection of components , 1988 .

[3]  A. Atkinson Subset Selection in Regression , 1992 .

[4]  R. Leardi Application of a genetic algorithm to feature selection under full validation conditions and to outlier detection , 1994 .

[5]  N. Draper,et al.  Applied Regression Analysis , 1966 .

[6]  Norman R. Draper,et al.  Applied regression analysis (2. ed.) , 1981, Wiley series in probability and mathematical statistics.

[7]  Cosimo Distante,et al.  Drift counteraction with multiple self-organising maps for an electronic nose , 2004 .

[8]  B. Kowalski,et al.  Partial least-squares regression: a tutorial , 1986 .

[9]  John H. Kalivas,et al.  Global optimization by simulated annealing with wavelength selection for ultraviolet-visible spectrophotometry , 1989 .

[10]  A. Ortega,et al.  Gas identification with tin oxide sensor array and self organizing maps: adaptive correction of sensor drifts , 1997, IEEE Instrumentation and Measurement Technology Conference Sensing, Processing, Networking. IMTC Proceedings.

[11]  Desire L. Massart,et al.  Random correlation in variable selection for multivariate calibration with a genetic algorithm , 1996 .

[12]  C. B. Lucasius,et al.  Genetic algorithms in wavelength selection: a comparative study , 1994 .

[13]  Satoshi Kawata,et al.  Optimal Wavelength Selection for Quantitative Analysis , 1986 .

[14]  Desire L. Massart,et al.  Multivariate calibration with Raman spectroscopic data: a case study , 2000 .

[15]  J. S. Ribeiro,et al.  Chemometric models for the quantitative descriptive sensory analysis of Arabica coffee beverages using near infrared spectroscopy. , 2011, Talanta.

[16]  Lisbeth Olsson,et al.  Sensor combination and chemometric variable selection for online monitoring of Streptomyces coelicolor fed-batch cultivations , 2010, Applied Microbiology and Biotechnology.

[17]  Riccardo Leardi,et al.  Genetic Algorithms as a Tool for Wavelength Selection in Multivariate Calibration , 1995 .

[18]  J. Brezmes,et al.  Building parsimonious fuzzy ARTMAP models by variable selection with a cascaded genetic algorithm: application to multisensor systems for gas analysis , 2004 .

[19]  Antonella Macagnano,et al.  Electronic-nose modelling and data analysis using a self-organizing map , 1997 .

[20]  R. Boggia,et al.  Genetic algorithms as a strategy for feature selection , 1992 .

[21]  Douglas N. Rutledge,et al.  GENETIC ALGORITHM APPLIED TO THE SELECTION OF PRINCIPAL COMPONENTS , 1998 .

[22]  U Depczynski,et al.  Genetic algorithms applied to the selection of factors in principal component regression , 2000 .

[23]  Maria Fernanda Pimentel,et al.  Aspects of the successive projections algorithm for variable selection in multivariate calibration applied to plasma emission spectrometry , 2001 .

[24]  Jianguo Sun,et al.  A correlation principal component regression analysis of NIR data , 1995 .

[25]  Charles Elkan,et al.  Quadratic Programming Feature Selection , 2010, J. Mach. Learn. Res..