Leveraging Multiple Linear Regression for Wavelength Selection

Abstract Wavelength selection is often used for multivariate calibration methods to lower prediction error for the calibrated sample properties. As a result, there are a plethora of wavelength selection methods to select from; all with unique advantages and disadvantages. All wavelength selection methods involve a range of tuning parameters making the methods cumbersome or complex and hence, difficult to work with. The goal of this study is to provide a simple process to select wavelengths for multivariate calibration methods while trying to standardize values of the five adjustable algorithm tuning parameters across data sets. The proposed method uses multiple linear regression (MLR) as an indicator to which wavelengths should be used further to form a multivariate calibration model by some processes such as partial least squares (PLS). From a collection of MLR models formed from randomly selected wavelengths, those models within a thresholds of the bias indicator root mean square error of calibration (RMSEC) and variance indicator model vector L 2 norm are evaluated to ascertain the most frequently selected wavelengths. Portions of the most frequent wavelengths are retained and used to produce a calibration model by PLS. This proposed wavelength selection method is compared to PLS models based on full spectra. Several near infrared data sets are evaluated showing that PLS models based on MLR selected wavelengths provide improved prediction errors. Of the five adjustable parameters for the wavelength selection method, three could be standardized across the data sets while the other two required minor tuning. Recommendations are provided as to alternate wavelength selection algorithms.

[1]  Qing-Song Xu,et al.  Random frog: an efficient reversible jump Markov Chain Monte Carlo-like approach for variable selection with applications to gene selection and disease classification. , 2012, Analytica chimica acta.

[2]  Tormod Næs,et al.  A user-friendly guide to multivariate calibration and classification , 2002 .

[3]  Erik Andries,et al.  Sparse Methods in Spectroscopy: An Introduction, Overview, and Perspective , 2013, Applied spectroscopy.

[4]  J. Šustek Method for the choice of optimal analytical positions in spectrophotometric analysis of multicomponent systems , 1974 .

[5]  Yizeng Liang,et al.  A perspective demonstration on the importance of variable selection in inverse calibration for complex analytical systems. , 2013, The Analyst.

[6]  Károly Héberger,et al.  Wavelength Selection for Multivariate Calibration Using Tikhonov Regularization , 2007, Applied spectroscopy.

[7]  Colm P. O'Donnell,et al.  Selection of Variables Based on Most Stable Normalised Partial Least Squares Regression Coefficients in an Ensemble Monte Carlo Procedure , 2011 .

[8]  Masoumeh Hasani,et al.  Selection of individual variables versus intervals of variables in PLSR , 2010 .

[9]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[10]  E. Andries Sparse models by iteratively reweighted feature scaling: a framework for wavelength and sample selection , 2013 .

[11]  Yi-Zeng Liang,et al.  An efficient variable selection method based on variable permutation and model population analysis for multivariate calibration of NIR spectra , 2016 .

[12]  Dong-Sheng Cao,et al.  An efficient method of wavelength interval selection based on random frog for multivariate spectral calibration. , 2013, Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy.

[13]  Qingsong Xu,et al.  Elastic Net Grouping Variable Selection Combined with Partial Least Squares Regression (EN-PLSR) for the Analysis of Strongly Multi-Collinear Spectroscopic Data , 2011, Applied spectroscopy.

[14]  Kimito Funatsu,et al.  Genetic algorithm‐based wavelength selection method for spectral calibration , 2011 .

[15]  Hongdong Li,et al.  Key wavelengths screening using competitive adaptive reweighted sampling method for multivariate calibration. , 2009, Analytica chimica acta.

[16]  J. Kalivas Two data sets of near infrared spectra , 1997 .

[17]  Erik Andries,et al.  Spectral Multivariate Calibration with Wavelength Selection Using Variants of Tikhonov Regularization , 2010, Applied spectroscopy.

[18]  S. Engelsen,et al.  Interval Partial Least-Squares Regression (iPLS): A Comparative Chemometric Study with an Example from Near-Infrared Spectroscopy , 2000 .

[19]  Yvan Vander Heyden,et al.  Improved variable reduction in partial least squares modelling based on predictive-property-ranked variables and adaptation of partial least squares complexity. , 2011, Analytica chimica acta.

[20]  Christopher D. Brown,et al.  Critical factors limiting the interpretation of regression vectors in multivariate calibration , 2009 .

[21]  John H. Kalivas,et al.  Overview of two‐norm (L2) and one‐norm (L1) Tikhonov regularization variants for full wavelength or sparse spectral multivariate calibration models or maintenance , 2012 .

[22]  Dong-Sheng Cao,et al.  A simple idea on applying large regression coefficient to improve the genetic algorithm-PLS for variable selection in multivariate calibration , 2014 .

[23]  Qing-Song Xu,et al.  Using variable combination population analysis for variable selection in multivariate calibration. , 2015, Analytica chimica acta.

[24]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[25]  Lunzhao Yi,et al.  A novel variable selection approach that iteratively optimizes variable space using weighted binary matrix sampling. , 2014, The Analyst.

[26]  Tarja Rajalahti,et al.  Discriminating variable test and selectivity ratio plot: quantitative tools for interpretation and variable (biomarker) selection in complex spectral or chromatographic profiles. , 2009, Analytical chemistry.

[27]  A A Gowen,et al.  Evaluation of ensemble Monte Carlo variable selection for identification of metabolite markers on NMR data. , 2017, Analytica chimica acta.

[28]  Riccardo Leardi,et al.  Genetic Algorithms as a Tool for Wavelength Selection in Multivariate Calibration , 1995 .

[29]  W. Cai,et al.  A variable selection method based on uninformative variable elimination for multivariate calibration of near-infrared spectra , 2008 .

[30]  Jan Gerretzen,et al.  Boosting model performance and interpretation by entangling preprocessing selection and variable selection. , 2016, Analytica chimica acta.

[31]  John H. Kalivas,et al.  Wavelength Selection Characterization for NIR Spectra , 1997 .

[32]  Dong-Sheng Cao,et al.  Model-population analysis and its applications in chemical and biological modeling , 2012 .

[33]  Dong-Sheng Cao,et al.  Model population analysis for variable selection , 2010 .

[34]  T. Fearn,et al.  Bayes model averaging with selection of regressors , 2002 .

[35]  John H. Kalivas,et al.  Global optimization by simulated annealing with wavelength selection for ultraviolet-visible spectrophotometry , 1989 .

[36]  C. B. Lucasius,et al.  Genetic algorithms in wavelength selection: a comparative study , 1994 .

[37]  Dong-Sheng Cao,et al.  A strategy that iteratively retains informative variables for selecting optimal variable subset in multivariate calibration. , 2014, Analytica chimica acta.