Predictive-property-ranked variable reduction with final complexity adapted models in partial least squares modeling for multiple responses.

For partial least-squares regression with one response (PLS1), many variable-reduction methods have been developed. However, only a few address the case of multiple-response partial-least-squares (PLS2) modeling. The calibration performance of PLS1 can be improved by elimination of uninformative variables. Many variable-reduction methods are based on various PLS-model-related parameters, called predictor-variable properties. Recently, an important adaptation, in which the model complexity is optimized, was introduced in these methods. This method was called Predictive-Property-Ranked Variable Reduction with Final Complexity Adapted Models, denoted as PPRVR-FCAM or simply FCAM. In this study, variable reduction for PLS2 models, using an adapted FCAM method, FCAM-PLS2, is investigated. The utility and effectiveness of four new predictor-variable properties, derived from the multiple response PLS2 regression coefficients, are studied for six data sets consisting of ultraviolet-visible (UV-vis) spectra, near-infrared (NIR) spectra, NMR spectra, and two simulated sets, one with correlated and one with uncorrelated responses. The four properties include the mean of the absolute values as well as the norm of the PLS2 regression coefficients and their significances. The four properties were found to be applicable by the FCAM-PLS2 method for variable reduction. The predictive abilities of models resulting from the four properties are similar. The norm of the PLS2 regression coefficients has the best selective abilities, low numbers of variables with an informative meaning to the responses are retained. The significance of the mean of the PLS2 regression coefficients is found to be the least-selective property.

[1]  Sumio Kawano,et al.  Near infrared spectral patterns of fatty acid analysis from fats and oils , 1991 .

[2]  S. Lanteri,et al.  Selection of useful predictors in multivariate calibration , 2004, Analytical and bioanalytical chemistry.

[3]  Ronald R. Coifman,et al.  The prediction error in CLS and PLS: the importance of feature selection prior to multivariate calibration , 2005 .

[4]  F. Podczeck,et al.  Feasibility study for the rapid determination of the amylose content in starch by near-infrared spectroscopy. , 2004, European journal of pharmaceutical sciences : official journal of the European Federation for Pharmaceutical Sciences.

[5]  J. deMan,et al.  Determination of oil content of seeds by NIR: Influence of fatty acid composition on wavelength selection , 1990 .

[6]  D. Massart,et al.  Elimination of uninformative variables for multivariate calibration. , 1996, Analytical chemistry.

[7]  J. Roger,et al.  CovSel: Variable selection for highly multivariate and multi-response calibration: Application to IR spectroscopy , 2011 .

[8]  M A Arnold,et al.  Genetic algorithm-based method for selecting wavelengths and model size for use with partial least-squares regression: application to near-infrared spectroscopy. , 1996, Analytical chemistry.

[9]  A. G. Frenich,et al.  Wavelength selection method for multicomponent spectrophotometric determinations using partial least squares , 1995 .

[10]  Elaine B. Martin,et al.  Model selection for partial least squares regression , 2002 .

[11]  H. Büning-Pfaue Analysis of water in food by near infrared spectroscopy , 2003 .

[12]  H. Martens,et al.  Variable Selection in near Infrared Spectroscopy Based on Significance Testing in Partial Least Squares Regression , 2000 .

[13]  C. Spiegelman,et al.  Theoretical Justification of Wavelength Selection in PLS Calibration:  Development of a New Algorithm. , 1998, Analytical Chemistry.

[14]  T. Fearn,et al.  Bayesian wavelength selection in multicomponent analysis , 1998 .

[15]  R. Bro,et al.  Quantitative analysis of NMR spectra with chemometrics. , 2008, Journal of magnetic resonance.

[16]  Rasmus Bro,et al.  Variable selection in regression—a tutorial , 2010 .

[17]  Roman M. Balabin,et al.  Variable selection in near-infrared spectroscopy: benchmarking of feature selection methods on biodiesel data. , 2011, Analytica chimica acta.

[18]  Rasmus Bro,et al.  Finding relevant spectral regions between spectroscopic techniques by use of cross model validation and partial least squares regression. , 2007, Analytica chimica acta.

[19]  M. Luca,et al.  Multivariate calibration techniques applied to derivative spectroscopy data for the analysis of pharmaceutical mixtures , 2009 .

[20]  Jean-Pierre Gauchi,et al.  Comparison of selection methods of explanatory variables in PLS regression with application to manufacturing process data , 2001 .

[21]  B. R. Kowalski,et al.  Background detection and correction in multicomponent analysis , 1985 .

[22]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[23]  L. Buydens,et al.  Predictive-property-ranked variable reduction in partial least squares modelling with final complexity adapted models: comparison of properties for ranking. , 2013, Analytica chimica acta.

[24]  S. Wold Cross-Validatory Estimation of the Number of Components in Factor and Principal Components Models , 1978 .

[25]  W. Cai,et al.  A variable selection method based on uninformative variable elimination for multivariate calibration of near-infrared spectra , 2008 .

[26]  Philip J. Brown,et al.  Wavelength selection in multicomponent near‐infrared calibration , 1992 .

[27]  Lijuan Xie,et al.  Quantification of glucose, fructose and sucrose in bayberry juice by NIR and PLS , 2009 .

[28]  B. Efron,et al.  A Leisurely Look at the Bootstrap, the Jackknife, and , 1983 .

[29]  Yvan Vander Heyden,et al.  Improved variable reduction in partial least squares modelling based on predictive-property-ranked variables and adaptation of partial least squares complexity. , 2011, Analytica chimica acta.

[30]  Agnar Höskuldsson,et al.  COVPROC method: strategy in modeling dynamic systems , 2003 .

[31]  R. Leardi Genetic algorithms in chemometrics and chemistry: a review , 2001 .

[32]  Israel Schechter,et al.  Wavelength Selection for Simultaneous Spectroscopic Analysis. Experimental and Theoretical Study , 1996 .

[33]  Philip K. Hopke,et al.  Variable selection in classification of environmental soil samples for partial least square and neural network models , 2001 .

[34]  Ronald D. Snee,et al.  Validation of Regression Models: Methods and Examples , 1977 .

[35]  Ron Wehrens,et al.  Wavelength selection with Tabu Search , 2003 .

[36]  S. D. Jong SIMPLS: an alternative approach to partial least squares regression , 1993 .

[37]  D B Kell,et al.  Variable selection in discriminant partial least-squares analysis. , 1998, Analytical chemistry.

[38]  A. Höskuldsson H‐methods in applied sciences , 2008 .

[39]  Zou Xiaobo,et al.  Variables selection methods in near-infrared spectroscopy. , 2010, Analytica chimica acta.

[40]  M. de la Guardia,et al.  PLS-NIR determination of total sugar, glucose, fructose and sucrose in aqueous solutions of fruit juices , 1997 .

[41]  Hugo Kubinyi,et al.  Evolutionary variable selection in regression and PLS analyses , 1996 .

[42]  S. Jacobsen,et al.  Analysis of protein structures and interactions in complex food by near-infrared spectroscopy. 1. Gluten powder. , 2007, Journal of agricultural and food chemistry.

[43]  R. Teófilo,et al.  Sorting variables by using informative vectors as a strategy for feature selection in multivariate regression , 2009 .