Representative subset selection in modified iterative predictor weighting (mIPW) — PLS models for parsimonious multivariate calibration

A new sample selection technique, named as subset selection based on weighted X (instrumental responses) and y (concentrations) joint distances (SSWD), is proposed in this work. The SSWD takes into account exact contributions of both X and y spaces in calibration, thus being sensitive to variations of both instrumental signals and concentrations. In order to avoid the interference of redundant wavelengths, uninformative wavelengths are eliminated at the first stage by using the modified iterative predictors weighting PLS (mIPW-PLS), then, the SSWD is performed based on the mIPW selected wavelengths to extract the optimal calibration subset. There is no subjective parameter that should be adjusted in calculation of the mIPW-SSWD-PLS, which makes it convenient to perform in practice. To validate the effectiveness and universality of the strategy, it was applied to two different sets of Near-infrared (NIR) spectra. The results indicated that the final model was constructed economically because only the most informative wavelengths and samples were used. The study reveals that the proposed method is of value to reduce both the interference and complexity of multivariate calibration involving large and complex matrices.

[1]  Yukio Tominaga,et al.  Representative subset selection using genetic algorithms , 1998 .

[2]  Yun Hu,et al.  Modified secured principal component regression for detection of unexpected chromatographic features in herbal fingerprints. , 2006, The Analyst.

[3]  D. Massart,et al.  Elimination of uninformative variables for multivariate calibration. , 1996, Analytical chemistry.

[4]  M. H. Choueiki,et al.  Training data development with the D-optimality criterion , 1999, IEEE Trans. Neural Networks.

[5]  F. Xavier Rius,et al.  Constructing D-optimal designs from a list of candidate samples , 1997 .

[6]  C. Greensill,et al.  Sorting of Fruit Using near Infrared Spectroscopy: Application to a Range of Fruit and Vegetables for Soluble Solids and Dry Matter Content , 2004 .

[7]  Richard G. Brereton,et al.  Introduction to multivariate calibration in analytical chemistry , 2000 .

[8]  E. V. Thomas,et al.  Partial least-squares methods for spectral analyses. 1. Relation to other quantitative calibration methods and the extraction of qualitative information , 1988 .

[9]  Desire L. Massart,et al.  Artificial neural networks in classification of NIR spectral data: Design of the training set , 1996 .

[10]  Xueguang Shao,et al.  Variable selection by modified IPW (iterative predictor weighting)-PLS (partial least squares) in continuous wavelet regression models. , 2004, The Analyst.

[11]  Wei Tang,et al.  Ensembling neural networks: Many could be better than all , 2002, Artif. Intell..

[12]  Thomas Adaptable multivariate calibration models for spectral applications , 2000, Analytical chemistry.

[13]  Desire L. Massart,et al.  Representative subset selection , 2002 .

[14]  Fang Wang,et al.  A method for near-infrared spectral calibration of complex plant samples with wavelet transform and elimination of uninformative variables , 2004, Analytical and bioanalytical chemistry.

[15]  Alexander Kai-man Leung,et al.  Wavelet: a new trend in chemistry. , 2003, Accounts of chemical research.

[16]  G. Puchwein Selection of calibration samples for near-infrared spectrometry by factor analysis of spectra , 1988 .

[17]  N. M. Faber,et al.  Uncertainty estimation and figures of merit for multivariate calibration (IUPAC Technical Report) , 2006 .

[18]  K. Walsh,et al.  Short-Wavelength Near-Infrared Spectra of Sucrose, Glucose, and Fructose with Respect to Sugar Concentration and Temperature , 2003, Applied spectroscopy.

[19]  L. Buydens,et al.  Development of robust calibration models in near infra-red spectrometric applications , 2000 .

[20]  W. Melssen,et al.  Selecting a representative training set for the classification of demolition waste using remote NIR sensing , 1999 .

[21]  Xueguang Shao,et al.  Removal of major interference sources in aqueous near-infrared spectroscopy techniques , 2004, Analytical and bioanalytical chemistry.

[22]  D. L. Massart,et al.  Characterisation of the representativity of selected sets of samples in multivariate calibration and pattern recognition , 1997 .

[23]  Alejandro C. Olivieri,et al.  A new family of genetic algorithms for wavelength interval selection in multivariate analytical spectroscopy , 2003 .

[24]  X. Shao,et al.  Simultaneous Wavelength Selection and Outlier Detection in Multivariate Regression of Near-Infrared Spectra , 2005, Analytical sciences : the international journal of the Japan Society for Analytical Chemistry.

[25]  Celio Pasquini,et al.  A strategy for selecting calibration samples for multivariate modelling , 2004 .

[26]  A. Olivieri,et al.  Sustained prediction ability of net analyte preprocessing methods using reduced calibration sets. Theoretical and experimental study involving the spectrophotometric analysis of multicomponent mixtures. , 2001, The Analyst.

[27]  F. Rius,et al.  Selection of the best calibration sample subset for multivariate regression. , 1996, Analytical chemistry.

[28]  Roberto Kawakami Harrop Galvão,et al.  A method for calibration and validation subset partitioning. , 2005, Talanta.

[29]  L. A. Stone,et al.  Computer Aided Design of Experiments , 1969 .

[30]  R. Phan-tan-luu,et al.  Improving an EVM QSPR model for glass transition temperature prediction using optimal design , 2002 .

[31]  T. Iwata,et al.  Elimination of the Uninformative Calibration Sample Subset in the Modified UVE(Uninformative Variable Elimination)–PLS (Partial Least Squares) Method , 2001, Analytical sciences : the international journal of the Japan Society for Analytical Chemistry.

[32]  Robert D. Clark,et al.  OptiSim: An Extended Dissimilarity Selection Method for Finding Diverse Representative Subsets , 1997, J. Chem. Inf. Comput. Sci..