Subset selection strategy

A new technique for representative subset selection is presented. The advocated method selects unambiguously the most important objects among the calibration set and uses this subset for the model development without significant deterioration in the predictive ability. The method is called boundary subset selection and it is an inherent part of the Simple Interval Calculation (SIC) approach. SIC is a method for linear modeling, which is based on the assumption of error boundedness. The primary SIC consequence is an object status classification (OSClas) that reveals the most influential objects and also designates the most stable and reliable ones. The OSClas is used as the main tool for representative subset selection. The presented results are compared with widely used Kennard–Stone algorithm and D‐optimal design procedure employing three real‐world examples. Copyright © 2008 John Wiley & Sons, Ltd.

[1]  D. L. Massart,et al.  Characterisation of the representativity of selected sets of samples in multivariate calibration and pattern recognition , 1997 .

[2]  M. Forina,et al.  Multivariate calibration. , 2007, Journal of chromatography. A.

[3]  A. Robinson I. Introduction , 1991 .

[4]  Chonghun Han,et al.  Calibration transfer of near-infrared spectra based on compression of wavelet coefficients , 2002 .

[5]  S. Wold,et al.  Strategies for subset selection of parts of an in‐house chemical library , 2001 .

[6]  Svante Wold,et al.  The utility of multivariate design in PLS modeling , 2004 .

[7]  Agnar Höskuldsson,et al.  Process control and optimization with simple interval calculation method , 2006 .

[8]  Gabriele Cruciani,et al.  Peptide studies by means of principal properties of amino acids derived from MIF descriptors , 2004 .

[9]  Nesa L'abbe Wu,et al.  Linear programming and extensions , 1981 .

[10]  J. Kropp,et al.  Nova Science Publishers, Inc , 2007 .

[11]  D. Massart,et al.  Standardisation of near-infrared spectrometric instruments: A review , 1996 .

[12]  Celio Pasquini,et al.  A strategy for selecting calibration samples for multivariate modelling , 2004 .

[13]  A. Höskuldsson Variable and subset selection in PLS regression , 2001 .

[14]  Chi-Hyuck Jun,et al.  Near-infrared spectral data transfer using independent standardization samples: a case study on the trans-alkylation process , 2001 .

[15]  W. J. Studden,et al.  Theory Of Optimal Experiments , 1972 .

[16]  J. Zupan,et al.  Separation of data on the training and test set for modelling: a case study for modelling of five colour properties of a white pigment , 2003 .

[17]  Nicolaas M. Faber,et al.  Comparison of two recently proposed expressions for partial least squares regression prediction error , 2000 .

[18]  D. B. Hibbert Multivariate calibration and classification - T. Naes, T. Isaksson, T. Fearn and T. Davis, NIR Publications, Chichester, 2002, ISBN 0 9528666 2 5, UK @$45.00, US$75.00 , 2004 .

[19]  L. A. Stone,et al.  Computer Aided Design of Experiments , 1969 .

[20]  Desire L. Massart,et al.  Artificial neural networks in classification of NIR spectral data: Design of the training set , 1996 .

[21]  Kim H. Esbensen,et al.  Application of SIC (simple interval calculation) for object status classification and outlier detection—comparison with regression approach , 2004 .