An active learning representative subset selection method using net analyte signal.

To guarantee accurate predictions, representative samples are needed when building a calibration model for spectroscopic measurements. However, in general, it is not known whether a sample is representative prior to measuring its concentration, which is both time-consuming and expensive. In this paper, a method to determine whether a sample should be selected into a calibration set is presented. The selection is based on the difference of Euclidean norm of net analyte signal (NAS) vector between the candidate and existing samples. First, the concentrations and spectra of a group of samples are used to compute the projection matrix, NAS vector, and scalar values. Next, the NAS vectors of candidate samples are computed by multiplying projection matrix with spectra of samples. Scalar value of NAS is obtained by norm computation. The distance between the candidate set and the selected set is computed, and samples with the largest distance are added to selected set sequentially. Last, the concentration of the analyte is measured such that the sample can be used as a calibration sample. Using a validation test, it is shown that the presented method is more efficient than random selection. As a result, the amount of time and money spent on reference measurements is greatly reduced.

[1]  Oxana Ye. Rodionova,et al.  Subset selection strategy , 2008 .

[2]  Edwin Lughofer,et al.  Incremental and decremental active learning for optimized self-adaptive calibration in viscose production , 2014 .

[3]  Zhenhe Ma,et al.  Design of a reference value-based sample-selection method and evaluation of its prediction capability , 2015 .

[4]  Roberto Kawakami Harrop Galvão,et al.  A method for calibration and validation subset partitioning. , 2005, Talanta.

[5]  Celio Pasquini,et al.  A strategy for selecting calibration samples for multivariate modelling , 2004 .

[6]  L. A. Stone,et al.  Computer Aided Design of Experiments , 1969 .

[7]  A. Lorber Error propagation and figures of merit for quantification by solving matrix equations , 1986 .

[8]  Thorsten Behrens,et al.  Sampling optimal calibration sets in soil infrared spectroscopy , 2014 .

[9]  F. Xavier Rius,et al.  Constructing D-optimal designs from a list of candidate samples , 1997 .

[10]  Edwin Lughofer,et al.  Evolving chemometric models for predicting dynamic process parameters in viscose production. , 2012, Analytica chimica acta.

[11]  Wei Liu,et al.  [An optimal selection method of samples of calibration set and validation set for spectral multivariate analysis]. , 2014, Guang pu xue yu guang pu fen xi = Guang pu.

[12]  Avraham Lorber,et al.  Net analyte signal calculation in multivariate calibration , 1997 .

[13]  Nicolaas M. Faber,et al.  Net analyte signal calculation for multivariate calibration , 2003 .

[14]  Naif Alajlan,et al.  Active learning for spectroscopic data regression , 2012 .

[15]  Desire L. Massart,et al.  Artificial neural networks in classification of NIR spectral data: Design of the training set , 1996 .