Selection of representative calibration sample sets for near-infrared reflectance spectroscopy to predict nitrogen concentration in grasses

Abstract The effect of using representative calibration sets with fewer samples was explored and discussed. The data set consisted of near-infrared reflectance (NIR) spectra of grass samples. The grass samples were taken from different years covering a wide range of species and cultivars. Partial least squares regression (PLSR), a chemometric method, has been applied on NIR spectroscopy data for the determination of the nitrogen (N) concentration in these grass samples. The sample selection method based on NIR spectral data proposed by Puchwein and the CADEX (computer aided design of experiments) algorithm were used and compared. Both Puchwein and CADEX methods provide a calibration set equally distributed in space, and both methods require a minimum prior of knowledge. The samples were also selected randomly using complete random, cultivar random (year fixed), year random (cultivar fixed) and interaction (cultivar × year fixed) random procedures to see the influence of different factors on sample selection. Puchwein's method performed best with lowest RMSEP followed by CADEX, interaction random, year random, cultivar random and complete random. Out of 118 samples of the complete calibration set, 19 samples were selected as minimal number of representative samples. RMSEP values obtained for subsets selected using Puchwein, CADEX and using full calibration set were 0.099% N, 0.109% N and 0.092% N respectively. The result indicated that the selection of representative calibration samples can effectively enhance the cost-effectiveness of NIR spectral analysis by reducing the number of analyzed samples in the calibration set by more than 80%, which substantially reduces the effort of laboratory analyses with no significant loss in prediction accuracy.

[1]  Wynne W. Chin,et al.  Structural equation modeling analysis with small samples using partial least squares , 1999 .

[2]  H. J. H. Macfie,et al.  A robust PLS procedure , 1992 .

[3]  Tormod Næs,et al.  A user-friendly guide to multivariate calibration and classification , 2002 .

[4]  H. Martens,et al.  Light scattering and light absorbance separated by extended multiplicative signal correction. application to near-infrared transmission analysis of powder mixtures. , 2003, Analytical chemistry.

[5]  Tormod Næs,et al.  Near Infra-Red Spectroscopy: Bridging the Gap between Data Analysis and NIR Applications , 1995 .

[6]  Avraham Lorber,et al.  The effect of interferences and calbiration design on accuracy: Implications for sensor and sample selection , 1988 .

[7]  T. Hirschfeld,et al.  Unique-sample selection via near-infrared spectral subtraction. , 1985, Analytical chemistry.

[8]  P. Geladi,et al.  Linearization and Scatter-Correction for Near-Infrared Reflectance Spectra of Meat , 1985 .

[9]  Tormod Næs,et al.  Selection of Samples for Calibration in Near-Infrared Spectroscopy. Part II: Selection Based on Spectral Measurements , 1990 .

[10]  Fadia Nasser,et al.  A Monte Carlo Study Investigating the Impact of Item Parceling on Measures of Fit in Confirmatory Factor Analysis , 2003 .

[11]  T. Næs The design of calibration in near infra‐red reflectance analysis by clustering , 1987 .

[12]  G. Puchwein Selection of calibration samples for near-infrared spectrometry by factor analysis of spectra , 1988 .

[13]  F. Rius,et al.  Selection of the best calibration sample subset for multivariate regression. , 1996, Analytical chemistry.

[14]  Ronald D. Snee,et al.  Validation of Regression Models: Methods and Examples , 1977 .

[15]  John S. Shenk,et al.  Population Definition, Sample Selection, and Calibration Procedures for Near Infrared Reflectance Spectroscopy , 1991 .

[16]  Wei Tang,et al.  Ensembling neural networks: Many could be better than all , 2002, Artif. Intell..

[17]  Howard Mark,et al.  Chemometrics in Spectroscopy , 2007 .

[18]  F. Xavier Rius,et al.  Constructing D-optimal designs from a list of candidate samples , 1997 .

[19]  William R. Dillon,et al.  Offending Estimates in Covariance Structure Analysis: Comments on the Causes of and Solutions to Heywood Cases , 1987 .

[20]  L. A. Stone,et al.  Computer Aided Design of Experiments , 1969 .

[21]  A. Savitzky,et al.  Smoothing and Differentiation of Data by Simplified Least Squares Procedures. , 1964 .

[22]  H W Marsh,et al.  Is More Ever Too Much? The Number of Indicators per Factor in Confirmatory Factor Analysis. , 1998, Multivariate behavioral research.