Optimal Sample Size for Predicting Viability of Cabbage and Radish Seeds Based on near Infrared Spectra of Single Seeds

The effects of the number of seeds in a training sample set on the ability to predict the viability of cabbage or radish seeds are presented and discussed. The supervised classification method extended canonical variates analysis (ECVA) was used to develop a classification model. Calibration sub-sets of different sizes were chosen randomly with several iterations and using the spectral-based sample selection algorithms DUPLEX and CADEX. An independent test set was used to validate the developed classification models. The results showed that 200 seeds were optimal in a calibration set for both cabbage and radish data. The misclassification rates at optimal sample size were 8%, 6% and 7% for cabbage and 3%, 3% and 2% for radish respectively for random method (averaged for 10 iterations), DUPLEX and CADEX algorithms. This was similar to the misclassification rate of 6% and 2% for cabbage and radish obtained using all 600 seeds in the calibration set. Thus, the number of seeds in the calibration set can be reduced by up to 67% without significant loss of classification accuracy, which will effectively enhance the cost-effectiveness of NIR spectral analysis. Wavelength regions important for the discrimination between viable and non-viable seeds were identified using interval ECVA (iECVA) models, ECVA weight plots and the mean difference spectrum for viable and non-viable seeds.

[1]  P. Geladi,et al.  Linearization and Scatter-Correction for Near-Infrared Reflectance Spectra of Meat , 1985 .

[2]  S. Engelsen,et al.  Interval Partial Least-Squares Regression (iPLS): A Comparative Chemometric Study with an Example from Near-Infrared Spectroscopy , 2000 .

[3]  R. Bro,et al.  Fluorescence spectroscopy and chemometrics for classification of breast cancer samples—a feasibility study using extended canonical variates analysis , 2007 .

[4]  Anil K. Jain,et al.  Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Rasmus Bro,et al.  Some common misunderstandings in chemometrics , 2010 .

[6]  R. Brereton Consequences of sample size, variable selection, and model validation and optimisation, for predicting classification ability from analytical data , 2006 .

[7]  L. Munck,et al.  A new holistic exploratory approach to Systems Biology by Near Infrared Spectroscopy evaluated by chemometrics and data inspection , 2007 .

[8]  Calyampudi R. Rao,et al.  Advanced Statistical Methods in Biometric Research. , 1953 .

[9]  Ronald D. Snee,et al.  Validation of Regression Models: Methods and Examples , 1977 .

[10]  Tom Fearn,et al.  Practical Nir Spectroscopy With Applications in Food and Beverage Analysis , 1993 .

[11]  R. Morrison Sampling in seed health testing. , 1999, Phytopathology.

[12]  M. Tigabu,et al.  Rapid and non-destructive analysis of vigour of Pinus patula seeds using single seed near infrared transmittance spectra and multivariate analysis , 2004 .

[13]  T. Min,et al.  Nondestructive Separation of Viable and Nonviable Gourd (Lagenaria siceraria) Seeds Using Single Seed Near Infrared Spectroscopy , 2003 .

[14]  Rasmus Bro,et al.  A modification of canonical variates analysis to handle highly collinear multivariate data , 2006 .

[15]  T. Næs,et al.  Multivariate strategies for classification based on NIR-spectra—with application to mayonnaise , 1999 .

[16]  Birte Boelt,et al.  Classification of Viable and Non-Viable Spinach (Spinacia Oleracea L.) Seeds by Single Seed near Infrared Spectroscopy and Extended Canonical Variates Analysis , 2011 .

[17]  T. Min,et al.  Nondestructive Classification of Viable and Nonviable Radish (Raphanus sativus L.) Seeds Using Single Seed Near Infrared Spectroscopy , 2008 .

[18]  Mulualem Tigabu,et al.  Discrimination of viable and empty seeds of Pinus patula Schiede & Deppe with near-infrared spectroscopy , 2003, New Forests.

[19]  L. A. Stone,et al.  Computer Aided Design of Experiments , 1969 .

[20]  L. Velasco,et al.  Estimation of seed weight, oil content and fatty acid composition in intact single seeds of rapeseed ( Brassica napus} L.) by near-infrared reflectance spectroscopy , 1999, Euphytica.

[21]  A. Savitzky,et al.  Smoothing and Differentiation of Data by Simplified Least Squares Procedures. , 1964 .

[22]  Dominique Job,et al.  Proteome-wide characterization of sugarbeet seed vigor and its tissue specific expression , 2008, Proceedings of the National Academy of Sciences.

[23]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[24]  H. Martens,et al.  Light scattering and light absorbance separated by extended multiplicative signal correction. application to near-infrared transmission analysis of powder mixtures. , 2003, Analytical chemistry.

[25]  K. Norris,et al.  4. Direct Spectrophotometric Determination of Moisture Content of Grain and Seeds , 1996 .

[26]  M. Mcdonald,et al.  lipid peroxidation model of seed ageing , 1986 .