Wheat Kernel Variety Identification Based on a Large Near-Infrared Spectral Dataset and a Novel Deep Learning-Based Feature Selection Method

Near-infrared (NIR) hyperspectroscopy becomes an emerging nondestructive sensing technology for inspection of crop seeds. A large spectral dataset of more than 140,000 wheat kernels in 30 varieties was prepared for classification. Feature selection is a critical segment in large spectral data analysis. A novel convolutional neural network-based feature selector (CNN-FS) was proposed to screen out deeply target-related spectral channels. A convolutional neural network with attention (CNN-ATT) framework was designed for one-dimension data classification. Popular machine learning models including support vector machine (SVM) and partial least square discrimination analysis were used as the benchmark classifiers. Features selected by conventional feature selection algorithms were considered for comparison. Results showed that the designed CNN-ATT produced a higher performance than the compared classifier. The proposed CNN-FS found a subset of features, which made a better representation of raw dataset than conventional selectors did. The CNN-ATT achieved an accuracy of 93.01% using the full spectra and keep its high precision (90.20%) by training on the 60-channel features obtained via the CNN-FS method. The proposed methods have great potential for handling the analyzing tasks on other large spectral datasets. The proposed feature selection structure can be extended to design other new model-based selectors. The combination of NIR hyperspectroscopic technology and the proposed models has great potential for automatic nondestructive classification of single wheat kernels.

[1]  Barbara Pes Ensemble feature selection for high-dimensional data: a stability analysis across multiple domains , 2019, Neural Computing and Applications.

[2]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[3]  B. Mallick,et al.  Bayesian sparse multiple regression for simultaneous rank reduction and variable selection. , 2016, Biometrika.

[4]  Yisen Liu,et al.  Convolutional neural network for hyperspectral data analysis and effective wavelengths selection. , 2019, Analytica chimica acta.

[5]  Md. Zakirul Alam Bhuiyan,et al.  A Survey on Deep Learning in Big Data , 2017, 22017 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC).

[6]  Qibing Zhu,et al.  Model fusion for prediction of apple firmness using hyperspectral scattering image , 2012 .

[7]  Chu Zhang,et al.  Application of Near-Infrared Hyperspectral Imaging with Variable Selection Methods to Determine and Visualize Caffeine Content of Coffee Beans , 2016, Food and Bioprocess Technology.

[8]  Yong He,et al.  Identification of Geographical Origin of Olive Oil Using Visible and Near-Infrared Spectroscopy Technique Combined with Chemometrics , 2009, Food and Bioprocess Technology.

[9]  Richard A. Crocombe,et al.  Portable Spectroscopy , 2018, Applied spectroscopy.

[10]  Da-Wen Sun,et al.  A novel NIR spectral calibration method: Sparse coefficients wavelength selection and regression (SCWR). , 2020, Analytica chimica acta.

[11]  Bing Cai Kok,et al.  Sparse Extended Redundancy Analysis: Variable Selection via the Exclusive LASSO , 2019, Multivariate behavioral research.

[12]  Mei-Ling Shyu,et al.  A Survey on Deep Learning , 2018, ACM Comput. Surv..

[13]  Yidan Bao,et al.  Near-Infrared Hyperspectral Imaging Combined with Deep Learning to Identify Cotton Seed Varieties , 2019, Molecules.

[14]  Shigeyuki Matsui,et al.  compound.Cox: Univariate feature selection and compound covariate for predicting survival , 2019, Comput. Methods Programs Biomed..

[15]  Kaveh Mollazade,et al.  Toward an automatic wheat purity measuring device: A machine vision-based neural networks-assisted imperialist competitive algorithm approach , 2014 .

[16]  Zhao Lingjuan,et al.  Comprehensive comparison of multiple quantitative near-infrared spectroscopy models for Aspergillus flavus contamination detection in peanut. , 2019, Journal of the science of food and agriculture.

[17]  Yisen Liu,et al.  Convolutional neural network for hyperspectral data analysis and effective wavelengths selection. , 2019, Analytica chimica acta.

[18]  Roman M. Balabin,et al.  Variable selection in near-infrared spectroscopy: benchmarking of feature selection methods on biodiesel data. , 2011, Analytica chimica acta.

[19]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[20]  Chu Zhang,et al.  Variety Identification of Single Rice Seed Using Hyperspectral Imaging Combined with Convolutional Neural Network , 2018 .

[21]  Kemal Özkan,et al.  Identification of wheat kernels by fusion of RGB, SWIR, and VNIR samples. , 2019, Journal of the science of food and agriculture.

[23]  Min Huang,et al.  Maize seed classification using hyperspectral image coupled with multi-linear discriminant analysis , 2019 .

[24]  Yidan Bao,et al.  Hyperspectral imaging for seed quality and safety inspection: a review , 2019, Plant Methods.

[25]  Eric Ziemons,et al.  Vibrational spectroscopy in analysis of pharmaceuticals: Critical review of innovative portable and handheld NIR and Raman spectrophotometers , 2019, TrAC Trends in Analytical Chemistry.

[26]  Siamak Mehrkanoon,et al.  Deep Shared Representation Learning for Weather Elements Forecasting , 2019, BNAIC/BENELEARN.

[27]  Kadir Sabanci,et al.  Computer vision-based method for classification of wheat grains using artificial neural network. , 2017, Journal of the science of food and agriculture.

[28]  Kati Hanhineva,et al.  Mass spectrometry-based analysis of whole-grain phytochemicals , 2017, Critical reviews in food science and nutrition.

[29]  Biao Huang,et al.  Deep Learning-Based Feature Representation and Its Application for Soft Sensor Modeling With Variable-Wise Weighted SAE , 2018, IEEE Transactions on Industrial Informatics.

[30]  Ana M. Jiménez-Carvelo,et al.  Alternative data mining/machine learning methods for the analytical evaluation of food quality and authenticity - A review. , 2019, Food research international.

[31]  Fei Liu,et al.  Application of Deep Learning in Food: A Review. , 2019, Comprehensive reviews in food science and food safety.

[32]  Joanna Polanska,et al.  Multidimensional Feature Selection and Interaction Mining with Decision Tree Based Ensemble Methods , 2017, PACBB.

[33]  Li-Xin Zhao,et al.  Research on wheat leaf water content based on machine vision , 2018, Cluster Computing.

[34]  Chia-Yen Lee,et al.  LASSO variable selection in data envelopment analysis with small datasets , 2020 .

[35]  Samsuzana Abd Aziz,et al.  Spectral features selection and classification of oil palm leaves infected by Basal stem rot (BSR) disease using dielectric spectroscopy , 2018, Comput. Electron. Agric..

[36]  C. Kesmir,et al.  Variety identification of wheat using mass spectrometry with neural networks and the influence of mass spectra processing prior to neural network analysis. , 2002, Rapid communications in mass spectrometry : RCM.

[37]  L. Ni,et al.  The method of calibration model transfer by optimizing wavelength combinations based on consistent and stable spectral signals. , 2020, Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy.

[38]  Byoung-Kwan Cho,et al.  Assessment of seed quality using non-destructive measurement techniques: a review , 2016, Seed Science Research.

[39]  Chu Zhang,et al.  Identification of Soybean Varieties Using Hyperspectral Imaging Coupled with Convolutional Neural Network , 2019, Sensors.

[40]  Stephen Marshall,et al.  Varietal Classification of Rice Seeds Using RGB and Hyperspectral Images , 2020, IEEE Access.

[41]  Syed Abdul Wadood,et al.  Geographical discrimination of Chinese winter wheat using volatile compounds analysis by HS-SPME/GC-MS coupled with multivariate statistical analysis. , 2019, Journal of mass spectrometry : JMS.

[42]  Licheng Jiao,et al.  Hyperspectral Unmixing via Deep Convolutional Neural Networks , 2018, IEEE Geoscience and Remote Sensing Letters.

[43]  Chu Zhang,et al.  Hyperspectral imaging analysis for ripeness evaluation of strawberry with support vector machine , 2016 .

[44]  Wei Liu,et al.  TDP: Two-dimensional perceptron for image recognition , 2020, Knowl. Based Syst..

[45]  Abdul Aziz Jemain,et al.  Partial least squares-discriminant analysis (PLS-DA) for classification of high-dimensional (HD) data: a review of contemporary practice strategies and knowledge gaps. , 2018, The Analyst.

[46]  Xiaoli Li,et al.  Nondestructive measurement and fingerprint analysis of soluble solid content of tea soft drink based on Vis/NIR spectroscopy , 2007 .