mRMR-based feature selection for classification of cotton foreign matter using hyperspectral imaging

A mRMR-based two stage framework was used for optimal wavelengths selection.A total of 12 wavelengths were selected for cotton foreign matter classification.The selected wavelengths were highly correlated to the properties of foreign matter.The selected wavelengths achieved 86% or higher hit rate for different classifiers. Different cotton foreign matter causes various levels of damage to textile products and decreases the monetary value of cotton. Hyperspectral imaging technique has shown the capability of discriminating the foreign matter, but its large amount of information which is mostly correlated and redundant limits the classification accuracy and processing speed. The goal of this study was to explore a new method of feature selection (minimum Redundancy Maximum Relevance algorithm) to select optimal wavelengths from the visible to near infrared spectra of the hyperspectral imaging data for cotton foreign matter classification. A spectral dataset containing 480 samples was collected from hyperspectral reflectance images of cotton lint and 15 types of foreign matter. Each sample was represented by a mean spectrum containing 256 wavelengths ranging from 400nm to 1000nm. The dataset was pre-processed by removing the noise, and the number of wavelengths was reduced from 256 to 223 by removing those with a signal to noise ratio lower than 10dB. The optimal wavelengths were selected from the pre-processed dataset by a two-stage approach. The first step was to rank the features using the minimum Redundancy Maximum Relevance algorithm and to provide only the top ranked features for the following feature selection. In the second step, the sequential backward elimination was applied to the top ranked wavelengths to select the optimal wavelengths for foreign matter classification. The generality of the selected wavelengths was evaluated by comparing the classification performance using the Linear Discriminant Analysis (LDA), Support Vector Machine (SVM), and Artificial Neural Networks (ANNs). A total of 12 wavelengths were selected as the optimal feature set for foreign matter classification. Eight wavelengths from the visible range were related to the natural or artificial pigments of foreign matter, and the other four from the near-infrared range were related to the proteins or nutrients in foreign matter. The selected wavelengths achieved average classification rates of 91.25%, 86.67%, and 86.67% for the LDA, SVM, and ANNs, respectively, indicating the generality of the selected features. This study explored a new method for hyperspectral imaging optimal wavelength selection and the selected wavelengths can be used with different classifiers for cotton foreign matter classification.

[1]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[2]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[3]  Heaton T. Jeff,et al.  Introduction to Neural Networks with Java , 2005 .

[4]  Pier Luca Lanzi,et al.  Fast feature selection with genetic algorithms: a filter approach , 1997, Proceedings of 1997 IEEE International Conference on Evolutionary Computation (ICEC '97).

[5]  Kurt Hornik,et al.  The support vector machine under test , 2003, Neurocomputing.

[6]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[7]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[8]  Dongyao Jia,et al.  Detection of foreign materials in cotton using a multi-wavelength imaging method , 2005 .

[9]  D. Himmelsbach,et al.  Novel Search Algorithms for a Mid-Infrared Spectral Library of Cotton Contaminants , 2008, Applied spectroscopy.

[10]  Renfu Lu,et al.  Hyperspectral and multispectral imaging for evaluating food safety and quality , 2013 .

[11]  Daoliang Li,et al.  Original paper: Classification of foreign fibers in cotton lint using machine vision and multi-class support vector machine , 2010 .

[12]  Changying Li,et al.  Detection and Discrimination of Cotton Foreign Matter Using Push-Broom Based Hyperspectral Imaging: System Design and Capability , 2015, PloS one.

[13]  Kurt C. Lawrence,et al.  Line-scan hyperspectral imaging system for real-time inspection of poultry carcasses with fecal material and ingesta , 2011 .

[14]  Jane Labadin,et al.  Feature selection based on mutual information , 2015, 2015 9th International Conference on IT in Asia (CITA).

[15]  Daoliang Li,et al.  A fast segmentation method for high-resolution color images of foreign fibers in cotton , 2011 .

[16]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  J. Foulk,et al.  Identification of cotton and cotton trash components by Fourier transform near-infrared spectroscopy , 2011 .

[18]  J. Kittler,et al.  Feature Set Search Alborithms , 1978 .

[19]  LarrañagaPedro,et al.  A review of feature selection techniques in bioinformatics , 2007 .

[20]  Da-Wen Sun,et al.  Inspection and grading of agricultural and food products by computer vision systems—a review , 2002 .

[21]  Shih-Wei Lin,et al.  Parameter determination and feature selection for C4.5 algorithm using scatter search approach , 2012, Soft Comput..

[22]  J. Zvezdanović,et al.  THE IDENTIFICATION OF CHLOROPHYLL AND ITS DERIVATIVES IN THE PIGMENT MIXTURES: HPLC-CHROMATOGRAPHY, VISIBLE AND MASS SPECTROSCOPY STUDIES , 2012 .

[23]  W S McCulloch,et al.  A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.

[24]  Inderjit S. Dhillon,et al.  A Divisive Information-Theoretic Feature Clustering Algorithm for Text Classification , 2003, J. Mach. Learn. Res..

[25]  Ruihu Wang,et al.  AdaBoost for Feature Selection, Classification and Its Relation with SVM, A Review , 2012 .

[26]  J. Bruce German,et al.  Development and Use of an Attenuated Total Reflectance / Fourier Transform Infrared ( ATR / FTIR ) Spectral Database To Identify Foreign Matter in Cotton , 2007 .

[27]  Chih-Jen Lin,et al.  Errata to "A comparison of methods for multiclass support vector machines" , 2002, IEEE Trans. Neural Networks.

[28]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[29]  B. Xu,et al.  Clustering Analysis for Cotton Trash Classification , 1999 .

[30]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[31]  Avinash C. Kak,et al.  PCA versus LDA , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  D. Himmelsbach,et al.  Development and use of an attenuated total reflectance/fourier transform infrared (ATR/FT-IR) spectral database to identify foreign matter in cotton. , 2006, Journal of agricultural and food chemistry.

[33]  H. Lichtenthaler,et al.  Chlorophylls and Carotenoids: Measurement and Characterization by UV‐VIS Spectroscopy , 2001 .

[34]  Jeff Heaton,et al.  Introduction to Neural Networks for C#, 2nd Edition , 2008 .

[35]  Sotiris B. Kotsiantis,et al.  Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[36]  Dongyao Jia,et al.  Detection of foreign fibers in cotton using near-infrared optimal wavelength imaging , 2005 .

[37]  Yeşim Aydın Son,et al.  A Prostate Cancer Model Build by a Novel SVM-ID3 Hybrid Feature Selection Method Using Both Genotyping and Phenotype Data from dbGaP , 2014, PloS one.

[38]  Hugo Scheer Chlorophylls and Carotenoids , 2013 .

[39]  Yesim Aydin Son,et al.  A prostate cancer model build by a novel SVM-ID3 hybrid feature selection method using both genotyping and phenotype data from dbGaP. , 2014 .