A feature-selection algorithm based on Support Vector Machine-Multiclass for hyperspectral visible spectral analysis

Abstract Quality and safety of foods is one of the world’s top topics. Using high-precision spectral devices is a main technology trends by its high accuracy and nondestructive of food inspection, but the common obstacle is how to extract informative variables from raw data without losing significant information. This article proposes a novel feature selection algorithm named Support Vector Machine-Multiclass Forward Feature Selection (SVM-MFFS). SVM-MFFS adopts the wrapper and forward feature selection strategy, explores the stability of spectral variables, and uses classical SVM as classification and regression model to select the most relevant wavelengths from hundreds of spectral data. We compare SVM-MFFS with Successive Projection Analysis and Uninformative Variable Elimination in the experiment of identifying different brands of sesame oil. The results show that SVM-MFFS outperforms in accuracy, Receiver Operating Characteristic curve, Prediction and Cumulative Stability, and it will provide a reliable and rapid method in food quality inspection.

[1]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[2]  Joon Heo,et al.  A hierarchical approach to Compact Airborne Spectrographic Imager (CASI) high-resolution image classification of Little Miami River Watershed for environmental modelling , 2012 .

[3]  Yong He,et al.  Application of image texture for the sorting of tea categories using multi-spectral imaging technique and support vector machine , 2008 .

[4]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Fei Liu,et al.  Classification of brands of instant noodles using Vis/NIR spectroscopy and chemometrics , 2008 .

[6]  Martin Wolf,et al.  Progress of near-infrared spectroscopy and topography for brain and muscle clinical applications. , 2007, Journal of biomedical optics.

[7]  Yong He,et al.  Prediction of soil macronutrients content using near-infrared spectroscopy , 2007 .

[8]  Xin Zhou,et al.  MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data , 2007, Bioinform..

[9]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[10]  Chonghun Han,et al.  Real-time classification of petroleum products using near-infrared spectra , 2000 .

[11]  Fei Liu,et al.  Variable selection in visible/near infrared spectra for linear and nonlinear calibrations: a case study to determine soluble solids content of beer. , 2009, Analytica chimica acta.

[12]  D. Massart,et al.  Elimination of uninformative variables for multivariate calibration. , 1996, Analytical chemistry.

[13]  M. Zweig,et al.  Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. , 1993, Clinical chemistry.

[14]  Zhuoyong Zhang,et al.  Detection of adulterants such as sweeteners materials in honey using near-infrared spectroscopy and chemometrics , 2010 .

[15]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[16]  Zehang Sun,et al.  Object detection using feature subset selection , 2004, Pattern Recognit..

[17]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[18]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[19]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[20]  Xiaoli Li,et al.  Nondestructive measurement and fingerprint analysis of soluble solid content of tea soft drink based on Vis/NIR spectroscopy , 2007 .

[21]  Maxim E. Darvin,et al.  Noninvasive Detection of beta-Carotene and Lycopene in Human Skin using Raman Spectroscopy , 2004 .

[22]  H. Abdi,et al.  Principal component analysis , 2010 .

[23]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[24]  Shintaroh Ohashi,et al.  Comparison of different modes of visible and near-infrared spectroscopy for detecting internal insect infestation in jujubes , 2010 .

[25]  L. A. Stone,et al.  Computer Aided Design of Experiments , 1969 .

[26]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[27]  M. C. U. Araújo,et al.  The successive projections algorithm for variable selection in spectroscopic multicomponent analysis , 2001 .