A spectral envelope approach towards effective SVM-RFE on infrared data

Unsupervised feature selection towards effective SVM-RFE on IR data is considered.Unsupervised feature selection is guided by spectral envelope functions of IR data.Spectral windows are induced from peaks of the spectral envelope functions.SVM-RFE is applied to individual spectral windows.Promising results are observed across three different NIR/MIR application domains. Infrared spectroscopy data is characterized by the presence of a huge number of variables. Applications of infrared spectroscopy in the mid-infrared (MIR) and near-infrared (NIR) bands are of widespread use in many fields. To effectively handle this type of data, suitable dimensionality reduction methods are required. In this paper, a dimensionality reduction method designed to enable effective Support Vector Machine Recursive Feature Elimination (SVM-RFE) on NIR/MIR datasets is presented. The method exploits the information content at peaks of the spectral envelope functions which characterize NIR/MIR spectra datasets. Experimental evaluation across different NIR/MIR application domains shows that the proposed method is useful for the induction of compact and accurate SVM classifiers for qualitative NIR/MIR applications involving stringent interpretability or time processing requirements.

[1]  D. M. Beckles,et al.  Factors affecting the postharvest soluble solids and sugar content of tomato (Solanum lycopersicum L.) fruit , 2012 .

[2]  Vladimir Vapnik,et al.  Universal learning technology : Support vector machines , 2005 .

[3]  Xiaoli Li,et al.  Non-destructive discrimination of Chinese bayberry varieties using Vis/NIR spectroscopy , 2007 .

[4]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[5]  Y. Ge,et al.  Remote sensing of soil properties in precision agriculture: A review , 2006 .

[6]  Guodong Guo,et al.  Content-based audio classification and retrieval by support vector machines , 2003, IEEE Trans. Neural Networks.

[7]  L. Duponchel,et al.  Support vector machines (SVM) in near infrared (NIR) spectroscopy: Focus on parameters optimization and model interpretation , 2009 .

[8]  Marc Leman,et al.  Content-Based Music Information Retrieval: Current Directions and Future Challenges , 2008, Proceedings of the IEEE.

[9]  Changshui Zhang,et al.  Multiple Fundamental Frequency Estimation by Modeling Spectral Peaks and Non-Peak Regions , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Quansheng Chen,et al.  Feasibility study on identification of green, black and Oolong teas using near-infrared reflectance spectroscopy based on support vector machine (SVM). , 2007, Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy.

[11]  Alessandro Torricelli,et al.  Nondestructive measurement of fruit and vegetable quality. , 2014, Annual review of food science and technology.

[12]  Hai-Long Wu,et al.  Ensemble preprocessing of near-infrared (NIR) spectra for multivariate calibration. , 2008, Analytica chimica acta.

[13]  J.C. Rajapakse,et al.  SVM-RFE With MRMR Filter for Gene Selection , 2010, IEEE Transactions on NanoBioscience.

[14]  Anne-Laure Boulesteix,et al.  Partial least squares: a versatile tool for the analysis of high-dimensional genomic data , 2006, Briefings Bioinform..

[15]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[16]  B. Stuart Infrared Spectroscopy , 2004, Analytical Techniques in Forensic Science.

[17]  Yong Shi,et al.  An alternative approach for the classification of orange varieties based on near infrared spectroscopy , 2013 .

[18]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[19]  M. Barker,et al.  Partial least squares for discrimination , 2003 .

[20]  LarrañagaPedro,et al.  A review of feature selection techniques in bioinformatics , 2007 .

[21]  Ronei J. Poppi,et al.  Direct analysis of the main chemical constituents in Chenopodium quinoa grain using Fourier transform near-infrared spectroscopy , 2015 .

[22]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[23]  Ding Xiangqian,et al.  Application of High-Dimensional Feature Selection in Near-Infrared Spectroscopy of Cigarettes' Qualitative Evaluation , 2013 .

[24]  Melanie Hilario,et al.  Knowledge and Information Systems , 2007 .

[25]  D. Derewiaka,et al.  Detection of adulteration of extra virgin olive oils available on the Polish market , 2016 .

[26]  J. Kalivas Two data sets of near infrared spectra , 1997 .

[27]  D. Cozzolino,et al.  Classification of smoke tainted wines using mid-infrared spectroscopy and chemometrics. , 2012, Journal of agricultural and food chemistry.

[28]  Changshui Zhang,et al.  Unsupervised Single-Channel Music Source Separation by Average Harmonic Structure Modeling , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[29]  E. K. Kemsley,et al.  Use of Fourier transform infrared spectroscopy and partial least squares regression for the detection of adulteration of strawberry purées , 1998 .

[30]  Gözde Gürdeniz,et al.  Detection of adulteration of extra-virgin olive oil by chemometric analysis of mid-infrared spectral data , 2009 .

[31]  Graham E. Poliner,et al.  Melody Transcription From Music Audio: Approaches and Evaluation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[32]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[33]  Julius O. Smith,et al.  PARSHL: An Analysis/Synthesis Program for Non-Harmonic Sounds Based on a Sinusoidal Representation , 1987, ICMC.

[34]  Tahir Mehmood,et al.  A review of variable selection methods in Partial Least Squares Regression , 2012 .

[35]  Elizabeth Tapia,et al.  Sparse and stable gene selection with consensus SVM-RFE , 2012, Pattern Recognit. Lett..

[36]  Ming Chun. Liu,et al.  Content-based audio classification and retrieval. , 2005 .

[37]  Douglas N Rutledge,et al.  Rapid discrimination of plastic packaging materials using MIR spectroscopy coupled with independent components analysis (ICA). , 2014, Waste management.

[38]  Yong He,et al.  A feature-selection algorithm based on Support Vector Machine-Multiclass for hyperspectral visible spectral analysis , 2013 .

[39]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[40]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[41]  Yun Xu,et al.  Support Vector Machines: A Recent Method for Classification in Chemometrics , 2006 .

[42]  R. V. Rossel,et al.  Visible, near infrared, mid infrared or combined diffuse reflectance spectroscopy for simultaneous assessment of various soil properties , 2006 .

[43]  David J. Hewson,et al.  Classifying NIR spectra of textile products with kernel methods , 2007, Eng. Appl. Artif. Intell..

[44]  A. Peirs,et al.  Nondestructive measurement of fruit and vegetable quality by means of NIR spectroscopy: A review , 2007 .

[45]  Márcio José Coelho Pontes,et al.  Classification of cereal bars using near infrared spectroscopy and linear discriminant analysis , 2013 .

[46]  Barbara Gouble,et al.  Rapid and non-destructive analysis of apricot fruit quality using FT-near-infrared spectroscopy. , 2009 .

[47]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[48]  Mahdi Ghasemi-Varnamkhasti,et al.  NIR spectroscopy coupled with multivariate computational tools for qualitative characterization of the aging of beer , 2014 .

[49]  Melanie Hilario,et al.  Stability of feature selection algorithms , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[50]  Danilo Monarca,et al.  Nondestructive detection of insect infested chestnuts based on NIR spectroscopy , 2014 .