Efficient Feature Selection for PTR-MS Fingerprinting of Agroindustrial Products

We recently introduced the Random Forest - Recursive Feature Elimination (RF-RFE) algorithm for feature selection. In this paper we apply it to the identification of relevant features in the spectra (fingerprints) produced by Proton Transfer Reaction - Mass Spectrometry (PTR-MS) analysis of four agro-industrial products (two datasets with cultivars of Berries and other two with typical cheeses, all from North Italy). The method is compared with the more traditional Support Vector Machine - Recursive Feature Elimination (SVM-RFE), extended to allow multiclass problems. Using replicated experiments we estimate unbiased generalization errors for both methods. We analyze the stability of the two methods and find that RF-RFE is more stable than SVM-RFE in selecting small subsets of features. Our results also show that RF-RFE outperforms SVM-RFE on the task of finding small subsets of features with high discrimination levels on PTR-MS datasets.

[1]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[2]  Werner Lindinger,et al.  Proton transfer reaction mass spectrometry: on-line trace gas analysis at the ppb level , 1995 .

[3]  Franco Biasioli,et al.  Correlation of PTR-MS spectral fingerprints with sensory characterisation of flavour and odour profile of “Trentingrana” cheese , 2006 .

[4]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[5]  Geoffrey J McLachlan,et al.  Selection bias in gene extraction on the basis of microarray gene-expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Franco Biasioli,et al.  Coupling proton transfer reaction-mass spectrometry with linear discriminant analysis: a case study. , 2003, Journal of agricultural and food chemistry.

[7]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[8]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[9]  Vojislav Kecman,et al.  Gene Extraction for Cancer Diagnosis by Support Vector Machines , 2005, ICANN.

[10]  C. Furlanello,et al.  Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products , 2006 .

[11]  Franco Biasioli,et al.  Gas chromatography-olfactometry (GC-O) and proton transfer reaction-mass spectrometry (PTR-MS) analysis of the flavor profile of grana padano, parmigiano reggiano, and grana trentino cheeses. , 2003, Journal of agricultural and food chemistry.

[12]  Franco Biasioli,et al.  Fingerprinting mass spectrometry by PTR-MS: heat treatment vs. pressure treatment of red orange juice—a case study , 2003 .

[13]  Cesare Furlanello,et al.  Entropy-based gene ranking without selection bias for the predictive classification of microarray data , 2003, BMC Bioinformatics.

[14]  L. Breiman Heuristics of instability and stabilization in model selection , 1996 .

[15]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[16]  Z R Li,et al.  Prediction of genotoxicity of chemical compounds by statistical learning methods. , 2005, Chemical research in toxicology.

[17]  A. Hansel,et al.  On-line monitoring of volatile organic compounds at pptv levels by means of proton-transfer-reaction mass spectrometry (PTR-MS) medical applications, food control and environmental research , 1998 .

[18]  V. Framondino,et al.  Ruolo dell'analisi sensoriale nella definizione delle caratteristiche dei prodotti tipici: l'esempio dei formaggi trentini , 2004 .

[19]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Vojislav Kecman,et al.  Gene extraction for cancer diagnosis by support vector machines - An improvement , 2005, Artif. Intell. Medicine.

[21]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[22]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[23]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.