Differential profiling of volatile organic compound biomarker signatures utilizing a logical statistical filter-set and novel hybrid evolutionary classifiers

A growing body of discoveries in molecular signatures has revealed that volatile organic compounds (VOCs), the small molecules associated with an individual's odor and breath, can be monitored to reveal the identity and presence of a unique individual, as well their overall physiological status. Given the analysis requirements for differential VOC profiling via gas chromatography/mass spectrometry, our group has developed a novel informatics platform, Metabolite Differentiation and Discovery Lab (MeDDL). In its current version, MeDDL is a comprehensive tool for time-series spectral registration and alignment, visualization, comparative analysis, and machine learning to facilitate the efficient analysis of multiple, large-scale biomarker discovery studies. The MeDDL toolset can therefore identify a large differential subset of registered peaks, where their corresponding intensities can be used as features for classification. This initial screening of peaks yields results sets that are typically too large for incorporation into a portable, electronic nose based system in addition to including VOCs that are not amenable to classification; consequently, it is also important to identify an optimal subset of these peaks to increase classification accuracy and to decrease the cost of the final system. MeDDL's learning tools include a classifier similar to a K-nearest neighbor classifier used in conjunction with a genetic algorithm (GA) that simultaneously optimizes the classifier and subset of features. The GA uses ROC curves to produce classifiers having maximal area under their ROC curve. Experimental results on over a dozen recognition problems show many examples of classifiers and feature sets that produce perfect ROC curves.

[1]  Jianjiang Lu,et al.  Feature selection based-on genetic algorithm for image annotation , 2008, Knowl. Based Syst..

[2]  H. Spitzer,et al.  Temporal encoding of two-dimensional patterns by single units in primate inferior temporal cortex. I. Response characteristics. , 1987, Journal of neurophysiology.

[3]  Alexander Erban,et al.  TagFinder for the quantitative analysis of gas chromatography - mass spectrometry (GC-MS)-based metabolite profiling experiments , 2008, Bioinform..

[4]  David G. Stork,et al.  Pattern Classification , 1973 .

[5]  Masaru Tomita,et al.  MathDAMP: a package for differential analysis of metabolite profiles , 2006, BMC Bioinformatics.

[6]  R. Abagyan,et al.  XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. , 2006, Analytical chemistry.

[7]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[8]  Claude C. Grigsby,et al.  Metabolite differentiation and discovery lab (MeDDL): a new tool for biomarker discovery and mass spectral visualization. , 2010, Analytical chemistry.

[9]  Dieter Jahn,et al.  MetaQuant: a tool for the automatic quantification of GC/MS-based metabolome data , 2006, Bioinform..

[10]  Corey D Broeckling,et al.  MET-IDEA: data extraction tool for mass spectrometry-based metabolomics. , 2006, Analytical chemistry.

[11]  Carol J. Bult,et al.  The mouse as a model for human biology: a resource guide for complex trait analysis , 2007, Nature Reviews Genetics.

[12]  Cheng-Lung Huang,et al.  A GA-based feature selection and parameters optimizationfor support vector machines , 2006, Expert Syst. Appl..

[13]  Matej Oresic,et al.  Processing methods for differential analysis of LC/MS profile data , 2005, BMC Bioinformatics.

[14]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .