Bioinformatic-driven search for metabolic biomarkers in disease

The search and validation of novel disease biomarkers requires the complementary power of professional study planning and execution, modern profiling technologies and related bioinformatics tools for data analysis and interpretation. Biomarkers have considerable impact on the care of patients and are urgently needed for advancing diagnostics, prognostics and treatment of disease. This survey article highlights emerging bioinformatics methods for biomarker discovery in clinical metabolomics, focusing on the problem of data preprocessing and consolidation, the data-driven search, verification, prioritization and biological interpretation of putative metabolic candidate biomarkers in disease. In particular, data mining tools suitable for the application to omic data gathered from most frequently-used type of experimental designs, such as case-control or longitudinal biomarker cohort studies, are reviewed and case examples of selected discovery steps are delineated in more detail. This review demonstrates that clinical bioinformatics has evolved into an essential element of biomarker discovery, translating new innovations and successes in profiling technologies and bioinformatics to clinical application.

[1]  Bernhard Pfeifer,et al.  Demoting redundant features to improve the discriminatory ability in cancer data , 2009, J. Biomed. Informatics.

[2]  Weida Tong,et al.  Molecular biomarkers: a US FDA effort. , 2010, Biomarkers in medicine.

[3]  Thomas Wetter,et al.  Feature construction can improve diagnostic criteria for high-dimensional metabolic data in newborn screening for medium-chain acyl-CoA dehydrogenase deficiency. , 2007, Clinical chemistry.

[4]  K. Hotakainen,et al.  Developing biomarkers for improved diagnosis and treatment outcome monitoring of bladder cancer , 2010, Expert opinion on biological therapy.

[5]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[6]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[7]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[8]  Priyanka Gupta,et al.  BioWarehouse: a bioinformatics database warehouse toolkit , 2006, BMC Bioinformatics.

[9]  John E Hale,et al.  The role of mass spectrometry in biomarker discovery and measurement. , 2006, Current drug metabolism.

[10]  Maria Joseph,et al.  Guilt-by-association feature selection: Identifying biomarkers from proteomic profiles , 2008, J. Biomed. Informatics.

[11]  C. Baumgartner,et al.  Non-invasive diagnosis of liver diseases by breath analysis using an optimized ion–molecule reaction-mass spectrometry approach: a pilot study , 2010, Biomarkers : biochemical indicators of exposure, response, and susceptibility to chemicals.

[12]  S. Grisolía,et al.  Changes in the levels of urea cycle enzymes and in metabolites thereof in diabetes. , 1981, Enzyme.

[13]  J. Epstein,et al.  Theranostic and prognostic biomarkers: genomic applications in urological malignancies. , 2010, Pathology.

[14]  Bernhard Pfeifer,et al.  A new data mining approach for profiling and categorizing kinetic patterns of metabolic biomarkers after myocardial injury , 2010, Bioinform..

[15]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[16]  G. Wu,et al.  Impaired arginine metabolism and NO synthesis in coronary endothelial cells of the spontaneously diabetic BB rat. , 1995, The American journal of physiology.

[17]  Christian Böhm,et al.  Modelling of classification rules on metabolic patterns including machine learning and expert knowledge , 2005, J. Biomed. Informatics.

[18]  Martin Kussmann,et al.  OMICS-driven biomarker discovery in nutrition and health. , 2006, Journal of biotechnology.

[19]  R. Rosenson New Technologies Personalize Diagnostics and Therapeutics , 2010, Current atherosclerosis reports.

[20]  M. Verani,et al.  Echocardiography-guided ethanol septal reduction for hypertrophic obstructive cardiomyopathy. , 1998, Circulation.

[21]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[22]  D. DeMets,et al.  Biomarkers and surrogate endpoints: Preferred definitions and conceptual framework , 2001, Clinical pharmacology and therapeutics.

[23]  Bernhard Pfeifer,et al.  A data warehouse for prostate cancer biomarker discovery , 2007, International Conference on Bioinformatics & Computational Biology.

[24]  M. Pencina,et al.  Novel and conventional biomarkers for prediction of incident cardiovascular events in the community. , 2009, JAMA.

[25]  E. Lehmann Testing Statistical Hypotheses , 1960 .

[26]  Ian D Wilson,et al.  Analytical strategies in metabonomics. , 2007, Journal of proteome research.

[27]  D. Hosmer,et al.  Applied Logistic Regression , 1991 .

[28]  B. Hammock,et al.  Mass spectrometry-based metabolomics. , 2007, Mass spectrometry reviews.

[29]  Peng Wang,et al.  Machine learning in bioinformatics: A brief survey and recommendations for practitioners , 2006, Comput. Biol. Medicine.

[30]  Peer Bork,et al.  KEGG Atlas mapping for global analysis of metabolic pathways , 2008, Nucleic Acids Res..

[31]  Cesare Furlanello,et al.  Machine learning methods for predictive proteomics , 2007, Briefings Bioinform..

[32]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[33]  Qiong Gao,et al.  Resources for integrative systems biology: from data through databases to networks and dynamic system models , 2006, Briefings Bioinform..

[34]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[35]  P. Harper,et al.  A review and comparison of classification algorithms for medical decision making. , 2005, Health policy.

[36]  I. Jolliffe Principal Component Analysis , 2002 .

[37]  K K Jain,et al.  Personalised medicine for cancer: from drug development into clinical practice , 2005, Expert opinion on pharmacotherapy.

[38]  Thomas J. Wang,et al.  The search for new cardiovascular biomarkers , 2008, Nature.

[39]  Christian Baumgartner,et al.  Metabolite profiling of blood from individuals undergoing planned myocardial infarction reveals early markers of myocardial injury. , 2008, The Journal of clinical investigation.

[41]  Christos Faloutsos,et al.  LOCI: fast outlier detection using the local correlation integral , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[42]  Susmita Datta,et al.  Feature selection and machine learning with mass spectrometry data. , 2010, Methods in molecular biology.

[43]  Edward R. Dougherty,et al.  Quantification of the Impact of Feature Selection on the Variance of Cross-Validation Error Estimation , 2007, EURASIP J. Bioinform. Syst. Biol..

[44]  R. Bowser,et al.  The application of biomarkers in clinical trials for motor neuron disease. , 2010, Biomarkers in medicine.

[45]  Paul D. Williams,et al.  Data mining in genomics. , 2008, Clinics in laboratory medicine.

[46]  Bernhard Pfeifer,et al.  A new ensemble-based algorithm for identifying breath gas marker candidates in liver disease using ion molecule reaction mass spectrometry , 2009, Bioinform..

[47]  Brian L Hood,et al.  Development of high-throughput mass spectrometry-based approaches for cancer biomarker discovery and implementation. , 2009, Clinics in laboratory medicine.

[48]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[49]  Zheng Rong Yang,et al.  Biological applications of support vector machines , 2004, Briefings Bioinform..

[50]  Sean Ekins,et al.  Pathway mapping tools for analysis of high content data. , 2007, Methods in molecular biology.

[51]  Maria P. Pavlou,et al.  Integrating high-throughput technologies in the quest for effective biomarkers for ovarian cancer , 2010, Nature Reviews Cancer.

[52]  Sorin Draghici,et al.  Machine Learning and Its Applications to Biology , 2007, PLoS Comput. Biol..

[53]  David J. Hand,et al.  Analysis of Repeated Measures , 1990 .

[54]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[55]  Elise C. Kohn,et al.  Proteomics as a Tool for Biomarker Discovery , 2007, Disease markers.

[56]  Bernhard Pfeifer,et al.  A new rule-based algorithm for identifying metabolic markers in prostate cancer using tandem mass spectrometry , 2008, Bioinform..

[57]  J. Dear,et al.  The application of mass-spectrometry-based protein biomarker discovery to theragnostics. , 2010, British journal of clinical pharmacology.

[58]  Pantelis G Bagos,et al.  Statistical Applications in Genetics and Molecular Biology A Method for Meta-Analysis of Case-Control Genetic Association Studies Using Logistic Regression , 2011 .

[59]  Marko Grobelnik,et al.  Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II , 2009 .

[60]  Christian Böhm,et al.  Supervised machine learning techniques for the classification of metabolic disorders in newborns , 2004, Bioinform..

[61]  Yvan Saeys,et al.  Robust Feature Selection Using Ensemble Feature Selection Techniques , 2008, ECML/PKDD.

[62]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[63]  L. M. Akella,et al.  SeMoP: a new computational strategy for the unrestricted search for modified peptides using LC-MS/MS data. , 2008, Journal of proteome research.

[64]  S. Schuster,et al.  Understanding the roadmap of metabolism by pathway analysis. , 2007, Methods in molecular biology.

[65]  Vladimir Shulaev,et al.  Metabolomics technology and bioinformatics , 2006, Briefings Bioinform..

[66]  Douglas G Altman,et al.  Key Issues in Conducting a Meta-Analysis of Gene Expression Microarray Datasets , 2008, PLoS medicine.

[67]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[68]  Concha Bielza,et al.  Machine Learning in Bioinformatics , 2008, Encyclopedia of Database Systems.

[69]  John Draper,et al.  Predicting interpretability of metabolome models based on behavior, putative identity, and biological relevance of explanatory signals , 2006, Proceedings of the National Academy of Sciences.

[70]  M. Fiegl,et al.  Noninvasive detection of lung cancer by analysis of exhaled breath , 2009, BMC Cancer.

[71]  Rainer Schrader,et al.  Metabolic pathway analysis web service (Pathway Hunter Tool at CUBIC) , 2005, Bioinform..

[72]  Sarunas Raudys,et al.  Statistical and Neural Classifiers , 2001, Advances in Pattern Recognition.

[73]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[74]  Pedro Larrañaga,et al.  Machine learning: an indispensable tool in bioinformatics. , 2010, Methods in molecular biology.

[75]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[76]  G C Cunningham,et al.  Use of phenylalanine-to-tyrosine ratio determined by tandem mass spectrometry to improve newborn screening for phenylketonuria of early discharge specimens collected in the first 24 hours. , 1998, Clinical chemistry.

[77]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[78]  Maguelonne Teisseire,et al.  Successes and New Directions in Data Mining , 2007 .

[79]  Christian Baumgartner,et al.  Data mining and knowledge discovery in metabolomics , 2007 .

[80]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[81]  N. Rifai,et al.  Biomarker discovery and validation. , 2006, Clinical chemistry.

[82]  Gary D. Bader,et al.  cPath: open source software for collecting, storing, and querying biological pathways , 2006, BMC Bioinformatics.

[83]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..