On consensus biomarker selection

BackgroundRecent development of mass spectrometry technology enabled the analysis of complex peptide mixtures. A lot of effort is currently devoted to the identification of biomarkers in human body fluids like serum or plasma, based on which new diagnostic tests for different diseases could be constructed. Various biomarker selection procedures have been exploited in recent studies. It has been noted that they often lead to different biomarker lists and as a consequence, the patient classification may also vary.ResultsHere we propose a new approach to the biomarker selection problem: to apply several competing feature ranking procedures and compute a consensus list of features based on their outcomes. We validate our methods on two proteomic datasets for the diagnosis of ovarian and prostate cancer.ConclusionThe proposed methodology can improve the classification results and at the same time provide a unified biomarker list for further biological examinations and interpretation.

[1]  Anna Gambin,et al.  Efficient Model-Based Clustering for LC-MS Data , 2006, WABI.

[2]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[3]  David Ward,et al.  Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data , 2003, Bioinform..

[4]  Bruce Randall Donald,et al.  Probabilistic Disease Classification of Expression-Dependent Proteomic Data from Mass Spectrometry of Human Serum , 2003, J. Comput. Biol..

[5]  Moni Naor,et al.  Rank aggregation methods for the Web , 2001, WWW '01.

[6]  D. Chan,et al.  Proteomics and bioinformatics approaches for identification of serum biomarkers to detect breast cancer. , 2002, Clinical chemistry.

[7]  Claudio Cobelli,et al.  Ovarian cancer identification based on dimensionality reduction for high-throughput mass spectrometry data , 2005, Bioinform..

[8]  Brian D. Ripley,et al.  Modern Applied Statistics with S Fourth edition , 2002 .

[9]  R. Abagyan,et al.  XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. , 2006, Analytical chemistry.

[10]  I. Jolliffe Principal Component Analysis , 2002 .

[11]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[12]  Piotr Pokarowski,et al.  Directed forests with application to algorithms related to Markov chains , 1999 .

[13]  Usha Menon,et al.  Progress and Challenges in Screening for Early Detection of Ovarian Cancer* , 2004, Molecular & Cellular Proteomics.

[14]  E. Petricoin,et al.  Use of proteomic patterns in serum to identify ovarian cancer , 2002, The Lancet.

[15]  Jerzy Tiuryn,et al.  Automated reduction and interpretation of multidimensional mass spectra for analysis of complex peptide mixtures , 2007 .

[16]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[17]  D. Edwards,et al.  Statistical Analysis of Gene Expression Microarray Data , 2003 .

[18]  G. Jones,et al.  Information and Coding Theory , 2000 .

[19]  Walter Krämer,et al.  Review of Modern applied statistics with S, 4th ed. by W.N. Venables and B.D. Ripley. Springer-Verlag 2002 , 2003 .

[20]  R. Tibshirani,et al.  Prediction by Supervised Principal Components , 2006 .

[21]  Pierre Geurts,et al.  Proteomic mass spectra classification using decision tree based ensemble methods , 2005, Bioinform..

[22]  Anna Gambin,et al.  A Combinatorial Aggregation Algorithm for Stationary Distribution of a Large Markov Chain , 2001, FCT.

[23]  Winfried K. Grassmann,et al.  Regenerative Analysis and Steady State Distributions for Markov Chains , 1985, Oper. Res..

[24]  Robert Tibshirani,et al.  Sample classification from protein mass spectrometry, by 'peak probability contrasts' , 2004, Bioinform..

[25]  P. Schellhammer,et al.  Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. , 2002, Cancer research.

[26]  Yishay Mansour,et al.  Learning with Maximum-Entropy Distributions , 1997, COLT '97.

[27]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .