Definition of Valid Proteomic Biomarkers: A Bayesian Solution

Clinical proteomics is suffering from high hopes generated by reports on apparent biomarkers, most of which could not be later substantiated via validation. This has brought into focus the need for improved methods of finding a panel of clearly defined biomarkers. To examine this problem, urinary proteome data was collected from healthy adult males and females, and analysed to find biomarkers that differentiated between genders. We believe that models that incorporate sparsity in terms of variables are desirable for biomarker selection, as proteomics data typically contains a huge number of variables (peptides) and few samples making the selection process potentially unstable. This suggests the application of a two-level hierarchical Bayesian probit regression model for variable selection which assumes a prior that favours sparseness. The classification performance of this method is shown to improve that of the Probabilistic K-Nearest Neighbour model.

[1]  Richard M. Everson,et al.  Intelligent Data Engineering and Automated Learning – IDEAL 2004 , 2004, Lecture Notes in Computer Science.

[2]  Jonathan E. Fieldsend,et al.  A Variable Metric Probabilistic k-Nearest-Neighbours Classifier , 2004, IDEAL.

[3]  Erika Check,et al.  Proteomics and cancer: Running before we can walk? , 2004, Nature.

[4]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[5]  A. Dominiczak,et al.  Capillary electrophoresis-mass spectrometry as a powerful tool in biomarker discovery and clinical diagnosis: an update of recent developments. , 2009, Mass spectrometry reviews.

[6]  M. Girolami,et al.  Clinical proteomics: A need to define the field and to begin to set adequate standards , 2007, Proteomics. Clinical applications.

[7]  E. Petricoin,et al.  Use of proteomic patterns in serum to identify ovarian cancer , 2002, The Lancet.

[8]  Harald Mischak,et al.  Urine in Clinical Proteomics* , 2008, Molecular & Cellular Proteomics.

[9]  Mark A. Girolami,et al.  An empirical analysis of the probabilistic K-nearest neighbour classifier , 2007, Pattern Recognit. Lett..

[10]  A. Dominiczak,et al.  CE‐MS analysis of the human urinary proteome for biomarker discovery and disease diagnostics , 2008, Proteomics. Clinical applications.

[11]  H. Mischak,et al.  Quantitative urinary proteome analysis for biomarker evaluation in chronic kidney disease. , 2009, Journal of proteome research.

[12]  C. Holmes,et al.  A probabilistic nearest neighbour method for statistical pattern recognition , 2002 .

[13]  Bani K. Mallick,et al.  Gene selection using a two-level hierarchical Bayesian model , 2004, Bioinform..