Cross-platform Analysis of Cancer Biomarkers: A Bayesian Network Approach to Incorporating Mass Spectrometry and Microarray Data

Many studies showed inconsistent cancer biomarkers due to bioinformatics artifacts. In this paper we use multiple data sets from microarrays, mass spectrometry, protein sequences, and other biological knowledge in order to improve the reliability of cancer biomarkers. We present a novel Bayesian network (BN) model which integrates and cross-annotates multiple data sets related to prostate cancer. The main contribution of this study is that we provide a method that is designed to find cancer biomarkers whose presence is supported by multiple data sources and biological knowledge. Relevant biological knowledge is explicitly encoded into the model parameters, and the biomarker finding problem is formulated as a Bayesian inference problem. Besides diagnostic accuracy, we introduce reliability as another quality measurement of the biological relevance of biomarkers. Based on the proposed BN model, we develop an empirical scoring scheme and a simulation algorithm for inferring biomarkers. Fourteen genes/proteins including prostate specific antigen (PSA) are identified as reliable serum biomarkers which are insensitive to the model assumptions. The computational results show that our method is able to find biologically relevant biomarkers with highest reliability while maintaining competitive predictive power. In addition, by combining biological knowledge and data from multiple platforms, the number of putative biomarkers is greatly reduced to allow more-focused clinical studies.

[1]  Bruce Randall Donald,et al.  Probabilistic Disease Classification of Expression-Dependent Proteomic Data from Mass Spectrometry of Human Serum , 2003, J. Comput. Biol..

[2]  M S Pepe,et al.  Phases of biomarker development for early detection of cancer. , 2001, Journal of the National Cancer Institute.

[3]  Jeffrey S. Morris,et al.  Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments , 2004, Bioinform..

[4]  James Lyons-Weiler,et al.  Standards of Excellence and Open Questions in Cancer Biomarker Research: An Informatics Perspective , 2005, Cancer informatics.

[5]  Jian Liu,et al.  Finding cancer biomarkers from mass spectrometry data by decision lists , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[6]  Hesham H. Ali,et al.  Learning yeast gene functions from heterogeneous sources of data using hybrid weighted Bayesian networks , 2005, 2005 IEEE Computational Systems Bioinformatics Conference (CSB'05).

[7]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt) , 2004, Nucleic Acids Res..

[8]  J. Crowley Introduction to proteomics: Tools for the new biology , 2002 .

[9]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[10]  Alex Pothen,et al.  Computational protein biomarker prediction: a case study for prostate cancer , 2004, BMC Bioinformatics.

[11]  Karuturi R. Krishna Murthy,et al.  Bias in the estimation of false discovery rate in microarray studies , 2005, Bioinform..

[12]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[13]  Michael J. Becich,et al.  Tests for finding complex patterns of differential expression in cancers: towards individualized medicine , 2004, BMC Bioinformatics.

[14]  Anders Krogh,et al.  Prediction of Signal Peptides and Signal Anchors by a Hidden Markov Model , 1998, ISMB.

[15]  E. Petricoin,et al.  Serum proteomic patterns for detection of prostate cancer. , 2002, Journal of the National Cancer Institute.

[16]  Xuefeng Bruce Ling,et al.  Multiclass cancer classification and biomarker discovery using GA-based algorithms , 2005, Bioinform..

[17]  Ronald J. Moore,et al.  Toward a Human Blood Serum Proteome , 2002, Molecular & Cellular Proteomics.

[18]  G. Siuzdak The Expanding Role of Mass Spectrometry in Biotechnology , 2006 .

[19]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[20]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[21]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[22]  M. Mattei,et al.  A unique gene encodes spliceoforms of the B-cell adhesion molecule cell surface glycoprotein of epithelial cancer and of the Lutheran blood group glycoprotein. , 1996, Blood.

[23]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[24]  Thomas L. Isenhour,et al.  Time-warping algorithm applied to chromatographic peak matching gas chromatography/Fourier transform infrared/mass spectrometry , 1987 .

[25]  Ming Zhou,et al.  Cancer diagnosis using proteomic patterns , 2003, Expert review of molecular diagnostics.

[26]  Min Zhan,et al.  A data review and re-assessment of ovarian cancer serum proteomic profiling , 2003, BMC Bioinformatics.

[27]  E. Petricoin,et al.  Use of proteomic patterns in serum to identify ovarian cancer , 2002, The Lancet.

[28]  M. Schrader,et al.  Composition of the peptide fraction in human blood plasma: database of circulating human peptides. , 1999, Journal of chromatography. B, Biomedical sciences and applications.

[29]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[30]  H. Fukuhara,et al.  Isolation of the TSLL1 and TSLL2 genes, members of the tumor suppressor TSLC1 gene family encoding transmembrane proteins , 2001, Oncogene.

[31]  Hesham H. Ali,et al.  Cross-platform Analysis of Cancer Biomarkers: A Bayesian Network Approach to Incorporating Mass Spectrometry and Microarray Data , 2007 .

[32]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[33]  E. Diamandis Mass Spectrometry as a Diagnostic and a Cancer Biomarker Discovery Tool , 2004, Molecular & Cellular Proteomics.

[34]  Jeffrey S. Morris,et al.  Understanding the characteristics of mass spectrometry data through the use of simulation , 2005, Cancer informatics.

[35]  S. Brunak,et al.  Improved prediction of signal peptides: SignalP 3.0. , 2004, Journal of molecular biology.

[36]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .