iBAG: integrative Bayesian analysis of high-dimensional multiplatform genomics data

Motivation: Analyzing data from multi-platform genomics experiments combined with patients’ clinical outcomes helps us understand the complex biological processes that characterize a disease, as well as how these processes relate to the development of the disease. Current data integration approaches are limited in that they do not consider the fundamental biological relationships that exist among the data obtained from different platforms. Statistical Model: We propose an integrative Bayesian analysis of genomics data (iBAG) framework for identifying important genes/biomarkers that are associated with clinical outcome. This framework uses hierarchical modeling to combine the data obtained from multiple platforms into one model. Results: We assess the performance of our methods using several synthetic and real examples. Simulations show our integrative methods to have higher power to detect disease-related genes than non-integrative methods. Using the Cancer Genome Atlas glioblastoma dataset, we apply the iBAG model to integrate gene expression and methylation data to study their associations with patient survival. Our proposed method discovers multiple methylation-regulated genes that are related to patient survival, most of which have important biological functions in other diseases but have not been previously studied in glioblastoma. Availability: http://odin.mdacc.tmc.edu/∼vbaladan/. Contact: veera@mdanderson.org Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[2]  K. Kinzler,et al.  The multistep nature of cancer. , 1993, Trends in genetics : TIG.

[3]  Frank E. Harrell,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2001 .

[4]  D. Slamon,et al.  Biological rationale for HER2/neu (c-erbB2) as a target for monoclonal antibody therapy. , 2000, Seminars in oncology.

[5]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Guifang Fu,et al.  The Bayesian lasso for genome-wide association studies , 2011, Bioinform..

[7]  C. Hoggart,et al.  Simultaneous Analysis of All SNPs in Genome-Wide and Re-Sequencing Association Studies , 2008, PLoS genetics.

[8]  Wen-Lin Kuo,et al.  Amplification of MDS1/EVI1 and EVI1, located in the 3q26.2 amplicon, is associated with favorable patient prognosis in ovarian cancer. , 2007, Cancer research.

[9]  Gary D Bader,et al.  International network of cancer genome projects , 2010, Nature.

[10]  Brad T. Sherman,et al.  DAVID: Database for Annotation, Visualization, and Integrated Discovery , 2003, Genome Biology.

[11]  Ji Zhu,et al.  Regularized Multivariate Regression for Identifying Master Predictors with Application to Integrative Genomics Study of Breast Cancer. , 2008, The annals of applied statistics.

[12]  Nello Cristianini,et al.  A statistical framework for genomic data fusion , 2004, Bioinform..

[13]  Jeffrey S. Morris,et al.  Bayesian Analysis of Mass Spectrometry Proteomic Data Using Wavelet‐Based Functional Mixed Models , 2008, Biometrics.

[14]  A. Regev,et al.  SOX2 Is an Amplified Lineage Survival Oncogene in Lung and Esophageal Squamous Cell Carcinomas , 2009, Nature Genetics.

[15]  Matthew Meyerson,et al.  Somatic alterations in the human cancer genome. , 2004, Cancer cell.

[16]  J. Pounds,et al.  Data merging for integrated microarray and proteomic analysis. , 2006, Briefings in functional genomics & proteomics.

[17]  D Pinkel,et al.  Novel risk stratification of patients with neuroblastoma by genomic signature, which is independent of molecular signature , 2008, Oncogene.

[18]  Giuseppe Leone,et al.  Analysis of genome-wide methylation and gene expression induced by 5-aza-2′-deoxycytidine identifies BCL2L10 as a frequent methylation target in acute myeloid leukemia , 2010, Leukemia & lymphoma.

[19]  Ajay N. Jain,et al.  Breast tumor copy number aberration phenotypes and genomic instability , 2006, BMC Cancer.

[20]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[21]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[22]  S. Gabriel,et al.  Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. , 2010, Cancer cell.

[23]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[24]  L. Chin,et al.  Making sense of cancer genomic data. , 2011, Genes & development.

[25]  J. Griffin,et al.  BAYESIAN HYPER‐LASSOS WITH NON‐CONVEX PENALIZATION , 2011 .

[26]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[27]  Anne-Laure Boulesteix,et al.  Survival prediction using gene expression data: A review and comparison , 2009, Comput. Stat. Data Anal..

[28]  LiRunze,et al.  The Bayesian lasso for genome-wide association studies , 2011 .

[29]  M. West On scale mixtures of normal distributions , 1987 .

[30]  Dirk Troost,et al.  Overexpression of the Human Major Vault Protein in Gangliogliomas , 2003, Epilepsia.

[31]  C. Greenwood,et al.  Data Integration in Genetics and Genomics: Methods and Challenges , 2009, Human genomics and proteomics : HGP.

[32]  P. Gluckman,et al.  Transcriptional activation of signal transducer and activator of transcription (STAT) 3 and STAT5B partially mediate homeobox A1-stimulated oncogenic transformation of the immortalized human mammary epithelial cell. , 2008, Endocrinology.

[33]  Kim-Anh Do,et al.  Bayesian ensemble methods for survival prediction in gene expression data , 2011, Bioinform..

[34]  Li-Xuan Qin,et al.  An Integrative Analysis of microRNA and mRNA Expression—A Case Study , 2008, Cancer informatics.

[35]  A. Dobrovic,et al.  Clinical responses observed with imatinib or sorafenib in melanoma patients expressing mutations in KIT , 2010, British Journal of Cancer.

[36]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[37]  G. Glinsky Integration of HapMap-Based SNP Pattern Analysis and Gene Expression Profiling Reveals Common SNP Profiles for Cancer Therapy Outcome Predictor Genes* , 2006, Cell cycle.

[38]  Kimberly D. Siegmund,et al.  DNA Methylation Changes in Atypical Adenomatous Hyperplasia, Adenocarcinoma In Situ, and Lung Adenocarcinoma , 2011, PloS one.

[39]  G. Casella,et al.  Explaining the Gibbs Sampler , 1992 .

[40]  S. Lê,et al.  BMC Genomics BioMed Central Methodology article Simultaneous analysis of distinct Omics data sets with integration of biological knowledge: Multiple Factor Analysis approach , 2008 .

[41]  Wessel N. van Wieringen,et al.  Matching of array CGH and gene expression microarray features for the purpose of integrative genomic analyses , 2011, BMC Bioinformatics.

[42]  Walter Berger,et al.  Overexpression of the human major vault protein in astrocytic brain tumor cells , 2001, International journal of cancer.

[43]  Mihaela Campan,et al.  DNA methylation profiles of gastric carcinoma characterized by quantitative DNA methylation analysis , 2008, Laboratory Investigation.

[44]  R. Wilson,et al.  Identification of a CpG island methylator phenotype that defines a distinct subgroup of glioma. , 2010, Cancer cell.

[45]  In Ah Kim,et al.  Promoter CpG island hypermethylation during breast cancer progression , 2010, Virchows Archiv.

[46]  Robert Tibshirani,et al.  Statistical Significance for Genome-Wide Experiments , 2003 .

[47]  Daniela M Witten,et al.  Extensions of Sparse Canonical Correlation Analysis with Applications to Genomic Data , 2009, Statistical applications in genetics and molecular biology.

[48]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[49]  J. Griffin,et al.  Inference with normal-gamma prior distributions in regression problems , 2010 .

[50]  Deepayan Sarkar,et al.  Detecting differential gene expression with a semiparametric hierarchical mixture method. , 2004, Biostatistics.

[51]  Hsuan-Cheng Huang,et al.  Integrative network analysis reveals active microRNAs and their functions in gastric cancer , 2011, BMC Systems Biology.

[52]  Yang Dai,et al.  Relational database index choices for genome annotation data , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW).

[53]  Yonghong Xiao,et al.  GOLPH3 modulates mTOR signaling and rapamycin sensitivity in cancer , 2009, Nature.

[54]  J. Suykens,et al.  A kernel-based integration of genome-wide data for clinical decision support , 2009, Genome Medicine.