Integrative Bayesian Analysis of High-Dimensional Multi-platform Genomics Data

Motivation: Analyzing data from multi-platform genomics experiments combined with patients’ clinical outcomes helps us understand the complex biological processes that characterize a disease, as well as how these processes relate to the development of the disease. Current integration approaches that treat the data are limited in that they do not consider the fundamental biological relationships that exist among the data from platforms. Statistical Model: We propose an integrative Bayesian analysis of genomics data (iBAG) framework for identifying important genes/biomarkers that are associated with clinical outcome. This framework uses a hierarchical modeling technique to combine the data obtained from multiple platforms into one model. Results: We assess the performance of our methods using several synthetic and real examples. Simulations show our integrative methods to have higher power to detect disease-related genes than non-integrative methods. Using The Cancer Genome Atlas glioblastoma dataset, we apply the iBAG model to integrate expression and methylation data to study their associations with patient survival. Our proposed method discovers multiple methylation-regulated genes that are related to patient survival, most of which have important biological functions in other diseases but have not been previously studied in glioblastoma. Availability: http://odin.mdacc.tmc.edu/ ̃vbaladan/ Contact: veera@mdanderson.org Supplementary Information: Supplementary data are available at Bioinformatics online.

[1]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[2]  M. West On scale mixtures of normal distributions , 1987 .

[3]  G. Casella,et al.  Explaining the Gibbs Sampler , 1992 .

[4]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[5]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[6]  K. Kinzler,et al.  The multistep nature of cancer. , 1993, Trends in genetics : TIG.

[7]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[8]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[9]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[10]  Frank E. Harrell,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2001 .

[11]  R. Tibshirani,et al.  Statistical Applications in Genetics and Molecular Biology Pre-validation and inference in microarrays , 2011 .

[12]  Dirk Troost,et al.  Overexpression of the Human Major Vault Protein in Gangliogliomas , 2003, Epilepsia.

[13]  Brad T. Sherman,et al.  DAVID: Database for Annotation, Visualization, and Integrated Discovery , 2003, Genome Biology.

[14]  Holly Dressman,et al.  Towards integrated clinico-genomic models for personalized medicine: combining gene expression signatures and clinical factors in breast cancer outcomes prediction. , 2003, Human molecular genetics.

[15]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[16]  Nello Cristianini,et al.  A statistical framework for genomic data fusion , 2004, Bioinform..

[17]  K. Mimori,et al.  Clinical significance of Caveolin-1, Caveolin-2 and HER2/neu mRNA expression in human breast cancer , 2004, British Journal of Cancer.

[18]  Deepayan Sarkar,et al.  Detecting differential gene expression with a semiparametric hierarchical mixture method. , 2004, Biostatistics.

[19]  Matthew Meyerson,et al.  Somatic alterations in the human cancer genome. , 2004, Cancer cell.

[20]  J. Pounds,et al.  Data merging for integrated microarray and proteomic analysis. , 2006, Briefings in functional genomics & proteomics.

[21]  G. Glinsky Integration of HapMap-Based SNP Pattern Analysis and Gene Expression Profiling Reveals Common SNP Profiles for Cancer Therapy Outcome Predictor Genes* , 2006, Cell cycle.

[22]  Ajay N. Jain,et al.  Breast tumor copy number aberration phenotypes and genomic instability , 2006, BMC Cancer.

[23]  Wen-Lin Kuo,et al.  Amplification of MDS1/EVI1 and EVI1, located in the 3q26.2 amplicon, is associated with favorable patient prognosis in ovarian cancer. , 2007, Cancer research.

[24]  S. Lê,et al.  BMC Genomics BioMed Central Methodology article Simultaneous analysis of distinct Omics data sets with integration of biological knowledge: Multiple Factor Analysis approach , 2008 .

[25]  Mihaela Campan,et al.  DNA methylation profiles of gastric carcinoma characterized by quantitative DNA methylation analysis , 2008, Laboratory Investigation.

[26]  D Pinkel,et al.  Novel risk stratification of patients with neuroblastoma by genomic signature, which is independent of molecular signature , 2008, Oncogene.

[27]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[28]  J. Suykens,et al.  A kernel-based integration of genome-wide data for clinical decision support , 2009, Genome Medicine.

[29]  Li-Xuan Qin,et al.  An Integrative Analysis of microRNA and mRNA Expression—A Case Study , 2008, Cancer informatics.

[30]  P. Gluckman,et al.  Transcriptional activation of signal transducer and activator of transcription (STAT) 3 and STAT5B partially mediate homeobox A1-stimulated oncogenic transformation of the immortalized human mammary epithelial cell. , 2008, Endocrinology.

[31]  A. Regev,et al.  SOX2 Is an Amplified Lineage Survival Oncogene in Lung and Esophageal Squamous Cell Carcinomas , 2009, Nature Genetics.

[32]  Daniela M Witten,et al.  Extensions of Sparse Canonical Correlation Analysis with Applications to Genomic Data , 2009, Statistical applications in genetics and molecular biology.

[33]  C. Greenwood,et al.  Data Integration in Genetics and Genomics: Methods and Challenges , 2009, Human genomics and proteomics : HGP.

[34]  Yonghong Xiao,et al.  GOLPH3 modulates mTOR signaling and rapamycin sensitivity in cancer , 2009, Nature.

[35]  S. Gabriel,et al.  Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. , 2010, Cancer cell.

[36]  Ji Zhu,et al.  Regularized Multivariate Regression for Identifying Master Predictors with Application to Integrative Genomics Study of Breast Cancer. , 2008, The annals of applied statistics.

[37]  In Ah Kim,et al.  Promoter CpG island hypermethylation during breast cancer progression , 2010, Virchows Archiv.

[38]  Yang Dai,et al.  Relational database index choices for genome annotation data , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW).

[39]  R. Wilson,et al.  Identification of a CpG island methylator phenotype that defines a distinct subgroup of glioma. , 2010, Cancer cell.

[40]  G. Casella,et al.  Penalized regression, standard errors, and Bayesian lassos , 2010 .

[41]  Gary D Bader,et al.  International network of cancer genome projects , 2010, Nature.

[42]  J. Griffin,et al.  Inference with normal-gamma prior distributions in regression problems , 2010 .

[43]  A. Dobrovic,et al.  Clinical responses observed with imatinib or sorafenib in melanoma patients expressing mutations in KIT , 2010, British Journal of Cancer.

[44]  J. Griffin,et al.  BAYESIAN HYPER‐LASSOS WITH NON‐CONVEX PENALIZATION , 2011 .

[45]  Guifang Fu,et al.  The Bayesian lasso for genome-wide association studies , 2011, Bioinform..

[46]  Hsuan-Cheng Huang,et al.  Integrative network analysis reveals active microRNAs and their functions in gastric cancer , 2011, BMC Systems Biology.

[47]  L. Chin,et al.  Making sense of cancer genomic data. , 2011, Genes & development.

[48]  Kim-Anh Do,et al.  Bayesian ensemble methods for survival prediction in gene expression data , 2011, Bioinform..

[49]  Kimberly D. Siegmund,et al.  DNA Methylation Changes in Atypical Adenomatous Hyperplasia, Adenocarcinoma In Situ, and Lung Adenocarcinoma , 2011, PloS one.