Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM

Motivation: High-throughput data is providing a comprehensive view of the molecular changes in cancer tissues. New technologies allow for the simultaneous genome-wide assay of the state of genome copy number variation, gene expression, DNA methylation and epigenetics of tumor samples and cancer cell lines. Analyses of current data sets find that genetic alterations between patients can differ but often involve common pathways. It is therefore critical to identify relevant pathways involved in cancer progression and detect how they are altered in different patients. Results: We present a novel method for inferring patient-specific genetic activities incorporating curated pathway interactions among genes. A gene is modeled by a factor graph as a set of interconnected variables encoding the expression and known activity of a gene and its products, allowing the incorporation of many types of omic data as evidence. The method predicts the degree to which a pathway's activities (e.g. internal gene states, interactions or high-level ‘outputs’) are altered in the patient using probabilistic inference. Compared with a competing pathway activity inference approach called SPIA, our method identifies altered activities in cancer-related pathways with fewer false-positives in both a glioblastoma multiform (GBM) and a breast cancer dataset. PARADIGM identified consistent pathway-level activities for subsets of the GBM patients that are overlooked when genes are considered in isolation. Further, grouping GBM patients based on their significant pathway perturbations divides them into clinically-relevant subgroups having significantly different survival outcomes. These findings suggest that therapeutics might be chosen that target genes at critical points in the commonly perturbed pathway(s) of a group of patients. Availability:Source code available at http://sbenz.github.com/Paradigm Contact: jstuart@soe.ucsc.edu Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[2]  R. Shamir,et al.  Refinement and expansion of signaling pathways: the osmotic response network in yeast. , 2007, Genome research.

[3]  PagelPhilipp,et al.  The MIPS mammalian protein--protein interaction database , 2005 .

[4]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Michael I. Jordan,et al.  Loopy Belief Propagation for Approximate Inference: An Empirical Study , 1999, UAI.

[6]  D. Allison,et al.  Microarray data analysis: from disarray to consolidation and consensus , 2006, Nature Reviews Genetics.

[7]  D. Pe’er,et al.  Identifying regulatory mechanisms using individual variation reveals key role for chromatin modification , 2006, Proceedings of the National Academy of Sciences.

[8]  J. Wisell,et al.  Meta-analysis of gene expression profiles in breast cancer: toward a unified understanding of breast cancer subtyping and prognosis signatures , 2010 .

[9]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[10]  D. Busam,et al.  An Integrated Genomic Analysis of Human Glioblastoma Multiforme , 2008, Science.

[11]  Jie Zhou,et al.  Akt1 governs breast cancer progression in vivo , 2007, Proceedings of the National Academy of Sciences.

[12]  Kenneth H. Buetow,et al.  PID: the Pathway Interaction Database , 2008, Nucleic Acids Res..

[13]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[14]  Nir Friedman,et al.  Sequential Update of Bayesian Network Structure , 1997, UAI.

[15]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[16]  Raymond Sawaya,et al.  FoxM1B is overexpressed in human glioblastomas and critically regulates the tumorigenicity of glioma cells. , 2006, Cancer research.

[17]  R. Neve,et al.  Unraveling the biologic and clinical complexities of HER2. , 2008, Clinical breast cancer.

[18]  D. Bigner,et al.  EGF mutant receptor vIII as a molecular target in cancer therapy. , 2001, Endocrine-related cancer.

[19]  Russ B. Altman,et al.  Nonparametric methods for identifying differentially expressed genes in microarray data , 2002, Bioinform..

[20]  Dmitrij Frishman,et al.  The MIPS mammalian protein?Cprotein interaction database , 2005, Bioinform..

[21]  Ting Wang,et al.  The UCSC Cancer Genomics Browser , 2009, Nature Methods.

[22]  Kenneth H. Buetow,et al.  Identification of Key Processes Underlying Cancer Phenotypes Using Biologic Pathway Analysis , 2007, PloS one.

[23]  Robert Tibshirani,et al.  Statistical methods for identifying differentially expressed genes in DNA microarrays. , 2003, Methods in molecular biology.

[24]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[25]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Van,et al.  A gene-expression signature as a predictor of survival in breast cancer. , 2002, The New England journal of medicine.

[27]  I. Ellis,et al.  A gene-expression signature to predict survival in breast cancer across independent data sets , 2007, Oncogene.

[28]  Lincoln Stein,et al.  Reactome: a knowledgebase of biological pathways , 2004, Nucleic Acids Res..

[29]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[30]  Yumei Song,et al.  A Novel Signaling Pathway , 2009, The Journal of Biological Chemistry.

[31]  G. Semenza,et al.  HIF-1 and human disease: one highly involved factor. , 2000, Genes & development.

[32]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[33]  L. Harris,et al.  First-line, single-agent Herceptin(R) (trastuzumab) in metastatic breast cancer. a preliminary report. , 2001, European journal of cancer.

[34]  Nir Friedman,et al.  Inferring Cellular Networks Using Probabilistic Graphical Models , 2004, Science.

[35]  D. Koller,et al.  From signatures to models: understanding cancer using microarrays , 2005, Nature Genetics.

[36]  Pooja Mittal,et al.  A novel signaling pathway impact analysis , 2009, Bioinform..

[37]  Joshua M. Korn,et al.  Comprehensive genomic characterization defines human glioblastoma genes and core pathways , 2008, Nature.

[38]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[39]  Ron Shamir,et al.  A Probabilistic Methodology for Integrating Knowledge and Experiments on Biological Networks , 2006, J. Comput. Biol..

[40]  S. Dudoit,et al.  A prediction-based resampling method for estimating the number of clusters in a dataset , 2002, Genome Biology.

[41]  Michael A. Beer,et al.  Predicting Gene Expression from Sequence , 2004, Cell.

[42]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[43]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[44]  K. Sachs,et al.  Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data , 2005, Science.

[45]  Gary A. Churchill,et al.  Analysis of Variance for Gene Expression Microarray Data , 2000, J. Comput. Biol..

[46]  L. Harris,et al.  First-line, single-agent Herceptin(trastuzumab) in metastatic breast cancer: a preliminary report. , 2001, European journal of cancer.

[47]  Ron Shamir,et al.  The Factor Graph Network Model for Biological Systems , 2005, RECOMB.

[48]  S. Tavaré,et al.  High-resolution aCGH and expression profiling identifies a novel genomic subtype of ER negative breast cancer , 2007, Genome Biology.