DART: Denoising Algorithm based on Relevance network Topology improves molecular pathway activity inference

BackgroundInferring molecular pathway activity is an important step towards reducing the complexity of genomic data, understanding the heterogeneity in clinical outcome, and obtaining molecular correlates of cancer imaging traits. Increasingly, approaches towards pathway activity inference combine molecular profiles (e.g gene or protein expression) with independent and highly curated structural interaction data (e.g protein interaction networks) or more generally with prior knowledge pathway databases. However, it is unclear how best to use the pathway knowledge information in the context of molecular profiles of any given study.ResultsWe present an algorithm called DART (Denoising Algorithm based on Relevance network Topology) which filters out noise before estimating pathway activity. Using simulated and real multidimensional cancer genomic data and by comparing DART to other algorithms which do not assess the relevance of the prior pathway information, we here demonstrate that substantial improvement in pathway activity predictions can be made if prior pathway information is denoised before predictions are made. We also show that genes encoding hubs in expression correlation networks represent more reliable markers of pathway activity. Using the Netpath resource of signalling pathways in the context of breast cancer gene expression data we further demonstrate that DART leads to more robust inferences about pathway activity correlations. Finally, we show that DART identifies a hypothesized association between oestrogen signalling and mammographic density in ER+ breast cancer.ConclusionsEvaluating the consistency of prior information of pathway databases in molecular tumour profiles may substantially improve the subsequent inference of pathway activity in clinical tumour specimens. This de-noising strategy should be incorporated in approaches which attempt to infer pathway activity from prior pathway models.

[1]  Peter Kraft,et al.  Common variants in ZNF365 are associated with both mammographic density and breast cancer risk , 2011, Nature Genetics.

[2]  Benjamin J. Raphael,et al.  Integrated Genomic Analyses of Ovarian Carcinoma , 2011, Nature.

[3]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Michael A. White,et al.  Use of Data-Biased Random Walks on Graphs for the Retrieval of Context-Specific Networks from Genomic Data , 2010, PLoS Comput. Biol..

[5]  Charles M Perou,et al.  EGFR associated expression profiles vary with breast tumor subtype , 2007, BMC Genomics.

[6]  J. Foekens,et al.  Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer , 2005, The Lancet.

[7]  T. Ideker,et al.  Network-based classification of breast cancer metastasis , 2007, Molecular systems biology.

[8]  Torben Lüders,et al.  Expression levels of uridine 5'-diphospho-glucuronosyltransferase genes in breast tissue from healthy women are associated with mammographic density , 2010, Breast Cancer Research.

[9]  Hongmin Li,et al.  A Precisely Regulated Gene Expression Cassette Potently Modulates Metastasis and Survival in Multiple Solid Cancers , 2008, PLoS genetics.

[10]  T. Golub,et al.  mTOR inhibition reverses Akt-dependent prostate intraepithelial neoplasia through regulation of apoptotic and HIF-1-dependent pathways , 2004, Nature Medicine.

[11]  A. Chinnaiyan,et al.  Activation of mitogen-activated protein kinase in estrogen receptor alpha-positive breast cancer cells in vitro induces an in vivo molecular phenotype of estrogen receptor alpha-negative human breast tumors. , 2006, Cancer research.

[12]  Hagai Attias,et al.  Inferring Parameters and Structure of Latent Variable Models by Variational Bayes , 1999, UAI.

[13]  G. Getz,et al.  DNA microarrays identification of primary and secondary target genes regulated by p53 , 2001, Oncogene.

[14]  Alex Arenas,et al.  Improved prognostic classification of breast cancer defined by antagonistic activation patterns of immune response pathway modules , 2010, BMC Cancer.

[15]  Jianjun Liu,et al.  Genetic variation in the estrogen metabolic pathway and mammographic density as an intermediate phenotype of breast cancer , 2010, Breast Cancer Research.

[16]  W. Gerald,et al.  An estrogen receptor-negative breast cancer subset characterized by a hormonally regulated transcriptional program and response to androgen , 2006, Oncogene.

[17]  P. Khatri,et al.  A systems biology approach for pathway level analysis. , 2007, Genome research.

[18]  Zhi Hu,et al.  Integrated analysis of breast cancer cell lines reveals unique signaling pathways , 2009, Genome Biology.

[19]  Leonard D. Goldstein,et al.  MicroRNA expression profiling of human breast cancer identifies new markers of tumor subtype , 2007, Genome Biology.

[20]  Howard Y. Chang,et al.  Decoding global gene expression programs in liver cancer by noninvasive imaging , 2007, Nature Biotechnology.

[21]  Ron Shamir,et al.  Identification of functional modules using network topology and high-throughput data , 2007, BMC Systems Biology.

[22]  Andrew E. Teschendorff,et al.  PACK: Profile Analysis using Clustering and Kurtosis to find molecular classifiers in cancer , 2006, Bioinform..

[23]  Jeffrey T. Chang,et al.  A genomic strategy to elucidate modules of oncogenic pathway signaling networks. , 2009, Molecular cell.

[24]  J. Bergh,et al.  Definition of clinically distinct molecular subtypes in estrogen receptor-positive breast carcinomas through genomic grade. , 2007, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[25]  Chuhsing Kate Hsiao,et al.  Identification of a Novel Biomarker, SEMA5A, for Non–Small Cell Lung Carcinoma in Nonsmoking Women , 2010, Cancer Epidemiology, Biomarkers & Prevention.

[26]  Andrew E. Teschendorff,et al.  A variational Bayesian mixture modelling framework for cluster analysis of gene-expression data , 2005, Bioinform..

[27]  Stephen W Duffy,et al.  Using mammographic density to predict breast cancer risk: dense area or percentage dense area , 2010, Breast Cancer Research.

[28]  Eytan Domany,et al.  Outcome signature genes in breast cancer: is there a unique set? , 2004, Breast Cancer Research.

[29]  Pooja Mittal,et al.  A novel signaling pathway impact analysis , 2009, Bioinform..

[30]  K. Kinzler,et al.  Cancer genes and the pathways they control , 2004, Nature Medicine.

[31]  H. Kölbl,et al.  The humoral immune system has a key prognostic impact in node-negative breast cancer. , 2008, Cancer research.

[32]  Shinichiro Wachi,et al.  Interactome-transcriptome analysis reveals the high centrality of genes differentially expressed in lung cancer tissues , 2005, Bioinform..

[33]  Carlos Caldas,et al.  Dysregulated expression of Fau and MELK is associated with poor prognosis in breast cancer , 2009, Breast Cancer Research.

[34]  Doheon Lee,et al.  Inferring Pathway Activity toward Precise Disease Classification , 2008, PLoS Comput. Biol..

[35]  A. Nobel,et al.  The molecular portraits of breast tumors are conserved across microarray platforms , 2006, BMC Genomics.

[36]  A. Chinnaiyan,et al.  Activation of mitogen-activated protein kinase in estrogen receptor alpha-positive breast cancer cells in vitro induces an in vivo molecular phenotype of estrogen receptor alpha-negative human breast tumors. , 2006, Cancer research.

[37]  Carlos Caldas,et al.  ZNF703 is a common Luminal B breast cancer oncogene that differentially regulates luminal and basal progenitors in human mammary epithelium , 2011, EMBO molecular medicine.

[38]  Jeffrey T. Chang,et al.  Oncogenic pathway signatures in human cancers as a guide to targeted therapies , 2006, Nature.

[39]  Ajay N. Jain,et al.  Genomic and transcriptional aberrations linked to breast cancer pathophysiologies. , 2006, Cancer cell.

[40]  S. Wacholder,et al.  Gene Expression Signature of Cigarette Smoking and Its Role in Lung Adenocarcinoma Development and Survival , 2008, PloS one.

[41]  Gary D Bader,et al.  NetPath: a public resource of curated signal transduction pathways , 2010, Genome Biology.

[42]  Chi-Ying F. Huang,et al.  Selection of DDX5 as a novel internal control for Q-RT-PCR from microarray data using a block bootstrap re-sampling scheme , 2007, BMC Genomics.

[43]  BMC Bioinformatics , 2005 .

[44]  P. Park,et al.  Discovering statistically significant pathways in expression profiling studies. , 2005, Proceedings of the National Academy of Sciences of the United States of America.