Network enrichment analysis: extension of gene-set enrichment analysis to gene networks

BackgroundGene-set enrichment analyses (GEA or GSEA) are commonly used for biological characterization of an experimental gene-set. This is done by finding known functional categories, such as pathways or Gene Ontology terms, that are over-represented in the experimental set; the assessment is based on an overlap statistic. Rich biological information in terms of gene interaction network is now widely available, but this topological information is not used by GEA, so there is a need for methods that exploit this type of information in high-throughput data analysis.ResultsWe developed a method of network enrichment analysis (NEA) that extends the overlap statistic in GEA to network links between genes in the experimental set and those in the functional categories. For the crucial step in statistical inference, we developed a fast network randomization algorithm in order to obtain the distribution of any network statistic under the null hypothesis of no association between an experimental gene-set and a functional category. We illustrate the NEA method using gene and protein expression data from a lung cancer study.ConclusionsThe results indicate that the NEA method is more powerful than the traditional GEA, primarily because the relationships between gene sets were more strongly captured by network connectivity rather than by simple overlaps.

[1]  Brian H. Dunford-Shore,et al.  Somatic mutations affect key pathways in lung adenocarcinoma , 2008, Nature.

[2]  Thomas Lengauer,et al.  Improved scoring of functional groups from gene expression data by decorrelating GO graph structure , 2006, Bioinform..

[3]  Damian Szklarczyk,et al.  The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored , 2010, Nucleic Acids Res..

[4]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[5]  Kui Zhang,et al.  Prediction of protein function using protein-protein interaction data , 2002, Proceedings. IEEE Computer Society Bioinformatics Conference.

[6]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Lior Pachter,et al.  Exon-Level Microarray Analyses Identify Alternative Splicing Programs in Breast Cancer , 2010, Molecular Cancer Research.

[8]  M E J Newman Assortative mixing in networks. , 2002, Physical review letters.

[9]  Yudi Pawitan,et al.  False discovery rate, sensitivity and sample size for microarray studies , 2005, Bioinform..

[10]  Damien Challet,et al.  Optimal combinations of imperfect objects. , 2002, Physical review letters.

[11]  C. Sander,et al.  Automated Network Analysis Identifies Core Pathways in Glioblastoma , 2010, PloS one.

[12]  Yuguo Chen,et al.  Sequential Monte Carlo Methods for Statistical Analysis of Tables , 2005 .

[13]  Peter N. Robinson,et al.  GOing Bayesian: model-based gene set analysis of genome-scale data , 2010, Nucleic acids research.

[14]  D. Hanahan,et al.  Hallmarks of Cancer: The Next Generation , 2011, Cell.

[15]  Pankaj Agarwal,et al.  A global pathway crosstalk network , 2008, Bioinform..

[16]  E. Sonnhammer,et al.  Network-based Identification of Novel Cancer Genes , 2009, Molecular & Cellular Proteomics.

[17]  G. Michailidis,et al.  Network Enrichment Analysis in Complex Experiments , 2010, Statistical applications in genetics and molecular biology.

[18]  D. Enquobahrie,et al.  Early pregnancy peripheral blood gene expression and risk of preterm delivery: a nested case control study , 2009, BMC pregnancy and childbirth.

[19]  Benno Schwikowski,et al.  Discovering regulatory and signalling circuits in molecular interaction networks , 2002, ISMB.

[20]  Zhen Jiang,et al.  Bioconductor Project Bioconductor Project Working Papers Year Paper Extensions to Gene Set Enrichment , 2013 .

[21]  K. Sneppen,et al.  Specificity and Stability in Topology of Protein Networks , 2002, Science.

[22]  Soniya priyadharishni,et al.  Network-based Identification of Novel Cancer Genes , 2012 .

[23]  E. Sonnhammer,et al.  Global networks of functional coupling in eukaryotes from comprehensive data integration. , 2009, Genome research.

[24]  Jeffrey T. Chang,et al.  Oncogenic pathway signatures in human cancers as a guide to targeted therapies , 2006, Nature.

[25]  Matthew A. Hibbs,et al.  Exploring the human genome with functional maps. , 2009, Genome research.

[26]  Andrey Alexeyenko,et al.  Dynamic Zebrafish Interactome Reveals Transcriptional Mechanisms of Dioxin Toxicity , 2010, PloS one.

[27]  R. Tsien,et al.  Specificity and Stability in Topology of Protein Networks , 2022 .

[28]  Annarita D'Addabbo,et al.  Comparative study of gene set enrichment methods , 2009, BMC Bioinformatics.

[29]  Mona Singh,et al.  How and when should interactome-derived clusters be used to predict functional modules and protein function? , 2009, Bioinform..

[30]  S. Kasif,et al.  Network-Based Analysis of Affected Biological Processes in Type 2 Diabetes Models , 2007, PLoS genetics.

[31]  Karuturi R. Krishna Murthy,et al.  Bias in the estimation of false discovery rate in microarray studies , 2005, Bioinform..

[32]  M. Newman,et al.  Random graphs with arbitrary degree distributions and their applications. , 2000, Physical review. E, Statistical, nonlinear, and soft matter physics.

[33]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[34]  Andrey Alexeyenko,et al.  Genome-wide pathway analysis implicates intracellular transmembrane protein transport in Alzheimer disease , 2010, Journal of Human Genetics.