Statistical Assessment of Crosstalk Enrichment between Gene Groups in Biological Networks

Motivation Analyzing groups of functionally coupled genes or proteins in the context of global interaction networks has become an important aspect of bioinformatic investigations. Assessing the statistical significance of crosstalk enrichment between or within groups of genes can be a valuable tool for functional annotation of experimental gene sets. Results Here we present CrossTalkZ, a statistical method and software to assess the significance of crosstalk enrichment between pairs of gene or protein groups in large biological networks. We demonstrate that the standard z-score is generally an appropriate and unbiased statistic. We further evaluate the ability of four different methods to reliably recover crosstalk within known biological pathways. We conclude that the methods preserving the second-order topological network properties perform best. Finally, we show how CrossTalkZ can be used to annotate experimental gene sets using known pathway annotations and that its performance at this task is superior to gene enrichment analysis (GEA). Availability and Implementation CrossTalkZ (available at http://sonnhammer.sbc.su.se/download/software/CrossTalkZ/) is implemented in C++, easy to use, fast, accepts various input file formats, and produces a number of statistics. These include z-score, p-value, false discovery rate, and a test of normality for the null distributions.

[1]  Paul Erdös,et al.  On random graphs, I , 1959 .

[2]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[3]  E. R. Cohen An Introduction to Error Analysis: The Study of Uncertainties in Physical Measurements , 1998 .

[4]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[5]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[6]  中尾 光輝,et al.  KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集 ゲノム医学の現在と未来--基礎と臨床) -- (データベース) , 2000 .

[7]  K. Sneppen,et al.  Specificity and Stability in Topology of Protein Networks , 2002, Science.

[8]  C. Heldin,et al.  Mechanisms of TGF-beta signaling in regulation of cell growth and differentiation. , 2002, Immunology letters.

[9]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[10]  M E J Newman Assortative mixing in networks. , 2002, Physical review letters.

[11]  M. Newman,et al.  On the uniform generation of random graphs with prescribed degree sequences , 2003, cond-mat/0312028.

[12]  Ting Chen,et al.  Assessment of the reliability of protein-protein interactions and protein function prediction , 2002, Pacific Symposium on Biocomputing.

[13]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[14]  Mathew D. Penrose,et al.  Random Geometric Graphs , 2003 .

[15]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Walter Willinger,et al.  Towards a Theory of Scale-Free Graphs: Definition, Properties, and Implications , 2005, Internet Math..

[17]  J. Ioannidis Why Most Published Research Findings Are False , 2005, PLoS medicine.

[18]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[20]  Å. Borg,et al.  Molecular characterization of early-stage bladder carcinomas by expression profiles, FGFR3 mutation status, and loss of 9q , 2006, Oncogene.

[21]  Annarita D'Addabbo,et al.  Comparative study of gene set enrichment methods , 2009, BMC Bioinformatics.

[22]  Pankaj Agarwal,et al.  A global pathway crosstalk network , 2008, Bioinform..

[23]  K. Leong,et al.  The Notch pathway in prostate development and cancer. , 2008, Differentiation; research in biological diversity.

[24]  T. Golub,et al.  Estrogen-dependent signaling in a molecularly distinct subclass of aggressive prostate cancer. , 2008, Journal of the National Cancer Institute.

[25]  Matthew A. Hibbs,et al.  Exploring the human genome with functional maps. , 2009, Genome research.

[26]  Michael Lappe,et al.  Optimized Null Model for Protein Structure Networks , 2009, PloS one.

[27]  Mona Singh,et al.  How and when should interactome-derived clusters be used to predict functional modules and protein function? , 2009, Bioinform..

[28]  Desmond J. Higham,et al.  Geometric De-noising of Protein-Protein Interaction Networks , 2009, PLoS Comput. Biol..

[29]  E. Sonnhammer,et al.  Global networks of functional coupling in eukaryotes from comprehensive data integration. , 2009, Genome research.

[30]  Tailored graph ensembles as proxies or null models for real networks I: tools for quantifying structure , 2009, 0908.1759.

[31]  E. Sonnhammer,et al.  Network-based Identification of Novel Cancer Genes , 2009, Molecular & Cellular Proteomics.

[32]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[33]  Andrey Alexeyenko,et al.  Analysis of lipid pathway genes indicates association of sequence variation near SREBF1/TOM1L2/ATPAF2 with dementia risk. , 2010, Human molecular genetics.

[34]  Andrey Alexeyenko,et al.  Genome-wide pathway analysis implicates intracellular transmembrane protein transport in Alzheimer disease , 2010, Journal of Human Genetics.

[35]  Andrey Alexeyenko,et al.  Dynamic Zebrafish Interactome Reveals Transcriptional Mechanisms of Dioxin Toxicity , 2010, PloS one.

[36]  Damian Szklarczyk,et al.  The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored , 2010, Nucleic Acids Res..

[37]  Atul J. Butte,et al.  Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges , 2012, PLoS Comput. Biol..

[38]  Soniya priyadharishni,et al.  Network-based Identification of Novel Cancer Genes , 2012 .

[39]  J. Ioannidis Why Most Published Research Findings Are False , 2019, CHANCE.