Improved Functional Enrichment Analysis of Biological Networks using Scalable Modularity Based Clustering

The past decade has seen a rapid growth in the application of mathematical and computational tools for extracting insight from biological networks, and of particular interest here, visualising the community structure within such networks. Clustering approaches have proven useful methods to uncover structural and functional sub-groups from within protein interaction networks. However many commonly used clustering methods for identifying functionally relevant substructures within molecular networks do not perform well with increasing network sizes. We tested the performance of algorithms in terms of their ability to identify functionally relevant sub-clusters within networks of varying size as well as computational performance. Our studies suggest many algorithms perform well on smaller networks but fail to scale with network size. A Spectral based Modularity clustering algorithm, with a fine-tuning step, provided both scalability and improved identification of clusters enriched for functional annotation (e.g. disease) in real proteomic interaction datasets.

[1]  Christie S. Chang,et al.  The BioGRID interaction database: 2013 update , 2012, Nucleic Acids Res..

[2]  M. Gerstein,et al.  Getting connected: analysis and principles of biological networks. , 2007, Genes & development.

[3]  R. Nicoll,et al.  Synaptic transmission regulated by a presynaptic MALS/Liprin-alpha protein complex. , 2006, Current opinion in cell biology.

[4]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[5]  Gerry Shaw,et al.  Preferential transformation of human neuronal cells by human adenoviruses and the origin of HEK 293 cells , 2002, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[6]  J. Douglas Armstrong,et al.  Merged consensus clustering to assess and improve class discovery with microarray data , 2010, BMC Bioinformatics.

[7]  R. Carter 11 – IT and society , 1991 .

[8]  Ken Wakita,et al.  Finding community structure in mega-scale social networks: [extended abstract] , 2007, WWW '07.

[9]  R. Huganir,et al.  MAPK cascade signalling and synaptic plasticity , 2004, Nature Reviews Neuroscience.

[10]  Stephen T. C. Wong,et al.  Transcriptional signaling pathways inversely regulated in Alzheimer's disease and glioblastoma multiform , 2013, Scientific Reports.

[11]  A. Smit,et al.  Proteomics analysis of immuno-precipitated synaptic protein complexes. , 2009, Journal of proteomics.

[12]  J. Reichardt,et al.  Statistical mechanics of community detection. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[13]  Leon Danon,et al.  The effect of size heterogeneity on community identification in complex networks , 2006, physics/0601144.

[14]  Rafael C. Jimenez,et al.  The IntAct molecular interaction database in 2012 , 2011, Nucleic Acids Res..

[15]  Allan Kuchinsky,et al.  GLay: community structure analysis of biological networks , 2010, Bioinform..

[16]  M. Sheng,et al.  The postsynaptic organization of synapses. , 2011, Cold Spring Harbor perspectives in biology.

[17]  Joyce A. Mitchell,et al.  Gene Indexing: Characterization and Analysis of NLM's GeneRIFs , 2003, AMIA.

[18]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[19]  M. Newman Communities, modules and large-scale structure in networks , 2011, Nature Physics.

[20]  J Douglas Armstrong,et al.  Bio::Homology::InterologWalk - A Perl module to build putative protein-protein interaction networks through interolog mapping , 2011, BMC Bioinformatics.

[21]  Gary D. Bader,et al.  clusterMaker: a multi-algorithm clustering plugin for Cytoscape , 2011, BMC Bioinformatics.

[22]  S. Grant,et al.  Characterization of the proteome, diseases and evolution of the human postsynaptic density , 2011, Nature Neuroscience.

[23]  Laurent Gil,et al.  Ensembl variation resources , 2010, BMC Genomics.

[24]  Edward L. Huttlin,et al.  The BioPlex Network: A Systematic Exploration of the Human Interactome , 2015, Cell.

[25]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[26]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[27]  Gang Feng,et al.  Disease Ontology: a backbone for disease semantic integration , 2011, Nucleic Acids Res..

[28]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[29]  Christopher G. Chute,et al.  The National Center for Biomedical Ontology , 2012, J. Am. Medical Informatics Assoc..

[30]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[31]  Nophar Geifman,et al.  The Neural/Immune Gene Ontology: clipping the Gene Ontology for neurological and immunological systems , 2010, BMC Bioinformatics.

[32]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[33]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[34]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[35]  Michael Jünger,et al.  Graph Drawing Software , 2003, Graph Drawing Software.

[36]  M C O'Donovan,et al.  Functional gene group analysis identifies synaptic gene groups as risk factor for schizophrenia , 2011, Molecular Psychiatry.

[37]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[38]  R. Nicoll,et al.  Synaptic transmission regulated by a presynaptic MALS/Liprin-α protein complex , 2006 .

[39]  S. Grant,et al.  The proteomes of neurotransmitter receptor complexes form modular networks with distributed functionality underlying plasticity and behaviour , 2006, Molecular systems biology.

[40]  Matthieu Latapy,et al.  Computing Communities in Large Networks Using Random Walks , 2004, J. Graph Algorithms Appl..

[41]  V. Traag,et al.  Community detection in networks with positive and negative links. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[42]  Youping Deng,et al.  Recent advances in clustering methods for protein interaction networks , 2010, BMC Genomics.

[43]  Martin H. Schaefer,et al.  HIPPIE: Integrating Protein Interaction Networks with Experiment Based Quality Scores , 2012, PloS one.

[44]  L Davidroper,et al.  Numerical recipes: The art of scientific computing , 1987 .

[45]  S. Grant,et al.  Targeted tandem affinity purification of PSD-95 recovers core postsynaptic complexes and schizophrenia susceptibility proteins , 2009, Molecular systems biology.

[46]  Csongor Nyulas,et al.  BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications , 2011, Nucleic Acids Res..

[47]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[48]  Chao Yang,et al.  ARPACK users' guide - solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods , 1998, Software, environments, tools.

[49]  Yao-Cheng Lin,et al.  Genome dynamics of the human embryonic kidney 293 lineage in response to cell biology manipulations , 2014, Nature Communications.

[50]  Thomas Lengauer,et al.  Improved scoring of functional groups from gene expression data by decorrelating GO graph structure , 2006, Bioinform..

[51]  J. Naylor,et al.  Mendelian inheritance in man: A catalog of human genes and genetic disorders , 1996 .

[52]  Jaeyoung Jung,et al.  How to Take Advantage of the Limitations with Markov Clustering?-The Foundations of Branching Markov Clustering (BMCL) , 2008, IJCNLP.