Supplemental Methods For: Identifying Robust Communities and Multi-community Nodes by Combining Top-down and Bottom-up Approaches to Clustering

Biological functions are carried out by groups of interacting molecules, cells or tissues, known as communities. Membership in these communities may overlap when biological components are involved in multiple functions. However, traditional clustering methods detect non-overlapping communities. These detected communities may also be unstable and difficult to replicate, because traditional methods are sensitive to noise and parameter settings. These aspects of traditional clustering methods limit our ability to detect biological communities, and therefore our ability to understand biological functions. To address these limitations and detect robust overlapping biological communities, we propose an unorthodox clustering method called SpeakEasy which identifies communities using top-down and bottom-up approaches simultaneously. Specifically, nodes join communities based on their local connections, as well as global information about the network structure. This method can quantify the stability of each community, automatically identify the number of communities, and quickly cluster networks with hundreds of thousands of nodes. SpeakEasy shows top performance on synthetic clustering benchmarks and accurately identifies meaningful biological communities in a range of datasets, including: gene microarrays, protein interactions, sorted cell populations, electrophysiology and fMRI brain imaging.

[1]  Allan R. Jones,et al.  An anatomically comprehensive atlas of the adult human brain transcriptome , 2012, Nature.

[2]  Haiyuan Yu,et al.  Detecting overlapping protein complexes in protein-protein interaction networks , 2012, Nature Methods.

[3]  T. Blanche,et al.  Polytrodes: high-density silicon electrode arrays for large-scale multiunit recording. , 2005, Journal of neurophysiology.

[4]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[5]  Andrew M. Gross,et al.  Network-based stratification of tumor mutations , 2013, Nature Methods.

[6]  Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on, Vancouver, BC, Canada, December 11, 2011 , 2011, ICDM Workshops.

[7]  S. Kügler,et al.  Imaging of respiratory network topology in living brainstem slices , 2008, Molecular and Cellular Neuroscience.

[8]  Pasquale De Meo,et al.  Mixing local and global information for community detection in large networks , 2013, J. Comput. Syst. Sci..

[9]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[10]  Ruth Nussinov,et al.  Structure and dynamics of molecular networks: A novel paradigm of drug discovery. A comprehensive review , 2012, Pharmacology & therapeutics.

[11]  Yong He,et al.  BrainNet Viewer: A Network Visualization Tool for Human Brain Connectomics , 2013, PloS one.

[12]  Boleslaw K. Szymanski,et al.  Constructing Limited Scale-Free Topologies over Peer-to-Peer Networks , 2013, IEEE Transactions on Parallel and Distributed Systems.

[13]  Kara Dolinski,et al.  Gene Ontology annotations at SGD: new data sources and annotation methods , 2007, Nucleic Acids Res..

[14]  Marco Grzegorczyk,et al.  Improving the structure MCMC sampler for Bayesian networks by introducing a new edge reversal move , 2008, Machine Learning.

[15]  Or Zuk,et al.  Identification of transcriptional regulators in the mouse immune system , 2013, Nature Immunology.

[16]  A. G. de la Fuente From 'differential expression' to 'differential networking' - identification of dysfunctional regulatory networks in diseases. , 2010, Trends in genetics : TIG.

[17]  Andrea Lancichinetti,et al.  Community detection algorithms: a comparative analysis: invited presentation, extended abstract , 2009, VALUETOOLS.

[18]  Joydeep Ghosh,et al.  Cluster ensembles , 2011, Data Clustering: Algorithms and Applications.

[19]  R W Cox,et al.  AFNI: software for analysis and visualization of functional magnetic resonance neuroimages. , 1996, Computers and biomedical research, an international journal.

[20]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[21]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[22]  Fergal Reid,et al.  Detecting highly overlapping community structure by greedy clique expansion , 2010, KDD 2010.

[23]  Günce Keziban Orman,et al.  A Comparison of Community Detection Algorithms on Artificial Networks , 2009, Discovery Science.

[24]  Mark W. Woolrich,et al.  FSL , 2012, NeuroImage.

[25]  M. Vidal,et al.  Edgetic perturbation models of human inherited disorders , 2009, Molecular systems biology.

[26]  Joshua A. Grochow,et al.  Genomic analysis reveals a tight link between transcription factor dynamics and regulatory network architecture , 2009, Molecular systems biology.

[27]  Boleslaw K. Szymanski,et al.  Community Detection via Maximization of Modularity and Its Variants , 2014, IEEE Transactions on Computational Social Systems.

[28]  Sean R. Collins,et al.  Toward a Comprehensive Atlas of the Physical Interactome of Saccharomyces cerevisiae*S , 2007, Molecular & Cellular Proteomics.

[29]  M. P. van den Heuvel,et al.  Exploring the brain network: a review on resting-state fMRI functional connectivity. , 2010, European neuropsychopharmacology : the journal of the European College of Neuropsychopharmacology.

[30]  Jaques Reifman,et al.  PathNet: a tool for pathway analysis using topological information , 2012, Source Code for Biology and Medicine.

[31]  Eric E Schadt,et al.  Multi-tissue coexpression networks reveal unexpected subnetworks associated with disease. , 2009 .

[32]  Alessandra Conversi,et al.  Comparative Analysis , 2009, Encyclopedia of Database Systems.

[33]  O. Kuchaiev,et al.  Simulating trait evolution for cross-cultural comparison , 2010, Philosophical Transactions of the Royal Society B: Biological Sciences.

[34]  Xiaoming Liu,et al.  SLPA: Uncovering Overlapping Communities in Social Networks via a Speaker-Listener Interaction Dynamic Process , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[35]  L. Tran,et al.  Integrated Systems Approach Identifies Genetic Nodes and Networks in Late-Onset Alzheimer’s Disease , 2013, Cell.

[36]  O. Kuchaiev,et al.  Topological network alignment uncovers biological function and phylogeny , 2008, Journal of The Royal Society Interface.

[37]  A M Dale,et al.  Measuring the thickness of the human cerebral cortex from magnetic resonance images. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Adam A. Margolin,et al.  The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity , 2012, Nature.

[39]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[40]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[41]  Boleslaw K. Szymanski,et al.  A New Metric for Quality of Network Community Structure , 2015, ArXiv.

[42]  Pietro Liò,et al.  Towards real-time community detection in large networks. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[43]  Robert D. Leclerc Survival of the sparsest: robust gene networks are parsimonious , 2008, Molecular systems biology.

[44]  D. Koller,et al.  The Immunological Genome Project: networks of gene expression in immune cells , 2008, Nature Immunology.

[45]  Sandro Vega-Pons,et al.  A Survey of Clustering Ensemble Algorithms , 2011, Int. J. Pattern Recognit. Artif. Intell..

[46]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[47]  Ignacio Marín,et al.  Deciphering Network Community Structure by Surprise , 2011, PloS one.

[48]  Rich Caruana,et al.  Consensus Clusterings , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[49]  J. Cummings,et al.  The Montreal Cognitive Assessment, MoCA: A Brief Screening Tool For Mild Cognitive Impairment , 2005, Journal of the American Geriatrics Society.

[50]  Srinivasan Parthasarathy,et al.  An ensemble framework for clustering protein-protein interaction networks , 2007, ISMB/ECCB.

[51]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[52]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of proteins from whole genomes in 2005 , 2005, Nucleic Acids Res..

[53]  Kwang-Hyun Cho,et al.  Attractor Landscape Analysis Reveals Feedback Loops in the p53 Network That Control the Cellular Response to DNA Damage , 2012, Science Signaling.

[54]  Sujit K Sikdar,et al.  Small‐world network topology of hippocampal neuronal network is lost, in an in vitro glutamate injury model of epilepsy , 2007, The European journal of neuroscience.

[55]  Bin Zhang,et al.  Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R , 2008, Bioinform..

[56]  M. V. D. Heuvel,et al.  Exploring the brain network: A review on resting-state fMRI functional connectivity , 2010, European Neuropsychopharmacology.

[57]  Malika Charrad,et al.  NbClust: An R Package for Determining the Relevant Number of Clusters in a Data Set , 2014 .

[58]  Timothy O. Laumann,et al.  Functional Network Organization of the Human Brain , 2011, Neuron.

[59]  G. Wooten,et al.  Are men at greater risk for Parkinson’s disease than women? , 2004, Journal of Neurology, Neurosurgery & Psychiatry.

[60]  T. Prescott,et al.  The brainstem reticular formation is a small-world, not scale-free, network , 2006, Proceedings of the Royal Society B: Biological Sciences.

[61]  Yves A. Lussier,et al.  Network models of genome-wide association studies uncover the topological centrality of protein interactions in complex diseases , 2013, J. Am. Medical Informatics Assoc..

[62]  J. Jankovic,et al.  Movement Disorder Society‐sponsored revision of the Unified Parkinson's Disease Rating Scale (MDS‐UPDRS): Scale presentation and clinimetric testing results , 2008, Movement disorders : official journal of the Movement Disorder Society.

[63]  Boleslaw K. Szymanski,et al.  Overlapping community detection in networks: The state-of-the-art and comparative study , 2011, CSUR.

[64]  G. Tseng,et al.  Beyond modules and hubs: the potential of gene coexpression networks for investigating molecular mechanisms of complex brain disorders , 2014, Genes, brain, and behavior.

[65]  J. Ramirez,et al.  Cycle-by-cycle assembly of respiratory network activity is dynamic and stochastic. , 2013, Journal of neurophysiology.

[66]  David Warde-Farley,et al.  Dynamic modularity in protein interaction networks predicts breast cancer outcome , 2009, Nature Biotechnology.

[67]  O. Sporns,et al.  Rich-Club Organization of the Human Connectome , 2011, The Journal of Neuroscience.

[68]  S. Pu,et al.  Up-to-date catalogues of yeast protein complexes , 2008, Nucleic acids research.

[69]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[70]  K. Worsley,et al.  Impaired small-world efficiency in structural cortical networks in multiple sclerosis associated with white matter lesion load. , 2009, Brain : a journal of neurology.

[71]  L. Siever,et al.  Spatial and Temporal Mapping of De Novo Mutations in Schizophrenia to a Fetal Prefrontal Cortical Network , 2013, Cell.

[72]  Thomas J. Grabowski,et al.  Dynamic Connectivity at Rest Predicts Attention Task Performance , 2015, Brain Connect..

[73]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[74]  Melanie Boerries,et al.  Boolean approach to signalling pathway modelling in HGF-induced keratinocyte migration , 2012, Bioinform..

[75]  E. Bullmore,et al.  Human brain networks in health and disease , 2009, Current opinion in neurology.

[76]  S. Debener,et al.  Default-mode brain dysfunction in mental disorders: A systematic review , 2009, Neuroscience & Biobehavioral Reviews.

[77]  B. Miller,et al.  Neurodegenerative Diseases Target Large-Scale Human Brain Networks , 2009, Neuron.

[78]  S. Horvath,et al.  Genes and pathways underlying regional and cell type changes in Alzheimer's disease , 2013, Genome Medicine.

[79]  Homin K. Lee,et al.  Coexpression analysis of human genes across many microarray data sets. , 2004, Genome research.

[80]  Lin Gao,et al.  Detecting Overlapping Protein Complexes by Rough-Fuzzy Clustering in Protein-Protein Interaction Networks , 2014, PloS one.

[81]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[82]  P. Deloukas,et al.  Patterns of Cis Regulatory Variation in Diverse Human Populations , 2012, PLoS genetics.

[83]  Santo Fortunato,et al.  Finding Statistically Significant Communities in Networks , 2010, PloS one.