Jerarca: Efficient Analysis of Complex Networks Using Hierarchical Clustering

Background How to extract useful information from complex biological networks is a major goal in many fields, especially in genomics and proteomics. We have shown in several works that iterative hierarchical clustering, as implemented in the UVCluster program, is a powerful tool to analyze many of those networks. However, the amount of computation time required to perform UVCluster analyses imposed significant limitations to its use. Methodology/Principal Findings We describe the suite Jerarca, designed to efficiently convert networks of interacting units into dendrograms by means of iterative hierarchical clustering. Jerarca is divided into three main sections. First, weighted distances among units are computed using up to three different approaches: a more efficient version of UVCluster and two new, related algorithms called RCluster and SCluster. Second, Jerarca builds dendrograms based on those distances, using well-known phylogenetic algorithms, such as UPGMA or Neighbor-Joining. Finally, Jerarca provides optimal partitions of the trees using statistical criteria based on the distribution of intra- and intercluster connections. Outputs compatible with the phylogenetic software MEGA and the Cytoscape package are generated, allowing the results to be easily visualized. Conclusions/Significance The four main advantages of Jerarca in respect to UVCluster are: 1) Improved speed of a novel UVCluster algorithm; 2) Additional, alternative strategies to perform iterative hierarchical clustering; 3) Automatic evaluation of the hierarchical trees to obtain optimal partitions; and, 4) Outputs compatible with popular software such as MEGA and Cytoscape.

[1]  Anton J. Enright,et al.  Detection of functional modules from protein interaction networks , 2003, Proteins.

[2]  Antonio Marco,et al.  Interactome and Gene Ontology provide congruent yet subtly different views of a eukaryotic cell , 2009, BMC Systems Biology.

[3]  Antonio Marco,et al.  A general strategy to determine the congruence between a hierarchical and a non-hierarchical classification , 2007, BMC Bioinformatics.

[4]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[5]  Igor Jurisica,et al.  Functional topology in a network of protein interactions , 2004, Bioinform..

[6]  A. R. Wagner Molecular Biology and Evolution , 2001 .

[7]  Alexander Rives,et al.  Modular organization of cellular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Ignacio Marín,et al.  Iterative Cluster Analysis of Protein Interaction Data , 2005, Bioinform..

[9]  M. Nei,et al.  Molecular Evolution and Phylogenetics , 2000 .

[10]  D. Watts,et al.  Small Worlds: The Dynamics of Networks between Order and Randomness , 2001 .

[11]  Michael L. Creech,et al.  Integration of biological networks and gene expression data using Cytoscape , 2007, Nature Protocols.

[12]  M. Nei,et al.  MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. , 2007, Molecular biology and evolution.

[13]  M. Nadeau,et al.  Proteins : Structure , Function , and Bioinformatics , 2022 .

[14]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2008 update , 2008, Nucleic Acids Res..

[15]  J. Herskowitz,et al.  Proceedings of the National Academy of Sciences, USA , 1996, Current Biology.

[16]  Alain Guénoche,et al.  Clustering proteins from interaction networks for the prediction of cellular functions , 2004, BMC Bioinformatics.

[17]  Werner Dubitzky,et al.  Briefings in bioinformatics. , 2009, Briefings in bioinformatics.

[18]  G. D,et al.  American Naturalist , 1867, Nature.

[19]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[20]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[21]  A. Pühler,et al.  Molecular systems biology , 2007 .

[22]  Ignacio Marín,et al.  Deciphering Network Community Structure by Surprise , 2011, PloS one.

[23]  O. Bagasra,et al.  Proceedings of the National Academy of Sciences , 1914, Science.

[24]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[25]  J. Farris Estimating Phylogenetic Trees from Distance Matrices , 1972, The American Naturalist.

[26]  Shigehiko Kanaya,et al.  Development and implementation of an algorithm for detection of protein complexes in large interaction networks , 2006, BMC Bioinformatics.

[27]  Igor Jurisica,et al.  Protein complex prediction via cost-based clustering , 2004, Bioinform..

[28]  Andy M. Yip,et al.  Gene network interconnectedness and the generalized topological overlap measure , 2007, BMC Bioinformatics.

[29]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[30]  A. Clauset Finding local community structure in networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[31]  S. Pu,et al.  Up-to-date catalogues of yeast protein complexes , 2008, Nucleic acids research.

[32]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[33]  Per Bak,et al.  Small Worlds: The Dynamics of Networks between Order and Randomness, by Duncan J. Watts , 2000 .

[34]  R. Sharan,et al.  Network-based prediction of protein function , 2007, Molecular systems biology.

[35]  ArnauVicente,et al.  Iterative Cluster Analysis of Protein Interaction Data , 2005 .

[36]  Jie Wu,et al.  Small Worlds: The Dynamics of Networks between Order and Randomness , 2003 .

[37]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Journal of Molecular Biology , 1959, Nature.

[39]  G. Giacomello,et al.  Proteins structure. , 1957, Scientia medica italica. English ed.

[40]  Gesine Reinert,et al.  Small worlds , 2001, Random Struct. Algorithms.

[41]  BMC Bioinformatics , 2005 .

[42]  D. Bu,et al.  the protein–protein interaction network , 2004 .

[43]  Benno Schwikowski,et al.  Graph-based methods for analysing networks in cell biology , 2006, Briefings Bioinform..

[44]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[45]  Ignacio Marín,et al.  Comparative genomics and protein domain graph analyses link ubiquitination and RNA metabolism. , 2006, Journal of molecular biology.

[46]  Nature Protocols , 2006, Nature Cell Biology.