Detecting Non-Uniform Clusters in Large-Scale Interaction Graphs

Graph clustering becomes difficult as the graph size and complexity increase. In particular, in interaction graphs, the clusters are small and the data on the underlying interaction are not only complex, but also noisy due to the lack of information and experimental errors. The graphs representing such data consist of (possibly overlapping) clusters of non-uniform size with some false positive and false negative links. In this article, we propose a new approach, assuming that clusters in the graphs of protein-protein interaction (PPI) networks resemble corrupted cliques. Therefore, the problem can be reduced to looking for clusters only among nodes of approximately similar degrees. This idea was implemented using a soft version of the Farthest-Point-First (FPF) clustering algorithm with the Jaccard distance function modified to perform on slightly overlapping clusters. The StripClust program developed by us was tested on a synthetic network and on the yeast PPI network.

[1]  Mona Singh,et al.  How and when should interactome-derived clusters be used to predict functional modules and protein function? , 2009, Bioinform..

[2]  Gary D. Bader,et al.  clusterMaker: a multi-algorithm clustering plugin for Cytoscape , 2011, BMC Bioinformatics.

[3]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Inderjit S. Dhillon,et al.  Weighted Graph Cuts without Eigenvectors A Multilevel Approach , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  S. Di Tommaso,et al.  Extensive analysis of D-J-C arrangements allows the identification of different mechanisms enhancing the diversity in sheep T cell receptor β-chain repertoire , 2010, BMC Genomics.

[6]  Johan Håstad,et al.  Clique is hard to approximate within n1-epsilon , 1996, Electron. Colloquium Comput. Complex..

[7]  Teofilo F. GONZALEZ,et al.  Clustering to Minimize the Maximum Intercluster Distance , 1985, Theor. Comput. Sci..

[8]  Youping Deng,et al.  Recent advances in clustering methods for protein interaction networks , 2010, BMC Genomics.

[9]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2011 update , 2010, Nucleic Acids Res..

[10]  David D. Jensen,et al.  Indexing Network Structure with Shortest-Path Trees , 2011, TKDD.

[11]  Christian Hennig,et al.  Design of dissimilarity measures: a new dissimilarity measure between species distribution ranges , 2006 .

[12]  D. Bu,et al.  Topological structure analysis of the protein-protein interaction network in budding yeast. , 2003, Nucleic acids research.

[13]  Inderjit S. Dhillon,et al.  Kernel k-means: spectral clustering and normalized cuts , 2004, KDD.

[14]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Kahn Rhrissorrakrai,et al.  MINE: Module Identification in Networks , 2011, BMC Bioinformatics.

[16]  J. Håstad Clique is hard to approximate withinn1−ε , 1999 .

[17]  L. K. Hansen,et al.  On Clustering fMRI Time Series , 1999, NeuroImage.

[18]  Satu Elisa Schaeffer,et al.  Graph Clustering , 2017, Encyclopedia of Machine Learning and Data Mining.

[19]  Ulrik Brandes,et al.  Experiments on Graph Clustering Algorithms , 2003, ESA.

[20]  S. Dongen Graph clustering by flow simulation , 2000 .