Application of Graph Clustering and Visualisation Methods to Analysis of Biomolecular Data

In this paper we present an approach based on integrated use of graph clustering and visualisation methods for semi-supervised discovery of biologically significant features in biomolecular data sets. We describe several clustering algorithms that have been custom designed for analysis of biomolecular data and feature an iterated two step approach involving initial computation of thresholds and other parameters used in clustering algorithms, which is followed by identification of connected graph components, and, if needed, by adjustment of clustering parameters for processing of individual subgraphs.

[1]  Karlis Freivalds,et al.  Graph Compact Orthogonal Layout Algorithm , 2014, ISCO.

[2]  Haiyuan Yu,et al.  Detecting overlapping protein complexes in protein-protein interaction networks , 2012, Nature Methods.

[3]  Karlis Freivalds,et al.  Disconnected Graph Layout and the Polyomino Packing Approach , 2001, GD.

[4]  N. Grishin Fold change in evolution of protein structures. , 2001, Journal of structural biology.

[5]  Rodrigo Dienstmann,et al.  Genomic Determinants of Protein Abundance Variation in Colorectal Cancer Cells , 2016, bioRxiv.

[6]  Krisjanis Prusis,et al.  A Potential Field Function for Overlapping Point Set and Graph Cluster Visualization , 2014, VISIGRAPP.

[7]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[8]  Burak Eksioglu,et al.  Clustering high throughput biological data with B-MST, a minimum spanning tree based heuristic , 2015, Comput. Biol. Medicine.

[9]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[10]  Satu Elisa Schaeffer,et al.  Graph Clustering , 2017, Encyclopedia of Machine Learning and Data Mining.

[11]  Z. Wang,et al.  The structure and dynamics of multilayer networks , 2014, Physics Reports.

[12]  Juris Viksna,et al.  Exploration of Evolutionary Relations between Protein Structures , 2008, BIRD.

[13]  Nuno A. Fonseca,et al.  Expression Atlas update—an integrated database of gene and protein expression in humans, animals and plants , 2015, Nucleic Acids Res..

[14]  David R. Gilbert,et al.  Assessment of the probabilities for evolutionary structural changes in protein folds , 2007, Bioinform..

[15]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[16]  Juris Viksna,et al.  Protein Structure Comparison Based on Fold Evolution , 2007, German Conference on Bioinformatics.

[17]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[18]  Vincent A. Traag,et al.  Partitioning signed networks , 2018, Advances in Network Clustering and Blockmodeling.

[19]  W. Pearson Effective protein sequence comparison. , 1996, Methods in enzymology.

[20]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[21]  Paul A. Bates,et al.  Cluster analysis of networks generated through homology: automatic identification of important protein communities involved in cancer metastasis , 2006, BMC Bioinformatics.

[22]  Fabian Sievers,et al.  Clustal Omega, accurate alignment of very large numbers of sequences. , 2014, Methods in molecular biology.

[23]  Jaak Vilo,et al.  Building and analysing genome-wide gene disruption networks , 2002, ECCB.

[24]  Stijn van Dongen,et al.  Using MCL to extract clusters from networks. , 2012, Methods in molecular biology.

[25]  Ch. Eslahchi,et al.  Discovering overlapped protein complexes from weighted PPI networks by removing inter-module hubs , 2017, Scientific Reports.