IncGraph: Incremental graphlet counting for topology optimisation

Motivation Graphlets are small network patterns that can be counted in order to characterise the structure of a network (topology). As part of a topology optimisation process, one could use graphlet counts to iteratively modify a network and keep track of the graphlet counts, in order to achieve certain topological properties. Up until now, however, graphlets were not suited as a metric for performing topology optimisation; when millions of minor changes are made to the network structure it becomes computationally intractable to recalculate all the graphlet counts for each of the edge modifications. Results IncGraph is a method for calculating the differences in graphlet counts with respect to the network in its previous state, which is much more efficient than calculating the graphlet occurrences from scratch at every edge modification made. In comparison to static counting approaches, our findings show IncGraph reduces the execution time by several orders of magnitude. The usefulness of this approach was demonstrated by developing a graphlet-based metric to optimise gene regulatory networks. IncGraph is able to quickly quantify the topological impact of small changes to a network, which opens novel research opportunities to study changes in topologies in evolving or online networks, or develop graphlet-based criteria for topology optimisation. Availability IncGraph is freely available as an open-source R package on CRAN (incgraph). The development version is also available on GitHub (rcannood/incgraph).

[1]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[2]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[3]  Vladimir Vacic,et al.  Graphlet Kernels for Prediction of Functional Residues in Protein Structures , 2010, J. Comput. Biol..

[4]  Martin J. B. Appel,et al.  The Maximum Vertex Degree of a Graph on Uniform Points in [0, 1] d , 1997, Advances in Applied Probability.

[5]  Kurt Mehlhorn,et al.  Efficient graphlet kernels for large graph comparison , 2009, AISTATS.

[6]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[7]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[8]  J. E. Hirsch,et al.  An index to quantify an individual's scientific research output , 2005, Proc. Natl. Acad. Sci. USA.

[9]  Piet Demeester,et al.  Netter: re-ranking gene network inference predictions using structural network properties , 2016, BMC Bioinformatics.

[10]  Tijana Milenkovic,et al.  Characterization of the proteasome interaction network using a QTAX-based tag-team strategy and protein interaction network analysis , 2008, Proceedings of the National Academy of Sciences.

[11]  Richard Bonneau,et al.  Biophysically motivated regulatory network inference: progress and prospects , 2016, bioRxiv.

[12]  T. Milenković,et al.  Systems-level cancer gene identification from protein interaction network topology applied to melanogenesis-related functional genomics data , 2010, Journal of The Royal Society Interface.

[13]  D. Floreano,et al.  Revealing strengths and weaknesses of methods for gene network inference , 2010, Proceedings of the National Academy of Sciences.

[14]  Tijana Milenkoviæ,et al.  Uncovering Biological Network Function via Graphlet Degree Signatures , 2008, Cancer informatics.

[15]  P. Kemmeren,et al.  De-Novo Learning of Genome-Scale Regulatory Networks in S. cerevisiae , 2014, PloS one.

[16]  Igor Jurisica,et al.  Modeling interactome: scale-free or geometric? , 2004, Bioinform..

[17]  O. Kuchaiev,et al.  Topological network alignment uncovers biological function and phylogeny , 2008, Journal of The Royal Society Interface.

[18]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[19]  Janez Demsar,et al.  A combinatorial approach to graphlet counting , 2014, Bioinform..

[20]  David Correa Martins,et al.  A feature selection technique for inference of graphs from their known topological properties: Revealing scale-free gene regulatory networks , 2014, Inf. Sci..

[21]  P. Geurts,et al.  Inferring Regulatory Networks from Expression Data Using Tree-Based Methods , 2010, PloS one.

[22]  Chiara Romualdi,et al.  COLOMBOS v3.0: leveraging gene expression compendia for cross-species analyses , 2015, Nucleic Acids Res..

[23]  R. Albert Network Inference, Analysis, and Modeling in Systems Biology , 2007, The Plant Cell Online.

[24]  Fabio Rinaldi,et al.  RegulonDB version 9.0: high-level integration of gene regulation, coexpression, motif clustering and beyond , 2015, Nucleic Acids Res..

[25]  Han Zhao,et al.  Global Network Alignment in the Context of Aging , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[26]  J. Hopfield,et al.  From molecular to modular cell biology , 1999, Nature.

[27]  Omkar Singh,et al.  Graphlet signature-based scoring method to estimate protein–ligand binding affinity , 2014, Royal Society Open Science.

[28]  Alexander Rives,et al.  Modular organization of cellular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[29]  David Eppstein,et al.  Journal of Graph Algorithms and Applications the H-index of a Graph and Its Application to Dynamic Subgraph Statistics , 2022 .

[30]  N. Lytkin,et al.  A comprehensive assessment of methods for de-novo reverse-engineering of genome-scale regulatory networks. , 2011, Genomics.

[31]  David Eppstein,et al.  Extended h-Index Parameterized Data Structures for Computing Dynamic Subgraph Statistics , 2010, ArXiv.

[32]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.