Extending the Applicability of Graphlets to Directed Networks

With recent advances in high-throughput cell biology, the amount of cellular biological data has grown drastically. Such data is often modeled as graphs (also called networks) and studying them can lead to new insights into molecule-level organization. A possible way to understand their structure is by analyzing the smaller components that constitute them, namely network motifs and graphlets. Graphlets are particularly well suited to compare networks and to assess their level of similarity due to the rich topological information that they offer but are almost always used as small undirected graphs of up to five nodes, thus limiting their applicability in directed networks. However, a large set of interesting biological networks such as metabolic, cell signaling, or transcriptional regulatory networks are intrinsically directional, and using metrics that ignore edge direction may gravely hinder information extraction. Our main purpose in this work is to extend the applicability of graphlets to directed networks by considering their edge direction, thus providing a powerful basis for the analysis of directed biological networks. We tested our approach on two network sets, one composed of synthetic graphs and another of real directed biological networks, and verified that they were more accurately grouped using directed graphlets than undirected graphlets. It is also evident that directed graphlets offer substantially more topological information than simple graph metrics such as degree distribution or reciprocity. However, enumerating graphlets in large networks is a computationally demanding task. Our implementation addresses this concern by using a state-of-the-art data structure, the g-trie, which is able to greatly reduce the necessary computation. We compared our tool to other state-of-the art methods and verified that it is the fastest general tool for graphlet counting.

[1]  Wan Li,et al.  Cancer-related marketing centrality motifs acting as pivot units in the human signaling network and mediating cross-talk between biological pathways. , 2013, Molecular bioSystems.

[2]  Olaf Sporns,et al.  Can structure predict function in the human brain? , 2010, NeuroImage.

[3]  Santiago Schnell,et al.  Network motifs provide signatures that characterize metabolism. , 2013, Molecular bioSystems.

[4]  Tijana Milenkovic,et al.  Exploring the structure and function of temporal networks with dynamic graphlets , 2015, Bioinform..

[5]  P. Erdos,et al.  On the evolution of random graphs , 1984 .

[6]  Janez Demsar,et al.  A combinatorial approach to graphlet counting , 2014, Bioinform..

[7]  Tijana Milenkovic,et al.  Rebuttal to the Letter to the Editor in response to the paper: proper evaluation of alignment‐free network comparison methods , 2017, Bioinform..

[8]  Fernando M. A. Silva,et al.  Efficient Parallel Subgraph Counting Using G-Tries , 2010, 2010 IEEE International Conference on Cluster Computing.

[9]  Simo V. Zhang,et al.  A map of human cancer signaling , 2007, Molecular systems biology.

[10]  Natasa Przulj,et al.  Predicting disease associations via biological network analysis , 2014, BMC Bioinformatics.

[11]  Avi Ma'ayan,et al.  SNAVI: Desktop application for analysis and visualization of large-scale signaling networks , 2009, BMC Systems Biology.

[12]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[13]  Olga G. Troyanskaya,et al.  Simultaneous Genome-Wide Inference of Physical, Genetic, Regulatory, and Functional Pathway Components , 2010, PLoS Comput. Biol..

[14]  Natasa Przulj,et al.  Learning the Structure of Protein-Protein Interaction Networks , 2009, Pacific Symposium on Biocomputing.

[15]  S. Shen-Orr,et al.  Network motifs in the transcriptional regulation network of Escherichia coli , 2002, Nature Genetics.

[16]  Joost N. Kok,et al.  The Gaston Tool for Frequent Subgraph Mining , 2005, GraBaTs.

[17]  R. Kahn,et al.  Efficiency of Functional Brain Networks and Intellectual Performance , 2009, The Journal of Neuroscience.

[18]  Wei Wang,et al.  GAIA: graph classification using evolutionary computation , 2010, SIGMOD Conference.

[19]  Christian Borgelt,et al.  Mining molecular fragments: finding relevant substructures of molecules , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[20]  Tijana Milenkoviæ,et al.  Uncovering Biological Network Function via Graphlet Degree Signatures , 2008, Cancer informatics.

[21]  A. Rbnyi ON THE EVOLUTION OF RANDOM GRAPHS , 2001 .

[22]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[23]  Marcus Kaiser,et al.  Strategies for Network Motifs Discovery , 2009, 2009 Fifth IEEE International Conference on e-Science.

[24]  Tijana Milenkovic,et al.  GraphCrunch: A tool for large network analyses , 2008, BMC Bioinformatics.

[25]  Fernando M. A. Silva,et al.  g-tries: an efficient data structure for discovering network motifs , 2010, SAC '10.

[26]  Pedro Manuel Pinto Ribeiro,et al.  Towards a faster network-centric subgraph census , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[27]  Fernando M. A. Silva,et al.  G-Tries: a data structure for storing and finding subgraphs , 2014, Data Mining and Knowledge Discovery.

[28]  Andre Levchenko,et al.  Dynamic Properties of Network Motifs Contribute to Biological Network Organization , 2005, PLoS biology.

[29]  Sahar Asadi,et al.  Kavosh: a new algorithm for finding network motifs , 2009, BMC Bioinformatics.

[30]  Fernando M. A. Silva,et al.  Discovering Colored Network Motifs , 2014, CompleNet.

[31]  Kenneth H. Buetow,et al.  PID: the Pathway Interaction Database , 2008, Nucleic Acids Res..

[32]  Zoran Nenadic,et al.  Structure of brain functional networks , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[33]  Fernando M. A. Silva,et al.  Motif Mining in Weighted Networks , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[34]  Réka Albert,et al.  Conserved network motifs allow protein-protein interaction prediction , 2004, Bioinform..

[35]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[36]  Zoran Levnajic,et al.  Revealing the Hidden Language of Complex Networks , 2014, Scientific Reports.

[37]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[38]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994 .

[39]  Peter Donnelly,et al.  Superfamilies of Evolved and Designed Networks , 2004 .

[40]  Natasa Przulj,et al.  GR-Align: fast and flexible alignment of protein 3D structures using graphlet degree similarity , 2014, Bioinform..

[41]  Falk Schreiber,et al.  Towards Motif Detection in Networks: Frequency Concepts and Flexible Search , 2004 .

[42]  Tijana Milenkovic,et al.  Improving identification of key players in aging via network de-noising , 2014, BCB.

[43]  Pedro Ribeiro,et al.  Efficient and Scalable Algorithms for Network Motifs Discovery , 2011 .

[44]  D. Fell,et al.  The small world inside large metabolic networks , 2000, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[45]  Xinghuo Yu,et al.  Identification of Important Nodes in Directed Biological Networks: A Network Motif Approach , 2014, PloS one.

[46]  Diego Garlaschelli,et al.  Patterns of link reciprocity in directed networks. , 2004, Physical review letters.

[47]  Natasa Przulj,et al.  Biological network comparison using graphlet degree distribution , 2007, Bioinform..

[48]  Fernando M. A. Silva,et al.  Parallel Subgraph Counting for Multicore Architectures , 2014, 2014 IEEE International Symposium on Parallel and Distributed Processing with Applications.

[49]  Sebastian Wernicke,et al.  FANMOD: a tool for fast network motif detection , 2006, Bioinform..

[50]  S. Mangan,et al.  Structure and function of the feed-forward loop network motif , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[51]  B. Bollobás The evolution of random graphs , 1984 .

[52]  O. Sporns,et al.  Organization, development and function of complex brain networks , 2004, Trends in Cognitive Sciences.

[53]  Peter Sanders,et al.  A detailed analysis of random polling dynamic load balancing , 1994, Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN).

[54]  R. Albert,et al.  The large-scale organization of metabolic networks , 2000, Nature.

[55]  Wei Wang,et al.  Efficient mining of frequent subgraphs in the presence of isomorphism , 2003, Third IEEE International Conference on Data Mining.

[56]  Pedro Manuel Pinto Ribeiro,et al.  A Scalable Parallel Approach for Subgraph Census Computation , 2014, Euro-Par Workshops.

[57]  K. Goh,et al.  Universal behavior of load distribution in scale-free networks. , 2001, Physical review letters.

[58]  Fernando M. A. Silva,et al.  Parallel discovery of network motifs , 2012, J. Parallel Distributed Comput..