A sampling framework for counting temporal motifs

Pattern counting in graphs is fundamental to network science tasks, and there are many scalable methods for approximating counts of small patterns, often called motifs, in large graphs. However, modern graph datasets now contain richer structure, and incorporating temporal information in particular has become a critical part of network analysis. Temporal motifs, which are generalizations of small subgraph patterns that incorporate temporal ordering on edges, are an emerging part of the network analysis toolbox. However, there are no algorithms for fast estimation of temporal motifs counts; moreover, we show that even counting simple temporal star motifs is NP-complete. Thus, there is a need for fast and approximate algorithms. Here, we present the first frequency estimation algorithms for counting temporal motifs. More specifically, we develop a sampling framework that sits as a layer on top of existing exact counting algorithms and enables fast and accurate memory-efficient estimates of temporal motif counts. Our results show that we can achieve one to two orders of magnitude speedups with minimal and controllable loss in accuracy on a number of datasets.

[1]  H. Avron Counting Triangles in Large Graphs using Randomized Matrix Trace Estimation , 2010 .

[2]  Karl Rohe,et al.  The blessing of transitivity in sparse and stochastic networks , 2013, 1307.2302.

[3]  Xiaodong Wang,et al.  A Sequential Monte Carlo Method for Motif Discovery , 2006, IEEE Transactions on Signal Processing.

[4]  Ravi Kumar,et al.  Counting Graphlets: Space vs Time , 2017, WSDM.

[5]  Natasa Przulj,et al.  Biological network comparison using graphlet degree distribution , 2007, Bioinform..

[6]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[7]  Matthieu Latapy,et al.  Finding remarkably dense sequences of contacts in link streams , 2016, Social Network Analysis and Mining.

[8]  S. Mangan,et al.  Structure and function of the feed-forward loop network motif , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Yongsub Lim,et al.  MASCOT: Memory-efficient and Accurate Sampling for Counting Local Triangles in Graph Streams , 2015, KDD.

[10]  Jure Leskovec,et al.  Motifs in Temporal Networks , 2016, WSDM.

[11]  LimYongsub,et al.  Memory-Efficient and Accurate Sampling for Counting Local Triangles in Graph Streams , 2018 .

[12]  Tamara G. Kolda,et al.  Triadic Measures on Graphs: The Power of Wedge Sampling , 2012, SDM.

[13]  Shuang Li,et al.  COEVOLVE: A Joint Point Process Model for Information Diffusion and Network Co-evolution , 2015, NIPS.

[14]  Rahul Siddharthan,et al.  PhyloGibbs-MP: Module Prediction and Discriminative Motif-Finding by Gibbs Sampling , 2008, PLoS Comput. Biol..

[15]  Dana Ron,et al.  Approximately Counting Triangles in Sublinear Time , 2017, SIAM J. Comput..

[16]  L. Christophorou Science , 2018, Emerging Dynamics: Science, Energy, Society and Values.

[17]  Hasan H. Otu,et al.  Prediction of peptides binding to MHC class I and II alleles by temporal motif mining , 2013, BMC Bioinformatics.

[18]  Fergal Reid,et al.  An Analysis of Anonymity in the Bitcoin System , 2011, PASSAT 2011.

[19]  Matthieu Latapy,et al.  Computing maximal cliques in link streams , 2015, Theor. Comput. Sci..

[20]  Qi He,et al.  Communication motifs: a tool to characterize social communications , 2010, CIKM.

[21]  Jure Leskovec,et al.  Governance in Social Media: A Case Study of the Wikipedia Promotion Process , 2010, ICWSM.

[22]  Christos Faloutsos,et al.  oddball: Spotting Anomalies in Weighted Graphs , 2010, PAKDD.

[23]  Philip S. Yu,et al.  GraphScope: parameter-free mining of large time-evolving graphs , 2007, KDD '07.

[24]  Diane J. Cook,et al.  Graph-based anomaly detection , 2003, KDD '03.

[25]  Louis H. Y. Chen,et al.  Importance Sampling of Word Patterns in DNA and Protein Sequences , 2008, J. Comput. Biol..

[26]  Christos Faloutsos,et al.  DOULION: counting triangles in massive graphs with a coin , 2009, KDD.

[27]  Andrew McGregor,et al.  Graph stream algorithms: a survey , 2014, SGMD.

[28]  Peter Donnelly,et al.  Superfamilies of Evolved and Designed Networks , 2004 .

[29]  Ryan A. Rossi,et al.  Role Discovery in Networks , 2014, IEEE Transactions on Knowledge and Data Engineering.

[30]  I. Csabai,et al.  Inferring the interplay between network structure and market effects in Bitcoin , 2014, ArXiv.

[31]  Jari Saramäki,et al.  Temporal motifs reveal homophily, gender-specific patterns, and group talk in call sequences , 2013, Proceedings of the National Academy of Sciences.

[32]  Seshadhri Comandur,et al.  A Fast and Provable Method for Estimating Clique Counts Using Turán's Theorem , 2016, WWW.

[33]  Jakub W. Pachocki,et al.  Scalable Motif-aware Graph Clustering , 2016, WWW.

[34]  Jon M. Kleinberg,et al.  The structure of information pathways in a social communication network , 2008, KDD.

[35]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[36]  Alexandros G. Dimakis,et al.  Distributed Estimation of Graph 4-Profiles , 2016, WWW.

[37]  Jack Hessel,et al.  Science, AskScience, and BadScience: On the Coexistence of Highly Related Communities , 2016, ICWSM.

[38]  Jon M. Kleinberg,et al.  Subgraph frequencies: mapping the empirical and extremal geography of large graph collections , 2013, WWW.

[39]  Matthieu Latapy,et al.  Enumerating maximal cliques in link streams with durations , 2018, Inf. Process. Lett..

[40]  Ruoming Jin,et al.  Trend Motif: A Graph Mining Approach for Analysis of Dynamic Complex Networks , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[41]  Danai Koutra,et al.  RolX: structural role extraction & mining in large graphs , 2012, KDD.

[42]  Jure Leskovec,et al.  Higher-order organization of complex networks , 2016, Science.

[43]  Manish Marwah,et al.  A Temporal Motif Mining Approach to Unsupervised Energy Disaggregation: Applications to Residential and Commercial Buildings , 2013, AAAI.

[44]  Vito Latora,et al.  Multilayer motif analysis of brain networks. , 2016, Chaos.

[45]  Jari Saramäki,et al.  Temporal motifs in time-dependent networks , 2011, ArXiv.

[46]  S. Shen-Orr,et al.  Network motifs in the transcriptional regulation network of Escherichia coli , 2002, Nature Genetics.

[47]  Jari Saramäki,et al.  Temporal Networks , 2011, Encyclopedia of Social Network Analysis and Mining.

[48]  Tijana Milenkovic,et al.  Exploring the structure and function of temporal networks with dynamic graphlets , 2015, Bioinform..

[49]  Shlomo Havlin,et al.  Dynamic motifs in socio-economic networks , 2014 .

[50]  Tanya Y. Berger-Wolf,et al.  Structure Prediction in Temporal Networks using Frequent Subgraphs , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[51]  Ingo Scholtes,et al.  When is a Network a Network?: Multi-Order Graphical Model Selection in Pathways and Temporal Networks , 2017, KDD.

[52]  Linyuan Lu,et al.  Potential Theory for Directed Networks , 2012, PloS one.

[53]  Kathleen M. Carley,et al.  Patterns and dynamics of users' behavior and interaction: Network analysis of an online community , 2009, J. Assoc. Inf. Sci. Technol..

[54]  Balaraman Ravindran,et al.  COMMIT: A Scalable Approach to Mining Communication Motifs from Dynamic Networks , 2015, SIGMOD Conference.

[55]  Joseph G. Ibrahim,et al.  Variable Selection in Regression Mixture Modeling for the Discovery of Gene Regulatory Networks , 2007 .

[56]  Xiangliang Zhang,et al.  MOSS-5: A Fast Method of Approximating Counts of 5-Node Graphlets in Large Graphs , 2018, IEEE Transactions on Knowledge and Data Engineering.

[57]  Igor Jurisica,et al.  Modeling interactome: scale-free or geometric? , 2004, Bioinform..

[58]  Sutanay Choudhury,et al.  A Chronological Edge-Driven Approach to Temporal Subgraph Isomorphism , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[59]  Albert-László Barabási,et al.  Aggregation of topological motifs in the Escherichia coli transcriptional regulatory network , 2004, BMC Bioinformatics.

[60]  Lorenzo De Stefani,et al.  TRIÈST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fixed Memory Size , 2016, KDD.

[61]  Danai Koutra,et al.  Summarizing and understanding large graphs , 2015, Stat. Anal. Data Min..

[62]  Jure Leskovec,et al.  Signed networks in social media , 2010, CHI.

[63]  Joan Feigenbaum,et al.  On graph problems in a semi-streaming model , 2005, Theor. Comput. Sci..

[64]  Ramana Rao Kompella,et al.  Graph sample and hold: a framework for big-graph analytics , 2014, KDD.