Enumerating subgraph instances using map-reduce

The theme of this paper is how to find all instances of a given “sample” graph in a larger “data graph,” using a single round of map-reduce. For the simplest sample graph, the triangle, we improve upon the best known such algorithm. We then examine the general case, considering both the communication cost between mappers and reducers and the total computation cost at the reducers. To minimize communication cost, we exploit the techniques of [1] for computing multiway joins (evaluating conjunctive queries) in a single map-reduce round. Several methods are shown for translating sample graphs into a union of conjunctive queries with as few queries as possible. We also address the matter of optimizing computation cost. Many serial algorithms are shown to be “convertible,” in the sense that it is possible to partition the data graph, explore each partition in a separate reducer, and have the total computation cost at the reducers be of the same order as the computation cost of the serial algorithm.

[1]  Christos Faloutsos,et al.  DOULION: counting triangles in massive graphs with a coin , 2009, KDD.

[2]  Jeffrey D. Ullman,et al.  Cluster Computing, Recursion and Datalog , 2010, Datalog.

[3]  Jeffrey D. Ullman,et al.  Optimizing Multiway Joins in a Map-Reduce Environment , 2011, IEEE Transactions on Knowledge and Data Engineering.

[4]  Jennifer Widom,et al.  Constraint checking with partial information , 1994, PODS.

[5]  Sergei Vassilvitskii,et al.  Counting triangles and the curse of the last reducer , 2011, WWW.

[6]  Charalampos E. Tsourakakis,et al.  Colorful triangle counting and a MapReduce implementation , 2011, Inf. Process. Lett..

[7]  N. Alon On the number of subgraphs of prescribed type of graphs with a given number of edges , 1981 .

[8]  Madhav V. Marathe,et al.  Subgraph Enumeration in Large Social Contact Networks Using Parallel Color Coding and Streaming , 2010, 2010 39th International Conference on Parallel Processing.

[9]  Jonathan Cohen,et al.  Graph Twiddling in a MapReduce World , 2009, Computing in Science & Engineering.

[10]  Noga Alon,et al.  Biomolecular network motif counting and discovery by color coding , 2008, ISMB.

[11]  M. Ancona,et al.  Cluster computing , 2003, Eleventh Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2003. Proceedings..

[12]  Andrzej Lingas,et al.  Counting and detecting small subgraphs via equations and matrix multiplication , 2011, SODA '11.

[13]  V. S. Subrahmanian,et al.  A budget-based algorithm for efficient subgraph matching on Huge Networks , 2011, 2011 IEEE 27th International Conference on Data Engineering Workshops.

[14]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[15]  Thomas Schank,et al.  Algorithmic Aspects of Triangle-Based Network Analysis , 2007 .

[16]  P. Erdös ASYMMETRIC GRAPHS , 2022 .

[17]  Dániel Marx,et al.  Size Bounds and Query Plans for Relational Joins , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[18]  Noga Alon,et al.  Finding and counting given length cycles , 1997, Algorithmica.

[19]  V. S. Subrahmanian,et al.  COSI: Cloud Oriented Subgraph Identification in Massive Social Networks , 2010, 2010 International Conference on Advances in Social Networks Analysis and Mining.

[20]  M. A. Perles ON THE NUMBER OF SUBGRAPHS OF PRESCRIBED TYPE OF GRAPHS WITH A GIVEN NUMBER OF EDGES* , 2007 .

[21]  Z. Meral Özsoyoglu,et al.  Some Results on the Containment and Minimization of (in) Equality Queries , 1994, Inf. Process. Lett..

[22]  Jeffrey D. Ullman,et al.  Map-reduce extensions and recursive queries , 2011, EDBT/ICDT '11.

[23]  Jure Leskovec,et al.  The life and death of online groups: predicting group growth and longevity , 2012, WSDM '12.