Inexact subgraph isomorphism in MapReduce

Inexact subgraph matching based on type-isomorphism was introduced by Berry et al. [J. Berry, B. Hendrickson, S. Kahan, P. Konecny, Software and algorithms for graph queries on multithreaded architectures, in: Proc. IEEE International Parallel and Distributed Computing Symposium, IEEE, 2007, pp. 1-14] as a generalization of the exact subgraph matching problem. Enumerating small subgraph patterns in very large graphs is a core problem in the analysis of social networks, bioinformatics data sets, and other applications. This paper describes a MapReduce algorithm for subgraph type-isomorphism matching. The MapReduce computing framework is designed for distributed computing on massive data sets, and the new algorithm leverages MapReduce techniques to enable processing of graphs with billions of vertices. The paper also introduces a new class of walk-level constraints for narrowing the set of matches. Constraints meeting criteria defined in the paper are useful for specifying more precise patterns and for improving algorithm performance. Results are provided on a variety of graphs, with size ranging up to billions of vertices and edges, including graphs that follow a power law degree distribution.

[1]  Jonathan W. Berry,et al.  Software and Algorithms for Graph Queries on Multithreaded Architectures , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[2]  Christos Faloutsos,et al.  Fast best-effort pattern matching in large attributed graphs , 2007, KDD '07.

[3]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[4]  Jonathan Cohen,et al.  Graph Twiddling in a MapReduce World , 2009, Computing in Science & Engineering.

[5]  Lawrence B. Holder,et al.  Substructure Discovery Using Minimum Description Length and Background Knowledge , 1993, J. Artif. Intell. Res..

[6]  Jonathan W. Berry,et al.  Listing triangles in expected linear time on a class of power law graphs. , 2010 .

[7]  Lawrence B. Holder,et al.  Graph-Based Data Mining , 2000, IEEE Intell. Syst..

[8]  Sherry Marcus,et al.  Graph-based technologies for intelligence analysis , 2004, CACM.

[9]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[10]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[11]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[12]  Christos Faloutsos,et al.  Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication , 2005, PKDD.

[13]  George Karypis,et al.  Finding Frequent Patterns in a Large Sparse Graph* , 2005, Data Mining and Knowledge Discovery.

[14]  Madhav V. Marathe,et al.  SAHAD: Subgraph Analysis in Massive Networks Using Hadoop , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[15]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[16]  Nils J. Nilsson,et al.  Principles of Artificial Intelligence , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Horst Bunke,et al.  Inexact graph matching for structural pattern recognition , 1983, Pattern Recognit. Lett..

[18]  Tamara G. Kolda,et al.  An In-depth Study of Stochastic Kronecker Graphs , 2011, 2011 IEEE 11th International Conference on Data Mining.

[19]  Huajun Chen,et al.  MapReduce-Based Pattern Finding Algorithm Applied in Motif Detection for Prescription Compatibility Network , 2009, APPT.

[20]  Dennis Shasha,et al.  Algorithmics and applications of tree and graph searching , 2002, PODS.

[21]  V. S. Subrahmanian,et al.  COSI: Cloud Oriented Subgraph Identification in Massive Social Networks , 2010, 2010 International Conference on Advances in Social Networks Analysis and Mining.

[22]  V. S. Subrahmanian,et al.  DOGMA: A Disk-Oriented Graph Matching Algorithm for RDF Databases , 2009, SEMWEB.

[23]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[24]  Christos Faloutsos,et al.  Kronecker Graphs: An Approach to Modeling Networks , 2008, J. Mach. Learn. Res..

[25]  Steven J. Plimpton,et al.  MapReduce in MPI for Large-scale graph algorithms , 2011, Parallel Comput..

[26]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[27]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .