Abstract: Analyzing Patterns in Large-Scale Graphs Using MapReduce in Hadoop
暂无分享,去创建一个
Analyzing patterns in large-scale graphs, such as social networks (e.g. Facebook, Linkedin, Twitter) has many applications including community identification, blog analysis, intrusion and spamming detections. Currently, it is impossible to process information in large-scale graphs with millions even billions of edges with a single computer. In this paper, we take advantage of MapReduce, a programming model for processing large datasets, to detect important graph patterns using open source Hadoop on Amazon EC2. The aim of this paper is to show how MapReduce cloud computing with the application of graph pattern detection scales on real world data. We implement Cohen's MapReduce graph algorithms to enumerate patterns including triangles, rectangles, trusses and barycentric clusters using real world data taken from Snap Stanford. In addition, we create a visualization algorithm to visualize the detected graph patterns. The performance of MapReduce graph algorithms has been discussed too.
[1] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[2] Charalampos E. Tsourakakis. Fast counting of triangles in real-world networks : proofs , algorithms and observations , 2008 .
[3] Jimmy J. Lin,et al. Design patterns for efficient graph algorithms in MapReduce , 2010, MLG '10.
[4] Jonathan Cohen,et al. Graph Twiddling in a MapReduce World , 2009, Computing in Science & Engineering.