论文信息 - Large graph processing in the cloud

Large graph processing in the cloud

As the study of graphs, such as web and social graphs, becomes increasingly popular, the requirements of efficiency and programming flexibility of large graph processing tasks challenge existing tools. We propose to demonstrate Surfer, a large graph processing engine designed to execute in the cloud. Surfer provides two basic primitives for programmers - MapReduce and propagation. MapReduce, originally developed by Google, processes different key-value pairs in parallel, and propagation is an iterative computational pattern that transfers information along the edges from a vertex to its neighbors in the graph. These two primitives are complementary in graph processing. MapReduce is suitable for processing flat data structures, such as vertex-oriented tasks, and propagation is optimized for edge-oriented tasks on partitioned graphs. To further improve the programmability of large graph processing, Surfer consists of a small set of high level building blocks that use these two primitives. Developers may also construct custom building blocks. Surfer further provides a GUI (Graphical User Interface) using which developers can visually create large graph processing tasks. Surfer transforms a task into an execution plan composed of MapReduce and propagation operations. It then automatically applies various optimizations to improve the efficiency of distributed execution. Surfer also provides a visualization tool to monitor the detailed execution dynamics of the execution plan to show the interesting tradeoffs between MapReduce and propagation. We demonstrate our system in two ways: first, we demo the ease-of-programming features of the system; second, we show the efficiency of the system with a series of applications on a social network. We find that Surfer is simple to use and is highly efficient for large graph-based tasks.

[1] Vipin Kumar,et al. Parallel Multilevel k-way Partitioning Scheme for Irregular Graphs , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[2] Vipin Kumar,et al. A Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering , 1998, J. Parallel Distributed Comput..

[3] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[4] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[5] Gediminas Adomavicius,et al. Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[6] Danyel Fisher,et al. Visualizing the Signatures of Social Roles in Online Discussion Groups , 2007, J. Soc. Struct..

[7] Yuan Yu,et al. Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[8] Charalampos E. Tsourakakis,et al. HADI : Fast Diameter Estimation and Mining in Massive Graphs with Hadoop , 2008 .

[9] Jingren Zhou,et al. SCOPE: easy and efficient parallel processing of massive data sets , 2008, Proc. VLDB Endow..

[10] Christos Faloutsos,et al. PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[11] Bingsheng He,et al. Wave Computing in the Cloud , 2009, HotOS.