Resource provisioning for memory intensive graph processing

In the recent past, graph processing has attracted much attention particularly with the development of Google's Pregel. What has followed is the development of open source counterparts, Apache Giraph and GraphLab. These systems enable the distributed processing of large and complex graphs, such as web graphs and social networks. However, the efficacy of such distributed processing heavily depends on resource provisioning even in clouds with increasingly abundant resources. In this paper, we present resource provisioning models for memory-intensive graph processing applications. In particular, we profile their memory usage pattern while considering their types and sizes. This profiling model enables to determine the "right" number of resources and workers (or containers in a graph processing framework). As such determination on resource provisioning level is subject to user's objective, we further provide a model to identify Pareto frontier of resource provisioning trade-offs between performance and cost. We use a graph drawing application (GILA [4]), implemented on Apache Giraph and Hadoop YARN, as a case study. Experimental results demonstrate an increase in performance by 15% - 35% with a cost trade-off through the optimization of worker count and the use of Pareto Optimal resources selection.

[1]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[2]  Archana Ganapathi,et al.  Statistics-driven workload modeling for the Cloud , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[3]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[4]  Hinge Antoine,et al.  Distributed Graph Layout with Spark , 2015 .

[5]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[6]  Prashant J. Shenoy,et al.  Resource overbooking and application profiling in shared hosting platforms , 2002, OSDI '02.

[7]  Michael Burch,et al.  Consistently GPU-Accelerated Graph Visualization , 2015, VINCI.

[8]  Walter Didimo,et al.  A Million Edge Drawing for a Fistful of Dollars , 2015, Graph Drawing.

[9]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[10]  G. Bruce Berriman,et al.  Scientific workflow applications on Amazon EC2 , 2010, 2009 5th IEEE International Conference on E-Science Workshops.

[11]  Joseph M. Hellerstein,et al.  Distributed GraphLab: A Framework for Machine Learning in the Cloud , 2012, Proc. VLDB Endow..

[12]  Walter Didimo,et al.  A Distributed Multilevel Force-Directed Algorithm , 2016, IEEE Transactions on Parallel and Distributed Systems.

[13]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[14]  Marco Mellia,et al.  A Distributed Architecture for the Monitoring of Clouds and CDNs: Applications to Amazon AWS , 2014, IEEE Transactions on Network and Service Management.

[15]  Luke M. Leslie,et al.  Exploiting Performance and Cost Diversity in the Cloud , 2013, 2013 IEEE Sixth International Conference on Cloud Computing.

[16]  Rio Yokota,et al.  Scalable Force Directed Graph Layout Algorithms Using Fast Multipole Methods , 2012, 2012 11th International Symposium on Parallel and Distributed Computing.

[17]  Holger Giese,et al.  Implementing Graph Transformations in the Bulk Synchronous Parallel Model , 2014, Software Engineering & Management.

[18]  G. Bruce Berriman,et al.  An Evaluation of the Cost and Performance of Scientific Workflows on Amazon EC2 , 2012, Journal of Grid Computing.

[19]  Emmanouel A. Varvarigos,et al.  SuMo: Analysis and Optimization of Amazon EC2 Instances , 2014, Journal of Grid Computing.

[20]  Walter Didimo,et al.  A Distributed Force-Directed Algorithm on Giraph: Design and Experiments , 2016, ArXiv.

[21]  Albert Y. Zomaya,et al.  Profiling Applications for Virtual Machine Placement in Clouds , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[22]  Filip Lievens,et al.  Designing Pareto-optimal selection systems: formalizing the decisions required for selection system development. , 2011, The Journal of applied psychology.

[23]  Albert Y. Zomaya,et al.  Executing Large Scale Scientific Workflow Ensembles in Public Clouds , 2015, 2015 44th International Conference on Parallel Processing.