GPU-Accelerated Cloud Computing for Data-Intensive Applications

Recently, many large-scale data-intensive applications have emerged from the Internet and science domains. They pose significant challenges on the performance, scalability and programmability of existing data management systems. The challenges are even greater when these data management systems run on emerging parallel and distributed hardware and software platforms. In this chapter, we study the use of the GPU (Graphics Processing Units) in MapReduce and general graph processing in the Cloud for these data-intensive applications. In particular, we report our experiences in developing system prototypes, and discuss the open problems in the interplay between data-intensive applications and system platforms.

[1]  Justin Talbot,et al.  Phoenix++: modular MapReduce for shared-memory systems , 2011, MapReduce '11.

[2]  Jonathan W. Berry,et al.  Software and Algorithms for Graph Queries on Multithreaded Architectures , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[3]  P. J. Narayanan,et al.  Accelerating Large Graph Algorithms on the GPU Using CUDA , 2007, HiPC.

[4]  Charalampos E. Tsourakakis,et al.  HADI : Fast Diameter Estimation and Mining in Massive Graphs with Hadoop , 2008 .

[5]  Jianlong Zhong,et al.  Medusa: Simplified Graph Processing on GPUs , 2014, IEEE Transactions on Parallel and Distributed Systems.

[6]  Bu-Sung Lee,et al.  A Map-Reduce Based Framework for Heterogeneous Processing Element Cluster Environments , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[7]  P. J. Narayanan,et al.  CUDA cuts: Fast graph cuts on the GPU , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[8]  Nachiket Kapre,et al.  GraphStep: A System Architecture for Sparse-Graph Algorithms , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[9]  Vipin Kumar,et al.  Parallel Multilevel k-way Partitioning Scheme for Irregular Graphs , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[10]  Gagan Agrawal,et al.  Optimizing MapReduce for GPUs with effective shared memory usage , 2012, HPDC '12.

[11]  Joseph T. Kider,et al.  All-pairs shortest-paths for large graphs on the GPU , 2008, GH '08.

[12]  Vipin Kumar,et al.  Parallel Multilevel series k-Way Partitioning Scheme for Irregular Graphs , 1999, SIAM Rev..

[13]  Wu-chun Feng,et al.  StreamMR: An Optimized MapReduce Framework for AMD GPUs , 2011, 2011 IEEE 17th International Conference on Parallel and Distributed Systems.

[14]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[15]  Douglas P. Gregor,et al.  The Parallel BGL : A Generic Library for Distributed Graph Computations , 2005 .

[16]  Kunle Olukotun,et al.  Accelerating CUDA graph algorithms at maximum warp , 2011, PPoPP '11.

[17]  Thomas E. Anderson,et al.  High speed switch scheduling for local area networks , 1992, ASPLOS V.

[18]  John D. Owens,et al.  Multi-GPU MapReduce on GPU Clusters , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[19]  Christos Faloutsos,et al.  PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[20]  Jianlong Zhong,et al.  Towards GPU-Accelerated Large-Scale Graph Processing in the Cloud , 2013, 2013 IEEE 5th International Conference on Cloud Computing Technology and Science.

[21]  Hai Jiang,et al.  MGMR: Multi-GPU Based MapReduce , 2013, GPC.

[22]  Kuan-Ching Li,et al.  Pipelined Multi-GPU MapReduce for Big-Data Processing , 2013 .

[23]  Hong Chen,et al.  Parallel SimRank computation on large graphs with iterative aggregation , 2010, KDD.

[24]  Jimmy J. Lin,et al.  Design patterns for efficient graph algorithms in MapReduce , 2010, MLG '10.

[25]  Vivek Sarkar,et al.  HadoopCL: MapReduce on Distributed Heterogeneous Platforms through Seamless Integration of Hadoop and OpenCL , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[26]  Wei Li,et al.  Lit: A high performance massive data computing framework based on CPU/GPU cluster , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[27]  Christoforos E. Kozyrakis,et al.  Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[28]  William J. Knottenbelt,et al.  Parallel multilevel algorithms for hypergraph partitioning , 2008, J. Parallel Distributed Comput..

[29]  Seyong Lee,et al.  PUMA: Purdue MapReduce Benchmarks Suite , 2012 .

[30]  Joseph M. Hellerstein,et al.  Distributed GraphLab: A Framework for Machine Learning in the Cloud , 2012, Proc. VLDB Endow..

[31]  Wenguang Chen,et al.  MapCG: Writing parallel program portable between CPU and GPU , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[32]  Kunle Olukotun,et al.  Efficient Parallel Graph Exploration on Multi-Core CPU and GPU , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[33]  Vipin Kumar,et al.  A Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering , 1998, J. Parallel Distributed Comput..

[34]  Steven J. Plimpton,et al.  MapReduce in MPI for Large-scale graph algorithms , 2011, Parallel Comput..

[35]  Lei Shi,et al.  Dcell: a scalable and fault-tolerant network structure for data centers , 2008, SIGCOMM '08.

[36]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[37]  Bingsheng He,et al.  Relational query coprocessing on graphics processors , 2009, TODS.

[38]  David A. Maltz,et al.  Network traffic characteristics of data centers in the wild , 2010, IMC '10.

[39]  Joseph E. Gonzalez,et al.  GraphLab: A New Parallel Framework for Machine Learning , 2010 .

[40]  Peter Wittek,et al.  Leveraging on High-Performance Computing and Cloud Technologies in Digital Libraries: A Case Study , 2011, 2011 IEEE Third International Conference on Cloud Computing Technology and Science.

[41]  Haixun Wang,et al.  Trinity: a distributed graph engine on a memory cloud , 2013, SIGMOD '13.

[42]  Roy H. Campbell,et al.  MITHRA: Multiple data independent tasks on a heterogeneous resource architecture , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[43]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[44]  Bilel Derbel,et al.  Fast distributed graph partition and application , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[45]  Gagan Agrawal,et al.  Accelerating MapReduce on a coupled CPU-GPU architecture , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[46]  Naga K. Govindaraju,et al.  Mars: A MapReduce Framework on graphics processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[47]  Jens H. Krüger,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.

[48]  Kyoung-Don Kang,et al.  Grex: An efficient MapReduce framework for graphics processing units , 2013, J. Parallel Distributed Comput..

[49]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[50]  Bingsheng He,et al.  Database compression on graphics processors , 2010, Proc. VLDB Endow..

[51]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[52]  Albert G. Greenberg,et al.  The nature of data center traffic: measurements & analysis , 2009, IMC '09.

[53]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[54]  Satoshi Matsuoka,et al.  Hybrid Map Task Scheduling for GPU-Based Heterogeneous Clusters , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[55]  Bingsheng He,et al.  High-Throughput Transaction Executions on Graphics Processors , 2011, Proc. VLDB Endow..

[56]  Andrew S. Grimshaw,et al.  Scalable GPU graph traversal , 2012, PPoPP '12.

[57]  Andrey Tovchigrechko,et al.  Parallelizing BLAST and SOM Algorithms with MapReduce-MPI Library , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[58]  Jianlong Zhong,et al.  Parallel Graph Processing on Graphics Processors Made Easy , 2013, Proc. VLDB Endow..

[59]  Bingsheng He,et al.  Mars: Accelerating MapReduce with Graphics Processors , 2011, IEEE Transactions on Parallel and Distributed Systems.

[60]  Keshav Pingali,et al.  Morph algorithms on GPUs , 2013, PPoPP '13.

[61]  Rishan Chen,et al.  Improving large graph processing on partitioned graphs in the cloud , 2012, SoCC '12.

[62]  Benjamin Rose,et al.  CellMR: A framework for supporting mapreduce on asymmetric cell-based clusters , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[63]  Christoforos E. Kozyrakis,et al.  Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[64]  Sandeep Koranne A distributed algorithm for k-way graph partitioning , 1999, Proceedings 25th EUROMICRO Conference. Informatics: Theory and Practice for the New Millennium.

[65]  F. Khunjush,et al.  A preliminary study of incorporating GPUs in the Hadoop framework , 2012, The 16th CSI International Symposium on Computer Architecture and Digital Systems (CADS 2012).

[66]  Joseph M. Hellerstein,et al.  GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.

[67]  George Karypis,et al.  Multilevel k-way Partitioning Scheme for Irregular Graphs , 1998, J. Parallel Distributed Comput..

[68]  Hai Jiang,et al.  Accelerating MapReduce framework on multi-GPU systems , 2013, Cluster Computing.

[69]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[70]  Aoying Zhou,et al.  DISG: A DIStributed Graph Repository for Web Infrastructure (Invited Paper) , 2008, 2008 Second International Symposium on Universal Communication.

[71]  Feng Ji,et al.  Using Shared Memory to Accelerate MapReduce on Graphics Processing Units , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[72]  Bingsheng He,et al.  Frequent itemset mining on graphics processors , 2009, DaMoN '09.