CellMR: A framework for supporting mapreduce on asymmetric cell-based clusters

The use of asymmetric multi-core processors with on-chip computational accelerators is becoming common in a variety of environments ranging from scientific computing to enterprise applications. The focus of current research has been on making efficient use of individual systems, and porting applications to asymmetric processors. In this paper, we take the next step by investigating the use of multi-core-based systems, especially the popular Cell processor, in a cluster setting. We present CellMR, an efficient and scalable implementation of the MapReduce framework for asymmetric Cell-based clusters. The novelty of CellMR lies in its adoption of a streaming approach to supporting MapReduce, and its adaptive resource scheduling schemes: Instead of allocating workloads to the components once, CellMR slices the input into small work units and streams them to the asymmetric nodes for efficient processing. Moreover, CellMR removes I/O bottlenecks by design, using a number of techniques, such as double-buffering and asynchronous I/O, to maximize cluster performance. Our evaluation of CellMR using typical MapReduce applications shows that it achieves 50.5% better performance compared to the standard nonstreaming approach, introduces a very small overhead on the manager irrespective of application input size, scales almost linearly with increasing number of compute nodes (a speedup of 6.9 on average, when using eight nodes compared to a single node), and adapts effectively the parameters of its resource management policy between applications with varying computation density.

[1]  Andrew J. Hutton,et al.  Lustre: Building a File System for 1,000-node Clusters , 2003 .

[2]  Francisco J. Cazorla,et al.  A Flexible Heterogeneous Multi-Core Architecture , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[3]  Alexandros Stamatakis,et al.  RAxML-Cell: Parallel Phylogenetic Tree Inference on the Cell Broadband Engine , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[4]  David A. Bader,et al.  FFTC: Fastest Fourier Transform for the IBM Cell Broadband Engine , 2007, HiPC.

[5]  Michael Lang,et al.  Entering the petaflop era: The architecture and performance of Roadrunner , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[6]  Christoforos E. Kozyrakis,et al.  Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[7]  Philip S. Yu,et al.  CellSort: High Performance Sorting on the Cell Processor , 2007, VLDB.

[8]  Robert Strzodka,et al.  Exploring weak scalability for FEM calculations on a GPU-enhanced cluster , 2007, Parallel Comput..

[9]  Fabrizio Petrini,et al.  Multicore Surprises: Lessons Learned from Optimizing Sweep3D on the Cell Broadband Engine , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[10]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[11]  Teresa H. Y. Meng,et al.  Merge: a programming model for heterogeneous multi-core systems , 2008, ASPLOS.

[12]  Jack J. Dongarra,et al.  The PlayStation 3 for High-Performance Scientific Computing , 2008, Computing in Science & Engineering.

[13]  Dimitrios S. Nikolopoulos,et al.  Dma-based prefetching for i/o-intensive workloads on the cell architecture , 2008, CF '08.

[14]  H. Peter Hofstee,et al.  Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..

[15]  Karthikeyan Sankaralingam,et al.  MapReduce for the Cell B.E. Architecture , 2007 .

[16]  Michael Kistler,et al.  Accelerating computing with the cell broadband engine processor , 2008, Conf. Computing Frontiers.

[17]  Ravi Rajwar,et al.  The impact of performance asymmetry in emerging multicore architectures , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[18]  Marcin Zukowski,et al.  Vectorized data processing on the cell broadband engine , 2007, DaMoN '07.

[19]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[20]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures for multithreaded workload performance , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[21]  Dan Walsh,et al.  Design and implementation of the Sun network filesystem , 1985, USENIX Conference Proceedings.

[22]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[23]  Luiz André Barroso,et al.  Web Search for a Planet: The Google Cluster Architecture , 2003, IEEE Micro.

[24]  Jason N. Dale,et al.  Cell Broadband Engine Architecture and its first implementation - A performance view , 2007, IBM J. Res. Dev..