A Unified MapReduce Domain-Specific Language for Distributed and Shared Memory Architectures

MapReduce is a suitable and efficient parallel programming pattern for processing big data analysis. In recent years, many frameworks/languages have implemented this pattern to achieve high performance in data mining applications, particularly for distributed memory architectures (e.g., clusters). Nevertheless, the industry of processors is now able to offer powerful processing on single machines (e.g., multi-core). Thus, these applications may address the parallelism in another architectural level. The target problems of this paper are code reuse and programming effort reduction since current solutions do not provide a single interface to deal with these two architectural levels. Therefore, we propose a unified domain-specific language in conjunction with transformation rules for code generation for Hadoop and Phoenix++. We selected these frameworks as state-of-the-art MapReduce implementations for distributed and shared memory architectures, respectively. Our solution achieves a programming effort reduction from 41.84% and up to 95.43% without significant performance losses (below the threshold of 3%) compared to Hadoop and Phoenix++.

[1]  Justin Talbot,et al.  Phoenix++: modular MapReduce for shared-memory systems , 2011, MapReduce '11.

[2]  Dalvan Griebler,et al.  Performance and Usability Evaluation of a Pattern-Oriented Parallel Programming Interface for Multi-Core Architectures , 2014, SEKE.

[3]  Emery D. Berger,et al.  Garbage collection without paging , 2005, PLDI '05.

[4]  Jimmy J. Lin,et al.  Optimization Techniques for "Scaling Down" Hadoop on Multi-Core, Shared-Memory Systems , 2014, EDBT.

[5]  Antony I. T. Rowstron,et al.  Scale-up vs scale-out for Hadoop: time to rethink? , 2013, SoCC.

[6]  Martin Fowler,et al.  Domain-Specific Languages , 2010, The Addison-Wesley signature series.

[7]  Christoforos E. Kozyrakis,et al.  Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[8]  Haibo Chen,et al.  Tiled-MapReduce: Efficient and Flexible MapReduce Processing on Multicore with Tiling , 2013, TACO.

[9]  Haibo Chen,et al.  Tiled-MapReduce: Optimizing resource usages of data-parallel applications on multicore with tiling , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[10]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[11]  Haibo Chen,et al.  A Hierarchical Approach to Maximizing MapReduce Efficiency , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[12]  Ranjit Jhala,et al.  Refinement types for Haskell , 2014, ICFP.

[13]  Christoforos E. Kozyrakis,et al.  Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[14]  Kyoung-Don Kang,et al.  Grex: An efficient MapReduce framework for graphics processing units , 2013, J. Parallel Distributed Comput..

[15]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[16]  Dalvan Griebler,et al.  Towards a Domain-Specific Language for Patterns-Oriented Parallel Programming , 2013, SBLP.