论文信息 - XRT: Programming-Language Independent MapReduce on Shared-Memory Systems

XRT: Programming-Language Independent MapReduce on Shared-Memory Systems

Increasing processor core-counts have created an opportunity for efficient parallel processing of large datasets on shared-memory systems. When compared to clusters of networked commodity hardware, shared-memory systems have the potential to provide better per-core performance, a more straightforward development environment and reduced operational overhead. This paper presents XRT, a high-performance and programming-language independent MapReduce runtime for shared-memory systems. XRT is built to be simple to use, pedantic about resource usage and capable of utilizing disk-based data structures for processing datasets too large to fit in memory. To our knowledge, XRT is the first MapReduce runtime explicitly designed for programming-language independent MapReduce. Moreover, XRT is the first MapReduce runtime for shared-memory systems taking advantage of disk-based data structures for processing datasets which cannot fit in memory. Benchmarks of three common data processing problems demonstrate the disk-based capabilities as well as the excellent speedup profile of XRT as system core-counts increase.

Herna Viktor | Erik G. Selin

[1] Robert Morris,et al. Optimizing MapReduce for Multicore Architectures , 2010 .

[2] Justin Talbot,et al. Phoenix++: modular MapReduce for shared-memory systems , 2011, MapReduce '11.

[3] Jeffrey Scott Vitter,et al. External memory algorithms and data structures: dealing with massive data , 2001, CSUR.

[4] Christoforos E. Kozyrakis,et al. Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[5] Donald E. Knuth,et al. The art of computer programming, volume 3: (2nd ed.) sorting and searching , 1998 .

[6] Long Zheng,et al. ShmStreaming: A Shared Memory Approach for Improving Hadoop Streaming Performance , 2013, 2013 IEEE 27th International Conference on Advanced Information Networking and Applications (AINA).

[7] Wei Zhang,et al. Melia: A MapReduce Framework on OpenCL-Based FPGAs , 2016, IEEE Transactions on Parallel and Distributed Systems.

[8] Arif Mahwish,et al. A scalable and composable map-reduce system , 2016 .

[9] Gianluigi Zanetti,et al. Pydoop: a Python MapReduce and HDFS API for Hadoop , 2010, HPDC '10.

[10] Scott Shenker,et al. Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[11] Long Zheng,et al. More convenient more overhead: the performance evaluation of Hadoop streaming , 2011, RACS.

[12] Bingsheng He,et al. Optimizing the MapReduce framework on Intel Xeon Phi coprocessor , 2013, 2013 IEEE International Conference on Big Data.

[13] Tomás F. Pena,et al. Perldoop: Efficient execution of Perl scripts on Hadoop clusters , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[14] Bhavin J. Mathiya,et al. Apache Hadoop Yarn Parameter configuration Challenges and Optimization , 2015, 2015 International Conference on Soft-Computing and Networks Security (ICSNS).

[15] Naga K. Govindaraju,et al. Mars: A MapReduce Framework on graphics processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[16] Reynold Xin,et al. Scaling Spark in the Real World: Performance and Usability , 2015, Proc. VLDB Endow..

[17] Haibo Chen,et al. Tiled-MapReduce: Optimizing resource usages of data-parallel applications on multicore with tiling , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[18] Christoforos E. Kozyrakis,et al. Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).