Evaluating the Performance and Scalability of MapReduce Applications on X10

MapReduce has been shown to be a simple and efficient way to harness the massive resources of clusters. Recently, researchers propose using partitioned global address space (PGAS) based language and runtime to ease the programming of large-scale clusters. In this paper, we present an empirical study on the effectiveness of running MapReduce applications on a typical PGAS language runtime called X10. By tuning the performance of two applications on X10 platforms, we successfully eliminate several performance bottlenecks related to I/O processing. We also identify several remaining problems and propose several approaches to remedying them. Our final performance evaluation on a small-scale multicore cluster shows that the MapReduce applications written with X10 notably outperform those in Hadoop in most cases. Detailed analysis reveals that the major performance advantages come from a simplified task management and data storage scheme.

[1]  Vivek Sarkar,et al.  Deadlock-free scheduling of X10 computations with bounded resources , 2007, SPAA '07.

[2]  Brian Campbell,et al.  Amortised Memory Analysis Using the Depth of Data Structures , 2009, ESOP.

[3]  PVR Murthy Parallel computing with x10 , 2008, IWMSE '08.

[4]  Vivek Sarkar,et al.  X10: concurrent programming for modern architectures , 2007, PPOPP.

[5]  Vivek Sarkar,et al.  X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[6]  Vivek Sarkar,et al.  Reducing task creation and termination overhead in explicitly parallel programs , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[7]  Vivek Sarkar,et al.  Hierarchical Place Trees: A Portable Abstraction for Task Parallelism and Data Movement , 2009, LCPC.

[8]  Raghavan Raman,et al.  Compiler support for work-stealing parallel runtime systems , 2009 .

[9]  Sriram Krishnamoorthy,et al.  Lifeline-based global load balancing , 2011, PPoPP '11.

[10]  Vivek Sarkar,et al.  Efficient optimization of memory accesses in parallel programs , 2010 .

[11]  Walter F. Tichy,et al.  Proceedings of the 1st international workshop on Multicore software engineering , 2008, ICSE 2008.

[12]  R. K. Shyamasundar,et al.  Static Detection of Place Locality and Elimination of Runtime Checks , 2008, APLAS.

[13]  Sayantan Sur,et al.  Efficient, portable implementation of asynchronous multi-place programs , 2009, PPoPP '09.

[14]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.