Enhancing X10 performance by auto-tuning the managed java back-end

X10 is a programming language specifically de- signed with productivity and scalability in mind. In the era of distributed multi-core systems, X10 provides programmers a high-level abstraction which is an absolute necessity. In this paper we present an auto-tuning solution to enhance the performance of X10 programs that uses the Java back-end. Our auto-tuner is based on OpenTuner, an extensible framework for building auto-tuning applications. We present improved running times for X10 benchmark programs that are shipped with X10 and the well known LULESH benchmark. The auto-tuning experiments recorded a maximum performance improvement of 50% for LULESH while the average improvement for the set of benchmarks is 25%. We analyze the internal changes a Java Virtual Machine (JVM) undergoes as a result of our auto-tuning. Finally, the study of the behavior of tuned programs for their input sensitivity shows that our tuned JVM configurations would continue with enhanced performance over varying input sizes of a program.

[1]  P. Sadayappan,et al.  Annotation-based empirical performance tuning using Orio , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[2]  Toshio Suganuma,et al.  Compiling X10 to Java , 2011, X10 '11.

[3]  Matteo Frigo A Fast Fourier Transform Compiler , 1999, PLDI.

[4]  Vivek Sarkar,et al.  Communication Optimizations for Distributed-Memory X10 Programs , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[5]  Gang Ren,et al.  Is Search Really Necessary to Generate High-Performance BLAS? , 2005, Proceedings of the IEEE.

[6]  Toyotaro Suzumura,et al.  Introducing ScaleGraph: an X10 library for billion scale graph analytics , 2012, X10 '12.

[7]  Sanath Jayasena,et al.  Auto-Tuning the Java Virtual Machine , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[8]  Toyotaro Suzumura,et al.  X10-based distributed and parallel betweenness centrality and its application to social analytics , 2013, 20th Annual International Conference on High Performance Computing.

[9]  Chi-Bang Kuan,et al.  Automated Empirical Optimization , 2011, Encyclopedia of Parallel Computing.

[10]  Shoaib Kamil,et al.  OpenTuner: An extensible framework for program autotuning , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[11]  Chun Chen,et al.  Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy , 2005, International Symposium on Code Generation and Optimization.

[12]  Mary W. Hall,et al.  CHiLL : A Framework for Composing High-Level Loop Transformations , 2007 .

[13]  Haibo Chen,et al.  X10-FT: transparent fault tolerance for APGAS language and runtime , 2013, PMAM '13.

[14]  Spyros Kotoulas,et al.  High throughput indexing for large-scale semantic web data , 2015, SAC.

[15]  Vijay A. Saraswat,et al.  A Resilient Framework for Iterative Linear Algebra Applications in X10 , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[16]  Daniel Diaz,et al.  Experimenting with X10 for Parallel Constraint-Based Local Search , 2013, ArXiv.

[17]  Keshav Pingali,et al.  Think globally, search locally , 2005, ICS '05.

[18]  James Demmel,et al.  Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.

[19]  Toyotaro Suzumura,et al.  Towards highly scalable X10 based spectral clustering , 2012, 2012 19th International Conference on High Performance Computing.

[20]  David Cunningham,et al.  A performance model for X10 applications: what's going on under the hood? , 2011, X10 '11.

[21]  Toyotaro Suzumura,et al.  Graph database benchmarking on cloud environments with XGDBench , 2013, Automated Software Engineering.

[22]  Kiyokuni Kawachiya,et al.  Distributed garbage collection for managed X10 , 2012, X10 '12.

[23]  Gang Ren,et al.  A comparison of empirical and model-driven optimization , 2003, PLDI '03.

[24]  Ian Karlin,et al.  LULESH 2.0 Updates and Changes , 2013 .

[25]  Yuefan Deng,et al.  New trends in high performance computing , 2001, Parallel Computing.

[26]  David A. Bader Designing Scalable Synthetic Compact Applications for Benchmarking High Productivity Computing Systems , 2006 .