Optimizing Chapel for Single-Node Environments

This paper investigates how Chapel performance compares with other parallel frameworks. We provide specific examples of how programmers may improve their single-node (single-locale) Chapel programs to improve performance. We also identify some changes that would be possible to the language to make it easier to get these performance gains. Specifically, we compare the intranode performance of Chapel programs with OpenMP in C/C++ by conducting case studies profiling the LULESH, MiniMD, SSCA#2, and CLOMP benchmarks. Our optimization techniques demonstrate improved runtime performance of Chapel benchmarks by factors of 3x, 5.3x, 6.3x, and 4.8x respectively and outperformed their OpenMP counterparts by factors of 2x for LULESH, 1.6x for SSCA#2, and 4.8x for CLOMP.

[1]  Ian Karlin,et al.  LULESH Programming Model and Performance Ports Overview , 2012 .

[2]  Sebastian Nanz,et al.  Examining the Expert Gap in Parallel Programming , 2013, Euro-Par.

[3]  Kenjiro Taura,et al.  An Empirical Performance Study of Chapel Programming Language , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[4]  Tarek A. El-Ghazawi,et al.  Assessing Memory Access Performance of Chapel through Synthetic Benchmarks , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[5]  B. Chamberlain,et al.  The State of the Chapel Union , 2013 .

[6]  Martin Schulz,et al.  Exploring Traditional and Emerging Parallel Programming Models Using a Proxy Application , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[7]  Bertrand Meyer,et al.  Benchmarking Usability and Performance of Multicore Languages , 2013, 2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement.

[8]  B. Chamberlain,et al.  Authoring User-Defined Domain Maps in Chapel ∗ , 2011 .

[9]  Bradford L. Chamberlain,et al.  Parallel Programmability and the Chapel Language , 2007, Int. J. High Perform. Comput. Appl..

[10]  Bradford L. Chamberlain,et al.  Parameterized Diamond Tiling for Stencil Computations with Chapel parallel iterators , 2015, ICS.