Optimizing PGAS Overhead in a Multi-locale Chapel Implementation of CoMD

Chapel supports distributed computing with an underlying PGAS memory address space. While it provides abstractions for writing simple and elegant distributed code, the type system currently lacks a notion of locality i.e. a description of an object's access behavior in relation to its actual location. This often necessitates programmer intervention to avoid redundant non-local data access. Moreover, due to insufficient locality information the compiler ends up using “wide” pointers—that can point to non-local data—for objects referenced in an otherwise completely local manner, adding to the runtime overhead.In this work we describe CoMD-Chapel, our distributed Chapel implementation of the CoMD benchmark. We demonstrate that optimizing data access through replication and localization is crucial for achieving performance comparable to the reference implementation. We discuss limitations of existing scope-based locality optimizations and argue instead for a more general (and robust) type-based approach. Lastly, we also evaluate code performance and scaling characteristics. The fully optimized version of CoMD-Chapel can perform to within 62%–87% of the reference implementation.

[1]  B. Chamberlain,et al.  User-Defined Parallel Zippered Iterators in Chapel ∗ , 2011 .

[2]  Jeffrey K. Hollingsworth,et al.  Optimizing Chapel for Single-Node Environments , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[3]  Martin Wimmer Programming models for parallel computing , 2010 .

[4]  Katherine A. Yelick,et al.  Hierarchical Pointer Analysis for Distributed Programs , 2007, SAS.

[5]  Bradford L. Chamberlain,et al.  Parallel Programmability and the Chapel Language , 2007, Int. J. High Perform. Comput. Appl..

[6]  Vivek Sarkar,et al.  X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[7]  Rafael Asenjo,et al.  Global Data Re-allocation via Communication Aggregation in Chapel , 2012, 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing.

[8]  Tarek A. El-Ghazawi,et al.  PGAS Access Overhead Characterization in Chapel , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[9]  Katherine A. Yelick,et al.  Hierarchical Computation in the SPMD Programming Model , 2013, LCPC.