APAT: an access pattern analysis tool for distributed arrays

Distributed arrays reduce programming effort through implicit communication. However, relying solely on this abstraction causes fine-grained communication and performance overhead. A variety of optimization techniques can be used to mitigate such overheads. However, these techniques require a thorough understanding of how distributed arrays are accessed which can be very challenging in realistic use cases. We present Access Pattern Analysis Tool (APAT) for distributed arrays. APAT is a framework that can be integrated into language software stack to efficiently collect access logs and analyze them. We show that APAT can help discover optimization opportunities that can lead to up to 35% improvement.

[1]  David F. Richards,et al.  Optimizing PGAS Overhead in a Multi-locale Chapel Implementation of CoMD , 2016, 2016 PGAS Applications Workshop (PAW).

[2]  Karl Fürlinger,et al.  DASH: A C++ PGAS Library for Distributed Data Structures and Parallel Algorithms , 2016, 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS).

[3]  Tarek A. El-Ghazawi,et al.  Comparative Performance and Optimization of Chapel in Modern Manycore Architectures , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[4]  Tarek A. El-Ghazawi,et al.  PGAS Access Overhead Characterization in Chapel , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[5]  Hui Zhang,et al.  Data Centric Performance Measurement Techniques for Chapel Programs , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[6]  Katherine A. Yelick,et al.  Scaling communication-intensive applications on BlueGene/P using one-sided communication and overlap , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[7]  Katherine Yelick,et al.  UPC: Distributed Shared-Memory Programming , 2003 .

[8]  Julian M. Kunkel,et al.  Exascale Storage Systems - An Analytical Study of Expenses , 2014, Supercomput. Front. Innov..

[9]  Bradford L. Chamberlain,et al.  Parallel Programmability and the Chapel Language , 2007, Int. J. High Perform. Comput. Appl..

[10]  Tarek A. El-Ghazawi,et al.  UPC Performance and Potential: A NPB Experimental Study , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[11]  Isao Kojima,et al.  Applying Selectively Parallel I/O Compression to Parallel Storage Systems , 2014, Euro-Par.