Algorithmic species revisited: A program code classification based on array references

The shift towards parallel processor architectures has made programming, performance prediction and code generation increasingly challenging. Abstract representations of program code (i.e. classifications) have been introduced to address this challenge. An example is `algorithmic species', a memory access pattern classification of loop nests. It provides an architecture-agnostic structured view of program code, allowing programmers and compilers to take for example parallelisation decisions or perform memory hierarchy optimisations. The existing algorithmic species theory is based on the polyhedral model and is limited to static affine loop nests. In this work, we first present a revised theory of algorithmic species that overcomes this limitation. The theory consists of a 5-tuple characterisation of individual array references and their corresponding merging operation. Second, we present an extension of this theory named SPECIES+, providing a more detailed 6-tuple characterisation. With this, we are able to retain relevant access pattern information not captured by the original algorithmic species, such as column-major versus row-major matrix accesses. We implement both new theories into a tool, enabling automatic classification of program code.

[1]  Sven Verdoolaege,et al.  isl: An Integer Set Library for the Polyhedral Model , 2010, ICMS.

[2]  Henk Corporaal,et al.  Algorithmic species: A classification of affine loop nests for parallel programming , 2013, TACO.

[3]  Albert Cohen,et al.  PENCIL: Towards a Platform-Neutral Compute Intermediate Language for DSLs , 2013, HiPC 2013.

[4]  Karthikeyan Sankaralingam,et al.  Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[5]  Stephen W. Poole,et al.  An idiom-finding tool for increasing productivity of accelerators , 2011, ICS '11.

[6]  P. J. J. M. Custers Algorithmic Species : Classifying Program Code for Parallel Computing , 2012 .

[7]  Mehdi Amini,et al.  Beyond Do Loops: Data Transfer Generation with Convex Array Regions , 2012, LCPC.

[8]  Paul H. J. Kelly,et al.  Deriving Efficient Data Movement from Decoupled Access/Execute Specifications , 2008, HiPEAC.

[9]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[10]  François Irigoin,et al.  Exact versus Approximate Array Region Analyses , 1996, LCPC.

[11]  Henk Corporaal,et al.  Automatic Skeleton-Based Compilation through Integration with an Algorithm Classification , 2013, APPT.

[12]  Sven Verdoolaege,et al.  Polyhedral Process Networks , 2010, Handbook of Signal Processing Systems.

[13]  William Pugh,et al.  The Omega test: A fast and practical integer programming algorithm for dependence analysis , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[14]  Samuel H. Fuller,et al.  Computing Performance: Game Over or Next Level? , 2011, Computer.

[15]  Kleanthis Psarris,et al.  The I Test: An Improved Dependence Test for Automatic Parallelization and Vectorization , 1991, IEEE Trans. Parallel Distributed Syst..

[16]  Michael F. P. O'Boyle,et al.  Rapidly Selecting Good Compiler Optimizations using Performance Counters , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[17]  Pierre Boulet,et al.  Array-OL Revisited, Multidimensional Intensive Signal Processing Specification , 2007 .

[18]  Paul Feautrier,et al.  Dataflow analysis of array and scalar references , 1991, International Journal of Parallel Programming.