Optimizing numerical code by means of the transitive closure of dependence graphs

A challenging task in numerical programming modern computer systems is to effectively exploit the parallelism available in the architecture and manage the CPU caches to increase performance. Loop nest tiling allows for both coarsening parallel code and improving code locality. In this paper, we explore a new way to generate tiled code and derive the free schedule of tiles by means of the transitive closure of loop nest dependence graphs. Multi-threaded code executes tiles as soon as their operands are available. To design the approach, loop dependences are presented in the form of tuple relations. Discussed techniques are implemented in the source-to-source TRACO compiler. Experimental study, carried out on multi-core architectures, demonstrates the considerable speed-up of tiled numerical codes generated by the presented approach.

[1]  Cédric Bastoul,et al.  Code generation in the polyhedral model is easier than you think , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[2]  Albert Cohen,et al.  Polyhedral AST Generation Is More Than Scanning Polyhedra , 2015, ACM Trans. Program. Lang. Syst..

[3]  D. Wonnacott,et al.  On the Scalability of Loop Tiling Techniques , 2012 .

[4]  W. Pugh,et al.  A framework for unifying reordering transformations , 1993 .

[5]  David Wonnacott,et al.  Automatic Tiling of “ Mostly-Tileable ” Loop Nests , 2014 .

[6]  Uday Bondhugula,et al.  Tiling for Dynamic Scheduling , 2014 .

[7]  Beata Bylina,et al.  Parallelizing nested loops on the Intel Xeon Phi on the example of the dense WZ factorization , 2016, 2016 Federated Conference on Computer Science and Information Systems (FedCSIS).

[8]  William Pugh,et al.  Iteration space slicing and its application to communication optimization , 1997, ICS '97.

[9]  Christian Lengauer,et al.  Loop Parallelization in the Polytope Model , 1993, CONCUR.

[10]  Jan-Philipp Weiss,et al.  Facing the Multicore-Challenge - Aspects of New Paradigms and Technologies in Parallel Computing [Proceedings of a conference held at Stuttgart, Germany, September 19-21, 2012] , 2013, Facing the Multicore-Challenge.

[11]  Marek Palkowski,et al.  Tiling arbitrarily nested loops by means of the transitive , 2016, Int. J. Appl. Math. Comput. Sci..

[12]  Uday Bondhugula,et al.  A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.

[13]  Martin Griebl,et al.  Automatic Parallelization of Loop Programs for Distributed Memory Architectures , 2004 .

[14]  Anna Beletska,et al.  An Iterative Algorithm of Computing the Transitive Closure of a Union of Parameterized Affine Integer Tuple Relations , 2010, COCOA.

[15]  François Irigoin,et al.  Supernode partitioning , 1988, POPL '88.

[16]  Jingling Xue,et al.  On Tiling as a Loop Transformation , 1997, Parallel Process. Lett..