Integrating Data Layout Transformations with the Polyhedral Model

In the polyhedral model, classical loop transformations and statement reordering transformations have been unified and formalized as affine scheduling problems that can be applied to optimization goals such as locality and parallelism. More recently, data layout transformations have shown significant benefits in improving performance by contributing to improved efficiencies in spatial locality, multi-core parallelism and vector parallelism. However, integration of data layout transformations in the polyhedral model has received relatively little attention thus far. In this paper, we report on our work-in-progress on integrating data layout transformations in the polyhedral model, with a focus on affine representations of data layout transformations. We also demonstrate the potential benefit of performing loop and data layout transformations as an integrated optimization problem, compared to standard decoupled approaches which pick a specific phase order (e.g., data layout transformations before loop transformations) and can thereby miss opportunities to co-optimize both data layout transformations and loop transformations.

[1]  Francky Catthoor,et al.  Polyhedral parallel code generation for CUDA , 2013, TACO.

[2]  Christian Lengauer,et al.  Polly - Performing Polyhedral Optimizations on a Low-Level Intermediate Representation , 2012, Parallel Process. Lett..

[3]  Tomofumi Yuki,et al.  Extended lattice-based memory allocation , 2016, CC.

[4]  Vivek Sarkar,et al.  Modeling the conflicting demands of parallelism and Temporal/Spatial locality in affine scheduling , 2018, CC.

[5]  Sanjay V. Rajopadhye,et al.  Optimizing memory usage in the polyhedral model , 2000, TOPL.

[6]  Sven Verdoolaege Counting Affine Calculator and Applications , 2011 .

[7]  Paul Feautrier,et al.  Automatic Storage Management for Parallel Programs , 1998, Parallel Comput..

[8]  Paul Feautrier,et al.  Polyhedron Model , 2011, Encyclopedia of Parallel Computing.

[9]  Frédéric Vivien,et al.  A unified framework for schedule and storage optimization , 2001, PLDI '01.

[10]  Santosh Pande,et al.  Brainy: effective selection of data structures , 2011, PLDI '11.

[11]  Sven Verdoolaege,et al.  Schedule Trees , 2013 .

[12]  Chau-Wen Tseng,et al.  Improving data locality with loop transformations , 1996, TOPL.

[13]  Michael Wolfe,et al.  Loops skewing: The wavefront method revisited , 1986, International Journal of Parallel Programming.

[14]  Larry Carter,et al.  Schedule-independent storage mapping for loops , 1998, ASPLOS VIII.

[15]  Vivek Sarkar,et al.  Automatic selection of high-order transformations in the IBM XL FORTRAN compilers , 1997, IBM J. Res. Dev..

[16]  Ken Kennedy,et al.  Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution , 1993, LCPC.

[17]  Gilles Villard,et al.  Lattice-based memory allocation , 2003, IEEE Transactions on Computers.

[18]  Mahmut T. Kandemir,et al.  Improving locality using loop and data transformations in an integrated framework , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[19]  Vivek Sarkar,et al.  Automatic data layout generation and kernel mapping for CPU+GPU architectures , 2016, CC.

[20]  Uday Bondhugula,et al.  A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.

[21]  Michael Wolfe,et al.  Iteration Space Tiling for Memory Hierarchies , 1987, PPSC.

[22]  Vivek Sarkar,et al.  Data Layout Optimization for Portable Performance , 2015, Euro-Par.

[23]  Michael E. Wolf,et al.  Combining Loop Transformations Considering Caches and Scheduling , 2004, International Journal of Parallel Programming.

[24]  Vincent Loechner,et al.  Precise Data Locality Optimization of Nested Loops , 2004, The Journal of Supercomputing.

[25]  Frédéric Vivien,et al.  A step towards unifying schedule and storage optimization , 2007, TOPL.

[26]  François Irigoin,et al.  Supernode partitioning , 1988, POPL '88.

[27]  Franz Franchetti,et al.  Data Layout Transformation for Stencil Computations on Short-Vector SIMD Architectures , 2011, CC.

[28]  Paul Feautrier,et al.  Array expansion , 1988, ICS '88.

[29]  Richard Veras,et al.  When polyhedral transformations meet SIMD code generation , 2013, PLDI.

[30]  Uday Bondhugula,et al.  Automatic Storage Optimization for Arrays , 2016, TOPL.

[31]  Vivek Sarkar,et al.  Oil and Water Can Mix: An Integration of Polyhedral and AST-Based Transformations , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[32]  Uday Bondhugula,et al.  Effective automatic computation placement and data allocation for parallelization of regular programs , 2014, ICS '14.

[33]  Uday Bondhugula,et al.  The Pluto+ Algorithm , 2016, ACM Trans. Program. Lang. Syst..

[34]  Nicolas Vasilache,et al.  Joint Scheduling and Layout Optimization to Enable Multi-Level Vectorization , 2012 .

[35]  Vivek Sarkar,et al.  Automatic localization for distributed-memory multiprocessors using a shared-memory compilation framework , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[36]  Sanjay V. Rajopadhye,et al.  Memory Reuse Analysis in the Polyhedral Model , 1996, Euro-Par, Vol. I.

[37]  William Pugh,et al.  Constraint-based array dependence analysis , 1998, TOPL.