Loop Nest Tiling for Image Processing and Communication Applications

Loop nest tiling is one of the most important loop nest optimizations. This paper presents a practical framework for automatic tiling of affine loop nests to reduce time of application execution which is crucial for the quality of image processing and communication systems. Our framework is derived via a combination of the Polyhedral and Iteration Space Slicing models and uses the transitive closure of loop nest dependence graphs. To describe and implement the approach in the source-to-source TRACO compiler, loop dependences are presented in the form of tuple relations. We expose the applicability of the framework to generate tiled code for image analysis, encoding and communication program loop nests from the UTDSP Benchmark Suite. Experimental results demonstrate the speed-up of optimized tiled programs generated by means of the approach implemented in TRACO.

[1]  Albert Cohen,et al.  A Note on the Performance Distribution of Affine Schedules , 2008 .

[2]  Albert Cohen,et al.  Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[3]  Albert Cohen,et al.  Polyhedral AST Generation Is More Than Scanning Polyhedra , 2015, ACM Trans. Program. Lang. Syst..

[4]  Albert Cohen,et al.  The Polyhedral Model Is More Widely Applicable Than You Think , 2010, CC.

[5]  Albert Cohen,et al.  Predictive modeling in a polyhedral optimization space , 2011, CGO 2011.

[6]  Marek Palkowski,et al.  Coarse-Grained Loop Parallelization for Image Processing and Communication Applications , 2010, IP&C.

[7]  Tei-Wei Kuo,et al.  Real-time partitioned scheduling on multi-core systems with local and global memories , 2013, 2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC).

[8]  Paul Feautrier,et al.  Some efficient solutions to the affine scheduling problem. I. One-dimensional time , 1992, International Journal of Parallel Programming.

[9]  Martin Griebl,et al.  Automatic Parallelization of Loop Programs for Distributed Memory Architectures , 2004 .

[10]  Marek Palkowski,et al.  Free scheduling for statement instances of parameterized arbitrarily nested affine loops , 2012, Parallel Comput..

[11]  William Pugh,et al.  The Omega Library interface guide , 1995 .

[12]  Uday Bondhugula,et al.  A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.

[13]  Marek Palkowski,et al.  Perfectly Nested Loop Tiling Transformations Based on the Transitive Closure of the Program Dependence Graph , 2014, ACS.

[14]  Sean Hsien-en Peng,et al.  UTDSP, a VLIW programmable DSP processor , 2000 .

[15]  Cédric Bastoul,et al.  Code generation in the polyhedral model is easier than you think , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[16]  Björn Franke,et al.  Code transformation and instruction set extension , 2009, TECS.

[17]  Jingling Xue,et al.  On Tiling as a Loop Transformation , 1997, Parallel Process. Lett..

[18]  William Pugh,et al.  An Exact Method for Analysis of Value-based Array Data Dependences , 1993, LCPC.

[19]  Monica S. Lam,et al.  An affine partitioning algorithm to maximize parallelism and minimize communication , 1999, ICS '99.

[20]  Marek Palkowski,et al.  Impact of Variable Privatization on Extracting Synchronization-Free Slices for Multi-core Computers , 2012, Facing the Multicore-Challenge.

[21]  Paul Feautrier,et al.  Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time , 1992, International Journal of Parallel Programming.

[22]  Albert Cohen,et al.  Coarse-Grained Loop Parallelization: Iteration Space Slicing vs Affine Transformations , 2009, 2009 Eighth International Symposium on Parallel and Distributed Computing.

[23]  François Irigoin,et al.  Supernode partitioning , 1988, POPL '88.