论文信息 - Symbolic loop parallelization for balancing I/O and memory accesses on processor arrays

Symbolic loop parallelization for balancing I/O and memory accesses on processor arrays

Loop parallelization techniques for massively parallel processor arrays using one-level tiling are often either I/O- or memory-bounded, exceeding the target architecture's capabilities. Furthermore, if the number of available processing elements is only known at runtime - as in adaptive systems - static approaches fail. To solve these problems, we present a hybrid compile/runtime technique to symbolically parallelize loop nests with uniform dependences on multiple levels. At compile time, two novel transformations are performed: (a) symbolic hierarchical tiling followed by (b) symbolic multi-level scheduling. By tuning the size of the tiles on multiple levels, a trade-off between the necessary I/O-bandwidth and memory is possible, which facilitates obeying resource constraints. The resulting schedules are symbolic with respect to the number of tiles; thus, the number of processing elements to map onto does not need to be known at compile time. At runtime, when the number is known, a simple prolog chooses a feasible schedule with respect to I/O and memory constraints that is latency-optimal for the chosen tile size. In this way, our approach dynamically chooses latency-optimal and feasible schedules while avoiding expensive re-compilations.

Jürgen Teich | Frank Hannig | Michael Witterauf | Alexandru Tanase

[1] Jürgen Teich,et al. Invasive Algorithms and Architectures Invasive Algorithmen und Architekturen , 2008, it Inf. Technol..

[2] Sanjay V. Rajopadhye,et al. Efficient Tiled Loop Generation: D-Tiling , 2009, LCPC.

[3] Jürgen Teich,et al. Symbolic inner loop parallelisation for massively parallel processor arrays , 2014, 2014 Twelfth ACM/IEEE Conference on Formal Methods and Models for Codesign (MEMOCODE).

[4] Sriram Krishnamoorthy,et al. Parametric multi-level tiling of imperfectly nested loops , 2009, ICS.

[5] B. Ramakrishna Rau,et al. A Constructive Solution to the Juggling Problem in Systolic Array Synthesis , 2000 .

[6] Sanjay V. Rajopadhye,et al. Towards Optimal Multi-level Tiling for Stencil Computations , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[7] Sanjay V. Rajopadhye,et al. Parameterized loop tiling , 2012, TOPL.

[8] Sanjay V. Rajopadhye,et al. Parameterized tiled loops for free , 2007, PLDI '07.

[9] J. Ramanujam,et al. Parametric Tiling of Affine Loop Nests , 2010 .

[10] Jürgen Teich,et al. Symbolic Mapping of Loop Programs onto Processor Arrays , 2014, J. Signal Process. Syst..

[11] Lothar Thiele,et al. On the design of piecewise regular processor arrays , 1989, IEEE International Symposium on Circuits and Systems,.

[12] Oscar H. Ibarra,et al. On symbolic scheduling and parallel complexity of loops , 1995, Proceedings.Seventh IEEE Symposium on Parallel and Distributed Processing.

[13] Steven Swanson,et al. Conservation cores: reducing the energy of mature computations , 2010, ASPLOS XV.

[14] Yves Robert,et al. Linear scheduling is close to optimality , 1992, [1992] Proceedings of the International Conference on Application Specific Array Processors.

[15] J. Ramanujam,et al. DynTile: Parametric tiled loop generation for parallel execution on multicore processors , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[16] Yves Robert,et al. Affine-by-Statement Scheduling of Uniform and Affine Loop Nests over Parametric , 1995, J. Parallel Distributed Comput..

[17] Uwe Eckhardt,et al. Hierarchical algorithm partitioning at system level for an improved utilization of memory structures , 1999, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[18] Sanjay V. Rajopadhye,et al. Multi-level tiling: M for the price of one , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[19] Frédéric Vivien,et al. A constructive solution to the juggling problem in processor array synthesis , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[20] Tomofumi Yuki,et al. Parametrically Tiled Distributed Memory Parallelization of Polyhedral Programs , 2013 .

[21] Frank Hannig,et al. Invasive Tightly-Coupled Processor Arrays , 2014, ACM Trans. Embed. Comput. Syst..