Optimal semi-oblique tiling

For 2-D iteration space tiling, we address the problem of determining the tile parameters that minimize the total execution time under the BSP model. We consider uniform dependency computations, tiled so that (at least) one of the tile boundaries is parallel to the domain boundary. We determine the optimal tile size as a closed form solution. In addition, we determine the optimal number of processors and also the optimal slope of the oblique tile boundary. Our predictions are validated, among other examples, on a sequence alignment problem specialized to similar sequences using Ficket's “k-band” algorithm, for which, our optimal semi-oblique tiling yields an improvement over orthogonal tiling by a factor of 2.5. Our optimal solution requires a block-cyclic distribution of tiles to processors. The best one can obtain with only block distribution (as many authors require) is 3 times slower.

[1]  Jingling Xue,et al.  On Tiling as a Loop Transformation , 1997, Parallel Process. Lett..

[2]  P. Calland,et al.  First Steps Towards Optimal Oblique Tile , 2000 .

[3]  François Irigoin,et al.  Supernode partitioning , 1988, POPL '88.

[4]  Jack Dongarra,et al.  Automatic Blocking of Nested Loops , 1990 .

[5]  Larry Carter,et al.  Selecting tile shape for minimal execution time , 1999, SPAA '99.

[6]  T. Risset,et al.  Precise tiling for uniform loop nests , 1995, Proceedings The International Conference on Application Specific Array Processors.

[7]  Larry Carter,et al.  Predicting performance for tiled perfectly nested loops , 1999 .

[8]  William F. McColl,et al.  Scalable Computing , 1995, Computer Science Today.

[9]  Kathryn S. McKinley,et al.  Tile size selection using cache organization and data layout , 1995, PLDI '95.

[10]  John A. Chandy,et al.  Communication Optimizations Used in the Paradigm Compiler for Distributed-Memory Multicomputers , 1994, 1994 Internatonal Conference on Parallel Processing Vol. 2.

[11]  Weijia Shang,et al.  On supernode transformation with minimized total running time , 1996, Proceedings of International Conference on Application Specific Systems, Architectures and Processors: ASAP '96.

[12]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[13]  David G. Wonnacott,et al.  Time Skewing for Parallel Computers , 1999, LCPC.

[14]  Sanjay V. Rajopadhye,et al.  Optimal Semi-Oblique Tiling , 2003, IEEE Trans. Parallel Distributed Syst..

[15]  J. Ramanujam,et al.  Tiling multidimensional iteration spaces for nonshared memory machines , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[16]  James W. Fickett,et al.  Fast optimal alignment , 1984, Nucleic Acids Res..

[17]  Anant Agarwal,et al.  Automatic Partitioning of Parallel Loops and Data Arrays for Distributed Shared-Memory Multiprocessors , 1995, IEEE Trans. Parallel Distributed Syst..

[18]  Yves Robert,et al.  Determining the idle time of a tiling: new results , 1997, Proceedings 1997 International Conference on Parallel Architectures and Compilation Techniques.

[19]  Sanjay V. Rajopadhye,et al.  Optimal Orthogonal Tiling , 1998, Euro-Par.

[20]  Larry Carter,et al.  Determining the idle time of a tiling , 1997, POPL '97.

[21]  Rumen Andonov,et al.  Tiling and Processors Allocation for Three Dimensional Iteration Space , 1999, HiPC.

[22]  Hiroshi Ohta,et al.  Optimal tile size adjustment in compiling general DOACROSS loop nests , 1995, ICS '95.

[23]  Chung-Ta King,et al.  Pipelined Data Parallel Algorithms-I: Concept and Modeling , 1990, IEEE Trans. Parallel Distributed Syst..

[24]  João Meidanis,et al.  Introduction to computational molecular biology , 1997 .

[25]  Ken Kennedy,et al.  Evaluating Compiler Optimizations for Fortran D , 1994, J. Parallel Distributed Comput..

[26]  Mateo Valero,et al.  Computing size-independent matrix problems on systolic array processors , 1986, ISCA '86.

[27]  T. KingC.,et al.  Pipelined Data Parallel Algorithms-I , 1990 .

[28]  Richard M. Karp,et al.  The Organization of Computations for Uniform Recurrence Equations , 1967, JACM.

[29]  Chung-Ta King,et al.  Pipelined Data Parallel Algorithms-II: Design , 1990, IEEE Trans. Parallel Distributed Syst..

[30]  Sanjay V. Rajopadhye,et al.  Optimal Orthogonal Tiling of 2-D Iterations , 1997, J. Parallel Distributed Comput..

[31]  Dan I. Moldovan,et al.  Partitioning and Mapping Algorithms into Fixed Size Systolic Arrays , 1986, IEEE Transactions on Computers.

[32]  Yves Robert,et al.  (Pen)-ultimate tiling? , 1994, Integr..

[33]  Michael Wolfe,et al.  Iteration Space Tiling for Memory Hierarchies , 1987, PPSC.

[34]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.