Data alignments for modular time-space mappings of BLAS-like algorithms

Modular time-space transformations have been recently proposed for algorithm mappings that cannot be described by affine functions. This paper extends affine data alignments to a new class of data alignments, called expanded modular data alignments (EMDAs), for algorithms that are mapped by modular time-space transformations. An EMDA is a set of modular data alignments (MDAs) which are described by affine functions module a constant vector. With an EMDA, multiple copies of a data array are mapped into target processors by different modular data alignments (MDAs) and therefore can be efficiently used with modular time-space transformations which may require several operations to access the same data at the same time. Conditions of EMDAs that guarantee local access of data entries are provided. These conditions cover initial data alignment, data movement during the computation, and the number of copies required to avoid unnecessary communications. These conditions can be used to derive the EMDA for a given modular mapping or to generate a modular mapping for a given EMDA so that communication due to data misalignment does not occur. Several examples are given to show that EMDAs are well suited for modular time-space mappings.

[1]  M. Wolfe,et al.  Massive parallelism through program restructuring , 1990, [1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation.

[2]  Hyuk-Jae Lee,et al.  On the injectivity of modular mappings , 1994, Proceedings of IEEE International Conference on Application Specific Array Processors (ASSAP'94).

[3]  Weijia Shang,et al.  Time Optimal Linear Schedules for Algorithms with Uniform Dependencies , 1991, IEEE Trans. Computers.

[4]  S. Kung,et al.  VLSI Array processors , 1985, IEEE ASSP Magazine.

[5]  Lynn Elliot Cannon,et al.  A cellular computer to implement the kalman filter algorithm , 1969 .

[6]  Jack J. Dongarra,et al.  A set of level 3 basic linear algebra subprograms , 1990, TOMS.

[7]  J. R. Gilbert,et al.  Mobile and replicated alignment of arrays in data-parallel programs , 1993, Supercomputing '93. Proceedings.

[8]  Hyuk-Jae Lee,et al.  Toward data distribution independent parallel matrix multiplication , 1995, Proceedings of 9th International Parallel Processing Symposium.

[9]  Yves Robert,et al.  Constructive Methods for Scheduling Uniform Loop Nests , 1994, IEEE Trans. Parallel Distributed Syst..