Generation of Injective and Reversible Modular Mappings

A modular mapping consists of a linear transformation followed by modulo operations. It is characterized by a transformation matrix and a vector of moduli, called the modulus vector. Modular mappings are useful to derive parallel versions of algorithms with commutative operations and algorithms intended for execution on processor arrays with toroidal networks. In order to preserve algorithm correctness, modular mappings must be injective. Results of previous work characterize injective modular mappings of rectangular index sets. This paper provides a technique to generate modular mappings that satisfy these injective conditions and extends the results to general index sets. For an n-dimensional rectangular index set, the technique has O(n/sup 2/n!) complexity. To facilitate generation of efficient code, modular mappings must also be reversible (i.e., have easily described inverses). An O(n/sup 2/) method is provided to generate reversible modular mappings. This method reduces the search space by fixing entries of the modulus vector while attempting to minimize the number of entries to exclude few solutions. For general index sets defined by linear inequalities, injectivity can be checked by formulating and solving a set of linear inequalities. A modified Fourier-Motzkin elimination is proposed to solve these inequalities. To generate an injective modular mapping of an index set defined by linear inequalities, this paper proposes a technique that attempts to minimize the values of the entries of the modulus vector. Several examples are provided to illustrate the application of the above mentioned methods, including the case of BLAS routines.

[1]  Franck Delaplace,et al.  A Static Execution Model for Data Parallelism , 1994, Parallel Process. Lett..

[2]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[3]  Weijia Shang,et al.  Time Optimal Linear Schedules for Algorithms with Uniform Dependencies , 1991, IEEE Trans. Computers.

[4]  S. Kung,et al.  VLSI Array processors , 1985, IEEE ASSP Magazine.

[5]  Petter E. Bjørstad,et al.  Efficient Matrix Multiplication on SIMD Computers , 1992, SIAM J. Matrix Anal. Appl..

[6]  Yves Robert,et al.  On the Alignment Problem , 1994, Parallel Process. Lett..

[7]  Frank Harary,et al.  Distance in graphs , 1990 .

[8]  Keshav Pingali,et al.  Access normalization: loop restructuring for NUMA compilers , 1992, ASPLOS V.

[9]  Hyuk-Jae Lee,et al.  Automatic Generation of Modular Time-Space Mappings and Data Alignments , 1998, J. VLSI Signal Process..

[10]  Lionel M. Ni,et al.  Processor Mapping Techniques Toward Efficient Data Redistribution , 1995, IEEE Trans. Parallel Distributed Syst..

[11]  Yves Robert,et al.  A Characterization of One-to-One Modular Mappings , 1996, Parallel Process. Lett..

[12]  Hyuk-Jae Lee,et al.  Modular Mappings and Data Distribution Independent Computations , 1997, Parallel Process. Lett..

[13]  Corinne Ancourt,et al.  A Linear Algebra Framework for Static High Performance Fortran Code Distribution , 1997, Sci. Program..

[14]  S. Lennart Johnsson,et al.  Communication Efficient Basic Linear Algebra Computations on Hypercube Architectures , 1987, J. Parallel Distributed Comput..

[15]  William Pugh,et al.  The Omega test: A fast and practical integer programming algorithm for dependence analysis , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[16]  Fikret Erçal,et al.  Time-Efficient Maze Routing Algorithms on Reconfigurable Mesh Architectures , 1997, J. Parallel Distributed Comput..

[17]  José María Llabería,et al.  Loop Transformation Using Nonunimodular Matrices , 1995, IEEE Trans. Parallel Distributed Syst..

[18]  Patrice Quinton,et al.  The mapping of linear recurrence equations on regular arrays , 1989, J. VLSI Signal Process..

[19]  Hyuk-Jae Lee,et al.  Generalized Cannon's algorithm for parallel matrix multiplication , 1997, ICS '97.

[20]  Christian Lengauer,et al.  Loop Parallelization in the Polytope Model , 1993, CONCUR.

[21]  Hyuk-Jae Lee,et al.  On the injectivity of modular mappings , 1994, Proceedings of IEEE International Conference on Application Specific Array Processors (ASSAP'94).

[22]  Franco P. Preparata,et al.  Area-Time Optimal VLSI Networks for Multiplying Matrices , 1980, Inf. Process. Lett..

[23]  J. R. Gilbert,et al.  Mobile and replicated alignment of arrays in data-parallel programs , 1993, Supercomputing '93. Proceedings.

[24]  William Pugh,et al.  A practical algorithm for exact array dependence analysis , 1992, CACM.

[25]  George B. Dantzig,et al.  Fourier-Motzkin Elimination and Its Dual , 1973, J. Comb. Theory, Ser. A.

[26]  Lynn Elliot Cannon,et al.  A cellular computer to implement the kalman filter algorithm , 1969 .

[27]  Jack J. Dongarra,et al.  A set of level 3 basic linear algebra subprograms , 1990, TOMS.

[28]  Paul Feautrier,et al.  Fuzzy array dataflow analysis , 1995, PPOPP '95.

[29]  Benjamin W. Wah,et al.  The Design of Optimal Systolic Arrays , 1985, IEEE Transactions on Computers.

[30]  Franck Delaplace,et al.  Automatic Vectorization of Communications for Data-Parallel Programs , 1995, Euro-Par.

[31]  Hyuk-Jae Lee,et al.  Systematic optimization of basic linear algebra computations for distributed-memory systems , 1996 .

[32]  Paul Feautrier,et al.  Some efficient solutions to the affine scheduling problem. I. One-dimensional time , 1992, International Journal of Parallel Programming.

[33]  M. Wolfe,et al.  Massive parallelism through program restructuring , 1990, [1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation.

[34]  Keshav Pingali,et al.  Access normalization: loop restructuring for NUMA computers , 1993, TOCS.

[35]  J. L. Aravena Triple matrix product architectures for fast signal processing , 1988 .