Generalized Schemes for Access and Alignment of Data in Parallel Processors with Self-Routing Interconnection Networks

Abstract In this paper, we give a generalized solution to the problem of conflict-free access of various templates of data of a matrix, when they are stored in memory units in a parallel processor. The important features of our method are: (a) compact representation of a skewing scheme, (b) simple address computation, (c) use of self-routing schemes to set up the interconnection network, and (d) a general framework for the study of skewing schemes. In our method, each template access of interest will be a linear permutation on the processor address. The linear permutation involved determines the types of templates accessible. For parallel access of the most important templates, namely, row, column, main diagonal, and square blocks, the interconnection network needs to realize only the class of linear-complement permutations. It is known that with Benes or Omega as the interconnection network, one can efficiently self-route these permutations; this compares favorably with the schemes proposed by other researchers who assume that a crossbar is available for processor-memory interconnections. Hence, the approach given in the paper can be used to solve the data alignment problem for the existing parallel machines such as IBM RP3, Cedar multiprocessor, and NYU Ultracomputer. This is a generalized solution to the data skewing problem and encompasses the previous efforts by other researchers as special cases.

[1]  G. Birkhoff,et al.  A survey of modern algebra , 1942 .

[2]  William Jalby,et al.  XOR-Schemes: A Flexible Data Organization in Parallel Memories , 1985, ICPP.

[3]  Douglas Stott Parker,et al.  Notes on Shuffle/Exchange-Type Switching Networks , 1980, IEEE Transactions on Computers.

[4]  De-Lei Lee On Access and Alignment of Data in a Parallel Processor , 1989, Inf. Process. Lett..

[5]  A. Hedayat A Complete Solution to the Existence and Nonexistence of Knut Vik Designs and Orthogonal Knut Vik Designs , 1977, J. Comb. Theory, Ser. A.

[6]  Cauligi S. Raghavendra,et al.  Optimal Self-Routing of Linear-Complement Permutations in Hypercubes , 1990, Proceedings of the Fifth Distributed Memory Computing Conference, 1990..

[7]  Pen-Chung Yew,et al.  An Easily Controlled Network for Frequently Used Permutations , 1981, IEEE Transactions on Computers.

[8]  Duncan H. Lawrie,et al.  Access and Alignment of Data in an Array Processor , 1975, IEEE Transactions on Computers.

[9]  Abraham Waksman,et al.  A Permutation Network , 1968, JACM.

[10]  Daniel Gajski,et al.  CEDAR: a large scale multiprocessor , 1983, CARN.

[11]  V. Benes,et al.  Mathematical Theory of Connecting Networks and Telephone Traffic. , 1966 .

[12]  Jan van Leeuwen,et al.  The Structure of Periodic Storage Schemes for Parallel Memories , 1985, IEEE Transactions on Computers.

[13]  Frank K. Hwang Crisscross Latin Squares , 1979, J. Comb. Theory, Ser. A.

[14]  W. Greub Linear Algebra , 1981 .

[15]  Ralph Grishman,et al.  The NYU Ultracomputer—Designing an MIMD Shared Memory Parallel Computer , 1983, IEEE Transactions on Computers.

[16]  Gene H. Golub,et al.  Matrix computations , 1983 .

[17]  Alan Norton,et al.  A Class of Boolean Linear Transformations for Conflict-Free Power-of-Two Stride Access , 1987, ICPP.

[18]  Henry D. Shapiro,et al.  Theoretical Limitations on the Efficient Use of Parallel Memories , 1978, IEEE Transactions on Computers.

[19]  De-Lei Lee Scrambled storage for parallel memory systems , 1988, ISCA '88.

[20]  Marshall C. Pease,et al.  The Indirect Binary n-Cube Microprocessor Array , 1977, IEEE Transactions on Computers.

[21]  Kevin P. McAuliffe,et al.  The IBM Research Parallel Processor Prototype (RP3): Introduction and Architecture , 1985, ICPP.

[22]  V. K. Prasanna-Kumar,et al.  Perfect Latin squares and parallel array access , 1989, ISCA '89.

[23]  Paul Budnik,et al.  The Organization and Use of Parallel Memories , 1971, IEEE Transactions on Computers.

[24]  Duncan H. Lawrie,et al.  The Prime Memory System for Array Access , 1982, IEEE Transactions on Computers.

[25]  Kenneth E. Batcher The Multidimensional Access Memory in STARAN , 1977, IEEE Transactions on Computers.

[26]  J. Dénes,et al.  Latin squares and their applications , 1974 .

[27]  Sartaj Sahni,et al.  A Self-Routing Benes Network and Parallel Permutation Algorithms , 1981, IEEE Transactions on Computers.

[28]  Cauligi S. Raghavendra,et al.  On Array Storage for Conflict-Free Memory Access for Parallel Processors , 1988, International Conference on Parallel Processing.