Efficient mapping of algorithms to single-stage interconnections

In this paper, we consider the problem of restructuring or transforming algorithms to efficiently use a single-stage interconnection network. All algorithms contain some freedom in the way they are mapped to a machine. We use this freedom to show that superior interconnection efficiency can be obtained by implementing the interconnections required by the algorithm within the context of the algorithm rather than attempting to implement each request individually. The interconnection considered is the bidirectional shuffle-shift. It is shown that two algorithm transformations are useful for implementing several lower triangular and tridiagonal system algorithms on the shuffle-shift network. Of the 14 algorithms considered, 85% could be implemented on this network. The transformations developed to produce these results are described. They are general-purpose in nature and can be applied to a much larger class of algorithms.

[1]  Sartaj Sahni,et al.  Data broadcasting in SIMD computers , 1981, IEEE Transactions on Computers.

[2]  Leslie Lamport,et al.  The parallel execution of DO loops , 1974, CACM.

[3]  Harold S. Stone,et al.  Parallel Processing with the Perfect Shuffle , 1971, IEEE Transactions on Computers.

[4]  Clark D. Thomborson,et al.  Generalized Connection Networks for Parallel Processor Intercommunication , 1978, IEEE Trans. Computers.

[5]  Paul B. Johnson Congruences and Card Shuffling , 1956 .

[6]  Duncan H. Lawrie,et al.  Access and Alignment of Data in an Array Processor , 1975, IEEE Transactions on Computers.

[7]  Yoichi Muraoka,et al.  On the Number of Operations Simultaneously Executable in Fortran-Like Programs and Their Resulting Speedup , 1972, IEEE Transactions on Computers.

[8]  David J. Kuck,et al.  Time and Parallel Processor Bounds for Linear Recurrence Systems , 1975, IEEE Transactions on Computers.

[9]  Robert Henry Kuhn,et al.  Optimization and interconnection complexity for: parallel processors, single-stage networks, and decision trees , 1980 .

[10]  David A. Patterson,et al.  X-Tree: A tree structured multi-processor computer architecture , 1978, ISCA '78.

[11]  W. Morven Gentleman,et al.  Some Complexity Results for Matrix Computations on Parallel Processors , 1978, JACM.

[12]  Howard Jay Siegel,et al.  Many SIMD interconnection networks have been proposed . To put the different approaches into perspective , this analysis compares a number of single-and multistage networks , 2022 .

[13]  H. T. Kung Let's Design Algorithms for VLSI Systems , 1979 .

[14]  Franco P. Preparata,et al.  The cube-connected-cycles: A versatile network for parallel computation , 1979, 20th Annual Symposium on Foundations of Computer Science (sfcs 1979).

[15]  R. Brent,et al.  Solving Triangular Systems on a Parallel Computer , 1977 .

[16]  Allan Borodin,et al.  The computational complexity of algebraic and numeric problems , 1975, Elsevier computer science library.

[17]  Daniel Gajski,et al.  An Algorithm for Solving Linear Recurrence Systems on Parallel and Pipelined Machines , 1981, IEEE Transactions on Computers.

[18]  John Holland,et al.  A universal computer capable of executing an arbitrary number of sub-programs simultaneously , 1959, IRE-AIEE-ACM '59 (Eastern).