Adapting shuffle-exchange like parallel processing organizations to work as systolic arrays

Abstract In this paper, parallel algorithms for tree computations and linear recurrence systems of the form y i = a i y i −1 + b i are presented. The algorithms are designed to be executed on a network of parallel processors connected using the shuffle-exchange or cube-connected cycles patterns, where the number of processors is possibly much smaller than the size of the problem being solved. They are the result of illustrating that a graph representing the computation being performed can be restructured to match the given number of processors being employed and the pattern by which the processors are connected. By accepting inputs in a systolic fashion, the algorithms make efficient use of the resources of parallel time and number of processors. Their time performance is T = O( n / P + log P ) when P processors connected by the above mentioned networks are employed, which is the maximum possible speedup for P = O( n /log n ). Thus, the shuffle-exchange and the cube-connected cycles parallel processing organizations can be adapted to work as general systolic systems for the solution of an important class of computational problems.

[1]  Daniel Gajski,et al.  An Algorithm for Solving Linear Recurrence Systems on Parallel and Pipelined Machines , 1981, IEEE Transactions on Computers.

[2]  J. Ian Munro,et al.  Optimal Algorithms for Parallel Polynomial Evaluation , 1971, J. Comput. Syst. Sci..

[3]  Allan Gottlieb,et al.  Networks and algorithms for very-large-scale parallel computation , 2011, Computer.

[4]  Franco P. Preparata,et al.  The cube-connected-cycles: A versatile network for parallel computation , 1979, 20th Annual Symposium on Foundations of Computer Science (sfcs 1979).

[5]  David A. Carlson Parallel Processing of Tree-like Computations , 1984, ICDCS.

[6]  David J. Kuck,et al.  A Survey of Parallel Machine Organization and Programming , 1977, CSUR.

[7]  Allan Borodin,et al.  The computational complexity of algebraic and numeric problems , 1975, Elsevier computer science library.

[8]  George P. Copeland,et al.  What if mass storage were free? , 1982, Computer Architecture Workshop.

[9]  Harold S. Stone,et al.  A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations , 1973, IEEE Transactions on Computers.

[10]  H. T. Kung,et al.  Systolic Arrays for (VLSI). , 1978 .

[11]  Marshall C. Pease,et al.  The Indirect Binary n-Cube Microprocessor Array , 1977, IEEE Transactions on Computers.

[12]  H. T. Kung Why systolic architectures? , 1982, Computer.

[13]  Gary L. Miller,et al.  An Asymptotically Optimal Layout for the Shuffle-Exchange Graph , 1983, J. Comput. Syst. Sci..

[14]  Kiyoshi Maruyama,et al.  The Parallel Evaluation of Arithmetic Expressions Without Division , 1973, IEEE Transactions on Computers.

[15]  David J. Kuck,et al.  Time and Parallel Processor Bounds for Linear Recurrence Systems , 1975, IEEE Transactions on Computers.

[16]  Richard P. Brent,et al.  The Parallel Evaluation of General Arithmetic Expressions , 1974, JACM.

[17]  Harold S. Stone,et al.  Parallel Processing with the Perfect Shuffle , 1971, IEEE Transactions on Computers.

[18]  Peter M. Kogge,et al.  Parallel Solution of Recurrence Problems , 1974, IBM J. Res. Dev..