O (log N / log log N) Randomized Routing in Degree-log N "Hypermeshes"

I . 'L 3 Given the clear and pressing need for improved computer system performance, there are several means of achieving this end. In the simplest approach, current computer architectures are reimplemented using faster technologies. Although this approach will 7 always be exploited, physical, technological, and economic limitations make it incapable of providing all the needed compu@ional power. Instead, parallelism must be exploited to obtain m l y significant performance improvements. Parallelism is a two dimensional problem. Along one dimension we find pure data parallelism as might be found in typical army algorithms involving vectors and matrices. Along the other dimension we find concurrency where independent processes work on facets of an algorithm which may not lend themselves to array processing. Id Assuming the use of the fastest reasonable technology, any further increase in performance requires the efficient exploitation of parallelism in one form or another. The performance of computers can be made incrementaliy extensible by exploiting VLSI b technology to builda con~urrent/parallel computers, ensembles of proceSsing nodes connected by a network. Low latency communication elements are required to support fine-grain or medium-grain parallel computation. Communication between nodes of a multicor$uter need not be slower than the communication between the processor and /memory of a conventional computer. A VLSI-Based network controller can provide nodeto-node communication times that approach main memory access times of sequential computers. A VLSI chip is subject to several technological constraints. Whenever each node of a multicomputer system is implemented as a VLSI chip or a printed circuit board, packaging constraints limit the number of connections that can be made available for communication links. Some key issues which must be considered when designing a high \ performance network controller based on VLSI technology are also discussed. New variations on the 2-d mesh interconnection computer which can be implemented

[1]  S. Lakshmivarahan,et al.  Parallel Sorting Algorithms , 1984, Adv. Comput..

[2]  Hee Yong Youn,et al.  A Highly Efficient Design for Reconfiguring the Processor Array in VLSI , 1988, ICPP.

[3]  K. Digre,et al.  Expert Opinion , 1920, The Hospital.

[4]  Tsair-chin Lin,et al.  M^2-Mesh: An Augmented Mesh Architecture , 1986, ICPP.

[5]  Duncan H. Lawrie,et al.  Access and Alignment of Data in an Array Processor , 1975, IEEE Transactions on Computers.

[6]  Viktor K. Prasanna,et al.  Array Processor with Multiple Broadcasting , 1985, ISCA.

[7]  Russ Miller,et al.  Mesh Computer Algorithms for Computational Geometry , 1989, IEEE Trans. Computers.

[8]  Isaac D. Scherson,et al.  Parallel Sorting in Two-Dimensional VLSI Models of Computation , 1989, IEEE Trans. Computers.

[9]  Anthony P. Reeves,et al.  On Measuring the Performance of a Massively Parallel Processor , 1988, ICPP.

[10]  Massimo Maresca,et al.  Polymorphic-Torus Network , 1989, IEEE Trans. Computers.

[11]  Tsair-chin Lin,et al.  Tradeoffs in Mapping Algorithms to Array Processors , 1985, ICPP.

[12]  Frank Thomson Leighton,et al.  Tight Bounds on the Complexity of Parallel Sorting , 1985, IEEE Trans. Computers.

[13]  Richard W. Hall,et al.  Orthogonal Fast Channels: An Enhanced Mesh Architecture , 1987, ICPP.

[14]  José L. Balcázar,et al.  Optimized mesh-connected networks for SIMD and MIMD architectures , 1987, ISCA '87.

[15]  Kai Hwang,et al.  An Orthogonal Multiprocessor for Parallel Scientific Computations , 1989, IEEE Trans. Computers.

[16]  Chris R. Jesshope,et al.  Parallel Computers 2: Architecture, Programming and Algorithms , 1981 .

[17]  I. Page Parallel Architectures and Computer Vision , 1988 .

[18]  Thompson The VLSI Complexity of Sorting , 1983, IEEE Transactions on Computers.

[19]  David A. Carlson Performing Tree and Prefix Computations on Modified Mesh-Connected Parallel Computers , 1985, ICPP.

[20]  James B. Sinclair,et al.  Optimal Assignments in Broadcast Networks , 1988, IEEE Trans. Computers.

[21]  R.F. Hobson,et al.  A mesh-like array processor with fully connected rows and columns , 1989, Conference Proceeding IEEE Pacific Rim Conference on Communications, Computers and Signal Processing.

[22]  W. Morven Gentleman,et al.  Some Complexity Results for Matrix Computations on Parallel Processors , 1978, JACM.

[23]  Malcolm J. Shute Fifth Generation Wafer Architecture , 1988 .

[24]  Tse-yun Feng,et al.  A Survey of Interconnection Networks , 1981, Computer.

[25]  SahniSartaj,et al.  An optimal routing algorithm for mesh-connected Parallel computers , 1980 .

[26]  Russ Miller,et al.  Mesh Computer Algorithms for Line Segments and Simple Polygons , 1987, ICPP.

[27]  Dharma P. Agrawal,et al.  Evaluating the performance of multicomputer configurations , 1986 .

[28]  Thomas Kailath,et al.  A Family of New Efficient Arrays for Matrix Multiplication , 1989, IEEE Trans. Computers.

[29]  Richard M. Fujimoto,et al.  Multicomputer Networks: Message-Based Parallel Processing , 1987 .

[30]  Laxmikant V. Kalé,et al.  Optimal Communication Neighborhoods , 1986, ICPP.

[31]  Lawrence C. Stewart,et al.  Firefly: a multiprocessor workstation , 1987, IEEE Trans. Computers.

[32]  Edward Gehringer,et al.  A survey of commercial parallel processors , 1988, CARN.

[33]  John C. Peterson,et al.  Caltech/JPL MARK II Hypercube Concurrent Processor , 1985, ICPP.

[34]  Quentin F. Stout,et al.  Supporting Divide-and-Conquer Algorithms for Image Processing , 1987, J. Parallel Distributed Comput..

[35]  Quentin F. Stout,et al.  Mesh-Connected Computers with Broadcasting , 1983, IEEE Transactions on Computers.

[36]  David T. Harper,et al.  Vector Access Performance in Parallel Memories Using a Skewed Storage Scheme , 1987, IEEE Transactions on Computers.

[37]  Adi Shamir,et al.  Shear Sort: A True Two-Dimensional Sorting Techniques for VLSI Networks , 1986, ICPP.

[38]  Branko Soucek,et al.  Neural and massively parallel computers - the sixth generation , 1988 .

[39]  Edward D. Lazowska,et al.  Speedup Versus Efficiency in Parallel Systems , 1989, IEEE Trans. Computers.

[40]  Eli Gafni,et al.  Sorting and Selection in Multi-Channel Broadcast Networks , 1985, ICPP.

[41]  H. T. Kung,et al.  Sorting on a mesh-connected parallel computer , 1976, STOC '76.

[42]  Yijie Han A Family of Parallel Sorting Algorithms , 1985, ICPP.

[43]  Sheldon B. Akers,et al.  A Group-Theoretic Model for Symmetric Interconnection Networks , 1989, IEEE Trans. Computers.

[44]  Sartaj Sahni,et al.  Bitonic Sort on a Mesh-Connected Parallel Computer , 1979, IEEE Transactions on Computers.

[45]  David A. Plaisted,et al.  A Multiprocessor Architecture for Medium-Grain Parallelism , 1986, ICDCS.

[46]  Jake K. Aggarwal,et al.  A Mapping Strategy for Parallel Processing , 1987, IEEE Transactions on Computers.

[47]  Tam Anh Chu,et al.  Design of VLSI Asynchronous FIFO Queues for Packet Communication Networks , 1986, ICPP.

[48]  Keki B. Irani,et al.  Large Scale Unification Using a Mesh-Connected Array of Hardware Unifiers , 1987, ICPP.

[49]  W. Daniel Hillis,et al.  The connection machine , 1985 .

[50]  Xiaobo Li,et al.  On the Communication Complexity of Generalized 2-D Convolution on Array Processors , 1989, IEEE Trans. Computers.

[51]  John P. Hayes,et al.  Architecture of a Hypercube Supercomputer , 1986, ICPP.

[52]  Hiroto Yasuura,et al.  The Parallel Enumeration Sorting Scheme for VLSI , 1982, IEEE Transactions on Computers.