A bandwidth-efficient implementation of mesh with multiple broadcasting

This paper presents a mesh with virtual buses as the bandwidth-efficient implementation of the mesh with multiple broadcasting on which many computational problems can be solved with reduced time complexity. The new system provides a low latency and high bandwidth communication mechanism without the disadvantage of the dual network approach: the possible bandwidth waste. The virtual buses are built using those communication links of the base point-to-point mesh network only upon request. Even if bus requests are not made, the full network bandwidth is harnessed by the point-to-point communication. On the contrary, the bandwidth assigned to the conventional real buses will be wasted when there is little bus request. This paper introduces a wormhole router design equipped with the virtual bus functions, describes the connections among these routers, and presents various virtual bus transactions. We prove the effectiveness of the virtual bus by showing that a representative semigroup computation can be solved very efficiently on it: a vector maxima finding is accelerated by 2.66 times in the experiment. We also explore the network characteristics by the distribution-driven simulation of the system. These evaluations convinced us that the mesh with virtual buses is a promising approach to low latency and high bandwidth communication which has applications in many parallel computations such as in parallel simulations, computer graphics, real time systems, and many others.

[1]  Kang G. Shin,et al.  Implementation of Decentralized Load Sharing in Networked Workstations Using the Condor Package , 1997, J. Parallel Distributed Comput..

[2]  Kyu Ho Park,et al.  A Skew-tolerant Wave-Pipelined Router on FPGA , 1999 .

[3]  Herb Schwetman,et al.  CSIM: a C-based process-oriented simulation language , 1986, WSC '86.

[4]  Kevin John Nowka,et al.  High-performance CMOS system design using wave pipelining , 1996 .

[5]  Stephan Olariu,et al.  Convexity Problems on Meshes with Multiple Broadcasting , 1995, J. Parallel Distributed Comput..

[6]  D. Parkinson,et al.  The AMT DAP 500 , 1988, Digest of Papers. COMPCON Spring 88 Thirty-Third IEEE Computer Society International Conference.

[7]  Viktor K. Prasanna,et al.  Array Processor with Multiple Broadcasting , 1985, ISCA.

[8]  Johnnie W. Baker,et al.  Solving a Two-Dimensional Knapsack Problem on a Mesh with Multiple Buses , 1995, ICPP.

[9]  Stephan Olariu,et al.  A Unifying Look at Semigroup Computations on Meshes with Multiple Broadcasting , 1993, Parallel Process. Lett..

[10]  Stephan Olariu,et al.  Simulating Enhanced Meshes, with Applications , 1993, Parallel Process. Lett..

[11]  Philip K. McKinley,et al.  MultiSim: A Simulation Tool for the Study of Large-Scale Multiprocessors , 1993, MASCOTS.

[12]  D. Matthew Taub Improved Control Acquisition Scheme for the IEEE 896 Futurebus , 1987, IEEE Micro.

[13]  Yen-Wen Lu,et al.  Permutation on the mesh with reconfigurable bus: algorithms and practical considerations , 1993, [1993] Proceedings Seventh International Parallel Processing Symposium.

[14]  Alok Aggarwal Optimal Bounds for Finding Maximum on Array of Processors with k Global Buses , 1986, IEEE Transactions on Computers.

[15]  Jang-Ping Sheu,et al.  Designing Efficient Parallel Algorithms on Mech-Connected Computers with Multiple Broadcasting , 1990, IEEE Trans. Parallel Distributed Syst..

[16]  Kyu Ho Park,et al.  A wormhole router with embedded broadcasting virtual bus for mesh computers , 2000 .

[17]  C. Siva Ram Murthy,et al.  A faster algorithm for sorting on mesh-connected computers with multiple broadcasting using fewer processors , 1993 .

[18]  Selim G. Akl Parallel computation: models and methods , 1997 .

[19]  Kyu Ho Park,et al.  A Wormhole Router with Embedded Broadcasting Virtual Bus for Mesh Computers , 2000, Parallel Process. Lett..

[20]  Pedro López,et al.  A high performance router architecture for interconnection networks , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.

[21]  Quentin F. Stout,et al.  Mesh-Connected Computers with Broadcasting , 1983, IEEE Transactions on Computers.

[22]  Shahid H. Bokhari,et al.  Finding Maximum on an Array Processor with a Global Bus , 1984, IEEE Transactions on Computers.