A framework for the design, synthesis and cycle-accurate simulation of multiprocessor networks

This paper introduces a framework for the design, synthesis and cycle-accurate simulation for parallel computing networks of 128+ processors. In order to accurately characterize the network, we present a bottom-up design methodology in which each of the components are designed using a hardware description language and synthesized to an FPGA for performance estimation of the final ASIC implementation. The components are then integrated to form a parallel computing network and simulated using a cycle-accurate simulator with network traffic described by command files. This enabled us to simulate various switching techniques, three of which are presented in this paper: wormhole switching, circuit switching and a newly introduced technique called predictive circuit switching. In our experiments, four different representational traffics are generated for our simulation and, to show the flexibility of this model, we vary the cable lengths and thus their latency for all four test cases. Our results show that this hardware design, synthesis and cycle-accurate simulation methodology provides a useful method for evaluating design tradeoffs in parallel networks. A non-blocking queue, with up to 128 internal queues, and a real-time bandwidth scheduler, for up to 128 ports, were designed in hardware with hardware synthesis results presented. From our network simulation results, we conclude that predictive circuit switching exceeds the performance of packet switching for highly predictable traffic, like collective communications, and for heavily loaded unpredictable traffic with small packet sizes. As expected, predictive circuit switching significantly underperforms both packet and circuit switching for unpredictable traffic.

[1]  Wu-chun Feng,et al.  The Quadrics Network: High-Performance Clustering Technology , 2002, IEEE Micro.

[2]  Kwan Lawrence Yeung Efficient time slot assignment algorithms for TDM hierarchical and nonhierarchical switching systems , 2001, IEEE Trans. Commun..

[3]  T. Cegrell A Simulation Model of the TIDAS Computer Network , 1976, IEEE Trans. Commun..

[4]  Nick McKeown,et al.  The Tiny Tera: A Packet Switch Core , 1998, IEEE Micro.

[5]  Alan D. George,et al.  An Integrated Simulation Environment for Parallel and Distributed System Prototying , 1999, Simul..

[6]  Donald M. Chiarulli,et al.  Predicting Multiprocessor Memory Access Patterns with Learning Models , 1997, ICML.

[7]  Rami G. Melhem,et al.  Dynamic Reconfiguration of Optically Interconnected Networks with Time-Division Multiplexing , 1994, J. Parallel Distributed Comput..

[8]  Rami G. Melhem,et al.  Algorithms for Supporting Compiled Communication , 2003, IEEE Trans. Parallel Distributed Syst..

[9]  Sudeep Pasricha Transaction level modeling of SoC with SystemC 2.0 , 2004 .

[10]  S. F. Nugent,et al.  The iPSC/2 direct-connect communications technology , 1988, C3P.

[11]  Kang G. Shin,et al.  PP-MESS-SIM: A Flexible and Extensible Simulator for Evaluating Multicomputer Networks , 1997, IEEE Trans. Parallel Distributed Syst..

[12]  Lasse Natvig High-level architectural simulation of the Torus Routing Chip , 1997, Proceedings of Meeting on Verilog HDL (IVC/VIUF'97).

[13]  Sanjay Ranka,et al.  Scheduling of unstructured communication on the Intel iPSC/860 , 1994, Proceedings of Supercomputing '94.

[14]  Deborah Estrin,et al.  Advances in network simulation , 2000, Computer.

[15]  Andrea Francini,et al.  Scalable electronic packet switches , 2003, IEEE J. Sel. Areas Commun..

[16]  Todd M. Austin,et al.  SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[17]  Ahmad Faraj,et al.  Communication Characteristics in the NAS Parallel Benchmarks , 2002, IASTED PDCS.

[18]  Edward C. Russell Building Simulation Models with Simscript II.5 , 1999 .

[19]  Thomas Rauber,et al.  Modeling the communication behavior of the Intel Paragon , 1997, Proceedings Fifth International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[20]  P. Mars Some aspects of simulation in telecommunication networks , 1995 .

[21]  Fernando Gehm Moraes,et al.  From VHDL register transfer level to SystemC transaction level modeling: a comparative case study , 2003, 16th Symposium on Integrated Circuits and Systems Design, 2003. SBCCI 2003. Proceedings..

[22]  Mats Brorsson,et al.  A Comparative Characterization of Communication Patterns in Applications Using MPI and Shared Memory on an IBM SP2 , 1998, CANPC.

[23]  Rami G. Melhem,et al.  A high speed scheduler/controller for unbuffered banyan networks , 2001, Comput. Commun..

[24]  Y. Liu,et al.  Simulation and analysis of enhanced switch architectures for interconnection networks in massively parallel shared memory machines , 1988, Proceedings., 2nd Symposium on the Frontiers of Massively Parallel Computation.

[25]  Aura Ganz,et al.  TDMA communication for SS/TDMA satellites with optical intersatellite links , 1990, IEEE International Conference on Communications, Including Supercomm Technical Sessions.

[26]  Jon M. Kerridge,et al.  Simulating microprocessor systems using occam and a network of transputers , 1989 .

[27]  Nikitas J. Dimopoulos,et al.  Hiding communication latency in reconfigurable message-passing environments , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.

[28]  Nick McKeown,et al.  Scheduling algorithms for input-queued cell switches , 1996 .

[29]  Gheith A. Abandah,et al.  Modeling the communication performance of the IBM SP2 , 1996, Proceedings of International Conference on Parallel Processing.

[30]  I. Xilinx,et al.  Virtex-II Pro and Virtex-II Pro X Platform FPGAs: Complete data sheet , 2004 .

[31]  Sudhakar Yalamanchili,et al.  Interconnection Networks: An Engineering Approach , 2002 .