A design methodology for efficient application-specific on-chip interconnects

As the level of chip-integration continues to advance at a fast pace, the desire for efficient interconnects - whether on-chip or off-chip - is rapidly increasing. Traditional interconnects like buses, point-to-point wires, and regular topologies may suffer from poor resource sharing in the time and space domains, leading to high contention or low resource utilization. In this paper, we propose a design methodology for constructing networks for special-purpose computer systems with well-behaved (known) communication characteristics. A temporal and spatial model is proposed to define the sufficient condition for contention-free communication. Based upon this model, a design methodology using a recursive bisection technique is applied to systematically partition a parallel system such that the required number of links and switches is minimized while achieving low contention. Results show that the design methodology can generate more optimized on-chip networks with up to 60 percent fewer resources than meshes or tori while providing blocking performance closer to that of a fully connected crossbar.

[1]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[2]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[3]  Stephen D. Brown,et al.  Flexibility of interconnection structures for field-programmable gate arrays , 1991 .

[4]  Luciano Lavagno,et al.  Hardware-software codesign of embedded systems , 1994, IEEE Micro.

[5]  Wayne Wolf,et al.  Hardware-software co-design of embedded systems , 1994, Proc. IEEE.

[6]  Cécile Germain,et al.  Static Communications in Parallel Scientific Propgrams , 1994, PARLE.

[7]  William Gropp,et al.  Users guide for mpich, a portable implementation of MPI , 1996 .

[8]  Ramesh Subramonian,et al.  LogP: a practical model of parallel computation , 1996, CACM.

[9]  Lionel M. Ni,et al.  The effects of network contention on processor allocation strategies , 1996, Proceedings of International Conference on Parallel Processing.

[10]  Sarita V. Adve,et al.  RSIM Reference Manual: Version 1.0 , 1997 .

[11]  A. O. Fernandes,et al.  Hardware-software codesign of embedded systems , 1998, Proceedings. XI Brazilian Symposium on Integrated Circuit Design (Cat. No.98EX216).

[12]  Niraj K. Jha,et al.  MOGAC: a multiobjective genetic algorithm for hardware-software cosynthesis of distributed embedded systems , 1998, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[13]  Timothy Mark Pinkston,et al.  Design issues for core-based optoelectronic chips: a case study of the WARRP network router , 1999 .

[14]  Shietung Peng,et al.  Wavelengths requirement for permutation routing in all-optical multistage interconnection networks , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[15]  José Duato,et al.  Characterization of communications between processes in message-passing applications , 2000, Proceedings IEEE International Conference on Cluster Computing. CLUSTER 2000.

[16]  F. Silla,et al.  A new task mapping technique for communication-aware scheduling strategies , 2001, Proceedings International Conference on Parallel Processing Workshops.

[17]  W. Dally,et al.  Route packets, not wires: on-chip interconnection networks , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[18]  Jason Miller,et al.  The Raw Processor: A Composeable 32-Bit Fabric for Embedded and General Purpose Computing , 2001 .

[19]  Ruby B. Lee,et al.  Efficient permutation instructions for fast software cryptography , 2001 .

[20]  A design space evaluation of grid processor architectures , 2001, MICRO.

[21]  Shubhendu S. Mukherjee,et al.  The Alpha 21364 network architecture , 2001, HOT 9 Interconnects. Symposium on High Performance Interconnects.

[22]  Timothy Mark Pinkston,et al.  Characterization of Deadlocks in Irregular Networks , 2002, J. Parallel Distributed Comput..

[23]  Scott Hauck,et al.  Reconfigurable computing: a survey of systems and software , 2002, CSUR.

[24]  Radu Marculescu,et al.  Exploiting the Routing Flexibility for Energy/Performance Aware Mapping of Regular NoC Architectures , 2003, DATE.

[25]  Timothy Mark Pinkston,et al.  A clustering approach for identifying and quantifying irregularities in interconnection networks , 2003, IEEE Trans. Parallel Distributed Syst..

[26]  Jeffrey S. Vetter,et al.  Communication characteristics of large-scale scientific applications for contemporary cluster architectures , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[27]  José Duato,et al.  Deadlock-Free Dynamic Reconfiguration Schemes for Increased Network Dependability , 2003, IEEE Trans. Parallel Distributed Syst..

[28]  Timothy Mark Pinkston,et al.  A methodology for designing efficient on-chip interconnects on well-behaved communication patterns , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[29]  Srinivasan Murali,et al.  Bandwidth-constrained mapping of cores onto NoC architectures , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.