Implementing DSP Algorithms with On-Chip Networks

Many DSP algorithms are very computationally intensive. They are typically implemented using an ensemble of processing elements (PEs) operating in parallel. The results from PEs need to be communicated with other PEs, and for many applications the cost of implementing the communication between PEs is very high. Given a DSP algorithm with high communication complexity, it is natural to use a network-on-chip (NoC) to implement the communication. We address two key optimization problems that arise in this context - placement, i.e., assigning computations to PEs on the NoC, and scheduling, i.e., constructing a detailed cycle-by-cycle scheme for implementing the communication between PEs on the NoC

[1]  Hoi-Jun Yoo,et al.  An 800MHz star-connected on-chip network for application to systems on a chip , 2003, 2003 IEEE International Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC..

[2]  Kees G. W. Goossens,et al.  Trade Offs in the Design of a Router with Both Guaranteed and Best-Effort Services for Networks on Chip , 2003, DATE.

[3]  Jari Nurmi,et al.  Interconnect IP node for future system-on-chip designs , 2002, Proceedings First IEEE International Workshop on Electronic Design, Test and Applications '2002.

[4]  F. Leighton,et al.  Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes , 1991 .

[5]  Ken Mai,et al.  The future of wires , 2001, Proc. IEEE.

[6]  H. Schmit,et al.  Memory optimization in single chip network switch fabrics , 2002, Proceedings 2002 Design Automation Conference (IEEE Cat. No.02CH37324).

[7]  K. Keutzer,et al.  System-level design: orthogonalization of concerns andplatform-based design , 2000, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[8]  W. Dally,et al.  Route packets, not wires: on-chip interconnection networks , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[9]  Srinivasan Murali,et al.  An Application-Specific Design Methodology for STbus Crossbar Generation , 2005, Design, Automation and Test in Europe.

[10]  Dake Liu,et al.  SoCBUS: switched network on chip for hard real time embedded systems , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[11]  Naveed A. Sherwani Algorithms for VLSI Physcial Design Automation , 1998 .

[12]  Sujit Dey,et al.  On-chip communication architecture for OC-768 network processors , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[13]  Axel Jantsch,et al.  A network on chip architecture and design methodology , 2002, Proceedings IEEE Computer Society Annual Symposium on VLSI. New Paradigms for VLSI Systems Design. ISVLSI 2002.

[14]  Tobias Bjerregaard,et al.  A survey of research and practices of Network-on-chip , 2006, CSUR.

[15]  D. Liu,et al.  Design of an Internet core router using the SoCBUS network on chip , 2005, International Symposium on Signals, Circuits and Systems, 2005. ISSCS 2005..

[16]  Cheng-Shang Chang,et al.  Load balanced Birkhoff-von Neumann switches, part II: multi-stage buffering , 2002, Comput. Commun..

[17]  Adnan Aziz,et al.  A near optimal scheduler for switch-memory-switch routers , 2003, SPAA '03.

[18]  Cheng-Shang Chang,et al.  Load balanced Birkhoff-von Neumann switches , 2001, 2001 IEEE Workshop on High Performance Switching and Routing (IEEE Cat. No.01TH8552).

[19]  David Harris,et al.  CMOS VLSI Design: A Circuits and Systems Perspective , 2004 .

[20]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[21]  Ian Holyer,et al.  The NP-Completeness of Edge-Coloring , 1981, SIAM J. Comput..

[22]  Alberto L. Sangiovanni-Vincentelli,et al.  Addressing the system-on-a-chip interconnect woes through communication-based design , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[23]  Sharad Malik,et al.  A hierarchical modeling framework for on-chip communication architectures , 2002, ICCAD 2002.

[24]  Jason Cong,et al.  Interconnect performance estimation models for design planning , 2001, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[25]  Frank Thomson Leighton Introduction to parallel algorithms and architectures: arrays , 1992 .

[26]  B. Nowak,et al.  Fitted Elmore delay: a simple and accurate interconnect delay model , 2004, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[27]  Luca Benini,et al.  ×pipesCompiler: A Tool for Instantiating Application Specific Networks on Chip , 2004, DATE.

[28]  Luca Benini,et al.  Networks on Chips : A New SoC Paradigm , 2022 .

[29]  A. Sangiovanni-Vincentelli,et al.  Constraint-driven communication synthesis , 2002, Proceedings 2002 Design Automation Conference (IEEE Cat. No.02CH37324).

[30]  Edward A. Lee,et al.  Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing , 1989, IEEE Transactions on Computers.

[31]  Radu Marculescu,et al.  Energy-aware mapping for tile-based NoC architectures under performance constraints , 2003, ASP-DAC '03.

[32]  Timothy Mark Pinkston,et al.  A methodology for designing efficient on-chip interconnects on well-behaved communication patterns , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[33]  Srinivasan Murali,et al.  SUNMAP: a tool for automatic topology selection and generation for NoCs , 2004, Proceedings. 41st Design Automation Conference, 2004..

[34]  Alberto L. Sangiovanni-Vincentelli,et al.  Coping with Latency in SOC Design , 2002, IEEE Micro.

[35]  Xuan Zeng,et al.  Power-optimal simultaneous buffer insertion/sizing and uniform wire sizing for single long wires , 2005, 2005 IEEE International Symposium on Circuits and Systems.

[36]  Naoaki Yamanaka,et al.  Architectural choices in large scale ATM switches , 1998 .

[37]  Naveed A. Sherwani,et al.  Algorithms for VLSI Physical Design Automation , 1999, Springer US.

[38]  Adnan Aziz,et al.  Scheduling Traffic Matrices On General Switch Fabrics , 2006, 14th IEEE Symposium on High-Performance Interconnects (HOTI'06).

[39]  David S. Johnson,et al.  Computers and In stractability: A Guide to the Theory of NP-Completeness. W. H Freeman, San Fran , 1979 .

[40]  Luca Benini,et al.  Networks on Chips: A New Paradigm for Component-Based MPSoC Design , 2005 .

[41]  Frank Kienle,et al.  A synthesizable IP core for DVB-S2 LDPC code decoding , 2005, Design, Automation and Test in Europe.

[42]  Axel Jantsch,et al.  The Nostrum backbone-a communication protocol stack for Networks on Chip , 2004, 17th International Conference on VLSI Design. Proceedings..

[43]  Timothy Mark Pinkston,et al.  A design methodology for efficient application-specific on-chip interconnects , 2006, IEEE Transactions on Parallel and Distributed Systems.

[44]  H. Zhang,et al.  A 1-V heterogeneous reconfigurable DSP IC for wireless baseband digital signal processing , 2000, IEEE Journal of Solid-State Circuits.

[45]  Drew Wingard MicroNetwork-based integration for SOCs , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[46]  U. Jagdhold,et al.  A 64-point Fourier transform chip for high-speed wireless LAN application using OFDM , 2004, IEEE Journal of Solid-State Circuits.

[47]  Srinivasan Murali,et al.  Bandwidth-constrained mapping of cores onto NoC architectures , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[48]  Luca Benini,et al.  NoC synthesis flow for customized domain specific multiprocessor systems-on-chip , 2005, IEEE Transactions on Parallel and Distributed Systems.

[49]  Ganesh Lakshminarayana,et al.  LOTTERYBUS: a new high-performance communication architecture for system-on-chip designs , 2001, DAC '01.

[50]  Vivek Sarkar,et al.  Baring It All to Software: Raw Machines , 1997, Computer.

[51]  Keshab K. Parhi,et al.  VLSI digital signal processing systems , 1999 .

[52]  Robert G. Gallager,et al.  Low-density parity-check codes , 1962, IRE Trans. Inf. Theory.

[53]  Alain Greiner,et al.  A generic architecture for on-chip packet-switched interconnections , 2000, DATE '00.

[54]  A. J. Blanksby,et al.  A 690-mW 1-Gb/s 1024-b, rate-1/2 low-density parity-check code decoder , 2001, IEEE J. Solid State Circuits.