Embedded dynamic programming networks for networks-on-chip

Relentless technology downscaling and recent technological advancements in three dimensional integrated circuit (3D-IC) provide a promising prospect to realize heterogeneous system-on-chip (SoC) and homogeneous chip multiprocessor (CMP) based on the networks-onchip (NoCs) paradigm with augmented scalability, modularity and performance. In many cases in such systems, scheduling and managing communication resources are the major design and implementation challenges instead of the computing resources. Past research efforts were mainly focused on complex design-time or simple heuristic run-time approaches to deal with the on-chip network resource management with only local or partial information about the network. This could yield poor communication resource utilizations and amortize the benefits of the emerging technologies and design methods. Thus, the provision for efficient run-time resource management in large-scale on-chip systems becomes critical. This thesis proposes a design methodology for a novel run-time resource management infrastructure that can be realized efficiently using a distributed architecture, which closely couples with the distributed NoC infrastructure. The proposed infrastructure exploits the global information and status of the network to optimize and manage the on-chip communication resources at run-time. There are four major contributions in this thesis. First, it presents a novel deadlock detection method that utilizes run-time transitive closure (TC) computation to discover the existence of deadlock-equivalence sets, which imply loops of requests in NoCs. This detection scheme, TC-network, guarantees the discovery of all true-deadlocks without false alarms in contrast to state-of-the-art approximation and heuristic approaches. Second, it investigates the advantages of implementing future on-chip systems using three dimensional (3D) integration and presents the design, fabrication and testing results of a TC-network implemented in a fully stacked three-layer 3D architecture using a through-silicon via (TSV) complementary metal-oxide semiconductor (CMOS) technology. Testing results demonstrate the effectiveness of such a TC-network for deadlock detection with minimal computational delay in a large-scale network. Third, it introduces an adaptive strategy to effectively diffuse heat throughout the three dimensional network-on-chip (3D-NoC) geometry. This strategy employs a dynamic programming technique to select and optimize the direction of data manoeuvre in NoC. It leads to a tool, which is based on the accurate HotSpot thermal model and SystemC cycle accurate model, to simulate the thermal system and evaluate the proposed approach. Fourth, it

[1]  Jenn-Gwo Hwu,et al.  High sensitive and wide detecting range MOS tunneling temperature sensors for on-chip temperature detection , 2004 .

[2]  Arnab Banerjee,et al.  An Energy and Performance Exploration of Network-on-Chip Architectures , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[3]  Yicong Meng,et al.  Dynamic Programming Networks for Large-Scale 3D Chip Integration , 2011, IEEE Circuits and Systems Magazine.

[4]  Jim D. Garside,et al.  SpiNNaker: Design and Implementation of a GALS Multicore System-on-Chip , 2011, JETC.

[5]  Sriram R. Vangal,et al.  A 5-GHz Mesh Interconnect for a Teraflops Processor , 2007, IEEE Micro.

[6]  Lionel M. Ni,et al.  A survey of wormhole routing techniques in direct networks , 1993, Computer.

[7]  Alain Greiner,et al.  A generic architecture for on-chip packet-switched interconnections , 2000, DATE '00.

[8]  Jian Xu,et al.  Demystifying 3D ICs: the pros and cons of going vertical , 2005, IEEE Design & Test of Computers.

[9]  An-Yeu Wu,et al.  Traffic-and thermal-aware routing for throttled three-dimensional Network-on-Chip systems , 2011, Proceedings of 2011 International Symposium on VLSI Design, Automation and Test.

[10]  W. Dally,et al.  Route packets, not wires: on-chip interconnection networks , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[11]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[12]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[13]  Kees G. W. Goossens,et al.  The aethereal network on chip after ten years: Goals, evolution, lessons, and future , 2010, Design Automation Conference.

[14]  Chita R. Das,et al.  Impact of virtual channels and adaptive routing on application performance , 2001, SIGCPR '01.

[15]  Vincenzo Catania,et al.  A methodology for design of application specific deadlock-free routing algorithms for NoC systems , 2006, Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06).

[16]  L. Benini,et al.  Xpipes: a network-on-chip architecture for gigascale systems-on-chip , 2004, IEEE Circuits and Systems Magazine.

[17]  Terrence S. T. Mak,et al.  Dynamic programming-based runtime thermal management (DPRTM) , 2013, ACM Trans. Design Autom. Electr. Syst..

[18]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[19]  Simon W. Moore,et al.  The next resource war: computation vs. communication , 2008, SLIP '08.

[20]  Partha Pratim Pande,et al.  Performance evaluation and design trade-offs for network-on-chip interconnect architectures , 2005, IEEE Transactions on Computers.

[21]  Vincenzo Catania,et al.  Implementation and Analysis of a New Selection Strategy for Adaptive Routing in Networks-on-Chip , 2008, IEEE Transactions on Computers.

[22]  Alain J. Martin,et al.  The architecture and programming of the Ametek series 2010 multicomputer , 1988, C3P.

[23]  K. P. Lam,et al.  Closed semiring connectionist network for the Bellman-Ford computation , 1996 .

[24]  Chita R. Das,et al.  Network-on-Chip Architectures - A Holistic Design Exploration , 2010, Lecture Notes in Electrical Engineering.

[25]  Kevin Skadron,et al.  Compact thermal modeling for temperature-aware design , 2004, Proceedings. 41st Design Automation Conference, 2004..

[26]  Dimitri Bertsekas,et al.  Distributed dynamic programming , 1981, 1981 20th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.

[27]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[28]  William J. Dally,et al.  Architecture and implementation of the reliable router , 1994, Symposium Record Hot Interconnects II.

[29]  Terrence S. T. Mak,et al.  Embedded Transitive Closure Network for Runtime Deadlock Detection in Networks-on-Chip , 2012, IEEE Transactions on Parallel and Distributed Systems.

[30]  Z. Rahman,et al.  Architectural implications and process development of 3-D VLSI Z-axis interconnects using through silicon vias , 2005, IEEE Transactions on Advanced Packaging.

[31]  Andrew B. Kahng,et al.  ORION 2.0: A Power-Area Simulator for Interconnection Networks , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[32]  José Duato,et al.  Generalized theory for deadlock-free adaptive wormhole routing and its application to Disha Concurrent , 1996, Proceedings of International Conference on Parallel Processing.

[33]  A. A. Chein,et al.  A cost and speed model for k-ary n-cube wormhole routers , 1998 .

[34]  Sudhakar Yalamanchili,et al.  Interconnection Networks: An Engineering Approach , 2002 .

[35]  Lawrence Snyder,et al.  The Chaos Router , 1994, IEEE Trans. Computers.

[36]  Kevin Skadron,et al.  HotSpot: a compact thermal modeling methodology for early-stage VLSI design , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[37]  Soha Hassoun,et al.  Power Delivery Design for 3-D ICs Using Different Through-Silicon Via (TSV) Technologies , 2011, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[38]  Ge-Ming Chiu,et al.  The Odd-Even Turn Model for Adaptive Routing , 2000, IEEE Trans. Parallel Distributed Syst..

[39]  Wayne Luk,et al.  A Hybrid Analog-Digital Routing Network for NoC Dynamic Routing , 2007, First International Symposium on Networks-on-Chip (NOCS'07).

[40]  M. Freimer,et al.  A dynamic programming approach to adaptive control processes , 1959 .

[41]  Ming Li,et al.  DyXY - a proximity congestion-aware deadlock-free dynamic routing method for network on chip , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[42]  Luca Benini,et al.  NoC synthesis flow for customized domain specific multiprocessor systems-on-chip , 2005, IEEE Transactions on Parallel and Distributed Systems.

[43]  Eby G. Friedman,et al.  On-chip optical interconnect roadmap: challenges and critical directions , 2005 .

[44]  Vivek Sarkar,et al.  Baring It All to Software: Raw Machines , 1997, Computer.

[45]  Jason Cong,et al.  CMP network-on-chip overlaid with multi-band RF-interconnect , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[46]  Kevin Skadron,et al.  Temperature-aware microarchitecture , 2003, ISCA '03.

[47]  Aline. Vieira-de-Mello ATLAS-An Environment for NoC Generation and Evaluation , 2011 .

[48]  David Atienza,et al.  Modeling and dynamic management of 3D multicore systems with liquid cooling , 2009, 2009 17th IFIP International Conference on Very Large Scale Integration (VLSI-SoC).

[49]  P.M. Watts,et al.  Requirements of low power photonic networks for Distributed Shared Memory computers , 2011, 2011 Optical Fiber Communication Conference and Exposition and the National Fiber Optic Engineers Conference.

[50]  Kevin Skadron,et al.  Accurate, Pre-RTL Temperature-Aware Design Using a Parameterized, Geometric Thermal Model , 2008, IEEE Transactions on Computers.

[51]  Robert E. Kalaba,et al.  On the role of dynamic programming in statistical communication theory , 1957, IRE Trans. Inf. Theory.

[52]  Luca Benini,et al.  Networks on Chips : A New SoC Paradigm , 2022 .

[53]  Dilip Sarkar,et al.  Design of Optimal Systolic Algorithms for the Transitive Closure Problem , 1992, IEEE Trans. Computers.

[54]  José Duato,et al.  A Cost-Effective Approach to Deadlock Handling in Wormhole Networks , 2001, IEEE Trans. Parallel Distributed Syst..

[55]  Radu Marculescu,et al.  DyAD - smart routing for networks-on-chip , 2004, Proceedings. 41st Design Automation Conference, 2004..

[56]  Timothy Mark Pinkston,et al.  Characterization of Deadlocks in k-ary n-Cube Networks , 1999, IEEE Trans. Parallel Distributed Syst..

[57]  Paul Ampadu,et al.  A Dual-Layer Method for Transient and Permanent Error Co-Management in NoC Links , 2011, IEEE Transactions on Circuits and Systems II: Express Briefs.

[58]  Raymond G. Beausoleil,et al.  Nanoelectronic and Nanophotonic Interconnect , 2008, Proceedings of the IEEE.

[59]  Soojung Lee A deadlock detection mechanism for true fully adaptive routing in regular wormhole networks , 2007, Comput. Commun..

[60]  José Duato,et al.  FC3D: Flow Control-Based Distributed Deadlock Detection Mechanism for True Fully Adaptive Routing in Wormhole Networks , 2003, IEEE Trans. Parallel Distributed Syst..

[61]  Payman Zarkesh-Ha,et al.  Impact of three-dimensional architectures on interconnects in gigascale integration , 2001, IEEE Trans. Very Large Scale Integr. Syst..

[62]  G. Moore Cramming more components onto integrated circuits, Reprinted from Electronics, volume 38, number 8, April 19, 1965, pp.114 ff. , 2006, IEEE Solid-State Circuits Newsletter.

[63]  Li Shang,et al.  Spectrum: A hybrid nanophotonic—electric on-chip network , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[64]  Ding-Ming Kwai,et al.  Thermal-aware on-line task allocation for 3D multi-core processor throughput optimization , 2011, 2011 Design, Automation & Test in Europe.

[65]  Dong Xiang,et al.  An Effective Congestion-Aware Selection Function for Adaptive Routing in Interconnection Networks , 2010, 2010 International Conference on Parallel and Distributed Computing, Applications and Technologies.

[66]  Kees G. W. Goossens,et al.  Networks on Chips for High-End Consumer-Electronics TV System Architectures , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[67]  Sudhakar Yalamanchili,et al.  High Performance Non-blocking Switch Design in 3D Die-Stacking Technology , 2009, 2009 IEEE Computer Society Annual Symposium on VLSI.

[68]  Eby G. Friedman,et al.  3-D Topologies for Networks-on-Chip , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[69]  Hoi-Jun Yoo,et al.  Low-power network-on-chip for high-performance SoC design , 2006, IEEE Trans. Very Large Scale Integr. Syst..

[70]  Radu Marculescu,et al.  Energy- and performance-aware mapping for regular NoC architectures , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[71]  Pedro López,et al.  Software-based deadlock recovery technique for true fully adaptive routing in wormhole networks , 1997, Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162).

[72]  Li Shang,et al.  Three-Dimensional Chip-Multiprocessor Run-Time Thermal Management , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[73]  Frédéric Pétrot,et al.  Physical Implementation of an Asynchronous 3D-NoC Router Using Serial Vertical Links , 2011, 2011 IEEE Computer Society Annual Symposium on VLSI.

[74]  Kees Goossens,et al.  AEthereal network on chip: concepts, architectures, and implementations , 2005, IEEE Design & Test of Computers.

[75]  Richard Bellman,et al.  Adaptive Control Processes - A Guided Tour (Reprint from 1961) , 2015, Princeton Legacy Library.

[76]  Luca Benini,et al.  Xpipes: a latency insensitive parameterized network-on-chip architecture for multiprocessor SoCs , 2003, Proceedings 21st International Conference on Computer Design.

[77]  Timothy Mark Pinkston,et al.  Characterization of deadlocks in interconnection networks , 1997, Proceedings 11th International Parallel Processing Symposium.

[78]  Wolfgang Ziegler 3D Integration for NoC-based SoC Architectures , 2011, Integrated Circuits and Systems.

[79]  Chi-Sang Poon,et al.  A CMOS Current-Mode Dynamic Programming Circuit , 2010, IEEE Transactions on Circuits and Systems I: Regular Papers.

[80]  Partha Pratim Pande,et al.  High-throughput switch-based interconnect for future SoCs , 2003, The 3rd IEEE International Workshop on System-on-Chip for Real-Time Applications, 2003. Proceedings..

[81]  An-Yeu Wu,et al.  Traffic- and Thermal-Aware Run-Time Thermal Management Scheme for 3D NoC Systems , 2010, 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip.

[82]  Radu Marculescu,et al.  Key research problems in NoC design: a holistic perspective , 2005, 2005 Third IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'05).

[83]  K. Skadron,et al.  Parameterized physical compact thermal modeling , 2005, IEEE Transactions on Components and Packaging Technologies.

[84]  Terrence S. T. Mak,et al.  On-chip dynamic programming networks using 3D-TSV integration , 2011, 2011 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.

[85]  Timothy Mark Pinkston,et al.  DISHA: a deadlock recovery scheme for fully adaptive routing , 1995, Proceedings of 9th International Parallel Processing Symposium.

[86]  William J. Dally,et al.  Deadlock-Free Message Routing in Multiprocessor Interconnection Networks , 1987, IEEE Transactions on Computers.

[87]  Pedro López,et al.  A very efficient distributed deadlock detection mechanism for wormhole networks , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[88]  Chi-Sang Poon,et al.  A Current-Mode Analog Circuit for Reinforcement Learning Problems , 2007, 2007 IEEE International Symposium on Circuits and Systems.

[89]  Yuan Xie,et al.  3D optical networks-on-chip (NoC) for multiprocessor systems-on-chip (MPSoC) , 2009, 2009 IEEE International Conference on 3D System Integration.

[90]  Wayne Luk,et al.  A DP-network for optimal dynamic routing in network-on-chip , 2009, CODES+ISSS '09.

[91]  Lionel M. Ni,et al.  The Turn Model for Adaptive Routing , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[92]  Frederick S. Hillier,et al.  Introduction of Operations Research , 1967 .

[93]  Stefan Schaal,et al.  Reinforcement Learning for Humanoid Robotics , 2003 .

[94]  S. Borkar,et al.  An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS , 2008, IEEE Journal of Solid-State Circuits.

[95]  Li Shang,et al.  Thermal Modeling, Characterization and Management of On-Chip Networks , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[96]  Ken Mai,et al.  The future of wires , 2001, Proc. IEEE.

[97]  Kees G. W. Goossens,et al.  CoMPSoC: A template for composable and predictable multi-processor system on chips , 2009, TODE.

[98]  Jae H. Kim,et al.  Compressionless Routing: a framework for adaptive and fault-tolerant routing , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[99]  Axel Jantsch,et al.  A network on chip architecture and design methodology , 2002, Proceedings IEEE Computer Society Annual Symposium on VLSI. New Paradigms for VLSI Systems Design. ISVLSI 2002.

[100]  Sun-Yuan Kung,et al.  Optimal Systolic Design for the Transitive Closure and the Shortest Path Problems , 1987, IEEE Transactions on Computers.

[101]  Axel Jantsch,et al.  Low-power and error protection coding for network-on-chip traffic , 2008, IET Comput. Digit. Tech..

[102]  William J. Dally Virtual-channel flow control , 1990, ISCA '90.

[103]  Lei Jiang,et al.  Die Stacking (3D) Microarchitecture , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[104]  D. J. Kinniment Synchronization and Arbitration in Digital Systems , 2008 .

[105]  Michael T. Goodrich,et al.  Algorithm Design: Foundations, Analysis, and Internet Examples , 2001 .

[106]  Jens Sparsø,et al.  A router architecture for connection-oriented service guarantees in the MANGO clockless network-on-chip , 2005, Design, Automation and Test in Europe.

[107]  Altamiro Amadeu Susin,et al.  SoCIN: a parametric and scalable network-on-chip , 2003, 16th Symposium on Integrated Circuits and Systems Design, 2003. SBCCI 2003. Proceedings..

[108]  Andreas Herkersdorf,et al.  Comparison of Deadlock Recovery and Avoidance Mechanisms to Approach Message Dependent Deadlocks in On-chip Networks , 2010, 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip.

[109]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[110]  Soojung Lee Turn-based Deadlock Detection for Wormhole Routed Networks , 2006, The Sixth IEEE International Conference on Computer and Information Technology (CIT'06).

[111]  Kevin Skadron,et al.  Differentiating the roles of IR measurement and simulation for power and temperature-aware design , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[112]  Partha Pratim Pande,et al.  Networks-on-Chip in a Three-Dimensional Environment: A Performance Evaluation , 2009, IEEE Transactions on Computers.

[113]  John K. Antonio,et al.  A Fast Distributed Shortest Path Algorithm for a Class of Hierarchically Clustered Data Networks , 1992, IEEE Trans. Computers.

[114]  Jan M. Rabaey,et al.  Digital Integrated Circuits: A Design Perspective , 1995 .

[115]  Margaret Martonosi,et al.  Techniques for Multicore Thermal Management: Classification and New Exploration , 2006, ISCA 2006.

[116]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[117]  Solomon Assefa,et al.  CMOS-Integrated Optical Receivers for On-Chip Interconnects , 2010, IEEE Journal of Selected Topics in Quantum Electronics.

[118]  Davide Bertozzi,et al.  Designing Network On-Chip Architectures in the Nanoscale Era , 2010 .

[119]  Axel Jantsch,et al.  Power analysis of link level and end-to-end data protection in networks on chip , 2005, 2005 IEEE International Symposium on Circuits and Systems.

[120]  Degang Chen,et al.  Highly linear very compact untrimmed on-chip temperature sensor with second and third order temperature compensation , 2010, 2010 53rd IEEE International Midwest Symposium on Circuits and Systems.

[121]  Axel Jantsch,et al.  Networks on chip , 2003 .

[122]  Kees G. W. Goossens,et al.  A high-level debug environment for communication-centric debug , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[123]  Terrence S. T. Mak,et al.  Run-time deadlock detection in networks-on-chip using coupled transitive closure networks , 2011, 2011 Design, Automation & Test in Europe.

[124]  Tobias Bjerregaard,et al.  A survey of research and practices of Network-on-chip , 2006, CSUR.

[125]  Alexandre Yakovlev,et al.  Self-Timed Control of Concurrent Processes: The Design of Aperiodic Logical Circuits in Computers and Discrete Systems , 1990 .

[126]  Ching-Che Chung,et al.  An Autocalibrated All-Digital Temperature Sensor for On-Chip Thermal Monitoring , 2011, IEEE Transactions on Circuits and Systems II: Express Briefs.

[127]  Wayne Luk,et al.  Adaptive Routing in Network-on-Chips Using a Dynamic-Programming Network , 2011, IEEE Transactions on Industrial Electronics.

[128]  Kees Goossens,et al.  A Network-on-Chip monitoring infrastructure for communication-centric debug of embedded multi-processor SoCs , 2009, 2009 International Symposium on VLSI Design, Automation and Test.

[129]  Luca Benini,et al.  A method to remove deadlocks in Networks-on-Chips with Wormhole flow control , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[130]  Ankur Jain,et al.  Interstratum Connection Design Considerations for Cost-Effective 3-D System Integration , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[131]  Radu Marculescu Toward a science for future NoC design , 2009, 2009 2nd International Workshop on Network on Chip Architectures.

[132]  Fernando Gehm Moraes,et al.  HERMES: an infrastructure for low area overhead packet-switching networks on chip , 2004, Integr..