A survey and taxonomy of on-chip monitoring of multicore systems-on-chip

Billion transistor systems-on-chip increasingly require dynamic management of their hardware components and careful coordination of the tasks that they carry out. Diverse real-time monitoring functions assist towards this objective through the collection of important system metrics, such as throughput of processing elements, communication latency, or resource utilization for each application. The online evaluation of these metrics can result in localized or global decisions that attempt to improve aspects of system behavior, system performance, quality-of-service, power and thermal effects under nominal conditions. This work provides a comprehensive categorization of monitoring approaches used in multiprocessor SoCs. As adaptive systems are encountered in many disciplines, it is imperative to present the prominent research efforts in developing online monitoring methods. To this end we offer a taxonomy that groups strongly related techniques that designers increasingly use to produce more efficient and adaptive chips. The provided classification helps to understand and compare architectural mechanisms that can be used in systems, while one can envisage the innovations required to build real adaptive and intelligent systems-on-chip.

[1]  M.J. Flynn,et al.  Microprocessor design issues: thoughts on the road ahead , 2005, IEEE Micro.

[2]  Aleksandar Milenkovic,et al.  Hardware support for code integrity in embedded processors , 2005, CASES '05.

[3]  Xavier Bonnaire,et al.  Cluster Monitoring Platform Based on Self Adaptable Probes , 2000 .

[4]  Chandra Krintz,et al.  HPS: hybrid profiling support , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[5]  Klaus D. McDonald-Maier,et al.  Debug support strategy for systems-on-chips with multiple processor cores , 2006, IEEE Transactions on Computers.

[6]  Stefanos Kaxiras,et al.  4T-decay sensors: a new class of small, fast, robust, and low-power, temperature/leakage sensors , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[7]  Tien-Fu Chen,et al.  NUDA: A Non-Uniform Debugging Architecture and Nonintrusive Race Detection for Many-Core Systems , 2012, IEEE Transactions on Computers.

[8]  Stephen A. Jarvis,et al.  Self-adaptive and self-optimising resource monitoring for dynamic grid environments , 2004, Proceedings. 15th International Workshop on Database and Expert Systems Applications, 2004..

[9]  Lance M. Berc,et al.  Continuous profiling: where have all the cycles gone? , 1997, TOCS.

[10]  Mark D. Hill,et al.  Karma: scalable deterministic record-replay , 2011, ICS '11.

[11]  K. Heider The Rashomon Effect: When Ethnographers Disagree , 1988 .

[12]  Chia-Lin Yang,et al.  Power gating strategies on GPUs , 2011, TACO.

[13]  Matthias Hauswirth,et al.  Vertical profiling: understanding the behavior of object-priented applications , 2004, OOPSLA.

[14]  Xue Liu,et al.  Online adaptive utilization control for real-time embedded multiprocessor systems , 2008, CODES+ISSS '08.

[15]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[16]  Soraya Ghiasi,et al.  System power management support in the IBM POWER6 microprocessor , 2007, IBM J. Res. Dev..

[17]  Chenyang Lu,et al.  Feedback Thermal Control for Real-time Systems , 2010, 2010 16th IEEE Real-Time and Embedded Technology and Applications Symposium.

[18]  Kyoungho Woo,et al.  Time-Domain CMOS Temperature Sensors With Dual Delay-Locked Loops for Microprocessor Thermal Monitoring , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[19]  Kevin Skadron,et al.  Temperature-aware microarchitecture: Modeling and implementation , 2004, TACO.

[20]  Luca Benini,et al.  A survey of design techniques for system-level dynamic power management , 2000, IEEE Trans. Very Large Scale Integr. Syst..

[21]  David Zhang,et al.  Secure program execution via dynamic information flow tracking , 2004, ASPLOS XI.

[22]  石田 好輝 Immunity-based systems : a design perspective , 2004 .

[23]  David Blaauw,et al.  Compact Degradation Sensors for Monitoring NBTI and Oxide Degradation , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[24]  Srikar Bhagavatula,et al.  A Low Power Real-time On-Chip Power Sensor in 45-nm SOI , 2012, IEEE Transactions on Circuits and Systems I: Regular Papers.

[25]  Paolo A. Aseron,et al.  All-Digital Circuit-Level Dynamic Variation Monitor for Silicon Debug and Adaptive Clock Control , 2011, IEEE Transactions on Circuits and Systems I: Regular Papers.

[26]  J.F. Martinez,et al.  Cherry: Checkpointed early resource recycling in out-of-order microprocessors , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[27]  Bishop Brock,et al.  Introducing the Adaptive Energy Management Features of the Power7 Chip , 2011, IEEE Micro.

[28]  Jean-Michel Chabloz,et al.  Distributed DVFS using rationally-related frequencies and discrete voltage levels , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).

[29]  Luca Benini,et al.  Reliability-aware design for nanometer-scale devices , 2008, 2008 Asia and South Pacific Design Automation Conference.

[30]  Xi Li,et al.  Temperature-aware energy minimization technique through dynamic voltage frequency scaling for embedded systems , 2010, 2010 2nd International Conference on Education Technology and Computer.

[31]  Tajana Simunic,et al.  Utilizing Predictors for Efficient Thermal Management in Multiprocessor SoCs , 2009, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[32]  Fernando Gehm Moraes,et al.  A monitoring and adaptive routing mechanism for QoS traffic on mesh NoC architectures , 2009, CODES+ISSS '09.

[33]  David Blaauw,et al.  A Power-Efficient 32 bit ARM Processor Using Timing-Error Detection and Correction for Transient-Error Tolerance and Adaptation to PVT Variation , 2011, IEEE Journal of Solid-State Circuits.

[34]  Víctor H. Champac,et al.  Built-In Sensor for Signal Integrity Faults in Digital Interconnect Signals , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[35]  Ravishankar K. Iyer,et al.  An architectural framework for providing reliability and security support , 2004, International Conference on Dependable Systems and Networks, 2004.

[36]  Hartmut Schmeck,et al.  Adaptivity and self-organization in organic computing systems , 2010, TAAS.

[37]  Massoud Pedram,et al.  Fine-grained dynamic voltage and frequency scaling for precise energy and performance tradeoff based on the ratio of off-chip access to on-chip computation times , 2005 .

[38]  Michael F. P. O'Boyle,et al.  A Predictive Model for Dynamic Microarchitectural Adaptivity Control , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[39]  William J. Dally,et al.  Route packets, not wires: on-chip inteconnection networks , 2001, DAC '01.

[40]  Babak Falsafi,et al.  JETTY: filtering snoops for reduced energy consumption in SMP servers , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[41]  Yoshiteru Ishida Defining Immunity-Based Systems , 2004 .

[42]  Claudio Carpineto,et al.  A Survey of Automatic Query Expansion in Information Retrieval , 2012, CSUR.

[43]  Wenguang Chen,et al.  Taming hardware event samples for FDO compilation , 2010, CGO '10.

[44]  Simha Sethumadhavan,et al.  Rapid identification of architectural bottlenecks via precise event counting , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[45]  Stephen W. Keckler,et al.  Regional congestion awareness for load balance in networks-on-chip , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[46]  Li Shang,et al.  Multi-Optimization power management for chip multiprocessors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[47]  Hyesoon Kim,et al.  An integrated GPU power and performance model , 2010, ISCA.

[48]  Jim D. Garside,et al.  Overview of the SpiNNaker System Architecture , 2013, IEEE Transactions on Computers.

[49]  Luca Benini,et al.  Analysis of error recovery schemes for networks on chips , 2005, IEEE Design & Test of Computers.

[50]  Shekhar Y. Borkar,et al.  Designing reliable systems from unreliable components: the challenges of transistor variability and degradation , 2005, IEEE Micro.

[51]  Luigi Carro,et al.  Reusing an on-chip network for the test of core-based systems , 2004, TODE.

[52]  Slobodan Lukovic,et al.  Hierarchical multi-agent protection system for NoC based MPSoCs , 2010, S&D4RCES '10.

[53]  David Blaauw,et al.  Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation , 2003, MICRO.

[54]  Kevin Skadron,et al.  Recent thermal management techniques for microprocessors , 2012, CSUR.

[55]  Théodore Marescaux,et al.  Introducing the SuperGT Network-on-Chip; SuperGT QoS: more than just GT , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[56]  Lance M. Berc,et al.  Continuous profiling: where have all the cycles gone? , 1997, ACM Trans. Comput. Syst..

[57]  Josep Torrellas,et al.  Two hardware-based approaches for deterministic multiprocessor replay , 2009, CACM.

[58]  Nicholas Nethercote,et al.  Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.

[59]  Jihong Kim,et al.  ePRO-MP: A tool for profiling and optimizing energy and performance of mobile multiprocessor applications , 2009 .

[60]  David Blaauw,et al.  Vicis: A reliable network for unreliable silicon , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[61]  Gérard Memmi,et al.  A reconfigurable design-for-debug infrastructure for SoCs , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[62]  Kees G. W. Goossens,et al.  A Monitoring-Aware Network-on-Chip Design Flow , 2006, 9th EUROMICRO Conference on Digital System Design (DSD'06).

[63]  Miroslaw Malek,et al.  A survey of online failure prediction methods , 2010, CSUR.

[64]  R. Leatherman,et al.  An embedding debugging architecture for SOCs , 2005, IEEE Potentials.

[65]  Muhammad Shafique,et al.  Efficient Resource Utilization for an Extensible Processor Through Dynamic Instruction Set Adaptation , 2008, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[66]  Koushik Chakraborty,et al.  Mixed-mode multicore reliability , 2009, ASPLOS.

[67]  Kees G. W. Goossens,et al.  An event-based monitoring service for networks on chip , 2005, TODE.

[68]  Josep Torrellas,et al.  ReVive: cost-effective architectural support for rollback recovery in shared-memory multiprocessors , 2002, ISCA.

[69]  Poki Chen,et al.  A Time-Domain SAR Smart Temperature Sensor With Curvature Compensation and a 3σ Inaccuracy of −0.4°C ∼ +0.6°C Over a 0°C to 90°C Range , 2010, IEEE Journal of Solid-State Circuits.

[70]  James E. Smith,et al.  Rapid profiling via stratified sampling , 2001, ISCA 2001.

[71]  C.H. Kim,et al.  Silicon Odometer: An On-Chip Reliability Monitor for Measuring Frequency Degradation of Digital Circuits , 2007, 2007 IEEE Symposium on VLSI Circuits.

[72]  Babak Falsafi,et al.  Fingerprinting: bounding soft-error-detection latency and bandwidth , 2004, IEEE Micro.

[73]  Hamid Sarbazi-Azad,et al.  An efficient dynamically reconfigurable on-chip network architecture , 2010, Design Automation Conference.

[74]  Dharmendra S. Modha,et al.  The cat is out of the bag: cortical simulations with 109 neurons, 1013 synapses , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[75]  S. Naffziger,et al.  Power and temperature control on a 90-nm Itanium family processor , 2006, IEEE Journal of Solid-State Circuits.

[76]  Margaret Martonosi,et al.  Runtime power monitoring in high-end processors: methodology and empirical data , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[77]  Roberto Montemanni,et al.  Design patterns from biology for distributed computing , 2006, TAAS.

[78]  Bernhard Sick,et al.  Techniques for knowledge acquisition in dynamically changing environments , 2012, TAAS.

[79]  Peter H. N. de With,et al.  Qos concept for scalable MPEG-4 video object decoding on multimedia (NoC) chips , 2006, IEEE Transactions on Consumer Electronics.

[80]  Mahmut T. Kandemir,et al.  Feedback control for providing QoS in NoC based multicores , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[81]  Jeffrey Dean,et al.  ProfileMe: hardware support for instruction-level profiling on out-of-order processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[82]  Keith A. Bowman,et al.  Resilient microprocessor design for improving performance and energy efficiency , 2010, 2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[83]  Xiaobo Sharon Hu,et al.  Signature-based workload estimation for mobile 3D graphics , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[84]  Adrian Stoica,et al.  Adaptive and Evolvable Hardware - A Multifaceted Analysis , 2007, Second NASA/ESA Conference on Adaptive Hardware and Systems (AHS 2007).

[85]  Pin Zhou,et al.  HARD: Hardware-Assisted Lockset-based Race Detection , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[86]  Yufu Zhang,et al.  Accurate Temperature Estimation Using Noisy Thermal Sensors for Gaussian and Non-Gaussian Cases , 2011, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[87]  Bertil Folliot,et al.  PHOENIX: A Self Adaptable Monitoring Platform for Cluster Management , 2004, Cluster Computing.

[88]  Karsten Schwan,et al.  CHAOSarc: kernel support for multiweight objects, invocations, and atomicity in real-time multiprocessor applications , 1993, TOCS.

[89]  Sayed Mohammad Kia,et al.  Micro embedded monitoring for security in application specific instruction-set processors , 2005, CASES '05.

[90]  David Blaauw,et al.  ElastIC: An Adaptive Self-Healing Architecture for Unpredictable Silicon , 2006, IEEE Design & Test of Computers.

[91]  Yoshiteru Ishida,et al.  Immunity-Based Systems , 2004, Advanced Information Processing.

[92]  Jörg Henkel,et al.  Run-time adaptive on-chip communication scheme , 2007, 2007 IEEE/ACM International Conference on Computer-Aided Design.

[93]  Matthias Hauswirth,et al.  Using Hardware Performance Monitors to Understand the Behavior of Java Applications , 2004, Virtual Machine Research and Technology Symposium.

[94]  Brecht Vermeulen,et al.  Debugging multi-core systems-on-chip , 2010 .

[95]  Luca Benini,et al.  Reliability-Aware Design for Nanometer-Scale Devices, January 2008 , 2008, ASP-DAC 2008.

[96]  Jihong Kim,et al.  ePRO-MP: A tool for profiling and optimizing energy and performance of mobile multiprocessor applications , 2009, Sci. Program..

[97]  Danny Raz,et al.  Efficient reactive monitoring , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[98]  Diederik Verkest,et al.  Operating-system controlled network on chip , 2004, Proceedings. 41st Design Automation Conference, 2004..

[99]  Stephen A. Jarvis,et al.  Self-adaptive and self-optimising resource monitoring for dynamic grid environments , 2004 .

[100]  Zheng Wang,et al.  System support for automatic profiling and optimization , 1997, SOSP.

[101]  Luca Benini,et al.  A Feedback-Based Approach to DVFS in Data-Flow Applications , 2009, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[102]  Kai Ma,et al.  Temperature-constrained power control for chip multiprocessors with online model estimation , 2009, ISCA '09.

[103]  Frank Bellosa,et al.  Dynamic Thermal Management for Distributed Systems , 2002 .

[104]  Margaret Martonosi,et al.  Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data , 2003, MICRO.

[105]  Omer Khan,et al.  A framework for predictive dynamic temperature management of microprocessor systems , 2008, 2008 IEEE/ACM International Conference on Computer-Aided Design.

[106]  Frank Bellosa,et al.  Task activity vectors: a new metric for temperature-aware scheduling , 2008, Eurosys '08.

[107]  Lennart Lindh,et al.  A hardware and software monitor for high-level system-on-chip verification , 2001, Proceedings of the IEEE 2001. 2nd International Symposium on Quality Electronic Design.

[108]  Seda Ogrenci Memik,et al.  Thermal monitoring mechanisms for chip multiprocessors , 2008, TACO.

[109]  Kevin Skadron,et al.  Control-theoretic techniques and thermal-RC modeling for accurate and localized dynamic thermal management , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[110]  Wolfgang Karl,et al.  A Light-Weight Approach for Online State Classification of Self-organizing Parallel Systems , 2011, ARCS.

[111]  Jong Wook Kwak,et al.  Performance monitor unit design for an AXI-based multi-core SoC platform , 2007, SAC '07.

[112]  Subhasish Mitra,et al.  Post-silicon bug localization for processors using IFRA , 2010, Commun. ACM.

[113]  Karsten Schwan,et al.  Dynamic adaptation of real-time software , 1991, TOCS.

[114]  Keith A. Bowman,et al.  Resilient circuits — Enabling energy-efficient performance and reliability , 2009, 2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers.

[115]  David E. Culler,et al.  The ganglia distributed monitoring system: design, implementation, and experience , 2004, Parallel Comput..

[116]  Michael Gschwind,et al.  Next-Generation Performance Counters: Towards Monitoring Over Thousand Concurrent Events , 2008, ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software.

[117]  Sherief Reda,et al.  Thermal monitoring of real processors: Techniques for sensor allocation and full characterization , 2010, Design Automation Conference.

[118]  Michael C. Huang,et al.  Cherry: checkpointed early resource recycling in out-of-order microprocessors , 2002, MICRO.

[119]  Axel Jantsch,et al.  Methods for fault tolerance in networks-on-chip , 2013, CSUR.

[120]  Paul A. S. Ward,et al.  ADAPTIVE MONITORING IN ENTERPRISE SOFTWARE SYSTEMS , 2006 .

[121]  Josh Lothian,et al.  Open Standards for Sensor Information Processing , 2009 .

[122]  Jörg Henkel,et al.  Digital On-Demand Computing Organism for Real-Time Systems , 2006, ARCS Workshops.

[123]  Min Xu,et al.  A "flight data recorder" for enabling full-system multiprocessor deterministic replay , 2003, ISCA '03.

[124]  Chenyang Lu,et al.  Feedback utilization control in distributed real-time systems with end-to-end tasks , 2005, IEEE Transactions on Parallel and Distributed Systems.

[125]  Paolo A. Aseron,et al.  A 45nm resilient and adaptive microprocessor core for dynamic variation tolerance , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[126]  Yale N. Patt,et al.  Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[127]  Daniele Mangano,et al.  Enabling dynamic and programmable QoS in SoCs , 2010, NoCArc '10.

[128]  Russell Tessier,et al.  A Dedicated Monitoring Infrastructure for Multicore Processors , 2011, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[129]  Massoud Pedram,et al.  Fine-Grained Dynamic Voltage and Frequency Scaling for Precise Energy and Performance Trade-Off Based on the Ratio of Off-Chip Access to On-Chip Computation Times , 2004, DATE.

[130]  E. Markatos,et al.  PASSIVE END-TO-END PACKET LOSS ESTIMATION FOR GRID TRAFFIC MONITORING , 2006 .

[131]  Kees G. W. Goossens,et al.  Congestion-Controlled Best-Effort Communication for Networks-on-Chip , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[132]  Hai Zhou,et al.  Parallel CAD: Algorithm Design and Programming Special Section Call for Papers TODAES: ACM Transactions on Design Automation of Electronic Systems , 2010 .

[133]  Li-Shiuan Peh,et al.  Enabling system-level modeling of variation-induced faults in Networks-on-Chips , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[134]  Nicola Nicolici,et al.  Embedded Debug Architecture for Bypassing Blocking Bugs During Post-Silicon Validation , 2011, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[135]  Jörg Henkel,et al.  ADAM: Run-time agent-based distributed application mapping for on-chip communication , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[136]  Gianluca Palermo,et al.  Secure Memory Accesses on Networks-on-Chip , 2008, IEEE Transactions on Computers.

[137]  Brinkley Sprunt,et al.  The Basics of Performance-Monitoring Hardware , 2002, IEEE Micro.

[138]  José Pineda de Gyvez,et al.  Dynamic voltage scaling based on supply current tracking using fuzzy Logic controller , 2009, 2009 16th IEEE International Conference on Electronics, Circuits and Systems - (ICECS 2009).

[139]  Frank Vahid,et al.  A One-Shot Configurable-Cache Tuner for Improved Energy and Performance , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[140]  Hsien-Hsin S. Lee,et al.  An Integrated Framework for Dependable and Revivable Architectures Using Multicore Processors , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[141]  Alan J. Hu,et al.  TAB-BackSpace: Unlimited-length trace buffers with zero additional on-chip overhead , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[142]  Srivaths Ravi,et al.  Secure embedded processing through hardware-assisted run-time monitoring , 2005, Design, Automation and Test in Europe.

[143]  Takayasu Ito,et al.  Embedded SoC Resource Manager to Control Temperature and Data Bandwidth , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.