Theorem proving based Formal Verification of Distributed Dynamic Thermal Management schemes

Abstract Distributed Dynamic Thermal Management (DDTM) schemes are widely being used nowadays to cater for the elevated chip temperatures for many-core systems. Traditionally, DDTM schemes are analyzed using simulation or emulation but the non-exhaustive and incomplete nature of these analysis techniques may compromise on the reliability of the chip. Recently, model checking has been proposed for formally verifying simple DDTM schemes but, despite several abstractions, the analysis is limited to less than 100 cores due to the state-space explosion problem. As a more scalable approach for next-generation many-core systems, we propose a methodology based on theorem proving to perform formal verification of DDTM schemes. The proposed approach allows specification and verification of both functional and timing properties for any number of cores and for all times. For this purpose, the paper provides a higher-order-logic formalization of a generic DDTM scheme. The proposed generic model can be specialized to formally specify most of the existing DDTM schemes and thus formally verify their thermal properties, like temperature bounds and balancing and time to reach thermal stability, as higher-order-logic theorems. As an illustrative example, the paper presents a formal model and analysis of a Distributed Task Migration based DDTM scheme for many-core systems.

[1]  Rachel Cardell-Oliver The formal verification of hard real-time systems , 1992 .

[2]  Kevin Skadron,et al.  Recent thermal management techniques for microprocessors , 2012, CSUR.

[3]  Qing Wu,et al.  A Multi-Agent Framework for Thermal Aware Task Migration in Many-Core Systems , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[4]  Michael Norrish,et al.  A Brief Overview of HOL4 , 2008, TPHOLs.

[5]  C. P. Burger,et al.  Thermal modeling , 1975 .

[6]  Sarita V. Adve,et al.  AS SCALING THREATENS TO ERODE RELIABILITY STANDARDS, LIFETIME RELIABILITY MUST BECOME A FIRST-CLASS DESIGN CONSTRAINT. MICROARCHITECTURAL INTERVENTION OFFERS A NOVEL WAY TO MANAGE LIFETIME RELIABILITY WITHOUT SIGNIFICANTLY SACRIFICING COST AND PERFORMANCE , 2005 .

[7]  Christel Baier,et al.  Principles of model checking , 2008 .

[8]  Jörg Henkel,et al.  CARAT: Context-aware runtime adaptive task migration for multi core architectures , 2011, 2011 Design, Automation & Test in Europe.

[9]  Margaret Martonosi,et al.  Dynamic thermal management for high-performance microprocessors , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[10]  Chen-Yong Cher,et al.  Temperature Variation Characterization and Thermal Management of Multicore Architectures , 2009, IEEE Micro.

[11]  John Harrison,et al.  Handbook of Practical Logic and Automated Reasoning , 2009 .

[12]  Muhammad Shafique,et al.  Distributed fair scheduling for many-cores , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[13]  Sarma B. K. Vrudhula,et al.  Temperature-Aware DVFS for Hard Real-Time Applications on Multicore Processors , 2012, IEEE Transactions on Computers.

[14]  Muhammad Shafique,et al.  Formal probabilistic analysis of distributed dynamic thermal management , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[15]  Zhiyi Yu,et al.  A 167-Processor Computational Platform in 65 nm CMOS , 2009, IEEE Journal of Solid-State Circuits.

[16]  Kevin Skadron,et al.  Temperature-aware microarchitecture: Modeling and implementation , 2004, TACO.

[17]  Thomas F. Melham Higher Order Logic and Hardware Verification , 1993, Cambridge Tracts in Theoretical Computer Science.

[18]  Muhammad Shafique,et al.  Probabilistic Formal Verification Methodology for Decentralized Thermal Management in On-Chip Systems , 2015, 2015 IEEE 24th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises.

[19]  Chong-Min Kyung,et al.  Runtime Power Management of 3-D Multi-Core Architectures Under Peak Power and Temperature Constraints , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[20]  Yongcan Cao,et al.  Distributed Average Tracking of Multiple Time-Varying Reference Signals With Bounded Derivatives , 2012, IEEE Transactions on Automatic Control.

[21]  Sofiène Tahar,et al.  Performance Analysis and Functional Verification of the Stop-and-Wait Protocol in HOL , 2008, Journal of Automated Reasoning.

[22]  Osman Hasan,et al.  Formalized Probability Theory and Applications Using Theorem Proving , 2015 .

[23]  Sofiène Tahar,et al.  Formal Reasoning About Finite-State Discrete-Time Markov Chains in HOL , 2013, Journal of Computer Science and Technology.

[24]  Sandip Ray,et al.  Combining Theorem Proving with Model Checking through Predicate Abstraction , 2007, IEEE Design & Test of Computers.

[25]  Muhammad Shafique,et al.  Formal Verification of Distributed Task Migration for Thermal Management in On-Chip Multi-core Systems Using nuXmv , 2014, FTSCS.

[26]  Li Shang,et al.  System-Level Dynamic Thermal Management for High-Performance Microprocessors , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[27]  Sheldon X.-D. Tan,et al.  Distributed task migration for thermal hot spot reduction in many-core microprocessors , 2013, 2013 IEEE 10th International Conference on ASIC.

[28]  Muhammad Shafique,et al.  Formal verification of distributed dynamic thermal management , 2013, 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[29]  Sandeep K. Shukla,et al.  A simulation and validation tool for self-stabilizing protocols , 1996, The Spin Verification System.

[30]  Thomas A. Henzinger,et al.  Automatic symbolic verification of embedded systems , 1993, 1993 Proceedings Real-Time Systems Symposium.

[31]  David Hutchison,et al.  Theorem Proving in Higher Order Logics , 2003, Lecture Notes in Computer Science.

[32]  José González,et al.  Understanding the Thermal Implications of Multi-Core Architectures , 2007, IEEE Transactions on Parallel and Distributed Systems.

[33]  Gerard J. Holzmann,et al.  The Model Checker SPIN , 1997, IEEE Trans. Software Eng..

[34]  Muhammad Shafique,et al.  Agent-based distributed power management for Kilo-core processors: Special Session: “Keeping Kilo-core chips cool: New directions and emerging solutions” , 2013, 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[35]  John Eberhard,et al.  Semantics-Based Object Caching in Distributed Systems , 2010, IEEE Transactions on Parallel and Distributed Systems.

[36]  Coniferous softwood GENERAL TERMS , 2003 .

[37]  Augustus K. Uht,et al.  Central vs. distributed dynamic thermal management for multi-core processors: which one is better? , 2009, GLSVLSI '09.

[38]  Franco Fummi,et al.  Properties Incompleteness Evaluation by Functional Verification , 2007, IEEE Transactions on Computers.

[39]  Mohammad Abdullah Al Faruque,et al.  Runtime Thermal Management Using Software Agents for Multi- and Many-Core Architectures , 2010, IEEE Design & Test of Computers.

[40]  C. E. Brown Automated Reasoning in Higher-Order Logic: Set Comprehension and Extensionality in Church's Type Theory , 2007 .

[41]  Jose Renau,et al.  Characterizing processor thermal behavior , 2010, ASPLOS XV.

[42]  Michael J. C. Gordon,et al.  Mechanizing programming logics in higher order logic , 1989 .

[43]  M. Gordon,et al.  Introduction to HOL: a theorem proving environment for higher order logic , 1993 .

[44]  Sheldon X.-D. Tan,et al.  Task Migrations for Distributed Thermal Management Considering Transient Effects , 2015, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[45]  Tatsuhiro Tsuchiya,et al.  Symbolic Model Checking for Self-Stabilizing Algorithms , 2001, IEEE Trans. Parallel Distributed Syst..

[46]  Shahin Nazarian,et al.  Thermal Modeling, Analysis, and Management in VLSI Circuits: Principles and Methods , 2006, Proceedings of the IEEE.

[47]  Margaret Martonosi,et al.  Techniques for Multicore Thermal Management: Classification and New Exploration , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[48]  Jörg Henkel,et al.  TAPE: Thermal-aware agent-based power econom multi/many-core architectures , 2009, 2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers.

[49]  Doris Schmitt-Landsiedel,et al.  Modeling of temperature scenarios in a multicore processor system , 2013 .