Synergistic timing speculation for multi-threaded programs

In this paper, we address the problem of timing speculation for multi-threaded workloads executing on a multi-core processor. Our approach is based on a new observation --- heterogeneity in path sensitization delays across different threads in multi-threaded programs. Leveraging this heterogeneity, we propose Synergistic Timing Speculation (SynTS) to jointly optimize the energy and execution time of multithreaded applications. In particular, SynTS uses a sampling based online error probability estimation technique, coupled with a polynomial time algorithm, to optimally determine the voltage, frequency and the amount of timing speculation for each thread. Our experimental evaluations, based on detailed cross-layer simulations, demonstrate that SynTS reduces energy delay product by up to 21%, compared to existing timing speculation schemes.

[1]  Karthikeyan Sankaralingam,et al.  MIAOW - An open source RTL implementation of a GPGPU , 2015, COOL Chips.

[2]  Yuanyuan Zhou,et al.  Managing energy-performance tradeoffs for multithreaded applications on multiprocessor architectures , 2007, SIGMETRICS '07.

[3]  Yu Cao,et al.  New Generation of Predictive Technology Model for Sub-45 nm Early Design Exploration , 2006, IEEE Transactions on Electron Devices.

[4]  John Sartori,et al.  Recovery-driven design: A power minimization methodology for error-tolerant processor modules , 2010, Design Automation Conference.

[5]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[6]  Margaret Martonosi,et al.  Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors , 2009, ISCA '09.

[7]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[8]  David R. Kaeli,et al.  Multi2Sim: A simulation framework for CPU-GPU computing , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[9]  Kaustav Banerjee,et al.  Aging-resilient design of pipelined architectures using novel detection and correction circuits , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[10]  David Blaauw,et al.  Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation , 2003, MICRO.

[11]  José González,et al.  Meeting points: Using thread criticality to adapt multicore hardware to parallel regions , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[12]  K.A. Bowman,et al.  Energy-Efficient and Metastability-Immune Resilient Circuits for Dynamic Variation Tolerance , 2009, IEEE Journal of Solid-State Circuits.

[13]  Karthikeyan Sankaralingam,et al.  A unified model for timing speculation: Evaluating the impact of technology scaling, CMOS design style, and fault recovery mechanism , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).

[14]  Robert C. Aitken,et al.  TIMBER: Time borrowing and error relaying for online timing error resilience , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[15]  Sanjay J. Patel,et al.  Characterizing the effects of transient faults on a high-performance processor pipeline , 2004, International Conference on Dependable Systems and Networks, 2004.

[16]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[17]  Lieven Eeckhout,et al.  Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[18]  Kaushik Roy,et al.  A Novel Delay Fault Testing Methodology Using Low-Overhead Built-In Delay Sensor , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[19]  Nicola Nicolici,et al.  In-system and on-the-fly clock tuning mechanism to combat lifetime performance degradation , 2011, 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[20]  Michael C. Huang,et al.  The thrifty barrier: energy-aware synchronization in shared-memory multiprocessors , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[21]  David M. Bull,et al.  RazorII: In Situ Error Detection and Correction for PVT and SER Tolerance , 2009, IEEE Journal of Solid-State Circuits.

[22]  Stijn Eyerman,et al.  Criticality stacks: identifying critical threads in parallel programs using synchronization behavior , 2013, ISCA.

[23]  Susan J. Eggers,et al.  Static Analysis of Barrier Synchronization in Explicitly Parallel Programs , 1994, IFIP PACT.

[24]  Qiang Xu,et al.  Online clock skew tuning for timing speculation , 2011, 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[25]  Mahmut T. Kandemir,et al.  Exploiting barriers to optimize power consumption of CMPs , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.