Optimizing Soft Error Reliability Through Scheduling on Heterogeneous Multicore Processors

Reliability to soft errors is an increasingly important issue as technology continues to shrink. In this paper, we show that applications exhibit different reliability characteristics on big, high-performance cores versus small, power-efficient cores, and that there is significant opportunity to improve system reliability through reliability-aware scheduling on heterogeneous multicore processors. We monitor the reliability characteristics of all running applications, and dynamically schedule applications to the different core types in a heterogeneous multicore to maximize system reliability. Reliability-aware scheduling improves reliability by 25.4 percent on average (and up to 60.2 percent) compared to performance-optimized scheduling on a heterogeneous multicore processor with two big cores and two small cores, while degrading performance by 6.3 percent only. We also introduce a novel system-level reliability metric for multiprogram workloads on (heterogeneous) multicores. We provide a trade-off analysis among reliability-, power- and performance-optimized scheduling, and evaluate reliability-aware scheduling under performance constraints and for unprotected L1 caches. In addition, we also extend our scheduling mechanisms to multithreaded programs. The hardware cost in support of our reliability-aware scheduler is limited to 296 bytes per core.

[1]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[2]  Onur Mutlu,et al.  Bottleneck identification and scheduling in multithreaded applications , 2012, ASPLOS XVII.

[3]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures for multithreaded workload performance , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[4]  Arijit Biswas,et al.  Computing architectural vulnerability factors for address-based structures , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[5]  Lizy Kurian John,et al.  Efficient program scheduling for heterogeneous multi-core processors , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[6]  Vijay Janapa Reddi,et al.  Event-based scheduling for energy-efficient QoS (eQoS) in mobile Web applications , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[7]  Mehdi Baradaran Tahoori,et al.  Balancing Performance and Reliability in the Memory Hierarchy , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[8]  Shuai Wang,et al.  On the Characterization and Optimization of On-Chip Cache Reliability against Soft Errors , 2009, IEEE Transactions on Computers.

[9]  Anand Sivasubramaniam,et al.  Mechanisms for bounding vulnerabilities of processor structures , 2007, ISCA '07.

[10]  Muhammad Shafique,et al.  Task Mapping for Redundant Multithreading in Multi-Cores with Reliability and Performance Heterogeneity , 2016, IEEE Transactions on Computers.

[11]  Xiaodong Li,et al.  Online Estimation of Architectural Vulnerability Factor for Soft Errors , 2008, 2008 International Symposium on Computer Architecture.

[12]  Muhammad Shafique,et al.  Reliability-Aware Adaptations for Shared Last-Level Caches in Multi-Cores , 2016, ACM Trans. Embed. Comput. Syst..

[13]  Sudhanva Gurumurthi,et al.  Dynamic prediction of architectural vulnerability from microarchitectural state , 2007, ISCA '07.

[14]  Dheeraj Reddy,et al.  Bias scheduling in heterogeneous multi-core architectures , 2010, EuroSys '10.

[15]  Lorenzo Alvisi,et al.  Modeling the effect of technology trends on the soft error rate of combinational logic , 2002, Proceedings International Conference on Dependable Systems and Networks.

[16]  R.C. Baumann,et al.  Radiation-induced soft errors in advanced semiconductor technologies , 2005, IEEE Transactions on Device and Materials Reliability.

[17]  Lieven Eeckhout,et al.  Scheduling heterogeneous multi-cores through performance impact estimation (PIE) , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[18]  Stijn Eyerman,et al.  Bottle graphs: visualizing scalability bottlenecks in multi-threaded applications , 2013, OOPSLA.

[19]  Muhammad Shafique,et al.  R2Cache: Reliability-aware reconfigurable last-level cache architecture for multi-cores , 2015, 2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[20]  Stacey Jeffery,et al.  HASS: a scheduler for heterogeneous multicore systems , 2009, OPSR.

[21]  Lieven Eeckhout,et al.  AVF Stressmark: Towards an Automated Methodology for Bounding the Worst-Case Vulnerability to Soft Errors , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[22]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[23]  Stijn Eyerman,et al.  Reliability-Aware Scheduling on Heterogeneous Multicore Processors , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[24]  Stijn Eyerman,et al.  Criticality stacks: identifying critical threads in parallel programs using synchronization behavior , 2013, ISCA.

[25]  Todd M. Austin,et al.  A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor , 2003, MICRO.

[26]  Tryggve Fossum,et al.  Cache scrubbing in microprocessors: myth or necessity? , 2004, 10th IEEE Pacific Rim International Symposium on Dependable Computing, 2004. Proceedings..

[27]  Anuj Pathania,et al.  Price theory based power management for heterogeneous multi-cores , 2014, ASPLOS.

[28]  Margaret Martonosi,et al.  Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors , 2009, ISCA '09.

[29]  Daniel J. Sorin,et al.  Fault Tolerant Computer Architecture , 2009, Fault Tolerant Computer Architecture.

[30]  M. Mohanapriya,et al.  Area, Delay And Power Comparison Of Adder Topologies , 2012, VLSIC 2012.

[31]  Aamer Jaleel,et al.  Explaining cache SER anomaly using DUE AVF measurement , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[32]  Li Zhao,et al.  QuickIA: Exploring heterogeneous architectures on real prototypes , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[33]  Muhammad Shafique,et al.  Soft error-aware architectural exploration for designing reliability adaptive cache hierarchies in multi-cores , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[34]  Lieven Eeckhout,et al.  Fairness-aware scheduling on single-ISA heterogeneous multi-cores , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[35]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[36]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[37]  T. Calin,et al.  Upset hardened memory design for submicron CMOS technology , 1996 .

[38]  Stijn Eyerman,et al.  A first-order mechanistic model for architectural vulnerability factor , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[39]  Stijn Eyerman,et al.  An Evaluation of High-Level Mechanistic Core Models , 2014, ACM Trans. Archit. Code Optim..

[40]  David R. Kaeli,et al.  Using hardware vulnerability factors to enhance AVF analysis , 2010, ISCA.

[41]  Norman P. Jouppi,et al.  Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction , 2003, MICRO.

[42]  Joel S. Emer,et al.  Techniques to reduce the soft error rate of a high-performance microprocessor , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[43]  Scott A. Mahlke,et al.  Composite Cores: Pushing Heterogeneity Into a Core , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[44]  Stijn Eyerman,et al.  System-Level Performance Metrics for Multiprogram Workloads , 2008, IEEE Micro.

[45]  M. Nicolaidis,et al.  Design for soft error mitigation , 2005, IEEE Transactions on Device and Materials Reliability.

[46]  Bin Li,et al.  Versatile prediction and fast estimation of Architectural Vulnerability Factor from processor performance metrics , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.