Comprehensive and Efficient Design Parameter Selection for Soft Error Resilient Processors via Universal Rules

Soft errors have been significantly degrading the reliability of current processors whose feature sizes and supply voltages are fast scaling down. In this paper, we propose two effective approaches to characterize processor reliability against soft errors at presilicon stage. By utilizing a rule search strategy named Patient Rule Induction Method (PRIM), we are capable of generating a set of selective rules on key design parameters. These rules quantify the design space subregion with the lowest effective soft error rate (SER), thus providing useful guidelines in designing reliable processors. Furthermore, we also propose to use Classification and Regression Trees (CART) to partition the design space into a number of small subregions each being associated with a representative SER value. This gives the processor designer a global view of the SER distribution, enabling a comprehensive analysis over the entire design space. More importantly, both approaches generate “universal” models whose effectiveness is validated with a set of test programs unseen to training. Compared to traditional application-specific design space studies, our models' cross-program capability can save great training effort in the era of multithreading. Finally, a case study on multiprocessors is performed to simultaneously balance multiple design metrics, including reliability, performance, and power.

[1]  Tao Li,et al.  Characterizing Microarchitecture Soft Error Vulnerability Phase Behavior , 2006, 14th IEEE International Symposium on Modeling, Analysis, and Simulation.

[2]  Anand Sivasubramaniam,et al.  Characterizing the soft error vulnerability of multicores running multithreaded applications , 2010, SIGMETRICS '10.

[3]  Irith Pomeranz,et al.  Transient-fault recovery for chip multiprocessors , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[4]  Shubhendu S. Mukherjee,et al.  Transient fault detection via simultaneous multithreading , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[5]  Sanjay J. Patel,et al.  Examining ACE analysis reliability estimates using fault-injection , 2007, ISCA '07.

[6]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[7]  Kevin Skadron,et al.  CMP design space exploration subject to physical constraints , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[8]  J. Fortes,et al.  Sim-SODA : A Unified Framework for Architectural Level Software Reliability Analysis , 2006 .

[9]  Xiaodong Li,et al.  Online Estimation of Architectural Vulnerability Factor for Soft Errors , 2008, 2008 International Symposium on Computer Architecture.

[10]  Eric Rotenberg,et al.  AR-SMT: a microarchitectural approach to fault tolerance in microprocessors , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[11]  Anand Sivasubramaniam,et al.  Mechanisms for bounding vulnerabilities of processor structures , 2007, ISCA '07.

[12]  Ronald G. Dreslinski,et al.  The M5 Simulator: Modeling Networked Systems , 2006, IEEE Micro.

[13]  Shubhendu S. Mukherjee,et al.  A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[14]  Sally A. McKee,et al.  Efficiently exploring architectural design spaces via predictive modeling , 2006, ASPLOS XII.

[15]  Bin Li,et al.  Reliability-Constrained Processor Performance Optimization via Design Parameter Selection , 2009 .

[16]  Michael F. P. O'Boyle,et al.  Microarchitectural Design Space Exploration Using an Architecture-Centric Approach , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[17]  Todd M. Austin,et al.  DIVA: a reliable substrate for deep submicron microarchitecture design , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[18]  Bin Li,et al.  Universal rules guided design parameter selection for soft error resilient processors , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.

[19]  James L. Walsh,et al.  IBM experiments in soft fails in computer electronics (1978-1994) , 1996, IBM J. Res. Dev..

[20]  Bin Li,et al.  Two-level soft error vulnerability prediction on SMT/CMP architectures , 2011, 2011 IEEE International Symposium on Workload Characterization (IISWC).

[21]  David M. Brooks,et al.  Accurate and efficient regression modeling for microarchitectural performance and power prediction , 2006, ASPLOS XII.

[22]  Ramon Canal,et al.  Design space exploration for multicore architectures: a power/performance/thermal view , 2006, ICS '06.

[23]  David R. Kaeli,et al.  Eliminating microarchitectural dependency from Architectural Vulnerability , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[24]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[25]  Bin Li,et al.  Versatile prediction and fast estimation of Architectural Vulnerability Factor from processor performance metrics , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[26]  David M. Brooks,et al.  CPR: Composable performance regression for scalable multiprocessor models , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[27]  Bin Li,et al.  Efficient Microarchitectural Vulnerabilities Prediction Using Boosted Regression Trees and Patient Rule Inductions , 2010, IEEE Transactions on Computers.

[28]  Tao Li,et al.  Managing multi-core soft-error reliability through utility-driven cross domain optimization , 2008, 2008 International Conference on Application-Specific Systems, Architectures and Processors.

[29]  Tao Li,et al.  Combined circuit and microarchitecture techniques for effective soft error robustness in SMT processors , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[30]  Bin Li,et al.  Predicting Architectural Vulnerability on Multithreaded Processors under Resource Contention and Sharing , 2013, IEEE Transactions on Dependable and Secure Computing.

[31]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[32]  David M. Brooks,et al.  Illustrative Design Space Studies with Microarchitectural Regression Models , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[33]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[34]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[35]  David R. Kaeli,et al.  Using hardware vulnerability factors to enhance AVF analysis , 2010, ISCA.

[36]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[37]  Irith Pomeranz,et al.  Transient-fault recovery using simultaneous multithreading , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[38]  Tao Li,et al.  Modeling and Analyzing the Effect of Microarchitecture Design Parameters on Microprocessor Soft Error Vulnerability , 2008, 2008 IEEE International Symposium on Modeling, Analysis and Simulation of Computers and Telecommunication Systems.

[39]  Sudhanva Gurumurthi,et al.  Dynamic prediction of architectural vulnerability from microarchitectural state , 2007, ISCA '07.

[40]  Joel S. Emer,et al.  Techniques to reduce the soft error rate of a high-performance microprocessor , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[41]  C. Pipper,et al.  [''R"--project for statistical computing]. , 2008, Ugeskrift for laeger.

[42]  Xin Fu,et al.  An Analysis of Microarchitecture Vulnerability to Soft Errors on Simultaneous Multithreaded Architectures , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.

[43]  Tao Li,et al.  Informed Microarchitecture Design Space Exploration Using Workload Dynamics , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[44]  Arijit Biswas,et al.  Computing architectural vulnerability factors for address-based structures , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[45]  Xiaodong Li,et al.  SoftArch: an architecture-level tool for modeling and analyzing soft errors , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[46]  Shubhendu S. Mukherjee,et al.  Detailed design and evaluation of redundant multithreading alternatives , 2002, ISCA.

[47]  Nicholas I. Fisher,et al.  Bump hunting in high-dimensional data , 1999, Stat. Comput..