Efficient Microarchitectural Vulnerabilities Prediction Using Boosted Regression Trees and Patient Rule Inductions

The shrinking processor feature size, lower threshold voltage, and increasing clock frequency make modern processors highly vulnerable to transient faults. Architectural Vulnerability Factor (AVF) reflects the possibility that a transient fault eventually causes a visible error in the program output, and it indicates a system's susceptibility to transient faults. Therefore, the awareness of the AVF, especially at early design stage, is greatly helpful to achieve a trade-off between system performance and reliability. However, tracking the AVF during program execution is extremely costly, which makes accurate AVF prediction extraordinarily attractive to computer architects. In this paper, we propose to use Boosted Regression Trees (BRT), a nonparametric tree-based predictive modeling scheme, to identify the correlation across workloads, execution phases, and processor configurations between a key processor structure's AVF and various performance metrics. The proposed method not only makes an accurate prediction but also quantitatively illustrates individual performance variable's importance to the AVF. A quantitative comparison between our model and conventional linear regression is performed in terms of model stability, showing that our model is more stable when the model size varies. Moreover, to reduce the prediction complexity, we also utilize a technique named Patient Rule Induction Method (PRIM) to extract some simple selecting rules on important metrics. Applying these rules during runtime can fast identify execution intervals with a relatively high AVF. A case study that enables PRIM-based ROB redundancy has been performed to demonstrate a possible application of the trained PRIM rules.

[1]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[2]  David I. August,et al.  Design and evaluation of hybrid fault-detection systems , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[3]  David M. Brooks,et al.  Roughness of microarchitectural design topologies and its implications for optimization , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[4]  J. Fortes,et al.  Sim-SODA : A Unified Framework for Architectural Level Software Reliability Analysis , 2006 .

[5]  Shubhendu S. Mukherjee,et al.  Transient fault detection via simultaneous multithreading , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[6]  Todd M. Austin,et al.  A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor , 2003, MICRO.

[7]  Arijit Biswas,et al.  Computing architectural vulnerability factors for address-based structures , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[8]  Irith Pomeranz,et al.  Transient-fault recovery for chip multiprocessors , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[9]  Sanjay J. Patel,et al.  Characterizing the effects of transient faults on a high-performance processor pipeline , 2004, International Conference on Dependable Systems and Networks, 2004.

[10]  Tao Li,et al.  Characterizing Microarchitecture Soft Error Vulnerability Phase Behavior , 2006, 14th IEEE International Symposium on Modeling, Analysis, and Simulation.

[11]  Todd M. Austin,et al.  DIVA: a reliable substrate for deep submicron microarchitecture design , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[12]  Xin Fu,et al.  An Analysis of Microarchitecture Vulnerability to Soft Errors on Simultaneous Multithreaded Architectures , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.

[13]  Anand Sivasubramaniam,et al.  Mechanisms for bounding vulnerabilities of processor structures , 2007, ISCA '07.

[14]  Xiaodong Li,et al.  Online Estimation of Architectural Vulnerability Factor for Soft Errors , 2008, 2008 International Symposium on Computer Architecture.

[15]  Xiaodong Li,et al.  SoftArch: an architecture-level tool for modeling and analyzing soft errors , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[16]  Michael F. P. O'Boyle,et al.  Microarchitectural Design Space Exploration Using an Architecture-Centric Approach , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[17]  Sudhanva Gurumurthi,et al.  Dynamic prediction of architectural vulnerability from microarchitectural state , 2007, ISCA '07.

[18]  Arijit Biswas,et al.  Computing Accurate AVFs using ACE Analysis on Performance Models: A Rebuttal , 2008, IEEE Computer Architecture Letters.

[19]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[20]  Sanjay J. Patel,et al.  Examining ACE analysis reliability estimates using fault-injection , 2007, ISCA '07.

[21]  Joel S. Emer,et al.  Techniques to reduce the soft error rate of a high-performance microprocessor , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[22]  Tao Li,et al.  Informed Microarchitecture Design Space Exploration Using Workload Dynamics , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[23]  L. Breiman Heuristics of instability and stabilization in model selection , 1996 .

[24]  Irith Pomeranz,et al.  Transient-fault recovery using simultaneous multithreading , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[25]  Nicholas I. Fisher,et al.  Bump hunting in high-dimensional data , 1999, Stat. Comput..

[26]  Eric Rotenberg,et al.  AR-SMT: a microarchitectural approach to fault tolerance in microprocessors , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[27]  Shubhendu S. Mukherjee,et al.  Detailed design and evaluation of redundant multithreading alternatives , 2002, ISCA.

[28]  Bin Li,et al.  Versatile prediction and fast estimation of Architectural Vulnerability Factor from processor performance metrics , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[29]  V. Barnett,et al.  Applied Linear Statistical Models , 1975 .

[30]  James L. Walsh,et al.  IBM experiments in soft fails in computer electronics (1978-1994) , 1996, IBM J. Res. Dev..

[31]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.