Evaluating and Accelerating High-Fidelity Error Injection for HPC
暂无分享,去创建一个
Mattan Erez | Sangkug Lym | Chun-Kai Chang | Michael B. Sullivan | Nicholas Kelly | Michael B. Sullivan | M. Erez | Sangkug Lym | Chun-Kai Chang | Nicholas Kelly
[1] R. Allmon,et al. Soft Error Susceptibilities of 22 nm Tri-Gate Devices , 2012, IEEE Transactions on Nuclear Science.
[2] L. W. Massengill,et al. Impact of technology scaling on the combinational logic soft error rate , 2014, 2014 IEEE International Reliability Physics Symposium.
[3] Sarita V. Adve,et al. Approxilyzer: Towards a systematic framework for instruction-level approximate computing and its application to hardware resiliency , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[4] Israel Koren,et al. CAROL-FI: an Efficient Fault-Injection Tool for Vulnerability Evaluation of Modern HPC Parallel Accelerators , 2017, Conf. Computing Frontiers.
[5] Sandia Report,et al. Improving Performance via Mini-applications , 2009 .
[6] Zainalabedin Navabi,et al. Hierarchical fault simulation using behavioral and gate level hardware models , 2002, Proceedings of the 11th Asian Test Symposium, 2002. (ATS '02)..
[7] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.
[8] Johan Karlsson,et al. One Bit is (Not) Enough: An Empirical Study of the Impact of Single and Multiple Bit-Flip Errors , 2017, 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).
[9] Sarita V. Adve,et al. GangES: Gang error simulation for hardware resiliency evaluation , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[10] Ravishankar K. Iyer,et al. Hierarchical Simulation Approach to Accurate Fault Modeling for System Dependability Evaluation , 1999, IEEE Trans. Software Eng..
[11] Franck Cappello,et al. Addressing failures in exascale computing , 2014, Int. J. High Perform. Comput. Appl..
[12] Sarita V. Adve,et al. Accurate microarchitecture-level fault modeling for studying hardware faults , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[13] Albert Meixner,et al. Argus: Low-Cost, Comprehensive Error Detection in Simple Cores , 2008, IEEE Micro.
[14] Régis Leveugle,et al. Statistical fault injection: Quantified error and confidence , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.
[15] Karthik Pattabiraman,et al. LLFI : An Intermediate Code Level Fault Injector For Soft Computing Applications , 2013 .
[16] Mattan Erez,et al. Hamartia: A Fast and Accurate Error Injection Framework , 2018, 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W).
[17] Elizabeth M. Rudnick,et al. A Gate-Level Simulation Environment for Alpha-Particle-Induced Transient Faults , 1996, IEEE Trans. Computers.
[18] John T. Daly,et al. Impact of sub-optimal checkpoint intervals on application efficiency in computational clusters , 2010, HPDC '10.
[19] Andrew Siegel,et al. XSBENCH - THE DEVELOPMENT AND VERIFICATION OF A PERFORMANCE ABSTRACTION FOR MONTE CARLO REACTOR ANALYSIS , 2014 .
[20] Dimitris Gizopoulos,et al. Anatomy of microarchitecture-level reliability assessment: Throughput and accuracy , 2016, 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[21] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[22] Jaume Abella,et al. Implementing End-to-End Register Data-Flow Continuous Self-Test , 2009, IEEE Transactions on Computers.
[23] Ravishankar K. Iyer,et al. An experimental study of soft errors in microprocessors , 2005, IEEE Micro.
[24] Albert Meixner,et al. Argus: Low-Cost, Comprehensive Error Detection in Simple Cores , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[25] Karthikeyan Sankaralingam,et al. Understanding the impact of gate-level physical reliability effects on whole program execution , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[26] Gokcen Kestor,et al. Understanding the propagation of transient errors in HPC applications , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[27] Jinsuk Chung,et al. Containment domains: A scalable, efficient, and flexible resilience scheme for exascale systems , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[28] Jacob A. Abraham,et al. Quantitative evaluation of soft error injection techniques for robust system design , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).
[29] Martin Schulz,et al. REFINE: Realistic Fault Injection via Compiler-based Instrumentation for Accuracy, Portability and Speed , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.
[30] Dimitris Gizopoulos,et al. MeRLiN: Exploiting dynamic instruction behavior for fast and accurate microarchitecture level reliability assessment , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[31] Scott A. Mahlke,et al. Harnessing Soft Computations for Low-Budget Fault Tolerance , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[32] Ian Karlin,et al. LULESH 2.0 Updates and Changes , 2013 .
[33] David Z. Pan,et al. High-level synthesis of error detecting cores through low-cost modulo-3 shadow datapaths , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).
[34] Sarita V. Adve,et al. Low-cost program-level detectors for reducing silent data corruptions , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).
[35] Laura Monroe,et al. Design, Use and Evaluation of P-FSEFI: A Parallel Soft Error Fault Injection Framework for Emulating Soft Errors in Parallel Applications , 2016, SimuTools.
[36] Meeta Sharma Gupta,et al. Understanding Soft Error Resiliency of Blue Gene/Q Compute Chip through Hardware Proton Irradiation and Software Fault Injection , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[37] John T. Daly,et al. A higher order estimate of the optimum checkpoint interval for restart dumps , 2006, Future Gener. Comput. Syst..
[38] Eric Cheng,et al. CLEAR: Cross-layer exploration for architecting resilience: Combining hardware and software techniques to tolerate soft errors in processor cores , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).
[39] Shinya Takamaeda-Yamazaki,et al. Pyverilog: A Python-Based Hardware Design Processing Toolkit for Verilog HDL , 2015, ARC.
[40] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..
[41] Dong Li,et al. Classifying soft error vulnerabilities in extreme-Scale scientific applications using a binary instrumentation tool , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[42] N. Seifert,et al. Comparison of alpha-particle and neutron-induced combinational and sequential logic error rates at the 32nm technology node , 2009, 2009 IEEE International Reliability Physics Symposium.
[43] Sarita V. Adve,et al. Relyzer: exploiting application-level fault equivalence to analyze application resiliency to transient faults , 2012, ASPLOS XVII.
[44] Karthik Pattabiraman,et al. Quantifying the Accuracy of High-Level Fault Injection Techniques for Hardware Faults , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.
[45] Stephen W. Keckler,et al. SASSIFI: An architecture-level fault injection tool for GPU application resilience evaluation , 2017, 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[46] James Tschanz,et al. A Low Cost Scheme for Reducing Silent Data Corruption in Large Arithmetic Circuits , 2008, 2008 IEEE International Symposium on Defect and Fault Tolerance of VLSI Systems.
[47] Sriram Krishnamoorthy,et al. Towards Resiliency Evaluation of Vector Programs , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[48] Ganesh Gopalakrishnan,et al. Towards Formal Approaches to System Resilience , 2013, 2013 IEEE 19th Pacific Rim International Symposium on Dependable Computing.
[49] Pradip Bose,et al. BRAVO: Balanced Reliability-Aware Voltage Optimization , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[50] Song Fu,et al. F-SEFI: A Fine-Grained Soft Error Fault Injection Tool for Profiling Application Vulnerability , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[51] Norbert Wehn,et al. A Cross-Layer Technology-Based Study of How Memory Errors Impact System Resilience , 2013, IEEE Micro.
[52] Prabhakar Kudva,et al. Soft-error resilience of the IBM POWER6 processor , 2008, IBM J. Res. Dev..
[53] Florent de Dinechin,et al. Designing Custom Arithmetic Data Paths with FloPoCo , 2011, IEEE Design & Test of Computers.
[54] Laura Monroe,et al. SDC is in the Eye of the Beholder: A Survey and Preliminary Study , 2016, 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshop (DSN-W).