PEPPA-X: Finding Program Test Inputs to Bound Silent Data Corruption Vulnerability in HPC Applications
暂无分享,去创建一个
Guanpeng Li | Shengjian Guo | Aabid Shamji | Md Hasanur Rahman | Guanpeng Li | Md. Hasanur Rahman | Aabid Shamji | Shengjian Guo
[1] Roger Johansson,et al. On the Impact of Hardware Faults - An Investigation of the Relationship between Workload Inputs and Failure Mode Distributions , 2012, SAFECOMP.
[2] Lei Xu,et al. Life after Speech Recognition: Fuzzing Semantic Misinterpretation for Voice Assistant Applications , 2019, NDSS.
[3] Martin Schulz,et al. IPAS: Intelligent protection against silent output corruption in scientific applications , 2016, 2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[4] Sriram Sankar,et al. Silent Data Corruptions at Scale , 2021, ArXiv.
[5] Franck Cappello,et al. Towards End-to-end SDC Detection for HPC Applications Equipped with Lossy Compression , 2020, 2020 IEEE International Conference on Cluster Computing (CLUSTER).
[6] Martin Schulz,et al. REFINE: Realistic Fault Injection via Compiler-based Instrumentation for Accuracy, Portability and Speed , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.
[7] LADR , 2018, Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing.
[8] Franck Cappello,et al. Addressing failures in exascale computing , 2014, Int. J. High Perform. Comput. Appl..
[9] Adwait Jog,et al. Enabling Software Resilience in GPGPU Applications via Partial Thread Protection , 2021, 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE).
[10] Peng Li,et al. SAVIOR: Towards Bug-Driven Hybrid Testing , 2019, 2020 IEEE Symposium on Security and Privacy (SP).
[11] Corina S. Pasareanu,et al. DifFuzz: Differential Fuzzing for Side-Channel Analysis , 2018, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).
[12] Dong Li,et al. Quantitatively Modeling Application Resilience with the Data Vulnerability Factor , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[13] Cen Zhang,et al. MUZZ: Thread-aware Grey-box Fuzzing for Effective Bug Hunting in Multithreaded Programs , 2020, USENIX Security Symposium.
[14] Harshitha Menon,et al. DisCVar: discovering critical variables using algorithmic differentiation for transient faults , 2018, PPoPP.
[15] Guanpeng Li,et al. A Tale of Two Injectors: End-to-End Comparison of IR-Level and Assembly-Level Fault Injection , 2019, 2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE).
[16] C. Constantinescu,et al. Intermittent faults and effects on reliability of integrated circuits , 2008, 2008 Annual Reliability and Maintainability Symposium.
[17] Cornelius Aschermann,et al. Ijon: Exploring Deep State Spaces via Fuzzing , 2020, 2020 IEEE Symposium on Security and Privacy (SP).
[18] Yang Liu,et al. Cerebro: context-aware adaptive fuzzing for effective vulnerability detection , 2019, ESEC/SIGSOFT FSE.
[19] Johan Karlsson,et al. One Bit is (Not) Enough: An Empirical Study of the Impact of Single and Multiple Bit-Flip Errors , 2017, 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).
[20] Karthik Pattabiraman,et al. Modeling Soft-Error Propagation in Programs , 2018, 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).
[21] Evgenia Smirni,et al. Practical Resilience Analysis of GPGPU Applications in the Presence of Single- and Multi-Bit Faults , 2021, IEEE Transactions on Computers.
[22] Amin Ansari,et al. Shoestring: probabilistic soft error reliability on the cheap , 2010, ASPLOS XV.
[23] D. Kaeli,et al. ArmorAll: Compiler-based Resilience Targeting GPU Applications , 2020, ACM Trans. Archit. Code Optim..
[24] Karthik Pattabiraman,et al. Quantifying the Accuracy of High-Level Fault Injection Techniques for Hardware Faults , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.
[25] Dong Li,et al. Rethinking algorithm-based fault tolerance with a cooperative software-hardware approach , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[26] Bihuan Chen,et al. Hawkeye: Towards a Desired Directed Grey-box Fuzzer , 2018, CCS.
[27] Daniel P. Siewiorek,et al. Observations on the Effects of Fault Manifestation as a Function of Workload , 1992, IEEE Trans. Computers.
[28] Christof Fetzer,et al. SpecFuzz: Bringing Spectre-type vulnerabilities to the surface , 2019, USENIX Security Symposium.
[29] Sarita V. Adve,et al. Relyzer: exploiting application-level fault equivalence to analyze application resiliency to transient faults , 2012, ASPLOS XVII.
[30] J. Karlsson,et al. The Effects of Workload Input Domain On Fault Injection Results , 1999 .
[31] Near-Zero Downtime Recovery From Transient-Error-Induced Crashes , 2021, IEEE Transactions on Parallel and Distributed Systems.
[32] G. B. Mathews. On the Partition of Numbers , 1896 .
[33] Sarita V. Adve,et al. Low-cost program-level detectors for reducing silent data corruptions , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).
[34] Evgenia Smirni. Practical Reliability Analysis of GPGPUs in the Wild: From Systems to Applications , 2019, ICPE.
[35] Nicolas Wu,et al. Reasoning about effect interaction by fusion , 2021, Proc. ACM Program. Lang..
[36] R. Haupt. Optimum population size and mutation rate for a simple real genetic algorithm that optimizes array factors , 2000, IEEE Antennas and Propagation Society International Symposium. Transmitting Waves of Progress to the Next Millennium. 2000 Digest. Held in conjunction with: USNC/URSI National Radio Science Meeting (C.
[37] Dong Li,et al. MOARD: Modeling Application Resilience to Transient Faults on Data Objects , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[38] Karthik Pattabiraman,et al. Modeling Input-Dependent Error Propagation in Programs , 2018, 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).
[39] Martin Schulz,et al. FlipTracker: Understanding Natural Error Resilience in HPC Applications , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[40] Dinghao Wu,et al. SQUIRREL: Testing Database Management Systems with Language Validity and Coverage Feedback , 2020, CCS.
[41] Dong Li,et al. PARIS: Predicting Application Resilience Using Machine Learning , 2018, J. Parallel Distributed Comput..
[42] SUGAR: Speeding Up GPGPU Application Resilience Estimation with Input Sizing , 2021, Proc. ACM Meas. Anal. Comput. Syst..
[43] Darko Marinov,et al. Minotaur: Adapting Software Testing Techniques for Hardware Errors , 2019, ASPLOS.
[44] Bin Nie,et al. Fault Site Pruning for Practical Reliability Analysis of GPGPU Applications , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[45] Valerio Pascucci,et al. Understanding a program's resiliency through error propagation , 2021, PPoPP.
[46] Pradip Bose,et al. Understanding Error Propagation in GPGPU Applications , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[47] MemLock , 2020, Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering.
[48] John Shalf,et al. DOE Advanced Scientific Computing Advisory Subcommittee (ASCAC) Report: Top Ten Exascale Research Challenges , 2014 .
[49] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[50] Jacob A. Abraham,et al. Quantitative evaluation of soft error injection techniques for robust system design , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).
[51] Santosh Pande,et al. LADR: low-cost application-level detector for reducing silent output corruptions , 2018, HPDC.
[52] Franck Cappello,et al. FT-CNN: Algorithm-Based Fault Tolerance for Convolutional Neural Networks , 2020, IEEE Transactions on Parallel and Distributed Systems.
[53] Abdul Rehman Anwer,et al. GPU-trident: efficient modeling of error propagation in GPU programs , 2020, SC.
[54] Gokcen Kestor,et al. Understanding the propagation of transient errors in HPC applications , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[55] Craig B. Zilles,et al. A characterization of instruction-level error derating and its implications for error detection , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).
[56] Edward J. McCluskey,et al. Control-flow checking by software signatures , 2002, IEEE Trans. Reliab..
[57] Bin Nie,et al. Machine Learning Models for GPU Error Prediction in a Large Scale HPC System , 2018, 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).
[58] Minotaur , 2019, Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems.
[59] Ravishankar K. Iyer,et al. Hauberk: Lightweight Silent Data Corruption Error Detector for GPGPU , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[60] K. Mohror,et al. DisCVar: discovering critical variables using algorithmic differentiation for transient faults , 2018, PPOPP.
[61] Isil Dillig,et al. Singularity: pattern fuzzing for worst case complexity , 2018, ESEC/SIGSOFT FSE.
[62] Yang Liu,et al. MEMLOCK: Memory Usage Guided Fuzzing , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).