Studying error propagation on application data structure and hardware

[1]  David Kaeli,et al.  Characterizing and Exploiting Soft Error Vulnerability Phase Behavior in GPU Applications , 2022, IEEE Transactions on Dependable and Secure Computing.

[2]  J. A. Moríñigo,et al.  Error resilience of three GMRES implementations under fault injection , 2021, The Journal of Supercomputing.

[3]  Valerio Pascucci,et al.  Understanding a program's resiliency through error propagation , 2021, PPoPP.

[4]  Abdul Rehman Anwer,et al.  GPU-Trident: Efficient Modeling of Error Propagation in GPU Programs , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.

[5]  M. Erez,et al.  Runtime-Guided ECC Protection using Online Estimation of Memory Vulnerability , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.

[6]  Heng Yin,et al.  Chaser: An Enhanced Fault Injection Tool for Tracing Soft Errors in MPI Applications , 2020, 2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[7]  Valerio Pascucci,et al.  SpotSDC: Revealing the Silent Data Corruption Propagation in High-Performance Computing Systems , 2020, IEEE Transactions on Visualization and Computer Graphics.

[8]  Jörg Nolte,et al.  Real-Time Dynamic Hardware Reconfiguration for Processors with Redundant Functional Units , 2020, 2020 IEEE 23rd International Symposium on Real-Time Distributed Computing (ISORC).

[9]  Ignacio Laguna,et al.  Detecting and reproducing error-code propagation bugs in MPI implementations , 2020, PPoPP.

[10]  Ricardo Reis,et al.  Evaluation of Compilers Effects on OpenMP Soft Error Resiliency , 2019, 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).

[11]  Marc Casas,et al.  A Vulnerability Factor for ECC-protected Memory , 2019, 2019 IEEE 25th International Symposium on On-Line Testing and Robust System Design (IOLTS).

[12]  Dong Li,et al.  MOARD: Modeling Application Resilience to Transient Faults on Data Objects , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[13]  A. Bosio,et al.  SyRA: Early System Reliability Analysis for Cross-Layer Soft Errors Resilience in Memory Arrays of Microprocessor Systems , 2019, IEEE Transactions on Computers.

[14]  Davy Pissoort,et al.  An Improved Data Error Detection Technique for Dependable Embedded Software , 2018, 2018 IEEE 23rd Pacific Rim International Symposium on Dependable Computing (PRDC).

[15]  Stephen W. Keckler,et al.  Optimizing Software-Directed Instruction Replication for GPU Error Detection , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[16]  Seyyed Amir Asghari,et al.  Enhancing transient fault tolerance in embedded systems through an OS task level redundancy approach , 2018, Future Gener. Comput. Syst..

[17]  Martin Schulz,et al.  FlipTracker: Understanding Natural Error Resilience in HPC Applications , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[18]  Ricardo Reis,et al.  Evaluation of Compiler Optimization Flags Effects on Soft Error Resiliency , 2018, 2018 31st Symposium on Integrated Circuits and Systems Design (SBCCI).

[19]  Paolo Rech,et al.  Code-Dependent and Architecture-Dependent Reliability Behaviors , 2018, 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[20]  Karthik Pattabiraman,et al.  Modeling Input-Dependent Error Propagation in Programs , 2018, 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[21]  Sam Ainsworth,et al.  Parallel Error Detection Using Heterogeneous Cores , 2018, 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[22]  Karthik Pattabiraman,et al.  Modeling Soft-Error Propagation in Programs , 2018, 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[23]  Yang Jun,et al.  A method of soft error propagation based on cellular automata , 2018, 2018 IEEE 3rd International Conference on Cloud Computing and Big Data Analysis (ICCCBDA).

[24]  Xavier Martorell,et al.  Analysis of the Impact Factors on Data Error Propagation in HPC Applications , 2018, 2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP).

[25]  Rajesh K. Gupta,et al.  Reliability-Aware Data Placement for Heterogeneous Memory Architecture , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[26]  György Györök,et al.  Duplicated control unit based embedded fault-masking systems , 2017, 2017 IEEE 15th International Symposium on Intelligent Systems and Informatics (SISY).

[27]  Paolo Rech,et al.  Register File Criticality and Compiler Optimization Effects on Embedded Microprocessor Reliability , 2017, IEEE Transactions on Nuclear Science.

[28]  Johan Karlsson,et al.  One Bit is (Not) Enough: An Empirical Study of the Impact of Single and Multiple Bit-Flip Errors , 2017, 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[29]  Aviral Shrivastava,et al.  Protecting Caches from Soft Errors , 2017, ACM Trans. Embed. Comput. Syst..

[30]  Gokcen Kestor,et al.  Exploring the Effect of Compiler Optimizations on the Reliability of HPC Applications , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[31]  Olga V. Mamoutova,et al.  On design of cache with efficient soft error protection , 2017, 2017 IEEE 37th International Conference on Electronics and Nanotechnology (ELNANO).

[32]  Gianfranco Politano,et al.  Cross-layer system reliability assessment framework for hardware faults , 2016, 2016 IEEE International Test Conference (ITC).

[33]  Peng-Sheng Chen,et al.  A Software-Based Redundant Execution Programming Model for Transient Fault Detection and Correction , 2016, 2016 45th International Conference on Parallel Processing Workshops (ICPPW).

[34]  Jeffrey S. Vetter,et al.  Reducing soft-error vulnerability of caches using data compression , 2016, 2016 International Great Lakes Symposium on VLSI (GLSVLSI).

[35]  Mehdi Baradaran Tahoori,et al.  Online soft-error vulnerability estimation for memory arrays , 2016, 2016 IEEE 34th VLSI Test Symposium (VTS).

[36]  Eric Cheng,et al.  CLEAR: Cross-layer exploration for architecting resilience: Combining hardware and software techniques to tolerate soft errors in processor cores , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[37]  Xavier Martorell,et al.  Analyzing Data-Error Propagation Effects in High-Performance Computing , 2016, 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP).

[38]  Gokcen Kestor,et al.  Understanding the propagation of transient errors in HPC applications , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[39]  Karthik Pattabiraman,et al.  LLFI: An Intermediate Code-Level Fault Injection Tool for Hardware Faults , 2015, 2015 IEEE International Conference on Software Quality, Reliability and Security.

[40]  Osman S. Unsal,et al.  NanoCheckpoints: A Task-Based Asynchronous Dataflow Framework for Efficient and Scalable Checkpoint/Restart , 2015, 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[41]  Dong Li,et al.  Quantitatively Modeling Application Resilience with the Data Vulnerability Factor , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[42]  Gabriel L. Nazar,et al.  Adaptive Low-Power Architecture for High-Performance and Reliable Embedded Computing , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[43]  Christos D. Antonopoulos,et al.  GemFI: A Fault Injection Tool for Studying the Behavior of Applications on Unreliable Substrates , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[44]  Franck Cappello,et al.  Toward Exascale Resilience: 2014 update , 2014, Supercomput. Front. Innov..

[45]  Xavier Vera,et al.  Reducing DUE-FIT of caches by exploiting acoustic wave detectors for error recovery , 2013, 2013 IEEE 19th International On-Line Testing Symposium (IOLTS).

[46]  Rolf Riesen,et al.  Detection and Correction of Silent Data Corruption for Large-Scale High-Performance Computing , 2012, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[47]  Mahmut T. Kandemir,et al.  Thread vulnerability in parallel applications , 2012, J. Parallel Distributed Comput..

[48]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[49]  R. Leveugle,et al.  Microprocessor soft error rate prediction based on cache memory analysis , 2011, 2011 12th European Conference on Radiation and Its Effects on Components and Systems.

[50]  Padma Raghavan,et al.  Characterizing the impact of soft errors on iterative methods in scientific computing , 2011, ICS '11.

[51]  Ben H. H. Juurlink,et al.  Protective redundancy overhead reduction using instruction vulnerability factor , 2010, Conf. Computing Frontiers.

[52]  Arshad Jhumka,et al.  Towards Understanding the Importance of Variables in Dependable Software , 2010, 2010 European Dependable Computing Conference.

[53]  Shuai Wang,et al.  On the Exploitation of Narrow-Width Values for Improving Register File Reliability , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[54]  David R. Kaeli,et al.  Eliminating microarchitectural dependency from Architectural Vulnerability , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[55]  Sarita V. Adve,et al.  Understanding the propagation of hard errors to software and implications for resilient system design , 2008, ASPLOS.

[56]  R.C. Baumann,et al.  Radiation-induced soft errors in advanced semiconductor technologies , 2005, IEEE Transactions on Device and Materials Reliability.

[57]  Shekhar Y. Borkar,et al.  Designing reliable systems from unreliable components: the challenges of transistor variability and degradation , 2005, IEEE Micro.

[58]  Wei Zhang,et al.  Computing cache vulnerability to transient errors and its implication , 2005, 20th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT'05).

[59]  Wei Zhang,et al.  Compiler-guided register reliability improvement against soft errors , 2005, EMSOFT.

[60]  Mehdi Baradaran Tahoori,et al.  An analytical approach for soft error rate estimation in digital circuits , 2005, 2005 IEEE International Symposium on Circuits and Systems.

[61]  David I. August,et al.  SWIFT: software implemented fault tolerance , 2005, International Symposium on Code Generation and Optimization.

[62]  Mehdi Baradaran Tahoori,et al.  Balancing Performance and Reliability in the Memory Hierarchy , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[63]  Joel S. Emer,et al.  The soft error problem: an architectural perspective , 2005, 11th International Symposium on High-Performance Computer Architecture.

[64]  Babak Falsafi,et al.  Fingerprinting: bounding soft-error-detection latency and bandwidth , 2004, IEEE Micro.

[65]  Joel S. Emer,et al.  Techniques to reduce the soft error rate of a high-performance microprocessor , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[66]  Joel Emer,et al.  A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[67]  Shubhendu S. Mukherjee,et al.  Measuring Architectural Vulnerability Factors , 2003, IEEE Micro.

[68]  Massimo Violante,et al.  An accurate analysis of the effects of soft errors in the instruction and data caches of a pipelined microprocessor , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[69]  Lorenzo Alvisi,et al.  Modeling the effect of technology trends on the soft error rate of combinational logic , 2002, Proceedings International Conference on Dependable Systems and Networks.

[70]  Neeraj Suri,et al.  On the placement of software mechanisms for detection of data errors , 2002, Proceedings International Conference on Dependable Systems and Networks.

[71]  Shubhendu S. Mukherjee,et al.  Detailed design and evaluation of redundant multi-threading alternatives , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[72]  R. Baumann Soft errors in advanced semiconductor devices-part I: the three radiation sources , 2001 .

[73]  Shubhendu S. Mukherjee,et al.  Transient fault detection via simultaneous multithreading , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[74]  Kishor S. Trivedi,et al.  A cache error propagation model , 1997, Proceedings Pacific Rim International Symposium on Fault-Tolerant Systems.

[75]  Jingjing Gu,et al.  Vulnerability Analysis of Instructions for SDC-Causing Error Detection , 2019, IEEE Access.

[76]  Babak Falsafi,et al.  The Granularity of Soft-Error Containment in Shared-Memory Multiprocessors , 2006 .

[77]  James L. Walsh,et al.  Field testing for cosmic ray soft errors in semiconductor memories , 1996, IBM J. Res. Dev..

[78]  N. R. Wagner Fingerprinting , 1983, 1983 IEEE Symposium on Security and Privacy.