Experimental and Analytical Study of Xeon Phi Reliability
暂无分享,去创建一个
Israel Koren | Philippe Olivier Alexandre Navaux | Laércio Lima Pilla | Heather M. Quinn | Nathan DeBardeleben | Paolo Rech | Sean Blanchard | Daniel A. G. de Oliveira | I. Koren | P. Navaux | Nathan Debardeleben | H. Quinn | S. Blanchard | P. Rech | L. Pilla | Daniel Oliveira
[1] Joel Emer,et al. SASSIFI : Evaluating Resilience of GPU Applications , 2015 .
[2] B. L. Bhuva,et al. Comparison of Combinational and Sequential Error Rates for a Deep Submicron Process , 2011, IEEE Transactions on Nuclear Science.
[3] Stephen W. Keckler,et al. SASSIFI: An architecture-level fault injection tool for GPU application resilience evaluation , 2017, 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[4] Sudhakar Yalamanchili,et al. Reliability-performance tradeoffs between 2.5D and 3D-stacked DRAM processors , 2016, 2016 IEEE International Reliability Physics Symposium (IRPS).
[5] J-C. Laprie,et al. DEPENDABLE COMPUTING AND FAULT TOLERANCE : CONCEPTS AND TERMINOLOGY , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..
[6] John Shalf,et al. DOE Advanced Scientific Computing Advisory Subcommittee (ASCAC) Report: Top Ten Exascale Research Challenges , 2014 .
[7] William M. Jones,et al. Towards Building Resilient Scientific Applications: Resilience Analysis on the Impact of Soft Error and Transient Error Tolerance with the CLAMR Hydrodynamics Mini-App , 2015, 2015 IEEE International Conference on Cluster Computing.
[8] A. Oates,et al. Characterization of Single Bit and Multiple Cell Soft Error Events in Planar and FinFET SRAMs , 2016, IEEE Transactions on Device and Materials Reliability.
[9] Meeta Sharma Gupta,et al. Understanding Soft Error Resiliency of Blue Gene/Q Compute Chip through Hardware Proton Irradiation and Software Fault Injection , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[10] Franck Cappello,et al. Exploring Partial Replication to Improve Lightweight Silent Data Corruption Detection for HPC Applications , 2016, Euro-Par.
[11] Dong Li,et al. Classifying soft error vulnerabilities in extreme-Scale scientific applications using a binary instrumentation tool , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[12] Luigi Carro,et al. Understanding GPU errors on large-scale HPC systems and the implications for system design and operation , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[13] Steven M. Guertin,et al. Using Benchmarks for Radiation Testing of Microprocessors and FPGAs , 2015, IEEE Transactions on Nuclear Science.
[14] Yo-Hwan Koh,et al. A low power and highly reliable 400Mbps mobile DDR SDRAM with on-chip distributed ECC , 2007, 2007 IEEE Asian Solid-State Circuits Conference.
[15] Robert Baumann,et al. Soft errors in advanced computer systems , 2005, IEEE Design & Test of Computers.
[16] S. Pontarelli,et al. A New Hardware/Software Platform and a New 1/E Neutron Source for Soft Error Studies: Testing FPGAs at the ISIS Facility , 2007, IEEE Transactions on Nuclear Science.
[17] Dimitris Gizopoulos,et al. GUFI: A framework for GPUs reliability assessment , 2016, 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[18] M. Baze,et al. Comparison of error rates in combinational and sequential logic , 1997 .
[19] Michael Nicolaidis. Time redundancy based soft-error tolerance to rescue nanometer technologies , 1999, Proceedings 17th IEEE VLSI Test Symposium (Cat. No.PR00146).
[20] Eduardo Pinheiro,et al. DRAM errors in the wild: a large-scale field study , 2009, SIGMETRICS '09.
[21] Luigi Carro,et al. GPGPUs: How to combine high computational power with high reliability , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[22] Jean-Claude Laprie,et al. Dependable computing: concepts, limits, challenges , 1995 .
[23] Dhiraj K. Pradhan,et al. Single element correction in sorting algorithms with minimum delay overhead , 2009, 2009 10th Latin American Test Workshop.
[24] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .
[25] Pradip Bose,et al. Understanding Error Propagation in GPGPU Applications , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[26] Melvin A. Breuer,et al. Multi-media applications and imprecise computation , 2005, 8th Euromicro Conference on Digital System Design (DSD'05).
[27] L. Carro,et al. An Efficient and Experimentally Tuned Software-Based Hardening Strategy for Matrix Multiplication on GPUs , 2013, IEEE Transactions on Nuclear Science.
[28] David Blaauw,et al. Using Low Cost Erasure and Error Correction Schemes to Improve Reliability of Commodity DRAM Systems , 2016, IEEE Transactions on Computers.
[29] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[30] Hans Werner Meuer,et al. Top500 Supercomputer Sites , 1997 .
[31] Thiago Santini,et al. Evaluation and Mitigation of Radiation-Induced Soft Errors in Graphics Processing Units , 2016, IEEE Transactions on Computers.
[32] Jacob A. Abraham,et al. Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.
[33] Bo Fang,et al. GPU-Qin: A methodology for evaluating the error resilience of GPGPU applications , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[34] Gokcen Kestor,et al. Understanding the propagation of transient errors in HPC applications , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[35] Mauricio Hanzich,et al. Mimetic seismic wave modeling including topography on deformed staggered grids , 2014 .
[36] Cristian Constantinescu,et al. Impact of deep submicron technology on dependability of VLSI circuits , 2002, Proceedings International Conference on Dependable Systems and Networks.
[37] Luigi Carro,et al. Radiation-Induced Error Criticality in Modern HPC Parallel Accelerators , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[38] Franck Cappello,et al. Addressing failures in exascale computing , 2014, Int. J. High Perform. Comput. Appl..
[39] Laura Monroe,et al. GPU Behavior on a Large HPC Cluster , 2013, Euro-Par Workshops.
[40] R.C. Baumann,et al. Radiation-induced soft errors in advanced semiconductor technologies , 2005, IEEE Transactions on Device and Materials Reliability.
[41] Robyn R. Lutz,et al. Analyzing software requirements errors in safety-critical, embedded systems , 1993, [1993] Proceedings of the IEEE International Symposium on Requirements Engineering.
[42] Ravishankar K. Iyer,et al. An experimental study of soft errors in microprocessors , 2005, IEEE Micro.
[43] Claus Braun,et al. Efficacy and efficiency of algorithm-based fault-tolerance on GPUs , 2013, 2013 IEEE 19th International On-Line Testing Symposium (IOLTS).