From experiment to design - fault characterization and detection in parallel computer systems using computational accelerators
暂无分享,去创建一个
[1] Ravishankar K. Iyer,et al. Hierarchical Simulation Approach to Accurate Fault Modeling for System Dependability Evaluation , 1999, IEEE Trans. Software Eng..
[2] Ravishankar K. Iyer,et al. DEPEND: A Simulation-Based Environment for System Level Dependability Analysis , 1997, IEEE Trans. Computers.
[3] James L. Walsh,et al. IBM experiments in soft fails in computer electronics (1978-1994) , 1996, IBM J. Res. Dev..
[4] Stephanie Forrest,et al. A sense of self for Unix processes , 1996, Proceedings 1996 IEEE Symposium on Security and Privacy.
[5] Jun Yang,et al. Frequent value compression in data caches , 2000, MICRO 33.
[6] G. Amdhal,et al. Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).
[7] Pedro J. Gil,et al. A prototype of a VHDL-based fault injection tool: description and application , 2002, J. Syst. Archit..
[8] Jason Cong,et al. Application-specific instruction generation for configurable processor architectures , 2004, FPGA '04.
[9] Cristian Constantinescu,et al. Trends and Challenges in VLSI Circuit Reliability , 2003, IEEE Micro.
[10] Sanjay J. Patel,et al. Characterizing the effects of transient faults on a high-performance processor pipeline , 2004, International Conference on Dependable Systems and Networks, 2004.
[11] James L. Walsh,et al. Field testing for cosmic ray soft errors in semiconductor memories , 1996, IBM J. Res. Dev..
[12] John A. Gunnels,et al. Extending stability beyond CPU millennium: a micron-scale atomistic simulation of Kelvin-Helmholtz instability , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[13] Dan Tsafrir,et al. System noise, OS clock ticks, and fine-grained parallel applications , 2005, ICS '05.
[14] Barry W. Johnson,et al. System-level modeling in the ADEPT environment of a distributed computer system for real-time applications , 1995, Proceedings of 1995 IEEE International Computer Performance and Dependability Symposium.
[15] Huntington W. Curtis,et al. Accelerated testing for cosmic soft-error rate , 1996, IBM J. Res. Dev..
[16] Tipp Moseley,et al. Using Process-Level Redundancy to Exploit Multiple Cores for Transient Fault Tolerance , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).
[17] Mahmut T. Kandemir,et al. Analyzing heap error behavior in embedded JVM environments , 2004, International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004..
[18] Miguel Castro,et al. Fast byte-granularity software fault isolation , 2009, SOSP '09.
[19] Edward J. McCluskey,et al. Error detection by duplicated instructions in super-scalar processors , 2002, IEEE Trans. Reliab..
[20] Kevin Skadron,et al. A hardware redundancy and recovery mechanism for reliable scientific computation on graphics processors , 2007, GH '07.
[21] Eduardo Pinheiro,et al. DRAM errors in the wild: a large-scale field study , 2009, SIGMETRICS '09.
[22] Diana Marculescu,et al. Multiple Transient Faults in Combinational and Sequential Circuits: A Systematic Approach , 2010, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[23] Matthias Hauswirth,et al. Automating performance testing of interactive Java applications , 2010, AST '10.
[24] L. Borucki,et al. Comparison of accelerated DRAM soft error rates measured at component and system level , 2008, 2008 IEEE International Reliability Physics Symposium.
[25] David Lie,et al. Using VMM-based sensors to monitor honeypots , 2006, VEE '06.
[26] Fred L. Yang,et al. Simulation of faults causing analog behavior in digital circuits , 1992 .
[27] Bianca Schroeder,et al. Disk Failures in the Real World: What Does an MTTF of 1, 000, 000 Hours Mean to You? , 2007, FAST.
[28] Ryuji Kan,et al. Validation of hardware error recovery mechanisms for the SPARC64 V microprocessor , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).
[29] Rakesh Kumar,et al. Algorithmic approaches to low overhead fault detection for sparse linear algebra , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).
[30] Sayed Mohammad Kia,et al. Micro embedded monitoring for security in application specific instruction-set processors , 2005, CASES '05.
[31] Inderpal S. Bhandari,et al. Orthogonal Defect Classification - A Concept for In-Process Measurements , 1992, IEEE Trans. Software Eng..
[32] Jan Vitek,et al. Efficient intrusion detection using automaton inlining , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).
[33] Ravishankar K. Iyer,et al. Error sensitivity of the Linux kernel executing on PowerPC G4 and Pentium 4 processors , 2004, International Conference on Dependable Systems and Networks, 2004.
[34] Ravishankar K. Iyer,et al. FAMAS: FAult Modeling via Adaptive Simulation , 1997, Proceedings Tenth International Conference on VLSI Design.
[35] James F. Ziegler,et al. Terrestrial cosmic rays , 1996, IBM J. Res. Dev..
[36] Ravishankar K. Iyer,et al. NFTAPE: a framework for assessing dependability in distributed systems with lightweight fault injectors , 2000, Proceedings IEEE International Computer Performance and Dependability Symposium. IPDS 2000.
[37] G. C. Messenger,et al. Collection of Charge on Junction Nodes from Ion Tracks , 1982, IEEE Transactions on Nuclear Science.
[38] Charng-Da Lu,et al. Assessing Fault Sensitivity in MPI Applications , 2004, Proceedings of the ACM/IEEE SC2004 Conference.
[39] Ravishankar K. Iyer,et al. Measurement-based analysis of fault and error sensitivities of dynamic memory , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).
[40] Joel Emer,et al. A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..
[41] David A. Wagner,et al. Intrusion detection via static analysis , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.
[42] Satoshi Matsuoka,et al. A high-performance fault-tolerant software framework for memory on commodity GPUs , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[43] Ram Chillarege,et al. Understanding large system failures-a fault injection experiment , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.
[44] Henrique Madeira,et al. RIFLE: A General Purpose Pin-level Fault Injector , 1994, EDCC.
[45] Kishor S. Trivedi. Probability and Statistics with Reliability, Queuing, and Computer Science Applications , 1984 .
[46] Edward J. McCluskey,et al. Concurrent Error Detection Using Watchdog Processors - A Survey , 1988, IEEE Trans. Computers.
[47] Ravishankar K. Iyer,et al. Hauberk: Lightweight Silent Data Corruption Error Detector for GPGPU , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[48] Jean Arlat,et al. Dependability of COTS Microkernel-Based Systems , 2002, IEEE Trans. Computers.
[49] Volodymyr Kindratenko,et al. On testing GPU memory for hard and soft errors , 2011 .
[50] Ravishankar K. Iyer,et al. Automated Derivation of Application-aware Error Detectors using Static Analysis , 2007, 13th IEEE International On-Line Testing Symposium (IOLTS 2007).
[51] David Kaeli,et al. Virtual machine monitor-based lightweight intrusion detection , 2011, OPSR.
[52] Edward J. McCluskey,et al. Word-voter: a new voter design for triple modular redundant systems , 2000, Proceedings 18th IEEE VLSI Test Symposium.
[53] D.A. Rennels,et al. Fault Injection Campaign for a Fault Tolerant Duplex Framework , 2007, 2007 IEEE Aerospace Conference.
[54] Carl E. Landwehr,et al. Basic concepts and taxonomy of dependable and secure computing , 2004, IEEE Transactions on Dependable and Secure Computing.
[55] William G. Griswold,et al. Dynamically discovering likely program invariants to support program evolution , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).
[56] Algirdas Avizienis,et al. The N-Version Approach to Fault-Tolerant Software , 1985, IEEE Transactions on Software Engineering.
[57] Rajeev Rastogi,et al. Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD 2000.
[58] R. Butler. Outlier Discordancy Tests in the Normal Linear Model , 1983 .
[59] Jack J. Dongarra,et al. The LINPACK Benchmark: past, present and future , 2003, Concurr. Comput. Pract. Exp..
[60] Ravishankar K. Iyer,et al. Measurement-Based Analysis of Error Latency , 1987, IEEE Transactions on Computers.
[61] Raymond T. Ng,et al. Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.
[62] Philip K. Chan,et al. Learning Patterns from Unix Process Execution Traces for Intrusion Detection , 1997 .
[63] Amin Ansari,et al. Shoestring: probabilistic soft error reliability on the cheap , 2010, ASPLOS 2010.
[64] Huiyang Zhou,et al. Understanding software approaches for GPGPU reliability , 2009, GPGPU-2.
[65] Xin Li,et al. A Memory Soft Error Measurement on Production Systems , 2007, USENIX Annual Technical Conference.
[66] Daniel P. Siewiorek,et al. Fault Injection Experiments Using FIAT , 1990, IEEE Trans. Computers.
[67] George M. Castillo,et al. Single event upset testing of commercial off-the-shelf electronics for launch vehicle applications , 2011, 2011 Aerospace Conference.
[68] Todd M. Austin,et al. DIVA: a reliable substrate for deep submicron microarchitecture design , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.
[69] P. Rousseeuw,et al. Computing depth contours of bivariate point clouds , 1996 .
[70] Hans P. Muhlfeld,et al. Cosmic ray soft error rates of 16-Mb DRAM memory chips , 1998, IEEE J. Solid State Circuits.
[71] Bernd Becker,et al. A study of cognitive resilience in a JPEG compressor , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).
[72] Grigore Rosu,et al. Mop: an efficient and generic runtime verification framework , 2007, OOPSLA.
[73] Timothy J. Slegel,et al. IBM's S/390 G5 microprocessor design , 1999, IEEE Micro.
[74] N. Hengartner,et al. Predicting the number of fatal soft errors in Los Alamos national laboratory's ASC Q supercomputer , 2005, IEEE Transactions on Device and Materials Reliability.
[75] Eun Ha Kim,et al. Implementing an Effective Test Automation Framework , 2009, 2009 33rd Annual IEEE International Computer Software and Applications Conference.
[76] David I. August,et al. SWIFT: software implemented fault tolerance , 2005, International Symposium on Code Generation and Optimization.
[77] David I. August,et al. Automatic Instruction-Level Software-Only Recovery , 2006, IEEE Micro.
[78] Jacob A. Abraham,et al. Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.
[79] Ravishankar K. Iyer,et al. An architectural framework for providing reliability and security support , 2004, International Conference on Dependable Systems and Networks, 2004.
[80] Janak H. Patel,et al. Reliability of scrubbing recovery-techniques for memory systems , 1990 .
[81] Sarita V. Adve,et al. Using likely program invariants to detect hardware errors , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).
[82] Kang G. Shin,et al. Measurement and Application of Fault Latency , 1986, IEEE Transactions on Computers.
[83] Sanjay J. Patel,et al. ReStore: symptom based soft error detection in microprocessors , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).
[84] Neha Narula,et al. Native Client: A Sandbox for Portable, Untrusted x86 Native Code , 2009, IEEE Symposium on Security and Privacy.
[85] Sarita V. Adve,et al. Understanding the propagation of hard errors to software and implications for resilient system design , 2008, ASPLOS.
[86] Ravishankar K. Iyer,et al. FINE: A Fault Injection and Monitoring Environment for Tracing the UNIX System Behavior under Faults , 1993, IEEE Trans. Software Eng..
[87] Hovav Shacham,et al. On the effectiveness of address-space randomization , 2004, CCS '04.
[88] Ravishankar K. Iyer,et al. Quantitative Analysis of Long-Latency Failures in System Software , 2009, 2009 15th IEEE Pacific Rim International Symposium on Dependable Computing.
[89] Jean-Claude Laprie,et al. Dependable computing: concepts, limits, challenges , 1995 .
[90] Vijay S. Pande,et al. Hard Data on Soft Errors: A Large-Scale Assessment of Real-World Error Rates in GPGPU , 2009, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.
[91] Henrique Madeira,et al. Emulation of Software Faults: A Field Data Study and a Practical Approach , 2006, IEEE Transactions on Software Engineering.
[92] Karthikeyan Sankaralingam,et al. Relax: an architectural framework for software recovery of hardware faults , 2010, ISCA.
[93] H.H.K. Tang,et al. Measurement of the flux and energy spectrum of cosmic-ray induced neutrons on the ground , 2004, IEEE Transactions on Nuclear Science.
[94] Neeraj Suri,et al. On the placement of software mechanisms for detection of data errors , 2002, Proceedings International Conference on Dependable Systems and Networks.
[95] Jacob A. Abraham,et al. FERRARI: A Flexible Software-Based Fault and Error Injection System , 1995, IEEE Trans. Computers.
[96] Kevin Skadron,et al. The visual vulnerability spectrum: characterizing architectural vulnerability for graphics hardware , 2006, GH '06.
[97] Ravishankar K. Iyer,et al. FOCUS: An Experimental Environment for Fault Sensitivity Analysis , 1992, IEEE Trans. Computers.
[98] Ravishankar K. Iyer,et al. Microprocessor sensitivity to failures: control vs. execution and combinational vs. sequential logic , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).
[99] Brad Calder,et al. Phase tracking and prediction , 2003, ISCA '03.
[100] Salvatore J. Stolfo,et al. A data mining framework for building intrusion detection models , 1999, Proceedings of the 1999 IEEE Symposium on Security and Privacy (Cat. No.99CB36344).
[101] Daniel Pierre Bovet,et al. Understanding the Linux Kernel , 2000 .
[102] Milos Krstic,et al. FPGA implementation of hardware voter , 2001, 5th International Conference on Telecommunications in Modern Satellite, Cable and Broadcasting Service. TELSIKS 2001. Proceedings of Papers (Cat. No.01EX517).