Analyzing heap error behavior in embedded JVM environments

Recent studies have shown that transient hardware errors caused by external factors such as alpha particles and cosmic ray strikes can be responsible for a large percentage of system down-time. Denser processing technologies, increasing clock speeds, and low supply voltages used in embedded systems can worsen this problem. In many embedded environments, one may not want to provision extensive error protection in hardware because of (i) form-factor or power consumption limitations, and/or (ii) to keep costs low. Also, the mismatch between the hardware protection granularity and the field access granularity can lead to false alarms and error cancellations. Consequently, software-based approaches to identify and possibly rectify these errors seem to be promising. Towards this goal, This work specifically looks to enhance the software's ability to detect heap memory errors in a Java-based embedded system. Using several embedded Java applications, This work first studies the tradeoffs between reliability, performance, and memory space overhead for two schemes that perform error checks at object and field granularities. We also study the impact of object characteristics (e.g., lifetime, re-use intervals, access frequency, etc.) on error propagation. Considering the pros and cons of these two schemes, we then investigate two hybrid strategies that attempt to strike a balance between memory space and performance overheads and reliability. Our experimental results clearly show that the granularity of error protection and its frequency can significantly impact static/dynamic overheads and error detection ability.

[1]  Kishor S. Trivedi Probability and Statistics with Reliability, Queuing, and Computer Science Applications , 1984 .

[2]  Ravishankar K. Iyer,et al.  FINE: A Fault Injection and Monitoring Environment for Tracing the UNIX System Behavior under Faults , 1993, IEEE Trans. Software Eng..

[3]  Timothy J. Dell,et al.  A white paper on the benefits of chipkill-correct ecc for pc server main memory , 1997 .

[4]  James F. Ziegler,et al.  Terrestrial cosmic rays , 1996, IBM J. Res. Dev..

[5]  L. D. Paulson Handheld-to-handheld fighting over java , 2001 .

[6]  Jean Arlat,et al.  Benchmarking the dependability of Windows NT4, 2000 and XP , 2004, International Conference on Dependable Systems and Networks, 2004.

[7]  Venkatesh Krishnan,et al.  sEc: A Portable Interpreter Optimizing Technique for Embedded Java Virtual Machine , 2002, Java Virtual Machine Research and Technology Symposium.

[8]  Nik Shaylor,et al.  A Just-in-Time Compiler for Memory-Constrained Low-Power Devices , 2002, Java Virtual Machine Research and Technology Symposium.

[9]  Narayanan Vijaykrishnan,et al.  Analyzing soft errors in leakage optimized SRAM design , 2003, 16th International Conference on VLSI Design, 2003. Proceedings..

[10]  Brendan Murphy,et al.  Windows 2000 Dependability , 2000 .

[11]  Kishor S. Trivedi,et al.  Performance And Reliability Analysis Of Computer Systems (an Example-based Approach Using The Sharpe Software , 1997, IEEE Transactions on Reliability.

[12]  David Keppel,et al.  Shade: a fast instruction-set simulator for execution profiling , 1994, SIGMETRICS.

[13]  Alan Messer,et al.  Increasing relevance of memory hardware errors: a case for recoverable programming models , 2000, EW 9.

[14]  Todd M. Austin,et al.  A fault tolerant approach to microprocessor design , 2001, 2001 International Conference on Dependable Systems and Networks.

[15]  Boudewijn R. Haverkort,et al.  Performance and reliability analysis of computer systems: An example-based approach using the sharpe software package , 1998 .

[16]  Arun K. Somani,et al.  An adaptive write error detection technique in on-chip caches of multi-level caching systems , 1999, Microprocess. Microsystems.

[17]  G. R. Srinivasan Modeling the cosmic-ray-induced soft-error rate in integrated circuits: An overview , 1996, IBM J. Res. Dev..

[18]  Alan Messer,et al.  JVM Susceptibility to Memory Errors , 2001, Java Virtual Machine Research and Technology Symposium.

[19]  Chung-Ho Chen,et al.  Fault Containment in Cache Memories for TMR Redundant Processor Systems , 1999, IEEE Trans. Computers.

[20]  John I. McCool,et al.  Probability and Statistics With Reliability, Queuing and Computer Science Applications , 2003, Technometrics.