Zapmem: A Framework for Testing the Effect of Memory Corruption Errors on Operating System Kernel Reliability

While monolithic operating system kernels are composed of many subsystems, during runtime they all share a common address space, making fault propagation a serious issue. The code quality of each subsystem is different, as OS development is a complex task commonly divided by different groups with different degrees of expertise. Since the memory space into which this code runs is shared, the occurrence of bugs or errors in one of the subsystems may propagate to others and affect general OS reliability. It is necessary, then, to test how errors propagate between the different kernel subsystems and how they affect reliability. This work presents a simple new technique to inject memory corruption faults and Zapmem, a fault injection tool which uses such technique to test the effect on reliability from memory corruption of statically allocated kernel data. Zapmem associates the runtime memory addresses to the corresponding high level (source code) memory structure definitions, which indicate which kernel subsystem allocated that memory region, and the tool has minimal intrusiveness, as our technique does not require kernel instrumentation. The efficacy of our approach and preliminary results are also presented.

[1]  Barton P. Miller,et al.  An empirical study of the reliability of UNIX utilities , 1990, Commun. ACM.

[2]  Neeraj Suri,et al.  Error propagation profiling of operating systems , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[3]  Henrique Madeira,et al.  Xception: Software Fault Injection and Monitoring in Processor Functional Units1 , 1995 .

[4]  Ieee Standards Board IEEE Standard for a High Performance Serial Bus-Amendment 1 , 2000 .

[5]  Ravishankar K. Iyer,et al.  Characterization of linux kernel behavior under errors , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[6]  Herbert Bos,et al.  Construction of a Highly Dependable Operating System , 2006, 2006 Sixth European Dependable Computing Conference.

[7]  João Carreira,et al.  Why do some (weird) people inject faults? , 1998, SOEN.

[8]  Jean Arlat,et al.  Characterization of the impact of faulty drivers on the robustness of the Linux kernel , 2004, International Conference on Dependable Systems and Networks, 2004.