HeapMon: a Low Overhead, Automatic, and Programmable Memory Bug Detector

Detection of memory-related bugs is a very important aspect of the software development cycle, yet there are not many reliable and efficient tools available for this purpose. Most of the tools and techniques available have either a high performance overhead or require a high degree of human intervention. This paper presents HeapMon, a novel hardware/software approach to detecting memory bugs, such as reads from uninitialized or unallocated memory locations. This new approach does not require human intervention and has only minor storage and execution time overheads. HeapMon relies on a helper thread that runs on a separate processor in a CMP system. The thread monitors the status of each word on the heap by associating state bits with it. These state bits indicate whether the word is unallocated, allocated but uninitialized, or allocated and initialized. The state bits associated with a word are updated when the word is allocated, initialized, or deallocated. They are checked on reads or writes. Bugs are detected as illegal operations, such as writes to unallocated memory regions and reads from unallocated or uninitialized regions. When a bug is detected, its type, PC, and address are logged to enable developers to precisely pinpoint the bug’s nature and location. The hardware support for HeapMon consists of augmenting each cached word with one extra state bit, communication queues between the application thread and the helper thread, and a small private cache for the helper thread. We test the effectiveness of our approach with existing and injected memory bugs. Our experimental results show that HeapMon effectively detects and identifies most forms of heap memory bugs. To study the performance overheads of the new mechanism, we test it on SPEC 2000 benchmarks. Our results show that the overhead of our approach is significantly lower than that imposed by existing tools. The storage overhead is 3.1% of the cache size and 6.2% of the allocated heap memory size. Although architectural support for HeapMon is simple, its execution time overhead is only 8% on average, and less than 26% in the worst case.

[1]  N. S. Hoang,et al.  A Low-Cost , 1997 .

[2]  Crispan Cowan,et al.  StackGuard: Automatic Adaptive Detection and Prevention of Buffer-Overflow Attacks , 1998, USENIX Security Symposium.

[3]  Wei Liu,et al.  AccMon: Automatically Detecting Memory-Related Bugs via Program Counter-Based Invariants , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[4]  Krste Asanovic,et al.  Mondrian memory protection , 2002, ASPLOS X.

[5]  M. Dubois,et al.  Assisted Execution , 1998 .

[6]  Harish Patil,et al.  Efficient Run-time Monitoring Using Shadow Processing , 1995, AADEBUG.

[7]  Yuanyuan Zhou,et al.  SafeMem: exploiting ECC-memory for detecting memory leaks and memory corruption during production runs , 2005, 11th International Symposium on High-Performance Computer Architecture.

[8]  W. Daniel Hillis,et al.  The CM-5 Connection Machine: a scalable supercomputer , 1993, CACM.

[9]  Yan Solihin,et al.  HeapMon: A helper-thread approach to programmable, automatic, and low-overhead memory bug detection , 2006, IBM J. Res. Dev..

[10]  James R. Larus,et al.  Tempest and typhoon: user-level shared memory , 1994, ISCA '94.

[11]  Donald Yeung,et al.  Physical experimentation with prefetching helper threads on Intel's hyper-threaded processors , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[12]  Andrew A. Chien,et al.  Architecture of a message-driven processor , 1987, ISCA '87.

[13]  John Paul Shen,et al.  Dynamic speculative precomputation , 2001, MICRO.

[14]  Seth Copen Goldstein,et al.  Evaluation of mechanisms for fine-grained parallel programs in the J-machine and the CM-5 , 1993, ISCA '93.

[15]  Wei Liu,et al.  iWatcher: efficient architectural support for software debugging , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[16]  Harish Patil,et al.  Low‐cost, Concurrent Checking of Pointer and Array Accesses in C Programs , 1997, Softw. Pract. Exp..

[17]  Monica S. Lam,et al.  Enhancing software reliability with speculative threads , 2002, ASPLOS X.

[18]  Robert J. Fowler,et al.  MINT: a front end for efficient simulation of shared-memory multiprocessors , 1994, Proceedings of International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[19]  Susan Horwitz,et al.  Debugging via Run-Time Type Checking , 2001, FASE.

[20]  Greg Kroah-Hartman,et al.  Linux Device Drivers , 1998 .

[21]  Dawson R. Engler,et al.  Proceedings of the 5th Symposium on Operating Systems Design and Implementation Cmc: a Pragmatic Approach to Model Checking Real Code , 2022 .

[22]  Dawson R. Engler,et al.  A system and language for building system-specific, static analyses , 2002, PLDI '02.

[23]  Kaivalya M. Dixit,et al.  The SPEC benchmarks , 1991, Parallel Comput..

[24]  Allan Porterfield,et al.  The Tera computer system , 1990, ICS '90.

[25]  Chi-Keung Luk,et al.  Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.

[26]  Sudheendra Hangal,et al.  Tracking down software bugs using automatic anomaly detection , 2002, ICSE '02.

[27]  Todd M. Austin,et al.  Efficient detection of all pointer and array access errors , 1994, PLDI '94.

[28]  Gurindar S. Sohi,et al.  Speculative data-driven multithreading , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[29]  Michael Burrows,et al.  Eraser: a dynamic data race detector for multithreaded programs , 1997, TOCS.

[30]  Rob Williams,et al.  Linux device drivers , 2006 .

[31]  Dawson R. Engler,et al.  RacerX: effective, static detection of race conditions and deadlocks , 2003, SOSP '03.

[32]  Gregory Tassey,et al.  Prepared for what , 2007 .

[33]  Jong-Deok Choi,et al.  Efficient and precise datarace detection for multithreaded object-oriented programs , 2002, PLDI '02.

[34]  George C. Necula,et al.  CCured: type-safe retrofitting of legacy code , 2002, POPL '02.

[35]  David L. Dill,et al.  Automatic verification of the SCI cache coherence protocol , 1995, CHARME.

[36]  Josep Torrellas,et al.  Using a user-level memory thread for correlation prefetching , 2002, ISCA.

[37]  Josep Torrellas,et al.  A direct-execution framework for fast and accurate simulation of superscalar processors , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).