Backwards-Compatible Bounds Checking for Arrays and Pointers in C Programs

This paper presents a new approach to enforcing array bounds and pointer checking in the C language Check ing is rigorous in the sense that the result of pointer arithmetic must refer to the same object as the orig inal pointer this object is sometimes called the in tended referent The novel aspect of this work is that checked code can inter operate without restriction with unchecked code without interface problems with some e ective checking and without false alarms This backwards compatibility property allows the overheads of checking to be con ned to suspect modules and also facilitates the use of libraries for which source code is not available The paper describes the scheme its pro totype implementation as an extension to the GNU C compiler presents experimental results to evaluate its e ectiveness and discusses performance issues and the e ectiveness of some simple optimisations Introduction and related work C is unusual among programming languages in provid ing the programmer with the full power of pointers Languages in the Pascal Algol family have arrays and pointers with the restriction that arithmetic on point ers is disallowed Languages like BCPL allow arbitrary operations on pointers but lack types and so require clumsy scaling by object sizes An advantage of the Pascal Algol approach is that array references can be checked at run time fairly e ciently in fact so e ciently that there is a good case for bounds checking in production code Bounds check ing is easy for arrays because the array subscript syn tax speci es both the address calculation and the array within which the resulting pointer should point A pointer in C can be used in a context divorced from the name of the storage region for which it is valid it s intended referent and this has prevented a fully satisfactory bounds checking mechanism from being de veloped There is overwhelming evidence that bounds checking is desirable and a number of schemes have been presented The main di erence between our work and Kendall s bcc and Ste en s rtcc is that in our scheme the representation of pointers is unchanged This is crucial since it means that inter operation with non checked modules and libraries still works and much checking is still possible Compared with interpretative schemes like Sabre C we o er the potential for much higher performance Patil and Fischer present a sophisticated technique with very low overheads using a second CPU to perform checking in parallel Unfor tunately their scheme requires function interfaces to be changed to carry information about pointers so also has the inter operation problem Another approach is exempli ed by the commercially available checking package Purify Purify processes the binary representation of the software so can handle binary only code Each memory access instruction is modi ed to maintain a bit map of valid storage regions and whether each byte has been initialised Accesses to unallocated or uninitialised locations are reported as errors Purify catches many important bugs and is fairly e cient However Purify does not catch abuse of pointer arithmetic which yields a pointer to a valid region which is not the intended referent Fischer and Patil provide evidence for the importance of this re nement Our goals in this paper are to describe a method of bounds checking C programs that ful lls the following criteria Backwards compatibility the ability to mix checked code and unchecked libraries for which the source may be proprietary or otherwise unavailable Works with all common C programming styles