Space Eecient Conservative Garbage Collection

We call a garbage collector conservative if it has only partial information about the location of pointers and is thus forced to treat arbitrary bit patterns as though they might be pointers in at least some cases We show that some very inexpensive but previously unused tech niques can have dramatic impact on the e ectiveness of conservative garbage collectors in reclaiming mem ory Our most signi cant observation is that static data that appears to point to the heap should not result in misidenti ed references to the heap The garbage collec tor has enough information to allocate around such ref erences We also observe that programming style has a signi cant impact on the amount of spuriously retained storage typically even if the collector is not terribly conservative Some fairly common C and C pro gramming styles signi cantly decrease the e ectiveness of any garbage collector These observations su ce to explain some of the di erent assessments of conservative collection that have appeared in the literature Copyright c by the Association for Computing Machinery Inc Permission to make digital or hard copies of part or all of this work for personal or class room use is granted without fee provided that copies are not made or distributed for pro t or commercial advanatage and that copies bear this notice and the full citation on the rst page Copyrights for components of this work owned by others than ACM must be honored Abstracting with credit is permitted To copy other wise to republish to post on severs or to redistribute to lists requires prior speci c permission and or a fee Re quest permissions from Publications Dept ACM Inc fax or permissions acm org This originally appeared in Proceedings of the ACM SIG PLAN Conference on Programming Language De sign and Implementation SIGPLAN Notices JUne pp Introduction Garbage collectors reclaim storage that has been allo cated by a client program but is no longer accessible by following pointers from program variables For a re cent survey of the problem and of garbage collection techniques see Conservative garbage collectors can operate with only minimal information about the layout of the client program s data Instead of relying on compiler provided information on the location of pointers they assume that any bit pattern that could be a valid pointer in fact is a valid pointer Generally this is safe only under the assumption that objects do not move However hybrids that rely on some exact pointer information to move some objects are both possible and often used It is possible to construct conservative garbage col lectors that utilize many of the same performance im provement techniques as conventional collectors Gener ational conservative collectors have been constructed as have concurrent collectors that greatly reduce client pause times Conservative collectors have been used successfully even with fairly large conventional C programs Such collectors have also been used as a debugging tool for programs that explicitly deallocate storage Conservative garbage collection also makes it possible to easily compile other programming languages that re quire garbage collection into e cient C thus providing a portable implementation that can take advantage of the manufacturers C compilers to obtain competitive performance Programming language implementations that rely on conservative collection in this manner in clude the only commonly available implementations of Modula SRC Modula and Sather as well as portable implementations of Scheme ML Common Lisp AKCL Mesa and CLU The correctness of such an approach can be guaranteed with minimal restrictions on C compiler optimizations In fact most current systems use standard C compilers and ignore any possi bility of unsafe compiler optimizations among others These vary greatly in their degree of con servativism i e in how much information about data structure layout they maintain Some maintain com plete information on the location of pointers in the heap and only scan the stack conservatively Oth ers also treat the heap conservatively Thus the following observations apply to di erent degrees Most applications of such collectors have encountered few peoblems In particular the Xerox Portable Com mon Runtime system is used routinely to run more than a million lines of Cedar Mesa code that have been compiled to C Nonetheless a few negative results have been reported in the literature In particular several authors have reported signi cant memory leak age under some circumstances i e signi cant amounts of inaccessible memory were not reclaimed The nega tive performance results of are probably partially attributable to such leakage Other papers cf point to the dangers of such leakage but do not cite speci c empirical results We note that garbage collection with minimal leak age is fundamentally an optimization problem and not an absolute issue of correctness The notion of a zero leakage garbage collector is ill de ned As pointed out in for example programming language de nitions rarely never de ne a notion of accessible memory In deed it is hard to see how to do so without disallowing essential compiler optimizations Thus the notion of re claiming all inaccessible storage is ill de ned Indeed the traditional interpretation as pointer reachability in a given implementation is both dependent on the imple mentation and not optimal in any real sense There are many cases in which pointer accessible structures can be safely discarded cf This is not an indictment of automatic garbage collection C malloc implementa tions usually provide no useful bound on space usage either In the worst case they are subject to disastrous fragmentation overhead Thus the goal of any garbage collector has to be to re tain as little memory as it can subject to the constraint that all memory that will be accessed in the future must be retained Like many compiler optimizations a fail ure by the run time system to solve this problem well is likely to lead to unacceptable results Also like many compiler optimizations it is important to give the pro grammer a reasonable idea of what programming styles are likely to result in unacceptable performance The remainder of this paper addresses these two is sues First the next two sections present some empirical results on the causes of spurious memory retention by conservative collectors and discuss re nements for such collectors that can greatly reduce such retention Our approach will be to reduce the probability that non pointer data will be mistakenly identi ed as pointers We then conclude with a discussion of programming techniques that can greatly alter the amount of memory retained as the result of a misidenti cation The only detailed previously published discussion of these issues appears to be by Wentworth He dis cusses the circumstances under which spurious retention is likely to be unacceptable Some minimal empirical re sults also appear in Some measurements of overall space usage are given by Zorn but he does not an alyze the causes of excess space consumption He does not speci cally discuss techniques for reducing such re tention Pointer Misidenti cation The most apparent potential source of excess memory retention by conservative collectors is the misidenti ca tion of for example integers as pointers If the collec tor nds an integer variable that happens to contain the address of a valid but inaccessible object and the run time system has no way to determine that it is indeed an integer then that object and other garbage objects referenced by it will be retained This can easily hap pen while for example trying to garbage collect C data structures The probability of such misidenti cation in creases if more of the address space is occupied by the heap since this increases the probability that a ran dom piece of data will happen to be the address of an object In some environments it is essential to recognize a pointer to the interior of an object as valid forcing the containing object to be retained This is often required if the source language requires that array elements can be passed by reference It potentially allows arbitrary portable fully ANSI conforming C programs to be garbage collected This requirement greatly increases the chance of misidenti cation Some simple ad hoc techniques can often greatly de crease the misidenti cation probability It is desirable to design the allocator to avoid allocating objects at ad dresses that are likely to collide with other data On a machine that ensures that pointers are stored at word boundaries in memory where a pointer is a word long an adequate solution sometimes consists of properly po sitioning the heap in the address space If the high or der bits of addresses are neither all zeros or all ones then con icts with integer data are unlikely Similarly likely character codes and oating point values can be avoided This also requires some minimal additional constraints on compiler optimizations Note that the standard does not de ne and hence renders unportable the results of many kinds of commonly used pointer arithmetic e g pointer hashing many of which are actually benign for conservative garbage collection Interestingly interiorpointers rarely need to be recognized if old C programsare run with garbage collection such programs normally also maintain a pointer to the base of the object in anticipation of having to explicitly deallocate it If pointers are not guaranteed to be properly aligned then all possible alignments must be considered by the collector thus greatly increasing the number of false pointers This situation is particularly unpleasant since the concatenation of the low order half word of an inte ger with the high order half word of the next integer can easily be a valid heap address see gure even if small integers by themselves are not valid heap addresses as on most machines Experience with the collector of indicates that the impact of this problem can be gre

[1]  David M. Ungar,et al.  Generation Scavenging: A non-disruptive high performance storage reclamation algorithm , 1984, SDE 1.

[2]  Hans-Juergen Boehm,et al.  Garbage collection in an uncooperative environment , 1988, Softw. Pract. Exp..

[3]  Daniel G. Bobrow,et al.  Combining generational and conservative garbage collection: framework and implementations , 1989, POPL '90.

[4]  Carl H. Hauser,et al.  The portable common runtime approach to interoperability , 1989, SOSP '89.

[5]  David L. Detlefs,et al.  Concurrent garbage collection for C , 1990 .

[6]  E. P. Wentworth Pitfalls of conservative garbage collection , 1990, Softw. Pract. Exp..

[7]  Scott Shenker,et al.  Mostly parallel garbage collection , 1991, PLDI '91.

[8]  Stephen M. Omohundro The Sather Language and Libraries , 1991, TOOLS.

[9]  Robert O. Hastings,et al.  Fast detection of memory leaks and access errors , 1991 .

[10]  John R. Rose,et al.  Integrating the Scheme and C languages , 1992, LFP '92.

[11]  Paul R. Wilson,et al.  Uniprocessor Garbage Collection Techniques , 1992, IWMM.

[12]  Emmanuel Chailloux A Conservative Garbage Collector with Ambiguous Roots for Static Typechecking Languages , 1992, IWMM.

[13]  Olivier Ridoux,et al.  Dynamic Memory Management for Sequential Logic Programming Languages , 1992, IWMM.

[14]  Regis Cridlig,et al.  An optimizing ML to C compiler , 1992 .

[15]  Benjamin Goldberg,et al.  Polymorphic type reconstruction for garbage collection without tags , 1992, LFP '92.

[16]  Benjamin G. Zorn,et al.  The measured cost of conservative garbage collection , 1993, Softw. Pract. Exp..