Virtual Cache Line: A New Technique to Improve Cache Exploitation for Recursive Data Structures

Recursive data structures (lists, trees, graphs, etc.) are used throughout scientific and commercial software. The common approach is to allocate storage to the individual nodes of such structures dynamically, maintaining the logical connection between them via pointers. Once such a data structure goes through a sequence of updates (inserts and deletes), it may get scattered all over memory yielding poor spatial locality, which in turn introduces many cache misses. In this paper we present the new concept of Virtual Cache Lines (VCLs). Basically, the mechanism keeps groups of consecutive nodes in close proximity, forming virtual cache lines, while allowing the groups to be stored arbitrarily far away from each other. Virtual cache lines increase the spatial locality of the given data structure resulting in better locality of references. Furthermore, since the spatial locality is improved, software prefetching becomes much more attractive. Indeed, we also present a software prefetching algorithm that can be used when dealing with VCLs resulting in even higher data cache performance. Our results show that the average performance of linked list operations, like scan, insert, and delete can be improved by more than 200% even in architectures that do not support prefetching, like the Intel Pentium. Moreover, when using prefetching one can gain additional 100% improvement. We believe that given a program that manipulates certain recursive data structures, compilers will be able to generate VCL-based code. Also, until this vision becomes true, VCLs can be used to build more efficient user libraries, operating-systems and applications programs.

[1]  Michael J. Flynn,et al.  Reducing Cache Miss Rates Using Prediction Caches , 1996 .

[2]  Reinhard Wilhelm,et al.  Solving shape-analysis problems in languages with destructive updating , 1998, TOPL.

[3]  Todd C. Mowry,et al.  Compiler-based prefetching for recursive data structures , 1996, ASPLOS VII.

[4]  Wen-mei W. Hwu,et al.  Achieving High Instruction Cache Performance With An Optimizing Compiler , 1989, The 16th Annual International Symposium on Computer Architecture.

[5]  David R. Musser,et al.  STL tutorial and reference guide , 2001 .

[6]  Reinhard Wilhelm,et al.  Parametric shape analysis via 3-valued logic , 1999, POPL '99.

[7]  Michael E. Wolf,et al.  The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.

[8]  R. Bayer,et al.  Organization and maintenance of large ordered indices , 1970, SIGFIDET '70.

[9]  Susan Horwitz,et al.  Fast and accurate flow-insensitive points-to analysis , 1997, POPL '97.

[10]  Tom Shanley PowerPC system architecture , 1995, PC system architecture series.

[11]  Karl Pettis,et al.  Profile guided code positioning , 1990, PLDI '90.

[12]  Ken Kennedy,et al.  Software prefetching , 1991, ASPLOS IV.

[13]  Monica S. Lam,et al.  The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.

[14]  Todd C. Mowry,et al.  Tolerating latency through software-controlled data prefetching , 1994 .

[15]  Chandra Krintz,et al.  Cache-conscious data placement , 1998, ASPLOS VIII.

[16]  David Bernstein,et al.  Compiler techniques for data prefetching on the PowerPC , 1995, PACT.

[17]  David R. Musser,et al.  STL tutorial and reference guide - C++ programming with the standard template library , 1996, Addison-Wesley professional computing series.

[18]  David A. Wood,et al.  Cache profiling and the SPEC benchmarks: a case study , 1994, Computer.

[19]  D. B. Davis,et al.  Intel Corp. , 1993 .

[20]  James R. Larus,et al.  Using generational garbage collection to implement cache-conscious data placement , 1998, ISMM '98.

[21]  T. Mowry,et al.  Comparative evaluation of latency reducing and tolerating techniques , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.