Utilization of Separate Caches to Eliminate Cache Pollution Caused by Memory Management Functions

Data intensive service functions such as memory allocation/de-allocation, data prefetching, and data relocation can pollute processor cache in conventional systems since the same CPU (using the same cache) executes both application code and system services. In this paper we show the improvements in cache performance that can result from the elimination of the cache pollution using separate caches for memory management functions. For the purpose of our study we simulate the existence of separate hardware units for the application and the memory management services using two Unix processes. One process executes application code (simulating main CPU) while the other executes memory management code. We collected address traces for the two processes and used Dinero IV cache simulator to evaluate the expected cache behaviors. A second goal of this paper is to examine the cache performance of different memory allocators. In this paper we compare two allocators: a very popular segregated list based allocator (originally due to Doug Lea) and our own binary-tree based allocator (called Address-ordered Binary Tree).

[1]  C. J. Stephenson,et al.  New methods for dynamic storage allocation (Fast Fits) , 1983, SOSP '83.

[2]  Amitabh Srivastava,et al.  Analysis Tools , 2019, Public Transportation Systems.

[3]  Krishna M. Kavi,et al.  Execution and Cache Performance of the Scheduled Dataflow Architecture , 2000, J. Univers. Comput. Sci..

[4]  Christoforos E. Kozyrakis,et al.  A New Direction for Computer Architecture Research , 1998, Computer.

[5]  Dean M. Tullsen,et al.  Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[6]  Frederic T. Chong,et al.  Active pages: a computation model for intelligent memory , 1998, ISCA.

[7]  Kenneth C. Knowlton,et al.  A fast storage allocator , 1965, CACM.

[8]  C. J. Stephenson Fast fits--new methods for dynamic storage allocation , 1983 .

[9]  Donald E. Knuth,et al.  The art of computer programming: V.1.: Fundamental algorithms , 1997 .

[10]  Donald E. Knuth,et al.  The Art of Computer Programming, Volume I: Fundamental Algorithms, 2nd Edition , 1997 .

[11]  Anant Agarwal,et al.  APRIL: a processor architecture for multiprocessing , 1990, ISCA '90.

[12]  Paul R. Wilson,et al.  Dynamic Storage Allocation: A Survey and Critical Review , 1995, IWMM.

[13]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[14]  Jack L. Lo,et al.  Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[15]  Kathryn S. McKinley,et al.  Composing high-performance memory allocators , 2001, PLDI '01.

[16]  Graem A. Ringwood,et al.  Garbage collecting the Internet: a survey of distributed garbage collection , 1998, CSUR.

[17]  Paul R. Wilson,et al.  The memory fragmentation problem: solved? , 1998, ISMM '98.

[18]  Krishna M. Kavi,et al.  Performance Evaluation of a Non-Blocking Multithreaded Architecture for Embedded, Real-Time and DSP Applications , 2001, PDCS.

[19]  Toru Shimizu,et al.  M32R/D-integrating DRAM and microprocessor , 1997, IEEE Micro.

[20]  Krishna M. Kavi,et al.  Scheduled Dataflow: Execution Paradigm, Architecture, and Performance Evaluation , 2001, IEEE Trans. Computers.