Generating Miss Rate Curves with Low Overhead Using Existing Hardware

Generating Miss Rate Curves with Low Overhead Using Existing Hardware Tom Walsh Master of Science Graduate Department of Computer Science University of Toronto 2009 Miss Rate Curves (MRCs) for main memory have been proposed as a representation of memory utilization for use in a range of optimizations in the area of memory management. Various techniques exist for their creation; however, all real-world methods of MRC generation must make trade-offs between overhead and accuracy. Proposals for new hardware techniques exist, but have yet to be implemented in actual hardware. We propose the use of the Intel PEBS (Precise Event-Based Sampling) performance monitoring capability for the task of MRC generation on existing commodity hardware. We use PEBS to generate MRCs and compare them against MRCs generated through instrumentation, finding the PEBS MRCs to be good, but imperfect approximations, while keeping average PEBS overheads below 5%. We were unable to show that PEBS is better or worse than existing techniques, but believe we have succeeded in showing the promise of the use of general purpose performance monitoring hardware for this task and in motivating future research and development in this area.

[1]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[2]  Per Stenström,et al.  An analytical model of the working-set sizes in decision-support systems , 2000, SIGMETRICS '00.

[3]  Emery D. Berger,et al.  CRAMM: virtual memory support for garbage-collected applications , 2006, OSDI '06.

[4]  James Archibald,et al.  BACH: BYU Address Collection Hardware, The Collection of Complete Traces , 1992 .

[5]  Jim Zelenka,et al.  Informed prefetching and caching , 1995, SOSP.

[6]  Sang Lyul Min,et al.  A low-overhead high-performance unified buffer management scheme that exploits sequential and looping references , 2000, OSDI.

[7]  Eyal de Lara,et al.  SnowFlock: rapid virtual machine cloning for cloud computing , 2009, EuroSys '09.

[8]  Vivek Thakkar Dynamic Page Migration on ccNUMA Platforms Guided by HardwareTracing , 2008 .

[9]  Yuanyuan Zhou,et al.  The Multi-Queue Replacement Algorithm for Second Level Buffer Caches , 2001, USENIX Annual Technical Conference, General Track.

[10]  David A. Wood,et al.  Implementing stack simulation for highly-associative memories , 1991, SIGMETRICS '91.

[11]  G. Edward Suh,et al.  Dynamic Partitioning of Shared Cache Memory , 2004, The Journal of Supercomputing.

[12]  Sanjeev Kumar,et al.  Dynamic tracking of page miss ratio curve for memory management , 2004, ASPLOS XI.

[13]  Gheith A. Abandah,et al.  Configuration independent analysis for characterizing shared-memory applications , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[14]  John Turek,et al.  Optimal Partitioning of Cache Memory , 1992, IEEE Trans. Computers.

[15]  Madhusudan Raman Trace-Based Optimization for Precomputation and Prefetching , 2006 .

[16]  Yale N. Patt,et al.  Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[17]  Sally A. McKee,et al.  Reflections on the memory wall , 2004, CF '04.

[18]  S. Eranian Perfmon2: a flexible performance monitoring interface for Linux , 2010 .

[19]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[20]  Myles G. Watson,et al.  Does the Halting Necessary for Hardware Trace Collection Inordinately Perturb the Results , 2004 .

[21]  Michael Stumm,et al.  RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations , 2009, ASPLOS.

[22]  Michael Stumm,et al.  Online performance analysis by statistical sampling of microprocessor performance counters , 2005, ICS '05.

[23]  Michael Brudno,et al.  Adding the easy button to the cloud with SnowFlock and MPI , 2009, HPCVirt '09.

[24]  Michael Stumm,et al.  Path: page access tracking to improve memory management , 2007, ISMM '07.

[25]  Anoop Gupta,et al.  Working sets, cache sizes, and node granularity issues for large-scale multiprocessors , 1993, ISCA '93.

[26]  Kai Shen,et al.  Virtual Machine Memory Access Tracing with Hypervisor Exclusive Cache , 2007, USENIX Annual Technical Conference.

[27]  Song Jiang,et al.  LIRS: an efficient low inter-reference recency set replacement policy to improve buffer cache performance , 2002, SIGMETRICS '02.

[28]  Thomas R. Gross,et al.  Online optimizations driven by hardware performance monitoring , 2007, PLDI '07.

[29]  Dilma Da Silva,et al.  Experience with K42, an open-source, Linux-compatible, scalable operating-system kernel , 2005, IBM Syst. J..

[30]  Todd C. Mowry,et al.  Compiler-based I/O prefetching for out-of-core applications , 2001, TOCS.