Fine-grained data usage analysis by access sampling: seeing the data that is not there

Estimating active data usage is a basic problem in memory system analysis, management and optimization. Fine-grained usage analysis is costly because it requires monitoring data access. This paper presents efficient fine-grained analysis through access sampling. By taking random samples at some frequency ratio, e.g. 1% of cache misses, it infers the size of the other data accessed in the rest of the trace. Since the analysis deduces the total amount of data accessed by inspecting a subset of accesses, it is seeing the data that is not there. The paper presents the analysis and its evaluation using 8 program traces. The error of data-size prediction is 33% at 1% sampling and 6% at 10% sampling. The new technique is significantly more accurate than two previous models. One is based on skewed distributions, i.e. the "80-20" law. The other is the well-known Good-Turing frequency estimation.

[1]  Manfred Schroeder,et al.  Fractals, Chaos, Power Laws: Minutes From an Infinite Paradise , 1992 .

[2]  Yale N. Patt,et al.  Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[3]  Irfan Ahmad,et al.  Cache Modeling and Optimization using Miniature Simulations , 2017, USENIX Annual Technical Conference.

[4]  David Eklov,et al.  StatStack: Efficient modeling of LRU caches , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[5]  Yannis E. Ioannidis,et al.  Balancing histogram optimality and practicality for query result size estimation , 1995, SIGMOD '95.

[6]  Yingwei Luo,et al.  DCAPS: dynamic cache allocation with partial sharing , 2018, EuroSys.

[7]  Zhenlin Wang,et al.  mPart: miss-ratio curve guided partitioning in key-value stores , 2018, ISMM.

[8]  Dong Chen,et al.  Locality analysis through static parallel sampling , 2018, PLDI.

[9]  Yingwei Luo,et al.  Kinetic Modeling of Data Eviction in Cache , 2016, USENIX Annual Technical Conference.

[10]  Hao Luo,et al.  HOTL: a higher order theory of locality , 2013, ASPLOS '13.

[11]  Christos Faloutsos,et al.  Modeling Skewed Distribution Using Multifractals and the '80-20' Law , 1996, VLDB.

[12]  Michael Stumm,et al.  RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations , 2009, ASPLOS.

[13]  Kristof Beyls,et al.  Discovery of Locality-Improving Refactorings by Reuse Path Analysis , 2006, HPCC.