MicroLib: A Case for the Quantitative Comparison of Micro-Architecture Mechanisms

While most research papers on computer architectures include some performance measurements, these performance numbers tend to be distrusted. Up to the point that, after so many research articles on data cache architectures, for instance, few researchers have a clear view of what are the best data cache mechanisms. To illustrate the usefulness of a fair quantitative comparison, we have picked a target architecture component for which lots of optimizations have been proposed (data caches), and we have implemented most of the performance-oriented hardware data cache optimizations published in top conferences in the past 4 years. Beyond the comparison of data cache ideas, our goals are twofold: (1) to clearly and quantitatively evaluate the effect of methodology shortcomings, such as model precision, benchmark selection, trace selection..., on assessing and comparing research ideas, and to outline how strong is the methodology effect in many cases, (2) to outline that the lack of interoperable simulators and not disclosing simulators at publication time make it difficult if not impossible to fairly assess the benefit of research ideas. This study is part of a broader effort, called MicroLib, an open library of modular simulators aimed at promoting the disclosure and sharing of simulator models.

[1]  Seung-Moon Yoo,et al.  L1 data cache decomposition for energy efficiency , 2001, ISLPED'01: Proceedings of the 2001 International Symposium on Low Power Electronics and Design (IEEE Cat. No.01TH8581).

[2]  K. Kavi Cache Memories Cache Memories in Uniprocessors. Reading versus Writing. Improving Performance , 2022 .

[3]  J.W.C. Fu,et al.  Stride Directed Prefetching In Scalar Processors , 1992, [1992] Proceedings the 25th Annual International Symposium on Microarchitecture MICRO 25.

[4]  Rajiv Gupta,et al.  Enabling partial cache line prefetching through data compression , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..

[5]  Jim Handy,et al.  The cache memory book , 1993 .

[6]  Dirk Grunwald,et al.  Prefetching Using Markov Predictors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[7]  John Paul Shen,et al.  Non-vital loads , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[8]  Trevor N. Mudge,et al.  A performance comparison of contemporary DRAM architectures , 1999, ISCA.

[9]  M. Martonosi,et al.  Timekeeping in the memory system: predicting and optimizing memory behavior , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[10]  Dirk Grunwald,et al.  A stateless, content-directed data prefetching mechanism , 2002, ASPLOS X.

[11]  C. Green,et al.  ANALYZING AND IMPLEMENTING SDRAM AND SGRAM CONTROLLERS , 1998 .

[12]  G. Tyson,et al.  Eager writeback-a technique for improving bandwidth utilization , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.

[13]  David I. August,et al.  Microarchitectural exploration with Liberty , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[14]  Norman P. Jouppi,et al.  Cacti 3. 0: an integrated cache timing, power, and area model , 2001 .

[15]  James E. Smith,et al.  Data Cache Prefetching Using a Global History Buffer , 2005, IEEE Micro.

[16]  Roland E. Wunderlich,et al.  SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[17]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[18]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[19]  Zhao Zhang,et al.  A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality , 2000, MICRO 33.

[20]  Tomas Rokicki,et al.  Indexing Memory Banks to Maximize Page Mode Hit Percentage and Minimize Memory Latency , 2003 .

[21]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[22]  Jun Yang,et al.  Frequent Value Locality and Value-Centric Data Cache Design , 2000, ASPLOS.

[23]  Margaret Martonosi,et al.  TCP: tag correlating prefetchers , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[24]  Babak Falsafi,et al.  Dead-block prediction & dead-block correlating prefetchers , 2001, ISCA 2001.