Reducing energy of virtual cache synonym lookup using bloom filters

Virtual caches are employed as L1 caches of both high performance and embedded processors to meet their short latency requirements. However, they also introduce the synonym problem where the same physical cache line can be present at multiple locations in the cache due to their distinct virtual addresses, leading to potential data consistency issues. To guarantee correctness, common hardware solutions either perform serial lookups for all possible synonym locations in the L1 consuming additional energy or employ a reverse map in the L2 cache that incurs a large area overhead. Such preventive mechanisms are nevertheless indispensable even though synonyms may not always be present during the execution.In this paper, we study the synonym issue using Windows applications workload and propose a technique based on Bloom filters to reduce synonym lookup energy. By tracking the address stream using Bloom filters, we can confidently exclude the addresses that were never observed to eliminate unnecessary synonym lookups, thereby saving energy in the L1 cache. Bloom filters have a very small area overhead making our implementation a feasible and attractive solution for synonym detection. Our results show that synonyms in these applications actually constitutes less than 0.1% of the total cache misses. By applying our technique, the dynamic energy consumed in L1 data cache can be reduced up to 32.5%. When taking leakage energy into account, the savings is up to 27.6%.

[1]  Haitham Akkary,et al.  Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors , 2003, MICRO.

[2]  S. G. Tucker,et al.  The IBM 3090 System: An Overview , 1986, IBM Syst. J..

[3]  Alan Jay Smith,et al.  Implementation Issues in Modern Cache Memory , 1998 .

[4]  John W. Lockwood,et al.  Deep packet inspection using parallel bloom filters , 2004, IEEE Micro.

[5]  Kang Li,et al.  Approximate caches for packet classification , 2004, IEEE INFOCOM 2004.

[6]  Glenn Reinman,et al.  Just say no: benefits of early cache miss determination , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[7]  Michel Cekleov,et al.  Virtual-address caches. Part 1: problems and solutions in uniprocessors , 1997, IEEE Micro.

[8]  David Crowe,et al.  Dynamic optimization of micro-operations , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[9]  A. Kumar,et al.  Space-code bloom filter for efficient per-flow traffic measurement , 2004, IEEE INFOCOM 2004.

[10]  Kazuaki Murakami,et al.  Way-predicting set-associative cache for high performance and low energy consumption , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[11]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[12]  Amir Roth,et al.  Store vulnerability window (SVW): re-execution filtering for enhanced load optimization , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[13]  Li Fan,et al.  Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.

[14]  W. H. Wang,et al.  Organization and performance of a two-level virtual-real cache hierarchy , 1989, ISCA '89.

[15]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..

[16]  Yossi Matias,et al.  Spectral bloom filters , 2003, SIGMOD '03.

[17]  Edward S. Davidson,et al.  TAXI: Trace Analysis for x86 Interpretation , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[18]  John Kubiatowicz,et al.  Probabilistic location and routing , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[19]  K. Kavi Cache Memories Cache Memories in Uniprocessors. Reading versus Writing. Improving Performance , 2022 .

[20]  Hsien-Hsin S. Lee,et al.  Efficient System-on-Chip Energy Management with a Segmented Bloom Filter , 2006, ARCS.

[21]  Amir Roth Store Vulnerability Window (SVW): Re-Execution Filtering for Enhanced Load Optimization , 2005, ISCA 2005.

[22]  Abhishek Kumar,et al.  Space-code bloom filter for efficient traffic flow measurement , 2003, IMC '03.

[23]  R. Iris Bahar,et al.  Fetch Halting on critical load misses , 2004, IEEE International Conference on Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings..

[24]  Sang Lyul Min,et al.  U-cache: a cost-effective solution to synonym problem , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.

[25]  Richard E. Kessler,et al.  Page placement algorithms for large real-indexed caches , 1992, TOCS.

[26]  Simha Sethumadhavan,et al.  Scalable hardware memory disambiguation for high-ILP processors , 2003, IEEE Micro.