DEW: A fast level 1 cache simulation approach for embedded processors with FIFO replacement policy

Increasing the speed of cache simulation to obtain hit/miss rates enables performance estimation, cache exploration for embedded systems and energy estimation. Previously, such simulations, particularly exact approaches, have been exclusively for caches which utilize the least recently used (LRU) replacement policy. In this paper, we propose a new, fast and exact cache simulation method for the First In First Out(FIFO) replacement policy. This method, called DEW, is able to simulate multiple level 1 cache configurations (different set sizes, associativities, and block sizes) with FIFO replacement policy. DEW utilizes a binomial tree based representation of cache configurations and a novel searching method to speed up simulation over single cache simulators like Dinero IV. Depending on different cache block sizes and benchmark applications, DEW operates around 8 to 40 times faster than Dinero IV. Dinero IV compares 2.17 to 19.42 times more cache ways than DEW to determine accurate miss rates.

[1]  Sharad Malik,et al.  Cache miss equations: a compiler framework for analyzing and tuning memory behavior , 1999, TOPL.

[2]  Santosh G. Abraham,et al.  Set-associative cache simulation using generalized binomial trees , 1995, TOCS.

[3]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[4]  Donald E. Thomas,et al.  High level cache simulation for heterogeneous multiprocessors , 2004, Proceedings. 41st Design Automation Conference, 2004..

[5]  Irving L. Traiger,et al.  Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..

[6]  Nozomu Togawa,et al.  Exact and fast L1 cache simulation for embedded systems , 2009, 2009 Asia and South Pacific Design Automation Conference.

[7]  Josep Llosa,et al.  A fast and accurate framework to analyze and optimize cache memory behavior , 2004, TOPL.

[8]  Aleksandar Milenkovic,et al.  Performance evaluation of cache replacement policies for the SPEC CPU2000 benchmark suite , 2004, ACM-SE 42.

[9]  Gürhan Küçük,et al.  AccuPower: an accurate power estimation tool for superscalar microprocessors , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[10]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[11]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[12]  Vittorio Zaccaria,et al.  A design framework to efficiently explore energy-delay tradeoffs , 2001, Ninth International Symposium on Hardware/Software Codesign. CODES 2001 (IEEE Cat. No.01TH8571).

[13]  Alan Jay Smith,et al.  Evaluating Associativity in CPU Caches , 1989, IEEE Trans. Computers.

[14]  Xianfeng Li,et al.  Design space exploration of caches using compressed traces , 2004, ICS '04.

[15]  Sri Parameswaran,et al.  Finding optimal L1 cache configuration for embedded systems , 2006, Asia and South Pacific Conference on Design Automation, 2006..