HC-Sim: A fast and exact L1 cache simulator with scratchpad memory co-simulation support

The configuration of L1 caches has a significant impact on the performance and energy consumption of an embedded system. Normally, an embedded system is designed for a specific application or a domain of applications. Performing simulations on the application(s) is the most popular way to find the optimal L1 cache configuration. However, the simulation-based approach suffers from long simulation time due to the need to exhaustively simulate all configurations, which are characterized by three parameters: the number of cache sets, associativity, and the cache line size. In previous work, the most time-consuming part was to determine the hit or miss status of a cache access under each configuration by performing a linear search on a long linked-list based on the inclusion property. In this work, we propose a novel simulator, HC-Sim, which adopts elaborate data structures, a centralized hash table, and a novel miss counter structure, to effectively reduce the search time. On average, we can achieve 2.56X speedup compared to the existing fastest approach (SuSeSim). In addition, we implement HC-Sim by using the dynamic binary instrumentation tool, Pin. This provides scalability for simulating larger applications by eliminating the overhead of generating and storing a huge trace file. Furthermore, HC-Sim provides the capacity to simulate an L1 cache and a scratchpad memory (SPM) simultaneously. It helps designers to explore the design space considering both L1 cache configurations and the SPM sizes.

[1]  Josep Llosa,et al.  A fast and accurate framework to analyze and optimize cache memory behavior , 2004, TOPL.

[2]  Jason Cong,et al.  An energy-efficient adaptive hybrid cache , 2011, IEEE/ACM International Symposium on Low Power Electronics and Design.

[3]  Zhen Fang,et al.  ACCESS: Smart scheduling for asymmetric cache CMPs , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[4]  Irving L. Traiger,et al.  Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..

[5]  Sri Parameswaran,et al.  Finding optimal L1 cache configuration for embedded systems , 2006, Asia and South Pacific Conference on Design Automation, 2006..

[6]  David H. Albonesi,et al.  Selective cache ways: on-demand cache resource allocation , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[7]  Santosh G. Abraham,et al.  Set-associative cache simulation using generalized binomial trees , 1995, TOCS.

[8]  Jason Cong,et al.  A reuse-aware prefetching scheme for scratchpad memory , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[9]  Mahmut T. Kandemir,et al.  Compiler-directed scratch pad memory hierarchy design and management , 2002, DAC '02.

[10]  Sri Parameswaran,et al.  SuSeSim: a fast simulation strategy to find optimal L1 cache configuration for embedded systems , 2009, CODES+ISSS '09.

[11]  Srinivas Devadas,et al.  Application-specific memory management for embedded systems using software-controlled caches , 2000, Proceedings 37th Design Automation Conference.

[12]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[13]  David A. Wood,et al.  Implementing stack simulation for highly-associative memories , 1991, SIGMETRICS '91.

[14]  Norman P. Jouppi,et al.  Reconfigurable caches and their application to media processing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[15]  Donald E. Thomas,et al.  High level cache simulation for heterogeneous multiprocessors , 2004, Proceedings. 41st Design Automation Conference, 2004..

[16]  Nozomu Togawa,et al.  Exact and fast L1 cache simulation for embedded systems , 2009, 2009 Asia and South Pacific Design Automation Conference.

[17]  Sharad Malik,et al.  Cache miss equations: a compiler framework for analyzing and tuning memory behavior , 1999, TOPL.

[18]  Peter Marwedel,et al.  Scratchpad memory: a design alternative for cache on-chip memory in embedded systems , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).

[19]  Erik Hagersten,et al.  StatCache: a probabilistic approach to efficient and accurate data locality analysis , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.

[20]  Vittorio Zaccaria,et al.  A design framework to efficiently explore energy-delay tradeoffs , 2001, Ninth International Symposium on Hardware/Software Codesign. CODES 2001 (IEEE Cat. No.01TH8571).

[21]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[22]  Arijit Ghosh,et al.  Analytical design space exploration of caches for embedded systems , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[23]  Alan Jay Smith,et al.  Evaluating Associativity in CPU Caches , 1989, IEEE Trans. Computers.

[24]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.