Chip multiprocessors have become the normative architecture for medium and high performance processors. These devices introduce new questions and research topics. One such topic is exploring the design space of a cachememory hierarchy that prevents the memory accesses from being a limiting factor on system performance. Simulation of system workloads is a widely accepted method for evaluating proposed cache organizations. Cycle accurate simulation of multicore devices requires a significant amount of time, limiting the number of configurations that can be analyzed. The generation of a memory access trace file from a cycle accurate simulation can be used to analyze multiple cache configurations in much less time. This paper introduces Moola, a multicore, trace-based cache simulator with cycle accurate timing within the cache-memory subsystem. Moola is suitable for experimenting with different cache configurations, including different types of last level cache (LLC) implementations and demonstrating to researchers and students how access congestion at the shared LLC can adversely impact the system performance. Moola is highly configurable at run-time through configuration files and command line arguments. An analysis of congestion effects in the LLC is provided as an example of how Moola can be used to analyze current cache constructs. keywords: Computer architecture, multicore processors, cache simulation.
[1]
Trevor N. Mudge,et al.
Trace-driven memory simulation: a survey
,
1997,
CSUR.
[2]
P. Ratanaworabhan.
Functional cache simulator for multicore
,
2012,
2012 9th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology.
[3]
Muhammad Ali Ismail,et al.
MCSMC: A new parallel Multi-level Cache Simulator For multi-core processors
,
2013,
2013 Saudi International Electronics, Communications and Photonics Conference.
[4]
Wolfgang Karl,et al.
Evaluating the Cache Architecture of Multicore Processors
,
2008,
16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008).
[5]
John L. Henning.
Performance counters and development of SPEC CPU2006
,
2007,
CARN.
[6]
B. Jacob,et al.
CMP $ im : A Pin-Based OnThe-Fly Multi-Core Cache Simulator
,
2008
.