论文信息 - Performance Evaluation of a Novel CMP Cache Structure for Hybrid Workloads

Performance Evaluation of a Novel CMP Cache Structure for Hybrid Workloads

The Chip Multiprocessor (CMP) architecture offers parallel multi-thread execution and fast retrieval of shared data that is cached on-chip. In order to obtain the best possible performance with the CMP architecture, the cache architecture must be optimised to reduce time lost during remote cache and off-chip memory accesses. Many researchers proposed CMP cache architectures to improve the system performance, but they have not considered parallel execution of mixed single-thread and multi-thread workloads. In this paper, we propose a hybrid workload-aware cache architecture SPS2, in which each processor has both private and shared L2 caches. We describe the corresponding SPS2 cache coherence protocol with state transition graph. Performance evaluation demonstrates that the proposed SPS2 cache structure has better performance than traditional private L2 and shared L2 when hybrid workloads are applied.

Xuemei Zhao | Karl Sammut | Fangpo He

[1] Milo M. K. Martin,et al. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[2] G. Edward Suh,et al. Dynamic Partitioning of Shared Cache Memory , 2004, The Journal of Supercomputing.

[3] Fredrik Larsson,et al. Simics: A Full System Simulation Platform , 2002, Computer.

[4] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[5] Keith Diefendorff,et al. Power4 focuses on memory bandwidth , 1999 .

[6] Rohit Bhatia,et al. Montecito: a dual-core, dual-thread Itanium processor , 2005, IEEE Micro.

[7] Shyamkumar Thoziyoor,et al. 1 CACTI 4 . 0 , 2006 .

[8] Avinoam Kolodny. Nahalal: Memory Organization for Chip Multiprocessors , 2006 .

[9] Trevor Mudge,et al. MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[10] Mario Donato Marino. 32-core CMP with multi-sliced L2: 2 and 4 cores sharing a L2 slice , 2006, 2006 18th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'06).

[11] Jaehyuk Huh,et al. A NUCA Substrate for Flexible CMP Cache Sharing , 2007, IEEE Transactions on Parallel and Distributed Systems.

[12] Yan Solihin,et al. Fair cache sharing and partitioning in a chip multiprocessor architecture , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[13] Zeshan Chishti,et al. Optimizing replication, communication, and capacity allocation in CMPs , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[14] Mahmut T. Kandemir,et al. Organizing the last line of defense before hitting the memory wall for CMPs , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[15] G. Edward Suh,et al. Dynamic Cache Partitioning for Simultaneous Multithreading Systems , 2004 .