论文信息 - 32-core CMP with multi-sliced L2: 2 and 4 cores sharing a L2 slice

32-core CMP with multi-sliced L2: 2 and 4 cores sharing a L2 slice

Nowadays the market is moving to have multiple cores on the same chip (chip multiprocessors - CMP) with a multi-sliced L2 which is shared by 2 cores. CMPs with 8 cores can already be found, and future CMPs will have more than 8 cores. It's interesting to have more than 2 cores sharing their L2 slice. So, the idea is to evaluate future CMPs with 4 processors sharing the same L2 slice, compare them to the present ones with 2 processors sharing it and also with processors with 1 processor per L2. We construct a model and evaluate it with a full-system simulation, using 32 processors, under SPLASH-2 benchmarks. Previous results show that the execution time is improved of about 8.7% for FMM to 40.3% for radiosity

Mario Donato Marino

[1] H. A. Howe. A short Method for Kepler's Problem , 1884 .

[2] Norman P. Jouppi,et al. Conjoined-Core Chip Multiprocessing , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[3] José E. Moreira,et al. Evaluation of a multithreaded architecture for cellular computing , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[4] Norman P. Jouppi,et al. Heterogeneous chip multiprocessors , 2005, Computer.

[5] K. Olukotun,et al. Evaluation of Design Alternatives for a Multiprocessor Microprocessor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[6] Luiz André Barroso,et al. Piranha: a scalable architecture based on single-chip multiprocessing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[7] Henry Hoffmann,et al. Evaluation of the Raw microprocessor: an exposed-wire-delay architecture for ILP and streams , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[8] Zeshan Chishti,et al. Distance Associativity for High-Performance Energy-Efficient Non-Uniform Cache Architectures , 2003, MICRO.

[9] Karthikeyan Sankaralingam,et al. A design space evaluation of grid processor architectures , 2001, MICRO.

[10] Zeshan Chishti,et al. Optimizing replication, communication, and capacity allocation in CMPs , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[11] Chun Liu,et al. Optimizing bus energy consumption of on-chip multiprocessors using frequent values , 2006, J. Syst. Archit..

[12] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[13] Xiaoning Ding,et al. An Evaluation of OpenMP on Current and Emerging Multithreaded/Multicore Processors , 2005, IWOMP.

[14] Jaehyuk Huh,et al. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture , 2003, ISCA '03.

[15] Norman P. Jouppi,et al. Cacti 3. 0: an integrated cache timing, power, and area model , 2001 .

[16] Mahmut T. Kandemir,et al. Optimizing bus energy consumption of on-chip multiprocessors using frequent values , 2004, 12th Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2004. Proceedings..

[17] Vivek Sarkar,et al. Baring It All to Software: Raw Machines , 1997, Computer.

[18] T. N. Vijaykumar,et al. Optimizing Replication, Communication, and Capacity Allocation in CMPs , 2005, ISCA 2005.

[19] T. N. Vijaykumar,et al. Distance associativity for high-performance energy-efficient non-uniform cache architectures , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[20] Kunle Olukotun,et al. The case for a single-chip multiprocessor , 1996, ASPLOS VII.

[21] Dean M. Tullsen,et al. Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling , 2005, ISCA 2005.

[22] Jaehyuk Huh,et al. Exploring the design space of future CMPs , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[23] Dean M. Tullsen,et al. Interconnections in multi-core architectures: understanding mechanisms, overheads and scaling , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[24] Krste Asanovic,et al. Victim replication: maximizing capacity while hiding wire delay in tiled chip multiprocessors , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[25] Manuel E. Acacio,et al. Memory Subsystem Characterization in a 16-Core Snoop-Based Chip-Multiprocessor Architecture , 2005, HPCC.

[26] SankaralingamKarthikeyan,et al. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture , 2003 .