32-core CMP with multi-sliced L2: 2 and 4 cores sharing a L2 slice

Nowadays the market is moving to have multiple cores on the same chip (chip multiprocessors - CMP) with a multi-sliced L2 which is shared by 2 cores. CMPs with 8 cores can already be found, and future CMPs will have more than 8 cores. It's interesting to have more than 2 cores sharing their L2 slice. So, the idea is to evaluate future CMPs with 4 processors sharing the same L2 slice, compare them to the present ones with 2 processors sharing it and also with processors with 1 processor per L2. We construct a model and evaluate it with a full-system simulation, using 32 processors, under SPLASH-2 benchmarks. Previous results show that the execution time is improved of about 8.7% for FMM to 40.3% for radiosity

[1]  H. A. Howe A short Method for Kepler's Problem , 1884 .

[2]  Norman P. Jouppi,et al.  Conjoined-Core Chip Multiprocessing , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[3]  José E. Moreira,et al.  Evaluation of a multithreaded architecture for cellular computing , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[4]  Norman P. Jouppi,et al.  Heterogeneous chip multiprocessors , 2005, Computer.

[5]  K. Olukotun,et al.  Evaluation of Design Alternatives for a Multiprocessor Microprocessor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[6]  Luiz André Barroso,et al.  Piranha: a scalable architecture based on single-chip multiprocessing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[7]  Henry Hoffmann,et al.  Evaluation of the Raw microprocessor: an exposed-wire-delay architecture for ILP and streams , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[8]  Zeshan Chishti,et al.  Distance Associativity for High-Performance Energy-Efficient Non-Uniform Cache Architectures , 2003, MICRO.

[9]  Karthikeyan Sankaralingam,et al.  A design space evaluation of grid processor architectures , 2001, MICRO.

[10]  Zeshan Chishti,et al.  Optimizing replication, communication, and capacity allocation in CMPs , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[11]  Chun Liu,et al.  Optimizing bus energy consumption of on-chip multiprocessors using frequent values , 2006, J. Syst. Archit..

[12]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[13]  Xiaoning Ding,et al.  An Evaluation of OpenMP on Current and Emerging Multithreaded/Multicore Processors , 2005, IWOMP.

[14]  Jaehyuk Huh,et al.  Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture , 2003, ISCA '03.

[15]  Norman P. Jouppi,et al.  Cacti 3. 0: an integrated cache timing, power, and area model , 2001 .

[16]  Mahmut T. Kandemir,et al.  Optimizing bus energy consumption of on-chip multiprocessors using frequent values , 2004, 12th Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2004. Proceedings..

[17]  Vivek Sarkar,et al.  Baring It All to Software: Raw Machines , 1997, Computer.

[18]  T. N. Vijaykumar,et al.  Optimizing Replication, Communication, and Capacity Allocation in CMPs , 2005, ISCA 2005.

[19]  T. N. Vijaykumar,et al.  Distance associativity for high-performance energy-efficient non-uniform cache architectures , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[20]  Kunle Olukotun,et al.  The case for a single-chip multiprocessor , 1996, ASPLOS VII.

[21]  Dean M. Tullsen,et al.  Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling , 2005, ISCA 2005.

[22]  Jaehyuk Huh,et al.  Exploring the design space of future CMPs , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[23]  Dean M. Tullsen,et al.  Interconnections in multi-core architectures: understanding mechanisms, overheads and scaling , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[24]  Krste Asanovic,et al.  Victim replication: maximizing capacity while hiding wire delay in tiled chip multiprocessors , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[25]  Manuel E. Acacio,et al.  Memory Subsystem Characterization in a 16-Core Snoop-Based Chip-Multiprocessor Architecture , 2005, HPCC.

[26]  SankaralingamKarthikeyan,et al.  Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture , 2003 .