Workload adaptive shared memory multicore processors with reconfigurable interconnects

Interconnection networks for multicore processors are designed in a generic way to serve a diversity of workloads. For multicore processors, there is a considerable opportunity to achieve an improvement in performance by implementing interconnects which adapt to different program phases and to a variety of workloads. This paper proposes one such interconnection network for medium-scale (up to 32 cores) shared memory multicore processors and the associated means at the software level to utilize it effectively. The proposed architecture uses clustering to divide the cores on the chip among many groups called clusters. Reconfigurable logic is inserted between clusters to support either isolation or different policies for communication among clusters. The experiments show that the isolation property of clusters can improve overall throughput of a multicore processor by as much as 60% for multiprogramming workloads consisting of two and four applications. The area-overhead of the additional logic is shown to be minimal.

[1]  John B. Carter,et al.  Design of the Munin Distributed Shared Memory System , 1995, J. Parallel Distributed Comput..

[2]  Todd M. Austin,et al.  Polymorphic On-Chip Networks , 2008, 2008 International Symposium on Computer Architecture.

[3]  Lizy Kurian John,et al.  A dynamically reconfigurable interconnect for array processors , 1998, IEEE Trans. Very Large Scale Integr. Syst..

[4]  Mikko H. Lipasti,et al.  Improving multiprocessor performance with coarse-grain coherence tracking , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[5]  Balaram Sinharoy,et al.  POWER5 system microarchitecture , 2005, IBM J. Res. Dev..

[6]  Ronald G. Dreslinski,et al.  The M5 Simulator: Modeling Networked Systems , 2006, IEEE Micro.

[7]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[8]  Li-Shiuan Peh,et al.  In-network cache coherence , 2006, IEEE Comput. Archit. Lett..

[9]  Mark D. Hill,et al.  Coherence Ordering for Ring-based Chip Multiprocessors , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[10]  Sandhya Dwarkadas,et al.  Characterizing and predicting program behavior and its variability , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[11]  Andreas Moshovos RegionScout: exploiting coarse grain sharing in snoop-based coherence , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[12]  Stephen D. Brown,et al.  Flexibility of interconnection structures for field-programmable gate arrays , 1991 .

[13]  Andrew W. Wilson,et al.  Hierarchical cache/bus architecture for shared memory multiprocessors , 1987, ISCA '87.

[14]  P. Stenstrom A survey of cache coherence schemes for multiprocessors , 1990, Computer.

[15]  Dean M. Tullsen,et al.  Interconnections in multi-core architectures: understanding mechanisms, overheads and scaling , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[16]  William J. Dally,et al.  Design tradeoffs for tiled CMP on-chip networks , 2006, ICS '06.