Ever since industry has turned to parallelism instead of frequency scaling to improve processor performance, multicore processors have continued to scale to larger and larger numbers of cores. Some believe that multicores will have 1000 cores or more by the middle of the next decade. However, their promise of increased performance will only be reached if their inherent scaling challenges are overcome. One such major scaling challenge is the viability of efficient cache coherence with large numbers of cores. Meanwhile, recent advances in nanophotonic device manufacturing are making CMOS-integrated optics a reality—interconnect technology which can provide significantly more bandwidth at lower power than conventional electrical analogs. The contributions of this paper are two-fold. (1) It presents ATAC, a new manycore architecture that augments an electrical mesh network with an optical network that performs highly efficient broadcasts. (2) It introduces ACKwise, a novel directorybased cache coherence protocol that provides high performance and scalability on any large-scale manycore interconnection network with broadcast capability. Performance evaluation studies using analytical models show that (i) a 1024-core ATAC chip using ACKwise achieves a speedup of 3.9× compared to a similarly-sized pure electrical mesh manycore with a conventional limited directory protocol; (ii) the ATAC chip with ACKwise achieves a speedup of 1.35× compared to the electrical mesh chip with ACKwise; and (iii) a pure electrical mesh chip with ACKwise achieves a speedup of 2.9× over the same chip using a conventional limited directory protocol.
[1]
Christopher Batten,et al.
Building Manycore Processor-to-DRAM Networks with Monolithic Silicon Photonics
,
2008,
2008 16th IEEE Symposium on High Performance Interconnects.
[2]
Kai Li,et al.
The PARSEC benchmark suite: Characterization and architectural implications
,
2008,
2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[3]
Henry Hoffmann,et al.
Evaluation of the Raw microprocessor: an exposed-wire-delay architecture for ILP and streams
,
2004,
Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[4]
Alyssa B. Apsel,et al.
Leveraging Optical Technology in Future Bus-based Chip Multiprocessors
,
2006,
2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[5]
Anant Agarwal,et al.
LimitLESS directories: A scalable cache coherence scheme
,
1991,
ASPLOS IV.
[6]
Jung Ho Ahn,et al.
Corona: System Implications of Emerging Nanophotonic Technology
,
2008,
2008 International Symposium on Computer Architecture.
[7]
Luca P. Carloni,et al.
Photonic NoC for DMA Communications in Chip Multiprocessors
,
2007
.
[8]
Natalie D. Enright Jerger,et al.
Virtual Circuit Tree Multicasting: A Case for On-Chip Hardware Multicast Support
,
2008,
2008 International Symposium on Computer Architecture.
[9]
Jason Miller,et al.
ATAC: A Manycore Processor with On-Chip Optical Network
,
2009
.
[10]
Mark Horowitz,et al.
An evaluation of directory schemes for cache coherence
,
1998,
ISCA '98.