Atomic Coherence: Leveraging nanophotonics to build race-free cache coherence protocols

This paper advocates Atomic Coherence, a framework that simplifies cache coherence protocol specification, design, and verification by decoupling races from the protocol's operation. Atomic Coherence requires conflicting coherence requests to the same addresses be serialized with a mutex before they are issued. Once issued, requests follow a predictable race-free path. Because requests are guaranteed not to race, coherence protocols are simpler and protocol extensions are straightforward. Our implementation of Atomic Coherence uses optical mutexes because optics provides very low latency. We begin with a state-of-the-art non-atomic MOEFSI protocol and demonstrate that an atomic implementation is much simpler while imposing less than a 2% performance penalty. We then show how, in the absence of races, it is easy to add support for speculative coherence and improve performance by up to 70%. Similar performance gains may be possible in a non-atomic protocol, but not without considerable effort in race management.

[1]  Larry Carter,et al.  Universal classes of hash functions (Extended Abstract) , 1977, STOC '77.

[2]  J. Goodman Using cache memory to reduce processor-memory traffic , 1983, ISCA '83.

[3]  Dante Del Corso Microcomputer buses and links , 1986 .

[4]  James K. Archibald,et al.  Cache coherence protocols: evaluation using a multiprocessor simulation model , 1986, TOCS.

[5]  David B. Gustavson The Scalable Coherent Interface and related standards projects , 1992, IEEE Micro.

[6]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[7]  Erik Hagersten,et al.  Gigaplane: A High Performance Bus for Large SMPs , 2003 .

[8]  James R. Goodman,et al.  Efficient Synchronization: Let Them Eat QOLB /sup1/ , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[9]  D. Lenoski,et al.  The SGI Origin: A ccnuma Highly Scalable Server , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[10]  James R. Goodman,et al.  Efficient Synchronization: Let Them Eat QOLB , 1997, International Symposium on Computer Architecture.

[11]  J. Goodman Using cache memory to reduce processor-memory traffic , 1983, ISCA '98.

[12]  Erik Hagersten,et al.  WildFire: a scalable path for SMPs , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[13]  M. Hill,et al.  Multicast snooping: a new coherence method using a multicast address network , 1999, Proceedings of the 26th International Symposium on Computer Architecture (Cat. No.99CB36367).

[14]  Babak Falsafi,et al.  Memory sharing predictor: the key to a speculative coherent DSM , 1999, ISCA.

[15]  Luiz André Barroso,et al.  Piranha: a scalable architecture based on single-chip multiprocessing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[16]  David J. Lilja,et al.  So many states, so little time: verifying memory coherence in the Cray X1 , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[17]  Milo M. K. Martin,et al.  Simulating a $ 2 M Commercial Server on a $ 2 K PC T , 2001 .

[18]  Token Coherence: decoupling performance and correctness , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[19]  Ying Chen,et al.  Heuristics for Complexity-Effective Verification of a Cache Coherence Protocol Implementation , 2003 .

[20]  Mikko H. Lipasti,et al.  Exploring, defining, and exploiting recent store value locality , 2003 .

[21]  Leslie Lamport,et al.  Checking Cache-Coherence Protocols with TLA+ , 2003, Formal Methods Syst. Des..

[22]  Jaehyuk Huh,et al.  Coherence decoupling: making use of incoherence , 2004, ASPLOS XI.

[23]  Andreas Moshovos RegionScout: exploiting coarse grain sharing in snoop-based coherence , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[24]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[25]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[26]  Mark D. Hill,et al.  Virtual hierarchies to support server consolidation , 2007, ISCA '07.

[27]  S. Mikhrin,et al.  Quantum dot laser with 75 nm broad spectrum of emission. , 2007, Optics letters.

[28]  David F. Welch,et al.  Large-scale photonic integrated circuits for long-haul transmission and switching , 2007 .

[29]  Shuguang Feng,et al.  Self-calibrating Online Wearout Detection , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[30]  Josep Torrellas,et al.  Uncorq: Unconstrained Snoop Request Delivery in Embedded-Ring Multiprocessors , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[31]  Milo M. K. Martin,et al.  Token tenure: PATCHing token counting using directory-based cache coherence , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[32]  Jung Ho Ahn,et al.  Corona: System Implications of Emerging Nanophotonic Technology , 2008, 2008 International Symposium on Computer Architecture.

[33]  Li Shang,et al.  Leveraging on-chip networks for data cache migration in chip multiprocessors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[34]  Mikko H. Lipasti,et al.  Light speed arbitration and flow control for nanophotonic interconnects , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[35]  David H. Albonesi,et al.  Phastlane: a rapid transit optical routing network , 2009, ISCA '09.

[36]  Meng Zhang,et al.  Fractal Coherence: Scalably Verifiable Cache Coherence , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[37]  John Kim,et al.  FlexiShare: Channel sharing for an energy-efficient nanophotonic crossbar , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[38]  Leslie Lamport,et al.  The Wildfire Challenge Problem , 2016 .