Decoupling Translation Lookaside Buffer Coherence from Cache Coherence

Many multicore and manycore architectures support hardware cache coherence. However, most of them rely on software techniques to maintain Translation Lookaside Buffer (TLB) coherence, namely the TLB shootdown routine, which is a costly procedure, known to be hardly scalable.The TSAR architecture is a manycore architecture including hardware TLB coherence, but in which the TLB coherence mechanism is tightly coupled to the cache coherence protocol, resulting in useless TLB invalidations. We propose to improve this existing TLB coherence scheme by adding a hardware module which allows separating data from metadata for cache lines containing address translation. This allows to eliminate the need to invalidate TLB entries when a line containing a translation is evicted from the L1 cache.Our solution does not modify the cache coherence protocol, does not increase the critical path in the L1 cache, and even results in little memory savings. Performance results show that our solution allows to eliminate from 90% to 95% of TLB scans operations, and from 50% to 80% of TLB flushes. This in turn results in an overall performance improvement of 5% to 20% of execution times on a 16-core architecture.

[1]  Daniel J. Sorin,et al.  UNified Instruction/Translation/Data (UNITD) coherence: One protocol to rule them all , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[2]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[3]  Shekhar Borkar Thousand Core ChipsA Technology Perspective , 2007, DAC 2007.

[4]  Avi Mendelson,et al.  DiDi: Mitigating the Performance Impact of TLB Shootdowns Using a Shared TLB Directory , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[5]  George Kurian,et al.  ATAC: A 1000-core cache-coherent processor with on-chip optical network , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[6]  David L. Black,et al.  Translation lookaside buffer consistency: a software approach , 1989, ASPLOS III.

[7]  Alberto Ros,et al.  Cache Coherence Protocols for Many-Core CMPs , 2010 .

[8]  Coniferous softwood GENERAL TERMS , 2003 .

[9]  Carl Ramey,et al.  TILE-Gx100 ManyCore processor: Acceleration interfaces and architecture , 2011, 2011 IEEE Hot Chips 23 Symposium (HCS).

[10]  Milo M. K. Martin,et al.  Why on-chip cache coherence is here to stay , 2012, Commun. ACM.