Teaching old caches new tricks: RegionTracker and predictor virtualization

On-chip last-level caches are increasing to tens of megabytes to accommodate applications with large memory footprints and to compensate for high memory latencies and limited off-chip bandwidth. This paper reviews two on-going research efforts that exploit such large caches: coarse-grain cache management, and predictor virtualization. Coarse-grain cache management collects and stores cache information at a large memory region granularity (e.g., 1KB to 8KB). This coarse view of memory access behaviour enables optimizations that were not previously possible with conventional caches. Predictor virtualization is motivated by the observation that on-chip storage has become sufficiently large to accommodate allocating, on demand, a small percentage of its capacity for purposes other than storing program data and instructions. Predictor virtualization uses conventional caches to store program metadata, i.e., information about program behaviour. Such metadata information can be used for several optimizations that improve performance and power. This paper summarizes the progress made and the on-going activity in these two research efforts.

[1]  Babak Falsafi,et al.  JETTY: filtering snoops for reduced energy consumption in SMP servers , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[2]  Natalie D. Enright Jerger,et al.  Virtual tree coherence: Leveraging regions and in-network multicast trees for scalable cache coherence , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[3]  Alan Jay Smith,et al.  Branch Prediction Strategies and Branch Target Buffer Design , 1995, Computer.

[4]  Andreas Moshovos RegionScout: exploiting coarse grain sharing in snoop-based coherence , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[5]  James E. Smith,et al.  The predictability of data values , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[6]  Babak Falsafi,et al.  Predictor virtualization , 2008, ASPLOS.

[7]  James E. Smith,et al.  Data Cache Prefetching Using a Global History Buffer , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[8]  Mikko H. Lipasti,et al.  Power-Efficient DRAM Speculation , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[9]  R. Govindarajan,et al.  Emulating Optimal Replacement with a Shepherd Cache , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[10]  Phantom-BTB : Improving Branch Target Buffer Performance by Leveraging the On-Chip Memory Hierarchy , 2003 .

[11]  Brad Calder,et al.  Predictor-directed stream buffers , 2000, MICRO 33.

[12]  Thomas F. Wenisch,et al.  Spatial Memory Streaming , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[13]  Mikko H. Lipasti,et al.  Improving multiprocessor performance with coarse-grain coherence tracking , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[14]  Andreas Moshovos,et al.  A Framework for Coarse-Grain Optimizations in the On-Chip Memory Hierarchy , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[15]  Mikko H. Lipasti,et al.  Stealth prefetching , 2006, ASPLOS XII.