PreTrans: Reducing TLB CAM-search via page number prediction and speculative pre-translation

The need for fast address translation within tight time constraints (before L1 tag check but after effective address computation) imposes many design constraints. The freedom from such constraints can potentially lead to lower TLB energy costs. In this paper, we observe that (1) data accesses commonly use base-displacement addressing modes in which the effective address is computed as the sum of a base and a displacement, and (2) the effective page numbers are predictable once the base address is known. Further, it is easy to cache address translations alongside the predicted page numbers thus enabling speculative address translation that can filter accesses to the TLB. The two observations enable our PreTrans design in which (a) a speculative translation is available based solely on the base address, and (b) the translation is available simultaneously with the effective (virtual) address. PreTrans replaces most of the energy-expensive CAM-lookups for TLB access with RAM lookups, which translates to significant power improvements in the TLB.

[1]  Mikko H. Lipasti,et al.  Exceeding the dataflow limit via value prediction , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[2]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[3]  Scott A. Mahlke,et al.  Data access microarchitectures for superscalar processors with compiler-assisted data prefetching , 1991, MICRO 24.

[4]  Tomás Lang,et al.  Reducing TLB power requirements , 1997, Proceedings of 1997 International Symposium on Low Power Electronics and Design.

[5]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.

[6]  Mahmut T. Kandemir,et al.  Generating physical addresses directly for saving instruction TLB energy , 2002, MICRO.

[7]  James R. Goodman Coherency for multiprocessor virtual address caches , 1987, ASPLOS 1987.

[8]  Michel Cekleov,et al.  Virtual-address caches. Part 1: problems and solutions in uniprocessors , 1997, IEEE Micro.

[9]  Alan L. Cox,et al.  SpecTLB: A mechanism for speculative address translation , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[10]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[11]  Avi Mendelson,et al.  Using value prediction to increase the power of speculative execution hardware , 1998, TOCS.

[12]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[13]  Jean-Loup Baer,et al.  A performance study of software and hardware data prefetching schemes , 1994, ISCA '94.

[14]  W. H. Wang,et al.  Organization and performance of a two-level virtual-real cache hierarchy , 1989, ISCA '89.

[15]  Babak Falsafi,et al.  Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.

[16]  Todd M. Austin,et al.  Zero-cycle loads: microarchitecture support for reducing load latency , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[17]  Vicki H. Allan,et al.  Petri net versus module scheduling for software pipelining , 1995, MICRO 1995.

[18]  No License,et al.  Intel ® 64 and IA-32 Architectures Software Developer ’ s Manual Volume 3 A : System Programming Guide , Part 1 , 2006 .

[19]  Dionisios N. Pnevmatikatos,et al.  Streamlining data cache access with fast address calculation , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[20]  Ken Kennedy,et al.  Software prefetching , 1991, ASPLOS IV.

[21]  Michael M. Swift,et al.  Reducing memory reference energy with opportunistic virtual caching , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).