Phantom-BTB: a virtualized branch target buffer design

Modern processors use branch target buffers (BTBs) to predict the target address of branches such that they can fetch ahead in the instruction stream increasing concurrency and performance. Ideally, BTBs would be sufficiently large to capture the entire working set of the application and sufficiently small for fast access and practical on-chip dedicated storage. Depending on the application, these requirements are at odds. This work introduces a BTB design that accommodates large instruction footprints without dedicating expensive onchip resources. In the proposed Phantom-BTB (PBTB) design, a conventional BTB is augmented with a virtual table that collects branch target information as the application runs. The virtual table does not have fixed dedicated storage. Instead, it is transparently allocated, on demand, in the on-chip caches, at cache line granularity. The entries in the virtual table are proactively prefetched and installed in the dedicated conventional BTB, thus, increasing its perceived capacity. Experimental results with commercial workloads under full-system simulation demonstrate that PBTB improves IPC performance over a 1K-entry BTB by 6.9% on average and up to 12.7%, with a storage overhead of only 8%. Overall, the virtualized design performs within 1% of a conventional 4K-entry, single-cycle access BTB, while the dedicated storage is 3.6 times smaller.

[1]  Lizy Kurian John,et al.  Adapting branch-target buffer to improve the target predictability of java code , 2005, TACO.

[2]  D. Patterson,et al.  Performance characterization of a quad Pentium Pro SMP using OLTP workloads , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).

[3]  Barry Fagin,et al.  Partial resolution in branch target buffers , 1995, MICRO 1995.

[4]  Luiz André Barroso,et al.  Memory system characterization of commercial workloads , 1998, ISCA.

[5]  D.R. Kaeli,et al.  Branch history table prediction of moving target branches due to subroutine returns , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.

[6]  Yale N. Patt,et al.  Alternative implementations of two-level adaptive branch prediction , 1992, ISCA '92.

[7]  Hideki Ando,et al.  A Cost-Effective Branch Target Buffer with a Two-Level Table Organization , 1999 .

[8]  Karel Driesen,et al.  The cascaded predictor: economical and adaptive branch target prediction , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[9]  David A. Wood,et al.  Variability in architectural simulations of multi-threaded workloads , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[10]  Pat Conway,et al.  The AMD Opteron Processor for Multiprocessor Servers , 2003, IEEE Micro.

[11]  Norman P. Jouppi,et al.  Reconfigurable caches and their application to media processing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[12]  Onur Mutlu,et al.  Improving the performance of object-oriented languages with dynamic predication of indirect jumps , 2008, ASPLOS.

[13]  Thomas F. Wenisch,et al.  SimFlex: a fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture , 2004, PERV.

[14]  Ravi Nair Dynamic path-based branch correlation , 1995, MICRO 1995.

[15]  Yale N. Patt,et al.  Branch history table indexing to prevent pipeline bubbles in wide-issue superscalar processors , 1993, MICRO 1993.

[16]  David J. DeWitt,et al.  DBMSs on a Modern Processor: Where Does Time Go? , 1999, VLDB.

[17]  Albert Meixner,et al.  Unified microprocessor core storage , 2007, CF '07.

[18]  Daniel A. Jiménez,et al.  The impact of delay on the design of branch predictors , 2000, MICRO 33.

[19]  Alan Jay Smith,et al.  Branch Prediction Strategies and Branch Target Buffer Design , 1995, Computer.

[20]  Daniel A. Jiménez,et al.  Reconsidering complex branch predictors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[21]  Chris H. Perleberg,et al.  Branch Target Buffer Design and Optimization , 1993, IEEE Trans. Computers.

[22]  Babak Falsafi,et al.  Predictor virtualization , 2008, ASPLOS.

[23]  Trung A. Diep,et al.  Branch behavior of a commercial OLTP workload on Intel IA32 processors , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[24]  Wolfgang Rosenstiel,et al.  Evaluation of Branch-prediction Methods on Traces from Commercial Applications for Modern Superscalar Processors, Branch Prediction Is a Must, and There Has Been Significant Progress in This Field during Recent Years. for the Ibm System Esa/390 , 1999 .