In-line interrupt handling and lock-up free translation lookaside buffers (TLBs)

The effects of the general-purpose precise interrupt mechanisms in use for the past few decades have received very little attention. When modern out-of-order processors handle interrupts precisely, they typically begin by flushing the pipeline to make the CPU available to execute handler instructions. In doing so, the CPU ends up flushing many instructions that have been brought in to the reorder buffer. In particular, these instructions may have reached a very deep stage in the pipeline - representing significant work that is wasted. In addition, an overhead of several cycles and wastage of energy (per exception detected) can be expected in refetching and reexecuting the instructions flushed. This paper concentrates on improving the performance of precisely handling software managed translation look-aside buffer (TLB) interrupts, one of the most frequently occurring interrupts. The paper presents a novel method of in-lining the interrupt handler within the reorder buffer. Since the first level interrupt-handlers of TLBs are usually small, they could potentially fit in the reorder buffer along with the user-level code already there. In doing so, the instructions that would otherwise be flushed from the pipe need not be refetched and reexecuted. Additionally, it allows for instructions independent of the exceptional instruction to continue to execute in parallel with the handler code. By in-lining the TLB interrupt handler, this provides lock-up free TLBs. This paper proposes the prepend and append schemes of in-lining the interrupt handler into the available reorder buffer space. The two schemes are implemented on a performance model of the Alpha 21264 processor built by Alpha designers at the Palo Alto Design Center (PADC), California. We compare the overhead and performance impact of handling TLB interrupts by the traditional scheme, the append in-lined scheme, and the prepend in-lined scheme. For small, medium, and large memory footprints, the overhead is quantified by comparing the number and pipeline state of instructions flushed, the energy savings, and the performance improvements. We find that lock-up free TLBs reduce the overhead of refetching and reexecuting the instructions flushed by 30-95 percent, reduce the execution time by 5-25 percent, and also reduce the energy wasted by 30-90 percent.

[1]  Aamer Jaleel,et al.  Improving the Precise Interrupt Mechanism of Software-Managed TLB Miss Handlers , 2001, HiPC.

[2]  Aamer Jaleel,et al.  In-line interrupt handling for software-managed TLBs , 2001, Proceedings 2001 IEEE International Conference on Computer Design: VLSI in Computers and Processors. ICCD 2001.

[3]  Richard E. Kessler,et al.  Performance analysis of the Alpha 21264-based Compaq ES40 system , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[4]  Bradley C. Kuszmaul,et al.  Circuits for wide-window superscalar processors , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[5]  Gurindar S. Sohi,et al.  The use of multithreading for exception handling , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[6]  William J. Dally,et al.  Concurrent Event Handling through Multithreading , 1999, IEEE Trans. Computers.

[7]  M. Dubois,et al.  Tolerating late memory traps in ILP processors , 1999, Proceedings of the 26th International Symposium on Computer Architecture (Cat. No.99CB36367).

[8]  Trevor N. Mudge,et al.  Virtual memory in contemporary microprocessors , 1998, IEEE Micro.

[9]  Trevor N. Mudge,et al.  Virtual Memory: Issues of Implementation , 1998, Computer.

[10]  Larry L. Biro,et al.  Power considerations in the design of the Alpha 21264 microprocessor , 1998, Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175).

[11]  Tomás Lang,et al.  Reducing TLB power requirements , 1997, Proceedings of 1997 International Symposium on Low Power Electronics and Design.

[12]  Kenneth C. Yeager The Mips R10000 superscalar microprocessor , 1996, IEEE Micro.

[13]  Stamatis Vassiliadis,et al.  Precise Interrupts , 1996, IEEE Micro.

[14]  Anoop Gupta,et al.  The impact of architectural trends on operating system performance , 1995, SOSP.

[15]  Harvey G. Cragon,et al.  Interrupt Processing in Concurrent Processors , 1995, Computer.

[16]  R. M. Tomasulo,et al.  An efficient algorithm for exploiting multiple arithmetic units , 1995 .

[17]  Trevor N. Mudge,et al.  Design Tradeoffs For Software-managed Tlbs , 1994, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[18]  Jerome C. Huck,et al.  Architectural Support For Translation Table Management In Large Address Space Machines , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[19]  Brian N. Bershad,et al.  The interaction of architecture and operating system design , 1991, ASPLOS IV.

[20]  Gurindar S. Sohi,et al.  Instruction Issue Logic for High-Performance Interruptible, Multiple Functional Unit, Pipelines Computers , 1990, IEEE Trans. Computers.

[21]  Andrew R. Pleszkun,et al.  Implementation of precise interrupts in pipelined processors , 1985, ISCA '98.

[22]  Peter J. Denning,et al.  Virtual Memory , 1970, CSUR.

[23]  Dana S. Henry Adding Fast Interrupts to Superscalar Processors , 2005 .

[24]  Trevor N. Mudge,et al.  A look at several memory management units, TLB-refill mechanisms, and page table organizations , 1998, ASPLOS VIII.

[25]  Hwa C. Torng,et al.  Interrupt Handling for Out-of-Order Execution Processors , 1993, IEEE Trans. Computers.

[26]  Gurindar S. Sohi,et al.  Instruction issue logic for high-performance, interruptable pipelined processors , 1987, ISCA '87.

[27]  Gerry Kane,et al.  MIPS RISC Architecture , 1987 .

[28]  James E. Smith,et al.  Implementation of precise interrupts in pipelined processors , 1985, ISCA '85.