Adding Fast Interrupts to Superscalar Processors

The hardware cost of taking an interrupt is increasing as processors become more superscalar. Using FLIP, an aggressively superscalar processor which we have designed and tested in Verilog, we demonstrate that interrupts can be fast and inexpensive. We trace individual signals through FLIP’s pipeline stages to show that fast interrupts require negligible new hardware. Except for linkage information, interrupts reuse existing branch mechanisms. An asynchronous interrupt acts as an immediate jump instruction, while a synchronous interrupt acts as a mispredicted branch. Although we concentrate on user-level interrupts, we show that kernel-level interrupts can be handled identically with the addition of protection mode bits to identify the protection mode of every outstanding instruction. In blending fast interrupts into the superscalar processor, we address two new problems. The first problem arises from fast synchronous interrupts. Because most instructions can cause an interrupt, the processor must be able to revert to its state prior to most instructions, not just mispredicted branches. This ubiquitousness of reversion leads us to design a new renaming data structure. Our renaming data structure can revert to the state prior to any outstanding instruction by updating a single pointer. The entire structure consists of the outstanding renaming bindings plus a simple scan circuit to look up the latest binding. The second problem arises from the interaction of mispredicted branches and asynchronous interrupts. An asynchronous interrupt can sometimes vanish if a branch which dynamically precedes the interrupt handler mispredicts. We offer a simple solution in which the processor remembers an outstanding interrupt and replays the interrupt in case of a preceding misprediction.

[1]  George Radin The 801 minicomputer , 2000, IBM J. Res. Dev..

[2]  Eric A. Brewer,et al.  How to get good performance from the CM-5 data network , 1994, Proceedings of 8th International Parallel Processing Symposium.

[3]  David A. Padua,et al.  Advanced compiler optimizations for supercomputers , 1986, CACM.

[4]  Paul Hudak,et al.  Memory coherence in shared virtual memory systems , 1986, PODC '86.

[5]  Joseph A. Fisher,et al.  Predicting conditional branch directions from previous runs of a program , 1992, ASPLOS V.

[6]  Yale N. Patt,et al.  An experimental single-chip data flow CPU , 1990 .

[7]  William J. Dally,et al.  The message-driven processor: a multicomputer processing node with efficient mechanisms , 1992, IEEE Micro.

[8]  Henry M. Levy,et al.  Hardware and software support for efficient exception handling , 1994, ASPLOS VI.

[9]  A. Gupta,et al.  The Stanford FLASH multiprocessor , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[10]  Jack B. Dennis,et al.  Data Flow Supercomputers , 1980, Computer.

[11]  Andrew R. Pleszkun,et al.  Implementing Precise Interrupts in Pipelined Processors , 1988, IEEE Trans. Computers.

[12]  Douglas Johnson,et al.  Trap architectures for Lisp systems , 1990, LISP and Functional Programming.

[13]  David W. Wall,et al.  Limits of instruction-level parallelism , 1991, ASPLOS IV.

[14]  James E. Smith,et al.  A study of branch prediction strategies , 1981, ISCA '98.

[15]  James R. Larus,et al.  The Wisconsin Wind Tunnel: virtual prototyping of parallel computers , 1993, SIGMETRICS '93.

[16]  Gregory M. Papadopoulos,et al.  Implementation of a general purpose dataflow multiprocessor , 1991 .

[17]  David W. Anderson,et al.  The IBM System/360 model 91: machine philosophy and instruction-handling , 1967 .

[18]  Monica S. Lam,et al.  Limits of control flow on parallelism , 1992, ISCA '92.

[19]  Gerry Kane,et al.  MIPS RISC Architecture , 1987 .

[20]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[21]  S SohiGurindar Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers , 1990 .

[22]  Harry F. Jordan Performance measurements on HEP - a pipelined MIMD computer , 1983, ISCA '83.

[23]  Dana S. Henry,et al.  A tightly-coupled processor-network interface , 1992, ASPLOS V.

[24]  Yale N. Patt,et al.  HPSm, a high performance restricted data flow architecture having minimal functionality , 1986, ISCA '98.

[25]  David E. Culler,et al.  Dataflow architectures , 1986 .

[26]  P.R. Wilson,et al.  Pointer swizzling at page fault time: efficiently and compatibly supporting huge address spaces on standard hardware , 1992, [1992] Proceedings of the Second International Workshop on Object Orientation in Operating Systems.