Using Software-extended Architectures for Software Simultaneous Multithreading

A software-extended architecture (SEA) enhances a hardware architecture by placing a high-performance dynamic instruction-set translator between the application binary and the processor, improving processor utilization and enabling new functionality with no changes to either the processor or the binaries. Our prototype implementation of a software-extended Alpha 21164 can provide new system functionality while adding only 1%{30% to the running time of an application. Using this prototype, we have implemented software simultaneous multithreading (SSMT), a new software technique for allowing programs to make greater use of the processor pipeline. SSMT merges instruction streams from independent processes, in order to increase instruction-level parallelism. Experiments with SSMT on the software-extended Alpha 21164 show that processor throughput can be improved by up to 30% on real programs, despite the small number of issue slots on this processor.

[1]  Joseph A. Fisher,et al.  Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[2]  M.D. Smith,et al.  An Analysis of Dynamic Branch Prediction Schemes on System Workloads , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[3]  Kunle Olukotun,et al.  The case for a single-chip multiprocessor , 1996, ASPLOS VII.

[4]  Mendel Rosenblum,et al.  Embra: fast and flexible machine simulation , 1996, SIGMETRICS '96.

[5]  Michael B. Jones,et al.  Interposition agents: transparently interposing user code at the system interface , 1994, SOSP '93.

[6]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[7]  David W. Wall,et al.  Limits of instruction-level parallelism , 1991, ASPLOS IV.

[8]  Richard L. Sites,et al.  Binary translation , 1993, CACM.

[9]  Monica S. Lam,et al.  Maximizing Multiprocessor Performance with the SUIF Compiler , 1996, Digit. Tech. J..

[10]  David A. Patterson,et al.  Reduced instruction set computers , 1985, CACM.

[11]  Brian N. Bershad,et al.  Scheduler activations: effective kernel support for the user-level management of parallelism , 1991, TOCS.

[12]  David W. Wall,et al.  Global register allocation at link time , 1986, SIGPLAN '86.

[13]  Y.N. Patt,et al.  Using Hybrid Branch Predictors to Improve Branch Prediction Accuracy in the Presence of Context Switches , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[14]  James R. Larus,et al.  EEL: machine-independent executable editing , 1995, PLDI '95.

[15]  Anoop Gupta,et al.  The impact of architectural trends on operating system performance , 1995, SOSP.

[16]  Dawson R. Engler,et al.  VCODE: a retargetable, extensible, very fast dynamic code generation system , 1996, PLDI '96.

[17]  T. Anderson,et al.  Eecient Software-based Fault Isolation , 1993 .

[18]  Carl Staelin,et al.  lmbench: Portable Tools for Performance Analysis , 1996, USENIX Annual Technical Conference.

[19]  GoslingJames Java intermediate bytecodes , 1995 .