论文信息 - MARS-Multiprocessor architecture reconciling symbolic with numerical processing-a CPU ensemble with zero-delay branch/jump

MARS-Multiprocessor architecture reconciling symbolic with numerical processing-a CPU ensemble with zero-delay branch/jump

The design of CPU (central processing unit) chips for the MARS project is described. They are the IFU (instruction fetch unit), IPU (integer processing unit), and LPU (list processing unit). The IFU is devised to interleave instruction fetch and execution, and thus to achieve coordinated execution among datapath chips. The IPU is the main computing engine for integer operations and operand address calculation. By using dual-instruction buffers, a reserved phase for branch/jump target fetch, and instruction decode peeping, the architecture can support almost-zero-delay branching and super-zero-delay jump. The LPU handles a Lisp runtime environment, dynamic type checking, and fast list access. In this architecture, the critical path of complex register file access and ALU operation is distributed over the LPU and IPU, and list tracing can be executed quickly by the nondelayed car or cdr instructions.<<ETX>>

[1] D. Lilja. Reducing the Branch Penalty in Pipelined Processors , 1988, Computer.

[2] Emmanuel Katevenis,et al. Reduced instruction set computer architectures for VLSI , 1984 .

[3] Norman P. Jouppi,et al. Organization and VLSI implementation of MIPS , 1984 .

[4] Chris Rowen,et al. A CMOS RISC Processor with Integrated System Functions , 1986, COMPCON.

[5] D. J. Lalja,et al. Reducing the branch penalty in pipelined processors , 1988, Computer.

[6] James K. Archibald,et al. Cache coherence protocols: evaluation using a multiprocessor simulation model , 1986, TOCS.

[7] Richard P. Gabriel,et al. Performance and evaluation of Lisp systems , 1985 .

[8] Dave Patterson. A progress report on SPUR: February 1, 1987 , 1987, CARN.

[9] George Radin,et al. The 801 minicomputer , 1982, ASPLOS I.

[10] John Cocke,et al. A methodology for the real world , 1981 .

[11] Tack-Don Han,et al. Fast area-efficient VLSI adders , 1987, 1987 IEEE 8th Symposium on Computer Arithmetic (ARITH).

[12] Anant Agarwal,et al. MIPS-X: a 20-MIPS peak, 32-bit microprocessor with on-chip cache , 1987 .

[13] Gregory J. Chaitin,et al. Register allocation & spilling via graph coloring , 1982, SIGPLAN '82.

[14] Mark Horowitz,et al. Architectural tradeoffs in the design of MIPS-X , 1987, ISCA '87.

[15] John L. Hennessy,et al. Register allocation by priority-based coloring , 1984, SIGPLAN '84.

[16] Norman P. Jouppi,et al. Hardware/software tradeoffs for increased performance , 1982, ASPLOS I.

[17] Andrew R. Pleszkun,et al. Implementation of precise interrupts in pipelined processors , 1985, ISCA '98.

[18] Peter Steenkiste,et al. Lisp on a reduced-instruction-set processor: characterization and optimization , 1988, Computer.

[19] Alan Jay Smith,et al. Branch Prediction Strategies and Branch Target Buffer Design , 1995, Computer.