PipeRench implementation of the instruction path coprocessor

The paper demonstrates how an Instruction Path Coprocessor (I-COP) can be efficiently implemented using the PipeRench reconfigurable architecture. An I-COP is a programmable on-chip coprocessor that operates on the core processor's instructions to transform them into a new format that can be more efficiently executed. The I-COP can be used to implement many sophisticated hardware code modification techniques. We show how four specific techniques can be mapped to the PipeRench pipelined computation model. The experimental results show that a PipeRench I-COP used to perform trace construction and trace optimizations for a trace cache fill unit not only achieves good performance gains but can potentially be implemented in less than 10 mm/sup 2/ (assuming 0.18 micron technology) or approximately 3% of the die area of a current high-end microprocessor. We believe these results demonstrate the usefulness and feasibility of the I-COP concept.

[1]  Yale N. Patt,et al.  Putting the fill unit to work: dynamic optimizations for trace cache microprocessors , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[2]  Herman Schmit Incremental reconfiguration for pipelined applications , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).

[3]  Wen-mei W. Hwu,et al.  A hardware mechanism for dynamic extraction and relayout of program hot spots , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[4]  Quinn Jacobson,et al.  Instruction pre-processing in trace processors , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[5]  Sanjay J. Patel,et al.  Critical Issues Regarding the Trace Cache Fetch Mechanism , 1997 .

[6]  John Paul Shen,et al.  Completion time multiple branch prediction for enhancing trace cache performance , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[7]  Gurindar S. Sohi,et al.  Effective jump-pointer prefetching for linked data structures , 1999, ISCA.

[8]  Jean-Loup Baer,et al.  Effective Hardware Based Data Prefetching for High-Performance Processors , 1995, IEEE Trans. Computers.

[9]  Seth Copen Goldstein,et al.  PipeRench: a co/processor for streaming multimedia acceleration , 1999, ISCA.

[10]  Anne Rogers,et al.  Supporting dynamic data structures on distributed-memory machines , 1995, TOPL.

[11]  John Paul Shen,et al.  The block-based trace cache , 1999, ISCA.

[12]  R. Nair,et al.  Exploiting Instruction Level Parallelism In Processors By Caching Scheduled Groups , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[13]  Seth Copen Goldstein,et al.  Fast compilation for pipelined reconfigurable fabrics , 1999, FPGA '99.

[14]  M. Usami,et al.  A 1.8 ns access, 550 MHz 4.5 Mb CMOS SRAM , 1998, 1998 IEEE International Solid-State Circuits Conference. Digest of Technical Papers, ISSCC. First Edition (Cat. No.98CH36156).

[15]  John Paul Shen,et al.  Instruction path coprocessors , 2000, ISCA '00.

[16]  R. Khanna,et al.  Circuit techniques in a 266-MHz MMX-enabled processor , 1997 .

[17]  Eric Rotenberg,et al.  Trace cache: a low latency approach to high bandwidth instruction fetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.