Long pipelines in single-chip digital signal processors-concepts and case study

The effectiveness of long pipelines in single-chip digital signal processors for complex algorithms was studied using a processor model with 25 pipeline stages. The processor is based on a Harvard architecture. Pipelining is used to reduce the instruction cycle time compared to current signal processors. Key features of the processor model are data-stationary pipeline control, local resolution of pipeline hazards with buffering, multiple branch prediction, a mixed relative-incremental addressing scheme, and asynchronous communication between pipeline and environment. The processor is implemented as a software model. The results show that high pipeline utilization can be achieved for a variety of algorithms leading to a significantly higher performance than achieved by conventional single-chip signal processors with Harvard architecture. >

[1]  Norman P. Jouppi,et al.  Hardware/software tradeoffs for increased performance , 1982, ASPLOS I.

[2]  Harold S. Stone,et al.  Parallel Processing with the Perfect Shuffle , 1971, IEEE Transactions on Computers.

[3]  S. McFarling,et al.  Reducing the cost of branches , 1986, ISCA '86.

[4]  Carlo H. Séquin,et al.  A VLSI RISC , 1982, Computer.

[5]  T. Mori,et al.  Implementation of a bipolar real-time image signal processor-RISP-II , 1987 .

[6]  George Radin,et al.  The 801 minicomputer , 1982, ASPLOS I.

[7]  Thomas R. Gross,et al.  Optimizing delayed branches , 1982, MICRO 15.

[8]  F. Anceau A synchronous approach for clocking VLSI systems , 1982, IEEE Journal of Solid-State Circuits.

[9]  Alan Jay Smith,et al.  Branch Prediction Strategies and Branch Target Buffer Design , 1995, Computer.

[10]  David W. Anderson,et al.  The IBM System/360 model 91: machine philosophy and instruction-handling , 1967 .

[11]  Norman P. Jouppi,et al.  Organization and VLSI implementation of MIPS , 1984 .

[12]  James E. Smith,et al.  A study of branch prediction strategies , 1981, ISCA '98.

[13]  Joseph A. Fisher,et al.  2n-way jump microinstruction hardware and an effective instruction binding method , 1980, SIGM.

[14]  Peter W. Cook,et al.  A 15-ns CMOS 64K RAM , 1986 .

[15]  R. I. Kung,et al.  Two-13 ns-64K CMOS SRAM's with very low active power and improved asynchronous circuit techniques , 1986 .

[16]  Roland N. Ibbett,et al.  An Analysis of Instruction-Fetching Strategies in Pipelined Computers , 1980, IEEE Transactions on Computers.

[17]  David R. Ditzel,et al.  Branch folding in the CRISP microprocessor: reducing branch delay to zero , 1987, ISCA '87.

[18]  Christer Svensson,et al.  Signal resynchronization in VLSI systems , 1986, Integr..

[19]  Kai Hwang,et al.  Vector-Reduction Techniques for Arithmetic Pipelines , 1985, IEEE Transactions on Computers.