Overcoming the limitations of conventional vector processors

Despite their superior performance for multimedia applications, vector processors have three limitations that hinder their widespread acceptance. First, the complexity and size of the centralized vector register file limits the number of functional units. Second, precise exceptions for vector instructions are difficult to implement. Third, vector processors require an expensive on-chip memory system that supports high bandwidth at low access latency.This paper introduces CODE, a scalable vector microarchitecture that addresses these three shortcomings. It is designed around a clustered vector register file and uses a separate network for operand transfers across functional units. With extensive use of decoupling, it can hide the latency of communication across functional units and provides 26% performance improvement over a centralized organization. CODE scales efficiently to 8 functional units without requiring wide instruction issue capabilities. A renaming table makes the clustered register file transparent at the instruction set level. Renaming also enables precise exceptions for vector instructions at a performance loss of less than 5%. Finally, decoupling allows CODE to tolerate large increases in memory latency at sub-linear performance degradation without using on-chip caches. Thus, CODE can use economical, off-chip, memory systems.

[1]  Mateo Valero,et al.  Vector architectures: past, present and future , 1998, ICS '98.

[2]  Mateo Valero,et al.  Decoupled vector architectures , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[3]  Norman P. Jouppi,et al.  The Multicluster Architecture: Reducing Processor Cycle Time Through Partitioning , 1999, International Journal of Parallel Programming.

[4]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[5]  Marc Tremblay,et al.  The MAJC Architecture: A Synthesis of Parallelism and Scalability , 2000, IEEE Micro.

[6]  William J. Dally,et al.  Register organization for media processing , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[7]  Christoforos E. Kozyrakis,et al.  Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks , 2002, MICRO.

[8]  Andrew R. Pleszkun,et al.  Implementing Precise Interrupts in Pipelined Processors , 1988, IEEE Trans. Computers.

[9]  Joseph A. Fisher,et al.  Clustered Instruction-Level Parallel Processors , 1998 .

[10]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor , 1999, IEEE Micro.

[11]  Matthew Mattina,et al.  Tarantula: a vector extension to the alpha architecture , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[12]  Geoffrey Brown,et al.  Lx: a technology platform for customizable VLIW embedded processing , 2000, ISCA '00.

[13]  William J. Dally,et al.  Imagine: Media Processing with Streams , 2001, IEEE Micro.

[14]  Gurindar S. Sohi,et al.  Characterizing and predicting value degree of use , 2002, MICRO.

[15]  John Wawrzynek,et al.  Vector microprocessors , 1998 .

[16]  Richard Crisp,et al.  Direct RAMbus technology: the new main memory standard , 1997, IEEE Micro.

[17]  James E. Smith,et al.  An instruction set and microarchitecture for instruction level distributed processing , 2002, ISCA.

[18]  Norman P. Jouppi,et al.  The multicluster architecture: reducing cycle time through partitioning , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[19]  David A. Patterson,et al.  Scalable Vector Media-processors for Embedded Systems , 2002 .

[20]  Mateo Valero,et al.  Out-of-order vector architectures , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[21]  William J. Dally,et al.  Exploring the VLSI scalability of stream processors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[22]  Hiroshi Takahara,et al.  NEC SX Series and Its Applications to Weather and Climate Modeling , 2002 .