The vector-thread architecture

The vector-thread (VT) architectural paradigm unifies the vector and multithreaded compute models. The VT abstraction provides the programmer with a control processor and a vector of virtual processors (VPs). The control processor can use vector-fetch commands to broadcast instructions to all the VPs or each VP can use thread-fetches to direct its own control flow. A seamless intermixing of the vector and threaded control mechanisms allows a VT architecture to flexibly and compactly encode application parallelism and locality, and a VT machine exploits these to improve performance and efficiency. We present SCALE, an instantiation of the VT architecture designed for low-power and high-performance embedded systems. We evaluate the SCALE prototype design using detailed simulation of a broad range of embedded applications and show that its performance is competitive with larger and more complex processors.

[1]  Chris R. Jesshope Implementing an efficient vector instruction set in a chip multi-processor using micro-threaded pipelines , 2001, Proceedings 6th Australasian Computer Systems Architecture Conference. ACSAC 2001.

[2]  Jaehyuk Huh,et al.  Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture , 2003, ISCA '03.

[3]  David A. Patterson,et al.  Scalable Vector Media-processors for Embedded Systems , 2002 .

[4]  Michael Zhang,et al.  Highly-Associative Caches for Low-Power Processors , 2000 .

[5]  Tadashi Watanabe Architecture and performance of NEC supercomputer SX system , 1987, Parallel Comput..

[6]  William J. Dally,et al.  Smart Memories: a modular reconfigurable architecture , 2000, ISCA '00.

[7]  Brian Kingsbury,et al.  Spert-II: A Vector Microprocessor System , 1996, Computer.

[8]  Tzi-cker Chiueh,et al.  Multi-threaded vectorization , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.

[9]  James E. Smith,et al.  Dynamic instruction scheduling and the Astronautics ZS-1 , 1989, Computer.

[10]  Yasuhiko Hagihara,et al.  A hardware overview of SX-6 and SX-7 supercomputer , 2003 .

[11]  William J. Dally,et al.  A bandwidth-efficient architecture for media processing , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[12]  Christoforos E. Kozyrakis,et al.  Overcoming the limitations of conventional vector processors , 2003, ISCA '03.

[13]  Christopher Batten,et al.  Cache Refill/Access Decoupling for Vector Machines , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[14]  Vivek Sarkar,et al.  Baring It All to Software: Raw Machines , 1997, Computer.

[15]  Noah Treuhaft,et al.  Scalable Processors in the Billion-Transistor Era: IRAM , 1997, Computer.

[16]  James S. Kolodzey The CRAY-1 com computer system , 2000 .

[17]  Christopher Batten,et al.  The Vector-Thread Architecture , 2004, ISCA 2004.