Implementing an efficient vector instruction set in a chip multi-processor using micro-threaded pipelines

This paper looks at a combination of two techniques, one of which, using a vector instruction set, has a long history dating back to pipelined vector supercomputers, such as the Cray 1 and its successors. The other technique, multi-threading, is also well understood. The novel approach proposed in this paper combines both vertical and horizontal micro-threading with vector instruction descriptors. It will be shown that a family of threads can represent a vector instruction with dependencies between the instances of that family, the iterations. This technique gives a very low overhead in implementing an n-way loop and is able to tolerate high memory latency. The use of micro-threading to handle dependencies between threads provides the ability to trade-off between instruction level parallelism and loop parallelism. The paper describes the means by which instruction classes may be instanced as independent parallel micro-threads and illustrates the speed-up that may be obtained compared to using a conventional loop.

[1]  D. Parkinson Parallel efficiency can be greater than unity , 1986, Parallel Comput..

[2]  Charles L. Seitz,et al.  The cosmic cube , 1985, CACM.

[3]  Israel Koren,et al.  Tradeoffs in the Design of Single Chip Multiprocessors , 1994, IFIP PACT.

[4]  Chris R. Jesshope,et al.  Micro-threading: a new approach to future RISC , 2000, Proceedings 5th Australasian Computer Architecture Conference. ACAC 2000 (Cat. No.PR00512).

[5]  DAVID P. HELMBOLD,et al.  Modeling Speedup (n) Greater than n , 1990, IEEE Trans. Parallel Distributed Syst..

[6]  Todd C. Mowry,et al.  The potential for using thread-level data speculation to facilitate automatic parallelization , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[7]  C. R. Jesshope,et al.  Dynamic scheduling in RISC architectures , 1996 .