论文信息 - Instruction level redundant number computations for fast data intensive processing in asynchronous processors

Instruction level redundant number computations for fast data intensive processing in asynchronous processors

Instruction level parallelism (ILP) is strictly limited by various dependencies. In particular, data dependency is a major performance bottleneck of data intensive applications. In this paper we address acceleration of the execution of instruction codes serialized by data dependencies. We propose a new computer architecture supporting a redundant number computation at the instruction level. To design and implement the scheme, an extended data-path and additional instructions are also proposed. The architectural exploitation of instruction level redundant number computations (IL-RNC) makes it possible to eliminate carry propagations. As a result execution of instructions which are serialized due to inherent data dependencies is accelerated. Simulations have been performed with data intensive processing benchmarks and the proposed architecture shows about a 1.2-1.35 fold speedup over a conventional counterpart. The proposed architecture model can be used effectively for data intensive processing in a microprocessor, a digital signal processor and a multimedia processor.

Jeong-Gun Lee | Euiseok Kim | Dong-Ik Lee

[1] Jordi Cortadella,et al. Evaluating 'A+B=K' conditions in constant time , 1988, 1988., IEEE International Symposium on Circuits and Systems.

[2] Ran Ginosar,et al. Kin: a high performance asynchronous processor architecture , 1998, ICS '98.

[3] Scott Hauck,et al. Asynchronous design methodologies: an overview , 1995, Proc. IEEE.

[4] Mikko H. Lipasti,et al. Exceeding the dataflow limit via value prediction , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[5] Theo Ungerer,et al. A survey of new research directions in microprocessors , 2000, Microprocess. Microsystems.

[6] Ken Mai,et al. The future of wires , 2001, Proc. IEEE.

[7] Jordi Cortadella,et al. Evaluation of A + B = K Conditions Without Carry Propagation , 1992, IEEE Trans. Computers.

[8] Jim D. Garside,et al. A result forwarding mechanism for asynchronous pipelined systems , 1997, Proceedings Third International Symposium on Advanced Research in Asynchronous Circuits and Systems.

[9] Israel Koren. Computer arithmetic algorithms , 1993 .

[10] Earl E. Swartzlander,et al. Parallel reduced area multipliers , 1995, J. VLSI Signal Process..

[11] Peter A. Beerel,et al. Speculative completion for the design of high-performance asynchronous dynamic adders , 1997, Proceedings Third International Symposium on Advanced Research in Asynchronous Circuits and Systems.

[12] Vikas Agarwal,et al. Clock rate versus IPC: the end of the road for conventional microarchitectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[13] Taewhan Kim,et al. Circuit optimization using carry-save-adder cells , 1998, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[14] Ivan E. Sutherland,et al. GasP: a minimal FIFO control , 2001, Proceedings Seventh International Symposium on Asynchronous Circuits and Systems. ASYNC 2001.

[15] D. N. Jayasimha,et al. The half-adder form and early branch condition resolution , 1997, Proceedings 13th IEEE Sympsoium on Computer Arithmetic.

[16] Jeong-Gun Lee,et al. Imprecise data computation for high performance asynchronous processors , 2001, ASP-DAC '01.

[17] Giovanni De Micheli,et al. Synthesis and Optimization of Digital Circuits , 1994 .

[18] David W. Wall,et al. Limits of instruction-level parallelism , 1991, ASPLOS IV.

[19] D. N. Jayasimha,et al. Early zero detection [integrated adder/subtracter/zero-detector] , 1996, Proceedings International Conference on Computer Design. VLSI in Computers and Processors.