Instruction level redundant number computations for fast data intensive processing in asynchronous processors

Instruction level parallelism (ILP) is strictly limited by various dependencies. In particular, data dependency is a major performance bottleneck of data intensive applications. In this paper we address acceleration of the execution of instruction codes serialized by data dependencies. We propose a new computer architecture supporting a redundant number computation at the instruction level. To design and implement the scheme, an extended data-path and additional instructions are also proposed. The architectural exploitation of instruction level redundant number computations (IL-RNC) makes it possible to eliminate carry propagations. As a result execution of instructions which are serialized due to inherent data dependencies is accelerated. Simulations have been performed with data intensive processing benchmarks and the proposed architecture shows about a 1.2-1.35 fold speedup over a conventional counterpart. The proposed architecture model can be used effectively for data intensive processing in a microprocessor, a digital signal processor and a multimedia processor.

[1]  Jordi Cortadella,et al.  Evaluating 'A+B=K' conditions in constant time , 1988, 1988., IEEE International Symposium on Circuits and Systems.

[2]  Ran Ginosar,et al.  Kin: a high performance asynchronous processor architecture , 1998, ICS '98.

[3]  Scott Hauck,et al.  Asynchronous design methodologies: an overview , 1995, Proc. IEEE.

[4]  Mikko H. Lipasti,et al.  Exceeding the dataflow limit via value prediction , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[5]  Theo Ungerer,et al.  A survey of new research directions in microprocessors , 2000, Microprocess. Microsystems.

[6]  Ken Mai,et al.  The future of wires , 2001, Proc. IEEE.

[7]  Jordi Cortadella,et al.  Evaluation of A + B = K Conditions Without Carry Propagation , 1992, IEEE Trans. Computers.

[8]  Jim D. Garside,et al.  A result forwarding mechanism for asynchronous pipelined systems , 1997, Proceedings Third International Symposium on Advanced Research in Asynchronous Circuits and Systems.

[9]  Israel Koren Computer arithmetic algorithms , 1993 .

[10]  Earl E. Swartzlander,et al.  Parallel reduced area multipliers , 1995, J. VLSI Signal Process..

[11]  Peter A. Beerel,et al.  Speculative completion for the design of high-performance asynchronous dynamic adders , 1997, Proceedings Third International Symposium on Advanced Research in Asynchronous Circuits and Systems.

[12]  Vikas Agarwal,et al.  Clock rate versus IPC: the end of the road for conventional microarchitectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[13]  Taewhan Kim,et al.  Circuit optimization using carry-save-adder cells , 1998, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[14]  Ivan E. Sutherland,et al.  GasP: a minimal FIFO control , 2001, Proceedings Seventh International Symposium on Asynchronous Circuits and Systems. ASYNC 2001.

[15]  D. N. Jayasimha,et al.  The half-adder form and early branch condition resolution , 1997, Proceedings 13th IEEE Sympsoium on Computer Arithmetic.

[16]  Jeong-Gun Lee,et al.  Imprecise data computation for high performance asynchronous processors , 2001, ASP-DAC '01.

[17]  Giovanni De Micheli,et al.  Synthesis and Optimization of Digital Circuits , 1994 .

[18]  David W. Wall,et al.  Limits of instruction-level parallelism , 1991, ASPLOS IV.

[19]  D. N. Jayasimha,et al.  Early zero detection [integrated adder/subtracter/zero-detector] , 1996, Proceedings International Conference on Computer Design. VLSI in Computers and Processors.