On the design of high performance digital arithmetic units
暂无分享,去创建一个
Several new algorithms for use for enhancing the performance of pipelined digital computers have been developed and evaluated. The design of a particular such computer embodying most of these algorithms is discussed in detail--the S-1 Mark IIA. The relationship and importance of the new algorithms to the overall performance of such a machine is analyzed.
An algorithm for the very rapid pipelined computation of medium precision approximations (about 30 bits for the Mark IIA) to elementary functions is described. This method uses table lookup and two parallel multiplications to triple the precision available from direct table lookup. Current RAM technology permits the effective use of this algorithm for non-trivial word sizes. The method is applied to reciprocal, square-root, exponential, logarithm, arctangent, sine, cosine, and the error function.
A floating-point addition algorithm which has a much shorter latency than previous approaches is developed and analyzed. This algorithm lends itself to the efficient simultaneous calculation of floating-point sums and differences, which is of great value computing FFTs and to other related algorithms. The algorithm resolves floating-point addition into one of two independent cases, each of which can be implemented in fewer logic gate delays than previous algorithms.
Previous techniques of sorting on pipelined machines are analyzed and a new algorithm based on Quicksort is developed. This new algorithm is significantly faster and simpler than previous pipelined sorting techniques.
The use of skewed data representations to increase the performance of interleaved memories for many algorithms is well known. However a large price is paid in convenience by the use of such techniques. A new approach which allows the use of normal data representations but which has all of the performance advantages of the skewed representations is described. This technique is particularly valuable since the hardware used to implement it can also serve as a queue to minimize the effects of temporary stoppages in the instruction and operand fetching and arithmetic execution hardware.
Several other new techniques for performance enhancement are also described and analyzed, and fruitful directions for future work in this area are discussed.