Binary Floating-Point Unit Design

Since 1990 many floating-point units have been designed using a fused multiply-add dataflow. This type of design has a huge performance advantage over a separate multiplier and adder. With one compound operation, effectively two dependent operations per cycle can be achieved. Even though a fused multiply-add dataflow is now common in today’s microprocessors, there are many details which have never been discussed in papers. This chapter shows the implementation of the different parts of the fused multiply-add dataflow including the counter tree, suppression of sign extension encoding, leading zero anticipation, and end around carry adder design. This chapter illustrates algorithms and implementation details used in today’s floating-point units that have been passed down from designer to designer, becoming the folklore of floating-point unit design.

[1]  Donald E. Knuth,et al.  The art of computer programming. Vol.2: Seminumerical algorithms , 1981 .

[2]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[3]  Kevin J. Nowka,et al.  Leading zero anticipation and detection-a comparison of methods , 2001, Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001.

[4]  Cheng-Chew Lim,et al.  Parallel prefix adder design , 2001, Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001.

[5]  Vojin G. Oklobdzija,et al.  An algorithmic and novel design of a leading zero detector circuit: comparison with logic synthesis , 1994, IEEE Trans. Very Large Scale Integr. Syst..

[6]  P.-M. Seidel Multiple path IEEE floating-point fused multiply-add , 2003, 2003 46th Midwest Symposium on Circuits and Systems.

[7]  Eric M. Schwarz,et al.  FPU implementations with denormalized numbers , 2005, IEEE Transactions on Computers.

[8]  Stamatis Vassiliadis,et al.  Hard-Wired Multipliers with Encoded Partial Products , 1991, IEEE Trans. Computers.

[9]  Andrew D. Booth,et al.  A SIGNED BINARY MULTIPLICATION TECHNIQUE , 1951 .

[10]  Michael J. Flynn,et al.  Introduction to Arithmetic for Digital Systems Designers , 1995 .

[11]  Erdem Hokenek,et al.  Design of the IBM RISC System/6000 Floating-Point Execution Unit , 1990, IBM J. Res. Dev..

[12]  R. K. Richards,et al.  Arithmetic operations in digital computers , 2013 .

[13]  Makoto Suzuki,et al.  A 4.4 ns CMOS 54/spl times/54-b multiplier using pass-transistor multiplexer , 1995 .

[14]  Stamatis Vassiliadis,et al.  A General Proof for Overlapped Multiple-Bit Scanning Multiplications , 1989, IEEE Trans. Computers.

[15]  Eric M. Schwarz,et al.  Hardware implementations of denormalized numbers , 2003, Proceedings 2003 16th IEEE Symposium on Computer Arithmetic.

[16]  Javier D. Bruguera,et al.  Floating-point fused multiply-add: reduced latency for floating-point addition , 2005, 17th IEEE Symposium on Computer Arithmetic (ARITH'05).

[17]  Vojin G. Oklobdzija,et al.  An implementation algorithm and design of a novel leading zero detector circuit , 1992, [1992] Conference Record of the Twenty-Sixth Asilomar Conference on Signals, Systems & Computers.

[18]  Robert O. Winder,et al.  Majority Gate Networks , 1964, IEEE Trans. Electron. Comput..

[19]  Erdem Hokenek,et al.  Leading-Zero Anticipator (LZA) in the IBM RISC System/6000 Floating-Point Execution Unit , 1990, IBM J. Res. Dev..