A 33 Mflops Floating Point Processor Using Redundant Binary Representation

A 33MFLOPS single precision floating point processor that uses the redundant binary representation in a multiplier (FMUL) and a divider (FDIV) will be reported*. Each execution unit operates independently without execution pipelining. Processor design includes minimum number of pipeline stages, maximized concurrent operations of data IjO and three independent floating point units. Three pipeline stages have been used: read, execution and write stages. The execution units; arithmetic unit (FAU), FMUL, and FDIV are connected through a sevenport register (FREG) to minimize data transfer between execution units. An exponent data path, normalized and rounding circuitry are implemented in each of the units to make concurrent operation possible. Redundant binary representation was adopted to remove carry propagation. Each of the redundant binary digits, [-1, 0, 11, has been encoded into two binary bits. This has made it possible t o perform multiplication faster than the combination of Booth algorithm and Wallace tree, and division faster than SRT. In FMUL, partial products are generated using a modified redundant binary Booth recoding technique, where four bits of binary multiplier are treated as two redundant binary digits, and recoded, using the modified Booth algorithm in REC’. The number of partial product generators (PPG) thus is half of the number of PPG in the modified Booth algorithm. The partial products are then added using a redundant binary adder (RBA) tree which has been embedded into an array structure to simplify layout design; Figure 2. In FMUL, [ ( l l ) , (00), (Ol)] . Two RBA inputs, Xi = (Xsi Xai) and Yi = (Ysi Yai), and one output, Zi = (Zsi Zai), serve to make it easier to have a repetitive layout than with a Wallace tree. The i-th RBA receives information on a carry from the ( i l>th RBA; that is, Pi-1, which is 0 if both of the inpub to the (iJ)-th RBA, XI-1 and Yi-1, are non-negative and 1 otherwise. Thus, Pi-1 indicates the sign of the potential carry from the (i-1)-th REA. With this information, RBA generates intermediate sum and carry, Ci, which are determined to prevent carry propagation. A delay of one RBA is 4.2ns. Multiplication including rounding and normalization is performed in 45ns. FDIV (Figure 4) has adopted redundant binary non-restoring division where a partial remainder is represented in redundant

[1]  Naofumi Takagi,et al.  Design of high speed MOS multiplier and divider using redundant binary representation , 1987, 1987 IEEE 8th Symposium on Computer Arithmetic (ARITH).