An Optimization on Shortening the Delay of the Floating-Point Division with SRT Algorithm

Based on the traditional structure of SRT division algorithm, an optimized structure is obtained to short the delay of the critical path. That structure can let the two independent parts in the traditional structures (including computing the input value of quotient selection table and looking up the quotient selection table) work in parallel. And in this paper, the traditional structure and the optimized structure are implemented in Verilog hardware description language. Then, these designs are synthesized with the Design Compiler synthesis tool (with . 18 micron CMOS standard cell library) to obtain the delay and area. Synthesis results show that, compared with the traditional structures, the optimized structures can short the delay of different radixes obviously. Meanwhile, the shorter delay should be paid more area. Specifically, with the optimized structures, radix-4 has approximately 13.30% shorter delay (the delay is about 0.27ns), but requires approximately 5.02% more area; Radix-8 has approximately 22.31% shorter delay (the delay is about 0.54ns), but requires approximately 31.94% more area; and Radix-16 has approximately 12.41% shorter delay (the delay is about 0.33ns), but requires approximately 259.59% more area.