A 2.1GHz 6.5mW 64-bit Unified PopCount/BitScan Datapath Unit for 65nm High-Performance Microprocessor Execution Cores

This paper describes a unified popcount/bitscanforward/bitscanreverse datapath circuit designed for 2.1GHz operation with total power consumption of 6.5 mW, targeted for 65 nm 64-bit microprocessor execution cores. The unified datapath uses a hybrid 3:2 compressor-based Wallace tree to count the number of '1's in the 64-bit input, along with a novel encoding scheme that enables reuse of the same tree to identify the bit-location of the 1st set bit when scanning the input in the forward and reverse directions. This circuit thus combines the functions of 3 separate units, enabling 26% reduction in total energy and 20% lower area, while achieving single-cycle latency & throughput.

[1]  S. Hsu,et al.  A 110 GOPS/W 16-bit multiplier and reconfigurable PLA loop in 90-nm CMOS , 2005, IEEE Journal of Solid-State Circuits.

[2]  P. Bai,et al.  A 65nm logic technology featuring 35nm gate lengths, enhanced channel strain, 8 Cu interconnect layers, low-k ILD and 0.57 /spl mu/m/sup 2/ SRAM cell , 2004, IEDM Technical Digest. IEEE International Electron Devices Meeting, 2004..