Divide and concatenate: a scalable hardware architecture for universal MAC

We present a cryptographic architecture optimization technique called divide-and-concatenate based on two observations: (i) the area of a multiplier and associated data path decreases quadratically and their speeds increase gradually as their operand size is reduced. (ii) in hash functions, message authentication codes and related cryptographic algorithms, two functions are equivalent if they have the same collision probability property. In the proposed approach we divide a 2w-bit data path into two w-bit data paths and concatenate their results to construct an equivalent 2w-bit data path. We applied this technique on NH hash. When compared to the 100% overhead associated with duplicating a straightforward 32-bit pipelined NH hash data path, the divide-and-concatenate approach yields a 94% increase in throughput with only 40% hardware overhead. The NH hash associated message authentication code UMAC architecture with collision probability 2-32 that uses four equivalent 8-bit divide-and-concatenate NH hash data paths yields a throughput of 79.2 Gbps with only 3840 FPGA slices when implemented on a Xilinx FPGA.