Divide-and-concatenate: an architecture-level optimization technique for universal hash functions

The authors present an architectural optimization technique called divide-and-concatenate for hardware architectures of universal hash functions based on three observations: 1) the area of a multiplier and associated data path decreases quadratically and their speeds increase gradually as their operand size is reduced; 2) multiplication is at the core of universal hash functions and multipliers consume most of the area of universal hash function hardware; and 3) two universal hash functions are equivalent if they have the same collision-probability property. In the proposed approach, the authors divide a 2w-bit data path (with collision probability 2/sup -2w/) into two w-bit data paths (each with collision probability 2/sup -w/), apply one message word to these two w-bit data paths and concatenate their results to construct an equivalent 2w-bit data path (with a collision probability 2/sup -2w/). The divide-and-concatenate technique is complementary to all circuit-, logic-, and architecture-optimization techniques. The authors applied this technique on a linear congruential universal hash (LCH) family. When compared to the 100% overhead associated with duplicating a straightforward 32-bit LCH data path, the divide-and-concatenate approach that uses four equivalent 8-bit data paths yields a 101% increase in throughput with only 52% hardware overhead.

[1]  H. T. Kung,et al.  A Regular Layout for Parallel Adders , 1982, IEEE Transactions on Computers.

[2]  Sarvar Patel,et al.  SQUARE HASH: Fast Message Authenication via Optimized Universal Hash Functions , 1999, CRYPTO.

[3]  Miodrag Potkonjak,et al.  Throughput optimization of general non-linear computations , 1999, 1999 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (Cat. No.99CH37051).

[4]  Hugo De Man,et al.  Combined hardware selection and pipelining in high-performance data-path design , 1992, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[5]  R. Ravi,et al.  Optimal Circuits for Parallel Multipliers , 1998, IEEE Trans. Computers.

[6]  Alfred Menezes,et al.  Handbook of Applied Cryptography , 2018 .

[7]  Hugo Krawczyk,et al.  MMH: Software Message Authentication in the Gbit/Second Rates , 1997, FSE.

[8]  David Hung-Chang Du,et al.  An efficient parallel critical path algorithm , 1991, 28th ACM/IEEE Design Automation Conference.

[9]  David McGrew The Truncated Multi-Modular Hash Function (TMMH) , 2001 .

[10]  G. Goto,et al.  A 54*54-b regularly structured tree multiplier , 1992 .

[11]  Christof Paar,et al.  An FPGA-based performance evaluation of the AES block cipher candidate algorithm finalists , 2001, IEEE Trans. Very Large Scale Integr. Syst..

[12]  Soha Hassoun,et al.  Architectural retiming: pipelining latency-constrained circuits , 1996, DAC '96.

[13]  Larry Carter,et al.  Universal Classes of Hash Functions , 1979, J. Comput. Syst. Sci..

[14]  John Black,et al.  Message authentication codes , 2000 .

[15]  Miodrag Potkonjak,et al.  Critical Path Minimization Using Retiming and Algebraic Speed-Up , 1993, 30th ACM/IEEE Design Automation Conference.

[16]  Thomas Shrimpton,et al.  Cryptographic Hash-Function Basics: Definitions, Implications, and Separations for Preimage Resistance, Second-Preimage Resistance, and Collision Resistance , 2004, FSE.

[17]  Keshab K. Parhi A systematic approach for design of digit-serial signal processing architectures , 1991 .

[18]  John W. Lockwood,et al.  Deep packet inspection using parallel Bloom filters , 2003, 11th Symposium on High Performance Interconnects, 2003. Proceedings..

[19]  Hugo Krawczyk,et al.  UMAC: Fast and Secure Message Authentication , 1999, CRYPTO.

[20]  Daniel D. Gajski,et al.  An effective methodology for functional pipelining , 1992, 1992 IEEE/ACM International Conference on Computer-Aided Design.