Find Me a Hash

W e're accustomed to hearing about the unreasonable effectiveness of mathematics, delightful—and unex-pected—applications of theory to the real world. In the world of the In-ternet, we've seen it in the use of number theory in public-key cryptography (the Diffie-Hellman system , the RSA algorithm, elliptic curve cryptosys-tems), in the utilization of graph theory in network design. In the world of Internet data security, currently we face the opposite situation: a problem in search of mathematical theory. The problem is hash functions. A hash function is an easy-to-compute compression function that takes a variable-length input and converts it to a fixed-length output. The hashes in which we are interested, called cryptographic hash functions, are " one-way " , which is to say, they should be easy to compute and " hard " , or compu-tationally expensive, to invert 1. Hash functions are used as a compact representation of a longer piece of data—a digital fingerprint—and to provide message integrity. The way hashes are used to provide integrity is that the hash value of a particular piece of data, h 0 , is computed at an initial time t 0. When the data needs to be used later at time t 1 , the hash, h 1 , is recomputed. If the two hashes are equal, then the data has not been altered. Ralph Merkle, a co-inventor of public-key cryptography, calls hashes the " duct tape " of cryptography. Among other things, hashes are used to ascertain software integrity, in digital signatures, in message authentication, and as one-time passwords; they are employed in many Internet protocols including SSL/TLS, the transport-layer protocol that enables secure Web transactions, IPsec, and SSH. Because hash functions " shrink " data, collisions between hashes are inevitable. There are three fundamental properties that a cryptographic hash should satisfy: pre-image resistance (sometimes called non-invertibility): it should be computation-ally infeasible to find an input which hashes to a specified output, second pre-image resistance: it should be computationally infeasible to find a second input that hashes to the same output as a specified input, and collision resistance: it should be computationally infeasible to find two different inputs that hash to the same output. In 1979 Merkle [10, pp. 12–13] and Gideon Yuval [12] independently observed that because of the " birthday " paradox—the well-known result that in a group of twenty-three people, the probability that two people share the same birthday …

[1]  Antoon Bosselaers,et al.  Collisions for the Compressin Function of MD5 , 1994, EUROCRYPT.

[2]  Ralph C. Merkle,et al.  A fast software one-way hash function , 1990, Journal of Cryptology.

[3]  Gideon Yuval,et al.  How to Swindle Rabin , 1979, Cryptologia.

[4]  Antoine Joux,et al.  Collisions in SHA-0 , 2004, CRYPTO 2004.

[5]  Susan Landau,et al.  Polynomials in the Nation's Service: Using Algebra to Design the Advanced Encryption Standard , 2004, Am. Math. Mon..

[6]  Antoine Joux,et al.  Collisions of SHA-0 and Reduced SHA-1 , 2005, EUROCRYPT.

[7]  Alfred Menezes,et al.  Handbook of Applied Cryptography , 2018 .

[8]  Xiaoyun Wang,et al.  Finding Collisions in the Full SHA-1 , 2005, CRYPTO.

[9]  Ivan Damgård,et al.  A Design Principle for Hash Functions , 1989, CRYPTO.

[10]  S. Landau Standing the Test of Time : The Data Encryption Standard , 2000 .

[11]  F. P. Secrecy , 1994, RES: Anthropology and Aesthetics.