First Analysis of Keccak

We apply known automated cryptanalytic tools to the Keccak-f [1600] permutation, using a triangulation tool to solve the CICO problem, and cube testers to detect some structure in the algebraic description of the reduced Keccak-f [1600]. The applicability of our tools was notably limited by the strength of the inverse permutation. Unless otherwise stated, we consider the Keccak permutation used in the Keccak submission to SHA-3, that is, the function called Keccak-f [1600] in [2]. 1 Solving the CICO problem 1.1 Preliminaries Assume we try to detect non-randomness in a function f with n-bit input and m-bit output. Consider the following problem: find a solution to f(x) = y (1) such that first q bits of x and y are zero, q ≤ min(n,m). Brute-force search, which works for any f , requires about 2 computations of f . This bound remains the same even if n = m and f is invertible. We expect that for a ”good” hash transformation, this problem should have the same workload. Although non-trivial solutions do not imply collision and preimage weaknesses, they are a first sign of non-ideal behavior. Such a weakness has been previously discovered in a reduced version of MD6 [3]: for 26 rounds of the compression function four bits can be fixed, two bits for 30 rounds, and one bit for 33 rounds. It has been recently suggested that so many rounds can be broken due to slow diffusion in MD6. A more general problem has been proposed by the designers of Keccak [2]. Assume that n = m and define X ⊆ {0, 1} as a set of possible inputs and Y ⊆ {0, 1} as a set of possible outputs. Then find a solution to Eq. (1) with (x, y) ∈ X×Y ; [2, §§4.2.4] calls this the CICO problem (Constrained-Input Constrained-Output). 1.2 Triangulation algorithm The triangulation algorithm was proposed in [4] as a tool for solving systems of non-linear equations (see Appendix A), which appear in the differential attacks. Given the constraints on the internal variables, the algorithm outputs a special set of variables, called free variables. Those variables can be assigned randomly; and this assignment together with pre-fixed variables completely and efficiently determines the whole execution. The fewer variables are fixed, the better the algorithm works. Consider the application of the triangulation algorithm to the analysis of hash functions. Evidently, the initial value and the message completely determine all the execution of the transformation. Following the framework of the Gaussian elimination process, such variables are called free variables since they can be assigned randomly and independently. While it is trivial to find a set of free variables when there are no constraints, it becomes harder if some variables are pre-fixed and are positioned far from one another. The fixed bits of the input and the output (CICO) are an example of such case. Underlying ideas. The triangulation algorithm iteratively searches for a variable involved in only one equation, and that can be expressed as a function of the other variables involved in that equation. If such a variable is found this implies that it can be determined in the last step when all the other variables are known. Then the equation and the variable are removed from the consideration (in Gaussian elimination terminology, they are put on the diagonal), and the process goes on. Applications. So far, the algorithm properties are not carefully investigated. However, based on empirical observations, we conjecture that it stops working if the distance between fixed variables is twice the number of rounds needed for the full diffusion. If the diffusion is different in the backward direction, the bound may change. Thus the efficiency of the triangulation algorithm applied to Keccak is determined by its diffusion properties.