Dfc Update

This document reports an update of DFC. We answer to a question about the rationale for the CP Confusion Permutation. We give new implementation results for DFC. In particular we present an impressively fast implementation which takes 323 cycles on Compaq's 21164 Alpha microprocessor. On the new 21264 we expect to reach software encryption rates over 500 Mbps. We also discuss making DFC scalable to allow the block size or the number of rounds, in the encryption or key schedule, to be varied. Finally, we describe how DFC may be subject to slight change in its key schedule in order to x a minor drawback noticed by Coppersmith. Since DFC was proposed in [5, 6], several issues were raised and several advances made. The present report addresses the following. 1. The rationale for CP was needed. 2. New implementation results. 3. Criticisms raised on the number of rounds. 4. Weak keys were identi ed. 1 Rationale of the Design of CP During the rst AES workshop, the question of the rationale for the CP Confusion Permutation was raised. The somewhat provocative answer given was that CP could be replaced by anything else (even the identity function) as far as the decorrelation analysis is concerned but, as discussed in the next section. This is not enough to guarantee real security though, and CP actually plays some role for the security. Decorrelation provides provable security against some classes of attacks, and the frontier between these attacks and other potential ones which might be covered by this theory is quite sharp. Conservative designs use heuristic security for which the frontier is usually smooth. We believe that we should use both approaches: combining decorrelation designs which provably protect against some classes of attacks, with conservative design which increases the di culty of other attacks. This is the purpose of the CP function. In the DFC design we wanted to mix several simple arithmetic operations over mixed algebraic structures. We chose a CP which combines XOR and addition (as is proposed in | for example | RC5 [7]). We also introduced some nonalgebraic randomness by means of a look-up table, and we wanted that table to be limited to 256 bytes in order to minimize memory requirements (for smart cards). We did not want to introduce rotations which are painful on the 6805 as well as on Alpha or (Ultra)Sparc. We also used random translations by constants. The original report [6] gives some rationale for the choice of the constants. 2 New Implementations We have been optimizing our implementations provided in the AES CD-ROM2. Using programming tricks to help the compilers to produce optimized code and a fast carry scheme, we basically got a 30% speed improvement for our 64-bit C implementation on Pentium Pro (1262 cycles), UltraSparc (910 cycles) and Alpha processors (565 cycles). Harley made an implementation of DFC on ARM which encrypts within 710 cycles (C language plus asm opcode) or 560 cycles (assembly code). We have optimized the Java implementation as well, using the JDK-2 and \just-in-time" compilation. Harley wrote an impressive implementation of DFC on the Alpha architecture. This implementation is in ANSI-C but requires that long types be 64-bit integers. It uses the Alpha assembly code instruction umulh if it is available (this instruction returns the 64 most signi cant bits of a 64x64 bit unsigned multiplication) and otherwise falls back to generic replacement code for the multiplication. On 21164a microprocessors, we got an encryption within 323 cycles for m = 128 and r = 8. We measured 232 cycles on a prototype for the new generation 21264. A pure C implementation (not using umulh) encrypts one block within 526 cycles on the 21164a. This implementation is given in Appendix. All these results are reported in Table 1. 3 Possible Variations on DFC In order to address several issues on DFC (namely, the low number of rounds and key scheduling issues), we discuss possible adjustments to DFC. The present report does not aim to propose a speci c variant but to show that the known problems could easily be xed.