Bitslice Implementation of AES

Network applications need to be fast and at the same time provide security. In order to minimize the overhead of the security algorithm on the performance of the application, the speeds of encryption and decryption of the algorithm are critical. To obtain maximum performance from the algorithm, efficient techniques for its implementation must be used and the implementation must be tuned for the specific hardware on which it is running. Bitslice is a non-conventional but efficient way to implement DES in software. It involves breaking down of DES into logical bit operations so that N parallel encryptions are possible on a single N-bit microprocessor. This results in tremendous throughput. AES is a symmetric block cipher introduced by NIST as a replacement for DES. It is rapidly becoming popular due to its good security features, efficiency, performance and simplicity. In this paper we present an implementation of AES using the bitslice technique. We analyze the impact of the architecture of the microprocessor on the performance of bitslice AES. We consider three processors; the Intel Pentium 4, the AMD Athlon 64 and the Intel Core 2. We optimize the implementation to best utilize the superscalar architecture and SIMD instruction set present in the processors.