Exploiting bit-level parallelism in GPGPUs: A case study on KeeLoq exhaustive key search attack

Graphic Processing Units (GPU) are increasingly popular in the field of high-performance computing for their ability to provide computational power for massively parallel problems at a reduced cost. However, the programming model exposed by the GPGPU software development tools is often insufficient to achieve full performance, and a major rethinking of algorithmic choices is needed. In this paper, we showcase such an effect on a case study drawn from the cryptography application domain. The pervasive use of cryptographic primitives in modern embedded systems is a growing trend. Small, efficient cryptosystems have been effectively employed to design and implement keyless password-based access control systems in various wireless authentication applications. The security margin provided by these lightweight ciphers should be accurately examined in light of the speed and area constraints imposed by the target environment. We present a re-design of the ASIC-oriented KEELOQ implementation to perform efficient exhaustive key search attacks while fitting tightly the parallel programming model exposed by modern GPUs. Indeed, the bitslicing technique allows the intrinsic parallelism offered by word-oriented SIMD computations to be effectively exploited. Through proper adaptation of the algorithm implementation to a platform radically different from the one it was designed for, we achieved a ×40 speedup in the computation time with respect to a single-core CPU bruteforce attack, employing only consumer grade hardware. The outstanding speedup obtainable points to a significant weakening of the cipher security margin, since it proves that anyone with off-the-shelf hardware is able to circumvent the security measures in place.

[1]  Erik Lindholm,et al.  NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[2]  John Waldron,et al.  AES Encryption Implementation and Analysis on Commodity Graphics Processing Units , 2007, CHES.

[3]  Eli Biham,et al.  A Fast New DES Implementation in Software , 1997, FSE.

[4]  Eli Biham,et al.  A Practical Attack on KeeLoq , 2008, Journal of Cryptology.

[5]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[6]  Giovanni Agosta,et al.  Fast Disk Encryption through GPGPU Acceleration , 2009, 2009 International Conference on Parallel and Distributed Computing, Applications and Technologies.

[7]  Andrey Bogdanov,et al.  Linear Slide Attacks on the KeeLoq Block Cipher , 2007, Inscrypt.

[8]  Giovanni Agosta,et al.  Design of a parallel AES for graphics hardware using the CUDA framework , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[9]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[10]  Christof Paar,et al.  On the Power of Power Analysis in the Real World: A Complete Break of the KeeLoqCode Hopping Scheme , 2008, CRYPTO.

[11]  Gregory V. Bard,et al.  Algebraic and Slide Attacks on KeeLoq , 2008, FSE.

[12]  Giovanni Agosta,et al.  Record Setting Software Implementation of DES Using CUDA , 2010, 2010 Seventh International Conference on Information Technology: New Generations.

[13]  J. Xu OpenCL – The Open Standard for Parallel Programming of Heterogeneous Systems , 2009 .