Fast parallel table lookups to accelerate symmetric-key cryptography

Table lookups are one of the most frequently-used operations in symmetric key ciphers. Particularly in the newer algorithms such as the advanced encryption standard (AES), we frequently find that the greatest fraction of the execution time is spent during table lookups, varying between 34% and 72% for the five representative ciphers we consider: AES, Blowfish, Twofish, MARS, and RC4. In order to accelerate and parallelize these table lookups, we describe a new parallel table lookup (ptlu) instruction. Our synthesis results indicate that such an instruction can be added to a basic RISC processor with no cycle time impact. We compare the performance of the ptlu instruction with the speedups available through more conventional architectural techniques such as multiple-issue execution. We find that the performance benefits of using the ptlu instruction can be far higher than increasing the number of instructions executed per cycle in superscalar or VLIW processors.

[1]  Paul Douglas,et al.  Proceedings International Conference on Information Technology: Coding and Computing , 2002, Proceedings. International Conference on Information Technology: Coding and Computing.

[2]  Chris Weaver,et al.  CryptoManiac: a fast flexible architecture for secure communication , 2001, ISCA 2001.

[3]  A. Murat Fiskiran,et al.  3 Multimedia Instructions in Microprocessors for Native Signal Processing , 2001 .

[4]  T. Austin,et al.  Architectural support for fast symmetric-key cryptography , 2000, ASPLOS IX.

[5]  Bernard P. Zajac Applied cryptography: Protocols, algorithms, and source code in C , 1994 .

[6]  Ruby B. Lee,et al.  Evaluating instruction set extensions for fast arithmetic on binary finite fields , 2004, Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004..

[7]  Alfred Menezes,et al.  Handbook of Applied Cryptography , 2018 .

[8]  Ruby B. Lee,et al.  Performance impact of addressing modes on encryption algorithms , 2001, Proceedings 2001 IEEE International Conference on Computer Design: VLSI in Computers and Processors. ICCD 2001.

[9]  Fadi J. Kurdahi,et al.  Design and Implementation of the MorphoSys Reconfigurable Computing Processor , 2000, J. VLSI Signal Process..

[10]  Ruby B. Lee,et al.  PAX : A Datapath-Scalable Minimalist Cryptographic Processor For Mobile Environments , 2003 .

[11]  Shai Halevi,et al.  MARS - a candidate cipher for AES , 1999 .

[12]  Ruby B. Lee,et al.  Refining instruction set architecture for high-performance multimedia processing in constrained environments , 2002, Proceedings IEEE International Conference on Application- Specific Systems, Architectures, and Processors.

[13]  Ruby B. Lee,et al.  PLX: An Instruction Set Architecture and Testbed for Multimedia Information Processing , 2005, J. VLSI Signal Process..