Fixslicing AES-like Ciphers: New bitsliced AES speed records on ARM-Cortex M and RISC-V

The fixslicing implementation strategy was originally introduced as a new representation for the hardware-oriented GIFT block cipher to achieve very efficient software constant-time implementations. In this article, we show that the fundamental idea underlying the fixslicing technique is not of interest only for GIFT, but can be applied to other ciphers as well. Especially, we study the benefits of fixslicing in the case of AES and show that it allows to reduce by 41% the amount of operations required by the linear layer when compared to the current fastest bitsliced implementation on 32-bit platforms. Overall, we report that fixsliced AES-128 allows to reach 83 and 98 cycles per byte on ARM Cortex-M and E31 RISC-V processors respectively (assuming pre-computed round keys), improving the previous records on those platforms by 17% and 20%. In order to highlight that our work also directly improves masked implementations that rely on bitslicing, we report implementation results when integrating first-order masking that outperform by 12% the fastest results reported in the literature on ARM Cortex-M4. Finally, we demonstrate the genericity of the fixslicing technique for AES-like designs by applying it to the Skinny-128 tweakable block ciphers.

[1]  Vincent Rijmen,et al.  The Design of Rijndael: AES - The Advanced Encryption Standard , 2002 .

[2]  Markku-Juhani O. Saarinen,et al.  The design of scalar AES Instruction Set Extensions for RISC-V , 2020, IACR Cryptol. ePrint Arch..

[3]  Joseph Bonneau,et al.  Cache-Collision Timing Attacks Against AES , 2006, CHES.

[4]  Julio César López-Hernández,et al.  PRESENT Runs Fast - Efficient and Secure Implementation in Software , 2017, CHES.

[5]  Peter Schwabe,et al.  Faster and Timing-Attack Resistant AES-GCM , 2009, CHES.

[6]  Michael Hamburg,et al.  Accelerating AES with Vector Permute Instructions , 2009, CHES.

[7]  Thomas Peyrin,et al.  The SKINNY Family of Block Ciphers and its Low-Latency Variant MANTIS , 2016, IACR Cryptol. ePrint Arch..

[8]  Daniel J. Bernstein,et al.  Cache-timing attacks on AES , 2005 .

[9]  Chester Rebeiro,et al.  Bitslice Implementation of AES , 2006, CANS.

[10]  Ingrid Verbauwhede,et al.  DPA, Bitslicing and Masking at 1 GHz , 2015, IACR Cryptol. ePrint Arch..

[11]  Ko Stoffelen,et al.  First-Order Masking with Only Two Random Bits , 2019, TIS@CCS.

[12]  Thomas Peyrin,et al.  GIFT: A Small Present , 2017, IACR Cryptol. ePrint Arch..

[13]  Ko Stoffelen,et al.  Instruction Scheduling and Register Allocation on ARM Cortex-M , 2016 .

[14]  Thomas Peyrin,et al.  Fixslicing: A New GIFT Representation Fast Constant-Time Implementations of GIFT and GIFT-COFB on ARM Cortex-M , 2020, IACR Trans. Cryptogr. Hardw. Embed. Syst..

[15]  Mitsuru Matsui,et al.  On the Power of Bitslice Implementation on Intel Core2 Processor , 2007, CHES.

[16]  Robert Könighofer,et al.  A Fast and Cache-Timing Resistant Implementation of the AES , 2008, CT-RSA.

[17]  Eli Biham,et al.  A Fast New DES Implementation in Software , 1997, FSE.

[18]  Pankaj Rohatgi,et al.  Towards Sound Approaches to Counteract Power-Analysis Attacks , 1999, CRYPTO.

[19]  Joan Boyar,et al.  A New Combinational Logic Minimization Technique with Applications to Cryptology , 2010, SEA.

[20]  Ko Stoffelen,et al.  Efficient Cryptography on the RISC-V Architecture , 2019, IACR Cryptol. ePrint Arch..

[21]  Werner Schindler,et al.  CHES 2018 Side Channel Contest CTF - Solution of the AES Challenges , 2019, IACR Cryptol. ePrint Arch..

[22]  Matthew Kwan Reducing the Gate Count of Bitslice DES , 2000, IACR Cryptol. ePrint Arch..

[23]  Andrey Bogdanov,et al.  PRESENT: An Ultra-Lightweight Block Cipher , 2007, CHES.

[24]  Peter Schwabe,et al.  All the AES You Need on Cortex-M3 and M4 , 2016, SAC.

[25]  Markku-Juhani O. Saarinen A Lightweight ISA Extension for AES and SM4 , 2020, ArXiv.