Anole: A Highly Efficient Dynamically Reconfigurable Crypto-Processor for Symmetric-Key Algorithms

This paper presents a dynamically reconfigurable processing array named Anole for symmetric-key algorithms. Processing elements and the interconnections between them are designed to support various block and stream ciphers. Without affecting flexibility, three key techniques are presented to increase energy efficiency (throughput/power, the number of operations per unit energy consumption) and area efficiency (throughput/area). First, the distributed control network supports multithreading on reconfigurable fabrics at a low cost, thereby maximizing the utility of computing resources in the space domain. Second, the concurrent computation and reconfiguration scheme integrates configuration contexts with processing data to simultaneously execute in the data-path. The resulted immediate switching between different configurations increases the utilization rate of hardware resources in the temporal domain. Third, under configuration context compression and organization, the context memory size and configuration time are further minimized. Anole is implemented on a 7.75 mm2 silicon square with TSMC 65-nm technology at 400 MHz. Experiments show that Anole significantly outperforms field programmable gate array and general purpose processor by more than two orders of magnitude in energy and area efficiencies. Compared with state-of-the-art reconfigurable solutions, Anole achieves (average) $16.5\boldsymbol {\times }$ higher energy efficiency and $9.4\boldsymbol {\times }$ higher area efficiency.

[1]  Wayne Luk,et al.  Heterogeneous Systems for Energy Efficient Scientific Computing , 2012, ARC.

[2]  Chenchen Deng,et al.  Against Double Fault Attacks: Injection Effort Model, Space and Time Randomization Based Countermeasures for Reconfigurable Array Architecture , 2016, IEEE Transactions on Information Forensics and Security.

[3]  Yuan Ma,et al.  Evaluating the Optimized Implementations of SNOW3G and ZUC on FPGA , 2012, 2012 IEEE 11th International Conference on Trust, Security and Privacy in Computing and Communications.

[4]  Christof Paar,et al.  Instruction-level distributed processing for symmetric-key cryptography , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[5]  Aviral Shrivastava,et al.  Enabling Multithreading on CGRAs , 2011, 2011 International Conference on Parallel Processing.

[6]  Qingguo Zhou,et al.  Cost-Efficient Data Cryptographic Engine Based on FPGA , 2011, 2011 Fourth International Conference on Ubi-Media Computing.

[7]  Bin Liu,et al.  Parallel AES Encryption Engines for Many-Core Processor Arrays , 2013, IEEE Transactions on Computers.

[8]  Yajun Ha,et al.  FPGA-Based 40.9-Gbits/s Masked AES With Area Optimization for Storage Area Network , 2013, IEEE Transactions on Circuits and Systems II: Express Briefs.

[9]  Tom Vander Aa,et al.  Mapping of the AES cryptographic algorithm on a Coarse-Grain reconfigurable array processor , 2008, 2008 International Conference on Application-Specific Systems, Architectures and Processors.

[10]  Georgi Kuzmanov,et al.  Architectural Support for Multithreading on Reconfigurable Hardware , 2011, ARC.

[11]  M. McLoone,et al.  High-performance FPGA implementation of DES using a novel method for implementing the key schedule , 2003 .

[12]  Lilian Bossuet,et al.  Architectures of flexible symmetric key crypto engines—a survey: From hardware coprocessor to multi-crypto-processor system on chip , 2013, CSUR.

[13]  Derek Chiou,et al.  Cryptoraptor: High throughput reconfigurable cryptographic processor , 2014, 2014 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[14]  John Wawrzynek,et al.  Augmenting a microprocessor with reconfigurable hardware , 2000 .

[15]  Vaidyanathan Jairaj,et al.  High Performance Implementation of Snow3G Algorithm in Memory Limited Environments , 2011, 2011 4th IFIP International Conference on New Technologies, Mobility and Security.

[16]  Li Yongzhen,et al.  The design and implementation of a symmetric encryption algorithm based on DES , 2014, 2014 IEEE 5th International Conference on Software Engineering and Service Science.

[17]  Fadi J. Kurdahi,et al.  MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications , 2000, IEEE Trans. Computers.

[18]  J.M. Granado,et al.  Implementing the IDEA Cryptographic Algorithm in Virtex-E and Virtex-II FPGAs , 2006, MELECON 2006 - 2006 IEEE Mediterranean Electrotechnical Conference.

[19]  Canqun Yang,et al.  Constant memory optimizations in MD5 Crypt cracking algorithm on GPU-accelerated supercomputer using CUDA , 2012, 2012 7th International Conference on Computer Science & Education (ICCSE).

[20]  James S. Tiller A technical guide to IPSec virtual private networks , 2000 .

[21]  Cheng-Wen Wu,et al.  Single- and Multi-core Configurable AES Architectures for Flexible Security , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[22]  Bertrand Le Gal,et al.  A Reconfigurable Multi-core Cryptoprocessor for Multi-channel Communication Systems , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[23]  Sanu Mathew,et al.  53 Gbps Native ${\rm GF}(2 ^{4}) ^{2}$ Composite-Field AES-Encrypt/Decrypt Accelerator for Content-Protection in 45 nm High-Performance Microprocessors , 2011, IEEE Journal of Solid-State Circuits.

[24]  Takakazu Kurokawa,et al.  High-Performance Symmetric Block Ciphers on Multicore CPU and GPUs , 2012, Int. J. Netw. Comput..

[25]  Song Jun Park,et al.  Reconfigurable Computing for High Performance Computing Computational Science , 2007, 2007 DoD High Performance Computing Modernization Program Users Group Conference.

[26]  Tim Good,et al.  Pipelined AES on FPGA with support for feedback modes (in a multi-channel environment) , 2007, IET Inf. Secur..

[27]  Matti Tommiska,et al.  Hardware Implementation Analysis of the MD5 Hash Algorithm , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[28]  Leibo Liu,et al.  Polyhedral model based mapping optimization of loop nests for CGRAs , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[29]  Yu Peng,et al.  Optimizing Spatial Mapping of Nested Loop for Coarse-Grained Reconfigurable Architectures , 2015, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[30]  Annie Pérez,et al.  Celator: A Multi-algorithm Cryptographic Co-processor , 2008, 2008 International Conference on Reconfigurable Computing and FPGAs.

[31]  Issam W. Damaj,et al.  Serpent Cryptography on Static and Dynamic Reconfigurable Hardware , 2006, IEEE International Conference on Computer Systems and Applications, 2006..

[32]  David H. Albonesi,et al.  Dynamically managed multithreaded reconfigurable architectures for chip multiprocessors , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[33]  Dong Chen,et al.  Efficient architecture and implementations of AES , 2010, 2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE).

[34]  Fawnizu Azmadi Hussin,et al.  Serpent encryption algorithm implementation on Compute Unified Device Architecture (CUDA) , 2009, 2009 IEEE Student Conference on Research and Development (SCOReD).

[35]  Leibo Liu,et al.  A flexible and energy-efficient reconfigurable architecture for symmetric cipher processing , 2015, 2015 IEEE International Symposium on Circuits and Systems (ISCAS).

[36]  Rainer Buchty,et al.  Cryptonite - A Programmable Crypto Processor Architecture for High-Bandwidth Applications , 2004, ARCS.

[37]  V. Benes Optimal rearrangeable multistage connecting networks , 1964 .

[38]  Tim Good,et al.  AES on FPGA from the Fastest to the Smallest , 2005, CHES.

[39]  Bertil Svensson,et al.  Evolution in architectures and programming methodologies of coarse-grained reconfigurable computing , 2009, Microprocess. Microsystems.

[40]  Jie Li,et al.  A side-channel analysis resistant reconfigurable cryptographic coprocessor supporting multiple block cipher algorithms , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[41]  Todd M. Austin,et al.  CryptoManiac: a fast flexible architecture for secure communication , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.

[42]  Sebastian Wallner A Reconfigurable Multi-threaded Architecture Model , 2003, Asia-Pacific Computer Systems Architecture Conference.

[43]  Francisco Rodríguez-Henríquez,et al.  Reconfigurable Hardware Implementations of Tweakable Enciphering Schemes , 2010, IEEE Transactions on Computers.

[44]  B. Sadeghian,et al.  High speed implementation of Serpent algorithm , 2004, Proceedings. The 16th International Conference on Microelectronics, 2004. ICM 2004..

[45]  Leibo Liu,et al.  Exploiting Parallelism of Imperfect Nested Loops on Coarse-Grained Reconfigurable Architectures , 2016, IEEE Transactions on Parallel and Distributed Systems.

[46]  Georgi Gaydadjiev,et al.  Reconfigurable Multithreading Architectures: A Survey , 2009, SAMOS.