DW-AES: A Domain-Wall Nanowire-Based AES for High Throughput and Energy-Efficient Data Encryption in Non-Volatile Memory

Big-data storage poses significant challenges to anonymization of sensitive information against data sniffing. Not only will the encryption bandwidth be limited by the I/O traffic, the transfer of data between the processor and the memory will also expose the input-output mapping of intermediate computations on I/O channels that are susceptible to semi-invasive and non-invasive attacks. Limited by the simplistic cell-level logic, existing logic-in-memory computing architectures are incapable of performing the complete encryption process within the memory at reasonable throughput and energy efficiency. In this paper, a block-level in-memory architecture for advanced encryption standard (AES) is proposed. The proposed technique, called DW-AES, maps all AES operations directly to the domain-wall nanowires. The entire encryption process can be completed within a homogeneous, high-density, and standby-power-free non-volatile spintronic-based memory array without exposing the intermediate results to external I/O interface. Domain-wall nanowire-based pipelining and multi-issue pipelining methods are also proposed to increase the throughput of the baseline DW-AES with an insignificant area overhead and negligible difference on leakage power and energy consumption. The experimental results show that DW-AES can reduce the leakage power and area by the orders of magnitude compared with existing CMOS ASIC accelerators. It has an energy efficiency of 22 pJ/b, which is 5× and 3× better than the CMOS ASIC and memristive CMOL-based implementations, respectively. Under the same area budget, the proposed DW-AES achieves 4.6× higher throughput than the latest CMOS ASIC AES with similar power consumption. The throughput improvement increases to 11× for pipelined DW-AES at the expense of doubling the power consumption.

[1]  Yasuyuki Nogami,et al.  Mixed Bases for Efficient Inversion in F((22)2)2 and Conversion Matrices of SubBytes of AES , 2011, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[2]  Eric Belhaire,et al.  New non‐volatile logic based on spin‐MTJ , 2008 .

[3]  Hao Yu,et al.  Energy efficient in-memory AES encryption based on nonvolatile domain-wall nanowire , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[4]  Israel Koren,et al.  Error Analysis and Detection Procedures for a Hardware Implementation of the Advanced Encryption Standard , 2003, IEEE Trans. Computers.

[5]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[6]  Arash Reyhani-Masoleh,et al.  Efficient and High-Performance Parallel Hardware Architectures for the AES-GCM , 2012, IEEE Transactions on Computers.

[7]  K. Roy,et al.  Numerical analysis of domain wall propagation for dense memory arrays , 2011, 2011 International Electron Devices Meeting.

[8]  Hao Yu,et al.  Logic-in-memory based big-data computing by nonvolatile domain-wall nanowire devices , 2013, 2013 13th Non-Volatile Memory Technology Symposium (NVMTS).

[9]  Youguang Zhang,et al.  Domain wall motion based magnetic adder , 2012 .

[10]  Sanu Mathew,et al.  53 Gbps Native ${\rm GF}(2 ^{4}) ^{2}$ Composite-Field AES-Encrypt/Decrypt Accelerator for Content-Protection in 45 nm High-Performance Microprocessors , 2011, IEEE Journal of Solid-State Circuits.

[11]  T. Hanyu,et al.  Complementary ferroelectric-capacitor logic for low-power logic-in-memory VLSI , 2003, IEEE Journal of Solid-State Circuits.

[12]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[13]  Z. Abid,et al.  Efficient CMOL Gate Designs for Cryptography Applications , 2009, IEEE Transactions on Nanotechnology.

[14]  Naoki Kasai,et al.  Nonvolatile Magnetic Flip-Flop for Standby-Power-Free SoCs , 2009, IEEE J. Solid State Circuits.

[15]  Sanu Mathew,et al.  340 mV–1.1 V, 289 Gbps/W, 2090-Gate NanoAES Hardware Accelerator With Area-Optimized Encrypt/Decrypt GF(2 4 ) 2 Polynomials in 22 nm Tri-Gate CMOS , 2015, IEEE Journal of Solid-State Circuits.

[16]  K. Tsunekawa,et al.  Frequency-dependent magnetoresistance and magnetocapacitance properties of magnetic tunnel junctions with MgO tunnel barrier , 2007 .

[17]  Hao Yu,et al.  Analysis and Modeling of Internal State Variables for Dynamic Effects of Nonvolatile Memory Devices , 2012, IEEE Transactions on Circuits and Systems I: Regular Papers.

[18]  Shunsuke Fukami,et al.  Control of Multiple Magnetic Domain Walls by Current in a Co/Ni Nano-Wire , 2010 .

[19]  Yiran Chen,et al.  Spin Torque Random Access Memory Down to 22 nm Technology , 2008, IEEE Transactions on Magnetics.

[20]  Akashi Satoh,et al.  High-Performance Hardware Architectures for Galois Counter Mode , 2009, IEEE Transactions on Computers.

[21]  Hao Yu,et al.  An energy-efficient matrix multiplication accelerator by distributed in-memory computing on binary RRAM crossbar , 2016, ASP-DAC.

[22]  Yasuyuki Nogami,et al.  Mixed Bases for Efficient Inversion in \mathbb F((22)2) and Conversion Matrices of SubBytes of AES , 2010, CHES.

[23]  Vincent Rijmen,et al.  The Design of Rijndael: AES - The Advanced Encryption Standard , 2002 .

[24]  Kuei-Hung Shen,et al.  Racetrack Memory: A high-performance, low-cost, non-volatile memory based on magnetic domain walls , 2011, 2011 International Electron Devices Meeting.

[25]  Hao Yu,et al.  An ultralow-power memory-based big-data computing platform by nonvolatile domain-wall nanowire devices , 2013, International Symposium on Low Power Electronics and Design (ISLPED).

[26]  Ramesh Karri,et al.  Fault-based side-channel cryptanalysis tolerant Rijndael symmetric block cipher architecture , 2001, Proceedings 2001 IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems.

[27]  S. Parkin,et al.  Magnetic Domain-Wall Racetrack Memory , 2008, Science.

[28]  Hao Yu,et al.  Nonvolatile State Identification and NVM SPICE , 2014 .

[29]  Kang L. Wang,et al.  Sub-200 ps spin transfer torque switching in in-plane magnetic tunnel junctions with interface perpendicular anisotropy , 2012 .

[30]  Wei Zhang,et al.  Design Exploration of Hybrid CMOS and Memristor Circuit by New Modified Nodal Analysis , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[31]  Mark D. Stiles,et al.  Spin-Transfer Torque and Dynamics , 2006 .

[32]  Mauro Olivieri,et al.  Impact of technology scaling on leakage power in nano-scale bulk CMOS digital standard cells , 2014, Microelectron. J..

[33]  Shoji Ikeda,et al.  2Mb Spin-Transfer Torque RAM (SPRAM) with Bit-by-Bit Bidirectional Current Write and Parallelizing-Direction Current Read , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[34]  Sanu Mathew,et al.  340mV–1.1V, 289Gbps/W, 2090-gate NanoAES hardware accelerator with area-optimized encrypt/decrypt GF(24)2 polynomials in 22nm tri-gate CMOS , 2014, 2014 Symposium on VLSI Circuits Digest of Technical Papers.

[35]  Kaushik Roy,et al.  TapeCache: a high density, energy efficient cache based on domain wall memory , 2012, ISLPED '12.

[36]  Norman P. Jouppi,et al.  CACTI: an enhanced cache access and cycle time model , 1996, IEEE J. Solid State Circuits.

[37]  H. Ohno,et al.  Fabrication of a Nonvolatile Full Adder Based on Logic-in-Memory Architecture Using Magnetic Tunnel Junctions , 2008 .

[38]  Shoji Ikeda,et al.  MTJ-based nonvolatile logic-in-memory circuit, future prospects and issues , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.