论文信息 - PIMA-Logic: A Novel Processing-in-Memory Architecture for Highly Flexible and Energy-Efficient Logic Computation

PIMA-Logic: A Novel Processing-in-Memory Architecture for Highly Flexible and Energy-Efficient Logic Computation

In this paper, we propose PIMA-Logic, as a novel Processing-in-Memory Architecture for highly flexible and efficient Logic computation. Instead of integrating complex logic units in cost-sensitive memory, PIMA-Logic exploits a hardware-friendly approach to implement Boolean logic functions between operands either located in the same row or the same column within entire memory arrays. Furthermore, it can efficiently process more complex logic functions between multiple operands to further reduce the latency and power-hungry data movement. The proposed architecture is developed based on Spin Orbit Torque Magnetic Random Access Memory (SOT-MRAM) array and it can simultaneously work as a non-volatile memory and a reconfigurable in-memory logic. The device-to-architecture co-simulation results show that PIMA-Logic can achieve up to 56% and 31.6% improvements with respect to overall energy and delay on combinational logic benchmarks compared to recent Pinatubo architecture. We further implement an in-memory data encryption engine based on PIMA-Logic as a case study. With AES application, it shows 77.2% and 21% lower energy consumption compared to CMOS-ASIC and recent RIMPA implementation, respectively.

[1] Somayeh Sardashti,et al. The gem5 simulator , 2011, CARN.

[2] Kaushik Roy,et al. Spin-Transfer Torque Devices for Logic and Memory: Prospects and Perspectives , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[3] Onur Mutlu,et al. Chapter Four - Simple Operations in Memory to Reduce Data Movement , 2017, Adv. Comput..

[4] Kaushik Roy,et al. A framework for simulating hybrid MTJ/CMOS circuits: Atoms to system approach , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[5] Shaahin Angizi,et al. RIMPA: A New Reconfigurable Dual-Mode In-Memory Processing Architecture with Spin Hall Effect-Driven Domain Wall Motion Device , 2017, 2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).

[6] D. Ralph,et al. Spin transfer torque devices utilizing the giant spin Hall effect of tungsten , 2012, 1208.1711.

[7] W. Kang,et al. In-memory processing paradigm for bitwise logic operations in STT-MRAM , 2017, 2017 IEEE International Magnetics Conference (INTERMAG).

[8] Rui Zhang,et al. Threshold network synthesis and optimization and its application to nanotechnologies , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[9] Yoshishige Suzuki,et al. Novel voltage controlled MRAM (VCM) with fast read/write circuits for ultra large last level cache , 2016, 2016 IEEE International Electron Devices Meeting (IEDM).

[10] Tao Zhang,et al. PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[11] Tajana Simunic,et al. MPIM: Multi-purpose in-memory processing using configurable resistive memory , 2017, 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC).

[12] Cong Xu,et al. NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[13] Deliang Fan,et al. A Low Power Current-Mode Flash ADC with Spin Hall Effect based Multi-Threshold Comparator , 2016, ISLPED.

[14] Sanu Mathew,et al. 340 mV–1.1 V, 289 Gbps/W, 2090-Gate NanoAES Hardware Accelerator With Area-Optimized Encrypt/Decrypt GF(2 4 ) 2 Polynomials in 22 nm Tri-Gate CMOS , 2015, IEEE Journal of Solid-State Circuits.

[15] B. Baas,et al. Toward More Accurate Scaling Estimates of CMOS Circuits from 180 nm to 22 nm , 2012 .

[16] Cong Xu,et al. Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[17] Onur Mutlu,et al. Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.

[18] Miao Hu,et al. ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[19] David Blaauw,et al. Compute Caches , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[20] Jung Ho Ahn,et al. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[21] G.A. Jullien,et al. A method of majority logic reduction for quantum cellular automata , 2004, IEEE Transactions on Nanotechnology.

[22] Chip-Hong Chang,et al. DW-AES: A Domain-Wall Nanowire-Based AES for High Throughput and Energy-Efficient Data Encryption in Non-Volatile Memory , 2016, IEEE Transactions on Information Forensics and Security.

[23] H. Kanaya,et al. 4Gbit density STT-MRAM using perpendicular MTJ realized with compact cell structure , 2016, 2016 IEEE International Electron Devices Meeting (IEDM).

[24] Z. Abid,et al. Efficient CMOL Gate Designs for Cryptography Applications , 2009, IEEE Transactions on Nanotechnology.