HMC-MAC: Processing-in Memory Architecture for Multiply-Accumulate Operations with Hybrid Memory Cube

Many studies focus on implementing processing-in memory (PIM) on the logic die of the hybrid memory cube (HMC) architecture. The multiply-accumulate (MAC) operation is heavily used in digital signal processing (DSP) systems. In this paper, a novel PIM architecture called HMC-MAC that implements the MAC operation in the HMC is proposed. The vault controllers of the conventional HMC are working independently to maximize the parallelism, and HMC-MAC is based on the conventional HMC without modifying the architecture much. Therefore, a large number of MAC operations can be processed in parallel. In HMC-MAC, the MAC operation can be carried out simultaneously with as much as 128 KB data. The correctness on HMC-MAC is verified by simulations, and its performance is better than the conventional CPU-based MAC operation when the MAC operation is consecutively executed at least six times

[1]  Kiyoung Choi,et al.  PIM-enabled instructions: A low-overhead, locality-aware processing-in-memory architecture , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[2]  Bruce Jacob,et al.  DRAMSim2: A Cycle Accurate Memory System Simulator , 2011, IEEE Computer Architecture Letters.

[3]  Bruce F. Cockburn,et al.  Implementation of DSP-RAM: an architecture for parallel digital signal processing in memory , 2001, Canadian Conference on Electrical and Computer Engineering 2001. Conference Proceedings (Cat. No.01TH8555).

[4]  Mikko H. Lipasti,et al.  Data compression for thermal mitigation in the Hybrid Memory Cube , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).

[5]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[6]  Ki-Seok Chung,et al.  CasHMC: A Cycle-Accurate Simulator for Hybrid Memory Cube , 2017, IEEE Computer Architecture Letters.

[7]  Sudhakar Yalamanchili,et al.  Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[8]  Ramyad Hadidi,et al.  GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[9]  William J. Dally,et al.  Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[10]  Srinivas Sridharan,et al.  Memory in processor: a novel design paradigm for supercomputing architectures , 2004, SIGARCH Comput. Archit. News.