论文信息 - HMC-MAC: Processing-in Memory Architecture for Multiply-Accumulate Operations with Hybrid Memory Cube

HMC-MAC: Processing-in Memory Architecture for Multiply-Accumulate Operations with Hybrid Memory Cube

Many studies focus on implementing processing-in memory (PIM) on the logic die of the hybrid memory cube (HMC) architecture. The multiply-accumulate (MAC) operation is heavily used in digital signal processing (DSP) systems. In this paper, a novel PIM architecture called HMC-MAC that implements the MAC operation in the HMC is proposed. The vault controllers of the conventional HMC are working independently to maximize the parallelism, and HMC-MAC is based on the conventional HMC without modifying the architecture much. Therefore, a large number of MAC operations can be processed in parallel. In HMC-MAC, the MAC operation can be carried out simultaneously with as much as 128 KB data. The correctness on HMC-MAC is verified by simulations, and its performance is better than the conventional CPU-based MAC operation when the MAC operation is consecutively executed at least six times

Ki-Seok Chung | Dong-Ik Jeon | Kyeong-Bin Park

[1] Kiyoung Choi,et al. PIM-enabled instructions: A low-overhead, locality-aware processing-in-memory architecture , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[2] Bruce Jacob,et al. DRAMSim2: A Cycle Accurate Memory System Simulator , 2011, IEEE Computer Architecture Letters.

[3] Bruce F. Cockburn,et al. Implementation of DSP-RAM: an architecture for parallel digital signal processing in memory , 2001, Canadian Conference on Electrical and Computer Engineering 2001. Conference Proceedings (Cat. No.01TH8555).

[4] Mikko H. Lipasti,et al. Data compression for thermal mitigation in the Hybrid Memory Cube , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).

[5] Somayeh Sardashti,et al. The gem5 simulator , 2011, CARN.

[6] Ki-Seok Chung,et al. CasHMC: A Cycle-Accurate Simulator for Hybrid Memory Cube , 2017, IEEE Computer Architecture Letters.

[7] Sudhakar Yalamanchili,et al. Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[8] Ramyad Hadidi,et al. GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[9] William J. Dally,et al. Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[10] Srinivas Sridharan,et al. Memory in processor: a novel design paradigm for supercomputing architectures , 2004, SIGARCH Comput. Archit. News.