TRiM: Enhancing Processor-Memory Interfaces with Scalable Tensor Reduction in Memory
暂无分享,去创建一个
Jung Ho Ahn | Minsoo Rhu | Eojin Lee | Jaehyun Park | Byeongho Kim | Sungmin Yun | Minsoo Rhu | Sungmin Yun | Byeongho Kim | Jaehyun Park | Eojin Lee
[1] O Seongil,et al. Row-buffer decoupling: A case for low-latency DRAM microarchitecture , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[2] O Seongil,et al. Defect Analysis and Cost-Effective Resilience Architecture for Future DRAM Devices , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[3] Carole-Jean Wu,et al. RecSSD: near data processing for solid state drive based recommendation inference , 2021, ASPLOS.
[4] Yuan Xie,et al. DRISA: A DRAM-based Reconfigurable In-Situ Accelerator , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[5] Sukhan Lee,et al. CiDRA: A cache-inspired DRAM resilience architecture , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[6] Andrew B. Kahng,et al. CACTI-IO: CACTI with off-chip power-area-timing models , 2012, 2012 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[7] Minsoo Rhu,et al. Tensor Casting: Co-Designing Algorithm-Architecture for Personalized Recommendation Training , 2020, 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA).
[8] John Kim,et al. NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units , 2019, ASPLOS.
[9] Nam Sung Kim,et al. NetDIMM: Low-Latency Near-Memory Network Interface Architecture , 2019, MICRO.
[10] Yinghai Lu,et al. Deep Learning Recommendation Model for Personalization and Recommendation Systems , 2019, ArXiv.
[11] Onur Mutlu,et al. Ramulator: A Fast and Extensible DRAM Simulator , 2016, IEEE Computer Architecture Letters.
[12] Jie Yang,et al. Mixed-Precision Embedding Using a Cache , 2020, ArXiv.
[13] Yuan Xie,et al. SCOPE: A Stochastic Computing Engine for DRAM-Based In-Situ Accelerator , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[14] Sudhakar Yalamanchili,et al. Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[15] Developing a Recommendation Benchmark for MLPerf Training and Inference , 2020, ArXiv.
[16] Wei Lin,et al. Characterizing Deep Learning Training Workloads on Alibaba-PAI , 2019, 2019 IEEE International Symposium on Workload Characterization (IISWC).
[17] William J. Dally,et al. Principles and Practices of Interconnection Networks , 2004 .
[18] Sachin Katti,et al. Bandana: Using Non-volatile Memory for Storing Deep Learning Models , 2018, MLSys.
[19] Jinjun Xiong,et al. Application-Transparent Near-Memory Processing Architecture with Memory Channel Network , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[20] Carole-Jean Wu,et al. The Architectural Implications of Facebook's DNN-Based Personalized Recommendation , 2019, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[21] Yuan Xie,et al. MEDAL: Scalable DIMM based Near Data Processing Accelerator for DNA Seeding Algorithm , 2019, MICRO.
[22] Martin D. Schatz,et al. RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing , 2019, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[23] Dong Li,et al. Processing-in-Memory for Energy-Efficient Neural Network Training: A Heterogeneous Approach , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[24] Christoforos E. Kozyrakis,et al. TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory , 2017, ASPLOS.
[25] Jung Ho Ahn,et al. NDA: Near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[26] William J. Dally,et al. Scatter-add in data parallel architectures , 2005, 11th International Symposium on High-Performance Computer Architecture.
[27] Cody Coleman,et al. MLPerf Inference Benchmark , 2019, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[28] Eunhyeok Park,et al. McDRAM: Low Latency and Energy-Efficient Matrix Computations in DRAM , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[29] Minsoo Rhu,et al. TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning , 2019, MICRO.
[30] O Seongil,et al. CIDR: A Cache Inspired Area-Efficient DRAM Resilience Architecture against Permanent Faults , 2015, IEEE Computer Architecture Letters.
[31] Fabrice Devaux,et al. The true Processing In Memory accelerator , 2019, 2019 IEEE Hot Chips 31 Symposium (HCS).
[32] David P. Luebke,et al. CUDA: Scalable parallel programming for high-performance scientific computing , 2008, 2008 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro.
[33] F. Lemmermeyer. Error-correcting Codes , 2005 .
[34] Alexander Heinecke,et al. Optimizing Deep Learning Recommender Systems Training on CPU Cluster Architectures , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.
[35] Carole-Jean Wu,et al. DeepRecSys: A System for Optimizing End-To-End At-Scale Neural Recommendation Inference , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[36] Jung Ho Ahn,et al. Chameleon: Versatile and practical near-DRAM acceleration architecture for large memory systems , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[37] O Seongil,et al. Reducing memory access latency with asymmetric DRAM bank organizations , 2013, ISCA.
[38] Paul Covington,et al. Deep Neural Networks for YouTube Recommendations , 2016, RecSys.
[39] Minsoo Rhu,et al. Centaur: A Chiplet-based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[40] Sung Kyu Lim,et al. FAFNIR: Accelerating Sparse Gathering by Using Efficient Near-Memory Intelligent Reduction , 2021, 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA).
[41] Hyoung-Joo Kim,et al. A 3.2 Gbps/pin 8 Gbit 1.0 V LPDDR4 SDRAM With Integrated ECC Engine for Sub-1 V DRAM Core Operation , 2015, IEEE Journal of Solid-State Circuits.
[42] Dimin Niu,et al. iPIM: Programmable In-Memory Image Processing Accelerator Using Near-Bank Architecture , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[43] John D. Leidel,et al. PIMS: a lightweight processing-in-memory accelerator for stencil computations , 2019, MEMSYS.
[44] Chia-Lin Yang,et al. Improving DRAM latency with dynamic asymmetric subarray , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[45] O Seongil,et al. Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology : Industrial Product , 2021, 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA).
[46] Xuan Zhang,et al. Near-Memory Processing in Action: Accelerating Personalized Recommendation With AxDIMM , 2021, IEEE Micro.
[47] Hankyu Chi,et al. 23.2 A 1.1V 1ynm 6.4Gb/s/pin 16Gb DDR5 SDRAM with a Phase-Rotator-Based DLL, High-Speed SerDes and RX/TX Equalization Scheme , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).
[48] William J. Dally,et al. Fine-Grained DRAM: Energy-Efficient DRAM for Extreme Bandwidth Systems , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[49] Babak Falsafi,et al. The mondrian data engine , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[50] Tony Tung,et al. Scaling Memcache at Facebook , 2013, NSDI.
[51] Jung Ho Ahn,et al. TRiM: Tensor Reduction in Memory , 2021, IEEE Computer Architecture Letters.
[52] Oscar Plata,et al. NATSA: A Near-Data Processing Accelerator for Time Series Analysis , 2020, 2020 IEEE 38th International Conference on Computer Design (ICCD).
[53] Jie Li,et al. PIMS: a lightweight processing-in-memory accelerator for stencil computations , 2019, MEMSYS.