ABC-DIMM: Alleviating the Bottleneck of Communication in DIMM-based Near-Memory Processing with Inter-DIMM Broadcast
暂无分享,去创建一个
Shouyi Yin | Shaojun Wei | Leibo Liu | Zhaoshi Li | Weiyi Sun | Leibo Liu | S. Yin | Shaojun Wei | Zhaoshi Li | Weiyi Sun
[1] Keshav Pingali,et al. Parallel graph analytics , 2016, Commun. ACM.
[2] Kai Li,et al. Retrospective: virtual memory mapped network interface for the SHRIMP multicomputer , 1994, ISCA '98.
[3] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[4] Ramyad Hadidi,et al. GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[5] Christoforos E. Kozyrakis,et al. GraphP: Reducing Communication for PIM-Based Graph Processing with Efficient Data Partition , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[6] David A. Wood,et al. A Primer on Memory Consistency and Cache Coherence, Second Edition , 2020, A Primer on Memory Consistency and Cache Coherence.
[7] Minsoo Rhu,et al. TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning , 2019, MICRO.
[8] Christoforos E. Kozyrakis,et al. A case for intelligent RAM , 1997, IEEE Micro.
[9] Margaret Martonosi,et al. Graphicionado: A high-performance and energy-efficient accelerator for graph analytics , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[10] Dean M. Tullsen,et al. Interconnections in multi-core architectures: understanding mechanisms, overheads and scaling , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[11] Michael F. Deering,et al. FBRAM: a new form of memory optimized for 3D graphics , 1994, SIGGRAPH.
[12] Sparsh Mittal. A Survey of Techniques for Cache Locking , 2016, ACM Trans. Design Autom. Electr. Syst..
[13] Hyuk-Jae Lee,et al. An Effective DRAM Address Remapping for Mitigating Rowhammer Errors , 2019, IEEE Transactions on Computers.
[14] Sally A. McKee,et al. Hitting the memory wall: implications of the obvious , 1995, CARN.
[15] David A. Patterson,et al. Computer Architecture, Fifth Edition: A Quantitative Approach , 2011 .
[16] William J. Dally,et al. Smart Memories: a modular reconfigurable architecture , 2000, ISCA '00.
[17] Milan M. Jovanovic,et al. An overview of reflective memory systems , 1999, IEEE Concurr..
[18] Mingyu Gao,et al. HRL: Efficient and flexible reconfigurable logic for near-data processing , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[19] Kiyoung Choi,et al. PIM-enabled instructions: A low-overhead, locality-aware processing-in-memory architecture , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[20] Mike Ignatowski,et al. TOP-PIM: throughput-oriented programmable processing in memory , 2014, HPDC '14.
[21] Richard B. Gillett. Memory Channel Network for PCI , 1996, IEEE Micro.
[22] Yanzhi Wang,et al. GraphQ: Scalable PIM-Based Graph Processing , 2019, MICRO.
[23] Jung Ho Ahn,et al. Chameleon: Versatile and practical near-DRAM acceleration architecture for large memory systems , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[24] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[25] Kunle Olukotun,et al. Simplifying Scalable Graph Processing with a Domain-Specific Language , 2014, CGO '14.
[26] Rajeev Balasubramonian,et al. Innovations in the Memory System , 2019, Synthesis Lectures on Computer Architecture.
[27] Feifei Li,et al. Comparing Implementations of Near-Data Computing with In-Memory MapReduce Workloads , 2014, IEEE Micro.
[28] Valeria Bertacco,et al. GraphVine: Exploiting Multicast for Scalable Graph Analytics , 2020, 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[29] William J. Dally,et al. Scaling the Power Wall: A Path to Exascale , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[30] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[31] Aart J. C. Bik,et al. Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.
[32] Onur Mutlu,et al. Ramulator: A Fast and Extensible DRAM Simulator , 2016, IEEE Computer Architecture Letters.
[33] Onur Mutlu,et al. A case for exploiting subarray-level parallelism (SALP) in DRAM , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[34] Dong Li,et al. DESTINY: A tool for modeling emerging 3D NVM and eDRAM caches , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[35] Kiyoung Choi,et al. A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[36] Christoforos E. Kozyrakis,et al. ZSim: fast and accurate microarchitectural simulation of thousand-core systems , 2013, ISCA.
[37] Babak Falsafi,et al. The mondrian data engine , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[38] Shashank Shekhar,et al. Opportunity for compute partitioning in pursuit of energy-efficient systems , 2016, LCTES.
[39] Frederic T. Chong,et al. Active pages: a computation model for intelligent memory , 1998, ISCA.
[40] Jinjun Xiong,et al. Application-Transparent Near-Memory Processing Architecture with Memory Channel Network , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[41] Jung Ho Ahn,et al. NDA: Near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[42] Mattan Erez,et al. Near Data Acceleration with Concurrent Host Access , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[43] No License,et al. Intel ® 64 and IA-32 Architectures Software Developer ’ s Manual Volume 3 A : System Programming Guide , Part 1 , 2006 .
[44] Yuan Xie,et al. MEDAL: Scalable DIMM based Near Data Processing Accelerator for DNA Seeding Algorithm , 2019, MICRO.