PAC: Paged Adaptive Coalescer for 3D-Stacked Memory
暂无分享,去创建一个
[1] William J. Dally,et al. Fine-Grained DRAM: Energy-Efficient DRAM for Extreme Bandwidth Systems , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[2] Krste Asanovic,et al. The RISC-V Instruction Set Manual Volume 2: Privileged Architecture Version 1.7 , 2015 .
[3] Sandia Report,et al. Toward a New Metric for Ranking High Performance Computing Systems , 2013 .
[4] Sudhakar Yalamanchili,et al. Demystifying the characteristics of 3D-stacked memories: A case study for Hybrid Memory Cube , 2017, 2017 IEEE International Symposium on Workload Characterization (IISWC).
[5] David H. Bailey,et al. NAS parallel benchmark results , 1993, IEEE Parallel & Distributed Technology: Systems & Applications.
[6] Scott A. Mahlke,et al. WarpPool: Sharing requests with inter-warp coalescing for throughput processors , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[7] David Kroft,et al. Lockup-free instruction fetch/prefetch cache organization , 1998, ISCA '81.
[8] Doe Hyun Yoon,et al. Adaptive granularity memory systems: A tradeoff between storage efficiency and throughput , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).
[9] Aamer Jaleel,et al. CoLT: Coalesced Large-Reach TLBs , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[10] J. Thomas Pawlowski,et al. Hybrid memory cube (HMC) , 2011, 2011 IEEE Hot Chips 23 Symposium (HCS).
[11] Christoforos E. Kozyrakis,et al. GraphP: Reducing Communication for PIM-Based Graph Processing with Efficient Data Partition , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[12] Mingyu Gao,et al. HRL: Efficient and flexible reconfigurable logic for near-data processing , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[13] Antonino Tumeo,et al. MAC: Memory Access Coalescer for 3D-Stacked Memory , 2019, ICPP.
[14] J. Jeddeloh,et al. Hybrid memory cube new DRAM architecture increases density and performance , 2012, 2012 Symposium on VLSI Technology (VLSIT).
[15] Jaeha Kim,et al. Memory-centric system interconnect design with Hybrid Memory Cubes , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.
[16] Josep Torrellas,et al. Scalable Cache Miss Handling for High Memory-Level Parallelism , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[17] William J. Dally,et al. Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[18] Kiyoung Choi,et al. PIM-enabled instructions: A low-overhead, locality-aware processing-in-memory architecture , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[19] Abhishek Bhattacharjee,et al. Large-reach memory management unit caches , 2013, MICRO.
[20] Yong Chen,et al. HMC-Sim: A Simulation Framework for Hybrid Memory Cube Devices , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.
[21] Kenneth E. Batcher,et al. Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.
[22] Margaret Martonosi,et al. TLB Improvements for Chip Multiprocessors: Inter-Core Cooperative Prefetchers and Shared Last-Level TLBs , 2013, TACO.
[23] Richard W. Vuduc,et al. Many-Thread Aware Prefetching Mechanisms for GPGPU Applications , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[24] Marc Casas,et al. Data Prefetching on In-order Processors , 2018, 2018 International Conference on High Performance Computing & Simulation (HPCS).
[25] Ramyad Hadidi,et al. GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[26] Satish Narayanasamy,et al. InvisiMem: Smart memory defenses for memory bus side channel , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[27] Thomas F. Wenisch,et al. Selective GPU caches to eliminate CPU-GPU HW cache coherence , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[28] David A. Bader,et al. Design and Implementation of the HPCS Graph Analysis Benchmark on Symmetric Multiprocessors , 2005, HiPC.
[29] Hans-Peter Kriegel,et al. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.
[30] Maya Gokhale,et al. Hybrid memory cube performance characterization on data-centric workloads , 2015, IA3@SC.
[31] Dongrui Fan,et al. High performance comparison-based sorting algorithm on many-core GPUs , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[32] Mateo Valero,et al. VSR sort: A novel vectorised sorting algorithm & architecture extensions for future microprocessors , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[33] Tejas Karkhanis,et al. Active Memory Cube: A processing-in-memory architecture for exascale systems , 2015, IBM J. Res. Dev..
[34] Christoforos E. Kozyrakis,et al. Practical Near-Data Processing for In-Memory Analytics Frameworks , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).
[35] Fabio Checconi,et al. A Throughput-Optimized Optical Network for Data-Intensive Computing , 2014, IEEE Micro.
[36] Dong Chen,et al. Locality analysis through static parallel sampling , 2018, PLDI.
[37] Kevin Skadron,et al. Dymaxion: Optimizing memory access patterns for heterogeneous systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[38] Paul Rosenfeld,et al. Performance Exploration of the Hybrid Memory Cube , 2014 .
[39] David A. Patterson,et al. The GAP Benchmark Suite , 2015, ArXiv.
[40] Rajeev Balasubramonian,et al. MemZip: Exploring unconventional benefits from memory compression , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[41] Alejandro Duran,et al. Barcelona OpenMP Tasks Suite: A Set of Benchmarks Targeting the Exploitation of Task Parallelism in OpenMP , 2009, 2009 International Conference on Parallel Processing.
[42] William J. Dally,et al. Architecting an Energy-Efficient DRAM System for GPUs , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[43] Gabriel Zachmann,et al. GPU-ABiSort: optimal parallel sorting on stream architectures , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[44] David A. Wood,et al. Supporting x86-64 address translation for 100s of GPU lanes , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[45] Gabriel H. Loh,et al. Increasing TLB reach by exploiting clustering in page translations , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[46] Bahar Asgari,et al. Performance Implications of NoCs on 3D-Stacked Memories: Insights from the Hybrid Memory Cube , 2017, 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[47] David R. Kaeli,et al. Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures , 2011, IEEE Transactions on Parallel and Distributed Systems.
[48] John D. Leidel,et al. Memory Coalescing for Hybrid Memory Cube , 2018, ICPP.
[49] Valentin Puente,et al. CMP off-chip bandwidth scheduling guided by instruction criticality , 2013, ICS '13.