ProfDP: A Lightweight Profiler to Guide Data Placement in Heterogeneous Memory Systems
暂无分享,去创建一个
Ludmila Cherkasova | Shasha Wen | Xu Liu | Felix Xiaozhu Lin | L. Cherkasova | F. Lin | Xu Liu | Shasha Wen
[1] David Eklov,et al. Bandwidth Bandit: Quantitative characterization of memory contention , 2012, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[2] Nathan R. Tallent,et al. Scalable Identification of Load Imbalance in Parallel Executions Using Call Path Profiles , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[3] Xu Liu,et al. memif: Towards Programming Heterogeneous Memory Asynchronously , 2016, ASPLOS.
[4] Jin Xiong,et al. Exploiting Program Semantics to Place Data in Hybrid Memory , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).
[5] Nathan R. Tallent,et al. HPCTOOLKIT: tools for performance analysis of optimized parallel programs , 2010, Concurr. Comput. Pract. Exp..
[6] John Shalf,et al. NVMMU: A Non-volatile Memory Management Unit for Heterogeneous GPU-SSD Architectures , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).
[7] Nathan R. Tallent,et al. Performance analysis for parallel programs from multicore to petascale , 2010 .
[8] Ian Karlin,et al. LULESH 2.0 Updates and Changes , 2013 .
[9] Jaejin Lee,et al. Performance characterization of the NAS Parallel Benchmarks in OpenCL , 2011, 2011 IEEE International Symposium on Workload Characterization (IISWC).
[10] Nathan Froyd,et al. Scalability analysis of SPMD codes using expectations , 2007, ICS '07.
[11] Zhen Fang,et al. Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[12] Jun Li,et al. Quartz: A Lightweight Performance Emulator for Persistent Memory Software , 2015, Middleware.
[13] Dong Li,et al. Identifying Opportunities for Byte-Addressable Non-Volatile Memory in Extreme-Scale Scientific Applications , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[14] Xu Liu,et al. Characterizing emerging heterogeneous memory , 2016, ISMM.
[15] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[16] Ricardo Bianchini,et al. Page placement in hybrid memory systems , 2011, ICS '11.
[17] Martin Dimitrov,et al. A framework for application guidance in virtual memory systems , 2013, VEE '13.
[18] Karsten Schwan,et al. Data tiering in heterogeneous memory systems , 2016, EuroSys.
[19] Jeffrey S. Vetter,et al. Algorithm-Directed Data Placement in Explicitly Managed Non-Volatile Memory , 2016, HPDC.
[20] Dong Li,et al. PORPLE: An Extensible Optimizer for Portable Data Placement on GPU , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[21] M. Lankhorst,et al. Low-cost and nanoscale non-volatile memory concept for future silicon chips , 2005, Nature materials.
[22] Dimitrios S. Nikolopoulos,et al. Software-managed energy-efficient hybrid DRAM/NVM main memory , 2015, Conf. Computing Frontiers.
[23] Gabriel H. Loh,et al. 3D-Stacked Memory Architectures for Multi-core Processors , 2008, 2008 International Symposium on Computer Architecture.
[24] Hao Luo,et al. HOTL: a higher order theory of locality , 2013, ASPLOS '13.
[25] Avinash Sodani,et al. Knights landing (KNL): 2nd Generation Intel® Xeon Phi processor , 2015, 2015 IEEE Hot Chips 27 Symposium (HCS).
[26] Nathan R. Tallent,et al. Binary analysis for measurement and attribution of program performance , 2009, PLDI '09.
[27] Rachata Ausavarungnirun,et al. Row buffer locality aware caching policies for hybrid memories , 2012, 2012 IEEE 30th International Conference on Computer Design (ICCD).
[28] Guoyang Chen,et al. Coherence-Free Multiview: Enabling Reference-Discerning Data Placement on GPU , 2016, ICS.
[29] Joseph Antony,et al. Exploring Thread and Memory Placement on NUMA Architectures: Solaris and Linux, UltraSPARC/FirePlane and Opteron/HyperTransport , 2006, HiPC.
[30] Peter Marwedel,et al. Scratchpad memory: a design alternative for cache on-chip memory in embedded systems , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).
[31] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[32] Paul E. McKenney. Differential Profiling , 1999, Softw. Pract. Exp..
[33] No License,et al. Intel ® 64 and IA-32 Architectures Software Developer ’ s Manual Volume 3 A : System Programming Guide , Part 1 , 2006 .
[34] John M. Mellor-Crummey,et al. Pinpointing data locality bottlenecks with low overhead , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[35] Bo Wu,et al. ScaAnalyzer: a tool to identify memory scalability bottlenecks in parallel programs , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[36] Stephen W. Keckler,et al. Page Placement Strategies for GPUs within Heterogeneous Memory Systems , 2015, ASPLOS.
[37] Simon David Hammond,et al. memkind: An Extensible Heap Memory Manager for Heterogeneous Memory Platforms and Mixed Memory Policies. , 2015 .
[38] Gokcen Kestor,et al. RTHMS: a tool for data placement on hybrid memory system , 2017, ISMM.