Understanding Object-level Memory Access Patterns Across the Spectrum
暂无分享,去创建一个
Wei Xue | Xiaosong Ma | Youngjae Kim | Chao Wang | Sudharshan S. Vazhkudai | Xu Ji | Nosayba El-Sayed | Daniel Sánchez | Daniel Sánchez | Xiaosong Ma | Youngjae Kim | Nosayba El-Sayed | Xu Ji | W. Xue | Chao Wang
[1] Chao Wang,et al. NVMalloc: Exposing an Aggregate SSD Store as a Memory Partition in Extreme-Scale Machines , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[2] Michael J. Franklin,et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.
[3] Guy E. Blelloch,et al. Brief announcement: the problem based benchmark suite , 2012, SPAA '12.
[4] Nathan R. Tallent,et al. HPCTOOLKIT: tools for performance analysis of optimized parallel programs , 2010, Concurr. Comput. Pract. Exp..
[5] Gokcen Kestor,et al. RTHMS: a tool for data placement on hybrid memory system , 2017, ISMM.
[6] Mark Johnson,et al. NCBI BLAST: a better web interface , 2008, Nucleic Acids Res..
[7] Peter M. Kogge,et al. On the Memory Access Patterns of Supercomputer Applications: Benchmark Selection and Its Implications , 2007, IEEE Transactions on Computers.
[8] R. Govindarajan,et al. Probabilistic Shared Cache Management (PriSM) , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[9] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[10] Matthew L. Seidl,et al. Predicting References to Dynamically Allocated Objects ; CU-CS-826-97 , 1997 .
[11] Xiaofeng Gao,et al. Exploiting Stability to Reduce Time-Space Cost for Memory Tracing , 2003, International Conference on Computational Science.
[12] Benjamin G. Zorn,et al. Using lifetime predictors to improve memory allocation performance , 1993, PLDI '93.
[13] Eddie Kohler,et al. Cache craftiness for fast multicore key-value storage , 2012, EuroSys '12.
[14] Xiaofeng Gao,et al. Reducing overheads for acquiring dynamic memory traces , 2005, IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005..
[15] Easwaran Raman,et al. Recursive data structure profiling , 2005, MSP '05.
[16] Karsten Schwan,et al. Data tiering in heterogeneous memory systems , 2016, EuroSys.
[17] Luiz André Barroso,et al. Memory system characterization of commercial workloads , 1998, ISCA.
[18] Berk Hess,et al. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers , 2015 .
[19] Vivien Quéma,et al. Traffic management: a holistic approach to memory placement on NUMA systems , 2013, ASPLOS '13.
[20] Simon D. Hammond,et al. Multi-Level Memory Policies: What You Add Is More Important Than What You Take Out , 2016, MEMSYS.
[21] Alaa R. Alameldeen,et al. Base-Victim Compression: An Opportunistic Cache Compression Architecture , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[22] Thomas F. Wenisch,et al. Temporal Streaming of Shared Memory , 2005, ISCA 2005.
[23] Sally A. McKee,et al. METRIC: tracking down inefficiencies in the memory hierarchy via binary rewriting , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..
[24] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..
[25] David A. Wood,et al. Managing Wire Delay in Large Chip-Multiprocessor Caches , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).
[26] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..
[27] Kathryn S. McKinley,et al. Reconsidering custom memory allocation , 2002, OOPSLA '02.
[28] Stijn Eyerman,et al. A first-order mechanistic model for architectural vulnerability factor , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[29] Babak Falsafi,et al. Cuckoo directory: A scalable directory for many-core systems , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[30] Clark Verbrugge,et al. Dynamic Data Structure Analysis for Java Programs , 2006, 14th IEEE International Conference on Program Comprehension (ICPC'06).
[31] Hans Werner Meuer,et al. Top500 Supercomputer Sites , 1997 .
[32] Guy E. Blelloch,et al. Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.
[33] Zhang Jing,et al. Data locality characterization of OLTP applications and its effects on cache performance , 2010, 2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE).
[34] Aamer Jaleel,et al. Sandbox Prefetching: Safe run-time evaluation of aggressive prefetchers , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[35] Onur Mutlu,et al. The Dirty-Block Index , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[36] Michael Stonebraker,et al. OLTP through the looking glass, and what we found there , 2008, SIGMOD Conference.
[37] Qiang Wu,et al. Exposing memory access regularities using object-relative memory profiling , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[38] Lizhong Chen,et al. Futility Scaling: High-Associativity Cache Partitioning , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[39] John M. Mellor-Crummey,et al. A data-centric profiler for parallel programs , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[40] Thomas F. Wenisch,et al. Temporal streaming of shared memory , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[41] Bernd Hamann,et al. Dissecting On-Node Memory Access Performance: A Semantic Approach , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[42] Daniel Sánchez,et al. Tailbench: a benchmark suite and evaluation methodology for latency-critical applications , 2016, 2016 IEEE International Symposium on Workload Characterization (IISWC).
[43] Gu-Yeon Wei,et al. ReVIVaL: A Variation-Tolerant Architecture Using Voltage Interpolation and Variable Latency , 2008, 2008 International Symposium on Computer Architecture.
[44] Akanksha Jain,et al. Back to the Future: Leveraging Belady's Algorithm for Improved Cache Replacement , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[45] Krishna M. Kavi,et al. Gleipnir: a memory profiling and tracing tool , 2013, CARN.
[46] Zeshan Chishti,et al. Optimizing replication, communication, and capacity allocation in CMPs , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[47] Ben Zorn,et al. Predicting References to Dynamically Allocated Objects , 1997 .
[48] Alex Zelinsky,et al. Learning OpenCV---Computer Vision with the OpenCV Library (Bradski, G.R. et al.; 2008)[On the Shelf] , 2009, IEEE Robotics & Automation Magazine.
[49] Rastislav Bodík,et al. An efficient profile-analysis framework for data-layout optimizations , 2002, POPL '02.
[50] David R. Kaeli,et al. Profile-guided I/O partitioning , 2003, ICS '03.
[51] Eddie Kohler,et al. Speedy transactions in multicore in-memory databases , 2013, SOSP.
[52] Chandra Krintz,et al. Cache-conscious data placement , 1998, ASPLOS VIII.
[53] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[54] Simon D. Hammond,et al. Analyzing allocation behavior for multi-level memory , 2016, MEMSYS.
[55] Erich Strohmaier,et al. Quantifying Locality In The Memory Access Patterns of HPC Applications , 2005, ACM/IEEE SC 2005 Conference (SC'05).
[56] Aamer Jaleel,et al. High performance cache replacement using re-reference interval prediction (RRIP) , 2010, ISCA.
[57] Trishul M. Chilimbi. Efficient representations and abstractions for quantifying and exploiting data reference locality , 2001, PLDI '01.
[58] Joseph Antony,et al. Exploring Thread and Memory Placement on NUMA Architectures: Solaris and Linux, UltraSPARC/FirePlane and Opteron/HyperTransport , 2006, HiPC.
[59] Yutao Zhong,et al. Predicting whole-program locality through reuse distance analysis , 2003, PLDI.
[60] Yi Yang,et al. Locality Principle Revisited: A Probability-Based Quantitative Approach , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[61] Wu-chun Feng,et al. The design, implementation, and evaluation of mpiBLAST , 2003 .
[62] Mahmut T. Kandemir,et al. Managing GPU Concurrency in Heterogeneous Architectures , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[63] Gary R. Bradski,et al. Learning OpenCV 3: Computer Vision in C++ with the OpenCV Library , 2016 .
[64] Daniel Sánchez,et al. Whirlpool: Improving Dynamic Cache Management with Static Data Classification , 2016, ASPLOS.