暂无分享,去创建一个
Jesús Labarta | Antonio J. Peña | Harald Servat | Judit Giménez | Hans-Christian Hoppe | Jesús Labarta | Harald Servat | Judit Giménez | Hans-Christian Hoppe
[1] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[2] Pavan Balaji,et al. Toward the efficient use of multiple explicitly managed memory subsystems , 2014, 2014 IEEE International Conference on Cluster Computing (CLUSTER).
[3] Jack J. Dongarra,et al. High-performance conjugate-gradient benchmark: A new metric for ranking high-performance computing systems , 2016, Int. J. High Perform. Comput. Appl..
[4] James C. Browne,et al. Enhancing performance optimization of multicore chips and multichip nodes with data structure metrics , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[5] Lars Koesterke,et al. PerfExpert: An Easy-to-Use Performance Diagnosis Tool for HPC Applications , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[6] Pavan Balaji,et al. A Framework for Tracking Memory Accesses in Scientific Applications , 2014, 2014 43rd International Conference on Parallel Processing Workshops.
[7] Jesús Labarta,et al. Unveiling Internal Evolution of Parallel Application Computation Phases , 2011, 2011 International Conference on Parallel Processing.
[8] P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .
[9] Arnaldo Carvalho de Melo,et al. The New Linux ’ perf ’ Tools , 2010 .
[10] Mateo Valero,et al. Quantifying the Potential Task-Based Dataflow Parallelism in MPI Applications , 2011, Euro-Par.
[11] Laura Carrington,et al. ADAMANT: Tools to Capture, Analyze, and Manage Data Movement , 2016, ICCS.
[12] Xu Liu,et al. StructSlim: A lightweight profiler to guide structure splitting , 2016, 2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[13] Bernd Hamann,et al. Dissecting On-Node Memory Access Performance: A Semantic Approach , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[14] John M. Mellor-Crummey,et al. A data-centric profiler for parallel programs , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[15] Kristof Beyls,et al. Refactoring for Data Locality , 2009, Computer.
[16] Jesús Labarta,et al. DiP: A Parallel Program Development Environment , 1996, Euro-Par, Vol. II.
[17] Robert Richter,et al. Incorporating Instruction-Based Sampling into AMD CodeAnalyst , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).
[18] Gabriel H. Loh,et al. 3D-Stacked Memory Architectures for Multi-core Processors , 2008, 2008 International Symposium on Computer Architecture.
[19] Juan Gonzalez,et al. Low-Overhead Detection of Memory Access Patterns and Their Time Evolution , 2015, Euro-Par.
[20] Jack J. Dongarra,et al. A Portable Programming Interface for Performance Evaluation on Modern Processors , 2000, Int. J. High Perform. Comput. Appl..
[21] Margaret Martonosi,et al. MemSpy: analyzing memory system bottlenecks in programs , 1992, SIGMETRICS '92/PERFORMANCE '92.
[22] Balaram Sinharoy,et al. IBM POWER7 performance modeling, verification, and evaluation , 2011 .
[23] Chao Wang,et al. NVMalloc: Exposing an Aggregate SSD Store as a Memory Partition in Extreme-Scale Machines , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[24] Brian J. N. Wylie,et al. Memory Profiling using Hardware Counters , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[25] Michael Laurenzano,et al. PEBIL: Efficient static binary instrumentation for Linux , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).