Timemory: Modular Performance Analysis for HPC
暂无分享,去创建一个
Samuel Williams | Leonid Oliker | Jonathan R. Madsen | Muaaz G. Awan | Hugo Brunie | Jack Deslippe | Rahul Gayatri | Yunsong Wang | Charlene Yang | Samuel Williams | L. Oliker | J. Deslippe | M. Awan | Rahulkumar Gayatri | Charlene Yang | Hugo Brunie | Yunsong Wang | Jonathan Madsen
[1] Robert Dietrich,et al. OMPT: An OpenMP Tools Application Programming Interface for Performance Analysis , 2013, IWOMP.
[2] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[3] James Demmel,et al. Precimonious: Tuning assistant for floating-point precision , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[4] A. Dell'Acqua,et al. Geant4 - A simulation toolkit , 2003 .
[5] Gerhard Wellein,et al. LIKWID: Lightweight Performance Tools , 2011, CHPC.
[6] Jack J. Dongarra,et al. Collecting Performance Data with PAPI-C , 2009, Parallel Tools Workshop.
[7] Anthony Di Franco,et al. A comprehensive study of real-world numerical bug characteristics , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).
[8] Barton P. Miller,et al. Anywhere, any-time binary instrumentation , 2011, PASTE '11.
[9] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[10] Hal Finkel,et al. ClangJIT: Enhancing C++ with Just-in-Time Compilation , 2019, 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC).
[11] Samuel Williams,et al. Roofline Model Toolkit: A Practical Tool for Architectural and Program Analysis , 2014, PMBS@SC.
[12] Josef Weidendorfer,et al. The Case for a Common Instrumentation Interface for HPC Codes , 2019, 2019 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools).
[13] Nan Ding,et al. An Instruction Roofline Model for GPUs , 2019, 2019 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS).
[14] Martin Schulz,et al. Caliper: Performance Introspection for HPC Software Stacks , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[15] Allen D. Malony,et al. The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..
[16] Daniel Sunderland,et al. Kokkos: Enabling manycore performance portability through polymorphic memory access patterns , 2014, J. Parallel Distributed Comput..
[17] Vladimir Getov,et al. PMPI: High-Level Message Passing in Fortran 77 and C , 1997, HPCN Europe.
[18] Dirk Schmidl,et al. Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir , 2011, Parallel Tools Workshop.
[19] Jeffrey S. Vetter,et al. NVIDIA Tensor Core Programmability, Performance & Precision , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[20] David Poliakoff,et al. Gotcha: An Function-Wrapping Interface for HPC Tools , 2017, ESPT/VPA@SC.
[21] Nathan R. Tallent,et al. HPCTOOLKIT: tools for performance analysis of optimized parallel programs , 2010, Concurr. Comput. Pract. Exp..
[22] Gerhard Wellein,et al. LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments , 2010, 2010 39th International Conference on Parallel Processing Workshops.
[23] Nectarios Koziris,et al. Reliable and Efficient Performance Monitoring in Linux , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[24] Dhabaleswar K. Panda,et al. MPI performance engineering with the MPI tool interface: the integration of MVAPICH and TAU , 2017, EuroMPI/USA.