APMT: an automatic hardware counter-based performance modeling tool for HPC applications
暂无分享,去创建一个
Wei Xue | Victor W. Lee | Nan Ding | Weimin Zheng | V. Lee | Weimin Zheng | W. Xue | N. Ding
[1] Jack J. Dongarra,et al. A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[2] Brinkley Sprunt,et al. Pentium 4 Performance-Monitoring Features , 2002, IEEE Micro.
[3] Sheri A. Mickelson,et al. Improved parallel performance of the CICE model in CESM1 , 2015, Int. J. High Perform. Comput. Appl..
[4] Saturnino Garcia,et al. Kremlin: rethinking and rebooting gprof for the multicore age , 2011, PLDI '11.
[5] Torsten Hoefler,et al. Using automated performance modeling to find scalability bugs in complex codes , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[6] Vivek K. Pallipuram,et al. Subjective versus objective: classifying analytical models for productive heterogeneous performance prediction , 2014, The Journal of Supercomputing.
[7] George Ho,et al. PAPI: A Portable Interface to Hardware Performance Counters , 1999 .
[8] Vivek K. Pallipuram,et al. A regression‐based performance prediction framework for synchronous iterative algorithms on general purpose graphical processing unit clusters , 2014, Concurr. Comput. Pract. Exp..
[9] Jeffrey S. Vetter,et al. Aspen: A domain specific language for performance modeling , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[10] John C. Gyllenhaal,et al. A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization , 1999, ISCA.
[11] Olivia R. Liu Sheng,et al. Examining the Technology Acceptance Model Using Physician Acceptance of Telemedicine Technology , 1999, J. Manag. Inf. Syst..
[12] Gerhard Wellein,et al. LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments , 2010, 2010 39th International Conference on Parallel Processing Workshops.
[13] Alan Stewart. A programming model for BSP with partitioned synchronisation , 2010, Formal Aspects of Computing.
[14] KimHyesoon,et al. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009 .
[15] Torsten Hoefler,et al. Using Compiler Techniques to Improve Automatic Performance Modeling , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).
[16] Mark A. Taylor,et al. CAM-SE: A scalable spectral element dynamical core for the Community Atmosphere Model , 2012, Int. J. High Perform. Comput. Appl..
[17] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[18] John M. Levesque,et al. Practical performance portability in the Parallel Ocean Program (POP) , 2005, Concurr. Pract. Exp..
[19] Nathan R. Tallent,et al. HPCTOOLKIT: tools for performance analysis of optimized parallel programs , 2010, Concurr. Comput. Pract. Exp..
[20] Thomas Willhalm,et al. Quantifying the Performance Impact of Memory Latency and Bandwidth for Big Data Workloads , 2015, 2015 IEEE International Symposium on Workload Characterization.
[21] Dirk Schmidl,et al. Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir , 2011, Parallel Tools Workshop.
[22] Juan Touriño,et al. XARK: An extensible framework for automatic recognition of computational kernels , 2008, TOPL.
[23] Jack Doweck,et al. Inside Intel® Core microarchitecture , 2006, 2006 IEEE Hot Chips 18 Symposium (HCS).
[24] Torsten Hoefler,et al. PEMOGEN: Automatic adaptive performance modeling during program runtime , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[25] Hyesoon Kim,et al. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.
[26] Xu Ji,et al. CESMTuner: An Auto-tuning Framework for the Community Earth System Model , 2014, 2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS).
[27] Qiang Wu,et al. Evaluating Sampling Based Hotspot Detection , 2009, ARCS.
[28] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..
[29] Prasanna Balaprakash,et al. AutoMOMML: Automatic Multi-objective Modeling with Machine Learning , 2016, ISC.
[30] Torsten Hoefler,et al. Performance Modeling and Comparative Analysis of the MILC Lattice QCD Application su3_rmd , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).
[31] Kirk W. Cameron,et al. MuMMI: Multiple Metrics Modeling Infrastructure , 2013, 2013 14th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing.
[32] Allen D. Malony,et al. Overhead Compensation in Performance Profiling , 2004, Parallel Process. Lett..
[33] Martin Schulz,et al. A regression-based approach to scalability prediction , 2008, ICS '08.
[34] Gerhard Wellein,et al. LIKWID: Lightweight Performance Tools , 2011, CHPC.
[35] Martin Schulz,et al. Reconciling Sampling and Direct Instrumentation for Unintrusive Call-Path Profiling of MPI Programs , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[36] David Black-Schaffer,et al. Micro-architecture independent analytical processor performance and power modeling , 2015, 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[37] Mark A. Taylor,et al. Performance of the Community Earth System Model , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[38] Bernd Mohr,et al. The Scalasca performance toolset architecture , 2010, Concurr. Comput. Pract. Exp..
[39] Matthias Hauswirth,et al. Accuracy of performance counter measurements , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[40] Rushil Anirudh,et al. Performance Modeling under Resource Constraints Using Deep Transfer Learning , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.
[41] Kuo-Chan Huang,et al. An Improved Model for Predicting HPL Performance , 2007, GPC.
[42] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .
[43] Sally A. McKee,et al. Methods of inference and learning for performance modeling of parallel applications , 2007, PPoPP.
[44] Michael M. Resch,et al. Towards Efficient Execution of MPI Applications on the Grid: Porting and Optimization Issues , 2003, Journal of Grid Computing.
[45] Sathish S. Vadhiyar,et al. Matching Application Signatures for Performance Predictions Using a Single Execution , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.
[46] Andrzej Nowak,et al. The overhead of profiling using PMU hardware counters , 2014 .