Automatic loop kernel analysis and performance modeling with Kerncraft
暂无分享,去创建一个
Gerhard Wellein | Georg Hager | Jan Eitzinger | Julian Hammer | G. Wellein | G. Hager | Julian Hammer | Jan Eitzinger
[1] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[2] Roger W. Hockney,et al. F1/2: a Parameter to Characterize Memory and Communication Bottlenecks , 1989, Parallel Comput..
[3] William Jalby,et al. MAQAO : Modular Assembler Quality Analyzer and Optimizer for Itanium 2 , 2005 .
[4] Gerhard Wellein,et al. Performance Patterns and Hardware Metrics on Modern Multicore Processors: Best Practices for Performance Engineering , 2012, Euro-Par Workshops.
[5] William Kahan,et al. Pracniques: further remarks on reducing truncation errors , 1965, CACM.
[6] Samuel Williams,et al. Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors , 2007, SIAM Rev..
[7] Gerhard Wellein,et al. LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments , 2010, 2010 39th International Conference on Parallel Processing Workshops.
[8] Ken Kennedy,et al. Estimating Interlock and Improving Balance for Pipelined Architectures , 1988, J. Parallel Distributed Comput..
[9] Gerhard Wellein,et al. LIKWID: Lightweight Performance Tools , 2011, CHPC.
[10] Gerhard Wellein,et al. Quantifying Performance Bottlenecks of Stencil Computations Using the Execution-Cache-Memory Model , 2014, ICS.
[11] Samuel Williams,et al. ExaSAT: An exascale co-design tool for performance modeling , 2015, Int. J. High Perform. Comput. Appl..
[12] Gerhard Wellein,et al. Exploring performance and power properties of modern multi‐core chips via simple machine models , 2012, Concurr. Comput. Pract. Exp..
[13] Gerhard Wellein,et al. likwid-bench: An Extensible Microbenchmarking Platform for x86 Multicore Compute Nodes , 2011, Parallel Tools Workshop.
[14] Dietmar Fey,et al. Execution-Cache-Memory Performance Model: Introduction and Validation , 2015, ArXiv.
[15] H. T. Kung. Memory requirements for balanced computer architectures , 1986, ISCA '86.
[16] Samuel Williams,et al. Roofline Model Toolkit: A Practical Tool for Architectural and Program Analysis , 2014, PMBS@SC.
[17] Thomas Ilsche,et al. An Energy Efficiency Feature Survey of the Intel Haswell Processor , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.
[18] Paul D. Hovland,et al. Generating Performance Bounds from Source Code , 2010, 2010 39th International Conference on Parallel Processing Workshops.
[19] Gerhard Wellein,et al. Performance Analysis of the Kahan-Enhanced Scalar Product on Current Multicore Processors , 2015, PPAM.