Bridging the Architecture Gap: Abstracting Performance-Relevant Properties of Modern Server Processors
暂无分享,去创建一个
[1] Gerhard Wellein,et al. Kerncraft: A Tool for Analytic Performance Modeling of Loop Kernels , 2017, ArXiv.
[2] Ronald N. Kalla,et al. IBM Power9 Processor Architecture , 2017, IEEE Micro.
[3] Frederico Pratas,et al. Cache-aware Roofline model: Upgrading the loft , 2014, IEEE Computer Architecture Letters.
[4] Gerhard Wellein,et al. Collecting and Presenting Reproducible Intranode Stencil Performance: INSPECT , 2019, Supercomput. Front. Innov..
[5] Gerhard Wellein,et al. Analytic performance modeling and analysis of detailed neuron simulations , 2019, Int. J. High Perform. Comput. Appl..
[6] Georg Hager,et al. On the accuracy and usefulness of analytic energy models for contemporary multicore processors , 2018, ISC.
[7] Gerhard Wellein,et al. Chip‐level and multi‐node analysis of energy‐optimized lattice Boltzmann CFD simulations , 2016, Concurr. Comput. Pract. Exp..
[8] Gerhard Wellein,et al. Quantifying Performance Bottlenecks of Stencil Computations Using the Execution-Cache-Memory Model , 2014, ICS.
[9] Sadaf R. Alam,et al. An Exploration of Performance Attributes for Symbolic Modeling of Emerging Processing Devices , 2007, HPCC.
[10] Rainald Löhner,et al. Practical applicability of optimizations and performance models to complex stencil-based loop kernels in CFD , 2019, Int. J. High Perform. Comput. Appl..
[11] Gerhard Wellein,et al. Introduction to High Performance Computing for Scientists and Engineers , 2010, Chapman and Hall / CRC computational science series.
[12] Dietmar Fey,et al. An ECM-based Energy-Efficiency Optimization Approach for Bandwidth-Limited Streaming Kernels on Recent Intel Xeon Processors , 2016, 2016 4th International Workshop on Energy Efficient Supercomputing (E2SC).
[13] Barbara I. Wohlmuth,et al. Performance and Scalability of Hierarchical Hybrid Multigrid Solvers for Stokes Systems , 2015, SIAM J. Sci. Comput..
[14] Jack J. Dongarra,et al. Collecting Performance Data with PAPI-C , 2009, Parallel Tools Workshop.
[15] Samuel Williams,et al. Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors , 2007, SIAM Rev..
[16] HagerGeorg,et al. Chip-level and multi-node analysis of energy-optimized lattice Boltzmann CFD simulations , 2016 .
[17] Georg Ofenbeck,et al. Applying the roofline model , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[18] Roger W. Hockney,et al. F1/2: a Parameter to Characterize Memory and Communication Bottlenecks , 1989, Parallel Comput..
[19] Ananta Tiwari,et al. Understanding the performance of stencil computations on Intel's Xeon Phi , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).
[20] Gerhard Wellein,et al. Exploring performance and power properties of modern multi‐core chips via simple machine models , 2012, Concurr. Comput. Pract. Exp..
[21] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[22] Thomas Rauber,et al. Applicability of the ECM Performance Model to Explicit ODE Methods on Current Multi-core Processors , 2018, ISC.