Understanding HPC Benchmark Performance on Intel Broadwell and Cascade Lake Processors
暂无分享,去创建一个
Gerhard Wellein | Holger Fehske | Johannes Hofmann | Georg Hager | Christie L. Alappat | Alan R. Bishop
[1] Gerhard Wellein,et al. Efficient Temporal Blocking for Stencil Computations by Multicore-Aware Wavefront Parallelization , 2009, 2009 33rd Annual IEEE International Computer Software and Applications Conference.
[2] Gerhard Wellein,et al. A Recursive Algebraic Coloring Technique for Hardware-efficient Symmetric Sparse Matrix-vector Multiplication , 2019, ACM Trans. Parallel Comput..
[3] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[4] Courtenay T. Vaughan,et al. Evaluating the Intel Skylake Xeon Processor for HPC Workloads , 2018, 2018 International Conference on High Performance Computing & Simulation (HPCS).
[5] Gerhard Wellein,et al. High-performance implementation of Chebyshev filter diagonalization for interior eigenvalue computations , 2015, J. Comput. Phys..
[6] Gerhard Wellein,et al. Desynchronization and Wave Pattern Formation in MPI-Parallel and Hybrid Memory-Bound Programs , 2020, ISC.
[7] Subhash Saini,et al. Performance Evaluation of an Intel Haswell-and Ivy Bridge-Based Supercomputer Using Scientific and Engineering Applications , 2016, 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS).
[8] Aamer Jaleel,et al. Adaptive insertion policies for high performance caching , 2007, ISCA '07.
[9] Gerhard Wellein,et al. likwid-bench: An Extensible Microbenchmarking Platform for x86 Multicore Compute Nodes , 2011, Parallel Tools Workshop.
[10] Constantine Bekas,et al. Stochastic Matrix-Function Estimators: Scalable Big-Data Kernels with High Performance , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[11] Carl Staelin,et al. lmbench: Portable Tools for Performance Analysis , 1996, USENIX Annual Technical Conference.
[12] Georg Hager,et al. On the accuracy and usefulness of analytic energy models for contemporary multicore processors , 2018, ISC.
[13] C. W. Glass,et al. Performance Modeling of the HPCG Benchmark , 2014, PMBS@SC.
[14] David E. Keyes,et al. Multidimensional Intratile Parallelization for Memory-Starved Stencil Computations , 2015, ACM Trans. Parallel Comput..
[15] Subhash Saini,et al. Performance Evaluation of Intel Broadwell Nodes Based Supercomputer Using Computational Fluid Dynamics and Climate Applications , 2017, 2017 IEEE 19th International Conference on High Performance Computing and Communications Workshops (HPCCWS).
[16] C. T. Vaughan,et al. Evaluating the Marvell ThunderX2 Server Processor for HPC Workloads , 2019, 2019 International Conference on High Performance Computing & Simulation (HPCS).
[17] A. Y. Suhov. An Accurate Polynomial Approximation of Exponential Integrators , 2014, J. Sci. Comput..
[18] Robert Schöne,et al. Main memory and cache performance of intel sandy bridge and AMD bulldozer , 2014, MSPC@PLDI.
[19] Simon McIntosh-Smith,et al. A performance analysis of the first generation of HPC‐optimized Arm processors , 2019, Concurr. Comput. Pract. Exp..
[20] Gerhard Wellein,et al. An Analysis of Core- and Chip-Level Architectural Features in Four Generations of Intel Server Processors , 2017, ISC.
[21] Gerhard Wellein,et al. Analysis of Intel's Haswell Microarchitecture Using the ECM Model and Microbenchmarks , 2016, ARCS.