Evaluating the Intel Skylake Xeon Processor for HPC Workloads

Despite significant advances in the porting of scientific applications to novel architectures such as compute-optimized graphics processors, many-core processor/accelerators and, even special-purpose function units, the vast majority of scientific calculations are still performed on high-performance, commodity server processors. Even in the cases of applications which have been ported to new architectures, frequent serial sections still require strong server-class processor cores to compute as fast as possible. In this paper we report on a set of benchmark studies which evaluate Intel's latest Skylake Xeon server processor. Skylake represents a significant change in the Xeon product line with wider SIMD vector units, a redesigned cache architecture, and, an increased number of memory channels. The wider vector units provide 2x improvement for some compute-intensive applications and the combined memory changes can provide close to 2x the memory bandwidth. We evaluate these new hardware features on several HPC-relevant mini-applications and benchmarks, including, STREAM, LULESH, XSBench, HPCG and SW4Lite. Together, the new hardware functions provide up to 1.8x speedup on HPC benchmark codes when compared with the previous generation Haswell processor core, providing much greater utility to a broader range of HPC applications that rely on this class of compute node.

[1]  N. Anders Petersson,et al.  A Fourth Order Accurate Finite Difference Scheme for the Elastic Wave Equation in Second Order Formulation , 2012, J. Sci. Comput..

[2]  Andrew Siegel,et al.  XSBENCH - THE DEVELOPMENT AND VERIFICATION OF A PERFORMANCE ABSTRACTION FOR MONTE CARLO REACTOR ANALYSIS , 2014 .

[3]  Douglas W. Doerfler Trinity: Next-Generation Supercomputer for the ASC Program. , 2014 .

[4]  Pradeep Dubey,et al.  Optimizations in a high-performance conjugate gradient benchmark for IA-based multi- and many-core processors , 2016, Int. J. High Perform. Comput. Appl..

[5]  Avinash Sodani,et al.  Knights landing (KNL): 2nd Generation Intel® Xeon Phi processor , 2015, 2015 IEEE Hot Chips 27 Symposium (HCS).

[6]  Efraim Rotem,et al.  Inside 6th-Generation Intel Core: New Microarchitecture Code-Named Skylake , 2017, IEEE Micro.

[7]  Sandia Report,et al.  Toward a New Metric for Ranking High Performance Computing Systems , 2013 .

[8]  Keith D. Underwood,et al.  Intel® Omni-path Architecture: Enabling Scalable, High Performance Fabrics , 2015, 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects.

[9]  Ashish Khanna,et al.  Broadwell: A family of IA 14nm processors , 2015, 2015 Symposium on VLSI Circuits (VLSI Circuits).

[10]  R. Hornung,et al.  HYDRODYNAMICS CHALLENGE PROBLEM , 2011 .

[11]  Courtenay T. Vaughan,et al.  Application Performance on the Tri-Lab Linux Capacity Cluster - TLCC , 2010, Int. J. Distributed Syst. Technol..

[12]  Courtenay T. Vaughan,et al.  ASC Tri-lab Co-design Level 2 Milestone Report 2015 , 2015 .

[13]  James H. Laros,et al.  Trinity Advanced Technology System Overview. , 2015 .

[14]  Carl Staelin,et al.  lmbench: Portable Tools for Performance Analysis , 1996, USENIX Annual Technical Conference.

[15]  C. T. Vaughan,et al.  Early Experiences with Trinity-The First Advanced Technology Platform for the ASC Program , 2016 .

[16]  Ian Karlin,et al.  LULESH 2.0 Updates and Changes , 2013 .

[17]  Simon David Hammond,et al.  Sandia's Advanced Architecture Test Beds. , 2014 .

[18]  Sandia Report,et al.  HPCG Technical Specification , 2013 .

[19]  Courtenay T. Vaughan,et al.  Unprecedented Scalability and Performance of the New NNSA Tri-Lab Linux Capacity Cluster 2 , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[20]  Yen-Chen Liu,et al.  Knights Landing: Second-Generation Intel Xeon Phi Product , 2016, IEEE Micro.