Analytical Performance Modeling and Validation of Intel's Xeon Phi Architecture

Modeling the performance of scientific applications on emerging hardware plays a central role in achieving extreme-scale computing goals. Analytical models that capture the interaction between applications and hardware characteristics are attractive because even a reasonably accurate model can be useful for performance tuning before the hardware is made available. In this paper, we develop a hardware model for Intel's second-generation Xeon Phi architecture code-named Knights Landing (KNL) for the SKOPE framework. We validate the KNL hardware model by projecting the performance of minibenchmarks and application kernels. The results show that our KNL model can project the performance with prediction errors of 10% to 20%. The hardware model also provides informative recommendations for code transformations and tuning.

[1]  Jichi Guo,et al.  Analytically Modeling Application Execution for Software-Hardware Co-design , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[2]  Xingfu Wu,et al.  SWAPP: A Framework for Performance Projections of HPC Applications Using Benchmarks , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[3]  Fabrizio Petrini,et al.  Predictive Performance and Scalability Modeling of a Large-Scale Application , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[4]  Venkatram Vishwanath,et al.  SKOPE: a framework for modeling and exploring workload behavior , 2014, Conf. Computing Frontiers.

[5]  Nathan R. Tallent,et al.  HPCTOOLKIT: tools for performance analysis of optimized parallel programs , 2010, Concurr. Comput. Pract. Exp..

[6]  Venkatram Vishwanath,et al.  Improving Multisite Workflow Performance Using Model-Based Scheduling , 2014, 2014 43rd International Conference on Parallel Processing.

[7]  Jeffrey S. Vetter,et al.  Aspen: A domain specific language for performance modeling , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[8]  Allen D. Malony,et al.  The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..

[9]  Venkatram Vishwanath,et al.  GROPHECY: GPU performance projection from CPU code skeletons , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).