Subjective versus objective: classifying analytical models for productive heterogeneous performance prediction

Heterogeneous analytical models are valuable tools that facilitate optimal application tuning via runtime prediction; however, they require several man-hours of effort to understand and employ for meaningful performance prediction. Consequently, developers face the challenge of selecting adequate performance models that best fit their design goals and level of system knowledge. In this research, we present a classification that enables users to select a set of easy-to-use and reliable analytical models for quality performance prediction. These models, which target the general-purpose graphical processing unit (GPGPU)-based systems, are categorized into two primary analytical classes: subjective-analytical and objective-analytical. The subjective-analytical models predict the computation and communication components of an application by describing the system using minimum qualitative relations among the system parameters; whereas the objective-analytical models predict these components by measuring pertinent hardware events using micro-benchmarks. We categorize, enhance, and characterize the existing analytical models for GPGPU computations, network-level, and inter-connect communications to facilitate fast and reliable application performance prediction. We also explore a suitable combination of the aforementioned analytical classes, the hybrid approach, for high-quality performance prediction and report prediction accuracy up to 95 % for several tested GPGPU cluster configurations. The research aims to ultimately provide a collection of easy-to-select analytical models that promote straightforward and accurate performance prediction prior to large-scale implementation.

[1]  Vivek K. Pallipuram,et al.  A regression‐based performance prediction framework for synchronous iterative algorithms on general purpose graphical processing unit clusters , 2014, Concurr. Comput. Pract. Exp..

[2]  Eugene M. Izhikevich,et al.  Simple model of spiking neurons , 2003, IEEE Trans. Neural Networks.

[3]  A. Hodgkin,et al.  A quantitative description of membrane current and its application to conduction and excitation in nerve , 1990 .

[4]  Jitendra Malik,et al.  Scale-Space and Edge Detection Using Anisotropic Diffusion , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Lyle N. Long,et al.  Character Recognition using Spiking Neural Networks , 2007, 2007 International Joint Conference on Neural Networks.

[6]  Vivek K. Pallipuram,et al.  Exploring Multi-level Parallelism for Large-Scale Spiking Neural Networks , 2012 .

[7]  정혜동,et al.  InfiniBand 연결망 기반 데이터 전송 시 상위 응용에 따른 최적 패킷 크기에 관한 연구 , 2015 .

[8]  Hyesoon Kim,et al.  An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.

[9]  Ingemar Kinnmark,et al.  The Shallow Water Wave Equations: Formulation, Analysis and Application , 1985 .

[10]  M. C. Smith,et al.  A Multi-Node GPGPU Implementation of Non-Linear Anisotropic Diffusion Filter , 2012, 2012 Symposium on Application Accelerators in High Performance Computing.

[11]  A. Hodgkin,et al.  A quantitative description of membrane current and its application to conduction and excitation in nerve , 1952, The Journal of physiology.

[12]  C. Morris,et al.  Voltage oscillations in the barnacle giant muscle fiber. , 1981, Biophysical journal.

[13]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[14]  Chris J. Scheiman,et al.  LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation , 1995, SPAA '95.

[15]  H. Wilson Simplified dynamics of human and mammalian neocortical neurons. , 1999, Journal of theoretical biology.

[16]  Torsten Hoefler,et al.  Low-Overhead LogGP Parameter Assessment for Modern Interconnection Networks , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[17]  Hongchen Liu,et al.  Noise Removal Using Nonlinear Anisotropic Diffusion Filtering Based on Statistic-Local Open System , 2008, 2008 Congress on Image and Signal Processing.

[18]  Greg Burns,et al.  LAM: An Open Cluster Environment for MPI , 2002 .

[19]  Collin McCurdy,et al.  The Scalable Heterogeneous Computing (SHOC) benchmark suite , 2010, GPGPU-3.

[20]  William Gropp,et al.  An adaptive performance modeling tool for GPU architectures , 2010, PPoPP '10.

[21]  John E. Stone,et al.  GPU clusters for high-performance computing , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[22]  David R. Kaeli,et al.  Exploring the multiple-GPU design space , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[23]  With Invariant Submanifolds,et al.  Systems of Conservation Laws , 2009 .

[24]  Kees Verstoep,et al.  Fast Measurement of LogP Parameters for Message Passing Platforms , 2000, IPDPS Workshops.

[25]  Yao Zhang,et al.  A quantitative performance analysis model for GPU architectures , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.