Memory Bandwidth Prediction for HPC Applications in NUMA Architecture

This paper proposes a statistical method for memory bandwidth prediction in NUMA architecture. The memory bandwidth is expressed in terms of total transferred data per execution time. We first split memory bandwidth into components and measure data for each component separately. We then predict memory bandwidth for unknown values based on component measurements. Since memory bandwidth usage varies with problem size and the number of processors, we use these two values as parameters and perform the estimation in these two dimensions. This helps us to predict memory bandwidth usage for unknown problem sizes and processor numbers. This prediction method requires a few data points and it works dynamically without any source code inspection. We verified the approach in NUMA architecture for different regular and irregular high performance computing applications. The values for transferred data and elapsed time are analyzed for increasing input sizes and growing numbers of processors. Conclusions on memory bandwidth usage are derived from these data.

[1]  Barbara M. Chapman,et al.  Compile Time Modeling of Off-Chip Memory Bandwidth for Parallel Loops , 2013, LCPC.

[2]  Eui-Young Chung,et al.  Analytical memory bandwidth model for many-core processor based systems , 2012, IEICE Electron. Express.

[3]  Kevin Skadron,et al.  HotSpot: a compact thermal modeling methodology for early-stage VLSI design , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[4]  Mary Lou Soffa,et al.  DraMon: Predicting memory bandwidth usage of multi-threaded programs with high accuracy and low overhead , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[5]  Bronis R. de Supinski,et al.  Experiences with Achieving Portability across Heterogeneous Architectures , 2011 .

[6]  Rajendra V. Boppana,et al.  Performance Prediction of Parallel Applications Based on Small-Scale Executions , 2016, 2016 IEEE 23rd International Conference on High Performance Computing (HiPC).

[7]  John M. Mellor-Crummey,et al.  A tool to analyze the performance of multithreaded programs on NUMA architectures , 2014, PPoPP '14.

[8]  Rita Loogen,et al.  Estimating parallel performance , 2013, J. Parallel Distributed Comput..

[9]  Manoj K. Nambiar,et al.  Predicting the Runtime of Memory Intensive Batch Workloads , 2018, ICPP Workshops.

[10]  Wei Wang,et al.  Predicting the memory bandwidth and optimal core allocations for multi-threaded applications on large-scale NUMA machines , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[11]  Mateo Valero,et al.  HPC System Software for Regular and Irregular Parallel Applications , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[12]  Zbigniew J. Czech,et al.  Introduction to Parallel Computing , 2017 .

[13]  Yonghong Yan,et al.  Comparison of Threading Programming Models , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).