Evaluation of memory performance in NUMA architectures using Stochastic Reward Nets

Abstract Understanding memory performance in multi-core platforms is a prerequisite to perform optimizations. To this end, this paper presents analytical models based on Stochastic Reward Nets (SRNs) to model and evaluate the memory performance of Non-Uniform Memory Access (NUMA) multi-core architectures. The approach considers the details of the architecture and first proposes a monolithic SRN model that evaluates the memory performance in terms of the mean memory response time. Since the monolithic model incurs a state space explosion with an increasing number of cores and memory controllers, two approximate models are presented that are able to evaluate large-scale NUMA architectures. The SRNs are validated through measurements on two NUMA multi-core platforms, a 64-core AMD Opteron server and a 72-core Intel system. The results demonstrate the ability of the proposed models to accurately compute the mean memory response time on NUMA architectures. The results also provide valuable information that runtime systems and application designers can use to optimize execution of parallel applications on such architectures.

[1]  Alexey L. Lastovetsky,et al.  New Model-Based Methods and Algorithms for Performance and Energy Optimization of Data Parallel Applications on Homogeneous Multicore Clusters , 2017, IEEE Transactions on Parallel and Distributed Systems.

[2]  Kishor S. Trivedi,et al.  Performability Evaluation of Grid Environments Using Stochastic Reward Nets , 2015, IEEE Transactions on Dependable and Secure Computing.

[3]  Hoon Choi,et al.  Performance Evaluation of Client-Server Systems , 1993, IEEE Trans. Parallel Distributed Syst..

[4]  Henk Jonkers,et al.  Queueing Models of Parallel Applications: The Glamis Methodology , 1994, Computer Performance Evaluation.

[5]  Younghyun Cho,et al.  Performance Modeling of Parallel Loops on Multi-Socket Platforms Using Queueing Systems , 2020, IEEE Transactions on Parallel and Distributed Systems.

[6]  Joe Celko Chapter 13 – Petri Nets , 2012 .

[7]  Dimitrios S. Nikolopoulos,et al.  Prediction-Based Power-Performance Adaptation of Multithreaded Scientific Codes , 2008, IEEE Transactions on Parallel and Distributed Systems.

[8]  Sergey Brin,et al.  Reprint of: The anatomy of a large-scale hypertextual web search engine , 2012, Comput. Networks.

[9]  Marco Ajmone Marsan,et al.  A class of generalized stochastic Petri nets for the performance evaluation of multiprocessor systems , 1984, TOCS.

[10]  Kishor S. Trivedi,et al.  Effective Modeling Approach for IaaS Data Center Performance Analysis under Heterogeneous Workload , 2018, IEEE Transactions on Cloud Computing.

[11]  Kishor S. Trivedi Probability and Statistics with Reliability, Queuing, and Computer Science Applications , 1984 .

[12]  Wei Wang,et al.  Predicting the memory bandwidth and optimal core allocations for multi-threaded applications on large-scale NUMA machines , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[13]  Danilo Ardagna,et al.  Hierarchical Stochastic Models for Performance, Availability, and Power Consumption Analysis of IaaS Clouds , 2019, IEEE Transactions on Cloud Computing.

[14]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[15]  William J. Dally,et al.  Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[16]  Younghyun Cho,et al.  Maximizing system utilization via parallelism management for co-located parallel applications , 2018, PACT.

[17]  Kishor S. Trivedi,et al.  Composite Performance and Availability Analysis Using a Hierarchy of Stochastic Reward Nets , 1991 .

[18]  Thin-Fong Tsuei,et al.  Queuing Simulation Model for Multiprocessor Systems , 2003, Computer.

[19]  Jungwon Kim,et al.  A Performance Model for GPUs with Caches , 2015, IEEE Transactions on Parallel and Distributed Systems.

[20]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[21]  Ching-Yung Lin,et al.  GraphBIG: understanding graph computing in the context of industrial solutions , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[22]  Kishor S. Trivedi,et al.  Automated Generation and Analysis of Markov Reward Models Using Stochastic Reward Nets , 1993 .

[23]  Kishor S. Trivedi,et al.  Fixed Point Iteration in Availability Modeling , 1991, Fault-Tolerant Computing Systems.

[24]  Vicent Selfa,et al.  A Hardware Approach to Fairly Balance the Inter-Thread Interference in Shared Caches , 2017, IEEE Transactions on Parallel and Distributed Systems.

[25]  James Lyle Peterson,et al.  Petri net theory and the modeling of systems , 1981 .

[26]  Lingjia Tang,et al.  Directly characterizing cross core interference through contention synthesis , 2011, HiPEAC.

[27]  Falko Bause,et al.  Stochastic Petri Nets: An Introduction to the Theory , 2012, PERV.