Evaluation of Energy Characteristics of MPI Communication Primitives with RAPL

The energy consumed by modern supercomputing systems continues to grow at an alarming rate. The Message Passing Interface (MPI) has been the de facto programming model for parallel applications and MPI libraries have been designed to achieve the best communication performance on modern architectures. However, the performance and energy trade-offs of these designs have not been studied. Hence, it is critical to understand the energy consumption characteristics of MPI routines and the performance-energy trade-offs of various protocols and designs that are used in MPI libraries. The first hurdle in achieving this objective is to design a framework that can be used to measure energy consumption of various components during communication operations. The RAPL interface allows users to measure energy across various domains on the Intel Sandy-Bridge processor, in a low-overhead, non-intrusive manner. However, this interface has certain limitations and cannot be directly used to measure energy profiles of MPI operations in a fine-grained manner. In this paper, we propose a novel methodology to address these limitations. We propose a new shared-memory window-based solution to accurately measure the aggregate energy consumed by all processes engaged in MPI operations. Using our proposed framework, we demonstrate the impact of various communication protocols and progress mechanisms on the energy consumption. Our evaluations demonstrate that the kernel-based solutions can potentially lead to lower energy consumption for intra-node communication operations. Further, our framework also reveals possible energy bottlenecks in scaling important collective operations, such as, MPI All reduce. In addition, we also use our proposed framework to study the energy consumption characteristics of MPI calls in the NAS-IS benchmark and we infer that the choice of progress mechanism can lead to about 6% energy savings for the processors.

[1]  Jeffrey S. Vetter,et al.  Communication characteristics of large-scale scientific applications for contemporary cluster architectures , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[2]  Sayantan Sur,et al.  LiMIC: support for high-performance MPI intra-node communication on Linux cluster , 2005, 2005 International Conference on Parallel Processing (ICPP'05).

[3]  Sung Woo Chung,et al.  Low-Cost Application-Aware DVFS for Multi-core Architecture , 2008, 2008 Third International Conference on Convergence and Hybrid Information Technology.

[4]  Rong Ge,et al.  Performance-constrained Distributed DVS Scheduling for Scientific Applications on Power-aware Clusters , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[5]  Jiuxing Liu,et al.  Evaluating high performance communication: a power perspective , 2009, ICS.

[6]  Sayantan Sur,et al.  Designing Power-Aware Collective Communication Algorithms for InfiniBand Clusters , 2010, 2010 39th International Conference on Parallel Processing.

[7]  Mahmut T. Kandemir,et al.  Reducing power with performance constraints for parallel sparse applications , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[8]  Shuaiwen Song,et al.  Designing Energy Efficient Communication Runtime Systems for Data Centric Programming Models , 2010, 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing.

[9]  David K. Lowenthal,et al.  Using multiple energy gears in MPI programs on a power-scalable cluster , 2005, PPoPP.

[10]  Shirley Moore,et al.  Measuring Energy and Power with PAPI , 2012, 2012 41st International Conference on Parallel Processing Workshops.

[11]  Rahul Khanna,et al.  RAPL: Memory power estimation and capping , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).

[12]  Rong Ge,et al.  High-performance, power-aware distributed computing for scientific applications , 2005, Computer.

[13]  Abhinav Vishnu,et al.  Energy Templates: Exploiting Application Information to Save Energy , 2011, 2011 IEEE International Conference on Cluster Computing.

[14]  Martin Schulz,et al.  Beyond DVFS: A First Look at Performance under a Hardware-Enforced Power Bound , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[15]  Nathan R. Tallent,et al.  HPCTOOLKIT: tools for performance analysis of optimized parallel programs , 2010, Concurr. Comput. Pract. Exp..

[16]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[17]  Mohammad Banikazemi,et al.  PAM: A novel performance/power aware meta-scheduler for multi-core systems , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.