Energy efficient communications in quantum chemistry applications

Modern supercomputing platform designers are becoming increasingly aware of the operational costs and reliability issues, which are rising due to high power consumption of such systems. At the same time, high-performance application developers are taking pro-active steps towards less energy consumption without a significant performance loss. One way to accomplish energy savings during application execution is to change the processor frequency dynamically when processor is not busy, such as during certain communication stages. Previously, the authors have proposed a runtime procedure that identifies communication phases in parallel applications to apply frequency scaling efficiently and without much overhead. The present work applies the phase detection procedure to parallel electronic structure calculations, performed by a widely used package GAMESS. High computational intensity of these calculations and the GAMESS communication model, which distinguishes computation and communication processes, motivated the investigations in this paper. They have led to several insights as to the role of process-core mapping in the application of dynamic frequency scaling during communications.

[1]  Martin Schulz,et al.  Bounding energy consumption in large-scale MPI programs , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[2]  Masha Sosonkina,et al.  Runtime Procedure for Energy Savings in Applications with Point-to-Point Communications , 2012, 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing.

[3]  Margaret Martonosi,et al.  Techniques for Multicore Thermal Management: Classification and New Exploration , 2006, ISCA 2006.

[4]  José González,et al.  Understanding the Thermal Implications of Multi-Core Architectures , 2007, IEEE Transactions on Parallel and Distributed Systems.

[5]  Kwang S. Kim,et al.  Theory and applications of computational chemistry : the first forty years , 2005 .

[6]  Rahul Khanna,et al.  RAPL: Memory power estimation and capping , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).

[7]  Mark S. Gordon,et al.  Advances in electronic structure theory , 2005 .

[8]  Reza Zamani,et al.  A feasibility analysis of power-awareness and energy minimization in modern interconnects for high-performance computing , 2007, 2007 IEEE International Conference on Cluster Computing.

[9]  David K. Lowenthal,et al.  Using multiple energy gears in MPI programs on a power-scalable cluster , 2005, PPoPP.

[10]  Raghubir Singh,et al.  Silatranes. A Review on Their Synthesis, Structure, Reactivity and Applications , 2011 .

[11]  Min Yeol Lim,et al.  Adaptive, transparent CPU scaling algorithms leveraging inter-node MPI communication regions , 2011, Parallel Comput..

[12]  Shuaiwen Song,et al.  Energy Profiling and Analysis of the HPC Challenge Benchmarks , 2009, Int. J. High Perform. Comput. Appl..

[13]  Masha Sosonkina,et al.  Per-call Energy Saving Strategies in All-to-All Communications , 2011, EuroMPI.

[14]  Sayantan Sur,et al.  Designing Power-Aware Collective Communication Algorithms for InfiniBand Clusters , 2010, 2010 39th International Conference on Parallel Processing.

[15]  Rolf Rabenseifner,et al.  Automatic Profiling of MPI Applications with Hardware Performance Counters , 1999, PVM/MPI.

[16]  S. Huang,et al.  Energy-Efficient Cluster Computing via Accurate Workload Characterization , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[17]  David H. Bailey,et al.  The NAS parallel benchmarks summary and preliminary results , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[18]  Eduard Ayguadé,et al.  Decomposable and responsive power models for multicore processors using performance counters , 2010, ICS '10.

[19]  Rong Ge,et al.  CPU MISER: A Performance-Directed, Run-Time System for Power-Aware Clusters , 2007, 2007 International Conference on Parallel Processing (ICPP 2007).

[20]  James Demmel,et al.  Avoiding communication in sparse matrix computations , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[21]  Karthick Rajamani,et al.  Thermal response to DVFS: analysis with an Intel Pentium M , 2007, Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07).

[22]  Mark S. Gordon,et al.  General atomic and molecular electronic structure system , 1993, J. Comput. Chem..

[23]  Mark S. Gordon,et al.  The Distributed Data Interface in GAMESS , 2000 .

[24]  Naehyuck Chang,et al.  Accurate modeling and calculation of delay and energy overheads of dynamic voltage scaling in modern high-performance microprocessors , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).

[25]  Shuaiwen Song,et al.  Designing Energy Efficient Communication Runtime Systems for Data Centric Programming Models , 2010, 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing.

[26]  Xiao Zhang,et al.  An Evaluation of Per-Chip Nonuniform Frequency Scaling on Multicores , 2010, USENIX Annual Technical Conference.

[27]  Dong Li,et al.  PowerPack: Energy Profiling and Analysis of High-Performance Systems and Applications , 2010, IEEE Transactions on Parallel and Distributed Systems.

[28]  Becky Verastegui,et al.  Proceedings of the 2007 ACM/IEEE conference on Supercomputing , 2007, HiPC 2007.

[29]  Bronis R. de Supinski,et al.  Prediction models for multi-dimensional power-performance optimization on many cores , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[30]  Wu-chun Feng,et al.  A Power-Aware Run-Time System for High-Performance Computing , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[31]  Zhao Zhang,et al.  Achieving energy efficiency during collective communications , 2013, Concurr. Comput. Pract. Exp..

[32]  Mark S. Gordon,et al.  A new hierarchical parallelization scheme: Generalized distributed data interface (GDDI), and an application to the fragment molecular orbital method (FMO) , 2004, J. Comput. Chem..

[33]  Nikolas Ioannou,et al.  Phase-Based Application-Driven Hierarchical Power Management on the Single-chip Cloud Computer , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[34]  Ray Elliott Proceedings of the 1991 ACM/IEEE conference on Supercomputing , 1991 .

[35]  Bronis R. de Supinski,et al.  Adagio: making DVS practical for complex HPC applications , 2009, ICS.

[36]  Mark S. Gordon,et al.  Enabling the Efficient Use of SMP Clusters: The GAMESS/DDI Model , 2003, SC.

[37]  Jiuxing Liu,et al.  Evaluating high performance communication: a power perspective , 2009, ICS.

[38]  Mark S. Gordon,et al.  A dash of protons: A theoretical study on the hydrolysis mechanism of 1-substituted silatranes and their protonated analogs , 2012 .

[39]  Jack Dongarra,et al.  Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface , 1997 .