Modeling and Predicting Power Consumption of High Performance Computing Jobs

Power is becoming an increasingly important concern for large supercomputing centers. Due to cost concerns, data centers are becoming increasingly limited in their ability to enhance their power infrastructure to support increased compute power on the machine-room oor. At Los Alamos National Laboratory it is projected that future-generation supercomputers will be power-limited rather than budget-limited. That is, it will be less costly to acquire a large number of nodes than it will be to upgrade an existing data-center and machine-room power infrastructure to run that large number of nodes at full power. That said, it is often the case that more power infrastructure is allocated to existing supercomputers than these machines typically draw. In the power-limited systems of the future, machines will in principle be capable of drawing more power than they have available. Thus, power capping at the node/job level must be used to ensure the total system power draw remains below the available level. In this paper, we present a statistically grounded framework with which to predict (with uncertainty) how much power a given job will need and use these predictions to provide an optimal node-level power capping strategy. We model the power drawn by a given job (and subsequently by the entire machine) using hierarchical Bayesian modeling with hidden Markov and Dirichlet process models. We then demonstrate how this model can be used inside of a power-management scheme to minimize the aect of power

[1]  Rong Ge,et al.  Power-Aware Speedup , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[2]  Lawrence Carin,et al.  Hidden Markov Models With Stick-Breaking Priors , 2009, IEEE Transactions on Signal Processing.

[3]  Leonhard Held,et al.  Gaussian Markov Random Fields: Theory and Applications , 2005 .

[4]  Michael I. Jordan,et al.  A Sticky HDP-HMM With Application to Speaker Diarization , 2009, 0905.2592.

[5]  Lancelot F. James,et al.  Gibbs Sampling Methods for Stick-Breaking Priors , 2001 .

[6]  Wu-chun Feng,et al.  A Power-Aware Run-Time System for High-Performance Computing , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[7]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[8]  Michael Lang,et al.  Trapped Capacity: Scheduling under a Power Cap to Maximize Machine-Room Throughput , 2014, 2014 Energy Efficient Supercomputing Workshop.

[9]  A. Doucet,et al.  On-Line Parameter Estimation in General State-Space Models , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.

[10]  Colin Rose Computational Statistics , 2011, International Encyclopedia of Statistical Science.

[11]  Michael I. Jordan,et al.  Joint Modeling of Multiple Related Time Series via the Beta Process , 2011, 1111.4226.

[12]  Heather Quinn,et al.  A Bayesian Reliability Analysis of Neutron-Induced Errors in High Performance Computing Hardware , 2013 .

[13]  J. Dukowicz,et al.  Implicit free‐surface method for the Bryan‐Cox‐Semtner ocean model , 1994 .

[14]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[15]  A. U.S. Practical Filtering with Sequential Parameter Learning , 2002 .

[16]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[17]  S. E. Michalak,et al.  Assessment of the Impact of Cosmic-Ray-Induced Neutrons on Hardware in the Roadrunner Supercomputer , 2012, IEEE Transactions on Device and Materials Reliability.

[18]  D. Blei Bayesian Nonparametrics I , 2016 .

[19]  Jun S. Liu,et al.  Sequential Monte Carlo methods for dynamic systems , 1997 .

[20]  P. Moral,et al.  Sequential Monte Carlo samplers , 2002, cond-mat/0212648.

[21]  Michael Lang,et al.  Power usage of production supercomputers and production workloads , 2016, Concurr. Comput. Pract. Exp..

[22]  Geir Storvik,et al.  Particle filters for state-space models with the presence of unknown static parameters , 2002, IEEE Trans. Signal Process..

[23]  H. Künsch Gaussian Markov random fields , 1979 .

[24]  Michael A. West,et al.  Combined Parameter and State Estimation in Simulation-Based Filtering , 2001, Sequential Monte Carlo Methods in Practice.

[25]  P. Müller,et al.  Bayesian Nonparametrics: An invitation to Bayesian nonparametrics , 2010 .

[26]  Partha Pratim Pande,et al.  Power efficiency in high performance computing , 2012 .

[27]  Michael I. Jordan,et al.  JOINT MODELING OF MULTIPLE TIME SERIES VIA THE BETA PROCESS WITH APPLICATION TO MOTION CAPTURE SEGMENTATION , 2013, 1308.4747.

[28]  Martin Schulz,et al.  Exploring hardware overprovisioning in power-constrained, high performance computing , 2013, ICS '13.

[29]  Marina Vannucci,et al.  A DIRICHLET PROCESS MIXTURE OF HIDDEN MARKOV MODELS FOR PROTEIN STRUCTURE PREDICTION. , 2010, The annals of applied statistics.

[30]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[31]  Michael Lang,et al.  Energy modeling of supercomputers and large-scale scientific applications , 2013, 2013 International Green Computing Conference Proceedings.

[32]  John Shalf,et al.  Power efficiency in high performance computing , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[33]  Feng Pan,et al.  Analyzing the Energy-Time Trade-Off in High-Performance Computing Applications , 2007, IEEE Transactions on Parallel and Distributed Systems.

[34]  M. Pitt,et al.  Filtering via Simulation: Auxiliary Particle Filters , 1999 .

[35]  Carl E. Rasmussen,et al.  Factorial Hidden Markov Models , 1997 .

[36]  Matt Taddy,et al.  Markov switching Dirichlet process mixture regression , 2009 .

[37]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .