Predicting Job Power Consumption Based on RJMS Submission Data in HPC Systems

Power-aware scheduling is a promising solution to the resource usage monitoring of High-Performance Computing facility electrical power consumption. This kind of solution needs a reliable estimation of job power consumption to feed the Resources and Jobs Management System at submission time. Available data for inference is restricted in practice because unavailable or even untrustworthy. We propose in this work an instance-based model using only the submission logs and user provided job data. GID and the number of tasks per node appears to be good features for prediction of a job’s average power consumption. Moreover, we extant this model to production context with online computation to make a practical global power prediction from job submission data using instances re-weighting. The performance of the online model are excellent on COBALT’s data. With any doubt this model will be a good candidate for the achievement of consistent power-aware scheduling for other computing centers with similar informative inputs.

[1]  Dror G. Feitelson,et al.  Utilization, Predictability, Workloads, and User Runtime Estimates in Scheduling the IBM SP2 with Backfilling , 2001, IEEE Trans. Parallel Distributed Syst..

[2]  Yuichi Tsujita,et al.  Classifying Jobs and Predicting Applications in HPC Systems , 2018, ISC.

[3]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[4]  Ian Karlin,et al.  LULESH 2.0 Updates and Changes , 2013 .

[5]  Daniel Andresen,et al.  Improving HPC System Performance by Predicting Job Resources via Supervised Machine Learning , 2019, PEARC.

[6]  Özalp Babaoglu,et al.  Power Consumption Modeling and Prediction in a Hybrid CPU-GPU-MIC Supercomputer (preliminary version) , 2016, Euro-Par.

[7]  Özalp Babaoglu,et al.  A data‐driven approach to modeling power consumption for a hybrid supercomputer , 2018, Concurr. Comput. Pract. Exp..

[8]  Yiannis Georgiou,et al.  Energy Accounting and Control with SLURM Resource and Job Management System , 2014, ICDCN.

[9]  Andrea Borghesi,et al.  Scheduling-based power capping in high performance computing systems , 2018, Sustain. Comput. Informatics Syst..

[10]  Denis Trystram,et al.  Improving backfilling by using machine learning to predict running times , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[11]  Ulrich A. Müller,et al.  Operators on Inhomogeneous Time Series , 2000 .

[12]  C. Holt Author's retrospective on ‘Forecasting seasonals and trends by exponentially weighted moving averages’ , 2004 .

[13]  Laurent Lefèvre,et al.  Towards Energy Budget Control in HPC , 2017, 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).

[14]  Kenny Gruchalla,et al.  Prediction and characterization of application power use in a high‐performance computing environment , 2017, Stat. Anal. Data Min..

[15]  Marcus B. Perry,et al.  The Exponentially Weighted Moving Average , 2010 .

[16]  Andreas Eckner,et al.  Algorithms for Unevenly Spaced Time Series : Moving Averages and Other Rolling Operators , 2015 .

[17]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[18]  Tom M. Mitchell,et al.  The Need for Biases in Learning Generalizations , 2007 .

[19]  Luca Benini,et al.  Predictive Modeling for Job Power Consumption in HPC Systems , 2016, ISC.

[20]  Scott Pakin,et al.  Modeling and Predicting Power Consumption of High Performance Computing Jobs , 2014 .

[21]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[22]  Andy B. Yoo,et al.  Approved for Public Release; Further Dissemination Unlimited X-ray Pulse Compression Using Strained Crystals X-ray Pulse Compression Using Strained Crystals , 2002 .