Eco-Efficient Resource Management in HPC Clusters through Computer Intelligence Techniques

High Performance Computing Clusters (HPCCs) are common platforms for solving both up-to-date challenges and high-dimensional problems faced by IT service providers. Nonetheless, the use of HPCCs carries a substantial and growing economic and environmental impact, owing to the large amount of energy they need to operate. In this paper, a two-stage holistic optimisation mechanism is proposed to manage HPCCs in an eco-efficiently manner. The first stage logically optimises the resources of the HPCC through reactive and proactive strategies, while the second stage optimises hardware allocation by leveraging a genetic fuzzy system tailored to the underlying equipment. The model finds optimal trade-offs among quality of service, direct/indirect operating costs, and environmental impact, through multiobjective evolutionary algorithms meeting the preferences of the administrator. Experimentation was done using both actual workloads from the Scientific Modelling Cluster of the University of Oviedo and synthetically-generated workloads, showing statistical evidence supporting the adoption of the new mechanism.

[1]  Feng Pan,et al.  Analyzing the Energy-Time Trade-Off in High-Performance Computing Applications , 2007, IEEE Transactions on Parallel and Distributed Systems.

[2]  Luca Castellazzi,et al.  Trends in Data Centre Energy Consumption under the European Code of Conduct for Data Centre Energy Efficiency , 2017 .

[3]  Laurent Lefèvre,et al.  A Runtime Framework for Energy Efficient HPC Systems without a Priori Knowledge of Applications , 2012, 2012 IEEE 18th International Conference on Parallel and Distributed Systems.

[4]  Xiao Qin,et al.  Improving Energy-Efficiency of Computational Grids via Scheduling , 2010 .

[5]  Hisao Ishibuchi,et al.  Classification and modeling with linguistic information granules - advanced approaches to linguistic data mining , 2004, Advanced information processing.

[6]  José Ranilla,et al.  A software tool to efficiently manage the energy consumption of HPC clusters , 2015, 2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[7]  K. How we do it. , 1966, The Journal of school health.

[8]  Jeffrey F. Naughton,et al.  On energy management, load balancing and replication , 2010, SGMD.

[9]  Ulrich Kremer,et al.  The design, implementation, and evaluation of a compiler algorithm for CPU energy reduction , 2003, PLDI '03.

[10]  Dong Li,et al.  Power-aware MPI task aggregation prediction for high-end computing systems , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[11]  José Ranilla,et al.  Improving the energy efficiency of virtual data centers in an IT service provider through proactive fuzzy rules-based multicriteria decision making , 2018, The Journal of Supercomputing.

[12]  David K. Lowenthal,et al.  Using multiple energy gears in MPI programs on a power-scalable cluster , 2005, PPoPP.

[13]  Vicente Hernández,et al.  An Energy Manager for High Performance Computer Clusters , 2012, 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications.

[14]  Jesús Labarta,et al.  Tools for Power-Energy Modelling and Analysis of Parallel Scientific Applications , 2012, 2012 41st International Conference on Parallel Processing.

[15]  Jiacheng Ni,et al.  A review of air conditioning energy performance in data centers , 2017 .

[16]  Amip J. Shah,et al.  The life cycle assessment of a UK data centre , 2015, The International Journal of Life Cycle Assessment.

[17]  José Ranilla,et al.  Improving the Eco-Efficiency of High Performance Computing Clusters Using EECluster , 2016 .

[18]  Jordi Torres,et al.  Towards energy-aware scheduling in data centers using machine learning , 2010, e-Energy.

[19]  Sarita V. Adve,et al.  AS SCALING THREATENS TO ERODE RELIABILITY STANDARDS, LIFETIME RELIABILITY MUST BECOME A FIRST-CLASS DESIGN CONSTRAINT. MICROARCHITECTURAL INTERVENTION OFFERS A NOVEL WAY TO MANAGE LIFETIME RELIABILITY WITHOUT SIGNIFICANTLY SACRIFICING COST AND PERFORMANCE , 2005 .

[20]  Navendu Jain,et al.  Managing cost, performance, and reliability tradeoffs for energy-aware server provisioning , 2011, 2011 Proceedings IEEE INFOCOM.

[21]  S. Huang,et al.  Energy-Efficient Cluster Computing via Accurate Workload Characterization , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[22]  José Ranilla,et al.  Leveraging a predictive model of the workload for intelligent slot allocation schemes in energy-efficient HPC clusters , 2016, Eng. Appl. Artif. Intell..

[23]  José Ranilla,et al.  Energy-efficient allocation of computing node slots in HPC clusters through parameter learning and hybrid genetic fuzzy system modeling , 2014, The Journal of Supercomputing.

[24]  Rajkumar Buyya,et al.  Cluster Computing: High-Performance, High-Availability, and High-Throughput Processing on a Network of Computers , 2006, Handbook of Nature-Inspired and Innovative Computing.

[25]  Xiao Qin,et al.  Energy efficient scheduling for parallel applications on mobile clusters , 2008, Cluster Computing.

[26]  E. N. Elnozahy,et al.  Energy-Efficient Server Clusters , 2002, PACS.

[27]  H. Ishibuchi Genetic fuzzy systems: evolutionary tuning and learning of fuzzy knowledge bases , 2004 .

[28]  Sandeep K. S. Gupta,et al.  Energy-Efficient Thermal-Aware Task Scheduling for Homogeneous High-Performance Computing Data Centers: A Cyber-Physical Approach , 2008, IEEE Transactions on Parallel and Distributed Systems.

[29]  Richard E. Brown,et al.  United States Data Center Energy Usage Report , 2016 .

[30]  Simon Kiertscher,et al.  Cherub: Power Consumption Aware Cluster Resource Management , 2010, GreenCom/CPSCom.

[31]  Yu Zeng,et al.  Automatic Energy Status Controlling with Dynamic Voltage Scaling in Power-Aware High Performance Computing Cluster , 2011, 2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies.

[32]  Michio Sugeno,et al.  Fuzzy identification of systems and its applications to modeling and control , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[33]  Anand Sivasubramaniam,et al.  Managing server energy and operational costs in hosting centers , 2005, SIGMETRICS '05.

[34]  Rong Ge,et al.  CPU MISER: A Performance-Directed, Run-Time System for Power-Aware Clusters , 2007, 2007 International Conference on Parallel Processing (ICPP 2007).

[35]  Kang G. Shin,et al.  Profiling Software for Energy Consumption , 2012, 2012 IEEE International Conference on Green Computing and Communications.

[36]  Wu-chun Feng,et al.  A Power-Aware Run-Time System for High-Performance Computing , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[37]  Yao Sun,et al.  Sacrificing Reliability for Energy Saving: Is it worthwhile for disk arrays? , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[38]  George Forman,et al.  Cool Job Allocation: Measuring the Power Savings of Placing Jobs at Cooling-Efficient Locations in the Data Center , 2007, USENIX Annual Technical Conference.

[39]  Rajarshi Das,et al.  Autonomic multi-agent management of power and performance in data centers , 2008, AAMAS.

[40]  Hai Jin,et al.  Lifetime or energy: Consolidating servers with reliability control in virtualized cloud datacenters , 2012, 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings.

[41]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[42]  Michael I. Jordan,et al.  Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..

[43]  D.K. Lowenthal,et al.  Adaptive, Transparent Frequency and Voltage Scaling of Communication Phases in MPI Programs , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[44]  Zhiyuan Li,et al.  A programming environment with runtime energy characterization for energy-aware applications , 2007, Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07).

[45]  Enrique V. Carrera,et al.  Load balancing and unbalancing for power and performance in cluster-based systems , 2001 .