An Energy Manager for High Performance Computer Clusters

This paper presents a general energy management system for HPC clusters and cloud infrastructures that powers off cluster nodes when they are not being used, and conversely powers them on when they are needed. This system can be integrated with different HPC cluster middleware, such as Batch-Queuing Systems or Cloud Management Systems, by using a set of connectors, and is also able to deal with different mechanisms for powering on and off the computing nodes (such as Wake-on-Lan, Power Device Units, Intelligent Platform Management Interface or other infrastructure-specific mechanisms). While some existing Batch-Queuing Systems provide energy saving mechanisms, other popular choices lack this feature. Cloud management middleware do not generally provide this feature out of the box, and incorporating it implies making modifications to the middleware. The advantage of our approach is that it can be integrated with different resource management middleware, without needing any modification of that middleware. The paper describes the successful integration of the system proposed with the popular Torque/PBS management system, and also with the OpenNebula open source cloud management tool. Two real use-cases are presented, involving two different HPC clusters. These use cases show significant energy/costs savings of 38% and 16%.

[1]  Gregor von Laszewski,et al.  Using computational grid capabilities to enhance the capability of an X‐ray source for structural biology , 2004, Cluster Computing.

[2]  S.-A. Liang A high power and high efficiency PC power supply topology with low cost design to meet 80 Plus Bronze requirements , 2009, 2009 IEEE International Conference on Industrial Technology.

[3]  Henri Casanova,et al.  Resource Allocation Using Virtual Clusters , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[4]  Wu-chun Feng,et al.  The Green500 List: Encouraging Sustainable Supercomputing , 2007, Computer.

[5]  S.-A. Liang Low cost and high efficiency PC power supply design to meet 80 plus requirement , 2008, 2008 IEEE International Conference on Industrial Technology.

[6]  Urs Hölzle,et al.  High-efficiency power supplies for home computers and servers , 2006 .

[7]  Ian T. Foster,et al.  The anatomy of the grid: enabling scalable virtual organizations , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[8]  Jean-Marc Pierson,et al.  Energy-aware resource allocation , 2009, 2009 10th IEEE/ACM International Conference on Grid Computing.

[9]  Randy H. Katz,et al.  Above the Clouds: A Berkeley View of Cloud Computing , 2009 .

[10]  Ricardo Bianchini,et al.  Energy conservation in heterogeneous server clusters , 2005, PPoPP.

[11]  Rong Ge,et al.  Green Supercomputing Comes of Age , 2008, IT Professional.

[12]  Xi He,et al.  Power-aware scheduling of virtual machines in DVFS-enabled clusters , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[13]  Jordi Torres,et al.  Towards energy-aware scheduling in data centers using machine learning , 2010, e-Energy.

[14]  Laurent Lefèvre,et al.  Towards Energy Aware Reservation Infrastructure for Large-Scale Experimental Distributed Systems , 2009, Parallel Process. Lett..

[15]  Ian T. Foster,et al.  The Globus project: a status report , 1998, Proceedings Seventh Heterogeneous Computing Workshop (HCW'98).