Scheduling Grid workloads on multicore clusters to minimize energy and maximize performance

Energy is a significant and growing component of the cost of running a large computing facility. A grid workload consisting of millions of jobs running on thousands of processors may consume millions of kilowatt hours of electricity. However, because a grid workload generally consists of many independent sequential processes, we may shape its execution to satisfy energy constraints. By varying the number and frequency of processors available, a scheduler may trade off energy against performance. In this paper, we explore energy and performance tradeoffs in the scheduling of grid workloads on large clusters. We build upon previous work by showing the interaction of intelligent job assignment, automated node scaling, and frequency scaling on multicore clusters. An unexpected result is that, even though low frequency is the most efficient mode of operating a single node, the careful application of frequency scaling can actually reduce overall energy consumption even further by reducing the number of nodes powered on.

[1]  Henri Casanova,et al.  Simgrid: a toolkit for the simulation of application scheduling , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[2]  Ricardo Bianchini,et al.  Dynamic cluster reconfiguration for power and performance , 2003 .

[3]  Alexandru Iosup,et al.  The Grid Workloads Archive , 2008, Future Gener. Comput. Syst..

[4]  M. Lamanna The LHC computing grid project at CERN , 2004 .

[5]  P. Pani,et al.  GEMS: Underwater spectrometer for long-term radioactivity measurements , 2011 .

[6]  Ricardo Bianchini,et al.  Power and energy management for server systems , 2004, Computer.

[7]  Rajkumar Buyya,et al.  GridSim: a toolkit for the modeling and simulation of distributed resource management and scheduling for Grid computing , 2002, Concurr. Comput. Pract. Exp..

[8]  Franck Cappello,et al.  Grid'5000: A Large Scale And Highly Reconfigurable Experimental Grid Testbed , 2006, Int. J. High Perform. Comput. Appl..

[9]  Margaret Martonosi,et al.  An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[10]  Kurt Stockinger,et al.  OptorSim-A Grid Simulator for Studying Dynamic Data Replication Strategies , 2003 .

[11]  Ian T. Foster,et al.  GangSim: a simulator for grid scheduling studies , 2005, CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005..

[12]  Amin Vahdat,et al.  Managing energy and server resources in hosting centers , 2001, SOSP.

[13]  Peter A. Dinda,et al.  PICSEL: measuring user-perceived performance to control dynamic frequency scaling , 2008, ASPLOS.

[14]  Diana Marculescu,et al.  Analysis of dynamic voltage/frequency scaling in chip-multiprocessors , 2007, Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07).

[15]  Peter A. Dinda,et al.  User-Driven Frequency Scaling , 2006, IEEE Computer Architecture Letters.

[16]  Jeffrey S. Chase,et al.  Making Scheduling "Cool": Temperature-Aware Workload Placement in Data Centers , 2005, USENIX Annual Technical Conference, General Track.

[17]  Satoshi Matsuoka,et al.  Overview of a performance evaluation system for global computing scheduling algorithms , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).

[18]  Hui Li,et al.  Workload Characteristics of a Multi-cluster Supercomputer , 2004, JSSPP.

[19]  Brian Vinter,et al.  The Nordugrid production grid infrastructure, status and plans , 2003, Proceedings. First Latin American Web Congress.

[20]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[21]  Massoud Pedram,et al.  Dynamic voltage and frequency scaling based on workload decomposition , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[22]  Ricardo Bianchini,et al.  Energy conservation in heterogeneous server clusters , 2005, PPoPP.