Memory-based scheduling of scientific computing clusters

This study looks at how increased memory utilisation affects throughput and energy consumption in scientific computing, especially in high-energy physics. Our aim is to minimise energy consumed by a set of jobs without increasing the processing time. The earlier tests indicated that, especially in data analysis, throughput can increase over 100% and energy consumption decrease 50% by processing multiple jobs in parallel per CPU core. Since jobs are heterogeneous, it is not possible to find an optimum value for the number of parallel jobs. A better solution is based on memory utilisation, but finding an optimum memory threshold is not straightforward. Therefore, a fuzzy logic-based algorithm was developed that can dynamically adapt the memory threshold based on the overall load. In this way, it is possible to keep memory consumption stable with different workloads while achieving significantly higher throughput and energy-efficiency than using a traditional fixed number of jobs or fixed memory threshold approaches.

[1]  J. Little A Proof for the Queuing Formula: L = λW , 1961 .

[2]  J D Littler,et al.  A PROOF OF THE QUEUING FORMULA , 1961 .

[3]  H. Popper,et al.  The goal. , 1967, Journal of the Mount Sinai Hospital, New York.

[4]  Chuen-Chien Lee FUZZY LOGIC CONTROL SYSTEMS: FUZZY LOGIC CONTROLLER - PART I , 1990 .

[5]  Chuen-Chien Lee,et al.  Fuzzy logic in control systems: fuzzy logic controller. II , 1990, IEEE Trans. Syst. Man Cybern..

[6]  Hamid R. Arabnia A transputer-based reconfigurable parallel system , 1993 .

[7]  Hamid R. Arabnia,et al.  The REFINE Multiprocessor - Theoretical Properties and Algorithms , 1995, Parallel Comput..

[8]  Bhaba R. Sarker,et al.  A Review of:“Factory Physics: Foundations of Manufacturing Management” Wallace J. Hopp and Mark L. Spearman Richard D. Irwin, Inc., 1996 , 1997 .

[9]  Jeff Edmonds,et al.  Scheduling in the dark , 1999, STOC '99.

[10]  Ivo Bolsens,et al.  Proceedings of the conference on Design, Automation & Test in Europe , 2000 .

[11]  S. Mrenna,et al.  Pythia 6.3 physics and manual , 2003, hep-ph/0308153.

[12]  Dror G. Feitelson,et al.  Utilization, Predictability, Workloads, and User Runtime Estimates in Scheduling the IBM SP2 with Backfilling , 2001, IEEE Trans. Parallel Distributed Syst..

[13]  Amin Vahdat,et al.  Managing energy and server resources in hosting centers , 2001, SOSP.

[14]  Klara Nahrstedt,et al.  Integration of dynamic voltage scaling and soft real-time scheduling for open mobile systems , 2002, NOSSDAV '02.

[15]  Francisco Vilar Brasileiro,et al.  Trading Cycles for Information: Using Replication to Schedule Bag-of-Tasks Applications on Computational Grids , 2003, Euro-Par.

[16]  Kevin C. Almeroth,et al.  Proceedings of the 13th international workshop on Network and operating systems support for digital audio and video , 2003 .

[17]  Subhash Saini,et al.  GridFlow: workflow management for grid computing , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[18]  Francisco Vilar Brasileiro,et al.  Exploiting Replication and Data Reuse to Efficiently Schedule Data-Intensive Applications on Grids , 2004, JSSPP.

[19]  Arun Agarwal,et al.  Fuzzy based resource management framework for high throughput computing , 2004, IEEE International Symposium on Cluster Computing and the Grid, 2004. CCGrid 2004..

[20]  Wei Zhang,et al.  Reducing instruction cache energy consumption using a compiler-based strategy , 2004, TACO.

[21]  Xiaodong Li,et al.  Performance directed energy management for main memory and disks , 2004, ASPLOS XI.

[22]  Mehrdad Nourani,et al.  SoC test scheduling with power-time tradeoff and hot spot avoidance , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[23]  Jinquan Zhang,et al.  A Heuristic Algorithm for Task Scheduling Based on Mean Load , 2005, 2005 First International Conference on Semantics, Knowledge and Grid.

[24]  Dan Tsafrir,et al.  Modeling User Runtime Estimates , 2005, JSSPP.

[25]  G. N. Srinivasa Prasanna,et al.  The optimal control approach to generalized multiprocessor scheduling , 2005, Algorithmica.

[26]  David K. Lowenthal,et al.  Just In Time Dynamic Voltage Scaling: Exploiting Inter-Node Slack to Save Energy in MPI Programs , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[27]  Emmanuel Medernach,et al.  Workload Analysis of a Cluster in a Grid Environment , 2005, JSSPP.

[28]  Yuanyuan Zhou,et al.  Hibernator: helping disk arrays sleep through the winter , 2005, SOSP '05.

[29]  Michael Franz,et al.  Power reduction techniques for microprocessor systems , 2005, CSUR.

[30]  Xiaoyun Zhu,et al.  Triage: Performance differentiation for storage systems using adaptive control , 2005, TOS.

[31]  Wagner Meira,et al.  AnthillSched: A Scheduling Strategy for Irregular and Iterative I/O-Intensive Parallel Jobs , 2005, JSSPP.

[32]  Rong Ge,et al.  Performance-constrained Distributed DVS Scheduling for Scientific Applications on Power-aware Clusters , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[33]  Dan Tsafrir,et al.  A Short Survey of Commercial Cluster Batch Schedulers , 2005 .

[34]  Mary Jane Irwin,et al.  Energy/performance modeling for collective communication in 3-D torus cluster networks , 2006, SC.

[35]  Jeffrey S. Chase,et al.  Active and accelerated learning of cost models for optimizing scientific applications , 2006, VLDB.

[36]  David P. Bunde Power-aware scheduling for makespan and flow , 2006, SPAA '06.

[37]  Elad Yom-Tov,et al.  A Self-optimized Job Scheduler for Heterogeneous Server Clusters , 2007, JSSPP.

[38]  Ruyan Fu,et al.  Online scheduling in a parallel batch processing system to minimize makespan using restarts , 2007, Theor. Comput. Sci..

[39]  Ajith Abraham,et al.  MULTIOBJECTIVE EVOLUTIONARY ALGORITHMS FOR SCHEDULING JOBS ON COMPUTATIONAL GRIDS , 2007 .

[40]  Mahmoud Pegah,et al.  The wild wild waste: e-waste , 2007, SIGUCCS.

[41]  Xiaorui Wang,et al.  Server-Level Power Control , 2007, Fourth International Conference on Autonomic Computing (ICAC'07).

[42]  Dan Tsafrir,et al.  Backfilling Using System-Generated Predictions Rather than User Runtime Estimates , 2007, IEEE Transactions on Parallel and Distributed Systems.

[43]  Ger Koole,et al.  Resource allocation in grid computing , 2008, J. Sched..

[44]  Jarek Nabrzyski,et al.  A multicriteria approach to two-level hierarchy scheduling in grids , 2008, J. Sched..

[45]  Tajana Simunic,et al.  Temperature-aware MPSoC scheduling for reducing hot spots and gradients , 2008, 2008 Asia and South Pacific Design Automation Conference.

[46]  P. Borne,et al.  Multi-objective Scheduling onto Heterogeneous Processors System Using Ant System & Fuzzy Logic Controller , 2008 .

[47]  F. Fabozzi,et al.  Physics Analysis Tools for the CMS Experiment at LHC , 2008, IEEE Transactions on Nuclear Science.

[48]  Ahmed Amer,et al.  Predictive data grouping: Defining the bounds of energy and latency reduction through predictive data grouping and replication , 2008, TOS.

[49]  Philip S. Yu,et al.  Temperature-Aware Scheduling: When is System-Throttling Good Enough? , 2008, 2008 The Ninth International Conference on Web-Age Information Management.

[50]  Soonhoi Ha,et al.  Proceedings of the 2008 Asia and South Pacific Design Automation Conference , 2008, ASP-DAC 2008.

[51]  Hesham El-Rewini,et al.  On the use of meta-heuristics to increase the efficiency of online grid workflow scheduling algorithms , 2008, Cluster Computing.

[52]  Patricia J. Teller,et al.  Proceedings of the 2008 ACM/IEEE conference on Supercomputing , 2008, HiPC 2008.

[53]  Rosario M. Piro,et al.  Using historical accounting information to predict the resource usage of grid jobs , 2009, Future Gener. Comput. Syst..

[54]  Ayan Banerjee,et al.  Spatio-temporal thermal-aware job scheduling to minimize energy consumption in virtualized heterogeneous data centers , 2009, Comput. Networks.

[55]  Simone A. Ludwig,et al.  Using artificial life techniques for distributed grid job scheduling , 2009, SAC '09.

[56]  Chien-Min Wang,et al.  Bi-objective Optimization: An Online Algorithm for Job Assignment , 2009, GPC.

[57]  Tapio Niemi,et al.  Improving Energy-Efficiency of Grid Computing Clusters , 2009, GPC.

[58]  Paola Lecca,et al.  A new probabilistic generative model of parameter inference in biochemical networks , 2009, SAC '09.

[59]  Manish Marwah,et al.  Data analysis, visualization and knowledge discovery in sustainable data centers , 2009, COMPUTE '09.

[60]  David P. Bunde Power-aware scheduling for makespan and flow , 2009, J. Sched..

[61]  A. Abraham,et al.  Scheduling jobs on computational grids using a fuzzy particle swarm optimization algorithm , 2010, Future Gener. Comput. Syst..

[62]  David J. Brown,et al.  Toward Energy-Efficient Computing , 2010, ACM Queue.

[63]  Tapio Niemi,et al.  Applying Operations Management Principles on Optimisation of Scientific Computing Clusters , 2011 .

[64]  Edwin V. Bonilla,et al.  Predicting best design trade-offs: A case study in processor customization , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[65]  Fuzzy Logic in Control Systems : Fuzzy Logic , 2022 .