Multi-objective job placement in clusters

One of the key decisions made by both MapReduce and HPC cluster management frameworks is the placement of jobs within a cluster. To make this decision, they consider factors like resource constraints within a node or the proximity of data to a process. However, they fail to account for the degree of collocation on the cluster's nodes. A tight process placement can create contention for the intra-node shared resources, such as shared caches, memory, disk, or network bandwidth. A loose placement would create less contention, but exacerbate network delays and increase cluster-wide power consumption. Finding the best job placement is challenging, because among many possible placements, we need to find one that gives us an acceptable trade-off between performance and power consumption. We propose to tackle the problem via multi-objective optimization. Our solution is able to balance conflicting objectives specified by the user and efficiently find a suitable job placement.

[1]  Christoph Meinel,et al.  Energy efficient scheduling of HPC-jobs on virtualize clusters using host and VM dynamic configuration , 2012, OPSR.

[2]  Michael I. Jordan,et al.  Managing data transfers in computer clusters with orchestra , 2011, SIGCOMM.

[3]  Albert G. Greenberg,et al.  VL2: a scalable and flexible data center network , 2009, SIGCOMM '09.

[4]  Prakash Adhikari,et al.  Acknowledgment , 2017, Plant Biotechnology Reports.

[5]  Amin Vahdat,et al.  TritonSort: A Balanced and Energy-Efficient Large-Scale Sorting System , 2013, TOCS.

[6]  Tamara Munzner,et al.  Vismon: Facilitating Analysis of Trade‐Offs, Uncertainty, and Sensitivity In Fisheries Management Decision Making , 2012, Comput. Graph. Forum.

[7]  Xavier Lorca,et al.  Entropy: a consolidation manager for clusters , 2009, VEE '09.

[8]  Preston M. Smith,et al.  Cost-Effective HPC: The Community or the Cloud? , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[9]  A. Shamsai,et al.  Multi-objective Optimization , 2017, Encyclopedia of Machine Learning and Data Mining.

[10]  Andrew V. Goldberg,et al.  Quincy: fair scheduling for distributed computing clusters , 2009, SOSP '09.

[11]  Alexandra Fedorova,et al.  A case for NUMA-aware contention management on multicore systems , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[12]  Khaled Z. Ibrahim,et al.  Characterizing the Performance of Parallel Applications on Multi-socket Virtual Machines , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[13]  Yanpei Chen,et al.  Interactive Analytical Processing in Big Data Systems: A Cross-Industry Study of MapReduce Workloads , 2012, Proc. VLDB Endow..

[14]  Franck Cappello,et al.  Grid'5000: A Large Scale And Highly Reconfigurable Experimental Grid Testbed , 2006, Int. J. High Perform. Comput. Appl..

[15]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[16]  Lionel Amodeo,et al.  A Multiobjective Optimization Approach to Solve a Parallel Machines Scheduling Problem , 2010, Adv. Artif. Intell..

[17]  Nathan Regola,et al.  Recommendations for Virtualization Technologies in High Performance Computing , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[18]  I. Stoica,et al.  FairCloud: sharing the network in cloud computing , 2011, CCRV.

[19]  Xavier Lorca,et al.  Choco: an Open Source Java Constraint Programming Library , 2008 .

[20]  Amar Phanishayee,et al.  FAWN: a fast array of wimpy nodes , 2009, SOSP '09.

[21]  Bruce M. Maggs,et al.  Cutting the electric bill for internet-scale systems , 2009, SIGCOMM '09.

[22]  Warren Smith,et al.  FutureGrid: A Reconfigurable Testbed for Cloud, HPC, and Grid Computing , 2017 .

[23]  T. N. Vijaykumar,et al.  Joint optimization of idle and cooling power in data centers while maintaining response time , 2010, ASPLOS XV.

[24]  Douglas Thain,et al.  Converting a High Performance Application to an Elastic Cloud Application , 2011, 2011 IEEE Third International Conference on Cloud Computing Technology and Science.

[25]  John Shalf,et al.  Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[26]  Alexandra Fedorova,et al.  Contention-Aware Scheduling on Multicore Systems , 2010, TOCS.

[27]  Michael D. Smith,et al.  Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[28]  Asser N. Tantawi A Scalable Algorithm for Placement of Virtual Clusters in Large Data Centers , 2012, 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[29]  Benjamin Hindman,et al.  Dominant Resource Fairness: Fair Allocation of Multiple Resource Types , 2011, NSDI.

[30]  El-Ghazali Talbi,et al.  A pareto-based GA for scheduling HPC applications on distributed cloud infrastructures , 2011, 2011 International Conference on High Performance Computing & Simulation.

[31]  Arun Venkataramani,et al.  Black-box and Gray-box Strategies for Virtual Machine Migration , 2007, NSDI.

[32]  Rina Panigrahy,et al.  Heuristics for Vector Bin Packing , 2011 .

[33]  Miltos Petridis,et al.  Dynamic Scheduling of Virtual Machines Running HPC Workloads in Scientific Grids , 2009, 2009 3rd International Conference on New Technologies, Mobility and Security.

[34]  Xavier Lorca,et al.  Bin Repacking Scheduling in Virtualized Datacenters , 2011, CP.

[35]  Randy H. Katz,et al.  Heterogeneity and dynamicity of clouds at scale: Google trace analysis , 2012, SoCC '12.

[36]  Anthony Sulistio,et al.  ViteraaS: Virtual Cluster as a Service , 2011, 2011 IEEE Third International Conference on Cloud Computing Technology and Science.

[37]  Ricardo Bianchini,et al.  DeepDive: Transparently Identifying and Managing Performance Interference in Virtualized Environments , 2013, USENIX Annual Technical Conference.

[38]  R. K. Ursem Multi-objective Optimization using Evolutionary Algorithms , 2009 .

[39]  Jie Huang,et al.  The HiBench benchmark suite: Characterization of the MapReduce-based data analysis , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[40]  Albert Y. Zomaya,et al.  Handbook on Data Centers , 2015, Springer New York.

[41]  J. Christopher Beck Principles and Practice of Constraint Programming , 2017, Lecture Notes in Computer Science.

[42]  Yanpei Chen,et al.  Energy efficiency for large-scale MapReduce workloads with significant interactive analysis , 2012, EuroSys '12.

[43]  Dinan Gunawardena,et al.  Chatty Tenants and the Cloud Network Sharing Problem , 2013, NSDI.

[44]  Jian Pei,et al.  A practical method for estimating performance degradation on multicore processors, and its application to HPC workloads , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[45]  Alexandra Fedorova,et al.  Addressing shared resource contention in multicore processors via scheduling , 2010, ASPLOS 2010.

[46]  Amin Vahdat,et al.  TritonSort: A Balanced Large-Scale Sorting System , 2011, NSDI.

[47]  Toni Mastelic,et al.  Methodology for trade-off analysis when moving scientific applications to cloud , 2012, 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings.

[48]  Gabriel H. Loh,et al.  Dynamic Classification of Program Memory Behaviors in CMPs , 2008 .

[49]  Gautam Kumar,et al.  FairCloud: sharing the network in cloud computing , 2011, CCRV.

[50]  Philippe Olivier Alexandre Navaux,et al.  High Performance Computing in the cloud: Deployment, performance and cost efficiency , 2012, 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings.

[51]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[52]  Aman Kansal,et al.  Q-clouds: managing performance interference effects for QoS-aware clouds , 2010, EuroSys '10.

[53]  Shantenu Jha,et al.  Exploring the Performance Fluctuations of HPC Workloads on Clouds , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[54]  Amin Vahdat,et al.  Themis: an I/O-efficient MapReduce , 2012, SoCC '12.

[55]  Seyong Lee,et al.  PUMA: Purdue MapReduce Benchmarks Suite , 2012 .

[56]  Mark Shacklette Linux Operating System , 2011 .

[57]  Jordi Torres,et al.  GreenSlot: Scheduling energy consumption in green datacenters , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[58]  Bin Fan,et al.  Small cache, big effect: provable load balancing for randomly partitioned cluster services , 2011, SoCC.

[59]  Richard Wolski,et al.  Efficient auction-based grid reservations using dynamic programming , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[60]  Charles Shubert,et al.  StarHPC — Teaching parallel programming within elastic compute cloud , 2009, Proceedings of the ITI 2009 31st International Conference on Information Technology Interfaces.