论文信息 - Heterogeneous MacroTasking (HeMT) for Parallel Processing in the Public Cloud

Heterogeneous MacroTasking (HeMT) for Parallel Processing in the Public Cloud

Using tiny, equal-sized tasks (Homogeneous microTasking, HomT) has long been regarded an effective way of load balancing in parallel computing systems. When combined with nodes pulling in work upon becoming idle, HomT has the desirable property of automatically adapting its load distribution to the processing capacities of participating nodes - more powerful nodes finish their work sooner and, therefore, pull in additional work faster. As a result, HomT is deemed especially desirable in settings with heterogeneous (and possibly possessing dynamically changing) processing capacities. However, HomT does have additional scheduling and I/O overheads that might make this load balancing scheme costly in some scenarios. In this paper, we first analyze these advantages and disadvantages of HomT. We then propose an alternative load balancing scheme - Heterogeneous MacroTasking (HeMT) - wherein workload is intentionally partitioned according to nodes' processing capacity. Our goal is to study when HeMT is able to overcome the performance disadvantages of HomT. We implement a prototype of HeMT within the Apache Spark application framework with complementary enhancements to the Apache Mesos cluster manager. Spark's built-in scheduler, when parameterized appropriately, implements HomT. Our experimental results show that HeMT out-performs HomT when accurate workload-specific estimates of nodes' processing capacities are learned. As representative results, Spark with HeMT offers about 10% better average completion times for realistic data processing workloads over the default system.

[1] Wei Jin,et al. USENIX Association Proceedings of USITS ’ 03 : 4 th USENIX Symposium on Internet Technologies and Systems , 2003 .

[2] Muli Ben-Yehuda,et al. The Resource-as-a-Service (RaaS) Cloud , 2012, HotCloud.

[3] P. Tetali,et al. Multidimensional Bin Packing and Other Related Problems : A Survey ∗ , 2016 .

[4] Fan Zhang,et al. A statistical approach to predictive detection , 2001, Comput. Networks.

[5] Jerome A. Rolia,et al. Workload Analysis and Demand Prediction of Enterprise Data Center Applications , 2007, 2007 IEEE 10th International Symposium on Workload Characterization.

[6] David G. Stork,et al. Pattern Classification (2nd ed.) , 1999 .

[7] Zhiwu Huang,et al. Dynamic resource reservation via broker federation in cloud service: A fine-grained heuristic-based approach , 2014, 2014 IEEE Global Communications Conference.

[8] Joseph L. Hellerstein,et al. An approach to predictive detection for service management , 1999, Integrated Network Management VI. Distributed Management for the Networked Millennium. Proceedings of the Sixth IFIP/IEEE International Symposium on Integrated Network Management. (Cat. No.99EX302).

[9] Adam Meyerson,et al. Online Multidimensional Load Balancing , 2013, APPROX-RANDOM.

[10] Asser N. Tantawi,et al. Performance management for cluster based Web services , 2003 .

[11] Depei Qian,et al. Load Balancing in Heterogeneous MapReduce Environments , 2013, 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing.

[12] Sanjeev Khanna,et al. On multi-dimensional packing problems , 2004, SODA '99.

[13] Ada Gavrilovska,et al. Merlin: Application- and Platform-aware Resource Allocation in Consolidated Server Systems , 2014, SoCC.

[14] Asser N. Tantawi,et al. An analytical model for multi-tier internet services and its applications , 2005, SIGMETRICS '05.

[15] Daniel A. Menascé,et al. Resource Allocation for Autonomic Data Centers using Analytic Performance Models , 2005, Second International Conference on Autonomic Computing (ICAC'05).

[16] Scott Shenker,et al. The Case for Tiny Tasks in Compute Clusters , 2013, HotOS.

[17] Hongzi Mao,et al. Learning Graph-based Cluster Scheduling Algorithms , 2018 .

[18] Rubén S. Montero,et al. Scheduling strategies for optimal service deployment across multiple clouds , 2013, Future Gener. Comput. Syst..

[19] Ethan Katz-Bassett,et al. SPANStore: cost-effective geo-replicated storage spanning multiple cloud services , 2013, SOSP.

[20] Prateek Sharma,et al. SpotCheck: designing a derivative IaaS cloud on the spot market , 2015, EuroSys.

[21] Jon Howell,et al. Flat Datacenter Storage , 2012, OSDI.

[22] Ioannis Lambadaris,et al. Scheduling Distributed Resources in Heterogeneous Private Clouds , 2018, 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS).

[23] Magdalena Balazinska,et al. SkewTune: mitigating skew in mapreduce applications , 2012, SIGMOD Conference.

[24] Randy H. Katz,et al. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[25] Quanyuan Wu,et al. Locality Based Data Partitioning in MapReduce , 2013, 2013 IEEE 16th International Conference on Computational Science and Engineering.

[26] George Kesidis,et al. Using Burstable Instances in the Public Cloud : When and How ? , 2016 .

[27] Baochun Li,et al. Multi-resource Fair Sharing for Datacenter Jobs with Placement Constraints , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[28] Carlo Curino,et al. Reservation-based Scheduling: If You're Late Don't Blame Us! , 2014, SoCC.

[29] Zheng Shao,et al. Data warehousing and analytics infrastructure at facebook , 2010, SIGMOD Conference.

[30] Divyakant Agrawal,et al. Minimizing Commit Latency of Transactions in Geo-Replicated Data Stores , 2015, SIGMOD Conference.

[31] Magdalena Balazinska,et al. Skew-resistant parallel processing of feature-extracting scientific user-defined functions , 2010, SoCC '10.

[32] Chuang Lin,et al. An online mechanism for dynamic instance allocation in reserved instance marketplace , 2014, 2014 23rd International Conference on Computer Communication and Networks (ICCCN).

[33] Michael J. Freedman,et al. Riffle: optimized shuffle service for large-scale data analytics , 2018, EuroSys.

[34] Prashant J. Shenoy,et al. Dynamic resource allocation for shared data centers using online measurements , 2003, IWQoS'03.

[35] M. Jalali Varnamkhasti,et al. Overview of the Algorithms for Solving the Multidimensional Knapsack Problems , 2012 .

[36] Günther R. Raidl,et al. The Multidimensional Knapsack Problem: Structure and Algorithms , 2010, INFORMS J. Comput..

[37] Yang Chen,et al. TR-Spark: Transient Computing for Big Data Analytics , 2016, SoCC.