论文信息 - NO2: Speeding up Parallel Processing of Massive Compute-Intensive Tasks

NO2: Speeding up Parallel Processing of Massive Compute-Intensive Tasks

Large-scale computing frameworks, either tenanted on the cloud or deployed in the high-end local cluster, have become an indispensable software infrastructure to support numerous enterprise and scientific applications. Tasks executed on these frameworks are generally classified into data-intensive and compute-intensive ones. However, most existing frameworks, led by MapReduce, are mainly suitable for data-intensive tasks. Their task schedulers assume that the proportion of data I/O reflects the task progress and state. Unfortunately, this assumption does not apply to most compute-intensive tasks. Due to biased estimation of task progress, traditional frameworks cannot timely cut off outliers and therefore largely prolong execution time when performing compute-intensive tasks. We propose a new framework designed for compute-intensive tasks. By using instrumentation and automatic instrument point selector, our framework estimates the compute-intensive task progress without resorting to data I/O. We employ a clustering method to identify outliers at runtime and perform speculative execution/aborting, speeding up task execution by up to 25%. Moreover, our improvement to bare instrumentation limits overhead within 0.1%, and the aborting-based execution only introduces 10% more average CPU usage. Low overhead and resource consumption make our framework practically usable in the production environment.

Weimin Zheng | Yongwei Wu | Jinglei Ren | Weichao Guo | Xun Zhao

[1] Yuan Yu,et al. Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[2] Sathiamoorthy Manoharan,et al. Effect of task duplication on the assignment of dependency graphs , 2001, Parallel Comput..

[3] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[4] Miron Livny,et al. Condor: a distributed job scheduler , 2001 .

[5] Ishfaq Ahmad,et al. On Exploiting Task Duplication in Parallel Program Scheduling , 1998, IEEE Trans. Parallel Distributed Syst..

[6] Dharma P. Agrawal,et al. A scalable task duplication based scheduling algorithm for heterogeneous systems , 2000, Proceedings 2000 International Conference on Parallel Processing.

[7] Insung Park. Event Tracing for Windows: Best Practices , 2004, Int. CMG Conference.

[8] Atakan Dogan,et al. LDBS: a duplication based scheduling algorithm for heterogeneous computing systems , 2002, Proceedings International Conference on Parallel Processing.

[9] Bryan Cantrill,et al. Dynamic Instrumentation of Production Systems , 2004, USENIX Annual Technical Conference, General Track.

[10] Randy H. Katz,et al. Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[11] Mona Attariyan,et al. AutoBash: improving configuration management with operating system causality analysis , 2007, SOSP.

[12] Mor Harchol-Balter,et al. Task assignment in a distributed system (extended abstract): improving performance by unbalancing load , 1997, SIGMETRICS '98/PERFORMANCE '98.

[13] Mor Harchol-Balter. Task assignment with unknown duration , 2002, JACM.

[14] Albert G. Greenberg,et al. Reining in the Outliers in Map-Reduce Clusters using Mantri , 2010, OSDI.

[15] Michel Dagenais,et al. Measuring and Characterizing System Behavior Using Kernel-Level Event Logging , 2000, USENIX Annual Technical Conference, General Track.

[16] Richard Mortier,et al. Using Magpie for Request Extraction and Workload Modelling , 2004, OSDI.

[17] Judy Qiu,et al. Cloud Technologies for Bioinformatics Applications , 2011, IEEE Trans. Parallel Distributed Syst..

[18] Andrew V. Goldberg,et al. Quincy: fair scheduling for distributed computing clusters , 2009, SOSP '09.

[19] Xiaofeng Gao,et al. A Performance Prediction Framework for Scientific Applications , 2003, International Conference on Computational Science.

[20] Denis Caromel,et al. A High Performance Java Middleware with a Real Application , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[21] Arif Ghafoor,et al. Semi-Distributed Load Balancing For Massively Parallel Multicomputer Systems , 1991, IEEE Trans. Software Eng..

[22] Miron Livny,et al. Scheduling Mixed Workloads in Multi-grids: The Grid Execution Hierarchy , 2006, 2006 15th IEEE International Conference on High Performance Distributed Computing.

[23] Eero Vainikko,et al. Adapting scientific computing problems to clouds using MapReduce , 2012, Future Gener. Comput. Syst..

[24] Bora Uçar,et al. Task assignment in heterogeneous computing systems , 2006, J. Parallel Distributed Comput..

[25] Miron Livny,et al. Adaptive Scheduling for Master-Worker Applications on the Computational Grid , 2000, GRID.