Detecting bottlenecks in parallel DAG-based data flow programs

In recent years, several frameworks have been introduced to facilitate massively-parallel data processing on shared-nothing architectures like compute clouds. While these frameworks generally offer good support in terms of task deployment and fault tolerance, they only provide poor assistance in finding reasonable degrees of parallelization for the tasks to be executed. However, as billing models of clouds enable the utilization of many resources for a short period of time for the same cost as utilizing few resources for a long time, proper levels of parallelization are crucial to achieve short processing times while maintaining good resource utilization and therefore good cost efficiency. In this paper, we present and evaluate a solution for detecting CPU and I/O bottlenecks in parallel DAG-based data flow programs assuming capacity constrained communication channels. The detection of bottlenecks represents an important foundation for manually or automatically scaling out and tuning parallel data flow programs in order to increase performance and cost efficiency.

[1]  Tomàs Margalef,et al.  Automatic performance evaluation of parallel programs , 1998, Proceedings of the Sixth Euromicro Workshop on Parallel and Distributed Processing - PDP '98 -.

[2]  Calton Pu,et al.  CloudXplor: a tool for configuration planning in clouds based on empirical data , 2010, SAC '10.

[3]  Emilio Luque,et al.  Modeling master-worker applications in POETRIES , 2004, Ninth International Workshop on High-Level Parallel Programming Models and Supportive Environments, 2004. Proceedings..

[4]  Calton Pu,et al.  Bottleneck Detection Using Statistical Intervention Analysis , 2007, DSOM.

[5]  Calton Pu,et al.  Detecting Bottleneck in n-Tier IT Applications Through Analysis , 2006, DSOM.

[6]  Dominic Battré,et al.  Nephele/PACTs: a programming model and execution framework for web-scale analytical processing , 2010, SoCC '10.

[7]  Li Li,et al.  Model-Based Performance Diagnosis of Master-Worker Parallel Computations , 2006, Euro-Par.

[8]  Mihai Budiu,et al.  Tuning SoCs using the global dynamic critical path , 2009, 2009 IEEE International SOC Conference (SOCC).

[9]  Garth A. Gibson,et al.  DiskReduce: RAID for data-intensive scalable computing , 2009, PDSW '09.

[10]  Rusty Russell,et al.  virtio: towards a de-facto standard for virtual I/O devices , 2008, OPSR.

[11]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[12]  Richard Wolski,et al.  The Eucalyptus Open-Source Cloud-Computing System , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[13]  Odej Kao,et al.  Nephele: efficient parallel data processing in the cloud , 2009, MTAGS '09.

[14]  Yong Zhao,et al.  Many-task computing for grids and supercomputers , 2008, 2008 Workshop on Many-Task Computing on Grids and Supercomputers.

[15]  A. Kivity,et al.  kvm : the Linux Virtual Machine Monitor , 2007 .

[16]  Jaspal Subhlok,et al.  Optimal latency-throughput tradeoffs for data parallel pipelines , 1996, SPAA '96.

[17]  Alan L. Cox,et al.  Whodunit: transactional profiling for multi-tier applications , 2007, EuroSys '07.

[18]  Joel H. Saltz,et al.  A Duplication Based Algorithm for Optimizing Latency Under Throughput Constraints for Streaming Workflows , 2008, 2008 37th International Conference on Parallel Processing.

[19]  Stephen Gilmore,et al.  Evaluating the Performance of Skeleton-Based High Level Parallel Programs , 2004, International Conference on Computational Science.

[20]  Seth Copen Goldstein,et al.  Global Critical Path: A Tool for System-Level Timing Analysis , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[21]  Calton Pu,et al.  Experimental evaluation of N-tier systems: Observation and analysis of multi-bottlenecks , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).