Maximizing Active Storage Resources with Deadlock Avoidance in Workflow-Based Computations

Workflow-based workloads usually consist of multiple instances of the same workflow, which are jobs with control or data dependencies to carry out a well-defined scientific computation task, with each instance acting on its own input data. To maximize the performance, a high degree of concurrency is always achieved by running multiple instances simultaneously. However, since the amount of storage is limited on most systems, deadlock due to oversubscribed storage requests is a potential problem. To address this problem, we integrate two novel concepts with the traditional problem of deadlock avoidance by proposing two algorithms that can maximize active (not just allocated) resource utilization and minimize makespan. Our approach is based on the well-known banker's algorithm, but our algorithms make the important distinction between active and inactive resources, which is not a part of previous approaches. The central idea is to leverage the data-flow information to dynamically approximate localized maximum claim (i.e., the resource requirements of the remaining jobs of the instance) to improve either interinstance or intrainstance concurrency and still avoid deadlock. Through simulation-based studies, we show how our proposed algorithms are better than the classic banker's algorithm and the more recent Lang's algorithm in terms of makespan and active storage resource utilization.

[1]  Ian J. Taylor,et al.  Workflows and e-Science: An overview of workflow system features and capabilities , 2009, Future Gener. Comput. Syst..

[2]  Rizos Sakellariou,et al.  Scheduling Data-IntensiveWorkflows onto Storage-Constrained Distributed Resources , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).

[3]  Arie Shoshani,et al.  Sequencing Tasks in Multiprocess Systems to Avoid Deadlocks , 1970, SWAT.

[4]  Andrew A. Chien,et al.  Input/Output Characteristics of Scalable Parallel Applications , 1995, SC.

[5]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[6]  Edward G. Coffman,et al.  Computer and job-shop scheduling theory , 1976 .

[7]  Zhiyong Lu,et al.  Proteome Analyst: custom predictions with explanations in a web-based tool for high-throughput proteome annotations , 2004, Nucleic Acids Res..

[8]  Yves Robert,et al.  Scheduling Tasks Sharing Files on Heterogeneous Master-Slave Platforms , 2004, PDP.

[9]  David J. DeWitt,et al.  Scientific data management in the coming decade , 2005, SGMD.

[10]  Lavanya Ramakrishnan,et al.  Magellan: experiences from a science cloud , 2011, ScienceCloud '11.

[11]  Yolanda Gil,et al.  Coordinating Workflows in Shared Grid Environments , 2004 .

[12]  Ravi Sethi,et al.  Complete register allocation problems , 1973, SIAM J. Comput..

[13]  Cheng Wu,et al.  Block-Based Concurrent and Storage-Aware Data Streaming for Grid Applications with Lots of Small Files , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[14]  Johan Montagnat,et al.  Grid-enabled workflows for data intensive medical applications , 2005, 18th IEEE Symposium on Computer-Based Medical Systems (CBMS'05).

[15]  Yang Wang,et al.  DDS: A deadlock detection-based scheduling algorithm for workflow computations in HPC systems with storage constraints , 2013, Parallel Comput..

[16]  J. Schaeffer,et al.  Solving the Game of Checkers , 1996 .

[17]  Cheng Wu,et al.  An integrated resource management and scheduling system for grid data streaming applications , 2008, 2008 9th IEEE/ACM International Conference on Grid Computing.

[18]  Rajkumar Buyya,et al.  Scheduling of Scientific Workflows on Data Grids , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[19]  Ewa Deelman,et al.  Integration of Workflow Partitioning and Resource Provisioning , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[20]  Yang Zhang,et al.  Relative Performance of Scheduling Algorithms in Grid Environments , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).

[21]  B. Barish,et al.  LIGO and the Detection of Gravitational Waves , 1999 .

[22]  L. Devroye Non-Uniform Random Variate Generation , 1986 .

[23]  Andrea C. Arpaci-Dusseau,et al.  Explicit Control in the Batch-Aware Distributed File System , 2004, NSDI.

[24]  David Abramson,et al.  Parameter Exploration in Science and Engineering Using Many-Task Computing , 2011, IEEE Transactions on Parallel and Distributed Systems.

[25]  Y.-K. Kwok,et al.  Static scheduling algorithms for allocating directed task graphs to multiprocessors , 1999, CSUR.

[26]  Wagner Meira,et al.  Scheduling data flow applications using linear programming , 2005, 2005 International Conference on Parallel Processing (ICPP'05).

[27]  Wlodzimierz Dobosiewicz,et al.  SMURPH: An Object-Oriented Simulator for Communication Networks and Protocols , 1993, MASCOTS.

[28]  T. Werner Target gene identification from expression array data by promoter analysis. , 2001, Biomolecular engineering.

[29]  M. A. Lawley,et al.  Efficient implementations of Banker's algorithm for deadlock avoidance in flexible manufacturing systems , 1997, 1997 IEEE 6th International Conference on Emerging Technologies and Factory Automation Proceedings, EFTA '97.

[30]  Mauricio D. Sacchi,et al.  Least-squares Wave-equation AVP Imaging of 3D Common Azimuth Data , 2003 .

[31]  Sheau-Dong Lang An Extended Banker's Algorithm for Deadlock Avoidance , 1999, IEEE Trans. Software Eng..

[32]  Ravi Sethi Complete Register Allocation Problems , 1975, SIAM J. Comput..

[33]  Ken Kennedy,et al.  Scheduling strategies for mapping application workflows onto the grid , 2005, HPDC-14. Proceedings. 14th IEEE International Symposium on High Performance Distributed Computing, 2005..

[34]  Radu Prodan,et al.  Applying Advance Reservation to Increase Predictability of Workflow Execution on the Grid , 2006, 2006 Second IEEE International Conference on e-Science and Grid Computing (e-Science'06).

[35]  Robert Lake,et al.  Solving Large Retrograde Analysis Problems Using a Network of Workstations , 1993 .

[36]  Andrea C. Arpaci-Dusseau,et al.  Data-driven batch scheduling , 2009, DADC '09.

[37]  S. G. Djorgovski,et al.  The Palomar Digital Sky Survey ( DPOSS ) 1 , 1998 .

[38]  Mark S. Gordon,et al.  General atomic and molecular electronic structure system , 1993, J. Comput. Chem..

[39]  Rajkumar Buyya,et al.  High-Performance Cloud Computing: A View of Scientific Applications , 2009, 2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks.

[40]  Marek Lehmann Data access in workflow management systems , 2006 .

[41]  Yang Wang,et al.  Dataflow detection and applications to workflow scheduling , 2011, Concurr. Comput. Pract. Exp..

[42]  Arnold L. Rosenberg,et al.  On scheduling mesh-structured computations for Internet-based computing , 2004, IEEE Transactions on Computers.

[43]  P. O. Hulth The Amanda Experiment , 1996 .

[44]  M. Siddiqui,et al.  Grid Capacity Planning with Negotiation-based Advance Reservation for Optimized QoS , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[45]  Veera Muangsin,et al.  Scheduling Workflow-based Parameter-Sweep Applications with Best-Intermediate-Result-First Heuristic , 2006, 2006 IEEE International Conference on Cluster Computing.

[46]  David Abramson,et al.  Scheduling Multiple Parameter Sweep Workflow Instances on the Grid , 2009, 2009 Fifth IEEE International Conference on e-Science.

[47]  Lizhe Wang,et al.  Scientific Cloud Computing: Early Definition and Experience , 2008, 2008 10th IEEE International Conference on High Performance Computing and Communications.

[48]  Weisong Shi,et al.  An Adaptive Rescheduling Strategy for Grid Workflow Applications , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.