Enforcing QoS in scientific workflow systems enacted over Cloud infrastructures

The ability to support Quality of Service (QoS) constraints is an important requirement in some scientific applications. With the increasing use of Cloud computing infrastructures, where access to resources is shared, dynamic and provisioned on-demand, identifying how QoS constraints can be supported becomes an important challenge. However, access to dedicated resources is often not possible in existing Cloud deployments and limited QoS guarantees are provided by many commercial providers (often restricted to error rate and availability, rather than particular QoS metrics such as latency or access time). We propose a workflow system architecture which enforces QoS for the simultaneous execution of multiple scientific workflows over a shared infrastructure (such as a Cloud environment). Our approach involves multiple pipeline workflow instances, with each instance having its own QoS requirements. These workflows are composed of a number of stages, with each stage being mapped to one or more physical resources. A stage involves a combination of data access, computation and data transfer capability. A token bucket-based data throttling framework is embedded into the workflow system architecture. Each workflow instance stage regulates the amount of data that is injected into the shared resources, allowing for bursts of data to be injected while at the same time providing isolation of workflow streams. We demonstrate our approach by using the Montage workflow, and develop a Reference net model of the workflow.

[1]  Joaquín Ezpeleta,et al.  Vega: A Service-Oriented Grid Workflow Management System , 2007, OTM Conferences.

[2]  Rüdiger Valk,et al.  Petri Nets as Token Objects: An Introduction to Elementary Object Nets , 1998, ICATPN.

[3]  Bertram Ludäscher,et al.  Parallelizing XML data-streaming workflows via MapReduce , 2010, J. Comput. Syst. Sci..

[4]  Carole A. Goble,et al.  Taverna, Reloaded , 2010, SSDBM.

[5]  G. Alonso,et al.  Parallel computing patterns for Grid workflows , 2006, 2006 Workshop on Workflows in Support of Large-Scale Science.

[6]  Omer F. Rana,et al.  Automating Performance Analysis from Taverna Workflows , 2008, CBSE.

[7]  Shivendra S. Panwar,et al.  A survey of envelope processes and their applications in quality of service provisioning , 2006, IEEE Communications Surveys & Tutorials.

[8]  Mark Greenwood,et al.  Taverna: lessons in creating a workflow environment for the life sciences: Research Articles , 2006 .

[9]  Shawn Bowers,et al.  An approach for pipelining nested collections in scientific workflows , 2005, SGMD.

[10]  Omer F. Rana,et al.  Adaptive exception handling for scientific workflows , 2010, Concurr. Comput. Pract. Exp..

[11]  Daniel Moldt,et al.  An Extensible Editor and Simulation Engine for Petri Nets: Renew , 2004, ICATPN.

[12]  Tadao Murata,et al.  Petri nets: Properties, analysis and applications , 1989, Proc. IEEE.

[13]  Pete Beckman,et al.  LEAD Cyberinfrastructure to Track Real-Time Storms Using SPRUCE Urgent Computing , 2008 .

[14]  Daniel Moldt,et al.  Pattern Based Workflow Design Using Reference Nets , 2003, Business Process Management.

[15]  Maria Cecilia Gomes,et al.  Extending Grid-Based Workflow Tools With Patterns/Operators , 2008, Int. J. High Perform. Comput. Appl..

[16]  Omer F. Rana,et al.  Autonomic streaming pipeline for scientific workflows , 2011, Concurr. Comput. Pract. Exp..

[17]  D.E. Tolmie Gigabit networking , 1992, IEEE LTS.

[18]  Omer F. Rana,et al.  An uncoordinated asynchronous checkpointing model for hierarchical scientific workflows , 2010, J. Comput. Syst. Sci..

[19]  Ewa Deelman,et al.  Pegasus: Mapping Large-Scale Workflows to Distributed Resources , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[20]  Edward A. Lee,et al.  CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2000; 00:1–7 Prepared using cpeauth.cls [Version: 2002/09/19 v2.02] Taverna: Lessons in creating , 2022 .

[21]  D. Katz,et al.  The Montage architecture for grid-enabled science processing of large, distributed datasets , 2004 .

[22]  Sang-Min Park,et al.  Data throttling for data-intensive workflows , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.