Optimisation of the enactment of fine-grained distributed data-intensive work flows

The emergence of data-intensive science as the fourth science paradigm has posed a data deluge challenge for enacting scientific workflows. The scientific community is facing an imminent flood of data from the next generation of experiments and simulations, besides dealing with the heterogeneity and complexity of data, applications and execution environments. New scientific workflows involve execution on distributed and heterogeneous computing resources across organisational and geographical boundaries, processing gigabytes of live data streams and petabytes of archived and simulation data, in various formats and from multiple sources. Managing the enactment of such workflows not only requires larger storage space and faster machines, but the capability to support scalability and diversity of the users, applications, data, computing resources and the enactment technologies. We argue that the enactment process can be made efficient using optimisation techniques in an appropriate architecture. This architecture should support the creation of diversified applications and their enactment on diversified execution environments, with a standard interface, i.e. a workflow language. The workflow language should be both human readable and suitable for communication between the enactment environments. The data-streaming model central to this architecture provides a scalable approach to large-scale data exploitation. Data-flow between computational elements in the scientific workflow is implemented as streams. To cope with the exploratory nature of scientific workflows, the architecture should support fast workflow prototyping, and the re-use of workflows and workflow components. Above all, the enactment process should be easily repeated and automated. In this thesis, we present a candidate data-intensive architecture that includes an intermediate workflow language, named DISPEL. We create a new fine-grained measurement framework to capture performance-related data during enactments, and design a performance database to organise them systematically. We propose a new enactment strategy to demonstrate that optimisation of data-streaming workflows can be automated by exploiting performance data gathered during previous enactments.

[1]  Murray Cole,et al.  Performance database: capturing data for optimizing distributed streaming workflows , 2011, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[2]  Y. Simmhan,et al.  Towards Reliable, Performant Workflows for Streaming-Applications on Cloud Platforms , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[3]  Alexander S. Szalay,et al.  Gray's laws: database-centric computing in science , 2009, The Fourth Paradigm.

[4]  Bradford L. Chamberlain,et al.  Graph Partitioning Algorithms for Distributing Workloads of Parallel Computations , 2001 .

[5]  Edward A. Lee,et al.  Heterogeneous Concurrent Modeling and Design in Java (Volume 1: Introduction to Ptolemy II) , 2008 .

[6]  Mendel Rosenblum,et al.  Streamware: programming general-purpose multicore processors using streams , 2008, ASPLOS.

[7]  Paul Hofmann,et al.  Cloud computing and electricity , 2010, Commun. ACM.

[8]  Paul T. Groth,et al.  The application of cloud computing to the creation of image mosaics and management of their provenance , 2010, Astronomical Telescopes + Instrumentation.

[9]  Mario Antonioletti,et al.  Integrating distributed data sources with OGSA–DAI DQP and Views , 2010, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[10]  C. Kesselman,et al.  CyberShake: A Physics-Based Seismic Hazard Model for Southern California , 2011 .

[11]  Ladislau Bölöni,et al.  A Comparison of Eleven Static Heuristics for Mapping a Class of Independent Tasks onto Heterogeneous Distributed Computing Systems , 2001, J. Parallel Distributed Comput..

[12]  Wil M.P. van der Aalst,et al.  YAWL: yet another workflow language , 2005, Inf. Syst..

[13]  Daniel S. Katz,et al.  Swift: A language for distributed parallel scripting , 2011, Parallel Comput..

[14]  Philip S. Yu,et al.  SPADE: the system s declarative stream processing engine , 2008, SIGMOD Conference.

[15]  Mark H. Ellisman,et al.  Case Studies on the Use of Workflow Technologies for Scientific Analysis: The Biomedical Informatics Research Network and the Telescience Project , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[16]  Peter Gerstoft,et al.  Seismic interferometry-turning noise into signal , 2006 .

[17]  D. Kell,et al.  Here is the evidence, now what is the hypothesis? The complementary roles of inductive and hypothesis-driven science in the post-genomic era. , 2004, BioEssays : news and reviews in molecular, cellular and developmental biology.

[18]  Carl Kesselman,et al.  Optimizing Grid-Based Workflow Execution , 2005, Journal of Grid Computing.

[19]  Salim Hariri,et al.  Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[20]  Yogesh L. Simmhan,et al.  The Trident Scientific Workflow Workbench , 2008, 2008 IEEE Fourth International Conference on eScience.

[21]  Li Zhao,et al.  SCEC CyberShake Workflows - Automating Probabilistic Seismic Hazard Analysis Calculations , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[22]  David Fernández-Baca,et al.  Allocating Modules to Processors in a Distributed System , 1989, IEEE Trans. Software Eng..

[23]  Andrew C. Jones Workflow and Biodiversity e-Science , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[24]  Edward A. Lee,et al.  Taming heterogeneity - the Ptolemy approach , 2003, Proc. IEEE.

[25]  Alexander S. Szalay,et al.  GrayWulf: Scalable Clustered Architecture for Data Intensive Computing , 2009, 2009 42nd Hawaii International Conference on System Sciences.

[26]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[27]  J. Conway,et al.  LOFAR: Recent Imaging Results and Future Prospects , 2011, 1106.3195.

[28]  Matthew S. Shields Control- Versus Data-Driven Workflows , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[29]  Anthony J. G. Hey,et al.  The Future of Data-Intensive Science , 2012, Computer.

[30]  Wil M. P. van der Aalst,et al.  Workflow Patterns , 2004, Distributed and Parallel Databases.

[31]  J. Ian Munro,et al.  Selection and sorting with limited storage , 1978, 19th Annual Symposium on Foundations of Computer Science (sfcs 1978).

[32]  Yong Zhao,et al.  Scientific Workflow Systems for 21st Century, New Bottle or New Wine? , 2008, 2008 IEEE Congress on Services - Part I.

[33]  Bertram Ludäscher,et al.  Actor-Oriented Design of Scientific Workflows , 2005, ER.

[34]  Jano I. van Hemert,et al.  Towards optimising distributed data streaming graphs using parallel streams , 2010, HPDC '10.

[35]  Michael Stonebraker,et al.  Requirements for Science Data Bases and SciDB , 2009, CIDR.

[36]  Mathias Weske,et al.  Scientific Workflows: Business as Usual? , 2009, BPM.

[37]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[38]  Aleksander Slominski Adapting BPEL to Scientific Workflows , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[39]  Yogesh L. Simmhan,et al.  A survey of data provenance in e-science , 2005, SGMD.

[40]  E. Ilavarasan,et al.  Performance Effective Task Scheduling Algorithm for Heterogeneous Computing System , 2005, The 4th International Symposium on Parallel and Distributed Computing (ISPDC'05).

[41]  Carole A. Goble,et al.  The design and realisation of the myExperiment Virtual Research Environment for social sharing of workflows , 2009, Future Gener. Comput. Syst..

[42]  Óscar Corcho,et al.  Data-intensive architecture for scientific knowledge discovery , 2012, Distributed and Parallel Databases.

[43]  John M. Dennis,et al.  Parallel high-resolution climate data analysis using swift , 2011, MTAGS '11.

[44]  Guillaume Urvoy-Keller,et al.  Scheduling in practice , 2007, PERV.

[45]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[46]  Malcolm Atkinson Data-Intensive Thinking with DISPEL , 2013 .

[47]  Ewa Deelman,et al.  Workflow overhead analysis and optimizations , 2011, WORKS '11.

[48]  Malcolm P. Atkinson,et al.  A distributed architecture for data mining and integration , 2009, DADC '09.

[49]  Carole A. Goble,et al.  Taverna, Reloaded , 2010, SSDBM.

[50]  Jens H. Krüger,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.

[51]  Yolanda Gil,et al.  Wings for Pegasus: A Semantic Approach to Creating Very Large Scientific Workflows , 2006, OWLED.

[52]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[53]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[54]  Michael I. Gordon,et al.  Exploiting coarse-grained task, data, and pipeline parallelism in stream programs , 2006, ASPLOS XII.

[55]  Tore Risch,et al.  Cost-based Optimization of Complex Scientific Queries , 2007, 19th International Conference on Scientific and Statistical Database Management (SSDBM 2007).

[56]  Donald Kossmann,et al.  The state of the art in distributed query processing , 2000, CSUR.

[57]  Domenico Talia,et al.  A Taxonomy for the Analysis of Scientific Workflow Faults , 2010, 2010 13th IEEE International Conference on Computational Science and Engineering.

[58]  Dennis Gannon,et al.  Scientific versus Business Workflows , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[59]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[60]  Thomas L. Casavant,et al.  A Taxonomy of Scheduling in General-Purpose Distributed Computing Systems , 1988, IEEE Trans. Software Eng..

[61]  Carole A. Goble,et al.  The impact of workflow tools on data-centric research , 2009, The Fourth Paradigm.

[62]  Marta Mattoso,et al.  Towards a Taxonomy of Provenance in Scientific Workflow Management Systems , 2009, 2009 Congress on Services - I.

[63]  Dennis Gannon Component Architectures and Services: From Application Construction to Scientific Workflows , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[64]  Junwei Cao,et al.  A Case Study on the Use of Workflow Technologies for Scientific Analysis: Gravitational Wave Data Analysis , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[65]  Jianting Zhang,et al.  Data Integration and Workflow Solutions for Ecology , 2005, DILS.

[66]  Adam Jacobs,et al.  The pathologies of big data , 2009, Commun. ACM.

[67]  Michael Wilde,et al.  Kickstarting remote applications , 2006 .

[68]  Ewa Deelman,et al.  Grids and Clouds: Making Workflow Applications Work in Heterogeneous Distributed Environments , 2010, Int. J. High Perform. Comput. Appl..

[69]  Michael E. Papka,et al.  Accelerating science gateway development with Web 2.0 and Swift , 2010 .

[70]  Robert Stephens,et al.  A survey of stream processing , 1997, Acta Informatica.

[71]  Alexander S. Szalay,et al.  Data-Intensive Computing in the 21st Century , 2008, Computer.

[72]  Surajit Chaudhuri,et al.  An overview of query optimization in relational systems , 1998, PODS.

[73]  Ichiro Fujinaga,et al.  An e-Research approach to Web-scale music analysis , 2011, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[74]  Jeff Weber,et al.  Workflow Management in Condor , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[75]  Yolanda Gil,et al.  Managing Large-Scale Scientific Workflows in Distributed Environments: Experiences and Challenges , 2006, 2006 Second IEEE International Conference on e-Science and Grid Computing (e-Science'06).

[76]  Yogesh L. Simmhan,et al.  Karma2: Provenance Management for Data-Driven Workflows , 2008, Int. J. Web Serv. Res..

[77]  David Tcheng,et al.  A general approach to data-intensive computing using the Meandre component-based framework , 2010, Wands '10.

[78]  Xavier Llorà,et al.  Meandre: Semantic-Driven Data-Intensive Flows in the Clouds , 2008, 2008 IEEE Fourth International Conference on eScience.

[79]  Robert L. Grossman,et al.  Sector and Sphere: the design and implementation of a high-performance data cloud , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[80]  William J. Dally,et al.  Imagine: Media Processing with Streams , 2001, IEEE Micro.

[81]  Carole A. Goble,et al.  The Evolution of myExperiment , 2010, 2010 IEEE Sixth International Conference on e-Science.

[82]  Johan Tordsson,et al.  Three fundamental dimensions of scientific workflow interoperability: Model of computation, language, and execution environment , 2010, Future Gener. Comput. Syst..

[83]  Jano I. van Hemert,et al.  A generic parallel processing model for facilitating data mining and integration , 2011, Parallel Comput..

[84]  Juliana Freire,et al.  Provenance and scientific workflows: challenges and opportunities , 2008, SIGMOD Conference.

[85]  Michael E. Papka,et al.  A solution looking for lots of problems: generic portals for science infrastructure , 2011 .

[86]  Yolanda Gil,et al.  Provenance trails in the Wings-Pegasus system , 2008 .

[87]  Johan Montagnat,et al.  Workflow-Level Parametric Study Support by MOTEUR and the P-GRADE Portal , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[88]  Frank Leymann,et al.  Conventional Workflow Technology for Scientific Simulation , 2011, Guide to e-Science.

[89]  Lukasz Golab Data Stream , 2009, Encyclopedia of Database Systems.

[90]  Yan Huang,et al.  Dynamic service selection in workflows using performance data , 2007, Sci. Program..

[91]  Ian J. Taylor,et al.  Workflows and e-Science: An overview of workflow system features and capabilities , 2009, Future Gener. Comput. Syst..

[92]  Steve Kelling,et al.  Estimating Species Distributions—Across Space, Through Time, and with Features of the Environment , 2013, DS 2013.

[93]  Yogesh Simmhan,et al.  Building the Trident Scientific Workflow Workbench for Data Management in the Cloud , 2009, 2009 Third International Conference on Advanced Engineering Computing and Applications in Sciences.

[94]  Carole A. Goble,et al.  Taverna: a tool for building and running workflows of services , 2006, Nucleic Acids Res..

[95]  Ian J. Taylor,et al.  The Triana Workflow Environment: Architecture and Applications , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[96]  Nicole Schweikardt,et al.  One-Pass Algorithm , 2009, Encyclopedia of Database Systems.

[97]  Martin L. Kersten,et al.  Breaking the memory wall in MonetDB , 2008, CACM.

[98]  Beng Chin Ooi,et al.  The Claremont report on database research , 2008, SGMD.

[99]  Alexander S. Szalay,et al.  Petascale computational systems , 2007, Computer.

[100]  Edward A. Lee,et al.  CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2000; 00:1–7 Prepared using cpeauth.cls [Version: 2002/09/19 v2.02] Taverna: Lessons in creating , 2022 .

[101]  Amit P. Sheth,et al.  A Taxonomy of Adaptive Workflow Management , 2002 .

[102]  Yolanda Gil,et al.  Wings for Pegasus: Creating Large-Scale Scientific Applications Using Semantic Representations of Computational Workflows , 2007, AAAI.

[103]  William Thies,et al.  StreamIt: A Language for Streaming Applications , 2002, CC.

[104]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[105]  Ewa Deelman,et al.  Pegasus: Mapping Large-Scale Workflows to Distributed Resources , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[106]  Matthew R. Pocock,et al.  Taverna: a tool for the composition and enactment of bioinformatics workflows , 2004, Bioinform..

[107]  Mei-Hui Su,et al.  Characterization of scientific workflows , 2008, 2008 Third Workshop on Workflows in Support of Large-Scale Science.

[108]  Jano I. van Hemert,et al.  Orchestrating Data-Centric Workflows , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[109]  Cláudio T. Silva,et al.  Managing the Evolution of Dataflows with VisTrails , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[110]  Sara J. Graves,et al.  CASA and LEAD: adaptive cyberinfrastructure for real-time multiscale weather forecasting , 2006, Computer.

[111]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[112]  Rajkumar Buyya,et al.  A Taxonomy of Workflow Management Systems for Grid Computing , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[113]  Bertram Ludäscher,et al.  Project Histories: Managing Data Provenance Across Collection-Oriented Scientific Workflow Runs , 2007, DILS.

[114]  Cesare Pautasso,et al.  Control the Flow: How to Safely Compose Streaming Services into Business Processes , 2006, 2006 IEEE International Conference on Services Computing (SCC'06).

[115]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[116]  Michelle Galea,et al.  Data‐Intensive Seismology: Research Horizons , 2013, DS 2013.

[117]  Paul W. P. J. Grefen,et al.  A Taxonomy of Transactional Workflow Support , 2006, Int. J. Cooperative Inf. Syst..

[118]  Michael Wilde,et al.  Modeling large regions in proteins: Applications to loops, termini, and folding , 2012, Protein science : a publication of the Protein Society.

[119]  Oscar Corcho,et al.  Validation and mismatch repair of workflows through typed data streams , 2011, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[120]  Tamara G. Kolda,et al.  Graph partitioning models for parallel computing , 2000, Parallel Comput..

[121]  Zhao Zhang,et al.  Parallel Scripting for Applications at the Petascale and Beyond , 2009, Computer.

[122]  G. Bruce Berriman,et al.  How Will Astronomy Archives Survive the Data Tsunami? , 2011, ACM Queue.

[123]  Carole A. Goble,et al.  Using provenance to manage knowledge of In Silico experiments , 2007, Briefings Bioinform..

[124]  Gregor von Laszewski,et al.  Workflow Concepts of the Java CoG Kit , 2005, Journal of Grid Computing.

[125]  Geoffrey C. Fox,et al.  Examining the Challenges of Scientific Workflows , 2007, Computer.

[126]  Jörg Becker,et al.  Workflow Application Architectures: Classification and Characteristics of Workflow-based Information Systems , 2002 .

[127]  Shiyong Lu,et al.  Prospective and Retrospective Provenance Collection in Scientific Workflow Environments , 2010, 2010 IEEE International Conference on Services Computing.

[128]  Bertram Ludäscher,et al.  Scientific workflow design 2.0: Demonstrating streaming data collections in Kepler , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[129]  Daniel S. Katz,et al.  Generating Complex Astronomy Workflows , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[130]  Robin J. Smith The Oracle platform for real time streaming event driven architecture based solutions , 2010, IWGS '10.

[131]  Jinjun Chen,et al.  A taxonomy of grid workflow verification and validation , 2008, Concurr. Comput. Pract. Exp..

[132]  Ewa Deelman,et al.  Scaling up workflow-based applications , 2010, J. Comput. Syst. Sci..

[133]  Gregor von Laszewski,et al.  Swift: Fast, Reliable, Loosely Coupled Parallel Computation , 2007, 2007 IEEE Congress on Services (Services 2007).

[134]  I. Sfiligoi,et al.  Making science in the Grid world: using glideins to maximize scientific output , 2007, 2007 IEEE Nuclear Science Symposium Conference Record.

[135]  Jun Qin,et al.  ASKALON: A Development and Grid Computing Environment for Scientific Workflows , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[136]  Norman W. Paton,et al.  The design and implementation of OGSA-DQP: A service-based distributed query processor , 2009, Future Gener. Comput. Syst..

[137]  Yogesh L. Simmhan,et al.  Dynamic, Adaptive Workflows for Mesoscale Meteorology , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[138]  Yingjie Yang,et al.  Processing seismic ambient noise data to obtain reliable broad-band surface wave dispersion measurements , 2007 .

[139]  V. Curcin,et al.  Scientific workflow systems - can one size fit all? , 2008, 2008 Cairo International Biomedical Engineering Conference.

[140]  Li Zhao,et al.  Managing Large-Scale Workflow Execution from Resource Provisioning to Provenance Tracking: The CyberShake Example , 2006, 2006 Second IEEE International Conference on e-Science and Grid Computing (e-Science'06).

[141]  David Stuart Robertson,et al.  Choreographing Web Services , 2009, IEEE Transactions on Services Computing.

[142]  Gagarine Yaikhom,et al.  Definition of the DISPEL Language , 2013 .

[143]  Radu Prodan,et al.  Towards a general model of the multi-criteria workflow scheduling on the grid , 2009, Future Gener. Comput. Syst..

[144]  Schahram Dustdar,et al.  Performance metrics and ontologies for Grid workflows , 2007, Future Gener. Comput. Syst..

[145]  Yolanda Gil,et al.  Pegasus: Mapping Scientific Workflows onto the Grid , 2004, European Across Grids Conference.

[146]  Jano I. van Hemert,et al.  Federated Enactment of Workflow Patterns , 2010, Euro-Par.

[147]  Jano I. van Hemert,et al.  Automatically identifying and annotating mouse embryo gene expression patterns , 2011, Bioinform..

[148]  Yong Zhao,et al.  Falkon: a Fast and Light-weight tasK executiON framework , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[149]  Ilkay Altintas,et al.  Provenance Collection Support in the Kepler Scientific Workflow System , 2006, IPAW.

[150]  Justin M. Wozniak,et al.  Coasters: Uniform Resource Provisioning and Access for Clouds and Grids , 2011, 2011 Fourth IEEE International Conference on Utility and Cloud Computing.

[151]  Heiko Schuldt,et al.  Towards Reliable Data Stream Processing with OSIRIS-SE , 2005, BTW.

[152]  Xavier Llorà Data-intensive computing for competent genetic algorithms: a pilot study using meandre , 2009, GECCO '09.

[153]  Joshua Zhexue Huang,et al.  Web services: problems and future directions , 2004, J. Web Semant..

[154]  Carole A. Goble,et al.  Taverna/myGrid: Aligning a Workflow System with the Life Sciences Community , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[155]  Chris Allan,et al.  OME Remote Objects (OMERO): a flexible, model-driven data management system for experimental biology , 2012, Nature Methods.

[156]  Ian T. Foster,et al.  Accelerating Medical Research using the Swift Workflow System , 2007, HealthGrid.

[157]  Michael L. Norman,et al.  Accelerating data-intensive science with Gordon and Dash , 2010 .

[158]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[159]  Johan Montagnat,et al.  A Service-Oriented Architecture enabling dynamic service grouping for optimizing distributed workflow execution , 2008, Future Gener. Comput. Syst..

[160]  Anthony J. G. Hey,et al.  Jim Gray on eScience: a transformed scientific method , 2009, The Fourth Paradigm.

[161]  Ewa Deelman,et al.  Integrating existing scientific workflow systems: the Kepler/Pegasus example , 2007, WORKS '07.

[162]  Liang Chen,et al.  Grid Service Orchestration Using the Business Process Execution Language (BPEL) , 2005, Journal of Grid Computing.

[163]  Shawn Bowers,et al.  An approach for pipelining nested collections in scientific workflows , 2005, SGMD.

[164]  G. Alonso,et al.  Parallel computing patterns for Grid workflows , 2006, 2006 Workshop on Workflows in Support of Large-Scale Science.

[165]  Peter Brezany,et al.  The Data Bonanza: Improving Knowledge Discovery in Science, Engineering, and Business , 2013 .

[166]  Prashant Pandey,et al.  Cloud computing , 2010, ICWET.

[167]  Johan Montagnat,et al.  Flexible and Efficient Workflow Deployment of Data-Intensive Applications On Grids With MOTEUR , 2008, Int. J. High Perform. Comput. Appl..

[168]  Yong Zhao,et al.  Chimera: a virtual data system for representing, querying, and automating data derivation , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[169]  Shantenu Jha,et al.  SAGA BigJob: An Extensible and Interoperable Pilot-Job Abstraction for Distributed Applications and Systems , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[170]  Joel H. Saltz,et al.  Designing and parameterizing a workflow for optimization: A case study in biomedical imaging , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[171]  Dennis Gannon,et al.  Workflows for e-Science, Scientific Workflows for Grids , 2014 .

[172]  Michael J. Flynn,et al.  Some Computer Organizations and Their Effectiveness , 1972, IEEE Transactions on Computers.

[173]  Alex Wright,et al.  Data streaming 2.0 , 2010, CACM.

[174]  Arthur H. M. ter Hofstede,et al.  newYAWL: Towards Workflow 2.0 , 2009, Trans. Petri Nets Other Model. Concurr..

[175]  David Abramson,et al.  Nimrod/K: Towards massively parallel dynamic Grid workflows , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[176]  Heiko Koziolek Introduction to Performance Metrics , 2005, Dependability Metrics.

[177]  Yoonho Park,et al.  SPC: a distributed, scalable platform for data mining , 2006, DMSSP '06.