Towards a Cost Model for Scheduling Scientific Workflows Activities in Cloud Environments

Cloud computing has emerged as a new paradigm that enables scientists to benefit from several distributed resources such as hardware and software. Clouds poses as an opportunity for scientists that need high performance computing infrastructure to execute their scientific experiments. Most of the experiments modeled as scientific workflows manage the execution of several activities and work with large amounts of data. In this way parallel techniques are often a key factor. Parallelizing a scientific workflow in the cloud environment is not trivial. One of the complex tasks is to define the number and types of virtual machines and to design the parallel execution strategy. Due to the number of options for configuring an environment it is a hard task to do it manually and it may produce negative impact on performance. This paper initially proposes a cost model based on concepts of quality of service (QoS) in clouds to help determining an adequate configuration of the environment according to restrictions imposed by scientists.

[1]  Shujia Zhou,et al.  Case study for running HPC applications in public clouds , 2010, HPDC '10.

[2]  Cláudio T. Silva,et al.  VisTrails: visualization meets data management , 2006, SIGMOD Conference.

[3]  Dennis Gannon,et al.  Workflows for e-Science, Scientific Workflows for Grids , 2014 .

[4]  Patrick Valduriez,et al.  Principles of Distributed Database Systems , 1990 .

[5]  Ramez Elmasri,et al.  Fundamentals of Database Systems , 1989 .

[6]  Tony Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery , 2009 .

[7]  Marta Mattoso,et al.  Improving Many-Task computing in scientific workflows using P2P techniques , 2010, 2010 3rd Workshop on Many-Task Computing on Grids and Supercomputers.

[8]  Edward Walker,et al.  Challenges in executing large parameter sweep studies across widely distributed computing environments , 2007, CLADE '07.

[9]  Patrick Valduriez,et al.  Principles of distributed database systems (2nd ed.) , 1999 .

[10]  Rogério Luís de Carvalho Costa,et al.  Scheduling in Grid Databases , 2008, 22nd International Conference on Advanced Information Networking and Applications - Workshops (aina workshops 2008).

[11]  Marta Mattoso,et al.  SciCumulus: A Lightweight Cloud Middleware to Explore Many Task Computing Paradigm in Scientific Workflows , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[12]  Ewa Deelman,et al.  Scientific workflows and clouds , 2010, ACM Crossroads.

[13]  Marta Mattoso,et al.  Towards supporting the life cycle of large scale scientific experiments , 2010, Int. J. Bus. Process. Integr. Manag..

[14]  Cláudio T. Silva,et al.  Provenance for Computational Tasks: A Survey , 2008, Computing in Science & Engineering.

[15]  Rajkumar Buyya,et al.  Modeling and simulation of scalable Cloud computing environments and the CloudSim toolkit: Challenges and opportunities , 2009, 2009 International Conference on High Performance Computing & Simulation.

[16]  G. Terstappen,et al.  In silico research in drug discovery. , 2001, Trends in pharmacological sciences.

[17]  Les Carr,et al.  Enhancing access to research data: the challenge of crystallography , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[18]  Andrew T. Campbell,et al.  A quality of service architecture , 1994, CCRV.

[19]  Marta Mattoso,et al.  Towards a Taxonomy for Cloud Computing from an e-Science Perspective , 2010, Cloud Computing.

[20]  Anthony J. G. Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery [Point of View] , 2011 .

[21]  Lizhe Wang,et al.  Scientific Cloud Computing: Early Definition and Experience , 2008, 2008 10th IEEE International Conference on High Performance Computing and Communications.