TPS : A Task Placement Strategy for Big Data Workflows

Workflow makespan is the total execution time for running a workflow in the Cloud. The workflow makespan significantly depends on how the workflow tasks and datasets are allocated and placed in a distributed computing environment such as Clouds. Incorporating data and task allocation strategies to minimize makespan delivers significant benefits to scientific users in receiving their results in time. The main goal of a task placement algorithm is to minimize the total amount of data movement between virtual machines during the execution of the workflows. In this paper, we do the following: 1) formalize the task placement problem in big data workflows; 2) propose a task placement strategy (TPS) that considers both initial input datasets and intermediate datasets to calculate the dependency between workflow tasks; and 3) perform extensive experiments in the distributed environment to demonstrate that the proposed strategy provides an effective task distribution and placement tool.

[1]  Tao Zhang,et al.  Genetic Algorithms for Project Management , 2001, Ann. Softw. Eng..

[2]  Miron Livny,et al.  A framework for reliable and efficient data placement in distributed computing systems , 2005, J. Parallel Distributed Comput..

[3]  Munindar P. Singh,et al.  Service-Oriented Computing: Key Concepts and Principles , 2005, IEEE Internet Comput..

[4]  Jianwen Su,et al.  On Completeness of Web Service Compositions , 2007, IEEE International Conference on Web Services (ICWS 2007).

[5]  Miron Livny,et al.  Data placement for scientific applications in distributed environments , 2007, 2007 8th IEEE/ACM International Conference on Grid Computing.

[6]  Ann L. Chervenak,et al.  Data Management Challenges of Data-Intensive Scientific Workflows , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[7]  Yong Zhao,et al.  Cloud Computing and Grid Computing 360-Degree Compared , 2008, GCE 2008.

[8]  Xiao Liu,et al.  A data placement strategy in scientific cloud workflows , 2010, Future Gener. Comput. Syst..

[9]  Ewa Deelman,et al.  Scientific workflows and clouds , 2010, ACM Crossroads.

[10]  Bora Uçar,et al.  Integrated data placement and task assignment for scientific workflows in clouds , 2011, DIDC '11.

[11]  Yong Zhao,et al.  Opportunities and Challenges in Running Scientific Workflows on the Cloud , 2011, 2011 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery.

[12]  M. Brian Blake,et al.  Workflow composition of service level agreements for web services , 2012, Decis. Support Syst..

[13]  Chen Yi,et al.  A Data Placement Strategy Based on Genetic Algorithm for Scientific Workflows , 2012, 2012 Eighth International Conference on Computational Intelligence and Security.

[14]  Samir Khuller,et al.  Data Placement and Replica Selection for Improving Co-location in Distributed Environments , 2013, ArXiv.

[15]  Cees T. A. M. de Laat,et al.  Addressing big data issues in Scientific Data Infrastructure , 2013, 2013 International Conference on Collaboration Technologies and Systems (CTS).

[16]  Elisa Bertino,et al.  Big Data -- Opportunities and Challenges Panel Position Paper , 2013, 2013 IEEE 37th Annual Computer Software and Applications Conference.

[17]  Calton Pu,et al.  Real-time collaborative planning with big data: Technical challenges and in-place computing (invited paper) , 2013, 9th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing.

[18]  Shiyong Lu,et al.  A System Architecture for Running Big Data Workflows in the Cloud , 2014, 2014 IEEE International Conference on Services Computing.

[19]  Jianwu Wang,et al.  Big Data Applications Using Workflows for Data Parallel Computing , 2014, Computing in Science & Engineering.

[20]  Rahul Singh,et al.  Data-Driven Workflows in Multi-cloud Marketplaces , 2014, 2014 IEEE 7th International Conference on Cloud Computing.

[21]  Venkata Subba Reddy,et al.  Data Management Challenges In Cloud Computing Infrastructures , 2014 .

[22]  V. Terzieva,et al.  BIG DATA – OPPORTUNITIES AND CHALLENGES FOR EDUCATION , 2015 .

[23]  Shiyong Lu,et al.  BDAP: A Big Data Placement Strategy for Cloud-Based Scientific Workflows , 2015, 2015 IEEE First International Conference on Big Data Computing Service and Applications.