Scheduling Workflow Applications Based on Multi-source Parallel Data Retrieval in Distributed Computing Networks

Many scientific experiments are carried out in collaboration with researchers around the world to use existing infrastructures and conduct experiments at massive scale. Data produced by such experiments are thus replicated and cached at multiple geographic locations. This gives rise to new challenges when selecting distributed data and compute resources so that the execution of applications is time-and cost-efficient. Existing heuristic techniques select ‘best’ data source for retrieving data to a compute resource and subsequently process task-resource assignment. However, this approach of scheduling, which is based only on single source data retrieval, may not give time-efficient schedules when: (i) tasks are interdependent on data, (ii) the average size of data processed by most tasks is large and (iii) data transfer time exceeds task computation time by at least one order of magnitude. In order to address these characteristics of data-intensive applications, we propose to leverage the presence of replicated data sources, retrieve data in parallel from multiple locations and thus achieve time-efficient schedules. In this article, we propose two multi-source data-retrieval-based scheduling heuristic that assigns interdependent tasks to compute resources based on both data retrieval time and task-computation time. We carry out experiments using real applications and deploy them on emulated as well as real environments. With a combination of data retrieval and task-resource mapping technique, we show that our heuristic produces time-efficient schedules that are better than existing heuristic-based techniques for scheduling application workflows.

[1]  Jennifer M Schopf,et al.  IBL for Replica Selection in Data-Intensive Grid Applications , 2004 .

[2]  D. Katz,et al.  The Montage architecture for grid-enabled science processing of large, distributed datasets , 2004 .

[3]  Radu Prodan,et al.  Dynamic scheduling of scientific workflow applications on the grid: a case study , 2005, SAC '05.

[4]  Richard Wolski,et al.  The network weather service: a distributed resource performance forecasting service for metacomputing , 1999, Future Gener. Comput. Syst..

[5]  Nenad Medvidovic,et al.  A software architecture-based framework for highly distributed and data intensive scientific applications , 2006, ICSE.

[6]  Stanley B. Zdonik,et al.  DBIS-toolkit: adaptable middleware for large scale data delivery , 1999, SIGMOD '99.

[7]  Cheng Wu,et al.  Ordinal Optimized Scheduling of Scientific Workflows in Elastic Compute Clouds , 2011, 2011 IEEE Third International Conference on Cloud Computing Technology and Science.

[8]  Daniel Mahrenholz,et al.  Real-Time Network Emulation with ns-2 , 2004, Eighth IEEE International Symposium on Distributed Simulation and Real-Time Applications.

[9]  KyoungSoo Park,et al.  Deploying Large File Transfer on an HTTP Content Distribution Network , 2004, WORLDS.

[10]  Derek R. Dreyer,et al.  Two Heuristics for the Euclidean Steiner Tree Problem , 1998, J. Glob. Optim..

[11]  Rajkumar Buyya,et al.  A novel architecture for realizing grid workflow using tuple spaces , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[12]  Jun Feng,et al.  Eliminating replica selection - using multiple replicas to accelerate data transfer on grids , 2004, Proceedings. Tenth International Conference on Parallel and Distributed Systems, 2004. ICPADS 2004..

[13]  Mehmet Balman,et al.  A new paradigm: Data-aware scheduling in grid computing , 2009, Future Gener. Comput. Syst..

[14]  Ian T. Foster,et al.  Improving parallel data transfer times using predicted variances in shared networks , 2005, CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005..

[15]  Franck Cappello,et al.  Selecting A Virtualization System For Grid/P2P Large Scale Emulation , 2006 .

[16]  Fatos Xhafa,et al.  Metaheuristics for scheduling in distributed computing environments , 2008 .

[17]  Weisong Shi,et al.  An Adaptive Rescheduling Strategy for Grid Workflow Applications , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[18]  Pablo Rodriguez,et al.  Dynamic parallel access to replicated content in the internet , 2002, TNET.

[19]  Yogesh L. Simmhan,et al.  Efficient scheduling of scientific workflows in a high performance computing cluster , 2008, CLADE '08.

[20]  Athena Vakali,et al.  Insight and Perspectives for Content Delivery Networks Valaki Fig. 1 (1/06) Figure 1 Cdn Content Distributor Cdn Content Delivery Network Overview , 2005 .

[21]  Volkmar Sieh,et al.  Implementing a User-Mode Linux with Minimal Changes from Original Kernel , 2002 .

[22]  Emin Gün Sirer,et al.  Meridian: a lightweight network location service without virtual coordinates , 2005, SIGCOMM '05.

[23]  David Cussans,et al.  CMS Computing: Technical Design Report , 2005 .

[24]  Miron Livny,et al.  Stork: making data placement a first class citizen in the grid , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[25]  Naveed A. Sherwani,et al.  Algorithms for VLSI Physical Design Automation , 1999, Springer US.

[26]  David J. DeWitt,et al.  Data driven workflow planning in cluster management systems , 2007, HPDC '07.

[27]  David Mazières,et al.  OASIS: Anycast for Any Service , 2006, NSDI.

[28]  Lili Qiu,et al.  On the placement of Web server replicas , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[29]  Egon Balas,et al.  On the Set-Covering Problem , 1972, Oper. Res..

[30]  Nenad Medvidovic,et al.  Scientific Software as Workflows: From Discovery to Distribution , 2008, IEEE Software.

[31]  Heon Young Yeom,et al.  ReCon: A Fast and Reliable Replica Retrieval Service for the Data Grid , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[32]  Anthony A. Maciejewski,et al.  Study of an Iterative Technique to Minimize Completion Times of Non-Makespan Machines , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[33]  George Pallis,et al.  Insight and perspectives for content delivery networks , 2006, CACM.

[34]  Bertram Ludäscher,et al.  Scientific workflow management and the Kepler system: Research Articles , 2006 .

[35]  Scott Klasky,et al.  Experiments with in-transit processing for data intensive grid workflows , 2007, 2007 8th IEEE/ACM International Conference on Grid Computing.

[36]  Ellen W. Zegura,et al.  How to model an internetwork , 1996, Proceedings of IEEE INFOCOM '96. Conference on Computer Communications.

[37]  Rajkumar Buyya,et al.  A grid workflow environment for brain imaging analysis on distributed systems , 2009, Concurr. Comput. Pract. Exp..

[38]  Radu Prodan,et al.  Bi-Criteria Scheduling of Scientific Grid Workflows , 2010, IEEE Transactions on Automation Science and Engineering.

[39]  Jack J. Dongarra,et al.  Scheduling workflow applications on processors with different capabilities , 2006, Future Gener. Comput. Syst..

[40]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[41]  Dennis Gannon,et al.  Workflows for e-Science, Scientific Workflows for Grids , 2014 .

[42]  Rizos Sakellariou,et al.  A low-cost rescheduling policy for efficient mapping of workflows on grid systems , 2004, Sci. Program..

[43]  Jeffrey D. Ullman,et al.  NP-Complete Scheduling Problems , 1975, J. Comput. Syst. Sci..

[44]  C.A. Mattmann,et al.  Software Connector Classification and Selection for Data-Intensive Systems , 2007, Second International Workshop on Incorporating COTS Software into Software Systems: Tools and Techniques (IWICSS '07).

[45]  Salim Hariri,et al.  Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[46]  Ian T. Foster,et al.  Replica selection in the Globus Data Grid , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[47]  Rajkumar Buyya,et al.  Scheduling of Scientific Workflows on Data Grids , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[48]  Ivona Brandic,et al.  An approach for the high-level specification of QoS-aware grid workflows considering location affinity , 2006, Sci. Program..

[49]  Andrew A. Chien,et al.  GTP: group transport protocol for lambda-Grids , 2004, IEEE International Symposium on Cluster Computing and the Grid, 2004. CCGrid 2004..

[50]  Elisa Heymann,et al.  Analysis of Dynamic Heuristics for Workflow Scheduling on Grid Systems , 2006, 2006 Fifth International Symposium on Parallel and Distributed Computing.

[51]  Peter Z. Kunszt,et al.  Giggle: A Framework for Constructing Scalable Replica Location Services , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[52]  Alex Zelikovsky,et al.  Minimum Steiner Tree Construction , 2008, Handbook of Algorithms for Physical Design Automation.

[53]  Rizos Sakellariou,et al.  A hybrid heuristic for DAG scheduling on heterogeneous systems , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..