Optimizing execution time predictions of scientific workflow applications in the Grid through evolutionary programming

Planning for execution of scientific workflow applications in the Grid requires in advance prediction of workflow execution time for optimized execution of these applications. However, predicting execution times of such applications is very complex mainly due to different structures of workflows, possible parallel execution of workflow tasks on multiple resources and the dynamic and heterogeneous nature of the Grid. In this paper, we describe an optimized method (in extension to a previous work by Nadeem et al. (2009) [4]) for execution time prediction of workflow applications in the Grid. We characterize workflows in terms of attributes describing their structures and performance during different stages of their execution. Overall, performance of the workflows is modeled through templates of workflow attributes. An optimized method exploiting evolutionary programming is employed to search for suitable templates. Three different induction models are employed to generate predictions and later compared for their accuracy. The results from our experiments for three real-world workflow applications on a real Grid are presented to show the effectiveness of our approach. We also compare the proposed approach with our previous method based on supervised exhaustive search by Nadeem and Fahringer (2009) [4].

[1]  Ian Taylor,et al.  Distributed computing with Triana on the Grid: Research Articles , 2005 .

[2]  Bertram Ludäscher,et al.  Kepler: an extensible system for design and execution of scientific workflows , 2004 .

[3]  Thomas Fahringer,et al.  Using Templates to Predict Execution Time of Scientific Workflow Applications in the Grid , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[4]  Radu Prodan,et al.  Soft Benchmarks-Based Application Performance Prediction Using a Minimum Training Set , 2006, 2006 Second IEEE International Conference on e-Science and Grid Computing (e-Science'06).

[5]  Alberto Gómez,et al.  A review of machine learning in dynamic scheduling of flexible manufacturing systems , 2001, Artificial Intelligence for Engineering Design, Analysis and Manufacturing.

[6]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[7]  Jack Belzer,et al.  Encyclopedia of Computer Science and Technology , 2002 .

[8]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.

[9]  Richard Gibbons,et al.  A Historical Application Profiler for Use by Parallel Schedulers , 1997, JSSPP.

[10]  Ming Wu,et al.  Network bandwidth predictor (NBP): a system for online network performance forecasting , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[11]  Erol Gelenbe,et al.  A performance model of block structured parallel programs , 1986 .

[12]  Radu Prodan,et al.  Overhead Analysis of Scientific Workflows in Grid Environments , 2008, IEEE Transactions on Parallel and Distributed Systems.

[13]  Michael F. P. O'Boyle,et al.  Fast compiler optimisation evaluation using code-feature based performance prediction , 2007, CF '07.

[14]  Jun Qin,et al.  Advanced data flow support for scientific grid workflow applications , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[15]  Ian J. Taylor,et al.  Distributed computing with Triana on the Grid , 2005, Concurr. Pract. Exp..

[16]  Ian Foster,et al.  Predicting application run times with historical information , 2004, J. Parallel Distributed Comput..

[17]  Andreas Wombacher,et al.  Piloting an Empirical Study on Measures forWorkflow Similarity , 2006, 2006 IEEE International Conference on Services Computing (SCC'06).

[18]  Juan Chen,et al.  Improving a Local Learning Technique for QueueWait Time Predictions , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[19]  Matthew R. Pocock,et al.  Taverna: a tool for the composition and enactment of bioinformatics workflows , 2004, Bioinform..

[20]  Philippe Nain,et al.  Evaluation of parallel execution of program tree structures , 1984, SIGMETRICS '84.

[21]  Xingfu Wu,et al.  Using kernel couplings to predict parallel application performance , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[22]  Geoffrey C. Fox,et al.  Examining the Challenges of Scientific Workflows , 2007, Computer.

[23]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[24]  Johan Montagnat,et al.  A Probabilistic Model to Analyse Workflow Performance on Production Grids , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[25]  Hui Li,et al.  Predicting job start times on clusters , 2004, IEEE International Symposium on Cluster Computing and the Grid, 2004. CCGrid 2004..

[26]  Stephen A. Jarvis,et al.  An Investigation into the Application of Different Performance Prediction Methods to Distributed Enterprise Applications , 2005, The Journal of Supercomputing.

[27]  Jun Qin,et al.  ASKALON: a Grid application development and computing environment , 2005, The 6th IEEE/ACM International Workshop on Grid Computing, 2005..