A novel statistical time-series pattern based interval forecasting strategy for activity durations in workflow systems

Forecasting workflow activity durations is of great importance to support satisfactory QoS in workflow systems. Traditionally, a workflow system is often designed to facilitate the process automation in a specific application domain where activities are of the similar nature. Hence, a particular forecasting strategy is employed by a workflow system and applied uniformly to all its workflow activities. However, with newly emerging requirement to serve as a type of middleware services for high performance computing infrastructures such as grid and cloud computing, more and more workflow systems are designed to be general purpose to support workflow applications from many different domains. Due to such a problem, the forecasting strategies in workflow systems must adapt to different workflow applications which are normally executed repeatedly such as data/computation intensive scientific applications (mainly with long-duration activities) and instance intensive business applications (mainly with short-duration activities). In this paper, with a systematic analysis of the above issues, we propose a novel statistical time-series pattern based interval forecasting strategy which has two different versions, a complex version for long-duration activities and a simple version for short-duration activities. The strategy consists of four major functional components: duration series building, duration pattern recognition, duration pattern matching and duration interval forecasting. Specifically, a novel hybrid non-linear time-series segmentation algorithm is designed to facilitate the discovery of duration-series patterns. The experimental results on real world examples and simulated test cases demonstrate the excellent performance of our strategy in the forecasting of activity duration intervals for both long-duration and short-duration activities in comparison to some representative time-series forecasting strategies in traditional workflow systems.

[1]  Jun Yan,et al.  SwinDeW ─ A p 2 p-based Decentralised Workflow Management System , .

[2]  Dennis Gannon,et al.  Workflows for e-Science, Scientific Workflows for Grids , 2014 .

[3]  Helen D. Karatza,et al.  Scheduling multiple task graphs with end-to-end deadlines in distributed real-time systems utilizing imprecise computations , 2010, J. Syst. Softw..

[4]  Myoung-Ho Kim,et al.  Improving the performance of time-constrained workflow processing , 2001, J. Syst. Softw..

[5]  Jun Zhang,et al.  An Ant Colony Optimization Approach to a Grid Workflow Scheduling Problem With Various QoS Requirements , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[6]  Kees M. van Hee,et al.  Workflow Management: Models, Methods, and Systems , 2002, Cooperative information systems.

[7]  Ian Foster,et al.  Predicting application run times with historical information , 2004, J. Parallel Distributed Comput..

[8]  Rajkumar Buyya,et al.  A taxonomy of scientific workflow systems for grid computing , 2005, SGMD.

[9]  Qingtian Zeng,et al.  Conflict detection and resolution for workflows constrained by resources and non-determined durations , 2008, J. Syst. Softw..

[10]  Jinjun Chen,et al.  Localising temporal constraints in scientific workflows , 2010, J. Comput. Syst. Sci..

[11]  Jinjun Chen,et al.  Grid Computing: Infrastructure, Service, and Applications , 2009 .

[12]  Yoichi Muraoka,et al.  Extended forecast of CPU and network load on computational Grid , 2004, IEEE International Symposium on Cluster Computing and the Grid, 2004. CCGrid 2004..

[13]  Wil vanderAalst,et al.  Workflow Management: Models, Methods, and Systems , 2004 .

[14]  Srikanta Tirthapura,et al.  Sketching asynchronous streams over a sliding window , 2006, PODC '06.

[15]  Radu Prodan,et al.  Soft Benchmarks-Based Application Performance Prediction Using a Minimum Training Set , 2006, 2006 Second IEEE International Conference on e-Science and Grid Computing (e-Science'06).

[16]  Eamonn J. Keogh,et al.  An online algorithm for segmenting time series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[17]  Tak-Chung Fu,et al.  An evolutionary approach to pattern-based time series segmentation , 2004, IEEE Transactions on Evolutionary Computation.

[18]  Hai Zhuge,et al.  A timed workflow process model , 2001, J. Syst. Softw..

[19]  Jinjun Chen,et al.  Multiple states based temporal consistency for dynamic verification of fixed‐time constraints in Grid workflow systems , 2007, Concurr. Comput. Pract. Exp..

[20]  Jinjun Chen,et al.  A taxonomy of grid workflow verification and validation , 2008, Concurr. Comput. Pract. Exp..

[21]  Xiao Liu,et al.  A probabilistic strategy for temporal constraint management in scientific workflow systems , 2011, Concurr. Comput. Pract. Exp..

[22]  Hai Jin,et al.  A throughput maximization strategy for scheduling transaction‐intensive workflows on SwinDeW‐G , 2008, Concurr. Comput. Pract. Exp..

[23]  K. A. Stroud,et al.  Engineering Mathematics , 2020, Nature.

[24]  Jinjun Chen,et al.  Temporal dependency-based checkpoint selection for dynamic verification of temporal constraints in scientific workflow systems , 2011, TSEM.

[25]  Yun Yang,et al.  SwinDeW-a p2p-based decentralized workflow management system , 2006, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[26]  Dennis Gannon,et al.  Scientific versus Business Workflows , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[27]  Anne H. H. Ngu,et al.  QoS-aware middleware for Web services composition , 2004, IEEE Transactions on Software Engineering.

[28]  Xiao Liu,et al.  A Compromised-Time-Cost Scheduling Algorithm in SwinDeW-C for Instance-Intensive Cost-Constrained Workflows on a Cloud Computing Platform , 2010, Int. J. High Perform. Comput. Appl..

[29]  Jinjun Chen,et al.  Temporal dependency based checkpoint selection for dynamic verification of fixed-time constraints in grid workflow systems , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[30]  Wei Sun,et al.  Predict task running time in grid environments based on CPU load predictions , 2008, Future Gener. Comput. Syst..

[31]  Radu Prodan,et al.  Overhead Analysis of Scientific Workflows in Grid Environments , 2008, IEEE Transactions on Parallel and Distributed Systems.

[32]  Xiao Liu,et al.  A Probabilistic Strategy for Setting Temporal Constraints in Scientific Workflows , 2008, BPM.

[33]  Jens Volkert,et al.  Adaps - A three-phase adaptive prediction system for the run-time of jobs based on user behaviour , 2011, J. Comput. Syst. Sci..

[34]  Rajkumar Buyya,et al.  A Taxonomy of Workflow Management Systems for Grid Computing , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[35]  Rajkumar Buyya,et al.  Scheduling scientific workflow applications with deadline and budget constraints using genetic algorithms , 2006, Sci. Program..

[36]  Xiao Liu,et al.  SwinDeW-C: A Peer-to-Peer Based Cloud Workflow System , 2010, Handbook of Cloud Computing.

[37]  Xiao Liu,et al.  A data placement strategy in scientific cloud workflows , 2010, Future Gener. Comput. Syst..

[38]  José Duato,et al.  A New Cost-Effective Technique for QoS Support in Clusters , 2007, IEEE Transactions on Parallel and Distributed Systems.

[39]  Chung-Chian Hsu,et al.  Pattern recognition in time series database: A case study on financial database , 2007, Expert Syst. Appl..

[40]  Thomas Fahringer,et al.  Predicting the execution time of grid workflow applications through local learning , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[41]  Richard Wolski,et al.  Forecasting network performance to support dynamic scheduling using the network weather service , 1997, Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183).

[42]  Chris Chatfield,et al.  The Analysis of Time Series : An Introduction, Sixth Edition , 2003 .

[43]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[44]  Yun Yang,et al.  Resource constraints analysis of workflow specifications , 2004, J. Syst. Softw..

[45]  Xiao Liu,et al.  An Algorithm in SwinDeW-C for Scheduling Transaction-Intensive Cost-Constrained Cloud Workflows , 2008, 2008 IEEE Fourth International Conference on eScience.

[46]  Guangwen Yang,et al.  Load prediction using hybrid model for computational grid , 2007, 2007 8th IEEE/ACM International Conference on Grid Computing.

[47]  Thomas Fahringer,et al.  Using Templates to Predict Execution Time of Scientific Workflow Applications in the Grid , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[48]  Feng-Jian Wang,et al.  An incremental analysis for resource conflicts to workflow specifications , 2008, J. Syst. Softw..

[49]  Robert D. van der Mei,et al.  Statistical Properties of Task Running Times in a Global-Scale Grid Environment , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[50]  YuanDong,et al.  A data placement strategy in scientific cloud workflows , 2010 .

[51]  Peter A. Dinda,et al.  Host load prediction using linear models , 2000, Cluster Computing.

[52]  Ian J. Taylor,et al.  Workflows and e-Science: An overview of workflow system features and capabilities , 2009, Future Gener. Comput. Syst..

[53]  Robert D. van der Mei,et al.  A prediction method for job runtimes on shared processors: Survey, statistical analysis and new avenues , 2007, Perform. Evaluation.

[54]  Chris Chatfield,et al.  The Analysis of Time Series: An Introduction , 1981 .

[55]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[56]  Borko Furht,et al.  Handbook of Cloud Computing , 2010 .

[57]  Rajkumar Buyya,et al.  Article in Press Future Generation Computer Systems ( ) – Future Generation Computer Systems Cloud Computing and Emerging It Platforms: Vision, Hype, and Reality for Delivering Computing as the 5th Utility , 2022 .

[58]  Hai Jin,et al.  Peer-to-Peer Based Grid Workflow Runtime Environment of SwinDeW-G , 2007, Third IEEE International Conference on e-Science and Grid Computing (e-Science 2007).

[59]  Jiawei Han,et al.  Data Mining: Concepts and Techniques, Second Edition , 2006, The Morgan Kaufmann series in data management systems.

[60]  Xiao Liu,et al.  Forecasting Duration Intervals of Scientific Workflow Activities Based on Time-Series Patterns , 2008, 2008 IEEE Fourth International Conference on eScience.

[61]  Neil A. Ernst,et al.  The Journal of Systems and Software , 2022 .

[62]  Ian T. Foster,et al.  Homeostatic and tendency-based CPU load predictions , 2003, Proceedings International Parallel and Distributed Processing Symposium.