Predicting temporal violations for parallel business cloud workflows

Workflow temporal violations, namely, intermediate workflow runtime delays, often occur and have a serious impact on the on‐time completion of massive concurrent requests. Therefore, accurate prediction of cloud workflow temporal violations is critical as its result can serve as an essential reference for temporal violation prevention and handling strategies. Conventional studies mainly focus on the time delays of a single workflow activity or a single workflow instance but overlook the propagation of time delays among them. This is a serious problem as time delays can propagate in cloud workflow system due to resource sharing and the dependencies among workflow activities. This paper first proposes a novel temporal violation transmission model inspired by an epidemic model to model the dynamics of time delay propagation. Afterward, a novel temporal violation prediction strategy is presented to estimate the number of temporal violations that may occur and determine the number of violations that must be handled to achieve the target service‐level agreement, namely, the on‐time completion rate. To the best of our knowledge, this is the first attempt to predict cloud workflow temporal violations at the workflow build‐time stage by analyzing the propagation of temporal violations. Experimental results demonstrate that our strategy can make highly accurate predictions and is scalable for a large batch of parallel workflows running in the cloud.

[1]  Mingdong Tang,et al.  Bayesian Model-Based Prediction of Service Level Agreement Violations for Cloud Services , 2014, 2014 Theoretical Aspects of Software Engineering Conference.

[2]  Maoan Han,et al.  Dynamics of an SIS reaction-diffusion epidemic model for disease transmission. , 2010, Mathematical biosciences and engineering : MBE.

[3]  Jinjun Chen,et al.  Temporal dependency-based checkpoint selection for dynamic verification of temporal constraints in scientific workflow systems , 2011, TSEM.

[4]  Moe Thandar Wynn,et al.  Predicting Deadline Transgressions Using Event Logs , 2012, Business Process Management Workshops.

[5]  Johann Eder,et al.  Time Constraints in Workflow Systems , 1999, CAiSE.

[6]  Thomas Fahringer,et al.  Using Templates to Predict Execution Time of Scientific Workflow Applications in the Grid , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[7]  C. Harris,et al.  Fundamentals of Queueing Theory, Fourth Edition , 2008 .

[8]  Guillaume Pierre,et al.  EC2 Performance Analysis for Resource Provisioning of Service-Oriented Applications , 2009, ICSOC/ServiceWave Workshops.

[9]  Hind Castel-Taleb,et al.  Reducing the complexity of the performance analysis of a multi- server facilities , 2014 .

[10]  Xiao Liu,et al.  A Probabilistic Strategy for Setting Temporal Constraints in Scientific Workflows , 2008, BPM.

[11]  Jinjun Chen,et al.  Multiple states based temporal consistency for dynamic verification of fixed‐time constraints in Grid workflow systems , 2007, Concurr. Comput. Pract. Exp..

[12]  Jinjun Chen,et al.  Multiple temporal consistency states for dynamical verification of upper bound constraints in grid workflow systems , 2005, First International Conference on e-Science and Grid Computing (e-Science'05).

[13]  F. Brauer,et al.  Mathematical Models in Population Biology and Epidemiology , 2001 .

[14]  Jinjun Chen,et al.  Adaptive selection of necessary and sufficient checkpoints for dynamic verification of temporal constraints in grid workflow systems , 2007, TAAS.

[15]  Akhil Kumar,et al.  Managing Controlled Violation of Temporal Process Constraints , 2015, BPM.

[16]  Michel Dagenais,et al.  Runtime latency detection and analysis , 2016, Softw. Pract. Exp..

[17]  Xiao Liu,et al.  Forecasting Duration Intervals of Scientific Workflow Activities Based on Time-Series Patterns , 2008, 2008 IEEE Fourth International Conference on eScience.

[18]  Xiao Liu,et al.  An Epidemic Model Based Temporal Violation Prediction Strategy for Large Batch of Parallel Business Cloud Workflows , 2015, 2015 IEEE International Conference on Data Science and Data Intensive Systems.

[19]  Xiao Liu,et al.  Where to Fix Temporal Violations: A Novel Handling Point Selection Strategy for Business Cloud Workflows , 2016, 2016 IEEE International Conference on Services Computing (SCC).

[20]  Carl M. Harris,et al.  Fundamentals of Queueing Theory: Gross/Fundamentals of Queueing Theory , 2008 .

[21]  Armin Haller,et al.  A note on software tools and techniques for monitoring and prediction of cloud services , 2014, Softw. Pract. Exp..

[22]  Yuan-Chun Jiang,et al.  Preventing Temporal Violations in Scientific Workflows: Where and How , 2011, IEEE Transactions on Software Engineering.

[23]  Xiao Liu,et al.  A probabilistic strategy for temporal constraint management in scientific workflow systems , 2011, Concurr. Comput. Pract. Exp..

[24]  Kees M. van Hee,et al.  Workflow Management: Models, Methods, and Systems , 2002, Cooperative information systems.

[25]  Leon J. Osterweil,et al.  Exception Handling Patterns for Process Modeling , 2010, IEEE Transactions on Software Engineering.

[26]  J. van Leeuwen,et al.  Advanced Information Systems Engineering , 1999, Lecture Notes in Computer Science.

[27]  Mario Piattini,et al.  A case study on business process recovery using an e‐government system , 2012, Softw. Pract. Exp..

[28]  Jan Mendling,et al.  Predictive Task Monitoring for Business Processes , 2014, BPM.

[29]  Enhancing Formal Specification and Verification of Temporal Constraints in Business Processes , 2014, 2014 IEEE International Conference on Services Computing.

[30]  Xiao Liu,et al.  Throughput based temporal verification for monitoring large batch of parallel processes , 2014, ICSSP 2014.

[31]  Xiao Liu,et al.  Workflow temporal verification: an efficient and effective approach for delivering on-time completion (Abstract Only) , 2014, ICSSP 2014.

[32]  Xiao Liu,et al.  Necessary and sufficient checkpoint selection for temporal verification of high-confidence cloud workflow systems , 2015, Science China Information Sciences.

[33]  Yushun Fan,et al.  Dynamic Checking and Solution to Temporal Violations in Concurrent Workflow Processes , 2011, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[34]  Haruhisa Takahashi,et al.  Estimation of Average Latent Waiting and Service Times of Activities from Event Logs , 2015, BPM.

[35]  Xiaohui Gu,et al.  FChain: Toward Black-Box Online Fault Localization for Cloud Systems , 2013, 2013 IEEE 33rd International Conference on Distributed Computing Systems.

[36]  Jelena V. Misic,et al.  Performance Analysis of Cloud Computing Centers Using M/G/m/m+r Queuing Systems , 2012, IEEE Transactions on Parallel and Distributed Systems.

[37]  Georg Sigl,et al.  Enhancing fault emulation of transient faults by separating combinational and sequential fault propagation , 2016, 2016 International Great Lakes Symposium on VLSI (GLSVLSI).

[38]  Ian J. Taylor,et al.  Workflows and e-Science: An overview of workflow system features and capabilities , 2009, Future Gener. Comput. Syst..

[39]  Dragan Ivanovic,et al.  Constraint-Based Runtime Prediction of SLA Violations in Service Orchestrations , 2011, ICSOC.

[40]  Geoffrey C. Fox,et al.  Distributed and Cloud Computing: From Parallel Processing to the Internet of Things , 2011 .

[41]  Maria E. Orlowska,et al.  On Modeling and Verification of Temporal Constraints in Production Workflows , 1999, Knowledge and Information Systems.

[42]  Schahram Dustdar,et al.  Monitoring, Prediction and Prevention of SLA Violations in Composite Services , 2010, 2010 IEEE International Conference on Web Services.

[43]  Yuan-Chun Jiang,et al.  A novel statistical time-series pattern based interval forecasting strategy for activity durations in workflow systems , 2011, J. Syst. Softw..

[44]  Huaimin Wang,et al.  Localizing root causes of performance anomalies in cloud computing systems by analyzing request trace logs , 2012, Science China Information Sciences.

[45]  Xiao Liu,et al.  Temporal Verification for Business Cloud Workflows: Open Research Issues , 2014, 2014 10th International Conference on Semantics, Knowledge and Grids.

[46]  Xiao Liu,et al.  Do we need to handle every temporal violation in scientific workflow systems? , 2014, TSEM.

[47]  Zhiling Lan,et al.  Toward Automated Anomaly Identification in Large-Scale Systems , 2010, IEEE Transactions on Parallel and Distributed Systems.

[48]  Gregory R. Madey,et al.  Temporal representation and reasoning for workflow in engineering design change review , 2000, IEEE Trans. Engineering Management.