Monere: Monitoring of Service Compositions for Failure Diagnosis

Service-oriented computing has enabled developers to build large, cross-domain service compositions in a more routine manner. These systems inhabit complex, multi-tier operating environments that pose many challenges to their reliable operation. Unanticipated failures at runtime can be time-consuming to diagnose and may propagate across administrative boundaries. It has been argued that measuring readily available data about system operation can significantly increase the failure management capabilities of such systems. We have built an online monitoring system for cross-domain Web service compositions called Monere, which we use in a controlled experiment involving human operators in order to determine the effects of such an approach on diagnosis times for system-level failures. This paper gives an overview of how Monere is able to instrument relevant components across all layers of a service composition and to exploit the structure of BPEL workflows to obtain structural cross-domain dependency graphs. Our experiments reveal a reduction in diagnosis time of more than 20%. However, further analysis reveals this benefit to be dependent on certain conditions, which leads to insights about promising directions for effective support of failure diagnosis in large Web service compositions.

[1]  Bernhard Plattner Real-Time Execution Monitoring , 1984, IEEE Transactions on Software Engineering.

[2]  Robbert van Renesse,et al.  Adding high availability and autonomic behavior to Web services , 2004, Proceedings. 26th International Conference on Software Engineering.

[3]  Liang Chen,et al.  Grid Service Orchestration Using the Business Process Execution Language (BPEL) , 2005, Journal of Grid Computing.

[4]  Regina Dunlea,et al.  Simple Object Access Protocol (SOAP) , 2005 .

[5]  Werner Vogels World wide failures , 1996, EW 7.

[6]  Manish Gupta,et al.  Discovering Dynamic Dependencies in Enterprise Environments for Problem Determination , 2003, DSOM.

[7]  Michael Anthony Bauer,et al.  Making distributed applications manageable through instrumentation , 1999, J. Syst. Softw..

[8]  Karsten Schwan,et al.  SysProf: Online Distributed Behavior Diagnosis through Fine-grain System Monitoring , 2006, 26th IEEE International Conference on Distributed Computing Systems (ICDCS'06).

[9]  Zhi-Li Zhang,et al.  Co-designing the failure analysis and monitoring of large-scale systems , 2008, PERV.

[10]  J. Steven Perry,et al.  Java Management Extensions , 2002 .

[11]  Chun Zhang,et al.  vPath: Precise Discovery of Request Processing Paths from Black-Box Observations of Thread and Network Activities , 2009, USENIX Annual Technical Conference.

[12]  Torsten Suel,et al.  Web Information Systems Engineering - WISE 2010 - 11th International Conference, Hong Kong, China, December 12-14, 2010. Proceedings , 2010, WISE.

[13]  Hisashi Kashima,et al.  Network-based problem detection for distributed systems , 2005, 21st International Conference on Data Engineering (ICDE'05).

[14]  Schahram Dustdar,et al.  Event Driven Monitoring for Service Composition Infrastructures , 2010, WISE.

[15]  Francisco Curbera,et al.  Web Services Business Process Execution Language Version 2.0 , 2007 .

[16]  Marcus Brunner,et al.  Self-Managing Distributed Systems , 2003, Lecture Notes in Computer Science.

[17]  Eric A. Brewer,et al.  Pinpoint: problem determination in large, dynamic Internet services , 2002, Proceedings International Conference on Dependable Systems and Networks.

[18]  Bryan Cantrill,et al.  Dynamic Instrumentation of Production Systems , 2004, USENIX Annual Technical Conference, General Track.

[19]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[20]  Abhishek Kumar,et al.  Lightweight, High-Resolution Monitoring for Troubleshooting Production Systems , 2008, OSDI.

[21]  Luciano Baresi,et al.  Self-Supervising BPEL Processes , 2011, IEEE Transactions on Software Engineering.

[22]  Wolfgang Emmerich,et al.  Service-Level Agreements for Electronic Services , 2010, IEEE Transactions on Software Engineering.

[23]  Richard Mortier,et al.  Using Magpie for Request Extraction and Workload Modelling , 2004, OSDI.

[24]  Christopher Ré,et al.  WS-Membership - Failure Management in a Web-Services World , 2003, WWW.