Mining Invariants from SaaS Application Logs (Practical Experience Report)

The increasing popularity of Software as a Service (SaaS) stresses the need of solutions to predict failures and avoid service interruptions, which invariably result in SLA violations and severe loss of revenue. A promising approach to continuously monitor the correct functioning of the system is to check the execution conformance to a set of invariants, i.e., properties that must hold when the system is deemed to run correctly. In this paper we propose a framework and a tool to automatically discover invariants from application logs and to online detect their violation. The framework has been applied on 9 months of log events from a real-world SaaS application. Results show that the proposed tool is able to automatically select 12 invariants with a stringent goodness of fit criteria out of more than 500 potential relationships. We also show the usefulness of our approach to detect runtime issues from logs in the form of violations of selected invariants, corresponding to silent errors that usually go unnoticed by the system maintenance personnel, even if they could represent symptoms of upcoming service failures.

[1]  Henrique Madeira,et al.  Experimental evaluation of the fail-silent behavior in computers without error masking , 1994, Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing.

[2]  William G. Griswold,et al.  Dynamically discovering likely program invariants to support program evolution , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[3]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .

[4]  Ram Chillarege,et al.  Measurement of failure rate in widely distributed software , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[5]  Haifeng Chen,et al.  Fault detection and localization in distributed systems using invariant relationships , 2013, 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[6]  Raj Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[7]  Daniel P. Siewiorek,et al.  VAX/VMS event monitoring and analysis , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[8]  Domenico Cotroneo,et al.  Assessing time coalescence techniques for the analysis of supercomputer logs , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).

[9]  Kishor S. Trivedi,et al.  Analysis of bugs in Apache Virtual Computing Lab , 2013, 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[10]  Jacob A. Abraham,et al.  Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.

[11]  Qiang Fu,et al.  Mining Invariants from Console Logs for System Problem Detection , 2010, USENIX Annual Technical Conference.

[12]  Andrea Bondavalli,et al.  A New Approach and a Related Tool for Dependability Measurements on Distributed Systems , 2010, IEEE Transactions on Instrumentation and Measurement.

[13]  Haifeng Chen,et al.  Efficient and Scalable Algorithms for Inferring Likely Invariants in Distributed Systems , 2007, IEEE Transactions on Knowledge and Data Engineering.

[14]  Gregory M. Kapfhammer,et al.  Dynamic invariant detection for relational databases , 2011, WODA '11.

[15]  Haifeng Chen,et al.  Discovering likely invariants of distributed transaction systems for autonomic system management , 2006, 2006 IEEE International Conference on Autonomic Computing.

[16]  Stephen S. Yau,et al.  An Approach to Concurrent Control Flow Checking , 1980, IEEE Transactions on Software Engineering.

[17]  Sarita V. Adve,et al.  Using likely program invariants to detect hardware errors , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[18]  Mark Sullivan,et al.  Software defects and their impact on system availability-a study of field failures in operating systems , 1991, [1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium.