A Logging Approach for Effective Dependability Evaluation of Complex Systems

The dependability evaluation and management of complex systems is often based on the collection of field data from event logs. Nevertheless, key decisions about log production and management are usually left to the late stages of development, leading to heterogeneous, inaccurate, and redundant logs. This in turn decreases the level of trust on logs. This paper proposes to enrich traditional logging by defining a set of rules, to be followed at design time, specifically conceived to improve the quality of logged failure data and to ease the coalescence of redundant or equivalent data. A tool for processing our rule-based logs has been developed to show the feasibility of the approach. The tool is applied on a real-world case study in order to evaluate the effectiveness of the approach when compared with traditional logging.

[1]  G. Pardo-Castellote,et al.  OMG data distribution service: architectural overview , 2003, IEEE Military Communications Conference, 2003. MILCOM 2003..

[2]  Jon Stearley,et al.  What Supercomputers Say: A Study of Five System Logs , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).

[3]  Navjot Singh,et al.  A log mining approach to failure analysis of enterprise telephony systems , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[4]  John P. Rouillard Real-time Log File Analysis Using the Simple Event Correlator (SEC) , 2004, LISA.

[5]  Miroslaw Malek,et al.  Comprehensive logfiles for autonomic systems , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[6]  Dong Tang,et al.  MEADEP: a dependability evaluation tool for engineers , 1998 .

[7]  Mohamed Kaâniche,et al.  Availability assessment of SunOS/Solaris Unix systems based on syslogd and wtmpx log files: A case study , 2005, 11th Pacific Rim International Symposium on Dependable Computing (PRDC'05).

[8]  Daniel P. Siewiorek,et al.  VAX/VMS event monitoring and analysis , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[9]  Carl E. Landwehr,et al.  Basic concepts and taxonomy of dependable and secure computing , 2004, IEEE Transactions on Dependable and Secure Computing.

[10]  Daniel P. Siewiorek,et al.  Error log analysis: statistical modeling and heuristic trend analysis , 1990 .

[11]  Anand Sivasubramaniam,et al.  BlueGene/L Failure Analysis and Prediction Models , 2006, International Conference on Dependable Systems and Networks (DSN'06).

[12]  Ravishankar K. Iyer,et al.  Failure data analysis of a LAN of Windows NT based computers , 1999, Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems.

[13]  Domenico Cotroneo,et al.  Towards a Framework for Field Data Production and Management , 2008 .