Industry Practices and Event Logging: Assessment of a Critical Software Development Process

Practitioners widely recognize the importance of event logging for a variety of tasks, such as accounting, system measurements and troubleshooting. Nevertheless, in spite of the importance of the tasks based on the logs collected under real workload conditions, event logging lacks systematic design and implementation practices. The implementation of the logging mechanism strongly relies on the human expertise. This paper proposes a measurement study of event logging practices in a critical industrial domain. We assess a software development process at Selex ES, a leading Finmeccanica company in electronic and information solutions for critical systems. Our study combines source code analysis, inspection of around 2.3 millions log entries, and direct feedback from the development team to gain process-wide insights ranging from programming practices, logging objectives and issues impacting log analysis. The findings of our study were extremely valuable to prioritize event logging reengineering tasks at Selex ES.

[1]  Martin Leucker,et al.  Runtime Reflection: Dynamic model-based analyis of component-based distributed embedded systems , 2006 .

[2]  Miroslaw Malek,et al.  Comprehensive logfiles for autonomic systems , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[3]  Archana Ganapathi,et al.  Why Do Internet Services Fail, and What Can Be Done About It? , 2002, USENIX Symposium on Internet Technologies and Systems.

[4]  Ravishankar K. Iyer,et al.  Measurement-based Analysis of Networked System Availability , 2000, Performance Evaluation.

[5]  Domenico Cotroneo,et al.  Identifying Compromised Users in Shared Computing Infrastructures: A Data-Driven Bayesian Network Approach , 2011, 2011 IEEE 30th International Symposium on Reliable Distributed Systems.

[6]  Domenico Cotroneo,et al.  Assessing Direct Monitoring Techniques to Analyze Failures of Critical Industrial Systems , 2014, 2014 IEEE 25th International Symposium on Software Reliability Engineering.

[7]  Tzilla Elrad,et al.  Aspect-oriented programming: Introduction , 2001, CACM.

[8]  Jon Stearley,et al.  Bad Words: Finding Faults in Spirit's Syslogs , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[9]  Stefano Russo,et al.  Detection of Software Failures through Event Logs: An Experimental Study , 2012, 2012 IEEE 23rd International Symposium on Software Reliability Engineering.

[10]  Ding Yuan,et al.  Characterizing logging practices in open-source software , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[11]  Ding Yuan,et al.  Improving Software Diagnosability via Log Enhancement , 2012, TOCS.

[12]  Brendan Murphy,et al.  Windows 2000 Dependability , 2000 .

[13]  Domenico Cotroneo,et al.  Event Logs for the Analysis of Software Failures: A Rule-Based Approach , 2013, IEEE Transactions on Software Engineering.

[14]  Navjot Singh,et al.  A log mining approach to failure analysis of enterprise telephony systems , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[15]  Chris Lonvick,et al.  The BSD Syslog Protocol , 2001, RFC.

[16]  Bianca Schroeder,et al.  A Large-Scale Study of Failures in High-Performance Computing Systems , 2006, IEEE Transactions on Dependable and Secure Computing.

[17]  Domenico Cotroneo,et al.  Filtering Security Alerts for the Analysis of a Production SaaS Cloud , 2014, 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing.

[18]  Ravishankar K. Iyer,et al.  Measurement and modeling of computer reliability as affected by system activity , 1986, TOCS.

[19]  Qiang Fu,et al.  Where do developers log? an empirical study on logging practices in industry , 2014, ICSE Companion.

[20]  Jon Stearley,et al.  What Supercomputers Say: A Study of Five System Logs , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).

[21]  Daniel P. Siewiorek,et al.  VAX/VMS event monitoring and analysis , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[22]  Ravishankar K. Iyer,et al.  Failure data analysis of a LAN of Windows NT based computers , 1999, Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems.

[23]  Alex Groce,et al.  Formal Analysis of Log Files , 2010, J. Aerosp. Comput. Inf. Commun..

[24]  Randy H. Katz,et al.  A Graphical Representation for Identifier Structure in Logs , 2010, SLAML.

[25]  Mohamed Kaâniche,et al.  Availability assessment of SunOS/Solaris Unix systems based on syslogd and wtmpx log files: A case study , 2005, 11th Pacific Rim International Symposium on Dependable Computing (PRDC'05).

[26]  Jeffrey M. Voas,et al.  Quality Time - Can Aspect-Oriented Programming Lead to More Reliable Software? , 2000, IEEE Softw..

[27]  Robert E. Filman,et al.  What Is Aspect-Oriented Programming , 2001 .