LogGC: garbage collecting audit log

System-level audit logs capture the interactions between applications and the runtime environment. They are highly valuable for forensic analysis that aims to identify the root cause of an attack, which may occur long ago, or to determine the ramifications of an attack for recovery from it. A key challenge of audit log-based forensics in practice is the sheer size of the log files generated, which could grow at a rate of Gigabytes per day. In this paper, we propose LogGC, an audit logging system with garbage collection (GC) capability. We identify and overcome the unique challenges of garbage collection in the context of computer forensic analysis, which makes LogGC different from traditional memory GC techniques. We also develop techniques that instrument user applications at a small number of selected places to emit additional system events so that we can substantially reduce the false dependences between system events to improve GC effectiveness. Our results show that LogGC can reduce audit log size by 14 times for regular user systems and 37 times for server systems, without affecting the accuracy of forensic analysis.

[1]  Leslie Lamport,et al.  On-the-fly garbage collection: an exercise in cooperation , 1975, Language Hierarchies and Interfaces.

[2]  Daniel G. Bobrow,et al.  An efficient, incremental, automatic garbage collector , 1976, CACM.

[3]  Andrew W. Appel,et al.  Simple generational garbage collection and fast allocation , 1989, Softw. Pract. Exp..

[4]  Rafael Dueire Lins,et al.  Garbage collection: algorithms for automatic dynamic memory management , 1996 .

[5]  Sushil Jajodia,et al.  Recovery from Malicious Transactions , 2002, IEEE Trans. Knowl. Data Eng..

[6]  Willy Zwaenepoel,et al.  Performance and scalability of EJB applications , 2002, OOPSLA '02.

[7]  Tal Garfinkel,et al.  Understanding data lifetime via whole system simulation , 2004 .

[8]  Eyal de Lara,et al.  The taser intrusion recovery system , 2005, SOSP '05.

[9]  Samuel T. King,et al.  Enriching Intrusion Alerts Through Multi-Host Causality , 2005, NDSS.

[10]  Samuel T. King,et al.  Backtracking intrusions , 2005, TOCS.

[11]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[12]  Subbarayan Venkatesan,et al.  Forensic analysis of file system intrusions using improved backtracking , 2005, Third IEEE International Workshop on Information Assurance (IWIA'05).

[13]  James Newsom,et al.  Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software, Network and Distributed System Security Symposium Conference Proceedings : 2005 , 2005 .

[14]  Xuxian Jiang,et al.  Provenance-Aware Tracing ofWorm Break-in and Contaminations: A Process Coloring Approach , 2006, 26th IEEE International Conference on Distributed Computing Systems (ICDCS'06).

[15]  Parag Agrawal,et al.  Trio: a system for data, uncertainty, and lineage , 2006, VLDB.

[16]  Wu-chi Feng,et al.  Automatic high-performance reconstruction and recovery , 2007, Comput. Networks.

[17]  Adriane Chapman,et al.  Efficient provenance storage , 2008, SIGMOD Conference.

[18]  Xi Wang,et al.  Intrusion Recovery Using Selective Re-execution , 2010, OSDI.

[19]  Fabian Monrose,et al.  Trail of bytes: efficient support for forensic analysis , 2010, CCS '10.

[20]  Michael Laurenzano,et al.  PEBIL: Efficient static binary instrumentation for Linux , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[21]  Nickolai Zeldovich,et al.  Intrusion recovery for database-backed web applications , 2011, SOSP.

[22]  Yulai Xie,et al.  A hybrid approach for efficient provenance storage , 2012, CIKM '12.

[23]  Ashish Gehani,et al.  Towards Automated Collection of Application-Level Data Provenance , 2012, TaPP.

[24]  Xiangyu Zhang,et al.  High Accuracy Attack Provenance via Binary-based Execution Partition , 2013, NDSS.

[25]  Michael Stonebraker,et al.  SubZero: A fine-grained lineage system for scientific databases , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).