GenLog: Accurate Log Template Discovery for Stripped X86 Binaries

Log analysis plays an important role for computer failure diagnosis. With the ever increasing size and complexity of logs, the task of analyzing logs has become cumbersome to carry out manually. For this reason, recent research has focused on automatic analysis techniques for large log files. However, log messages are texts with certain formats and it is very challenging for automatic analysis to understand the semantic meanings of log messages. The current state-of-the-art approaches depend on the quality of observed log messages or source code producing these log messages. In this paper, we propose a method GenLog that can extract log templates from stripped executables (neither source code nor debugging information need to be available). GenLog finds all log related functions in a binary through a combined bottom-up and top down slicing method, reconstructs the memory buffers where log messages were constructeStripped X86 Binaries d, and identifies components of log messages using data flow analysis and taint propagation analysis. GenLog can be used to analyze large binary code, and is suitable for commercial off-the-shelf (COTS) software or dynamic libraries. We evaluated GenLog on four X86 executables and one of them is Nginx. The experiments show that GenLog can identify the template for log messages in testing log files with a precision of 99.9%.

[1]  Yuriy Brun,et al.  Leveraging existing instrumentation to automatically infer invariant-constrained models , 2011, ESEC/FSE '11.

[2]  Amitabha Sanyal,et al.  Data Flow Analysis - Theory and Practice , 2009 .

[3]  David W. Binkley,et al.  Program slicing , 2008, 2008 Frontiers of Software Maintenance.

[4]  Michael I. Jordan,et al.  Detecting large-scale system problems by mining console logs , 2009, SOSP '09.

[5]  Rajeev Barua,et al.  Scalable variable and data type detection in a binary rewriter , 2013, PLDI.

[6]  Jules Desharnais,et al.  Static Detection of Malicious Code in Executable Programs , 2000 .

[7]  Risto Vaarandi,et al.  Mining event logs with SLCT and LogHound , 2008, NOMS 2008 - 2008 IEEE Network Operations and Management Symposium.

[8]  Xiangyu Zhang,et al.  Analyzing concurrency bugs using dual slicing , 2010, ISSTA '10.

[9]  Dawn Xiaodong Song,et al.  Dispatcher: enabling active botnet infiltration using automatic protocol reverse-engineering , 2009, CCS.

[10]  Radu State,et al.  Classification of Log Files with Limited Labeled Data , 2013, IPTComm '13.

[11]  Mihai Christodorescu,et al.  String analysis for x86 binaries , 2005, PASTE '05.

[12]  Jianmin Pang,et al.  Parameter and Return-value Analysis of Binary Executables , 2007, 31st Annual International Computer Software and Applications Conference (COMPSAC 2007).

[13]  Jennifer Neville,et al.  Structured Comparative Analysis of Systems Logs to Diagnose Performance Problems , 2012, NSDI.

[14]  Barton P. Miller,et al.  Labeling library functions in stripped binaries , 2011, PASTE '11.

[15]  Thomas Reps,et al.  Recovery of Variables and Heap Structure in x86 Executables , 2005 .

[16]  Cristina Cifuentes,et al.  Intraprocedural static slicing of binary executables , 1997, 1997 Proceedings International Conference on Software Maintenance.

[17]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[18]  Ding Yuan,et al.  SherLog: error diagnosis by connecting clues from run-time logs , 2010, ASPLOS XV.

[19]  Risto Vaarandi,et al.  A data clustering algorithm for mining patterns from event logs , 2003, Proceedings of the 3rd IEEE Workshop on IP Operations & Management (IPOM 2003) (IEEE Cat. No.03EX764).

[20]  Thomas W. Reps,et al.  Improved Memory-Access Analysis for x86 Executables , 2008, CC.

[21]  Qiang Fu,et al.  Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[22]  Xiaohui Gu,et al.  Insight: In-situ Online Service Failure Path Inference in Production Computing Infrastructures , 2014, USENIX Annual Technical Conference.

[23]  Manu Sridharan,et al.  Thin slicing , 2007, PLDI '07.

[24]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[25]  Thomas W. Reps,et al.  Analyzing Memory Accesses in x86 Executables , 2004, CC.

[26]  Stephen McCamant,et al.  Differential Slicing: Identifying Causal Execution Differences for Security Applications , 2011, 2011 IEEE Symposium on Security and Privacy.

[27]  Qiang Fu,et al.  Where do developers log? an empirical study on logging practices in industry , 2014, ICSE Companion.

[28]  Stephen McCamant,et al.  Binary Code Extraction and Interface Identification for Security Applications , 2009, NDSS.

[29]  Kenny Wong,et al.  Symptom-based problem determination using log data abstraction , 2010, CASCON.

[30]  Evangelos E. Milios,et al.  Clustering event logs using iterative partitioning , 2009, KDD.

[31]  Xin Zhang,et al.  Hybrid top-down and bottom-up interprocedural analysis , 2014, PLDI.

[32]  Thomas W. Reps,et al.  DIVINE: DIscovering Variables IN Executables , 2007, VMCAI.

[33]  Chris Eagle,et al.  The IDA Pro Book: The Unofficial Guide to the World's Most Popular Disassembler , 2008 .

[34]  Xiaohong Su,et al.  Using Reduced Execution Flow Graph to Identify Library Functions in Binary Code , 2016, IEEE Transactions on Software Engineering.

[35]  Xiangyu Zhang,et al.  Reuse-oriented reverse engineering of functional components from x86 binaries , 2014, ICSE.

[36]  Pengfei Chen,et al.  CauseInfer: Automatic and distributed performance diagnosis with hierarchical causality graph in large distributed systems , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[37]  Thomas W. Reps,et al.  CodeSurfer/x86-A Platform for Analyzing x86 Executables , 2005, CC.