Position-based automatic reverse engineering of network protocols

Abstract Automatic protocol reverse engineering is a process of extracting protocol message formats and protocol state machine without access to the specification of target protocol. Protocol reverse engineering is useful for addressing many problems of network management and security, such as network management, honey-pot systems, intrusion detection, Botnet detection and prevention, and so on. Currently, protocol reverse engineering is mainly a manual and painstaking process which is time-consuming and error-prone. In this paper, we present a novel approach for automatic reverse engineering application-layer network protocols. We extract protocol keywords from network traces based on their support rates and variances of positions, reconstruct message formats, and infer protocol state machines. We implement our approach in a prototype system called AutoReEngine and evaluate it over four text-based protocols (HTTP, POP3, SMTP and FTP) and two binary protocols (DNS and NetBIOS). The results show that our AutoReEngine outperforms the existing algorithms.

[1]  Helen J. Wang,et al.  Tupni: automatic reverse engineering of input formats , 2008, CCS.

[2]  Niels Provos,et al.  A Virtual Honeypot Framework , 2004, USENIX Security Symposium.

[3]  Vern Paxson,et al.  Bro: a system for detecting network intruders in real-time , 1998, Comput. Networks.

[4]  Xuxian Jiang,et al.  Automatic Protocol Format Reverse Engineering through Context-Aware Monitored Execution , 2008, NDSS.

[5]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[6]  Helen J. Wang,et al.  Discoverer: Automatic Protocol Reverse Engineering from Network Traces , 2007, USENIX Security Symposium.

[7]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[8]  Guofei Gu,et al.  A Taxonomy of Botnet Structures , 2007, ACSAC.

[9]  Dawn Xiaodong Song,et al.  Dispatcher: enabling active botnet infiltration using automatic protocol reverse-engineering , 2009, CCS.

[10]  James Newsome,et al.  Dynamic Taint Analysis for Automatic Detection, Analysis, and SignatureGeneration of Exploits on Commodity Software , 2005, NDSS.

[11]  Helen J. Wang,et al.  Shield: vulnerability-driven network filters for preventing known vulnerability exploits , 2004, SIGCOMM 2004.

[12]  Wanlei Zhou,et al.  Generating regular expression signatures for network traffic classification in trusted network management , 2012, J. Netw. Comput. Appl..

[13]  Marc Dacier,et al.  ScriptGen: an automated script generation tool for Honeyd , 2005, 21st Annual Computer Security Applications Conference (ACSAC'05).

[14]  Stefan Savage,et al.  Unexpected means of protocol inference , 2006, IMC '06.

[15]  Zhenkai Liang,et al.  Polyglot: automatic extraction of protocol message format using dynamic binary analysis , 2007, CCS '07.

[16]  Nasser Yazdani,et al.  Mutual information-based feature selection for intrusion detection systems , 2011, J. Netw. Comput. Appl..

[17]  Christopher Krügel,et al.  Prospex: Protocol Specification Extraction , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[18]  Randy H. Katz,et al.  Protocol-Independent Adaptive Replay of Application Dialog , 2006, NDSS.

[19]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.