Protocol Formats Reverse Engineering Based on Association Rules in Wireless Environment

With the wide deployment of wireless networks, attackers may exploit Wi-Fi network vulnerabilities to transfer data secretly, or covert communication channels to spread malicious codes. The protocol formats reverse engineering technique can be used to detect such attacks, however, previous works are focused on the application layer protocol analysis, and can hardly work under the scenarios that the captured data is only in binary format due to the lack of semantics. In this paper, we propose a novel protocol formats reverse engineering framework, which utilizes the association rules of feature sequences to identify unknown protocols from captured binary data. We first convert the captured binary data into a bit stream, and segment it into frames. The improved AC algorithm is adopted to analyze the binary sequences. After which, we extract the feature sequences and analyze their association rules to detect potential unknown protocols. The experimental results show that our framework can identify 100% ARP packets and 98% ICMP packets from captured binary data.

[1]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[2]  Christopher Krügel,et al.  Prospex: Protocol Specification Extraction , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[3]  Zhenkai Liang,et al.  Polyglot: automatic extraction of protocol message format using dynamic binary analysis , 2007, CCS '07.

[4]  Udi Manber,et al.  A FAST ALGORITHM FOR MULTI-PATTERN SEARCHING , 1999 .

[5]  David Brumley,et al.  Replayer: automatic protocol replay by binary analysis , 2006, CCS '06.

[6]  Bin Wang,et al.  Machine Learning and keyword-matching integrated Protocol Identification , 2010, 2010 3rd IEEE International Conference on Broadband Network and Multimedia Technology (IC-BNMT).

[7]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[8]  Xuxian Jiang,et al.  Automatic Protocol Format Reverse Engineering through Context-Aware Monitored Execution , 2008, NDSS.

[9]  Eugene L. Lawler,et al.  Approximate string matching in sublinear expected time , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[10]  Li Guo,et al.  A semantics aware approach to automated reverse engineering unknown protocols , 2012, 2012 20th IEEE International Conference on Network Protocols (ICNP).

[11]  Patrick A. V. Hall,et al.  Approximate String Matching , 1994, Encyclopedia of Algorithms.

[12]  Marc Dacier,et al.  ScriptGen: an automated script generation tool for Honeyd , 2005, 21st Annual Computer Security Applications Conference (ACSAC'05).

[13]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[14]  Li Guo,et al.  Biprominer: Automatic Mining of Binary Protocol Features , 2011, 2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies.

[15]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[16]  David Haussler,et al.  The Smallest Automaton Recognizing the Subwords of a Text , 1985, Theor. Comput. Sci..

[17]  A. M. Abdullah,et al.  Wireless lan medium access control (mac) and physical layer (phy) specifications , 1997 .

[18]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.