Automatic Network Protocol Automaton Extraction

Protocol reverse engineering, the process of (re)constructing the protocol context of communication sessions by an implementation, which involves translating a sequence of packets into protocol messages, grouping them into sessions, and modeling state transitions in the protocol state machine, is well-known to be invaluable for many network security applications, including intrusion prevention and detection, traffic normalization, and penetration testing, etc. However, current practice in deriving protocol specifications is either mostly manual or focusing on automatic reverse engineering the message format only and leaving the protocol state machine inverse undone. Although regular expressions offer superior expressive ability and flexibility, application protocols are described by regular expression manually based on sufficiently understanding protocol itself. At present there is not an effect method to realize classification, recognition and control automatically for the known applications and the unknown applications in future. In this paper a novel approach is presented to model network application specification. In this work, the whole automatic protocol reverse engineering is realized through accomplishing the protocol state machine, and then the FSMs are translated to corresponding regular expressions to enrich and update the pattern database. This approach uses grammatical inference and is motivated by the observation that an implementation of the protocol is inherently a state transition process, the state machine model the essence exactly. The important significance is to describe various state protocols with a common method through modeling the protocol state transition, including known and unknown ones. This approach had been implemented in the system and evaluated using real-world implementations of three different protocols: HTTP, SMTP, FTP, and compared the extracted protocol to the corresponding other newly system, such as l7-filter.

[1]  Xuxian Jiang,et al.  Automatic Protocol Format Reverse Engineering through Context-Aware Monitored Execution , 2008, NDSS.

[2]  Frederic T. Chong,et al.  Minos: Control Data Attack Prevention Orthogonal to Memory Model , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[3]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[4]  Helen J. Wang,et al.  Discoverer: Automatic Protocol Reverse Engineering from Network Traces , 2007, USENIX Security Symposium.

[5]  Bell Telephone,et al.  Regular Expression Search Algorithm , 1968 .

[6]  Tal Garfinkel,et al.  Understanding data lifetime via whole system simulation , 2004 .

[7]  Heng Yin,et al.  Panorama: capturing system-wide information flow for malware detection and analysis , 2007, CCS '07.

[8]  Polyglot : Automatic Extraction of Protocol Format using Dynamic Binary Analysis , 2007 .

[9]  Ken Thompson,et al.  Programming Techniques: Regular expression search algorithm , 1968, Commun. ACM.

[10]  Laurent Miclet,et al.  Applying Grammatical Inference in Learning a Language Model for Oral Dialogue , 1998, ICGI.

[11]  N. Prieto,et al.  Automatic learning of structural language models , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[12]  Anil K. Jain,et al.  Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Enrique Vidal,et al.  Learning Regular Grammars to Model Musical Style: Comparing Different Coding Schemes , 1998, ICGI.

[14]  Pedro P. Cruz-Alcázar,et al.  A study of Grammatical Inference Algorithms in Automatic Music Composition and Musical Style Recognition. , 1997 .

[15]  Pedro P. Cruz-Alcázar,et al.  Modeling musical style using grammatical inference techniques: a tool for classifying and generating melodies , 2003, Proceedings Third International Conference on WEB Delivering of Music.

[16]  Miguel Castro,et al.  Vigilante: end-to-end containment of internet worms , 2005, SOSP '05.

[17]  Enrique Vidal,et al.  Application of the error-correcting grammatical inference algorithm (ECGI) to planar shape recognition , 1993 .

[18]  Zhenkai Liang,et al.  Polyglot: automatic extraction of protocol message format using dynamic binary analysis , 2007, CCS '07.

[19]  Enrique Vidal,et al.  Learning accurate finite-state structural models of words through the ECGI algorithm , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[20]  Vern Paxson,et al.  Bro: a system for detecting network intruders in real-time , 1998, Comput. Networks.

[21]  Christopher Krügel,et al.  Automatic Network Protocol Analysis , 2008, NDSS.

[22]  Martin Roesch,et al.  Snort - Lightweight Intrusion Detection for Networks , 1999 .

[23]  H. Rulot,et al.  An efficient algorithm for the inference of circuit-free automata , 1988 .