Inferring Protocol State Machine from Network Traces: A Probabilistic Approach

Application-level protocol specifications (i.e., how a protocol should behave) are helpful for network security management, including intrusion detection and intrusion prevention. The knowledge of protocol specifications is also an effective way of detecting malicious code. However, current methods for obtaining unknown protocol specifications highly rely on manual operations, such as reverse engineering which is a major instrument for extracting application-level specifications but is time-consuming and laborious. Several works have focus their attentions on extracting protocol messages from real-world trace automatically, and leave protocol state machine unsolved. In this paper, we propose Veritas, a system that can automatically infer protocol state machine from real-world network traces. The main feature of Veritas is that it has no prior knowledge of protocol specifications, and our technique is based on the statistical analysis on the protocol formats. We also formally define a new model - probabilistic protocol state machine (P-PSM), which is a probabilistic generalization of protocol state machine. In our experiments, we evaluate a text-based protocol and two binary-based protocols to test the performance of Veritas. Our results show that the protocol state machines that Veritas infers can accurately represent 92% of the protocol flows on average. Our system is general and suitable for both text-based and binary-based protocols. Veritas can also be employed as an auxiliary tool for analyzing unknown behaviors in real-world applications.

[1]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[2]  P. Jaccard THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1 , 1912 .

[3]  Zhenkai Liang,et al.  Towards Automatic Discovery of Deviations in Binary Implementations with Applications to Error Detection and Fingerprint Generation , 2007, USENIX Security Symposium.

[4]  Thomas W. Reps,et al.  Extracting Output Formats from Executables , 2006, 2006 13th Working Conference on Reverse Engineering.

[5]  Francisco Casacuberta,et al.  Probabilistic finite-state machines - part I , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Stefan Savage,et al.  Unexpected means of protocol inference , 2006, IMC '06.

[7]  Niccolo Cascarano,et al.  GT: picking up the truth from the ground for internet traffic , 2009, CCRV.

[8]  田端 利宏,et al.  Network and Distributed System Security Symposiumにおける研究動向の調査 , 2004 .

[9]  Marc Dacier,et al.  ScriptGen: an automated script generation tool for Honeyd , 2005, 21st Annual Computer Security Applications Conference (ACSAC'05).

[10]  Helen J. Wang,et al.  Generic Application-Level Protocol Analyzer and its Language , 2007, NDSS.

[11]  Zhenkai Liang,et al.  Polyglot: automatic extraction of protocol message format using dynamic binary analysis , 2007, CCS '07.

[12]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[13]  Patrick Haffner,et al.  ACAS: automated construction of application signatures , 2005, MineNet '05.

[14]  M. Kendall,et al.  Kendall's advanced theory of statistics , 1995 .

[15]  Christopher Krügel,et al.  Prospex: Protocol Specification Extraction , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[16]  Dawn Xiaodong Song,et al.  Fig: Automatic Fingerprint Generation , 2007, NDSS.

[17]  Anja Feldmann,et al.  Dynamic Application-Layer Protocol Analysis for Network Intrusion Detection , 2006, USENIX Security Symposium.

[18]  Helen J. Wang,et al.  Discoverer: Automatic Protocol Reverse Engineering from Network Traces , 2007, USENIX Security Symposium.

[19]  Vern Paxson,et al.  Semi-automated discovery of application session structure , 2006, IMC '06.