Using Segment-Based Alignment to Extract Packet Structures from Network Traces

Many applications in security, from understanding unfamiliar protocols to fuzz-testing and guarding against potential attacks, rely on analysing network protocols. In many situations we cannot rely on access to a specification or even an implementation of the protocol, and must instead rely on raw network data "sniffed" from the network. When this is the case, one of the key challenges is to discern from the raw data the underlying packet structures -- a task that is commonly carried out by using alignment algorithms to identify commonalities (e.g. field delimiters) between packets. For this, most approaches have used variants of the Needleman Wunsch algorthm to perform byte-wise alignment. However, they can suffer when messages are heterogeneous, or in cases where protocol fields are separated by long variable fields. In this paper, we present an alternative alignment algorithm known as segment-based alignment. We show how this technique can produce accurate results on traces from several common protocols, and how the results tend to be more intuitive than those produced by state-of-the-art techniques.

[1]  Christopher Krügel,et al.  Prospex: Protocol Specification Extraction , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[2]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[3]  W. Richard Stevens,et al.  TCP/IP Illustrated, Volume 1: The Protocols , 1994 .

[4]  Konrad Rieck,et al.  Linear-Time Computation of Similarity Measures for Sequential Data , 2008, J. Mach. Learn. Res..

[5]  Pedram Amini,et al.  Fuzzing: Brute Force Vulnerability Discovery , 2007 .

[6]  Burkhard Morgenstern,et al.  DIALIGN2: Improvement of the segment to segment approach to multiple sequence alignment , 1999, German Conference on Bioinformatics.

[7]  Zhi Wang,et al.  ReFormat: Automatic Reverse Engineering of Encrypted Messages , 2009, ESORICS.

[8]  Nicole Krämer,et al.  Learning stateful models for network honeypots , 2012, AISec.

[9]  Dawn Xiaodong Song,et al.  Fig: Automatic Fingerprint Generation , 2007, NDSS.

[10]  Li Guo,et al.  A semantics aware approach to automated reverse engineering unknown protocols , 2012, 2012 20th IEEE International Conference on Network Protocols (ICNP).

[11]  W. B. Cavnar,et al.  N-gram-based text categorization , 1994 .

[12]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[13]  Guillaume Hiet,et al.  Towards automated protocol reverse engineering using semantic information , 2014, AsiaCCS.

[14]  Randy H. Katz,et al.  Protocol-Independent Adaptive Replay of Application Dialog , 2006, NDSS.

[15]  Neil Walkinshaw,et al.  Finding Clustering Configurations to Accurately Infer Packet Structures from Network Data , 2016, ArXiv.

[16]  Zhenkai Liang,et al.  Polyglot: automatic extraction of protocol message format using dynamic binary analysis , 2007, CCS '07.

[17]  Larry L. Peterson,et al.  binpac: a yacc for writing application protocol parsers , 2006, IMC '06.

[18]  Ian H. Witten,et al.  Identifying Hierarchical Structure in Sequences: A linear-time algorithm , 1997, J. Artif. Intell. Res..

[19]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[20]  Vern Paxson,et al.  Bro: a system for detecting network intruders in real-time , 1998, Comput. Networks.

[21]  Michael Kaufmann,et al.  BMC Bioinformatics BioMed Central , 2005 .

[22]  Stefan Savage,et al.  Unexpected means of protocol inference , 2006, IMC '06.

[23]  Christopher Krügel,et al.  Automatic Network Protocol Analysis , 2008, NDSS.

[24]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[25]  Frits W. Vaandrager,et al.  Improving active Mealy machine learning for protocol conformance testing , 2014, Machine Learning.

[26]  Li Guo,et al.  Biprominer: Automatic Mining of Binary Protocol Features , 2011, 2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies.

[27]  Dawn Xiaodong Song,et al.  Automatic protocol reverse-engineering: Message format extraction and field semantics inference , 2013, Comput. Networks.

[28]  Marc Dacier,et al.  ScriptGen: an automated script generation tool for Honeyd , 2005, 21st Annual Computer Security Applications Conference (ACSAC'05).

[29]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[30]  A. Dress,et al.  Multiple DNA and protein sequence alignment based on segment-to-segment comparison. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[31]  John Derrick,et al.  Increasing Functional Coverage by Inductive Testing: A Case Study , 2010, ICTSS.

[32]  Li Guo,et al.  Inferring Protocol State Machine from Network Traces: A Probabilistic Approach , 2011, ACNS.

[33]  Mark Handley,et al.  SIP: Session Initiation Protocol , 1999, RFC.

[34]  Helen J. Wang,et al.  Discoverer: Automatic Protocol Reverse Engineering from Network Traces , 2007, USENIX Security Symposium.

[35]  Dawn Xiaodong Song,et al.  Inference and analysis of formal models of botnet command and control protocols , 2010, CCS '10.

[36]  Christopher Hertel Implementing CIFS: The Common Internet File System , 2003 .

[37]  Helen J. Wang,et al.  Tupni: automatic reverse engineering of input formats , 2008, CCS.

[38]  Nuno Ferreira Neves,et al.  Automatically complementing protocol specifications from network traces , 2011, EWDC '11.

[39]  H. Zimmermann,et al.  OSI Reference Model - The ISO Model of Architecture for Open Systems Interconnection , 1980, IEEE Transactions on Communications.

[40]  Michalis Faloutsos,et al.  Internet traffic classification demystified: myths, caveats, and the best practices , 2008, CoNEXT '08.