NetPlier: Probabilistic Network Protocol Reverse Engineering from Message Traces

Network protocol reverse engineering is an important challenge with many security applications. A popular kind of method leverages network message traces. These methods rely on pair-wise sequence alignment and/or tokenization. They have various limitations such as difficulties of handling a large number of messages and dealing with inherent uncertainty. In this paper, we propose a novel probabilistic method for network trace based protocol reverse engineering. It first makes use of multiple sequence alignment to align all messages and then reduces the problem to identifying the keyword field from the set of aligned fields. The keyword field determines the type of a message. The identification is probabilistic, using random variables to indicate the likelihood of each field (being the true keyword). A joint distribution is constructed among the random variables and the observations of the messages. Probabilistic inference is then performed to determine the most likely keyword field, which allows messages to be properly clustered by their true types and enables the recovery of message format and state machine. Our evaluation on 10 protocols shows that our technique substantially outperforms the state-of-the-art and our case studies show the unique advantages of our technique in IoT protocol reverse engineering and malware analysis.

[1]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[2]  G. C. Tiao,et al.  Bayesian inference in statistical analysis , 1973 .

[3]  M. Sternberg,et al.  A strategy for the rapid multiple alignment of protein sequences. Confidence levels from tertiary structure comparisons. , 1987, Journal of molecular biology.

[4]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[5]  P. Jonathon Phillips,et al.  An Introduction to Evaluating Biometric Systems , 2000, Computer.

[6]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[7]  D. Mount Bioinformatics: Sequence and Genome Analysis , 2001 .

[8]  K. Katoh,et al.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. , 2002, Nucleic acids research.

[9]  Vern Paxson,et al.  A high-level programming environment for packet trace anonymization and transformation , 2003, SIGCOMM '03.

[10]  Marc Dacier,et al.  ScriptGen: an automated script generation tool for Honeyd , 2005, 21st Annual Computer Security Applications Conference (ACSAC'05).

[11]  Dawson R. Engler,et al.  From uncertainty to belief: inferring the specification within , 2006, OSDI '06.

[12]  Anil K. Jain,et al.  Performance evaluation of fingerprint verification systems , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  R. Doolittle,et al.  Progressive sequence alignment as a prerequisitetto correct phylogenetic trees , 2007, Journal of Molecular Evolution.

[14]  Helen J. Wang,et al.  Discoverer: Automatic Protocol Reverse Engineering from Network Traces , 2007, USENIX Security Symposium.

[15]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[16]  Pushmeet Kohli,et al.  Dynamic Graph Cuts for Efficient Inference in Markov Random Fields , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Zhenkai Liang,et al.  Polyglot: automatic extraction of protocol message format using dynamic binary analysis , 2007, CCS '07.

[18]  Teng Joon Lim,et al.  Belief Propagation on Factor Graphs for Cooperative Spectrum Sensing in Cognitive Radio , 2008, 2008 3rd IEEE Symposium on New Frontiers in Dynamic Spectrum Access Networks.

[19]  Christopher Krügel,et al.  Automatic Network Protocol Analysis , 2008, NDSS.

[20]  Christopher Krügel,et al.  Overbot: a botnet protocol based on Kademlia , 2008, SecureComm.

[21]  Samuel T. King,et al.  Digging for Data Structures , 2008, OSDI.

[22]  Xuxian Jiang,et al.  Automatic Protocol Format Reverse Engineering through Context-Aware Monitored Execution , 2008, NDSS.

[23]  Jonathon T. Giffin,et al.  Automatic Reverse Engineering of Malware Emulators , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[24]  Leyla Bilge,et al.  Automatically Generating Models for Botnet Detection , 2009, ESORICS.

[25]  Christopher Krügel,et al.  Your botnet is my botnet: analysis of a botnet takeover , 2009, CCS.

[26]  Christopher Krügel,et al.  Prospex: Protocol Specification Extraction , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[27]  Andreas Zeller,et al.  Localizing Bugs in Program Executions with Graphical Models , 2009, NIPS.

[28]  Benjamin Livshits,et al.  Merlin: specification inference for explicit information flow problems , 2009, PLDI '09.

[29]  Xiangyu Zhang,et al.  Automatic Reverse Engineering of Data Structures from Binary Execution , 2010, NDSS.

[30]  Nicole Krämer,et al.  ASAP: Automatic Semantics-Aware Analysis of Network Payloads , 2010, PSDML.

[31]  Aditya V. Nori,et al.  Probabilistic, modular and scalable inference of typestate specifications , 2011, PLDI '11.

[32]  Dawn Xiaodong Song,et al.  MACE: Model-inference-Assisted Concolic Exploration for Protocol and Vulnerability Discovery , 2011, USENIX Security Symposium.

[33]  Li Chen,et al.  A Survey on Methods of Automatic Protocol Reverse Engineering , 2011, 2011 Seventh International Conference on Computational Intelligence and Security.

[34]  Li Guo,et al.  Inferring Protocol State Machine from Network Traces: A Probabilistic Approach , 2011, ACNS.

[35]  Leyla Bilge,et al.  Disclosure: detecting botnet command and control servers through large-scale NetFlow analysis , 2012, ACSAC '12.

[36]  Nicole Krämer,et al.  Learning stateful models for network honeypots , 2012, AISec.

[37]  Li Guo,et al.  A semantics aware approach to automated reverse engineering unknown protocols , 2012, 2012 20th IEEE International Conference on Network Protocols (ICNP).

[38]  Chao Wu,et al.  Discovering Semantic Data of Interest from Un-mappable Memory with Confidence , 2012, NDSS.

[39]  Christopher Krügel,et al.  Delta: automatic identification of unknown web-based infection campaigns , 2013, CCS.

[40]  Guillaume Hiet,et al.  Towards automated protocol reverse engineering using semantic information , 2014, AsiaCCS.

[41]  Georges Bossert,et al.  Exploiting Semantic for the Automatic Reverse Engineering of Communication Protocols. , 2014 .

[42]  Fei Peng,et al.  X-Force: Force-Executing Binary Programs for Security Applications , 2014, USENIX Security Symposium.

[43]  Christopher Krügel,et al.  Protecting Web-Based Single Sign-on Protocols against Relying Party Impersonation Attacks through a Dedicated Bi-directional Authenticated Secure Channel , 2014, RAID.

[44]  Abinash Panda,et al.  pgmpy: Probabilistic Graphical Models using Python , 2015, SciPy.

[45]  Paul C. van Oorschot,et al.  What Lies Beneath? Analyzing Automated SSH Bruteforce Attacks , 2015, PASSWORDS.

[46]  Sandeep K. Shukla,et al.  A Survey of Automatic Protocol Reverse Engineering Tools , 2015, ACM Comput. Surv..

[47]  Julien Duchêne,et al.  State of the art of network protocol reverse engineering tools , 2016, Journal of Computer Virology and Hacking Techniques.

[48]  Baowen Xu,et al.  Python probabilistic type inference with natural language support , 2016, SIGSOFT FSE.

[49]  David Brumley,et al.  Your Exploit is Mine: Automatic Shellcode Transplant for Remote Exploits , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[50]  Yi Zhou,et al.  Understanding the Mirai Botnet , 2017, USENIX Security Symposium.

[51]  Ninghui Li,et al.  Analyzing Operational Behavior of Stateful Protocol Implementations for Detecting Semantic Bugs , 2017, 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[52]  Neil Walkinshaw,et al.  Using Segment-Based Alignment to Extract Packet Structures from Network Traces , 2017, 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS).

[53]  Tarek N. Saadawi,et al.  Deterministic Dendritic Cell Algorithm Application to Smart Grid Cyber-Attack Detection , 2017, 2017 IEEE 4th International Conference on Cyber Security and Cloud Computing (CSCloud).

[54]  Frank Kargl,et al.  NEMESYS: Network Message Syntax Reverse Engineering by Analysis of the Intrinsic Structure of Individual Messages , 2018, WOOT @ USENIX Security Symposium.

[55]  Xiangyu Zhang,et al.  Phys: probabilistic physical unit assignment and inconsistency detection , 2018, ESEC/SIGSOFT FSE.

[56]  Long Lu,et al.  Compiler-Assisted Code Randomization , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[57]  Myung-Sup Kim,et al.  A Survey of Automatic Protocol Reverse Engineering Approaches, Methods, and Tools on the Inputs and Outputs View , 2018, Secur. Commun. Networks.

[58]  Gilles Barthe,et al.  Relational Reasoning for Markov Chains in a Probabilistic Guarded Lambda Calculus , 2018, ESOP.

[59]  Gergo Lodi,et al.  Message Format and Field Semantics Inference for Binary Protocols Using Recorded Network Traffic , 2018, 2018 26th International Conference on Software, Telecommunications and Computer Networks (SoftCOM).

[60]  Herbert Bos,et al.  TIFF: Using Input Type Inference To Improve Fuzzing , 2018, ACSAC.

[61]  Bing Mao,et al.  PTrix: Efficient Hardware-Assisted Fuzzing for COTS Binary , 2019, AsiaCCS.

[62]  Johannes Pohl,et al.  Automatic Wireless Protocol Reverse Engineering , 2019, WOOT @ USENIX Security Symposium.

[63]  Wei You,et al.  BDA: practical dependence analysis for binary executables by unbiased whole-program path sampling and per-path abstract interpretation , 2019, Proc. ACM Program. Lang..

[64]  Cristina Nita-Rotaru,et al.  Leveraging Textual Specifications for Grammar-based Fuzzing of Network Protocols , 2019, AAAI.

[65]  Yi Sun,et al.  Probabilistic Disassembly , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[66]  Hamed Haddadi,et al.  Information Exposure From Consumer IoT Devices: A Multidimensional, Network-Informed Measurement Approach , 2019, Internet Measurement Conference.

[67]  Max von Hippel,et al.  Automated Attacker Synthesis for Distributed Protocols , 2020, SAFECOMP.

[68]  Cristiano Giuffrida,et al.  ParmeSan: Sanitizer-guided Greybox Fuzzing , 2020, USENIX Security Symposium.

[69]  Richard J. Piro Fiddler , 2020, My Health Is Better in November.

[70]  Xiangyu Zhang,et al.  PMP: Cost-effective Forced Execution with Probabilistic Memory Pre-planning , 2020, 2020 IEEE Symposium on Security and Privacy (SP).

[71]  Athina Markopoulou,et al.  Packet-Level Signatures for Smart Home Devices , 2020, NDSS.

[72]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .