FlowSifter: A counting automata approach to layer 7 field extraction for deep flow inspection

In this paper, we introduce FlowSifter, a systematic framework for online application protocol field extraction. FlowSifter introduces a new grammar model Counting Regular Grammars (CRG) and a corresponding automata model Counting Automata (CA). The CRG and CA models add counters with update functions and transition guards to regular grammars and finite state automata. These additions give CRGs and CAs the ability to parse and extract fields from context sensitive application protocols. These additions also facilitate fast and stackless approximate parsing of recursive structures. These new grammar models enable FlowSifter to generate optimized Layer 7 field extractors from simple extraction specifications. In our experiments, we compare FlowSifter against both BinPAC and UltraPAC, which are the freely available state of the art field extractors. Our experiments show that when compared to UltraPAC parsers, FlowSifter extractors run 84% faster and use 12% of the memory.

[1]  Patrick Crowley,et al.  Algorithms to accelerate multiple regular expressions matching for deep packet inspection , 2006, SIGCOMM 2006.

[2]  Patrick Crowley,et al.  An improved algorithm to accelerate regular expression evaluation , 2007, ANCS '07.

[3]  Nikita Borisov,et al.  High-Speed Matching of Vulnerability Signatures , 2008, RAID.

[4]  Hao Wang,et al.  Towards automatic generation of vulnerability-based signatures , 2006, 2006 IEEE Symposium on Security and Privacy (S&P'06).

[5]  Murray Hill,et al.  Yacc: Yet Another Compiler-Compiler , 1978 .

[6]  Abhishek Kumar,et al.  Exploiting Underlying Structure for Detailed Reconstruction of an Internet-scale Event , 2005, Internet Measurement Conference.

[7]  Kevin Borders,et al.  Web tap: detecting covert web traffic , 2004, CCS '04.

[8]  Russell W. Quong,et al.  ANTLR: A predicated‐LL(k) parser generator , 1995, Softw. Pract. Exp..

[9]  Bin Liu,et al.  NetShield: massive semantics-based vulnerability signature matching for high-speed networks , 2010, SIGCOMM '10.

[10]  Larry L. Peterson,et al.  binpac: a yacc for writing application protocol parsers , 2006, IMC '06.

[11]  Martin Roesch,et al.  Snort - Lightweight Intrusion Detection for Networks , 1999 .

[12]  Towards Quantification of Network-Based Information Leaks via HTTP , 2008, HotSec.

[13]  George Varghese,et al.  Curing regular expressions matching algorithms from insomnia, amnesia, and acalculia , 2007, ANCS '07.

[14]  Regina Dunlea,et al.  Simple Object Access Protocol (SOAP) , 2005 .

[15]  Vern Paxson,et al.  Bro: a system for detecting network intruders in real-time , 1998, Comput. Networks.

[16]  Srihari Cadambi,et al.  Memory-Efficient Regular Expression Search Using State Merging , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[17]  John W. Lockwood,et al.  Fast and Scalable Pattern Matching for Network Intrusion Detection Systems , 2006, IEEE Journal on Selected Areas in Communications.

[18]  Christopher R. Clark,et al.  Scalable pattern matching for high speed networks , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[19]  Yan Chen,et al.  Network-based and Attack-resilient Length Signature Generation for Zero-day Polymorphic Worms , 2007, 2007 IEEE International Conference on Network Protocols.

[20]  Randy Smith,et al.  Efficient signature matching with multiple alphabet compression tables , 2008, SecureComm.

[21]  David Moore,et al.  The Spread of the Witty Worm , 2004, IEEE Secur. Priv..

[22]  Helen J. Wang,et al.  Shield: vulnerability-driven network filters for preventing known vulnerability exploits , 2004, SIGCOMM 2004.

[23]  Somesh Jha,et al.  Deflating the big bang: fast and scalable deep packet inspection with extended finite automata , 2008, SIGCOMM '08.

[24]  Helen J. Wang,et al.  Generic Application-Level Protocol Analyzer and its Language , 2007, NDSS.