Towards a Fast Regular Expression Matching Method Over Compressed Traffic

Nowadays, Deep Packet Inspection (DPI) becomes a critical component of the network traffic detection applications. For comprehensive analysis of traffic, regular expression matching as the core technique of DPI is widely used. However, web services tend to compress their traffic for less data transmission, which challenges the regular expression matching to achieve wire-speed processing. In this paper, we propose Twins, a fast regular expression matching method over compressed traffic that leverages the returned states encoding in the compression to skip the bytes to be scanned. In our evaluation results, Twins can skip about 90% compression data and can achieve 1.5Gbps throughput, which gains 2.7∼3.4 performance boost to the state-of-the-art work.

[1]  Ayumi Shinohara,et al.  Multiple pattern matching in LZW compressed text , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[2]  Yuming Jiang,et al.  Deep semantics inspection over big network data at wire speed , 2016, IEEE Network.

[3]  Udi Manber,et al.  A FAST ALGORITHM FOR MULTI-PATTERN SEARCHING , 1999 .

[4]  Peter Deutsch,et al.  GZIP file format specification version 4.3 , 1996, RFC.

[5]  Min Sik Kim,et al.  DFA-Based Regular Expression Matching on Compressed Traffic , 2011, 2011 IEEE International Conference on Communications (ICC).

[6]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[7]  Michela Becchi,et al.  Accelerating regular expression matching over compressed HTTP , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[8]  Michael Walfish,et al.  Pretzel: Email encryption and provider-supplied functions are compatible , 2017, SIGCOMM.

[9]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[10]  Kenneth Mixter,et al.  A Proposal for Shared Dictionary Compression over HTTP , 2016 .

[11]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.

[12]  Chunming Qiao,et al.  SPABox: Safeguarding Privacy During Deep Packet Inspection at a MiddleBox , 2017, IEEE/ACM Transactions on Networking.

[13]  Peter Deutsch,et al.  DEFLATE Compressed Data Format Specification version 1.3 , 1996, RFC.

[14]  Shmuel Tomi Klein,et al.  Pattern matching in Huffman encoded texts , 2001, Proceedings DCC 2001. Data Compression Conference.

[15]  Dana Shapira,et al.  Adapting the Knuth-Morris-Pratt algorithm for pattern matching in Huffman encoded texts , 2006, Inf. Process. Manag..

[16]  Ayumi Shinohara,et al.  A Boyer-Moore Type Algorithm for Compressed Pattern Matching , 2000, CPM.

[17]  Patrick Crowley,et al.  Algorithms to accelerate multiple regular expressions matching for deep packet inspection , 2006, SIGCOMM 2006.

[18]  Anat Bremler-Barr,et al.  Decompression-free inspection: DPI for shared dictionary compression over HTTP , 2012, 2012 Proceedings IEEE INFOCOM.

[19]  Hao Li,et al.  Towards a fast packet inspection over compressed HTTP traffic , 2017, 2017 IEEE/ACM 25th International Symposium on Quality of Service (IWQoS).

[20]  Yehuda Afek,et al.  Space efficient deep packet inspection of compressed web traffic , 2012, Comput. Commun..

[21]  Efi Arazi,et al.  Shift-based Pattern Matching for Compressed Web Traffic , 2011 .

[22]  Hiroaki Nishi,et al.  Hardware Parallel Decoder of Compressed HTTP Traffic on Service-oriented Router , 2013 .

[23]  Patrick Crowley,et al.  An improved algorithm to accelerate regular expression evaluation , 2007, ANCS '07.

[24]  Sylvia Ratnasamy,et al.  BlindBox: Deep Packet Inspection over Encrypted Traffic , 2015, SIGCOMM.

[25]  Anat Bremler-Barr,et al.  Accelerating Multipattern Matching on Compressed HTTP Traffic , 2012, IEEE/ACM Transactions on Networking.

[26]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.