Compressing Regular Expression Sets for Deep Packet Inspection

The ability to generate security-related alerts while analyzing network traffic in real time has become a key mechanism in many networking devices. This functionality relies on the application of large sets of regular expressions describing attack signatures to each individual packet. Implementing an engine of this form capable of operating at line speed is considerably difficult and the corresponding performance problems have been attacked from several points of view. In this work we propose a novel approach complementing earlier proposals: we suggest transforming the starting set of regular expressions to another set of expressions which is much smaller yet classifies network traffic in the same categories as the starting set. Key component of the transformation is an evolutionary search based on Genetic Programming: a large population of expressions represented as abstract syntax trees evolves by means of mutation and crossover, evolution being driven by fitness indexes tailored to the desired classification needs and which minimize the length of each expression. We assessed our proposals on real datasets composed of up to 474 expressions and the outcome has been very good, resulting in compressions in the order of 74%. Our results are highly encouraging and demonstrate the power of evolutionary techniques in an important application domain.

[1]  Tarek El-Ghazawi,et al.  Reconfigurable Computing: Architectures, Tools, and Applications , 2014, Lecture Notes in Computer Science.

[2]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[3]  Jan Korenek,et al.  Reduction of FPGA resources for regular expression matching by relation similarity , 2011, 14th IEEE International Symposium on Design and Diagnostics of Electronic Circuits and Systems.

[4]  Patrick Crowley,et al.  A workload for evaluating deep packet inspection architectures , 2008, 2008 IEEE International Symposium on Workload Characterization.

[5]  Randy Smith,et al.  Efficient signature matching with multiple alphabet compression tables , 2008, SecureComm.

[6]  Ron K. Cytron,et al.  A Scalable Architecture For High-Throughput Regular-Expression Pattern Matching , 2006, ISCA 2006.

[7]  Jan Korenek,et al.  Netbench: Framework for Evaluation of Packet Processing Algorithms , 2011, 2011 ACM/IEEE Seventh Symposium on Architectures for Networking and Communications Systems.

[8]  Patrick Crowley,et al.  A hybrid finite automaton for practical deep packet inspection , 2007, CoNEXT '07.

[9]  Eric Torng,et al.  Fast Regular Expression Matching Using Small TCAM , 2014, IEEE/ACM Transactions on Networking.

[10]  Patrick Crowley,et al.  Efficient regular expression evaluation: theory to practice , 2008, ANCS '08.

[11]  Patrick Crowley,et al.  An improved algorithm to accelerate regular expression evaluation , 2007, ANCS '07.

[12]  T. V. Lakshman,et al.  Fast and memory-efficient regular expression matching for deep packet inspection , 2006, 2006 Symposium on Architecture For Networking And Communications Systems.

[13]  Patrick Crowley,et al.  Algorithms to accelerate multiple regular expressions matching for deep packet inspection , 2006, SIGCOMM 2006.

[14]  Eric Medvet,et al.  Automatic Synthesis of Regular Expressions from Examples , 2014, Computer.

[15]  Ron K. Cytron,et al.  A Scalable Architecture For High-Throughput Regular-Expression Pattern Matching , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[16]  Srihari Cadambi,et al.  Memory-Efficient Regular Expression Search Using State Merging , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[17]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[18]  Ali A. Ghorbani,et al.  Toward developing a systematic approach to generate benchmark datasets for intrusion detection , 2012, Comput. Secur..

[19]  Stamatis Vassiliadis,et al.  Synthesis of Regular Expressions Targeting FPGAs: Current Status and Open Issues , 2007, ARC.