A novel JSON based regular expression language for pattern matching in the internet of things

The Internet of Things work by constantly sensing the physical properties in the vicinity of the user such as ambient light, sounds, motion and temperature. These sensors produce huge volumes of data that has to be efficiently sifted for relevant events required triggering certain actions. In addition, filtering has to be performed to ensure that privacy-sensitive confidential data is not leaked. Efficient and expressive pattern matching is thus a key enabling technology for the full realization of ambient and humanized computing. The bulk of research in this area has focused on the use of specialized hardware and reducing of the memory footprint. Unfortunately, there has been limited work if any on optimizing the core elements of pattern matching- the regular expression language and the compilation process that is responsible for converting patterns into internal data structures. The importance of writing good REs so that on compilation they do not lead to unrealizable data structures is relatively less understood. In the proposed research, we empirically compare different RE processing engines and practically demonstrate that the compilation phase is highly memory intensive and time-consuming as compared to the matching phase -and hence is worth exploring for new techniques and optimizations. As a second important contribution, we propose a novel technique for defining regular expressions by utilizing JavaScript Object Notation. Our evaluation with carefully created patterns shows that the performance of the proposed technique is at par with competing approaches. It is also less ambiguous, extensible, more expressive and much appropriate for defining large and complex patterns.

[1]  Udi Manber,et al.  A FAST ALGORITHM FOR MULTI-PATTERN SEARCHING , 1999 .

[2]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[3]  Li Guo,et al.  Towards Fast and Optimal Grouping of Regular Expressions via DFA Size Estimation , 2014, IEEE Journal on Selected Areas in Communications.

[4]  Raihan Ur Rasool,et al.  Speculative parallel pattern matching using stride-k DFA for deep packet inspection , 2015, J. Netw. Comput. Appl..

[5]  Patrick Crowley,et al.  A hybrid finite automaton for practical deep packet inspection , 2007, CoNEXT '07.

[6]  Daniel Kusswurm,et al.  Modern X86 Assembly Language Programming: 32-bit, 64-bit, SSE, and AVX , 2014 .

[7]  Patrick Crowley,et al.  Algorithms to accelerate multiple regular expressions matching for deep packet inspection , 2006, SIGCOMM 2006.

[8]  Wu-chun Feng,et al.  Demystifying automata processing: GPUs, FPGAs or Micron's AP? , 2017, ICS.

[9]  Meng-Fan Chang,et al.  A Flexible Wildcard-Pattern Matching Accelerator via Simultaneous Discrete Finite Automata , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[10]  Stefano Giordano,et al.  Differential Encoding of DFAs for Fast Regular Expression Matching , 2011, IEEE/ACM Transactions on Networking.

[11]  Beate Commentz-Walter,et al.  A String Matching Algorithm Fast on the Average , 1979, ICALP.

[12]  Yeim-Kuan Chang,et al.  A Memory Efficient DFA Using Compression and Pattern Segmentation , 2015, FNC/MobiSPC.

[13]  Srihari Cadambi,et al.  Memory-Efficient Regular Expression Search Using State Merging , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[14]  Raihan Ur Rasool,et al.  Multi-byte Pattern Matching Using Stride-K DFA for High Speed Deep Packet Inspection , 2014, 2014 IEEE 17th International Conference on Computational Science and Engineering.

[15]  Kai Wang,et al.  Intelligent grouping algorithms for regular expressions in deep inspection , 2014, 2014 23rd International Conference on Computer Communication and Networks (ICCCN).

[16]  Malin Eriksson,et al.  Comparison between JSON and YAML for Data Serialization. , 2011 .

[17]  Sotiris Ioannidis,et al.  Regular Expression Matching on Graphics Hardware for Intrusion Detection , 2009, RAID.

[18]  T. V. Lakshman,et al.  Fast and memory-efficient regular expression matching for deep packet inspection , 2006, 2006 Symposium on Architecture For Networking And Communications Systems.

[19]  Stefano Giordano,et al.  An improved DFA for fast regular expression matching , 2008, CCRV.

[20]  C.J. Coit,et al.  Towards faster string matching for intrusion detection or exceeding the speed of Snort , 2001, Proceedings DARPA Information Survivability Conference and Exposition II. DISCEX'01.

[21]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.

[22]  Judith Kelner,et al.  Design and optimizations for efficient regular expression matching in DPI systems , 2015, Comput. Commun..

[23]  Kai Wang,et al.  Towards fast regular expression matching in practice , 2013, SIGCOMM.

[24]  Yanchun Zhang,et al.  Parallelization of Massive Textstream Compression Based on Compressed Sensing , 2017, ACM Trans. Inf. Syst..

[25]  NajamMaleeha,et al.  Speculative parallel pattern matching using stride-k DFA for deep packet inspection , 2015 .

[26]  Kai Wang,et al.  Practical regular expression matching free of scalability and performance barriers , 2014, Comput. Commun..

[27]  Marina Thottan,et al.  Algorithms for Next Generation Networks , 2010, Computer Communications and Networks.

[28]  Lina Yao,et al.  Collaborative text categorization via exploiting sparse coefficients , 2018, World Wide Web.

[29]  George Varghese,et al.  Applying Fast String Matching to Intrusion Detection , 2001 .

[30]  Eric Torng,et al.  Bypassing Space Explosion in High-Speed Regular Expression Matching , 2014, IEEE/ACM Transactions on Networking.

[31]  Randy Smith,et al.  Efficient signature matching with multiple alphabet compression tables , 2008, SecureComm.

[32]  Nelma Moreira,et al.  Implementation and Application of Automata , 2012, Lecture Notes in Computer Science.

[33]  Patrick Crowley,et al.  Extending finite automata to efficiently match Perl-compatible regular expressions , 2008, CoNEXT '08.

[34]  Markus E. Nebel Fast string matching by using probabilities: On an optimal mismatch variant of Horspool's algorithm , 2006, Theor. Comput. Sci..

[35]  Li Guo,et al.  An efficient regular expressions compression algorithm from a new perspective , 2011, 2011 Proceedings IEEE INFOCOM.

[36]  Xiaohong Jiang,et al.  On Secure Wireless Communications for IoT Under Eavesdropper Collusion , 2016, IEEE Transactions on Automation Science and Engineering.

[37]  Ping Chen,et al.  A Study on Advanced Persistent Threats , 2014, Communications and Multimedia Security.

[38]  George Varghese,et al.  Deterministic memory-efficient string matching algorithms for intrusion detection , 2004, IEEE INFOCOM 2004.

[39]  Yang Song,et al.  TFA: A Tunable Finite Automaton for Pattern Matching in Network Intrusion Detection Systems , 2014, IEEE Journal on Selected Areas in Communications.

[40]  Somesh Jha,et al.  Multi-byte Regular Expression Matching with Speculation , 2009, RAID.

[41]  Xiaodong Yu,et al.  Revisiting State Blow-Up: Automatically Building Augmented-FA While Preserving Functional Equivalence , 2014, IEEE Journal on Selected Areas in Communications.

[42]  Hua Wang,et al.  Editorial: Special Issue on Security and Privacy of IoT , 2017, World Wide Web.