A Lightweight Framework for Regular Expression Verification

Regular expressions and finite state automata have been widely used in programs for pattern searching and string matching. Unfortunately, despite the popularity, regular expressions are difficult to understand and verify even for experienced programmers. Conventional testing techniques remain a challenge as large regular expressions are constantly used for security purposes such as input validation and network intrusion detection. In this paper, we present a lightweight verification framework for regular expressions. In this framework, instead of a large number of test cases, it takes in requirements in natural language descriptions to automatically synthesize formal specifications. By checking the equivalence between the synthesized specifications and target regular expressions, errors will be detected and counterexamples will be reported. We have built a web application prototype and demonstrated its usability with two case studies.

[1]  Xiao Liu,et al.  Automated Synthesis of Access Control Lists , 2017, 2017 International Conference on Software Security and Assurance (ICSSA).

[2]  Joxan Jaffar,et al.  S3: A Symbolic String Solver for Vulnerability Detection in Web Applications , 2014, CCS.

[3]  Joseph Weizenbaum,et al.  and Machine , 1977 .

[4]  N. Chater,et al.  Simplicity: a unifying principle in cognitive science? , 2003, Trends in Cognitive Sciences.

[5]  Ib Holm Sørensen A Specification Language , 1981, Program Specification.

[6]  Regina Barzilay,et al.  Using Semantic Unification to Generate Regular Expressions from Natural Language , 2013, NAACL.

[7]  Michael D. Ernst,et al.  HAMPI: A solver for word equations over strings, regular expressions, and context-free grammars , 2012, TSEM.

[8]  Kathryn T. Stolee,et al.  Exploring regular expression usage and context in Python , 2016, ISSTA.

[9]  Nelma Moreira,et al.  Interactive manipulation of regular objects with FAdo , 2005, ITiCSE '05.

[10]  Guodong Li,et al.  JST: An automatic test generation tool for industrial Java applications with strings , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[11]  Xiao Liu,et al.  PiE: programming in eliza , 2014, ASE.

[12]  Alaa A. Kharbouch,et al.  Three models for the description of language , 1956, IRE Trans. Inf. Theory.

[13]  Sumit Gulwani,et al.  Automating string processing in spreadsheets using input-output examples , 2011, POPL '11.

[14]  Alan F. Blackwell,et al.  See What You Need: Helping End-users to Build Abstractions , 2001, J. Vis. Lang. Comput..

[15]  Peter D. Mosses,et al.  Rewriting Extended Regular Expressions , 1993, International Conference on Developments in Language Theory.

[16]  Susan H. Rodger,et al.  Increasing visualization and interaction in the automata theory course , 2000, SIGCSE '00.

[17]  Cliff B. Jones,et al.  Systematic software development using VDM , 1986, Prentice Hall International Series in Computer Science.

[18]  B. B. Meshram,et al.  Analysis of different technique for detection of SQL injection , 2011, ICWET.

[19]  Andrew Begel,et al.  Codebook: discovering and exploiting relationships in software repositories , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[20]  Aarne Ranta,et al.  A Multilingual Natural-Language Interface to Regular Expressions , 1998 .

[21]  Myra B. Cohen,et al.  An orchestrated survey of methodologies for automated software test case generation , 2013, J. Syst. Softw..

[22]  Tao Xie,et al.  SemRegex: A Semantics-Based Approach for Generating Regular Expressions from Natural Language Specifications , 2018, EMNLP.

[23]  Michael D. Ernst,et al.  A type system for regular expressions , 2012, FTfJP@ECOOP.

[24]  Bernhard K. Aichernig,et al.  Survey on test data generation tools , 2013, International Journal on Software Tools for Technology Transfer.

[25]  Fabian Beck,et al.  RegViz: visual debugging of regular expressions , 2014, ICSE Companion.

[26]  Bernhard Rumpe,et al.  Synthesis of component and connector models from crosscutting structural views , 2013, ESEC/FSE 2013.

[27]  Xiao Liu,et al.  Natural Shell: An Assistant for End-User Scripting , 2016, Int. J. People Oriented Program..

[28]  Robert A. Martin,et al.  Common weakness enumeration (CWE) status update , 2008, ALET.

[29]  Sumit Gulwani,et al.  Automated feedback generation for introductory programming assignments , 2012, ACM-SIGPLAN Symposium on Programming Language Design and Implementation.

[30]  Nikolai Tillmann,et al.  Transferring an automated test generation tool to practice: from pex to fakes and code digger , 2014, ASE.